PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

Peng Li1, Wangguandong Zheng2, Yuan Liu1, Tao Yu3, Yangguang Li4, Xingqun Qi1, Mengfei Li1, Xiaowei Chi1, Siyu Xia2, Wei Xue1, Wenhan Luo1, Qifeng Liu1, Yike Guo1

1HKUST, 2Southeast University, 3Tsinghua University, 4VAST

Please wait moments for loading meshes...
PSHuman facilitates photorealistic and identity-preserved human digitization from a single image. 

Abstract

Photorealistic 3D human modeling is essential for various applications and has seen tremendous progress. However, existing methods for monocular full-body reconstruction, typically relying on front and/or predicted back view, still struggle with satisfactory performance due to the ill-posed nature of the problem and sophisticated self-occlusions. In this paper, we propose PSHuman, a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model. It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions, especially on generated faces. To address it, we propose a cross-scale diffusion that models the joint probability distribution of global full-body shape and local facial characteristics, enabling detailed and identity- preserved novel-view generation without any geometric distortion. Moreover, to enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X, which provide body priors and prevent unnatural views inconsistent with human anatomy. Leveraging the generated multi-view normal and color images, we present SMPLX-initialized explicit human carving to recover realistic textured human meshes efficiently. Extensive experimental results and quantitative evaluations on CAPE and THuman2.1 datasets demonstrate PSHuman’s superiority in geometry details, texture fidelity, and generalization capability.


pipeline

Pipeline. Given a single full-body human image, PSHuman recovers the texture human mesh by two stages: 1) Body-face enhanced and SMPL-X conditioned multi-view generation. The input image and predicted SMPL-X are fed into a multi-view image diffusion model to generate six views of global full-body images and local face images. 2) SMPLX-initialized explicit human carving. Utilizing generated normal and color maps to deform and remesh the SMPL-X with differentiable rasterization.

Mesh carving

Appearance Fusion

Textured mesh

SMPLX-initialized explicit human mesh carving and appearance fusion.

Comparisons

pipeline

Geometry comparison between Implicit and Explicit methods.


pipeline

Appearance comparison with methods which produce texture.


pipeline

Qualitative comparisons with optimization-based methods for single view reconstruction.

(a) Magic123, (b) Dreamgaussian, (c) Chupa, (d) TeCH and (e) Ours.


pipeline

Reconstruction quality on self-occluded images. We present the generated back, left,

and right views of normal maps and corresponding meshes.


pipeline

Generalization on anime characters. We present the generated multiview color and normal images and corresponding meshes (in blue).

More results

SHHQ fashion data Digitization
Anime Characters Digitization