GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation

Jie Wang1, Jiu-Cheng Xie1, Xianyan Li1, Feng Xu2, Chi-Man Pun3, Hao Gao1
1Nanjing University of Posts and Telecommunications,
2Tsinghua University,
3University of Macau

Figure. 1. Based on anisotropic 3D Gaussians and the learnable derivation strategies, our method learns an identity-specific head avatar from a monocular video of the corresponding subject. The proposed GaussianHead demonstrates outstanding performance in self-reconstruction, novel-view synthesis and cross-identity reenactment tasks.

Abstract

Creating lifelike 3D head avatars and generating compelling animations for diverse subjects remain challenging in computer vision. This paper presents GaussianHead, which models the active head based on anisotropic 3D Gaussians. Our method integrates a motion deformation field and a single resolution tri-plane to capture the head’s intricate dynamics and detailed texture. Notably, we introduce a customized deriva tion scheme for each 3D Gaussian, facilitating the generation of multiple “doppelgangers” through learnable parameters for precise position transformation. This approach enables efficient representation of diverse Gaussian attributes and ensures their precision. Additionally, we propose an inherited derivation strat egy for newly added Gaussians to expedite training. Extensive ex periments demonstrate GaussianHead’s efficacy, achieving high f idelity visual results with a remarkably compact model size (≈ 12 MB). Our method outperforms state-of-the-art alternatives in tasks such as reconstruction, cross-identity reenactment, and novel view synthesis.

Method

Figure. 2. GaussianHead uses a set of 3D Gaussians with learnable attributes controlling their shape and appearance to model the subject’s head. A motion deformation field is first set up to represent the dynamic head geometry, which converts structureless Gaussians G_R to structured core ones G_C in a canonical space via conditioning on pre-acquired expression parameters e. Next, a single-resolution tri-plane structure of the feature container is leveraged to store appearance related attributes. Notably, derivation mechanisms through learnable rotations are applied to each core Gaussian, yielding several doppelgangers of it. The integration of sub-features obtained through projection onto the planes from those doppelgangers is taken as the final canonical feature f of the core Gaussian. Two separate tiny MLPs are then employed to decode opacity α and spherical harmonic coefficients (SHs), based on which we generate the final rendering via differential rasterization. Notations odot and bigCup represent Hadamard product and concatenation operations, respectively.

Demo Video

BibTeX

@misc{wang2024gaussianhead,
      title={GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation}, 
      author={Jie Wang and Jiu-Cheng Xie and Xianyan Li and Feng Xu and Chi-Man Pun and Hao Gao},
      year={2024},
      eprint={2312.01632},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}