TexFormer - MMLab@NTU

The Framework

Texformer

We propose a Transformer-based framework, termed as Texformer, for 3D human texture estimation from a single image. Based on the attention mechanism, the proposed network is able to effectively exploit global information of the input. It naturally overcomes the limitations of existing algorithms that solely rely on CNNs and effectively facilitates higher-quality 3D human texture reconstruction.

The Query is a pre-computed color encoding of the UV space obtained by mapping the 3D coordinates of a standard human body mesh to the UV space. The Key is a concatenation of the input image and the 2D part-segmentation map. The Value is a concatenation of the input image and its 2D coordinates. We first feed the Query, Key, and Value into three CNNs to transform them into feature space. Then the multi-scale features are sent to the Transformer units to generate the Output features. The multi-scale Output features are processed and fused in another CNN, which produces the RGB UV map T, texture flow F, and fusion mask M. The final UV map is generated by combining T and the textures sampled with F using the fusion mask M.

Từ khóa » Xiangyu Xu Github