[读论文] MeshInversionMonocular 3D Object Reconstruction with GAN inversion (ECCV2022)

YuQiao0303

2024-03-26 帮助1人

概述

项目主页：https://www.mmlab-ntu.com/project/meshinversion/
方法名称：MeshInversion
输入：单目图像（in the wild，有背景的，没有抠图的）
输出：textured 3D mesh
key challenge: 缺少3D或multiview supervision
方法核心：先预训练一个3D GAN （ConvMesh，其中mesh表达为deformation and texture maps），可以从latent code z生成textured mesh。然后在inference的时候，从输入的图片倒推最符合的z。（这是一个inferece optimization的方法！！）（将生成的mesh用预测的相机参数渲染出来，用输入图片的texture CD loss和mask CD loss来监督）
主要用到或参考的网络：ConvMesh，PatchGAN，mask 用现成的segmentation tool (PointRend)来获取。

Related Work

Single View 3D Reconstruction

image-3D object pairs [46,35,32,39]
multi-view images [33,28,51,47,34]
SMPL for humans and 3DMM for faces [8,40,18],

CMR [19] reconstructs category-specific
textured mesh

texture一般有两种方法，一个是direct regression of pixel values in the UV texture map – often blurry 但作者用的这个。
主流方法是learning the texture flow，对novel view的泛化能力不好。

GAN inversion

GAN inversion 是指先训练好一个GAN，然后找到合适的z，使得z输入GAN以后得到的输出尽可能满足要求。

通常可以用
梯度下降（略）

用一个encoder来学：
Bau, D., Strobelt, H., Peebles, W., Zhou, B., Zhu, J.Y., Torralba, A., et al.: Semantic photo manipulation with a generative image prior. In: SIGGRAPH (2019)

或者二者的结合：
Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image
editing. In: ECCV (2020)

3D领域最新的工作，包括用GAN Inversion进行点云补全：
Zhang, J., Chen, X., Cai, Z., Pan, L., Zhao, H., Yi, S., Yeo, C.K., Dai, B., Loy, C.C.:
Unsupervised 3D shape completion through GAN inversion. In: CVPR (2021)

textured mesh generation

6.Learning to predict 3D objects with an interpolation-based differentiable renderer.
In: NeurIPS (2019)
重建的mesh可微渲染之后，用渲染得到的multi view images做discriminaive 监督

13.Leveraging 2D data to learn textured
3D mesh generation. In: CVPR (2020)
VAE 方法，face colors instead of texture maps

38.Convolutional generation of textured 3D meshes
topology-aligned texture maps and deformation maps in the UV space. （本文就用了他的pretrained model）

Method

看起来大体方法是用Generator从latent code生成geometry和texture，然后用chamfer mask loss和chamfer texture loss来监督。

Preliminaries

mesh表示为O = (V,F,T), 即点，面，texture map。
其中，由于

An individual mesh is iso-morphic to a 2-pole sphere.

因此点的位置可用球体的deformation $\Delta \mathbf{V}$ 表示：
$\mathbf{V} = \mathbf{V}_{sphere} \Delta \mathbf{V}$
以前的方法大多用MLP来regress delta V，本文使用CNN。

渲染时，使用弱透视投影。（区别于透视投影和正交投影的一种投影方法），参数为π, 包含scale s， translation t和rotation r。

3.1 Reconstruction with Generative Prior

Pre-training Stage

这个阶段训练了一个3D GAN。
Generator主要参考ConvMesh
- 发生在uv space
- 输出的是deformation map和texture map。
Discriminator主要参考PatchGAN。
Loss 包括
- generator loss
- Discrininator loss on UV space
- DIscrininator loss on image space (参考PatchGAN)

Inversion Stage

目的：find the z that best recovers the 3D object from the input image $\mathbf{I}_{in}$ .
需要：原始的image，其对应的mask，还有将3Dshape进行渲染的相机参数。
- 其中mask 用现成的segmentation tool (PointRend)来获取。
  - 理由在此：https://github.com/junzhezhang/mesh-inversion/issues/5 是为了fair comparison以及强调这是test time optimization
- 用ConvMesh 预测Mesh (shape)的latent code z，用CMR预测相机参数π。
  - 如何预测相机参数π：如果直接regress camera pose from scratch，存在camera-shape ambiguity问题。[24] 所以我们用CMR来initialize the camera。
- 用预测的相机参数，将预测的mesh渲染为2D图片求loss（见下文）

由于这个相机位置是不断oprimize的，image不可能完美对齐，需要一个鲁棒的texture loss，见下文

3.2 chamfer texture loss （重点参考）

将image看做2D点云，每个点有2D坐标和3D的RGB颜色值。
两个图像的dissimilarity就用chamfer distance来表达。
- 其中distance D 被分解为 appearance term and spatial term, 都用的l2 distance。
- 重要：具体来说，考虑到我们只想让他tolerant on local misalignment, 因此在spatial term上增加了一个exp操作来惩罚空间距离过远的点，变成这样:
- 解释：首先是Da和Ds相乘。
  - 增加epsilon是如果有一样位置的点（Ds为零），颜色相差极大（Da很大），那应该算作不同的点，免得给他弄成零了；
  - 然后Ds这边加上指数，惩罚距离太远的，因为我只想要较小的misalignment
  - 取个max
- 注意：Ds这一项是不可微的，他只是训练Da（texture）用的权重。

这个东西挺有用的，请看消融实验：
学新通

除了pixel level的CD loss，还有feature level的CD loss：
Specifically, we apply the Chamfer texture loss between the (foreground) feature maps extracted with a pre-trained VGG-19 network [42] from the rendered image and the input image.
这一点有点像contextual loss （The contextual loss for image transformation with non-aligned data.），但有点区别。

feature level Chamfer texture loss: 考虑location，但不要求完全对齐；
contextual loss：完全不考虑location。

loss的消融实验

学新通
CT是指chamfer texture loss；
LpCT是pixel level的； LfCT是feature level的。

看中间那三行，可以看到，
contextual是最差的，
其次是只有L1；
L1 perceptual好一点；
最好的还是CT loss

3.3 Chamfer Mask Loss

传统的mask loss，通常是把3Dshape量化到一个个grid of pixels（mask），然后和gt mask 求l1或IoU loss
- 从3D shape 得到mask需要rasterization that discretizes the mesh into a grid of pixels. 这一部会导致信息丢失，引入误差，对训练好的ConvMesh影响尤其大。
为此，作者提出Chamfer Mask Loss Lcm. （不求L1，而求CD，不再有量化误差）
- 不是将mesh渲染为binary mask，而是把mesh的点直接投影到image plane，得到Sv。
- 然后把用现成工具分割得到的前景点的坐标给normalize到-1到1之间，得到Sf。
- 然后计算Sv和Sf的chamfer distance

总loss

pixel-level chamfer texture loss (appearance)
feature-level chamfer texture loss (appearance)
chamfer mask loss (geometry)
smooth loss (neighboring faces to have similar normals i.e. low cosine)
latent space loss (L2 norm of z to ensure Gaussian distribution)

等下仔细看看代码，尤其是这个latent space loss。
以及那个feature level是咋搞啊。

Experiments

datasets：
- CUB-200-2011 （鸟类）
- PASCAL3D: cars
pretrain ConvMesh: pseudo ground truths ??? 感觉是指上文提到的那个segmentation和camera pose prediction网络得到的结果。
inference 时GAN inversion：似乎也是pseudo ground truths。
evaluation：用的GT了
- geometry accuracy: rendered masks 和 GT masks的2D mask IoU
- appearance quality: image synthesis metric FID （single view and multi view）, 反映了GT images和generated images的分布的相似性。
- user study: 找了40个user来打分。
- (PASCAL3D 特有：有approximated 3D CAD shapes，可以用3D IoU）

Texture Flow vs. Texture Regression

Texture Flow 更常用，但在invisible的地方容易出错；因为容易copy foreground pixies including the obstacles.

实现（主要来自补充材料）

时间，显存，设备GPU

Pre-training：
600 epochs, with a batch size of 128,
15 hours on four Nvidia V100 GPUs.

网络结构：和ConvMesh一样。

convolutional generator G with 2 branches.
- 输入：latent code z （64）
- 输出：deformation map S 32*32; texture map T 512-512
UV space discriminator
- deformation map
- texture map
image space discriminator (PatchGAN)

chamfer texture loss实现笔记

学新通

解释：首先是Da和Ds相乘。
- 增加epsilon是如果有一样位置的点（Ds为零），颜色相差极大（Da很大），那应该算作不同的点，免得给他弄成零了；
- 然后Ds这边加上指数，惩罚距离太远的，因为我只想要较小的misalignment
- 取个max
注意：Ds这一项是不可微的，他只是训练Da（texture）用的权重。

mesh_inversion.py

if self.args.chamfer_texture_pixel_loss:
    # NOTE: batch size should be one
    pix_pos_pred = mask2proj(mask_pred)
    pix_pred = grid_sample_from_vtx(pix_pos_pred, image_pred)
    dist_map_c, idx_a, idx_b = distChamfer_downsample(pix_pred,color_target,resolution=self.args.chamfer_resolution)
    dist_map_p, _, _ = distChamfer_downsample(pix_pos_pred,vtx_target,resolution=self.args.chamfer_resolution, idx_a=idx_a, idx_b=idx_b)

    xy_threshold = self.args.xy_threshold
    k = self.args.xy_k
    alpha = self.args.xy_alpha
    eps = 1 - (2*k*xy_threshold)**2
    rgb_eps = self.args.rgb_eps
    if eps == 1:
        xy_term = torch.pow(1 k*dist_map_p, alpha)
    else:
        xy_term = F.relu(torch.pow(eps k*dist_map_p, alpha)-1)   1
    dist_map = xy_term * (dist_map_c   rgb_eps)

    dist_min_ab = dist_map.min(-1)[0]
    dist_mean_ab = dist_min_ab.mean(-1)

    loss  = dist_mean_ab * self.args.chamfer_texture_pixel_loss_wt
    
    ### colect the matched points in the target for visualization
    indices = dist_map.argmin(dim=-1)
    self.matched_pos = torch.stack([vtx_target[i,indices[i]] for i in range(indices.shape[0])],0)
    self.matched_clr = torch.stack([color_target[i,indices[i]] for i in range(indices.shape[0])],0)
    # v2 from: grid sample
    self.matched_clr_v2 = grid_sample_from_vtx(self.matched_pos, target) # NOTE that back vertices color shown as well

my kaolin:

这篇好文章是转载于：学新通技术网

[读论文] MeshInversionMonocular 3D Object Reconstruction with GAN inversion (ECCV2022)

概述

Related Work

Single View 3D Reconstruction

GAN inversion

textured mesh generation

Method

Preliminaries

3.1 Reconstruction with Generative Prior

Pre-training Stage

Inversion Stage

3.2 chamfer texture loss （重点参考）

loss的消融实验

3.3 Chamfer Mask Loss

总loss

Experiments

Texture Flow vs. Texture Regression

实现（主要来自补充材料）

时间，显存，设备GPU

网络结构：和ConvMesh一样。

chamfer texture loss实现笔记

photoshop保存的图片太大微信发不了怎么办

《学习通》视频自动暂停处理方法

Android 11 保存文件到外部存储，并分享文件

word里面弄一个表格后上面的标题会跑到下面怎么办

photoshop扩展功能面板显示灰色怎么办

微信公众号没有声音提示怎么办

excel下划线不显示怎么办

excel打印预览压线压字怎么办

TikTok加速器哪个好免费的TK加速器推荐

怎样阻止微信小程序自动打开