• 首页 首页 icon
  • 工具库 工具库 icon
    • IP查询 IP查询 icon
  • 内容库 内容库 icon
    • 快讯库 快讯库 icon
    • 精品库 精品库 icon
    • 问答库 问答库 icon
  • 更多 更多 icon
    • 服务条款 服务条款 icon

PointPillars论文阅读和代码

武飞扬头像
CVplayer111
帮助1

论文地址:https://arxiv.org/pdf/1812.05784.pdf

代码地址:https://github.com/nutonomy/second.pytorch

                  https://github.com/open-mmlab/OpenPCDet 

一、论文动机

1.将点云投影到鸟瞰图,往往会丧失大部分空间信息,导致特征比较稀疏,直接用卷积神经网络效果不是很好。

2.为了解决这个问题,在PointNet基础上提出了VoxelNet,算是真正意义上的端对端的3D检测方法。尽管性能很好,但是他的推理速度只有4.4Hz,无法实时部署,second对它进行改进,但3D卷积还是实时的瓶颈。

二、论文方法

1.提出了一种新颖的点云编码器和检测网络

2.由于去掉了3D卷积,所以他的速度非常快,可以达到60Hz

3.直接操作柱体,而不是voxel,可以直接使用2D卷积操作,在GPU上十分高效

三、网络结构

学新通

第一块:将点云划分为柱体块,然后扩充点的特征到9个(xyz,xc,yc,zc,xp,yp),用PointNet简化版进行特征升维处理,maxpooling得到每个柱体的特征,再放到伪图像里。

第二块:使用2D CNN对伪图像特征进行处理,同时使用RPN网络,获得更好的定位精度和语义特征。

第三块:根据得到的特征图在先验框的基础上进行回归和分类

四、损失函数和其他创新

4.1损失函数

和VoxelNet里一样,每个类锚点都由宽度,长度,高度和z中心描述,并在两个方向上应用:0度和90度。(x,y,z,w,l,h,θ)使用smoothL1损失,角度使用正弦损失,朝向使用softmax分类损失,类别使用Focal Loss损失。pointpillar里正负样本的定义,每个类别的GT和相应的每个类别的anchor单独计算正负样本,流程是:对于每个GT,找到与其IOU最大的anchor,直接赋为正样本,然后每个anchor找iou最大的GT,筛选大于阈值正样本,小于阈值负样本。这么做是为了防止有些GT分不到anchor。可能GT与所有anchor的最大iou为0.3,防止不满足阈值导致匹配不上。

分类损失:将正负样本用scatter_转换为独热向量(batch_size, 321408, 4),4表示背景 三个类,然后模型预测的(batch_size, 248, 216, 18) --> (batch_size, 321408, 3),然后只计算三个类的focalloss损失。

4.2数据增强

数据增强对于性能的提升非常明显。

1.仿照SECOND建立真实框库,每次向点云里随机插入

2.对真实框进行旋转平移增强

3.全局点云增强,随机镜像翻转,全局缩放旋转,全局平移模拟定位噪声

4.3巧妙设计

VoxelNet的编码器是两个PointNet,这里瘦身就使用了一个, 这使我们的运行时在PyTorch runtime中减少了2.5ms。通过将上采样特征图层的输出尺寸减少一半至128,我们又节省了3.9ms。 这些变化均不会影响检测性能。

4.4推理部分

特征图上每个点都有6个anchor(3个尺度*2个角度)。对每个anchor都会预测三个类别概率,七个检测框参数,对于xyz的偏移量,要先乘以缩放比例系数,xy的是学新通,z的系数是高度h,角度是argsin。每个anchor预测三个类别,分别sigmoid,得到三个分数,然后求max,得到最大的。根据阈值卡掉大部分anchor,然后进行无类别NMS,无类别NMS时,首先要选取topk概率,然后再NMS。

五、代码阅读

5.1Pillar Feature Net

将输入的点云进行pillar划分,每个pillar长宽为0.16m,得到网格平面(432,496),选取非空pillar,组成(M,32,4)和(M,3)pillar在网格平面坐标,然后进行点云特征扩充,每个点云增加其相对于该pillar内选取点平均xyz的偏移量和相对于pillar几何中心的xyz偏移量,得到(M,32,10),经过一个简化的PointNet对点云特征进行升维(M,32,64)再maxpooling得到(M,64),再将M个pillar放回到(432,496)的网格里,得到伪图像数据。

点云生成pillar代码pcdet/datasets/processor/data_processor.py

  1.  
    def transform_points_to_voxels(self, data_dict=None, config=None):
  2.  
    """
  3.  
    将点云转换为pillar,使用spconv的VoxelGeneratorV2
  4.  
    因为pillar可是认为是一个z轴上所有voxel的集合,所以在设置的时候,
  5.  
    只需要将每个voxel的高度设置成kitti中点云的最大高度即可
  6.  
    """
  7.  
     
  8.  
    #初始化点云转换成pillar需要的参数
  9.  
    if data_dict is None:
  10.  
    # kitti截取的点云范围是[0, -39.68, -3, 69.12, 39.68, 1]
  11.  
    # 得到[69.12, 79.36, 4]/[0.16, 0.16, 4] = [432, 496, 1]
  12.  
    grid_size = (self.point_cloud_range[3:6] - self.point_cloud_range[0:3]) / np.array(config.VOXEL_SIZE)
  13.  
    self.grid_size = np.round(grid_size).astype(np.int64)
  14.  
    self.voxel_size = config.VOXEL_SIZE
  15.  
    # just bind the config, we will create the VoxelGeneratorWrapper later,
  16.  
    # to avoid pickling issues in multiprocess spawn
  17.  
    return partial(self.transform_points_to_voxels, config=config)
  18.  
     
  19.  
    if self.voxel_generator is None:
  20.  
    self.voxel_generator = VoxelGeneratorWrapper(
  21.  
    #给定每个pillar的大小 [0.16, 0.16, 4]
  22.  
    vsize_xyz=config.VOXEL_SIZE,
  23.  
    #给定点云的范围 [0, -39.68, -3, 69.12, 39.68, 1]
  24.  
    coors_range_xyz=self.point_cloud_range,
  25.  
    #给定每个点云的特征维度,这里是x,y,z,r 其中r是激光雷达反射强度
  26.  
    num_point_features=self.num_point_features,
  27.  
    #给定每个pillar中最多能有多少个点 32
  28.  
    max_num_points_per_voxel=config.MAX_POINTS_PER_VOXEL,
  29.  
    #最多选取多少个pillar,因为生成的pillar中,很多都是没有点在里面的
  30.  
    # 可以重上面的可视化图像中查看到,所以这里只需要得到那些非空的pillar就行
  31.  
    max_num_voxels=config.MAX_NUMBER_OF_VOXELS[self.mode], # 16000
  32.  
    )
  33.  
     
  34.  
    points = data_dict['points']
  35.  
    # 生成pillar输出
  36.  
    voxel_output = self.voxel_generator.generate(points)
  37.  
    # 假设一份点云数据是N*4,那么经过pillar生成后会得到三份数据
  38.  
    # voxels代表了每个生成的pillar数据,维度是[M,32,4]
  39.  
    # coordinates代表了每个生成的pillar所在的zyx轴坐标,维度是[M,3],其中z恒为0
  40.  
    # num_points代表了每个生成的pillar中有多少个有效的点维度是[m,],因为不满32会被0填充
  41.  
    voxels, coordinates, num_points = voxel_output
  42.  
     
  43.  
    if not data_dict['use_lead_xyz']:
  44.  
    voxels = voxels[..., 3:] # remove xyz in voxels(N, 3)
  45.  
     
  46.  
    data_dict['voxels'] = voxels
  47.  
    data_dict['voxel_coords'] = coordinates
  48.  
    data_dict['voxel_num_points'] = num_points
  49.  
    return data_dict
  50.  
     
  51.  
    # 下面是使用spconv生成pillar的代码
  52.  
     
  53.  
    class VoxelGeneratorWrapper():
  54.  
    def __init__(self, vsize_xyz, coors_range_xyz, num_point_features, max_num_points_per_voxel, max_num_voxels):
  55.  
    try:
  56.  
    from spconv.utils import VoxelGeneratorV2 as VoxelGenerator
  57.  
    self.spconv_ver = 1
  58.  
    except:
  59.  
    try:
  60.  
    from spconv.utils import VoxelGenerator
  61.  
    self.spconv_ver = 1
  62.  
    except:
  63.  
    from spconv.utils import Point2VoxelCPU3d as VoxelGenerator
  64.  
    self.spconv_ver = 2
  65.  
     
  66.  
    if self.spconv_ver == 1:
  67.  
    self._voxel_generator = VoxelGenerator(
  68.  
    voxel_size=vsize_xyz,
  69.  
    point_cloud_range=coors_range_xyz,
  70.  
    max_num_points=max_num_points_per_voxel,
  71.  
    max_voxels=max_num_voxels
  72.  
    )
  73.  
    else:
  74.  
    self._voxel_generator = VoxelGenerator(
  75.  
    vsize_xyz=vsize_xyz,
  76.  
    coors_range_xyz=coors_range_xyz,
  77.  
    num_point_features=num_point_features,
  78.  
    max_num_points_per_voxel=max_num_points_per_voxel,
  79.  
    max_num_voxels=max_num_voxels
  80.  
    )
  81.  
     
  82.  
    def generate(self, points):
  83.  
    if self.spconv_ver == 1:
  84.  
    voxel_output = self._voxel_generator.generate(points)
  85.  
    if isinstance(voxel_output, dict):
  86.  
    voxels, coordinates, num_points = \
  87.  
    voxel_output['voxels'], voxel_output['coordinates'], voxel_output['num_points_per_voxel']
  88.  
    else:
  89.  
    voxels, coordinates, num_points = voxel_output
  90.  
    else:
  91.  
    assert tv is not None, f"Unexpected error, library: 'cumm' wasn't imported properly."
  92.  
    voxel_output = self._voxel_generator.point_to_voxel(tv.from_numpy(points))
  93.  
    tv_voxels, tv_coordinates, tv_num_points = voxel_output
  94.  
    # make copy with numpy(), since numpy_view() will disappear as soon as the generator is deleted
  95.  
    voxels = tv_voxels.numpy()
  96.  
    coordinates = tv_coordinates.numpy()
  97.  
    num_points = tv_num_points.numpy()
  98.  
    return voxels, coordinates, num_points
学新通

点云特征扩充和简化版pointnet处理pcdet/models/backbones_3d/vfe/pillar_vfe.py

  1.  
    import torch
  2.  
    import torch.nn as nn
  3.  
    import torch.nn.functional as F
  4.  
     
  5.  
    from .vfe_template import VFETemplate
  6.  
     
  7.  
     
  8.  
    class PFNLayer(nn.Module):
  9.  
    def __init__(self,
  10.  
    in_channels,
  11.  
    out_channels,
  12.  
    use_norm=True,
  13.  
    last_layer=False):
  14.  
    super().__init__()
  15.  
     
  16.  
    self.last_vfe = last_layer
  17.  
    self.use_norm = use_norm
  18.  
    if not self.last_vfe:
  19.  
    out_channels = out_channels // 2
  20.  
     
  21.  
    if self.use_norm:
  22.  
    # 根据论文中,这是是简化版pointnet网络层的初始化
  23.  
    # 论文中使用的是 1x1 的卷积层完成这里的升维操作(理论上使用卷积的计算速度会更快)
  24.  
    # 输入的通道数是刚刚经过数据增强过后的点云特征,每个点云有10个特征,
  25.  
    # 输出的通道数是64
  26.  
    self.linear = nn.Linear(in_channels, out_channels, bias=False)
  27.  
    # 一维BN层
  28.  
    self.norm = nn.BatchNorm1d(out_channels, eps=1e-3, momentum=0.01)
  29.  
    else:
  30.  
    self.linear = nn.Linear(in_channels, out_channels, bias=True)
  31.  
     
  32.  
    self.part = 50000
  33.  
     
  34.  
    def forward(self, inputs):
  35.  
    if inputs.shape[0] > self.part:
  36.  
    # nn.Linear performs randomly when batch size is too large
  37.  
    num_parts = inputs.shape[0] // self.part
  38.  
    part_linear_out = [self.linear(inputs[num_part * self.part:(num_part 1) * self.part])
  39.  
    for num_part in range(num_parts 1)]
  40.  
    x = torch.cat(part_linear_out, dim=0)
  41.  
    else:
  42.  
    # x的维度由(M, 32, 10)升维成了(M, 32, 64)
  43.  
    x = self.linear(inputs)
  44.  
    torch.backends.cudnn.enabled = False
  45.  
    # BatchNorm1d层:(M, 64, 32) --> (M, 32, 64)
  46.  
    # (pillars,num_point,channel)->(pillars,channel,num_points)
  47.  
    # 这里之所以变换维度,是因为BatchNorm1d在通道维度上进行,对于图像来说默认模式为[N,C,H*W],通道在第二个维度上
  48.  
    x = self.norm(x.permute(0, 2, 1)).permute(0, 2, 1) if self.use_norm else x
  49.  
    torch.backends.cudnn.enabled = True
  50.  
    x = F.relu(x)
  51.  
    # 完成pointnet的最大池化操作,找出每个pillar中最能代表该pillar的点
  52.  
    # x_max shape :(M, 1, 64) 
  53.  
    x_max = torch.max(x, dim=1, keepdim=True)[0]
  54.  
     
  55.  
    if self.last_vfe:
  56.  
    # 返回经过简化版pointnet处理pillar的结果
  57.  
    return x_max
  58.  
    else:
  59.  
    x_repeat = x_max.repeat(1, inputs.shape[1], 1)
  60.  
    x_concatenated = torch.cat([x, x_repeat], dim=2)
  61.  
    return x_concatenated
  62.  
     
  63.  
     
  64.  
    class PillarVFE(VFETemplate):
  65.  
    """
  66.  
    model_cfg:NAME: PillarVFE
  67.  
    WITH_DISTANCE: False
  68.  
    USE_ABSLOTE_XYZ: True
  69.  
    USE_NORM: True
  70.  
    NUM_FILTERS: [64]
  71.  
    num_point_features:4
  72.  
    voxel_size:[0.16 0.16 4]
  73.  
    POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1]
  74.  
    """
  75.  
     
  76.  
    def __init__(self, model_cfg, num_point_features, voxel_size, point_cloud_range, **kwargs):
  77.  
    super().__init__(model_cfg=model_cfg)
  78.  
     
  79.  
    self.use_norm = self.model_cfg.USE_NORM
  80.  
    self.with_distance = self.model_cfg.WITH_DISTANCE
  81.  
    self.use_absolute_xyz = self.model_cfg.USE_ABSLOTE_XYZ
  82.  
    num_point_features = 6 if self.use_absolute_xyz else 3
  83.  
    if self.with_distance:
  84.  
    num_point_features = 1
  85.  
     
  86.  
    self.num_filters = self.model_cfg.NUM_FILTERS
  87.  
    assert len(self.num_filters) > 0
  88.  
    num_filters = [num_point_features] list(self.num_filters)
  89.  
     
  90.  
    pfn_layers = []
  91.  
    for i in range(len(num_filters) - 1):
  92.  
    in_filters = num_filters[i]
  93.  
    out_filters = num_filters[i 1]
  94.  
    pfn_layers.append(
  95.  
    PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
  96.  
    )
  97.  
    # 加入线性层,将10维特征变为64维特征
  98.  
    self.pfn_layers = nn.ModuleList(pfn_layers)
  99.  
     
  100.  
    self.voxel_x = voxel_size[0]
  101.  
    self.voxel_y = voxel_size[1]
  102.  
    self.voxel_z = voxel_size[2]
  103.  
    self.x_offset = self.voxel_x / 2 point_cloud_range[0]
  104.  
    self.y_offset = self.voxel_y / 2 point_cloud_range[1]
  105.  
    self.z_offset = self.voxel_z / 2 point_cloud_range[2]
  106.  
     
  107.  
    def get_output_feature_dim(self):
  108.  
    return self.num_filters[-1]
  109.  
     
  110.  
    def get_paddings_indicator(self, actual_num, max_num, axis=0):
  111.  
    """
  112.  
    计算padding的指示
  113.  
    Args:
  114.  
    actual_num:每个voxel实际点的数量(M,)
  115.  
    max_num:voxel最大点的数量(32,)
  116.  
    Returns:
  117.  
    paddings_indicator:表明一个pillar中哪些是真实数据,哪些是填充的0数据
  118.  
    """
  119.  
    # 扩展一个维度,使变为(M,1)
  120.  
    actual_num = torch.unsqueeze(actual_num, axis 1)
  121.  
    # [1, 1]
  122.  
    max_num_shape = [1] * len(actual_num.shape)
  123.  
    # [1, -1]
  124.  
    max_num_shape[axis 1] = -1
  125.  
    # (1,32)
  126.  
    max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape)
  127.  
    # (M, 32)
  128.  
    paddings_indicator = actual_num.int() > max_num
  129.  
    return paddings_indicator
  130.  
     
  131.  
    def forward(self, batch_dict, **kwargs):
  132.  
    """
  133.  
    batch_dict:
  134.  
    points:(N,5) --> (batch_index,x,y,z,r) batch_index代表了该点云数据在当前batch中的index
  135.  
    frame_id:(4,) --> (003877,001908,006616,005355) 帧ID
  136.  
    gt_boxes:(4,40,8)--> (x,y,z,dx,dy,dz,ry,class)
  137.  
    use_lead_xyz:(4,) --> (1,1,1,1)
  138.  
    voxels:(M,32,4) --> (x,y,z,r)
  139.  
    voxel_coords:(M,4) --> (batch_index,z,y,x) batch_index代表了该点云数据在当前batch中的index
  140.  
    voxel_num_points:(M,)
  141.  
    image_shape:(4,2) 每份点云数据对应的2号相机图片分辨率
  142.  
    batch_size:4 batch_size大小
  143.  
    """
  144.  
    voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict[
  145.  
    'voxel_coords']
  146.  
    # 求每个pillar中所有点云的和 (M, 32, 3)->(M, 1, 3) 设置keepdim=True的,则保留原来的维度信息
  147.  
    # 然后在使用求和信息除以每个点云中有多少个点来求每个pillar中所有点云的平均值 points_mean shape:(M, 1, 3)
  148.  
    points_mean = voxel_features[:, :, :3].sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(
  149.  
    -1, 1, 1)
  150.  
    # 每个点云数据减去该点对应pillar的平均值得到差值 xc,yc,zc
  151.  
    f_cluster = voxel_features[:, :, :3] - points_mean
  152.  
     
  153.  
    # 创建每个点云到该pillar的坐标中心点偏移量空数据 xp,yp,zp
  154.  
    f_center = torch.zeros_like(voxel_features[:, :, :3])
  155.  
    # coords是每个网格点的坐标,即[432, 496, 1],需要乘以每个pillar的长宽得到点云数据中实际的长宽(单位米)
  156.  
    # 同时为了获得每个pillar的中心点坐标,还需要加上每个pillar长宽的一半得到中心点坐标
  157.  
    # 每个点的x、y、z减去对应pillar的坐标中心点,得到每个点到该点中心点的偏移量
  158.  
    f_center[:, :, 0] = voxel_features[:, :, 0] - (
  159.  
    coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x self.x_offset)
  160.  
    f_center[:, :, 1] = voxel_features[:, :, 1] - (
  161.  
    coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y self.y_offset)
  162.  
    # 此处偏移多了z轴偏移 论文中没有z轴偏移
  163.  
    f_center[:, :, 2] = voxel_features[:, :, 2] - (
  164.  
    coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z self.z_offset)
  165.  
     
  166.  
    # 如果使用绝对坐标,直接组合
  167.  
    if self.use_absolute_xyz:
  168.  
    features = [voxel_features, f_cluster, f_center]
  169.  
    # 否则,取voxel_features的3维之后,在组合
  170.  
    else:
  171.  
    features = [voxel_features[..., 3:], f_cluster, f_center]
  172.  
     
  173.  
    # 如果使用距离信息
  174.  
    if self.with_distance:
  175.  
    # torch.norm的第一个2指的是求2范数,第二个2是在第三维度求范数
  176.  
    points_dist = torch.norm(voxel_features[:, :, :3], 2, 2, keepdim=True)
  177.  
    features.append(points_dist)
  178.  
    # 将特征在最后一维度拼接 得到维度为(M,32,10)的张量
  179.  
    features = torch.cat(features, dim=-1)
  180.  
    # 每个pillar中点云的最大数量
  181.  
    voxel_count = features.shape[1]
  182.  
    """
  183.  
    由于在生成每个pillar中,不满足最大32个点的pillar会存在由0填充的数据,
  184.  
    而刚才上面的计算中,会导致这些
  185.  
    由0填充的数据在计算出现xc,yc,zc和xp,yp,zp出现数值,
  186.  
    所以需要将这个被填充的数据的这些数值清0,
  187.  
    因此使用get_paddings_indicator计算features中哪些是需要被保留真实数据和需要被置0的填充数据
  188.  
    """
  189.  
    # 得到mask维度是(M, 32)
  190.  
    # mask中指名了每个pillar中哪些是需要被保留的数据
  191.  
    mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
  192.  
    # (M, 32)->(M, 32, 1)
  193.  
    mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
  194.  
    # 将feature中被填充数据的所有特征置0
  195.  
    features *= mask
  196.  
     
  197.  
    for pfn in self.pfn_layers:
  198.  
    features = pfn(features)
  199.  
    # (M, 64), 每个pillar抽象出一个64维特征
  200.  
    features = features.squeeze()
  201.  
    batch_dict['pillar_features'] = features
  202.  
    return batch_dict
学新通

将M个pillar放回到原来坐标分布中形成伪图像pcdet/models/backbones_2d/map_to_bev/pointpillar_scatter.py

  1.  
    import torch
  2.  
    import torch.nn as nn
  3.  
     
  4.  
     
  5.  
    class PointPillarScatter(nn.Module):
  6.  
    """
  7.  
    对应到论文中就是stacked pillars,将生成的pillar按照坐标索引还原到原空间中
  8.  
    """
  9.  
     
  10.  
    def __init__(self, model_cfg, grid_size, **kwargs):
  11.  
    super().__init__()
  12.  
     
  13.  
    self.model_cfg = model_cfg
  14.  
    self.num_bev_features = self.model_cfg.NUM_BEV_FEATURES # 64
  15.  
    self.nx, self.ny, self.nz = grid_size # [432,496,1]
  16.  
    assert self.nz == 1
  17.  
     
  18.  
    def forward(self, batch_dict, **kwargs):
  19.  
    """
  20.  
    Args:
  21.  
    pillar_features:(M,64)
  22.  
    coords:(M, 4) 第一维是batch_index 其余维度为xyz
  23.  
    Returns:
  24.  
    batch_spatial_features:(batch_size, 64, 496, 432)
  25.  
    """
  26.  
    # 拿到经过前面pointnet处理过后的pillar数据和每个pillar所在点云中的坐标位置
  27.  
    # pillar_features 维度 (M, 64)
  28.  
    # coords 维度 (M, 4)
  29.  
    pillar_features, coords = batch_dict['pillar_features'], batch_dict['voxel_coords']
  30.  
     
  31.  
    # 将转换成为伪图像的数据存在到该列表中
  32.  
    batch_spatial_features = []
  33.  
    batch_size = coords[:, 0].max().int().item() 1
  34.  
     
  35.  
    # batch中的每个数据独立处理
  36.  
    for batch_idx in range(batch_size):
  37.  
    # 创建一个空间坐标所有用来接受pillar中的数据
  38.  
    # self.num_bev_features是64
  39.  
    # self.nz * self.nx * self.ny是生成的空间坐标索引 [496, 432, 1]的乘积
  40.  
    # spatial_feature 维度 (64,214272)
  41.  
    spatial_feature = torch.zeros(
  42.  
    self.num_bev_features,
  43.  
    self.nz * self.nx * self.ny,
  44.  
    dtype=pillar_features.dtype,
  45.  
    device=pillar_features.device) # (64,214272)-->1x432x496=214272
  46.  
     
  47.  
    # 从coords[:, 0]取出该batch_idx的数据mask
  48.  
    batch_mask = coords[:, 0] == batch_idx
  49.  
    # 根据mask提取坐标
  50.  
    this_coords = coords[batch_mask, :]
  51.  
    # this_coords中存储的坐标是z,y和x的形式,且只有一层,因此计算索引的方式如下
  52.  
    # 平铺后需要计算前面有多少个pillar 一直到当前pillar的索引
  53.  
    """
  54.  
    因为前面是将所有数据flatten成一维的了,相当于一个图片宽高为[496, 432]的图片
  55.  
    被flatten成一维的图片数据了,变成了496*432=214272;
  56.  
    而this_coords中存储的是平面(不需要考虑Z轴)中一个点的信息,所以要
  57.  
    将这个点的位置放回被flatten的一位数据时,需要计算在该点之前所有行的点总和加上
  58.  
    该点所在的列即可
  59.  
    """
  60.  
    # 这里得到所有非空pillar在伪图像的对应索引位置
  61.  
    indices = this_coords[:, 1] this_coords[:, 2] * self.nx this_coords[:, 3]
  62.  
    # 转换数据类型
  63.  
    indices = indices.type(torch.long)
  64.  
    # 根据mask提取pillar_features
  65.  
    pillars = pillar_features[batch_mask, :]
  66.  
    pillars = pillars.t()
  67.  
    # 在索引位置填充pillars
  68.  
    spatial_feature[:, indices] = pillars
  69.  
    # 将空间特征加入list,每个元素为(64, 214272)
  70.  
    batch_spatial_features.append(spatial_feature)
  71.  
     
  72.  
    # 在第0个维度将所有的数据堆叠在一起
  73.  
    batch_spatial_features = torch.stack(batch_spatial_features, 0)
  74.  
    # reshape回原空间(伪图像) (4, 64, 214272)--> (4, 64, 496, 432)
  75.  
    batch_spatial_features = batch_spatial_features.view(batch_size, self.num_bev_features * self.nz, self.ny,
  76.  
    self.nx)
  77.  
    # 将结果加入batch_dict
  78.  
    batch_dict['spatial_features'] = batch_spatial_features
  79.  
    return batch_dict
学新通

5.2 2D CNN

得到伪图像特征(batch_size,64,432,496),使用FPN网络,进行多尺度特征提取和融合,三次上采样后得到(batch_size,128,248,216),拼接得到(batch_size,384,248,216)

pcdet/models/backbones_2d/base_bev_backbone.py

  1.  
    import numpy as np
  2.  
    import torch
  3.  
    import torch.nn as nn
  4.  
     
  5.  
     
  6.  
    class BaseBEVBackbone(nn.Module):
  7.  
    def __init__(self, model_cfg, input_channels):
  8.  
    super().__init__()
  9.  
    self.model_cfg = model_cfg
  10.  
    # 读取下采样层参数
  11.  
    if self.model_cfg.get('LAYER_NUMS', None) is not None:
  12.  
    assert len(self.model_cfg.LAYER_NUMS) == len(self.model_cfg.LAYER_STRIDES) == len(
  13.  
    self.model_cfg.NUM_FILTERS)
  14.  
    layer_nums = self.model_cfg.LAYER_NUMS
  15.  
    layer_strides = self.model_cfg.LAYER_STRIDES
  16.  
    num_filters = self.model_cfg.NUM_FILTERS
  17.  
    else:
  18.  
    layer_nums = layer_strides = num_filters = []
  19.  
    # 读取上采样层参数
  20.  
    if self.model_cfg.get('UPSAMPLE_STRIDES', None) is not None:
  21.  
    assert len(self.model_cfg.UPSAMPLE_STRIDES) == len(self.model_cfg.NUM_UPSAMPLE_FILTERS)
  22.  
    num_upsample_filters = self.model_cfg.NUM_UPSAMPLE_FILTERS
  23.  
    upsample_strides = self.model_cfg.UPSAMPLE_STRIDES
  24.  
    else:
  25.  
    upsample_strides = num_upsample_filters = []
  26.  
     
  27.  
    num_levels = len(layer_nums) # 2
  28.  
    c_in_list = [input_channels, *num_filters[:-1]] # (256, 128) input_channels:256, num_filters[:-1]:64,128
  29.  
    self.blocks = nn.ModuleList()
  30.  
    self.deblocks = nn.ModuleList()
  31.  
    for idx in range(num_levels): # (64,64)-->(64,128)-->(128,256) # 这里为cur_layers的第一层且stride=2
  32.  
    cur_layers = [
  33.  
    nn.ZeroPad2d(1),
  34.  
    nn.Conv2d(
  35.  
    c_in_list[idx], num_filters[idx], kernel_size=3,
  36.  
    stride=layer_strides[idx], padding=0, bias=False
  37.  
    ),
  38.  
    nn.BatchNorm2d(num_filters[idx], eps=1e-3, momentum=0.01),
  39.  
    nn.ReLU()
  40.  
    ]
  41.  
    for k in range(layer_nums[idx]): # 根据layer_nums堆叠卷积层
  42.  
    cur_layers.extend([
  43.  
    nn.Conv2d(num_filters[idx], num_filters[idx], kernel_size=3, padding=1, bias=False),
  44.  
    nn.BatchNorm2d(num_filters[idx], eps=1e-3, momentum=0.01),
  45.  
    nn.ReLU()
  46.  
    ])
  47.  
    # 在block中添加该层
  48.  
    # *作用是:将列表解开成几个独立的参数,传入函数 # 类似的运算符还有两个星号(**),是将字典解开成独立的元素作为形参
  49.  
    self.blocks.append(nn.Sequential(*cur_layers))
  50.  
    if len(upsample_strides) > 0: # 构造上采样层 # (1, 2, 4)
  51.  
    stride = upsample_strides[idx]
  52.  
    if stride >= 1:
  53.  
    self.deblocks.append(nn.Sequential(
  54.  
    nn.ConvTranspose2d(
  55.  
    num_filters[idx], num_upsample_filters[idx],
  56.  
    upsample_strides[idx],
  57.  
    stride=upsample_strides[idx], bias=False
  58.  
    ),
  59.  
    nn.BatchNorm2d(num_upsample_filters[idx], eps=1e-3, momentum=0.01),
  60.  
    nn.ReLU()
  61.  
    ))
  62.  
    else:
  63.  
    stride = np.round(1 / stride).astype(np.int)
  64.  
    self.deblocks.append(nn.Sequential(
  65.  
    nn.Conv2d(
  66.  
    num_filters[idx], num_upsample_filters[idx],
  67.  
    stride,
  68.  
    stride=stride, bias=False
  69.  
    ),
  70.  
    nn.BatchNorm2d(num_upsample_filters[idx], eps=1e-3, momentum=0.01),
  71.  
    nn.ReLU()
  72.  
    ))
  73.  
     
  74.  
    c_in = sum(num_upsample_filters) # 512
  75.  
    if len(upsample_strides) > num_levels:
  76.  
    self.deblocks.append(nn.Sequential(
  77.  
    nn.ConvTranspose2d(c_in, c_in, upsample_strides[-1], stride=upsample_strides[-1], bias=False),
  78.  
    nn.BatchNorm2d(c_in, eps=1e-3, momentum=0.01),
  79.  
    nn.ReLU(),
  80.  
    ))
  81.  
     
  82.  
    self.num_bev_features = c_in
  83.  
     
  84.  
    def forward(self, data_dict):
  85.  
    """
  86.  
    Args:
  87.  
    data_dict:
  88.  
    spatial_features : (4, 64, 496, 432)
  89.  
    Returns:
  90.  
    """
  91.  
    spatial_features = data_dict['spatial_features']
  92.  
    ups = []
  93.  
    ret_dict = {}
  94.  
    x = spatial_features
  95.  
    for i in range(len(self.blocks)):
  96.  
    x = self.blocks[i](x)
  97.  
     
  98.  
    stride = int(spatial_features.shape[2] / x.shape[2])
  99.  
    ret_dict['spatial_features_%dx' % stride] = x
  100.  
    if len(self.deblocks) > 0: # (4,64,248,216)-->(4,128,124,108)-->(4,256,62,54)
  101.  
    ups.append(self.deblocks[i](x))
  102.  
    else:
  103.  
    ups.append(x)
  104.  
     
  105.  
    # 如果存在上采样层,将上采样结果连接
  106.  
    if len(ups) > 1:
  107.  
    """
  108.  
    最终经过所有上采样层得到的3个尺度的的信息
  109.  
    每个尺度的 shape 都是 (batch_size, 128, 248, 216)
  110.  
    在第一个维度上进行拼接得到x 维度是 (batch_size, 384, 248, 216)
  111.  
    """
  112.  
    x = torch.cat(ups, dim=1)
  113.  
    elif len(ups) == 1:
  114.  
    x = ups[0]
  115.  
     
  116.  
    # Fasle
  117.  
    if len(self.deblocks) > len(self.blocks):
  118.  
    x = self.deblocks[-1](x)
  119.  
     
  120.  
    # 将结果存储在spatial_features_2d中并返回
  121.  
    data_dict['spatial_features_2d'] = x
  122.  
     
  123.  
    return data_dict
学新通

5.3 SSD检测头

先验框的设计上,一共有三个类别的先验框,每个类别有一个尺度两个角度的先验框。

 pcdet/models/dense_heads/anchor_head_single.py

  1.  
    import numpy as np
  2.  
    import torch.nn as nn
  3.  
     
  4.  
    from .anchor_head_template import AnchorHeadTemplate
  5.  
     
  6.  
     
  7.  
    class AnchorHeadSingle(AnchorHeadTemplate):
  8.  
    """
  9.  
    Args:
  10.  
    model_cfg: AnchorHeadSingle的配置
  11.  
    input_channels: 384 输入通道数
  12.  
    num_class: 3
  13.  
    class_names: ['Car','Pedestrian','Cyclist']
  14.  
    grid_size: (432, 496, 1)
  15.  
    point_cloud_range: (0, -39.68, -3, 69.12, 39.68, 1)
  16.  
    predict_boxes_when_training: False
  17.  
    """
  18.  
     
  19.  
    def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range,
  20.  
    predict_boxes_when_training=True, **kwargs):
  21.  
    super().__init__(
  22.  
    model_cfg=model_cfg, num_class=num_class, class_names=class_names, grid_size=grid_size,
  23.  
    point_cloud_range=point_cloud_range,
  24.  
    predict_boxes_when_training=predict_boxes_when_training
  25.  
    )
  26.  
    # 每个点有3个尺度的个先验框 每个先验框都有两个方向(0度,90度) num_anchors_per_location:[2, 2, 2]
  27.  
    self.num_anchors_per_location = sum(self.num_anchors_per_location) # sum([2, 2, 2])
  28.  
    # Conv2d(512,18,kernel_size=(1,1),stride=(1,1))
  29.  
    self.conv_cls = nn.Conv2d(
  30.  
    input_channels, self.num_anchors_per_location * self.num_class,
  31.  
    kernel_size=1
  32.  
    )
  33.  
    # Conv2d(512,42,kernel_size=(1,1),stride=(1,1))
  34.  
    self.conv_box = nn.Conv2d(
  35.  
    input_channels, self.num_anchors_per_location * self.box_coder.code_size,
  36.  
    kernel_size=1
  37.  
    )
  38.  
    # 如果存在方向损失,则添加方向卷积层Conv2d(512,12,kernel_size=(1,1),stride=(1,1))
  39.  
    if self.model_cfg.get('USE_DIRECTION_CLASSIFIER', None) is not None:
  40.  
    self.conv_dir_cls = nn.Conv2d(
  41.  
    input_channels,
  42.  
    self.num_anchors_per_location * self.model_cfg.NUM_DIR_BINS,
  43.  
    kernel_size=1
  44.  
    )
  45.  
    else:
  46.  
    self.conv_dir_cls = None
  47.  
    self.init_weights()
  48.  
     
  49.  
    # 初始化参数
  50.  
    def init_weights(self):
  51.  
    pi = 0.01
  52.  
    # 初始化分类卷积偏置
  53.  
    nn.init.constant_(self.conv_cls.bias, -np.log((1 - pi) / pi))
  54.  
    # 初始化分类卷积权重
  55.  
    nn.init.normal_(self.conv_box.weight, mean=0, std=0.001)
  56.  
     
  57.  
    def forward(self, data_dict):
  58.  
    # 从字典中取出经过backbone处理过的信息
  59.  
    # spatial_features_2d 维度 (batch_size, 384, 248, 216)
  60.  
    spatial_features_2d = data_dict['spatial_features_2d']
  61.  
    # 每个坐标点上面6个先验框的类别预测 --> (batch_size, 18, 200, 176)
  62.  
    cls_preds = self.conv_cls(spatial_features_2d)
  63.  
    # 每个坐标点上面6个先验框的参数预测 --> (batch_size, 42, 200, 176) 其中每个先验框需要预测7个参数,分别是(x, y, z, w, l, h, θ)
  64.  
    box_preds = self.conv_box(spatial_features_2d)
  65.  
    # 维度调整,将类别放置在最后一维度 [N, H, W, C] --> (batch_size, 200, 176, 18)
  66.  
    cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous()
  67.  
    # 维度调整,将先验框调整参数放置在最后一维度 [N, H, W, C] --> (batch_size ,200, 176, 42)
  68.  
    box_preds = box_preds.permute(0, 2, 3, 1).contiguous()
  69.  
    # 将类别和先验框调整预测结果放入前向传播字典中
  70.  
    self.forward_ret_dict['cls_preds'] = cls_preds
  71.  
    self.forward_ret_dict['box_preds'] = box_preds
  72.  
    # 进行方向分类预测
  73.  
    if self.conv_dir_cls is not None:
  74.  
    # # 每个先验框都要预测为两个方向中的其中一个方向 --> (batch_size, 12, 200, 176)
  75.  
    dir_cls_preds = self.conv_dir_cls(spatial_features_2d)
  76.  
    # 将类别和先验框方向预测结果放到最后一个维度中 [N, H, W, C] --> (batch_size, 248, 216, 12)
  77.  
    dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
  78.  
    # 将方向预测结果放入前向传播字典中
  79.  
    self.forward_ret_dict['dir_cls_preds'] = dir_cls_preds
  80.  
    else:
  81.  
    dir_cls_preds = None
  82.  
     
  83.  
    """
  84.  
    如果是在训练模式的时候,需要对每个先验框分配GT来计算loss
  85.  
    """
  86.  
    if self.training:
  87.  
    # targets_dict = {
  88.  
    # 'box_cls_labels': cls_labels, # (4,211200)
  89.  
    # 'box_reg_targets': bbox_targets, # (4,211200, 7)
  90.  
    # 'reg_weights': reg_weights # (4,211200)
  91.  
    # }
  92.  
    targets_dict = self.assign_targets(
  93.  
    gt_boxes=data_dict['gt_boxes'] # (4,39,8)
  94.  
    )
  95.  
    # 将GT分配结果放入前向传播字典中
  96.  
    self.forward_ret_dict.update(targets_dict)
  97.  
     
  98.  
    # 如果不是训练模式,则直接生成进行box的预测
  99.  
    if not self.training or self.predict_boxes_when_training:
  100.  
    # 根据预测结果解码生成最终结果
  101.  
    batch_cls_preds, batch_box_preds = self.generate_predicted_boxes(
  102.  
    batch_size=data_dict['batch_size'],
  103.  
    cls_preds=cls_preds, box_preds=box_preds, dir_cls_preds=dir_cls_preds
  104.  
    )
  105.  
    data_dict['batch_cls_preds'] = batch_cls_preds # (1, 211200, 3) 70400*3=211200
  106.  
    data_dict['batch_box_preds'] = batch_box_preds # (1, 211200, 7)
  107.  
    data_dict['cls_preds_normalized'] = False
  108.  
     
  109.  
    return data_dict
学新通

六、Reference

https://blog.csdn.net/qq_41366026/article/details/123006401?ops_request_misc=%7B%22request%5Fid%22%3A%22166692373016800182114331%22%2C%22scm%22%3A%2220140713.130102334..%22%7D&request_id=166692373016800182114331&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~top_positive~default-1-123006401-null-null.142^v62^control_1,201^v3^control_1,213^v1^t3_control1&utm_term=pointpillars&spm=1018.2226.3001.4187

这篇好文章是转载于:学新通技术网

  • 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
  • 本站站名: 学新通技术网
  • 本文地址: /boutique/detail/tanhgkjgef
系列文章
更多 icon
同类精品
更多 icon
继续加载