1. <strong id="7actg"></strong>
    2. <table id="7actg"></table>

    3. <address id="7actg"></address>
      <address id="7actg"></address>
      1. <object id="7actg"><tt id="7actg"></tt></object>

        基于激光雷達點云的3D檢測方法匯總(LiDAR only)

        共 13746字,需瀏覽 28分鐘

         ·

        2022-02-20 23:28

        點擊下方卡片,關(guān)注“新機器視覺”公眾號

        視覺/圖像重磅干貨,第一時間送達

        作者柒柒@知乎
        來源丨h(huán)ttps://zhuanlan.zhihu.com/p/436452723
        轉(zhuǎn)載自3D視覺工坊,文章僅用于學術(shù)分享。

        前段時間比較忙,鴿了挺久,近期應(yīng)該會恢復更新。

        這篇文章主要是梳理一下近期3D Detection的進展,分類列舉出一些我認為的比較重要的、有代表性的工作。

        論文總結(jié)(一)主要講解基于激光雷達點云的3D檢測方法(LiDAR only),歡迎補充指正。


        一、論文分類匯總

        1. 基于激光雷達點云的3D檢測方法(LiDAR only)


        2. 基于多模態(tài)融合的3D檢測方法(LiDAR+RGB)

        3. 基于單目圖像的3D檢測方法(Monocular)

        4. 基于雙目圖像的3D檢測方法(Stereo)

        5. 基于視角特征提取的3D檢測方法

        6. 基于特征補充/偽點云生成的3D檢測方法(pseudo augment)

        7. 基于transformer的3D檢測方法 (Transformer)


        8. 基于半監(jiān)督學習的3D檢測方法(Semi supervised)


        二、論文分類解讀

        由于篇幅限制,論文總結(jié)(一)主要講解基于激光雷達點云的3D檢測方法(LiDAR only)。LiDAR only 指的是此類方法僅僅采用點云數(shù)據(jù)作為輸入,方法的主要區(qū)分性在于對點云數(shù)據(jù)不同的特征提取方式。

        1. Part-A^2 (TPAMI 2020)

        鏈接:https://zhuanlan.zhihu.com/p/436797682

        論文地址:https://arxiv.org/pdf/1907.03670.pdf
        作者單位:The Chinese University of Hong Kong
        代碼地址:
        GitHub - open-mmlab/OpenPCDet: OpenPCDet Toolbox for LiDAR-based 3D Object Detection.
        一句話讀論文:The ground-truth boxes of 3D object detection not only automatically provide accurate segmentation mask because of the fact that 3D objects are naturally separated in 3D scenes, but also imply the relative locations for each foreground 3D point within the ground truth boxes.

        網(wǎng)絡(luò)框架圖

        KITTI testset 實驗結(jié)果

        整體網(wǎng)絡(luò)框架分為兩個部分:Part-aware stage和Part-aggregation stage。

        • Part-aware stage:作者認為前景點的相對位置(intra-object part location)可以表征物體的形狀信息。因此,通過估計前景點的相對位置,作者認為可以得到更具有辨別性的特征。

        The part-aware network aims to extract discriminative features from the point cloud by learning to estimate the intra-object part locations of foreground points, since these part locations implicitly encode the 3D object’s shapes by indicating the relative locations of surface points of 3D objects.

        • Part-aggregation stage:既然是一個aggregation mechanism,作者具體聚合了哪些特征呢?文中作者主要融合了兩部分特征,point-wise part location 以及 point-wise sementic features。利用融合后的特征,進一步預測每一個候選框的置信度和位置。

        By considering the spatial distribution of the predicted intraobject part locations and the learned point-wise part features in a 3D box propsoal from stage-I, it is reasonable to aggregate all the information within a proposal for box proposal scoring and refinement.

        2. Point RCNN (CVPR 2019)

        鏈接:https://zhuanlan.zhihu.com/p/390767889

        鏈接:https://zhuanlan.zhihu.com/p/436419513

        論文地址:https://arxiv.org/pdf/1812.04244.pdf

        作者單位:The Chinese University of Hong Kong

        代碼地址:https://github.com/sshaoshuai/PointRCN

        N一句話讀論文:The learned point representation from segmentation is not only good at proposal generation but is also helpful for the later box refinement.

        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果

        PointRCNN 整體為two-stage的框架,第一級生成proposal (Bottom-up 3D Proposal Generation),第二級對proposal進行微調(diào)并得到最終的檢測結(jié)果 (Canonical 3D Box Refinement)。

        • Bottom-up 3D Proposal Generation:主要目的是做proposal的生成。對每一個point,提取point-wise feature,預測其屬于前景點的概率和相應(yīng)的proposal大小。對于生成的大量的proposal,利用NMS進行過濾,只保留其中的300個送入第二級進行微調(diào)。

        We propose a novel bottom-up point cloud-based 3D bounding box proposal generation algorithm, which generates a small number of high-quality 3D proposals via segmenting the point cloud into foreground objects and background. The learned point representation from segmentation is not only good at proposal generation but is also helpful for the later box refinement.

        • Canonical 3D Box Refinement:提取第一級proposal更精細的特征用于分類回歸。更精細的特征包括:點特征+空間位置特征+RoI特征。

        The proposed canonical 3D bounding box refinement takes advantages of our highrecall box proposals generated from stage-1 and learns to predict box coordinates refinements in the canonical coordinates with robust bin-based losses.

        3. STD (ICCV 2019)

        論文地址:https://arxiv.org/pdf/1907.10471v1.pdf
        作者單位:Youtu Lab, Tencent 等
        一句話讀論文:They propose a point-based proposal generation paradigm on point cloud with spherical anchors.

        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果

        整體框架依然屬于two-stage的網(wǎng)絡(luò),第一級生成proposal,第二級提取更精細的proposal (point+voxel)特征用于微調(diào)。與其他工作相比,STD 網(wǎng)絡(luò)最大的不同之處在于proposal generation的過程種使用了球形anchor (spherical anchors)。那么如何從球形anchor得到proposal呢?其具體步驟是:

        為所有點設(shè)定球形anchor → 判斷其為前景點的概率 → NMS 過濾冗余anchor → 預測余下的anchor對應(yīng)的proposal。

        作者認為這種球形anchor的優(yōu)點在于:

        • 球形anchor可以不必考慮物體朝向問題,極大減小了計算量;

        Considering that a 3D object could be with any orientations, we design spherical anchors rather than traditional cuboid anchors. As a result, the number of spherical anchors is not proportional to the number of pre-defined reference box orientation, leading to about 50% less anchors. With computation much reduced, we surprisingly achieve a much higher recall with spherical anchors than with traditional ones.

        4. PV-RCNN/PV-RCNN++(CVPR 2020)

        論文地址:https://arxiv.org/pdf/2102.00463.pdf
        作者單位:The Chinese University of Hong Kong
        代碼地址:
        https://github.com/open-mmlab/OpenPCDet
        一句話讀論文:They propose a novel two-stage detection network for accurate 3D object detection through a two-step strategy of point-voxel feature aggregation.

        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果

        整體框架屬于two-stage,有兩個核心內(nèi)容,第一個voxel feature → keypoints feature,第二個keypoints feature → proposal/grid feature。

        • voxel feature → keypoints feature。將大量的voxel feature 整合在少量的keypoints上,整合的過程包括了:原始raw points feature + multi-scale voxel feature + bev feature;

        Our proposed PV-RCNN-v1 first aggregates the voxel-wise scene features at multiple neural layers of 3D voxel CNN into a small number of keypoints, which bridge the 3D voxel CNN feature encoder and the proposal refinement network.

        • keypoints feature → proposal/grid feature。這一步其實就是利用之前整合的keypoints feature對每一個proposal做RoI Grid Pooling。只是需要額外注意的是,這個的grid 半徑是多尺度的,作者認為這種方式可以提取更豐富的proposal feature。

        In this step, we propose keypoint-to-grid RoI feature abstraction to generate accurate proposal-aligned features from the keypoint features for fine-grained proposal refinement.

        6. PointPillar(CVPR 2019)

        論文地址:https://arxiv.org/pdf/1812.05784.pdf
        作者單位:Oscar Beijbom and nuTonomy: an APTIV company
        一句話讀論文:A novel encoder which utilizes PointNets to learn a representation of point clouds organized in vertical columns (pillars).

        https://zhuanlan.zhihu.com/p/389034609

        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果

        這篇文章的核心內(nèi)容是Pillar Feature Network (Pillar點云提?。?/span>。主要作用是將點云數(shù)據(jù)處理為常見的三維數(shù)據(jù),也就是point clouds → C×H×W。具體流程為:point clouds → D×P×N → C×P×N → C×P → C×H×W,其中:

        1. D=9,包括每個點坐標(x, y, z),對應(yīng)的pillar中心點坐標(x_c, y_c, z_c),該點到中心點的偏移(x_p, y_p),以及反射值r。P=12000,表示單場景中的pillar數(shù)目。N=100,表示每個pillar中采樣的point數(shù)目。

        The points in each pillar are then augmented with xc, yc, zc, xp and yp where the c subscript denotes distance to the arithmetic mean of all points in the pillar and the p subscript denotes the offset from the pillar x; y center.

        2. D×P×N → C×P×N 通過卷積運算實現(xiàn)。

        Next, we use a simplified version of PointNet where, for each point, a linear layer is applied followed by Batch- Norm and ReLU to generate a C×P×N sized tensor.

        3. C×P×N → C×P是對N維度做max運算,類似max pooling。

        This is followed by a max operation over the channels to create an output tensor of size C×P.

        4. C×P → C×H×W是對P維度的展開,因為P=12000,表示該場景中的pillar數(shù)目,因此作者認為可以一定程度上描述完整場景,所以將P維度直接展開,可以得到我們熟悉的C×H×W三維數(shù)據(jù)。

        Once encoded, the features are scattered back to the original pillar locations to create a?pseudo-image?of size C×H×W where H and W indicate the height and width of the canvas.

        至此,其實核心部分已經(jīng)處理完畢,接下來可以用我們熟知的檢測方法處理此點云數(shù)據(jù)。

        7. MVP(NIPS 2021)

        論文地址:https://arxiv.org/pdf/2111.06881.pdf
        作者單位:https://www.utexas.edu/
        代碼地址:https://tianweiy.github.io/mvp/
        一句話讀論文:The approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point cloud.

        網(wǎng)絡(luò)框架

        NuScenes testset 實驗結(jié)果

        這篇文章的核心內(nèi)容是如何利用2D信息對3D點云進行補全,也就是2D instance segmentation → virtual 3D points。作者的具體做法是:3D raw points → 2D points → 3D virtual points。

        1. 3D raw points → 2D points。這一步是將3D物體上的所有點映射到2D圖像上,但是映射之后的點僅考慮落在對應(yīng)物體segmentation區(qū)域內(nèi)的。

        We start by projecting the 3D Lidar point cloud onto our detection. Specifically, we transform each Lidar point into the reference frame of the RGB camera, then project it into image coordinates with associated depth using a perspective projection. The frustum only considers projected 3D points that fall within a detection mask. Any Lidar measurement outside detection masks is discarded.

        2. 2D points → 3D virtual points。我們知道,3D到2D的映射屬于降維,而2D到3D的映射屬于升維。因此,如果沒有增加額外的depth信息,從2D映射到3D有無窮多解,這顯然是不合適的。作者嘗試通過補全每一個2D point的depth信息,將原本“一到無窮”的映射變換為“一到一”的映射。更具體的,對于一個random 2D point,作者將其depth估計為其最鄰近的映射3D raw point的depth。

        We start by randomly sampling 2D points from each instance mask. We sample points uniformly at random without repetition. For each sampled point, we retrieve a depth estimate from its nearest neighbor in the frustum.

        8. SE-SSD(CVPR 2021)

        論文地址:arxiv.org/pdf/2104.0980
        作者單位:The Chinese University of Hong Kong
        代碼地址:
        GitHub - Vegeta2020/SE-SSD: SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud, CVPR 2021.
        一句話讀論文:The key focus is on exploiting both soft and hard targets with formulated constraints to jointly optimize the model, without introducing extra computation in the inference.

        https://zhuanlan.zhihu.com/p/371520457

        https://zhuanlan.zhihu.com/p/390167950

        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果

        這篇文章的核心其實更偏向于augmentation,從本質(zhì)上講,本文中包含的兩個核心模塊soft targets生成訓練(label space augmentation)Shape-aware data augmentation (target space augmentation)均屬于augmentation范疇。通過ensemble的形式,可以同時利用soft targets和hard targets的信息。

        a)Hence, we exploit both soft and hard targets with our formulated constraints to jointly optimize the model, while incurring no extra inference time.
        b)On the other hand, to enable the student SSD to effectively explore a larger data space, we design a new augmentation scheme on top of conventional augmentation strategies to produce augmented object samples in a shape-aware manner.

        更具體地,有兩個注意點:

        1. 第一,Shape-aware Data Augmentation是怎么做的?首先,shape-aware嘗試解決的問題是:same samples but with different point representation,用作者文中的話說就是由于遮擋距離等問題的存在,相同物體具有不同的點云表達。

        Our insight comes from the observation that the point cloud patterns of ground-truth objects could vary significantly due to occlusions, changes in distance, and diversity of object shapes in practice.

        2. 第二,需要注意的是,作者這里是使用了不同loss監(jiān)督student SSD的學習過程。其中,consistency loss用于監(jiān)督predicitons和soft targets,orientation-aware DIoU loss用于監(jiān)督predictions和hard targets。

        Our consistency loss is to align the student predictions with the soft targets and when we augment the input, we bring along its hard targets to supervise the student with our orientationaware distance-IoU loss.

        9. SA-SSD(CVPR 2020)

        論文地址:https://www4.comp.polyu.edu.hk/~cslzhang/paper/SA-SSD.pdf
        作者單位:The Hong Kong Polytechnic University 等
        代碼地址:
        https://github.com/skyhehe123/SA-SSD
        一句話讀論文:The auxiliary network is introduced, which is jointly optimized by two point-level supervisions, to guide the convolutional features in the backbone network to be aware of the object structure.

        https://zhuanlan.zhihu.com/p/378017015

        https://zhuanlan.zhihu.com/p/390452834

        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果

        這篇文章的核心是利用輔助任務(wù)(auxiliary network)引導backbone學習到更豐富的結(jié)構(gòu)信息。更具體地,這個輔助任務(wù)包含兩個部分:前背景點分類(foreground/background classification)和中心點估計(center estimation)。

        We propose a structure-aware single-stage 3D object detector, which employs a detachable auxiliary network to learn structure information and exhibits better localization performance without extra cost.

        更具體地,

        1.前背景點分類(foreground/background classification),這一步為每一個point輸出其前/背景分類得分。這樣做的好處是,可以促使backbone學習到更清晰的邊界特征。

        Specifically, we employ a sigmoid function to the segmentation branch to predict the foreground/background probability of each point. The segmentation task enables the backbone network to more precisely detect the object boundary.

        2.中心點估計(center estimation),這一步通過整合intra-object points預測每一個object的中心點。這樣做的好處是,可以促使backbone學習到更精準的shape和scale信息。

        To further improve the localization accuracy, we employ another auxiliary task to learn the relative position of each object point to the object center. This intra-object relationship can help determine the scale and shape of the object, resulting in more precise localization.

        10. HVPR(CVPR 2021)

        論文地址:https://arxiv.org/pdf/2104.00902.pdf
        作者單位:School of Electrical and Electronic Engineering, Yonsei University
        代碼地址:
        https://cvlab.yonsei.ac.kr/projects/HVPR/
        一句話讀論文:It proposes a novel single-stage 3D detection method having the merit of both voxel-based and point-based features.

        https://zhuanlan.zhihu.com/p/373069090

        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果

        11. LiDAR RCNN(CVPR 2021)

        論文地址:arxiv.org/pdf/2103.1529
        作者單位:TuSimple
        代碼地址:
        GitHub - TuSimple/LiDAR_RCNN: LiDAR R-CNN: An Efficient and Universal 3D Object Detector
        一句話讀論文:The authors analyze the size ambiguity problem in detail and propose several methods to remedy it.

        https://arxiv.org/pdf/2103.15297.pdf

        https://zhuanlan.zhihu.com/p/372199358

        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果

        12. SECOND(Sensors 2018)

        論文地址:https://www.mdpi.com/1424-8220/18/10/3337
        作者單位:Chongqing University 等
        代碼地址:
        https://github.com/traveller59/second.pytorch
        一句話讀論文:It investigates an improved sparse convolution method, which significantly increases the speed of both training and inference.


        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果

        13. 3DIoUMatch(CVPR 2021)

        論文地址:https://arxiv.org/pdf/2012.04355.pdf
        作者單位:Stanford University 等
        代碼地址:
        https://thu17cyz.github.io/3DIoUMatch/
        一句話讀論文:It proposes to use the estimated 3D IoU as a localization metric and set category-aware selfadjusted thresholds to filter poorly localized proposals.

        https://zhuanlan.zhihu.com/p/390114438

        網(wǎng)絡(luò)框架

        ScanNet &amp;amp;amp;amp;amp;amp;amp; SUN-RGBD 實驗結(jié)果

        14. CenterPoint(CVPR 2021)

        論文地址:https://arxiv.org/pdf/2006.11275.pdf
        作者單位:https://www.utexas.edu/
        代碼地址:https://github.com/tianweiy/CenterPoint
        一句話讀論文:It first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity.

        網(wǎng)絡(luò)框架

        nuScenes testset 實驗結(jié)果

        15. 3DSSD(CVPR 2021)

        論文地址:https://arxiv.org/pdf/2002.10187.pdf
        作者單位:The Chinese University of Hong Kong 等
        代碼地址:
        https://github.com/dvlab-research/3DSSD
        一句話讀論文:It proposes a fusion sampling strategy in downsampling process to make detection on less representative points feasible.

        https://zhuanlan.zhihu.com/p/380595350

        網(wǎng)絡(luò)框架

        KITTI testset 實驗結(jié)果


        本文僅做學術(shù)分享,如有侵權(quán),請聯(lián)系刪文。

        —THE END—
        瀏覽 102
        點贊
        評論
        收藏
        分享

        手機掃一掃分享

        分享
        舉報
        評論
        圖片
        表情
        推薦
        點贊
        評論
        收藏
        分享

        手機掃一掃分享

        分享
        舉報
        1. <strong id="7actg"></strong>
        2. <table id="7actg"></table>

        3. <address id="7actg"></address>
          <address id="7actg"></address>
          1. <object id="7actg"><tt id="7actg"></tt></object>
            91精品国产99久久久久久女少 | 欧美精品欧美精品系列 | 9热在线| 免费一级婬片AAA片毛片A级 | 人妻一区日韩二区国产 | 四虎一区二区 | 美女考比视频 | 孕妇被医生粗大挺进 | 国产成人无码精品久久久电影 | 欧美老熟妇性爱视频 |