# CLUSTER-BASED 3D KEYPOINT DETECTION FOR CATEGORY-AGNOSTIC 6D POSE TRACKING

## Long Tian, Andrea Cavallaro, Changjae Oh

### Centre for Intelligent Sensing, Queen Mary University of London, United Kingdom

ABSTRACT: We present a model for category-agnostic 6D pose tracking. We tackle object pose tracking as a 3D keypoint detection and matching task that does not require ground-truth annotation of the keypoints. Using RGB-D data and the target object mask as inputs, we spatially segment the point cloud of the object into clusters. Each 3D point in the cluster is characterised by features encoding appearance and geometric information. We use these features to detect a keypoint for each cluster, and obtain a keypoint set of the object. With the detected keypoint sets from two frames, the inter-frame pose change is recovered through least-squares optimisation. The loss functions are designed to ensure that the detected keypoints are consistent in two frames and suitable for pose tracking.

The overall pipeline of 3D keypoint detection, consisting of three modules: a feature estimation module, a split module, and a 3D keypoint detection module. Given RGB and depth images, and a segmentation mask of the target object as inputs, the network detects a 3D keypoint set, $K^t = k_1^t, k_2^t,..., k_M^t$, as an output. We use ResNet to extract the appearance feature of the object using the RGB image and the segmentation mask. We also extract the geometric feature using the point cloud, after the outlier removal, as input to PointNet. The appearance and geometric features are fused by an encoder, following the architecture from DenseFusion. The split module uses the farthest point sampling algorithm to sample initial points, finds $N_c$ nearest points for each initial point in the 3D space, and groups the initial point and its neighbours into a cluster. The target object is separated into M clusters in total, and each cluster is used to detect a keypoint by utilising keypoint detection module.