Zhihao Li, Kexue Fu, Haoran Wang, Manning Wang†
31th ACM International Conference on Multimedia (ACMMM 2023)
In recent years, Neural Radiance Fields (NeRF) have been used as a map of 3D scene to estimate the 6-DoF pose of new observed images - given an image, estimate the relative rotation and translation of a camera using a trained NeRF. However, existing NeRF-based pose estimation methods have a small convergence region and need to be optimized iteratively over a given initial pose, which makes them slow and sensitive to the initial pose. In this paper, we propose PI-NeRF that directly outputs the pose of a given image without pose initialization and iterative optimization. This is achieved by integrating NeRF with invertible neural network (INN). Our method employs INNs to establish a bijective mapping between the rays and pixel features, which allows us to directly estimate the ray corresponding to each image pixel using the feature map extracted by an image encoder. Based on these rays, we can directly estimate the pose of the image using the PnP algorithm. Experiments conducted on both synthetic and real-world datasets demonstrate that our method is two orders of magnitude faster than existing NeRF-based methods, while the accuracy is competitive without initial pose. The accuracy of our method also outperforms NeRF-free absolute pose regression methods by a large margin.