In SLAM (Simultaneous localization and mapping) problems, Pose Graph Optimization (PGO) is a technique to refine an initial estimate of a set of poses (positions and orientations) from a set of pairwise relative measurements. The optimization procedure can be negatively affected even by a single outlier measurement, with possible catastrophic and meaningless results. Although recent works on robust optimization aim to mitigate the presence of outlier measurements, robust solutions capable of handling large numbers of outliers are yet to come. This paper presents IPC, acronym for Incremental Probabilistic Consensus, a method that approximates the solution to the combinatorial problem of finding the maximally consistent set of measurements in an incremental fashion. It evaluates the consistency of each loop closure measurement through a consensus-based procedure, possibly applied to a subset of the global problem, where all previously integrated inlier measurements have veto power. We evaluated IPC on standard benchmarks against several state-of-the-art methods. Although it is simple and relatively easy to implement, IPC competes with or outperforms the other tested methods in handling outliers while providing online performances. We release with this paper an open-source implementation of the proposed method.
在SLAM(同时定位与映射)问题中,Pose Graph Optimization(PGO)是一种利用成对相对测量数据对一组姿态(位置和方向)的初始估计进行优化的方法。优化过程可能会受到单个离群测量的负面影响,可能导致灾难性和无意义的结果。尽管最近关于鲁棒优化的研究表明,旨在减轻离群测量的存在,但能够处理大量离群情况的鲁棒解决方案仍然尚未到来。本文介绍了一种名为IPC(渐进概率共识)的方法,该方法通过迭代计算在逐步方式下求解组合问题,以找到一致性最大的一组测量。它通过一致性为基础的程序评估每个循环关闭测量的一致性,该程序可能应用于全局问题的一个子集,其中之前整合的所有内插测量具有否决权。我们对IPC在几个最先进的SLAM方法上的标准基准进行了评估。尽管它简单且相对容易实现,但在处理离群同时提供在线性能方面,IPC与其他测试方法竞争或优于它们。我们还将此论文作为开源实现所提出的方法。
https://arxiv.org/abs/2405.08503
Place recognition is the foundation for enabling autonomous systems to achieve independent decision-making and safe operations. It is also crucial in tasks such as loop closure detection and global localization within SLAM. Previous methods utilize mundane point cloud representations as input and deep learning-based LiDAR-based Place Recognition (LPR) approaches employing different point cloud image inputs with convolutional neural networks (CNNs) or transformer architectures. However, the recently proposed Mamba deep learning model, combined with state space models (SSMs), holds great potential for long sequence modeling. Therefore, we developed OverlapMamba, a novel network for place recognition, which represents input range views (RVs) as sequences. In a novel way, we employ a stochastic reconstruction approach to build shift state space models, compressing the visual representation. Evaluated on three different public datasets, our method effectively detects loop closures, showing robustness even when traversing previously visited locations from different directions. Relying on raw range view inputs, it outperforms typical LiDAR and multi-view combination methods in time complexity and speed, indicating strong place recognition capabilities and real-time efficiency.
位置识别是使自动驾驶系统实现独立决策和安全的操作的基础,同时在SLAM任务中(例如闭环检测和全局定位)也非常关键。以前的方法利用乏味的点云表示作为输入,并采用基于深度学习的激光雷达(LiDAR)基于点云图像的识别方法(LPR)或Transformer架构的不同点云图像输入。然而,最近提出的Mamba深度学习模型与状态空间模型(SSMs)相结合,具有很大的长期序列建模潜力。因此,我们开发了OverlapMamba,一种新型的用于位置识别的网络,将输入范围视(RVs)表示为序列。与传统方法不同,我们采用随机重构方法构建了转移状态空间模型,压缩了视觉表示。在三个不同的公开数据集上评估,我们的方法有效地检测到了闭环,即使从不同的方向访问之前访问过的位置时,表现依然稳健。依赖原始范围视图输入,它在时间和速度上优于典型的LiDAR和多视图组合方法,表明具有强大的定位能力和实时效率。
https://arxiv.org/abs/2405.07966
We present SceneFactory, a workflow-centric and unified framework for incremental scene modeling, that supports conveniently a wide range of applications, such as (unposed and/or uncalibrated) multi-view depth estimation, LiDAR completion, (dense) RGB-D/RGB-L/Mono//Depth-only reconstruction and SLAM. The workflow-centric design uses multiple blocks as the basis for building different production lines. The supported applications, i.e., productions avoid redundancy in their designs. Thus, the focus is on each block itself for independent expansion. To support all input combinations, our implementation consists of four building blocks in SceneFactory: (1) Mono-SLAM, (2) depth estimation, (3) flexion and (4) scene reconstruction. Furthermore, we propose an unposed & uncalibrated multi-view depth estimation model (U2-MVD) to estimate dense geometry. U2-MVD exploits dense bundle adjustment for solving for poses, intrinsics, and inverse depth. Then a semantic-awared ScaleCov step is introduced to complete the multi-view depth. Relying on U2-MVD, SceneFactory both supports user-friendly 3D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high quality surface and color reconstruction, we propose due-purpose Multi-resolutional Neural Points (DM-NPs) for the first surface accessible Surface Color Field design, where we introduce Improved Point Rasterization (IPR) for point cloud based surface query. We implement and experiment with SceneFactory to demonstrate its broad practicability and high flexibility. Its quality also competes or exceeds the tightly-coupled state of the art approaches in all tasks. We contribute the code to the community (this https URL).
我们提出了SceneFactory,一个以工作流为中心且统一的框架,用于增量式场景建模,支持各种应用,例如(未经校准和/或未校准)多视角深度估计、激光雷达完成、(密集) RGB-D/RGB-L/单/深度重建和SLAM。工作流中心的设计使用多个模块作为构建不同生产线的基线。支持的应用程序在其设计中避免了冗余。因此,关注点在于每个模块本身,进行独立扩展。为了支持所有输入组合,我们的实现包括SceneFactory中的四个构建模块:(1)单SLAM,(2)深度估计,(3)伸展和(4)场景重建。此外,我们提出了一个未经校准且未校准的多视角深度估计模型(U2-MVD)来估计密集几何。U2-MVD利用密集Bundle Adjustment来解决姿态、内参和逆深度。然后引入了一个语义感知的尺度 Cov 步来完成多视角深度。凭借 U2-MVD,SceneFactory支持用户友好的3D创建(仅使用图像)。同时,它还桥接了Dense RGB-D和Dense Mono的应用程序。为了实现高品质表面和色彩重建,我们为第一个可访问表面颜色场设计提出了目的明确的Multi-resolutional Neural Points(DM-NPs),并在基于点云的表面查询中引入了改进点映射(IPR)。我们在SceneFactory上实现并实验了它,以证明其广泛适用性和高度灵活性。其在所有任务中的质量也与其他紧密耦合的方法竞争或超过。我们将代码贡献给社区(此https URL)。
https://arxiv.org/abs/2405.07847
Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning will be reduced. In order to accurately determine the position of the UAV in a planetary scene in the absence of the global navigation satellite system (GNSS), this paper proposes JointLoc, which estimates the real-time UAV position in the world coordinate system by adaptively fusing the absolute 2-degree-of-freedom (2-DoF) pose and the relative 6-degree-of-freedom (6-DoF) pose. Extensive comparative experiments were conducted on a proposed planetary UAV image cross-modal localization dataset, which contains three types of typical Martian topography generated via a simulation engine as well as real Martian UAV images from the Ingenuity helicopter. JointLoc achieved a root-mean-square error of 0.237m in the trajectories of up to 1,000m, compared to 0.594m and 0.557m for ORB-SLAM2 and ORB-SLAM3 respectively. The source code will be available at this https URL.
无人机(UAVs)在行星上的视觉定位旨在通过卫星地图和由机载相机捕获的图像来估计UAV在世界坐标系中的绝对位置。然而,由于行星场景通常缺乏明显的地标,卫星地图和UAV图像之间存在模态差异,因此UAV定位的准确性和实时性能将降低。为了在没有全球导航卫星系统(GNSS)的情况下准确确定UAV在行星场景中的位置,本文提出了JointLoc,它通过自适应融合绝对2度自由度(2-DoF)姿态和相对6度自由度(6-DoF)姿态来估计实时UAV在世界坐标系中的位置。在拟议的行星UAV图像跨模态定位数据集上进行了广泛的比较实验,该数据集包含通过模拟引擎生成的三种典型火星地形类型以及来自Ingenuity直升机的真实火星UAV图像。JointLoc在1000m及以上的轨迹上的根均方误差为0.237m,而ORB-SLAM2和ORB-SLAM3的轨迹上的根均方误差分别为0.594m和0.557m。源代码将在此处https URL上提供。
https://arxiv.org/abs/2405.07429
Accurate and robust camera tracking in dynamic environments presents a significant challenge for visual SLAM (Simultaneous Localization and Mapping). Recent progress in this field often involves the use of deep learning techniques to generate mask for dynamic objects, which usually require GPUs to operate in real-time (30 fps). Therefore, this paper proposes a novel visual SLAM system for dynamic environments that obtains real-time performance on CPU by incorporating a mask prediction mechanism, which allows the deep learning method and the camera tracking to run entirely in parallel at different frequencies such that neither waits for the result from the other. Based on this, it further introduces a dual-stage optical flow tracking approach and employs a hybrid usage of optical flow and ORB features, which significantly enhance the efficiency and robustness of the system. Compared with state-of-the-art methods, this system maintains high localization accuracy in dynamic environments while achieving a tracking frame rate of 56 fps on a single laptop CPU without any hardware acceleration, thus proving that deep learning methods are still feasible for dynamic SLAM even without GPU support. Based on the available information, this is the first SLAM system to achieve this.
准确且鲁棒的目标跟踪在动态环境中具有重大的挑战,尤其是在视觉SLAM(同时定位与映射)中。最近,这个领域的进展通常涉及使用深度学习技术生成动态对象的掩码,这通常需要GPU在实时(30帧/秒)操作。因此,本文提出了一种新的动态环境下的视觉SLAM系统,通过引入掩码预测机制,在CPU上实现实时性能,使得深度学习方法和相机跟踪可以在不同频率下并行运行,从而使它们不必等待结果。基于此,它还引入了双级光学流跟踪方法,并采用光流和ORB特征的混合使用,显著提高了系统的效率和鲁棒性。与最先进的方法相比,这个系统在动态环境中保持了高定位精度,同时在单个笔记本CPU上实现了56帧/秒的跟踪率,没有使用任何硬件加速,从而证明即使没有GPU支持,深度学习方法仍然可以实现动态SLAM。基于可用信息,这是第一个实现此目标的SLAM系统。
https://arxiv.org/abs/2405.07392
Guided by the hologram technology of the infamous Star Wars franchise, I present an application that creates real-time holographic overlays using LiDAR augmented 3D reconstruction. Prior attempts involve SLAM or NeRFs which either require highly calibrated scenes, incur steep computation costs, or fail to render dynamic scenes. I propose 3 high-fidelity reconstruction tools that can run on a portable device, such as a iPhone 14 Pro, which can allow for metric accurate facial reconstructions. My systems enable interactive and immersive holographic experiences that can be used for a wide range of applications, including augmented reality, telepresence, and entertainment.
在《星际迷航》这个著名的电影系列中,引导着虚拟现实技术的全息图技术的应用,我提出了一个使用激光增强型3D建模创建实时全息图覆盖的应用程序。之前的尝试包括SLAM或NeRFs,它们要么需要高度校准的场景,要么需要高昂的计算成本,或者无法渲染动态场景。我提出了3个可以在便携式设备上运行的高保真度重建工具,比如iPhone 14 Pro,这将允许实现精确的 metric 面部重建。我的系统可以实现交互式和沉浸式的全息图体验,可以应用于各种应用,包括增强现实、远程呈现和娱乐。
https://arxiv.org/abs/2405.07178
The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at this https URL.
依赖准确的相机姿态是广泛应用神经辐射场(NeRF)模型进行3D建模和SLAM任务的显著障碍。现有的方法引入了单目深度优先权来共同优化相机姿态和NeRF,但未能充分利用深度优先权,忽视了它们的固有噪声。在本文中,我们提出了截断深度NeRF(TD-NeRF),一种通过共同优化Radiance场和学习器参数来训练NeRF的新颖方法。我们的方法通过三个关键进步利用了单目深度优先权:1)我们提出了一种基于截断正态分布的新型深度光线采样策略,从而提高姿态估计的收敛速度和准确性;2)为了绕过局部最小值并优化深度几何,我们引入了一种粗到细的训练策略,逐步提高深度精度;3)我们提出了一种更健壮的跨帧点约束,在训练过程中增强对深度噪声的鲁棒性。在三个数据集上的实验结果表明,TD-NeRF在联合优化相机姿态和NeRF方面取得了卓越的性能,超越了之前的工作,并生成了更准确的深度几何。我们方法的实现已经发布在https:// this URL上。
https://arxiv.org/abs/2405.07027
This letter introduces a novel framework for dense Visual Simultaneous Localization and Mapping (VSLAM) based on Gaussian Splatting. Recently Gaussian Splatting-based SLAM has yielded promising results, but rely on RGB-D input and is weak in tracking. To address these limitations, we uniquely integrates advanced sparse visual odometry with a dense Gaussian Splatting scene representation for the first time, thereby eliminating the dependency on depth maps typical of Gaussian Splatting-based SLAM systems and enhancing tracking robustness. Here, the sparse visual odometry tracks camera poses in RGB stream, while Gaussian Splatting handles map reconstruction. These components are interconnected through a Multi-View Stereo (MVS) depth estimation network. And we propose a depth smooth loss to reduce the negative effect of estimated depth maps. Furthermore, the consistency in scale between the sparse visual odometry and the dense Gaussian map is preserved by Sparse-Dense Adjustment Ring (SDAR). We have evaluated our system across various synthetic and real-world datasets. The accuracy of our pose estimation surpasses existing methods and achieves state-of-the-art performance. Additionally, it outperforms previous monocular methods in terms of novel view synthesis fidelity, matching the results of neural SLAM systems that utilize RGB-D input.
本文介绍了一种基于Gaussian Splatting的密集视觉同时定位与映射(VSLAM)的新框架。最近基于Gaussian Splatting的SLAM已经取得了良好的结果,但是依赖于RGB-D输入,并且在跟踪方面较弱。为了克服这些限制,我们独特地将先进的稀疏视觉欧拉角与密集Gaussian Splatting场景表示集成在一起,从而消除了Gaussian Splatting-based SLAM系统常见的深度图依赖,提高了跟踪的鲁棒性。在这里,稀疏视觉欧拉角跟踪相机姿态,而Gaussian Splatting处理地图重建。这些组件通过多视角立体(MVS)深度估计网络相互连接。我们提出了一个深度平滑损失来减少估计深度图的负影响。此外,通过稀疏-稠密调整环(SDAR)可以保留稀疏视觉欧拉角与密集Gaussian地图之间的一致缩放。我们在各种合成和真实世界数据集上对系统进行了评估。我们的姿态估计精度超越了现有方法,达到了最先进的水平。此外,在新颖视角合成保真度方面,它超过了之前的多目视觉SLAM系统,与利用RGB-D输入的神经SLAM系统的结果相匹敌。
https://arxiv.org/abs/2405.06241
Implicit neural representations and neural render- ing have gained increasing attention for bathymetry estimation from sidescan sonar (SSS). These methods incorporate multiple observations of the same place from SSS data to constrain the elevation estimate, converging to a globally-consistent bathymetric model. However, the quality and precision of the bathymetric estimate are limited by the positioning accuracy of the autonomous underwater vehicle (AUV) equipped with the sonar. The global positioning estimate of the AUV relying on dead reckoning (DR) has an unbounded error due to the absence of a geo-reference system like GPS underwater. To address this challenge, we propose in this letter a modern and scalable framework, NeuRSS, for SSS SLAM based on DR and loop closures (LCs) over large timescales, with an elevation prior provided by the bathymetric estimate using neural rendering from SSS. This framework is an iterative procedure that improves localization and bathymetric mapping. Initially, the bathymetry estimated from SSS using the DR estimate, though crude, can provide an important elevation prior in the nonlinear least-squares (NLS) optimization that estimates the relative pose between two loop-closure vertices in a pose graph. Subsequently, the global pose estimate from the SLAM component improves the positioning estimate of the vehicle, thus improving the bathymetry estimation. We validate our localization and mapping approach on two large surveys collected with a surface vessel and an AUV, respectively. We evaluate their localization results against the ground truth and compare the bathymetry estimation against data collected with multibeam echo sounders (MBES).
隐式神经表示和神经渲染在从侧扫声(SSS)数据中估计海底地形方面已经获得了越来越多的关注。这些方法通过从SSS数据中获取同一地点的多个观测值来约束估计地形高度,并收敛为全局一致的海底地形模型。然而,由于自主水下车辆(AUV)的声呐位置精度有限,地形估计的精度和精度有限。AUV依赖 dead reckoning(DR)的全局定位估计由于缺乏类似于GPS的水下地理参考系统而具有无界误差。为了应对这一挑战,我们在这封信中提出了一个现代且可扩展的框架NeuRSS,用于基于DR和环路封闭(LCs)的大时间尺度SSS SLAM,并提供由神经渲染估计的地形先验。这个框架是一个迭代过程,可以提高局部定位和地形绘制。最初,从SSS中使用DR估计得到的海底地形虽然粗略,但可以提供在非线性最小二乘(NLS)优化中估计两个环路封闭顶点之间相对姿态的重要地形先验。随后,SLAM部分的全局姿态估计车辆的位置,从而提高地形估计。我们对该方法在一个大调查和一个AUV上进行局部定位和绘制进行了验证。我们分别将它们的局部定位结果与地面真实值进行比较,并将地形估计与多束回波声呐(MBES)收集的数据进行比较。
https://arxiv.org/abs/2405.05807
Gaussian Splatting has garnered widespread attention due to its exceptional performance. Consequently, SLAM systems based on Gaussian Splatting have emerged, leveraging its capabilities for rapid real-time rendering and high-fidelity mapping. However, current Gaussian Splatting SLAM systems usually struggle with large scene representation and lack effective loop closure adjustments and scene generalization capabilities. To address these issues, we introduce NGM-SLAM, the first GS-SLAM system that utilizes neural radiance field submaps for progressive scene expression, effectively integrating the strengths of neural radiance fields and 3D Gaussian Splatting. We have developed neural implicit submaps as supervision and achieve high-quality scene expression and online loop closure adjustments through Gaussian rendering of fused submaps. Our results on multiple real-world scenes and large-scale scene datasets demonstrate that our method can achieve accurate gap filling and high-quality scene expression, supporting both monocular, stereo, and RGB-D inputs, and achieving state-of-the-art scene reconstruction and tracking performance.
由于其出色的表现,Gaussian Splatting引起了广泛的关注。因此,基于Gaussian Splatting的SLAM系统应运而生,利用其能力实现快速实时渲染和高保真度映射。然而,当前的基于Gaussian Splatting的SLAM系统通常在大型场景表示上遇到困难,并且缺乏有效的环路关闭调整和场景泛化功能。为了应对这些问题,我们引入了NGM-SLAM,首个利用神经元辐射场子图进行渐进式场景表达的GS-SLAM系统。我们通过将神经元隐式子图作为监督来实现高质量的场景表达,并通过融合子图的Gaussian渲染来实现。我们在多个真实世界场景和大型场景数据集上的结果表明,我们的方法可以实现准确的填充缺口和高保真度的场景表达,支持单目、立体视和RGB-D输入,并实现与最先进的场景重建和跟踪性能相当的结果。
https://arxiv.org/abs/2405.05702
Neural Radiance Fields (NeRF) have emerged as a powerful paradigm for 3D scene representation, offering high-fidelity renderings and reconstructions from a set of sparse and unstructured sensor data. In the context of autonomous robotics, where perception and understanding of the environment are pivotal, NeRF holds immense promise for improving performance. In this paper, we present a comprehensive survey and analysis of the state-of-the-art techniques for utilizing NeRF to enhance the capabilities of autonomous robots. We especially focus on the perception, localization and navigation, and decision-making modules of autonomous robots and delve into tasks crucial for autonomous operation, including 3D reconstruction, segmentation, pose estimation, simultaneous localization and mapping (SLAM), navigation and planning, and interaction. Our survey meticulously benchmarks existing NeRF-based methods, providing insights into their strengths and limitations. Moreover, we explore promising avenues for future research and development in this domain. Notably, we discuss the integration of advanced techniques such as 3D Gaussian splatting (3DGS), large language models (LLM), and generative AIs, envisioning enhanced reconstruction efficiency, scene understanding, decision-making capabilities. This survey serves as a roadmap for researchers seeking to leverage NeRFs to empower autonomous robots, paving the way for innovative solutions that can navigate and interact seamlessly in complex environments.
神经辐射场(NeRF)已成为3D场景表示的强大范例,提供来自一系列稀疏和无结构传感器数据的最高质量渲染和重构。在自主机器人领域,感知和理解环境至关重要,因此NeRF在提高性能方面具有巨大的潜力。在本文中,我们对使用NeRF增强自主机器人能力的最先进技术进行全面调查和分析。我们特别关注自主机器人的感知、定位和导航模块,深入研究了对于自主操作至关重要的任务,包括3D建模、分割、姿态估计、同时定位与映射(SLAM)、导航和规划以及交互。我们的调查详细基准了现有的NeRF方法,提供了它们的优势和局限性的洞察。此外,我们探讨了该领域未来研究的方向和前景。值得注意的是,我们讨论了包括3D高斯分裂(3DGS)、大型语言模型(LLM)和生成式人工智能(GAN)等先进技术的整合,旨在提高建模效率、增强场景理解和决策能力。本调查为研究人员利用NeRF增强自主机器人提供了路线图,为研究人员提供了一个创新解决方案,可以让自主机器人顺畅地导航和交互。
https://arxiv.org/abs/2405.05526
In the realm of robotics, the quest for achieving real-world autonomy, capable of executing large-scale and long-term operations, has positioned place recognition (PR) as a cornerstone technology. Despite the PR community's remarkable strides over the past two decades, garnering attention from fields like computer vision and robotics, the development of PR methods that sufficiently support real-world robotic systems remains a challenge. This paper aims to bridge this gap by highlighting the crucial role of PR within the framework of Simultaneous Localization and Mapping (SLAM) 2.0. This new phase in robotic navigation calls for scalable, adaptable, and efficient PR solutions by integrating advanced artificial intelligence (AI) technologies. For this goal, we provide a comprehensive review of the current state-of-the-art (SOTA) advancements in PR, alongside the remaining challenges, and underscore its broad applications in robotics. This paper begins with an exploration of PR's formulation and key research challenges. We extensively review literature, focusing on related methods on place representation and solutions to various PR challenges. Applications showcasing PR's potential in robotics, key PR datasets, and open-source libraries are discussed. We also emphasizes our open-source package, aimed at new development and benchmark for general PR. We conclude with a discussion on PR's future directions, accompanied by a summary of the literature covered and access to our open-source library, available to the robotics community at: this https URL.
在机器人领域,实现真实世界自主,能够执行大规模和长期任务的目标,使位置识别(PR)成为关键技术。尽管 PR 社区在过去的 20 年里取得了惊人的进展,从计算机视觉和机器人学等领域吸引了关注,但开发足够支持真实世界机器人系统的方法仍然具有挑战性。本文旨在通过强调 PR 在同步定位与映射(SLAM)2.0 框架中的关键作用来弥合这个空白。这一机器人导航的新阶段要求通过整合先进的人工智能(AI)技术,提供可扩展、可适应和高效的 PR 解决方案。为此,我们对 PR 领域的最前沿(SOTA)进展进行全面回顾,并讨论了其在机器人领域的广泛应用。本文从 PR 的定义和关键研究挑战开始。我们详细回顾了文献,重点关注与位置表示和各种 PR 挑战相关的相关方法。展示了 PR 在机器人领域的潜在应用、关键 PR 数据集和开源库。我们还强调了我们旨在为一般 PR 的新发展和基准提供支持的开放式源代码包。我们最后讨论了 PR 的未来方向,并附有文献回顾和开放式源代码库的链接,供机器人学术界使用:https:// this URL.
https://arxiv.org/abs/2405.04812
Existing neural field-based SLAM methods typically employ a single monolithic field as their scene representation. This prevents efficient incorporation of loop closure constraints and limits scalability. To address these shortcomings, we propose a neural mapping framework which anchors lightweight neural fields to the pose graph of a sparse visual SLAM system. Our approach shows the ability to integrate large-scale loop closures, while limiting necessary reintegration. Furthermore, we verify the scalability of our approach by demonstrating successful building-scale mapping taking multiple loop closures into account during the optimization, and show that our method outperforms existing state-of-the-art approaches on large scenes in terms of quality and runtime. Our code is available at this https URL.
现有的基于神经场的三维SLAM方法通常使用单个单体化场作为其场景表示。这阻止了循环约束的 efficiently整合,并限制了可扩展性。为了克服这些缺陷,我们提出了一个神经映射框架,将轻量级神经场锚定到一个稀疏视觉SLAM系统的姿态图中。我们的方法展示了能够整合大型循环约束,同时限制必要的重新整合。此外,我们通过在优化过程中考虑多个循环约束来验证我们的方法的普适性,并在大场景上证明了其在质量和运行时间方面的优越性。我们的代码可在此处访问:https://www.example.com/。
https://arxiv.org/abs/2405.03633
This paper explores how deep learning techniques can improve visual-based SLAM performance in challenging environments. By combining deep feature extraction and deep matching methods, we introduce a versatile hybrid visual SLAM system designed to enhance adaptability in challenging scenarios, such as low-light conditions, dynamic lighting, weak-texture areas, and severe jitter. Our system supports multiple modes, including monocular, stereo, monocular-inertial, and stereo-inertial configurations. We also perform analysis how to combine visual SLAM with deep learning methods to enlighten other researches. Through extensive experiments on both public datasets and self-sampled data, we demonstrate the superiority of the SL-SLAM system over traditional approaches. The experimental results show that SL-SLAM outperforms state-of-the-art SLAM algorithms in terms of localization accuracy and tracking robustness. For the benefit of community, we make public the source code at this https URL.
本文探讨了深度学习技术如何通过结合深度特征提取和深度匹配方法来提高基于视觉的SLAM在具有挑战性的环境中的性能。通过结合深度特征提取和深度匹配方法,我们引入了一种多功能的混合视觉SLAM系统,旨在增强在具有挑战性的场景中的适应性,例如低光条件、动态照明、弱纹理区和严重抖动。我们的系统支持多种模式,包括单目、双目、单目-惯性和平面-惯性配置。我们还进行了分析,探讨了如何将视觉SLAM与深度学习方法相结合以启发其他研究者。通过在公开数据集和自采样数据上进行广泛的实验,我们证明了SL-SLAM系统与传统方法相比具有优越性。实验结果表明,SL-SLAM在定位精度和跟踪鲁棒性方面优于最先进的SLAM算法。为了造福社区,我们将SL-SLAM的源代码公开在以下链接处:
https://arxiv.org/abs/2405.03413
We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters through Taylor series expansion within the complex domain. Our system allows for the real-time calculation of not just the gradient, but also higher-order differentiation. This facilitates the use of high-order optimizers to achieve better accuracy and faster convergence. Building on X-SLAM, we implemented end-to-end optimization frameworks for two important tasks: camera relocalization in wide outdoor scenes and active robotic scanning in complex indoor environments. Comprehensive evaluations on public benchmarks and intricate real scenes underscore the improvements in the accuracy of camera relocalization and the efficiency of robotic navigation achieved through our task-aware optimization. The code and data are available at this https URL.
我们提出了X-SLAM,一种利用复杂步长有限差分(CSFD)方法实现实时高密度的SLAM系统,无需大型计算图。我们方法的关键是将SLAM过程视为一个可导函数,从而在复数域内通过泰勒级数展开计算重要SLAM参数的导数。我们的系统允许实时计算不仅仅是梯度,还包括高阶导数。这有助于使用高阶优化器实现更好的精度和更快的收敛。在X-SLAM的基础上,我们为两个重要任务实现了端到端优化框架:在广阔户外场景中的相机重新定位和复杂室内环境中主动机器人扫描。在公开基准测试和复杂现实场景的全面评估中,我们的任务感知优化提高了相机重新定位的精度,以及通过我们的优化实现了机器人导航的高效性。代码和数据可在此https URL获取。
https://arxiv.org/abs/2405.02187
The majority of visual SLAM systems are not robust in dynamic scenarios. The ones that deal with dynamic objects in the scenes usually rely on deep-learning-based methods to detect and filter these objects. However, these methods cannot deal with unknown moving objects. This work presents Panoptic-SLAM, an open-source visual SLAM system robust to dynamic environments, even in the presence of unknown objects. It uses panoptic segmentation to filter dynamic objects from the scene during the state estimation process. Panoptic-SLAM is based on ORB-SLAM3, a state-of-the-art SLAM system for static environments. The implementation was tested using real-world datasets and compared with several state-of-the-art systems from the literature, including DynaSLAM, DS-SLAM, SaD-SLAM, PVO and FusingPanoptic. For example, Panoptic-SLAM is on average four times more accurate than PVO, the most recent panoptic-based approach for visual SLAM. Also, experiments were performed using a quadruped robot with an RGB-D camera to test the applicability of our method in real-world scenarios. The tests were validated by a ground-truth created with a motion capture system.
大多数视觉SLAM系统在动态场景中并不稳健。那些处理场景中的动态对象通常依赖于基于深度学习的检测和过滤方法来检测和过滤这些物体。然而,这些方法无法处理未知的动态物体。本文介绍了一种名为Panoptic-SLAM的开放源代码视觉SLAM系统,即使在存在未知的物体时,它也能够在动态环境中保持稳健。它使用 panoptic 分割在状态估计过程中从场景中过滤动态物体。Panoptic-SLAM 基于 ORB-SLAM3,这是一种用于静态环境的先进SLAM系统。通过使用真实世界数据集进行实现并与多个文献中提到的最先进的系统进行比较,包括 DynaSLAM、DS-SLAM、SaD-SLAM、PVO 和 FusingPanoptic。例如,Panoptic-SLAM 的平均精度是 PVO 的四倍,这是最先进的基于 panoptic 的视觉SLAM 方法。此外,还使用四足机器人与 RGB-D 相机进行实验,以测试我们方法在真实世界场景中的应用。测试通过使用运动捕捉系统生成的真实世界数据进行验证。
https://arxiv.org/abs/2405.02177
We propose RTG-SLAM, a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. RTG-SLAM features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of real large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking accuracy.
我们提出了RTG-SLAM,一种基于Gaussian分割的大规模环境下的实时3D重建系统。RTG-SLAM具有紧凑的Gaussian表示和高效的on-the-fly Gaussian优化方案。我们强制每个Gaussian要么是透明的,要么是几乎透明的,其中透明的Gaussian适合于表面和主导颜色,而透明的Gaussian适合于残余颜色。通过以与颜色渲染不同的方式渲染深度,我们使得一个透明的Gaussian可以适应用户本地表面区域,而无需多个重叠的Gaussian,从而大大降低了内存和计算成本。 对于on-the-fly Gaussian优化,我们明确地添加了每帧三种不同类型的像素的Gaussian:新观察到的,具有大的颜色误差和大的深度误差。我们还将所有Gaussian分为稳定和不稳定两类,其中稳定Gaussian预计将很好地适应用户之前观察到的RGBD图像,而其他Gaussian则是不稳定的。我们仅优化不稳定Gaussian,并仅渲染稳定Gaussian占用的像素。 通过这种方式,Gaussians要优化的数量和需要渲染的像素数量都大大减少,优化可以在实时过程中进行。我们展示了各种真实大场景的实时重构。与基于NeRF的RGBD SLAM的状态相比,我们的系统在质量和高速度方面具有相似的表现,同时将速度和内存成本降低约一半,并在新颖视图合成和相机跟踪精度的现实性方面具有卓越的表现。
https://arxiv.org/abs/2404.19706
Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and mapping (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and the projection onto the constraints can be calculated by normalization. Then a proximal linearized Riemannian alternating direction method of multipliers (PieADMM) is developed to solve the proposed model, which not only has low memory requirements, but also can update the poses in parallel. Furthermore, we establish the iteration complexity of $O(1/\epsilon^{2})$ of PieADMM for finding an $\epsilon$-stationary solution of our model. The efficiency of our proposed algorithm is demonstrated by numerical experiments on two synthetic and four 3D SLAM benchmark datasets.
姿态图优化(PGO)是一种解决基于姿态的的同时定位和映射(SLAM)问题的著名技术。在本文中,我们用单位四元数和三维向量表示旋转和平移,并基于Von Mises-Fisher分布提出了一种新的PGO模型。由单位四元数导出的约束是球形曼哈顿空间,投影到约束上可以通过归一化计算。然后,我们开发了一种代理线性化黎曼ian交替方向方法(PieADMM)来求解所提出的模型,该模型具有较低的内存要求,并且可以并行更新姿态。此外,我们还计算了PieADMM在找到我们模型的$\epsilon$-稳定解时的迭代复杂度。我们提出的算法的效率通过在两个合成和四个3D SLAM基准数据集上的数值实验得到了充分证明。
https://arxiv.org/abs/2404.18560
This paper proposes a photorealistic real-time dense 3D mapping system that utilizes a learning-based image enhancement method and mesh-based map representation. Due to the characteristics of the underwater environment, where problems such as hazing and low contrast occur, it is hard to apply conventional simultaneous localization and mapping (SLAM) methods. Furthermore, for sensitive tasks like inspecting cracks, photorealistic mapping is very important. However, the behavior of Autonomous Underwater Vehicle (AUV) is computationally constrained. In this paper, we utilize a neural network-based image enhancement method to improve pose estimation and mapping quality and apply a sliding window-based mesh expansion method to enable lightweight, fast, and photorealistic mapping. To validate our results, we utilize real-world and indoor synthetic datasets. We performed qualitative validation with the real-world dataset and quantitative validation by modeling images from the indoor synthetic dataset as underwater scenes.
本文提出了一种利用基于学习的图像增强方法和基于网格的地图表示的等距实时三维映射系统。由于水下环境的特性,例如雾和低对比度问题,因此很难应用传统的同时定位和映射(SLAM)方法。此外,对于诸如检查裂纹等敏感任务,等距实时映射非常重要。然而,自主水下车辆(AUV)的行为是计算受限的。在本文中,我们利用基于神经网络的图像增强方法来提高姿态估计和映射质量,并采用滑动窗口基础的网格扩展方法来实现轻量、快速和等距实时映射。为了验证我们的结果,我们利用真实世界和室内合成数据集。我们通过真实世界数据集进行定性评估,并通过将室内合成数据集中的图像建模为水下场景进行定量评估。
https://arxiv.org/abs/2404.18395
Multi-robot simultaneous localization and mapping (SLAM) enables a robot team to achieve coordinated tasks relying on a common map. However, centralized processing of robot observations is undesirable because it creates a single point of failure and requires pre-existing infrastructure and significant multi-hop communication throughput. This paper formulates multi-robot object SLAM as a variational inference problem over a communication graph. We impose a consensus constraint on the objects maintained by different nodes to ensure agreement on a common map. To solve the problem, we develop a distributed mirror descent algorithm with a regularization term enforcing consensus. Using Gaussian distributions in the algorithm, we derive a distributed multi-state constraint Kalman filter (MSCKF) for multi-robot object SLAM. Experiments on real and simulated data show that our method improves the trajectory and object estimates, compared to individual-robot SLAM, while achieving better scaling to large robot teams, compared to centralized multi-robot SLAM. Code is available at this https URL.
多机器人同时定位与映射(SLAM)使得机器人团队能够依靠共同的地图实现协同任务。然而,集中式处理机器人观测是一个不愉快的特点,因为它创造了一个单点故障,并需要依赖预先存在的设施和显著的多跳通信带宽。本文将多机器人对象SLAM建模为通信图上的变分推理问题。我们在不同节点维护的物体之间施加共识约束,以确保对共同地图的一致同意。为了解决这个问题,我们开发了一个具有正则化项的分布式镜像下降算法。使用高斯分布算法,我们推导出多机器人对象SLAM的分布式多状态约束Kalman滤波器(MSCKF)。在真实和模拟数据上的实验表明,与单独机器人SLAM相比,我们的方法提高了轨迹和物体估计,同时实现了更好的对大型机器人团队的比例扩展。代码可在此处访问:https://www.xxx.com/
https://arxiv.org/abs/2404.18331