Recent advances in aerial robotics have enabled the use of multirotor vehicles for autonomous payload transportation. Resorting only to classical methods to reliably model a quadrotor carrying a cable-slung load poses significant challenges. On the other hand, purely data-driven learning methods do not comply by design with the problem's physical constraints, especially in states that are not densely represented in training data. In this work, we explore the use of physics informed neural networks to learn an end-to-end model of the multirotor-slung-load system and, at a given time, estimate a sequence of the future system states. An LSTM encoder decoder with an attention mechanism is used to capture the dynamics of the system. To guarantee the cohesiveness between the multiple predicted states of the system, we propose the use of a physics-based term in the loss function, which includes a discretized physical model derived from first principles together with slack variables that allow for a small mismatch between expected and predicted values. To train the model, a dataset using a real-world quadrotor carrying a slung load was curated and is made available. Prediction results are presented and corroborate the feasibility of the approach. The proposed method outperforms both the first principles physical model and a comparable neural network model trained without the physics regularization proposed.
近年来,随着无人机遥控技术的进步,多旋翼车辆已被用于自主承载电缆吊重物的运输。仅仅依靠经典方法来可靠地建模四旋翼携带电缆吊重物存在重大挑战。另一方面,完全基于数据驱动的学习方法在设计上并不符合问题固有约束,尤其是在训练数据中没有很好地表示的状态。在本文中,我们探讨了使用受物理学启发的神经网络来学习多旋翼吊重系统端到端模型的应用,并在给定时间估计未来系统状态。为了捕捉系统的动态,我们使用了LSTM编码器-解码器模型,并引入了注意机制来控制多个预测状态之间的连贯性。为了保证系统中多个预测状态的连贯性,我们在损失函数中引入了一个基于物理学的项,包括从基本原理导出的离散化物理模型和允许预期值和预测值之间的小误差的可缩放变量。为了训练模型,我们挑选了一个使用真实世界四旋翼运输电缆吊重物的数据集,并提供了用于训练的数据集。预测结果被呈现,并证实了该方法的有效性。与无物理学 regularization的第一原理物理模型和没有物理学的神经网络模型相比,所提出的方法优越。
https://arxiv.org/abs/2405.09428
This research reports VascularPilot3D, the first 3D fully autonomous endovascular robot navigation system. As an exploration toward autonomous guidewire navigation, VascularPilot3D is developed as a complete navigation system based on intra-operative imaging systems (fluoroscopic X-ray in this study) and typical endovascular robots. VascularPilot3D adopts previously researched fast 3D-2D vessel registration algorithms and guidewire segmentation methods as its perception modules. We additionally propose three modules: a topology-constrained 2D-3D instrument end-point lifting method, a tree-based fast path planning algorithm, and a prior-free endovascular navigation strategy. VascularPilot3D is compatible with most mainstream endovascular robots. Ex-vivo experiments validate that VascularPilot3D achieves 100% success rate among 25 trials. It reduces the human surgeon's overall control loops by 18.38%. VascularPilot3D is promising for general clinical autonomous endovascular navigations.
这项研究报道了VascularPilot3D,这是第一个3D完全自主式内窥镜导航系统。作为自主引导线导航探索,VascularPilot3D是基于内窥镜成像系统(本研究中的荧光X射线)和典型内窥镜机器人开发的完整导航系统。VascularPilot3D采用之前研究过的快速3D-2D血管配准算法和引导线分割方法作为其感知模块。此外,我们还提出了三个模块:基于树的高速路径规划算法、基于约束的2D-3D器械端点提升方法和无需先验的内窥镜导航策略。VascularPilot3D兼容大多数主流内窥镜机器人。实验验证表明,VascularPilot3D在25个试点研究中实现了100%的成功率。它减少了人类外科医生的总操作循环次数 by 18.38%。VascularPilot3D在一般临床自主内窥镜导航方面具有前景。
https://arxiv.org/abs/2405.09375
To safely navigate intricate real-world scenarios, autonomous vehicles must be able to adapt to diverse road conditions and anticipate future events. World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments. Nevertheless, to the best of our knowledge, there does not exist an accessible platform for training and testing such algorithms in sophisticated driving environments. To fill this void, we introduce CarDreamer, the first open-source learning platform designed specifically for developing WM based autonomous driving algorithms. It comprises three key components: 1) World model backbone: CarDreamer has integrated some state-of-the-art WMs, which simplifies the reproduction of RL algorithms. The backbone is decoupled from the rest and communicates using the standard Gym interface, so that users can easily integrate and test their own algorithms. 2) Built-in tasks: CarDreamer offers a comprehensive set of highly configurable driving tasks which are compatible with Gym interfaces and are equipped with empirically optimized reward functions. 3) Task development suite: This suite streamlines the creation of driving tasks, enabling easy definition of traffic flows and vehicle routes, along with automatic collection of multi-modal observation data. A visualization server allows users to trace real-time agent driving videos and performance metrics through a browser. Furthermore, we conduct extensive experiments using built-in tasks to evaluate the performance and potential of WMs in autonomous driving. Thanks to the richness and flexibility of CarDreamer, we also systematically study the impact of observation modality, observability, and sharing of vehicle intentions on AV safety and efficiency. All code and documents are accessible on this https URL.
为了在复杂的现实场景中安全导航,自动驾驶车辆必须能够适应各种道路条件并预测未来事件。基于强化学习的(RL)世界模型(WM)作为一种有前景的方法,通过学习和预测各种环境中的复杂动态而 emergence。然而,据我们所知,目前没有可用的平台来训练和测试这种算法在复杂驾驶环境中的自动驾驶算法。为填补这一空白,我们介绍了CarDreamer,第一个专为开发基于RL的自驾算法而设计的开源学习平台。它包括三个关键组件:1)世界模型骨架:CarDreamer集成了一些最先进的WMs,简化了RL算法的复制。骨架与其余部分解耦并使用标准的Gym界面通信,以便用户轻松地将自己的算法集成和测试。2)内置任务:CarDreamer提供了一系列高度可配置的驾驶任务,与Gym接口兼容,并配备经过实证优化的奖励函数。3)任务开发套件:该套件简化了驾驶任务的创建,用户可以轻松定义交通流量和车辆路线,并自动收集多模态观察数据。可视化服务器允许用户通过浏览器追踪实时代理驾驶员的视频和性能指标。此外,我们使用内置任务对WMs在自动驾驶中的性能和潜力进行了广泛的实验评估。由于CarDreamer的丰富性和灵活性,我们还系统地研究了观测模式、可观测性和车辆意图共享对AV安全性和效率的影响。所有代码和文档都可以在https://这个URL访问。
https://arxiv.org/abs/2405.09111
This study investigates the use of trajectory and dynamic state information for efficient data curation in autonomous driving machine learning tasks. We propose methods for clustering trajectory-states and sampling strategies in an active learning framework, aiming to reduce annotation and data costs while maintaining model performance. Our approach leverages trajectory information to guide data selection, promoting diversity in the training data. We demonstrate the effectiveness of our methods on the trajectory prediction task using the nuScenes dataset, showing consistent performance gains over random sampling across different data pool sizes, and even reaching sub-baseline displacement errors at just 50% of the data cost. Our results suggest that sampling typical data initially helps overcome the ''cold start problem,'' while introducing novelty becomes more beneficial as the training pool size increases. By integrating trajectory-state-informed active learning, we demonstrate that more efficient and robust autonomous driving systems are possible and practical using low-cost data curation strategies.
本研究探讨了在自动驾驶机器学习任务中使用轨迹和动态状态信息进行有效数据策展的方法。我们提出了一个活动学习框架中的轨迹聚类方法和采样策略,旨在降低注释和数据成本,同时保持模型性能。我们的方法利用轨迹信息指导数据选择,促进训练数据的多样性。我们在nuScenes数据集上展示了我们方法的效力,证实了在不同的数据池大小下,随机采样和轨迹预测任务的模型性能都有显著提升,甚至在仅占数据成本50%的情况下,还达到了亚基线位移误差。我们的结果表明,首先采用典型数据进行采样有助于克服“启动问题”,而随着训练池大小的增加,新颖性的引入变得更加有益。通过将轨迹-状态信息引导的主动学习集成到我们的研究中,我们证明了使用低成本数据策展策略可以实现更加高效和稳健的自动驾驶系统。
https://arxiv.org/abs/2405.09049
Autonomous systems often encounter environments and scenarios beyond the scope of their training data, which underscores a critical challenge: the need to generalize and adapt to unseen scenarios in real time. This challenge necessitates new mathematical and algorithmic tools that enable adaptation and zero-shot transfer. To this end, we leverage the theory of function encoders, which enables zero-shot transfer by combining the flexibility of neural networks with the mathematical principles of Hilbert spaces. Using this theory, we first present a method for learning a space of dynamics spanned by a set of neural ODE basis functions. After training, the proposed approach can rapidly identify dynamics in the learned space using an efficient inner product calculation. Critically, this calculation requires no gradient calculations or retraining during the online phase. This method enables zero-shot transfer for autonomous systems at runtime and opens the door for a new class of adaptable control algorithms. We demonstrate state-of-the-art system modeling accuracy for two MuJoCo robot environments and show that the learned models can be used for more efficient MPC control of a quadrotor.
自主系统通常会面临其训练数据范围之外的环境和场景,这凸显了一个关键挑战:需要在实时情况下对未见过的场景进行泛化和适应。这个挑战需要新的数学和算法工具来实现适应和零样本转移。为此,我们利用函数编码器的理论,该理论通过结合神经网络的灵活性和Hilbert空间数学原理来实现零样本转移。使用这个理论,我们首先提出了一种学习由一组神经ODE基础函数组成的动态空间的方法。训练后,所提出的方法可以迅速地在学习到的空间中识别出动态。关键的是,这个计算在在线阶段不需要梯度计算或重新训练。这种方法使得自主系统在实时情况下实现零样本转移,并为新的自适应控制算法打开了大门。我们用两个MuJoCo机器人环境证明了最先进的系统建模精度,并表明所学习到的模型可以用于更有效的MPC控制四旋翼。
https://arxiv.org/abs/2405.08954
Autonomous tuning of particle accelerators is an active and challenging field of research with the goal of enabling novel accelerator technologies cutting-edge high-impact applications, such as physics discovery, cancer research and material sciences. A key challenge with autonomous accelerator tuning remains that the most capable algorithms require an expert in optimisation, machine learning or a similar field to implement the algorithm for every new tuning task. In this work, we propose the use of large language models (LLMs) to tune particle accelerators. We demonstrate on a proof-of-principle example the ability of LLMs to successfully and autonomously tune a particle accelerator subsystem based on nothing more than a natural language prompt from the operator, and compare the performance of our LLM-based solution to state-of-the-art optimisation algorithms, such as Bayesian optimisation (BO) and reinforcement learning-trained optimisation (RLO). In doing so, we also show how LLMs can perform numerical optimisation of a highly non-linear real-world objective function. Ultimately, this work represents yet another complex task that LLMs are capable of solving and promises to help accelerate the deployment of autonomous tuning algorithms to the day-to-day operations of particle accelerators.
自动调节粒子加速器是一个充满挑战的研究领域,旨在实现新型的加速器技术, cutting-edge 的具有高影响应用的高科技应用,例如物理学发现、癌症研究和材料科学。自动调节粒子加速器的一个重要挑战是,最有效的算法需要优化领域的专家才能实现对每个新调节任务的算法进行操作。在这项工作中,我们提出使用大型语言模型(LLMs)对粒子加速器进行自动调节。我们在一个证明性的例子中展示了LLMs成功且自主地调节一个粒子加速器子系统的能力,仅基于操作员的自然语言提示。我们还比较了我们的LLM基于解决方案与最先进的优化算法(如贝叶斯优化(BO)和强化学习训练的优化(RLO))的性能。通过这样做,我们还展示了LLMs如何执行高度非线性的现实世界目标函数的数值优化。最终,这项工作代表了LLMs能够解决的最新复杂任务,并有望加速将自调节算法应用于粒子加速器日常运营的工作。
https://arxiv.org/abs/2405.08888
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.
在自动驾驶领域,在非分布环境下稳健的感知至关重要,这将有利于车辆的安全部署。例如恶劣天气、传感器故障和环境不可预测性等问题会对自动驾驶系统的性能造成严重影响。为了解决这个问题,2024 RoboDrive挑战是为了推动开发能够承受并适应这些现实世界变异性的人工智能驱动感知技术。将注意力放在四个关键任务上--BEV检测、地图分割、语义占用预测和多视角深度估计--比赛为创新和提高系统抗干扰能力设定了挑战。今年的挑战包括五个不同的赛道,吸引了来自93个机构的140支注册队伍,并通过我们的服务器评估了大约1000个解决方案。比赛最终产生了15个最佳解决方案,其中包括先进的数据增强、多传感器融合、自监督学习误码纠正和新的算法策略来增强传感器稳健性。这些贡献显著推动了技术的进步,尤其是在处理传感器不一致性和环境变化方面。参与者通过协同努力,推动了现有技术的边界,展示了他们在现实场景中的潜力。 extensive评估和分析提供了对这些解决方案的有效性的深入了解,强调了改进驾驶感知系统韧性的关键趋势和成功策略。这个挑战为该领域设定了新的基准,为未来研究提供了丰富的技术资料。
https://arxiv.org/abs/2405.08816
Datasets labelled by human annotators are widely used in the training and testing of machine learning models. In recent years, researchers are increasingly paying attention to label quality. However, it is not always possible to objectively determine whether an assigned label is correct or not. The present work investigates this ambiguity in the annotation of autonomous driving datasets as an important dimension of data quality. Our experiments show that excluding highly ambiguous data from the training improves model performance of a state-of-the-art pedestrian detector in terms of LAMR, precision and F1 score, thereby saving training time and annotation costs. Furthermore, we demonstrate that, in order to safely remove ambiguous instances and ensure the retained representativeness of the training data, an understanding of the properties of the dataset and class under investigation is crucial.
数据集是由人类注释者标记的 labeled 数据集在机器学习模型的训练和测试中得到了广泛应用。近年来,研究者们越来越关注标签的质量。然而,确定分配给任务的标签是否正确并不总是可能的。本文研究了自动驾驶数据集注释中的不确定性作为一个重要数据质量维度。我们的实验结果表明,从训练中排除高度 ambiguous 的数据可以提高最先进的行人检测模型(LAMM)的精度、召回率和 F1 分数,从而节省训练时间和标注成本。此外,我们还证明了,为了安全地移除歧义实例并确保训练数据的保留代表性,了解数据集及其所属类的特性至关重要。
https://arxiv.org/abs/2405.08794
With the proliferation of edge devices, there is a significant increase in attack surface on these devices. The decentralized deployment of threat intelligence on edge devices, coupled with adaptive machine learning techniques such as the in-context learning feature of large language models (LLMs), represents a promising paradigm for enhancing cybersecurity on low-powered edge devices. This approach involves the deployment of lightweight machine learning models directly onto edge devices to analyze local data streams, such as network traffic and system logs, in real-time. Additionally, distributing computational tasks to an edge server reduces latency and improves responsiveness while also enhancing privacy by processing sensitive data locally. LLM servers can enable these edge servers to autonomously adapt to evolving threats and attack patterns, continuously updating their models to improve detection accuracy and reduce false positives. Furthermore, collaborative learning mechanisms facilitate peer-to-peer secure and trustworthy knowledge sharing among edge devices, enhancing the collective intelligence of the network and enabling dynamic threat mitigation measures such as device quarantine in response to detected anomalies. The scalability and flexibility of this approach make it well-suited for diverse and evolving network environments, as edge devices only send suspicious information such as network traffic and system log changes, offering a resilient and efficient solution to combat emerging cyber threats at the network edge. Thus, our proposed framework can improve edge computing security by providing better security in cyber threat detection and mitigation by isolating the edge devices from the network.
随着边缘设备的普及,这些设备上的攻击面显著增加。在边缘设备上分布式威胁情报的集中部署,与大型语言模型(LLMs)的上下文学习特征等自适应机器学习技术的结合,代表了一种增强网络安全低功耗边缘设备的有前途的范式。这种方法涉及在边缘设备上直接部署轻量级机器学习模型以实时分析本地数据流,如网络流量和系统日志。此外,将计算任务分配给边缘服务器可以降低延迟并提高响应速度,同时通过在本地处理敏感数据而增强隐私。LLM服务器可以使得这些边缘服务器能够自主适应不断变化的威胁和攻击模式,持续更新模型以提高检测准确性和减少误报。此外,合作学习机制使边缘设备之间实现安全且可信的相互知识共享,增强网络集体智慧,并能够实现针对检测到的异常情况的动态威胁缓解措施,如设备隔离。这种方法的可扩展性和灵活性使其非常适合各种不断变化的网络环境,因为边缘设备仅发送网络流量和系统日志变化等可疑信息,为解决网络边缘 emerging cyber threats 提供了一个弹性和高效的解决方案。因此,我们提出的框架可以通过在网络边缘隔离边缘设备来提高边缘计算安全性,从而通过隔离边缘设备从网络来提高网络威胁检测和缓解的 security。
https://arxiv.org/abs/2405.08755
Autonomous intersection management (AIM) poses significant challenges due to the intricate nature of real-world traffic scenarios and the need for a highly expensive centralised server in charge of simultaneously controlling all the vehicles. This study addresses such issues by proposing a novel distributed approach to AIM utilizing multi-agent reinforcement learning (MARL). We show that by leveraging the 3D surround view technology for advanced assistance systems, autonomous vehicles can accurately navigate intersection scenarios without needing any centralised controller. The contributions of this paper thus include a MARL-based algorithm for the autonomous management of a 4-way intersection and also the introduction of a new strategy called prioritised scenario replay for improved training efficacy. We validate our approach as an innovative alternative to conventional centralised AIM techniques, ensuring the full reproducibility of our results. Specifically, experiments conducted in virtual environments using the SMARTS platform highlight its superiority over benchmarks across various metrics.
自动驾驶交叉管理(AIM)由于现实交通场景复杂性和需要一个昂贵的集中式服务器同时控制所有车辆而带来了显著的挑战。为了应对这些问题,本研究通过提出一种新型的分布式AIM方法利用多智能体强化学习(MARL)来解决这些问题。我们证明了通过利用高级辅助系统3D环绕视技术,自动驾驶车辆可以在不需要任何集中式控制器的情况下准确地导航路口场景。因此,本文的贡献包括基于MARL的自動管理4个路口的算法和引入了一种名为优先场景回放的新策略,以提高训练效果。我们验证了我们的方法作为传统集中AIM技术的一个创新替代方案,确保了我们的结果的完整可重复性。具体来说,使用SMARTS平台在虚拟环境中进行的实验强调了其在各种指标上优于基准测试的优越性。
https://arxiv.org/abs/2405.08655
Robust road surface estimation is required for autonomous ground vehicles to navigate safely. Despite it becoming one of the main targets for autonomous mobility researchers in recent years, it is still an open problem in which cameras and LiDAR sensors have demonstrated to be adequate to predict the position, size and shape of the road a vehicle is driving on in different environments. In this work, a novel Convolutional Neural Network model is proposed for the accurate estimation of the roadway surface. Furthermore, an ablation study has been conducted to investigate how different encoding strategies affect model performance, testing 6 slightly different neural network architectures. Our model is based on the use of a Twin Encoder-Decoder Neural Network (TEDNet) for independent camera and LiDAR feature extraction, and has been trained and evaluated on the Kitti-Road dataset. Bird's Eye View projections of the camera and LiDAR data are used in this model to perform semantic segmentation on whether each pixel belongs to the road surface. The proposed method performs among other state-of-the-art methods and operates at the same frame-rate as the LiDAR and cameras, so it is adequate for its use in real-time applications.
为了使自动驾驶车辆安全导航,需要对道路表面进行稳健的估计。尽管近年来,自动驾驶移动研究人员将这一目标作为主要目标,但仍然是一个尚未解决的问题,其中相机和LiDAR传感器已经被证明在预测车辆在各种环境中行驶的道路位置、大小和形状方面是足够的。在本文中,我们提出了一个用于准确估计道路表面的全新卷积神经网络模型。此外,我们还进行了一项消融研究,以研究不同编码策略对模型性能的影响,测试了6种稍微不同的神经网络架构。我们的模型基于使用Twin Encoder-Decoder Neural Network(TEDNet)进行独立相机和LiDAR特征提取,并在Kitti-Road数据集上进行训练和评估。在这个模型中,使用了摄像机和LiDAR数据的鸟瞰投影来进行语义分割,以确定每个像素是否属于道路表面。与最先进的算法相比,我们的方法在性能上处于领先地位,并且与LiDAR和相机在相同的帧率下运行,因此它非常适合在实时应用中使用。
https://arxiv.org/abs/2405.08429
Autonomous Vehicles (AVs) heavily rely on sensors and communication networks like Global Positioning System (GPS) to navigate autonomously. Prior research has indicated that networks like GPS are vulnerable to cyber-attacks such as spoofing and jamming, thus posing serious risks like navigation errors and system failures. These threats are expected to intensify with the widespread deployment of AVs, making it crucial to detect and mitigate such attacks. This paper proposes GPS Intrusion Detection System, or GPS-IDS, an Anomaly Behavior Analysis (ABA)-based intrusion detection framework to detect GPS spoofing attacks on AVs. The framework uses a novel physics-based vehicle behavior model where a GPS navigation model is integrated into the conventional dynamic bicycle model for accurate AV behavior representation. Temporal features derived from this behavior model are analyzed using machine learning to detect normal and abnormal navigation behavior. The performance of the GPS-IDS framework is evaluated on the AV-GPS-Dataset - a real-world dataset collected by the team using an AV testbed. The dataset has been publicly released for the global research community. To the best of our knowledge, this dataset is the first of its kind and will serve as a useful resource to address such security challenges.
自动驾驶车辆(AVs)主要依赖像全球定位系统(GPS)这样的传感器和通信网络进行自主导航。之前的研究表明,像GPS这样的网络很容易受到诸如伪造和干扰等 cyber-attacks,从而对导航错误和系统故障等严重风险构成严重威胁。预计随着AV的广泛部署,这些威胁将会加剧,因此检测和减轻这些攻击至关重要。本文提出了一种基于GPS Intrusion Detection System(GPS-IDS)的异常行为分析(ABA)框架,用于检测AV上的GPS伪造攻击。该框架使用了一种新颖的基于物理的车辆行为模型,将GPS导航模型与传统的动态自行车模型集成,以准确表示AV的行为。这种行为模型生成的时间特征通过机器学习分析,以检测正常和异常导航行为。对GPS-IDS框架的性能在AV-GPS-Dataset上进行了评估——这是该团队利用AV测试台收集的实时数据集。据我们所知,这个数据集是独一无二的,将成为解决这类安全挑战的有用资源。
https://arxiv.org/abs/2405.08359
Despite significant technological advancements, the process of programming robots for adaptive assembly remains labor-intensive, demanding expertise in multiple domains and often resulting in task-specific, inflexible code. This work explores the potential of Large Language Models (LLMs), like ChatGPT, to automate this process, leveraging their ability to understand natural language instructions, generalize examples to new tasks, and write code. In this paper, we suggest how these abilities can be harnessed and applied to real-world challenges in the manufacturing industry. We present a novel system that uses ChatGPT to automate the process of programming robots for adaptive assembly by decomposing complex tasks into simpler subtasks, generating robot control code, executing the code in a simulated workcell, and debugging syntax and control errors, such as collisions. We outline the architecture of this system and strategies for task decomposition and code generation. Finally, we demonstrate how our system can autonomously program robots for various assembly tasks in a real-world project.
尽管出现了显著的技术进步,但为适应性装配编程机器人仍然是一个劳动密集的过程,需要掌握多个领域的专业知识,通常导致任务特定、不灵活的代码。本文探讨了大型语言模型(如ChatGPT)自动执行这一过程的潜力,并利用其理解自然语言指令、泛化示例到新任务和编写代码的能力。在本文中,我们提出了如何利用这些能力解决制造业中实际挑战的建议。我们提出了一个新系统,该系统使用ChatGPT自动编程机器人进行适应性装配,通过将复杂任务分解为简单的子任务,生成机器人控制代码,在模拟工作单元中执行代码并进行调试,例如碰撞。我们概述了该系统的架构和任务分解和代码生成的策略。最后,我们展示了我们的系统如何自主编程各种装配任务在实际项目中的机器人。
https://arxiv.org/abs/2405.08216
Mixed-integer quadratic programs (MIQPs) are a versatile way of formulating vehicle decision making and motion planning problems, where the prediction model is a hybrid dynamical system that involves both discrete and continuous decision variables. However, even the most advanced MIQP solvers can hardly account for the challenging requirements of automotive embedded platforms. Thus, we use machine learning to simplify and hence speed up optimization. Our work builds on recent ideas for solving MIQPs in real-time by training a neural network to predict the optimal values of integer variables and solving the remaining problem by online quadratic programming. Specifically, we propose a recurrent permutation equivariant deep set that is particularly suited for imitating MIQPs that involve many obstacles, which is often the major source of computational burden in motion planning problems. Our framework comprises also a feasibility projector that corrects infeasible predictions of integer variables and considerably increases the likelihood of computing a collision-free trajectory. We evaluate the performance, safety and real-time feasibility of decision-making for autonomous driving using the proposed approach on realistic multi-lane traffic scenarios with interactive agents in SUMO simulations.
混合整数二次规划(MIQPs)是一种将车辆决策和运动规划问题形式化的 versatile方法,其中预测模型是一个涉及离散和连续决策变量的混合动态系统。然而,即使是最先进的MIQP求解器也可能很难满足汽车嵌入平台上的挑战要求。因此,我们使用机器学习来简化,从而加速优化。我们的工作基于通过训练神经网络预测整数变量的最优值来解决实时MIQPs的想法,并通过在线二次规划解决剩余问题。具体来说,我们提出了一个循环移位等价深度集,特别适用于涉及许多障碍物的MIQPs,这是运动规划问题中计算负担的主要来源。我们的框架还包括一个可行性投影器,用于纠正整数变量的不可行预测,大大增加了计算无碰撞轨迹的可能性。我们在SUMO仿真中使用该方法对现实世界的多道交通场景进行自动驾驶的决策分析。
https://arxiv.org/abs/2405.08122
Global Positioning System (GPS) navigation provides accurate positioning with global coverage, making it a reliable option in open areas with unobstructed sky views. However, signal degradation may occur in indoor spaces and urban canyons. In contrast, Inertial Measurement Units (IMUs) consist of gyroscopes and accelerometers that offer relative motion information such as acceleration and rotational changes. Unlike GPS, IMUs do not rely on external signals, making them useful in GPS-denied environments. Nonetheless, IMUs suffer from drift over time due to the accumulation of errors while integrating acceleration to determine velocity and position. Therefore, fusing the GPS and IMU is crucial for enhancing the reliability and precision of navigation systems in autonomous vehicles, especially in environments where GPS signals are compromised. To ensure smooth navigation and overcome the limitations of each sensor, the proposed method fuses GPS and IMU data. This sensor fusion uses the Unscented Kalman Filter (UKF) Bayesian filtering technique. The proposed navigation system is designed to be robust, delivering continuous and accurate positioning critical for the safe operation of autonomous vehicles, particularly in GPS-denied environments. This project uses KITTI GNSS and IMU datasets for experimental validation, showing that the GNSS-IMU fusion technique reduces GNSS-only data's RMSE. The RMSE decreased from 13.214, 13.284, and 13.363 to 4.271, 5.275, and 0.224 for the x-axis, y-axis, and z-axis, respectively. The experimental result using UKF shows promising direction in improving autonomous vehicle navigation using GPS and IMU sensor fusion using the best of two sensors in GPS-denied environments.
全球定位系统(GPS)导航提供了全球覆盖的准确定位,因此在开阔区域和无阻挡的视野下,它是一个可靠的选项。然而,在室内和城市峡谷中,信号可能会减弱。相比之下,惯性测量单元(IMU)由陀螺仪和加速度计组成,提供相对运动信息,如加速度和旋转变化。与GPS不同,IMU不依赖于外部信号,因此在GPS被拒绝的环境中,IMU具有优势。然而,IMU由于在积分加速度以确定速度和位置时累积误差而产生漂移。因此,将GPS和IMU融合起来对于提高自动驾驶车辆导航系统的可靠性和精度至关重要,尤其是在GPS信号受到破坏的环境中。为了确保平稳导航并克服每个传感器的局限性,所提出的方法将GPS和IMU数据进行融合。该传感器融合使用英国西格玛过滤器(UKF)的贝叶斯滤波技术。所设计的导航系统旨在确保其稳健,能够实现自动驾驶车辆的安全操作,尤其是在GPS被拒绝的环境中。本项目使用KITTI GNSS和IMU数据集进行实验验证,结果表明,GNSS-IMU融合技术可以降低GNSS-only数据的方差。方差从x轴、y轴和z轴分别减小了13.214、13.284和13.363到4.271、5.275和0.224。使用UKF的实验结果表明,使用GPS和IMU传感器融合有望在GPS被拒绝的环境中改善自动驾驶车辆的导航。
https://arxiv.org/abs/2405.08119
Even as technology and performance gains are made in the sphere of automated driving, safety concerns remain. Vehicle simulation has long been seen as a tool to overcome the cost associated with a massive amount of on-road testing for development and discovery of safety critical "edge-cases". However, purely software-based vehicle models may leave a large realism gap between their real-world counterparts in terms of dynamic response, and highly realistic vehicle-in-the-loop (VIL) simulations that encapsulate a virtual world around a physical vehicle may still be quite expensive to produce and similarly time intensive as on-road testing. In this work, we demonstrate an AV simulation test bed that combines the realism of vehicle-in-the-loop (VIL) simulation with the ease of implementation of model-in-the-loop (MIL) simulation. The setup demonstrated in this work allows for response diagnosis for the VIL simulations. By observing causal links between virtual weather and lighting conditions that surround the virtual depiction of our vehicle, the vision-based perception model and controller of Openpilot, and the dynamic response of our physical vehicle under test, we can draw conclusions regarding how the perceived environment contributed to vehicle response. Conversely, we also demonstrate response prediction for the MIL setup, where the need for a physical vehicle is not required to draw richer conclusions around the impact of environmental conditions on AV performance than could be obtained with VIL simulation alone. These combine for a simulation setup with accurate real-world implications for edge-case discovery that is both cost effective and time efficient to implement.
尽管在自动驾驶技术及其性能方面取得了进步,但安全性仍然是一个值得关注的问题。长期以来,车辆仿真被认为是一种通过开发和发现安全关键“边缘情况”所需的巨额道路测试成本的工具。然而,基于软件的车辆模型可能在其现实世界的对应物之间留下较大的动态响应差距。同样,高度逼真的车辆在环形仿真(VIL)中封装的虚拟世界可能仍然非常昂贵,且类似的时间密集于道路测试。在这项工作中,我们展示了结合车辆在环形仿真(VIL)中的现实主义和模型在环形仿真中实现容易性相结合的AV仿真测试台。该设置允许对VIL仿真进行响应诊断。通过观察围绕我们车辆的虚拟天气和照明条件之间的因果关系,以及Openpilot基于视觉感知的模型和控制器以及我们在测试过程中实车的动态响应,我们可以得出关于感知环境如何影响车辆反应的结论。相反,我们还展示了不需要物理车辆的MIL设置的响应预测。在这种设置中,通过观察虚拟天气和照明条件与我们对车辆的视觉描绘之间的因果关系,以及通过VIL仿真获得的关于环境条件对AV性能的影响,我们可以得出关于感知环境如何影响车辆反应的结论。这些结合为既具有准确现实意义,又具有成本效益和时间效率的仿真设置。
https://arxiv.org/abs/2405.07981
Place recognition is the foundation for enabling autonomous systems to achieve independent decision-making and safe operations. It is also crucial in tasks such as loop closure detection and global localization within SLAM. Previous methods utilize mundane point cloud representations as input and deep learning-based LiDAR-based Place Recognition (LPR) approaches employing different point cloud image inputs with convolutional neural networks (CNNs) or transformer architectures. However, the recently proposed Mamba deep learning model, combined with state space models (SSMs), holds great potential for long sequence modeling. Therefore, we developed OverlapMamba, a novel network for place recognition, which represents input range views (RVs) as sequences. In a novel way, we employ a stochastic reconstruction approach to build shift state space models, compressing the visual representation. Evaluated on three different public datasets, our method effectively detects loop closures, showing robustness even when traversing previously visited locations from different directions. Relying on raw range view inputs, it outperforms typical LiDAR and multi-view combination methods in time complexity and speed, indicating strong place recognition capabilities and real-time efficiency.
位置识别是使自动驾驶系统实现独立决策和安全的操作的基础,同时在SLAM任务中(例如闭环检测和全局定位)也非常关键。以前的方法利用乏味的点云表示作为输入,并采用基于深度学习的激光雷达(LiDAR)基于点云图像的识别方法(LPR)或Transformer架构的不同点云图像输入。然而,最近提出的Mamba深度学习模型与状态空间模型(SSMs)相结合,具有很大的长期序列建模潜力。因此,我们开发了OverlapMamba,一种新型的用于位置识别的网络,将输入范围视(RVs)表示为序列。与传统方法不同,我们采用随机重构方法构建了转移状态空间模型,压缩了视觉表示。在三个不同的公开数据集上评估,我们的方法有效地检测到了闭环,即使从不同的方向访问之前访问过的位置时,表现依然稳健。依赖原始范围视图输入,它在时间和速度上优于典型的LiDAR和多视图组合方法,表明具有强大的定位能力和实时效率。
https://arxiv.org/abs/2405.07966
The scale-up of autonomous vehicles depends heavily on their ability to deal with anomalies, such as rare objects on the road. In order to handle such situations, it is necessary to detect anomalies in the first place. Anomaly detection for autonomous driving has made great progress in the past years but suffers from poorly designed benchmarks with a strong focus on camera data. In this work, we propose AnoVox, the largest benchmark for ANOmaly detection in autonomous driving to date. AnoVox incorporates large-scale multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. We propose a formal definition of normality and provide a compliant training dataset. AnoVox is the first benchmark to contain both content and temporal anomalies.
自动驾驶车辆的扩展很大程度上取决于其处理异常情况(如道路上的罕见物体)的能力。为了处理这种情况,首先需要检测异常情况。自动驾驶中的异常检测在过去几年中取得了很大进展,但受到设计不良的基准测试关注点强烈的相机数据的影响。在这项工作中,我们提出了AnoVox,迄今为止自动驾驶中最大规模的异常检测基准。AnoVox包含了大规模的多模态传感器数据和空间Voxel地面真实值,允许比较独立于它们所使用传感器的方法。我们提出了异常的正式定义,并提供了符合规范的训练数据集。AnoVox是第一个包含内容和时间异常的基准。
https://arxiv.org/abs/2405.07865
Autonomous driving systems require a quick and robust perception of the nearby environment to carry out their routines effectively. With the aim to avoid collisions and drive safely, autonomous driving systems rely heavily on object detection. However, 2D object detections alone are insufficient; more information, such as relative velocity and distance, is required for safer planning. Monocular 3D object detectors try to solve this problem by directly predicting 3D bounding boxes and object velocities given a camera image. Recent research estimates time-to-contact in a per-pixel manner and suggests that it is more effective measure than velocity and depth combined. However, per-pixel time-to-contact requires object detection to serve its purpose effectively and hence increases overall computational requirements as two different models need to run. To address this issue, we propose per-object time-to-contact estimation by extending object detection models to additionally predict the time-to-contact attribute for each object. We compare our proposed approach with existing time-to-contact methods and provide benchmarking results on well-known datasets. Our proposed approach achieves higher precision compared to prior art while using a single image.
自动驾驶系统需要对周围环境进行快速且可靠的感知,以有效执行其任务。为了避免碰撞并安全驾驶,自动驾驶系统 reliance heavily on object detection。然而,单独的2D物体检测是不够的;为了进行更安全的规划,还需要更多的信息,如相对速度和距离。单目3D物体检测器试图通过直接预测相机图像中的3D边界框和物体速度来解决这个问题。最近的研究以每像素方式估计了时间到达,并建议这是比速度和深度联合更有效的测量方法。然而,每像素时间到达需要物体检测器实现其目的,因此增加了整体计算需求。为了应对这个问题,我们提出了一种通过扩展物体检测模型来预测每个物体的时间到达的方法。我们比较了我们提出的方法与现有时间到达方法,并在知名数据集上进行了基准测试。我们提出的方法在单张图片上实现更高精度的时间到达,同时使用了一个物体。
https://arxiv.org/abs/2405.07698
Current multi-modality driving frameworks normally fuse representation by utilizing attention between single-modality branches. However, the existing networks still suppress the driving performance as the Image and LiDAR branches are independent and lack a unified observation representation. Thus, this paper proposes MaskFuser, which tokenizes various modalities into a unified semantic feature space and provides a joint representation for further behavior cloning in driving contexts. Given the unified token representation, MaskFuser is the first work to introduce cross-modality masked auto-encoder training. The masked training enhances the fusion representation by reconstruction on masked tokens. Architecturally, a hybrid-fusion network is proposed to combine advantages from both early and late fusion: For the early fusion stage, modalities are fused by performing monotonic-to-BEV translation attention between branches; Late fusion is performed by tokenizing various modalities into a unified token space with shared encoding on it. MaskFuser respectively reaches a driving score of 49.05 and route completion of 92.85% on the CARLA LongSet6 benchmark evaluation, which improves the best of previous baselines by 1.74 and 3.21%. The introduced masked fusion increases driving stability under damaged sensory inputs. MaskFuser outperforms the best of previous baselines on driving score by 6.55 (27.8%), 1.53 (13.8%), 1.57 (30.9%), respectively given sensory masking ratios 25%, 50%, and 75%.
目前的多模态驾驶框架通常通过在单模态分支之间利用注意力来融合表示。然而,现有的网络仍然会抑制在 Image 和 LiDAR 分支之间缺乏统一观察表示的情况下驱动性能。因此,本文提出了 MaskFuser,它将各种模块统一到一个语义特征空间中,并为进一步在驾驶场景中行为复制提供联合表示。在统一 token 表示下,MaskFuser 是第一个引入跨模态掩码自编码器训练的工作。通过在掩码标记的token上进行重建,遮码训练增强了融合表示。架构上,一种混合融合网络被提出,结合了早期和晚期融合的优点:在早期融合阶段,通过在分支之间执行单调到 BEV 转换注意力和;在晚期融合阶段,将各种模块元分类到统一的标记空间,并在其中共享编码。MaskFuser 在CARLA LongSet6基准评估中的驾驶分数为49.05,路线完成率为92.85%,比最先进的基线提高了1.74和3.21%。引入的遮码融合在受损的感知输入下提高了驾驶稳定性。MaskFuser在驾驶分数上比最先进的基线提高了6.55(27.8%),1.53(13.8%),1.57(30.9%)倍。在感官掩码比率为25%,50%,75%时,MaskFuser的表现也分别比最先进的基线提高了2.67%,3.48%,和3.92%。
https://arxiv.org/abs/2405.07573