Aerial imagery is increasingly used in Earth science and natural resource management as a complement to labor-intensive ground-based surveys. Aerial systems can collect overlapping images that provide multiple views of each location from different perspectives. However, most prediction approaches (e.g. for tree species classification) use a single, synthesized top-down "orthomosaic" image as input that contains little to no information about the vertical aspects of objects and may include processing artifacts. We propose an alternate approach that generates predictions directly on the raw images and accurately maps these predictions into geospatial coordinates using semantic meshes. This method$\unicode{x2013}$released as a user-friendly open-source toolkit$\unicode{x2013}$enables analysts to use the highest quality data for predictions, capture information about the sides of objects, and leverage multiple viewpoints of each location for added robustness. We demonstrate the value of this approach on a new benchmark dataset of four forest sites in the western U.S. that consists of drone images, photogrammetry results, predicted tree locations, and species classification data derived from manual surveys. We show that our proposed multiview method improves classification accuracy from 53% to 75% relative to an orthomosaic baseline on a challenging cross-site tree species classification task.
无人机影像在地球科学和自然资源管理中作为劳动密集型地面调查的补充,越来越受到关注。无人机系统可以收集重叠的图像,从不同的角度提供每个地点的多个视图。然而,大多数预测方法(例如树木种类分类)使用单个合成顶部的“正射影像”作为输入,其中包含少量的关于物体垂直方面的信息,并可能包括处理伪影。我们提出了一种替代方法,直接在原始图像上生成预测,并使用语义网格将预测准确地映射到地理坐标中。这个用户友好、开源的工具包$\unicode{x2013}$的发布使得分析师可以使用最高质量的数据进行预测,捕获物体的一侧信息,并利用每个地点的多个视角来增加稳健性。我们在美国西部四个森林站的基准数据集上证明了这种方法的价值,该数据集包括无人机影像、地形测量结果、预测树木位置和来自手动调查的树木种类分类数据。我们显示,与正射影像基线相比,我们提出的多视角方法将分类准确性从53%提高到了75%。
https://arxiv.org/abs/2405.09544
In this paper, we present an innovative technique for the path planning of flying robots in a 3D environment in Rough Mereology terms. The main goal was to construct the algorithm that would generate the mereological potential fields in 3-dimensional space. To avoid falling into the local minimum, we assist with a weighted Euclidean distance. Moreover, a searching path from the start point to the target, with respect to avoiding the obstacles was applied. The environment was created by connecting two cameras working in real-time. To determine the gate and elements of the world inside the map was responsible the Python Library OpenCV [1] which recognized shapes and colors. The main purpose of this paper is to apply the given results to drones.
在本文中,我们提出了一种创新的方法,用于在 rough melee 环境下对飞行机器人的路径进行规划。主要目标是为 3D 空间中的飞行机器人生成只论域 potential fields。为了避免陷入局部最小值,我们使用加权欧氏距离来协助算法。此外,我们还应用了从起点到目标点的搜索路径,以避免障碍物。环境是由实时连接的两个相机创建的。确定地图内世界的门和元素的是 Python 库 OpenCV [1],它识别形状和颜色。本文的主要目的是将所得到的结果应用于无人机。
https://arxiv.org/abs/2405.09282
Object detection techniques for Unmanned Aerial Vehicles (UAVs) rely on Deep Neural Networks (DNNs), which are vulnerable to adversarial attacks. Nonetheless, adversarial patches generated by existing algorithms in the UAV domain pay very little attention to the naturalness of adversarial patches. Moreover, imposing constraints directly on adversarial patches makes it difficult to generate patches that appear natural to the human eye while ensuring a high attack success rate. We notice that patches are natural looking when their overall color is consistent with the environment. Therefore, we propose a new method named Environmental Matching Attack(EMA) to address the issue of optimizing the adversarial patch under the constraints of color. To the best of our knowledge, this paper is the first to consider natural patches in the domain of UAVs. The EMA method exploits strong prior knowledge of a pretrained stable diffusion to guide the optimization direction of the adversarial patch, where the text guidance can restrict the color of the patch. To better match the environment, the contrast and brightness of the patch are appropriately adjusted. Instead of optimizing the adversarial patch itself, we optimize an adversarial perturbation patch which initializes to zero so that the model can better trade off attacking performance and naturalness. Experiments conducted on the DroneVehicle and Carpk datasets have shown that our work can reach nearly the same attack performance in the digital attack(no greater than 2 in mAP$\%$), surpass the baseline method in the physical specific scenarios, and exhibit a significant advantage in terms of naturalness in visualization and color difference with the environment.
无人机(UAV)的目标检测技术依赖于深度神经网络(DNNs),这些网络对攻击非常敏感。然而,UAV领域现有算法生成的攻击补丁对攻击的自然性非常关注。此外,直接对攻击补丁施加约束会使得生成看起来自然的人工补丁变得困难,同时保证高攻击成功率。我们注意到,当补丁的整体颜色与环境相同时,它们看起来是自然的。因此,我们提出了一种名为环境匹配攻击(EMA)的新方法来解决在颜色约束下优化攻击补丁的问题。据我们所知,这是第一个考虑UAV领域自然补丁的论文。EMA方法利用预训练的稳定扩散的强烈先验知识引导攻击补丁的优化方向,其中文本指导可以限制补丁的颜色。为了更好地匹配环境,适当调整补丁的对比度和亮度。我们不是优化攻击补丁本身,而是优化一个攻击补丁,该补丁初始化为零,以便模型可以更好地平衡攻击性能和自然性。在DroneVehicle和Carpk数据集上进行的实验表明,我们的工作在数字攻击(MAP%不超过2)方面的攻击性能与基线方法相当,在物理特定场景中超过了基线方法,并且在可视化和颜色差异方面具有显著的优越性。
https://arxiv.org/abs/2405.07595
Berry picking has long-standing traditions in Finland, yet it is challenging and can potentially be dangerous. The integration of drones equipped with advanced imaging techniques represents a transformative leap forward, optimising harvests and promising sustainable practices. We propose WildBe, the first image dataset of wild berries captured in peatlands and under the canopy of Finnish forests using drones. Unlike previous and related datasets, WildBe includes new varieties of berries, such as bilberries, cloudberries, lingonberries, and crowberries, captured under severe light variations and in cluttered environments. WildBe features 3,516 images, including a total of 18,468 annotated bounding boxes. We carry out a comprehensive analysis of WildBe using six popular object detectors, assessing their effectiveness in berry detection across different forest regions and camera types. We will release WildBe publicly.
翻译:虽然芬兰有着悠久的采摘野果的传统,但是采摘野果具有挑战性,还可能存在危险。利用配备先进成像技术的无人机进行集成,代表着向前迈进了一步,优化了采摘成果并承诺了可持续的实践。我们提出了WildBe,第一个利用无人机在泥炭地和对芬兰森林树冠层采摘野果的图像数据集。与之前和相关数据集相比,WildBe包括在严重光变和杂乱环境中捕获的新品种野果,如越橘、云莓、酸果和野草莓。WildBe包含3,516张图像,包括总共18,468个标注的边界框。我们使用六个流行的物体检测器对WildBe进行全面分析,评估它们在不同森林区域和相机类型下的野果检测效果。我们将发布WildBe公开。
https://arxiv.org/abs/2405.07550
Consumer-grade drones have become effective multimedia collection tools, spring-boarded by rapid development in embedded CPUs, GPUs, and cameras. They are best known for their ability to cheaply collect high-quality aerial video, 3D terrain scans, infrared imagery, etc., with respect to manned aircraft. However, users can also create and attach custom sensors, actuators, or computers, so the drone can collect different data, generate composite data, or interact intelligently with its environment, e.g., autonomously changing behavior to land in a safe way, or choosing further data collection sites. Unfortunately, developing custom payloads is prohibitively difficult for many researchers outside of engineering. We provide guidelines for how to create a sophisticated computational payload that integrates a Raspberry Pi 5 into a DJI Matrice 350. The payload fits into the Matrice's case like a typical DJI payload (but is much cheaper), is easy to build and expand (3D-printed), uses the drone's power and telemetry, can control the drone and its other payloads, can access the drone's sensors and camera feeds, and can process video and stream it to the operator via the controller in real time. We describe the difficulties and proprietary quirks we encountered, how we worked through them, and provide setup scripts and a known-working configuration for others to use.
消费级无人机已经成为有效的多媒体收集工具,由嵌入式CPU、GPU和相机的快速发展而催生。它们最为人所知的是在成本较低的情况下收集高质量的高空视频、3D地形扫描、红外图像等,与载人飞机相比。然而,用户还可以创建并附加定制传感器、执行器或计算机,使无人机可以收集不同数据、生成合成数据,或者与周围环境智能交互,例如以安全方式自主着陆,或选择进一步的数据收集站点。 然而,对于许多非工程师领域的科研人员来说,开发定制负载是非常困难的。我们为如何将Raspberry Pi 5集成到DJI Matrice 350中创建一个复杂的计算负载提供了指导。该负载非常适合Matrice(但成本更低),易于设计和扩展(3D打印),使用无人机的电力和数据传输,可以控制无人机及其它负载,可以访问无人机的传感器和摄像头数据,并通过控制器实时处理视频并将其流式传输给操作员。 我们描述了我们在过程中遇到的困难和 proprietary quirks,以及我们是如何克服这些问题的。我们还提供了设置脚本和已知工作的配置,供其他人使用。
https://arxiv.org/abs/2405.06176
This article presents the world's first rapid drone flocking control using natural language through generative AI. The described approach enables the intuitive orchestration of a flock of any size to achieve the desired geometry. The key feature of the method is the development of a new interface based on Large Language Models to communicate with the user and to generate the target geometry descriptions. Users can interactively modify or provide comments during the construction of the flock geometry model. By combining flocking technology and defining the target surface using a signed distance function, smooth and adaptive movement of the drone swarm between target states is achieved. Our user study on FlockGPT confirmed a high level of intuitive control over drone flocking by users. Subjects who had never previously controlled a swarm of drones were able to construct complex figures in just a few iterations and were able to accurately distinguish the formed swarm drone figures. The results revealed a high recognition rate for six different geometric patterns generated through the LLM-based interface and performed by a simulated drone flock (mean of 80% with a maximum of 93\% for cube and tetrahedron patterns). Users commented on low temporal demand (19.2 score in NASA-TLX), high performance (26 score in NASA-TLX), attractiveness (1.94 UEQ score), and hedonic quality (1.81 UEQ score) of the developed system. The FlockGPT demo code repository can be found at: coming soon
本文介绍了使用自然语言通过生成式人工智能实现世界范围内第一个快速无人机集群控制的方法。描述的方法允许用户直观地编排任意大小的集群以达到所需的形状。该方法的关键特点是基于大型语言模型开发的新接口,用于与用户交互并生成目标形状描述。用户在集群几何模型构建过程中可以交互式修改或提供评论。通过结合无人机集群技术和使用带签名距离函数定义目标表面,实现了无人机集群在目标状态之间的平滑和自适应运动。我们对FlockGPT的用户研究证实了用户对无人机集群的直观控制程度很高。之前没有控制过无人机集群的受试者只用几步就能构建出复杂的形状,并且能够准确地区分形成的无人机集群形状。结果表明,基于LLM的接口生成的六种不同几何图案的识别率为80%到93%。用户对系统的时间需求低(NASA-TLX中的19.2分),性能高(26分),吸引力高(1.94 UEQ分数),审美观好(1.81 UEQ分数)发表了评论。FlockGPT的演示代码存储库可以在:即将发布。
https://arxiv.org/abs/2405.05872
This work presents a drone detector with modified backbone and multiple pyramid feature maps enhancement structure (MDDPE). Novel feature maps improve modules that uses different levels of information to produce more robust and discriminatory features is proposed. These module includes the feature maps supplement function and the feature maps recombination enhancement this http URL effectively handle the drone characteristics, auxiliary supervisions that are implemented in the early stages by employing tailored anchors designed are utilized. To further improve the modeling of real drone detection scenarios and initialization of the regressor, an updated anchor matching technique is introduced to match anchors and ground truth drone as closely as feasible. To show the proposed MDDPE's superiority over the most advanced detectors, extensive experiments are carried out using well-known drone detection benchmarks.
本文提出了一种名为MDDPE的多层金字塔特征图增强结构(无人机检测器)。新颖的特征图改进了使用不同信息水平产生更稳健和具有区分性的模块。这些模块包括特征图补充功能和特征图复合同样增强。通过采用定制锚定器,在无人机特征识别的早期阶段实现了辅助监督,从而有效地处理了无人机特性,辅助监督在无人机检测器中作为定制锚定器被利用。为了进一步改善对真实无人机检测场景的建模和初始化,引入了一种更新的锚定匹配技术,以尽可能地匹配锚点和地面真实无人机。为了展示所提出的MDDPE与最先进的检测器相比的优越性,使用著名的无人机检测基准进行了广泛的实验。
https://arxiv.org/abs/2405.02882
Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module, an innovative module designed to simulate traffic scenarios with inherent uncertainty. This module enriches the predictive process by infusing it with detailed semantic information, thereby enhancing trajectory prediction accuracy. Complementing this, our Spatio-Temporal (ST) Interaction Module captures the nuanced effects of traffic scenarios on vehicle dynamics across both spatial and temporal dimensions with remarkable effectiveness. Demonstrated through exhaustive evaluations, our model sets a new standard in trajectory prediction, achieving state-of-the-art (SOTA) results on the Next Generation Simulation (NGSIM), Highway Drone (HighD), and Macao Connected Autonomous Driving (MoCAD) datasets across both short and extended temporal spans. This performance underscores the model's unparalleled adaptability and efficacy in navigating complex traffic scenarios, including highways, urban streets, and intersections.
轨迹预测是自动驾驶(AD)中的关键技术,在使车辆在动态环境中安全高效地导航方面发挥了重要作用。为解决这个问题,本文提出了一种专门针对异质和不确定交通场景的轨迹预测模型。这个模型的核心是基于特征扩散模块,这是一种创新的模块,旨在通过模拟固有不确定性的交通场景来提高预测准确性。通过向这个模型注入详细语义信息,从而增强了轨迹预测的准确性。此外,我们的空间-时间(ST)交互模块有效地捕捉了交通场景对车辆动力学的影响,在时间和空间维度上实现了对车辆动态的微小影响。通过详尽评估,我们的模型在轨迹预测方面达到了新的标准,在Next Generation Simulation(NGSIM)、高速公路无人机(HighD)和澳门 Connected Autonomous Driving(MoCAD)数据集上取得了最先进的(SOTA)结果。这种性能凸显了模型的无与伦比的适应性和有效性,使其能够应对复杂的交通场景,包括高速公路、城市街道和交叉口。
https://arxiv.org/abs/2405.02145
Rapid advancements of deep learning are accelerating adoption in a wide variety of applications, including safety-critical applications such as self-driving vehicles, drones, robots, and surveillance systems. These advancements include applying variations of sophisticated techniques that improve the performance of models. However, such models are not immune to adversarial manipulations, which can cause the system to misbehave and remain unnoticed by experts. The frequency of modifications to existing deep learning models necessitates thorough analysis to determine the impact on models' robustness. In this work, we present an experimental evaluation of the effects of model modifications on deep learning model robustness using adversarial attacks. Our methodology involves examining the robustness of variations of models against various adversarial attacks. By conducting our experiments, we aim to shed light on the critical issue of maintaining the reliability and safety of deep learning models in safety- and security-critical applications. Our results indicate the pressing demand for an in-depth assessment of the effects of model changes on the robustness of models.
深度学习的快速发展在各种应用中加速了其采用,包括自动驾驶车辆、无人机、机器人和监控系统等安全关键应用。这些进步包括应用复杂的技巧来提高模型的性能。然而,这些模型并非免受对抗性操纵的影响,这可能导致系统表现异常,并让专家无法察觉。对现有深度学习模型的修改频率表明,需要对模型的一致性进行深入分析,以确定其对模型鲁棒性的影响。在这项工作中,我们通过使用对抗攻击来评估模型修改对深度学习模型鲁棒性的影响。我们的方法包括研究模型修改对各种对抗攻击的鲁棒性。通过进行我们的实验,我们希望阐明在安全性和安全性关键应用中保持深度学习模型可靠性和安全性的迫切需求。我们的结果表明,对模型更改对模型鲁棒性的影响进行深入评估的需求非常紧迫。
https://arxiv.org/abs/2405.01934
Invasive plant species are detrimental to the ecology of both agricultural and wildland areas. Euphorbia esula, or leafy spurge, is one such plant that has spread through much of North America from Eastern Europe. When paired with contemporary computer vision systems, unmanned aerial vehicles, or drones, offer the means to track expansion of problem plants, such as leafy spurge, and improve chances of controlling these weeds. We gathered a dataset of leafy spurge presence and absence in grasslands of western Montana, USA, then surveyed these areas with a commercial drone. We trained image classifiers on these data, and our best performing model, a pre-trained DINOv2 vision transformer, identified leafy spurge with 0.84 accuracy (test set). This result indicates that classification of leafy spurge is tractable, but not solved. We release this unique dataset of labelled and unlabelled, aerial drone imagery for the machine learning community to explore. Improving classification performance of leafy spurge would benefit the fields of ecology, conservation, and remote sensing alike. Code and data are available at our website: this http URL.
入侵植物物种对农业和野生地区的生态系统都有害。Euphorbia esula(或称为叶面灌木)是一种已经从东欧扩散到北美大部分地区的植物。与当代计算机视觉系统、自主飞行器或无人机搭配,可以追踪问题植物(如叶面灌木)的扩散并提高控制这些杂草的机会。我们收集了美国怀俄明州西部草原中叶面灌木的有无数据,然后用商业无人机对其进行了调查。我们对这些数据进行训练,并训练了图像分类器。我们表现最好的模型——预训练的DINOv2视觉变压器,识别出叶面灌木的准确率为0.84(测试集)。这个结果表明,对叶面灌木进行分类是可行的,但尚未解决。我们向机器学习社区发布了这个带有标签和未标记、无人机影像的独一无二 dataset。提高叶面灌木分类性能将有益于生态学、保护和遥感领域。代码和数据可在我们的网站http://www.this URL上获得。
https://arxiv.org/abs/2405.03702
In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The proposed technique relies upon a lightweight neural network backbone augmented with capabilities for continual learning. The envisaged approach amalgamates predictions of the gates' positional coordinates, distance, and orientation, encapsulating them into a cohesive pose tuple. A comprehensive number of tests serve to underscore the efficacy of this approach in confronting diverse and challenging scenarios, specifically those involving variable lighting conditions. The proposed methodology exhibits notable robustness in the face of illumination variations, thereby substantiating its effectiveness.
在自主和移动机器人领域,一个主要的挑战是具有弹性的实时环境感知,尤其是在具有未知和动态元素的背景下,例如在自主无人机竞速的背景下。本研究引入了一种在照明变化下检测无人机竞速门的技术,这是高速无人机飞行中常见的。所提出的技术依赖于一个轻量级的神经网络骨架,通过持续学习来增强其能力。预计的方法将门的位置坐标、距离和方向预测结合为一个凝聚的姿势元组。一系列全面的测试结果证实了这种方法在面对多样且具有挑战性的场景时具有有效性,尤其是在涉及变异性照明条件的情况下。所提出的方法在照明变化面前表现出明显的稳健性,从而证实了其有效性。
https://arxiv.org/abs/2405.01054
The availability of high-quality datasets is crucial for the development of behavior prediction algorithms in autonomous vehicles. This paper highlights the need for standardizing the use of certain datasets for motion forecasting research to simplify comparative analysis and proposes a set of tools and practices to achieve this. Drawing on extensive experience and a comprehensive review of current literature, we summarize our proposals for preprocessing, visualizing, and evaluation in the form of an open-sourced toolbox designed for researchers working on trajectory prediction problems. The clear specification of necessary preprocessing steps and evaluation metrics is intended to alleviate development efforts and facilitate the comparison of results across different studies. The toolbox is available at: this https URL.
高质量数据集的可用性对于自动驾驶车辆中行为预测算法的开发至关重要。本文强调了在运动预测研究中标准化使用某些数据集的必要性,以简化比较分析,并提出了一系列工具和做法来实现这一目标。我们综合了广泛的经验和对当前文献的全面回顾,以提供一个为研究轨迹预测问题而设计的开源工具箱。明确的数据预处理步骤和评估指标的定义旨在减轻开发负担,并促进不同研究之间的结果比较。该工具箱可在以下链接访问:https://this URL。
https://arxiv.org/abs/2405.00604
A vision-based drone-to-drone detection system is crucial for various applications like collision avoidance, countering hostile drones, and search-and-rescue operations. However, detecting drones presents unique challenges, including small object sizes, distortion, occlusion, and real-time processing requirements. Current methods integrating multi-scale feature fusion and temporal information have limitations in handling extreme blur and minuscule objects. To address this, we propose a novel coarse-to-fine detection strategy based on vision transformers. We evaluate our approach on three challenging drone-to-drone detection datasets, achieving F1 score enhancements of 7%, 3%, and 1% on the FL-Drones, AOT, and NPS-Drones datasets, respectively. Additionally, we demonstrate real-time processing capabilities by deploying our model on an edge-computing device. Our code will be made publicly available.
基于视觉的无人机对无人机检测系统对于各种应用,如避障、应对敌对无人机和搜索与救援任务至关重要。然而,检测无人机存在独特的挑战,包括小物体尺寸、畸变、遮挡和实时处理需求。目前将多尺度特征融合和时间信息相结合的方法在处理极端模糊和微小物体方面存在局限。为了应对这一挑战,我们提出了一个基于视觉变压器的全新粗-到细检测策略。我们在FL-Drones、AOT和NPS-Drones等三个具有挑战性的无人机对无人机检测数据集上进行了评估,分别实现了FL-Drones数据集的F1得分提高7%、AOT数据集的F1得分提高3%和NPS-Drones数据集的F1得分提高1%。此外,通过将我们的模型部署在边缘计算设备上,我们还展示了实时处理能力。我们的代码将公开发布。
https://arxiv.org/abs/2404.19276
Multi-drone cooperative transport (CT) problem has been widely studied in the literature. However, limited work exists on control of such systems in the presence of time-varying uncertainties, such as the time-varying center of gravity (CG). This paper presents a leader-follower approach for the control of a multi-drone CT system with time-varying CG. The leader uses a traditional Proportional-Integral-Derivative (PID) controller, and in contrast, the follower uses a deep reinforcement learning (RL) controller using only local information and minimal leader information. Extensive simulation results are presented, showing the effectiveness of the proposed method over a previously developed adaptive controller and for variations in the mass of the objects being transported and CG speeds. Preliminary experimental work also demonstrates ball balance (depicting moving CG) on a stick/rod lifted by two Crazyflie drones cooperatively.
多无人机协同运输(CT)问题在文献中得到了广泛研究。然而,在存在时间变化不确定性的情况下,例如时间变化的重心(CG),控制这种系统的工作有限。本文提出了一种领导-跟随方法来控制具有时间变化CG的多无人机CT系统。领导者使用传统的比例-积分-微分(PID)控制器,而跟随者则使用仅使用局部信息和最小领导者信息的深度强化学习(RL)控制器。详细仿真结果表明,与之前开发的自适应控制器相比,所提出的方法在物体质量变化和CG速度变化方面都取得了显著效果。初步实验工作还展示了两个Crazyflie无人机合作抬起一根杆子时的球平衡(显示运动CG)。
https://arxiv.org/abs/2404.19070
We propose a novel failure-aware reactive UAV delivery service composition framework. A skyway network infrastructure is presented for the effective provisioning of services in urban areas. We present a formal drone delivery service model and a system architecture for reactive drone delivery services. We develop radius-based, cell density-based, and two-phased algorithms to reduce the search space and perform reactive service compositions when a service failure occurs. We conduct a set of experiments with a real drone dataset to demonstrate the effectiveness of our proposed approach.
我们提出了一个新颖的失效感知反应式无人机交付服务组合框架。为城市地区有效提供服务,我们呈现了一种形式化的无人机配送服务模型和响应式无人机交付服务的系统架构。我们开发了基于半径、密度和两阶段算法的失效感知服务组合,以减少搜索空间并实现服务失败时的反应式服务组合。我们使用实际无人机数据集进行了一系列实验,以证明我们提出的方法的有效性。
https://arxiv.org/abs/2404.18363
Drones are increasingly used in fields like industry, medicine, research, disaster relief, defense, and security. Technical challenges, such as navigation in GPS-denied environments, hinder further adoption. Research in visual odometry is advancing, potentially solving GPS-free navigation issues. Traditional visual odometry methods use geometry-based pipelines which, while popular, often suffer from error accumulation and high computational demands. Recent studies utilizing deep neural networks (DNNs) have shown improved performance, addressing these drawbacks. Deep visual odometry typically employs convolutional neural networks (CNNs) and sequence modeling networks like recurrent neural networks (RNNs) to interpret scenes and deduce visual odometry from video sequences. This paper presents a novel real-time monocular visual odometry model for drones, using a deep neural architecture with a self-attention module. It estimates the ego-motion of a camera on a drone, using consecutive video frames. An inference utility processes the live video feed, employing deep learning to estimate the drone's trajectory. The architecture combines a CNN for image feature extraction and a long short-term memory (LSTM) network with a multi-head attention module for video sequence modeling. Tested on two visual odometry datasets, this model converged 48% faster than a previous RNN model and showed a 22% reduction in mean translational drift and a 12% improvement in mean translational absolute trajectory error, demonstrating enhanced robustness to noise.
无人机在工业、医疗、科研、灾害救援、安全和防御等领域越来越受欢迎。然而,技术挑战,如在GPS禁止的环境中进行导航,仍然阻碍了更广泛的采用。视觉惯性测量(Visual Odometry)的研究正在取得进展,有可能解决GPS-free导航问题。传统的视觉惯性测量方法基于几何原理,虽然受欢迎,但往往存在误差累积和高计算需求的问题。最近利用深度神经网络(DNNs)进行研究,已经取得了更好的性能,这些问题得到了解决。 深度视觉惯性测量通常采用卷积神经网络(CNN)和循环神经网络(RNN)等序列建模网络来解释场景,并从视频序列中推断视觉惯性。本文介绍了一种用于无人机的全新实时单目视觉惯性测量模型,该模型采用深度神经架构,并带有一个自注意模块。它通过连续视频帧估计无人机摄像机的自运动。推断单元处理实时视频流,并利用深度学习估计无人机的轨迹。 在两个视觉惯性测量数据集上进行了测试,与之前的RNN模型相比,该模型 converged 48% 更快,mean translational drift平均降低了22%,mean translational absolute trajectory error平均降低了12%,证明了在噪声环境下的增强鲁棒性。
https://arxiv.org/abs/2404.17745
As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traffic scenarios. It represents a significant leap forward, achieving marked performance improvements on several key datasets. Specifically, it surpasses existing benchmarks with gains of 16.2% on the Next Generation Simulation (NGSIM), 27.4% on the Highway Drone (HighD), and 19.8% on the Macao Connected Autonomous Driving (MoCAD) dataset. Our proposed model shows exceptional proficiency in handling corner cases, essential for real-world applications. Moreover, its robustness is evident in scenarios with missing or limited data, outperforming most of the state-of-the-art baselines. This adaptability and resilience position our model as a viable tool for real-world autonomous driving systems, heralding a new standard in vehicle trajectory prediction for enhanced safety and efficiency.
随着自动驾驶技术的不断发展,精确轨迹预测模型的重要性变得越来越突出。本文介绍了一种创新模型,将认知洞察力融入轨迹预测中,重点关注感知安全性和动态决策。与传统方法不同,我们的模型在混合自主交通场景中分析互动和行为模式方面表现出色。这标志着一个重大的跃升,在多个关键数据集上取得了显著的性能改进。具体来说,它超越了现有基准,在Next Generation Simulation(NGSIM)数据集上的增益为16.2%,在Highway Drone(HighD)数据集上的增益为27.4%,在Macao Connected Autonomous Driving(MoCAD)数据集上的增益为19.8%。我们提出的模型在处理角点方面表现出卓越的技能,这对于现实世界的应用至关重要。此外,在缺失或有限数据的场景中,其稳健性显然超过了最先进的基准方法。这种适应性和韧性使我们的模型成为现实世界自动驾驶系统的可行工具,为提高安全性和效率预示着一个新的标准。
https://arxiv.org/abs/2404.17520
Health monitoring of remote critical infrastructure is a complex and expensive activity due to the limited infrastructure accessibility. Inspection drones are ubiquitous assets that enhance the reliability of critical infrastructures through improved accessibility. However, due to the harsh operation environment, it is crucial to monitor their health to ensure successful inspection operations. The battery is a key component that determines the overall reliability of the inspection drones and, with an appropriate health management approach, contributes to reliable and robust inspections. In this context, this paper presents a novel hybrid probabilistic approach for battery end-of-discharge (EOD) voltage prediction of Li-Po batteries. The hybridization is achieved in an error-correction configuration, which combines physics-based discharge and probabilistic error-correction models to quantify the aleatoric and epistemic uncertainty. The performance of the hybrid probabilistic methodology was empirically evaluated on a dataset comprising EOD voltage under varying load conditions. The dataset was obtained from real inspection drones operated on different flights, focused on offshore wind turbine inspections. The proposed approach has been tested with different probabilistic methods and demonstrates 14.8% improved performance in probabilistic accuracy compared to the best probabilistic method. In addition, aleatoric and epistemic uncertainties provide robust estimations to enhance the diagnosis of battery health-states.
远程关键基础设施的健康监测是一个复杂且昂贵的活动,由于基础设施的可访问性有限。检查无人机是一种无处不在的资产,通过提高可访问性来增强关键基础设施的可靠性。然而,由于恶劣的操作环境,监测它们的健康状况对于确保成功的检查操作至关重要。电池是关键组件,决定了检查无人机的整体可靠性,通过适当的 Health 管理方法,还提高了可靠且健壮的检查。在这种情况下,本文介绍了一种新颖的混合概率方法,用于预测锂-聚合物(Li-Po)电池的放电(EOD)电压。混合是在错误纠正配置下完成的,该配置将基于物理模型进行放电和概率误差纠正模型,以量化随机和知识不确定性。在各种负载条件下对 EOD 电压的性能进行了实证评估。数据集是从不同航班上运行的现实检查无人机获得的,重点关注海上风能涡轮机检查。所提出的方法已经通过不同概率方法进行了测试,与最佳概率方法相比,概率准确度提高了 14.8%。此外,随机和知识不确定性为提高电池健康状况的诊断提供了稳健的估计。
https://arxiv.org/abs/2405.00055
This paper presents a novel control strategy for drone networks to improve the quality of 3D structures reconstructed from aerial images by drones. Unlike the existing coverage control strategies for this purpose, our proposed approach simultaneously controls both the camera orientation and drone translational motion, enabling more comprehensive perspectives and enhancing the map's overall quality. Subsequently, we present a novel problem formulation, including a new performance function to evaluate the drone positions and camera orientations. We then design a QP-based controller with a control barrier-like function for a constraint on the decay rate of the objective function. The present problem formulation poses a new challenge, requiring significantly greater computational efforts than the case involving only translational motion control. We approach this issue technologically, namely by introducing JAX, utilizing just-in-time (JIT) compilation and Graphical Processing Unit (GPU) acceleration. We finally conduct extensive verifications through simulation in ROS (Robot Operating System) and show the real-time feasibility of the controller and the superiority of the present controller to the conventional method.
本文提出了一种新的无人机网络控制策略,旨在通过无人机从高空图像中重构3D结构来提高无人机生成的3D结构的质量。与现有的覆盖控制策略不同,我们的方法同时控制摄像机方向和无人机平移运动,使得无人机可以获得更全面的视角,并提高地图的整体质量。接着,我们提出了一个新问题陈述,包括一个新的性能函数来评估无人机的位置和摄像机方向。然后,我们设计了一个基于QP的控制器,该控制器具有类似于控制壁垒的功能,用于约束目标函数的衰减率。当前问题陈述提出了一个新的挑战,需要比仅涉及平移运动控制的案例更大的计算努力。我们通过技术方法来解决这个问题,即通过引入JAX、即时编译和图形处理器(GPU)加速来利用。最后,我们在ROS(机器人操作系统)中进行广泛的仿真验证,证明了控制器和现有方法的优势。
https://arxiv.org/abs/2404.13915
Current methods for 3D reconstruction and environmental mapping frequently face challenges in achieving high precision, highlighting the need for practical and effective solutions. In response to this issue, our study introduces FlyNeRF, a system integrating Neural Radiance Fields (NeRF) with drone-based data acquisition for high-quality 3D reconstruction. Utilizing unmanned aerial vehicle (UAV) for capturing images and corresponding spatial coordinates, the obtained data is subsequently used for the initial NeRF-based 3D reconstruction of the environment. Further evaluation of the reconstruction render quality is accomplished by the image evaluation neural network developed within the scope of our system. According to the results of the image evaluation module, an autonomous algorithm determines the position for additional image capture, thereby improving the reconstruction quality. The neural network introduced for render quality assessment demonstrates an accuracy of 97%. Furthermore, our adaptive methodology enhances the overall reconstruction quality, resulting in an average improvement of 2.5 dB in Peak Signal-to-Noise Ratio (PSNR) for the 10% quantile. The FlyNeRF demonstrates promising results, offering advancements in such fields as environmental monitoring, surveillance, and digital twins, where high-fidelity 3D reconstructions are crucial.
目前用于3D建模和环境建模的方法通常很难实现高精度,这凸显了需要实际有效的解决方案。为了应对这个问题,我们的研究引入了FlyNeRF,一种将神经辐射场(NeRF)与无人机数据采集相结合的高质量3D建模系统。利用无人机捕获图像和相关空间坐标,然后将获得的数据用于环境中的最初NeRF-based 3D建模。通过系统内图像评估神经网络进一步评估建模渲染质量。根据图像评估模块的结果,自适应算法确定附加图像捕捉的位置,从而提高建模质量。用于建模质量评估的神经网络表现出97%的准确度。此外,我们的自适应方法提高了整体建模质量,使得10%分位数上的峰值信号-噪声比(PSNR)平均提高了2.5分贝。FlyNeRF显示出鼓舞人心的结果,为环境监测、监视和数字孪生等领域提供了进步,这些领域对高保真3D建模至关重要。
https://arxiv.org/abs/2404.12970