Survey

Classifying geospatial objects from multiview aerial imagery using semantic meshes

2024-05-15 17:56:49

David Russell, Ben Weinstein, David Wettergreen, Derek Young

arXiv_CV

arXiv_CV Drone Survey Classification Prediction Pose
Abstract

Aerial imagery is increasingly used in Earth science and natural resource management as a complement to labor-intensive ground-based surveys. Aerial systems can collect overlapping images that provide multiple views of each location from different perspectives. However, most prediction approaches (e.g. for tree species classification) use a single, synthesized top-down "orthomosaic" image as input that contains little to no information about the vertical aspects of objects and may include processing artifacts. We propose an alternate approach that generates predictions directly on the raw images and accurately maps these predictions into geospatial coordinates using semantic meshes. This method$\unicode{x2013}$released as a user-friendly open-source toolkit$\unicode{x2013}$enables analysts to use the highest quality data for predictions, capture information about the sides of objects, and leverage multiple viewpoints of each location for added robustness. We demonstrate the value of this approach on a new benchmark dataset of four forest sites in the western U.S. that consists of drone images, photogrammetry results, predicted tree locations, and species classification data derived from manual surveys. We show that our proposed multiview method improves classification accuracy from 53% to 75% relative to an orthomosaic baseline on a challenging cross-site tree species classification task.

Abstract (translated)

无人机影像在地球科学和自然资源管理中作为劳动密集型地面调查的补充，越来越受到关注。无人机系统可以收集重叠的图像，从不同的角度提供每个地点的多个视图。然而，大多数预测方法（例如树木种类分类）使用单个合成顶部的“正射影像”作为输入，其中包含少量的关于物体垂直方面的信息，并可能包括处理伪影。我们提出了一种替代方法，直接在原始图像上生成预测，并使用语义网格将预测准确地映射到地理坐标中。这个用户友好、开源的工具包$\unicode{x2013}$的发布使得分析师可以使用最高质量的数据进行预测，捕获物体的一侧信息，并利用每个地点的多个视角来增加稳健性。我们在美国西部四个森林站的基准数据集上证明了这种方法的价值，该数据集包括无人机影像、地形测量结果、预测树木位置和来自手动调查的树木种类分类数据。我们显示，与正射影像基线相比，我们提出的多视角方法将分类准确性从53%提高到了75%。

URL

https://arxiv.org/abs/2405.09544

PDF

https://arxiv.org/pdf/2405.09544.pdf
Read All
A Survey On Text-to-3D Contents Generation In The Wild

2024-05-15 15:23:22

Chenhan Jiang

arXiv_CV

arXiv_CV Survey Face Optimization Language_Model 3D Reconstruction Robot
Abstract

3D content creation plays a vital role in various applications, such as gaming, robotics simulation, and virtual reality. However, the process is labor-intensive and time-consuming, requiring skilled designers to invest considerable effort in creating a single 3D asset. To address this challenge, text-to-3D generation technologies have emerged as a promising solution for automating 3D creation. Leveraging the success of large vision language models, these techniques aim to generate 3D content based on textual descriptions. Despite recent advancements in this area, existing solutions still face significant limitations in terms of generation quality and efficiency. In this survey, we conduct an in-depth investigation of the latest text-to-3D creation methods. We provide a comprehensive background on text-to-3D creation, including discussions on datasets employed in training and evaluation metrics used to assess the quality of generated 3D models. Then, we delve into the various 3D representations that serve as the foundation for the 3D generation process. Furthermore, we present a thorough comparison of the rapidly growing literature on generative pipelines, categorizing them into feedforward generators, optimization-based generation, and view reconstruction approaches. By examining the strengths and weaknesses of these methods, we aim to shed light on their respective capabilities and limitations. Lastly, we point out several promising avenues for future research. With this survey, we hope to inspire researchers further to explore the potential of open-vocabulary text-conditioned 3D content creation.

Abstract (translated)

3D内容创作在各种应用中发挥着重要作用，如游戏、机器人模拟和虚拟现实。然而，该过程费力且耗时，需要熟练的设计师投入大量精力创作单个3D资产。为应对这一挑战，文本到3D生成技术作为一种有前途的自动化3D创作的解决方案应运而生。通过利用大型视觉语言模型的成功，这些技术旨在根据文本描述生成3D内容。尽管在最近一段时间内这一领域取得了进展，但现有的解决方案在生成质量和效率方面仍然存在显著的限制。在本次调查中，我们深入研究了最新的文本到3D创作方法。我们提供了关于文本到3D创作的全面背景，包括讨论训练和评估指标所使用的数据集以及用于评估生成3D模型的质量的评估指标。接着，我们深入探讨了作为3D生成过程基础的各种3D表示。此外，我们还对迅速发展的关于生成管道的研究进行了全面的比较，并将它们分为前馈生成、基于优化的生成和视图重构方法。通过分析这些方法的优缺点，我们希望揭示它们各自的潜能和局限。最后，我们指出了未来研究的几个有前景的方向。通过这次调查，我们希望激励研究人员进一步探索开放词汇文本条件下3D内容创作的潜力。

URL

https://arxiv.org/abs/2405.09431

PDF

https://arxiv.org/pdf/2405.09431.pdf
Read All
Learning Correspondence for Deformable Objects

2024-05-14 23:41:24

Priya Sundaresan, Aditya Ganapathi, Harry Zhang, Shivin Devgon

arXiv_CV

arXiv_CV Survey Object_Tracking Tracking Pose Matching Robot
Abstract

We investigate the problem of pixelwise correspondence for deformable objects, namely cloth and rope, by comparing both classical and learning-based methods. We choose cloth and rope because they are traditionally some of the most difficult deformable objects to analytically model with their large configuration space, and they are meaningful in the context of robotic tasks like cloth folding, rope knot-tying, T-shirt folding, curtain closing, etc. The correspondence problem is heavily motivated in robotics, with wide-ranging applications including semantic grasping, object tracking, and manipulation policies built on top of correspondences. We present an exhaustive survey of existing classical methods for doing correspondence via feature-matching, including SIFT, SURF, and ORB, and two recently published learning-based methods including TimeCycle and Dense Object Nets. We make three main contributions: (1) a framework for simulating and rendering synthetic images of deformable objects, with qualitative results demonstrating transfer between our simulated and real domains (2) a new learning-based correspondence method extending Dense Object Nets, and (3) a standardized comparison across state-of-the-art correspondence methods. Our proposed method provides a flexible, general formulation for learning temporally and spatially continuous correspondences for nonrigid (and rigid) objects. We report root mean squared error statistics for all methods and find that Dense Object Nets outperforms baseline classical methods for correspondence, and our proposed extension of Dense Object Nets performs similarly.

Abstract (translated)

我们通过比较基于经典方法和基于学习的方法来研究可变形物体的像素级对应问题，包括布料和绳索。我们选择布料和绳索是因为它们是传统上最难以用大配置空间进行分析建模的变形物体之一，而且在机器人任务如布料折叠、绳索结结、T恤折叠、窗帘关闭等背景下具有重要意义。对应问题在机器人领域受到广泛关注，包括通过特征匹配的语义抓取、物体跟踪和操作策略等。我们全面调查了通过特征匹配实现对应的传统经典方法，包括SIFT、SURF和ORB，以及两篇最近发表的学习方法TimeCycle和Dense Object Nets。我们做出了三个主要贡献：（1）通过模拟和渲染变形物体的合成图像，展示了模拟和真实领域之间的转移；（2）扩展了Dense Object Nets的新学习方法；（3）对最先进的对应方法进行了标准化比较。我们提出的方法为学习非刚性（和刚性）对象的时域和空间连续对应提供了一个灵活、通用的公式。我们报告了所有方法的所有者的根均方误差统计，并发现，Dense Object Nets基线经典方法在对应方面优越，而我们的对Dense Object Nets的扩展也具有相似的性能。

URL

https://arxiv.org/abs/2405.08996

PDF

https://arxiv.org/pdf/2405.08996.pdf
Read All
Bird's-Eye View to Street-View: A Survey

2024-05-14 21:01:12

Khawlah Bajbaa, Muhammad Usman, Saeed Anwar, Ibrahim Radwan, Abdul Bais

arXiv_CV

arXiv_CV Deep_Learning Review Survey
Abstract

In recent years, street view imagery has grown to become one of the most important sources of geospatial data collection and urban analytics, which facilitates generating meaningful insights and assisting in decision-making. Synthesizing a street-view image from its corresponding satellite image is a challenging task due to the significant differences in appearance and viewpoint between the two domains. In this study, we screened 20 recent research papers to provide a thorough review of the state-of-the-art of how street-view images are synthesized from their corresponding satellite counterparts. The main findings are: (i) novel deep learning techniques are required for synthesizing more realistic and accurate street-view images; (ii) more datasets need to be collected for public usage; and (iii) more specific evaluation metrics need to be investigated for evaluating the generated images appropriately. We conclude that, due to applying outdated deep learning techniques, the recent literature failed to generate detailed and diverse street-view images.

Abstract (translated)

近年来，街景图像已成为地理空间数据收集和城市分析中最重要的数据来源之一，从而促进了有意义的见解和决策支持。从相应的卫星图像合成街景图像是一个具有挑战性的任务，因为两个领域之间的外观和视点存在显著差异。在这项研究中，我们审查了20篇最近的研究论文，以全面回顾从相应卫星图像合成街景图像的最佳现状。研究结果是：（i）需要使用新颖的深度学习技术合成更真实和准确的街景图像；（ii）需要收集更多的数据用于公共使用；（iii）需要研究更多的评估指标，以便适当地评估生成的图像。我们得出结论，由于应用过时的深度学习技术，最近的文章没有生成详细和多样化的街景图像。

URL

https://arxiv.org/abs/2405.08961

PDF

https://arxiv.org/pdf/2405.08961.pdf
Read All
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine

2024-05-14 13:42:05

Hanguang Xiao, Feizhong Zhou, Xingyue Liu, Tianqi Liu, Zhipeng Li, Xin Liu, Xiaoxuan Huang

arXiv_CL

arXiv_CL Survey Face Attention Knowledge Language_Model Transformer Pose Medical Chat LLM
Abstract

Since the release of ChatGPT and GPT-4, large language models (LLMs) and multimodal large language models (MLLMs) have garnered significant attention due to their powerful and general capabilities in understanding, reasoning, and generation, thereby offering new paradigms for the integration of artificial intelligence with medicine. This survey comprehensively overviews the development background and principles of LLMs and MLLMs, as well as explores their application scenarios, challenges, and future directions in medicine. Specifically, this survey begins by focusing on the paradigm shift, tracing the evolution from traditional models to LLMs and MLLMs, summarizing the model structures to provide detailed foundational knowledge. Subsequently, the survey details the entire process from constructing and evaluating to using LLMs and MLLMs with a clear logic. Following this, to emphasize the significant value of LLMs and MLLMs in healthcare, we survey and summarize 6 promising applications in healthcare. Finally, the survey discusses the challenges faced by medical LLMs and MLLMs and proposes a feasible approach and direction for the subsequent integration of artificial intelligence with medicine. Thus, this survey aims to provide researchers with a valuable and comprehensive reference guide from the perspectives of the background, principles, and clinical applications of LLMs and MLLMs.

Abstract (translated)

自 ChatGPT 和 GPT-4 发布以来，大型语言模型（LLMs）和多模态大型语言模型（MLLMs）因其在理解、推理和生成方面的强大和通用能力而备受关注，为将人工智能与医疗相结合提供了新的范例。这项调查全面回顾了 LLMs 和 MLLMs 的开发背景和原理，并探讨了它们在医学中的应用场景、挑战和未来发展方向。具体来说，这项调查首先关注范式的转变，从传统模型到 LLMs 和 MLLMs 的演变过程，并总结模型的结构以提供详细的基础知识。接着，调查详细描述了从构建和评估到使用 LLMs 和 MLLMs 的整个过程，并强调了 LLMs 和 MLLMs 在医疗保健中的重要价值。随后，我们调查和总结了 6 个医疗保健领域的有益应用。最后，调查讨论了医疗 LLMs 和 MLLMs 面临的问题，并为将来的人工智能与医疗结合提出了一种可行的方式和方向。因此，这项调查旨在为研究人员提供关于 LLMs 和 MLLMs 的背景、原则和临床应用方面宝贵的全面参考指南。

URL

https://arxiv.org/abs/2405.08603

PDF

https://arxiv.org/pdf/2405.08603.pdf
Read All
A Timely Survey on Vision Transformer for Deepfake Detection

2024-05-14 09:33:04

Zhikan Wang, Zhongyao Cheng, Jiajie Xiong, Xun Xu, Tianrui Li, Bharadwaj Veeravalli, Xulei Yang

arXiv_CV

arXiv_CV Detection Survey Transformer
Abstract

In recent years, the rapid advancement of deepfake technology has revolutionized content creation, lowering forgery costs while elevating quality. However, this progress brings forth pressing concerns such as infringements on individual rights, national security threats, and risks to public safety. To counter these challenges, various detection methodologies have emerged, with Vision Transformer (ViT)-based approaches showcasing superior performance in generality and efficiency. This survey presents a timely overview of ViT-based deepfake detection models, categorized into standalone, sequential, and parallel architectures. Furthermore, it succinctly delineates the structure and characteristics of each model. By analyzing existing research and addressing future directions, this survey aims to equip researchers with a nuanced understanding of ViT's pivotal role in deepfake detection, serving as a valuable reference for both academic and practical pursuits in this domain.

Abstract (translated)

近年来，深度伪造技术的快速发展彻底颠覆了内容创作，降低了伪造成本，提高了质量。然而，这一进步也带来了令人担忧的问题，如侵犯个人权利、国家安全威胁和公共安全风险等。为应对这些挑战，各种检测方法应运而生，以Vision Transformer（ViT）为基础的方法在通用性和效率上表现出色。本调查对基于ViT的深度伪造检测模型进行了及时的概述，按独立、序列和并行架构进行了分类。此外，它简要描述了每种模型的结构和特点。通过分析现有研究和解决未来方向，本调查旨在为研究人员提供对ViT在深度伪造检测中关键作用 nuanced 的理解，成为该领域学术和实践探索的有价值的参考。

URL

https://arxiv.org/abs/2405.08463

PDF

https://arxiv.org/pdf/2405.08463.pdf
Read All
AI-Cybersecurity Education Through Designing AI-based Cyberharassment Detection Lab

2024-05-13 19:09:50

Ebuka Okpala, Nishant Vishwamitra, Keyan Guo, Song Liao, Long Cheng, Hongxin Hu, Yongkai Wu, Xiaohong Yuan, Jeannette Wade, Sajad Khorsandroo

arXiv_AI

arXiv_AI Detection Object_Detection Survey Knowledge
Abstract

Cyberharassment is a critical, socially relevant cybersecurity problem because of the adverse effects it can have on targeted groups or individuals. While progress has been made in understanding cyber-harassment, its detection, attacks on artificial intelligence (AI) based cyberharassment systems, and the social problems in cyberharassment detectors, little has been done in designing experiential learning educational materials that engage students in this emerging social cybersecurity in the era of AI. Experiential learning opportunities are usually provided through capstone projects and engineering design courses in STEM programs such as computer science. While capstone projects are an excellent example of experiential learning, given the interdisciplinary nature of this emerging social cybersecurity problem, it can be challenging to use them to engage non-computing students without prior knowledge of AI. Because of this, we were motivated to develop a hands-on lab platform that provided experiential learning experiences to non-computing students with little or no background knowledge in AI and discussed the lessons learned in developing this lab. In this lab used by social science students at North Carolina A&T State University across two semesters (spring and fall) in 2022, students are given a detailed lab manual and are to complete a set of well-detailed tasks. Through this process, students learn AI concepts and the application of AI for cyberharassment detection. Using pre- and post-surveys, we asked students to rate their knowledge or skills in AI and their understanding of the concepts learned. The results revealed that the students moderately understood the concepts of AI and cyberharassment.

Abstract (translated)

网络骚扰是一个关键的社会安全问题，因为它可能对目标群体或个人产生有害影响。尽管在理解网络骚扰方面已经取得了一些进展，但网络骚扰检测系统对人工智能（AI）的攻击以及网络骚扰探测器中的社会问题，都没有采取太多设计实践，使学生积极参与这一新兴的社交网络安全问题。实践学习机会通常通过包括计算机科学等STEM课程的毕业设计项目提供。虽然毕业设计项目是一个很好的实践学习例子，但由于这一新兴社会安全问题多学科的性质，很难在没有AI先前知识的情况下使用它们来吸引非计算机科学专业的学生。因此，我们受到了激励，开发了一个手动的实验室平台，为没有AI背景知识的学生提供实践学习体验，并讨论了在开发这个实验室过程中所学到的教训。在2022年春季和秋季期间，北卡罗来纳农业和技术州立大学的社会科学学生使用的这个实验室中，学生被提供了一份详细的实验室手册，并完成了一系列详细的任务。通过这个过程，学生学习了AI概念以及AI在网络骚扰检测中的应用。使用前调查和后调查，我们要求学生评估他们在AI方面的知识或技能以及他们对于学到的概念的理解。结果显示，学生对该概念的理解程度适中。

URL

https://arxiv.org/abs/2405.08125

PDF

https://arxiv.org/pdf/2405.08125.pdf
Read All
From Questions to Insightful Answers: Building an Informed Chatbot for University Resources

2024-05-13 19:05:42

Subash Neupane, Elias Hossain, Jason Keith, Himanshu Tripathi, Farbod Ghiasi, Noorbakhsh Amiri Golilarz, Amin Amirlatifi, Sudip Mittal, Shahram Rahimi

arXiv_AI

arXiv_AI Survey QA Language_Model Quantitative Action Chat LLM
Abstract

This paper presents BARKPLUG V.2, a Large Language Model (LLM)-based chatbot system built using Retrieval Augmented Generation (RAG) pipelines to enhance the user experience and access to information within academic settings.The objective of BARKPLUG V.2 is to provide information to users about various campus resources, including academic departments, programs, campus facilities, and student resources at a university setting in an interactive fashion. Our system leverages university data as an external data corpus and ingests it into our RAG pipelines for domain-specific question-answering tasks. We evaluate the effectiveness of our system in generating accurate and pertinent responses for Mississippi State University, as a case study, using quantitative measures, employing frameworks such as Retrieval Augmented Generation Assessment(RAGAS). Furthermore, we evaluate the usability of this system via subjective satisfaction surveys using the System Usability Scale (SUS). Our system demonstrates impressive quantitative performance, with a mean RAGAS score of 0.96, and experience, as validated by usability assessments.

Abstract (translated)

本文介绍了BARKPLUG V.2,一种基于Retrieval Augmented Generation(RAG)流程的大型语言模型(LLM)聊天机器人系统,旨在提高用户体验和学术环境中的信息访问。BARKPLUG V.2的目的是以交互的方式向用户提供有关各种校园资源的信息,包括学术部门、项目、校园设施和学生资源等。我们的系统利用大学数据作为外部数据语料库,并将其输入到我们的RAG流程中进行特定领域问题回答任务。我们用定量和框架评估我们的系统的有效性,如Retrieval Augmented Generation评估(RAGAS)。此外,我们还通过主观满意度调查评估了该系统的可用性,使用了System Usability Scale(SUS)。我们的系统表现出惊人的定量性能,平均RAGAS得分达到了0.96,这是我们通过可用性评估验证的。

URL

https://arxiv.org/abs/2405.08120

PDF

https://arxiv.org/pdf/2405.08120.pdf
Read All
The Platonic Representation Hypothesis

2024-05-13 17:58:30

Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola

arXiv_AI

arXiv_AI Survey Language_Model
Abstract

We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, we demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. We hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, we discuss the implications of these trends, their limitations, and counterexamples to our analysis.

Abstract (translated)

我们认为，AI模型中的表示，特别是深度网络，正在收敛。首先，我们在文献中调查了许多收敛的例子：在时间和多个领域中，不同神经网络表示数据的方式变得越来越相似。接下来，我们证明了数据模态之间的收敛：随着视觉模型和语言模型的增大，它们在数据点之间的距离测量的方式越来越相似。我们假设，这种收敛正在推动向共享的现实统计模型，类似于柏拉图理想现实的概念。我们将这种表示称为柏拉图表示，并讨论了对其可能的选择压力。最后，我们讨论了这些趋势的含义、局限性和对我们分析的反驳。

URL

https://arxiv.org/abs/2405.07987

PDF

https://arxiv.org/pdf/2405.07987.pdf
Read All
Ground-based Image Deconvolution with Swin Transformer UNet

2024-05-13 15:30:41

Utsav Akhaury, Pascale Jablonka, Jean-Luc Starck, Fr\'ed\'eric Courbin

arXiv_CV

arXiv_CV Deep_Learning Survey Relation Transformer Pose
Abstract

As ground-based all-sky astronomical surveys will gather millions of images in the coming years, a critical requirement emerges for the development of fast deconvolution algorithms capable of efficiently improving the spatial resolution of these images. By successfully recovering clean and high-resolution images from these surveys, our objective is to help deepen our understanding of galaxy formation and evolution through accurate photometric measurements. We introduce a two-step deconvolution framework using a Swin Transformer architecture. Our study reveals that the deep learning-based solution introduces a bias, constraining the scope of scientific analysis. To address this limitation, we propose a novel third step relying on the active coefficients in the sparsity wavelet framework. By conducting a performance comparison between our deep learning-based method and Firedec, a classical deconvolution algorithm, we analyze a subset of the EDisCS cluster samples. We demonstrate the advantage of our method in terms of resolution recovery, generalization to different noise properties, and computational efficiency. Not only does the analysis of this cluster sample assess the efficiency of our method, but it also enables us to quantify the number of clumps within these galaxies in relation to their disc colour. This robust technique holds promise for identifying structures in the distant universe from ground-based images.

Abstract (translated)

地面全天空天文调查将在未来几年收集数百万张图像，因此发展能够高效改善这些图像的空间分辨率的高速解像算法至关重要。通过从这些调查中成功恢复干净且高分辨率图像，我们的目标是通过准确的光度测量帮助加深我们对星系形成和演化的理解。我们使用Swin Transformer架构引入了一种两步解像框架。我们的研究揭示了基于深度学习的解决方案引入了偏差，限制了科学分析的范围。为了应对这一局限，我们提出了一个新步骤，依赖于稀疏波尔框架中的主动系数。通过将我们的深度学习方法与经典的解像算法Firedec进行性能比较，分析EDisCS簇样本。我们证明了我们的方法的分辨率恢复、对不同噪声特性的扩展优势以及计算效率。不仅对分析我们方法的有效性进行了评估，而且使我们能够定量这些星系中簇的数量与它们的光学颜色之间的关系。这种稳健的技术对于在地面图像中识别宇宙远处的结构具有前景。

URL

https://arxiv.org/abs/2405.07842

PDF

https://arxiv.org/pdf/2405.07842.pdf
Read All
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

2024-05-13 14:44:22

Jian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, Jin Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal Mian

arXiv_CV

arXiv_CV Deep_Learning Review Survey Pose_Estimation Inference Pose Robot
Abstract

Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependency on labeled training data, model compactness, robustness under challenging conditions, and their ability to generalize to novel unseen objects. A recent survey discussing the progress made on different aspects of this area, outstanding challenges, and promising future directions, is missing. To fill this gap, we discuss the recent advances in deep learning-based object pose estimation, covering all three formulations of the problem, i.e., instance-level, category-level, and unseen object pose estimation. Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks, providing readers with a holistic understanding of this field. Additionally, it discusses training paradigms of different domains, inference modes, application areas, evaluation metrics, and benchmark datasets, as well as reports the performance of current state-of-the-art methods on these benchmarks, thereby facilitating readers in selecting the most suitable method for their application. Finally, the survey identifies key challenges, reviews prevailing trends along with their pros and cons, and identifies promising directions for future research. We also keep tracing the latest works at this https URL.

Abstract (translated)

对象姿态估计是一个在增强现实和机器人领域具有广泛应用的基本计算机视觉问题。在过去的十年里，由于其卓越的准确性和鲁棒性，深度学习模型越来越多地取代了依赖于人工点对特征的传统算法。然而，在当代方法中仍然存在几个挑战，包括对标注训练数据的依赖，模型的紧凑性，在复杂条件下的鲁棒性以及泛化到新颖未知物的能力。一份最近关于该领域进展的调查讨论了这个问题，突出了一些挑战和有前景的未来方向，但缺少了对这个领域的深入讨论。为了填补这个空白，我们讨论了基于深度学习的对象姿态估计的最新进展，涵盖了问题的所有三种形式，即实例级、类别级和未见物体姿态估计。我们的调查还涵盖了多个输入数据模态，输出姿态的自由度，物体属性以及下游任务，为读者提供了对这一领域的全面了解。此外，它还讨论了不同领域的训练范式、推理模式、应用领域、评估指标和基准数据集，以及报告了最先进方法在这些基准上的性能。最后，调查列举了关键挑战，回顾了现有趋势的优缺点，并提出了未来研究的建议。我们还在这个链接上持续追踪最新的工作。

URL

https://arxiv.org/abs/2405.07801

PDF

https://arxiv.org/pdf/2405.07801.pdf
Read All
Evaluating large language models in medical applications: a survey

2024-05-13 05:08:33

Xiaolan Chen, Jiayang Xiang, Shanfu Lu, Yexin Liu, Mingguang He, Danli Shi

arXiv_AI

arXiv_AI Survey Language_Model Medical LLM
Abstract

Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medical information. This paper provides a comprehensive overview of the landscape of medical LLM evaluation, synthesizing insights from existing studies and highlighting evaluation data sources, task scenarios, and evaluation methods. Additionally, it identifies key challenges and opportunities in medical LLM evaluation, emphasizing the need for continued research and innovation to ensure the responsible integration of LLMs into clinical practice.

Abstract (translated)

大型语言模型（LLMs）在医疗领域展现出巨大的变革潜力，跨越多个领域，包括医疗保健和医学。在医疗领域，LLMs在临床决策支持、患者教育等方面具有潜力。然而，在医疗背景下评估LLM的表现带来了独特的挑战，因为医疗信息的复杂性和关键性。本文全面回顾了医疗LLM评估领域的现状，综合了现有研究的见解，并突出了评估数据来源、任务场景和评估方法。此外，它还识别出医疗LLM评估中的关键挑战和机遇，强调需要持续研究和创新，以确保将LLM responsible地融入临床实践中。

URL

https://arxiv.org/abs/2405.07468

PDF

https://arxiv.org/pdf/2405.07468.pdf
Read All
Evaluation of Retrieval-Augmented Generation: A Survey

2024-05-13 02:33:25

Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, Zhaofeng Liu

arXiv_AI

arXiv_AI Survey Knowledge Pose
Abstract

Retrieval-Augmented Generation (RAG) has emerged as a pivotal innovation in natural language processing, enhancing generative models by incorporating external information retrieval. Evaluating RAG systems, however, poses distinct challenges due to their hybrid structure and reliance on dynamic knowledge sources. We consequently enhanced an extensive survey and proposed an analysis framework for benchmarks of RAG systems, RAGR (Retrieval, Generation, Additional Requirement), designed to systematically analyze RAG benchmarks by focusing on measurable outputs and established truths. Specifically, we scrutinize and contrast multiple quantifiable metrics of the Retrieval and Generation component, such as relevance, accuracy, and faithfulness, of the internal links within the current RAG evaluation methods, covering the possible output and ground truth pairs. We also analyze the integration of additional requirements of different works, discuss the limitations of current benchmarks, and propose potential directions for further research to address these shortcomings and advance the field of RAG evaluation. In conclusion, this paper collates the challenges associated with RAG evaluation. It presents a thorough analysis and examination of existing methodologies for RAG benchmark design based on the proposed RGAR framework.

Abstract (translated)

检索增强生成（RAG）作为一种关键的自然语言处理创新，通过将外部信息检索融入生成模型，提高了生成模型的性能。然而，评估RAG系统却面临着明显的挑战，因为它们具有混合结构和依赖动态知识源的特点。因此，我们通过扩展调查和提出基准分析框架，对RAG系统的基准进行了分析，RAGR（检索、生成、附加要求），旨在通过关注可衡量的输出和已确立的真理，系统地分析RAG基准。具体来说，我们审视并比较了Retrieval和Generation组件的多项可量化指标，如相关性、准确性和忠实度，以及当前RAG评估方法中内部链接的多样性可能的输出和真实情况。我们还分析了不同作品附加要求的整合，讨论了当前基准的局限性，并提出了进一步研究的方向，以解决这些缺陷并推动该领域的评估。总之，本文汇总了与RAG评估相关的挑战。它基于提出的RGAR框架，对现有RAG基准设计方法进行了深入分析和评估。

URL

https://arxiv.org/abs/2405.07437

PDF

https://arxiv.org/pdf/2405.07437.pdf
Read All
Machine Unlearning: A Comprehensive Survey

2024-05-13 00:58:34

Weiqi Wang, Zhiyi Tian, Shui Yu

arXiv_AI

arXiv_AI GAN Review Survey
Abstract

As the right to be forgotten has been legislated worldwide, many studies attempt to design unlearning mechanisms to protect users' privacy when they want to leave machine learning service platforms. Specifically, machine unlearning is to make a trained model to remove the contribution of an erased subset of the training dataset. This survey aims to systematically classify a wide range of machine unlearning and discuss their differences, connections and open problems. We categorize current unlearning methods into four scenarios: centralized unlearning, distributed and irregular data unlearning, unlearning verification, and privacy and security issues in unlearning. Since centralized unlearning is the primary domain, we use two parts to introduce: firstly, we classify centralized unlearning into exact unlearning and approximate unlearning; secondly, we offer a detailed introduction to the techniques of these methods. Besides the centralized unlearning, we notice some studies about distributed and irregular data unlearning and introduce federated unlearning and graph unlearning as the two representative directions. After introducing unlearning methods, we review studies about unlearning verification. Moreover, we consider the privacy and security issues essential in machine unlearning and organize the latest related literature. Finally, we discuss the challenges of various unlearning scenarios and address the potential research directions.

Abstract (translated)

随着全球范围内对隐私保护的立法，许多研究试图设计 unlearning 机制以保护用户在想要离开机器学习服务平台时泄露隐私。具体来说，机器 unlearning 是指训练一个模型，从训练数据集中的被删除子集的贡献中移除贡献。本调查旨在系统地分类机器 unlearning 并提出它们的差异、联系和未解决的问题。我们将当前的 unlearning 方法分为四个场景：集中 unlearning、分布式和 irregular 数据 unlearning、 unlearning 验证和隐私和安全问题。由于集中 unlearning 是主要的领域，我们使用两个部分来介绍：首先，我们将集中 unlearning 分为精确 unlearning 和近似 unlearning；其次，我们详细介绍了这些方法的技巧。除了集中 unlearning，我们还注意到了一些关于分布式和 irregular 数据 unlearning 的研究，并将 federated unlearning 和 graph unlearning 作为两个具有代表性的方向。在介绍 unlearning 方法之后，我们回顾了关于 unlearning 验证的研究。此外，我们考虑了机器 unlearning 中隐私和安全问题至关重要，并组织最新的相关文献。最后，我们讨论了各种 unlearning 场景的挑战，并提出了潜在的研究方向。

URL

https://arxiv.org/abs/2405.07406

PDF

https://arxiv.org/pdf/2405.07406.pdf
Read All
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis

2024-05-12 10:52:15

Nikolay B Petrov, Gregory Serapio-Garc\'ia, Jason Rentfrow

arXiv_AI

arXiv_AI Survey Language_Model Transformer Chat LLM
Abstract

The humanlike responses of large language models (LLMs) have prompted social scientists to investigate whether LLMs can be used to simulate human participants in experiments, opinion polls and surveys. Of central interest in this line of research has been mapping out the psychological profiles of LLMs by prompting them to respond to standardized questionnaires. The conflicting findings of this research are unsurprising given that mapping out underlying, or latent, traits from LLMs' text responses to questionnaires is no easy task. To address this, we use psychometrics, the science of psychological measurement. In this study, we prompt OpenAI's flagship models, GPT-3.5 and GPT-4, to assume different personas and respond to a range of standardized measures of personality constructs. We used two kinds of persona descriptions: either generic (four or five random person descriptions) or specific (mostly demographics of actual humans from a large-scale human dataset). We found that the responses from GPT-4, but not GPT-3.5, using generic persona descriptions show promising, albeit not perfect, psychometric properties, similar to human norms, but the data from both LLMs when using specific demographic profiles, show poor psychometrics properties. We conclude that, currently, when LLMs are asked to simulate silicon personas, their responses are poor signals of potentially underlying latent traits. Thus, our work casts doubt on LLMs' ability to simulate individual-level human behaviour across multiple-choice question answering tasks.

Abstract (translated)

大语言模型（LLMs）的人性化回应引起了社会学家调查是否LLMs可以用于模拟人类参与实验、民意调查和调查。在研究这一领域时，一个关键兴趣是通过提示它们回答标准化问卷来绘制LLMs的心理特征。这一研究中的不同发现并不令人意外，因为从LLMs的问卷回答中绘制潜在特征并不是一项容易的任务。为解决这个问题，我们使用心理计量学，即心理测量学。在这项研究中，我们要求OpenAI的旗舰模型GPT-3.5和GPT-4扮演不同的角色，并回答一系列人格构成的标准化测量。我们使用两种人物描述：通用（四个或五个随机人物描述）或具体（主要是从大型数据集中的人类实际数据的 mostly demographic 信息）。我们发现，使用通用人物描述的GPT-4的回应具有有前景的心理计量属性，尽管并不完美，类似于人类标准，但当LLMs使用具体人口参数时，两个LLM的数据在心理计量属性上表现不佳。因此，我们得出结论，目前，当LLMs被要求模拟硅片人格时，它们的回应表现出了可能存在的潜在特征的低水平信号。因此，我们的工作对LLMs在多选题问答任务中模拟个体级别人类行为的能力提出了质疑。

URL

https://arxiv.org/abs/2405.07248

PDF

https://arxiv.org/pdf/2405.07248.pdf
Read All
A Methodology-Oriented Study of Catastrophic Forgetting in Incremental Deep Neural Networks

2024-05-11 05:10:07

Ashutosh Kumar, Sonali Agarwal, D Jude Hemanth

arXiv_AI

arXiv_AI Survey Knowledge Pose Autonomous Agent
Abstract

Human being and different species of animals having the skills to gather, transferring knowledge, processing, fine-tune and generating information throughout their lifetime. The ability of learning throughout their lifespan is referred as continuous learning which is using neurocognition mechanism. Consequently, in real world computational system of incremental learning autonomous agents also needs such continuous learning mechanism which provide retrieval of information and long-term memory consolidation. However, the main challenge in artificial intelligence is that the incremental learning of the autonomous agent when new data confronted. In such scenarios, the main concern is catastrophic forgetting(CF), i.e., while learning the sequentially, neural network underfits the old data when it confronted with new data. To tackle this CF problem many numerous studied have been proposed, however it is very difficult to compare their performance due to dissimilarity in their evaluation mechanism. Here we focus on the comparison of all algorithms which are having similar type of evaluation mechanism. Here we are comparing three types of incremental learning methods: (1) Exemplar based methods, (2) Memory based methods, and (3) Network based method. In this survey paper, methodology oriented study for catastrophic forgetting in incremental deep neural network is addressed. Furthermore, it contains the mathematical overview of impact-full methods which can be help researchers to deal with CF.

Abstract (translated)

人类和其他动物在其一生中具有收集、传递知识、处理、微调和生成信息的能力。这种能力在整个生命中进行学习被称为连续学习，利用神经认知机制。因此，在现实世界的自增学习自主代理也需要这种连续学习机制，以提供信息检索和长期记忆巩固。然而，人工智能的一个主要挑战是，当代理面对新的数据时，自增学习的递归问题。在这些问题中，主要关注的是灾难性遗忘（CF），即在依次学习的过程中，神经网络在遇到新数据时，过拟合旧数据。为解决这一CF问题，已经提出了许多研究，但是由于它们的评估机制不同，很难比较它们的性能。因此，本文将重点比较具有相似评估机制的所有算法的性能。本文将比较三种自增学习方法：（1）示例方法，（2）记忆方法，和（3）网络方法。本研究旨在解决灾难性遗忘在自增深度神经网络中的问题，并提供了对有影响力的方法的数学概述，以帮助研究人员解决CF问题。

URL

https://arxiv.org/abs/2405.08015

PDF

https://arxiv.org/pdf/2405.08015.pdf
Read All
Fairness in Reinforcement Learning: A Survey

2024-05-11 04:36:46

Anka Reuel, Devin Ma

arXiv_AI

arXiv_AI Review Survey Reinforcement_Learning Classification Attention Autonomous Agent
Abstract

While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period of time. To ensure the responsible development and deployment of these systems, we must better understand fairness in RL. In this paper, we survey the literature to provide the most up-to-date snapshot of the frontiers of fairness in RL. We start by reviewing where fairness considerations can arise in RL, then discuss the various definitions of fairness in RL that have been put forth thus far. We continue to highlight the methodologies researchers used to implement fairness in single- and multi-agent RL systems before showcasing the distinct application domains that fair RL has been investigated in. Finally, we critically examine gaps in the literature, such as understanding fairness in the context of RLHF, that still need to be addressed in future work to truly operationalize fair RL in real-world systems.

Abstract (translated)

虽然我们对机器学习中的公平性理解已经有了很大的进步，但在强化学习（RL）中的公平性理解仍然非常初浅。大部分关注点都集中在单次分类任务中的公平性上；然而，现实世界中的RL enabled系统（例如，自动驾驶车辆）在较长的时期内操作在动态环境中更为复杂。为了确保这些系统的负责任发展和部署，我们必须更好地理解RL中的公平性。在本文中，我们调查了文献，为RL中的公平性提供了最最新的前沿。我们首先回顾了在RL中公平性考虑可能出现的地方，然后讨论了迄今为止在RL中提出的公平性的各种定义。我们继续强调研究人员在实现单人和多机器人RL系统中的公平性时所采用的方法，然后展示了公平RL在研究中所涉及的独特应用领域。最后，我们对这些文献中仍需解决的关于RLHF公平性理解的空白进行了批判性审查，以确保在未来的工作中真正实现公平RL在现实世界系统中的操作。

URL

https://arxiv.org/abs/2405.06909

PDF

https://arxiv.org/pdf/2405.06909.pdf
Read All
The Ghanaian NLP Landscape: A First Look

2024-05-10 21:39:09

Sheriff Issaka, Zhaoyi Zhang, Mihir Heda, Keyi Wang, Yinka Ajibola, Ryan DeMar, Xuefeng Du

arXiv_CL

arXiv_CL Survey Face
Abstract

Despite comprising one-third of global languages, African languages are critically underrepresented in Artificial Intelligence (AI), threatening linguistic diversity and cultural heritage. Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk. This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages, identifying methodologies, datasets, and techniques employed. Additionally, we create a detailed roadmap outlining challenges, best practices, and future directions, aiming to improve accessibility for researchers. This work serves as a foundational resource for Ghanaian NLP research and underscores the critical need for integrating global linguistic diversity into AI development.

Abstract (translated)

尽管全球语言的1/3是由非洲语言构成的，但它们在人工智能（AI）中的代表非常有限，这威胁着语言多样性和文化遗产。特别是加纳语，其使用和传承正面临令人担忧的下降，已经记录灭绝，还有许多处于危险之中。本研究在 Ghanaian 语言的 Natural Language Processing (NLP) 研究中进行了全面的调查，识别了采用的方法论、数据集和技术。此外，我们创建了一个详细的路由图，列出了挑战、最佳实践和未来的方向，旨在提高研究人员的使用体验。这项工作为 Ghanaian NLP 研究奠定了基础，并突出了将全球语言多样性融入 AI 发展的关键性。

URL

https://arxiv.org/abs/2405.06818

PDF

https://arxiv.org/pdf/2405.06818.pdf
Read All
A Survey of Large Language Models for Graphs

2024-05-10 18:05:37

Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh Chawla, Chao Huang

arXiv_AI

arXiv_AI Review Survey Classification Attention Summarization Relation Prediction Language_Model LLM
Abstract

Graphs are an essential data structure utilized to represent relationships in real-world scenarios. Prior research has established that Graph Neural Networks (GNNs) deliver impressive outcomes in graph-centric tasks, such as link prediction and node classification. Despite these advancements, challenges like data sparsity and limited generalization capabilities continue to persist. Recently, Large Language Models (LLMs) have gained attention in natural language processing. They excel in language comprehension and summarization. Integrating LLMs with graph learning techniques has attracted interest as a way to enhance performance in graph learning tasks. In this survey, we conduct an in-depth review of the latest state-of-the-art LLMs applied in graph learning and introduce a novel taxonomy to categorize existing methods based on their framework design. We detail four unique designs: i) GNNs as Prefix, ii) LLMs as Prefix, iii) LLMs-Graphs Integration, and iv) LLMs-Only, highlighting key methodologies within each category. We explore the strengths and limitations of each framework, and emphasize potential avenues for future research, including overcoming current integration challenges between LLMs and graph learning techniques, and venturing into new application areas. This survey aims to serve as a valuable resource for researchers and practitioners eager to leverage large language models in graph learning, and to inspire continued progress in this dynamic field. We consistently maintain the related open-source materials at \url{this https URL}.

Abstract (translated)

图是一种基本的数据结构，用于表示现实场景中的关系。先前的研究已经证明，图神经网络（GNNs）在图中心任务中取得出色的成果，如链路预测和节点分类。尽管取得了这些进步，数据稀疏性和有限的泛化能力仍然存在。最近，在自然语言处理领域，大型语言模型（LLMs）受到了关注。它们在语言理解和总结方面表现出色。将LLMs与图学习技术相结合，作为一种提高在图学习任务中性能的方法引起了人们的兴趣。在这篇调查中，我们对应用于图学习的最新 LLMs 进行了深入的回顾，并引入了一种新的分类方法，根据框架设计进行分类。我们详细介绍了四种独特的设计：i）GNNs 作为前缀，ii）LLMs 作为前缀，iii）LLMs-Graphs 集成，iv）LLMs-Only，突出每种分类方法的关键方法论。我们探讨了每个框架的优势和局限性，并强调未来研究的潜在方向，包括克服当前 LLMs 和图学习技术之间的集成挑战，探索新的应用领域。本调查旨在为渴望利用大型语言模型进行图学习的研究人员和实践者提供有价值的资源，并激发这个动态领域持续的进步。我们始终保持相关开源材料的 URL。

URL

https://arxiv.org/abs/2405.08011

PDF

https://arxiv.org/pdf/2405.08011.pdf
Read All
ChatGPTest: opportunities and cautionary tales of utilizing AI for questionnaire pretesting

2024-05-10 09:01:14

Francisco Olivos, Minhui Liu

arXiv_AI

arXiv_AI Survey Transformer Chat
Abstract

The rapid advancements in generative artificial intelligence have opened up new avenues for enhancing various aspects of research, including the design and evaluation of survey questionnaires. However, the recent pioneering applications have not considered questionnaire pretesting. This article explores the use of GPT models as a useful tool for pretesting survey questionnaires, particularly in the early stages of survey design. Illustrated with two applications, the article suggests incorporating GPT feedback as an additional stage before human pretesting, potentially reducing successive iterations. The article also emphasizes the indispensable role of researchers' judgment in interpreting and implementing AI-generated feedback.

Abstract (translated)

快速进步的生成人工智能为各种研究开辟了新的途径，包括调查问卷的设计和评估。然而，最近的应用探索并没有考虑到问卷预测试。本文探讨了将自然语言处理（NLP）模型作为预测试调查问卷的有用工具，特别是在调查设计的早期阶段。通过两个应用的示例，文章建议在人类预测试之前，将GPT反馈作为一个附加阶段，可能减少迭代次数。文章还强调了研究人员判断在解释和实施AI生成的反馈方面不可或缺的重要性。

URL

https://arxiv.org/abs/2405.06329

PDF

https://arxiv.org/pdf/2405.06329.pdf
Read All

Content

Survey (20)

Survey

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF