Review

Real-World Federated Learning in Radiology: Hurdles to overcome and Benefits to gain

2024-05-15 15:04:27

Markus R. Bujotzek, \"Unal Ak\"unal, Stefan Denner, Peter Neher, Maximilian Zenk, Eric Frodl, Astha Jaiswal, Moon Kim, Nicolai R. Krekiehn, Manuel Nickel, Richard Ruppel, Marcus Both, Felix D\"ollinger, Marcel Opitz, Thorsten Persigehl, Jens Kleesiek, Tobias Penzkofer, Klaus Maier-Hein, Rickmer Braren, Andreas Bucher

arXiv_CV

arXiv_CV Segmentation GAN Review Knowledge Pose
Abstract

Objective: Federated Learning (FL) enables collaborative model training while keeping data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles, leaving behind a significant knowledge gap. Minding efforts to implement real-world FL, there is a notable lack of comprehensive assessment comparing FL to less complex alternatives. Materials & Methods: We extensively reviewed FL literature, categorizing insights along with our findings according to their nature and phase while establishing a FL initiative, summarized to a comprehensive guide. We developed our own FL infrastructure within the German Radiological Cooperative Network (RACOON) and demonstrated its functionality by training FL models on lung pathology segmentation tasks across six university hospitals. We extensively evaluated FL against less complex alternatives in three distinct evaluation scenarios. Results: The proposed guide outlines essential steps, identified hurdles, and proposed solutions for establishing successful FL initiatives conducting real-world experiments. Our experimental results show that FL outperforms less complex alternatives in all evaluation scenarios, justifying the effort required to translate FL into real-world applications. Discussion & Conclusion: Our proposed guide aims to aid future FL researchers in circumventing pitfalls and accelerating translation of FL into radiological applications. Our results underscore the value of efforts needed to translate FL into real-world applications by demonstrating advantageous performance over alternatives, and emphasize the importance of strategic organization, robust management of distributed data and infrastructure in real-world settings.

Abstract (translated)

目标：联邦学习（FL）可以在保留数据本地的情况下实现协作模型训练。目前，由于许多阻碍将其转化为实践的障碍，大多数放射学领域的FL研究都是在模拟环境中进行的。少数现有现实世界的FL倡议很少详细介绍为克服这些障碍所采取的具体措施，留下了相当大的知识空白。关注实施现实世界FL，在比较FL与其他更简单选项的全局评估方面存在显著的不足。材料和方法：我们广泛审查了FL文献，根据其性质和阶段将见解进行分类，同时建立FL倡议，并将其总结为一本全面的指南。我们还在德国放射学合作网络（RACOON）内开发了自己的FL基础设施，并通过在六所大学医院的肺病理分割任务上训练FL模型来展示其功能性。我们在三个不同的评估场景对FL与更简单的替代方案进行了广泛评估。结果：所提出的指南概述了建立成功FL倡议的必要步骤，识别了障碍并提出了解决方案。我们的实验结果表明，FL在所有评估场景中都优于更简单的替代方案，从而为将FL融入放射学应用付出了所需的努力。讨论与结论：我们的指南旨在帮助未来的FL研究人员避免陷阱，加速FL向放射学应用的转化。我们的结果强调了将FL融入现实世界应用所需的努力，通过证明其相对于替代方案的优越性能，并着重强调了在现实环境中的战略组织、分布式数据和基础设施的稳健管理的重要性。

URL

https://arxiv.org/abs/2405.09409

PDF

https://arxiv.org/pdf/2405.09409.pdf
Read All
Bird's-Eye View to Street-View: A Survey

2024-05-14 21:01:12

Khawlah Bajbaa, Muhammad Usman, Saeed Anwar, Ibrahim Radwan, Abdul Bais

arXiv_CV

arXiv_CV Deep_Learning Review Survey
Abstract

In recent years, street view imagery has grown to become one of the most important sources of geospatial data collection and urban analytics, which facilitates generating meaningful insights and assisting in decision-making. Synthesizing a street-view image from its corresponding satellite image is a challenging task due to the significant differences in appearance and viewpoint between the two domains. In this study, we screened 20 recent research papers to provide a thorough review of the state-of-the-art of how street-view images are synthesized from their corresponding satellite counterparts. The main findings are: (i) novel deep learning techniques are required for synthesizing more realistic and accurate street-view images; (ii) more datasets need to be collected for public usage; and (iii) more specific evaluation metrics need to be investigated for evaluating the generated images appropriately. We conclude that, due to applying outdated deep learning techniques, the recent literature failed to generate detailed and diverse street-view images.

Abstract (translated)

近年来，街景图像已成为地理空间数据收集和城市分析中最重要的数据来源之一，从而促进了有意义的见解和决策支持。从相应的卫星图像合成街景图像是一个具有挑战性的任务，因为两个领域之间的外观和视点存在显著差异。在这项研究中，我们审查了20篇最近的研究论文，以全面回顾从相应卫星图像合成街景图像的最佳现状。研究结果是：（i）需要使用新颖的深度学习技术合成更真实和准确的街景图像；（ii）需要收集更多的数据用于公共使用；（iii）需要研究更多的评估指标，以便适当地评估生成的图像。我们得出结论，由于应用过时的深度学习技术，最近的文章没有生成详细和多样化的街景图像。

URL

https://arxiv.org/abs/2405.08961

PDF

https://arxiv.org/pdf/2405.08961.pdf
Read All
Literature Review on Maneuver-Based Scenario Description for Automated Driving Simulations

2024-05-14 14:06:26

Nicole Neis, Juergen Beyerer

arXiv_RO

arXiv_RO Review Pose
Abstract

The increasing complexity of automated driving functions and their growing operational design domains imply more demanding requirements on their validation. Classical methods such as field tests or formal analyses are not sufficient anymore and need to be complemented by simulations. For simulations, the standard approach is scenario-based testing, as opposed to distance-based testing primarily performed in field tests. Currently, the time evolution of specific scenarios is mainly described using trajectories, which limit or at least hamper generalizations towards variations. As an alternative, maneuver-based approaches have been proposed. We shed light on the state of the art and available foundations for this new method through a literature review of early and recent works related to maneuver-based scenario description. It includes related modeling approaches originally developed for other applications. Current limitations and research gaps are identified.

Abstract (translated)

自动驾驶功能的复杂性和其不断增长的操作设计领域意味着对其验证提出更为严格的要求。传统的测试方法，如现场测试或形式分析，已经不再足够，需要通过仿真进行补充。对于仿真，标准方法是基于场景的测试，而不仅仅是在现场测试中进行的距离测试。目前，特定场景的时间演化主要是通过轨迹来描述，这限制或至少阻碍了向变化进行的一般化。作为替代方法，提出了基于操纵的方法。通过关于操纵场景描述的早期和最近工作的文献回顾，揭示了这种新方法的现状和可用基础。包括为其他应用最初开发的相关建模方法。指出了当前的局限性和研究空白。

URL

https://arxiv.org/abs/2405.08626

PDF

https://arxiv.org/pdf/2405.08626.pdf
Read All
Dynamic NeRF: A Review

2024-05-14 13:49:31

Jinwei Lin

arXiv_CV

arXiv_CV Review Pose 3D Reconstruction
Abstract

Neural Radiance Field(NeRF) is an novel implicit method to achieve the 3D reconstruction and representation with a high resolution. After the first research of NeRF is proposed, NeRF has gained a robust developing power and is booming in the 3D modeling, representation and reconstruction areas. However the first and most of the followed research projects based on NeRF is static, which are weak in the practical applications. Therefore, more researcher are interested and focused on the study of dynamic NeRF that is more feasible and useful in practical applications or situations. Compared with the static NeRF, implementing the Dynamic NeRF is more difficult and complex. But Dynamic is more potential in the future even is the basic of Editable NeRF. In this review, we made a detailed and abundant statement for the development and important implementation principles of Dynamci NeRF. The analysis of main principle and development of Dynamic NeRF is from 2021 to 2023, including the most of the Dynamic NeRF projects. What is more, with colorful and novel special designed figures and table, We also made a detailed comparison and analysis of different features of various of Dynamic. Besides, we analyzed and discussed the key methods to implement a Dynamic NeRF. The volume of the reference papers is large. The statements and comparisons are multidimensional. With a reading of this review, the whole development history and most of the main design method or principles of Dynamic NeRF can be easy understood and gained.

Abstract (translated)

Neural Radiance Field（NeRF）是一种新型 implicit方法，旨在以高分辨率实现三维重建和表示。在NeRF首次研究提出后，NeRF获得了强大的发展动力，并在三维建模、表示和重建领域蓬勃发展。然而，大多数基于NeRF的研究项目是静态的，在实际应用中效果较弱。因此，越来越多的研究者对研究动态NeRF感兴趣，这是一个更实用且具有前景的方法。与静态NeRF相比，实现动态NeRF更具挑战性和复杂性。但动态NeRF在未来的发展前景仍相当广阔，即使是最基本的编辑NeRF方法。在本文中，我们对动态NeRF的发展和重要实施原则进行了详细而丰富的阐述。分析主要原则和动态NeRF的发展是从2021年到2023年，包括大部分动态NeRF项目。此外，我们还通过丰富的彩色和新颖的图案以及对比，对各种动态特征进行了深入的比较和分析。此外，我们分析了并讨论了实现动态NeRF的关键方法。参考文献的体积很大。陈述和比较是多维的。通过阅读本综述，可以轻松理解和掌握动态NeRF的发展历程和主要设计原则。

URL

https://arxiv.org/abs/2405.08609

PDF

https://arxiv.org/pdf/2405.08609.pdf
Read All
The Unseen Targets of Hate -- A Systematic Review of Hateful Communication Datasets

2024-05-14 12:50:33

Zehui Yu, Indira Sen, Dennis Assenmacher, Mattia Samory, Leon Fr\"ohling, Christina Dahn, Debora Nozza, Claudia Wagner

arXiv_CL

arXiv_CL Detection Review
Abstract

Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet, ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities and may discriminate against them, we know surprisingly little about the provenance of such bias. To fill this gap, we present a systematic review of the datasets for the automated detection of hateful communication introduced over the past decade, and unpack the quality of the datasets in terms of the identities that they embody: those of the targets of hateful communication that the data curators focused on, as well as those unintentionally included in the datasets. We find, overall, a skewed representation of selected target identities and mismatches between the targets that research conceptualizes and ultimately includes in datasets. Yet, by contextualizing these findings in the language and location of origin of the datasets, we highlight a positive trend towards the broadening and diversification of this research space.

Abstract (translated)

机器学习（ML）为基础的内容审查工具对于保持网络空间免于仇恨言论至关重要。然而，ML工具只能以其训练数据允许其实现的能力水平来保持这种功能。尽管有越来越多的证据表明，它们在检测针对特定身份的仇恨言论方面表现不佳，甚至可能歧视他们，但我们对于这种偏见来源的了解仍然非常有限。为了填补这一空白，我们回顾了过去十年中介绍的自动检测仇恨言论的数据集，并探讨了这些数据集的质量：这些数据集的 targets（即受到仇恨言论攻击的对象）以及无意中包括在这些数据集中的其他目标。我们发现，总体而言，选择目标的 representation 存在偏差，研究概念与最终包含在数据集中的目标之间的差距存在。然而，通过将这些发现置于数据集的起源语言和位置的背景下，我们强调了研究空间拓宽和多样化的积极趋势。

URL

https://arxiv.org/abs/2405.08562

PDF

https://arxiv.org/pdf/2405.08562.pdf
Read All
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

2024-05-13 17:56:13

Wenqi Dong, Bangbang Yang, Lin Ma, Xiao Liu, Liyuan Cui, Hujun Bao, Yuewen Ma, Zhaopeng Cui

arXiv_CV

arXiv_CV Review Pose 3D Reconstruction Inpainting Sketch Diffusion
Abstract

As humans, we aspire to create media content that is both freely willed and readily controlled. Thanks to the prominent development of generative techniques, we now can easily utilize 2D diffusion methods to synthesize images controlled by raw sketch or designated human poses, and even progressively edit/regenerate local regions with masked inpainting. However, similar workflows in 3D modeling tasks are still unavailable due to the lack of controllability and efficiency in 3D generation. In this paper, we present a novel controllable and interactive 3D assets modeling framework, named Coin3D. Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes, and introduces an interactive generation workflow to support seamless local part editing while delivering responsive 3D object previewing within a few seconds. To this end, we develop several techniques, including the 3D adapter that applies volumetric coarse shape control to the diffusion model, proxy-bounded editing strategy for precise part editing, progressive volume cache to support responsive preview, and volume-SDS to ensure consistent mesh reconstruction. Extensive experiments of interactive generation and editing on diverse shape proxies demonstrate that our method achieves superior controllability and flexibility in the 3D assets generation task.

Abstract (translated)

作为人类，我们渴望创建既自由又易于控制的媒体内容。得益于生成技术的显著发展，我们现在可以轻松地利用2D扩散方法合成由原始草图或指定人体姿势控制的图像，甚至可以逐步编辑/再生带有遮罩的局部区域。然而，在3D建模任务中，类似的工作流程仍然无法实现，因为3D生成的可控性和效率不高。在本文中，我们提出了一个新颖的可控且具有交互性的3D资产建模框架，名为Coin3D。Coin3D允许用户使用由基本形状组成的粗略几何代理来控制3D生成，并引入了交互式生成工作流程，以支持在几秒钟内提供响应式的3D物体预览。为此，我们开发了几个技术，包括对扩散模型应用体积粗略形状控制的3D适配器，用于精确部分编辑的代理边界编辑策略，用于支持响应式预览的渐进式体积缓存，以及体积-SDS，以确保一致的网格重建。对不同形状代理的交互式生成和编辑的广泛实验证明，我们的方法在3D资产生成任务中实现了卓越的可控性和灵活性。

URL

https://arxiv.org/abs/2405.08054

PDF

https://arxiv.org/pdf/2405.08054.pdf
Read All
Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

2024-05-13 15:12:21

Yuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie Zhu

arXiv_AI

arXiv_AI Recognition Review Classification Transfer_Learning Pose
Abstract

Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study", which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%.

Abstract (translated)

检测摄入环境是监控饮食摄入的重要方面。它为饮食评估提供了有洞察力的信息。然而，它是一个具有挑战性的问题，基于人类审查的方法可以让人感到乏味，而基于算法审查的方法则受到数据不平衡和感知混淆问题的困扰。为了解决这些问题，我们提出了一个基于神经网络的方法，具有两个训练框架，巧妙地将微调和支持学习技术相结合。我们对该方法在一个名为“UA自由生活研究”的新数据集上进行了评估，该数据集使用一个以自我为中心的智能穿戴相机、AIM-2传感器等设备，在自由生活条件下模拟食物摄入。所提出的训练框架应用于常见的神经网络骨干，结合了通用不平衡分类领域的方法。对收集到的数据集的实验结果表明，我们提出的自动摄入环境识别方法成功地解决了数据不平衡问题，并实现了96.63%的准确分类准确率，这是一个有前景的结果。

URL

https://arxiv.org/abs/2405.07827

PDF

https://arxiv.org/pdf/2405.07827.pdf
Read All
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

2024-05-13 14:44:22

Jian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, Jin Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal Mian

arXiv_CV

arXiv_CV Deep_Learning Review Survey Pose_Estimation Inference Pose Robot
Abstract

Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependency on labeled training data, model compactness, robustness under challenging conditions, and their ability to generalize to novel unseen objects. A recent survey discussing the progress made on different aspects of this area, outstanding challenges, and promising future directions, is missing. To fill this gap, we discuss the recent advances in deep learning-based object pose estimation, covering all three formulations of the problem, i.e., instance-level, category-level, and unseen object pose estimation. Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks, providing readers with a holistic understanding of this field. Additionally, it discusses training paradigms of different domains, inference modes, application areas, evaluation metrics, and benchmark datasets, as well as reports the performance of current state-of-the-art methods on these benchmarks, thereby facilitating readers in selecting the most suitable method for their application. Finally, the survey identifies key challenges, reviews prevailing trends along with their pros and cons, and identifies promising directions for future research. We also keep tracing the latest works at this https URL.

Abstract (translated)

对象姿态估计是一个在增强现实和机器人领域具有广泛应用的基本计算机视觉问题。在过去的十年里，由于其卓越的准确性和鲁棒性，深度学习模型越来越多地取代了依赖于人工点对特征的传统算法。然而，在当代方法中仍然存在几个挑战，包括对标注训练数据的依赖，模型的紧凑性，在复杂条件下的鲁棒性以及泛化到新颖未知物的能力。一份最近关于该领域进展的调查讨论了这个问题，突出了一些挑战和有前景的未来方向，但缺少了对这个领域的深入讨论。为了填补这个空白，我们讨论了基于深度学习的对象姿态估计的最新进展，涵盖了问题的所有三种形式，即实例级、类别级和未见物体姿态估计。我们的调查还涵盖了多个输入数据模态，输出姿态的自由度，物体属性以及下游任务，为读者提供了对这一领域的全面了解。此外，它还讨论了不同领域的训练范式、推理模式、应用领域、评估指标和基准数据集，以及报告了最先进方法在这些基准上的性能。最后，调查列举了关键挑战，回顾了现有趋势的优缺点，并提出了未来研究的建议。我们还在这个链接上持续追踪最新的工作。

URL

https://arxiv.org/abs/2405.07801

PDF

https://arxiv.org/pdf/2405.07801.pdf
Read All
Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AI

2024-05-13 14:17:52

Silvia Tulli, Stylianos Loukas Vasileiou, Sarath Sreedharan

arXiv_AI

arXiv_AI Review Action
Abstract

"Human-aware" has become a popular keyword used to describe a particular class of AI systems that are designed to work and interact with humans. While there exists a surprising level of consistency among the works that use the label human-aware, the term itself mostly remains poorly understood. In this work, we retroactively try to provide an account of what constitutes a human-aware AI system. We see that human-aware AI is a design-oriented paradigm, one that focuses on the need for modeling the humans it may interact with. Additionally, we see that this paradigm offers us intuitive dimensions to understand and categorize the kinds of interactions these systems might have with humans. We show the pedagogical value of these dimensions by using them as a tool to understand and review the current landscape of work related to human-AI systems that purport some form of human modeling. To fit the scope of a workshop paper, we specifically narrowed our review to papers that deal with sequential decision-making and were published in a major AI conference in the last three years. Our analysis helps identify the space of potential research problems that are currently being overlooked. We perform additional analysis on the degree to which these works make explicit reference to results from social science and whether they actually perform user-studies to validate their systems. We also provide an accounting of the various AI methods used by these works.

Abstract (translated)

"Human-aware"已成为用于描述一类旨在与人类互动和工作的AI系统的流行关键词。尽管在使用该标签的作品之间存在相当程度的一致性，但该术语本身仍然存在很大的误解。在这篇工作中，我们试图通过回顾来提供关于如何定义一个 human-aware AI 系统的说明。我们发现，human-aware AI 是一个设计导向的范式，该范式关注于可能与其互动的人类的需求建模。此外，我们还发现，这个范式提供了一个直观的维度来理解并分类这些系统与人类之间的互动。我们用这些维度作为工具来了解和审查声称某种形式的人类建模的现有工作格局。为了符合一场研讨会的论文范围，我们特别将分析缩小到在最近三年内在大型AI会议上发表的涉及序列决策的论文。我们的分析有助于识别当前被忽视的研究问题领域。我们进一步分析了这些作品明确提到社会科学结果以及实际上进行用户研究来验证其系统的程度。我们还提供了这些作品中使用的各种AI方法的详细说明。

URL

https://arxiv.org/abs/2405.07773

PDF

https://arxiv.org/pdf/2405.07773.pdf
Read All
Challenges and Opportunities of NLP for HR Applications: A Discussion Paper

2024-05-13 14:09:06

Jochen L. Leidner, Mark Stevenson

arXiv_AI

arXiv_AI Review
Abstract

Over the course of the recent decade, tremendous progress has been made in the areas of machine learning and natural language processing, which opened up vast areas of potential application use cases, including hiring and human resource management. We review the use cases for text analytics in the realm of human resources/personnel management, including actually realized as well as potential but not yet implemented ones, and we analyze the opportunities and risks of these.

Abstract (translated)

在过去的十年里，机器学习和自然语言处理领域取得了巨大的进展，为应用场景打开了广阔的空间，包括招聘和人力资源管理。我们回顾了人力资源/人员管理领域中文本分析的使用案例，包括实际实现的和尚未实现的潜在应用，并分析这些机会和风险。

URL

https://arxiv.org/abs/2405.07766

PDF

https://arxiv.org/pdf/2405.07766.pdf
Read All
Establishing a Unified Evaluation Framework for Human Motion Generation: A Comparative Analysis of Metrics

2024-05-13 12:10:57

Ali Ismail-Fawaz, Maxime Devanne, Stefano Berretti, Jonathan Weber, Germain Forestier

arXiv_CV

arXiv_CV Review Pose
Abstract

The development of generative artificial intelligence for human motion generation has expanded rapidly, necessitating a unified evaluation framework. This paper presents a detailed review of eight evaluation metrics for human motion generation, highlighting their unique features and shortcomings. We propose standardized practices through a unified evaluation setup to facilitate consistent model comparisons. Additionally, we introduce a novel metric that assesses diversity in temporal distortion by analyzing warping diversity, thereby enhancing the evaluation of temporal data. We also conduct experimental analyses of three generative models using a publicly available dataset, offering insights into the interpretation of each metric in specific case scenarios. Our goal is to offer a clear, user-friendly evaluation framework for newcomers, complemented by publicly accessible code.

Abstract (translated)

为了生成人类动作，发展了大量的生成人工智能，需要一个统一的评估框架。本文对人类动作生成方面的八个评估指标进行了详细回顾，强调了它们的独特特点和不足之处。我们通过统一的评估设置提出了标准化实践，以便进行一致的模型比较。此外，我们还引入了一个新的指标，通过分析畸变多样性来评估时间扭曲的多样性，从而提高了对时间数据的评估。我们还使用公开可用的数据集对三种生成模型进行了实验分析，为特定场景场景提供了关于每个指标的解释。我们的目标是提供一个清晰、易于使用的评估框架，并提供公开可用的代码作为补充。

URL

https://arxiv.org/abs/2405.07680

PDF

https://arxiv.org/pdf/2405.07680.pdf
Read All
Dehazing Remote Sensing and UAV Imagery: A Review of Deep Learning, Prior-based, and Hybrid Approaches

2024-05-13 07:35:24

Gao Yu Lee, Jinkuan Chen, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N Duong

arXiv_CV

arXiv_CV CNN Deep_Learning Review Knowledge Transformer Pose Few-Shot
Abstract

High-quality images are crucial in remote sensing and UAV applications, but atmospheric haze can severely degrade image quality, making image dehazing a critical research area. Since the introduction of deep convolutional neural networks, numerous approaches have been proposed, and even more have emerged with the development of vision transformers and contrastive/few-shot learning. Simultaneously, papers describing dehazing architectures applicable to various Remote Sensing (RS) domains are also being published. This review goes beyond the traditional focus on benchmarked haze datasets, as we also explore the application of dehazing techniques to remote sensing and UAV datasets, providing a comprehensive overview of both deep learning and prior-based approaches in these domains. We identify key challenges, including the lack of large-scale RS datasets and the need for more robust evaluation metrics, and outline potential solutions and future research directions to address them. This review is the first, to our knowledge, to provide comprehensive discussions on both existing and very recent dehazing approaches (as of 2024) on benchmarked and RS datasets, including UAV-based imagery.

Abstract (translated)

高质量的图像在遥感和无人机应用中至关重要，但大气雾霾会严重破坏图像质量，使图像去雾成为一个关键的研究领域。自深度卷积神经网络的引入，已经提出了许多方法，随着视觉变压器和对比/零样本学习的发展，更多方法也应运而生。同时，描述适用于各种遥感（RS）领域的去雾架构的论文也在不断发表。本综述超越了传统关注基准雾数据集的范围，我们还在遥感和无人机数据上探讨了去雾技术的应用，为这些领域提供了一个全面的深度学习和基于先验方法的研究概述。我们指出了关键挑战，包括缺乏大规模 RS 数据集和需要更健壮的评估指标，并提出了可能的解决方案和未来的研究方向来解决这些挑战。据我们所知，这是第一部关于基准和 RS 数据集上现有和非常最近去雾方法的综合讨论（截至 2024 年）。包括基于 UAV 的图像。

URL

https://arxiv.org/abs/2405.07520

PDF

https://arxiv.org/pdf/2405.07520.pdf
Read All
Machine Unlearning: A Comprehensive Survey

2024-05-13 00:58:34

Weiqi Wang, Zhiyi Tian, Shui Yu

arXiv_AI

arXiv_AI GAN Review Survey
Abstract

As the right to be forgotten has been legislated worldwide, many studies attempt to design unlearning mechanisms to protect users' privacy when they want to leave machine learning service platforms. Specifically, machine unlearning is to make a trained model to remove the contribution of an erased subset of the training dataset. This survey aims to systematically classify a wide range of machine unlearning and discuss their differences, connections and open problems. We categorize current unlearning methods into four scenarios: centralized unlearning, distributed and irregular data unlearning, unlearning verification, and privacy and security issues in unlearning. Since centralized unlearning is the primary domain, we use two parts to introduce: firstly, we classify centralized unlearning into exact unlearning and approximate unlearning; secondly, we offer a detailed introduction to the techniques of these methods. Besides the centralized unlearning, we notice some studies about distributed and irregular data unlearning and introduce federated unlearning and graph unlearning as the two representative directions. After introducing unlearning methods, we review studies about unlearning verification. Moreover, we consider the privacy and security issues essential in machine unlearning and organize the latest related literature. Finally, we discuss the challenges of various unlearning scenarios and address the potential research directions.

Abstract (translated)

随着全球范围内对隐私保护的立法，许多研究试图设计 unlearning 机制以保护用户在想要离开机器学习服务平台时泄露隐私。具体来说，机器 unlearning 是指训练一个模型，从训练数据集中的被删除子集的贡献中移除贡献。本调查旨在系统地分类机器 unlearning 并提出它们的差异、联系和未解决的问题。我们将当前的 unlearning 方法分为四个场景：集中 unlearning、分布式和 irregular 数据 unlearning、 unlearning 验证和隐私和安全问题。由于集中 unlearning 是主要的领域，我们使用两个部分来介绍：首先，我们将集中 unlearning 分为精确 unlearning 和近似 unlearning；其次，我们详细介绍了这些方法的技巧。除了集中 unlearning，我们还注意到了一些关于分布式和 irregular 数据 unlearning 的研究，并将 federated unlearning 和 graph unlearning 作为两个具有代表性的方向。在介绍 unlearning 方法之后，我们回顾了关于 unlearning 验证的研究。此外，我们考虑了机器 unlearning 中隐私和安全问题至关重要，并组织最新的相关文献。最后，我们讨论了各种 unlearning 场景的挑战，并提出了潜在的研究方向。

URL

https://arxiv.org/abs/2405.07406

PDF

https://arxiv.org/pdf/2405.07406.pdf
Read All
HGTDR: Advancing Drug Repurposing with Heterogeneous Graph Transformers

2024-05-12 21:34:03

Ali Gharizadeh, Karim Abbasi, Amin Ghareyazi, Mohammad R. K. Mofrad, Hamid R. Rabiee

arXiv_AI

arXiv_AI Review Relation Knowledge Knowledge_Graph Transformer Pose Medical
Abstract

Motivation: Drug repurposing is a viable solution for reducing the time and cost associated with drug development. However, thus far, the proposed drug repurposing approaches still need to meet expectations. Therefore, it is crucial to offer a systematic approach for drug repurposing to achieve cost savings and enhance human lives. In recent years, using biological network-based methods for drug repurposing has generated promising results. Nevertheless, these methods have limitations. Primarily, the scope of these methods is generally limited concerning the size and variety of data they can effectively handle. Another issue arises from the treatment of heterogeneous data, which needs to be addressed or converted into homogeneous data, leading to a loss of information. A significant drawback is that most of these approaches lack end-to-end functionality, necessitating manual implementation and expert knowledge in certain stages. Results: We propose a new solution, HGTDR (Heterogeneous Graph Transformer for Drug Repurposing), to address the challenges associated with drug repurposing. HGTDR is a three-step approach for knowledge graph-based drug re-purposing: 1) constructing a heterogeneous knowledge graph, 2) utilizing a heterogeneous graph transformer network, and 3) computing relationship scores using a fully connected network. By leveraging HGTDR, users gain the ability to manipulate input graphs, extract information from diverse entities, and obtain their desired output. In the evaluation step, we demonstrate that HGTDR performs comparably to previous methods. Furthermore, we review medical studies to validate our method's top ten drug repurposing suggestions, which have exhibited promising results. We also demon-strated HGTDR's capability to predict other types of relations through numerical and experimental validation, such as drug-protein and disease-protein inter-relations.

Abstract (translated)

动机：药物再利用是减少与药物研发相关的时间和成本的有效解决方案。然而，迄今为止，所提出的药物再利用方法仍需要满足预期。因此，为达到节省成本和提高人类生命的目标，有必要为药物再利用提供系统方法。近年来，基于生物网络的药物再利用方法已经产生了积极的结果。然而，这些方法存在局限性。首先，这些方法的适用范围通常仅限于他们可以有效地处理的数据规模和多样性。另一个问题源于处理异质数据，需要处理或将其转换为同质数据，导致信息丢失。一个显著的缺点是，大多数这些方法缺乏端到端的功能，需要手动实施并在某些阶段专家知识。结果：我们提出了一个新的解决方案，HGTDR（异质知识图Transformer for Drug Repurposing），以应对与药物再利用相关的挑战。HGTDR是基于知识图的药物再利用的三步方法：1）构建异质知识图，2）使用异质图转

URL

https://arxiv.org/abs/2405.08031

PDF

https://arxiv.org/pdf/2405.08031.pdf
Read All
Human-interpretable clustering of short-text using large language models

2024-05-12 12:55:40

Justin K. Miller, Tristram J. Alexander

arXiv_CL

arXiv_CL Review Language_Model Transformer Chat
Abstract

Large language models have seen extraordinary growth in popularity due to their human-like content generation capabilities. We show that these models can also be used to successfully cluster human-generated content, with success defined through the measures of distinctiveness and interpretability. This success is validated by both human reviewers and ChatGPT, providing an automated means to close the 'validation gap' that has challenged short-text clustering. Comparing the machine and human approaches we identify the biases inherent in each, and question the reliance on human-coding as the 'gold standard'. We apply our methodology to Twitter bios and find characteristic ways humans describe themselves, agreeing well with prior specialist work, but with interesting differences characteristic of the medium used to express identity.

Abstract (translated)

由于它们具有类似于人类的内容生成能力，大型语言模型在流行度上看到了非凡的增长。我们证明了这些模型还可以用于成功聚类人类生成的内容，成功定义为 distinctiveness（区分度）和 interpretability（可解释性）。这一成功得到了人类审核者和 ChatGPT 的验证，为解决短文本聚类中的“验证差距”提供了一种自动化的方法。比较机器和人类方法，我们识别出每种方法固有的偏见，并质疑将人类编码作为“黄金标准”的可信度。我们将我们的方法应用于 Twitter 简介，发现人们描述自己的方式具有特征性，与先前专家工作相一致，但用于表达身份的有趣差异。

URL

https://arxiv.org/abs/2405.07278

PDF

https://arxiv.org/pdf/2405.07278.pdf
Read All
InsightNet: Structured Insight Mining from Customer Feedback

2024-05-12 07:40:12

Sandeep Sricharan Mukku, Manan Soni, Jitenkumar Rana, Chetan Aggarwal, Promod Yenigalla, Rashmi Patange, Shyam Mohan

arXiv_AI

arXiv_AI Review Classification Sentiment Pose Action LLM
Abstract

We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level taxonomy from raw reviews, a semantic similarity heuristic approach to generate labelled data and employs a multi-task insight extraction architecture by fine-tuning an LLM. InsightNet identifies granular actionable topics with customer sentiments and verbatim for each topic. Evaluations on real-world customer review data show that InsightNet performs better than existing solutions in terms of structure, hierarchy and completeness. We empirically demonstrate that InsightNet outperforms the current state-of-the-art methods in multi-label topic classification, achieving an F1 score of 0.85, which is an improvement of 11% F1-score over the previous best results. Additionally, InsightNet generalises well for unseen aspects and suggests new topics to be added to the taxonomy.

Abstract (translated)

我们提出了InsightNet,一种从客户评论中自动提取结构化洞见的全新方法。我们的端到端机器学习框架旨在克服现有解决方案的局限性，包括缺乏主题结构的缺乏、非标准的方面名称和缺乏丰富的训练数据。所提出的解决方案基于原始评论构建了一个半监督的多级分类树，使用语义相似性启发式方法生成带标签的数据，并通过微调LLM来采用多任务智能提取架构。InsightNet通过客户情感和句子的精确匹配识别出细粒度的可操作主题。在真实世界客户评论数据上的评估表明，InsightNet在结构、层次和完整性方面优于现有解决方案。我们通过实验验证，InsightNet在多标签主题分类上超越了现有技术水平，实现了0.85的F1分数，这是前最佳结果的11%以上。此外，InsightNet对未见过的方面扩展良好，并为税目添加了新的主题建议。

URL

https://arxiv.org/abs/2405.07195

PDF

https://arxiv.org/pdf/2405.07195.pdf
Read All
Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI

2024-05-12 05:05:31

Gyeong-Geon Lee, Xiaoming Zhai

arXiv_AI

arXiv_AI VQA Review QA Language_Model Transformer Pose Chat
Abstract

Educational scholars have analyzed various image data acquired from teaching and learning situations, such as photos that shows classroom dynamics, students' drawings with regard to the learning content, textbook illustrations, etc. Unquestioningly, most qualitative analysis of and explanation on image data have been conducted by human researchers, without machine-based automation. It was partially because most image processing artificial intelligence models were not accessible to general educational scholars or explainable due to their complex deep neural network architecture. However, the recent development of Visual Question Answering (VQA) techniques is accomplishing usable visual language models, which receive from the user a question about the given image and returns an answer, both in natural language. Particularly, GPT-4V released by OpenAI, has wide opened the state-of-the-art visual langauge model service so that VQA could be used for a variety of purposes. However, VQA and GPT-4V have not yet been applied to educational studies much. In this position paper, we suggest that GPT-4V contributes to realizing VQA for education. By 'realizing' VQA, we denote two meanings: (1) GPT-4V realizes the utilization of VQA techniques by any educational scholars without technical/accessibility barrier, and (2) GPT-4V makes educational scholars realize the usefulness of VQA to educational research. Given these, this paper aims to introduce VQA for educational studies so that it provides a milestone for educational research methodology. In this paper, chapter II reviews the development of VQA techniques, which primes with the release of GPT-4V. Chapter III reviews the use of image analysis in educational studies. Chapter IV demonstrates how GPT-4V can be used for each research usage reviewed in Chapter III, with operating prompts provided. Finally, chapter V discusses the future implications.

Abstract (translated)

教育学者们对从教学和学习情境中获取的各种图像数据进行了分析，例如显示课堂动态的照片、关于学习内容的学生的绘画，教科书插图等。毫无疑问，大多数图像数据的可视化和解释都是通过人类研究人员进行的，没有机器基于的自动化。这部分是因为大多数图像处理人工智能模型对一般教育学者来说难以获取，或者由于其复杂深度神经网络架构，难以解释。然而，最近开发的视觉问答技术（VQA）正在取得可用性，该技术接受用户关于给定图像的问题，并返回自然语言的答案。特别是，OpenAI 发布的 GPT-4V 已经大大拓展了最先进的视觉语言模型服务，使得 VQA 可以用于各种目的。然而，迄今为止，VQA 和 GPT-4V 还没有在教育研究中得到广泛应用。在本文论文中，我们建议 GPT-4V 对教育研究有所贡献。通过“实现” VQA，我们指的是两个含义：（1）GPT-4V 实现了任何教育学者在不存在技术/可用性障碍的情况下利用 VQA 技术，以及（2）GPT-4V 使教育学者意识到 VQA 对教育研究的有用性。基于这些，本文旨在为教育研究提供 VQA 的里程碑，以便为教育研究方法论提供基准。本文第 II 章回顾了 VQA 技术的发展历程，第 III 章讨论了图像分析在教育研究中的应用，第 IV 章展示了 GPT-4V 在每个审查的研究用途中的应用，并提供操作提示。最后，第 V 章讨论了未来的影响。

URL

https://arxiv.org/abs/2405.07163

PDF

https://arxiv.org/pdf/2405.07163.pdf
Read All
Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language Models

2024-05-11 19:26:00

Edward Y. Chang

arXiv_AI

arXiv_AI Review Adversarial Classification Language_Model Action Self-Supervised Emotion LLM
Abstract

This research develops advanced methodologies for Large Language Models (LLMs) to better manage linguistic behaviors related to emotions and ethics. We introduce DIKE, an adversarial framework that enhances the LLMs' ability to internalize and reflect global human values, adapting to varied cultural contexts to promote transparency and trust among users. The methodology involves detailed modeling of emotions, classification of linguistic behaviors, and implementation of ethical guardrails. Our innovative approaches include mapping emotions and behaviors using self-supervised learning techniques, refining these guardrails through adversarial reviews, and systematically adjusting outputs to ensure ethical alignment. This framework establishes a robust foundation for AI systems to operate with ethical integrity and cultural sensitivity, paving the way for more responsible and context-aware AI interactions.

Abstract (translated)

这项研究开发了大型语言模型（LLMs）更好地管理与情感和伦理行为相关的先进方法。我们引入了DIKE，一种对抗性框架，它增强了LLMs内部化和反映全球人类价值观的能力，适应各种文化环境以促进用户之间的透明度和信任。方法论包括情感的详细建模、语言行为的分类和实施道德边界。我们创新的方法包括使用自我监督学习技术进行情感和行为的映射，通过对抗性审查来优化这些边界，并系统地调整输出以确保伦理一致性。这个框架为AI系统操作道德 integrity 和文化敏感奠定了坚实的基础，为更负责任和上下文敏感的AI交互铺平了道路。

URL

https://arxiv.org/abs/2405.07076

PDF

https://arxiv.org/pdf/2405.07076.pdf
Read All
Fairness in Reinforcement Learning: A Survey

2024-05-11 04:36:46

Anka Reuel, Devin Ma

arXiv_AI

arXiv_AI Review Survey Reinforcement_Learning Classification Attention Autonomous Agent
Abstract

While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period of time. To ensure the responsible development and deployment of these systems, we must better understand fairness in RL. In this paper, we survey the literature to provide the most up-to-date snapshot of the frontiers of fairness in RL. We start by reviewing where fairness considerations can arise in RL, then discuss the various definitions of fairness in RL that have been put forth thus far. We continue to highlight the methodologies researchers used to implement fairness in single- and multi-agent RL systems before showcasing the distinct application domains that fair RL has been investigated in. Finally, we critically examine gaps in the literature, such as understanding fairness in the context of RLHF, that still need to be addressed in future work to truly operationalize fair RL in real-world systems.

Abstract (translated)

虽然我们对机器学习中的公平性理解已经有了很大的进步，但在强化学习（RL）中的公平性理解仍然非常初浅。大部分关注点都集中在单次分类任务中的公平性上；然而，现实世界中的RL enabled系统（例如，自动驾驶车辆）在较长的时期内操作在动态环境中更为复杂。为了确保这些系统的负责任发展和部署，我们必须更好地理解RL中的公平性。在本文中，我们调查了文献，为RL中的公平性提供了最最新的前沿。我们首先回顾了在RL中公平性考虑可能出现的地方，然后讨论了迄今为止在RL中提出的公平性的各种定义。我们继续强调研究人员在实现单人和多机器人RL系统中的公平性时所采用的方法，然后展示了公平RL在研究中所涉及的独特应用领域。最后，我们对这些文献中仍需解决的关于RLHF公平性理解的空白进行了批判性审查，以确保在未来的工作中真正实现公平RL在现实世界系统中的操作。

URL

https://arxiv.org/abs/2405.06909

PDF

https://arxiv.org/pdf/2405.06909.pdf
Read All
LIVE: LaTex Interactive Visual Editing

2024-05-10 18:28:00

Jinwei Lin

arXiv_CL

arXiv_CL Review Relation Pose
Abstract

LaTex coding is one of the main methods of writing an academic paper. When writing a paper, abundant proper visual or graphic components will represent more information volume than the textual data. However, most of the implementation of LaTex graphic items are designed as static items that have some weaknesses in representing more informative figures or tables with an interactive reading experience. To address this problem, we propose LIVE, a novel design methods idea to design interactive LaTex graphic items. To make a lucid representation of the main idea of LIVE, we designed several novels representing implementations that are interactive and enough explanation for the basic level principles. Using LIVE can design more graphic items, which we call the Gitems, and easily and automatically get the relationship of the mutual application of a specific range of papers, which will add more vitality and performance factors into writing of traditional papers especially the review papers. For vividly representing the functions of LIVE, we use the papers from NeRF as the example reference papers. The code of the implementation project is open source.

Abstract (translated)

LaTeX 编码是编写学术论文的主要方法之一。在编写论文时，丰富的适当视觉或图形元素将表示文本数据的两倍信息量。然而，大多数 LaTeX 图形元素的实现都被设计为静态项，在表示更丰富的图形或具有交互式阅读体验的表格时存在一些弱点。为解决这个问题，我们提出了 LIVE，一种新颖的 LaTeX 图形元素设计方法，旨在设计交互式 LaTeX 图形元素。为了清晰地表示 Live 的主要思想，我们设计了几种具有交互性和足够解释基本原则的实现。使用 Live 可以设计更多的图形元素，我们称之为 Gitems，并且可以轻松地自动获取特定范围论文之间的相互应用关系，这将增加传统论文写作的活力和性能因素，尤其是综述论文。为了生动地表示 Live 的功能，我们使用 NeRF 论文作为示例参考文献。实现项目的代码是开源的。

URL

https://arxiv.org/abs/2405.06762

PDF

https://arxiv.org/pdf/2405.06762.pdf
Read All

Content

Review (20)

Review

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF