Transfer_Learning

Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

2024-05-14 19:18:19

Chendi Wang, Yuqing Zhu, Weijie J. Su, Yu-Xiang Wang

arXiv_CV

arXiv_CV Deep_Learning Classification Transfer_Learning Represenation_Learning Transformer
Abstract

A recent study by De et al. (2022) has reported that large-scale representation learning through pre-training on a public dataset significantly enhances differentially private (DP) learning in downstream tasks, despite the high dimensionality of the feature space. To theoretically explain this phenomenon, we consider the setting of a layer-peeled model in representation learning, which results in interesting phenomena related to learned features in deep learning and transfer learning, known as Neural Collapse (NC). Within the framework of NC, we establish an error bound indicating that the misclassification error is independent of dimension when the distance between actual features and the ideal ones is smaller than a threshold. Additionally, the quality of the features in the last layer is empirically evaluated under different pre-trained models within the framework of NC, showing that a more powerful transformer leads to a better feature representation. Furthermore, we reveal that DP fine-tuning is less robust compared to fine-tuning without DP, particularly in the presence of perturbations. These observations are supported by both theoretical analyses and experimental evaluation. Moreover, to enhance the robustness of DP fine-tuning, we suggest several strategies, such as feature normalization or employing dimension reduction methods like Principal Component Analysis (PCA). Empirically, we demonstrate a significant improvement in testing accuracy by conducting PCA on the last-layer features.

Abstract (translated)

De等人（2022）最近发表的一篇研究报道指出，通过在公共数据集上进行大规模表示学习，可以显著增强下游任务的差异隐私（DP）学习，尽管特征空间具有高维度。为了理论地解释这一现象，我们考虑表示学习中的层剥离设置，该设置在深度学习和迁移学习中学到的特征中产生了关于神经崩塌（NC）有趣的现象。在NC的框架内，我们建立了误分类误差与维度之间的独立性，即实际特征与理想特征之间的距离小于一个阈值时，误分类误差是独立的。此外，在不同预训练模型下对NC框架内最后层的特征进行了实证评估，结果表明，更强大的Transformer导致更好的特征表示。此外，我们还发现，与没有DP的微调相比，DP微调的鲁棒性较低，尤其是在存在扰动的情况下。这些观察结果得到了理论分析和实验评估的支持。此外，为了增强DP微调的鲁棒性，我们提出了几个策略，例如特征归一化或采用如主成分分析（PCA）等降维方法。通过PCA对最后一层特征进行降维，我们得到了显著的测试准确率提升。

URL

https://arxiv.org/abs/2405.08920

PDF

https://arxiv.org/pdf/2405.08920.pdf
Read All
MRSegmentator: Robust Multi-Modality Segmentation of 40 Classes in MRI and CT Sequences

2024-05-13 16:46:34

Hartmut H\"antze, Lina Xu, Felix J. Dorfner, Leonhard Donle, Daniel Truhn, Hugo Aerts, Mathias Prokop, Bram van Ginneken, Alessa Hering, Lisa C. Adams, Keno K. Bressem

arXiv_CV

arXiv_CV Segmentation GAN Deep_Learning Transfer_Learning Optimization Pose
Abstract

Purpose: To introduce a deep learning model capable of multi-organ segmentation in MRI scans, offering a solution to the current limitations in MRI analysis due to challenges in resolution, standardized intensity values, and variability in sequences. Materials and Methods: he model was trained on 1,200 manually annotated MRI scans from the UK Biobank, 221 in-house MRI scans and 1228 CT scans, leveraging cross-modality transfer learning from CT segmentation models. A human-in-the-loop annotation workflow was employed to efficiently create high-quality segmentations. The model's performance was evaluated on NAKO and the AMOS22 dataset containing 600 and 60 MRI examinations. Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD) was used to assess segmentation accuracy. The model will be open sourced. Results: The model showcased high accuracy in segmenting well-defined organs, achieving Dice Similarity Coefficient (DSC) scores of 0.97 for the right and left lungs, and 0.95 for the heart. It also demonstrated robustness in organs like the liver (DSC: 0.96) and kidneys (DSC: 0.95 left, 0.95 right), which present more variability. However, segmentation of smaller and complex structures such as the portal and splenic veins (DSC: 0.54) and adrenal glands (DSC: 0.65 left, 0.61 right) revealed the need for further model optimization. Conclusion: The proposed model is a robust, tool for accurate segmentation of 40 anatomical structures in MRI and CT images. By leveraging cross-modality learning and interactive annotation, the model achieves strong performance and generalizability across diverse datasets, making it a valuable resource for researchers and clinicians. It is open source and can be downloaded from this https URL.

Abstract (translated)

目的：介绍一种能够在MRI扫描中进行多器官分割的深度学习模型，解决了由于分辨率、标准化强度值和序列变异性导致的MRI分析 current limitations。材料和方法：该模型在来自英国生物银行的手动标注的1,200个MRI扫描、221个内部MRI扫描和1,228个CT扫描上进行训练，利用来自CT分割模型的跨模态转移学习。采用人机交互注释工作流程来高效地创建高质量分割。对模型性能进行了评估，在NAKO和包括600个和60个MRI检查的AMOS22数据集上。Dice相似性系数（DSC）和汉明距离（HD）被用来评估分割准确性。该模型将开源。结果：该模型在分割明确定义的器官方面表现出色，右肺和左肺的Dice相似性系数（DSC）分别为0.97，心脏的Dice相似性系数（DSC）分别为0.95。它还展示了在肝脏（DSC: 0.96）和肾脏（DSC: 0.95 left, 0.95 right）等结构中保持稳健性，这些结构具有更大的变异性。然而，对较小和复杂结构的分割（如门静脉和脾静脉，DSC: 0.54）和肾上腺素分泌细胞（DSC: 0.65 left, 0.61 right）的分割揭示了进一步模型优化。结论：所提出的模型是一种准确分割MRI和CT图像中40个解剖结构的有力工具。通过利用跨模态学习和支持性注释，该模型在各种数据集上都实现了强大的性能和泛化能力，成为研究人员和临床医生的有价值资源。该模型是开源的，可以从https://这个链接下载。

URL

https://arxiv.org/abs/2405.06463

PDF

https://arxiv.org/pdf/2405.06463.pdf
Read All
Enhancing Clinically Significant Prostate Cancer Prediction in T2-weighted Images through Transfer Learning from Breast Cancer

2024-05-13 15:57:27

Chi-en Amy Tai, Alexander Wong

arXiv_CV

arXiv_CV Transfer_Learning Prediction
Abstract

In 2020, prostate cancer saw a staggering 1.4 million new cases, resulting in over 375,000 deaths. The accurate identification of clinically significant prostate cancer is crucial for delivering effective treatment to patients. Consequently, there has been a surge in research exploring the application of deep neural networks to predict clinical significance based on magnetic resonance images. However, these networks demand extensive datasets to attain optimal performance. Recently, transfer learning emerged as a technique that leverages acquired features from a domain with richer data to enhance the performance of a domain with limited data. In this paper, we investigate the improvement of clinically significant prostate cancer prediction in T2-weighted images through transfer learning from breast cancer. The results demonstrate a remarkable improvement of over 30% in leave-one-out cross-validation accuracy.

Abstract (translated)

在2020年，前列腺癌新病例数量令人震惊地达到了140万，导致超过37.5万人死于癌症。准确识别临床显著性前列腺癌对给患者提供有效的治疗至关重要。因此，自那时以来，研究探索将深度神经网络应用于根据磁共振图像预测临床显著性的方法急剧增加。然而，这些网络需要大量数据来达到最佳性能。最近，传输学习成为了一种利用具有更丰富数据领域的已获得特征来增强具有有限数据的领域的技术。在本文中，我们研究了从乳腺癌中通过传输学习改善临床显著性前列腺癌预测的效果。结果表明，在 leave-one-out 交叉验证精度上，超过30%的改善是非常显著的。

URL

https://arxiv.org/abs/2405.07869

PDF

https://arxiv.org/pdf/2405.07869.pdf
Read All
Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

2024-05-13 15:12:21

Yuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie Zhu

arXiv_AI

arXiv_AI Recognition Review Classification Transfer_Learning Pose
Abstract

Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study", which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%.

Abstract (translated)

检测摄入环境是监控饮食摄入的重要方面。它为饮食评估提供了有洞察力的信息。然而，它是一个具有挑战性的问题，基于人类审查的方法可以让人感到乏味，而基于算法审查的方法则受到数据不平衡和感知混淆问题的困扰。为了解决这些问题，我们提出了一个基于神经网络的方法，具有两个训练框架，巧妙地将微调和支持学习技术相结合。我们对该方法在一个名为“UA自由生活研究”的新数据集上进行了评估，该数据集使用一个以自我为中心的智能穿戴相机、AIM-2传感器等设备，在自由生活条件下模拟食物摄入。所提出的训练框架应用于常见的神经网络骨干，结合了通用不平衡分类领域的方法。对收集到的数据集的实验结果表明，我们提出的自动摄入环境识别方法成功地解决了数据不平衡问题，并实现了96.63%的准确分类准确率，这是一个有前景的结果。

URL

https://arxiv.org/abs/2405.07827

PDF

https://arxiv.org/pdf/2405.07827.pdf
Read All
CL-MRI: Self-Supervised Contrastive Learning to Improve the Accuracy of Undersampled MRI Reconstruction

2024-05-10 04:48:19

Mevan Ekanayake, Zhifeng Chen, Mehrtash Harandi, Gary Egan, Zhaolin Chen

arXiv_AI

arXiv_AI Image_Caption Deep_Learning Adversarial Transfer_Learning Quantitative Pose Self-Supervised Contrastive_Learning Reconstruction
Abstract

In Magnetic Resonance Imaging (MRI), image acquisitions are often undersampled in the measurement domain to accelerate the scanning process, at the expense of image quality. However, image quality is a crucial factor that influences the accuracy of clinical diagnosis; hence, high-quality image reconstruction from undersampled measurements has been a key area of research. Recently, deep learning (DL) methods have emerged as the state-of-the-art for MRI reconstruction, typically involving deep neural networks to transform undersampled MRI images into high-quality MRI images through data-driven processes. Nevertheless, there is clear and significant room for improvement in undersampled DL MRI reconstruction to meet the high standards required for clinical diagnosis, in terms of eliminating aliasing artifacts and reducing image noise. In this paper, we introduce a self-supervised pretraining procedure using contrastive learning to improve the accuracy of undersampled DL MRI reconstruction. We use contrastive learning to transform the MRI image representations into a latent space that maximizes mutual information among different undersampled representations and optimizes the information content at the input of the downstream DL reconstruction models. Our experiments demonstrate improved reconstruction accuracy across a range of acceleration factors and datasets, both quantitatively and qualitatively. Furthermore, our extended experiments validate the proposed framework's robustness under adversarial conditions, such as measurement noise, different k-space sampling patterns, and pathological abnormalities, and also prove the transfer learning capabilities on MRI datasets with completely different anatomy. Additionally, we conducted experiments to visualize and analyze the properties of the proposed MRI contrastive learning latent space.

Abstract (translated)

在磁共振成像（MRI）中，图像采集通常在测量域中 undersampled 以加速扫描过程，但以牺牲图像质量为代价。然而，图像质量是影响临床诊断准确性至关重要的一个因素，因此，高质量的 undersampled 测量图像重建一直是一个研究热点。最近，深度学习（DL）方法已成为 MRI 重建的最新技术，通常涉及使用深度神经网络将 undersampled MRI 图像转换为高质量 MRI 图像，通过数据驱动的过程。然而，在 undersampled DL MRI 重建中，仍然存在显而易见的改进空间，以满足临床诊断的高标准，即消除混叠伪影并减少图像噪声。在本文中，我们使用对比学习引入自监督预训练程序来提高 undersampled DL MRI 重建的准确性。我们使用对比学习将 MRI 图像表示转换为具有最大相互信息的不同 undersampled 表示的潜在空间，并优化输入下游 DL 重建模型的信息内容。我们的实验结果表明，在不同的加速因子和数据集上，图像重建准确性得到了显著提高，无论是定量还是定性。此外，我们的扩展实验证明了所提出的框架在逆境条件下的鲁棒性，例如测量噪声、不同的 k-空间采样模式和病理性异常，以及证明其在具有完全不同解剖学结构的 MRI 数据上的迁移学习能力。此外，我们还进行了实验来可视化和分析所提出的 MRI 对比学习潜在空间的性质。

URL

https://arxiv.org/abs/2306.00530

PDF

https://arxiv.org/pdf/2306.00530.pdf
Read All
Robust and Explainable Fine-Grained Visual Classification with Transfer Learning: A Dual-Carriageway Framework

2024-05-09 15:41:10

Zheming Zuo, Joseph Smith, Jonathan Stonehouse, Boguslaw Obara

arXiv_CV

arXiv_CV CNN Deep_Learning Classification Transfer_Learning Inference Quantitative
Abstract

In the realm of practical fine-grained visual classification applications rooted in deep learning, a common scenario involves training a model using a pre-existing dataset. Subsequently, a new dataset becomes available, prompting the desire to make a pivotal decision for achieving enhanced and leveraged inference performance on both sides: Should one opt to train datasets from scratch or fine-tune the model trained on the initial dataset using the newly released dataset? The existing literature reveals a lack of methods to systematically determine the optimal training strategy, necessitating explainability. To this end, we present an automatic best-suit training solution searching framework, the Dual-Carriageway Framework (DCF), to fill this gap. DCF benefits from the design of a dual-direction search (starting from the pre-existing or the newly released dataset) where five different training settings are enforced. In addition, DCF is not only capable of figuring out the optimal training strategy with the capability of avoiding overfitting but also yields built-in quantitative and visual explanations derived from the actual input and weights of the trained model. We validated DCF's effectiveness through experiments with three convolutional neural networks (ResNet18, ResNet34 and Inception-v3) on two temporally continued commercial product datasets. Results showed fine-tuning pathways outperformed training-from-scratch ones by up to 2.13% and 1.23% on the pre-existing and new datasets, respectively, in terms of mean accuracy. Furthermore, DCF identified reflection padding as the superior padding method, enhancing testing accuracy by 3.72% on average. This framework stands out for its potential to guide the development of robust and explainable AI solutions in fine-grained visual classification tasks.

Abstract (translated)

在基于深度学习的实际细粒度视觉分类应用领域中，一种常见的情况是使用预先存在的数据集训练模型。随后，一个新的数据集变得可用，激发了对提高和充分利用推理性能的渴望：是否选择从头训练数据集，或者使用新发布的数据集对预训练模型进行微调？现有的文献表明，没有方法可以系统地确定最优训练策略，导致可解释性。为此，我们提出了一个自动最佳适应训练解决方案——双车道框架（DCF），来填补这个空白。 DCF 得益于双方向搜索（从预先存在的或新发布的数据集开始）的设计。在这里，我们应用了五种不同的训练设置。此外，DCF 不仅能够通过避免过拟合的能力来确定最优训练策略，而且还能够通过训练模型的实际输入和权重生成内置的数值和视觉解释。我们通过在两个时间持续的商用产品数据集（ResNet18，ResNet34 和 Inception-v3）上进行实验来评估 DCF 的有效性。结果表明，在预先存在和新的数据集上，微调路径的准确率均高于从头训练路径。此外，DCF 将反射填充称为优越的填充方法，通过平均提高测试准确率3.72%。这项框架在引导细粒度视觉分类任务中实现健壮和可解释 AI 解决方案的发展方面具有突出特点。

URL

https://arxiv.org/abs/2405.05853

PDF

https://arxiv.org/pdf/2405.05853.pdf
Read All
Model Inversion Robustness: Can Transfer Learning Help?

2024-05-09 07:24:28

Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran, Ngoc-Bao Nguyen, Ngai-Man Cheung

arXiv_CV

arXiv_CV Regularization Transfer_Learning Pose
Abstract

Model Inversion (MI) attacks aim to reconstruct private training data by abusing access to machine learning models. Contemporary MI attacks have achieved impressive attack performance, posing serious threats to privacy. Meanwhile, all existing MI defense methods rely on regularization that is in direct conflict with the training objective, resulting in noticeable degradation in model utility. In this work, we take a different perspective, and propose a novel and simple Transfer Learning-based Defense against Model Inversion (TL-DMI) to render MI-robust models. Particularly, by leveraging TL, we limit the number of layers encoding sensitive information from private training dataset, thereby degrading the performance of MI attack. We conduct an analysis using Fisher Information to justify our method. Our defense is remarkably simple to implement. Without bells and whistles, we show in extensive experiments that TL-DMI achieves state-of-the-art (SOTA) MI robustness. Our code, pre-trained models, demo and inverted data are available at: this https URL

Abstract (translated)

模型反向（MI）攻击旨在通过滥用对机器学习模型的访问来重构私有训练数据。当代MI攻击已经取得了令人印象深刻的攻击性能，对隐私造成了严重威胁。与此同时，所有现有的MI防御方法都依赖于正则化，与训练目标直接冲突，导致模型效用明显下降。在本文中，我们采取了一种不同的观点，并提出了一个基于Transfer Learning的新的简单的防御方法对抗模型反向（TL-DMI），从而使MI攻击鲁棒。特别地，通过利用Transfer Learning，我们限制了从私有训练数据中编码敏感信息的层数，从而降低了MI攻击的性能。我们使用Fisher信息进行分析，以证明我们的方法。我们的防御方法非常简单易行。没有花哨的装饰，我们在广泛的实验中展示了TL-DMI达到了最先进的（SOTA）MI鲁棒性。我们的代码、预训练模型、演示数据和反向数据都可以在：https:// this URL 找到。

URL

https://arxiv.org/abs/2405.05588

PDF

https://arxiv.org/pdf/2405.05588.pdf
Read All
Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming

2024-05-08 16:13:40

Tommaso Pasini, Alejo L\'opez-\'Avila, Husam Quteineh, Gerasimos Lampouras, Jinhua Du, Yubing Wang, Ze Li, Yusen Sun

arXiv_CL

arXiv_CL Transfer_Learning Language_Model Pose
Abstract

Composing poetry or lyrics involves several creative factors, but a challenging aspect of generation is the adherence to a more or less strict metric and rhyming pattern. To address this challenge specifically, previous work on the task has mainly focused on reverse language modeling, which brings the critical selection of each rhyming word to the forefront of each verse. On the other hand, reversing the word order requires that models be trained from scratch with this task-specific goal and cannot take advantage of transfer learning from a Pretrained Language Model (PLM). We propose a novel fine-tuning approach that prepends the rhyming word at the start of each lyric, which allows the critical rhyming decision to be made before the model commits to the content of the lyric (as during reverse language modeling), but maintains compatibility with the word order of regular PLMs as the lyric itself is still generated in left-to-right order. We conducted extensive experiments to compare this fine-tuning against the current state-of-the-art strategies for rhyming, finding that our approach generates more readable text and better rhyming capabilities. Furthermore, we furnish a high-quality dataset in English and 12 other languages, analyse the approach's feasibility in a multilingual context, provide extensive experimental results shedding light on good and bad practices for lyrics generation, and propose metrics to compare methods in the future.

Abstract (translated)

创作诗歌或歌词涉及多个创作因素，但生成诗歌的一个挑战是遵循更严格的韵律和韵脚模式。为解决这个挑战，之前的工作主要集中在反向语言建模上，这使得每个诗篇中关键的韵脚单词得到了突出。另一方面，颠倒单词顺序需要从头开始训练模型，并且不能利用预训练语言模型（PLM）的迁移学习。我们提出了一个新颖的微调方法，在每句歌词的开头附加韵脚单词，这使得在模型提交内容之前（如在反向语言建模期间）可以做出关键的韵脚决策，但又保持了与普通PLM的单词顺序兼容。我们进行了大量实验，将这种微调与当前的韵律生成策略进行比较，发现我们的方法生成的可读性文本和更好的韵脚能力。此外，我们还提供了英语和12种其他语言的高质量数据集，分析该方法在多语言环境下的可行性，提供了关于歌词生成中好与坏实践的广泛实验结果，并提出了未来比较方法的指标。

URL

https://arxiv.org/abs/2405.05176

PDF

https://arxiv.org/pdf/2405.05176.pdf
Read All
Deep Learning Method to Predict Wound Healing Progress Based on Collagen Fibers in Wound Tissue

2024-05-08 13:33:32

Juan He, Xiaoyan Wang, Long Chen, Yunpeng Cai, Zhengshan Wang

arXiv_CV

arXiv_CV Deep_Learning Classification Transfer_Learning Knowledge Prediction Quantitative Pose
Abstract

Wound healing is a complex process involving changes in collagen fibers. Accurate monitoring of these changes is crucial for assessing the progress of wound healing and has significant implications for guiding clinical treatment strategies and drug screening. However, traditional quantitative analysis methods focus on spatial characteristics such as collagen fiber alignment and variance, lacking threshold standards to differentiate between different stages of wound healing. To address this issue, we propose an innovative approach based on deep learning to predict the progression of wound healing by analyzing collagen fiber features in histological images of wound tissue. Leveraging the unique learning capabilities of deep learning models, our approach captures the feature variations of collagen fibers in histological images from different categories and classifies them into various stages of wound healing. To overcome the limited availability of histological image data, we employ a transfer learning strategy. Specifically, we fine-tune a VGG16 model pretrained on the ImageNet dataset to adapt it to the classification task of histological images of wounds. Through this process, our model achieves 82% accuracy in classifying six stages of wound healing. Furthermore, to enhance the interpretability of the model, we employ a class activation mapping technique called LayerCAM. LayerCAM reveals the image regions on which the model relies when making predictions, providing transparency to the model's decision-making process. This visualization not only helps us understand how the model identifies and evaluates collagen fiber features but also enhances trust in the model's prediction results. To the best of our knowledge, our proposed model is the first deep learning-based classification model used for predicting wound healing stages.

Abstract (translated)

伤口愈合是一个涉及胶原纤维变化复杂的生物过程。准确监测这些变化对评估伤口愈合进展至关重要，这对指导临床治疗策略和药物筛选具有重大意义。然而，传统的定量分析方法仅关注空间特征，如胶原纤维对齐和方差，缺乏将不同伤口愈合阶段区分开的阈值标准。为了应对这个问题，我们提出了一个基于深度学习的方法来预测伤口愈合的进展，通过分析伤口组织切片的胶原纤维特征。利用深度学习模型的独特学习能力，我们的方法捕捉了不同类别和分类的胶原纤维特征的变异，并将它们分类为不同的伤口愈合阶段。为了克服病理图像数据的有限性，我们采用迁移学习策略。具体来说，我们使用预训练于ImageNet数据集的VGG16模型，并对其进行调整，使其适应伤口组织图像的分类任务。通过这个过程，我们的模型在分类六个伤口愈合阶段时实现了82%的准确率。此外，为了增强模型的可解释性，我们采用了一种称为LayerCAM的分类激活映射技术。LayerCAM揭示了模型在做出预测时所依赖的图像区域，从而使我们对模型的决策过程有更深入的了解。这种可视化不仅帮助我们理解模型如何识别和评估胶原纤维特征，还增强了模型预测结果的可信度。据我们所知，我们的基于深度学习的模型是第一个用于预测伤口愈合阶段的深度学习分类模型。

URL

https://arxiv.org/abs/2405.05297

PDF

https://arxiv.org/pdf/2405.05297.pdf
Read All
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

2024-05-08 02:42:27

Qing Yu, Mikihiro Tanaka, Kent Fujiwara

arXiv_CV

arXiv_CV Recognition Action_Recognition Classification Transfer_Learning Knowledge Language_Model Transformer Pose Action Zero-Shot 3D
Abstract

To build a cross-modal latent space between 3D human motion and language, acquiring large-scale and high-quality human motion data is crucial. However, unlike the abundance of image data, the scarcity of motion data has limited the performance of existing motion-language models. To counter this, we introduce "motion patches", a new representation of motion sequences, and propose using Vision Transformers (ViT) as motion encoders via transfer learning, aiming to extract useful knowledge from the image domain and apply it to the motion domain. These motion patches, created by dividing and sorting skeleton joints based on body parts in motion sequences, are robust to varying skeleton structures, and can be regarded as color image patches in ViT. We find that transfer learning with pre-trained weights of ViT obtained through training with 2D image data can boost the performance of motion analysis, presenting a promising direction for addressing the issue of limited motion data. Our extensive experiments show that the proposed motion patches, used jointly with ViT, achieve state-of-the-art performance in the benchmarks of text-to-motion retrieval, and other novel challenging tasks, such as cross-skeleton recognition, zero-shot motion classification, and human interaction recognition, which are currently impeded by the lack of data.

Abstract (translated)

为了在3D人类运动和语言之间构建跨模态潜在空间，获取大规模且高质量的人类运动数据至关重要。然而，与图像数据的丰富相比，运动数据的稀疏性限制了现有运动-语言模型的性能。为了应对这个问题，我们引入了“运动补丁”，一种新的运动序列表示，并通过迁移学习使用Vision Transformers（ViT）作为运动编码器，旨在从图像域提取有用知识并将其应用于运动域。这些运动补丁是由基于运动部件在运动序列中进行拆分和排序的骨骼关节创建的，对不同的骨架结构具有鲁棒性，可以被视为ViT中的颜色图像补丁。我们发现，通过使用通过2D图像数据训练得到的预训练ViT权重的迁移学习，可以提高运动分析的性能，为解决运动数据有限的问题提供了一个有前途的方向。我们的广泛实验表明，与ViT共同使用的运动补丁在文本到运动检索基准测试和其他新颖挑战任务（如跨骨架识别、零散射击运动分类和人类交互识别）上实现了最先进的性能，这些任务目前由于缺乏数据而受到阻碍。

URL

https://arxiv.org/abs/2405.04771

PDF

https://arxiv.org/pdf/2405.04771.pdf
Read All
Large Language Models for Cyber Security: A Systematic Literature Review

2024-05-08 02:09:17

HanXiang Xu, ShenAo Wang, Ningke Li, Yanjie Zhao, Kai Chen, Kailong Wang, Yang Liu, Ting Yu, HaoYu Wang

arXiv_AI

arXiv_AI Detection Review Survey Transfer_Learning Language_Model Transformer LLM
Abstract

The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we conduct a comprehensive review of the literature on the application of LLMs in cybersecurity (LLM4Security). By comprehensively collecting over 30K relevant papers and systematically analyzing 127 papers from top security and software engineering venues, we aim to provide a holistic view of how LLMs are being used to solve diverse problems across the cybersecurity domain. Through our analysis, we identify several key findings. First, we observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Second, we find that the datasets used for training and evaluating LLMs in these tasks are often limited in size and diversity, highlighting the need for more comprehensive and representative datasets. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training. Finally, we discuss the main challenges and opportunities for future research in LLM4Security, including the need for more interpretable and explainable models, the importance of addressing data privacy and security concerns, and the potential for leveraging LLMs for proactive defense and threat hunting. Overall, our survey provides a comprehensive overview of the current state-of-the-art in LLM4Security and identifies several promising directions for future research.

Abstract (translated)

大规模语言模型的快速发展为利用人工智能在各个领域提供了新的机会，包括网络安全。随着网络威胁的数量和复杂性不断增加，人们越来越需要能够自动检测漏洞、分析恶意软件并应对攻击的智能系统。在本次调查中，我们对大规模语言模型在网络安全领域的应用进行全面回顾（LLM4Security）。通过全面收集超过3000篇相关论文并系统分析来自顶级安全和水下工程领域的127篇论文，我们希望为读者提供全面了解大规模语言模型在网络安全领域应用的视角。通过我们的分析，我们发现了几个关键发现。首先，我们观察到大规模语言模型被应用于广泛的网络安全任务，包括漏洞检测、恶意软件分析、网络入侵检测和网络钓鱼检测。其次，我们发现用于训练和评估大规模语言模型在这些任务中所用的数据集往往规模有限且多样性不足，强调了对更全面和代表性的数据集的需求。第三，我们识别出几种将大规模语言模型适应特定网络安全领域的有前途的方法，例如微调、迁移学习和领域特定的预训练。最后，我们讨论了在LLM4Security领域未来研究的主要挑战和机遇，包括需要更可解释和可解释的模型、解决数据隐私和安全问题的迫切需要以及利用大规模语言模型进行主动防御和威胁狩猎的可能性。总的来说，我们的调查为LLM4Security领域提供了全面回顾，并提出了几个有前途的研究方向。

URL

https://arxiv.org/abs/2405.04760

PDF

https://arxiv.org/pdf/2405.04760.pdf
Read All
Enriched BERT Embeddings for Scholarly Publication Classification

2024-05-07 09:05:20

Benjamin Wolff, Eva Seidlmayer, Konrad U. F\"orstner

arXiv_AI

arXiv_AI GAN Classification Face Transfer_Learning Embedding Knowledge Knowledge_Graph Language_Model Bert
Abstract

With the rapid expansion of academic literature and the proliferation of preprints, researchers face growing challenges in manually organizing and labeling large volumes of articles. The NSLP 2024 FoRC Shared Task I addresses this challenge organized as a competition. The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.This paper presents our results. Initially, we enrich the dataset (containing English scholarly articles sourced from ORKG and arXiv), then leverage different pre-trained language Models (PLMs), specifically BERT, and explore their efficacy in transfer learning for this downstream task. Our experiments encompass feature-based and fine-tuned transfer learning approaches using diverse PLMs, optimized for scientific tasks, including SciBERT, SciNCL, and SPECTER2. We conduct hyperparameter tuning and investigate the impact of data augmentation from bibliographic databases such as OpenAlex, Semantic Scholar, and Crossref. Our results demonstrate that fine-tuning pre-trained models substantially enhances classification performance, with SPECTER2 emerging as the most accurate model. Moreover, enriching the dataset with additional metadata improves classification outcomes significantly, especially when integrating information from S2AG, OpenAlex and Crossref. Our best-performing approach achieves a weighted F1-score of 0.7415. Overall, our study contributes to the advancement of reliable automated systems for scholarly publication categorization, offering a potential solution to the laborious manual curation process, thereby facilitating researchers in efficiently locating relevant resources.

Abstract (translated)

URL

https://arxiv.org/abs/2405.04136

PDF

https://arxiv.org/pdf/2405.04136.pdf
Read All
Predicting Lung Disease Severity via Image-Based AQI Analysis using Deep Learning Techniques

2024-05-07 03:42:49

Anvita Mahajan, Sayali Mate, Chinmayee Kulkarni, Suraj Sawant

arXiv_CV

arXiv_CV Deep_Learning Transfer_Learning Prediction Pose Action
Abstract

Air pollution is a significant health concern worldwide, contributing to various respiratory diseases. Advances in air quality mapping, driven by the emergence of smart cities and the proliferation of Internet-of-Things sensor devices, have led to an increase in available data, fueling momentum in air pollution forecasting. The objective of this study is to devise an integrated approach for predicting air quality using image data and subsequently assessing lung disease severity based on Air Quality Index (AQI).The aim is to implement an integrated approach by refining existing techniques to improve accuracy in predicting AQI and lung disease severity. The study aims to forecast additional atmospheric pollutants like AQI, PM10, O3, CO, SO2, NO2 in addition to PM2.5 levels. Additionally, the study aims to compare the proposed approach with existing methods to show its effectiveness. The approach used in this paper uses VGG16 model for feature extraction in images and neural network for predicting this http URL predicting lung disease severity, Support Vector Classifier (SVC) and K-Nearest Neighbors (KNN) algorithms are utilized. The neural network model for predicting AQI achieved training accuracy of 88.54 % and testing accuracy of 87.44%,which was measured using loss function, while the KNN model used for predicting lung disease severity achieved training accuracy of 98.4% and testing accuracy of 97.5% In conclusion, the integrated approach presented in this study forecasts air quality and evaluates lung disease severity, achieving high testing accuracies of 87.44% for AQI and 97.5% for lung disease severity using neural network, KNN, and SVC models. The future scope involves implementing transfer learning and advanced deep learning modules to enhance prediction capabilities. While the current study focuses on India, the objective is to expand its scope to encompass global coverage.

Abstract (translated)

空气污染是全球性的健康问题，对各种呼吸系统疾病产生了贡献。随着智能城市和物联网传感器设备的普及，空气质量测量技术的进步导致了可获得数据的增加，推动了空气污染预测的动量。本研究旨在制定一种使用图像数据预测空气质量的集成方法，并基于空气质量指数（AQI）评估肺病严重程度。该研究的目的是通过优化现有技术来提高预测AQI和肺病严重程度的精度。研究旨在预测除了PM2.5水平之外的大气污染物，如AQI、PM10、O3、CO、SO2和NO2。此外，研究旨在将所提出的方法与现有方法进行比较，以展示其有效性。本文中使用的神经网络模型为VGG16模型进行图像特征提取，并预测空气质量。支持向量分类器（SVC）和K-最近邻（KNN）算法用于预测肺病严重程度。预测AQI的神经网络模型的训练准确率为88.54%，测试准确率为87.44%，而预测肺病严重程度的KNN模型的训练准确率为98.4%，测试准确率为97.5%。总之，本研究提出的集成方法可以预测空气质量并评估肺病严重程度，使用神经网络、KNN和SVC模型实现高测试准确率，分别为87.44%的AQI和97.5%的肺病严重程度。未来的研究包括实施迁移学习和高级深度学习模块以提高预测能力。虽然本文的重点放在印度上，但目标是将其范围扩展到全球覆盖。

URL

https://arxiv.org/abs/2405.03981

PDF

https://arxiv.org/pdf/2405.03981.pdf
Read All
Mind the Gap Between Synthetic and Real: Utilizing Transfer Learning to Probe the Boundaries of Stable Diffusion Generated Data

2024-05-06 07:51:13

Leonhard Hennicke, Christian Medeiros Adriano, Holger Giese, Jan Mathias Koehler, Lukas Schott

arXiv_CV

arXiv_CV Transfer_Learning Knowledge Diffusion
Abstract

Generative foundation models like Stable Diffusion comprise a diverse spectrum of knowledge in computer vision with the potential for transfer learning, e.g., via generating data to train student models for downstream tasks. This could circumvent the necessity of collecting labeled real-world data, thereby presenting a form of data-free knowledge distillation. However, the resultant student models show a significant drop in accuracy compared to models trained on real data. We investigate possible causes for this drop and focus on the role of the different layers of the student model. By training these layers using either real or synthetic data, we reveal that the drop mainly stems from the model's final layers. Further, we briefly investigate other factors, such as differences in data-normalization between synthetic and real, the impact of data augmentations, texture vs.\ shape learning, and assuming oracle prompts. While we find that some of those factors can have an impact, they are not sufficient to close the gap towards real data. Building upon our insights that mainly later layers are responsible for the drop, we investigate the data-efficiency of fine-tuning a synthetically trained model with real data applied to only those last layers. Our results suggest an improved trade-off between the amount of real training data used and the model's accuracy. Our findings contribute to the understanding of the gap between synthetic and real data and indicate solutions to mitigate the scarcity of labeled real data.

Abstract (translated)

生成基础模型（如 Stable Diffusion）在计算机视觉领域具有多样化的知识，具有迁移学习的潜力，例如，通过生成数据来训练下游任务的 student 模型。这可能绕过了收集有标签的现实生活中数据的必要性，从而呈现了一种数据免费的知识蒸馏形式。然而，所得到的 student 模型在准确性上显著下降。我们研究了这种下降的可能原因，并着重研究了学生模型的不同层之间的关系。通过使用实数据或合成数据训练这些层，我们发现下降主要源于模型的最后一层。此外，我们简要研究了其他因素，例如合成和真实数据之间的数据标准化差异、数据增强的影响、纹理与形状学习以及假设预言。尽管我们发现其中一些因素可能会产生影响，但它们不足以弥合与真实数据之间的差距。基于我们之前的见解，即主要后层对下降起作用，我们研究了用真实数据仅对最后一层进行微调的合成训练模型的数据效率。我们的结果表明，在训练数据中使用的真实数据量与模型的准确性之间取得了更好的平衡。我们的研究有助于理解合成和真实数据之间的差距，并为解决缺乏有标签真实数据的问题提供解决方案。

URL

https://arxiv.org/abs/2405.03243

PDF

https://arxiv.org/pdf/2405.03243.pdf
Read All
Stable Diffusion Dataset Generation for Downstream Classification Tasks

2024-05-04 15:37:22

Eugenio Lomurno, Matteo D'Oria, Matteo Matteucci

arXiv_AI

arXiv_AI Classification Transfer_Learning Diffusion
Abstract

Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating synthetic datasets, using Transfer Learning, Fine-Tuning and generation parameter optimisation techniques to improve the utility of the dataset for downstream classification tasks. We present a class-conditional version of the model that exploits a Class-Encoder and optimisation of key generation parameters. Our methodology led to synthetic datasets that, in a third of cases, produced models that outperformed those trained on real datasets.

Abstract (translated)

近年来，在生成式人工智能领域的进步使得高质量的合成数据能够紧密地模仿现实世界的数据。本文探讨了使用迁移学习、微调和技术优化生成参数来适应生成式数据模型的方法，以提高数据下游分类任务的可用性。我们提出了一个类条件版本，利用类编码器和优化关键生成参数。我们的方法导致合成数据集，在三分之二的案例中，训练的模型性能优于现实世界的数据。

URL

https://arxiv.org/abs/2405.02698

PDF

https://arxiv.org/pdf/2405.02698.pdf
Read All
Few-Shot Fruit Segmentation via Transfer Learning

2024-05-04 04:05:59

Jordan A. James, Heather K. Manching, Amanda M. Hulse-Kemp, William J. Beksi

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Detection Transfer_Learning Knowledge Transformer Pose Few-Shot Scene_Parsing Robot
Abstract

Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of fruits from field images plays a crucial role in automating jobs such as harvesting, disease detection, and yield estimation. However, achieving robust and precise infield fruit segmentation remains a challenging task since large amounts of labeled data are required to handle variations in fruit size, shape, color, and occlusion. In this paper, we develop a few-shot semantic segmentation framework for infield fruits using transfer learning. Concretely, our work is aimed at addressing agricultural domains that lack publicly available labeled data. Motivated by similar success in urban scene parsing, we propose specialized pre-training using a public benchmark dataset for fruit transfer learning. By leveraging pre-trained neural networks, accurate semantic segmentation of fruit in the field is achieved with only a few labeled images. Furthermore, we show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground, and they can effectively transfer the knowledge to the target fruit dataset.

Abstract (translated)

机器学习、计算机视觉和机器人技术的发展为各个领域带来了 transformative 解决方案，尤其是在农业领域。例如，准确从田间图像中识别和分割水果在自动化诸如采摘、疾病检测和产量估计等任务中扮演着关键角色。然而，实现稳健且精确的田间水果分割仍然具有挑战性，因为需要大量标记数据来处理水果的大小、形状、颜色和遮挡的变异。在本文中，我们为田间水果使用迁移学习开发了一个几 shot semantic segmentation 框架。具体来说，我们的工作旨在解决缺乏公开可用标记数据的农业领域。受到城市场景解析的成功启发，我们提出了使用公共基准数据集进行水果转移学习的专用预训练方案。通过利用预训练的神经网络，可以在仅几张标记图片的情况下实现水果在田间的准确语义分割。此外，我们还证明了经过预训练的模型能够区分仍然在树上的水果和已经掉在地上的水果，并且可以有效地将知识传递到目标水果数据集中。

URL

https://arxiv.org/abs/2405.02556

PDF

https://arxiv.org/pdf/2405.02556.pdf
Read All
CNN-LSTM and Transfer Learning Models for Malware Classification based on Opcodes and API Calls

2024-05-04 03:13:13

Ahmed Bensaoud, Jugal Kalita

arXiv_AI

arXiv_AI RNN CNN Deep_Learning Classification Face Transfer_Learning Pose
Abstract

In this paper, we propose a novel model for a malware classification system based on Application Programming Interface (API) calls and opcodes, to improve classification accuracy. This system uses a novel design of combined Convolutional Neural Network and Long Short-Term Memory. We extract opcode sequences and API Calls from Windows malware samples for classification. We transform these features into N-grams (N = 2, 3, and 10)-gram sequences. Our experiments on a dataset of 9,749,57 samples produce high accuracy of 99.91% using the 8-gram sequences. Our method significantly improves the malware classification performance when using a wide range of recent deep learning architectures, leading to state-of-the-art performance. In particular, we experiment with ConvNeXt-T, ConvNeXt-S, RegNetY-4GF, RegNetY-8GF, RegNetY-12GF, EfficientNetV2, Sequencer2D-L, Swin-T, ViT-G/14, ViT-Ti, ViT-S, VIT-B, VIT-L, and MaxViT-B. Among these architectures, Swin-T and Sequencer2D-L architectures achieved high accuracies of 99.82% and 99.70%, respectively, comparable to our CNN-LSTM architecture although not surpassing it.

Abstract (translated)

在本文中，我们提出了一个基于API调用和opcodes的新型恶意软件分类系统模型，以提高分类准确性。该系统采用了一种新颖的结合卷积神经网络和长短时记忆的架构。我们通过对Windows恶意软件样本的分类，提取opcode序列和API调用。我们将这些特征转换为N-gram（N = 2, 3, 和10）序列。我们对9,749,57个样本的实验结果表明，使用8-gram序列取得了99.91%的高准确率。我们的方法在使用各种最新的深度学习架构时显著提高了恶意软件分类的性能，达到了最先进水平。特别是，我们进行了对ConvNeXt-T、ConvNeXt-S、RegNetY-4GF、RegNetY-8GF、RegNetY-12GF、EfficientNetV2、Sequencer2D-L、Swin-T、ViT-G/14、ViT-Ti、ViT-S、VIT-B、VIT-L和MaxViT-B架构的实验。在这些架构中，Swin-T和Sequencer2D-L架构的准确率分别为99.82%和99.70%， respectively，尽管没有超过我们的CNN-LSTM架构，但与我们的CNN-LSTM架构相当。

URL

https://arxiv.org/abs/2405.02548

PDF

https://arxiv.org/pdf/2405.02548.pdf
Read All
Spatio-Temporal SwinMAE: A Swin Transformer based Multiscale Representation Learner for Temporal Satellite Imagery

2024-05-03 22:55:56

Yohei Nakayama, Jiawei Su

arXiv_AI

arXiv_AI Attention Transfer_Learning Represenation_Learning Language_Model Transformer Pose Medical 3D
Abstract

Currently, the foundation models represented by large language models have made dramatic progress and are used in a very wide range of domains including 2D and 3D vision. As one of the important application domains of foundation models, earth observation has attracted attention and various approaches have been developed. When considering earth observation as a single image capture, earth observation imagery can be processed as an image with three or more channels, and when it comes with multiple image captures of different timestamps at one location, the temporal observation can be considered as a set of continuous image resembling video frames or medical SCAN slices. This paper presents Spatio-Temporal SwinMAE (ST-SwinMAE), an architecture which particularly focuses on representation learning for spatio-temporal image processing. Specifically, it uses a hierarchical Masked Auto-encoder (MAE) with Video Swin Transformer blocks. With the architecture, we present a pretrained model named Degas 100M as a geospatial foundation model. Also, we propose an approach for transfer learning with Degas 100M, which both pretrained encoder and decoder of MAE are utilized with skip connections added between them to achieve multi-scale information communication, forms an architecture named Spatio-Temporal SwinUNet (ST-SwinUNet). Our approach shows significant improvements of performance over existing state-of-the-art of foundation models. Specifically, for transfer learning of the land cover downstream task on the PhilEO Bench dataset, it shows 10.4\% higher accuracy compared with other geospatial foundation models on average.

Abstract (translated)

目前，大型语言模型所代表的基模型已经取得了显著的进步，并在包括2D和3D视觉在内的各种领域得到了广泛应用。作为基础模型的重要应用领域之一，地球观测吸引了人们的注意，并开发了各种方法。当将地球观测视为单张图像捕捉时，地球观测图像可以处理为具有三个或更多通道的图像，而当它位于一个位置的多个不同时间戳的图像捕捉时，时间观察可以被视为一系列连续的图像，类似于视频帧或医学SCAN切片。本文介绍了一种名为Spatio-Temporal SwinMAE（ST-SwinMAE）的架构，该架构特别关注空间-时间图像处理中的表示学习。具体来说，它使用了一个层次化的遮罩自编码器（MAE）和视频Swin Transformer块。通过这种架构，我们提出了一个名为Spatio-Temporal SwinUNet（ST-SwinUNet）的预训练模型。我们还提出了一种使用Degas 100M作为空间-时间基础模型的迁移学习方法，该模型包括MAE的预训练编码器和解码器，并在它们之间添加了跳跃连接以实现多尺度信息交流，形成了一个名为Spatio-Temporal SwinUNet的架构。我们的方法在现有基础模型性能上显示出显著的改进。具体来说，在菲欧埃奥基准数据集上对地表覆盖下游任务的迁移学习中，它比其他空间-时间基础模型平均高10.4%。

URL

https://arxiv.org/abs/2405.02512

PDF

https://arxiv.org/pdf/2405.02512.pdf
Read All
GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT

2024-05-03 14:58:46

Yu Pan, Yuguang Yang, Heng Lu, Lei Ma, Jianjun Zhao

arXiv_AI

arXiv_AI Recognition Transfer_Learning Bert Emotion Enhancement Speech
Abstract

The continuous evolution of pre-trained speech models has greatly advanced Speech Emotion Recognition (SER). However, there is still potential for enhancement in the performance of these methods. In this paper, we present GMP-ATL (Gender-augmented Multi-scale Pseudo-label Adaptive Transfer Learning), a novel HuBERT-based adaptive transfer learning framework for SER. Specifically, GMP-ATL initially employs the pre-trained HuBERT, implementing multi-task learning and multi-scale k-means clustering to acquire frame-level gender-augmented multi-scale pseudo-labels. Then, to fully leverage both obtained frame-level and utterance-level emotion labels, we incorporate model retraining and fine-tuning methods to further optimize GMP-ATL. Experiments on IEMOCAP show that our GMP-ATL achieves superior recognition performance, with a WAR of 80.0\% and a UAR of 82.0\%, surpassing state-of-the-art unimodal SER methods, while also yielding comparable results with multimodal SER approaches.

Abstract (translated)

预训练语音模型的连续进化已经极大地推动了情感识别(SER)。然而,这些方法在表现上还有很大的提升潜力。在本文中,我们提出了GMP-ATL(性别增强多尺度伪标签自适应转移学习),一种新颖的HuBERT基情感识别(SER)自适应转移学习框架。具体来说,GMP-ATL首先采用预训练的HuBERT,实现多任务学习和多尺度k-means聚类,以获取帧级的性别增强多尺度伪标签。然后,为了充分利用获得的帧级和语料水平情感标签,我们引入模型重构和微调方法,进一步优化GMP-ATL。在IEMOCAP上的实验表明,我们的GMP-ATL取得了卓越的识别性能,具有80.0%的准确率(WAR)和82.0%的召回率(UAR),超越了当前最先进的单模态SER方法,同时与多模态SER方法相当。

URL

https://arxiv.org/abs/2405.02151

PDF

https://arxiv.org/pdf/2405.02151.pdf
Read All
Creation of Novel Soft Robot Designs using Generative AI

2024-05-03 02:55:27

Wee Kiat Chan, PengWei Wang, Raye Chen-Hua Yeow

arXiv_AI

arXiv_AI Transfer_Learning 3D Diffusion Robot
Abstract

Soft robotics has emerged as a promising field with the potential to revolutionize industries such as healthcare and manufacturing. However, designing effective soft robots presents challenges, particularly in managing the complex interplay of material properties, structural design, and control strategies. Traditional design methods are often time-consuming and may not yield optimal designs. In this paper, we explore the use of generative AI to create 3D models of soft actuators. We create a dataset of over 70 text-shape pairings of soft pneumatic robot actuator designs, and adapt a latent diffusion model (SDFusion) to learn the data distribution and generate novel designs from it. By employing transfer learning and data augmentation techniques, we significantly improve the performance of the diffusion model. These findings highlight the potential of generative AI in designing complex soft robotic systems, paving the way for future advancements in the field.

Abstract (translated)

软机器人学已成为一个具有潜力彻底颠覆医疗和制造业等行业的有益领域。然而，设计有效的软机器人充满了挑战，特别是在管理材料的性质、结构设计和控制策略的复杂相互作用方面。传统的设计方法通常费时且可能无法产生最优的设计。在本文中，我们探讨了使用生成式人工智能来创建软致动器的三维模型。我们创建了一个由超过70个软气动机器人致动器设计文本形状对组成的 dataset，并使用潜在扩散模型（SDFusion）学习数据分布并从其中生成新设计。通过采用迁移学习和数据增强技术，我们显著提高了扩散模型的性能。这些发现突出了生成式人工智能在设计复杂软机器人系统中的潜在可能性，为该领域未来的发展铺平道路。

URL

https://arxiv.org/abs/2405.01824

PDF

https://arxiv.org/pdf/2405.01824.pdf
Read All

Content

Transfer_Learning (20)

Transfer_Learning

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF