RNN

Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming

2024-05-15 17:01:02

Bushi Xiao, Chao Gao, Demi Zhang

arXiv_CL

arXiv_CL RNN Language_Model Transformer LLM
Abstract

This study evaluates the performance of Recurrent Neural Network (RNN) and Transformer in replicating cross-language structural priming: a key indicator of abstract grammatical representations in human language processing. Focusing on Chinese-English priming, which involves two typologically distinct languages, we examine how these models handle the robust phenomenon of structural priming, where exposure to a particular sentence structure increases the likelihood of selecting a similar structure subsequently. Additionally, we utilize large language models (LLM) to measure the cross-lingual structural priming effect. Our findings indicate that Transformer outperform RNN in generating primed sentence structures, challenging the conventional belief that human sentence processing primarily involves recurrent and immediate processing and suggesting a role for cue-based retrieval mechanisms. Overall, this work contributes to our understanding of how computational models may reflect human cognitive processes in multilingual contexts.

Abstract (translated)

本研究评估了循环神经网络（RNN）和Transformer在复制跨语言结构预处理方面的性能：这是人类语言处理中抽象语义表示的关键指标。我们重点研究了汉语-英语预处理，涉及两种具有不同类型的典型语言，并探讨了这些模型如何处理结构预定的稳健现象，即暴露于特定句子结构会增加随后选择类似结构的概率。此外，我们还利用大型语言模型（LLM）测量跨语言结构预处理效果。我们的研究结果表明，Transformer在生成预置句子结构方面优于RNN，挑战了传统观念，即人类句子处理主要涉及循环和即时的处理，并表明了基于提示的检索机制的作用。总的来说，这项工作对我们的理解在多语言环境中计算模型如何反映人类认知过程做出了贡献。

URL

https://arxiv.org/abs/2405.09508

PDF

https://arxiv.org/pdf/2405.09508.pdf
Read All
Physics-Informed Neural Network for Multirotor Slung Load Systems Modeling

2024-05-15 15:20:18

Gil Serrano, Marcelo Jacinto, Jose Ribeiro-Gomes, Joao Pinto, Bruno J. Guerreiro, Alexandre Bernardino, Rita Cunha

arXiv_RO

arXiv_RO RNN Regularization Attention Prediction Pose Autonomous Robot
Abstract

Recent advances in aerial robotics have enabled the use of multirotor vehicles for autonomous payload transportation. Resorting only to classical methods to reliably model a quadrotor carrying a cable-slung load poses significant challenges. On the other hand, purely data-driven learning methods do not comply by design with the problem's physical constraints, especially in states that are not densely represented in training data. In this work, we explore the use of physics informed neural networks to learn an end-to-end model of the multirotor-slung-load system and, at a given time, estimate a sequence of the future system states. An LSTM encoder decoder with an attention mechanism is used to capture the dynamics of the system. To guarantee the cohesiveness between the multiple predicted states of the system, we propose the use of a physics-based term in the loss function, which includes a discretized physical model derived from first principles together with slack variables that allow for a small mismatch between expected and predicted values. To train the model, a dataset using a real-world quadrotor carrying a slung load was curated and is made available. Prediction results are presented and corroborate the feasibility of the approach. The proposed method outperforms both the first principles physical model and a comparable neural network model trained without the physics regularization proposed.

Abstract (translated)

近年来，随着无人机遥控技术的进步，多旋翼车辆已被用于自主承载电缆吊重物的运输。仅仅依靠经典方法来可靠地建模四旋翼携带电缆吊重物存在重大挑战。另一方面，完全基于数据驱动的学习方法在设计上并不符合问题固有约束，尤其是在训练数据中没有很好地表示的状态。在本文中，我们探讨了使用受物理学启发的神经网络来学习多旋翼吊重系统端到端模型的应用，并在给定时间估计未来系统状态。为了捕捉系统的动态，我们使用了LSTM编码器-解码器模型，并引入了注意机制来控制多个预测状态之间的连贯性。为了保证系统中多个预测状态的连贯性，我们在损失函数中引入了一个基于物理学的项，包括从基本原理导出的离散化物理模型和允许预期值和预测值之间的小误差的可缩放变量。为了训练模型，我们挑选了一个使用真实世界四旋翼运输电缆吊重物的数据集，并提供了用于训练的数据集。预测结果被呈现，并证实了该方法的有效性。与无物理学 regularization的第一原理物理模型和没有物理学的神经网络模型相比，所提出的方法优越。

URL

https://arxiv.org/abs/2405.09428

PDF

https://arxiv.org/pdf/2405.09428.pdf
Read All
RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations

2024-05-13 18:49:18

Chengde Lin, Xijun Lu, Guangxi Chen

arXiv_CV

arXiv_CV RNN GAN Adversarial Attention Pose
Abstract

Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), the classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization and instance normalization, have been applied to different layers of GAN to control content synthesis in images. CAT is a multi-layer perceptron that independently predicts data based on batch statistics between neighboring layers, with global textual information unavailable to other layers. To address this issue, we first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the characteristic of information forgetting in recurrent neural networks. Moreover, both our generator and discriminator utilize the powerful pre-trained model, Clip, which has been extensively employed for establishing associations between text and images through the learning of multimodal representations in latent space. The discriminator utilizes CLIP's ability to comprehend complex scenes to accurately assess the quality of the generated images. Extensive experiments have been conducted on the CUB, Oxford, and CelebA-tiny datasets to demonstrate the superiority of the proposed model over current state-of-the-art models. The code is this https URL.

Abstract (translated)

生成式对抗网络（GANs）是这个任务的经典模型，但通常在图像和文本描述之间缺乏一致性，并在生成图像的丰富性上不足。最近，条件反向传播（CAT）技术，如条件批归一化和实例归一化，已经被应用到GAN的不同层中，以控制图像中的内容合成。CAT是一个多层感知器，根据批归一化统计数据独立预测数据，而其他层则无法访问全局文本信息。为解决这个问题，我们首先将CAT和循环神经网络（RAT）建模，以确保不同层可以访问全局信息。然后，在RAT之间引入平移注意力和循环神经网络（RNN）的特点，降低信息遗忘的特点。此外，我们的生成器和鉴别器都利用了强大的预训练模型Clip，该模型已通过在潜在空间中学习多模态表示来建立文本和图像之间的联系。鉴别器利用Clip理解复杂场景，从而准确评估生成图像的质量。在CUB、牛津和CelebA-tiny数据集上进行了大量实验，证明了与当前最先进模型相比，所提出的模型具有优越性。代码在这个https URL上。

URL

https://arxiv.org/abs/2405.08114

PDF

https://arxiv.org/pdf/2405.08114.pdf
Read All
MambaOut: Do We Really Need Mamba for Vision?

2024-05-13 17:59:56

Weihao Yu, Xinchao Wang

arXiv_AI

arXiv_AI Segmentation RNN CNN Detection Classification Image_Classification Attention
Abstract

Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision tasks. Nevertheless, the performance of Mamba for vision is often underwhelming when compared with convolutional and attention-based models. In this paper, we delve into the essence of Mamba, and conceptually conclude that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks. To empirically verify our hypotheses, we construct a series of models named \emph{MambaOut} through stacking Mamba blocks while removing their core token mixer, SSM. Experimental results strongly support our hypotheses. Specifically, our MambaOut model surpasses all visual Mamba models on ImageNet image classification, indicating that Mamba is indeed unnecessary for this task. As for detection and segmentation, MambaOut cannot match the performance of state-of-the-art visual Mamba models, demonstrating the potential of Mamba for long-sequence visual tasks. The code is available at this https URL

Abstract (translated)

Mamba 是一种具有类似于 RNN 式的状态空间模型（SSM）的架构，最近为解决注意力机制的二次复杂度而引入。然而，与卷积和基于注意的模型相比，Mamba 在视觉任务上的表现往往令人失望。在本文中，我们深入研究了 Mamba 的本质，并从理论上得出了结论，即 Mamba 非常适合具有长序列和自回归特性的任务。对于视觉任务，由于图像分类不涉及任何特性，我们假设 Mamba 对这项任务是不必要的；检测和分割任务也不具有自回归特性，但它们仍然遵循长序列特性，因此我们认为值得探索 Mamba 在这些任务上的潜力。为了通过实验验证我们的假设，我们通过堆叠 Mamba 模块并删除其核心词混合器构建了一系列模型，命名为 \emph{MambaOut}。实验结果强烈支持我们的假设。具体来说，我们的 MambaOut 模型在 ImageNet 图像分类中超过了所有视觉 Mamba 模型，表明 Mamba 确实是不必要的。对于检测和分割任务，MambaOut 的性能无法与最先进的视觉 Mamba 模型相媲美，这表明了 Mamba 在长序列视觉任务上的潜力。代码可在此处获得：https:// this URL

URL

https://arxiv.org/abs/2405.07992

PDF

https://arxiv.org/pdf/2405.07992.pdf
Read All
Comparative analysis of neural network architectures for short-term FOREX forecasting

2024-05-13 14:51:02

Theodoros Zafeiriou, Dimitris Kalles

arXiv_AI

arXiv_AI RNN Prediction Optimization
Abstract

The present document delineates the analysis, design, implementation, and benchmarking of various neural network architectures within a short-term frequency prediction system for the foreign exchange market (FOREX). Our aim is to simulate the judgment of the human expert (technical analyst) using a system that responds promptly to changes in market conditions, thus enabling the optimization of short-term trading strategies. We designed and implemented a series of LSTM neural network architectures which are taken as input the exchange rate values and generate the short-term market trend forecasting signal and an ANN custom architecture based on technical analysis indicator simulators We performed a comparative analysis of the results and came to useful conclusions regarding the suitability of each architecture and the cost in terms of time and computational power to implement them. The ANN custom architecture produces better prediction quality with higher sensitivity using fewer resources and spending less time than LSTM architectures. The ANN custom architecture appears to be ideal for use in low-power computing systems and for use cases that need fast decisions with the least possible computational cost.

Abstract (translated)

本文件对短期外汇市场（FOREX）中各种神经网络架构的分析、设计、实施和基准测试进行了说明。我们的目标是用一个能够及时响应市场变化并在最短的时间内优化短期交易策略的系统来模拟人类专家（技术分析师）的判断。我们设计并实现了一系列LSTM神经网络架构，这些架构作为输入汇率和生成短期市场趋势预测信号以及基于技术分析指标模拟器的ANN自定义架构。我们对结果进行了比较分析，并得出了有关每个架构的适用性和实现成本方面的重要结论。ANN自定义架构在资源消耗较少的情况下具有更好的预测质量，并且能以比LSTM架构更少的时间实现。ANN自定义架构似乎非常适合在低功耗计算系统中使用，以及需要快速决策且计算成本最低的应用场景。

URL

https://arxiv.org/abs/2405.08045

PDF

https://arxiv.org/pdf/2405.08045.pdf
Read All
TKAN: Temporal Kolmogorov-Arnold Networks

2024-05-12 17:40:48

Remi Genet, Hugo Inzirillo

arXiv_AI

arXiv_AI RNN Embedding Pose
Abstract

Recurrent Neural Networks (RNNs) have revolutionized many areas of machine learning, particularly in natural language and data sequence processing. Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data. Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs), we proposed a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs). TKANs combined the strenght of both networks, it is composed of Recurring Kolmogorov-Arnold Networks (RKANs) Layers embedding memory management. This innovation enables us to perform multi-step time series forecasting with enhanced accuracy and efficiency. By addressing the limitations of traditional models in handling complex sequential patterns, the TKAN architecture offers significant potential for advancements in fields requiring more than one step ahead forecasting.

Abstract (translated)

循环神经网络（RNNs）已经在机器学习在很多领域取得了革命性的进展，特别是在自然语言处理和数据序列处理方面。长短时记忆（LSTM）已经证明了它在序列数据中捕捉长期依赖的能力。受到Kolmogorov-Arnold网络（KANs）的启发，我们提出了一个基于KAN和LSTM的新型神经网络架构，称为 Temporal Kolomogorov-Arnold Networks（TKANs）。TKANs 结合了两个网络的力量，它由 Recurring Kolmogorov-Arnold Networks（RKANs）层嵌入记忆管理组成。这一创新使得我们能够通过增强准确性和效率进行多步时间序列预测。通过解决传统模型在处理复杂序列模式方面的局限性，TKAN 架构在需要多步前预测的领域具有显著的改进潜力。

URL

https://arxiv.org/abs/2405.07344

PDF

https://arxiv.org/pdf/2405.07344.pdf
Read All
AraSpell: A Deep Learning Approach for Arabic Spelling Correction

2024-05-11 10:36:28

Mahmoud Salhab, Faisal Abu-Khzam

arXiv_AI

arXiv_AI RNN Deep_Learning Transformer Pose
Abstract

Spelling correction is the task of identifying spelling mistakes, typos, and grammatical mistakes in a given text and correcting them according to their context and grammatical structure. This work introduces "AraSpell," a framework for Arabic spelling correction using different seq2seq model architectures such as Recurrent Neural Network (RNN) and Transformer with artificial data generation for error injection, trained on more than 6.9 Million Arabic sentences. Thorough experimental studies provide empirical evidence of the effectiveness of the proposed approach, which achieved 4.8% and 1.11% word error rate (WER) and character error rate (CER), respectively, in comparison with labeled data of 29.72% WER and 5.03% CER. Our approach achieved 2.9% CER and 10.65% WER in comparison with labeled data of 10.02% CER and 50.94% WER. Both of these results are obtained on a test set of 100K sentences.

Abstract (translated)

拼写纠错是识别文本中的拼写错误、错别字和语法错误，并根据它们的上下文和语法结构进行纠正的任务。这项工作引入了"AraSpell"，一种使用不同序列到序列模型架构（如循环神经网络RNN和Transformer）对阿拉伯语进行拼写纠正的方法，训练在超过690万阿拉伯语句子上。详细的实验研究提供了对所提出方法的实证证据，该方法取得了4.8%和1.11%的单词错误率（WER）和字符错误率（CER）与标注数据分别为29.72%和5.03%的比较。与标注数据相比，我们的方法在拼写错误率上取得了2.9%和10.65%的改善，在字符错误率上取得了10.02%和50.94%的改善。这两种结果都是在100K个句子组成的测试集上取得的。

URL

https://arxiv.org/abs/2405.06981

PDF

https://arxiv.org/pdf/2405.06981.pdf
Read All
Music Emotion Prediction Using Recurrent Neural Networks

2024-05-10 18:03:20

Xinyu Chang, Xiangyu Zhang, Haoruo Zhang, Yulu Ran

arXiv_SD

arXiv_SD RNN Recommendation Prediction Emotion
Abstract

This study explores the application of recurrent neural networks to recognize emotions conveyed in music, aiming to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states. We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these categories. Our approach involves extracting a comprehensive set of audio features using Librosa and applying various recurrent neural network architectures, including standard RNNs, Bidirectional RNNs, and Long Short-Term Memory (LSTM) networks. Initial experiments are conducted using a dataset of 900 audio clips, labeled according to the emotional quadrants. We compare the performance of our neural network models against a set of baseline classifiers and analyze their effectiveness in capturing the temporal dynamics inherent in musical expression. The results indicate that simpler RNN architectures may perform comparably or even superiorly to more complex models, particularly in smaller datasets. We've also applied the following experiments on larger datasets: one is augmented based on our original dataset, and the other is from other sources. This research not only enhances our understanding of the emotional impact of music but also demonstrates the potential of neural networks in creating more personalized and emotionally resonant music recommendation and therapy systems.

Abstract (translated)

本研究探讨了循环神经网络（RNN）在识别音乐中传达的情感中的应用，旨在通过将音乐定制以适应听众的情感状态来增强音乐推荐系统和支持治疗干预。我们利用拉塞尔的情感四象限将音乐分为四个不同的情感区域，并开发了能够准确预测这些类别的模型。我们的方法包括使用Librosa提取全面音频特征，并应用各种循环神经网络架构，包括标准的RNN、双向RNN和长短时记忆（LSTM）网络。我们对900个音频片段的音频数据集进行了初始实验，并根据情感四象限进行分类。我们比较了我们的神经网络模型的性能与一系列基线分类器的性能，并分析了它们在捕捉音乐表达中的时间动态方面的有效性。结果显示，简单的RNN架构可能与更复杂的模型表现相当，甚至可能优于它们。我们还将在较大数据集上进行以下实验：一是基于我们原始数据集的增强，二是来自其他来源的。这项研究不仅增强了我们对音乐情感影响的了解，还展示了神经网络在创建更个性化和情感共鸣的音乐推荐和治疗系统方面的潜力。

URL

https://arxiv.org/abs/2405.06747

PDF

https://arxiv.org/pdf/2405.06747.pdf
Read All
Linearizing Large Language Models

2024-05-10 17:59:08

Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar

arXiv_CL

arXiv_CL RNN Attention Inference Language_Model Transformer Pose LLM
Abstract

Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost. However, their original formulation suffers from poor scaling and underperforms compute-matched transformers. Recent linear models such as RWKV and Mamba have attempted to address these shortcomings by proposing novel time-mixing and gating architectures, but pre-training large language models requires significant data and compute investments. Thus, the search for subquadratic architectures is limited by the availability of compute and quality pre-training datasets. As a cost-effective alternative to pre-training linear transformers, we propose Scalable UPtraining for Recurrent Attention (SUPRA). We present a method to uptrain existing large pre-trained transformers into Recurrent Neural Networks (RNNs) with a modest compute budget. This allows us to leverage the strong pre-training data and performance of existing transformer LLMs, while requiring 5% of the training cost. We find that our linearization technique leads to competitive performance on standard benchmarks, but we identify persistent in-context learning and long-context modeling shortfalls for even the largest linear models. Our code and models can be found at this https URL.

Abstract (translated)

线性变换器作为一种子quadratic-时间选择性的替代软最大注意力，因固定大小的循环状态使得推理成本降低而引起了 significant interest。然而，其原始公式存在 poor scaling 和 underperforms compute-matched transformers 的问题。为解决这些问题，一些最近线性模型如 RWKV 和 Mamba 尝试通过提出新颖的时间混合和门控架构来解决这些问题，但预训练大型语言模型需要大量的数据和计算投资。因此，寻找子quadratic 架构的搜索受到可用的计算和高质量预训练数据集的限制。作为预训练线性变换器的成本效益替代方法，我们提出了可扩展的UP训练 for Recurrent Attention (SUPRA)。我们提出了一种用最小的计算成本将现有的大型预训练变换器上训练成循环神经网络（RNN）的方法。这使我们能够利用现有 transformer LLM 的强大预训练数据和性能，同时只需支付训练成本的5%。我们发现，我们的线性化技术在标准基准测试中具有竞争力的性能，但我们还发现了即使是最大的线性模型也存在持续的上下文学习和长上下文建模的不足。我们的代码和模型可以从该链接找到。

URL

https://arxiv.org/abs/2405.06640

PDF

https://arxiv.org/pdf/2405.06640.pdf
Read All
Hierarchical Learned Risk-Aware Planning Framework for Human Driving Modeling

2024-05-10 16:33:57

Nathan Ludlow, Yiwei Lyu, John Dolan

arXiv_RO

arXiv_RO RNN Prediction Autonomous
Abstract

This paper presents a novel approach to modeling human driving behavior, designed for use in evaluating autonomous vehicle control systems in a simulation environments. Our methodology leverages a hierarchical forward-looking, risk-aware estimation framework with learned parameters to generate human-like driving trajectories, accommodating multiple driver levels determined by model parameters. This approach is grounded in multimodal trajectory prediction, using a deep neural network with LSTM-based social pooling to predict the trajectories of surrounding vehicles. These trajectories are used to compute forward-looking risk assessments along the ego vehicle's path, guiding its navigation. Our method aims to replicate human driving behaviors by learning parameters that emulate human decision-making during driving. We ensure that our model exhibits robust generalization capabilities by conducting simulations, employing real-world driving data to validate the accuracy of our approach in modeling human behavior. The results reveal that our model effectively captures human behavior, showcasing its versatility in modeling human drivers in diverse highway scenarios.

Abstract (translated)

本文提出了一种用于评估自动驾驶系统在仿真环境中的新颖方法，该方法基于分层前向展望、风险意识估计框架和学习参数来生成类似于人类的驾驶轨迹，并适应由模型参数确定的多个驾驶员级别。该方法基于多模态轨迹预测，使用基于LSTM的社交池化深度神经网络预测周围车辆的轨迹。这些轨迹用于计算自车路径上的向前展望风险评估，指导其导航。我们的目标是通过学习模仿人类在驾驶过程中决策的参数，复制人类驾驶行为。我们通过仿真和利用现实世界的驾驶数据来验证我们方法的准确性，确保我们的模型具有稳健的泛化能力。结果表明，我们的模型有效地捕捉了人类行为，展示了在多样高速公路场景中建模人类驾驶员的多样性。

URL

https://arxiv.org/abs/2405.06578

PDF

https://arxiv.org/pdf/2405.06578.pdf
Read All
Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification

2024-05-10 13:27:32

Yaoqin Ye, Junjie Zhang, Hongwei Shi

arXiv_CL

arXiv_CL RNN Recognition Classification Image_Classification Embedding Knowledge Language_Model Text_Generation Transformer Zero-Shot Medical
Abstract

The task of medical image recognition is notably complicated by the presence of varied and multiple pathological indications, presenting a unique challenge in multi-label classification with unseen labels. This complexity underlines the need for computer-aided diagnosis methods employing multi-label zero-shot learning. Recent advancements in pre-trained vision-language models (VLMs) have showcased notable zero-shot classification abilities on medical images. However, these methods have limitations on leveraging extensive pre-trained knowledge from broader image datasets, and often depend on manual prompt construction by expert radiologists. By automating the process of prompt tuning, prompt learning techniques have emerged as an efficient way to adapt VLMs to downstream tasks. Yet, existing CoOp-based strategies fall short in performing class-specific prompts on unseen categories, limiting generalizability in fine-grained scenarios. To overcome these constraints, we introduce a novel prompt generation approach inspirited by text generation in natural language processing (NLP). Our method, named Pseudo-Prompt Generating (PsPG), capitalizes on the priori knowledge of multi-modal features. Featuring a RNN-based decoder, PsPG autoregressively generates class-tailored embedding vectors, i.e., pseudo-prompts. Comparative evaluations on various multi-label chest radiograph datasets affirm the superiority of our approach against leading medical vision-language and multi-label prompt learning methods. The source code is available at this https URL

Abstract (translated)

医学图像识别任务的复杂性显著地由多种病理诊断表现的存在所加剧，这为在多标签分类中处理未见标签的挑战带来了独特的挑战。这种复杂性突出了需要使用多标签零样本学习来进行计算机辅助诊断的方法。最近，预训练视觉语言模型（VLMs）在医学图像上的显著零样本分类能力引起了人们的关注。然而，这些方法在利用更广泛的预训练知识方面存在局限，并且通常依赖于专家放射科医生的手动提示构建。通过自动调整提示过程，提示学习技术已成为将VLMs适应下游任务的有效方法。然而，现有的CoOp基策略在为未见类别生成类特定提示时存在局限，从而限制了在细粒度场景下的泛化能力。为了克服这些限制，我们引入了一种基于自然语言处理（NLP）的全新提示生成方法，我们称之为伪提示生成（PsPG）。PsPG利用多模态特征的先前知识。它采用循环神经网络（RNN）的解码器，逐个生成类定制嵌入向量，即伪提示。在各种多标签胸部X光片数据集上的比较评估证实了我们的方法相对于最先进的医学视觉语言和多标签提示学习方法具有优越性。源代码可在此链接下载：https://url.cn/

URL

https://arxiv.org/abs/2405.06468

PDF

https://arxiv.org/pdf/2405.06468.pdf
Read All
Visualizing Neural Network Imagination

2024-05-10 11:43:35

Nevan Wichers, Victor Tao, Riccardo Volpato, Fazl Barez

arXiv_AI

arXiv_AI RNN Adversarial Quantitative
Abstract

In certain situations, neural networks will represent environment states in their hidden activations. Our goal is to visualize what environment states the networks are representing. We experiment with a recurrent neural network (RNN) architecture with a decoder network at the end. After training, we apply the decoder to the intermediate representations of the network to visualize what they represent. We define a quantitative interpretability metric and use it to demonstrate that hidden states can be highly interpretable on a simple task. We also develop autoencoder and adversarial techniques and show that benefit interpretability.

Abstract (translated)

在某些情况下，神经网络会将其隐藏激活表示为环境状态。我们的目标是对网络表示的环境状态进行可视化。我们尝试使用一个循环神经网络（RNN）架构，并在其末尾使用解码器网络。在训练之后，我们将解码器应用于网络的中间表示，以可视化它们所表示的环境状态。我们定义了一个定量的可解释性指标，并使用它来证明在简单任务上，隐藏状态可以具有高度的可解释性。我们还开发了自动编码器和对抗技术，并证明了其具有可解释性的优势。

URL

https://arxiv.org/abs/2405.06409

PDF

https://arxiv.org/pdf/2405.06409.pdf
Read All
Evaluating the Efficacy of AI Techniques in Textual Anonymization: A Comparative Study

2024-05-09 11:29:25

Dimitris Asimopoulos, Ilias Siniosoglou, Vasileios Argyriou, Sotirios K. Goudos, Konstantinos E. Psannis, Nikoleta Karditsioti, Theocharis Saoulidis, Panagiotis Sarigiannidis

arXiv_AI

arXiv_AI RNN Attention Embedding Language_Model Transformer
Abstract

In the digital era, with escalating privacy concerns, it's imperative to devise robust strategies that protect private data while maintaining the intrinsic value of textual information. This research embarks on a comprehensive examination of text anonymisation methods, focusing on Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), Embeddings from Language Models (ELMo), and the transformative capabilities of the Transformers architecture. Each model presents unique strengths since LSTM is modeling long-term dependencies, CRF captures dependencies among word sequences, ELMo delivers contextual word representations using deep bidirectional language models and Transformers introduce self-attention mechanisms that provide enhanced scalability. Our study is positioned as a comparative analysis of these models, emphasising their synergistic potential in addressing text anonymisation challenges. Preliminary results indicate that CRF, LSTM, and ELMo individually outperform traditional methods. The inclusion of Transformers, when compared alongside with the other models, offers a broader perspective on achieving optimal text anonymisation in contemporary settings.

Abstract (translated)

在数字时代，随着隐私问题的加剧，制定保护个人隐私的策略至关重要，同时保持文本信息固有价值。这项研究对文本匿名化方法进行全面评估，重点关注条件随机场（CRF）、长短时记忆（LSTM）、语言模型嵌入（ELMo）和Transformer架构的转化能力。每个模型都具有独特的优势，因为LSTM建模了长距离依赖关系，CRF捕捉了单词序列之间的依赖关系，ELMo通过双向深度语言模型提供上下文单词表示，而Transformer架构引入了自注意力机制，提供了更高的可扩展性。我们的研究对这些模型进行了比较分析，强调了它们在解决文本匿名化挑战中的协同潜力。初步结果表明，CRF、LSTM和ELMo单独优于传统方法。与其它模型相比较，当Transformer与其他模型结合时，为实现当代环境中最优的文本匿名化提供了更广阔的视角。

URL

https://arxiv.org/abs/2405.06709

PDF

https://arxiv.org/pdf/2405.06709.pdf
Read All
An LSTM-Based Chord Generation System Using Chroma Histogram Representations

2024-05-08 17:36:29

Jack Hardwick

arXiv_SD

arXiv_SD RNN Relation Pose
Abstract

This paper proposes a system for chord generation to monophonic symbolic melodies using an LSTM-based model trained on chroma histogram representations of chords. Chroma representations promise more harmonically rich generation than chord label-based approaches, whilst maintaining a small number of dimensions in the dataset. This system is shown to be suitable for limited real-time use. While it does not meet the state-of-the-art for coherent long-term generation, it does show diatonic generation with cadential chord relationships. The need for further study into chroma histograms as an extracted feature in chord generation tasks is highlighted.

Abstract (translated)

本文提出了一种使用基于LSTM的模型来生成单旋律和弦符号音乐的系统。和弦表示法承诺比和弦标签基于方法更丰富的和声生成，同时保持数据集中的少量维度。该系统在有限实时应用中表现良好。虽然它未达到可预测的长期流畅性，但它确实展示了和声关系中的晚期弦生成。突出了对和弦色图作为提取特征在和弦生成任务中进行进一步研究的需求。

URL

https://arxiv.org/abs/2405.05240

PDF

https://arxiv.org/pdf/2405.05240.pdf
Read All
Machine Learning-based NLP for Emotion Classification on a Cholera X Dataset

2024-05-08 09:05:02

Paul Jideani, Aurona Gerber

arXiv_AI

arXiv_AI RNN Classification Knowledge Sentiment Bert Transformer Emotion
Abstract

Recent social media posts on the cholera outbreak in Hammanskraal have highlighted the diverse range of emotions people experienced in response to such an event. The extent of people's opinions varies greatly depending on their level of knowledge and information about the disease. The documented re-search about Cholera lacks investigations into the classification of emotions. This study aims to examine the emotions expressed in social media posts about Chol-era. A dataset of 23,000 posts was extracted and pre-processed. The Python Nat-ural Language Toolkit (NLTK) sentiment analyzer library was applied to deter-mine the emotional significance of each text. Additionally, Machine Learning (ML) models were applied for emotion classification, including Long short-term memory (LSTM), Logistic regression, Decision trees, and the Bidirectional En-coder Representations from Transformers (BERT) model. The results of this study demonstrated that LSTM achieved the highest accuracy of 75%. Emotion classification presents a promising tool for gaining a deeper understanding of the impact of Cholera on society. The findings of this study might contribute to the development of effective interventions in public health strategies.

Abstract (translated)

近期哈姆skraal地区社交媒体上关于霍乱疫情的社会动态强调了人们在此类事件中经历的多种情感反应。人们对此事件的观点差异很大，这很大程度上取决于他们对这种疾病的了解和知识水平。关于霍乱的详细研究缺乏对情感分类的调查。本研究旨在研究社交媒体上关于霍乱的情感表达。数据集提取并经过预处理，共有23,000条帖子。Python自然语言处理工具包（NLTK）情感分析库应用于确定每个文本的情感重要性。此外，应用机器学习（ML）模型进行情感分类，包括长短时记忆（LSTM）、逻辑回归、决策树和双向编码来自Transformer（BERT）模型。本研究的成果表明，LSTM取得了最高准确率75%。情感分类成为深入了解霍乱对社会影响的有趣工具。本研究的发现可能有助于在公共卫生策略的发展中制定有效的干预措施。

URL

https://arxiv.org/abs/2405.04897

PDF

https://arxiv.org/pdf/2405.04897.pdf
Read All
Physics-based deep learning reveals rising heating demand heightens air pollution in Norwegian cities

2024-05-07 23:43:46

Cong Cao, Ramit Debnath, R. Michael Alvarez

arXiv_AI

arXiv_AI RNN Deep_Learning Relation Prediction Action
Abstract

Policymakers frequently analyze air quality and climate change in isolation, disregarding their interactions. This study explores the influence of specific climate factors on air quality by contrasting a regression model with K-Means Clustering, Hierarchical Clustering, and Random Forest techniques. We employ Physics-based Deep Learning (PBDL) and Long Short-Term Memory (LSTM) to examine the air pollution predictions. Our analysis utilizes ten years (2009-2018) of daily traffic, weather, and air pollution data from three major cities in Norway. Findings from feature selection reveal a correlation between rising heating degree days and heightened air pollution levels, suggesting increased heating activities in Norway are a contributing factor to worsening air quality. PBDL demonstrates superior accuracy in air pollution predictions compared to LSTM. This paper contributes to the growing literature on PBDL methods for more accurate air pollution predictions using environmental variables, aiding policymakers in formulating effective data-driven climate policies.

Abstract (translated)

政策制定者经常孤立地分析空气质量和气候变化，忽视其相互影响。本研究通过将回归模型与K-聚类、层次聚类和随机森林技术相比较，探讨了特定气候因素对空气质量的影响。我们使用基于物理的深度学习（PBDL）和长短时记忆（LSTM）来研究空气污染预测。我们的分析利用了挪威三个主要城市（2009-2018）的每日交通、天气和空气污染数据十年的时间。特征选择的结果表明，每日加热 degree days 上升与空气污染水平加剧之间存在关联，表明挪威挪威的加热活动是加剧空气污染的一个因素。与LSTM相比，PBDL在空气污染预测方面的准确性具有优势。本文为使用环境变量进行更准确空气污染预测的PBDL方法的文献贡献，为政策制定者制定有效的数据驱动气候政策提供了帮助。

URL

https://arxiv.org/abs/2405.04716

PDF

https://arxiv.org/pdf/2405.04716.pdf
Read All
Language Modeling Using Tensor Trains

2024-05-07 18:09:47

Zhan Su, Yuqin Zhou, Fengran Mo, Jakob Grue Simonsen

arXiv_CL

arXiv_CL RNN Language_Model Pose
Abstract

We propose a novel tensor network language model based on the simplest tensor network (i.e., tensor trains), called `Tensor Train Language Model' (TTLM). TTLM represents sentences in an exponential space constructed by the tensor product of words, but computing the probabilities of sentences in a low-dimensional fashion. We demonstrate that the architectures of Second-order RNNs, Recurrent Arithmetic Circuits (RACs), and Multiplicative Integration RNNs are, essentially, special cases of TTLM. Experimental evaluations on real language modeling tasks show that the proposed variants of TTLM (i.e., TTLM-Large and TTLM-Tiny) outperform the vanilla Recurrent Neural Networks (RNNs) with low-scale of hidden units. (The code is available at this https URL.)

Abstract (translated)

我们提出了一个基于最简单的张量网络（即张量训练）的新型张量网络语言模型，称为`Tensor Train Language Model'（TTLM）。TTLM表示由单词张量乘积构成的指数空间中的句子，但以低维方式计算句子的概率。我们证明了Second-order RNN，Recurrent Arithmetic Circuits（RACs）和Multiplicative Integration RNNs的架构本质上与TTLM相同。在真实语言建模任务上的实验评估表明，与普通循环神经网络（RNNs）相比，所提出的TTLM变体（即TTLM-Large和TTLM-Tiny）在低隐藏单元规模上表现出更好的性能。（代码可在此处访问：https://this URL。）

URL

https://arxiv.org/abs/2405.04590

PDF

https://arxiv.org/pdf/2405.04590.pdf
Read All
xLSTM: Extended Long Short-Term Memory

2024-05-07 17:50:21

Maximilian Beck, Korbinian P\"oppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, G\"unter Klambauer, Johannes Brandstetter, Sepp Hochreiter

arXiv_AI

arXiv_AI RNN Deep_Learning Attention Language_Model Transformer LLM
Abstract

In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.

Abstract (translated)

在20世纪90年代，循环神经网络（RNN）和门控（gating）作为长短期记忆（LSTM）的核心思想被引入。从那时起，LSTMs经久耐用，为许多深度学习成功故事做出了贡献，特别是它们构成了第一类大型语言模型（LLM）。然而，随着Transformer技术以可并行自注意为核心的出现，标志着一个新时代的开端，其规模超越了LSTMs。现在我们提出一个问题：在将LSTMs扩展到数十亿参数时，我们在语言建模方面能达到多远？首先，我们引入了合适的归一化和稳定技术来使用指数加门控。其次，我们修改了LSTM记忆结构，获得了：i）带有标量内存、标量更新和新记忆混合的sLSTM；ii）具有矩阵记忆和协方差更新规则的全并行mLSTM。将这些LSTM扩展集成到残差块骨干网络中，得到xLSTM块，然后将它们堆叠成xLSTM架构。指数加门控和修改后的记忆结构使xLSTM功能在性能和扩展方面优于最先进的Transformer和状态空间模型。

URL

https://arxiv.org/abs/2405.04517

PDF

https://arxiv.org/pdf/2405.04517.pdf
Read All
Physics-data hybrid dynamic model of a multi-axis manipulator for sensorless dexterous manipulation and high-performance motion planning

2024-05-07 17:25:14

Wu-Te Yang, Jyun-Ming Liao, Pei-Chun Lin

arXiv_RO

arXiv_RO RNN Optimization Action
Abstract

We report on the development of an implementable physics-data hybrid dynamic model for an articulated manipulator to plan and operate in various scenarios. Meanwhile, the physics-based and data-driven dynamic models are studied in this research to select the best model for planning. The physics-based model is constructed using the Lagrangian method, and the loss terms include inertia loss, viscous loss, and friction loss. As for the data-driven model, three methods are explored, including DNN, LSTM, and XGBoost. Our modeling results demonstrate that, after comprehensive hyperparameter optimization, the XGBoost architecture outperforms DNN and LSTM in accurately representing manipulator dynamics. The hybrid model with physics-based and data-driven terms has the best performance among all models based on the RMSE criteria, and it only needs about 24k of training data. In addition, we developed a virtual force sensor of a manipulator using the observed external torque derived from the dynamic model and designed a motion planner through the physics-data hybrid dynamic model. The external torque contributes to forces and torque on the end effector, facilitating interaction with the surroundings, while the internal torque governs manipulator motion dynamics and compensates for internal losses. By estimating external torque via the difference between measured joint torque and internal losses, we implement a sensorless control strategy which is demonstrated through a peg-in-hole task. Lastly, a learning-based motion planner based on the hybrid dynamic model assists in planning time-efficient trajectories for the manipulator. This comprehensive approach underscores the efficacy of integrating physics-based and data-driven models for advanced manipulator control and planning in industrial environments.

Abstract (translated)

我们报道了一个可实施物理学数据混合动态模型的发展，用于设计和管理一个关节式操作器在各种场景下的运动规划和操作。同时，本研究还探讨了基于物理和数据驱动的动态模型的研究，以选择最佳模型进行规划。基于物理的模型使用拉格朗日方法构建，其中包括惯性损失、粘滞损失和摩擦损失。对于数据驱动模型，我们探讨了包括DNN、LSTM和XGBoost三种方法。我们模型的研究结果表明，在全面优化超参数后，XGBoost架构在准确表示操作器动力方面优于DNN和LSTM。基于物理和数据驱动的混合模型在所有基于RMSE标准的模型中具有最佳性能，而且只需要约24k的训练数据。此外，我们还使用从动态模型中观察到的外部扭矩开发了一个操作器的虚拟力传感器，并通过物理数据混合动态模型设计了一个运动规划器。外部扭矩对末端执行器的力和扭矩产生贡献，促进与周围环境的交互，而内部扭矩控制操作器运动动态并抵消内部损失。通过通过测量关节扭矩与内部损失之差估算外部扭矩，我们实现了无需传感器即可控制的策略，并通过一个钉孔任务证明了其有效性。最后，基于混合动态模型的学习运动规划器有助于为操作器在工业环境中的高级控制和规划实现高效的时间轨迹规划。这种全面的方法突出了将物理学基础和数据驱动模型整合起来在工业环境中设计高级操作器控制和规划的有效性。

URL

https://arxiv.org/abs/2405.04503

PDF

https://arxiv.org/pdf/2405.04503.pdf
Read All
Leveraging LSTM and GAN for Modern Malware Detection

2024-05-07 14:57:24

Ishita Gupta, Sneha Kumari, Priya Jha, Mohona Ghosh

arXiv_AI

arXiv_AI RNN GAN Detection Deep_Learning Face Pose
Abstract

The malware booming is a cyberspace equal to the effect of climate change to ecosystems in terms of danger. In the case of significant investments in cybersecurity technologies and staff training, the global community has become locked up in the eternal war with cyber security threats. The multi-form and changing faces of malware are continuously pushing the boundaries of the cybersecurity practitioners employ various approaches like detection and mitigate in coping with this issue. Some old mannerisms like signature-based detection and behavioral analysis are slow to adapt to the speedy evolution of malware types. Consequently, this paper proposes the utilization of the Deep Learning Model, LSTM networks, and GANs to amplify malware detection accuracy and speed. A fast-growing, state-of-the-art technology that leverages raw bytestream-based data and deep learning architectures, the AI technology provides better accuracy and performance than the traditional methods. Integration of LSTM and GAN model is the technique that is used for the synthetic generation of data, leading to the expansion of the training datasets, and as a result, the detection accuracy is improved. The paper uses the VirusShare dataset which has more than one million unique samples of the malware as the training and evaluation set for the presented models. Through thorough data preparation including tokenization, augmentation, as well as model training, the LSTM and GAN models convey the better performance in the tasks compared to straight classifiers. The research outcomes come out with 98% accuracy that shows the efficiency of deep learning plays a decisive role in proactive cybersecurity defense. Aside from that, the paper studies the output of ensemble learning and model fusion methods as a way to reduce biases and lift model complexity.

Abstract (translated)

恶意软件爆炸是一种与气候变化对生态系统的影响相等的网络空间。在大量投入网络安全技术和员工培训的情况下，全球社区已经陷入了与网络安全威胁的永恒战争中。恶意软件的多形态和不断变化的面孔不断推动网络安全实践者采用各种检测和减轻方法应对这一问题。一些老方法如基于签名的检测和行为分析在应对恶意软件类型快速演变方面较慢。因此，本文提出了使用深度学习模型、LSTM网络和GANs来提高恶意软件检测精度和速度。一种利用原始字节流数据和深度学习架构快速生长的最先进技术，人工智能技术提供了比传统方法更好的准确性和性能。LSTM和GAN模型的集成是用于数据合成技术的方法，导致训练数据集的扩展，从而提高了检测精度。本文使用VirusShare数据集，该数据集有超过一百万个独特的恶意软件样本作为训练和评估集，通过包括分词、增强以及模型训练等彻底的数据准备，LSTM和GAN模型在任务表现上比直接分类器更好。通过研究结果呈现了98%的准确度，这表明在主动网络安全防御中，深度学习的有效性具有决定性的作用。此外，本文研究了集成学习方法和模型融合方法的结果，以减少偏见和提高模型复杂性。

URL

https://arxiv.org/abs/2405.04373

PDF

https://arxiv.org/pdf/2405.04373.pdf
Read All

Content

RNN (20)

RNN

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF Copy

Abstract

Abstract (translated)

URL

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF

PDF