Large multimodal models (LMMs) have proven flexible and generalisable across many tasks and fields. Although they have strong potential to aid scientific research, their capabilities in this domain are not well characterised. A key aspect of scientific research is the ability to understand and interpret figures, which serve as a rich, compressed source of complex information. In this work, we present SciFIBench, a scientific figure interpretation benchmark. Our main benchmark consists of a 1000-question gold set of multiple-choice questions split between two tasks across 12 categories. The questions are curated from CS arXiv paper figures and captions, using adversarial filtering to find hard negatives and human verification for quality control. We evaluate 26 LMMs on SciFIBench, finding it to be a challenging benchmark. Finally, we investigate the alignment and reasoning faithfulness of the LMMs on augmented question sets from our benchmark. We release SciFIBench to encourage progress in this domain.
大规模多模态模型(LMMs)已经在许多任务和领域证明了自己的灵活性和通用性。尽管它们在科学研究方面具有强大的潜力,但它们在这个领域的表现尚不明确。科学研究的關鍵方面是理解和解释数据,这些数据作为複雜信息的豐富、壓縮資源。在本文中,我們介紹了SciFIBench,一個科學數據解釋 benchmark。主要的基准包括一個由1000個多选题問題組成的金寶石數據集,分佈在12個類別中跨越兩個任務。問題是從CS arXiv論文圖像和標題中策展的,使用 adversarial 過濾來發現強硬負面和人類驗證來進行質量控制。我們在SciFIBench上評估了26個LMM,發現它是一個具有挑戰性的標記。最後,我們研究了LMM在擴展問題集上的對 align 和reasoning faithfulness。我們發布SciFIBench,以鼓勵這個領域的進步。
https://arxiv.org/abs/2405.08807
Reports regarding the misuse of $\textit{Generative AI}$ ($\textit{GenAI}$) to create harmful deepfakes are emerging daily. Recently, defensive watermarking, which enables $\textit{GenAI}$ providers to hide fingerprints in their images to later use for deepfake detection, has been on the rise. Yet, its potential has not been fully explored. We present $\textit{UnMarker}$ -- the first practical $\textit{universal}$ attack on defensive watermarking. Unlike existing attacks, $\textit{UnMarker}$ requires no detector feedback, no unrealistic knowledge of the scheme or similar models, and no advanced denoising pipelines that may not be available. Instead, being the product of an in-depth analysis of the watermarking paradigm revealing that robust schemes must construct their watermarks in the spectral amplitudes, $\textit{UnMarker}$ employs two novel adversarial optimizations to disrupt the spectra of watermarked images, erasing the watermarks. Evaluations against the $\textit{SOTA}$ prove its effectiveness, not only defeating traditional schemes while retaining superior quality compared to existing attacks but also breaking $\textit{semantic}$ watermarks that alter the image's structure, reducing the best detection rate to $43\%$ and rendering them useless. To our knowledge, $\textit{UnMarker}$ is the first practical attack on $\textit{semantic}$ watermarks, which have been deemed the future of robust watermarking. $\textit{UnMarker}$ casts doubts on the very penitential of this countermeasure and exposes its paradoxical nature as designing schemes for robustness inevitably compromises other robustness aspects.
关于$\textit{Generative AI}$($\textit{GenAI}$)用于制作有害深度伪造的报告每天都在增加。最近,防御性水印标记(Defensive Watermarking)激增,它使$\textit{GenAI}$提供商能够在他们的图像中隐藏指纹,以便稍后用于深度伪造检测。然而,它的潜力还没有完全发挥出来。我们提出了$\textit{UnMarker}$——第一个针对防御性水印标记的实际通用攻击。与现有攻击不同,$\textit{UnMarker}$不需要检测器反馈,不需要对攻击方案或类似模型的不切实际知识,也不需要高级去噪管道,这些可能并不存在。相反,它是通过深入分析水印范式揭示出,具有弹性的方案必须在频谱幅度上构建水印,$\textit{UnMarker}$采用两种新颖的对抗性优化来干扰水印图像的频谱,消除水印。与当前最佳攻击($\textit{SOTA}$)的评估证明其有效性,不仅在对传统攻击的胜利中保持卓越的质量,而且打破了改变图像结构的“语义”水印,将最佳检测率降到43%,使它们变得毫无用处。据我们所知,$\textit{UnMarker}$是第一个针对“语义”水印的实际攻击,这些水印被认为将是未来具有弹性的水印方案。$\textit{UnMarker}$使人们对这一补救措施的惩罚产生怀疑,并揭示了其自相矛盾的性质,即为了设计具有弹性的方案,必然会牺牲其他方面的可靠性。
https://arxiv.org/abs/2405.08363
Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we design algorithms that can generate adversarial examples to jailbreak SLMs in both white-box and black-box attack settings without human involvement. Additionally, we propose countermeasures to thwart such jailbreaking attacks. Our models, trained on dialog data with speech instructions, achieve state-of-the-art performance on spoken question-answering task, scoring over 80% on both safety and helpfulness metrics. Despite safety guardrails, experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions spanning 12 different toxic categories. However, we demonstrate that our proposed countermeasures reduce the attack success significantly.
近年来,整合了语音指令并能够生成相关文本响应的大规模语言模型(SLMs)受到了欢迎。然而,这些模型的安全性和鲁棒性仍然存在较大不确定性。在这项工作中,我们研究了这些指令跟随语音模型的潜在漏洞,以及针对这些模型的对抗攻击和破解。具体来说,我们设计了几种算法,可以生成对抗性样本来破解SLMs,无论是在白盒还是黑盒攻击设置中。此外,我们还提出了防止此类破解攻击的措施。我们基于语音指令的对话数据训练的模型在口头问题回答任务上实现了最先进的性能,得分超过80%的安全性和帮助性指标。尽管有安全防护措施,但在针对精心设计的具有12种有毒类别的有害问题的大型数据集上进行破解实验表明,SLMs对对抗扰动和传输攻击非常脆弱。然而,我们证明了我们提出的措施显著减少了攻击的成功率。
https://arxiv.org/abs/2405.08317
The uses of machine learning (ML) have snowballed in recent years. In many cases, ML models are highly complex, and their operation is beyond the understanding of human decision-makers. Nevertheless, some uses of ML models involve high-stakes and safety-critical applications. Explainable artificial intelligence (XAI) aims to help human decision-makers in understanding the operation of such complex ML models, thus eliciting trust in their operation. Unfortunately, the majority of past XAI work is based on informal approaches, that offer no guarantees of rigor. Unsurprisingly, there exists comprehensive experimental and theoretical evidence confirming that informal methods of XAI can provide human-decision makers with erroneous information. Logic-based XAI represents a rigorous approach to explainability; it is model-based and offers the strongest guarantees of rigor of computed explanations. However, a well-known drawback of logic-based XAI is the complexity of logic reasoning, especially for highly complex ML models. Recent work proposed distance-restricted explanations, i.e. explanations that are rigorous provided the distance to a given input is small enough. Distance-restricted explainability is tightly related with adversarial robustness, and it has been shown to scale for moderately complex ML models, but the number of inputs still represents a key limiting factor. This paper investigates novel algorithms for scaling up the performance of logic-based explainers when computing and enumerating ML model explanations with a large number of inputs.
机器学习(ML)的应用在过去几年里呈指数增长。在许多情况下,ML模型非常复杂,其操作超出了人类决策者的理解能力。然而,一些ML模型的应用具有高风险和高安全性的应用。可解释的人工智能(XAI)旨在帮助人类决策者理解此类复杂ML模型的操作,从而诱发光荣。然而,过去的XAI工作基本上是基于非正式方法,而这些方法没有保证的严谨性。不出所料,存在全面实验和理论证据证实,非正式的XAI方法可以向人类决策者提供错误的信息。基于逻辑的XAI是一种严谨的说明方法;它是基于模型的,并提供了计算解释的最强保证。然而,逻辑 based XAI 的一个已知缺点是逻辑推理的复杂性,特别是对于高度复杂的ML模型。最近的工作提出了距离受限的解释,即距离足够小的解释是严谨的。距离受限的说明性与对抗鲁棒性紧密相关,已经在处理适度复杂性的ML模型方面得到验证,但输入数量仍然是一个关键的制约因素。本文调查了用于通过大量输入提高逻辑 based explainer 性能的新算法。
https://arxiv.org/abs/2405.08297
Deep learning-based medical image segmentation models often face performance degradation when deployed across various medical centers, largely due to the discrepancies in data distribution. Test Time Adaptation (TTA) methods, which adapt pre-trained models to test data, have been employed to mitigate such discrepancies. However, existing TTA methods primarily focus on manipulating Batch Normalization (BN) layers or employing prompt and adversarial learning, which may not effectively rectify the inconsistencies arising from divergent data distributions. In this paper, we propose a novel Human-in-the-loop TTA (HiTTA) framework that stands out in two significant ways. First, it capitalizes on the largely overlooked potential of clinician-corrected predictions, integrating these corrections into the TTA process to steer the model towards predictions that coincide more closely with clinical annotation preferences. Second, our framework conceives a divergence loss, designed specifically to diminish the prediction divergence instigated by domain disparities, through the careful calibration of BN parameters. Our HiTTA is distinguished by its dual-faceted capability to acclimatize to the distribution of test data whilst ensuring the model's predictions align with clinical expectations, thereby enhancing its relevance in a medical context. Extensive experiments on a public dataset underscore the superiority of our HiTTA over existing TTA methods, emphasizing the advantages of integrating human feedback and our divergence loss in enhancing the model's performance and adaptability across diverse medical centers.
基于深度学习的医疗图像分割模型在部署到各个医疗机构时,常常会出现性能下降,主要原因是数据分布的不一致性。为了解决这个问题,已经使用了测试时间自适应(TTA)方法,这些方法将预训练的模型适应测试数据。然而,现有的TTA方法主要关注对Batch Normalization(BN)层进行操作或采用提示和对抗学习,这可能不能有效地解决由不同数据分布产生的不规范问题。在本文中,我们提出了一个新颖的人机交互式TTA(HiTTA)框架,具有以下两个显著特点。首先,它充分利用了临床人员校正预测的潜力,将这些校正纳入TTA过程,使模型更接近临床注释偏好。其次,通过精心调整BN参数,我们的框架提出了一种差异损失,旨在通过减少领域差异引起的预测差异来改善模型的预测准确性。我们的HiTTA在适应测试数据分布的同时,确保模型的预测与临床预期保持一致,从而在医学领域增强其相关性。在公开数据集上进行的大量实验证实了我们的HiTTA比现有TTA方法更优越,强调了将人类反馈和差异损失集成到模型中提高其性能和适应性的优势。
https://arxiv.org/abs/2405.08270
Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), the classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization and instance normalization, have been applied to different layers of GAN to control content synthesis in images. CAT is a multi-layer perceptron that independently predicts data based on batch statistics between neighboring layers, with global textual information unavailable to other layers. To address this issue, we first model CAT and a recurrent neural network (RAT) to ensure that different layers can access global information. We then introduce shuffle attention between RAT to mitigate the characteristic of information forgetting in recurrent neural networks. Moreover, both our generator and discriminator utilize the powerful pre-trained model, Clip, which has been extensively employed for establishing associations between text and images through the learning of multimodal representations in latent space. The discriminator utilizes CLIP's ability to comprehend complex scenes to accurately assess the quality of the generated images. Extensive experiments have been conducted on the CUB, Oxford, and CelebA-tiny datasets to demonstrate the superiority of the proposed model over current state-of-the-art models. The code is this https URL.
生成式对抗网络(GANs)是这个任务的经典模型,但通常在图像和文本描述之间缺乏一致性,并在生成图像的丰富性上不足。最近,条件反向传播(CAT)技术,如条件批归一化和实例归一化,已经被应用到GAN的不同层中,以控制图像中的内容合成。CAT是一个多层感知器,根据批归一化统计数据独立预测数据,而其他层则无法访问全局文本信息。为解决这个问题,我们首先将CAT和循环神经网络(RAT)建模,以确保不同层可以访问全局信息。然后,在RAT之间引入平移注意力和循环神经网络(RNN)的特点,降低信息遗忘的特点。此外,我们的生成器和鉴别器都利用了强大的预训练模型Clip,该模型已通过在潜在空间中学习多模态表示来建立文本和图像之间的联系。鉴别器利用Clip理解复杂场景,从而准确评估生成图像的质量。在CUB、牛津和CelebA-tiny数据集上进行了大量实验,证明了与当前最先进模型相比,所提出的模型具有优越性。代码在这个https URL上。
https://arxiv.org/abs/2405.08114
Many commercial and open-source models claim to detect machine-generated text with very high accuracy (99\% or higher). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging -- lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our dataset and tools to encourage further exploration into detector robustness.
许多商业和开源模型声称其对机器生成的文本检测具有非常高的准确度(99%或更高)。然而,几乎所有的这些检测器都没有在共享基准数据集上进行评估,即使它们进行了评估,用于评估的数据集也缺乏挑战性——缺乏抽样策略、对抗攻击和开源生成模型的变化。在这项工作中,我们提出了RAID:机器生成文本检测中最大的、最具挑战性的基准数据集。RAID包括11个模型的超过600万组训练样本,8个领域,11个对抗攻击和4种解码策略。使用RAID,我们评估了8个开源的和4个闭源的检测器的离域和对抗鲁棒性,发现当前的检测器很容易受到对抗攻击、抽样策略的变化、重复惩罚和未见过的生成模型的欺骗。我们发布了我们的数据集和工具,鼓励对检测器鲁棒性的进一步探索。
https://arxiv.org/abs/2405.07940
In recent years, diffusion models (DMs) have become a popular method for generating synthetic data. By achieving samples of higher quality, they quickly became superior to generative adversarial networks (GANs) and the current state-of-the-art method in generative modeling. However, their potential has not yet been exploited in radar, where the lack of available training data is a long-standing problem. In this work, a specific type of DMs, namely denoising diffusion probabilistic model (DDPM) is adapted to the SAR domain. We investigate the network choice and specific diffusion parameters for conditional and unconditional SAR image generation. In our experiments, we show that DDPM qualitatively and quantitatively outperforms state-of-the-art GAN-based methods for SAR image generation. Finally, we show that DDPM profits from pretraining on largescale clutter data, generating SAR images of even higher quality.
近年来,扩散模型(DMs)已成为生成合成数据的一种流行方法。通过实现高质量的样本,它们迅速成为生成对抗网络(GANs)和当前生成建模状态的佼佼者。然而,在雷达领域,缺乏可用训练数据是一个长期存在的问题。在这项工作中,我们针对SAR领域 adapt 一种特定类型的DM,即去噪扩散概率模型(DDPM)。我们研究了条件下的和无条件SAR图像生成网络选择和扩散参数。在我们的实验中,我们证明了DDPM在SAR图像生成方面既具有定性又具有定量优势。最后,我们证明了DDPM在大型杂乱数据上的预训练使其产生更高质量的SAR图像。
https://arxiv.org/abs/2405.07776
Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, existing certified detection defenders suffer from protecting labels subject to manipulation, and existing certified recovery defenders cannot systematically warn samples about their labels. A certified defense that simultaneously offers robust labels and systematic warning protection against patch attacks is desirable. This paper proposes a novel certified defense technique called CrossCert. CrossCert formulates a novel approach by cross-checking two certified recovery defenders to provide unwavering certification and detection certification. Unwavering certification ensures that a certified sample, when subjected to a patched perturbation, will always be returned with a benign label without triggering any warnings with a provable guarantee. To our knowledge, CrossCert is the first certified detection technique to offer this guarantee. Our experiments show that, with a slightly lower performance than ViP and comparable performance with PatchCensor in terms of detection certification, CrossCert certifies a significant proportion of samples with the guarantee of unwavering certification.
补丁鲁棒性认证是一种新兴的防御技术,用于证明保证来对抗 adversarial 补丁攻击。有两个研究路线:认证恢复和认证检测。它们的目标是正确地给恶意样本分配保证,并为预计为非善意标签的恶意样本发出警告。然而,现有的 certified detection 防御者存在保护受到操纵的标签的问题,而现有的 certified recovery 防御者无法系统地警告样本其标签。一种同时提供鲁棒标签和系统警告保护 against 补丁攻击的 certified 防御是值得的。本文提出了一种名为 CrossCert 的全新 certified 防御技术。CrossCert 通过交叉检查两个 certified recovery 防御者来提供不受质疑的认证和检测认证。不受质疑的认证确保,当一个 certified 样本受到补丁扰动时,始终会得到一个 benign 标签,而不会触发任何有保证的警告。据我们所知,CrossCert 是第一个提供这项保证的 certified 检测技术。我们的实验结果表明,与 ViP 的性能略有不同,但与 PatchCensor 的性能相当,CrossCert 能够使大量样本通过不受质疑的认证保证。
https://arxiv.org/abs/2405.07668
Object detection techniques for Unmanned Aerial Vehicles (UAVs) rely on Deep Neural Networks (DNNs), which are vulnerable to adversarial attacks. Nonetheless, adversarial patches generated by existing algorithms in the UAV domain pay very little attention to the naturalness of adversarial patches. Moreover, imposing constraints directly on adversarial patches makes it difficult to generate patches that appear natural to the human eye while ensuring a high attack success rate. We notice that patches are natural looking when their overall color is consistent with the environment. Therefore, we propose a new method named Environmental Matching Attack(EMA) to address the issue of optimizing the adversarial patch under the constraints of color. To the best of our knowledge, this paper is the first to consider natural patches in the domain of UAVs. The EMA method exploits strong prior knowledge of a pretrained stable diffusion to guide the optimization direction of the adversarial patch, where the text guidance can restrict the color of the patch. To better match the environment, the contrast and brightness of the patch are appropriately adjusted. Instead of optimizing the adversarial patch itself, we optimize an adversarial perturbation patch which initializes to zero so that the model can better trade off attacking performance and naturalness. Experiments conducted on the DroneVehicle and Carpk datasets have shown that our work can reach nearly the same attack performance in the digital attack(no greater than 2 in mAP$\%$), surpass the baseline method in the physical specific scenarios, and exhibit a significant advantage in terms of naturalness in visualization and color difference with the environment.
无人机(UAV)的目标检测技术依赖于深度神经网络(DNNs),这些网络对攻击非常敏感。然而,UAV领域现有算法生成的攻击补丁对攻击的自然性非常关注。此外,直接对攻击补丁施加约束会使得生成看起来自然的人工补丁变得困难,同时保证高攻击成功率。我们注意到,当补丁的整体颜色与环境相同时,它们看起来是自然的。因此,我们提出了一种名为环境匹配攻击(EMA)的新方法来解决在颜色约束下优化攻击补丁的问题。据我们所知,这是第一个考虑UAV领域自然补丁的论文。EMA方法利用预训练的稳定扩散的强烈先验知识引导攻击补丁的优化方向,其中文本指导可以限制补丁的颜色。为了更好地匹配环境,适当调整补丁的对比度和亮度。我们不是优化攻击补丁本身,而是优化一个攻击补丁,该补丁初始化为零,以便模型可以更好地平衡攻击性能和自然性。在DroneVehicle和Carpk数据集上进行的实验表明,我们的工作在数字攻击(MAP%不超过2)方面的攻击性能与基线方法相当,在物理特定场景中超过了基线方法,并且在可视化和颜色差异方面具有显著的优越性。
https://arxiv.org/abs/2405.07595
Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural economy to quickly and properly identify these diseases. Traditional data augmentation approaches, such as rotation, flip, and translation, have limitations and frequently fail to provide strong generalization results. To address these issues, our research employs a novel approach termed as PotatoGANs. In this novel data augmentation approach, two types of Generative Adversarial Networks (GANs) are utilized to generate synthetic potato disease images from healthy potato images. This approach not only expands the dataset but also adds variety, which helps to enhance model generalization. Using the Inception score as a measure, our experiments show the better quality and realisticness of the images created by PotatoGANs, emphasizing their capacity to resemble real disease images closely. The CycleGAN model outperforms the Pix2Pix GAN model in terms of image quality, as evidenced by its higher IS scores CycleGAN achieves higher Inception scores (IS) of 1.2001 and 1.0900 for black scurf and common scab, respectively. This synthetic data can significantly improve the training of large neural networks. It also reduces data collection costs while enhancing data diversity and generalization capabilities. Our work improves interpretability by combining three gradient-based Explainable AI algorithms (GradCAM, GradCAM++, and ScoreCAM) with three distinct CNN architectures (DenseNet169, Resnet152 V2, InceptionResNet V2) for potato disease classification.
许多应用都是通过使用深度学习技术自动化农业病害分割而产生的。然而,当应用于新的条件时,这些应用经常面临过拟合的困难,导致分割性能下降。在马铃薯种植中,由于疾病对产量有很大影响,因此农业经济需要迅速且正确地识别这些疾病。传统的数据增强方法(如旋转、翻转和翻译)具有局限性,经常无法提供强大的泛化结果。为了解决这些问题,我们的研究采用了一种名为PotatoGANs的新型数据增强方法。在这种新型数据增强方法中,我们使用两种类型的生成对抗网络(GANs)从健康马铃薯图像中生成合成马铃薯病害图像。这种方法不仅扩展了数据集,还增加了多样性,从而提高了模型的泛化能力。使用Inception得分作为指标,我们的实验结果表明,由PotatoGANs生成的图像具有更好的质量和真实性,强调其接近真实病害图像的能力。CycleGAN模型在图像质量上优于Pix2Pix GAN模型,这可以通过其较高的IS得分(1.2001和1.0900)看出。这种合成数据可以显著提高大型神经网络的训练。同时,它还可以降低数据收集成本,增强数据多样性和泛化能力。通过结合三种基于梯度的解释性AI算法(GradCAM、GradCAM++和ScoreCAM)和三个不同的卷积网络架构(DenseNet169、Resnet152 V2和InceptionResNet V2)进行马铃薯病害分类,我们的工作提高了可解释性。
https://arxiv.org/abs/2405.07332
Artificial neural networks, celebrated for their human-like cognitive learning abilities, often encounter the well-known catastrophic forgetting (CF) problem, where the neural networks lose the proficiency in previously acquired knowledge. Despite numerous efforts to mitigate CF, it remains the significant challenge particularly in complex changing environments. This challenge is even more pronounced in cross-domain adaptation following the continual learning (CL) setting, which is a more challenging and realistic scenario that is under-explored. To this end, this article proposes a cross-domain CL approach making possible to deploy a single model in such environments without additional labelling costs. Our approach, namely continual learning approach for many processes (CLAMP), integrates a class-aware adversarial domain adaptation strategy to align a source domain and a target domain. An assessor-guided learning process is put forward to navigate the learning process of a base model assigning a set of weights to every sample controlling the influence of every sample and the interactions of each loss function in such a way to balance the stability and plasticity dilemma thus preventing the CF problem. The first assessor focuses on the negative transfer problem rejecting irrelevant samples of the source domain while the second assessor prevents noisy pseudo labels of the target domain. Both assessors are trained in the meta-learning approach using random transformation techniques and similar samples of the source domain. Theoretical analysis and extensive numerical validations demonstrate that CLAMP significantly outperforms established baseline algorithms across all experiments by at least $10\%$ margin.
人工神经网络因其类似于人类的学习能力而备受赞誉,但通常会遭遇著名的灾难性遗忘(CF)问题,即神经网络会失去之前获得的知识熟练度。尽管有大量努力试图减轻CF,但在复杂变化环境中,它仍然是一个尤为重要的挑战,尤其是在持续学习(CL)设置下。对于这一挑战,在跨领域自适应设置下,本文提出了一种跨领域CL方法,使得在不需要额外标签成本的情况下,可以在环境中部署单个模型。我们提出的方法,即持续学习方法(CLAMP),结合了一个类感知对抗域适应策略,以对 source domain 和 target domain 进行对齐。我们提出了一个评估者引导的学习过程,为每个样本分配一组权重,以控制每个样本的影响力和损失函数之间的交互,从而平衡稳定性与可塑性之间的困境,防止CF问题的发生。第一个评估器关注负向转移问题,拒绝源域中的无关样本,第二个评估器防止目标域中的噪音伪标签。两个评估器都在元学习方法中通过随机变换技术进行训练,并使用与源域相似的样本进行训练。理论分析和广泛的数值验证证明,CLAMP在所有实验中均显著优于既定的基线算法,至少领先10%。
https://arxiv.org/abs/2405.07142
This research develops advanced methodologies for Large Language Models (LLMs) to better manage linguistic behaviors related to emotions and ethics. We introduce DIKE, an adversarial framework that enhances the LLMs' ability to internalize and reflect global human values, adapting to varied cultural contexts to promote transparency and trust among users. The methodology involves detailed modeling of emotions, classification of linguistic behaviors, and implementation of ethical guardrails. Our innovative approaches include mapping emotions and behaviors using self-supervised learning techniques, refining these guardrails through adversarial reviews, and systematically adjusting outputs to ensure ethical alignment. This framework establishes a robust foundation for AI systems to operate with ethical integrity and cultural sensitivity, paving the way for more responsible and context-aware AI interactions.
这项研究开发了大型语言模型(LLMs)更好地管理与情感和伦理行为相关的先进方法。我们引入了DIKE,一种对抗性框架,它增强了LLMs内部化和反映全球人类价值观的能力,适应各种文化环境以促进用户之间的透明度和信任。方法论包括情感的详细建模、语言行为的分类和实施道德边界。我们创新的方法包括使用自我监督学习技术进行情感和行为的映射,通过对抗性审查来优化这些边界,并系统地调整输出以确保伦理一致性。这个框架为AI系统操作道德 integrity 和文化敏感奠定了坚实的基础,为更负责任和上下文敏感的AI交互铺平了道路。
https://arxiv.org/abs/2405.07076
Large Language Models (LLMs) enable a new ecosystem with many downstream applications, called LLM applications, with different natural language processing tasks. The functionality and performance of an LLM application highly depend on its system prompt, which instructs the backend LLM on what task to perform. Therefore, an LLM application developer often keeps a system prompt confidential to protect its intellectual property. As a result, a natural attack, called prompt leaking, is to steal the system prompt from an LLM application, which compromises the developer's intellectual property. Existing prompt leaking attacks primarily rely on manually crafted queries, and thus achieve limited effectiveness. In this paper, we design a novel, closed-box prompt leaking attack framework, called PLeak, to optimize an adversarial query such that when the attacker sends it to a target LLM application, its response reveals its own system prompt. We formulate finding such an adversarial query as an optimization problem and solve it with a gradient-based method approximately. Our key idea is to break down the optimization goal by optimizing adversary queries for system prompts incrementally, i.e., starting from the first few tokens of each system prompt step by step until the entire length of the system prompt. We evaluate PLeak in both offline settings and for real-world LLM applications, e.g., those on Poe, a popular platform hosting such applications. Our results show that PLeak can effectively leak system prompts and significantly outperforms not only baselines that manually curate queries but also baselines with optimized queries that are modified and adapted from existing jailbreaking attacks. We responsibly reported the issues to Poe and are still waiting for their response. Our implementation is available at this repository: this https URL.
大语言模型(LLMs)使一个新的生态系统得以建立,该生态系统具有许多下游应用,称为LLM应用,涵盖了许多自然语言处理任务。LLM应用的功能和性能高度取决于其系统提示,该提示教导后端LLM执行何种任务。因此,LLM应用开发人员通常会保守系统提示,以保护其知识产权。结果,一种自然攻击,称为提示泄漏,是偷取LLM应用程序的系统提示,这会削弱开发者的知识产权。现有的提示泄漏攻击主要依赖于手工制作的查询,因此其效果有限。在本文中,我们设计了一种新颖的闭箱提示泄漏攻击框架,称为PLeak,以优化针对目标LLM应用程序的对抗性查询,使其响应揭示其自己的系统提示。我们将发现这样的对抗性查询视为优化问题,并使用基于梯度的方法来解决它。我们的关键思路是逐步分解优化目标,通过逐步优化系统提示的对抗性查询,从每个系统提示步骤的第一几个词开始,直到整个长度的系统提示。我们在离线设置和现实世界的LLM应用程序中评估PLeak,例如在Poe上托管的流行应用程序。我们的结果表明,PLeak可以有效泄露系统提示,并且在不仅手动 curated查询而且优化了查询的基线模型中也表现出优异的性能。我们负责地将问题报告给Poe,并仍在等待他们的回复。我们的实现可通过此仓库的此链接获得:https:// this URL。
https://arxiv.org/abs/2405.06823
Large Language Models (LLMs) are becoming vital tools that help us solve and understand complex problems by acting as digital assistants. LLMs can generate convincing explanations, even when only given the inputs and outputs of these problems, i.e., in a ``black-box'' approach. However, our research uncovers a hidden risk tied to this approach, which we call *adversarial helpfulness*. This happens when an LLM's explanations make a wrong answer look right, potentially leading people to trust incorrect solutions. In this paper, we show that this issue affects not just humans, but also LLM evaluators. Digging deeper, we identify and examine key persuasive strategies employed by LLMs. Our findings reveal that these models employ strategies such as reframing the questions, expressing an elevated level of confidence, and cherry-picking evidence to paint misleading answers in a credible light. To examine if LLMs are able to navigate complex-structured knowledge when generating adversarially helpful explanations, we create a special task based on navigating through graphs. Some LLMs are not able to find alternative paths along simple graphs, indicating that their misleading explanations aren't produced by only logical deductions using complex knowledge. These findings shed light on the limitations of black-box explanation setting. We provide some advice on how to use LLMs as explainers safely.
大语言模型(LLMs)正在成为帮助我们解决问题和理解复杂问题的有价值的工具,通过充当数字助手来行动。LLMs可以生成令人信服的解释,即使仅给出问题及其输出,即一种“黑盒”方法。然而,我们的研究揭示了一种与这种方法相关的潜在风险,我们称之为“对抗性帮助性”。这种情况发生在LLM的解释出现错误时,即使只是给出了问题及其输出,这可能导致人们信任不正确的解决方案。在本文中,我们证明了这个问题不仅影响了人类,也对LLM评估者造成了影响。通过更深入地研究,我们发现了LLM采用的一些说服策略,并对其进行了深入探讨。我们的研究揭示了这些模型使用的一些策略,如重新定义问题,表达更高的信心水平,和选择性地引用证据来描绘误导性答案。为了研究LLM在生成对抗性帮助性解释时是否能够处理复杂结构知识,我们创建了一个基于图形的特殊任务。一些LLM在简单的图表上找不到替代路径,这表明它们误导性解释并非仅由复杂知识中的逻辑推理产生。这些发现揭示了黑盒解释设置的局限性。我们提供了一些建议,如何安全地使用LLM作为解释者。
https://arxiv.org/abs/2405.06800
Denoising diffusion models (DDM) have gained recent traction in medical image translation given improved training stability over adversarial models. DDMs learn a multi-step denoising transformation to progressively map random Gaussian-noise images onto target-modality images, while receiving stationary guidance from source-modality images. As this denoising transformation diverges significantly from the task-relevant source-to-target transformation, DDMs can suffer from weak source-modality guidance. Here, we propose a novel self-consistent recursive diffusion bridge (SelfRDB) for improved performance in medical image translation. Unlike DDMs, SelfRDB employs a novel forward process with start- and end-points defined based on target and source images, respectively. Intermediate image samples across the process are expressed via a normal distribution with mean taken as a convex combination of start-end points, and variance from additive noise. Unlike regular diffusion bridges that prescribe zero variance at start-end points and high variance at mid-point of the process, we propose a novel noise scheduling with monotonically increasing variance towards the end-point in order to boost generalization performance and facilitate information transfer between the two modalities. To further enhance sampling accuracy in each reverse step, we propose a novel sampling procedure where the network recursively generates a transient-estimate of the target image until convergence onto a self-consistent solution. Comprehensive analyses in multi-contrast MRI and MRI-CT translation indicate that SelfRDB offers superior performance against competing methods.
滤波扩散模型(DDM)在医学图像翻译领域最近受到了关注,因为通过改善对抗模型的训练稳定性,DDM获得了更好的性能。DDM通过学习多步滤波变换来逐步将随机高斯噪声图像映射到目标模态图像,同时从源模态图像中获得稳态指导。然而,由于这个滤波变换与任务相关的源到目标变换存在显著差异,DDM可能会受到弱源模态指导的困扰。 在这里,我们提出了一个名为自适应一致递归扩散桥(SelfRDB)的新型医疗图像翻译模型,以提高在医学图像翻译方面的性能。与DDM不同,SelfRDB采用了一种基于目标和源图像分别定义起点和终点的全新前向过程。过程中的中间图像样本通过正态分布来表示,平均取作为凸组合的起点和终点,方差从添加噪声中取。与常规扩散桥规定在起点和终点处方差为零,过程中间的方差较高的情况不同,我们提出了一个在端点处方差逐渐增加的新噪音调度策略,以提高泛化性能并促进两种模态之间的信息传递。 为了进一步提高反向步骤的抽样精度,我们提出了一个新抽样方法,网络在收敛到自适应解之前递归生成目标图像的暂态估计。在多对比MRI和MRI-CT翻译中进行全面的分析表明,SelfRDB相对于竞争方法具有卓越的性能。
https://arxiv.org/abs/2405.06789
We propose a gradient flow procedure for generative modeling by transporting particles from an initial source distribution to a target distribution, where the gradient field on the particles is given by a noise-adaptive Wasserstein Gradient of the Maximum Mean Discrepancy (MMD). The noise-adaptive MMD is trained on data distributions corrupted by increasing levels of noise, obtained via a forward diffusion process, as commonly used in denoising diffusion probabilistic models. The result is a generalization of MMD Gradient Flow, which we call Diffusion-MMD-Gradient Flow or DMMD. The divergence training procedure is related to discriminator training in Generative Adversarial Networks (GAN), but does not require adversarial training. We obtain competitive empirical performance in unconditional image generation on CIFAR10, MNIST, CELEB-A (64 x64) and LSUN Church (64 x 64). Furthermore, we demonstrate the validity of the approach when MMD is replaced by a lower bound on the KL divergence.
我们提出了一个通过将粒子从初始分布传输到目标分布的梯度流方法来进行生成建模,其中粒子的梯度场由噪声自适应的Wasserstein梯度(MMD)给出。训练噪声自适应的MMD时,通常用于去噪扩散概率模型。结果是MMD梯度流的推广,我们称之为扩散-MMD梯度流或DMMD。与GAN中的判别器训练过程相关,但不要求对抗性训练。在CIFAR10、MNIST和CELEB-A(64x64)以及LSUN Church(64x64)等条件下的图像生成中获得了竞争力的实证性能。此外,我们还证明了当MMD用KL散度下界替换时,该方法的有效性。
https://arxiv.org/abs/2405.06780
In certain situations, neural networks will represent environment states in their hidden activations. Our goal is to visualize what environment states the networks are representing. We experiment with a recurrent neural network (RNN) architecture with a decoder network at the end. After training, we apply the decoder to the intermediate representations of the network to visualize what they represent. We define a quantitative interpretability metric and use it to demonstrate that hidden states can be highly interpretable on a simple task. We also develop autoencoder and adversarial techniques and show that benefit interpretability.
在某些情况下,神经网络会将其隐藏激活表示为环境状态。我们的目标是对网络表示的环境状态进行可视化。我们尝试使用一个循环神经网络(RNN)架构,并在其末尾使用解码器网络。在训练之后,我们将解码器应用于网络的中间表示,以可视化它们所表示的环境状态。我们定义了一个定量的可解释性指标,并使用它来证明在简单任务上,隐藏状态可以具有高度的可解释性。我们还开发了自动编码器和对抗技术,并证明了其具有可解释性的优势。
https://arxiv.org/abs/2405.06409
Integrated sensing and communications (ISAC) is pivotal for 6G communications and is boosted by the rapid development of reconfigurable intelligent surfaces (RISs). Using the channel state information (CSI) across multiple frequency bands, RIS-aided multi-band ISAC systems can potentially track users' positions with high precision. Though tracking with CSI is desirable as no communication overheads are incurred, it faces challenges due to the multi-modalities of CSI samples, irregular and asynchronous data traffic, and sparse labeled data for learning the tracking function. This paper proposes the X2Track framework, where we model the tracking function by a hierarchical architecture, jointly utilizing multi-modal CSI indicators across multiple bands, and optimize it in a cross-domain manner, tackling the sparsity of labeled data for the target deployment environment (namely, target domain) by adapting the knowledge learned from another environment (namely, source domain). Under X2Track, we design an efficient deep learning algorithm to minimize tracking errors, based on transformer neural networks and adversarial learning techniques. Simulation results verify that X2Track achieves decimeter-level axial tracking errors even under scarce UL data traffic and strong interference conditions and can adapt to diverse deployment environments with fewer than 5% training data, or equivalently, 5 minutes of UE tracks, being labeled.
集成感测与通信(ISAC)对于6G通信至关重要,并且CSI在多个频段上的快速发展趋势也推动了ISAC的发展。通过利用多频段的CSI信息,辅助多频段ISAC系统可以实现对用户位置的高精度跟踪。尽管使用CSI进行跟踪是很受欢迎的,因为没有传输开销,但因为它存在多模态CSI样本、不规则的数据流量和稀疏的标注数据,所以面临着挑战。本文提出了X2Track框架,我们通过分层架构建模跟踪函数,在多个频段上共同利用多模态CSI指示器,并以跨领域的方式优化它,通过将另一个环境(即目标域)中学到的知识应用于目标域,来解决目标域中标签数据的稀疏性。在X2Track框架下,我们设计了一种基于Transformer神经网络和对抗学习技术的有效深度学习算法,用于最小化跟踪误差。仿真结果证实,即使在稀疏的UL数据流量和强干扰条件下,X2Track也能实现分米级的轴向跟踪误差,并能够适应各种部署环境,或者换句话说,使用5%的训练数据,或者5分钟的UE跟踪数据进行标记。
https://arxiv.org/abs/2405.06299
Graph Neural Networks (GNNs) have emerged as potent models for graph learning. Distributing the training process across multiple computing nodes is the most promising solution to address the challenges of ever-growing real-world graphs. However, current adversarial attack methods on GNNs neglect the characteristics and applications of the distributed scenario, leading to suboptimal performance and inefficiency in attacking distributed GNN training. In this study, we introduce Disttack, the first framework of adversarial attacks for distributed GNN training that leverages the characteristics of frequent gradient updates in a distributed system. Specifically, Disttack corrupts distributed GNN training by injecting adversarial attacks into one single computing node. The attacked subgraphs are precisely perturbed to induce an abnormal gradient ascent in backpropagation, disrupting gradient synchronization between computing nodes and thus leading to a significant performance decline of the trained GNN. We evaluate Disttack on four large real-world graphs by attacking five widely adopted GNNs. Compared with the state-of-the-art attack method, experimental results demonstrate that Disttack amplifies the model accuracy degradation by 2.75$\times$ and achieves speedup by 17.33$\times$ on average while maintaining unnoticeability.
图神经网络(GNNs)已成为用于图学习的强大模型。在分布式计算节点上分布式训练过程是最有希望解决日益增长的现实图挑战的解决方案。然而,当前针对GNN的对抗攻击方法忽略了分布式场景的性质和应用,导致攻击分布式GNN训练的性能和效率低下。在本文中,我们引入了Disttack,第一个针对分布式GNN训练的攻击框架,利用分布式系统中的频繁梯度更新特性。具体来说,Disttack通过在单个计算节点注入对抗攻击破坏了分布式GNN训练。受到攻击的子图会被精确地扰动,导致在反向传播过程中异常梯度上升,破坏计算节点之间的梯度同步,从而导致训练GNN的性能显著下降。我们通过攻击五種广泛采用的GNN来评估Disttack。与最先进的攻击方法相比,实验结果表明,Disttack通过增加模型准确度下降2.75倍,并实现平均速度提高17.33倍,同时保持不可察觉性。
https://arxiv.org/abs/2405.06247