Anomaly detection and localization without any manual annotations and prior knowledge is a challenging task under the setting of unsupervised learning. The existing works achieve excellent performance in the anomaly detection, but with complex networks or cumbersome pipelines. To address this issue, this paper explores a simple but effective architecture in the anomaly detection. It consists of a well pre-trained encoder to extract hierarchical feature representations and a decoder to reconstruct these intermediate features from the encoder. In particular, it does not require any data augmentations and anomalous images for training. The anomalies can be detected when the decoder fails to reconstruct features well, and then errors of hierarchical feature reconstruction are aggregated into an anomaly map to achieve anomaly localization. The difference comparison between those features of encoder and decode lead to more accurate and robust localization results than the comparison in single feature or pixel-by-pixel comparison in the conventional works. Experiment results show that the proposed method outperforms the state-of-the-art methods on MNIST, Fashion-MNIST, CIFAR-10, and MVTec Anomaly Detection datasets on both anomaly detection and localization.
在无需手动注释和先前知识的情况下,检测异常并定位异常是一个具有挑战性的任务,尤其是在无监督学习环境中。现有的作品在异常检测方面表现出色,但使用了复杂的网络或繁琐的流程。为解决这个问题,本文探索了一种简单但有效的异常检测架构。它由一个预训练的编码器和一个解码器组成,编码器用于提取分层次的特征表示,解码器用于从编码器中重构这些中间特征。特别地,它不需要进行数据增强或异常图像的训练。当解码器无法很好地重构特征时,可以检测到异常。然后将层次特征重构的错误聚集在异常地图上,实现异常的局部化。编码器和解码器的特征差异比较比传统工作中的单个特征或像素逐像素比较更准确和稳健的局部化结果。实验结果表明,与最先进的 methods相比,所提出的方法在MNIST、Fashion-MNIST、CIFAR-10和MVTec异常检测数据集上 both anomaly detection and localization outperforms.
https://arxiv.org/abs/2405.09148
Tattoos have been used effectively as soft biometrics to assist law enforcement in the identification of offenders and victims, as they contain discriminative information, and are a useful indicator to locate members of a criminal gang or organisation. Due to various privacy issues in the acquisition of images containing tattoos, only a limited number of databases exists. This lack of databases has delayed the development of new methods to effectively retrieve a potential suspect's tattoo images from a candidate gallery. To mitigate this issue, in our work, we use an unsupervised generative approach to create a balanced database consisting of 28,550 semi-synthetic images with tattooed subjects from 571 tattoo categories. Further, we introduce a novel Tattoo Template Reconstruction Network (TattTRN), which learns to map the input tattoo sample to its respective tattoo template to enhance the distinguishing attributes of the final feature embedding. Experimental results with real data, i.e., WebTattoo and BIVTatt databases, demonstrate the soundness of the presented approach: an accuracy of up to 99% is achieved for checking at most the first 20 entries of the candidate list.
纹身作为一种软生物识别技术,在协助警方识别罪犯和受害者的过程中取得了有效果,因为它们包含有歧视性信息,并且是联系犯罪集团或组织的有用指标。然而,由于纹身图像收集过程中存在各种隐私问题,因此只有少数数据库存在。纹身数据库的缺乏导致新方法有效地检索潜在嫌疑人的纹身图像的发展受到了延迟。为了减轻这个问题,在我们的工作中,我们使用无监督生成方法创建了一个由571个纹身类别中纹身 subject 的28,550个半合成图像组成的平衡数据库。此外,我们引入了一种名为TattTRN的新纹身模板重建网络(TattTRN),它学会了将纹身样本映射到其相应的纹身模板,以增强最终特征嵌入的区分性特征。用真实数据进行实验结果(即WebTattoo和BIVTatt数据库)证明了所提出方法的有效性:在检查候选名单的前20个条目时,达到99%的准确率。
https://arxiv.org/abs/2405.07571
Anomaly detection in time series data is crucial across various domains. The scarcity of labeled data for such tasks has increased the attention towards unsupervised learning methods. These approaches, often relying solely on reconstruction error, typically fail to detect subtle anomalies in complex datasets. To address this, we introduce RESTAD, an adaptation of the Transformer model by incorporating a layer of Radial Basis Function (RBF) neurons within its architecture. This layer fits a non-parametric density in the latent representation, such that a high RBF output indicates similarity with predominantly normal training data. RESTAD integrates the RBF similarity scores with the reconstruction errors to increase sensitivity to anomalies. Our empirical evaluations demonstrate that RESTAD outperforms various established baselines across multiple benchmark datasets.
在时间序列数据的异常检测在各个领域都至关重要。由于为这类任务提供带标签数据非常有限,因此对无监督学习方法的关注度有所增加。这些方法通常仅依赖于重构误差,通常无法检测到复杂数据集中的微妙异常。为了应对这个问题,我们引入了RESTAD,一种基于Transformer模型的自适应方法,在其架构中引入了径向基函数(RBF)神经元层。这一层在潜在表示中适应了一个非参数密度,使得高RBF输出表示与主要正常训练数据相似。将RBF相似度分数与重构误差相结合,可以增加对异常的敏感度。我们的实证评估结果表明,RESTAD在多个基准数据集上优于各种已建立基线。
https://arxiv.org/abs/2405.07509
In the character animation field, modern supervised keyframe interpolation models have demonstrated exceptional performance in constructing natural human motions from sparse pose definitions. As supervised models, large motion datasets are necessary to facilitate the learning process; however, since motion is represented with fixed hierarchical skeletons, such datasets are incompatible for skeletons outside the datasets' native configurations. Consequently, the expected availability of a motion dataset for desired skeletons severely hinders the feasibility of learned interpolation in practice. To combat this limitation, we propose Point Cloud-based Motion Representation Learning (PC-MRL), an unsupervised approach to enabling cross-compatibility between skeletons for motion interpolation learning. PC-MRL consists of a skeleton obfuscation strategy using temporal point cloud sampling, and an unsupervised skeleton reconstruction method from point clouds. We devise a temporal point-wise K-nearest neighbors loss for unsupervised learning. Moreover, we propose First-frame Offset Quaternion (FOQ) and Rest Pose Augmentation (RPA) strategies to overcome necessary limitations of our unsupervised point cloud-to-skeletal motion process. Comprehensive experiments demonstrate the effectiveness of PC-MRL in motion interpolation for desired skeletons without supervision from native datasets.
在角色动画领域,现代有监督的关键帧插值模型在构建自然人体运动方面表现出卓越的性能。作为有监督模型,需要大型动作数据集来促进学习过程;然而,由于运动以固定的层次结构骨架表示,这类数据集对于数据集中的非原配置骨骼是不可兼容的。因此,为期望的骨骼模型设计的运动数据集在实践中严重阻碍了学习插值的可行性。为了克服这一限制,我们提出了基于点云的运动表示学习(PC-MRL),一种无监督方法,以实现骨骼在运动插值学习中的互操作性。PC-MRL包括使用时域点云采样进行骨密度干扰策略和一个无监督骨架重构方法。我们设计了一个基于时刻的K-最近邻损失来进行无监督学习。此外,我们还提出了First-frame Offset Quaternion(FOQ)和Rest Pose Augmentation(RPA)策略来克服我们无监督点云到骨骼的运动过程所必需的局限性。全面的实验证明PC-MRL在不需要来自原始数据集的监督的情况下,在运动插值方面具有很高的有效性。
https://arxiv.org/abs/2405.07444
Emerging unsupervised reconstruction techniques based on implicit neural representation (INR), such as NeRP, CoIL, and SCOPE, have shown unique capabilities in CT linear inverse imaging. In this work, we propose a novel unsupervised density neural representation (Diner) to tackle the challenging problem of CT metal artifacts when scanned objects contain metals. The drastic variation of linear attenuation coefficients (LACs) of metals over X-ray spectra leads to a nonlinear beam hardening effect (BHE) in CT measurements. Recovering CT images from metal-affected measurements therefore poses a complicated nonlinear inverse problem. Existing metal artifact reduction (MAR) techniques mostly formulate the MAR as an image inpainting task, which ignores the energy-induced BHE and produces suboptimal performance. Instead, our Diner introduces an energy-dependent polychromatic CT forward model to the INR framework, addressing the nonlinear nature of the MAR problem. Specifically, we decompose the energy-dependent LACs into energy-independent densities and energy-dependent mass attenuation coefficients (MACs) by fully considering the physical model of X-ray absorption. Using the densities as pivot variables and the MACs as known prior knowledge, the LACs can be accurately reconstructed from the raw measurements. Technically, we represent the unknown density map as an implicit function of coordinates. Combined with a novel differentiable forward model simulating the physical acquisition from the densities to the measurements, our Diner optimizes a multi-layer perception network to approximate the implicit function by minimizing predicted errors between the estimated and real measurements. Experimental results on simulated and real datasets confirm the superiority of our unsupervised Diner against popular supervised techniques in MAR performance and robustness.
基于隐式神经表示(INR)的新兴无监督重建技术,如NeRP、CoIL和SCOPE,在CT线性反向成像中表现出独特的功能。在本文中,我们提出了一种新颖的无监督密度神经表示(Diner)来解决扫描物体中含有金属时CT金属伪影的挑战。金属在X射线光谱中的线性衰减系数(LACs)的剧烈变化导致CT测量中的非线性束硬化效应(BHE)。因此,从受金属影响的测量中恢复CT图像 poses复杂的非线性反向问题。现有的金属伪影去除技术通常将伪影表示为图像修复任务,这忽略了能量引起的BHE,并产生了低性能。相反,我们的Diner在INR框架中引入了一种能量相关的多色CT正向模型,解决了MAR问题的非线性性质。具体来说,我们通过完全考虑X射线吸收的物理模型,将能量相关的LACs分解为能量无关的密度和能量相关的质量衰减系数(MACs)。将密度作为原点变量,将MACs作为已知先验知识,LACs可以从原始测量准确地重构。从技术上讲,我们将未知密度图表示为坐标系中的隐函数。结合一个新的不同可导的正向模型,模拟从密度到测量物理采集,我们的Diner优化了一个多层感知网络,通过最小化预测误差间的预测误差来逼近隐函数。在模拟和真实数据集上的实验结果证实了我们的无监督Diner在MAR性能和鲁棒性方面优于流行的监督技术。
https://arxiv.org/abs/2405.07047
Source-free Unsupervised Domain Adaptation (SFDA) aims to classify target samples by only accessing a pre-trained source model and unlabelled target samples. Since no source data is available, transferring the knowledge from the source domain to the target domain is challenging. Existing methods normally exploit the pair-wise relation among target samples and attempt to discover their correlations by clustering these samples based on semantic features. The drawback of these methods includes: 1) the pair-wise relation is limited to exposing the underlying correlations of two more samples, hindering the exploration of the structural information embedded in the target domain; 2) the clustering process only relies on the semantic feature, while overlooking the critical effect of domain shift, i.e., the distribution differences between the source and target domains. To address these issues, we propose a new SFDA method that exploits the high-order neighborhood relation and explicitly takes the domain shift effect into account. Specifically, we formulate the SFDA as a Hypergraph learning problem and construct hyperedges to explore the local group and context information among multiple samples. Moreover, we integrate a self-loop strategy into the constructed hypergraph to elegantly introduce the domain uncertainty of each sample. By clustering these samples based on hyperedges, both the semantic feature and domain shift effects are considered. We then describe an adaptive relation-based objective to tune the model with soft attention levels for all samples. Extensive experiments are conducted on Office-31, Office-Home, VisDA, and PointDA-10 datasets. The results demonstrate the superiority of our method over state-of-the-art counterparts.
源无监督域适应(SFDA)旨在通过仅访问预训练的源模型和对无标签目标样本进行分类来对目标样本进行分类。由于没有源数据可用,将源域知识转移到目标域具有挑战性。现有的方法通常利用目标样本之间的成对关系,并通过基于语义特征聚类这些样本来发现它们的关联。这些方法的缺点包括:1)成对关系仅限于揭示两个样本之间的潜在关系,阻碍了目标域中内化结构的探索;2)聚类过程仅依赖于语义特征,而忽略了领域转移的关键影响,即源域和目标域之间的分布差异。为了解决这些问题,我们提出了一个新的SFDA方法,它利用高阶邻域关系并明确考虑了领域转移效果。具体来说,我们将SFDA表示为超图学习问题,并构建了边来探索多个样本之间的局部组和上下文信息。此外,我们将自环策略融入构建的超图,优雅地引入每个样本的领域不确定性。通过基于边聚类这些样本,同时考虑语义特征和领域转移效果。然后,我们描述了一个自相关关系基于的目标对齐 objective,用于对所有样本进行软注意力的自适应调整。我们在Office-31,Office-Home,VisDA和PointDA-10数据集上进行了广泛的实验。结果表明,我们的方法超过了最先进的同类方法。
https://arxiv.org/abs/2405.06916
Anomaly localization is a practical technology for improving industrial production line efficiency. Due to anomalies are manifold and hard to be collected, existing unsupervised researches are usually equipped with anomaly synthesis methods. However, most of them are biased towards structural defects synthesis while ignoring the underlying logical constraints. To fill the gap and boost anomaly localization performance, we propose an edge manipulation based anomaly synthesis framework, named LogicAL, that produces photo-realistic both logical and structural anomalies. We introduce a logical anomaly generation strategy that is adept at breaking logical constraints and a structural anomaly generation strategy that complements to the structural defects synthesis. We further improve the anomaly localization performance by introducing edge reconstruction into the network structure. Extensive experiments on the challenge MVTecLOCO, MVTecAD, VisA and MADsim datasets verify the advantage of proposed LogicAL on both logical and structural anomaly localization.
异常定位是一种提高工业生产线效率的实用技术。由于异常具有多种多样且难以收集的特点,现有的无监督研究通常配备有异常合成方法。然而,大多数研究在结构缺陷合成方面存在偏见,而忽略了潜在的逻辑约束。为了填补这一空白并提高异常定位性能,我们提出了一个基于异常合成的边缘操作框架,名为LogicAL。该框架能够产生照片真实感的逻辑结构和异常。我们引入了一种擅长打破逻辑约束的逻辑异常生成策略和一个互补于结构缺陷合成的结构异常生成策略。我们通过将边缘重建引入网络结构进一步提高了异常定位性能。在挑战MVTecLOCO、MVTecAD、VisA和MADsim数据集的实验验证中,LogicAL在逻辑和结构异常定位方面具有优势。
https://arxiv.org/abs/2405.06875
This study explores the integration of advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) techniques to analyze and interpret Persian literature, focusing on the poetry of Forough Farrokhzad. Utilizing computational methods, we aim to unveil thematic, stylistic, and linguistic patterns in Persian poetry. Specifically, the study employs AI models including transformer-based language models for clustering of the poems in an unsupervised framework. This research underscores the potential of AI in enhancing our understanding of Persian literary heritage, with Forough Farrokhzad's work providing a comprehensive case study. This approach not only contributes to the field of Persian Digital Humanities but also sets a precedent for future research in Persian literary studies using computational techniques.
本研究探讨了将先进的自然语言处理(NLP)和人工智能(AI)技术相结合以分析和解构波斯文学,重点关注福鲁霍夫·法拉赫扎德(Forough Farrokhzad)的诗歌。利用计算方法,我们旨在揭示波斯诗歌中的主题、风格和语言模式。具体来说,研究采用包括基于Transformer的语言模型在内的AI模型,在一个无监督框架中对诗歌进行聚类。这项研究突出了AI在增强我们对波斯文学遗产的理解方面的潜力,同时对使用计算技术进行波斯文学研究产生了先例。这种方法不仅为波斯数字人文学科做出了贡献,还为未来使用计算技术进行波斯文学研究树立了榜样。
https://arxiv.org/abs/2405.06760
As Transformers have become state-of-the-art models for natural language processing (NLP) tasks, the need to understand and explain their predictions is increasingly apparent. Especially in unsupervised applications, such as information retrieval tasks, similarity models built on top of foundation model representations have been widely applied. However, their inner prediction mechanisms have mostly remained opaque. Recent advances in explainable AI have made it possible to mitigate these limitations by leveraging improved explanations for Transformers through layer-wise relevance propagation (LRP). Using BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, we investigate which feature interactions drive similarity in NLP models. We validate the resulting explanations and demonstrate their utility in three corpus-level use cases, analyzing grammatical interactions, multilingual semantics, and biomedical text retrieval. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
随着Transformer模型成为自然语言处理(NLP)任务的顶级模型,理解并解释其预测的需求 increasingly凸显。尤其是在无监督应用中,例如信息检索任务,基于基础模型表示的相似度模型已经得到了广泛应用。然而,它们内部的预测机制仍然大多保持透明。近年来,可解释人工智能(Explainable AI)的进步使得通过逐层相关传播(LRP)来改善Transformer的解释成为可能。通过BiLRP,一个为计算二阶解释而在二元相似性模型中开发的扩展,我们研究了在NLP模型中导致相似性的特征交互。我们验证了所得的解释,并分析了三个语料库层面的用例,包括语义交互、多语言语义和生物医学文本检索。我们的研究结果为深入理解不同语义相似性任务和模型以及如何利用新颖的可解释AI方法提供了贡献,揭示了这些方法如何实现对语料库层面的深入分析和见解。
https://arxiv.org/abs/2405.06604
Online social media platforms, such as Twitter, provide valuable information during disaster events. Existing tweet disaster summarization approaches provide a summary of these events to aid government agencies, humanitarian organizations, etc., to ensure effective disaster response. In the literature, there are two types of approaches for disaster summarization, namely, supervised and unsupervised approaches. Although supervised approaches are typically more effective, they necessitate a sizable number of disaster event summaries for testing and training. However, there is a lack of good number of disaster summary datasets for training and evaluation. This motivates us to add more datasets to make supervised learning approaches more efficient. In this paper, we present ADSumm, which adds annotated ground-truth summaries for eight disaster events which consist of both natural and man-made disaster events belonging to seven different countries. Our experimental analysis shows that the newly added datasets improve the performance of the supervised summarization approaches by 8-28% in terms of ROUGE-N F1-score. Moreover, in newly annotated dataset, we have added a category label for each input tweet which helps to ensure good coverage from different categories in summary. Additionally, we have added two other features relevance label and key-phrase, which provide information about the quality of a tweet and explanation about the inclusion of the tweet into summary, respectively. For ground-truth summary creation, we provide the annotation procedure adapted in detail, which has not been described in existing literature. Experimental analysis shows the quality of ground-truth summary is very good with Coverage, Relevance and Diversity.
互联网社交媒体平台,如Twitter,在灾难事件期间提供有价值的信息。现有的 tweet 灾难概述方法将这些事件总结为政府的灾害机构等,以确保有效的灾难应对。在文献中,有两种类型的灾难概述方法,即监督方法和无监督方法。虽然监督方法通常更有效,但为了测试和培训,需要大量的灾难事件摘要。然而,在训练和评估阶段,缺乏足够的灾难摘要数据集。这促使我们增加更多的数据集,使监督学习方法更有效。在本文中,我们提出了ADSumm,为八个灾难事件增加了带注释的地面真实总结,包括七种不同国家的自然和人为灾难事件。我们的实验分析表明,新添加的数据集提高了监督摘要方法在ROUGE-N F1-分数方面的性能8-28%。此外,在新生成的数据集中,我们为每个输入推文增加了类别标签,这有助于确保摘要中涵盖不同类别的良好覆盖。此外,我们还添加了两个其他特征相关标签和关键词,分别为推文的质量和推文包含摘要的解释。对于地面真实摘要创建,我们提供了详细的注释步骤,这已在现有文献中没有描述。实验分析表明,地面真实摘要的质量非常好,具有覆盖、相关性和多样性。
https://arxiv.org/abs/2405.06551
State-of-the-art model for zero-shot cross-lingual spoken language understanding performs cross-lingual unsupervised contrastive learning to achieve the label-agnostic semantic alignment between each utterance and its code-switched data. However, it ignores the precious intent/slot labels, whose label information is promising to help capture the label-aware semantics structure and then leverage supervised contrastive learning to improve both source and target languages' semantics. In this paper, we propose Hybrid and Cooperative Contrastive Learning to address this problem. Apart from cross-lingual unsupervised contrastive learning, we design a holistic approach that exploits source language supervised contrastive learning, cross-lingual supervised contrastive learning and multilingual supervised contrastive learning to perform label-aware semantics alignments in a comprehensive manner. Each kind of supervised contrastive learning mechanism includes both single-task and joint-task scenarios. In our model, one contrastive learning mechanism's input is enhanced by others. Thus the total four contrastive learning mechanisms are cooperative to learn more consistent and discriminative representations in the virtuous cycle during the training process. Experiments show that our model obtains consistent improvements over 9 languages, achieving new state-of-the-art performance.
目前最先进的零 shot跨语言口语理解模型通过跨语言无监督对比学习实现每个交互语音和其代码切换数据的语义对齐。然而,它忽略了宝贵的意图/槽标签,这些标签的信息有望帮助捕捉语义感知结构,然后利用有监督对比学习来提高源语言和目标语言的语义。在本文中,我们提出了混合和合作对比学习来解决这个问题。除了跨语言无监督对比学习之外,我们还设计了一种全面的方法,利用了源语言监督对比学习、跨语言监督对比学习和多语言监督对比学习,以全面进行语义对齐。每种监督对比学习机制都包括单任务和联合任务场景。在我们的模型中,一种对比学习机制的输入通过其他对比学习机制得到增强。因此,在训练过程中,四种对比学习机制都是合作学习的,以在正循环中学习更一致和判别性的表示。实验证明,我们的模型在9种语言上获得了持续的改进,实现了新的最先进性能。
https://arxiv.org/abs/2405.06204
Unsupervised Visible-Infrared Person Re-identification (USVI-ReID) presents a formidable challenge, which aims to match pedestrian images across visible and infrared modalities without any annotations. Recently, clustered pseudo-label methods have become predominant in USVI-ReID, although the inherent noise in pseudo-labels presents a significant obstacle. Most existing works primarily focus on shielding the model from the harmful effects of noise, neglecting to calibrate noisy pseudo-labels usually associated with hard samples, which will compromise the robustness of the model. To address this issue, we design a Robust Pseudo-label Learning with Neighbor Relation (RPNR) framework for USVI-ReID. To be specific, we first introduce a straightforward yet potent Noisy Pseudo-label Calibration module to correct noisy pseudo-labels. Due to the high intra-class variations, noisy pseudo-labels are difficult to calibrate completely. Therefore, we introduce a Neighbor Relation Learning module to reduce high intra-class variations by modeling potential interactions between all samples. Subsequently, we devise an Optimal Transport Prototype Matching module to establish reliable cross-modality correspondences. On that basis, we design a Memory Hybrid Learning module to jointly learn modality-specific and modality-invariant information. Comprehensive experiments conducted on two widely recognized benchmarks, SYSU-MM01 and RegDB, demonstrate that RPNR outperforms the current state-of-the-art GUR with an average Rank-1 improvement of 10.3%. The source codes will be released soon.
无监督可见-红外人员识别(USVI-ReID)提出了一个极具挑战性的问题,旨在在没有任何注释的情况下匹配可见和红外模式下的行人图像。近年来,聚类伪标签方法在美国VI-ReID中变得主导,尽管伪标签固有噪声对模型有很大影响。大多数现有工作主要关注屏蔽噪声对模型的影响,而忽视了通常与困难样本相关联的噪音伪标签的校准,这将削弱模型的鲁棒性。为解决此问题,我们设计了一个鲁棒伪标签学习与邻居关系(RPNR)框架,用于美国VI-ReID。具体来说,我们首先引入了一个简单而强大的噪音伪标签校准模块来纠正噪音伪标签。由于高类内差异,噪音伪标签的校准很难完全进行。因此,我们引入了一个邻居关系学习模块,通过建模所有样本之间可能存在的相互作用来降低类内差异。接下来,我们设计了一个最优传输原型匹配模块,以建立可靠的跨模态对应关系。基于此,我们设计了一个记忆混合学习模块,共同学习模式特有和模式无关的信息。在两个广泛认可的基准数据集SYSU-MM01和RegDB上进行全面的实验证明,RPNR在平均秩-1改进方面优于目前的最佳GUR,平均秩-1改进率为10.3%。源代码即将发布。
https://arxiv.org/abs/2405.05613
Our work tackles the fundamental challenge of image segmentation in computer vision, which is crucial for diverse applications. While supervised methods demonstrate proficiency, their reliance on extensive pixel-level annotations limits scalability. In response to this challenge, we present an enhanced unsupervised Convolutional Neural Network (CNN)-based algorithm called DynaSeg. Unlike traditional approaches that rely on a fixed weight factor to balance feature similarity and spatial continuity, requiring manual adjustments, our novel, dynamic weighting scheme automates parameter tuning, adapting flexibly to image details. We also introduce the novel concept of a Silhouette Score Phase that addresses the challenge of dynamic clustering during iterations. Additionally, our methodology integrates both CNN-based and pre-trained ResNet feature extraction, offering a comprehensive and adaptable approach. We achieve state-of-the-art results on diverse datasets, with a notable 12.2% and 14.12% mIOU improvement compared to the current benchmarks on COCO-All and COCO-Stuff, respectively. The proposed approach unlocks the potential for unsupervised image segmentation and addresses scalability concerns in real-world scenarios by obviating the need for meticulous parameter tuning.
我们的工作解决了计算机视觉中图像分割的基本挑战,这对各种应用至关重要。虽然监督方法表现出熟练,但它们对广泛的像素级注释的依赖限制了可扩展性。为了应对这个挑战,我们提出了一个增强的无监督卷积神经网络(CNN)算法,称为DynaSeg。与传统方法不同,它们依赖于固定权重因子来平衡特征相似性和空间连续性,需要手动调整,而我们的新动态权重方案自动调整参数,适应图像细节。我们还引入了新的轮廓得分阶段来解决迭代过程中动态聚类的挑战。此外,我们的方法将基于CNN和预训练ResNet的特征提取相结合,提供全面且可适应的解决方案。我们在各种数据集上取得最先进的成果,比COCO-All和COCO-Stuff等基准数据集的现有结果分别提高了12.2%和14.12%的mIOU。所提出的方法解锁了无监督图像分割的潜力,并解决了在现实场景中需要仔细参数调整的规模性问题。
https://arxiv.org/abs/2405.05477
Despite their effectiveness, current deep learning models face challenges with images coming from different domains with varying appearance and content. We introduce SegCLR, a versatile framework designed to segment volumetric images across different domains, employing supervised and contrastive learning simultaneously to effectively learn from both labeled and unlabeled data. We demonstrate the superior performance of SegCLR through a comprehensive evaluation involving three diverse clinical datasets of retinal fluid segmentation in 3D Optical Coherence Tomography (OCT), various network configurations, and verification across 10 different network initializations. In an unsupervised domain adaptation context, SegCLR achieves results on par with a supervised upper-bound model trained on the intended target domain. Notably, we discover that the segmentation performance of SegCLR framework is marginally impacted by the abundance of unlabeled data from the target domain, thereby we also propose an effective zero-shot domain adaptation extension of SegCLR, eliminating the need for any target domain information. This shows that our proposed addition of contrastive loss in standard supervised training for segmentation leads to superior models, inherently more generalizable to both in- and out-of-domain test data. We additionally propose a pragmatic solution for SegCLR deployment in realistic scenarios with multiple domains containing labeled data. Accordingly, our framework pushes the boundaries of deep-learning based segmentation in multi-domain applications, regardless of data availability - labeled, unlabeled, or nonexistent.
尽管深度学习模型在各种领域的图像中具有很好的效果,但它们仍然面临着来自不同领域的图像具有不同的外观和内容的挑战。我们引入了SegCLR,一种多域分割体积图像的通用框架,通过同时使用有监督和对比学习来有效地从标签和未标记数据中学习。我们通过评估三个具有不同临床数据集的3D光学共轭断层扫描(OCT)中视网膜液体分割的三种不同网络配置,以及通过10种不同的网络初始化进行验证,展示了SegCLR在各种网络配置下的优越性能。在一个自监督域适配背景下,SegCLR在目标域上与具有监督的上界模型训练得出的结果相匹敌。值得注意的是,我们发现SegCLR框架的分割性能略微受到目标域中大量未标记数据的影响,因此我们还提出了一个有效的零样本域适配扩展SegCLR,消除了需要目标域信息的需求。这表明,我们在标准监督训练中为分割添加对比损失会导致更优秀的模型,本质上更具有泛化性,无论是针对域内还是域外测试数据。此外,我们还提出了一种针对包含多个有标签数据的现实场景的SegCLR部署策略。因此,我们的框架在多域应用中推动了基于分割的深度学习边界,而不管数据可用性如何 - 标记、未标记或不存在。
https://arxiv.org/abs/2405.05336
Accurately estimating a Health Index (HI) from condition monitoring data (CM) is essential for reliable and interpretable prognostics and health management (PHM) in complex systems. In most scenarios, complex systems operate under varying operating conditions and can exhibit different fault modes, making unsupervised inference of an HI from CM data a significant challenge. Hybrid models combining prior knowledge about degradation with deep learning models have been proposed to overcome this challenge. However, previously suggested hybrid models for HI estimation usually rely heavily on system-specific information, limiting their transferability to other systems. In this work, we propose an unsupervised hybrid method for HI estimation that integrates general knowledge about degradation into the convolutional autoencoder's model architecture and learning algorithm, enhancing its applicability across various systems. The effectiveness of the proposed method is demonstrated in two case studies from different domains: turbofan engines and lithium batteries. The results show that the proposed method outperforms other competitive alternatives, including residual-based methods, in terms of HI quality and their utility for Remaining Useful Life (RUL) predictions. The case studies also highlight the comparable performance of our proposed method with a supervised model trained with HI labels.
准确从病况监测数据(CM)估算健康指数(HI)对于复杂系统中的可靠且可解释的预后护理和健康管理(PHM)至关重要。在大多数情况下,复杂系统在不同的运行条件下运行,可能表现出不同的故障模式,因此从CM数据中无监督地推断HI是一个重要的挑战。为了克服这一挑战,已经提出了结合前知识 about degradation with deep learning models的混合模型。然而,以前提出的用于HI估计的混合模型通常依赖于系统特定信息,限制了它们在其他系统上的可迁移性。在本文中,我们提出了一种无监督的混合方法用于HI估计,将降解的一般知识融入到卷积自编码器的模型结构和求解算法中,增强了它在各种系统上的适用性。所提出方法的有效性在两个不同领域的案例研究中得到了证明:涡轮喷气发动机和锂离子电池。结果表明,与基于残余的方法的竞争对手相比,所提出的方法在HI质量和它们对剩余使用寿命(RUL)预测的实用性方面都表现优异。案例研究还强调了与使用HI标签进行有监督训练的监督模型具有可比较的性能。
https://arxiv.org/abs/2405.04990
Facial feature tracking is essential in imaging ballistocardiography for accurate heart rate estimation and enables motor degradation quantification in Parkinson's disease through skin feature tracking. While deep convolutional neural networks have shown remarkable accuracy in tracking tasks, they typically require extensive labeled data for supervised training. Our proposed pipeline employs a convolutional stacked autoencoder to match image crops with a reference crop containing the target feature, learning deep feature encodings specific to the object category in an unsupervised manner, thus reducing data requirements. To overcome edge effects making the performance dependent on crop size, we introduced a Gaussian weight on the residual errors of the pixels when calculating the loss function. Training the autoencoder on facial images and validating its performance on manually labeled face and hand videos, our Deep Feature Encodings (DFE) method demonstrated superior tracking accuracy with a mean error ranging from 0.6 to 3.3 pixels, outperforming traditional methods like SIFT, SURF, Lucas Kanade, and the latest transformers like PIPs++ and CoTracker. Overall, our unsupervised learning approach excels in tracking various skin features under significant motion conditions, providing superior feature descriptors for tracking, matching, and image registration compared to both traditional and state-of-the-art supervised learning methods.
面部特征跟踪在球面心电图成像中至关重要,因为它能准确估计心脏率,并且通过皮肤特征跟踪在帕金森病患者中实现运动降解量化。虽然深度卷积神经网络在跟踪任务中表现出惊人的准确性,但通常需要大量的有标签数据进行监督训练。我们提出的方案采用卷积堆叠自编码器将图像块与包含目标特征的参考块匹配,无监督地学习特定于物体类别的深度特征编码,从而减少了数据需求。为了克服边缘效果,使性能取决于图像大小,我们在计算损失函数时对像素残差应用高斯权重。在面部图像上训练自编码器并验证其性能,我们的Deep Feature Encodings(DFE)方法在平均误差范围内从0.6到3.3像素,超越了传统方法(如SIFT,SURF,Lucas Kanade)和最先进的变压器(如PIPs++和CoTracker),展示了卓越的跟踪精度。总的来说,我们的无监督学习方法在重大运动条件下 excels于跟踪各种皮肤特征,为跟踪、匹配和图像配准提供卓越的性能,与传统和最先进的监督学习方法相比。
https://arxiv.org/abs/2405.04943
Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext task of synthesized image reconstruction to unsupervised extract the image quality information based ViT block. The pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for score prediction. Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information (e.g., color change, blurring, etc.) of images compared with the classical full-reference IQA and NR-IQA under the same datasets.
自动感知图像质量是一个具有挑战性的问题,每天影响数十亿互联网和社交媒体用户。为了在這個領域進一步研究,我們提出了基於視覺變壓器(ViT)模型的交叉 IQA 方法,稱為 ViT-CrossIQA。這種方法可以从未標記的圖像數據中學習圖像質量特征。我們構建了基於 ViT 的合成圖像重构的前置任務,用於無監督地提取圖像質量信息,基於 ViT 塊。用於交叉 IQA 的预训练編碼器用於微調线性回歸模型進行得分預測。實驗結果表明,ViT-CrossIQA 在與相同數據集上評估圖像的低頻率退化信息(例如,色彩變化,模糊等)方面與經典的全參考 IQA 和 NR-IQA 达到最先进的性能。
https://arxiv.org/abs/2405.04311
In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined along the reverse diffusion trajectory. A measurement consistency criterion enforces the fidelity of the generated speech with the reverberant measurement, while an unconditional diffusion model implements a strong prior for clean speech generation. Without any knowledge of the room impulse response nor any coupled reverberant-anechoic data, we can successfully perform dereverberation in various acoustic scenarios. Our method significantly outperforms previous blind unsupervised baselines, and we demonstrate its increased robustness to unseen acoustic conditions in comparison to blind supervised methods. Audio samples and code are available online.
在本文中,我们提出了一种基于后验采样和扩散模型的联合盲去噪和室脉冲响应估计方法。我们通过指数衰减的滤波器对每个频率子带进行参数化,并沿着反向扩散轨迹对语音语调进行细化,逐步估计相应的参数。一个测量一致性标准确保生成的语音与回声测量保持一致,而条件扩散模型则实现了对干净语音生成的强大假设。在没有了解室脉冲响应,也没有任何耦合的回声-等化数据的情况下,我们可以在各种声学场景中成功进行去噪。与之前基于无监督学习的盲去噪 baseline 相比,我们的方法显著性能更卓越,并且我们证明了与盲监督方法相比,其对未见过的声学条件的鲁棒性有所提高。音频样本和代码可在网上获取。
https://arxiv.org/abs/2405.04272
The annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming, especially for authentic images. Training on synthetic data is expected to be beneficial, but synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that introducing more distortion types in the synthetic dataset may not improve or even be harmful to generalizing authentic image quality assessment. To solve this challenge, we propose distortion-guided unsupervised domain adaptation for BIQA (DGQA), a novel framework that leverages adaptive multi-domain selection via prior knowledge from distortion to match the data distribution between the source domains and the target domain, thereby reducing negative transfer from the outlier source domains. Extensive experiments on two cross-domain settings (synthetic distortion to authentic distortion and synthetic distortion to algorithmic distortion) have demonstrated the effectiveness of our proposed DGQA. Besides, DGQA is orthogonal to existing model-based BIQA methods, and can be used in combination with such models to improve performance with less training data.
盲图像质量评估(BIQA)的注释工作是费力且耗时的,尤其是在真实图像上。预计在合成数据上的训练会有所益处,但经过合成训练的模型在真实领域往往由于领域差距而表现不佳。在这项工作中,我们做出了一个关键观察,即在合成数据中引入更多扭曲类型并不能提高或甚至对真实图像质量评估造成损害。为了解决这个挑战,我们提出了一个名为DGQA的新框架,它通过利用扭曲先验知识来引导自监督域适应,以匹配源领域和目标领域之间的数据分布,从而减少来自离群源领域的负迁移。在两个跨域设置(合成扭曲到真实扭曲和合成扭曲到算法扭曲)的实验中,已经证明了我们提出的DGQA的有效性。此外,DGQA与现有的基于模型的BIQA方法正交,可以与这些模型结合使用以提高性能,同时训练数据需求较少。
https://arxiv.org/abs/2405.04167
Cancer, a leading cause of death globally, occurs due to genomic changes and manifests heterogeneously across patients. To advance research on personalized treatment strategies, the effectiveness of various drugs on cells derived from cancers (`cell lines') is experimentally determined in laboratory settings. Nevertheless, variations in the distribution of genomic data and drug responses between cell lines and humans arise due to biological and environmental differences. Moreover, while genomic profiles of many cancer patients are readily available, the scarcity of corresponding drug response data limits the ability to train machine learning models that can predict drug response in patients effectively. Recent cancer drug response prediction methods have largely followed the paradigm of unsupervised domain-invariant representation learning followed by a downstream drug response classification step. Introducing supervision in both stages is challenging due to heterogeneous patient response to drugs and limited drug response data. This paper addresses these challenges through a novel representation learning method in the first phase and weak supervision in the second. Experimental results on real patient data demonstrate the efficacy of our method (WISER) over state-of-the-art alternatives on predicting personalized drug response.
癌症是全球导致死亡的主要原因,其发生是由基因突变导致的,并表现出在不同患者中的异质性。为了推动癌症个性化治疗策略的研究,在实验室环境下对肿瘤细胞(`细胞系`)中各种药物的有效性进行了实验验证。然而,由于生物和环境差异,肿瘤细胞和人类之间基因组数据的分布和药物反应存在差异。此外,虽然许多癌症患者的基因组数据很容易获得,但相应的药物反应数据却很有限,这限制了能够有效预测患者药物反应的机器学习模型的能力。近年来,癌症药物反应预测方法大多遵循无监督领域不变性表示学习 followed by a downstream drug response classification step 的范式。由于药物反应异质性和药物响应数据有限,引入监督在两个阶段都具有挑战性。本文通过在第一阶段使用一种新颖的表示学习方法,在第二阶段使用弱监督,从而解决这些挑战。在真实患者数据上的实验结果表明,我们的方法(WISER)在预测个性化药物反应方面优于最先进的替代方法。
https://arxiv.org/abs/2405.04078