The primary color profile of the same identity is assumed to remain consistent in typical Person Re-identification (Person ReID) tasks. However, this assumption may be invalid in real-world situations and images hold variant color profiles, because of cross-modality cameras or identity with different clothing. To address this issue, we propose Color Space Learning (CSL) for those Cross-Color Person ReID problems. Specifically, CSL guides the model to be less color-sensitive with two modules: Image-level Color-Augmentation and Pixel-level Color-Transformation. The first module increases the color diversity of the inputs and guides the model to focus more on the non-color information. The second module projects every pixel of input images onto a new color space. In addition, we introduce a new Person ReID benchmark across RGB and Infrared modalities, NTU-Corridor, which is the first with privacy agreements from all participants. To evaluate the effectiveness and robustness of our proposed CSL, we evaluate it on several Cross-Color Person ReID benchmarks. Our method surpasses the state-of-the-art methods consistently. The code and benchmark are available at: this https URL
相同身份的主要颜色剖面假定在典型的人识别(Person ReID)任务中保持一致。然而,在现实世界的场景和图像中,这一假设可能是无效的,因为跨模态相机或由于不同服装而具有不同的身份。为了应对这个问题,我们为跨颜色人物识别问题提出了颜色空间学习(CSL)。具体来说,CSL通过两个模块来指导模型:图像级别的颜色增强和像素级别的颜色变换。第一个模块增加了输入的色度多样性并指导模型更关注非颜色信息。第二个模块将输入图像的每个像素投影到一个新的颜色空间。此外,我们还引入了一个新的跨RGB和红外模态的人识别基准,NTU-Corridor,它是第一个隐私协议得到所有参与者认可的基准。为了评估我们提出的CSL的有效性和鲁棒性,我们在多个跨颜色人物识别基准上进行了评估。我们的方法超越了最先进的方法。代码和基准可用于此链接:https:// this URL
https://arxiv.org/abs/2405.09487
Unsupervised Visible-Infrared Person Re-identification (USVI-ReID) presents a formidable challenge, which aims to match pedestrian images across visible and infrared modalities without any annotations. Recently, clustered pseudo-label methods have become predominant in USVI-ReID, although the inherent noise in pseudo-labels presents a significant obstacle. Most existing works primarily focus on shielding the model from the harmful effects of noise, neglecting to calibrate noisy pseudo-labels usually associated with hard samples, which will compromise the robustness of the model. To address this issue, we design a Robust Pseudo-label Learning with Neighbor Relation (RPNR) framework for USVI-ReID. To be specific, we first introduce a straightforward yet potent Noisy Pseudo-label Calibration module to correct noisy pseudo-labels. Due to the high intra-class variations, noisy pseudo-labels are difficult to calibrate completely. Therefore, we introduce a Neighbor Relation Learning module to reduce high intra-class variations by modeling potential interactions between all samples. Subsequently, we devise an Optimal Transport Prototype Matching module to establish reliable cross-modality correspondences. On that basis, we design a Memory Hybrid Learning module to jointly learn modality-specific and modality-invariant information. Comprehensive experiments conducted on two widely recognized benchmarks, SYSU-MM01 and RegDB, demonstrate that RPNR outperforms the current state-of-the-art GUR with an average Rank-1 improvement of 10.3%. The source codes will be released soon.
无监督可见-红外人员识别(USVI-ReID)提出了一个极具挑战性的问题,旨在在没有任何注释的情况下匹配可见和红外模式下的行人图像。近年来,聚类伪标签方法在美国VI-ReID中变得主导,尽管伪标签固有噪声对模型有很大影响。大多数现有工作主要关注屏蔽噪声对模型的影响,而忽视了通常与困难样本相关联的噪音伪标签的校准,这将削弱模型的鲁棒性。为解决此问题,我们设计了一个鲁棒伪标签学习与邻居关系(RPNR)框架,用于美国VI-ReID。具体来说,我们首先引入了一个简单而强大的噪音伪标签校准模块来纠正噪音伪标签。由于高类内差异,噪音伪标签的校准很难完全进行。因此,我们引入了一个邻居关系学习模块,通过建模所有样本之间可能存在的相互作用来降低类内差异。接下来,我们设计了一个最优传输原型匹配模块,以建立可靠的跨模态对应关系。基于此,我们设计了一个记忆混合学习模块,共同学习模式特有和模式无关的信息。在两个广泛认可的基准数据集SYSU-MM01和RegDB上进行全面的实验证明,RPNR在平均秩-1改进方面优于目前的最佳GUR,平均秩-1改进率为10.3%。源代码即将发布。
https://arxiv.org/abs/2405.05613
Text-to-image person re-identification (ReID) retrieves pedestrian images according to textual descriptions. Manually annotating textual descriptions is time-consuming, restricting the scale of existing datasets and therefore the generalization ability of ReID models. As a result, we study the transferable text-to-image ReID problem, where we train a model on our proposed large-scale database and directly deploy it to various datasets for evaluation. We obtain substantial training data via Multi-modal Large Language Models (MLLMs). Moreover, we identify and address two key challenges in utilizing the obtained textual descriptions. First, an MLLM tends to generate descriptions with similar structures, causing the model to overfit specific sentence patterns. Thus, we propose a novel method that uses MLLMs to caption images according to various templates. These templates are obtained using a multi-turn dialogue with a Large Language Model (LLM). Therefore, we can build a large-scale dataset with diverse textual descriptions. Second, an MLLM may produce incorrect descriptions. Hence, we introduce a novel method that automatically identifies words in a description that do not correspond with the image. This method is based on the similarity between one text and all patch token embeddings in the image. Then, we mask these words with a larger probability in the subsequent training epoch, alleviating the impact of noisy textual descriptions. The experimental results demonstrate that our methods significantly boost the direct transfer text-to-image ReID performance. Benefiting from the pre-trained model weights, we also achieve state-of-the-art performance in the traditional evaluation settings.
文本到图像人物识别(ReID)根据文本描述检索行人图像。手动标注文本描述费时,限制了现有数据集中的规模,因此限制了ReID模型的泛化能力。因此,我们研究可迁移的文本到图像ReID问题,在这个问题上,我们在提出的 large-scale 数据库上训练一个模型,然后直接部署到各种数据集上进行评估。我们通过多模态大型语言模型(MLLMs)获得了大量训练数据。此外,我们解决了利用获得的文本描述的两个关键挑战。首先,一个 MLLM 倾向于生成具有相似结构的描述,导致模型过拟合特定的句法模式。因此,我们提出了一种新颖的方法,使用 MLLMs 根据各种模板给图像 caption。这些模板是在与大型语言模型(LLM)的多轮对话中获得的。因此,我们可以构建一个具有多样文本描述的大型数据集。其次,一个 MLLM 可能产生错误的描述。因此,我们引入了一种新颖的方法,该方法会自动识别描述中与图像不匹配的单词。这个方法基于文本和图像中所有补丁词向量的相似性。然后,我们在后续的训练 epoch 中将这些单词的概率增大,减轻了噪音文本描述的影响。实验结果表明,我们的方法显著提高了直接迁移文本到图像 ReID 的性能。利用预训练模型权重,我们在传统评估设置中也取得了最先进的性能。
https://arxiv.org/abs/2405.04940
The quest for robust Person re-identification (Re-ID) systems capable of accurately identifying subjects across diverse scenarios remains a formidable challenge in surveillance and security applications. This study presents a novel methodology that significantly enhances Person Re-Identification (Re-ID) by integrating Uncertainty Feature Fusion (UFFM) with Wise Distance Aggregation (WDA). Tested on benchmark datasets - Market-1501, DukeMTMC-ReID, and MSMT17 - our approach demonstrates substantial improvements in Rank-1 accuracy and mean Average Precision (mAP). Specifically, UFFM capitalizes on the power of feature synthesis from multiple images to overcome the limitations imposed by the variability of subject appearances across different views. WDA further refines the process by intelligently aggregating similarity metrics, thereby enhancing the system's ability to discern subtle but critical differences between subjects. The empirical results affirm the superiority of our method over existing approaches, achieving new performance benchmarks across all evaluated datasets. Code is available on Github.
寻找在多样场景中准确识别主题的稳健Person识别(Re-ID)系统仍然是一项艰巨的挑战,尤其是在监视和安全性应用中。本研究介绍了一种通过将不确定度特征融合(UFFM)与智能距离聚合(WDA)相结合来显著增强Person Re-Identification(Re-ID)的新方法。在基准数据集- Market-1501、DukeMTMC-ReID和MSMT17上进行了测试,我们的方法在排名1准确性和平均精度(mAP)方面取得了显著改进。具体来说,UFFM利用多个图像的特征合成能力克服了在不同视角下主题外观变异性所施加的局限性。WDA通过智能聚合相似度度量进一步优化了过程,从而增强了系统在识别主题间微小但关键差异的能力。实证结果证实了我们的方法优越于现有方法,在所有评估数据集上都实现了新的性能基准。代码可在Github上获取。
https://arxiv.org/abs/2405.01101
Clothes-changing person re-identification (CC-ReID) aims to retrieve images of the same person wearing different outfits. Mainstream researches focus on designing advanced model structures and strategies to capture identity information independent of clothing. However, the same-clothes discrimination as the standard ReID learning objective in CC-ReID is persistently ignored in previous researches. In this study, we dive into the relationship between standard and clothes-changing~(CC) learning objectives, and bring the inner conflicts between these two objectives to the fore. We try to magnify the proportion of CC training pairs by supplementing high-fidelity clothes-varying synthesis, produced by our proposed Clothes-Changing Diffusion model. By incorporating the synthetic images into CC-ReID model training, we observe a significant improvement under CC protocol. However, such improvement sacrifices the performance under the standard protocol, caused by the inner conflict between standard and CC. For conflict mitigation, we decouple these objectives and re-formulate CC-ReID learning as a multi-objective optimization (MOO) problem. By effectively regularizing the gradient curvature across multiple objectives and introducing preference restrictions, our MOO solution surpasses the single-task training paradigm. Our framework is model-agnostic, and demonstrates superior performance under both CC and standard ReID protocols.
换衣人重新识别(CC-ReID)旨在检索同一人穿着不同服装的照片。主流研究关注设计具有先进模型结构和策略,使其独立于服装捕捉身份信息。然而, previous researchers 对相同服装的歧视标准 ReID 学习目标持续忽视。在本文中,我们深入研究了标准和换衣人~(CC) 学习目标之间的关系,并揭示了这两者之间的内心冲突。通过补充我们提出的换衣人扩散模型生成的极高保真度换衣图像,我们试图通过增加 CC 训练对偶的比例如下:通过将合成图像纳入 CC-ReID 模型训练,我们观察到在 CC 协议下显著的改善。然而,这种改善牺牲了标准协议下的性能,由于标准和 CC 之间的内心冲突。为减轻这种冲突,我们解耦这两个目标,并将 CC-ReID 学习重新表述为一个多目标优化(MOO)问题。通过有效地对多个目标周围的梯度曲率进行 regularization 和引入偏好限制,我们的 MOO 解决方案超越了单任务训练范式。我们的框架对模型一无所知,并且在 CC 和标准 ReID 协议下均表现出卓越的性能。
https://arxiv.org/abs/2404.12611
Current clothes-changing person re-identification (re-id) approaches usually perform retrieval based on clothes-irrelevant features, while neglecting the potential of clothes-relevant features. However, we observe that relying solely on clothes-irrelevant features for clothes-changing re-id is limited, since they often lack adequate identity information and suffer from large intra-class variations. On the contrary, clothes-relevant features can be used to discover same-clothes intermediaries that possess informative identity clues. Based on this observation, we propose a Feasibility-Aware Intermediary Matching (FAIM) framework to additionally utilize clothes-relevant features for retrieval. Firstly, an Intermediary Matching (IM) module is designed to perform an intermediary-assisted matching process. This process involves using clothes-relevant features to find informative intermediates, and then using clothes-irrelevant features of these intermediates to complete the matching. Secondly, in order to reduce the negative effect of low-quality intermediaries, an Intermediary-Based Feasibility Weighting (IBFW) module is designed to evaluate the feasibility of intermediary matching process by assessing the quality of intermediaries. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on several widely-used clothes-changing re-id benchmarks.
通常,基于衣物无关特征的当前人物识别(RE-ID)方法忽略了衣物相关特征的潜力。然而,我们观察到仅依赖衣物无关特征进行衣物RE-ID是有限的,因为它们通常缺乏足够的身份信息并受到类内变化的影响。相反,衣物相关特征可以用于发现具有有用身份提示的同款衣物中介。基于这一观察,我们提出了一个可行性感知的中介匹配(FAIM)框架,以进一步利用衣物相关特征进行检索。 首先,设计了一个中介匹配(IM)模块,执行中间人辅助匹配过程。这一过程涉及使用衣物相关特征找到有用的中介,然后使用这些中介的衣物无关特征完成匹配。 其次,为了减少低质量中介对匹配过程的负面影响,设计了一个基于中介的中性可行性加权(IBFW)模块,通过评估中介的质量来评估匹配过程的可行性。 丰富的实验证明,我们的方法在多个广泛使用的衣物RE-ID基准测试中超越了最先进的方法。
https://arxiv.org/abs/2404.09507
Visible-infrared person re-identification (VI-reID) aims at matching cross-modality pedestrian images captured by disjoint visible or infrared cameras. Existing methods alleviate the cross-modality discrepancies via designing different kinds of network architectures. Different from available methods, in this paper, we propose a novel parameter optimizing paradigm, parameter hierarchical optimization (PHO) method, for the task of VI-ReID. It allows part of parameters to be directly optimized without any training, which narrows the search space of parameters and makes the whole network more easier to be trained. Specifically, we first divide the parameters into different types, and then introduce a self-adaptive alignment strategy (SAS) to automatically align the visible and infrared images through transformation. Considering that features in different dimension have varying importance, we develop an auto-weighted alignment learning (AAL) module that can automatically weight features according to their importance. Importantly, in the alignment process of SAS and AAL, all the parameters are immediately optimized with optimization principles rather than training the whole network, which yields a better parameter training manner. Furthermore, we establish the cross-modality consistent learning (CCL) loss to extract discriminative person representations with translation consistency. We provide both theoretical justification and empirical evidence that our proposed PHO method outperform existing VI-reID approaches.
可见红外人员重新识别(VI-reID)旨在通过设计不同类型的网络架构来匹配由分离的可见或红外相机捕获的跨模态行人图像。现有的方法通过设计不同的网络架构来减轻跨模态差异。与现有的方法不同,本文提出了一种新的参数优化范例,参数层次优化(PHO)方法,用于VI-reID任务。它允许部分参数通过直接优化而无需训练,从而缩小参数搜索空间并使整个网络更容易训练。具体来说,我们首先将参数分为不同类型,然后引入自适应对齐策略(SAS)通过变换来自动对齐可见和红外图像。考虑到不同维度特征的重要性不同,我们开发了一个自适应加权对齐学习(AAL)模块,可以根据其重要性自动加权特征。重要的是,在SAS和AAL的对齐过程中,所有参数都使用优化原理进行优化,而不是训练整个网络,这导致了更好的参数训练方式。此外,我们还建立了跨模态一致性学习(CCL)损失,用于通过平移一致性提取具有平移一致性的区分性人物表示。我们提供了理论证明和实证证据,证明我们提出的PHO方法优于现有的VI-reID方法。
https://arxiv.org/abs/2404.07930
Unsupervised visible-infrared person re-identification (UVI-ReID) has recently gained great attention due to its potential for enhancing human detection in diverse environments without labeling. Previous methods utilize intra-modality clustering and cross-modality feature matching to achieve UVI-ReID. However, there exist two challenges: 1) noisy pseudo labels might be generated in the clustering process, and 2) the cross-modality feature alignment via matching the marginal distribution of visible and infrared modalities may misalign the different identities from two modalities. In this paper, we first conduct a theoretic analysis where an interpretable generalization upper bound is introduced. Based on the analysis, we then propose a novel unsupervised cross-modality person re-identification framework (PRAISE). Specifically, to address the first challenge, we propose a pseudo-label correction strategy that utilizes a Beta Mixture Model to predict the probability of mis-clustering based network's memory effect and rectifies the correspondence by adding a perceptual term to contrastive learning. Next, we introduce a modality-level alignment strategy that generates paired visible-infrared latent features and reduces the modality gap by aligning the labeling function of visible and infrared features to learn identity discriminative and modality-invariant features. Experimental results on two benchmark datasets demonstrate that our method achieves state-of-the-art performance than the unsupervised visible-ReID methods.
无监督可见-红外人员识别(UVI-ReID)最近因其在不同环境中增强人类检测潜力而受到广泛关注,而无需标签。以前的方法利用内部模态聚类和跨模态特征匹配来实现UVI-ReID。然而,存在两个挑战:1)在聚类过程中可能生成噪声伪标签,2)通过匹配可见和红外模态的边缘分布进行跨模态特征对齐可能错位不同个体的身份。在本文中,我们首先进行理论分析引入了可解释的泛化上界。基于分析,我们 then 提出了一个新颖的无监督跨模态人员识别框架(PRAISE)。具体来说,为解决第一个挑战,我们提出了一个伪标签修正策略,利用贝叶斯混合模型预测网络记忆效应并纠正错配,通过添加感知项来 contrastive 学习。接下来,我们引入了一个模块级别对齐策略,生成对齐的可视-红外潜在特征,并通过对可见和红外特征的标签函数进行对齐来降低模态差距,以学习具有身份鉴别和模态无关特征的识别。在两个基准数据集上的实验结果表明,与其他无监督可见-ReID 方法相比,我们的方法实现了最先进的性能。
https://arxiv.org/abs/2404.06683
The memory dictionary-based contrastive learning method has achieved remarkable results in the field of unsupervised person Re-ID. However, The method of updating memory based on all samples does not fully utilize the hardest sample to improve the generalization ability of the model, and the method based on hardest sample mining will inevitably introduce false-positive samples that are incorrectly clustered in the early stages of the model. Clustering-based methods usually discard a significant number of outliers, leading to the loss of valuable information. In order to address the issues mentioned before, we propose an adaptive intra-class variation contrastive learning algorithm for unsupervised Re-ID, called AdaInCV. And the algorithm quantitatively evaluates the learning ability of the model for each class by considering the intra-class variations after clustering, which helps in selecting appropriate samples during the training process of the model. To be more specific, two new strategies are proposed: Adaptive Sample Mining (AdaSaM) and Adaptive Outlier Filter (AdaOF). The first one gradually creates more reliable clusters to dynamically refine the memory, while the second can identify and filter out valuable outliers as negative samples.
基于记忆字典的对比学习方法在无监督的人体Re-ID领域取得了显著的成果。然而,基于所有样本更新的记忆方法并没有充分利用最难的样本来提高模型的泛化能力,而基于最难样本挖掘的方法可能会引入错误聚类的早期阶段的假阳性样本。聚类方法通常会舍弃大量的异常值,导致重要信息的丢失。为了解决前面提到的问题,我们提出了一个自适应类内变异对比学习算法,称为AdaInCV。这个算法通过考虑聚类后的类内变化来定量评估模型每个类的学习能力,有助于在模型训练过程中选择合适的样本。具体来说,我们提出了两种新的策略:自适应样本挖掘(AdaSaM)和自适应异常滤波(AdaOF)。第一种策略逐渐创建更有信心的聚类以动态优化记忆,而第二种策略可以识别并过滤出有价值的异常作为负样本。
https://arxiv.org/abs/2404.04665
The goal of occluded person re-identification (ReID) is to retrieve specific pedestrians in occluded situations. However, occluded person ReID still suffers from background clutter and low-quality local feature representations, which limits model performance. In our research, we introduce a new framework called PAB-ReID, which is a novel ReID model incorporating part-attention mechanisms to tackle the aforementioned issues effectively. Firstly, we introduce the human parsing label to guide the generation of more accurate human part attention maps. In addition, we propose a fine-grained feature focuser for generating fine-grained human local feature representations while suppressing background interference. Moreover, We also design a part triplet loss to supervise the learning of human local features, which optimizes intra/inter-class distance. We conducted extensive experiments on specialized occlusion and regular ReID datasets, showcasing that our approach outperforms the existing state-of-the-art methods.
遮挡人物识别(ReID)的目标是检索遮挡情况下的特定行人。然而,遮挡人物ReID仍然受到背景杂乱和低质量局部特征表示的限制,这限制了模型的性能。在我们的研究中,我们引入了一个新的框架PAB-ReID,这是一种新型的ReID模型,采用了部分注意机制来有效解决上述问题。首先,我们引入了人类解析标签来指导生成更准确的人的部分注意力图。此外,我们提出了一种细粒度特征关注器,用于在抑制背景干扰的同时生成细粒度的人局部特征表示。此外,我们还设计了一个部分三元组损失来指导人局部特征的学习,该损失优化了类内/类间距离。我们在专门的遮挡和普通ReID数据集上进行了广泛的实验,展示了我们的方法超越了现有最先进的方法。
https://arxiv.org/abs/2404.03443
Occlusion remains one of the major challenges in person reidentification (ReID) as a result of the diversity of poses and the variation of appearances. Developing novel architectures to improve the robustness of occlusion-aware person Re-ID requires new insights, especially on low-resolution edge cameras. We propose a deep ensemble model that harnesses both CNN and Transformer architectures to generate robust feature representations. To achieve robust Re-ID without the need to manually label occluded regions, we propose to take an ensemble learning-based approach derived from the analogy between arbitrarily shaped occluded regions and robust feature representation. Using the orthogonality principle, our developed deep CNN model makes use of masked autoencoder (MAE) and global-local feature fusion for robust person identification. Furthermore, we present a part occlusion-aware transformer capable of learning feature space that is robust to occluded regions. Experimental results are reported on several Re-ID datasets to show the effectiveness of our developed ensemble model named orthogonal fusion with occlusion handling (OFOH). Compared to competing methods, the proposed OFOH approach has achieved competent rank-1 and mAP performance.
遮挡仍然是人物识别(ReID)中的一个主要挑战,由于不同姿态和外观的差异。开发新的架构来提高遮挡注意到的行人ReID的鲁棒性需要新的见解,尤其是在低分辨率边缘相机上。我们提出了一种深度集成模型,利用CNN和Transformer架构生成鲁棒的特征表示。为了实现无需手动标注遮挡区域的稳健ReID,我们提出了一个基于元学习的方法,其来源于任意形状的遮挡区域与鲁棒特征表示的类比。通过正交性原理,我们开发了一种深度CNN模型,利用遮罩自动编码器(MAE)和全局局部特征融合进行鲁棒的行人识别。此外,我们还提出了一个部分遮挡注意到的Transformer,能够学习对遮挡区域鲁棒的特征空间。在多个ReID数据集上进行的实验结果表明,我们提出的具有遮挡处理能力的元学习模型具有很好的效果,名为Orthogonal Fusion with Occlusion Handling (OFOH)。与竞争方法相比,所提出的OFOH方法已经取得了出色的排名1和mAP性能。
https://arxiv.org/abs/2404.00107
Unsupervised person re-identification aims to retrieve images of a specified person without identity labels. Many recent unsupervised Re-ID approaches adopt clustering-based methods to measure cross-camera feature similarity to roughly divide images into clusters. They ignore the feature distribution discrepancy induced by camera domain gap, resulting in the unavoidable performance degradation. Camera information is usually available, and the feature distribution in the single camera usually focuses more on the appearance of the individual and has less intra-identity variance. Inspired by the observation, we introduce a \textbf{C}amera-\textbf{A}ware \textbf{L}abel \textbf{R}efinement~(CALR) framework that reduces camera discrepancy by clustering intra-camera similarity. Specifically, we employ intra-camera training to obtain reliable local pseudo labels within each camera, and then refine global labels generated by inter-camera clustering and train the discriminative model using more reliable global pseudo labels in a self-paced manner. Meanwhile, we develop a camera-alignment module to align feature distributions under different cameras, which could help deal with the camera variance further. Extensive experiments validate the superiority of our proposed method over state-of-the-art approaches. The code is accessible at this https URL.
无监督的人重新识别的目的是检索指定人物的图像,而无需身份标签。许多最近的无监督 Re-ID 方法采用聚类为基础的方法来测量跨相机特征的相似性,将图像大致分为簇。它们忽略了由相机领域差异引起的特征分布差异,导致性能降低。相机信息通常可用,而单个相机的特征分布通常更加关注单个个人的外观,并且具有较少的内部identity variance。受到观察的启发,我们引入了一个 Camera-Aware Label Refinement (CALR) 框架,通过聚类相机内相似性来减少相机差异。具体来说,我们使用相机内训练来获得每个相机内的可靠局部伪标签,然后通过 inter-camera 聚类生成的全局标签,以更可靠的全局伪标签的方式训练判别模型。同时,我们开发了一个相机对齐模块,用于在不同相机上对特征分布进行对齐,这可以帮助我们进一步处理相机变化。大量实验验证了我们提出的方法相对于最先进方法的优越性。代码可在此链接访问:
https://arxiv.org/abs/2403.16450
Lifelong Person Re-Identification (LReID) aims to continuously learn from successive data streams, matching individuals across multiple cameras. The key challenge for LReID is how to effectively preserve old knowledge while learning new information incrementally. Task-level domain gaps and limited old task datasets are key factors leading to catastrophic forgetting in ReLD, which are overlooked in existing methods. To alleviate this problem, we propose a novel Diverse Representation Embedding (DRE) framework for LReID. The proposed DRE preserves old knowledge while adapting to new information based on instance-level and task-level layout. Concretely, an Adaptive Constraint Module (ACM) is proposed to implement integration and push away operations between multiple representations, obtaining dense embedding subspace for each instance to improve matching ability on limited old task datasets. Based on the processed diverse representation, we interact knowledge between the adjustment model and the learner model through Knowledge Update (KU) and Knowledge Preservation (KP) strategies at the task-level layout, which reduce the task-wise domain gap on both old and new tasks, and exploit diverse representation of each instance in limited datasets from old tasks, improving model performance for extended periods. Extensive experiments were conducted on eleven Re-ID datasets, including five seen datasets for training in order-1 and order-2 orders and six unseen datasets for inference. Compared to state-of-the-art methods, our method achieves significantly improved performance in holistic, large-scale, and occluded datasets.
终身人物识别(LReID)旨在从连续的数据流中持续学习,并将个体跨越多台摄像机进行匹配。LReID的关键挑战是如何在逐渐学习新信息的同时有效保留旧知识。任务级别领域空白和有限的旧任务数据集是导致ReLD灾难性遗忘的原因,而现有的方法忽略了这个问题。为了减轻这个问题,我们提出了一个名为Diverse Representation Embedding(DRE)的新LReID框架。DRE在保留旧知识的同时,根据实例级别和任务级别布局自适应地适应新信息。具体来说,我们提出了一个自适应约束模块(ACM)来实现多个表示之间的集成和推开操作,为每个实例获得稠密嵌入子空间,从而提高在有限的老任务数据集中的匹配能力。根据处理后的多样性表示,我们在任务级别布局中通过知识更新(KU)和知识保留(KP)策略与调整模型和学习器模型交互,从而在老任务和新任务上减少任务级别领域差异,并从老任务有限数据集中每个实例的多样性表示中挖掘知识,提高模型在长时间内的性能。我们在包括order-1和order-2订单的五个可见数据集以及六个未见数据集上进行了广泛的实验。与最先进的 methods相比,我们的方法在整体、大型和遮挡的数据集上取得了显著的改进。
https://arxiv.org/abs/2403.16003
Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques. However, the existing benchmark datasets lack diversity, and models trained on these data cannot generalize well to dynamic wild scenarios. To meet the goal of improving the explicit generalization of ReID models, we develop a new Open-World, Diverse, Cross-Spatial-Temporal dataset named OWD with several distinct features. 1) Diverse collection scenes: multiple independent open-world and highly dynamic collecting scenes, including streets, intersections, shopping malls, etc. 2) Diverse lighting variations: long time spans from daytime to nighttime with abundant illumination changes. 3) Diverse person status: multiple camera networks in all seasons with normal/adverse weather conditions and diverse pedestrian appearances (e.g., clothes, personal belongings, poses, etc.). 4) Protected privacy: invisible faces for privacy critical applications. To improve the implicit generalization of ReID, we further propose a Latent Domain Expansion (LDE) method to develop the potential of source data, which decouples discriminative identity-relevant and trustworthy domain-relevant features and implicitly enforces domain-randomized identity feature space expansion with richer domain diversity to facilitate domain invariant representations. Our comprehensive evaluations with most benchmark datasets in the community are crucial for progress, although this work is far from the grand goal toward open-world and dynamic wild applications.
由于数据驱动的深度学习技术的进步,人物识别(ReID)取得了很大进展。然而,现有的基准数据集缺乏多样性,因此在这些数据上训练的模型在动态野外场景下的泛化能力差。为了实现提高ReID模型的显式泛化目标,我们开发了一个名为OWD的新开放世界、多样、跨时空数据集,具有多个独特的特征。1) 多样化的场景收集:包括多个独立开放世界和高度动态的场景,如街道、交叉口、购物中心等。2) 多样化的光照变化:从白天到黑夜漫长的时间段,有丰富的光照变化。3) 多样的人的状态:所有季节的多个相机网络,包括正常/恶劣的天气条件以及多样的人行道外观(例如,衣服、个人物品、姿势等)。4) 保护隐私:对于关键隐私应用的可见面。为了提高ReID的隐式泛化,我们进一步提出了一个潜在领域扩展(LDE)方法,以开发数据源的潜在能力,该方法解耦了相关域的特征,隐含地强制域随机化身份特征空间扩张,并为领域不变的表示创造更丰富的领域多样性。我们在社区中的大多数基准数据集的全面评估对于进步来说至关重要,尽管这项工作离开放世界和动态野外应用的 grand goal 还有很长的路要走。
https://arxiv.org/abs/2403.15119
Existing person re-identification methods have achieved remarkable advances in appearance-based identity association across homogeneous cameras, such as ground-ground matching. However, as a more practical scenario, aerial-ground person re-identification (AGPReID) among heterogeneous cameras has received minimal attention. To alleviate the disruption of discriminative identity representation by dramatic view discrepancy as the most significant challenge in AGPReID, the view-decoupled transformer (VDT) is proposed as a simple yet effective framework. Two major components are designed in VDT to decouple view-related and view-unrelated features, namely hierarchical subtractive separation and orthogonal loss, where the former separates these two features inside the VDT, and the latter constrains these two to be independent. In addition, we contribute a large-scale AGPReID dataset called CARGO, consisting of five/eight aerial/ground cameras, 5,000 identities, and 108,563 images. Experiments on two datasets show that VDT is a feasible and effective solution for AGPReID, surpassing the previous method on mAP/Rank1 by up to 5.0%/2.7% on CARGO and 3.7%/5.2% on AG-ReID, keeping the same magnitude of computational complexity. Our project is available at this https URL
目前,在基于外观的个体识别方法已经在均匀相机中取得了显著的进步,例如地面地面匹配。然而,作为更实际的场景,异质相机中的航空地面人物识别(AGPReID)受到了很少的关注。为了减轻由于显著的视差差异导致的区分性身份表示中断,我们提出了一个简单的但有效的框架——视解耦变压器(VDT)。 VDT有两个主要组成部分,用于解耦视相关和视无关特征。具体来说,前者在VDT内部分离这两个特征,后者则约束这两个特征相互独立。此外,我们还提出了一个名为CARGO的大规模AGPReID数据集,包括5/8个航空/地面相机,5,000个个体和108,563个图像。在两个数据集上的实验结果表明,VDT对于AGPReID是一个可行的且有效的解决方案,在CARGO数据集上比前方法提高了5.0%/2.7%的mAP/Rank1,而在AG-ReID数据集上提高了3.7%/5.2%的性能,同时保持相同的计算复杂度。我们的项目可以在这个https://url上找到。
https://arxiv.org/abs/2403.14513
Person re-identification (re-id), which aims to retrieve images of the same person in a given image from a database, is one of the most practical image recognition applications. In the real world, however, the environments that the images are taken from change over time. This causes a distribution shift between training and testing and degrades the performance of re-id. To maintain re-id performance, models should continue adapting to the test environment's temporal changes. Test-time adaptation (TTA), which aims to adapt models to the test environment with only unlabeled test data, is a promising way to handle this problem because TTA can adapt models instantly in the test environment. However, the previous TTA methods are designed for classification and cannot be directly applied to re-id. This is because the set of people's identities in the dataset differs between training and testing in re-id, whereas the set of classes is fixed in the current TTA methods designed for classification. To improve re-id performance in changing test environments, we propose TEst-time similarity Modification for Person re-identification (TEMP), a novel TTA method for re-id. TEMP is the first fully TTA method for re-id, which does not require any modification to pre-training. Inspired by TTA methods that refine the prediction uncertainty in classification, we aim to refine the uncertainty in re-id. However, the uncertainty cannot be computed in the same way as classification in re-id since it is an open-set task, which does not share person labels between training and testing. Hence, we propose re-id entropy, an alternative uncertainty measure for re-id computed based on the similarity between the feature vectors. Experiments show that the re-id entropy can measure the uncertainty on re-id and TEMP improves the performance of re-id in online settings where the distribution changes over time.
人物识别(RE-ID)是一种从数据库中检索相同人物图像的图像识别应用,是实践中最实用的图像识别应用之一。然而,在现实生活中,照片拍摄的环境会随着时间的推移而变化,这会导致训练和测试之间的分布转移,从而降低RE-ID的性能。为了保持RE-ID的性能,模型应继续适应测试环境的时变性。测试时间适应(TTA)是一种通过仅使用未标记测试数据来适应测试环境的方法,是解决这个问题的一个有前途的方法,因为TTA可以在测试环境中立即适应模型。然而,之前的设计为分类的TTA方法无法直接应用于RE-ID。这是因为数据集中的人名集合在训练和测试环境之间有所不同,而当前为分类设计的TTA方法中的集合是固定的。为了在变化的环境中提高RE-ID的性能,我们提出了TEst-time similarity Modification for Person re-identification(TEMP),一种新颖的RE-ID TTA方法。TEMP是第一个完全的RE-ID TTA方法,不需要对预训练进行修改。受到分类TTA方法精炼预测不确定性的启发,我们试图提高RE-ID的不确定性。然而,由于RE-ID是一个开放集任务,它不共享在训练和测试环境之间的人名标签,因此我们提出了RE-ID entropy,一种基于特征向量之间相似性计算的RE-ID alternative uncertainty measure。实验证明,RE-ID熵可以衡量RE-ID的不确定性,而TEMP的性能在在线设置中提高了RE-ID的性能,这些设置中分布会随时间变化。
https://arxiv.org/abs/2403.14114
Visible-Infrared Person Re-identification (VI-ReID) is a challenging cross-modal pedestrian retrieval task, due to significant intra-class variations and cross-modal discrepancies among different cameras. Existing works mainly focus on embedding images of different modalities into a unified space to mine modality-shared features. They only seek distinctive information within these shared features, while ignoring the identity-aware useful information that is implicit in the modality-specific features. To address this issue, we propose a novel Implicit Discriminative Knowledge Learning (IDKL) network to uncover and leverage the implicit discriminative information contained within the modality-specific. First, we extract modality-specific and modality-shared features using a novel dual-stream network. Then, the modality-specific features undergo purification to reduce their modality style discrepancies while preserving identity-aware discriminative knowledge. Subsequently, this kind of implicit knowledge is distilled into the modality-shared feature to enhance its distinctiveness. Finally, an alignment loss is proposed to minimize modality discrepancy on enhanced modality-shared features. Extensive experiments on multiple public datasets demonstrate the superiority of IDKL network over the state-of-the-art methods. Code is available at this https URL.
可见-红外人物识别(VI-ReID)是一个具有挑战性的跨模态行人检索任务,因为不同相机之间存在显著的类内差异和跨模态差异。现有工作主要集中在将不同模态的图像嵌入到一个统一的 space 中,以挖掘模态共性特征。他们仅关注这些共享特征中的显着信息,而忽略了隐含在模态特定特征中的身份意识有用信息。为了解决这个问题,我们提出了一个新颖的隐式区分性知识学习(IDKL)网络来揭示和利用模态特定特征中隐含的区分性信息。首先,我们使用一种新颖的双流网络提取模态特定和模态共性特征。然后,模态特定特征经过净化,以减少其模态风格差异,同时保留身份意识区分性知识。接下来,这种隐含知识被蒸馏到模态共性特征中,以增强其独特性。最后,提出了一种对增强模态共性特征的同步损失,以最小化模态差异。在多个公开数据集上进行的大量实验证明,IDKL网络相对于最先进的方法具有优越性。代码可在此链接处获取。
https://arxiv.org/abs/2403.11708
Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges. To our best knowledge, there is no versatile ReID model that can handle various ReID challenges at the same time. This work contributes to the first attempt at learning a versatile ReID model to solve such a problem. Our main idea is to form a two-stage prompt-based twin modeling framework called VersReID. Our VersReID firstly leverages the scene label to train a ReID Bank that contains abundant knowledge for handling various scenes, where several groups of scene-specific prompts are used to encode different scene-specific knowledge. In the second stage, we distill a V-Branch model with versatile prompts from the ReID Bank for adaptively solving the ReID of different scenes, eliminating the demand for scene labels during the inference stage. To facilitate training VersReID, we further introduce the multi-scene properties into self-supervised learning of ReID via a multi-scene prioris data augmentation (MPDA) strategy. Through extensive experiments, we demonstrate the success of learning an effective and versatile ReID model for handling ReID tasks under multi-scene conditions without manual assignment of scene labels in the inference stage, including general, low-resolution, clothing change, occlusion, and cross-modality scenes. Codes and models are available at this https URL.
为了学习同一人在不同视角下的图像之间的关联,在过去的十年里,对Person Re-identification(ReID)的研究已经得到了广泛的发展。为了克服在不同视角之间图像之间的图像差异,为了解决诸如分辨率变化、着装变化、遮挡和模态变化等问题,已经开发了大量的ReID模型的变体。尽管许多ReID变体在性能上表现出色,但这些变体通常会以独特的方式运行,并且不能应用于其他问题。据我们所知,没有一种通用的ReID模型可以同时处理各种ReID挑战。 我们的主要想法是建立一个两阶段提示为基础的双胞胎建模框架,称为VersReID。VersReID首先利用场景标签来训练一个包含丰富知识以处理各种场景的ReID银行,其中几组场景特定的提示被用于编码不同的场景特定知识。在第二阶段,我们从ReID银行中提取具有多样提示的V-支模态,用于自适应地解决不同场景的ReID,消除在推理阶段需要场景标签的需求。为了方便训练VersReID,我们还通过多场景 priori数据增强(MPDA)策略引入了多场景属性。 通过大量实验,我们证明了在不需要在推理阶段手动分配场景标签的情况下,学习一个有效且多场景的ReID模型可以成功地解决ReID任务,包括一般、低分辨率、着装变化、遮挡和跨模态场景。代码和模型可以从该链接下载。
https://arxiv.org/abs/2403.11121
A key challenge in visible-infrared person re-identification (V-I ReID) is training a backbone model capable of effectively addressing the significant discrepancies across modalities. State-of-the-art methods that generate a single intermediate bridging domain are often less effective, as this generated domain may not adequately capture sufficient common discriminant information. This paper introduces the Bidirectional Multi-step Domain Generalization (BMDG), a novel approach for unifying feature representations across diverse modalities. BMDG creates multiple virtual intermediate domains by finding and aligning body part features extracted from both I and V modalities. Indeed, BMDG aims to reduce the modality gaps in two steps. First, it aligns modalities in feature space by learning shared and modality-invariant body part prototypes from V and I images. Then, it generalizes the feature representation by applying bidirectional multi-step learning, which progressively refines feature representations in each step and incorporates more prototypes from both modalities. In particular, our method minimizes the cross-modal gap by identifying and aligning shared prototypes that capture key discriminative features across modalities, then uses multiple bridging steps based on this information to enhance the feature representation. Experiments conducted on challenging V-I ReID datasets indicate that our BMDG approach outperforms state-of-the-art part-based models or methods that generate an intermediate domain from V-I person ReID.
在可见-红外人员识别(V-I ReID)中的一个关键挑战是训练一个能够有效解决不同模态之间显著差异的主干模型。最先进的生成单个中间域的方法通常效果较差,因为生成的中间域可能不足以捕捉足够的共同区分信息。本文介绍了一种名为双向多级域泛化(BMDG)的新方法,用于统一不同模态的特征表示。BMDG通过找到并平滑从I和V模态中提取的身体部位特征,创建多个虚拟的中间域。实际上,BMDG旨在通过两个步骤减少模态差距。首先,它通过在特征空间中学习共享的和与模态无关的身体部位原型来对模态进行对齐。然后,它通过双向多级学习逐步优化每个步骤的特征表示,并从两个模态中包括更多的原型。特别地,我们的方法通过识别和归一化捕捉关键区分特征的共享原型,然后根据这些信息使用多个桥接步骤来增强特征表示。在具有挑战性的V-I ReID数据集上进行的实验表明,我们的BMDG方法优于最先进的部分基于模型或从V-I人员ReID中生成中间域的方法。
https://arxiv.org/abs/2403.10782
Lifelong person re-identification (LReID) assumes a practical scenario where the model is sequentially trained on continuously incoming datasets while alleviating the catastrophic forgetting in the old datasets. However, not only the training datasets but also the gallery images are incrementally accumulated, that requires a huge amount of computational complexity and storage space to extract the features at the inference phase. In this paper, we address the above mentioned problem by incorporating the backward-compatibility to LReID for the first time. We train the model using the continuously incoming datasets while maintaining the model's compatibility toward the previously trained old models without re-computing the features of the old gallery images. To this end, we devise the cross-model compatibility loss based on the contrastive learning with respect to the replay features across all the old datasets. Moreover, we also develop the knowledge consolidation method based on the part classification to learn the shared representation across different datasets for the backward-compatibility. We suggest a more practical methodology for performance evaluation as well where all the gallery and query images are considered together. Experimental results demonstrate that the proposed method achieves a significantly higher performance of the backward-compatibility compared with the existing methods. It is a promising tool for more practical scenarios of LReID.
终身人物识别(LReID)假定一个实际场景,即在连续的 incoming 数据集中对模型进行序列训练,同时减轻 old 数据集中的灾难性遗忘。然而,不仅训练数据集,还包括画廊图像,都需要积累大量的计算复杂度和存储空间,在推理阶段提取特征。在本文中,我们通过首次将反向兼容性引入 LReID,解决了上述提到的这个问题。我们在连续的 incoming 数据集上训练模型,同时保持模型对之前训练的旧模型的兼容性,而不重新计算旧画廊图像的特征。为此,我们根据所有 old 数据集的对比学习,设计了一种跨模态兼容性损失。此外,我们还基于部分分类开发了知识整合方法,以学习不同数据集之间的共享表示。我们建议一种更实际的性能评估方法,其中所有画廊和查询图像都被考虑在内。实验结果表明,与现有方法相比,所提出的方法在反向兼容性方面取得了显著的提高。这是一个有前景的工具,适用于更实际的 LReID 场景。
https://arxiv.org/abs/2403.10022