A fundamental tenet of pattern recognition is that overlap between training and testing sets causes an optimistic accuracy estimate. Deep CNNs for face recognition are trained for N-way classification of the identities in the training set. Accuracy is commonly estimated as average 10-fold classification accuracy on image pairs from test sets such as LFW, CALFW, CPLFW, CFP-FP and AgeDB-30. Because train and test sets have been independently assembled, images and identities in any given test set may also be present in any given training set. In particular, our experiments reveal a surprising degree of identity and image overlap between the LFW family of test sets and the MS1MV2 training set. Our experiments also reveal identity label noise in MS1MV2. We compare accuracy achieved with same-size MS1MV2 subsets that are identity-disjoint and not identity-disjoint with LFW, to reveal the size of the optimistic bias. Using more challenging test sets from the LFW family, we find that the size of the optimistic bias is larger for more challenging test sets. Our results highlight the lack of and the need for identity-disjoint train and test methodology in face recognition research.
模式识别的一个基本信念是,训练集和测试集之间的覆盖会导致乐观估计的准确性估计。用于面部识别的深度卷积神经网络通过N路分类对训练集中的身份进行训练。通常将准确性估计为像LFW、CALFW、CPLFW、CFP-FP和AgeDB-30等测试集中的图像对的平均10倍分类准确度。因为训练和测试集是独立组装的,所以每个测试集中的图像和身份可能在任何训练集中找到。特别是,我们的实验揭示了LFW家族测试集和MS1MV2训练集之间身份和图像重叠的令人惊讶的程度。我们的实验还揭示了MS1MV2中身份标签噪声。我们比较了与LFW相同大小的子集获得的准确度,这些子集不与LFW具有相同的身份,以揭示乐观估计的规模。使用LFW家族更具有挑战性的测试集,我们发现,对于具有更大挑战性的测试集,乐观估计的规模更大。我们的结果突出了在面部识别研究中缺乏身份-不相同的训练和测试方法的问题,并强调了需要改进这个问题。
https://arxiv.org/abs/2405.09403
In driving scenarios, automobile active safety systems are increasingly incorporating deep learning technology. These systems typically need to handle multiple tasks simultaneously, such as detecting fatigue driving and recognizing the driver's identity. However, the traditional parallel-style approach of combining multiple single-task models tends to waste resources when dealing with similar tasks. Therefore, we propose a novel tree-style multi-task modeling approach for multi-task learning, which rooted at a shared backbone, more dedicated separate module branches are appended as the model pipeline goes deeper. Following the tree-style approach, we propose a multi-task learning model for simultaneously performing driver fatigue detection and face recognition for identifying a driver. This model shares a common feature extraction backbone module, with further separated feature extraction and classification module branches. The dedicated branches exploit and combine spatial and channel attention mechanisms to generate space-channel fused-attention enhanced features, leading to improved detection performance. As only single-task datasets are available, we introduce techniques including alternating updation and gradient accumulation for training our multi-task model using only the single-task datasets. The effectiveness of our tree-style multi-task learning model is verified through extensive validations.
在驾驶场景中,汽车主动安全系统 increasingly 开始采用深度学习技术。这些系统通常需要同时处理多个任务,例如检测疲劳驾驶和识别驾驶员身份。然而,传统的并行式方法在处理类似任务时会浪费资源。因此,我们提出了一个新树形多任务建模方法,该方法基于共享骨架,随着模型管道加深,有更多的专用分支附加到模型管道中。遵循树形方法,我们提出了一个同时检测驾驶员疲劳和识别驾驶员身份的多任务学习模型。该模型共享了一个共同的特征提取骨架模块,并进一步附加了分离的特征提取和分类模块分支。专用的分支利用和结合空间和通道关注机制产生空间通道融合注意力增强特征,导致检测性能 improved。由于只有单任务数据集可用,我们介绍了交替更新和梯度累积等技术,以便仅使用单任务数据集训练我们的多任务模型。通过广泛的验证,我们验证了我们的树形多任务学习模型的有效性。
https://arxiv.org/abs/2405.07845
Assistive technologies for the visually impaired have evolved to facilitate interaction with a complex and dynamic world. In this paper, we introduce AIris, an AI-powered wearable device that provides environmental awareness and interaction capabilities to visually impaired users. AIris combines a sophisticated camera mounted on eyewear with a natural language processing interface, enabling users to receive real-time auditory descriptions of their surroundings. We have created a functional prototype system that operates effectively in real-world conditions. AIris demonstrates the ability to accurately identify objects and interpret scenes, providing users with a sense of spatial awareness previously unattainable with traditional assistive devices. The system is designed to be cost-effective and user-friendly, supporting general and specialized tasks: face recognition, scene description, text reading, object recognition, money counting, note-taking, and barcode scanning. AIris marks a transformative step, bringing AI enhancements to assistive technology, enabling rich interactions with a human-like feel.
辅助技术使盲人更容易与复杂和动态的世界进行交互。在本文中,我们介绍了AIris,一种由人工智能驱动的智能可穿戴设备,为盲人提供环境和交互功能。AIris将先进的摄像头安装在眼镜上,并配备了自然语言处理界面,使用户能够实时听到他们周围环境的描述。我们创建了一个在现实条件下有效运作的功能原型系统。AIris展示了准确识别物体和解释场景的能力,为用户提供了一种以前无法通过传统辅助设备获得的空间意识。该系统旨在实现成本效益和用户友好性,支持一般和专业任务:面部识别、场景描述、文字阅读、物体识别、货币计数、记事本扫描和条形码扫描。AIris标志着一个重大的转变,将人工智能提升到辅助技术中,使人们能够以类人般的方式与丰富的方式进行交互。
https://arxiv.org/abs/2405.07606
Masked face recognition (MFR) has emerged as a critical domain in biometric identification, especially by the global COVID-19 pandemic, which introduced widespread face masks. This survey paper presents a comprehensive analysis of the challenges and advancements in recognising and detecting individuals with masked faces, which has seen innovative shifts due to the necessity of adapting to new societal norms. Advanced through deep learning techniques, MFR, along with Face Mask Recognition (FMR) and Face Unmasking (FU), represent significant areas of focus. These methods address unique challenges posed by obscured facial features, from fully to partially covered faces. Our comprehensive review delves into the various deep learning-based methodologies developed for MFR, FMR, and FU, highlighting their distinctive challenges and the solutions proposed to overcome them. Additionally, we explore benchmark datasets and evaluation metrics specifically tailored for assessing performance in MFR research. The survey also discusses the substantial obstacles still facing researchers in this field and proposes future directions for the ongoing development of more robust and effective masked face recognition systems. This paper serves as an invaluable resource for researchers and practitioners, offering insights into the evolving landscape of face recognition technologies in the face of global health crises and beyond.
遮罩脸识别(MFR)已成为生物识别领域的一个关键领域,尤其是在全球COVID-19大流行期间,人们开始广泛佩戴面部口罩。这份调查论文对识别和检测遮罩脸个体的挑战和进展进行全面分析,由于适应新的社会规范的必要性,出现了创新的变化。通过深度学习技术取得了显著进步的MFR、Face Mask Recognition(FMR)和Face Unmasking(FU)代表了重要的研究重点。这些方法解决了遮罩脸特征独特的问题,从完全到部分遮盖的脸部。我们全面的回顾深入研究了为MFR、FMR和FU开发的各种基于深度学习的技术,突出了它们的独特挑战以及为解决这些挑战提出的解决方案。此外,我们还探讨了专门针对评估MFR研究绩效的基准数据集和评估指标。调查还讨论了该领域研究人员仍然面临的重大障碍,并提出了未来在开发更健壮和有效的遮罩脸识别系统方面的建议。本文对研究人员和从业者来说是一个宝贵的资源,揭示了全球卫生危机及以后面部识别技术演变的可能性。
https://arxiv.org/abs/2405.05900
Numerous studies have shown that existing Face Recognition Systems (FRS), including commercial ones, often exhibit biases toward certain ethnicities due to under-represented data. In this work, we explore ethnicity alteration and skin tone modification using synthetic face image generation methods to increase the diversity of datasets. We conduct a detailed analysis by first constructing a balanced face image dataset representing three ethnicities: Asian, Black, and Indian. We then make use of existing Generative Adversarial Network-based (GAN) image-to-image translation and manifold learning models to alter the ethnicity from one to another. A systematic analysis is further conducted to assess the suitability of such datasets for FRS by studying the realistic skin-tone representation using Individual Typology Angle (ITA). Further, we also analyze the quality characteristics using existing Face image quality assessment (FIQA) approaches. We then provide a holistic FRS performance analysis using four different systems. Our findings pave the way for future research works in (i) developing both specific ethnicity and general (any to any) ethnicity alteration models, (ii) expanding such approaches to create databases with diverse skin tones, (iii) creating datasets representing various ethnicities which further can help in mitigating bias while addressing privacy concerns.
许多研究都表明,现有的Face Recognition系统(包括商业系统)往往因为代表性数据不足而倾向于针对某些民族产生偏见。在这项工作中,我们使用合成面部图像生成方法来探讨种族改变和肤色修改,以增加数据集的多样性。我们首先构建了一个代表三个民族的平衡面部图像数据集,然后利用现有的基于生成对抗网络(GAN)的图像到图像转换和多态学习模型,将一种民族的肤色改变为另一种民族。我们进一步研究了这种数据集对Face Recognition System(FRS)的适用性,通过研究个体典型角度(ITA)来评估肤色现实主义表示。此外,我们还分析了使用现有的面部图像质量评估(FIQA)方法来评估质量特征。然后,我们使用四种不同的系统提供了全面的FRS性能分析。我们的研究结果为未来研究奠定了基础:(一)开发既针对特定民族又针对任意民族改变模型的可能性;(二)将这种方法扩展到创建具有不同肤色的数据库的可能性;(三)创建代表各种民族的數據集,从而在减轻偏见的同时解决隐私问题。
https://arxiv.org/abs/2405.01273
Face Recognition (FR) models are trained on large-scale datasets, which have privacy and ethical concerns. Lately, the use of synthetic data to complement or replace genuine data for the training of FR models has been proposed. While promising results have been obtained, it still remains unclear if generative models can yield diverse enough data for such tasks. In this work, we introduce a new method, inspired by the physical motion of soft particles subjected to stochastic Brownian forces, allowing us to sample identities distributions in a latent space under various constraints. With this in hands, we generate several face datasets and benchmark them by training FR models, showing that data generated with our method exceeds the performance of previously GAN-based datasets and achieves competitive performance with state-of-the-art diffusion-based synthetic datasets. We also show that this method can be used to mitigate leakage from the generator's training set and explore the ability of generative models to generate data beyond it.
面对识别(FR)模型在训练过程中存在隐私和伦理问题的大型数据集,最近提出了使用合成数据来补充或替换真实数据以训练FR模型的方法。虽然已经取得了一些令人满意的结果,但仍然不清楚生成模型是否可以产生足够多样化的数据来完成这些任务。在本文中,我们介绍了一种新方法,受到软粒子在随机的布朗力作用下运动物理运动的启发,允许在各种约束条件下在潜在空间中采样身份分布。有了这个方法,我们生成了多个面部数据集,并通过训练FR模型对其进行比较,结果表明,使用我们方法生成的数据超过了以前基于GAN的dataset的表现,并实现了与最先进的扩散基础 synthetic datasets 竞争的性能。我们还证明了这种方法可以用于减轻生成器训练集的泄漏,并探讨了生成模型是否具有超出其训练集的能力生成数据。
https://arxiv.org/abs/2405.00228
In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID generation problems. Additionally, our model achieves state-of-the-art performance in both single-ID and multi-ID preservation. Furthermore, our model exhibits remarkable scalability with a greater number of ID preservation than it was originally trained with.
在个性化图像生成领域,创建保留概念的图像的能力已经显著提高。创建一个自然地将多个概念集成在统一且视觉上吸引人的构图中的图像确实具有挑战性。本文介绍了一种名为“InstantFamily”的方法,该方法采用了一种新颖的遮罩交叉注意力和多模态嵌入堆栈来实现零散ID图像生成。我们的方法有效地保留了ID,因为它利用了与文本条件预训练的人脸识别模型。此外,我们的遮罩交叉注意力机制使得在生成的图像中精确控制多ID和构图。我们通过实验证明InstantFamily在生成具有多ID的图像方面具有优势,同时解决了已知的多ID生成问题。此外,我们的模型在单ID和多ID保留方面都达到了最先进的性能。此外,我们的模型在保留更多ID的情况下表现出了出色的可扩展性。
https://arxiv.org/abs/2404.19427
The recent progress in generative models has revolutionized the synthesis of highly realistic images, including face images. This technological development has undoubtedly helped face recognition, such as training data augmentation for higher recognition accuracy and data privacy. However, it has also introduced novel challenges concerning the responsible use and proper attribution of computer generated images. We investigate the impact of digital watermarking, a technique for embedding ownership signatures into images, on the effectiveness of face recognition models. We propose a comprehensive pipeline that integrates face image generation, watermarking, and face recognition to systematically examine this question. The proposed watermarking scheme, based on an encoder-decoder architecture, successfully embeds and recovers signatures from both real and synthetic face images while preserving their visual fidelity. Through extensive experiments, we unveil that while watermarking enables robust image attribution, it results in a slight decline in face recognition accuracy, particularly evident for face images with challenging poses and expressions. Additionally, we find that directly training face recognition models on watermarked images offers only a limited alleviation of this performance decline. Our findings underscore the intricate trade off between watermarking and face recognition accuracy. This work represents a pivotal step towards the responsible utilization of generative models in face recognition and serves to initiate discussions regarding the broader implications of watermarking in biometrics.
近年来,生成模型在图像合成方面的进步已经彻底颠覆了高度逼真的图像合成,包括人脸图像的合成。这一技术发展无疑帮助提高了人脸识别的准确性和隐私保护,例如增加训练数据以提高识别精度和支持更多的隐私。然而,它也引入了关于计算机生成图像的负责任使用和适当归功的新挑战。我们研究了数字水印技术,一种将所有权签名嵌入图像中的技术,对人脸识别模型的效果。我们提出了一个综合的管道,将人脸图像生成、水印和人脸识别系统系统地研究这个问题。基于编码器-解码器架构的水印方案,在真实和合成人脸图像上成功地嵌入和恢复签名,同时保留它们的视觉保真度。通过广泛的实验,我们发现,水印使得图像归因更加稳健,但人脸识别准确性略有下降,特别是对于具有复杂姿态和表情的面部图像。此外,我们发现,直接在带有水印的图像上训练人脸识别模型,只能缓解这种性能下降。我们的研究结果强调了水印和面部识别准确性的复杂权衡。这项工作代表了一个关键步骤,有助于将生成模型在面部识别和生物识别领域的负责任使用,并引发了关于水印在生物识别领域更广泛影响的热烈讨论。
https://arxiv.org/abs/2404.18890
Modern face recognition systems utilize deep neural networks to extract salient features from a face. These features denote embeddings in latent space and are often stored as templates in a face recognition system. These embeddings are susceptible to data leakage and, in some cases, can even be used to reconstruct the original face image. To prevent compromising identities, template protection schemes are commonly employed. However, these schemes may still not prevent the leakage of soft biometric information such as age, gender and race. To alleviate this issue, we propose a novel technique that combines Fully Homomorphic Encryption (FHE) with an existing template protection scheme known as PolyProtect. We show that the embeddings can be compressed and encrypted using FHE and transformed into a secure PolyProtect template using polynomial transformation, for additional protection. We demonstrate the efficacy of the proposed approach through extensive experiments on multiple datasets. Our proposed approach ensures irreversibility and unlinkability, effectively preventing the leakage of soft biometric attributes from face embeddings without compromising recognition accuracy.
现代面部识别系统利用深度神经网络从面部中提取显著特征。这些特征表示潜在空间中的嵌入,通常被面部识别系统中的模板存储。这些嵌入很容易受到数据泄漏的影响,在某些情况下,甚至可以用于重构原始面部图像。为了防止泄露身份,通常采用模板保护方案。然而,这些方案可能仍无法防止软生物特征(如年龄、性别和种族)的泄露。为了缓解这个问题,我们提出了一种结合完全同态加密(FHE)和已知模板保护方案(PolyProtect)的新技术。我们证明了使用FHE可以压缩和加密嵌入,并且可以使用多项式变换将其转换为安全的PolyProtect模板,提供额外的保护。我们通过在多个数据集上进行广泛实验,证明了所提出方法的有效性。与我们的方法相比,确保不可逆性和解链性,有效防止了未经过授权的软生物特征从面部嵌入中泄露,同时保持识别准确性。
https://arxiv.org/abs/2404.16255
Face Recognition Systems (FRS) are widely used in commercial environments, such as e-commerce and e-banking, owing to their high accuracy in real-world conditions. However, these systems are vulnerable to facial morphing attacks, which are generated by blending face color images of different subjects. This paper presents a new method for generating 3D face morphs from two bona fide point clouds. The proposed method first selects bona fide point clouds with neutral expressions. The two input point clouds were then registered using a Bayesian Coherent Point Drift (BCPD) without optimization, and the geometry and color of the registered point clouds were averaged to generate a face morphing point cloud. The proposed method generates 388 face-morphing point clouds from 200 bona fide subjects. The effectiveness of the method was demonstrated through extensive vulnerability experiments, achieving a Generalized Morphing Attack Potential (G-MAP) of 97.93%, which is superior to the existing state-of-the-art (SOTA) with a G-MAP of 81.61%.
面部识别系统(FRS)在商业环境中(如电子商务和电子银行)得到了广泛应用,因为它们在现实情况下的准确度高。然而,这些系统容易受到由不同主题混合生成面部颜色图像的变形攻击。本文提出了一种从两个真实点云生成3D面部变形的方法。与优化无关,两个输入点云使用贝叶斯一致性点漂移(BCPD)进行注册,然后平均几何和颜色生成面部变形点云。该方法从200个真实主题中生成388个面部变形点云。通过广泛的漏洞实验,该方法的有效性得到了证明,实现了97.93%的泛化形态攻击潜力(G-MAP),远高于现有状态下的81.61%。
https://arxiv.org/abs/2404.15765
Face recognition applications have grown in parallel with the size of datasets, complexity of deep learning models and computational power. However, while deep learning models evolve to become more capable and computational power keeps increasing, the datasets available are being retracted and removed from public access. Privacy and ethical concerns are relevant topics within these domains. Through generative artificial intelligence, researchers have put efforts into the development of completely synthetic datasets that can be used to train face recognition systems. Nonetheless, the recent advances have not been sufficient to achieve performance comparable to the state-of-the-art models trained on real data. To study the drift between the performance of models trained on real and synthetic datasets, we leverage a massive attribute classifier (MAC) to create annotations for four datasets: two real and two synthetic. From these annotations, we conduct studies on the distribution of each attribute within all four datasets. Additionally, we further inspect the differences between real and synthetic datasets on the attribute set. When comparing through the Kullback-Leibler divergence we have found differences between real and synthetic samples. Interestingly enough, we have verified that while real samples suffice to explain the synthetic distribution, the opposite could not be further from being true.
面部识别应用程序与数据集的大小、深度学习模型的复杂性和计算能力成正比增长。然而,尽管深度学习模型不断进化变得更具弹性和计算能力在增加,但可用的数据集正在减少和移除。隐私和伦理问题在这些领域内具有相关性。通过生成人工智能,研究人员在开发完全 synthetic 数据集以供训练面部识别系统方面付出了努力。然而,最近的研究成果尚不能达到与基于真实数据的先进模型的性能相当的水平。为了研究在真实和合成数据上训练模型的性能漂移,我们利用大规模属性分类器(MAC)为四个数据集:两个真实和两个合成创建注释:从这些注释,我们研究了每个属性的所有四个数据集中的分布。此外,我们进一步研究了真实和合成数据在属性集上的差异。通过Kullback-Leibler散度比较,我们发现了真实和合成样本之间的差异。有趣的是,我们已经证实,尽管真实样本足以解释合成分布,但相反的说法并不完全正确。
https://arxiv.org/abs/2404.15234
The wide deployment of Face Recognition (FR) systems poses risks of privacy leakage. One countermeasure to address this issue is adversarial attacks, which deceive malicious FR searches but simultaneously interfere the normal identity verification of trusted authorizers. In this paper, we propose the first Double Privacy Guard (DPG) scheme based on traceable adversarial watermarking. DPG employs a one-time watermark embedding to deceive unauthorized FR models and allows authorizers to perform identity verification by extracting the watermark. Specifically, we propose an information-guided adversarial attack against FR models. The encoder embeds an identity-specific watermark into the deep feature space of the carrier, guiding recognizable features of the image to deviate from the source identity. We further adopt a collaborative meta-optimization strategy compatible with sub-tasks, which regularizes the joint optimization direction of the encoder and decoder. This strategy enhances the representation of universal carrier features, mitigating multi-objective optimization conflicts in watermarking. Experiments confirm that DPG achieves significant attack success rates and traceability accuracy on state-of-the-art FR models, exhibiting remarkable robustness that outperforms the existing privacy protection methods using adversarial attacks and deep watermarking, or simple combinations of the two. Our work potentially opens up new insights into proactive protection for FR privacy.
广泛部署人脸识别(FR)系统会带来隐私泄露的风险。解决这个问题的一个对策是对抗性攻击,这种攻击会欺骗恶意的人脸识别,但同时会干扰可信授权者的正常身份验证。在本文中,我们提出了基于可追踪的对抗性水印的第一个双隐私保护(DPG)方案。DPG采用一次性水印嵌入来欺骗未经授权的人脸识别模型,并允许授权者通过提取水印来验证身份。具体来说,我们针对FR模型提出了信息指导的对抗性攻击。编码器将身份特定的水印嵌入到载体的深度特征空间中,引导图像的可识别特征远离源身份。我们进一步采用了一种可互补的元优化策略,该策略与子任务兼容,规范了编码器和解码器的联合优化方向。这种策略提高了普遍载荷特征的代表性,减轻了水印标记中的多目标优化冲突。实验证实,DPG在最先进的FR模型上实现了显著的攻击成功率和可追溯准确性,表现出出色的稳健性,超过使用对抗攻击和深度水印的现有隐私保护方法,或者使用简单的水印和编码器组合。我们的工作可能会为FR隐私的主动保护提供新的见解。
https://arxiv.org/abs/2404.14693
Face recognition technology has become an integral part of modern security systems and user authentication processes. However, these systems are vulnerable to spoofing attacks and can easily be circumvented. Most prior research in face anti-spoofing (FAS) approaches it as a two-class classification task where models are trained on real samples and known spoof attacks and tested for detection performance on unknown spoof attacks. However, in practice, FAS should be treated as a one-class classification task where, while training, one cannot assume any knowledge regarding the spoof samples a priori. In this paper, we reformulate the face anti-spoofing task from a one-class perspective and propose a novel hyperbolic one-class classification framework. To train our network, we use a pseudo-negative class sampled from the Gaussian distribution with a weighted running mean and propose two novel loss functions: (1) Hyp-PC: Hyperbolic Pairwise Confusion loss, and (2) Hyp-CE: Hyperbolic Cross Entropy loss, which operate in the hyperbolic space. Additionally, we employ Euclidean feature clipping and gradient clipping to stabilize the training in the hyperbolic space. To the best of our knowledge, this is the first work extending hyperbolic embeddings for face anti-spoofing in a one-class manner. With extensive experiments on five benchmark datasets: Rose-Youtu, MSU-MFSD, CASIA-MFSD, Idiap Replay-Attack, and OULU-NPU, we demonstrate that our method significantly outperforms the state-of-the-art, achieving better spoof detection performance.
面部识别技术已成为现代安全系统和用户身份验证过程的重要组成部分。然而,这些系统容易受到伪造攻击的攻击,而且可以轻松被绕过。在先前的面部抗伪造(FAS)研究中,大多数将FAS视为二分类分类任务,其中模型在真实样本和已知伪造攻击上进行训练,并在未知伪造攻击上测试检测性能。然而,在实践中,FAS应该被视为一个一分类分类任务,在训练过程中,不能假设有任何关于伪造样本的知识。在本文中,我们将从一分类的角度重新定义面部抗伪造任务,并提出了一个新的超几何一分类框架。为了训练我们的网络,我们使用从高斯分布伪负样本,带权运行平均的伪负类,并提出两个新的损失函数:(1) Hyp-PC:超几何对偶混淆损失,和(2) Hyp-CE:超几何交叉熵损失,它们在超几何空间中操作。此外,我们还使用欧氏特征截断和梯度截断来稳定超几何空间中的训练。据我们所知,这是第一个在一类方式上扩展超几何嵌入用于面部抗伪造的工作。在五个基准数据集:罗切斯特县,密苏里大学MSU-MFSD,卡西娅大学CASIA-MFSD,碘亚帕回复攻击和OULU-NPU的广泛实验中,我们证明了我们的方法在性能上明显优于现有技术水平,实现了更好的伪造检测性能。
https://arxiv.org/abs/2404.14406
Heterogeneous Face Recognition (HFR) aims to expand the applicability of Face Recognition (FR) systems to challenging scenarios, enabling the matching of face images across different domains, such as matching thermal images to visible spectra. However, the development of HFR systems is challenging because of the significant domain gap between modalities and the lack of availability of large-scale paired multi-channel data. In this work, we leverage a pretrained face recognition model as a teacher network to learn domaininvariant network layers called Domain-Invariant Units (DIU) to reduce the domain gap. The proposed DIU can be trained effectively even with a limited amount of paired training data, in a contrastive distillation framework. This proposed approach has the potential to enhance pretrained models, making them more adaptable to a wider range of variations in data. We extensively evaluate our approach on multiple challenging benchmarks, demonstrating superior performance compared to state-of-the-art methods.
异质面部识别(HFR)旨在将面部识别(FR)系统的应用扩展到具有挑战性的场景中,实现不同领域间面部图像的匹配,例如将热成像与可见光谱进行匹配。然而,由于模态之间的显著差异和大规模多通道数据缺乏,HFR系统的发展具有挑战性。在这项工作中,我们利用预训练的人脸识别模型作为教师网络,学习领域无关网络层,称为领域无关单元(DIU),以减少领域差距。所提出的DIU可以在训练过程中有效处理有限量的成对训练数据,并通过对比性蒸馏框架实现有效训练。这种方法具有提高预训练模型的潜力,使它们对数据中的更广泛的变异性具有更强的适应性。我们对我们的方法在多个具有挑战性的基准进行了广泛评估,证明了其在最先进的methods之上的卓越性能。
https://arxiv.org/abs/2404.14343
Heterogeneous Face Recognition (HFR) focuses on matching faces from different domains, for instance, thermal to visible images, making Face Recognition (FR) systems more versatile for challenging scenarios. However, the domain gap between these domains and the limited large-scale datasets in the target HFR modalities make it challenging to develop robust HFR models from scratch. In our work, we view different modalities as distinct styles and propose a method to modulate feature maps of the target modality to address the domain gap. We present a new Conditional Adaptive Instance Modulation (CAIM ) module that seamlessly fits into existing FR networks, turning them into HFR-ready systems. The CAIM block modulates intermediate feature maps, efficiently adapting to the style of the source modality and bridging the domain gap. Our method enables end-to-end training using a small set of paired samples. We extensively evaluate the proposed approach on various challenging HFR benchmarks, showing that it outperforms state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available
异质面部识别(HFR)关注于不同领域(例如,热图像到可见图像)的匹配,使面部识别(FR)系统在具有挑战性的场景更具多样性。然而,这些领域与目标HFR模态之间存在的领域差距以及目标HFR模态中有限的大规模数据集使从零开始开发鲁棒HFR模型具有挑战性。在我们的工作中,我们将不同模块视为不同的样式,并提出了一个方法来调节目标模态的特征图以解决领域差距。我们提出了一个名为条件自适应实例调制(CAIM)的模块,它与现有的FR网络无缝集成,使它们成为HFR ready的系统。CAIM模块调节中间特征图,有效地适应源模态的风格,并弥合领域差距。我们的方法使用一对配对样本进行端到端训练。我们在各种具有挑战性的HFR基准中广泛评估所提出的方案,结果表明,它超过了最先进的方法。复制这些发现的源代码和协议将公开发布。
https://arxiv.org/abs/2404.14247
Facial biometrics are an essential components of smartphones to ensure reliable and trustworthy authentication. However, face biometric systems are vulnerable to Presentation Attacks (PAs), and the availability of more sophisticated presentation attack instruments such as 3D silicone face masks will allow attackers to deceive face recognition systems easily. In this work, we propose a novel Presentation Attack Detection (PAD) algorithm based on 3D point clouds captured using the frontal camera of a smartphone to detect presentation attacks. The proposed PAD algorithm, VoxAtnNet, processes 3D point clouds to obtain voxelization to preserve the spatial structure. Then, the voxelized 3D samples were trained using the novel convolutional attention network to detect PAs on the smartphone. Extensive experiments were carried out on the newly constructed 3D face point cloud dataset comprising bona fide and two different 3D PAIs (3D silicone face mask and wrap photo mask), resulting in 3480 samples. The performance of the proposed method was compared with existing methods to benchmark the detection performance using three different evaluation protocols. The experimental results demonstrate the improved performance of the proposed method in detecting both known and unknown face presentation attacks.
面部生物识别是确保智能手机可靠且值得信赖的认证的重要组成部分。然而,面部生物识别系统容易受到展示攻击(PAs)的影响,而且比例如3D硅胶面部口罩等更复杂的展示攻击工具将使攻击者轻松欺骗面部识别系统。在这项工作中,我们提出了一种基于智能手机前摄像头捕获的3D点云的新型展示攻击检测(PAD)算法来检测展示攻击。所提出的PAD算法,VoxAtnNet,对3D点云进行处理以实现体素化以保留空间结构。然后,使用新颖的卷积注意网络对体素化的3D样本进行训练,以检测智能手机上的PAs。在构建了包含真实和两种不同3D PPI(3D硅胶面部口罩和贴纸照片面具)的新建3D面部点云数据集上进行了大量实验,结果产生了3480个样本。将所提出的方法与现有方法进行比较,以通过三种不同的评估协议 benchmark检测性能。实验结果表明,与已知和未知面部展示攻击相比,所提出的方法在检测方面都取得了显著改进。
https://arxiv.org/abs/2404.12680
Face-morphing attacks are a growing concern for biometric researchers, as they can be used to fool face recognition systems (FRS). These attacks can be generated at the image level (supervised) or representation level (unsupervised). Previous unsupervised morphing attacks have relied on generative adversarial networks (GANs). More recently, researchers have used linear interpolation of StyleGAN-encoded images to generate morphing attacks. In this paper, we propose a new method for generating high-quality morphing attacks using StyleGAN disentanglement. Our approach, called MLSD-GAN, spherically interpolates the disentangled latents to produce realistic and diverse morphing attacks. We evaluate the vulnerability of MLSD-GAN on two deep-learning-based FRS techniques. The results show that MLSD-GAN poses a significant threat to FRS, as it can generate morphing attacks that are highly effective at fooling these systems.
面部变形攻击是生物特征研究者越来越关注的问题,因为这些攻击可以使用户欺骗人脸识别系统(FRS)。这些攻击可以在图像级别(监督)或表示级别(无监督)进行生成。之前无监督的变形攻击依赖于生成对抗网络(GANs)。更最近,研究人员使用StyleGAN编码的图像的线性插值生成变形攻击。在本文中,我们提出了一种利用StyleGAN解离生成新方法来生成高质量变形攻击。我们的方法称为MLSD-GAN,它使用球形插值来解离变差的 latent,产生真实和多样化的变形攻击。我们评估了MLSD-GAN在两种基于深度学习的FRS技术上的安全性。结果显示,MLSD-GAN对FRS构成了显著威胁,因为它可以生成欺骗性效果很强的变形攻击。
https://arxiv.org/abs/2404.12679
Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface. Today, spreadsheets are used by billions of users to manipulate tables, most of whom are neither database experts nor professional programmers. Despite the success of spreadsheets, authoring complex formulas remains challenging, as non-technical users need to look up and understand non-trivial formula syntax. To address this pain point, we leverage the observation that there is often an abundance of similar-looking spreadsheets in the same organization, which not only have similar data, but also share similar computation logic encoded as formulas. We develop an Auto-Formula system that can accurately predict formulas that users want to author in a target spreadsheet cell, by learning and adapting formulas that already exist in similar spreadsheets, using contrastive-learning techniques inspired by "similar-face recognition" from compute vision. Extensive evaluations on over 2K test formulas extracted from real enterprise spreadsheets show the effectiveness of Auto-Formula over alternatives. Our benchmark data is available at this https URL to facilitate future research.
电子表格被广泛认为是用户最喜爱的开发工具,它将基于公式的计算力量与直观的表格界面相结合。如今,电子表格被数十亿人用于操作表格,其中大多数用户既不是数据库专家也不是专业程序员。尽管电子表格取得了成功,但创建复杂公式仍然具有挑战性,因为非技术用户需要查找并理解非琐碎的公式语法。为了应对这个痛点,我们利用观察到同一组织中通常有很多类似外观的电子表格这一事实,这些电子表格不仅具有类似的数据,而且共享相似的计算逻辑,作为公式编码。我们开发了一种自动公式系统,可以准确预测用户希望在目标电子表格单元格中创建的公式,通过使用与计算视觉中的“相似脸识别”技术灵感相同的对比学习方法来学习并适应现有的类似电子表格中的公式。对来自真实企业电子表格的2K个测试公式的广泛评估显示,自动公式比其他方法更有效。我们的基准数据可在此处访问,以促进未来研究:https://www.example.com/。
https://arxiv.org/abs/2404.12608
In recent years, Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology. With the rise of counterfeit face generation techniques, the challenge posed by digitally edited faces to face anti-spoofing is escalating. Existing FAS technologies primarily focus on intercepting physically forged faces and lack a robust solution for cross-domain FAS challenges. Moreover, determining an appropriate threshold to achieve optimal deployment results remains an issue for intra-domain FAS. To address these issues, we propose a visualization method that intuitively reflects the training outcomes of models by visualizing the prediction results on datasets. Additionally, we demonstrate that employing data augmentation techniques, such as downsampling and Gaussian blur, can effectively enhance performance on cross-domain tasks. Building upon our data visualization approach, we also introduce a methodology for setting threshold values based on the distribution of the training dataset. Ultimately, our methods secured us second place in both the Unified Physical-Digital Face Attack Detection competition and the Snapshot Spectral Imaging Face Anti-spoofing contest. The training code is available at this https URL.
近年来,面部抗伪造(FAS)在保护脸部识别技术的安全方面发挥了关键作用。随着伪造人脸生成技术的出现,数字编辑人脸对面部抗伪造技术的挑战不断升级。现有的FAS技术主要关注拦截物理伪造人脸,缺乏跨域FAS挑战的稳健解决方案。此外,确定适当的阈值以实现最佳部署结果仍然是内部FAS领域的难题。为了解决这些问题,我们提出了一个自直观地反映模型训练结果的可视化方法。此外,我们还证明了使用数据增强技术(如下采样和高斯模糊)可以在跨域任务上有效提高性能。基于我们的数据可视化方法,我们还引入了一种根据训练数据分布设置阈值的方法。最终,我们的方法在统一物理-数字脸部攻击检测比赛和快照光谱成像脸部抗伪造竞赛中获得了第二名的成绩。训练代码可在此处访问:https://www.url。
https://arxiv.org/abs/2404.12602
Face Image Quality Assessment (FIQA) estimates the utility of face images for automated face recognition (FR) systems. We propose in this work a novel approach to assess the quality of face images based on inspecting the required changes in the pre-trained FR model weights to minimize differences between testing samples and the distribution of the FR training dataset. To achieve that, we propose quantifying the discrepancy in Batch Normalization statistics (BNS), including mean and variance, between those recorded during FR training and those obtained by processing testing samples through the pretrained FR model. We then generate gradient magnitudes of pretrained FR weights by backpropagating the BNS through the pretrained model. The cumulative absolute sum of these gradient magnitudes serves as the FIQ for our approach. Through comprehensive experimentation, we demonstrate the effectiveness of our training-free and quality labeling-free approach, achieving competitive performance to recent state-of-theart FIQA approaches without relying on quality labeling, the need to train regression networks, specialized architectures, or designing and optimizing specific loss functions.
面部图像质量评估(FIQA)估计面部图像对自动面部识别(FR)系统的利用率。在本文中,我们提出了一种新方法来评估面部图像的质量,即根据检查预训练FR模型权重所需的更改来最小化测试样本与FR训练数据分布之间的差异。为了实现这一目标,我们提出计算在FR训练期间记录的BNS之间差异的方差,包括均值和方差,以及通过预训练FR模型处理测试样本获得的BNS之间的差异。然后,通过反向传播算法计算预训练FR权重的梯度大小。这些梯度大小的累积绝对和作为FIQ。通过全面的实验,我们证明了我们无需训练和免费的质量标注方法的有效性,实现了与最近 state-of-the-art FIQA 方法竞争的性能,而无需依赖质量标注。我们证明了不需要训练回归网络、专用架构或设计并优化特定的损失函数。
https://arxiv.org/abs/2404.12203