Unveiling the real appearance of retouched faces to prevent malicious users from deceptive advertising and economic fraud has been an increasing concern in the era of digital economics. This article makes the first attempt to investigate the face retouching reversal (FRR) problem. We first collect an FRR dataset, named deepFRR, which contains 50,000 StyleGAN-generated high-resolution (1024*1024) facial images and their corresponding retouched ones by a commercial online API. To our best knowledge, deepFRR is the first FRR dataset tailored for training the deep FRR models. Then, we propose a novel diffusion-based FRR approach (FRRffusion) for the FRR task. Our FRRffusion consists of a coarse-to-fine two-stage network: A diffusion-based Facial Morpho-Architectonic Restorer (FMAR) is constructed to generate the basic contours of low-resolution faces in the first stage, while a Transformer-based Hyperrealistic Facial Detail Generator (HFDG) is designed to create high-resolution facial details in the second stage. Tested on deepFRR, our FRRffusion surpasses the GP-UNIT and Stable Diffusion methods by a large margin in four widespread quantitative metrics. Especially, the de-retouched images by our FRRffusion are visually much closer to the raw face images than both the retouched face images and those restored by the GP-UNIT and Stable Diffusion methods in terms of qualitative evaluation with 85 subjects. These results sufficiently validate the efficacy of our work, bridging the recently-standing gap between the FRR and generic image restoration tasks. The dataset and code are available at this https URL.
在数字经济的时期,揭开修饰前后的脸的真实外观以防止恶意用户进行欺骗性广告和经济欺诈是一个越来越关注的问题。本文是首次调查了脸部修饰反向(FRR)问题。我们首先收集了一个名为deepFRR的数据集,其中包含50,000个由StyleGAN生成的具有1024*1024分辨率的高清(1024*1024)面部图像以及它们由商业在线API修整过的相应图像。据我们所知,deepFRR是第一个针对训练深度FRR模型的FRR数据集。然后,我们提出了一个新颖的扩散为基础的FRR方法(FRRffusion)用于FRR任务。我们的FRRffusion包括一个粗到细的两级网络:首先,通过扩散构建面部形态还原器(FMAR),以生成低分辨率面部的基本轮廓;其次,设计了一个Transformer-based超现实面部细节生成器(HFDG),用于在第二个阶段创建高分辨率面部细节。在deepFRR上测试我们的FRRffusion,我们的FRRffusion在四个广泛的定量指标上超过了GP-UNIT和Stable Diffusion方法。特别是,我们通过FRRffusion生成的去修补过的图像在视觉上与原始面部图像非常接近,而在定量评估中,与GP-UNIT和Stable Diffusion方法相比,修复后的图像在85个受试者中的质量评估结果也相差无几。这些结果充分验证了我们的工作,缩小了FRR和通用图像修复任务之间的最近空白。数据集和代码可在此https URL找到。
https://arxiv.org/abs/2405.07582
Deformable object manipulation is a classical and challenging research area in robotics. Compared with rigid object manipulation, this problem is more complex due to the deformation properties including elastic, plastic, and elastoplastic deformation. In this paper, we describe a new deformable object manipulation method including soft contact simulation, manipulation learning, and sim-to-real transfer. We propose a novel approach utilizing Vision-Based Tactile Sensors (VBTSs) as the end-effector in simulation to produce observations like relative position, squeezed area, and object contour, which are transferable to real robots. For a more realistic contact simulation, a new simulation environment including elastic, plastic, and elastoplastic deformations is created. We utilize RL strategies to train agents in the simulation, and expert demonstrations are applied for challenging tasks. Finally, we build a real experimental platform to complete the sim-to-real transfer and achieve a 90% success rate on difficult tasks such as cylinder and sphere. To test the robustness of our method, we use plasticine of different hardness and sizes to repeat the tasks including cylinder and sphere. The experimental results show superior performances of deformable object manipulation with the proposed method.
变形对象操作是一个经典的机器人研究领域。与刚性对象操作相比,由于变形特性包括弹性、塑性和弹性塑性变形,这个问题更加复杂。在本文中,我们描述了一种新的变形对象操作方法,包括软接触仿真、操作学习和仿真到实物的转移。我们提出了一个利用基于视觉的触觉传感器(VBTSs)作为末端执行器在仿真中产生相对位置、挤压区域和物体轮廓等观察值的全新方法。为了实现更加真实的接触仿真,我们创建了一个包括弹性、塑性和弹性塑性变形的新仿真环境。我们使用强化学习策略对仿真中的代理进行训练,并应用专家演示来解决具有挑战性的任务。最后,我们构建了一个真实实验平台,以实现仿真到实物的转移,并在困难任务(如圆柱体和球体)上实现90%的成功率。为了测试我们方法的稳健性,我们使用不同硬度和大小的塑料来重复包括圆柱体和球体的任务。实验结果表明,与所提出的方法相比,变形对象操作具有卓越的性能。
https://arxiv.org/abs/2405.07237
The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' meanings. We first show, on the basis of a Taiwan corpus of spontaneous conversations, using the generalized additive regression model, and focusing on the rise-fall tone pattern, that after controlling for effects of speaker and context, word type is a stronger predictor of pitch realization than all the previously established word-form related predictors combined. Importantly, the addition of information about meaning in context improves prediction accuracy even further. We then proceed to show, using computational modeling with context-specific word embeddings, that token-specific pitch contours predict word type with 50% accuracy on held-out data, and that context-sensitive, token-specific embeddings can predict the shape of pitch contours with 30% accuracy. These accuracies, which are an order of magnitude above chance level, suggest that the relation between words' pitch contours and their meanings are sufficiently strong to be functional for language users. The theoretical implications of these empirical findings are discussed.
汉语 two-character words 的语调轮廓通常被认为是由于构成单个字符词的潜在元音以及与诸如语速、与相邻元音的共调、语素构成和可预测性等因素的强制性约束相互作用而形成的。本研究显示,语调实现也与单词的意义有关。我们首先基于台湾会话语料库,使用广义加权回归模型,重点关注升调降调 patterns,得出在控制说话人和上下文的影响之后,单词类型是对 pitch realization 的预测强度大于所有之前确定的单词形式相关预测器的总和的强预测因素。重要的是,在上下文信息的基础上进一步提高了预测准确性。接着,我们通过使用上下文特定的 word embedding 进行计算建模,证明了对于 hold-out 数据,单词特定升调轮廓可以预测单词类型达到 50% 的准确率,而上下文敏感的、单词特定的嵌入可以预测语调轮廓的形状达到 30% 的准确率。这些准确率,其准确度是偶然水平的十倍以上,表明单词的升调轮廓和其意义之间的关系足够强,可以对语言使用者产生功能性影响。这些实证发现的理论意义进行了讨论。
https://arxiv.org/abs/2405.07006
To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object and not be restricted to segmenting a limited set of object classes, especially in robotic applications. Addressing this need, we introduce a diffusion model with cumulative occlusion learning designed for sequential amodal segmentation of objects with uncertain categories. This model iteratively refines the prediction using the cumulative mask strategy during diffusion, effectively capturing the uncertainty of invisible regions and adeptly reproducing the complex distribution of shapes and occlusion orders of occluded objects. It is akin to the human capability for amodal perception, i.e., to decipher the spatial ordering among objects and accurately predict complete contours for occluded objects in densely layered visual scenes. Experimental results across three amodal datasets show that our method outperforms established baselines.
要完全理解单个图像的3D语境,视觉系统必须能够同时分割可见和隐藏的对象,并分辨它们的遮挡顺序。理想情况下,系统应该能够处理任何物体,而不仅仅局限于分割有限的一组物体类别,尤其是在机器人应用中。为解决这个需求,我们引入了一种用于对不确定类别的物体进行序列amodal分割的累积遮挡学习扩散模型。这个模型在扩散过程中使用累积掩码策略来逐步优化预测,有效地捕捉了不可见区域的不确定性,并准确地还原了遮挡物体的形状和遮挡顺序的复杂分布。与人类对amodal perception的能力类似,即在多层视觉场景中理解和预测被遮挡物体的完整轮廓。在三个amodal数据集上的实验结果表明,我们的方法超越了已有的基线方法。
https://arxiv.org/abs/2405.05791
In recent years, convolutional neural networks (CNNs) have achieved remarkable advancement in the field of remote sensing image super-resolution due to the complexity and variability of textures and structures in remote sensing images (RSIs), which often repeat in the same images but differ across others. Current deep learning-based super-resolution models focus less on high-frequency features, which leads to suboptimal performance in capturing contours, textures, and spatial information. State-of-the-art CNN-based methods now focus on the feature extraction of RSIs using attention mechanisms. However, these methods are still incapable of effectively identifying and utilizing key content attention signals in RSIs. To solve this problem, we proposed an advanced feature extraction module called Channel and Spatial Attention Feature Extraction (CSA-FE) for effectively extracting the features by using the channel and spatial attention incorporated with the standard vision transformer (ViT). The proposed method trained over the UCMerced dataset on scales 2, 3, and 4. The experimental results show that our proposed method helps the model focus on the specific channels and spatial locations containing high-frequency information so that the model can focus on relevant features and suppress irrelevant ones, which enhances the quality of super-resolved images. Our model achieved superior performance compared to various existing models.
近年来,卷积神经网络(CNNs)在远程 sensing图像超分辨率领域取得了显著进展,这是由于远程 sensing图像(RSIs)中纹理和结构复杂性和变异性,这些图像在同一图像上重复,但在其他图像上不同。目前基于深度学习的超分辨率方法更加关注低频特征,这导致在捕捉轮廓、纹理和空间信息方面性能较低。最先进的基于CNN的超级分辨率方法现在专注于使用注意力机制提取RSIs的特征。然而,这些方法仍然无法有效地识别和利用RSIs中的关键内容关注信号。为了解决这个问题,我们提出了一个名为通道和空间注意力特征提取(CSA-FE)的高级特征提取模块,通过使用与标准视觉Transformer(ViT)集成的通道和空间注意力来有效地提取特征。在训练方面,我们使用UC Merced数据集在规模2、3和4上进行训练。实验结果表明,与各种现有模型相比,我们提出的方法有助于模型集中于包含高频信息的特定通道和空间位置,使模型可以关注相关特征并抑制无关特征,从而提高超分辨率图像的质量。我们的模型在各种现有模型中具有卓越的性能。
https://arxiv.org/abs/2405.04595
This paper aims to create a deep learning framework that can estimate the deformation vector field (DVF) for directly registering abdominal MRI-CT images. The proposed method assumed a diffeomorphic deformation. By using topology-preserved deformation features extracted from the probabilistic diffeomorphic registration model, abdominal motion can be accurately obtained and utilized for DVF estimation. The model integrated Swin transformers, which have demonstrated superior performance in motion tracking, into the convolutional neural network (CNN) for deformation feature extraction. The model was optimized using a cross-modality image similarity loss and a surface matching loss. To compute the image loss, a modality-independent neighborhood descriptor (MIND) was used between the deformed MRI and CT images. The surface matching loss was determined by measuring the distance between the warped coordinates of the surfaces of contoured structures on the MRI and CT images. The deformed MRI image was assessed against the CT image using the target registration error (TRE), Dice similarity coefficient (DSC), and mean surface distance (MSD) between the deformed contours of the MRI image and manual contours of the CT image. When compared to only rigid registration, DIR with the proposed method resulted in an increase of the mean DSC values of the liver and portal vein from 0.850 and 0.628 to 0.903 and 0.763, a decrease of the mean MSD of the liver from 7.216 mm to 3.232 mm, and a decrease of the TRE from 26.238 mm to 8.492 mm. The proposed deformable image registration method based on a diffeomorphic transformer provides an effective and efficient way to generate an accurate DVF from an MRI-CT image pair of the abdomen. It could be utilized in the current treatment planning workflow for liver radiotherapy.
本文旨在创建一个深度学习框架,可以准确估计直接注册的腹部MRI-CT图像的变形矢量场(DVF)。所提出的方法基于等变形的变形。通过使用概率形态不变的变形特征提取,可以准确获得腹部运动,并用于DVF估计。模型将Swin变换器集成到卷积神经网络(CNN)中,用于变形特征提取。模型使用跨模态图像相似性损失和表面匹配损失进行优化。为了计算图像损失,在MRI和CT图像之间使用了一个模态无关的邻域描述符(MIND)。表面匹配损失通过测量MRI和CT图像上轮廓结构的变形坐标之间的距离来确定。对MRI图像的变形轮廓使用目标注册误差(TRE)、余弦相似度系数(DSC)和平均表面距离(MSD)与手动CT图像的变形轮廓进行比较。与仅刚性注册相比,所提出的方法导致肝脏和门静脉的平均DSC值从0.850和0.628增加至0.903和0.763,肝脏平均MSD从7.216 mm减少至3.232 mm,TRE从26.238 mm减少至8.492 mm。基于等变形的图像注册方法,可以生成准确的可用于腹部MRI-CT图像对中的DVF。它可用于当前的肝脏放射治疗计划工作流程。
https://arxiv.org/abs/2405.02692
Background and purpose: Deformable image registration (DIR) is a crucial tool in radiotherapy for extracting and modelling organ motion. However, when significant changes and sliding boundaries are present, it faces compromised accuracy and uncertainty, determining the subsequential contour propagation and dose accumulation procedures. Materials and methods: We propose an implicit neural representation (INR)-based approach modelling motion continuously in both space and time, named Continues-sPatial-Temporal DIR (CPT-DIR). This method uses a multilayer perception (MLP) network to map 3D coordinate (x,y,z) to its corresponding velocity vector (vx,vy,vz). The displacement vectors (dx,dy,dz) are then calculated by integrating velocity vectors over time. The MLP's parameters can rapidly adapt to new cases without pre-training, enhancing optimisation. The DIR's performance was tested on the DIR-Lab dataset of 10 lung 4DCT cases, using metrics of landmark accuracy (TRE), contour conformity (Dice) and image similarity (MAE). Results: The proposed CPT-DIR can reduce landmark TRE from 2.79mm to 0.99mm, outperforming B-splines' results for all cases. The MAE of the whole-body region improves from 35.46HU to 28.99HU. Furthermore, CPT-DIR surpasses B-splines for accuracy in the sliding boundary region, lowering MAE and increasing Dice coefficients for the ribcage from 65.65HU and 90.41% to 42.04HU and 90.56%, versus 75.40HU and 89.30% without registration. Meanwhile, CPT-DIR offers significant speed advantages, completing in under 15 seconds compared to a few minutes with the conventional B-splines method. Conclusion: Leveraging the continuous representations, the CPT-DIR method significantly enhances registration accuracy, automation and speed, outperforming traditional B-splines in landmark and contour precision, particularly in the challenging areas.
背景和目的:曲面图像配准(DIR)在放射治疗中是提取和建模器官运动的关键工具。然而,当存在显著的变化和滑动边界时,它面临精度和不确定性的妥协,从而确定后续轮廓传播和剂量积累过程。材料和方法:我们提出了一种基于隐式神经表示(INR)的方法,在空间和时间上建模连续运动,名为继续-空间-时间曲面DIR(CPT-DIR)。该方法使用多层感知(MLP)网络将3D坐标(x,y,z)映射到其相应的速度向量(vx,vy,vz)。然后通过积分速度向量计算位移向量(dx,dy,dz)。MLP的参数可以快速适应新的病例,无需预训练,提高优化。DIR的性能在10个肺4DCT数据集上进行了测试,使用地标准确性(TRE)、轮廓一致性(Dice)和图像相似性(MAE)等指标。结果:与预训练的B-splines方法相比,CPT-DIR可以降低地标TRE从2.79mm降低到0.99mm,在所有病例中优于B-splines。整个身体的MAE从35.46HU降低到28.99HU。此外,CPT-DIR在滑动边界区域的精度超过了B-splines,降低了MAE并增加了脊椎的Dice系数从65.65HU和90.41%降低到42.04HU和90.56%,与75.40HU和89.30%没有配准相比。同时,CPT-DIR具有显著的速
https://arxiv.org/abs/2405.00430
Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficult miscellaneous diseases. In this paper, we address these issues by proposing the first high-definition (1280 $\times$ 1280) X-ray based pre-trained foundation vision model on our newly collected large-scale dataset which contains more than 1 million X-ray images. Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input, and the masked image patches are reconstructed by the Transformer encoder-decoder network. More importantly, we introduce a novel context-aware masking strategy that utilizes the chest contour as a boundary for adaptive masking operations. We validate the effectiveness of our model on two downstream tasks, including X-ray report generation and disease recognition. Extensive experiments demonstrate that our pre-trained medical foundation vision model achieves comparable or even new state-of-the-art performance on downstream benchmark datasets. The source code and pre-trained models of this paper will be released on this https URL.
目前基于X光预训练的视觉模型通常是在相对较小的数据集(小于500k个样本)上进行的,分辨率有限(例如,224×224)。然而,自监督预训练大型模型的成功关键在于大规模训练数据,因此在X光图像领域保持高分辨率是解决复杂疾病有效解决方案的保证。在本文中,我们通过在新收集的大规模数据集上提出第一个高清晰度(1280×1280)X光基于预训练的基础视觉模型来解决这些问题。我们的模型遵循遮罩自动编码器框架,将经过遮罩处理的token(具有高率)作为输入,通过Transformer编码器-解码器网络对遮罩图像补丁进行重建。更重要的是,我们引入了一种新颖的上下文感知遮罩策略,利用胸廓作为自适应遮罩操作的边界。我们对我们的模型在两个下游任务(包括X光报告生成和疾病识别)的有效性进行了评估。大量实验证明,我们的预训练医疗基础视觉模型在下游基准数据集上实现了与最先进的性能相当或甚至更好的表现。本文的预训练模型和源代码将在这个[https://URL]上发布。
https://arxiv.org/abs/2404.17926
Insufficient overlap between the melt pools produced during Laser Powder Bed Fusion (L-PBF) can lead to lack-of-fusion defects and deteriorated mechanical and fatigue performance. In-situ monitoring of the melt pool subsurface morphology requires specialized equipment that may not be readily accessible or scalable. Therefore, we introduce a machine learning framework to correlate in-situ two-color thermal images observed via high-speed color imaging to the two-dimensional profile of the melt pool cross-section. Specifically, we employ a hybrid CNN-Transformer architecture to establish a correlation between single bead off-axis thermal image sequences and melt pool cross-section contours measured via optical microscopy. In this architecture, a ResNet model embeds the spatial information contained within the thermal images to a latent vector, while a Transformer model correlates the sequence of embedded vectors to extract temporal information. Our framework is able to model the curvature of the subsurface melt pool structure, with improved performance in high energy density regimes compared to analytical melt pool models. The performance of this model is evaluated through dimensional and geometric comparisons to the corresponding experimental melt pool observations.
在激光粉末床熔融(L-PBF)过程中产生的熔池之间缺乏重叠可能导致缺乏熔融缺陷和退化的机械和疲劳性能。在熔池的现场监测中,需要使用专门的设备来进行实时监测,这可能不易获得或不可扩展。因此,我们引入了一个机器学习框架,将通过高速彩色成像观察到的熔池现场两色热像与通过光学显微镜测量的熔池横截面二维轮廓之间建立相关性。具体来说,我们采用了一种混合的CNN-Transformer架构来建立单 beads off-axis thermal image sequences与通过光学显微镜测量的熔池横截面轮廓之间的相关性。在这种架构中,ResNet模型将熔融图像中的空间信息嵌入到潜在向量中,而Transformer模型将嵌入的序列与提取时间信息相关联。我们的框架能够建模子表面熔池结构的弯曲程度,在高达高能量密度领域的性能优于分析性熔池模型。该模型的性能是通过与相应实验熔池观察进行维度和几何比较来评估的。
https://arxiv.org/abs/2404.17699
A positive margin may result in an increased risk of local recurrences after breast retention surgery for any malignant tumour. In order to reduce the number of positive margins would offer surgeon real-time intra-operative information on the presence of positive resection margins. This study aims to design an intra-operative tumour margin evaluation scheme by using specimen mammography in breast-conserving surgery. Total of 30 cases were evaluated and compared with the manually determined contours by experienced physicians and pathology report. The proposed method utilizes image thresholding to extract regions of interest and then performs a deep learning model, i.e. SegNet, to segment tumour tissue. The margin width of normal tissues surrounding it is evaluated as the result. The desired size of margin around the tumor was set for 10 mm. The smallest average difference to manual sketched margin (6.53 mm +- 5.84). In the all case, the SegNet architecture was utilized to obtain tissue specimen boundary and tumor contour, respectively. The simulation results indicated that this technology is helpful in discriminating positive from negative margins in the intra-operative setting. The aim of proposed scheme was a potential procedure in the intra-operative measurement system. The experimental results reveal that deep learning techniques can draw results that are consistent with pathology reports.
积极的 margin 可能导致任何恶性肿瘤手术后局部复发风险增加。为了减少阳性切缘的数量,从而让外科医生在手术过程中实时了解阳性切除切缘的存在,这项研究旨在通过乳腺保乳手术使用组织乳腺X光检查设计一种术中肿瘤边缘评估方案。共评价了30例病例,并将其与有经验的医生手动确定的轮廓进行比较。所提出的方法利用图像阈值提取感兴趣区域,然后进行深度学习模型,即SegNet,对肿瘤组织进行分割。评估正常组织周围的边缘宽度作为结果。肿瘤周围希望的切缘大小设置为10毫米。所有情况下,SegNet架构都被用于获取组织样本边界和肿瘤轮廓。仿真结果表明,这种技术在术中可以帮助鉴别阳性切缘和阴性切缘。所提出方案的潜在程序性在于术中测量系统中。实验结果表明,深度学习技术可以获得与病理报告结果一致的结果。
https://arxiv.org/abs/2404.10600
This study explores F0 entrainment in second language (L2) English speech imitation during an Alternating Reading Task (ART). Participants with Italian, French, and Slovak native languages imitated English utterances, and their F0 entrainment was quantified using the Dynamic Time Warping (DTW) distance between the parameterized F0 contours of the imitated utterances and those of the model utterances. Results indicate a nuanced relationship between L2 English proficiency and entrainment: speakers with higher proficiency generally exhibit less entrainment in pitch variation and declination. However, within dyads, the more proficient speakers demonstrate a greater ability to mimic pitch range, leading to increased entrainment. This suggests that proficiency influences entrainment differently at individual and dyadic levels, highlighting the complex interplay between language skill and prosodic adaptation.
本研究探讨了在交替阅读任务(ART)中,第二语言(L2)英语语音模仿中的F0同步现象。参与者用意大利语、法语和斯洛伐克语进行了英语句子的模仿,并通过参数化F0轮廓的动态时间膨胀(DTW)距离来量化他们的F0同步。结果表明,L2英语能力与同步存在微妙的关联:英语能力较高的参与者通常在语调变化和降调方面的同步表现较少。然而,在双人对话中,英语能力较高的参与者表现出更强的模仿语调范围的能力,导致同步增加。这表明,在个体和双人层面上,能力对同步产生了不同的影响,突显了语言技能和 prosodic adaptation之间的复杂相互作用。
https://arxiv.org/abs/2404.10440
Navigation for thoracoabdominal puncture surgery is used to locate the needle entry point on the patient's body surface. The traditional reflective ball navigation method is difficult to position the needle entry point on the soft, irregular, smooth chest and abdomen. Due to the lack of clear characteristic points on the body surface using structured light technology, it is difficult to identify and locate arbitrary needle insertion points. Based on the high stability and high accuracy requirements of surgical navigation, this paper proposed a novel method, a muti-modal 3D small object medical marker detection method, which identifies the center of a small single ring as the needle insertion point. Moreover, this novel method leverages Fourier transform enhancement technology to augment the dataset, enrich image details, and enhance the network's capability. The method extracts the Region of Interest (ROI) of the feature image from both enhanced and original images, followed by generating a mask map. Subsequently, the point cloud of the ROI from the depth map is obtained through the registration of ROI point cloud contour fitting. In addition, this method employs Tukey loss for optimal precision. The experimental results show this novel method proposed in this paper not only achieves high-precision and high-stability positioning, but also enables the positioning of any needle insertion point.
导航定位胸腹部穿刺手术中的针头入口点是在患者身体表面的定位。传统的反射球导航方法很难在柔软、不规则、平滑的胸腹部上定位针头入口点。由于使用结构光技术来观察人体表面的结构化光点缺乏明确特征点,因此很难识别和定位任意针头插入点。根据高精度和高准确度手术导航的要求,本文提出了一个新方法,一种多模态3D小物体医学标记检测方法,将小单环的圆心确定为针头插入点。此外,这种新方法利用傅里叶变换增强技术来丰富数据集,增加图像细节,并增强网络的功能。该方法从增强和原始图像中提取目标区域,然后生成掩膜图。接下来,通过目标点云轮廓拟合来获得ROI点云。此外,该方法采用Tukey损失来实现最佳精度。实验结果表明,本文提出的新方法不仅实现了高精度和高稳定性的定位,而且还能够定位任何针头插入点。
https://arxiv.org/abs/2404.08990
Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released.
在低光环境中定位文本具有挑战性,因为会出现视觉退化。尽管简单的解决方案涉及两个步骤:首先进行低光图像增强(LLE),然后是检测器,但LLE主要针对人类视觉而不是机器,并可能累积错误。在这项工作中,我们提出了一个高效且有效的单阶段方法来在黑暗中定位文本,绕过了需要LLE的步骤。我们在文本检测器的训练阶段引入了一个约束学习模块作为附加机制。这个模块的设计旨在指导文本检测器在特征图缩放过程中保留文本空间特征,从而在低光视觉退化下最小化文本中的空间信息损失。具体来说,我们在这个模块中引入了空间重构和空间语义约束,以确保文本检测器获得了关键的位置和上下文范围知识。我们的方法通过动态蛇特征金字塔网络增强了原始文本检测器的能力,并采用了一种新颖的矩形累积技术,实现了对平滑文本特征的准确边界描绘。此外,我们还提出了一个涵盖任意形状文本的全面低光数据集,包括各种场景和语言。值得注意的是,我们的方法在低光数据集上取得了最先进的成果,同时在标准正常光线数据集上的表现与标准 normal 光线数据集相当。代码和数据集将公开发布。
https://arxiv.org/abs/2404.08965
Existing angle-based contour descriptors suffer from lossy representation for non-starconvex shapes. By and large, this is the result of the shape being registered with a single global inner center and a set of radii corresponding to a polar coordinate parameterization. In this paper, we propose AdaContour, an adaptive contour descriptor that uses multiple local representations to desirably characterize complex shapes. After hierarchically encoding object shapes in a training set and constructing a contour matrix of all subdivided regions, we compute a robust low-rank robust subspace and approximate each local contour by linearly combining the shared basis vectors to represent an object. Experiments show that AdaContour is able to represent shapes more accurately and robustly than other descriptors while retaining effectiveness. We validate AdaContour by integrating it into off-the-shelf detectors to enable instance segmentation which demonstrates faithful performance. The code is available at this https URL.
现有的基于角度的轮廓描述符在非星形凸形状上存在失真表示。总的来说,这是由于将形状与单个全局内切中心和一系列半径对应于极坐标参数化相注册的结果。在本文中,我们提出了AdaContour,一种自适应轮廓描述符,它使用多个局部表示来 desirable地描述复杂形状。在训练集中对对象形状进行层次编码,并构建了所有子分区的轮廓矩阵后,我们计算了一个鲁棒的低秩鲁棒子空间,并通过线性组合共享基础向量来近似每个局部轮廓,从而表示一个物体。实验表明,AdaContour能够比其他描述符更准确、更稳健地表示形状,同时保持有效性和精确性。通过将AdaContour集成到标准检测器中进行实例分割,我们证明了其忠实性能。代码可在此处访问:https://url.com/
https://arxiv.org/abs/2404.08292
Deep learning-based medical image processing algorithms require representative data during development. In particular, surgical data might be difficult to obtain, and high-quality public datasets are limited. To overcome this limitation and augment datasets, a widely adopted solution is the generation of synthetic images. In this work, we employ conditional diffusion models to generate knee radiographs from contour and bone segmentations. Remarkably, two distinct strategies are presented by incorporating the segmentation as a condition into the sampling and training process, namely, conditional sampling and conditional training. The results demonstrate that both methods can generate realistic images while adhering to the conditioning segmentation. The conditional training method outperforms the conditional sampling method and the conventional U-Net.
基于深度学习的医疗图像处理算法在开发过程中需要代表性数据。特别是,手术数据可能很难获得,高质量的公共数据集有限。要克服这一局限并扩展数据集,一种广泛采用的解决方案是生成合成图像。在这项工作中,我们使用条件扩散模型根据轮廓和骨分割生成膝关节X线。值得注意的是,将分割作为一个条件融入抽样和训练过程中提出了两种不同的策略,即条件抽样和条件训练。结果显示,两种方法都可以生成逼真的图像,同时满足条件分割。条件训练方法超越了条件抽样方法和传统的U-Net。
https://arxiv.org/abs/2404.03541
Microvascular networks are challenging to model because these structures are currently near the diffraction limit for most advanced three-dimensional imaging modalities, including confocal and light sheet microscopy. This makes semantic segmentation difficult, because individual components of these networks fluctuate within the confines of individual pixels. Level set methods are ideally suited to solve this problem by providing surface and topological constraints on the resulting model, however these active contour techniques are extremely time intensive and impractical for terabyte-scale images. We propose a reformulation and implementation of the region-scalable fitting (RSF) level set model that makes it amenable to three-dimensional evaluation using both single-instruction multiple data (SIMD) and single-program multiple-data (SPMD) parallel processing. This enables evaluation of the level set equation on independent regions of the data set using graphics processing units (GPUs), making large-scale segmentation of high-resolution networks practical and inexpensive. We tested this 3D parallel RSF approach on multiple data sets acquired using state-of-the-art imaging techniques to acquire microvascular data, including micro-CT, light sheet fluorescence microscopy (LSFM) and milling microscopy. To assess the performance and accuracy of the RSF model, we conducted a Monte-Carlo-based validation technique to compare results to other segmentation methods. We also provide a rigorous profiling to show the gains in processing speed leveraging parallel hardware. This study showcases the practical application of the RSF model, emphasizing its utility in the challenging domain of segmenting large-scale high-topology network structures with a particular focus on building microvascular models.
微血管网络难以建模,因为这些结构目前接近大多数先进的三维成像技术的衍射极限,包括共轭梯度照明显微镜和光片显微镜。这使得语义分割变得困难,因为这些网络中单个组件在个体像素的约束范围内波动。等价集合方法理想地解决了这个问题,通过提供模型生成后的表面和拓扑约束,然而这些活动轮廓技术对于处理大规模图像来说非常耗时且不实用。我们提出了一个区域可扩展拟合(RSF)等价模型,使其具有使用单指令多数据(SIMD)和单程序多数据(SPMD)并行处理的能力。这使得可以使用图形处理器(GPUs)对数据集中的独立区域进行RSF方程的评估,实现对高分辨率网络的大规模分割。我们在采用最先进的成像技术从多个数据集中获取的微血管数据上进行了3D并行RSF方法的实际测试,包括微CT、光片荧光显微镜(LSFM)和加工显微镜。为了评估RSF模型的性能和准确性,我们采用蒙特卡洛基于比较技术将结果与其他分割方法进行比较。我们还对处理速度的提高进行了严格的度量,揭示了并行硬件在处理大型高 topology 网络结构方面的实际应用。本研究展示了RSF模型的实际应用价值,尤其是在构建具有特定关注度的微血管模型方面。
https://arxiv.org/abs/2404.02813
Some early violins have been reduced during their history to fit imposed morphological standards, while more recent ones have been built directly to these standards. We can observe differences between reduced and unreduced instruments, particularly in their contour lines and channel of minima. In a recent preliminary work, we computed and highlighted those two features for two instruments using triangular 3D meshes acquired by photogrammetry, whose fidelity has been assessed and validated with sub-millimetre accuracy. We propose here an extension to a corpus of 38 violins, violas and cellos, and introduce improved procedures, leading to a stronger discussion of the geometric analysis. We first recall the material we are working with. We then discuss how to derive the best reference plane for the violin alignment, which is crucial for the computation of contour lines and channel of minima. Finally, we show how to compute efficiently both characteristics and we illustrate our results with a few examples.
在它们的历史中,一些早期的大提琴已经减半以适应强制的形态标准,而更现代的大提琴则是直接按照这些标准建造的。我们可以观察到减半的大提琴和大提琴之间的轮廓线和最小通道的差异。在最近的一个初步工作中,我们使用通过摄影测量获得的三维三角网格计算并突出了这两个特征,其准确度已通过亚毫米级精度得到了评估和验证。在这里,我们提出了一个用于38个大提琴、小提琴和的大提琴的团体的扩展,并引入了改进的程序,导致几何分析的讨论更加深入。我们首先回忆我们正在处理的材料。然后我们讨论了如何确定最佳参考平面来对大提琴对齐,这对于计算轮廓线和最小通道至关重要。最后,我们展示了如何计算这两个特征的高效性,并通过几个例子说明了我们的结果。
https://arxiv.org/abs/2404.01995
Due to the uncertainty of traffic participants' intentions, generating safe but not overly cautious behavior in interactive driving scenarios remains a formidable challenge for autonomous driving. In this paper, we address this issue by combining a deep learning-based trajectory prediction model with risk potential field-based motion planning. In order to comprehensively predict the possible future trajectories of other vehicles, we propose a target-region based trajectory prediction model(TRTP) which considers every region a vehicle may arrive in the future. After that, we construct a risk potential field at each future time step based on the prediction results of TRTP, and integrate risk value to the objective function of Model Predictive Contouring Control(MPCC). This enables the uncertainty of other vehicles to be taken into account during the planning process. Balancing between risk and progress along the reference path can achieve both driving safety and efficiency at the same time. We also demonstrate the security and effectiveness performance of our method in the CARLA simulator.
由于交通参与者的意图不确定性,在交互式驾驶场景中产生安全但不过于谨慎的行为仍然是对自动驾驶的一个巨大挑战。在本文中,我们通过将基于深度学习的轨迹预测模型与基于风险势场的运动规划相结合来解决这个问题。为了全面预测其他车辆可能未来的轨迹,我们提出了一个基于目标区域的轨迹预测模型(TRTP)。然后,我们根据TRTP的预测结果在每个未来时间步构建一个风险势场,并将风险价值整合到Model预测控制(MPC)的对象函数中。这使得在规划过程中可以考虑到其他车辆的不确定性。在参考路径上平衡风险和进步可以实现同时提高驾驶安全性和效率。我们还证明了我们的方法在CARLA仿真器中的安全性和有效性性能。
https://arxiv.org/abs/2404.00893
Lumbar disc degeneration, a progressive structural wear and tear of lumbar intervertebral disc, is regarded as an essential role on low back pain, a significant global health concern. Automated lumbar spine geometry reconstruction from MR images will enable fast measurement of medical parameters to evaluate the lumbar status, in order to determine a suitable treatment. Existing image segmentation-based techniques often generate erroneous segments or unstructured point clouds, unsuitable for medical parameter measurement. In this work, we present TransDeformer: a novel attention-based deep learning approach that reconstructs the contours of the lumbar spine with high spatial accuracy and mesh correspondence across patients, and we also present a variant of TransDeformer for error estimation. Specially, we devise new attention modules with a new attention formula, which integrates image features and tokenized contour features to predict the displacements of the points on a shape template without the need for image segmentation. The deformed template reveals the lumbar spine geometry in the input image. We develop a multi-stage training strategy to enhance model robustness with respect to template initialization. Experiment results show that our TransDeformer generates artifact-free geometry outputs, and its variant predicts the error of a reconstructed geometry. Our code is available at this https URL.
腰椎间盘退化,腰椎间歇性椎间盘的渐进性结构和功能退化,被认为是导致腰痛和全球健康问题的关键因素。通过从MRI图像中自动重建腰椎脊柱几何形状,将能够快速测量医疗参数以评估腰椎状况,从而确定合适的治疗方案。现有的图像分割为基础的方法通常产生错误的分割或无结构的点云,不适合医疗参数测量。在这项工作中,我们提出了TransDeformer:一种基于注意力的深度学习方法,具有高空间精度和高网格对应性,用于重建腰椎脊柱的轮廓。我们还介绍了一种TransDeformer用于错误估计的变体。特别,我们设计了一种新的注意力模块,采用新的注意力公式,将图像特征和分割轮廓特征集成在一起,预测形状模板上点的位移,无需进行图像分割。变形模板揭示了输入图像中的腰椎脊柱几何形状。我们开发了一种多阶段训练策略,以增强模型在模板初始化方面的鲁棒性。实验结果表明,我们的TransDeformer生成了无伪影的拓扑结构输出,而其变体预测了重构几何的误差。我们的代码可在此处访问:https://www.thisurl.com/
https://arxiv.org/abs/2404.00231
Time-optimal quadrotor flight is an extremely challenging problem due to the limited control authority encountered at the limit of handling. Model Predictive Contouring Control (MPCC) has emerged as a leading model-based approach for time optimization problems such as drone racing. However, the standard MPCC formulation used in quadrotor racing introduces the notion of the gates directly in the cost function, creating a multi-objective optimization that continuously trades off between maximizing progress and tracking the path accurately. This paper introduces three key components that enhance the MPCC approach for drone racing. First and foremost, we provide safety guarantees in the form of a constraint and terminal set. The safety set is designed as a spatial constraint which prevents gate collisions while allowing for time-optimization only in the cost function. Second, we augment the existing first principles dynamics with a residual term that captures complex aerodynamic effects and thrust forces learned directly from real world data. Third, we use Trust Region Bayesian Optimization (TuRBO), a state of the art global Bayesian Optimization algorithm, to tune the hyperparameters of the MPC controller given a sparse reward based on lap time minimization. The proposed approach achieves similar lap times to the best state-of-the-art RL and outperforms the best time-optimal controller while satisfying constraints. In both simulation and real-world, our approach consistently prevents gate crashes with 100\% success rate, while pushing the quadrotor to its physical limit reaching speeds of more than 80km/h.
时间最优的 quadrotor 飞行是一个极其具有挑战性的问题,因为在手柄的极限处遇到的控制权限非常有限。为了实现时间优化的无人机竞赛,模型预测控制(MPC)作为一种基于模型的方法已经成为了领先的模式。然而,在无人机竞赛中使用的标准 MPCC 形式在成本函数中引入了门的概念,导致目标函数连续地平衡在最大化进步和精确跟踪路径之间。本文介绍了三个关键组件,增强了在无人机竞赛中使用 MPC 的方法。首先,我们提供了安全保证,以约束和终止集的形式提供保障。安全集被设计为空间约束,在防止门碰撞的同时,仅允许在成本函数中进行时间优化。其次,我们通过残差项增加了现有的第一性原理动力学,并捕捉了从现实世界数据中获得的复杂空气动力学效应和推力。第三,我们使用了一种最先进的全球贝叶斯优化算法——Trust Region Bayesian Optimization (TuRBO)来根据基于 lap 时间最小化的稀疏奖励来调整 MPC 控制器的超参数。所提出的方法在类似 lap 时间内取得了与最佳状态下的 RL 相同的速度,并且在满足约束的情况下超越了最佳时间最优控制器。在仿真和现实世界里,我们的方法始终能够以 100% 的成功率防止门碰撞,并将无人机推向其物理极限,达到超过 80km/h 的速度。
https://arxiv.org/abs/2403.17551