Underwater imaging often suffers from low quality due to factors affecting light propagation and absorption in water. To improve image quality, some underwater image enhancement (UIE) methods based on convolutional neural networks (CNN) and Transformer have been proposed. However, CNN-based UIE methods are limited in modeling long-range dependencies, and Transformer-based methods involve a large number of parameters and complex self-attention mechanisms, posing efficiency challenges. Considering computational complexity and severe underwater image degradation, a state space model (SSM) with linear computational complexity for UIE, named WaterMamba, is proposed. We propose spatial-channel omnidirectional selective scan (SCOSS) blocks comprising spatial-channel coordinate omnidirectional selective scan (SCCOSS) modules and a multi-scale feedforward network (MSFFN). The SCOSS block models pixel and channel information flow, addressing dependencies. The MSFFN facilitates information flow adjustment and promotes synchronized operations within SCCOSS modules. Extensive experiments showcase WaterMamba's cutting-edge performance with reduced parameters and computational resources, outperforming state-of-the-art methods on various datasets, validating its effectiveness and generalizability. The code will be released on GitHub after acceptance.
由于影响水下成像光传播和吸收的因素,水下成像通常会导致低质量。为提高图像质量,已经提出了基于卷积神经网络(CNN)和Transformer的一些水下图像增强(UIE)方法。然而,基于CNN的UIE方法在建模长距离依赖方面有限,而基于Transformer的方法参数数量较大且具有复杂的自注意机制,导致效率挑战。在考虑计算复杂性和严重的水下图像退化的情况下,我们提出了一个具有线性计算复杂度的水下图像增强状态空间模型(WaterMamba)。我们提出了包括空间通道坐标全方向选择扫描(SCCOSS)模块和多尺度全向导网络(MSFFN)的时空通道全方向选择扫描(SCOSS)块。SCOSS块建模像素和通道信息流,解决依赖关系。MSFFN促进信息流调整和SCOSS模块内的同步操作。大量实验展示了WaterMamba在较低参数和计算资源下的尖端性能,其在各种数据集上的表现优于最先进的方法,验证了其有效性和通用性。代码将在接受审核后发布到GitHub上。
https://arxiv.org/abs/2405.08419
As an important subtopic of image enhancement, color transfer aims to enhance the color scheme of a source image according to a reference one while preserving the semantic context. To implement color transfer, the palette-based color mapping framework was proposed. \textcolor{black}{It is a classical solution that does not depend on complex semantic analysis to generate a new color scheme. However, the framework usually requires manual settings, blackucing its practicality.} The quality of traditional palette generation depends on the degree of color separation. In this paper, we propose a new palette-based color transfer method that can automatically generate a new color scheme. With a redesigned palette-based clustering method, pixels can be classified into different segments according to color distribution with better applicability. {By combining deep learning-based image segmentation and a new color mapping strategy, color transfer can be implemented on foreground and background parts independently while maintaining semantic consistency.} The experimental results indicate that our method exhibits significant advantages over peer methods in terms of natural realism, color consistency, generality, and robustness.
作为图像增强的一个重要子主题,色彩转移的目的是根据参考图像增强源图像的颜色方案,同时保留语义上下文。为了实现色彩转移,基于调色板的颜色映射框架被提出。\textcolor{black}{这是一种经典的解决方案,不需要进行复杂的语义分析来生成新的颜色方案。然而,该框架通常需要手动设置,降低了其实用性。} 传统调色板生成的质量取决于色彩分离的程度。在本文中,我们提出了一种新的基于调色板的颜色转移方法,可以自动生成新的颜色方案。通过重新设计的基于调色板的分聚方法,可以根据色彩分布将像素分类为不同的片段,具有更好的应用效果。{通过将基于深度学习的图像分割和新的颜色映射策略相结合,可以在前景和背景部分独立地实现色彩转移,同时保持语义一致性。} 实验结果表明,我们的方法在自然真实感、色彩一致性、泛化和鲁棒性方面显著优于同类方法。
https://arxiv.org/abs/2405.08263
This paper introduces a groundbreaking multi-modal neural network model designed for resolution enhancement, which innovatively leverages inter-diagnostic correlations within a system. Traditional approaches have primarily focused on uni-modal enhancement strategies, such as pixel-based image enhancement or heuristic signal interpolation. In contrast, our model employs a novel methodology by harnessing the diagnostic relationships within the physics of fusion plasma. Initially, we establish the correlation among diagnostics within the tokamak. Subsequently, we utilize these correlations to substantially enhance the temporal resolution of the Thomson Scattering diagnostic, which assesses plasma density and temperature. By increasing its resolution from conventional 200Hz to 500kHz, we facilitate a new level of insight into plasma behavior, previously attainable only through computationally intensive simulations. This enhancement goes beyond simple interpolation, offering novel perspectives on the underlying physical phenomena governing plasma dynamics.
本文提出了一种在分辨率增强方面具有突破性的多模态神经网络模型,该模型创新地利用了系统内诊断关系。传统方法主要集中在单模态增强策略,例如基于像素的图像增强或启发式信号插值。相比之下,我们的模型通过利用融合 plasma 物理学中诊断关系的方法来创新性地实现了一种新的方法。首先,我们在 tokamak 中建立了诊断之间的关系。接着,我们利用这些关系大大增强了汤姆逊散射诊断的时域分辨率,该诊断评估了 plasma 密度和温度。通过将分辨率从传统的 200Hz 提高到 500kHz,我们促进了对 plasma 行为的深入洞察,这一般仅通过计算密集型模拟才能实现。这种增强超越了简单的插值,提供了一种新颖的视角,揭示了控制 plasma 动力学背后的物理现象。
https://arxiv.org/abs/2405.05908
In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's perception of brightness and color, decompose images into illumination and reflection components but struggle with noise management and detail preservation under low light conditions. Retinexformer enhances illumination estimation through traditional self-attention mechanisms, but faces challenges with insufficient interpretability and suboptimal enhancement effects. To overcome these limitations, this paper introduces the RetinexMamba architecture. RetinexMamba not only captures the physical intuitiveness of traditional Retinex methods but also integrates the deep learning framework of Retinexformer, leveraging the computational efficiency of State Space Models (SSMs) to enhance processing speed. This architecture features innovative illumination estimators and damage restorer mechanisms that maintain image quality during enhancement. Moreover, RetinexMamba replaces the IG-MSA (Illumination-Guided Multi-Head Attention) in Retinexformer with a Fused-Attention mechanism, improving the model's interpretability. Experimental evaluations on the LOL dataset show that RetinexMamba outperforms existing deep learning approaches based on Retinex theory in both quantitative and qualitative metrics, confirming its effectiveness and superiority in enhancing low-light images.
在低光图像增强领域,传统Retinex方法和先进深度学习技术如Retinexformer都表现出明显的优势和局限性。传统Retinex方法旨在模仿人眼的感知亮度和颜色,通过将图像分解为光照和反射组件,但其在低光条件下处理噪声和细节保留方面遇到困难。Retinexformer通过传统的自注意力机制增强光照估计,但面临可解释性不足和提升效果不优的问题。为了克服这些局限,本文引入了RetinexMamba架构。RetinexMamba不仅保留了传统Retinex方法的物理直觉,还整合了Retinexformer的深度学习框架,通过State Space Models(SSMs)的计算效率来提高处理速度。该架构具有创新的光照估计器和损伤恢复机制,可以在增强过程中保持图像质量。此外,RetinexMamba用融合注意机制取代了Retinexformer中的IG-MSA(光照引导多头注意力),提高了模型的可解释性。在LOL数据集上的实验评估显示,RetinexMamba在Retinex理论的基础上优于现有的深度学习方法,证实了其在增强低光图像方面的有效性和优越性。
https://arxiv.org/abs/2405.03349
This paper proposes a photorealistic real-time dense 3D mapping system that utilizes a learning-based image enhancement method and mesh-based map representation. Due to the characteristics of the underwater environment, where problems such as hazing and low contrast occur, it is hard to apply conventional simultaneous localization and mapping (SLAM) methods. Furthermore, for sensitive tasks like inspecting cracks, photorealistic mapping is very important. However, the behavior of Autonomous Underwater Vehicle (AUV) is computationally constrained. In this paper, we utilize a neural network-based image enhancement method to improve pose estimation and mapping quality and apply a sliding window-based mesh expansion method to enable lightweight, fast, and photorealistic mapping. To validate our results, we utilize real-world and indoor synthetic datasets. We performed qualitative validation with the real-world dataset and quantitative validation by modeling images from the indoor synthetic dataset as underwater scenes.
本文提出了一种利用基于学习的图像增强方法和基于网格的地图表示的等距实时三维映射系统。由于水下环境的特性,例如雾和低对比度问题,因此很难应用传统的同时定位和映射(SLAM)方法。此外,对于诸如检查裂纹等敏感任务,等距实时映射非常重要。然而,自主水下车辆(AUV)的行为是计算受限的。在本文中,我们利用基于神经网络的图像增强方法来提高姿态估计和映射质量,并采用滑动窗口基础的网格扩展方法来实现轻量、快速和等距实时映射。为了验证我们的结果,我们利用真实世界和室内合成数据集。我们通过真实世界数据集进行定性评估,并通过将室内合成数据集中的图像建模为水下场景进行定量评估。
https://arxiv.org/abs/2404.18395
Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factors of underwater images are closely intertwined in the spatial domain. Although certain methods focus on enhancing images in the frequency domain, they overlook the inherent relationship between the image degradation factors and the information present in the frequency domain. As a result, these methods frequently enhance certain attributes of the improved image while inadequately addressing or even exacerbating other attributes. Moreover, many existing methods heavily rely on prior knowledge to address color shift problems in underwater images, limiting their flexibility and robustness. In order to overcome these limitations, we propose the Embedding Frequency and Dual Color Encoder Network (FDCE-Net) in our paper. The FDCE-Net consists of two main structures: (1) Frequency Spatial Network (FS-Net) aims to achieve initial enhancement by utilizing our designed Frequency Spatial Residual Block (FSRB) to decouple image degradation factors in the frequency domain and enhance different attributes separately. (2) To tackle the color shift issue, we introduce the Dual-Color Encoder (DCE). The DCE establishes correlations between color and semantic representations through cross-attention and leverages multi-scale image features to guide the optimization of adaptive color query. The final enhanced images are generated by combining the outputs of FS-Net and DCE through a fusion network. These images exhibit rich details, clear textures, low noise and natural colors.
由于水下图像常常受到各种问题,如低亮度、色彩偏移、模糊细节和噪声,由于水和悬浮颗粒的光吸收和散射引起,水下图像增强(UIE)方法主要集中在空间域增强,而忽略了图像固有的频域信息。然而,水下图像的降解因素在空间域中密切相关。虽然某些方法关注于频域增强图像,但它们忽略了图像降解因素与频域中信息之间的关系。因此,这些方法经常在改善图像的某些属性时过度增强,甚至加剧其他属性。此外,许多现有方法在解决水下图像颜色偏移问题方面过于依赖先验知识,限制了它们的灵活性和稳健性。为了克服这些限制,我们在论文中提出了嵌入频率和双色编码器网络(FDCE-Net)。FDCE-Net由两个主要结构组成:(1)频率空间网络(FS-Net)旨在通过利用我们设计的频率空间残差块(FSRB)实现最初的水下图像增强,并分别增强不同属性。 (2)为了应对色彩偏移问题,我们引入了双色编码器(DCE)。DCE通过跨注意力建立颜色和语义表示之间的相关性,并利用多尺度图像特征引导自适应色彩查询优化。通过融合网络将FS-Net和DCE的输出进行组合,最终生成的增强后的图像具有丰富的细节、清晰的纹理、低噪声和自然色彩。
https://arxiv.org/abs/2404.17936
Underwater scenes intrinsically involve degradation problems owing to heterogeneous ocean elements. Prevailing underwater image enhancement (UIE) methods stick to straightforward feature modeling to learn the mapping function, which leads to limited vision gain as it lacks more explicit physical cues (e.g., depth). In this work, we investigate injecting the depth prior into the deep UIE model for more precise scene enhancement capability. To this end, we present a novel depth-guided perception UIE framework, dubbed underwater variable zoom (UVZ). Specifically, UVZ resorts to a two-stage pipeline. First, a depth estimation network is designed to generate critical depth maps, combined with an auxiliary supervision network introduced to suppress estimation differences during training. Second, UVZ parses near-far scenarios by harnessing the predicted depth maps, enabling local and non-local perceiving in different regions. Extensive experiments on five benchmark datasets demonstrate that UVZ achieves superior visual gain and delivers promising quantitative metrics. Besides, UVZ is confirmed to exhibit good generalization in some visual tasks, especially in unusual lighting conditions. The code, models and results are available at: this https URL.
由于水下场景中的异质海洋元素,水下的图像增强(UIE)方法本质上涉及降解问题。主要的UIE方法仍然坚持简单的特征建模来学习映射函数,这导致在缺乏更明确的物理线索(例如深度)的情况下,视觉增益有限。在本文中,我们研究将深度先验信息注入到深度UIE模型中,以实现更精确的场景增强能力。为此,我们提出了一个名为水下可变缩放(UVZ)的新深度指导感知UIE框架。具体来说,UVZ采用两个阶段。首先,设计了一个深度估计网络,生成关键深度图,并引入了辅助监督网络来抑制在训练过程中估计差异。其次,UVZ通过利用预测的深度图来解析近远场景,实现了不同区域的地方和非地方感知。在五个基准数据集上的大量实验证明,UVZ实现了卓越的视觉增益,并交付了有前景的定量指标。此外,UVZ在一些视觉任务中表现出良好的泛化能力,尤其是在不寻常的照明条件下。代码、模型和结果都可以在上述链接中找到:https:// this URL。
https://arxiv.org/abs/2404.17883
Imaging through fog significantly impacts fields such as object detection and recognition. In conditions of extremely low visibility, essential image information can be obscured, rendering standard extraction methods ineffective. Traditional digital processing techniques, such as histogram stretching, aim to mitigate fog effects by enhancing object light contrast diminished by atmospheric scattering. However, these methods often experience reduce effectiveness under inhomogeneous illumination. This paper introduces a novel approach that adaptively filters background illumination under extremely low visibility and preserve only the essential signal information. Additionally, we employ a visual optimization strategy based on image gradients to eliminate grayscale banding. Finally, the image is transformed to achieve high contrast and maintain fidelity to the original information through maximum histogram equalization. Our proposed method significantly enhances signal clarity in conditions of extremely low visibility and outperforms existing algorithms.
雾中的图像成像对诸如目标检测和识别等领域产生了显著影响。在极度低能见度的情况下,关键图像信息可能会被遮挡,导致标准提取方法变得无效。传统的数字处理技术,如直方图伸缩,试图通过增强物体光线对比度来减轻雾的影响。然而,这些方法在非均匀光照条件下往往效果减弱。本文提出了一种新方法,可以在极度低能见度条件下自适应地过滤背景光照,并仅保留关键信号信息。此外,我们还采用基于图像梯度的视觉优化策略来消除灰度带。最后,通过最大直方图均衡,将图像变换以实现高对比度并保持原始信息的完整性。与现有算法相比,我们提出的方法在极度低能见度条件下显著增强了信号清晰度,并表现出色。
https://arxiv.org/abs/2404.17503
Optical Doppler Tomography (ODT) is a blood flow imaging technique popularly used in bioengineering applications. The fundamental unit of ODT is the 1D frequency response along the A-line (depth), named raw A-scan. A 2D ODT image (B-scan) is obtained by first sensing raw A-scans along the B-line (width), and then constructing the B-scan from these raw A-scans via magnitude-phase analysis and post-processing. To obtain a high-resolution B-scan with a precise flow map, densely sampled A-scans are required in current methods, causing both computational and storage burdens. To address this issue, in this paper we propose a novel sparse reconstruction framework with four main sequential steps: 1) early magnitude-phase fusion that encourages rich interaction of the complementary information in magnitude and phase, 2) State Space Model (SSM)-based representation learning, inspired by recent successes in Mamba and VMamba, to naturally capture both the intra-A-scan sequential information and between-A-scan interactions, 3) an Inception-based Feedforward Network module (IncFFN) to further boost the SSM-module, and 4) a B-line Pixel Shuffle (BPS) layer to effectively reconstruct the final results. In the experiments on real-world animal data, our method shows clear effectiveness in reconstruction accuracy. As the first application of SSM for image reconstruction tasks, we expect our work to inspire related explorations in not only efficient ODT imaging techniques but also generic image enhancement.
光多普勒成像(ODT)是一种在生物工程应用中广受欢迎的血流成像技术。ODT的基本单元是沿着A线(深度)的1D频率响应,称为原始A扫描。通过先沿着B线(宽度)感应原始A扫描,然后通过幅度-相位分析和平处理这些原始A扫描来获得二维ODT图像(B-扫描)。为了获得具有精确流量图的高分辨率B-扫描,当前方法需要高密度的A扫描,导致计算和存储负担较高。为了解决这个问题,本文提出了一种新颖的稀疏重构框架,包括四个主要的序列步骤:1)早期的幅度-相位融合,鼓励 magnitude 和 phase 互补信息的丰富互动,2)基于状态空间模型的表示学习,受到Mamba 和 VMamba 最近成功的影响,以自然地捕捉 both the intra-A-scan sequential information and between-A-scan interactions,3)一种Inception-based Feedforward Network模块(IncFFN)来进一步增强SSM-module,4)一种B-线像素重排(BPS)层,以有效地重构最终结果。在现实世界动物数据上的实验表明,我们的方法在重建准确性方面表现出明显的效果。作为SSM用于图像重建任务的第一个应用,我们期望我们的工作将激发关于高效ODT成像技术和通用图像增强的相關探索。
https://arxiv.org/abs/2404.17484
Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical quality factor lacks spatial information, preventing network adaptability toward image contents. Recent studies in prompt-learning-based image restoration have showcased the potential of prompts to generalize across varied degradation types and degrees. This motivated us to design a prompt-learning-based compressed image restoration network, dubbed PromptCIR, which can effectively restore images from various compress levels. Specifically, PromptCIR exploits prompts to encode compression information implicitly, where prompts directly interact with soft weights generated from image features, thus providing dynamic content-aware and distortion-aware guidance for the restoration process. The light-weight prompts enable our method to adapt to different compression levels, while introducing minimal parameter overhead. Overall, PromptCIR leverages the powerful transformer-based backbone with the dynamic prompt module to proficiently handle blind CIR tasks, winning first place in the NTIRE 2024 challenge of blind compressed image enhancement track. Extensive experiments have validated the effectiveness of our proposed PromptCIR. The code is available at this https URL.
由于其实际应用而引起了广泛关注的 Blind Compressed Image Restoration (CIR) 旨在减轻由于未知质量因素引起的压缩伪影。特别地,在 JPEG 编码标准下,它试图通过质量因素预测网络帮助网络恢复压缩图像。然而,预测的数值质量因子缺乏空间信息,导致网络对图像内容的适应性受限。最近,基于提示的学习图像修复的研究展示了提示的潜力在各种退化类型和程度上进行泛化的可能性。因此,我们设计了一个基于提示的学习压缩图像修复网络,称之为 PromptCIR,可以有效地从各种压缩级别恢复图像。具体来说,PromptCIR 通过提示对压缩信息进行隐式编码,其中提示直接与图像特征生成的软权重交互,为修复过程提供动态内容感知和失真感知指导。轻量级的提示使得我们的方法能够适应不同的压缩级别,同时引入了最小参数开销。总体而言,PromptCIR 利用了强大的 Transformer 基干与动态提示模块,能够有效地处理盲压缩图像增强任务,在 2024 年 NTIRE 挑战中获得了第一名的成绩。大量实验验证了我们所提出的 PromptCIR 的有效性。代码可在此处访问:https://www.acm.org/dl/event/2024/08/02/NTIR-2024-Blind-Compressed-Image-Enhancement-track/
https://arxiv.org/abs/2404.17433
Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range correlations in such images. On the other hand, transformer-based methods that focus on global information face high computational complexities when processing high-resolution remote sensing images. From another perspective, Fourier transform can compute global information without introducing a large number of parameters, enabling the network to more efficiently capture the overall image structure and establish long-range correlations. Therefore, we propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. Specifically, this challenging task of low-light enhancement is divided into two more manageable sub-tasks: the first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. To facilitate information exchange between the two phases, we designed an information fusion affine block that combines data from different phases and scales. Additionally, we have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement. Extensive evaluations show that our method outperforms existing state-of-the-art methods. The code is available at this https URL.
低光遥感图像通常具有高分辨率和高的空间复杂性,在空间中具有连续分布的表面特征。这种连续性在场景中导致了遥感图像中空间域内长距离相关性的广泛扩展。卷积神经网络(依靠局部相关进行长距离建模)在图像中建立长距离相关性方面遇到了困难。另一方面,基于变换器的处理高分辨率遥感图像的方法在处理过程中面临高计算复杂度。从另一个角度来看,傅里叶变换可以在不引入大量参数的情况下计算全局信息,使网络更有效地捕捉整体图像结构并建立长距离相关性。因此,我们提出了一个低光遥感图像增强网络(DFFN)。具体来说,这个具有挑战性的低光增强任务被分为两个更易管理的子任务:第一阶段学习幅度信息以恢复图像亮度,第二阶段学习相信息以优化细节。为了促进两个阶段之间的信息交流,我们设计了一个信息融合平移块,它结合了不同阶段和尺度上的数据。此外,我们还构建了两个暗光遥感数据集,以解决当前暗光遥感图像增强领域缺乏数据集的问题。广泛的评估显示,我们的方法超越了现有最先进的方法。代码可在此处访问:https://www.xxx.com/
https://arxiv.org/abs/2404.17400
Learning-based underwater image enhancement (UIE) methods have made great progress. However, the lack of large-scale and high-quality paired training samples has become the main bottleneck hindering the development of UIE. The inter-frame information in underwater videos can accelerate or optimize the UIE process. Thus, we constructed the first large-scale high-resolution underwater video enhancement benchmark (UVEB) to promote the development of underwater this http URL contains 1,308 pairs of video sequences and more than 453,000 high-resolution with 38\% Ultra-High-Definition (UHD) 4K frame pairs. UVEB comes from multiple countries, containing various scenes and video degradation types to adapt to diverse and complex underwater environments. We also propose the first supervised underwater video enhancement method, UVE-Net. UVE-Net converts the current frame information into convolutional kernels and passes them to adjacent frames for efficient inter-frame information exchange. By fully utilizing the redundant degraded information of underwater videos, UVE-Net completes video enhancement better. Experiments show the effective network design and good performance of UVE-Net.
基于学习的 underwater图像增强(UIE)方法取得了很大的进展。然而,缺乏大规模和高质量的成对训练样本已成为阻碍UIE发展的主要瓶颈。水下视频中的帧间信息可以加速或优化UIE过程。因此,我们构建了第一个大规模高分辨率水下视频增强基准(UVEB)以促进水下图像增强的发展。 UVEB来自多个国家,包含各种场景和视频衰退类型,以适应多样和复杂的水下环境。我们还提出了第一个监督式水下视频增强方法,UVE-Net。UVE-Net将当前帧信息转换为卷积内核并传递给相邻帧进行有效的帧间信息交流。通过充分利用水下视频的冗余衰退信息,UVE-Net完成视频增强效果更好。实验结果表明,UVE-Net的有效的网络设计和良好的性能。
https://arxiv.org/abs/2404.14542
This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlighting, extreme darkness, and night scenes. A notable total of 428 participants registered for the challenge, with 22 teams ultimately making valid submissions. This paper meticulously evaluates the state-of-the-art advancements in enhancing low-light images, reflecting the significant progress and creativity in this field.
本文回顾了NTIRE 2024低光图像增强挑战,重点介绍了所提出的解决方案和结果。该挑战的目标是发现一种有效的网络设计或解决方案,能够在处理各种情况下产生更亮、更清晰、更美观的结果,包括超高清分辨率(4K及更高)、非均匀照明、反光、极度黑暗和夜间场景。值得注意的是,共有428名参与者注册参加挑战,最终有22支队伍提出了有效的参赛作品。本文详细评估了提高低光图像效果的现有技术进步,反映了该领域在进步和创造力方面的重要性。
https://arxiv.org/abs/2404.14248
Extremely low-light text images are common in natural scenes, making scene text detection and recognition challenging. One solution is to enhance these images using low-light image enhancement methods before text extraction. However, previous methods often do not try to particularly address the significance of low-level features, which are crucial for optimal performance on downstream scene text tasks. Further research is also hindered by the lack of extremely low-light text datasets. To address these limitations, we propose a novel encoder-decoder framework with an edge-aware attention module to focus on scene text regions during enhancement. Our proposed method uses novel text detection and edge reconstruction losses to emphasize low-level scene text features, leading to successful text extraction. Additionally, we present a Supervised Deep Curve Estimation (Supervised-DCE) model to synthesize extremely low-light images based on publicly available scene text datasets such as ICDAR15 (IC15). We also labeled texts in the extremely low-light See In the Dark (SID) and ordinary LOw-Light (LOL) datasets to allow for objective assessment of extremely low-light image enhancement through scene text tasks. Extensive experiments show that our model outperforms state-of-the-art methods in terms of both image quality and scene text metrics on the widely-used LOL, SID, and synthetic IC15 datasets. Code and dataset will be released publicly at this https URL.
极低光文本图像在自然场景中很常见,使得场景文本检测和识别变得具有挑战性。一种解决方案是在文本提取之前使用低光图像增强方法增强这些图像。然而,之前的方法通常没有特别关注低级别特征的重要性,这些特征对于下游场景文本任务具有关键作用。此外,缺乏极低光文本数据集也进一步阻碍了进一步的研究。为了克服这些限制,我们提出了一个新颖的编码器-解码器框架,配备边缘感知注意模块,以在增强过程中关注场景文本区域。我们的方法利用新的文本检测和边缘重构损失来强调低级别场景文本特征,从而实现成功的文本提取。此外,我们还提出了一个基于已知场景文本数据集如ICDAR15( see In the Dark,SID)的监督深度曲线估计(Supervised-DCE)模型,用于基于公开可用的场景文本数据合成极低光图像。我们还对极低光See In the Dark(SID)和普通Low-Light(LOL)数据集中的文本进行了标注,以使场景文本任务通过场景文本评估极低光图像增强。大量的实验结果表明,我们的模型在广泛使用的LOL、SID和合成IC15数据集上的图像质量和场景文本指标都优于最先进的方法。代码和数据集将在这个https:// URL上发布。
https://arxiv.org/abs/2404.14135
In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. However, merely performing a single type of image enhancement still cannot yield satisfactory images. In this paper, to deal with the challenge above, we propose the Composite Refinement Network (CRNet) to address this issue using multiple exposure images. By fully integrating information-rich multiple exposure inputs, CRNet can perform unified image restoration and enhancement. To improve the quality of image details, CRNet explicitly separates and strengthens high and low-frequency information through pooling layers, using specially designed Multi-Branch Blocks for effective fusion of these frequencies. To increase the receptive field and fully integrate input features, CRNet employs the High-Frequency Enhancement Module, which includes large kernel convolutions and an inverted bottleneck ConvFFN. Our model secured third place in the first track of the Bracketing Image Restoration and Enhancement Challenge, surpassing previous SOTA models in both testing metrics and visual quality.
在现实场景中,捕获的图像经常受到模糊、噪声和其他图像退化形式的影响,由于传感器限制,人们通常只能获得低动态范围图像。为了获得高质量的图像,研究人员对照片进行了各种图像修复和增强操作,包括去噪、去模糊和高动态范围成像。然而,仅进行一种图像增强操作仍然无法产生令人满意的图像。在本文中,为了应对上述挑战,我们提出了复合优化网络(CRNet)来解决这个问题,利用多个曝光图像。通过完全整合信息丰富的多个曝光输入,CRNet可以执行统一图像修复和增强。为了提高图像细节质量,CRNet通过池化层明确区分和加强高和低频信息,使用专门设计的Multi-Branch Blocks对这两个频率进行有效的融合。为了增加接收范围并完全整合输入特征,CRNet采用High-Frequency Enhancement Module,包括大内核卷积和反向瓶颈ConvFFN。我们的模型在Bracketing Image Restoration and Enhancement Challenge的第一 track获得了第三名的成绩,在测试指标和视觉质量方面均超过了之前的最佳模型。
https://arxiv.org/abs/2404.14132
Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering. In recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored. In addition, combining CNN and Transformer can effectively combine global and local information for enhancement. However, this approach is still affected by the secondary complexity of the Transformer and cannot maximize the performance. Recently, the state-space model (SSM) based architecture Mamba has been proposed, which excels in modeling long distances while maintaining linear complexity. This paper explores the potential of this SSM-based model for UIE from both efficiency and effectiveness perspectives. However, the performance of directly applying Mamba is poor because local fine-grained features, which are crucial for image enhancement, cannot be fully utilized. Specifically, we customize the MambaUIE architecture for efficient UIE. Specifically, we introduce visual state space (VSS) blocks to capture global contextual information at the macro level while mining local information at the micro level. Also, for these two kinds of information, we propose a Dynamic Interaction Block (DIB) and Spatial feed-forward Network (SGFN) for intra-block feature aggregation. MambaUIE is able to efficiently synthesize global and local information and maintains a very small number of parameters with high accuracy. Experiments on UIEB datasets show that our method reduces GFLOPs by 67.4% (2.715G) relative to the SOTA method. To the best of our knowledge, this is the first UIE model constructed based on SSM that breaks the limitation of FLOPs on accuracy in UIE. The official repository of MambaUIE at this https URL.
水下图像增强(UIE)技术旨在解决因光吸收和散射而导致的水下图像降解问题。近年来,基于卷积神经网络(CNN)和基于Transformer的方法已经得到了广泛探索。此外,结合CNN和Transformer可以有效地结合全局和局部信息进行增强。然而,这种方法仍然受到Transformer的二级复杂性的影响,无法实现最佳性能。最近,基于状态空间模型(SSM)的Mamba架构被提出,它在保持线性复杂性的同时表现出长距离建模能力。本文探讨了这种SSM-based模型在UIE方面的潜力和效果。然而,直接应用Mamba的结果性能较差,因为本地细粒度特征,这些特征对于图像增强至关重要,无法得到充分利用。具体来说,我们为MambaUIE架构定制了高效的UIE。具体来说,我们在宏观层面捕捉全局上下文信息,同时也在微观层面挖掘局部信息。此外,为了这两种信息,我们提出了动态交互块(DIB)和空间前馈网络(SGFN)用于块级特征聚合。MambaUIE能够有效地合成全局和局部信息,并具有非常小的参数数量,具有很高的准确性。在UIEB数据集上的实验表明,与最先进的UIE方法相比,我们的方法减少了67.4%的GFLOPs(2.715G)。据我们所知,这是基于SSM构建的第一种能够突破FLOPs限制的UIE模型。MambaUIE的官方仓库地址为:https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https://<https>
https://arxiv.org/abs/2404.13884
Underwater images taken from autonomous underwater vehicles (AUV's) often suffer from low light, high turbidity, poor contrast, motion-blur and excessive light scattering and hence require image enhancement techniques for object recognition. Machine learning methods are being increasingly used for object recognition under such adverse conditions. These enhanced object recognition methods of images taken from AUV's has potential applications in underwater pipeline and optical fibre surveillance, ocean bed resource extraction, ocean floor mapping, underwater species exploration, etc. While the classical machine learning methods are very efficient in terms of accuracy, they require large datasets and high computational time for image classification. In the current work, we use quantum-classical hybrid machine learning methods for real-time under-water object recognition on-board an AUV for the first time. We use real-time motion-blurred and low-light images taken from an on-board camera of AUV built in-house and apply existing hybrid machine learning methods for object recognition. Our hybrid methods consist of quantum encoding and flattening of classical images using quantum circuits and sending them to classical neural networks for image classification. The results of hybrid methods carried out using Pennylane based quantum simulators both on GPU and using pre-trained models on an on-board NVIDIA GPU chipset are compared with results from corresponding classical machine learning methods. We observe that the hybrid quantum machine learning methods show an efficiency greater than 65\% and reduction in run-time by one-thirds and require 50\% smaller dataset sizes for training the models compared to classical machine learning methods. We hope that our work opens up further possibilities in quantum enhanced real-time computer vision in autonomous vehicles.
自主水下车辆(AUV)拍摄的水下图像通常存在低光、高浊度、对比度差、运动模糊和过度光线散射等问题,因此需要图像增强技术来进行目标识别。机器学习方法在AUV拍摄的水下图像目标识别方面得到了越来越多的应用。利用AUV拍摄的水下图像的增强目标识别方法具有潜在的应用,如水下管道和光纤监测、海底资源开采、海底地形图、水下物种探索等。尽管经典的机器学习方法在准确性方面非常有效,但它们需要大量数据和高的计算时间进行图像分类。在当前工作中,我们使用量子经典混合机器学习方法进行AUV上实时水下物体识别,这是第一次在AUV上实现。我们使用AUV自带相机上的实时运动模糊和低光图像,并应用现有的混合机器学习方法进行目标识别。我们的混合方法包括量子编码和经典图像平铺,利用量子电路对经典图像进行量子编码,并将其发送到经典神经网络进行图像分类。使用Pennylane基于量子模拟器的混合方法在GPU和预训练的模型上进行的结果与相应的经典机器学习方法的结果进行了比较。我们观察到,混合量子机器学习方法显示出比经典机器学习方法超过65%的效率,并且在运行时间上减少了三分之一,同时训练模型的数据集需要量比经典方法小50%。我们希望我们的工作为自主车辆的量子增强实时计算机视觉开辟更广阔的可能性。
https://arxiv.org/abs/2404.13130
Accurate localization is fundamental for autonomous underwater vehicles (AUVs) to carry out precise tasks, such as manipulation and construction. Vision-based solutions using fiducial marker are promising, but extremely challenging underwater because of harsh lighting condition underwater. This paper introduces a gradient-based active camera exposure control method to tackle sharp lighting variations during image acquisition, which can establish better foundation for subsequent image enhancement procedures. Considering a typical scenario for underwater operations where visual tags are used, we proposed several experiments comparing our method with other state-of-the-art exposure control method including Active Exposure Control (AEC) and Gradient-based Exposure Control (GEC). Results show a significant improvement in the accuracy of robot localization. This method is an important component that can be used in visual-based state estimation pipeline to improve the overall localization accuracy.
准确的局部定位对于自主水下车辆(AUVs)执行精确任务(如操作和建设)至关重要。使用标记引导的视觉解决方案前景广阔,但在水下由于恶劣的照明条件而变得极其困难。本文介绍了一种基于梯度的主动相机曝光控制方法,以解决图像采集期间图像锐利的照明变化,为后续图像增强过程奠定更好的基础。考虑到水下操作中通常使用视觉标签的情况,我们提出了几种实验,将我们的方法与其他最先进的曝光控制方法(包括主动曝光控制(AEC)和基于梯度的曝光控制(GEC))进行比较。结果表明,机器人的局部定位精度得到了显著提高。这种方法是用于视觉 based 状态估计管道以提高整体局部定位精度的关键组成部分。
https://arxiv.org/abs/2404.12055
This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle density. Traditional monitoring methods, which rely on static cameras with narrow fields of view, are ineffective in dynamic urban environments, necessitating the installation of multiple cameras, which raises costs. Fisheye lenses, which were recently introduced, provide wide and omnidirectional coverage in a single frame, making them a transformative solution. However, issues such as distorted views and blurriness arise, preventing accurate object detection on these images. Motivated by these challenges, this study proposes a novel approach that combines a ransformer-based image enhancement framework and ensemble learning technique to address these challenges and improve traffic monitoring accuracy, making significant contributions to the future of intelligent traffic management systems. Our proposed methodological framework won 5th place in the 2024 AI City Challenge, Track 4, with an F1 score of 0.5965 on experimental validation data. The experimental results demonstrate the effectiveness, efficiency, and robustness of the proposed system. Our code is publicly available at this https URL.
本研究针对基于鱼眼镜头摄像头的城市交通监测检测系统所面临的不断演变挑战,提出了一个框架来提高这些系统的有效性和准确性。在城市的基础设施和交通管理背景下,先进的交通监测系统对于管理城市化复杂性和增加车辆密度至关重要。传统监测方法,依赖静态摄像头,其视野狭窄,在动态城市环境中无效,需要安装多个摄像头,这会增加成本。鱼眼镜头,最近引入,在单帧中提供广泛和全向覆盖,使得它们成为变革性的解决方案。然而,像扭曲和模糊这样的问题出现,使得这些图像上的准确物体检测效果受限。为了应对这些挑战,本研究提出了一个结合基于Transformer的图像增强框架和集成学习技术的新方法,以解决这些问题并提高交通监测准确性,对智能交通管理系统的发展做出了重要贡献。我们提出的方法论框架在2024 AI City Challenge Track 4中获得了第五名,实验验证数据中的F1分数为0.5965。实验结果证明了所提出系统的有效性、效率和稳健性。我们的代码公开在https://这个URL上。
https://arxiv.org/abs/2404.10078
Image restoration, which aims to recover high-quality images from their corrupted counterparts, often faces the challenge of being an ill-posed problem that allows multiple solutions for a single input. However, most deep learning based works simply employ l1 loss to train their network in a deterministic way, resulting in over-smoothed predictions with inferior perceptual quality. In this work, we propose a novel method that shifts the focus from a deterministic pixel-by-pixel comparison to a statistical perspective, emphasizing the learning of distributions rather than individual pixel values. The core idea is to introduce spatial entropy into the loss function to measure the distribution difference between predictions and targets. To make this spatial entropy differentiable, we employ kernel density estimation (KDE) to approximate the probabilities for specific intensity values of each pixel with their neighbor areas. Specifically, we equip the entropy with diffusion models and aim for superior accuracy and enhanced perceptual quality over l1 based noise matching loss. In the experiments, we evaluate the proposed method for low light enhancement on two datasets and the NTIRE challenge 2024. All these results illustrate the effectiveness of our statistic-based entropy loss. Code is available at this https URL.
图像修复的目标是从损坏的图像中恢复高质量的图像,通常面临着一个具有单个输入多项式解的问题。然而,大多数基于深度学习的作品仅仅采用L1损失来以确定性的方式训练网络,导致预测过拟合,感知质量差。在本文中,我们提出了一种新方法,将重点从确定性的像素逐像素比较转变为统计视角,强调学习分布而不是单个像素值。核心思想是引入空间熵到损失函数中,以测量预测和目标之间的分布差异。为了使空间熵不同寻常,我们采用核密度估计(KDE)来近似每个像素具有与其邻居区域的具体强度值的概率。具体来说,我们将熵与扩散模型相结合,旨在实现与基于L1噪声匹配的损失相比的卓越准确性和感知质量的提高。在实验中,我们对所提出的方法在两个数据集上的低光增强进行了评估,以及NTIRE挑战2024。所有这些结果都说明了基于统计熵的熵损失的有效性。代码可在此处访问:https://www.xxx.com/
https://arxiv.org/abs/2404.09735