Current video summarization methods primarily depend on supervised computer vision techniques, which demands time-consuming manual annotations. Further, the annotations are always subjective which make this task more challenging. To address these issues, we analyzed the feasibility in transforming the video summarization into a text summary task and leverage Large Language Models (LLMs) to boost video summarization. This paper proposes a novel self-supervised framework for video summarization guided by LLMs. Our method begins by generating captions for video frames, which are then synthesized into text summaries by LLMs. Subsequently, we measure semantic distance between the frame captions and the text summary. It's worth noting that we propose a novel loss function to optimize our model according to the diversity of the video. Finally, the summarized video can be generated by selecting the frames whose captions are similar with the text summary. Our model achieves competitive results against other state-of-the-art methods and paves a novel pathway in video summarization.
目前,主要的视频摘要方法依赖于监督计算机视觉技术,这需要耗时的人工标注。此外,标注总是主观的,使这项任务更具挑战性。为了应对这些问题,我们分析了将视频摘要转换为文本摘要的可行性,并利用大型语言模型(LLMs)提高视频摘要。本文提出了一种新的自监督框架,用于指导LLMs的 video summarization。我们的方法首先为视频帧生成字幕,然后由LLMs将其合成为文本摘要。接下来,我们测量视频帧字幕与文本摘要之间的语义距离。值得注意的是,我们提出了一个新颖的损失函数,根据视频的多样性优化我们的模型。最后,可以根据文本摘要选择具有相似文本摘要的帧来生成摘要视频。我们的模型在与其他最先进的 methods竞争的同时,在视频摘要领域开辟了新的途径。
https://arxiv.org/abs/2405.08890
The paper discusses the creation of a multimodal dataset of Russian-language scientific papers and testing of existing language models for the task of automatic text summarization. A feature of the dataset is its multimodal data, which includes texts, tables and figures. The paper presents the results of experiments with two language models: Gigachat from SBER and YandexGPT from Yandex. The dataset consists of 420 papers and is publicly available on this https URL.
本文讨论了创建一个多模态俄罗斯语科学论文数据集以及为自动文本摘要任务测试现有语言模型的过程。数据集的一个特点是其多模态数据,包括文本、表格和图表。本文介绍了使用两个语言模型(Gigachat来自SBER和YandexGPT来自Yandex)的实验结果。数据集包括420篇论文,可以在此链接上公开获取。
https://arxiv.org/abs/2405.07886
The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics. Specifically, extracting audio commentaries with ASR provides valuable insights into the events of the game, and opens the door to several downstream applications such as automatic highlight generation. This paper presents SoccerNet-Echoes, an augmentation of the SoccerNet dataset with automatically generated transcriptions of audio commentaries from soccer game broadcasts, enhancing video content with rich layers of textual information derived from the game audio using ASR. These textual commentaries, generated using the Whisper model and translated with Google Translate, extend the usefulness of the SoccerNet dataset in diverse applications such as enhanced action spotting, automatic caption generation, and game summarization. By incorporating textual data alongside visual and auditory content, SoccerNet-Echoes aims to serve as a comprehensive resource for the development of algorithms specialized in capturing the dynamics of soccer games. We detail the methods involved in the curation of this dataset and the integration of ASR. We also highlight the implications of a multimodal approach in sports analytics, and how the enriched dataset can support diverse applications, thus broadening the scope of research and development in the field of sports analytics.
自动语音识别(ASR)技术在足球中的应用带来了许多体育分析的机会。特别是,通过提取来自足球比赛直播的音频评论,使用ASR提取音频评论提供了对比赛事件的宝贵见解,并打开了几个下游应用的大门,如自动高光提取。本文介绍了SoccerNet-Echoes,一种通过从足球比赛直播中自动生成音频评论来扩充SoccerNet数据集的增强视频内容,利用ASR生成的文本信息丰富地增强了视频内容的应用。这些文本评论使用Whisper模型生成,并使用谷歌翻译翻译。通过结合文本数据和视觉和听觉内容,SoccerNet-Echoes旨在成为开发专门捕捉足球比赛动态的算法的全面资源。我们详细介绍了这个数据集的编辑方法和ASR的集成。我们还强调了体育分析中多模态方法的意义,以及丰富数据集如何支持各种应用,从而扩大体育分析领域的研究和开发范围。
https://arxiv.org/abs/2405.07354
Event relation extraction (ERE) is a critical and fundamental challenge for natural language processing. Existing work mainly focuses on directly modeling the entire document, which cannot effectively handle long-range dependencies and information redundancy. To address these issues, we propose a cluster-aware compression method for improving event relation extraction (TacoERE), which explores a compression-then-extraction paradigm. Specifically, we first introduce document clustering for modeling event dependencies. It splits the document into intra- and inter-clusters, where intra-clusters aim to enhance the relations within the same cluster, while inter-clusters attempt to model the related events at arbitrary distances. Secondly, we utilize cluster summarization to simplify and highlight important text content of clusters for mitigating information redundancy and event distance. We have conducted extensive experiments on both pre-trained language models, such as RoBERTa, and large language models, such as ChatGPT and GPT-4, on three ERE datasets, i.e., MAVEN-ERE, EventStoryLine and HiEve. Experimental results demonstrate that TacoERE is an effective method for ERE.
事件关系提取(ERE)是自然语言处理的一个关键和基本挑战。现有的工作主要集中在直接建模整个文档,这无法有效地处理长距离依赖和信息冗余。为了应对这些问题,我们提出了一个聚类感知压缩方法来提高事件关系提取(TacoREE),这是一种压缩然后再提取范式。具体来说,我们首先引入了文档聚类来建模事件依赖关系。它将文档划分为内聚和外聚的簇,内聚簇旨在在同一簇内增强关系,而外聚簇试图在任意距离上建模相关事件。其次,我们利用聚类摘要来简化并突出聚类的重要文本内容,以减轻信息冗余和事件距离。我们在三个数据集上(即MAVEN-ERE,EventStoryLine和HiEve)对预训练语言模型(如RoBERTa和ChatGPT)进行了广泛的实验。实验结果表明,TacoREE是一种有效的用于事件关系提取的有效方法。
https://arxiv.org/abs/2405.06890
Patient hand-off and triage are two fundamental problems in health care. Often doctors must painstakingly summarize complex findings to efficiently communicate with specialists and quickly make decisions on which patients have the most urgent cases. In pursuit of these challenges, we present (1) a model with state-of-art radiology report summarization performance using (2) a novel method for augmenting medical data, and (3) an analysis of the model limitations and radiology knowledge gain. We also provide a data processing pipeline for future models developed on the the MIMIC CXR dataset. Our best performing model was a fine-tuned BERT-to-BERT encoder-decoder with 58.75/100 ROUGE-L F1, which outperformed specialized checkpoints with more sophisticated attention mechanisms. We investigate these aspects in this work.
患者交接班和分诊是医疗保健中的两个基本问题。通常,医生必须费力地总结复杂的放射学报告以有效地与专家沟通并尽快做出关于病情最紧迫的患者的决策。为了解决这些挑战,我们提出了一个使用最先进的放射学报告总结性能的模型,该模型使用了一种新颖的方法来增强医疗数据,并分析了模型的局限性和放射学知识获取。我们还提供了基于MIMIC CXR数据集未来模型的数据处理管道。我们表现最好的模型是一款经过微调的BERT-to-BERT编码器-解码器,其ROUGE-L分数为58.75/100,优于具有更复杂注意机制的专业检查点。我们在本研究中调查了这些方面。
https://arxiv.org/abs/2405.06802
Graphs are an essential data structure utilized to represent relationships in real-world scenarios. Prior research has established that Graph Neural Networks (GNNs) deliver impressive outcomes in graph-centric tasks, such as link prediction and node classification. Despite these advancements, challenges like data sparsity and limited generalization capabilities continue to persist. Recently, Large Language Models (LLMs) have gained attention in natural language processing. They excel in language comprehension and summarization. Integrating LLMs with graph learning techniques has attracted interest as a way to enhance performance in graph learning tasks. In this survey, we conduct an in-depth review of the latest state-of-the-art LLMs applied in graph learning and introduce a novel taxonomy to categorize existing methods based on their framework design. We detail four unique designs: i) GNNs as Prefix, ii) LLMs as Prefix, iii) LLMs-Graphs Integration, and iv) LLMs-Only, highlighting key methodologies within each category. We explore the strengths and limitations of each framework, and emphasize potential avenues for future research, including overcoming current integration challenges between LLMs and graph learning techniques, and venturing into new application areas. This survey aims to serve as a valuable resource for researchers and practitioners eager to leverage large language models in graph learning, and to inspire continued progress in this dynamic field. We consistently maintain the related open-source materials at \url{this https URL}.
图是一种基本的数据结构,用于表示现实场景中的关系。先前的研究已经证明,图神经网络(GNNs)在图中心任务中取得出色的成果,如链路预测和节点分类。尽管取得了这些进步,数据稀疏性和有限的泛化能力仍然存在。最近,在自然语言处理领域,大型语言模型(LLMs)受到了关注。它们在语言理解和总结方面表现出色。将LLMs与图学习技术相结合,作为一种提高在图学习任务中性能的方法引起了人们的兴趣。在这篇调查中,我们对应用于图学习的最新 LLMs 进行了深入的回顾,并引入了一种新的分类方法,根据框架设计进行分类。我们详细介绍了四种独特的设计:i)GNNs 作为前缀,ii)LLMs 作为前缀,iii)LLMs-Graphs 集成,iv)LLMs-Only,突出每种分类方法的关键方法论。我们探讨了每个框架的优势和局限性,并强调未来研究的潜在方向,包括克服当前 LLMs 和图学习技术之间的集成挑战,探索新的应用领域。本调查旨在为渴望利用大型语言模型进行图学习的研究人员和实践者提供有价值的资源,并激发这个动态领域持续的进步。我们始终保持相关开源材料的 URL。
https://arxiv.org/abs/2405.08011
Online social media platforms, such as Twitter, provide valuable information during disaster events. Existing tweet disaster summarization approaches provide a summary of these events to aid government agencies, humanitarian organizations, etc., to ensure effective disaster response. In the literature, there are two types of approaches for disaster summarization, namely, supervised and unsupervised approaches. Although supervised approaches are typically more effective, they necessitate a sizable number of disaster event summaries for testing and training. However, there is a lack of good number of disaster summary datasets for training and evaluation. This motivates us to add more datasets to make supervised learning approaches more efficient. In this paper, we present ADSumm, which adds annotated ground-truth summaries for eight disaster events which consist of both natural and man-made disaster events belonging to seven different countries. Our experimental analysis shows that the newly added datasets improve the performance of the supervised summarization approaches by 8-28% in terms of ROUGE-N F1-score. Moreover, in newly annotated dataset, we have added a category label for each input tweet which helps to ensure good coverage from different categories in summary. Additionally, we have added two other features relevance label and key-phrase, which provide information about the quality of a tweet and explanation about the inclusion of the tweet into summary, respectively. For ground-truth summary creation, we provide the annotation procedure adapted in detail, which has not been described in existing literature. Experimental analysis shows the quality of ground-truth summary is very good with Coverage, Relevance and Diversity.
互联网社交媒体平台,如Twitter,在灾难事件期间提供有价值的信息。现有的 tweet 灾难概述方法将这些事件总结为政府的灾害机构等,以确保有效的灾难应对。在文献中,有两种类型的灾难概述方法,即监督方法和无监督方法。虽然监督方法通常更有效,但为了测试和培训,需要大量的灾难事件摘要。然而,在训练和评估阶段,缺乏足够的灾难摘要数据集。这促使我们增加更多的数据集,使监督学习方法更有效。在本文中,我们提出了ADSumm,为八个灾难事件增加了带注释的地面真实总结,包括七种不同国家的自然和人为灾难事件。我们的实验分析表明,新添加的数据集提高了监督摘要方法在ROUGE-N F1-分数方面的性能8-28%。此外,在新生成的数据集中,我们为每个输入推文增加了类别标签,这有助于确保摘要中涵盖不同类别的良好覆盖。此外,我们还添加了两个其他特征相关标签和关键词,分别为推文的质量和推文包含摘要的解释。对于地面真实摘要创建,我们提供了详细的注释步骤,这已在现有文献中没有描述。实验分析表明,地面真实摘要的质量非常好,具有覆盖、相关性和多样性。
https://arxiv.org/abs/2405.06551
The abundance of situational information on Twitter poses a challenge for users to manually discern vital and relevant information during disasters. A concise and human-interpretable overview of this information helps decision-makers in implementing efficient and quick disaster response. Existing abstractive summarization approaches can be categorized as sentence-based or key-phrase-based approaches. This paper focuses on sentence-based approach, which is typically implemented as a dual-phase procedure in literature. The initial phase, known as the extractive phase, involves identifying the most relevant tweets. The subsequent phase, referred to as the abstractive phase, entails generating a more human-interpretable summary. In this study, we adopt the methodology from prior research for the extractive phase. For the abstractive phase of summarization, most existing approaches employ deep learning-based frameworks, which can either be pre-trained or require training from scratch. However, to achieve the appropriate level of performance, it is imperative to have substantial training data for both methods, which is not readily available. This work presents an Abstractive Tweet Summarizer (ATSumm) that effectively addresses the issue of data sparsity by using auxiliary information. We introduced the Auxiliary Pointer Generator Network (AuxPGN) model, which utilizes a unique attention mechanism called Key-phrase attention. This attention mechanism incorporates auxiliary information in the form of key-phrases and their corresponding importance scores from the input tweets. We evaluate the proposed approach by comparing it with 10 state-of-the-art approaches across 13 disaster datasets. The evaluation results indicate that ATSumm achieves superior performance compared to state-of-the-art approaches, with improvement of 4-80% in ROUGE-N F1-score.
Twitter上关于情境信息的丰富性给用户在灾难期间手动辨别关键和相关信息带来了挑战。对这种信息的简洁和人性化概述有助于决策者在实施高效的快速灾害应对中做出明智的决策。现有的抽象概括方法可以分为基于句子的方法和基于短语的方法。本文重点关注基于句子的方法,这是在文献中通常采用的双阶段过程。提取阶段,也称为初级阶段,涉及确定最具相关性的 tweets。随后的抽象阶段,也称为摘要阶段,包括生成更人性化可解释的概述。在这项研究中,我们采用先前研究中的提取阶段方法。对于摘要阶段,大多数现有方法采用基于深度学习的框架,这些框架可以预训练或需要从零开始训练。然而,为了实现适当的性能水平,需要具备两种方法的充足训练数据,而这一数据并不容易获取。这项工作提出了一种Abstractive Tweet Summarizer(ATSumm)模型,通过使用辅助信息有效地解决了数据稀疏性问题。我们引入了Auxiliary Pointer Generator Network(AuxPGN)模型,它采用了一种独特的关注机制称为关键短语关注。这种关注机制将输入文本中的辅助信息转化为关键短语及其相应的重要性分数。我们通过比较ATSumm与13个灾难数据集的10个最先进的方法的性能来评估所提出的方案。评估结果显示,ATSumm在性能上优于最先进的方法,提高了ROUGE-N F1-分数4-80%。
https://arxiv.org/abs/2405.06541
Community Question-Answering (CQA) forums have revolutionized how people seek information, especially those related to their healthcare needs, placing their trust in the collective wisdom of the public. However, there can be several answers in response to a single query, which makes it hard to grasp the key information related to the specific health concern. Typically, CQA forums feature a single top-voted answer as a representative summary for each query. However, a single answer overlooks the alternative solutions and other information frequently offered in other responses. Our research focuses on aspect-based summarization of health answers to address this limitation. Summarization of responses under different aspects such as suggestions, information, personal experiences, and questions can enhance the usability of the platforms. We formalize a multi-stage annotation guideline and contribute a unique dataset comprising aspect-based human-written health answer summaries. We build an automated multi-faceted answer summarization pipeline with this dataset based on task-specific fine-tuning of several state-of-the-art models. The pipeline leverages question similarity to retrieve relevant answer sentences, subsequently classifying them into the appropriate aspect type. Following this, we employ several recent abstractive summarization models to generate aspect-based summaries. Finally, we present a comprehensive human analysis and find that our summaries rank high in capturing relevant content and a wide range of solutions.
社区问答(CQA)论坛已经彻底颠覆了人们获取信息的方式,尤其是那些与他们的医疗需求相关的人,将他们的信任寄托在公众的集体智慧上。然而,回应单个查询可能会有多个答案,这使得很难理解与特定健康问题相关的关键信息。通常,CQA论坛提供一个单一的顶级答案作为每个查询的代表性摘要。然而,一个单一的答案忽视了其他回答中经常提供的替代方案和其他信息。我们的研究聚焦于基于方面的摘要健康答案,以解决这个限制。对回答的摘要可以根据建议、信息、个人经历和问题进行不同方面的总结,从而提高平台的可用性。我们 formalize了一个多阶段注释指南,并贡献了一个由基于方面的的人类撰写健康答案摘要组成的数据集。我们基于这个数据集构建了一个自动的多方面答案摘要流水线,并对多个先进的模型进行任务特定的微调。该流水线利用问题相似性来检索相关答案句子,然后将它们分类为适当的方面类型。接下来,我们使用几个最近的抽象ive摘要模型来生成基于方面的摘要。最后,我们提交了全面的用户分析和研究,我们发现我们的摘要在捕捉相关内容以及广泛的解决方案方面表现出色。
https://arxiv.org/abs/2405.06295
This paper introduces a federated learning framework tailored for online combinatorial optimization with bandit feedback. In this setting, agents select subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals. Our framework transforms any offline resilient single-agent $(\alpha-\epsilon)$-approximation algorithm, having a complexity of $\tilde{\mathcal{O}}(\frac{\psi}{\epsilon^\beta})$, where the logarithm is omitted, for some function $\psi$ and constant $\beta$, into an online multi-agent algorithm with $m$ communicating agents and an $\alpha$-regret of no more than $\tilde{\mathcal{O}}(m^{-\frac{1}{3+\beta}} \psi^\frac{1}{3+\beta} T^\frac{2+\beta}{3+\beta})$. This approach not only eliminates the $\epsilon$ approximation error but also ensures sublinear growth with respect to the time horizon $T$ and demonstrates a linear speedup with an increasing number of communicating agents. Additionally, the algorithm is notably communication-efficient, requiring only a sublinear number of communication rounds, quantified as $\tilde{\mathcal{O}}\left(\psi T^\frac{\beta}{\beta+1}\right)$. Furthermore, the framework has been successfully applied to online stochastic submodular maximization using various offline algorithms, yielding the first results for both single-agent and multi-agent settings and recovering specialized single-agent theoretical guarantees. We empirically validate our approach to a stochastic data summarization problem, illustrating the effectiveness of the proposed framework, even in single-agent scenarios.
本文提出了一种为在线组合优化与强化反馈量身定制的联邦学习框架。在這種設置中,代理商选择子臂,在不访问個體臂信息的情況下观察這些子臂的噪聲獎勵,並且可以在特定的時間間隔內進行合作和信息共享。我們的框架將任何 offline 鲁棒單 agents (α-ε) 近似的算法转化为一個具有 $m$ 通信代理和 $\alpha$ 后悔为 $\tilde{\mathcal{O}}(m^{-\frac{1}{3+\beta}} \psi^\frac{1}{3+\beta} T^\frac{2+\beta}{3+\beta})$ 的在線多代理算法。這種方法不僅消除了 $\epsilon$ 近似誤差,還確保了與時間視角 $T$ 的 sub-linear 增長關係,並展示了隨著通信代理數量的增加,線性的加速。此外,該算法非常具有通信效率,只需要進行 sub-linear 通信環節,用來表示為 $\tilde{\mathcal{O}}\left(ps T^\frac{\beta}{\beta+1}\right)$。此外,該框架已經成功地應用於使用各種 offline 算法進行在線隨機子模擬最大化,從而為單機和多機設置帶來了第一個結果,並且通過 recover 單機理論保證,對於特定的單機設置,我們的方法具有線性的加速。我們通過經驗證 our approach 到隨機數據總結問題,即使是在單機場景中也展現了所提出框架的有效性。
https://arxiv.org/abs/2405.05950
Table summarization is a crucial task aimed at condensing information from tabular data into concise and comprehensible textual summaries. However, existing approaches often fall short of adequately meeting users' information and quality requirements and tend to overlook the complexities of real-world queries. In this paper, we propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model (LLM), utilizes textual queries and multiple tables to generate query-dependent table summaries tailored to users' information needs. To facilitate research in this area, we present a comprehensive dataset specifically tailored for this task, consisting of 4909 query-summary pairs, each associated with multiple tables. Through extensive experiments using our curated dataset, we demonstrate the effectiveness of our proposed method compared to baseline approaches. Our findings offer insights into the challenges of complex table reasoning for precise summarization, contributing to the advancement of research in query-focused multi-table summarization.
表格摘要是一个重要的任务,旨在将表格数据中的信息压缩成简洁且易于理解的文本摘要。然而,现有的方法通常无法充分满足用户的信息和质量要求,并且倾向于忽视现实查询的复杂性。在本文中,我们提出了一种新方法来解决这些局限,通过引入关注查询的多表摘要控制器、表格序列化和一个大语言模型(LLM)来生成根据用户信息需求的查询依赖的表格摘要。为了促进该领域的研究,我们为这一任务提供了全面的数据集,包括4909个查询摘要对,每个对都与多个表格相关联。通过使用我们精心挑选的数据集进行广泛的实验,我们证明了与基线方法相比,我们提出的方法的有效性。我们的研究结果揭示了精确摘要复杂表推理的挑战,为查询关注多表摘要 summarization 的研究进展做出了贡献。
https://arxiv.org/abs/2405.05109
Long text understanding is important yet challenging for natural language processing. A long article or document usually contains many redundant words that are not pertinent to its gist and sometimes can be regarded as noise. With recent advances of abstractive summarization, we propose our \emph{Gist Detector} to leverage the gist detection ability of a summarization model and integrate the extracted gist into downstream models to enhance their long text understanding ability. Specifically, Gist Detector first learns the gist detection knowledge distilled from a summarization model, and then produces gist-aware representations to augment downstream models. We evaluate our method on three different tasks: long document classification, distantly supervised open-domain question answering, and non-parallel text style transfer. The experimental results show that our method can significantly improve the performance of baseline models on all tasks.
长文本理解对于自然语言处理来说很重要,但也具有挑战性。一篇长文章或文档通常包含许多与文章主旨不相关的冗余单词,有时可以被视为噪音。随着最近摘要性总结的进步,我们提出了我们的\emph{长文本理解检测器},利用摘要模型的摘要检测能力,并将在下游模型中整合提取的摘要以提高其长文本理解能力。具体来说,Gist Detector首先从摘要模型中学习摘要检测知识,然后产生具有摘要意识的表示来增强下游模型。我们在三个不同的任务上评估我们的方法:长文档分类、远离监督的问题回答和非平行文本风格转移。实验结果表明,我们的方法可以在所有任务上显著提高基线模型的性能。
https://arxiv.org/abs/2405.04955
Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian Languages. We present a human annotated named entity corpora of 40K sentences for 4 Indian languages from two of the major Indian language families. Additionally,we present a multilingual model fine-tuned on our dataset, which achieves an F1 score of 0.80 on our dataset on average. We achieve comparable performance on completely unseen benchmark datasets for Indian languages which affirms the usability of our model.
命名实体识别(NER)是自然语言处理(NLP)应用的有用组件。它用于各种任务,如机器翻译、摘要、信息检索和问答系统。NER研究的中心是英语和其他主要语言,而印度语言受到了较少的关注。我们分析了这个挑战并提出了一些可以定制多语言命名实体识别的技巧。我们提供了两个主要印度语言家族中40K个句子的合成人名实体语料库。此外,我们还展示了在我们数据集上进行微调的多语言模型,该模型在我们的数据集上的平均F1分数为0.80。我们在完全未见过的基准数据集上为印度语言实现与我们的模型相当的表现,这证实了我们的模型的可用性。
https://arxiv.org/abs/2405.04829
In the task of Knowledge Graph Completion (KGC), the existing datasets and their inherent subtasks carry a wealth of shared knowledge that can be utilized to enhance the representation of knowledge triplets and overall performance. However, no current studies specifically address the shared knowledge within KGC. To bridge this gap, we introduce a multi-level Shared Knowledge Guided learning method (SKG) that operates at both the dataset and task levels. On the dataset level, SKG-KGC broadens the original dataset by identifying shared features within entity sets via text summarization. On the task level, for the three typical KGC subtasks - head entity prediction, relation prediction, and tail entity prediction - we present an innovative multi-task learning architecture with dynamically adjusted loss weights. This approach allows the model to focus on more challenging and underperforming tasks, effectively mitigating the imbalance of knowledge sharing among subtasks. Experimental results demonstrate that SKG-KGC outperforms existing text-based methods significantly on three well-known datasets, with the most notable improvement on WN18RR.
在知识图谱补全(KGC)任务中,现有数据集及其固有子任务携带了大量可以用于增强知识三元组表示和整体性能的共享知识。然而,目前没有具体研究专门关注KGC中的共享知识。为了填补这一空白,我们引入了一种多层共享知识指导学习方法(SKG),该方法在数据集和任务层面上操作。在数据集层面,SKG-KGC通过文本摘要在实体集中的识别共享特征来扩展原始数据集。在任务层面,对于典型的KGC子任务 - 头实体预测、关系预测和尾实体预测 - 我们提出了一种具有动态调整损失权重的创新多任务学习架构。这种方法允许模型专注于更具挑战性和低绩效的任务,有效减轻子任务之间的知识共享不平衡。实验结果表明,SKG-KGC在三个著名数据集上的表现优于现有的基于文本的方法,其中最显著的改进在WN18RR上。
https://arxiv.org/abs/2405.06696
Auto-regressive generation models achieve competitive performance across many different NLP tasks such as summarization, question answering, and classifications. However, they are also known for being slow in inference, which makes them challenging to deploy in real-time applications. We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each data instance. Automatically making decisions on where to skip and how to balance quality and computation cost with constrained optimization, our dynamic neural generation networks enforce the efficient inference path and determine the optimized trade-off. Experiments across question answering, summarization, and classification benchmarks show that our method benefits from less computation cost during inference while keeping the same accuracy. Extensive experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks.
自回归生成模型在许多不同的自然语言处理任务中实现竞争性的性能,如摘要、问答和分类。然而,它们也以在推理过程中速度较慢而闻名,这使得它们在实时应用中难以部署。我们提出了一种可切换的决策来加速推理,通过动态分配每个数据实例的计算资源。自动在数据实例上做出决策,在哪里跳过以及如何平衡质量和计算成本的约束优化,我们的动态神经生成网络确保了高效的推理路径并确定了最优的权衡。在问答、摘要和分类基准测试中进行的实验表明,与相同的精度相比,我们的方法在推理过程中减少了计算成本,同时保持相同的准确度。大量的实验和消融研究证明了我们的方法具有通用的效果和益处,对许多自然语言处理任务都有效。
https://arxiv.org/abs/2405.04513
This work presents a dynamic vocabulary adaptation strategy, MEDVOC, for fine-tuning pre-trained language models (PLMs) like BertSumAbs, BART, and PEGASUS for improved medical text summarization. In contrast to existing domain adaptation approaches in summarization, MEDVOC treats vocabulary as an optimizable parameter and optimizes the PLM vocabulary based on fragment score conditioned only on the downstream task's reference summaries. Unlike previous works on vocabulary adaptation (limited only to classification tasks), optimizing vocabulary based on summarization tasks requires an extremely costly intermediate fine-tuning step on large summarization datasets. To that end, our novel fragment score-based hyperparameter search very significantly reduces this fine-tuning time -- from 450 days to less than 2 days on average. Furthermore, while previous works on vocabulary adaptation are often primarily tied to single PLMs, MEDVOC is designed to be deployable across multiple PLMs (with varying model vocabulary sizes, pre-training objectives, and model sizes) -- bridging the limited vocabulary overlap between the biomedical literature domain and PLMs. MEDVOC outperforms baselines by 15.74% in terms of Rouge-L in zero-shot setting and shows gains of 17.29% in high Out-Of-Vocabulary (OOV) concentrations. Our human evaluation shows MEDVOC generates more faithful medical summaries (88% compared to 59% in baselines). We make the codebase publicly available at this https URL.
本文提出了一种名为MEDVOC的动态词汇适应策略,用于微调预训练语言模型(PLMs)如BertSumAbs、BART和PEGASUS,以改善医疗文本摘要。与现有的领域适应方法不同,MEDVOC将词汇视为一个可优化的参数,并仅基于下游任务的参考摘要优化PLM词汇。与之前的工作相比(仅限于分类任务),基于摘要任务的词汇优化需要在大规模摘要数据集上进行极其昂贵的的中间微调步骤。为此,我们的新片段得分基于的优化超参数搜索显著减少了这种微调时间——从平均450天减少到平均不超过2天。此外,之前的工作往往只针对单个PLM,而MEDVOC旨在支持多种PLM(具有不同的模型词汇大小、预训练目标和的模型大小)部署——缩小生物医学文献领域和PLM之间的词汇覆盖范围。MEDVOC在零散设置下的Rouge-L基准测试中的表现优于基线,在高出词覆盖(OOV)浓度时的表现也优于基线。我们的人类评估表明,MEDVOC生成的医疗摘要更准确(88% compared to 59% in baselines)。我们将代码库公开发布在https://这一URL上。
https://arxiv.org/abs/2405.04163
In this research, we uses the DistilBERT model to generate extractive summary and the T5 model to generate abstractive summaries. Also, we generate hybrid summaries by combining both DistilBERT and T5 models. Central to our research is the implementation of GPT-based refining process to minimize the common problem of hallucinations that happens in AI-generated summaries. We evaluate unrefined summaries and, after refining, we also assess refined summaries using a range of traditional and novel metrics, demonstrating marked improvements in the accuracy and reliability of the summaries. Results highlight significant improvements in reducing hallucinatory content, thereby increasing the factual integrity of the summaries.
在这项研究中,我们使用DistilBERT模型生成提取性摘要,使用T5模型生成抽象性摘要。此外,我们还通过将DistilBERT和T5模型结合生成混合摘要。我们研究的中心是基于GPT的优化过程来最小化在AI生成的摘要中常见的问题——幻觉。我们评估未优化的摘要,并在优化后使用一系列传统和新型指标评估优化的摘要,证明在准确性、可靠性和摘要的事实性方面有了显著的提高。结果突出了在减少幻觉内容方面取得的显著改进,从而提高了摘要的准确性。
https://arxiv.org/abs/2405.04039
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs that achieve full accuracy recovery for fine-tuning tasks at up to 70% sparsity. We achieve this for the LLaMA-2 7B model by combining the SparseGPT one-shot pruning method and sparse pretraining of those models on a subset of the SlimPajama dataset mixed with a Python subset of The Stack dataset. We exhibit training acceleration due to sparsity on Cerebras CS-3 chips that closely matches theoretical scaling. In addition, we establish inference acceleration of up to 3x on CPUs by utilizing Neural Magic's DeepSparse engine and 1.7x on GPUs through Neural Magic's nm-vllm engine. The above gains are realized via sparsity alone, thus enabling further gains through additional use of quantization. Specifically, we show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x. We demonstrate these results across diverse, challenging tasks, including chat, instruction following, code generation, arithmetic reasoning, and summarization to prove their generality. This work paves the way for rapidly creating smaller and faster LLMs without sacrificing accuracy.
大规模语言模型(LLMs) revolutionized 自然语言处理(NLP),但它们的大小产生了计算瓶颈。我们提出了一种新方法来创建准确、稀疏的基本大语言模型,在稀疏度达到 70% 时实现对微调任务的完全准确性恢复。我们通过将 SparseGPT 一键修剪方法和 SlimPajama 数据集中的稀疏预训练方法相结合,在 LaMA-2 7B 模型上实现了这一目标。我们在 Cerebras CS-3 芯片上展示了由于稀疏度而产生的训练加速,这个加速与理论上的扩展速度非常接近。此外,我们还通过利用 Neural Magic 的 DeepSparse 引擎在 CPU 上实现 up to 3x 的推理加速,而在 GPU 上实现同样的加速需要 Neural Magic 的 nm-vllm 引擎,通过稀疏度实现上述增长。这些增长是通过稀疏度实现的,因此可以通过进一步的量化实现更多的增长。具体来说,我们在 CPU 上实现了稀疏量化 LaMA-2 模型总共的 8.6x 速度提升。我们在各种具有挑战性的任务中展示了这些结果,包括聊天、指令跟随、代码生成、算术推理和总结,以证明其普适性。这项工作为快速创建小而快速的 LLM 奠定了基础,同时不牺牲准确性。
https://arxiv.org/abs/2405.03594
With the deluge of information delivered by the daily news cycle, there is a growing need to effectively and efficiently summarize news feeds for quick consumption. We leverage large language models (LLMs), with their advanced learning and generative abilities as compared to conventional language models, to generate concise and coherent summaries for news articles from the XSum dataset. Our paper focuses on two key aspects of LLMs: Efficient in-context Learning (ELearn) and Parameter Efficient Fine-tuning (EFit). Under ELearn, we find that increasing the number of shots in prompts and utilizing simple templates generally improve the quality of summaries. We also find that utilizing relevant examples in few-shot learning for ELearn does not improve model performance. In addition, we studied EFit using different methods and demonstrate that fine-tuning the first layer of LLMs produces better outcomes as compared to fine-tuning other layers or utilizing LoRA. We also find that leveraging more relevant training samples using selective layers does not result in better performance. By combining ELearn and EFit, we create a new model (ELearnFit) that leverages the benefits of both few-shot learning and fine-tuning and produces superior performance to either model alone. We also use ELearnFit to highlight the trade-offs between prompting and fine-tuning, especially for situations where only a limited number of annotated samples are available. Ultimately, our research provides practical techniques to optimize news summarization during the prompting and fine-tuning stages and enhances the synthesis of news articles.
随着每日新闻循环带来的信息流量,越来越需要有效地和高效地概括新闻摘要,以便快速消费。我们利用大型语言模型(LLMs),其与传统语言模型的先进学习和生成能力相比,以生成简洁且连贯的新闻文章摘要。我们的论文重点关注LLMs的两个关键方面:在上下文中的高效学习(ELearn)和参数效率微调(EFit)。在ELearn方面,我们发现,增加提示中的 shot数并使用简单的模板通常会提高摘要的质量。我们还发现,在ELearn中使用相关示例并不会提高模型的性能。此外,我们研究了EFit,并表明,通过微调第一层LLMs,会产生更好的结果, compared to fine-tuning其他层或使用LoRA。我们还发现,通过选择性层利用更相关的训练样本,并不能提高性能。通过结合ELearn和EFit,我们创建了一个新模型(ELearnFit),它利用了两者之间的优势,并产生了优于单独模型的优异性能。我们还使用ELearnFit突出了提示和微调之间的权衡,尤其是在只有有限数量注释样本的情况下的情况。最终,我们的研究为优化新闻摘要的提示和微调阶段提供了实际技术,并提高了新闻文章的合成。
https://arxiv.org/abs/2405.02710
While automatic summarization techniques have made significant advancements, their primary focus has been on summarizing short news articles or documents that have clear structural patterns like scientific articles or government reports. There has not been much exploration into developing efficient methods for summarizing financial documents, which often contain complex facts and figures. Here, we study the problem of bullet point summarization of long Earning Call Transcripts (ECTs) using the recently released ECTSum dataset. We leverage an unsupervised question-based extractive module followed by a parameter efficient instruction-tuned abstractive module to solve this task. Our proposed model FLAN-FinBPS achieves new state-of-the-art performances outperforming the strongest baseline with 14.88% average ROUGE score gain, and is capable of generating factually consistent bullet point summaries that capture the important facts discussed in the ECTs.
虽然自动摘要技术取得了显著的进步,但它们的主要关注点是摘要短新闻文章或具有明显结构模式的科学文章或政府报告等。在开发有效的摘要金融文件方面,还缺乏深入的研究。因此,我们研究了使用最近发布的ECTSum数据集解决长 Earnings Call Transcripts (ECTs) 摘要问题的方法。我们利用基于无监督的问题提取模块,然后跟一个参数有效的指令微调抽象模块来解决这个问题。我们提出的模型FLAN-FinBPS在平均ROUGE分数提高14.88%的强基线以上实现了最先进的性能,并且能够生成事实一致的摘要,捕捉到ECTs中讨论的重要事实。
https://arxiv.org/abs/2405.06669