Recent advances in knowledge graph embedding (KGE) rely on Euclidean/hyperbolic orthogonal relation transformations to model intrinsic logical patterns and topological structures. However, existing approaches are confined to rigid relational orthogonalization with restricted dimension and homogeneous geometry, leading to deficient modeling capability. In this work, we move beyond these approaches in terms of both dimension and geometry by introducing a powerful framework named GoldE, which features a universal orthogonal parameterization based on a generalized form of Householder reflection. Such parameterization can naturally achieve dimensional extension and geometric unification with theoretical guarantees, enabling our framework to simultaneously capture crucial logical patterns and inherent topological heterogeneity of knowledge graphs. Empirically, GoldE achieves state-of-the-art performance on three standard benchmarks. Codes are available at this https URL.
近年来,知识图嵌入(KGE)的进步主要依赖于欧氏/混叠欧氏正交关系变换来建模固有逻辑模式和拓扑结构。然而,现有方法局限于刚性关系正交化以及受限维度和同质几何,导致建模能力不足。在本文中,我们通过引入一个名为GoldE的强大框架,在维度和几何方面超越了这些方法。该框架基于一种一般形式的家用反射,具有普遍的正交参数化。这种参数化可以自然实现维度的扩展和几何的统一,使得我们的框架能够同时捕捉知识图的关键逻辑模式和固有拓扑异质性。实证研究表明,GoldE在三个标准基准测试中都实现了最先进的性能。代码可在此处访问:https://www.xxxxxx.com/。
https://arxiv.org/abs/2405.08540
Traditional recommendation proposals, including content-based and collaborative filtering, usually focus on similarity between items or users. Existing approaches lack ways of introducing unexpectedness into recommendations, prioritizing globally popular items over exposing users to unforeseen items. This investigation aims to design and evaluate a novel layer on top of recommender systems suited to incorporate relational information and suggest items with a user-defined degree of surprise. We propose a Knowledge Graph (KG) based recommender system by encoding user interactions on item catalogs. Our study explores whether network-level metrics on KGs can influence the degree of surprise in recommendations. We hypothesize that surprisingness correlates with certain network metrics, treating user profiles as subgraphs within a larger catalog KG. The achieved solution reranks recommendations based on their impact on structural graph metrics. Our research contributes to optimizing recommendations to reflect the metrics. We experimentally evaluate our approach on two datasets of LastFM listening histories and synthetic Netflix viewing profiles. We find that reranking items based on complex network metrics leads to a more unexpected and surprising composition of recommendation lists.
传统推荐策略,包括基于内容和协同过滤的方法,通常关注物品或用户之间的相似性。现有方法缺乏将意外性引入推荐的方法,将全局热门物品优先于向用户展示未知的物品。本研究旨在设计并评估一种新层,将关系信息编码在物品目录上,用于在推荐系统中建议具有用户定义程度惊喜的物品。我们提出了一个基于知识图谱的推荐系统,通过编码用户在目录上的交互来实现。我们的研究探讨了网络级指标在知识图谱上的影响是否会影响推荐中的惊喜程度。我们假设惊喜程度与某些网络指标相关,将用户个人档案视为大型目录知识图谱中的子图。所实现的结果根据其对结构图指标的影响对推荐进行排序。我们的研究为优化推荐以反映这些指标做出了贡献。我们在LastFM听书历史数据集和合成Netflix观看个人资料数据集上进行了实验评估。我们发现,根据复杂的网络指标重新排列物品会导致推荐列表更加意外和令人惊讶。
https://arxiv.org/abs/2405.08465
The grasp generation of dexterous hand often requires a large number of grasping annotations. Especially for functional grasp-requiring the grasp pose to be convenient for the subsequent use of the object. However, annotating high DoF dexterous hand pose is rather challenging. This prompt us to explore how people achieve manipulations on new objects based on past grasp experiences. We find that people are adept at discovering and leveraging various similarities between objects when grasping new items, including shape, layout, and grasp type. In light of this, we analyze and collect grasp-related similarity relationships among 51 common tool-like object categories and annotate semantic grasp representation for 1768 objects. These data are organized into the form of a knowledge graph, which helps infer our proposed cross-category functional grasp synthesis. Through extensive experiments, we demonstrate that the grasp-related knowledge indeed contributed to achieving functional grasp transfer across unknown or entirely new categories of objects. We will publicly release the dataset and code to facilitate future research.
熟练的手的抓取生成通常需要大量的抓取注释。特别是对于需要功能抓取且抓持姿势对于后续使用对象来说方便的对象。然而,对高维度灵活手抓取姿势的注释相当具有挑战性。这个提示我们探索人们如何基于过去的抓取经验来在新物体上进行操作。我们发现人们擅长发现和利用物体之间的各种相似性,包括形状、布局和抓取类型。鉴于这一点,我们对51个常见工具状物体类别进行了抓取相关相似关系分析,并为1768个物体标注了语义抓取表示。这些数据以知识图谱的形式组织起来,有助于推断我们提出的跨类功能抓取合成。通过大量实验,我们证实了抓取相关知识确实有助于在未知的或完全新的物体类别之间实现功能抓取转移。我们将公开发布该数据集和代码,以促进未来研究。
https://arxiv.org/abs/2405.08310
Geospatial knowledge graphs have emerged as a novel paradigm for representing and reasoning over geospatial information. In this framework, entities such as places, people, events, and observations are depicted as nodes, while their relationships are represented as edges. This graph-based data format lays the foundation for creating a "FAIR" (Findable, Accessible, Interoperable, and Reusable) environment, facilitating the management and analysis of geographic information. This entry first introduces key concepts in knowledge graphs along with their associated standardization and tools. It then delves into the application of knowledge graphs in geography and environmental sciences, emphasizing their role in bridging symbolic and subsymbolic GeoAI to address cross-disciplinary geospatial challenges. At the end, new research directions related to geospatial knowledge graphs are outlined.
地理空间知识图谱作为一种新型表示和推理地理空间信息的方法已经 emergence出来。在这个框架中,诸如地点、人物、事件和观测等实体被描绘成节点,而它们之间的关系则用边来表示。基于这个图形化数据格式,为创建一个“可发现、可访问、可互操作和可重用”的环境奠定了基础,推动了地理信息的管理和分析。本文首先介绍知识图谱中的关键概念以及与它们相关的标准化工具。接着深入探讨知识图谱在地理学和环境科学中的应用,强调它们在连接符号和子符号地理人工智能方面的作用,以解决跨学科的地理空间挑战。最后,提出了与地理空间知识图谱相关的新研究方向。
https://arxiv.org/abs/2405.07664
Motivation: Drug repurposing is a viable solution for reducing the time and cost associated with drug development. However, thus far, the proposed drug repurposing approaches still need to meet expectations. Therefore, it is crucial to offer a systematic approach for drug repurposing to achieve cost savings and enhance human lives. In recent years, using biological network-based methods for drug repurposing has generated promising results. Nevertheless, these methods have limitations. Primarily, the scope of these methods is generally limited concerning the size and variety of data they can effectively handle. Another issue arises from the treatment of heterogeneous data, which needs to be addressed or converted into homogeneous data, leading to a loss of information. A significant drawback is that most of these approaches lack end-to-end functionality, necessitating manual implementation and expert knowledge in certain stages. Results: We propose a new solution, HGTDR (Heterogeneous Graph Transformer for Drug Repurposing), to address the challenges associated with drug repurposing. HGTDR is a three-step approach for knowledge graph-based drug re-purposing: 1) constructing a heterogeneous knowledge graph, 2) utilizing a heterogeneous graph transformer network, and 3) computing relationship scores using a fully connected network. By leveraging HGTDR, users gain the ability to manipulate input graphs, extract information from diverse entities, and obtain their desired output. In the evaluation step, we demonstrate that HGTDR performs comparably to previous methods. Furthermore, we review medical studies to validate our method's top ten drug repurposing suggestions, which have exhibited promising results. We also demon-strated HGTDR's capability to predict other types of relations through numerical and experimental validation, such as drug-protein and disease-protein inter-relations.
动机:药物再利用是减少与药物研发相关的时间和成本的有效解决方案。然而,迄今为止,所提出的药物再利用方法仍需要满足预期。因此,为达到节省成本和提高人类生命的目标,有必要为药物再利用提供系统方法。近年来,基于生物网络的药物再利用方法已经产生了积极的结果。然而,这些方法存在局限性。首先,这些方法的适用范围通常仅限于他们可以有效地处理的数据规模和多样性。另一个问题源于处理异质数据,需要处理或将其转换为同质数据,导致信息丢失。一个显著的缺点是,大多数这些方法缺乏端到端的功能,需要手动实施并在某些阶段专家知识。结果:我们提出了一个新的解决方案,HGTDR(异质知识图Transformer for Drug Repurposing),以应对与药物再利用相关的挑战。HGTDR是基于知识图的药物再利用的三步方法:1)构建异质知识图,2)使用异质图转
https://arxiv.org/abs/2405.08031
Large language models (LLMs) have demonstrated remarkable capabilities across various domains, although their susceptibility to hallucination poses significant challenges for their deployment in critical areas such as healthcare. To address this issue, retrieving relevant facts from knowledge graphs (KGs) is considered a promising method. Existing KG-augmented approaches tend to be resource-intensive, requiring multiple rounds of retrieval and verification for each factoid, which impedes their application in real-world scenarios. In this study, we propose Self-Refinement-Enhanced Knowledge Graph Retrieval (Re-KGR) to augment the factuality of LLMs' responses with less retrieval efforts in the medical field. Our approach leverages the attribution of next-token predictive probability distributions across different tokens, and various model layers to primarily identify tokens with a high potential for hallucination, reducing verification rounds by refining knowledge triples associated with these tokens. Moreover, we rectify inaccurate content using retrieved knowledge in the post-processing stage, which improves the truthfulness of generated responses. Experimental results on a medical dataset demonstrate that our approach can enhance the factual capability of LLMs across various foundational models as evidenced by the highest scores on truthfulness.
大语言模型(LLMs)在各种领域表现出惊人的能力,尽管它们对虚构的易感性给这些模型在关键领域(如医疗保健)的应用带来了巨大的挑战。为解决这一问题,从知识图谱(KGs)中检索相关事实是一个有前景的方法。现有的KG增强方法往往资源密集,需要对每个事实进行多次检索和验证,这阻碍了它们在现实场景中的应用。在这项研究中,我们提出了自优化知识图谱检索(Re-KGR)来增强LLM在医疗领域回答的准确性。我们的方法利用了不同单词下一个词预测概率分布的归因,以及各种模型层,主要识别具有高可能导致虚构的潜力的事实token,通过优化与这些token相关的知识三元组,减少了验证环节。此外,我们在后处理阶段使用检索到的知识纠正不准确的内容,提高了生成响应的真实性。在医学数据集上的实验结果表明,我们的方法可以增强LLM在各种基础模型上的事实能力,这是通过提高真度得分来体现的。
https://arxiv.org/abs/2405.06545
Although Large Language Models (LLMs) are effective in performing various NLP tasks, they still struggle to handle tasks that require extensive, real-world knowledge, especially when dealing with long-tail facts (facts related to long-tail entities). This limitation highlights the need to supplement LLMs with non-parametric knowledge. To address this issue, we analysed the effects of different types of non-parametric knowledge, including textual passage and knowledge graphs (KGs). Since LLMs have probably seen the majority of factual question-answering datasets already, to facilitate our analysis, we proposed a fully automatic pipeline for creating a benchmark that requires knowledge of long-tail facts for answering the involved questions. Using this pipeline, we introduce the LTGen benchmark. We evaluate state-of-the-art LLMs in different knowledge settings using the proposed benchmark. Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required. Nonetheless, the performance of the same models improved significantly when they were prompted with non-parametric knowledge. We observed that, in most cases, prompting LLMs with KG triples surpasses passage-based prompting using a state-of-the-art retriever. In addition, while prompting LLMs with both KG triples and documents does not consistently improve knowledge coverage, it can dramatically reduce hallucinations in the generated content.
尽管大型语言模型(LLMs)在各种自然语言处理任务中表现出色,但在处理需要广泛现实世界知识的长尾事实(与长尾实体相关的事实)时,它们仍然遇到困难。这种局限性突显了需要补充LLMs的非参数知识。为解决这一问题,我们分析了不同类型的非参数知识,包括文本 passage 和知识图(KGs)。由于LLMs可能已经接触到了已经多数存在于事实问题回答数据集中的事实,为了方便我们的分析,我们提出了一个完全自动化的管道来创建一个需要长尾事实回答的问题的基准。使用这个管道,我们引入了 LTGen 基准。我们通过这个基准评估了最先进的LLM在不同知识环境下的表现。我们的实验结果表明,LLMs单独回答这些问题时表现不佳,尤其是当长尾级别高或丰富知识需要时。然而,当它们获得非参数知识时,这些模型的表现显著改进。我们观察到,在大多数情况下,使用KG三元组提示LLM要强于使用最先进的检索器基于 passage 的提示。此外,虽然向LLM提示KG三元组和文档并不能一致地提高知识覆盖率,但可以显著减少生成的内容的幻觉。
https://arxiv.org/abs/2405.06524
While a number of knowledge graph representation learning (KGRL) methods have been proposed over the past decade, very few theoretical analyses have been conducted on them. In this paper, we present the first PAC-Bayesian generalization bounds for KGRL methods. To analyze a broad class of KGRL models, we propose a generic framework named ReED (Relation-aware Encoder-Decoder), which consists of a relation-aware message passing encoder and a triplet classification decoder. Our ReED framework can express at least 15 different existing KGRL models, including not only graph neural network-based models such as R-GCN and CompGCN but also shallow-architecture models such as RotatE and ANALOGY. Our generalization bounds for the ReED framework provide theoretical grounds for the commonly used tricks in KGRL, e.g., parameter-sharing and weight normalization schemes, and guide desirable design choices for practical KGRL methods. We empirically show that the critical factors in our generalization bounds can explain actual generalization errors on three real-world knowledge graphs.
在过去的十年里,已经提出了许多知识图谱表示学习(KGRL)方法,但几乎没有对它们进行理论分析。在本文中,我们提出了第一个PAC-Bayesian扩展边界条件,用于KGRL方法。为了分析广泛的KGRL模型,我们提出了一个名为ReED(关系感知编码器-解码器)的通用框架,它由一个关系感知的消息传递编码器和一个三元组分类解码器组成。我们的ReED框架可以表示至少15个现有的KGRL模型,包括不仅基于图神经网络的模型(如R-GCN和CompGCN),还包括浅架构模型(如RotatE和ANALOGY)。我们对ReED框架扩展边界的理论分析提供了KGRL中常用技巧(如参数共享和权重归一化方案)的理论基础,并为实际KGRL方法提供有益的设计选择。我们通过实证研究证明了,我们的一般化边界中的关键因素可以解释实际泛化误差。
https://arxiv.org/abs/2405.06418
Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability. Some studies have explored combining LMs with Knowledge Graphs(KGs) by coarsely fusing the two modalities to perform Graph Neural Network(GNN)-based reasoning that lacks a profound interaction between heterogeneous modalities. In this paper, we propose a novel Graph-based Structure-Aware Prompt Learning Model for commonsense reasoning, named G-SAP, aiming to maintain a balance between heterogeneous knowledge and enhance the cross-modal interaction within the LM+GNNs model. In particular, an evidence graph is constructed by integrating multiple knowledge sources, i.e. ConceptNet, Wikipedia, and Cambridge Dictionary to boost the performance. Afterward, a structure-aware frozen PLM is employed to fully incorporate the structured and textual information from the evidence graph, where the generation of prompts is driven by graph entities and relations. Finally, a heterogeneous message-passing reasoning module is used to facilitate deep interaction of knowledge between the LM and graph-based networks. Empirical validation, conducted through extensive experiments on three benchmark datasets, demonstrates the notable performance of the proposed model. The results reveal a significant advancement over the existing models, especially, with 6.12% improvement over the SoTA LM+GNNs model on the OpenbookQA dataset.
常识性问题回答在各种应用中具有相当大的潜力,如助手和社交机器人。尽管已经完全预训练的语言模型(LM)在常识推理方面取得了显著的性能,但它们倾向于过分强调文本信息,这会破坏结构知识的精确传递并削弱可解释性。一些研究通过粗略地将两个模式融合起来进行图神经网络(GNN)推理,从而实现缺乏异质模式之间深刻交互的图形基知识图(KG)与LM的结合。在本文中,我们提出了一个名为G-SAP的新颖图形基结构意识提示学习模型,旨在在异质知识和LM+GNN模型的平衡中保持平衡,并增强模型的跨模态交互。特别地,通过将多个知识源( ConceptNet、Wikipedia和Cambridge Dictionary)整合为一个证据图,以提高性能。然后,采用结构意识冻结PLM,以完全包含证据图中的结构和文本信息,其中生成提示的动力来自图实体和关系。最后,采用异质信息传递推理模块来促进LM和基于图的网络之间的知识深入交互。通过在三个基准数据集上进行广泛的实验,进行实证验证,证明了所提出的模型的显著性能。结果表明,与现有模型相比,尤其是OpenbookQA数据集上,性能有显著的提高,其性能提高了6.12%。
https://arxiv.org/abs/2405.05616
Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at this https URL.
近年来,在自然语言处理(NLP)领域的大型语言模型(LLM)的进步已经取得了各种应用领域的 promising 表现。然而,长尾知识整合持续挑战仍然阻碍了 LLM 在专业领域的无缝采用。在这项工作中,我们引入了 DALK(动态协同增强LLM和KG),以解决这一局限,并证明其在研究阿尔茨海默病(AD)方面的能力,这是生物医学领域的一个专业子领域,也是全球健康优先事项。通过 LLM 和 KG 相互增强的协同框架,我们首先利用 LLM 构建一个从 AD 相关科学文献中不断演变的 AD 特定知识图(KG),然后我们利用一种新颖的自我感知知识检索方法,对 KG 进行粗到细的采样,以选择适当的知识来增强 LLM 的推理能力。实验结果表明,在构建的 AD 问题回答(ADQA)基准上进行实验时,DALK 的有效性得到了充分验证。此外,我们进行了一系列详细分析,可以提供有关增强 KG 和 LLM 的有益见解和指导。我们将发布代码和数据于该链接处。
https://arxiv.org/abs/2405.04819
In the task of Knowledge Graph Completion (KGC), the existing datasets and their inherent subtasks carry a wealth of shared knowledge that can be utilized to enhance the representation of knowledge triplets and overall performance. However, no current studies specifically address the shared knowledge within KGC. To bridge this gap, we introduce a multi-level Shared Knowledge Guided learning method (SKG) that operates at both the dataset and task levels. On the dataset level, SKG-KGC broadens the original dataset by identifying shared features within entity sets via text summarization. On the task level, for the three typical KGC subtasks - head entity prediction, relation prediction, and tail entity prediction - we present an innovative multi-task learning architecture with dynamically adjusted loss weights. This approach allows the model to focus on more challenging and underperforming tasks, effectively mitigating the imbalance of knowledge sharing among subtasks. Experimental results demonstrate that SKG-KGC outperforms existing text-based methods significantly on three well-known datasets, with the most notable improvement on WN18RR.
在知识图谱补全(KGC)任务中,现有数据集及其固有子任务携带了大量可以用于增强知识三元组表示和整体性能的共享知识。然而,目前没有具体研究专门关注KGC中的共享知识。为了填补这一空白,我们引入了一种多层共享知识指导学习方法(SKG),该方法在数据集和任务层面上操作。在数据集层面,SKG-KGC通过文本摘要在实体集中的识别共享特征来扩展原始数据集。在任务层面,对于典型的KGC子任务 - 头实体预测、关系预测和尾实体预测 - 我们提出了一种具有动态调整损失权重的创新多任务学习架构。这种方法允许模型专注于更具挑战性和低绩效的任务,有效减轻子任务之间的知识共享不平衡。实验结果表明,SKG-KGC在三个著名数据集上的表现优于现有的基于文本的方法,其中最显著的改进在WN18RR上。
https://arxiv.org/abs/2405.06696
Modern large language models (LLMs) have a significant amount of world knowledge, which enables strong performance in commonsense reasoning and knowledge-intensive tasks when harnessed properly. The language model can also learn social biases, which has a significant potential for societal harm. There have been many mitigation strategies proposed for LLM safety, but it is unclear how effective they are for eliminating social biases. In this work, we propose a new methodology for attacking language models with knowledge graph augmented generation. We refactor natural language stereotypes into a knowledge graph, and use adversarial attacking strategies to induce biased responses from several open- and closed-source language models. We find our method increases bias in all models, even those trained with safety guardrails. This demonstrates the need for further research in AI safety, and further work in this new adversarial space.
现代大型语言模型(LLMs)具有大量的世界知识,在恰当的利用下,在常识推理和知识密集型任务中表现出强大的性能。语言模型还可以学习社会偏见,这有可能对社会造成严重伤害。为解决LLM的安全性问题,已经提出了许多缓解策略,但目前尚不清楚它们是否对消除社会偏见有效。在本文中,我们提出了一个新的方法来攻击知识图增强生成语言模型。我们将自然语言刻板印象重构为知识图,并使用对抗攻击策略促使多个开源和闭源语言模型产生有偏的响应。我们发现,我们的方法在所有模型上都增加了偏见,即使是经过安全网保护的模型也不例外。这表明需要进一步研究AI安全问题,并进一步探索这个新的对抗领域。
https://arxiv.org/abs/2405.04756
Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize Large Language Models (LLMs), which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfillment. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.
攻击知识图构建旨在将文本形式的网络威胁情报(CTI)报告转换为结构化的表示形式,描绘网络攻击的演变轨迹。尽管之前的研究提出了各种方法来构建攻击知识图,但它们通常都存在对不同知识类型的泛化能力有限以及模型设计和调整的要求。为解决这些限制,我们寻求利用大型语言模型(LLMs),因为它们在广泛的任务上取得了巨大的成功,并且在语言理解和零击任务满足方面具有卓越的能力。因此,我们提出了一个完全自动化的LLM-为基础的攻击知识图构建框架,名为:AttacKG+。 我们的框架由四个连续的模块组成:改写器、解析器、标识器和总结器,每个模块都通过指令提示和上下文学习由LLM实现。此外,我们升级了现有的攻击知识模式并提出了全面版本。我们用一个时间展开的事件来表示网络攻击,每个时间步都包含三层表示,包括行为图、MITRE TTP标签和状态概述。丰富的评估表明:1)我们的公式在威胁事件分析中无缝地满足信息需求,2)我们的构建框架有效地忠实并准确地提取了由AttacKG+定义的信息,3)我们的攻击图直接受益于下游安全实践,如攻击重建。所有代码和数据将在接受提交时发布。
https://arxiv.org/abs/2405.04753
Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the explicit graph structure in KGs, which is critical for the above goal, has been heavily ignored by existing CKGE methods. On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs. On the other hand, old triples are preserved with equal priority, failing to alleviate catastrophic forgetting effectively. In this paper, we propose a competitive method for CKGE based on incremental distillation (IncDE), which considers the full use of the explicit graph structure in KGs. First, to optimize the learning order, we introduce a hierarchical strategy, ranking new triples for layer-by-layer learning. By employing the inter- and intra-hierarchical orders together, new triples are grouped into layers based on the graph structure features. Secondly, to preserve the old knowledge effectively, we devise a novel incremental distillation mechanism, which facilitates the seamless transfer of entity representations from the previous layer to the next one, promoting old knowledge preservation. Finally, we adopt a two-stage training paradigm to avoid the over-corruption of old knowledge influenced by under-trained new knowledge. Experimental results demonstrate the superiority of IncDE over state-of-the-art baselines. Notably, the incremental distillation mechanism contributes to improvements of 0.2%-6.5% in the mean reciprocal rank (MRR) score.
传统知识图嵌入(KGE)方法通常需要在新的知识出现时付出巨大的训练成本来保留整个知识图(KG)。为解决这个问题,连续知识图嵌入(CKGE)任务被提出,通过在同时学习和保留旧知识的同时,以高效的方式训练KGE模型。然而,现有的CKGE方法对KGs的显式图结构的重大忽视。一方面,现有方法通常在学习过程中以随机顺序学习新的三元组,破坏了新KG的内部结构。另一方面,旧三元组以等同的优先级被保留,未能有效减轻灾难性遗忘。在本文中,我们提出了一个基于增量的 distillation(IncDE)的竞争性的CKGE 方法,该方法考虑了 KGs 的显式图结构的全面利用。首先,为了优化学习顺序,我们引入了一个层次策略,对每个层级的新的三元组进行排序。通过同时使用上下文和内部层次结构,将新的三元组分组到基于图结构特征的层中。其次,为了有效地保留旧知识,我们设计了一种新颖的增量式蒸馏机制,促进了从前一层到下一层的实体表示的平稳转移,促进了旧知识的保留。最后,我们采用两级训练范式来避免受到欠训练的新知识中旧知识的过度污染。实验结果表明,与最先进的基线相比,IncDE 的优越性得到了充分证明。值得注意的是,增量式蒸馏机制对平均互反排名(MRR)得分提高了0.2%-6.5%。
https://arxiv.org/abs/2405.04453
With the rapid expansion of academic literature and the proliferation of preprints, researchers face growing challenges in manually organizing and labeling large volumes of articles. The NSLP 2024 FoRC Shared Task I addresses this challenge organized as a competition. The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.This paper presents our results. Initially, we enrich the dataset (containing English scholarly articles sourced from ORKG and arXiv), then leverage different pre-trained language Models (PLMs), specifically BERT, and explore their efficacy in transfer learning for this downstream task. Our experiments encompass feature-based and fine-tuned transfer learning approaches using diverse PLMs, optimized for scientific tasks, including SciBERT, SciNCL, and SPECTER2. We conduct hyperparameter tuning and investigate the impact of data augmentation from bibliographic databases such as OpenAlex, Semantic Scholar, and Crossref. Our results demonstrate that fine-tuning pre-trained models substantially enhances classification performance, with SPECTER2 emerging as the most accurate model. Moreover, enriching the dataset with additional metadata improves classification outcomes significantly, especially when integrating information from S2AG, OpenAlex and Crossref. Our best-performing approach achieves a weighted F1-score of 0.7415. Overall, our study contributes to the advancement of reliable automated systems for scholarly publication categorization, offering a potential solution to the laborious manual curation process, thereby facilitating researchers in efficiently locating relevant resources.
https://arxiv.org/abs/2405.04136
This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textual sources by extracting semantic triples via information extraction. However, assuring the quality of these extracted triples, especially when dealing with large or low-quality datasets, can pose a significant challenge and adversely affect the performance of downstream applications. CleanGraph allows users to perform Create, Read, Update, and Delete (CRUD) operations on their graphs, as well as apply models in the form of plugins for graph refinement and completion tasks. These functionalities enable users to enhance the integrity and reliability of their graph data. A demonstration of CleanGraph and its source code can be accessed at this https URL under the MIT License.
本文介绍了CleanGraph,一个交互式的网页工具,旨在促进知识图谱的完善和完成。保持知识图谱的可靠性,这些知识图谱基于高质量和无错误的事实,对于现实世界的应用,如问答和信息检索系统,至关重要。这些图通常通过提取语义三元组来自文本来源。然而,在处理大型或低质量数据集时,确保这些提取的三元组质量具有相当大的挑战,并会破坏下游应用的性能。CleanGraph允许用户执行创建、读取、更新和删除(CRUD)操作,以及以插件形式应用模型来完成知识图谱的完善和完成任务。这些功能使用户能够增强其图形数据的完整性和可靠性。CleanGraph及其源代码的演示地址可以在https://www.clean-graph.org/ under the MIT License中访问。
https://arxiv.org/abs/2405.03932
Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduces three key innovations: (1) a hierarchical knowledge forest for structured domain knowledge representation; (2) a multi-dimensional user profiling mechanism for comprehensive learner modeling; and (3) an interactive prompt engineering scheme for generating precise and tailored learning guidance. We showcase FOKE's application in programming education, homework assessment, and learning path planning, demonstrating its effectiveness and practicality. Additionally, we implement Scholar Hero, a real-world instantiation of FOKE. Our research highlights the potential of integrating foundation models, knowledge graphs, and prompt engineering to revolutionize intelligent education practices, ultimately benefiting learners worldwide. FOKE provides a principled and unified approach to harnessing cutting-edge AI technologies for personalized, interactive, and explainable educational services, paving the way for further research and development in this critical direction.
集成大型语言模型(LLMs)和知识图(KGs)在颠覆智能教育方面具有巨大的潜力,但在实现个性化、互动性和可解释性方面仍然存在挑战。我们提出了FOKE,一种结合基础模型、知识图和提示工程的方法,以解决这些挑战。FOKE引入了三个关键创新:(1)用于表示结构化领域知识的分层知识森林;(2) comprehensive learner modeling 的多维度用户跟踪机制;(3)用于生成精确、定制化学习指导的交互式提示工程方案。我们在编程教育、作业评估和学习路径规划中展示了FOKE的应用,证明了其有效性和实用性。此外,我们还实现了Scholar Hero,一个基于FOKE的现实生活中实例。我们的研究突出了将基础模型、知识图和提示工程集成到智能教育实践中,以颠覆教育传统,最终为全球学习者带来利益的潜力。FOKE提供了一种理性和统一的方法,利用最先进的人工智能技术为个性化、互动性和可解释性教育服务,为这个关键领域的研究和开发铺平道路。
https://arxiv.org/abs/2405.03734
The rapid advancement in artificial intelligence (AI), particularly through deep neural networks, has catalyzed significant progress in fields such as vision and text processing. Nonetheless, the pursuit of AI systems that exhibit human-like reasoning and interpretability continues to pose a substantial challenge. The Neural-Symbolic paradigm, which integrates the deep learning prowess of neural networks with the reasoning capabilities of symbolic systems, presents a promising pathway toward developing more transparent and comprehensible AI systems. Within this paradigm, the Knowledge Graph (KG) emerges as a crucial element, offering a structured and dynamic method for representing knowledge through interconnected entities and relationships, predominantly utilizing the triple (subject, predicate, object). This paper explores recent advancements in neural-symbolic integration based on KG, elucidating how KG underpins this integration across three key categories: enhancing the reasoning and interpretability of neural networks through the incorporation of symbolic knowledge (Symbol for Neural), refining the completeness and accuracy of symbolic systems via neural network methodologies (Neural for Symbol), and facilitating their combined application in Hybrid Neural-Symbolic Integration. It highlights current trends and proposes directions for future research in the domain of Neural-Symbolic AI.
人工智能(AI)的快速发展,特别是通过深度神经网络,在视觉和文本处理等领域取得了显著的进步。然而,追求具有类人推理和可解释性的AI系统仍然是一个巨大的挑战。神经符号范式将神经网络的深度学习能力与符号系统的推理能力相结合,为开发更透明和可解释的AI系统提供了有益的途径。在这种范式中,知识图(KG)成为了一个关键要素,它通过连接实体和关系提供了一个结构化和动态的方法来表示知识,主要利用三元组(主体,谓词,对象)。本文探讨了基于KG的神经符号整合最近的研究进展,解释了KG如何通过引入符号知识(Symbol for Neural)来提高神经网络的推理和可解释性,通过神经网络方法论(Neural for Symbol)来优化符号系统的完整性准确性,并通过混合神经-符号整合来促进它们的联合应用。它强调了当前领域内的趋势,并提出了未来在神经符号AI领域的研究方向。
https://arxiv.org/abs/2405.03524
Image-based retrieval in large Earth observation archives is challenging because one needs to navigate across thousands of candidate matches only with the query image as a guide. By using text as information supporting the visual query, the retrieval system gains in usability, but at the same time faces difficulties due to the diversity of visual signals that cannot be summarized by a short caption only. For this reason, as a matching-based task, cross-modal text-image retrieval often suffers from information asymmetry between texts and images. To address this challenge, we propose a Knowledge-aware Text-Image Retrieval (KTIR) method for remote sensing images. By mining relevant information from an external knowledge graph, KTIR enriches the text scope available in the search query and alleviates the information gaps between texts and images for better matching. Moreover, by integrating domain-specific knowledge, KTIR also enhances the adaptation of pre-trained vision-language models to remote sensing applications. Experimental results on three commonly used remote sensing text-image retrieval benchmarks show that the proposed knowledge-aware method leads to varied and consistent retrievals, outperforming state-of-the-art retrieval methods.
大地球观测档案中基于图像的检索具有挑战性,因为需要仅以查询图像为指南穿越数千个候选匹配。通过将文本作为支持视觉查询的信息,检索系统在可用性方面获得了提高,但同时由于视觉信号的多样性无法仅通过短文标题来总结,因此面临着困难。因此,作为一种匹配为基础的任务,跨模态文本-图像检索常常存在文本和图像之间的信息不对称。为了应对这一挑战,我们提出了一个知识引导的文本-图像检索(KTIR)方法来解决遥感图像。通过从外部知识图中挖掘相关信息,KTIR为搜索查询提供了更丰富的文本范围,并减轻了文本和图像之间的信息缺口,从而实现更好的匹配。此外,通过整合领域特定知识,KTIR还增强了预训练视觉语言模型对远程观测应用的适应性。在三个常用的遥感文本-图像检索基准测试中,与最先进的检索方法相比,所提出的知识引导方法产生了各种不同的检索结果,但均具有更好的表现。
https://arxiv.org/abs/2405.03373
Knowledge Graphs have been widely used to represent facts in a structured format. Due to their large scale applications, knowledge graphs suffer from being incomplete. The relation prediction task obtains knowledge graph completion by assigning one or more possible relations to each pair of nodes. In this work, we make use of the knowledge graph node names to fine-tune a large language model for the relation prediction task. By utilizing the node names only we enable our model to operate sufficiently in the inductive settings. Our experiments show that we accomplish new scores on a widely used knowledge graph benchmark.
知识图谱已被广泛用于以结构化格式表示事实。由于其大规模应用,知识图谱存在不完整性。关系预测任务通过为每对节点分配一个或多个可能的关系来获得知识图的完成。在这项工作中,我们利用知识图节点名称来微调一个大型语言模型,用于关系预测任务。通过仅使用节点名称,使我们模型能够在归纳设置中充分操作。我们的实验结果表明,我们在广泛使用的知识图基准上实现了新的分数。
https://arxiv.org/abs/2405.02738