Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent underrepresentation of homophobia in detection models. Our findings reveal that while BERT outperforms traditional methods, the choice of validation technique can impact model performance. This underscores the importance of contextual understanding in detecting nuanced hate speech. By releasing the largest open-source labelled English dataset for homophobia detection known to us, an analysis of various models' performance and our strongest BERT-based model, we aim to enhance online safety and inclusivity. Future work will extend to broader LGBTQIA+ hate speech detection, addressing the challenges of sourcing diverse datasets. Through this endeavour, we contribute to the larger effort against online hate, advocating for a more inclusive digital landscape. Our study not only offers insights into the effective detection of homophobic content by improving on previous research results, but it also lays groundwork for future advancements in hate speech analysis.
我们的研究在在线仇恨言论检测研究中填补了一个重要的空白,专注于情感分析研究经常被忽视的领域。利用先进的情感分析模型,特别是BERT,以及传统机器学习方法,我们开发了一种 nuanced的方法来识别X/Twitter上的同性恋内容。由于在检测模型中持续存在对仇恨言论的低估,这项研究至关重要。我们的发现表明,尽管BERT超越了传统方法,但验证技术的选择可能会影响模型性能。这凸显了在检测复杂仇恨言论中情境理解的重要性。通过发布我们所拥有的最大开放源代码的英语仇恨言论检测数据集,以及我们最强的基于BERT的模型,我们旨在提高在线安全和包容性。未来的工作将扩展到更广泛的LGBTQIA+仇恨言论检测,解决数据来源的挑战。通过这项努力,我们为反对在线仇恨言论作出了贡献,主张建设一个更加包容的数字环境。我们的研究不仅为以前的研究成果提供了洞察,而且也为未来仇恨言论分析的进步奠定了基础。
https://arxiv.org/abs/2405.09221
Stickers are increasingly used in social media to express sentiment and intent. When finding typing troublesome, people often use a sticker instead. Despite the significant impact of stickers on sentiment analysis and intent recognition, little research has been conducted. To address this gap, we propose a new task: Multimodal chat Sentiment Analysis and Intent Recognition involving Stickers (MSAIRS). Additionally, we introduce a novel multimodal dataset containing Chinese chat records and stickers excerpted from several mainstream social media platforms. Our dataset includes paired data with the same text but different stickers, and various stickers consisting of the same images with different texts, allowing us to better understand the impact of stickers on chat sentiment and intent. We also propose an effective multimodal joint model, MMSAIR, for our task, which is validated on our datasets and indicates that visual information of stickers counts. Our dataset and code will be publicly available.
贴纸在社交媒体上越来越多地用于表达情感和意图。当发现打字困难时,人们通常使用贴纸代替。尽管贴纸对情感分析和意图识别产生了显著影响,但很少有研究深入研究。为了填补这个空白,我们提出了一个新任务:多模态聊天情感分析和意图识别涉及贴纸(MSAIRS)。此外,我们还介绍了一个包含中国聊天记录和贴纸摘录的全新多模态数据集。我们的数据集包括相同文本但不同贴纸的配对数据,以及由相同图像但不同文本组成的各种贴纸,使我们能够更好地理解贴纸对聊天情感和意图的影响。我们还提出了有效的多模态联合模型MMSAIR,用于我们的任务,该模型在我们的数据集上验证,并表明贴纸的视觉信息计数。我们的数据集和代码将公开可用。
https://arxiv.org/abs/2405.08427
We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level taxonomy from raw reviews, a semantic similarity heuristic approach to generate labelled data and employs a multi-task insight extraction architecture by fine-tuning an LLM. InsightNet identifies granular actionable topics with customer sentiments and verbatim for each topic. Evaluations on real-world customer review data show that InsightNet performs better than existing solutions in terms of structure, hierarchy and completeness. We empirically demonstrate that InsightNet outperforms the current state-of-the-art methods in multi-label topic classification, achieving an F1 score of 0.85, which is an improvement of 11% F1-score over the previous best results. Additionally, InsightNet generalises well for unseen aspects and suggests new topics to be added to the taxonomy.
我们提出了InsightNet,一种从客户评论中自动提取结构化洞见的全新方法。我们的端到端机器学习框架旨在克服现有解决方案的局限性,包括缺乏主题结构的缺乏、非标准的方面名称和缺乏丰富的训练数据。所提出的解决方案基于原始评论构建了一个半监督的多级分类树,使用语义相似性启发式方法生成带标签的数据,并通过微调LLM来采用多任务智能提取架构。InsightNet通过客户情感和句子的精确匹配识别出细粒度的可操作主题。在真实世界客户评论数据上的评估表明,InsightNet在结构、层次和完整性方面优于现有解决方案。我们通过实验验证,InsightNet在多标签主题分类上超越了现有技术水平,实现了0.85的F1分数,这是前最佳结果的11%以上。此外,InsightNet对未见过的方面扩展良好,并为税目添加了新的主题建议。
https://arxiv.org/abs/2405.07195
As the conversation around using geoengineering to combat climate change intensifies, it is imperative to engage the public and deeply understand their perspectives on geoengineering research, development, and potential deployment. Through a comprehensive data-driven investigation, this paper explores the types of news that captivate public interest in geoengineering. We delved into 30,773 English-language news articles from the BBC and the New York Times, combined with Google Trends data spanning 2018 to 2022, to explore how public interest in geoengineering fluctuates in response to news coverage of broader climate issues. Using BERT-based topic modeling, sentiment analysis, and time-series regression models, we found that positive sentiment in energy-related news serves as a good predictor of heightened public interest in geoengineering, a trend that persists over time. Our findings suggest that public engagement with geoengineering and climate action is not uniform, with some topics being more potent in shaping interest over time, such as climate news related to energy, disasters, and politics. Understanding these patterns is crucial for scientists, policymakers, and educators aiming to craft effective strategies for engaging with the public and fostering dialogue around emerging climate technologies.
随着关于使用地球工程对抗气候变化的对话不断加剧,我们有必要与公众深入探讨他们对地球工程研究、开发和潜在部署的看法。通过全面的數據驱动調查,本文探討了哪些新闻吸引了公众对地球工程的兴趣。我们深入挖掘了BBC和《纽约时报》的30,773篇英文新闻文章以及2018年至2022年间的Google Trends数据,以探讨公众对地球工程兴趣的波动如何随针对更广泛气候问题的新闻报道而变化。利用基于BERT的主题建模、情感分析和时间序列回归模型,我们发现,与能源相关的新闻中积极情绪是一个预测公众对地球工程兴趣提高的好指标,这一趋势会随着时间延续。我们的研究结果表明,公众与地球工程和应对气候变化的参与程度并不一致。在塑造长期兴趣方面,一些主题比其他主题更有影响力,比如与能源、灾难和政治相关的气候新闻。了解这些模式对科学家、政策制定者和教育者来说至关重要,他们试图制定有效的策略与公众进行互动,并促进关于新兴气候技术的热烈讨论。
https://arxiv.org/abs/2405.07010
The chess domain is well-suited for creating an artificial intelligence (AI) system that mimics real-world challenges, including decision-making. Throughout the years, minimal attention has been paid to investigating insights derived from unstructured chess data sources. In this study, we examine the complicated relationships between multiple referenced moves in a chess-teaching textbook, and propose a novel method designed to encapsulate chess knowledge derived from move-action phrases. This study investigates the feasibility of using a modified sentiment analysis method as a means for evaluating chess moves based on text. Our proposed Aspect-Based Sentiment Analysis (ABSA) method represents an advancement in evaluating the sentiment associated with referenced chess moves. By extracting insights from move-action phrases, our approach aims to provide a more fine-grained and contextually aware `chess move'-based sentiment classification. Through empirical experiments and analysis, we evaluate the performance of our fine-tuned ABSA model, presenting results that confirm the efficiency of our approach in advancing aspect-based sentiment classification within the chess domain. This research contributes to the area of game-playing by machines and shows the practical applicability of leveraging NLP techniques to understand the context of strategic games.
棋域非常适合创建一个模仿现实世界挑战的人工智能(AI)系统,包括决策。多年来,在研究棋类数据源的非结构化数据时, minimal attention has been paid to investigating insights derived from unstructured chess data sources. 在这项研究中,我们研究了多个参考移动之间的复杂关系,并提出了一种新方法,旨在封装从移动-动作短语中得出的象棋知识。这项研究调查了使用修正的情感分析方法作为评估基于文本的棋移动可行性的手段。我们提出的基于方面的情感分析(ABSA)方法在评估参考移动的情感方面取得了进展。通过提取来自移动-动作短语的见解,我们的方法旨在提供更加细粒度和具有上下文意识的自定义棋移动情感分类。通过实验和分析,我们评估了我们微调过的ABSA模型的性能,结果证实了我们在棋领域中通过NLP技术推动面向方面的情感分类的有效性。这项研究为机器游戏领域的研究贡献了什么?它展示了利用自然语言处理技术理解战略游戏背景的实际应用。
https://arxiv.org/abs/2405.06499
Generative approaches have significantly influenced Aspect-Based Sentiment Analysis (ABSA), garnering considerable attention. However, existing studies often predict target text components monolithically, neglecting the benefits of utilizing single elements for tuple prediction. In this paper, we introduce Element to Tuple Prompting (E2TP), employing a two-step architecture. The former step focuses on predicting single elements, while the latter step completes the process by mapping these predicted elements to their corresponding tuples. E2TP is inspired by human problem-solving, breaking down tasks into manageable parts, using the first step's output as a guide in the second step. Within this strategy, three types of paradigms, namely E2TP($diet$), E2TP($f_1$), and E2TP($f_2$), are designed to facilitate the training process. Beyond in-domain task-specific experiments, our paper addresses cross-domain scenarios, demonstrating the effectiveness and generalizability of the approach. By conducting a comprehensive analysis on various benchmarks, we show that E2TP achieves new state-of-the-art results in nearly all cases.
生成式方法已经显著影响了基于情感分析的面向主题的情感分析(ASSA),并获得了相当的关注。然而,现有研究通常预测目标文本的组件,而忽视了使用单个元素进行元组预测的优势。在本文中,我们引入了元素到元组提示(E2TP)方法,采用双层架构。第一步关注预测单个元素,而第二步则完成这个过程,将预测的元素映射到相应的元组。E2TP受到了人类问题解决方法的影响,将任务分解成可管理的部分,使用第一步的输出作为第二步的指导。在这种策略下,我们设计了三种类型的范例,即E2TP($diet$), E2TP($f_1$),和E2TP($f_2$),以促进训练过程。除了在领域内任务特定实验之外,我们的论文还解决了跨领域场景,证明了这种方法的有效性和普遍性。通过对各种基准进行全面分析,我们发现,在几乎所有情况下,E2TP都实现了与当前最先进水平相当的结果。
https://arxiv.org/abs/2405.06454
In this paper, we introduce SaudiBERT, a monodialect Arabic language model pretrained exclusively on Saudi dialectal text. To demonstrate the model's effectiveness, we compared SaudiBERT with six different multidialect Arabic language models across 11 evaluation datasets, which are divided into two groups: sentiment analysis and text classification. SaudiBERT achieved average F1-scores of 86.15\% and 87.86\% in these groups respectively, significantly outperforming all other comparative models. Additionally, we present two novel Saudi dialectal corpora: the Saudi Tweets Mega Corpus (STMC), which contains over 141 million tweets in Saudi dialect, and the Saudi Forums Corpus (SFC), which includes 15.2 GB of text collected from five Saudi online forums. Both corpora are used in pretraining the proposed model, and they are the largest Saudi dialectal corpora ever reported in the literature. The results confirm the effectiveness of SaudiBERT in understanding and analyzing Arabic text expressed in Saudi dialect, achieving state-of-the-art results in most tasks and surpassing other language models included in the study. SaudiBERT model is publicly available on \url{this https URL}.
在本文中,我们引入了沙特BERT,一种仅针对沙特方言文本进行预训练的单语种阿拉伯语言模型。为了证明模型的有效性,我们在11个评估数据集上比较了沙特BERT与6个不同多语种阿拉伯语言模型,这些数据集分为两组:情感分析和文本分类。沙特BERT在这些组中的平均F1分数分别为86.15%和87.86%,显著优于其他比较模型。此外,我们还提出了两个新的沙特方言数据集:沙特推特Mega Corpus(STMC),包含了沙特方言超过14100万条推文,以及沙特论坛Corpus(SFC),包含了来自五个沙特在线论坛的15.2 GB文本。这两个数据集都在预训练过程中使用,是 literature中报道的最大沙特方言数据集。结果证实了沙特BERT在理解和分析沙特方言中表达的阿拉伯文本方面的有效性,在大多数任务上达到了最先进水平,并超越了包括在研究中使用的其他语言模型。沙特BERT模型可在 \url{此链接} 上公开获取。
https://arxiv.org/abs/2405.06239
Aspect-based sentiment analysis (ABSA) is an important subtask of sentiment analysis, which aims to extract the aspects and predict their sentiments. Most existing studies focus on improving the performance of the target domain by fine-tuning domain-specific models (trained on source domains) based on the target domain dataset. Few works propose continual learning tasks for ABSA, which aim to learn the target domain's ability while maintaining the history domains' abilities. In this paper, we propose a Large Language Model-based Continual Learning (\texttt{LLM-CL}) model for ABSA. First, we design a domain knowledge decoupling module to learn a domain-invariant adapter and separate domain-variant adapters dependently with an orthogonal constraint. Then, we introduce a domain knowledge warmup strategy to align the representation between domain-invariant and domain-variant knowledge. In the test phase, we index the corresponding domain-variant knowledge via domain positioning to not require each sample's domain ID. Extensive experiments over 19 datasets indicate that our \texttt{LLM-CL} model obtains new state-of-the-art performance.
面向 aspect 的情感分析(ASSA)是情感分析的一个重要子任务,旨在提取方面并预测其情感。大多数现有研究关注于通过基于目标领域数据集对领域特定模型(在源领域训练)进行微调,从而提高目标领域的性能。少数研究提出了持续学习任务(旨在学习目标领域的能力,同时保持历史领域的能力)来对ASSA进行持续学习。在本文中,我们提出了一个基于大型语言模型的大型语言模型持续学习(LLM-CL)模型。首先,我们设计了一个领域知识解耦模块,用于学习一个领域无关的适配器,并分别与正交约束分离领域内外的适配器。然后,我们引入了一个领域知识预热策略,以对领域内外的知识进行对齐。在测试阶段,我们通过领域定位索引相应的领域外知识,从而无需为每个样本的领域ID。在 19 个数据集上的大量实验表明,我们的 LLM-CL 模型取得了最先进的性能。
https://arxiv.org/abs/2405.05496
Online commerce relies heavily on user generated reviews to provide unbiased information about products that they have not physically seen. The importance of reviews has attracted multiple exploitative online behaviours and requires methods for monitoring and detecting reviews. We present a machine learning methodology for review detection and extraction, and demonstrate that it generalises for use across websites that were not contained in the training data. This method promises to drive applications for automatic detection and evaluation of reviews, regardless of their source. Furthermore, we showcase the versatility of our method by implementing and discussing three key applications for analysing reviews: Sentiment Inconsistency Analysis, which detects and filters out unreliable reviews based on inconsistencies between ratings and comments; Multi-language support, enabling the extraction and translation of reviews from various languages without relying on HTML scraping; and Fake review detection, achieved by integrating a trained NLP model to identify and distinguish between genuine and fake reviews.
网络贸易依赖于用户生成的评论来提供关于他们没有亲自见过的产品的非偏见信息。评论的重要性吸引了多个利用性行为,需要检测和检测评论的方法。我们提出了一个机器学习方法来检测和提取评论,并证明它适用于训练数据中没有包含的网站。这种方法承诺驱动应用程序自动检测和评估评论,无论其来源。此外,我们展示了我们方法的多样性,通过实现和讨论三种用于分析评论的关键应用来实现:情感不一致分析,它通过评分和评论之间的不协调来检测和过滤不可靠的评论;多语言支持,使可以从各种语言中提取和翻译评论,而不必依赖HTML爬取;以及假评论检测,通过将训练好的NLP模型集成到识别和区分真实和假评论中来实现。
https://arxiv.org/abs/2405.06704
Recent social media posts on the cholera outbreak in Hammanskraal have highlighted the diverse range of emotions people experienced in response to such an event. The extent of people's opinions varies greatly depending on their level of knowledge and information about the disease. The documented re-search about Cholera lacks investigations into the classification of emotions. This study aims to examine the emotions expressed in social media posts about Chol-era. A dataset of 23,000 posts was extracted and pre-processed. The Python Nat-ural Language Toolkit (NLTK) sentiment analyzer library was applied to deter-mine the emotional significance of each text. Additionally, Machine Learning (ML) models were applied for emotion classification, including Long short-term memory (LSTM), Logistic regression, Decision trees, and the Bidirectional En-coder Representations from Transformers (BERT) model. The results of this study demonstrated that LSTM achieved the highest accuracy of 75%. Emotion classification presents a promising tool for gaining a deeper understanding of the impact of Cholera on society. The findings of this study might contribute to the development of effective interventions in public health strategies.
近期哈姆skraal地区社交媒体上关于霍乱疫情的社会动态强调了人们在此类事件中经历的多种情感反应。人们对此事件的观点差异很大,这很大程度上取决于他们对这种疾病的了解和知识水平。关于霍乱的详细研究缺乏对情感分类的调查。本研究旨在研究社交媒体上关于霍乱的情感表达。 数据集提取并经过预处理,共有23,000条帖子。Python自然语言处理工具包(NLTK)情感分析库应用于确定每个文本的情感重要性。此外,应用机器学习(ML)模型进行情感分类,包括长短时记忆(LSTM)、逻辑回归、决策树和双向编码来自Transformer(BERT)模型。本研究的成果表明,LSTM取得了最高准确率75%。情感分类成为深入了解霍乱对社会影响的有趣工具。本研究的发现可能有助于在公共卫生策略的发展中制定有效的干预措施。
https://arxiv.org/abs/2405.04897
Inspired by the 'Bias Considerations in Bilingual Natural Language Processing' report by Statistics Canada, this study delves into potential biases in multilingual sentiment analysis between English and French. Given a 50-50 dataset of French and English, we aim to determine if there exists a language bias and explore how the incorporation of more diverse datasets in the future might affect the equity of multilingual Natural Language Processing (NLP) systems. By employing Support Vector Machine (SVM) and Naive Bayes models on three balanced datasets, we reveal potential biases in multilingual sentiment classification. Utilizing Fairlearn, a tool for assessing bias in machine learning models, our findings indicate nuanced outcomes. With French data outperforming English across accuracy, recall, and F1 score metrics in both models, hinting at a language bias favoring French. However, Fairlearn's metrics suggest that the SVM approaches equitable levels with a demographic parity ratio of 0.963, 0.989, and 0.985 for the three separate datasets, indicating near-equitable treatment across languages. In contrast, Naive Bayes demonstrates greater disparities, evidenced by a demographic parity ratio of 0.813, 0.908, and 0.961. These findings reveal the importance of developing equitable multilingual NLP systems, particularly as we anticipate the inclusion of more datasets in various languages in the future.
受到加拿大统计局发布的《双语自然语言处理中的偏见考虑》报告的启发,这项研究深入探讨了英语和法语多语言情感分析之间存在的潜在偏见。在法语和英语各50%的数据集上,我们旨在确定是否存在语言偏见,并探讨在未来的多语言自然语言处理(NLP)系统中,使用更多多样化的数据集可能会如何影响系统的公平性。通过在三个平衡数据集上使用支持向量机(SVM)和朴素贝叶斯模型,我们揭示了多语言情感分类中的潜在偏见。利用Fairlearn,一个用于评估机器学习模型偏见的工具,我们的研究结果表明, nuanced的结论。在两个模型中,法语数据在准确率、召回率和F1分数指标上都优于英语数据,这表明偏好法语的偏见。然而,Fairlearn的指标表明,在三个不同的数据集上,SVM方法具有接近均等对待的语言平等度比值,分别为0.963、0.989和0.985,表明在语言之间存在近乎均等对待的待遇。相反,朴素贝叶斯表现出更大的差异,这通过具有0.813、0.908和0.961的 demographic parity ratio 得出。这些发现表明,开发公平的多语言NLP系统尤为重要,尤其是在我们预计在未来的各种语言数据中包含更多数据时。
https://arxiv.org/abs/2405.06692
Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased suicide risk. However, the high demand for psychological interventions often results in a shortage of professional operators, highlighting the need for an effective speech emotion recognition model. This model would automatically detect and analyze callers' emotions, facilitating integration into hotline services. Additionally, it would enable large-scale data analysis of psychological support hotline interactions to explore psychological phenomena and behaviors across populations. Our study utilizes data from the Beijing psychological support hotline, the largest suicide hotline in China. We analyzed speech data from 105 callers containing 20,630 segments and categorized them into 11 types of negative emotions. We developed a negative emotion recognition model and a fine-grained multi-label classification model using a large-scale pre-trained model. Our experiments indicate that the negative emotion recognition model achieves a maximum F1-score of 76.96%. However, it shows limited efficacy in the fine-grained multi-label classification task, with the best model achieving only a 41.74% weighted F1-score. We conducted an error analysis for this task, discussed potential future improvements, and considered the clinical application possibilities of our study. All the codes are public available.
自杀和自杀行为仍然是公共政策和卫生保健的显著挑战。为了应对这一挑战,全球范围内已经建立了心理支持热线,为处于精神危机中的个人提供及时帮助。这些热线的有效性很大程度上取决于准确识别来电者的情感状态,特别是表明 increased suicide risk 的潜在负面情感。然而,心理干预需求的增加通常会导致专业操作员不足,凸显了需要有效的情感识别模型的需求。这个模型将自动检测并分析来电者的情感,促进将其融入热线服务。此外,它将能够对心理支持热线互动大规模数据进行分析,以探索人群中的心理现象和行为。我们的研究利用了北京心理支持热线等中国最大的自杀热线的中国数据。我们对105个来电者(包含20,630个片段)的语音数据进行了分析,并将它们分为11种负面情绪类型。我们使用一个大型的预训练模型开发了负情感识别模型和细粒度多标签分类模型。我们的实验表明,负情感识别模型的最大F1分数达到76.96%。然而,在细粒度多标签分类任务中,最佳模型只能达到41.74%的加权F1分数。我们对这个任务进行了错误分析,讨论了潜在的 future improvements,并考虑了本研究的临床应用可能性。所有代码都是公开可用的。
https://arxiv.org/abs/2405.04128
The iterative character of work in machine learning (ML) and artificial intelligence (AI) and reliance on comparisons against benchmark datasets emphasize the importance of reproducibility in that literature. Yet, resource constraints and inadequate documentation can make running replications particularly challenging. Our work explores the potential of using downstream citation contexts as a signal of reproducibility. We introduce a sentiment analysis framework applied to citation contexts from papers involved in Machine Learning Reproducibility Challenges in order to interpret the positive or negative outcomes of reproduction attempts. Our contributions include training classifiers for reproducibility-related contexts and sentiment analysis, and exploring correlations between citation context sentiment and reproducibility scores. Study data, software, and an artifact appendix are publicly available at this https URL .
机器学习(ML)和人工智能(AI)中工作的迭代性质以及依赖基准数据集的特点强调了在文献中可重复性的重要性。然而,资源限制和不足的文档可以使运行复制尤其具有挑战性。我们的工作探讨了使用下游引用上下文作为可重复性信号的可能性。我们引入了一个应用于参与机器学习可重复性挑战的论文的引用上下文的情绪分析框架,以解释繁殖尝试的积极或消极结果。我们的贡献包括为可重复性相关上下文训练分类器和探索引用上下文情感与可重复性评分之间的相关性。研究数据、软件和一件附件都可以在https://url.com/这个URL公开使用。
https://arxiv.org/abs/2405.03977
Social media aids disaster response but suffers from noise, hindering accurate impact assessment and decision making for resilient cities, which few studies considered. To address the problem, this study proposes the first domain-specific LLM model and an integrated method for rapid earthquake impact assessment. First, a few categories are introduced to classify and filter microblogs considering their relationship to the physical and social impacts of earthquakes, and a dataset comprising 7282 earthquake-related microblogs from twenty earthquakes in different locations is developed as well. Then, with a systematic analysis of various influential factors, QuakeBERT, a domain-specific large language model (LLM), is developed and fine-tuned for accurate classification and filtering of microblogs. Meanwhile, an integrated method integrating public opinion trend analysis, sentiment analysis, and keyword-based physical impact quantification is introduced to assess both the physical and social impacts of earthquakes based on social media texts. Experiments show that data diversity and data volume dominate the performance of QuakeBERT and increase the macro average F1 score by 27%, while the best classification model QuakeBERT outperforms the CNN- or RNN-based models by improving the macro average F1 score from 60.87% to 84.33%. Finally, the proposed approach is applied to assess two earthquakes with the same magnitude and focal depth. Results show that the proposed approach can effectively enhance the impact assessment process by accurate detection of noisy microblogs, which enables effective post-disaster emergency responses to create more resilient cities.
社交媒体在灾难应对方面有所帮助,但受到噪音的影响,阻碍了对 resilience 城市的准确影响评估和决策。为解决这个问题,本研究提出了第一个领域特定的 LLM 模型和一种快速地震影响评估方法。首先,根据地震对物理和社会影响的关系,为分类和过滤微博提供了几类,并建立了一个包括 7282 条地震相关微博的数据集。接着,通过分析各种影响因素的系统分析,开发了一个领域特定的 large language model (LLM),即 QuakeBERT,并对其进行微调以准确分类和过滤微博。同时,引入了一种整合公共意见趋势分析、情感分析和关键词基于物理影响量化的综合方法,以评估地震根据社交媒体文本的物理和社会影响。实验结果表明,数据多样性和数据量占据了 QuakeBERT 表现的主导地位,将宏观平均 F1 分数提高了 27%。而最佳分类模型 QuakeBERT 则超过了基于 CNN 或 RNN 的模型,将宏观平均 F1 分数从 60.87% 提高到了 84.33%。最后,所提出的方法应用于对相同规模的两个地震进行评估。结果表明,通过准确检测噪音微博,可以有效增强影响评估过程,从而为应对灾害创建更强的韧性城市。
https://arxiv.org/abs/2405.06684
This study delves into the relationship between emotional trends from X platform data and the market dynamics of well-known cryptocurrencies Cardano, Binance, Fantom, Matic, and Ripple over the period from October 2022 to March 2023. Leveraging SenticNet, we identified emotions like Fear and Anxiety, Rage and Anger, Grief and Sadness, Delight and Pleasantness, Enthusiasm and Eagerness, and Delight and Joy. Following data extraction, we segmented each month into bi-weekly intervals, replicating this process for price data obtained from Finance-Yahoo. Consequently, a comparative analysis was conducted, establishing connections between emotional trends observed across bi-weekly intervals and cryptocurrency prices, uncovering significant correlations between emotional sentiments and coin valuations.
本研究深入探讨了X平台数据中情感趋势与知名加密货币Cardano、Binance、Fantom、Matic和Ripple的市场动态之间的关系,研究期为2022年10月至2023年3月。利用SenticNet,我们识别出诸如恐惧、焦虑、愤怒、悲伤、喜悦和愉快等情感。接着,我们对从Finance-Yahoo获得的每个月份的数据进行提取,并重复这个过程,对价格数据进行处理。接着进行比较分析,发现了情感趋势与加密货币价格之间的显著相关性,揭示了情感与硬币估值之间的重大关联。
https://arxiv.org/abs/2405.03084
People communicate in more than 7,000 languages around the world, with around 780 languages spoken in India alone. Despite this linguistic diversity, research on Sentiment Analysis has predominantly focused on English text data, resulting in a disproportionate availability of sentiment resources for English. This paper examines the performance of transformer models in Sentiment Analysis tasks across multilingual datasets and text that has undergone machine translation. By comparing the effectiveness of these models in different linguistic contexts, we gain insights into their performance variations and potential implications for sentiment analysis across diverse languages. We also discuss the shortcomings and potential for future work towards the end.
世界上有超过7000种语言进行交流,其中印度独有的780种语言。尽管存在这种语言多样性,但研究主要集中在英语文本数据上,导致英语情感资源的不平衡。本文研究了在多语言数据和经过机器翻译的文本上,变压器模型在情感分析任务中的表现。通过比较这些模型在不同语言上下文下的效果,我们得出了它们表现变化的见解,以及情感分析在多样语言中可能产生的影响。我们还讨论了未来工作的不足之处以及可能的改进方向。
https://arxiv.org/abs/2405.02887
Sentiment analysis is one of the most widely used techniques in text analysis. Recent advancements with Large Language Models have made it more accurate and accessible than ever, allowing researchers to classify text with only a plain English prompt. However, "sentiment" entails a wide variety of concepts depending on the domain and tools used. It has been used to mean emotion, opinions, market movements, or simply a general ``good-bad'' dimension. This raises a question: What exactly are language models doing when prompted to label documents by sentiment? This paper first overviews how sentiment is defined across different contexts, highlighting that it is a confounded measurement construct in that it entails multiple variables, such as emotional valence and opinion, without disentangling them. I then test three language models across two data sets with prompts requesting sentiment, valence, and stance classification. I find that sentiment labels most strongly correlate with valence labels. I further find that classification improves when researchers more precisely specify their dimension of interest rather than using the less well-defined concept of sentiment. I conclude by encouraging researchers to move beyond "sentiment" when feasible and use a more precise measurement construct.
情感分析是文本分析中使用最广泛的工具之一。随着大型语言模型的最新进展,它比以往更准确、更易用,使得研究人员只需要一个简单的英语提示就可以对文本进行分类。然而,“情感”一词在不同的领域和工具中包含着广泛的含义。它被用意义情感、意见、市场运动或简单的“好坏”维度。这引发了一个问题:语言模型在受到情感提示时,实际上在做什么?本文首先概述了情感在不同情境下的定义,强调情感是一个复杂衡量指标,因为它涉及多个变量,如情感强度和意见,而没有区分它们。然后,我在两个数据集上测试了三种语言模型,针对情感、情感强度和立场的提示进行分类。我发现情感标签与情感强度标签最密切相关。我进一步发现,当研究者更精确地指定他们的兴趣维度而不是使用不太明确的情感概念时,分类会改善。最后,我鼓励研究者超越“情感”的范畴,并使用更精确的测量指标。
https://arxiv.org/abs/2405.02454
Language technologies have made enormous progress, especially with the introduction of large language models (LLMs). On traditional tasks such as machine translation and sentiment analysis, these models perform at near-human level. These advances can, however, exacerbate a variety of issues that models have traditionally struggled with, such as bias, evaluation, and risks. In this position paper, we argue that many of these issues share a common core: a lack of awareness of the factors, context, and implications of the social environment in which NLP operates, which we call social awareness. While NLP is getting better at solving the formal linguistic aspects, limited progress has been made in adding the social awareness required for language applications to work in all situations for all users. Integrating social awareness into NLP models will make applications more natural, helpful, and safe, and will open up new possibilities. Thus we argue that substantial challenges remain for NLP to develop social awareness and that we are just at the beginning of a new era for the field.
语言技术已经取得了巨大的进步,特别是随着大型语言模型(LLMs)的引入。在这些传统任务(如机器翻译和情感分析)中,这些模型表现接近人类水平。然而,这些进步也可能加剧模型长期以来一直难以解决的问题,例如偏见、评估和风险。在本文论文中,我们认为许多这些问题共享一个共同核心:对自然语言处理操作的社会环境因素、上下文和影响的缺乏认识,我们称之为社会意识。虽然自然语言处理在解决形式语言方面正在取得进步,但在添加所需的社交意识以使语言应用在所有情况和所有用户中正常运行方面,进展有限。将社会意识集成到自然语言处理模型中,将使应用更加自然、有益和安全,并开辟新的可能性。因此,我们认为在NLP开发社会意识方面仍然存在巨大的挑战,我们刚刚进入该领域的新的时代。
https://arxiv.org/abs/2405.02411
Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit weaknesses, the causes of which we examine in more detail. Furthermore, different approaches to combining the audio and lyrics results are proposed and evaluated. Considering both modalities generally leads to improved performance. We investigate misclassifications and (also intentional) contradictions between audio and lyrics sentiment more closely, and identify possible causes. Finally, we address fundamental problems in this research area, such as high subjectivity, lack of data, and inconsistency in emotion taxonomies.
情感或情绪可以在音乐的各个层次上表达出来。在自动分析中,通常会分析实际音频数据,但歌词也可以在感知情绪中扮演至关重要的角色。首先,我们根据歌词和音频分别评估各种情感分析模型。已经显示出了很好的效果,但它们也存在不足之处,我们将在更详细地研究其原因。此外,我们还提出了几种将音频和歌词结果相结合的方法,并对其进行了评估。考虑到两种方法的结合通常会提高性能。我们更深入地研究了音频和歌词情感的误分类和(也可能是故意)的矛盾,并确定了可能的原因。最后,我们在这个研究领域 addressing fundamental problems,例如高度主观性,数据不足和情感分类的不一致性。
https://arxiv.org/abs/2405.01988
The Internet has become an essential tool for people in the modern world. Humans, like all living organisms, have essential requirements for survival. These include access to atmospheric oxygen, potable water, protective shelter, and sustenance. The constant flux of the world is making our existence less complicated. A significant portion of the population utilizes online food ordering services to have meals delivered to their residences. Although there are numerous methods for ordering food, customers sometimes experience disappointment with the food they receive. Our endeavor was to establish a model that could determine if food is of good or poor quality. We compiled an extensive dataset of over 1484 online reviews from prominent food ordering platforms, including Food Panda and HungryNaki. Leveraging the collected data, a rigorous assessment of various deep learning and machine learning techniques was performed to determine the most accurate approach for predicting food quality. Out of all the algorithms evaluated, logistic regression emerged as the most accurate, achieving an impressive 90.91% accuracy. The review offers valuable insights that will guide the user in deciding whether or not to order the food.
互联网已成为现代人不可或缺的工具。与所有生物一样,人类对生存具有重要需求,包括呼吸空气、可饮用水、遮蔽保护和食物。世界的不断变化使我们的生存变得日益复杂。越来越多的人使用在线订餐服务在家里吃饭。尽管订餐有很多方法,但客户有时会对他们收到的食物感到失望。我们的努力是建立一个模型,以确定食物的质量和口感。我们收集了来自知名订餐平台的食物评论超过1484条数据,包括Food Panda和HungryNaki。利用收集到的数据,对各种深度学习和机器学习技术进行了严格的评估,以确定预测食物质量的最准确方法。在所有评估算法中,逻辑回归被证明是最准确的,达到令人印象深刻的90.91%的准确度。评论提供了有关订购食物的有价值的见解,将指导用户决定是否订购食物。
https://arxiv.org/abs/2405.06667