01] 今週の主要ML論文 (Top ML Papers of the Week)

(discuss.pytorch.kr)

5 ポイント投稿者 ninebow 2023-10-02 | 2件のコメント | WhatsAppで共有

概要

DAIR.AIが毎週公開しているML論文に関する記事を自動翻訳してみました。
今週の論文は大半がLLM（Large Language Models）に焦点を当てているようです。その中でも、さまざまな環境におけるLLMプロセスの効率化アルゴリズムの改善、LLMのGraph Neural Prompting、論理的思考過程の適用など、多様なテーマを扱っています。
今週選ばれた論文の中では、「Boolformer」や「Vision Transformers Need Registers」のように、他のAI分野との融合によって研究が進められている傾向もうかがえます。
このようにAI技術の発展は、各分野を個別に掘り下げるだけでなく、複数分野を融合して新たなアプローチや解決策を模索する重要な一環であることが分かります。

反転の呪い / The Reversal Curse

論文紹介

「aはb」という形式の文を学習した人工ニューラルネットワークは、その逆方向である「bはa」へは自動的に一般化しないという事実、すなわち反転の呪いを発見し、架空の文に対して人工ニューラルネットワークをファインチューニングし、モデルサイズやモデルファミリー全体にわたってその効果を実証します。 #llm-reasoning

Finds that llms trained on sentences of the form “a is b” will not automatically generalize to the reverse direction “b is a”, i.e., the reversal curse; shows the effect through finetuning llms on fictitious statements and demonstrating its robustness across model sizes and model families.

論文リンク

https://owainevans.github.io/reversal_curse.pdf

さらに読む

https://x.com/OwainEvans_UK/status/1705285631520407821

ファウンデーションモデルの効果的な長文コンテキスト拡張 / Effective Long-Context Scaling of Foundation Models

論文紹介

長文コンテキストのタスク群において、すでに gpt-3.5-turbo-16k の総合性能を上回る 70b バリアントを提案します。これには、人手で注釈された長文指示データを必要としない、コスト効率の高い命令チューニング手法が含まれます。 #1b-context-window #100k-context-window

Propose a 70b variant that can already surpass gpt-3.5-turbo-16k’s overall performance on a suite of long-context tasks. this involves a cost-effective instruction tuning procedure that does not require human-annotated long instruction data.

論文要旨

最大32,768トークンの有効なコンテキストウィンドウをサポートする一連の長文コンテキストLLMを紹介します。私たちのモデルシリーズは、より長い学習シーケンスと長文テキストをアップサンプリングしたデータセットを用いて、Llama 2から継続事前学習することで構築されています。言語モデリング、合成コンテキストプロービングタスク、および幅広い研究ベンチマークに対して広範な評価を行います。研究ベンチマークでは、私たちのモデルはほとんどの通常タスクで一貫した改善を達成し、長文コンテキストタスクではLlama 2に対して大幅な改善を示しました。特に、人手で注釈された長文指示データを必要としないコスト効率の高い命令チューニング手法により、70Bバリアントは長文コンテキストのタスク群において、すでに gpt-3.5-turbo-16k の総合性能を上回ることができます。これらの結果とあわせて、私たちはこの手法の個々の構成要素に関する詳細な分析も提供します。Llamaの位置エンコーディングを詳しく掘り下げ、長距離依存関係をモデリングする際の限界について議論します。また、データの混合やシーケンス長の学習カリキュラムなど、事前学習プロセスにおけるさまざまな設計選択の影響も検証します。アブレーション実験では、事前学習データセットに長文テキストが豊富に含まれていることが高性能達成の鍵ではないこと、また長いシーケンスで最初から事前学習するよりも、長文コンテキストの継続事前学習のほうが効率的で、同等の効果を持つことを実証的に確認しています。

We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchmarks, our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2. Notably, with a cost-effective instruction tuning procedure that does not require human-annotated long instruction data, the 70B variant can already surpass gpt-3.5-turbo-16k's overall performance on a suite of long-context tasks. Alongside these results, we provide an in-depth analysis on the individual components of our method. We delve into Llama's position encodings and discuss its limitation in modeling long dependencies. We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.

論文リンク

https://arxiv.org/abs/2309.16039

さらに読む

https://x.com/omarsar0/status/1707780482178400261

大規模言語モデルを用いたグラフニューラルプロンプティング / Graph Neural Prompting with Large Language Models

論文紹介

知識グラフ（Knowledge Graph）から有益な知識を学習できるよう、事前学習済みLLMを支援するプラグアンドプレイ方式を提案しており、標準的なグラフニューラルネットワークエンコーダ、クロスモダリティプーリングモジュール、ドメインプロジェクタ、自己教師ありリンク予測目標など、さまざまな設計が含まれています。 #knowledge-graph

Proposes a plug-and-play method to assist pre-trained llms in learning beneficial knowledge from knowledge graphs (kgs); includes various designs, including a standard graph neural network encoder, a cross-modality pooling module, a domain projector, and a self-supervised link prediction objective.

論文要旨

大規模言語モデル（LLM）は、さまざまな言語モデリング課題で卓越した性能を示し、驚異的な汎化能力を発揮してきました。しかし、根拠のある知識を正確に捉えて返すことには、依然として本質的な限界があります。既存研究では、共同学習やカスタマイズされたモデルアーキテクチャを通じて言語モデリングを強化するために知識グラフを活用する方法が模索されてきましたが、これをLLMに適用することは、膨大なパラメータ数と高い計算コストのため困難です。さらに、事前学習済みLLMを活用しつつ、カスタムモデルをゼロから学習することを避ける方法も、依然として未解決の課題として残っています。本研究では、事前学習済みLLMがKGから有益な知識を学習できるよう支援する、新しいプラグアンドプレイ手法であるグラフニューラルプロンプティング（GNP）を提案します。GNPには、標準的なグラフニューラルネットワークエンコーダ、クロスモダリティプーリングモジュール、ドメインプロジェクタ、自己教師ありリンク予測目的関数など、さまざまな設計が含まれています。複数のデータセットに対する広範な実験により、さまざまなLLMの規模と設定にわたって、常識推論および生物医学的推論の両タスクでGNPの優位性が実証されました。

Large Language Models (LLMs) have shown remarkable generalization capability with exceptional performance in various language modeling tasks. However, they still exhibit inherent limitations in precisely capturing and returning grounded knowledge. While existing work has explored utilizing knowledge graphs to enhance language modeling via joint training and customized model architectures, applying this to LLMs is problematic owing to their large number of parameters and high computational cost. In addition, how to leverage the pre-trained LLMs and avoid training a customized model from scratch remains an open question. In this work, we propose Graph Neural Prompting (GNP), a novel plug-and-play method to assist pre-trained LLMs in learning beneficial knowledge from KGs. GNP encompasses various designs, including a standard graph neural network encoder, a cross-modality pooling module, a domain projector, and a self-supervised link prediction objective. Extensive experiments on multiple datasets demonstrate the superiority of GNP on both commonsense and biomedical reasoning tasks across different LLM sizes and settings.

論文リンク

https://arxiv.org/abs/2309.15427

さらに読む

https://x.com/omarsar0/status/1707211751354212382

Vision Transformerにはレジスタが必要です / Vision Transformers Need Registers

論文紹介

内部計算のために用途変更されたVision Transformerネットワークの特徴マップにおけるアーティファクトを特定し、その役割を果たすために入力シーケンスへ追加トークンを与える解決策を提案します。この解決策は問題を修正し、特徴マップおよびアテンションマップをより滑らかにし、高密度な視覚予測タスクで新たな最先端の結果を打ち立てます。 #vision-transformer #transformer

Identifies artifacts in feature maps of vision transformer networks that are repurposed for internal computations; this work proposes a solution to provide additional tokens to the input sequence to fill that role; the solution fixes the problem, leads to smoother feature and attention maps, and sets new state-of-the-art results on dense visual prediction tasks.

論文要旨

トランスフォーマーは近年、視覚表現を学習するための強力なツールとして台頭しています。本論文では、教師ありおよび自己教師ありViTネットワークの特徴マップに存在するアーティファクトを特定し、その性質を分析します。これらのアーティファクトは主に、画像の情報量が少ない背景領域で推論時に現れる高ノルムトークンに対応しており、内部計算のために転用されています。私たちは、Vision Transformerの入力シーケンスに追加トークンを与えてこの役割を担わせる、シンプルでありながら効果的な解決策を提案します。この解決策が、教師ありモデルと自己教師ありモデルの両方でこの問題を完全に解決し、高密度な視覚予測タスクにおいて自己教師あり視覚モデルの新たな最先端を打ち立て、より大きなモデルでの物体発見手法を可能にし、そして最も重要な点として、下流の視覚処理に向けてより滑らかな特徴マップとアテンションマップにつながることを示します。

Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. The artifacts correspond to high-norm tokens appearing during inference primarily in low-informative background areas of images, that are repurposed for internal computations. We propose a simple yet effective solution based on providing additional tokens to the input sequence of the Vision Transformer to fill that role. We show that this solution fixes that problem entirely for both supervised and self-supervised models, sets a new state of the art for self-supervised visual models on dense visual prediction tasks, enables object discovery methods with larger models, and most importantly leads to smoother feature maps and attention maps for downstream visual processing.

論文リンク

https://arxiv.org/abs/2309.16588

さらに読む

https://x.com/TimDarcet/status/1707769575981424866

Boolformer: トランスフォーマーを用いた論理関数の記号回帰 / Boolformer: Symbolic Regression of Logic Functions with Transformers

論文紹介

ブール関数のエンドツーエンドな記号回帰を実行するよう学習された初のトランスフォーマーアーキテクチャを提示し、複雑な関数に対する簡潔な式を予測でき、遺伝子制御ネットワークのダイナミクスのモデリングに応用できます。 #transformer

Presents the first transformer architecture trained to perform end-to-end symbolic regression of boolean functions; it can predict compact formulas for complex functions and be applied to modeling the dynamics of gene regulatory networks.

論文要旨

本研究では、ブール関数のエンドツーエンドな記号回帰を実行するよう学習された初のTransformerアーキテクチャであるBoolformerを紹介します。まず、クリーンな真理値表が与えられた場合、学習中に見ていない複雑な関数に対しても簡潔な式を予測できることを示します。次に、不完全でノイズの多い観測値が与えられた場合に、おおよその式を見つける能力を示します。幅広い実世界のバイナリ分類データセットでBoolformerを評価し、従来の機械学習手法に対する解釈可能な代替手段としての可能性を実証します。最後に、遺伝子制御ネットワークのダイナミクスをモデリングする広範なタスクに適用します。最近のベンチマークにより、Boolformerが数桁の高速化を伴って最先端の遺伝的アルゴリズムに匹敵することを示します。コードとモデルは公開されています。

In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions. First, we show that it can predict compact formulas for complex functions which were not seen during training, when provided a clean truth table. Then, we demonstrate its ability to find approximate expressions when provided incomplete and noisy observations. We evaluate the Boolformer on a broad set of real-world binary classification datasets, demonstrating its potential as an interpretable alternative to classic machine learning methods. Finally, we apply it to the widespread task of modelling the dynamics of gene regulatory networks. Using a recent benchmark, we show that Boolformer is competitive with state-of-the art genetic algorithms with a speedup of several orders of magnitude. Our code and models are available publicly.

論文リンク

https://arxiv.org/abs/2309.12207

さらに読む

https://x.com/stephanedascoli/status/1706235856778834015

大規模マルチモーダルモデルを事実に基づく拡張RLHFでアラインする / Aligning Large Multimodal Models with Factually Augmented RLHF

論文紹介

大規模マルチモーダルモデルのアラインメントのために、事実に基づく拡張RLHFを適用します。このアプローチはRLHFにおける報酬ハッキングを緩和し、LLaVA-Benchデータセットでテキスト専用GPT-4の94%の性能水準まで改善します。 #llm-alignment #multimodal #rlhf

Adapts factually augmented rlhf to aligning large multimodal models; this approach alleviates the reward hacking in rlhf and improves performance on the llava-bench dataset with the 94% performance level of the text-only gpt-4.

論文要旨

大規模マルチモーダルモデル（LMM）は複数のモダリティにまたがって構築されており、2つのモダリティ間のアラインメントがずれると、文脈内のマルチモーダル情報に基づかないテキスト出力を生成する「ハルシネーション」が発生することがあります。こうしたマルチモーダルのミスアラインメント問題に対処するため、テキスト領域の人間のフィードバックによる強化学習（RLHF）を視覚言語アラインメントのタスクに適用し、人間のアノテーターに2つの応答を比較して、よりハルシネーションの強い応答を特定してもらい、視覚言語モデルがシミュレートされた人間報酬を最大化するよう学習させます。私たちは、画像キャプションや事実に基づく多肢選択肢などの追加の事実情報で報酬モデルを強化する、Factually Augmented RLHFという新しいアラインメントアルゴリズムを提案します。これにより、RLHFにおける報酬ハッキング現象を緩和し、性能をさらに向上させます。また、モデルの全体的な能力を改善するため、GPT-4が生成した学習データ（視覚命令チューニング用）を、従来から利用可能だった人手作成の画像テキスト対で補強しました。提案手法を実世界のシナリオで評価するため、ハルシネーションへのペナルティに特に焦点を当てた新しい評価ベンチマークMMHAL-BENCHも開発しました。RLHFで学習された初のLMMとして、私たちの手法はLLaVA-Benchデータセットでテキスト専用GPT-4の94%の性能水準を達成し（従来の最良手法は87%水準にとどまる）、さらにMMHAL-BENCHでは他のベースラインを60%上回る改善を示しました。コード、モデル、データは https://llava-rlhf.github.io でオープンソースとして公開されています。

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context. To address the multimodal misalignment issue, we adapt the Reinforcement Learning from Human Feedback (RLHF) from the text domain to the task of vision-language alignment, where human annotators are asked to compare two responses and pinpoint the more hallucinated one, and the vision-language model is trained to maximize the simulated human rewards. We propose a new alignment algorithm called Factually Augmented RLHF that augments the reward model with additional factual information such as image captions and ground-truth multi-choice options, which alleviates the reward hacking phenomenon in RLHF and further improves the performance. We also enhance the GPT-4-generated training data (for vision instruction tuning) with previously available human-written image-text pairs to improve the general capabilities of our model. To evaluate the proposed approach in real-world scenarios, we develop a new evaluation benchmark MMHAL-BENCH with a special focus on penalizing hallucinations. As the first LMM trained with RLHF, our approach achieves remarkable improvement on the LLaVA-Bench dataset with the 94% performance level of the text-only GPT-4 (while previous best methods can only achieve the 87% level), and an improvement by 60% on MMHAL-BENCH over other baselines. We opensource our code, model, data at https://llava-rlhf.github.io.

論文リンク

https://arxiv.org/abs/2309.14525

さらに読む

https://x.com/arankomatsuzaki/status/1706839311306621182

大規模言語モデルのアラインメント: サーベイ / Large Language Model Alignment: A Survey

論文紹介

外的アラインメント、内的アラインメント、機械論的解釈可能性、アライン済みLLMへの攻撃、アラインメント評価、今後の方向性と議論をテーマとする、LLMアラインメントに関する包括的なサーベイ論文です。 #survey-paper #llm-alignment

A comprehensive survey paper on llm alignment; topics include outer alignment, inner alignment, mechanistic interpretability, attacks on aligned llms, alignment evaluation, future directions, and discussions.

論文要旨

近年、大規模言語モデル（LLM）は目覚ましい進歩を遂げています。こうした進展は大きな注目を集める一方で、さまざまな懸念も同時に引き起こしています。これらのモデルの潜在力は疑いようがないほど大きいものの、不正確であったり、誤解を招いたり、さらには有害なテキストを生成する可能性があります。したがって、これらのモデルが人間の価値観に沿った振る舞いを示すようにするため、アラインメント技術を用いることが極めて重要です。本サーベイは、この分野における既存の能力研究とあわせて、LLM向けに設計されたアラインメント手法を広範に探究することを目指しています。AIアラインメントの観点を採用し、LLMのアラインメントに関する一般的な手法と新たな提案を、外的アラインメントと内的アラインメントに分類します。さらに、モデルの解釈可能性や、敵対的攻撃に対する潜在的な脆弱性などの重要な問題も調査します。LLMアラインメントを評価するため、多様なベンチマークと評価手法を提示します。LLMに関するアラインメント研究の現状を論じたうえで、最後に将来に向けたビジョンを示し、今後有望な研究領域を考察します。本サーベイに込めた私たちの願いは、単にこの領域への研究関心を喚起することにとどまりません。能力が高く安全なLLMの実現に向けて、AIアラインメント研究コミュニティと、LLMの能力探究に没頭する研究者との間のギャップを埋めることも目指しています。

Recent years have witnessed remarkable progress made in large language models (LLMs). Such advancements, while garnering significant attention, have concurrently elicited various concerns. The potential of these models is undeniably vast; however, they may yield texts that are imprecise, misleading, or even detrimental. Consequently, it becomes paramount to employ alignment techniques to ensure these models to exhibit behaviors consistent with human values. This survey endeavors to furnish an extensive exploration of alignment methodologies designed for LLMs, in conjunction with the extant capability research in this domain. Adopting the lens of AI alignment, we categorize the prevailing methods and emergent proposals for the alignment of LLMs into outer and inner alignment. We also probe into salient issues including the models' interpretability, and potential vulnerabilities to adversarial attacks. To assess LLM alignment, we present a wide variety of benchmarks and evaluation methodologies. After discussing the state of alignment research for LLMs, we finally cast a vision toward the future, contemplating the promising avenues of research that lie ahead. Our aspiration for this survey extends beyond merely spurring research interests in this realm. We also envision bridging the gap between the AI alignment research community and the researchers engrossed in the capability exploration of LLMs for both capable and safe LLMs.

論文リンク

https://arxiv.org/abs/2309.15025

さらに読む

https://x.com/omarsar0/status/1706845285064818905

Qwen 技術報告書 / Qwen Technical Report

論文紹介

言語エージェント生成のためのツール利用および計画能力に関わるタスクにおいて、RLHFの強みを示す一連のLLMを提案します。 #qwen-vl #rlhf

Proposes a series of llms demonstrating the strength of rlhf on tasks involving tool use and planning capabilities for creating language agents.

論文要旨

大規模言語モデル（LLM）は人工知能分野に革命をもたらし、これまでは人間の専売特許と考えられていた自然言語処理タスクを可能にしました。この記事では、大規模言語モデルシリーズの第1弾であるQwenを紹介します。Qwenは、さまざまなパラメータ数を持つ複数のモデルを包含する包括的な言語モデルシリーズです。ここには、事前学習済みのベース言語モデルであるQwenと、ヒューマンアラインメント技術でファインチューニングされたチャットモデルであるQwen-Chatが含まれます。ベース言語モデルは多様なダウンストリームタスクで一貫して優れた性能を示しており、特に人間のフィードバックによる強化学習（RLHF）を用いて学習されたチャットモデルは非常に高い競争力を備えています。チャットモデルは、エージェントアプリケーションを構築するための高度なツール利用および計画機能を備えており、コードインタープリタの活用のような複雑なタスクでも、より大規模なモデルと比較して印象的な性能を示します。さらに、ベース言語モデルを基盤として構築されたコーディング特化モデルのCode-QwenおよびCode-Qwen-Chat、ならびに数学特化モデルのMath-Qwen-Chatも開発しました。これらのモデルは、オープンソースモデルと比べて顕著に向上した性能を示し、独自モデルに対してはわずかに及びません。

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.

論文リンク

https://arxiv.org/abs/2309.16609

さらに読む

https://x.com/omarsar0/status/1707776749042364729

MentalLLaMA: 大規模言語モデルを用いたソーシャルメディア上の解釈可能なメンタルヘルス分析 / MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models

論文紹介

指示追従機能を備えた解釈可能なメンタルヘルス分析のためのオープンソースLLMシリーズであり、105,000件のデータサンプルを含む、ソーシャルメディア上のマルチタスク・マルチソースな解釈可能メンタルヘルス指示データセットも提案しています。 #medical #llm-for-clinical-task #llama

An open-source llm series for interpretable mental health analysis with instruction-following capability; it also proposes a multi-task and multi-source interpretable mental health instruction dataset on social media with 105k data samples.

論文要旨

Web技術の発展により、ソーシャルメディアのテキストは自動メンタルヘルス分析の豊かな情報源になりつつあります。従来の識別的手法には解釈可能性が低いという問題があるため、近年はソーシャルメディア上の解釈可能なメンタルヘルス分析に向けて、予測とともに詳細な説明を提供することを目的とした大規模言語モデルが研究されています。その結果、ChatGPTは正しい分類に対して人間に近い説明を生成できることが示されました。しかし、LLMは依然としてゼロショット／少数ショットの設定では満足のいく分類性能を達成できていません。ドメイン特化のファインチューニングは有効な解決策ですが、2つの課題に直面しています。1) 高品質な学習データが不足していること。2) ファインチューニングのコストを下げられる、解釈可能なメンタルヘルス分析向けのオープンソースLLMが公開されていないこと。これらの問題を緩和するため、Facebookはソーシャルメディア上の105万件のデータサンプルからなる、初のマルチタスク・マルチソースの解釈可能なメンタルヘルス指示データセット（IMHI）を構築しました。生のソーシャルメディアデータは、8つのメンタルヘルス分析タスクをカバーする既存の10ソースから収集されます。専門家が作成した少数ショットプロンプトと収集したラベルを用いてChatGPTにプロンプトを与え、その応答から説明を取得します。説明の信頼性を確保するため、生成データの正確性、一貫性、品質について厳格な自動評価および人手評価を行います。IMHIデータセットとLLaMA2基盤モデルに基づき、指示追従能力を備えた解釈可能なメンタルヘルス分析向け初のオープンソースLLMシリーズであるMentalLLaMAを学習します。さらに、10のテストセットで構成されるIMHI評価ベンチマークで予測の正確性と説明の品質を検証し、MentalLLaMAの性能を評価します。その結果、MentalLLaMAは最先端の識別的手法に近い正確度を示し、高品質な説明を生成することが分かりました。

With the development of web technology, social media texts are becoming a rich source for automatic mental health analysis. As traditional discriminative methods bear the problem of low interpretability, the recent large language models have been explored for interpretable mental health analysis on social media, which aims to provide detailed explanations along with predictions. The results show that ChatGPT can generate approaching-human explanations for its correct classifications. However, LLMs still achieve unsatisfactory classification performance in a zero-shot/few-shot manner. Domain-specific finetuning is an effective solution, but faces 2 challenges: 1) lack of high-quality training data. 2) no open-source LLMs for interpretable mental health analysis were released to lower the finetuning cost. To alleviate these problems, we build the first multi-task and multi-source interpretable mental health instruction (IMHI) dataset on social media, with 105K data samples. The raw social media data are collected from 10 existing sources covering 8 mental health analysis tasks. We use expert-written few-shot prompts and collected labels to prompt ChatGPT and obtain explanations from its responses. To ensure the reliability of the explanations, we perform strict automatic and human evaluations on the correctness, consistency, and quality of generated data. Based on the IMHI dataset and LLaMA2 foundation models, we train MentalLLaMA, the first open-source LLM series for interpretable mental health analysis with instruction-following capability. We also evaluate the performance of MentalLLaMA on the IMHI evaluation benchmark with 10 test sets, where their correctness for making predictions and the quality of explanations are examined. The results show that MentalLLaMA approaches state-of-the-art discriminative methods in correctness and generates high-quality explanations.

論文リンク

https://arxiv.org/abs/2309.13567

さらに読む

https://x.com/SAnaniadou/status/1707668936634794442

ロジックを通じて大規模言語モデルにおけるゼロショット連鎖思考推論を強化する / Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

論文紹介

LLMのゼロショット連鎖思考推論を改善するための新しいニューロシンボリックフレームワークであり、記号論理の原理を活用して推論プロセスを検証・修正することで、LLMの推論能力を向上させます。 #chain-of-thought

A new neurosymbolic framework to improve zero-shot chain-of-thought reasoning in llms; leverages principles from symbolic logic to verify and revise reasoning processes to improve the reasoning capabilities of llms.

論文要旨

最近の大規模言語モデルの進歩は、さまざまな領域で驚くべき汎化能力を示してきました。しかし、その推論能力には依然として大きな改善の余地があり、とりわけ多段階の推論を必要とするシナリオではそれが顕著です。大規模言語モデルは広範な知識を持っているものの、特に推論の面で、その知識を効果的に活用して一貫した思考パラダイムを構築することにしばしば失敗します。生成的言語モデルは、推論手続きが論理原則の制約を受けないため、ときにハルシネーションを示すこともあります。大規模言語モデルのゼロショットCoT推論能力を向上させるために、私たちは記号論理の原理を活用して推論過程を検証し、それに応じて修正するニューロシンボリック・フレームワークである Logical Chain-of-Thought (LogiCoT) を提案します。算術、常識、記号、因果推論、社会問題など多様な領域の言語タスクに対する実験的評価を通じて、論理によって強化された推論パラダイムの有効性を実証しました。

Recent advancements in large language models have showcased their remarkable generalizability across various domains. However, their reasoning abilities still have significant room for improvement, especially when confronted with scenarios requiring multi-step reasoning. Although large language models possess extensive knowledge, their behavior, particularly in terms of reasoning, often fails to effectively utilize this knowledge to establish a coherent thinking paradigm. Generative language models sometimes show hallucinations as their reasoning procedures are unconstrained by logical principles. Aiming to improve the zero-shot chain-of-thought reasoning ability of large language models, we propose Logical Chain-of-Thought (LogiCoT), a neurosymbolic framework that leverages principles from symbolic logic to verify and revise the reasoning processes accordingly. Experimental evaluations conducted on language tasks in diverse domains, including arithmetic, commonsense, symbolic, causal inference, and social problems, demonstrate the efficacy of the enhanced reasoning paradigm by logic.

2件のコメント

alstjr7375 2023-10-02

わあ……心のこもった文章で、とても楽しく読みました。

ninebow 2023-10-03

ありがとうございます ^^;

[2023/09/25 ~ 10/01] 今週の主要ML論文 (Top ML Papers of the Week)

概要

反転の呪い / The Reversal Curse

論文紹介

論文リンク

さらに読む

ファウンデーションモデルの効果的な長文コンテキスト拡張 / Effective Long-Context Scaling of Foundation Models

論文紹介

論文要旨

論文リンク

さらに読む

大規模言語モデルを用いたグラフニューラルプロンプティング / Graph Neural Prompting with Large Language Models

論文紹介

論文要旨

論文リンク

さらに読む

Vision Transformerにはレジスタが必要です / Vision Transformers Need Registers

論文紹介

論文要旨

論文リンク

さらに読む

Boolformer: トランスフォーマーを用いた論理関数の記号回帰 / Boolformer: Symbolic Regression of Logic Functions with Transformers

論文紹介

論文要旨

論文リンク

さらに読む

大規模マルチモーダルモデルを事実に基づく拡張RLHFでアラインする / Aligning Large Multimodal Models with Factually Augmented RLHF

論文紹介

論文要旨

論文リンク

さらに読む

大規模言語モデルのアラインメント: サーベイ / Large Language Model Alignment: A Survey

論文紹介

論文要旨

論文リンク

さらに読む

Qwen 技術報告書 / Qwen Technical Report

論文紹介

論文要旨

論文リンク

さらに読む

MentalLLaMA: 大規模言語モデルを用いたソーシャルメディア上の解釈可能なメンタルヘルス分析 / MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models

論文紹介

論文要旨

論文リンク

さらに読む

ロジックを通じて大規模言語モデルにおけるゼロショット連鎖思考推論を強化する / Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

論文紹介

論文要旨

論文リンク

さらに読む

原文

関連記事

2件のコメント