AcademicCodeConceptpaperSeries

Deep dive: LLM priority for RAG, AIOS, More Agents Is All You Need

Ever wondered if large language models (LLMs) stick to the information they retrieve or if they rely on their internal knowledge? In this video, we dive into a recent paper from Stanford University that explores this very question. Discover the experiments they conducted, the surprising insights they uncovered, and what it means for the future of AI. We’ll also touch on AIOS, a revolutionary LLM Agent Operating System, and how it’s changing the way we interact with machines. Stay tuned to the end for the most interesting revelations about the performance of LLMs with manipulated data!

Are LLM faithful to RAG?

A recent paper from Stanford University explores how well LLMs respect retrieved information or fallback to their internal knowledge. The goal is to quantify the tension between LLMs’ internal knowledge and the retrieved information

𝗘𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁𝘀:

  • Create a dataset from six domains using web content with questions, answers, and k references
  • Manipulate samples and introduce errors in reference documents (e.g. 20mg → 60mg)
  • Create a Baseline using LLM without context
  • Iterate over dataset with manipulated documents (0 errors, 1 error, 2 …)
  • Determine if LLMs prefer internal knowledge or RAG data.

𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀:

  • Using correct retrieved information increases accuracy from 35.7% to 94%.
  • Manipulated retrieved information led to LLM citing the wrong information.
  • Stronger LLMs (GPT-4) resist incorrect retrieved information better.
  • “strict” prompts (YOU MUST) helped respect retrieved information.
  • RAG most improvements for up-to-date information and questions.

AIOS: The First LLM Agent Operating System

AIOS (LLM Agent Operating System) is a new agent orchestration framework that embeds large language models into operating systems, creating an OS with a “brain” that can “understand”.

AIOS is designed for optimal resource allocation, facilitating context switches, concurrent execution, tool services for agents, access control, and providing a rich toolkit for developers.

AIOS is based on several key agents that orchestrate the others. It consists of:

  • an Agent Scheduler for prioritizing agent requests,
  • a Context Manager for managing interaction context,
  • a Memory Manager for short-term memory,
  • a Storage Manager for long-term data retention,
  • a Tool Manager for managing external API tools,
  • and a Access Manager for enforcing privacy and access control policies.

Those agents communicate with the AIOS SDK in an interactive mode, along with non-LLM tasks coming from the OS Kernel (with the Process scheduler, the memory manager, etc). This architecture allows AIOS to integrate complex AI functionalities into traditional operating systems, enabling the development of more intelligent, responsive, and efficient applications that can leverage the full power of LLMs alongside conventional OS resources and capabilities.


This approach is a shift in the way we interact with machines with agents deployed at the Operating System level that accomplish complex tasks. This trend is also demonstrated by Apple’s ReALM models capable of understanding not only conversation, but also on-screen and background jobs information. This a new era of intelligent computing.

Looking at LLMs as chatbots is the same as looking at early computers as calculators. We’re seeing an emergence of a whole new computing paradigm, and it is very early. Andrey Karpathy (09/23)

More Agents Is All You Need

Demonstrates that adding more agents increases LLMs accuracy using a simple sampling-and-voting technique. Tests across various LLM benchmarks show this approach improves performance significantly, especially for complex tasks.

For instance, with 15 agents, Llama2-13B equals Llama2-70B’s accuracy, and with 15 to 20 agents, Llama2-70B and GPT-3.5-Turbo reach the accuracy of their more powerful versions. The authors share the code publicly.

“We realize that the LLM performance may likely be improved by a brute-force scaling up the number of agents instantiated.”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses User Verification plugin to reduce spam. See how your comment data is processed.