Introduction to Retrieval Augmented Generation

September 6, 2025

In the first topic, I want to explore retrieval augmented generation or RAG. This is a bold choice to begin with since it's not traditional but it takes us to storage, similarity searches and other fun things.

What is the problem we want to solve? I remember when ChatGPT first came out, everyone was amazed at how it generated text to really small, not context rich prompt. It could write RAP songs on a given topic like your favorite artist. Soon though, users started crafting elaborate prompts to come up with something more useful like a proposal document for work. What really became very apparent early on was that the LLM responses improved greatly with addtional or relevant context.

LLMs' responses are only as good as the data they have been trained on. This leads to a few limitations. First, there is a cut off date for an LLM's knowledge. Any data or information which happens after the training will be unknown to the LLM. Second, LLM is aware of general knowledge but no specific knowledge. These two limitations severly impedes an LLM's capability to generate anything useful for your specific usecase. For instance, if I want to understand why the house prices are high in a town, then LLM can provide some information based on the historical town records. However, it won't be able to use any new information like a new town development or new businesses which may drive the house prices up.

RAG was developed to address this limitation. The core idea is simple. RAG stands for retrieval augmented generation and it works as follows,

First, a user prompt is used to "retrieve" any similar information from a data store.

Second, this information is used to "augment" the orginal user prompt before providing it to the LLM.

Third, the LLM then "generates" the response to the user prompt based on the augmented prompt.

Next we will focus on the retrieval