What Is RAG? Building Your First RAG System from Scratch

Maxwell Timothy

Maxwell Timothy

on Sep 9, 2024

15 min read

Large language models (LLM) have enjoyed explosive popularity in the last few years. Since the breakout success of LLM-powered tools like ChatGPT, interest in building AI-powered apps has been astronomically high. One of the key technologies that has really helped push the development of AI-powered tools is RAG.

If you are in the AI space, there’s a great chance you heard of RAG quite a lot. If you are wondering what it is, how it works, what you can do with it, and how to build your first RAG app, you are in the right spot.

What Is RAG?

Retrieval-augmented generation (RAG) is a hybrid concept that combines two key elements of AI—retrieval and generation—to improve the quality and accuracy of responses in natural language processing tasks. It might sound a bit complex, but at its core, it is basically a clever combo of two things: finding useful info and making it sound better.

So, basically, a RAG system starts by retrieving relevant information or documents from a large database or knowledge source. This could be external sources like web pages or internal data like company documents, depending on the use case.

After retrieval, the system uses a generative model (like GPT-4, LLAMA, Claude, etc) to synthesize and generate a response based on the retrieved information. This step combines the power of large language models to generate coherent text with factual data from the retrieval phase.

So, all together, a RAG system retrieves information from a knowledge source and then uses this information in combination with a large language model (LLM) to produce a clear, human-like response based on that retrieved info.

It’s the difference between "I think I know" and "Here’s the exact page with the info you need."

Why Should You Care About RAG?

Alright, so why should you care about RAG? Isn’t regular AI already smart enough? Well, yes and no. Standard AI models like GPT-4 are brilliant at generating text, but they have a little problem: hallucinations. Sometimes they make things up—like confidently telling you a banana is a type of fish. Or failing to understand the context of the question you are asking.

Before we proceed, let’s see an example.

I asked Google Gemini the meaning of the acronym RAG twice, and at both times, I got the wrong responses.

Now, does it mean Google’s Gemini is not smart enough to figure out the meaning of the word? Of course, it is. There are several possible meanings of the word, and Gemini, because of not understanding the context went with the wrong definition in both instances.

To fix the issue of hallucinations and lack of contextual data—RAG comes to the rescue.

Here’s why RAG is worth your attention:

1. Leverage Your Own Data: Imagine having a virtual assistant that actually knows the ins and outs of your company’s policies, product specs, or customer history. Instead of relying solely on generalized data (like many AI models do), RAG lets you feed in your documents, FAQs, or manuals. It’s like giving your AI an all-access pass to the library of information that matters most to you.

Think of a customer support chatbot. With RAG, it can directly pull answers from your company’s knowledge base. If someone asks, “What’s the return policy on your new product line?” the system can instantly retrieve and deliver the right answer, straight from the latest version of your documentation.

2. Cutting Down on AI Hallucinations: Ever asked an AI a simple question and received a completely off-the-wall answer? That’s a hallucination. AI models sometimes fill in gaps with stuff that sounds plausible but is completely wrong. RAG fixes this by making sure the AI pulls from a reliable source.

Imagine you’re a doctor and ask an AI for treatment options based on new medical research. You wouldn’t want it to guess, right? With RAG, it would fetch relevant documents from medical journals and then generate a response based on those actual sources. You get the right answer, grounded in the right facts.

3. Better Response Accuracy: Ever get frustrated by vague or incomplete answers from an AI? With RAG, your AI isn’t pulling info out of thin air; it’s using real documents. This makes the responses not only more accurate but also more specific. Whether you’re dealing with legal questions, technical product details, or policy queries, RAG ensures you get the exact answer you need, every time.

In technical support, for instance, if a user asks how to reset a product, RAG won’t just generate some random guess—it will pull the exact reset instructions from your product manual and deliver it in a clear, helpful format.

The Building Blocks of a RAG System

You might be thinking, “Okay, sounds great, but how does this thing actually work?” A RAG system is made up of a few key components that work together, kind of like a factory assembly line. Each part has a specific job, and when they all come together, they produce a well-oiled information retrieval machine. Let’s break down the main building blocks:

1. Document Collection (Corpus)

First things first: the corpus is the foundation of your RAG system. It’s essentially a collection of documents that your system will retrieve information from. This can be anything—customer service guidelines, research papers, articles, user manuals, FAQs, or even emails. The more complete and relevant your corpus, the better your RAG system will perform.

Think of it like stocking a library. If you’ve got a bunch of outdated books, your system will give outdated answers. But if you regularly update your collection with fresh, reliable documents, your AI will always be on top of the latest info.

For example, imagine you’re running an online store. Your corpus might include product descriptions, return policies, and customer reviews. When someone asks a question about shipping details, the RAG system will scan those documents and deliver the exact information from the most relevant source.

2. User Input and Document Similarity

When a user asks a question, the RAG system doesn’t just look through the corpus willy-nilly. It uses something called similarity measures to figure out which documents are the most relevant. One simple method is Jaccard similarity, which looks at how many words overlap between the query and the documents. It’s like playing a game of “which sentence is most like the one you just said?”

Let’s say someone asks, “What’s the best way to troubleshoot my router?” The system will use similarity measures to find documents that contain similar words or phrases (like "troubleshoot," "router," "best method"). The ones with the most overlap get selected as potential answers. This is an overly simplistic exclamation, but that’s the core of it.

While Jaccard similarity is a straightforward approach, more advanced methods—like cosine similarity and vector embeddings—can also be used for greater precision. These techniques help the system find the most relevant info, even if the exact words aren’t used.

3. Post-processing with LLMs

So, you’ve got the relevant documents. Now what? Here’s where the Large Language Model (LLM) comes in. The LLM takes the information from those documents and turns it into a natural, understandable response.

Think of it like asking a librarian for help. They might hand you a stack of books, but instead of just dumping all that info on you, the LLM reads through it and gives you a nice, concise summary that directly answers your question. This makes the whole interaction faster, smoother, and a lot more user-friendly.

For example, if the corpus contains a 10-page manual on how to reset a product, the LLM can extract just a few steps you need to follow, summarize them, and present them in a friendly tone. No need to dig through a giant document—the RAG system does the heavy lifting for you!

Yep, it's really that straightforward.

To begin exploring and grasping RAG-based systems, you don’t need a vector store or even a large language model—not at first, anyway. You can learn the concepts without all the complex tools. While it’s often made to seem more complicated than it is, the basics are actually pretty simple.

Building Your Own Basic RAG System

Now that we’ve covered what RAG is and how it works, let's walk through how you could actually build a simple RAG system from scratch. Don’t worry, you don’t need a PhD in AI to get started. Just follow these steps, and you’ll have a basic system up and running in no time.

Now, before I continue, I want to stress that this is a very basic approach to building a RAG system. You aren’t going to build the next big thing this way. But this is going to give you a clear foundational understanding of how a RAG system works.

1. Create a Simple Corpus

First, you need to gather your documents. Think of this like creating your AI's knowledge base. These could be text files, PDFs, web articles—whatever relevant information you need your system to pull from. For simplicity, you could start with a few text files containing key information, like company FAQs or product manuals. The better and more organized your corpus is, the better your RAG system will perform.

Imagine you’re building a customer support bot for a tech company. Your corpus could include user manuals, common troubleshooting steps, and policy documents. This way, when someone asks a question like “How do I reset my router?” the bot will search through the manual and give the exact reset steps.

2. Use Similarity for Document Retrieval

Next, let’s tackle document retrieval. A simple way to do this is by using Jaccard similarity. This method checks how many words in the user’s query match the words in the documents. It’s quick, easy, and surprisingly effective for basic tasks.

For example, if the user types, “How do I update my software?” the system will compare that sentence with all the documents in the corpus, find the ones with the most similar words, and rank them by relevance. While Jaccard is a basic tool, it works well enough to get your feet wet.

3. Handle User Queries

Once you’ve set up your corpus and similarity measures, you need to write functions that handle user input. Essentially, when a user types a question, the function needs to process it, compare it to your corpus using the similarity measure, and return the top relevant documents. This retrieved information can then be passed to an LLM (like GPT-4) to generate a natural-sounding answer.

Let’s say the user asks, “What’s your return policy?” Your function should identify the document or section of text that talks about returns, and then let the LLM generate a conversational response based on that document. This whole process happens in the blink of an eye!

Running Your Basic RAG System

Now that we've got a grasp on the steps, let’s apply it to a practical scenario—answering a business-related question like, "What is your return policy?"

We’ll start by defining a collection of policy-related documents, and then build a simple retrieval system using Jaccard similarity to match the user’s query to the closest document.

Setting Up Your Document Collection

Let’s create a basic collection of policy-related responses:

corpus snippet

1return_policy_docs = [
2 "Our standard return policy allows returns within 30 days of purchase with proof of receipt.",
3 "Refunds are processed within 5-7 business days after the returned item is received.",
4 "We offer exchanges on defective or damaged products within 14 days of receipt.",
5 "For any return, the product must be in its original packaging and unused.",
6 "Customers are responsible for return shipping fees unless the product is defective.",
7 "Clearance items are final sale and not eligible for return or exchange.",
8 "You can initiate a return by contacting our customer support team.",
9 "Returns due to change of mind are accepted only if the item is unopened.",
10 "Please include the original receipt and a brief reason for return in your package.",
11 "Gift returns can be exchanged for store credit within 30 days of purchase."
12]

Measuring Similarity Between Query and Documents

Next, we’ll implement a similarity measure to compare the user’s query with our document collection. We’ll use the Jaccard similarity, which compares word overlap between the query and the documents.

First, we need to pre-process the text (lowercase, split into words) before performing the similarity comparison.

Jaccard similarity snippet

1def compute_jaccard_similarity(query, doc):
2 query_set = set(query.lower().split(" "))
3 doc_set = set(doc.lower().split(" "))
4 intersection = query_set.intersection(doc_set)
5 union = query_set.union(doc_set)
6 return len(intersection) / len(union)

Returning the Best-Matched Document

Now, we define a function that takes the user's query and our document collection, and returns the document with the highest similarity score.

Best match snippet

1def find_best_match(query, docs):
2 similarities = []
3 for document in docs:
4 similarity_score = compute_jaccard_similarity(query, document)
5 similarities.append(similarity_score)
6 return docs[similarities.index(max(similarities))]

Running the System

Let’s run our system with a sample user query about the return policy:

query snippet

1user_query = "Can I return a product if it's defective?"
2best_response = find_best_match(user_query, return_policy_docs)
3print(best_response)

This would output:

snippet

1"We offer exchanges on defective or damaged products within 14 days of receipt."

Congratulations! You've just built a simple RAG system tailored for answering return policy questions. Of course, this basic approach has its limitations—it’s purely based on word overlap and doesn’t account for the actual meaning of the query, but it’s a good starting point.

Handling Bad Matches

As with any basic similarity measure, the system might sometimes return irrelevant results. For example:

This might still return:

Snippet

1"We offer exchanges on defective or damaged products within 14 days of receipt."

This happens because our system is based purely on the words in common between the query and documents, and it doesn't understand the negative sentiment in "don't." We’ll deal with this and improve the system by incorporating a large language model (LLM) in the next step.

Enhancing RAG with LLMs

So you’ve retrieved relevant documents—now what?

While the basic RAG setup works fine, integrating it with a large language model (LLM) like GPT-4 or Llama2 can supercharge the results. This is where things get really exciting.

For instance, instead of just showing a chunk of legalese from your terms of service, the LLM will summarize it, turning the answer into something like, “In short, you can return any item within 30 days, no questions asked.”

Setting Up LLMs

Let’s take Llama2 as an example. To integrate it into your RAG system, you’ll send the user query along with the retrieved documents to the LLM. The LLM will then process the documents, synthesize the information, and generate a response that’s clear, accurate, and easy to follow. Think of it as your AI translator, turning complex or technical text into a simple, useful answer.

To set this up, you install an LLM locally, or you’ll need an API (like those provided by OpenAI or Hugging Face) that allows your RAG system to send the documents to the LLM for processing.

If you choose to run a local LLM, you’ll need to have a very very powerful computer to do that at the speed that you get when using APIs from the likes of HuggingFace and OpenAI.

APIs and Prompt Engineering

Here’s where prompt engineering comes in. It’s all about how you ask the LLM to do its job. A well-crafted prompt can make all the difference. Instead of just dumping the documents on the LLM, you give it specific instructions like, “Summarize the following documents for a customer asking about product returns.”

Prompt engineering is like giving your AI specific marching orders—it ensures the LLM knows exactly what you want and how to deliver it. You’ll also use APIs to send the user query and retrieved documents to the LLM and get a processed answer back.

Handling Complex Queries

Sometimes, users don’t just ask simple questions. They want detailed, nuanced answers. This is where the LLM really shines. It can take multiple documents, extract the most relevant info, and synthesize it into a response that covers all the bases.

For example, if a user asks, “How does your warranty policy compare with competitors’?” the LLM can pull together details from several documents and give a balanced, well-rounded response. Instead of just stating facts, it can provide context, making the response both accurate and insightful.

In the course of this write up, I’ve tried several different LLMs and APIs, both free and paid. I found Groq’s API particularly useful. It’s free and fast! If you have the money to pay, then I’d recommend GPT-4 and the Claude series from OpenAI and Anthropic respectively.

Wrapping It Up

To sum things up, RAG (Retrieval Augmented Generation) offers a perfect balance between fact-based accuracy and conversational fluency. By pulling from a specific, reliable set of documents and using the power of LLMs to generate responses, RAG ensures that you’re getting the right answer, every time.

Whether you’re building a customer support chatbot, a research assistant, or any other AI-powered tool, RAG gives you the flexibility to customize it to your data and needs. The system reduces errors, cuts down on hallucinations, and produces answers that sound like they were written by a knowledgeable human—not a random guess.

At Chatbase, we’ve taken years of expertise in building RAG-based systems and transformed it into something you can easily harness: the Chatbase platform. With our API and intuitive visual chatbot builder, you can effortlessly create and train chatbots using your own data—whether it’s files, website content, or internal documents. We’ve taken care of the complex engineering so you can focus on what matters: building powerful tools for customer support, internal knowledge bases, AI agents, and more. Best of all, your chatbot can be embedded directly on your website or integrated into your app via API, making it seamless to deploy and scale.

Ready to build your own chatbot with Chatbase? Try Chatbase today and unlock the power of RAG for your business!

Build your custom chatbot

You can build your customer support chatbot in a matter of minutes

Get Started