Exploring RAGs: Enhancing Chatbots with Retrieval Augmented Generation

Since ChatGPT was first unveiled, one of my persistent doubts has been how it will work in applications requiring proprietary or post-2022 data.

One approach to solve this is to make what's called a RAG.

Here I will outline the ideas behind this as I understand them, and build my own version of it using Langchain and other tools. (Though in the next post)

So first of all, in this part, we will deal with, what is a RAG all about? The history and theory behind it, in as non technical terms as I can.

What is a RAG?

Suppose you want to develop a chatbot or assistant, that deals with proprietary or highly specific data that is new, and certainly not within the web crawled general data the model has been trained on.

There is the method of fine-tuning models, but that takes a massive amount of compute resources, every time you fine-tune.

What if language models could refer to a set of data, similar to a student using books for an assignment, rather than learning the entire book's content?

This approach is what's implemented by a RAG or Retrieval Augmented Generation system.

A possible workflow for making such a system would be as follows: (bear with me for the jargon)

We first take a LLM(large language model) and then connect it to our code, using some APIs.

Next, we convert the corpus of data into a vector of numerical representations using embeddings. This representation is then stored into a vector database.

When the user asks a query, that query is converted into a vector representation, then a semantic search is done, to find relevant data within the vector database.

That relevant data is then passed to the LLM, using which it returns a tailored response.

Which basically means, convert it to a number, and check how similar that number is to another number when retrieving.

Making a very simple version of it

For a very simple version of RAG without any of the fancy vector databases and embeddings, we use a basic similarity measure directly on the corpus and user query.

The one we use is called Jacquard Similarity and uses the intersection of words to find similarity. As we will see later, this is not an accurate measure.

# Description: This file contains the code for the RAG model
# Example: reuse your existing OpenAI setup
from openai import OpenAI

corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

# This function calculates the similarity by finding the intersection over union of the sets
def jaccard_similarity(query,document):
    query = query.lower().split(" ")
    document = document.lower().split()
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    similarity =  len(intersection)/len(union)
    return similarity

def return_response(query,corpus):
    similarities = []
    for doc in corpus:
        similarity =  jaccard_similarity(query, doc)
        similarities.append(similarity)
    most_viable_response = similarities.index(max(similarities))
    return corpus_of_documents[most_viable_response]

user_prompt = "What is a leisure activity that you like?"
user_input = input(user_prompt+"\n")
relevant_document = return_response(user_input, corpus_of_documents)


prompt = """
You are a bot that makes recommendations for activities. You answer in very short sentences and do not include extra information.
This is the recommended activity: {relevant_document}
The user input is: {user_input}
"""
prompt = prompt.format(relevant_document=relevant_document, user_input=user_input)

# Point to the local server
client = OpenAI(base_url="http://localhost:5555/v1", api_key="lm-studio")

completion = client.chat.completions.create(
  messages=[
    {"role": "system", "content": prompt},
    {"role": "user", "content": "Compile a recommendation to the user based on the recommended activity and the user input."}
  ],
  model="Mistral7b",
  temperature=0.7,
)



output = "".join(completion.choices[0].message.content)
print(output)

This is adapted from the tutorial here https://learnbybuilding.ai/tutorials/rag-from-scratch. I had to change some API parts such that I could use a Local LLM using LMStudio and it's local server API system.

This will send the relevant sentences to the LLM along with the query and the LLM will generate the rest of the response.

See you in the next one

Kept this one short so that the different versions of the process are clearly seen.

In the next part, I will delve into using real standard tools to build a RAG system, documenting my learnings and providing detailed guides. Stay tuned!

Exploring RAGs: Enhancing Chatbots with Retrieval Augmented Generation

Table of contents

What is a RAG?

Making a very simple version of it

See you in the next one