Image
Person in Neon-beleuchteter Umgebung

14.12.2023 | Blog Retrieval Augmented Generation (RAG) – What's it all about?

"Retrieval Augmented Generation" (RAG) is a concept that is becoming increasingly common in the field of enterprise search. In his blog post, IntraFind AI expert Tim Vossen explains what the core idea of RAG is and what issues need to be considered when implementing it.

What does RAG stand for?

In this context, "RAG" stands for "Retrieval Augmented Generation". The word "generation" stands for the generation of an answer. "Retrieval" here stands for finding relevant documents from a database. "Augmented" means that these documents are used when creating an answer to augment, i.e. improve, the result.

Ask questions and get answers

The starting point for this idea is, first, that users have questions in their work and can ask these questions directly to their IT system using natural language and should then receive answers that are as helpful and correct as possible.

LLMs

Ever since OpenAI's ChatGPT was published in November 2022 and (rightly) generated a huge media response, so-called Large Language Models (LLMs) have been increasingly used to generate such a response. 

Generally speaking, these LLMs have been around for some time and exist in a huge number of forms and variants. The most successful, however, are the "Generative Pretrained Transformers" (GPT), which also give ChatGPT its name. These models are based on Transformer technology, which has been used with great success in many areas (in addition to speech processing, for example, in computer vision or recommendation systems) since its introduction in 2017. When they are created, they usually first consume huge amounts of text ("pre-training") before they are trained to behave in a task-specific manner ("fine-tuning") using smaller amounts of data.

Limits of LLMs

Because these models have seen large amounts of text on the internet, they are often able to accurately reproduce facts about the world. This works all the better the more frequently and clearly these facts appear in the training material. As a result, they can usually answer questions that focus on generally known facts (e.g. "What is the capital of Germany called?") very well. This also tends to work for more exotic knowledge such as "What is the capital of Bhutan?" (it's "Thimphu") or questions about popular fiction such as "Which planet did Luke Skywalker grow up on?" (it's "Tatooine").

However, if the required knowledge is too specialized or too domain-specific, then it will either appear too rarely in the training material or not at all (e.g. in the case of non-public information). The knowledge may also be too new, so that it was not yet available at the time of training. However, since LLMs were primarily trained to complete texts, they often do just that: They generate an answer that sounds plausible but is not correct. This is known as "hallucinations".

Avoid hallucinations

As these hallucinations are generally undesirable, some efforts are being made to improve the training processes and thus avoid the hallucinations. Instead, the LLMs are supposed to indicate that they do not have enough information to answer a question. For example, the leading large models (e.g. from OpenAI or Anthropic) use RHLF (Reinforcement Learning from Human Feedback), in which user feedback is used, for example, to reward refusals to answer questions.

These techniques have great benefits but are not yet perfect. Also, at the moment it usually takes 6-12 months before the corresponding techniques are available for open source models - which offer many advantages.

Use organisation's own data sources

Of course, "no answer" is better than a wrong answer. But a correct answer would be even better. However, users often have access to documents that make it easier or even possible to answer their questions. This can be up-to-date information from the public Internet or internal organizational data sources such as document management systems, drive directories or databases.

And this is exactly where the RAG approach comes into play, which attempts to put the LLM in a better position to generate answers by providing it with relevant documents in addition to the user enquiry. Equipped with this additional knowledge, the LLM is then often able to answer the questions accurately.

Information Retrieval improves answers

The challenge here is to filter out the relevant information from the large amount of data. Today, proven and efficient information retrieval techniques such as lexical searches are often supplemented by so-called "semantic" searches. This is known as a "hybrid" search. 

The lexical search basically uses search terms and supplements them with thesauri (for example, anyone searching for "ebike" should also find "pedelec") and lemmatization (anyone searching for "children" should also find "child").

The semantic search is completely detached from individual search terms (sequences of letters) and instead relies on so-called "embeddings". These are vectors that usually consist of several hundred numerical values and which, put simply, represent the content (i.e. the semantics) of a text (or text section). This makes it possible to identify documents (or individual passages) that are closely related in terms of content to the question posed. For example, if you search for "How can I order a notebook for a working student?" you will ideally also find information on "Hardware procurement for interns and external employees". User authorizations must be considered when searching. This means that only documents that the respective users are authorized to see may be included in the search. Otherwise, users could indirectly access information for which they have no permissions.

It is a great advantage of RAG to be able to take authorizations into account during retrieval. This is not directly possible, for example, with the approach of allowing a model to learn the content from all available internal organizational data sources through training (or fine-tuning).

Benefit of the hits

If the information found in this way is made available to the LLM, it can use it when answering user questions. Depending on what information is contained here, this can greatly simplify the work of the LLM and thus significantly increase the quality of the answers.

The LLM can also indicate which hits it has used to generate the answers. This enables users to check the plausibility of the answers themselves.

Implementation challenges

This basic idea is simple and obvious. However, there are many challenges that arise during implementation to achieve optimal results. This starts with the selection of models for both retrieval and generative AI and extends to many detailed search-specific questions and security aspects: For example, how can it be ensured that only the data that individual users are authorized to see is used to answer questions?

However, organizations do not necessarily have to deal with such details themselves but can entrust themselves to experts who have already developed detailed concepts and can implement appropriate solutions.

Conclusion

With the RAG approach, the flexibility and intelligence of modern LLMs and the value of public and internal data sources can be combined to create a powerful tool that can support users in their daily work.

As an AI and search specialist, IntraFind has corresponding products and solutions in its portfolio and offers a secure framework for the use of LLMs and RAGs. We advise and support our customers on the best solution for their use case.

Related articles

Image
Flugzeuge

How organizations benefit from enterprise search with genAI

Find out more about how companies and authorities can use large language models in compliance with data protection regulations: A categorisation of the technologies and possible use cases.
Read blog
Image
Future

Semantic Search

With the advent of large language models and deep learning, there arises a need to redefine search engines. The era of relying solely on word-based searches is over, making way for semantic search.
Read blog
Image
Blick in die Glaskugel

ChatGPT: The future of search?

ChatGPT has become popular in a short time. Our Head of Research has tried out ChatGPT right in the beginning and thought about its effects on search engines.
Read blog

The author

Tim Vossen
Senior Software Engineer
Tim Vossen has been active in software development for over 15 years. He has been working as a senior software engineer at IntraFind since 2019. His focus there lies on the topics of text classification and machine learning.
Image
Tim Vossen