Can you really have both good and quick RAG? The answer to that depends on what you think ‘good’ and ‘quick’ look like. You can have good RAG. And you can have quick RAG. But having both at the same time is not so straight forward and this is why.
Good RAG
Good RAG behaves in key ways:
- Retrieves the right information, to provide accurate, correct answers, whereas bad RAG would retrieve text that only sounds similar
- Links every factual answer to a retrieved document; it doesn’t invent missing details but rather responds with “I don’t know” for example
- Addresses ambiguity head on and prioritises safety; doesn’t guess interpretation, asks for clarification, will decline to answer rather than hallucinate, recognising when retrieval is weak or conflicting and will bring this uncertainty to the fore
- Knows the latest versions of documents, which documents should override others, and which are the most authoritative for a specific question
- Gives the the correct answer no matter how the question is asked, whether that be casually, using synonyms or in a long-winded way
Quick RAG
When it comes to deployment, getting a RAG system live can be achieved very quickly, given a certain scope.
From choosing a simple set of tools, limiting the data to be ingested, organising the content for quick search, connecting it to a secure access point, to then making it available to users, can be carried out in hours.
A quick RAG deployment will have its limitations and sacrifices. Speed as the priority will result in the absence of fine-tuning, and customisation and optimisation will have to be addressed down the road. Other sacrifices will be that answers may be okay but not perfect, and the understanding of complex questions may be limited.
Quick RAG – forgoing the good
It is entirely possible and feasible to set up a RAG system very quickly.
By simply using website crawling as the data source and putting it into the vector store, a chatbot can be live within hours. However, websites are not designed or structured with RAG in mind. Herein lies the difficulty of having both good and quick RAG at the same time.
Websites are built for user experience, ease of navigation, and in many instances to generate sales. They are narrative based, hierarchical and spread across multiple pages with drop downs, click throughs and the like. The ‘R’ in RAG stands for retrieval and is built for this. However, retrieval and navigation design differ and that is the rub.
Spending time organising the content is the basis for good RAG which means quick RAG deployment will come with compromises. For example when using website data as the content source for quick RAG deployment, the website design will affect the quality of answers for a number of reasons:
- Chunking website data for a RAG system often results in an absence of critical context, the bringing together of unrelated topics and includes ‘noise’ such as the navigation bars, calls to action, and cookie text.
- On website pages there are many paragraphs that may look related but aren’t. There are also paragraphs that may even be repeated across many pages. Both scenarios have implications when it comes to delivering quick and good RAG.
The multiple pages of similar or same text results in duplicate embeddings, delivers confusing matches based on similarity, and returns high recall but retrieval precision is low.
- Website pages are optimised for SEO. There are repeated words, vague language, and specifics are put into PDFs. This structure and design doesn’t work well for RAG. RAG works on statements, facts and clear scope and constraints. Without this, guesses are made to fill the gaps. RAG needs grounding to avoid hallucinations.
- Most websites have common navigation elements that appear at the bottom of every page. Because of this, during the website crawl they get embedded hundreds of times, pushing out real content, and being the dominant content in the vector. This will result in the loss of both quality and speed of retrieval.
Quick RAG deployment does not correlate to good RAG unless the heavy lifting of content management has been done prior, because the content used is the foundation of good RAG.
Good and quick RAG
As mentioned before, it is entirely possible and feasible to get a RAG system up and running quickly. However, with the quick there must be the sacrifice of good. That is, unless the heavy lifting with regards content management (cleaning, processing, indexing) is already done.
Combing quick RAG with good RAG, on the fly, is only beneficial to organisations when certain conditions prevail:
- Answer scope is limited
- Content is restricted for e.g. internal documents, product manuals
- All information shared as a fact is accompanied by a specific, verifiable source
The most desirable “quick and good RAG” scenario is one where time has been taken to ensure content is correctly structured, safeguarding is built-in, and all data is verifiable. This results in the highest levels of accuracy, trustworthiness, and transparency. The next step can be quick RAG deployment.
Good RAG must be a priority for organisations that value reputation, trust, and the customer experience. This means:
- Data has already been verified and curated
- Content is organised into clear, digestible pieces
- Embeddings are created upfront
- Indexes are built for fast searching
Good data requires maintenance
Good RAG needs good data. Data changes over time and it must be maintained to ensure that good RAG doesn’t turn into bad RAG. Over time new documents/content will be created, old documents will get updated or become obsolete and removed. With progress, models are constantly improving which affects data retrieval.
When content maintenance is not carried out both quality and speed are affected. Wrong or outdated info may be retrieved, resulting in the wrong or outdated answers being shared, and retrieval may be slowed because of incorrect indexes or data that is disconnected.
Organisations need to invest in maintenance to ensure the continuity of good and quick RAG. This involves:
- Removing outdated and obsolete documents
- Updating embeddings when models are updated
- Reindexing as appropriate
- Monitoring for accuracy of retrieval
RAG systems that have access to clean, well-organised data deliver fast and accurate responses. This is what good and quick RAG is about – fast, accurate and trustworthy answers. The caveat is that good RAG takes time.
Good and quick RAG takes time
Good RAG is dependent on the quality of the data pool it is retrieving from. Achieving enterprise level data quality to ensure accurate, trustworthy answers is a time-intensive process. And it requires ongoing commitment to maintenance.
Bad RAG damages business because it causes damage to the trust customers have in a brand. A good chatbot is as good as the RAG system and design. Customers want their answers quickly and hence fast RAG is also critical because no matter how accurate the answers, customers don’t want to have to wait for them.
Organisations that deploy RAG chatbots that are well designed invest in the time required to get the content into the best state possible to deliver quality responses. This approach results in higher customer satisfaction and better customer experiences.
RAG needs to be done right to deliver real business value. RAG done quickly will come with compromises. For enterprise grade quality chatbots, RAG takes time because ensuring the content is fit for purpose takes time.
Reputation, trust and accuracy are at stake when customers engage with chatbots, voicebots and virtual assistants. Taking the time to ensure the safeguarding of consumers and the business must be a priority. This starts with the data.
