What this blog covers:
- Challenges faced by organizations using LLMs
- What is RAG and how RAG enriches LLM responses
- How does RAG work?
- How Kyvos empowers business users with context-rich LLM responses
- The combined benefits of Kyvos Gen AI-Powered semantic layer and RAG for conversational analytics
Before LLMs, search engines were the go-to tool for finding information online. Search engines are good, but they can only point in the direction of information. We need to review search results, evaluate their credibility and piece together the answers ourselves. Then, large language models (LLMs) were introduced as a complementary tool that can understand and respond to our questions, summarize complex topics or even generate creative text formats. These models are adept at identifying patterns in language and generating human-quality text in response to a user query, referred to as a prompt.
However, sometimes, these models can also return wrong answers. For instance, if a user asks LLM, “What is the capital of France?”.
In response, a user expects LLM to return a specific and current answer i.e., Paris. Now, if the LLM response is generic, like,
“The capital of a country is usually the most important city”.
This response is of no use. Similarly, if these models generate a response like,
“The capital of France is Lyon,”
Now, this information is outdated and irrelevant because Lyon used to be the capital of France hundreds of years ago.
These issues occur when the training data is not up-to-date, or the data stored in the knowledge base does not have this information. Therefore, even if LLMs can generate grammatically correct human-like text and code with remarkable accuracy, they might still hallucinate. These models struggle to incorporate domain-specific knowledge and real-time data, limiting their applicability in various industries.
Another challenge faced by businesses using LLMs is that the responses these models generate are sometimes from non-authoritative sources used while training these models. Consider how much a user can trust the source when looking for information online. No one will rely on a random blog post written by someone with no medical background over the information provided by a doctor on a reputable medical website. For instance, if a person asks LLM,
“What’s the cure for the common cold?” Ideally, these models should rely on credible medical websites or journals to access information and bring back the correct response. But, if the LLM relies on unreliable sources like personal blogs, social media posts or outdated websites, its responses can be inaccurate or misleading.
The next challenge is the lack of shared vocabulary between different domains. Sometimes, when an analyst asks a query, the models might not retrieve the intended data because the question the analyst asks has domain-specific terms and the data stored in the knowledge base does not have this information. For instance, a data analyst asks an LLM to write an SQL query to display the customer’s full name by extracting the first name and last name from the customer table. The LLM might generate a syntactically correct query based on the patterns observed in different SQL codes.
SELECT CONCAT (firstname, ‘ ‘, lastname) AS full_name FROM customer;
In the above query generated by LLM, if the customer table does not have a column by the name “firstname”, instead, the actual column name is “first_name.” The vector search models, typically trained on vast amounts of general text data, might not understand the domain-specific language. So, when the above query is executed, the system returns an error.
Due to these challenges, Meta (then Facebook) introduced retrieval augmented generation (RAG) in a 2020 research paper.
What is RAG?
Retrieval-augmented generation (RAG) is a technique that enables LLMs to access external information sources, such as an enterprise data repository, to enhance their responses to prompts. Large language models have demonstrated impressive capabilities in natural language processing (NLP). However, their effectiveness in specific business scenarios can be limited. For instance, if a business user asks LLM:
“What was the revenue in Q1 from customers in the northeast region?”.
This question is the prompt. While the LLM might understand the general meaning of the words (“revenue,” “Q1,” “customers,” etc.), it cannot interpret the prompt within the context of a specific business’s database structure. Let’s break this prompt into two sections:
What was the revenue?
This bit of the question is relatively straightforward for LLMs to understand and will generate accurate responses.
However, the second part of the question i.e., “in Q1 from customers in the northeast region,” introduces complexities that challenge LLMs. This section specifies the timeframe (Q1), filters by customer location (northeast region) and implies a connection to a specific data source (revenue). LLMs are not inherently trained to understand these nuances, the structure of the underlying database schema or a user’s particular intent when answering questions like this.
In such situations, RAG acts as a bridge between LLMs and a pre-determined external source that contains the information required to answer the complete prompt. It can access and process relevant data from various sources and extract the most critical information related to the specific business context. In this case, information related to (revenue in Q1 for northeast customers) is injected into the prompt before feeding it to the LLM. This is how RAG enriches the prompt with additional context and empowers LLMs to generate more comprehensive and accurate responses.
How Does RAG Work?
While LLMs are trained on massive datasets, without RAG, these models lack the ability to access and integrate real-time information. Their responses are restricted to the knowledge they possess till the day they are trained. RAG is not limited to the information LLMs were trained on; hence, they make LLMs more potent by enabling access to current information. It introduces an information retrieval component that uses the prompt to pull information from an external data source and returns the query to the LLM to generate accurate responses. But before moving to how RAG works, let’s understand the concept of vector databases.
Addressing the challenges of LLMs requires accessing and integrating data from multiple sources, such as PDFs, other business applications or images. The system must aggregate and utilize the appropriate data points to provide an accurate and relevant response. This is where vector databases offer a solution for integrating diverse data sources with LLMs. These databases store data in a mathematical representation, similar to arrays, that allows efficient processing by machine learning models.
Most importantly, vector databases can handle both structured data that is organized in a predefined format, typically stored in rows and columns within a database table and unstructured data such as emails, social media posts, etc., that don’t follow a predefined format. While they provide a powerful way to integrate structured data with LLMs, they don’t inherently understand the meaning of the information within that data. However, there might be situations where the relevant information resides in complex unstructured formats that can even challenge the LLM’s ability to understand the context or require domain-specific knowledge that isn’t captured in the vector database.
For instance, let’s take a scenario of healthcare industry, where a doctor is using LLM to get answers about a specific patient’s medical history. Here, vector databases contain all the information like demographics, diagnoses and medication history (stored in rows and columns). It might also contain unstructured data like doctor’s notes and clinical trial reports.
Now, suppose the LLM is trained on a massive dataset of medical text, including research papers, textbooks and even some clinical notes and can also access the vector database and retrieve relevant patient information based on keywords or search queries. This LLM can now find relevant data points like medications or diagnoses from the structured data and potentially understand some keywords in the unstructured notes, but it might not understand medical terminology, or abbreviations used by a doctor in his notes. This means the LLM might not understand the full context within the notes and doesn’t have the domain-specific knowledge that could be crucial for understanding the patient’s condition and to assess the potential risks associated with surgery or if the patient needs one.
This is where RAG acts as a complementary approach alongside vector databases. Now, let’s see how RAG works:
The process starts with the user’s prompt, a question or an instruction. RAG also receives this query, which can be in various formats, such as text, code, etc., making the relevancy search difficult with external sources. So, to perform a relevancy search, RAG converts the document collection (knowledge base) and the user-submitted query to a numerical representation called an “embedding” (a compact representation of relevant data points related to the user’s prompt) using embedding language models. This way, RAG creates a common ground for comparison of the user’s query embedding with the pre-computed embeddings of all the documents in the knowledge base.
Then, RAG performs a semantic search that goes beyond simple keyword matching. It focuses on understanding the meaning and intent behind the prompt and query to identify the most relevant data objects within the knowledge base. It can also identify semantically similar information that might be phrased differently but holds the same meaning. After finding relevant data points through semantic search, the original user prompt is combined with the query and the embedding from the knowledge base.
Finally, the enriched prompt, equipped with this additional information, is fed to the LLM to generate a more comprehensive and accurate response to the user’s prompt. Overall, RAG empowers LLMs to generate more informative and reliable responses to complex user queries by utilizing semantic search, embeddings and the knowledge base.
How Kyvos Helps Improve LLM Responses
Kyvos Copilot empowers business users to interact with their metrics in natural language to obtain summaries, contextualized insights and much more. It goes beyond simple keyword matching and leverages semantic search and RAG to unlock deeper insights and context. Here’s how it elevates LLM capabilities:
- Semantic search and information retrieval: When a user submits a question or prompt, Kyvos Copilot with the help of LLM analyzes the query to understand the underlying meaning and intent. It utilizes semantic search techniques to compare the user’s query vector to the vectors of Kyvos semantic model metadata in the knowledge base for semantic similarity, not just identical keywords. This helps in identifying the most relevant dimensions and metrics within the vector database that can accurately address the user’s query.
- Context Enrichment for LLMs: These identified relevant dimensions and metrics essentially become the context for the LLM to respond to the user’s query. This context empowers users to delve into context-aware data exploration. Guided by contextual prompts and intelligent data selection, Kyvos ensures a seamless journey from data to insights. Additionally, Kyvos’ semantic layer provide definitions and relationships between data points or other relevant details to enhance understanding for the LLM. It also keeps track of previous questions and responses within the conversation flow to recognize the specific data point (KPI) being referred to, even if not explicitly mentioned in the current question. Based on the ongoing conversation flow, Kyvos tailors its responses to address user’s intent. This comprehensive information is then fed into the LLM.
- Improved Visualizations: Kyvos Copilot intelligently selects the most appropriate chart type for visualization that effectively communicates trends, comparisons or proportions. It also facilitates users to choose from a pre-defined set of chart types (bar chart, line chart, pie chart, etc.) instead of relying solely on Kyvos’ automatic selection. Additionally, Kyvos also provides natural language summarization for users to comprehend the key patterns derived from their queries.
The Combined Benefits of Kyvos Gen AI-Powered Semantic Layer and RAG for Conversational Analytics
Kyvos’ Gen AI-powered semantic layer, working in tandem with RAG, empowers conversational analytics with the following benefits:
Using Kyvos semantic layer, businesses can define their KPIs and the relationships between them in a clear and consistent way. The layer represents data into meaningful business definitions and allows LLMs to query data through the semantic layer, providing LLMs with a unified representation of the data. So, when an LLM encounters a query, RAG and semantic layer can provide context for each part of the question. With this context, LLMs can formulate accurate responses, reducing the likelihood of misinterpreting the question or misunderstanding relationships between data points – meaning goodbye hallucinations.
Kyvos semantic layer acts as a front door to help organizations establish strong data governance. When LLMs query data through Kyvos, the risk of unauthorized data access is minimized. Its multi-layered security architecture empowers organizations with granular control of row and column levels to ensure data is only accessible to authorized users based on their roles. This prevents unauthorized access and data leaks, fostering trust in using RAG and LLMs and keeps the organization’s sensitive information protected.
While LLMs are powerful, they can struggle with context and outdated information. RAG, with a semantic layer, bridges this gap. Techniques like semantic search can enable context enrichment to empower LLMs to access and understand relevant information and generate accurate, informative and trustworthy responses. RAG paves the way for a future where LLMs become invaluable tools for unlocking deeper insights and knowledge.