5.2 Review RAG Design¶
The solution leverages the Retrieval Augmented Generation (RAG) design pattern to ensure the copilot's responses are grounded in the (private) data maintained by Woodgrove Bank. RAG works by retrieving relevant information from your data stores and augmenting the model prompt to create a composite prompt. This enhanced prompt is used by the LLM to generate the final response or completion.
To understand how the RAG design pattern works in the context of the Woodgrove Bank Contract Management application, select each tab in order and review the sequence of events shown in the figure above.
The user query arrives at our copilot implementation via the chat API endpoint.
User queries are submitted via the copilot interface of the Woodgrove Bank Contract Management portal. These queries are sent to the backend API's /chat
endpoint. The incoming user query has three components:
- The user's question (text input)
- An optional chat history (object array)
- And, a max history setting, which indicates the number of chat history messages to use in the prompt
The API extracts these parameters from the incoming request and invokes the /chat
endpoint, starting the workflow that reflects this RAG design pattern.
Embeddings representing the user query are generated.
When performing similarity searches to find data, the /chat
endpoint sends the text of the user's request to Azure OpenAI, where it is vectorized using a large language "Embedding" model (e.g., Azure Open AI text-embedding-ada-002
). This vector is then used in the query to retrieve similar records in the next step.
Generate Embeddings
The below __create_query_embeddings
function in the Woodgrove Bank API uses the LangChain AzureOpenAIEmbeddings class to generate embeddings for the provided user query.
src/api/app/functions/chat_functions.py | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
This step may be skipped in cases where raw data is requested, such as a list of vendors.
Queries are executed against the database to retrieve (private) data.
This step retrieves data from the PostgreSQL database to "augment" the prompt. Depending on the user query, hybrid search and direct data retrieval techniques may be used to obtain relevant records.
Improve RAG accuracy
The accuracy of the RAG pattern can be improved by using techniques like semantic ranking to order the returned results and GraphRAG to identify relationships between data. You will learn about these techniques in the next task.
Select each tab below to learn more about the implementation of Hybrid Search and Direct Data Retrieval in the context of the Woodgrove Bank Contract Management application!
Hybrid search in Azure Database for PostgreSQL combines traditional full-text search functionality with the vector similarity search capabilities enabled by the azure_ai
and vector
extensions to deliver highly relevant results. This dual approach leverages the precision of keyword matching with full-text search and the contextual understanding of vector search, ensuring that users obtain exact matches and semantically related content. This synergy enhances search efficiency, provides a richer user experience, and supports diverse use cases—from technical document retrieval to broad content discovery—making it an invaluable tool for modern copilots
Hybrid Search Example Code
The find_invoice_validation_results
function below provides an example of the hybrid search technique used in the Woodgrove Bank API.
src/api/app/functions/chat_functions.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
In the code above:
-
The
CASE
statement on lines 14-17 handles the Hybrid Search.-
WHEN result ILIKE '%{user_query}%'
performs a case-insensitive search for the exact text of the user query. If found, arank
of0
is assigned to the record, which ranks these records as the highest or most similar matches. -
When an exact match is not found, the
ELSE
statement executes a vector similarity search using the cosine distance function, as indicated by the<=>
vector operator, to compare the embedding representation of the user query (query_embeddings
) from the previous step to values in theembedding
column of theinvoice_validation_results
table. The similarity score of these matches is assigned as therank
. Scores closer to zero indicate a more semantically similar result.
-
-
If values are provided, the query is further refined to filter on a specific
invoice_id
orvendor_id
andsow_id
, as shown in lines 73-81. -
Finally, the results are ordered by
rank
to ensure the most relevant search results appear first in the rows returned.
In other cases, such as getting a list of all vendors, query vectorization is unnecessary. Direct data retrieval is handled via simple SELECT
queries against the database to avoid the overhead of generating embeddings and querying vector fields.
Direct Data Retrieval Example Code
The get_vendors
function below provides an example of a direct data retrieval technique used in the Woodgrove Bank API.
src/api/app/functions/chat_functions.py | |
---|---|
1 2 3 4 |
|
The copilot creates a composite prompt with the retrieved data.
The /chat
API combines the user's original question with results returned from the database to create an enhanced or composite model prompt, augmenting the model with additional data to use when generating a response.
The chat model uses the prompt to generate a grounded response.
The composite prompt, grounded with ("private") data, is sent to a Large Language "chat" completion model, such as Azure OpenAI's gpt-4o
. The completion model sees the enhanced prompt (hybrid search results and chat history) as the grounding context for generating the completion, improving the quality (e.g., relevance, groundedness) of the response returned from the Woodgrove Bank copilot.