5.2 Review RAG Design¶

The solution leverages the Retrieval Augmented Generation (RAG) design pattern to ensure the copilot's responses are grounded in the (private) data maintained by Woodgrove Bank. RAG works by retrieving relevant information from your data stores and augmenting the model prompt to create a composite prompt. This enhanced prompt is used by the LLM to generate the final response or completion.

RAG design pattern

To understand how the RAG design pattern works in the context of the Woodgrove Bank Contract Management application, select each tab in order and review the sequence of events shown in the figure above.

1. User Query2. Vectorize Query3. Retrieve Data4. Augment Query5. Generate Response

The user query arrives at our copilot implementation via the chat API endpoint.

User queries are submitted via the copilot interface of the Woodgrove Bank Contract Management portal. These queries are sent to the backend API's /chat endpoint. The incoming user query has three components:

The user's question (text input)
An optional chat history (object array)
And, a max history setting, which indicates the number of chat history messages to use in the prompt

The API extracts these parameters from the incoming request and invokes the /chat endpoint, starting the workflow that reflects this RAG design pattern.

Embeddings representing the user query are generated.

When performing similarity searches to find data, the /chat endpoint sends the text of the user's request to Azure OpenAI, where it is vectorized using a large language "Embedding" model (e.g., Azure Open AI text-embedding-ada-002). This vector is then used in the query to retrieve similar records in the next step.

Generate Embeddings

The below __create_query_embeddings function in the Woodgrove Bank API uses the LangChain AzureOpenAIEmbeddings class to generate embeddings for the provided user query.

src/api/app/functions/chat_functions.py
async def __create_query_embeddings(self, user_query: str):
    """
    Generates vector embeddings for the user query.
    """
    # Create embeddings using the LangChain Azure OpenAI Embeddings client
    # This makes an API call to the Azure OpenAI service to generate embeddings,
    # which can be used to compare the user query with vectorized data in the database.
    query_embeddings = await self.embedding_client.aembed_query(user_query)
    return query_embeddings

This step may be skipped in cases where raw data is requested, such as a list of vendors.

Queries are executed against the database to retrieve (private) data.

This step retrieves data from the PostgreSQL database to "augment" the prompt. Depending on the user query, hybrid search and direct data retrieval techniques may be used to obtain relevant records.

Improve RAG accuracy

The accuracy of the RAG pattern can be improved by using techniques like semantic ranking to order the returned results and GraphRAG to identify relationships between data. You will learn about these techniques in the next task.

Select each tab below to learn more about the implementation of Hybrid Search and Direct Data Retrieval in the context of the Woodgrove Bank Contract Management application!

Hybrid SearchDirect Data Retrieval

Hybrid search in Azure Database for PostgreSQL combines traditional full-text search functionality with the vector similarity search capabilities enabled by the azure_ai and vector extensions to deliver highly relevant results. This dual approach leverages the precision of keyword matching with full-text search and the contextual understanding of vector search, ensuring that users obtain exact matches and semantically related content. This synergy enhances search efficiency, provides a richer user experience, and supports diverse use cases—from technical document retrieval to broad content discovery—making it an invaluable tool for modern copilots

Hybrid Search Example Code

The find_invoice_validation_results function below provides an example of the hybrid search technique used in the Woodgrove Bank API.

src/api/app/functions/chat_functions.py
async def find_invoice_validation_results(self, user_query: str, invoice_id: int = None, vendor_id: int = None, sow_id: int = None):
    """
    Retrieves invoice accuracy and performance validation results similar to the user query for specified invoice, vendor, or SOW.
    If no invoice_id, vendor_id, or sow_id is provided, return all similar validation results.
    """
    # Define the columns to retrieve from the table
    # Exclude the embedding column in results
    columns = ["invoice_id", "datestamp", "result", "validation_passed"]

    # Get the embeddings for the user query
    query_embeddings = await self.__create_query_embeddings(user_query)

    # Use hybrid search to rank records, with exact matches ranked highest
    columns.append(f"""CASE
                        WHEN result ILIKE '%{user_query}%' THEN 0
                        ELSE (embedding <=> '{query_embeddings}')::real
                    END as rank""")

    query = f'SELECT {", ".join(columns)} FROM invoice_validation_results'

    # Filter the validation results by invoice_id, vendor_id, or sow_id, if provided
    if invoice_id is not None:
        query += f' WHERE invoice_id = {invoice_id}'
    else:
        if vendor_id is not None:
            query += f' WHERE vendor_id = {vendor_id}'
            if sow_id is not None:
                query += f' AND sow_id = {sow_id}'
        elif sow_id is not None:
            query += f' WHERE sow_id = {sow_id}'

    query += f' ORDER BY rank ASC'

    rows = await self.__execute_query(f'{query};')
    return [dict(row) for row in rows]

In the code above:

The CASE statement on lines 14-17 handles the Hybrid Search.
1. WHEN result ILIKE '%{user_query}%' performs a case-insensitive search for the exact text of the user query. If found, a rank of 0 is assigned to the record, which ranks these records as the highest or most similar matches.
2. When an exact match is not found, the ELSE statement executes a vector similarity search using the cosine distance function, as indicated by the <=> vector operator, to compare the embedding representation of the user query (query_embeddings) from the previous step to values in the embedding column of the invoice_validation_results table. The similarity score of these matches is assigned as the rank. Scores closer to zero indicate a more semantically similar result.
If values are provided, the query is further refined to filter on a specific invoice_id or vendor_id and sow_id, as shown in lines 73-81.
Finally, the results are ordered by rank to ensure the most relevant search results appear first in the rows returned.

In other cases, such as getting a list of all vendors, query vectorization is unnecessary. Direct data retrieval is handled via simple SELECT queries against the database to avoid the overhead of generating embeddings and querying vector fields.

Direct Data Retrieval Example Code

The get_vendors function below provides an example of a direct data retrieval technique used in the Woodgrove Bank API.

src/api/app/functions/chat_functions.py
async def get_vendors(self):
    """Retrieves a list of vendors from the database."""
    rows = await self.__execute_query('SELECT * FROM vendors;')
    return [dict(row) for row in rows]

The copilot creates a composite prompt with the retrieved data.

The /chat API combines the user's original question with results returned from the database to create an enhanced or composite model prompt, augmenting the model with additional data to use when generating a response.

The chat model uses the prompt to generate a grounded response.

The composite prompt, grounded with ("private") data, is sent to a Large Language "chat" completion model, such as Azure OpenAI's gpt-4o. The completion model sees the enhanced prompt (hybrid search results and chat history) as the grounding context for generating the completion, improving the quality (e.g., relevance, groundedness) of the response returned from the Woodgrove Bank copilot.