Skip to content

3.5 Vectorize data

By leveraging an embedding model, such as Azure OpenAI's text-embedding-ada-002, you can generate vector representations of textual data and store them in a vector store like Azure Database for PostgreSQL - Flexible Server. This approach facilitates efficient and accurate similarity searches, significantly enhancing the copilot's ability to retrieve relevant information and provide contextually rich interactions.

The azure_openai schema installed by the azure_ai extension contains the create_embeddings() function. This function enables you to generate embeddings for text input by invoking an embedding model deployed in Azure OpenAI directly from a query.

Function signatures for the create_embeddings() function
1
2
3
4
5
-- Single text input
azure_openai.create_embeddings(deployment_name text, input text, timeout_ms integer DEFAULT 3600000, throw_on_error boolean DEFAULT true, max_attempts integer DEFAULT 1, retry_delay_ms integer DEFAULT 1000)

-- Array of input text
azure_openai.create_embeddings(deployment_name text, input text[], batch_size integer DEFAULT 100, timeout_ms integer DEFAULT 3600000, throw_on_error boolean DEFAULT true, max_attempts integer DEFAULT 1, retry_delay_ms integer DEFAULT 1000)

Learn more about the create_embeddings() function, its overloads, and expected arguments in the function documentation.

Generate embeddings

The azure_ai extension makes calling the Azure OpenAI embedding API trivial. In its simplest form, the create_embeddings() function can be called with two arguments, deployment_name and input, as shown below:

SQL
1
SELECT azure_openai.create_embeddings(deployment_name, input)

To demonstrate how to generate vector embeddings through a SQL query, execute the following query in pgAdmin.

  1. Return to the open instance of pgAdmin on your local machine and ensure it is connected to your PostgreSQL database.

  2. In the pgAdmin Object Explorer, expand databases under your PostgreSQL server.

  3. Right-click the contracts database and select Query Tool from the context menu.

  4. Run the following query, which creates a vector embedding for the result field in the invoice_validation_results table. The deployment_name parameter in the function is set to embeddings, which is the name of the deployment of the text-embedding-ada-002 model in your Azure OpenAI service (it was created with that name by the Bicep deployment script):

    Execute the SQL command below!

    SQL
    1
    2
    3
    4
    5
    SELECT 
        invoice_id,
        azure_openai.create_embeddings('embeddings', result) AS embedding
    FROM invoice_validation_results
    LIMIT 1;
    

    You can view the deployment name of your embedding model in Azure AI Foundry.

    1. Open Azure AI Foundry from the landing page of your Azure OpenAI service.
    2. In Azure AI Foundry, select the Deployments option from the resource navigation menu.
    3. Observe the Name associated with the text-embedding-ada-002 model.

      Screenshot of Azure AI Foundry, with the Deployments page selected and highlighted. The embeddings model deployment name is highlighted.

  5. The results of the query will look similar to this:

    SQL
    1
    2
    3
       id |                              embedding
    ------+-------------------------------------------------------------------------
        1 | {-0.031766646,-0.033289704,0.0009468119,...,0.016508864,0.031440277}
    

    The 1,536 dimensions in the vector are abbreviated in the above output for brevity.

Vectorize existing data

You added vector columns to the deliverables, invoice_line_items, invoice_validation_results,  sow_chunks, and sow_validation_results tables. You will now use the azure_openai.create_embeddings() function in SQL UPDATE statements to generate embeddings for the text data already in each table.

Each table query may take several minutes, depending on the configured TPM limits.

Using pgAdmin, execute the SQL statement for each table.

Select the tab for each table below and execute the UPDATE statement to create embeddings for the specified column.

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query.

SQL
1
2
3
UPDATE deliverables
SET embedding = azure_openai.create_embeddings('embeddings', description, max_attempts => 5, retry_delay_ms => 500)
WHERE embedding IS NULL;

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query.

SQL
1
2
3
UPDATE invoice_line_items
SET embedding = azure_openai.create_embeddings('embeddings', description, max_attempts => 5, retry_delay_ms => 500)
WHERE embedding IS NULL;

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query.

SQL
1
2
3
UPDATE invoice_validation_results
SET embedding = azure_openai.create_embeddings('embeddings', result, max_attempts => 5, retry_delay_ms => 500)
WHERE embedding IS NULL;

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query.

SQL
1
2
3
UPDATE sow_chunks
SET embedding = azure_openai.create_embeddings('embeddings', content, max_attempts => 5, retry_delay_ms => 500)
WHERE embedding IS NULL;

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query.

SQL
1
2
3
UPDATE sow_validation_results
SET embedding = azure_openai.create_embeddings('embeddings', result, max_attempts => 5, retry_delay_ms => 500)
WHERE embedding IS NULL;

Be sure to run the UPDATE statements for each table to create all the embeddings before moving on to the next step.

Generate Embeddings on Database INSERT using a Trigger

The code for this application performs the calls to the azure_openai extension to generate the vector embedding directly within the database INSERT command. There are times where it's more useful to use an INSERT trigger on the table. The trigger will automatically generate the embeddings anytime a row is inserted into table.

The following is an example of what an INSERT trigger for generating vector embedding on the sow_chunks table might look:

Example INSERT Trigger to Generate Embeddings
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
-- create function that generates embeddings
CREATE OR REPLACE FUNCTION sow_chunks_insert_trigger_fn()
RETURNS trigger AS $$
BEGIN
  IF NEW.content IS NOT NULL THEN
    NEW.embeddings := azure_openai.create_embeddings('embeddings', NEW.content, throw_on_error => FALSE, max_attempts => 1000, retry_delay_ms => 2000);
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- setup INSERT TRIGGER to call function
CREATE TRIGGER sow_chunks_insert_trigger
BEFORE INSERT ON sow_chunks
FOR EACH ROW
EXECUTE FUNCTION sow_chunks_insert_trigger_fn();