Skip to content

3.4 Optimize Vector Query Performance

The vector extension adds vector storage and similarity search capabilities to Azure Database for PostgreSQL flexible server. To improve efficiency when executing vector searches, you can use the pg_diskann extension, which allows you to leverage DiskANN to create indexes on tables containing a vector column. Indexing can help improve the performance of vector queries against the database. Without indexes, the vector extension performs an exact search, which provides perfect recall at the expense of performance. To perform approximate nearest neighbor searches, you can create DiskANN indexes on your data, which offers high recall, high queries per second, and low query latency, even for billion-point datasets.

Always load your data before indexing it.

Loading data before indexing creates the index faster and results in a more optimal layout.

Index Types

There are three supported vector index types, each with its own tradeoffs. You can learn more about the index types using these links:

The IVFFlat index has faster build times and uses less memory than HNSW, but it has lower query performance (in terms of the speed-recall tradeoff). DiskANN minimizes these tradeoffs and provides a superior balance between highly accurate query performance and fast build times.

Index Type Limitations
  • To index a column, it must have dimensions defined. Attempting to index a column defined as col vector will result in the error: ERROR: column does not have dimensions.
  • When using ivfflat and hnsw, you are limited to indexing columns with 2000 dimensions or less. Attempting to index a column with more dimensions results in an error: ERROR: column cannot have more than 2000 dimensions for INDEX_TYPE index where INDEX_TYPE is either ivfflat or hnsw.

Index Access and Vector Distance Functions

The vector type added to your database by the vector extension allows you to perform three types of searches on the stored vectors. You need to select the correct access function for your index to have the database consider your index when executing your queries.

Distance Function Index Access Function Vector Operator Description
Cosine distance vector_cosine_ops <=> Measures similarity between vectors by calculating the cosine of the angle between them.
L2 (or Euclidean) distance vector_l2_ops <-> Measures the straight-line distance between two vectors in multi-dimensional space.
Inner (dot) product vector_ip_ops <#> Measures the similarity between two vectors by summing the products of their corresponding components.

Vector Operators

The vector operators in the table above indicate the "distance operator" used to calculate the distances between two vectors in a multi-dimensional space. They are used when writing queries to calculate similarity. For example, in the following query, records are selected based on the similarity to the provided vector, where the cosine distance similarity is less than 0.5:

1
2
SELECT * FROM sow_chunks
WHERE embedding <=> '[-0.031766646,-0.033289704,...,0.016508864,0.031440277]' < 0.5

Create Vector Indexes

In the previous tasks, you added vector columns to several tables in your database and populated them with embeddings using the azure_ai extension and Azure OpenAI. To ensure queries over those embeddings are efficient, you will add diskann indexes to each table to which you added a vector column.

Select the tab for each table below and execute the CREATE INDEX statement to create an diskann index.

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.

SQL
1
2
-- Create a diskann index by using Cosine distance operator
CREATE INDEX deliverables_diskann_idx ON deliverables USING diskann (embedding vector_cosine_ops);

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.

SQL
1
2
-- Create a diskann index by using Cosine distance operator
CREATE INDEX line_items_diskann_idx ON invoice_line_items USING diskann (embedding vector_cosine_ops);

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.

SQL
1
2
-- Create a diskann index by using Cosine distance operator
CREATE INDEX invoice_validation_results_diskann_idx ON invoice_validation_results USING diskann (embedding vector_cosine_ops);

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.

SQL
1
2
-- Create a diskann index by using Cosine distance operator
CREATE INDEX sow_chunks_diskann_idx ON sow_chunks USING diskann (embedding vector_cosine_ops);

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.

SQL
1
2
-- Create a diskann index by using Cosine distance operator
CREATE INDEX sow_validation_results_diskann_idx ON sow_validation_results USING diskann (embedding vector_cosine_ops);

Be sure to run the CREATE INDEX statements for each of the tables before moving on to the next step.