3.4 Optimize Vector Query Performance¶
The vector
extension adds vector storage and similarity search capabilities to Azure Database for PostgreSQL flexible server. To improve efficiency when executing vector searches, you can use the pg_diskann
extension, which allows you to leverage DiskANN to create indexes on tables containing a vector column. Indexing can help improve the performance of vector queries against the database. Without indexes, the vector
extension performs an exact search, which provides perfect recall at the expense of performance. To perform approximate nearest neighbor searches, you can create DiskANN indexes on your data, which offers high recall, high queries per second, and low query latency, even for billion-point datasets.
Always load your data before indexing it.
Loading data before indexing creates the index faster and results in a more optimal layout.
Index Types¶
There are three supported vector index types, each with its own tradeoffs. You can learn more about the index types using these links:
- Disk Approximate Nearest Neighbor (DiskANN)
- Inverted File with Flat Compression (IVVFlat)
- Hierarchical Navigable Small Worlds (HNSW)
The IVFFlat
index has faster build times and uses less memory than HNSW
, but it has lower query performance (in terms of the speed-recall tradeoff). DiskANN
minimizes these tradeoffs and provides a superior balance between highly accurate query performance and fast build times.
Index Type Limitations
- To index a column, it must have dimensions defined. Attempting to index a column defined as
col vector
will result in the error:ERROR: column does not have dimensions
. - When using
ivfflat
andhnsw
, you are limited to indexing columns with 2000 dimensions or less. Attempting to index a column with more dimensions results in an error:ERROR: column cannot have more than 2000 dimensions for INDEX_TYPE index where INDEX_TYPE is either ivfflat or hnsw
.
Index Access and Vector Distance Functions¶
The vector
type added to your database by the vector
extension allows you to perform three types of searches on the stored vectors. You need to select the correct access function for your index to have the database consider your index when executing your queries.
Distance Function | Index Access Function | Vector Operator | Description |
---|---|---|---|
Cosine distance | vector_cosine_ops |
<=> |
Measures similarity between vectors by calculating the cosine of the angle between them. |
L2 (or Euclidean) distance | vector_l2_ops |
<-> |
Measures the straight-line distance between two vectors in multi-dimensional space. |
Inner (dot) product | vector_ip_ops |
<#> |
Measures the similarity between two vectors by summing the products of their corresponding components. |
Vector Operators¶
The vector operators in the table above indicate the "distance operator" used to calculate the distances between two vectors in a multi-dimensional space. They are used when writing queries to calculate similarity. For example, in the following query, records are selected based on the similarity to the provided vector, where the cosine distance similarity is less than 0.5:
1 2 |
|
Create Vector Indexes¶
In the previous tasks, you added vector
columns to several tables in your database and populated them with embeddings using the azure_ai
extension and Azure OpenAI. To ensure queries over those embeddings are efficient, you will add diskann
indexes to each table to which you added a vector
column.
Select the tab for each table below and execute the CREATE INDEX
statement to create an diskann
index.
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.
SQL | |
---|---|
1 2 |
|
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.
SQL | |
---|---|
1 2 |
|
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.
SQL | |
---|---|
1 2 |
|
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.
SQL | |
---|---|
1 2 |
|
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the index.
SQL | |
---|---|
1 2 |
|
Be sure to run the CREATE INDEX
statements for each of the tables before moving on to the next step.