3.3 Enable vector storage¶
The pgvector extension is a cutting-edge addition to PostgreSQL that empowers the database with the ability to handle vector data natively. This feature allows you to store and query vector information, making it ideal for applications involving copilots, recommendation systems, and similarity searches. To enable vector embeddings to be stored alongside the rest of your data in a PostgreSQL database, you must alter the tables in which you want to store embeddings to add columns with the vector
data type provided by the extension.
What are vectors?¶
Vectors, also referred to as embeddings or vector embeddings, are mathematical structures that represent data in a high-dimensional space. Each dimension in this space corresponds to a particular feature of the data, and complex data might be represented using tens of thousands of these dimensions. The location of a vector in this space captures its unique attributes. Various types of data, including words, phrases, entire documents, images, audio, and more, can be transformed into vectors. By using vector search, it's possible to find similar data across different types thanks to this uniform representation.
An embedding is a dense and informative data format that machine learning models and algorithms can effectively leverage. Embeddings encapsulate the semantic meaning of a piece of text in the form of a vector of floating-point numbers. Consequently, the proximity of two embeddings in the vector space reflects the semantic similarity between the original inputs.
Add vector columns to tables¶
The vector
data type installed by the pgvector extension allows for the efficient storage and manipulation of high-dimensional numerical vectors in PostgreSQL databases. These vectors are arrays of numbers representing a wide range of data, such as text embeddings, image features, or user preferences.
Below, you will vector
columns to the deliverables
, invoice_line_items
, invoice_validation_results
, sow_chunks
, and sow_validation_results
tables. Each of these tables contains a column storing a block of text. The vector embeddings will be stored alongside the text that they represent.
The size of the vector column should correspond to the number of dimensions generated by the embedding model being used. For this solution, you use OpenAI's text-embedding-ada-002
model, configured to return 1,536 dimensions. You will see this number represented in the size of the vector columns added to the tables below.
Using pgAdmin, execute the SQL statement for each table against the contracts database.
Select the tab for each table below and execute the ALTER TABLE
statement to create an embedding
column for storing vector data.
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.
SQL | |
---|---|
1 |
|
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.
SQL | |
---|---|
1 |
|
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.
SQL | |
---|---|
1 |
|
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.
SQL | |
---|---|
1 |
|
Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.
SQL | |
---|---|
1 |
|
Be sure to run the ALTER TABLE
statements for each of the tables to add all the embeddings columns before moving on to the next step.