Skip to content

3.3 Enable vector storage

The pgvector extension is a cutting-edge addition to PostgreSQL that empowers the database with the ability to handle vector data natively. This feature allows you to store and query vector information, making it ideal for applications involving copilots, recommendation systems, and similarity searches. To enable vector embeddings to be stored alongside the rest of your data in a PostgreSQL database, you must alter the tables in which you want to store embeddings to add columns with the vector data type provided by the extension.

What are vectors?

Vectors, also referred to as embeddings or vector embeddings, are mathematical structures that represent data in a high-dimensional space. Each dimension in this space corresponds to a particular feature of the data, and complex data might be represented using tens of thousands of these dimensions. The location of a vector in this space captures its unique attributes. Various types of data, including words, phrases, entire documents, images, audio, and more, can be transformed into vectors. By using vector search, it's possible to find similar data across different types thanks to this uniform representation.

An embedding is a dense and informative data format that machine learning models and algorithms can effectively leverage. Embeddings encapsulate the semantic meaning of a piece of text in the form of a vector of floating-point numbers. Consequently, the proximity of two embeddings in the vector space reflects the semantic similarity between the original inputs.

Add vector columns to tables

The vector data type installed by the pgvector extension allows for the efficient storage and manipulation of high-dimensional numerical vectors in PostgreSQL databases. These vectors are arrays of numbers representing a wide range of data, such as text embeddings, image features, or user preferences.

Below, you will vector columns to the deliverables, invoice_line_items, invoice_validation_results, sow_chunks, and sow_validation_results tables. Each of these tables contains a column storing a block of text. The vector embeddings will be stored alongside the text that they represent.

The size of the vector column should correspond to the number of dimensions generated by the embedding model being used. For this solution, you use OpenAI's text-embedding-ada-002 model, configured to return 1,536 dimensions. You will see this number represented in the size of the vector columns added to the tables below.

Using pgAdmin, execute the SQL statement for each table against the contracts database.

Select the tab for each table below and execute the ALTER TABLE statement to create an embedding column for storing vector data.

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.

SQL
1
ALTER TABLE deliverables ADD COLUMN IF NOT EXISTS embedding vector(1536);

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.

SQL
1
ALTER TABLE invoice_line_items ADD COLUMN IF NOT EXISTS embedding vector(1536);

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.

SQL
1
ALTER TABLE invoice_validation_results ADD COLUMN IF NOT EXISTS embedding vector(1536);

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.

SQL
1
ALTER TABLE sow_chunks ADD COLUMN IF NOT EXISTS embedding vector(1536);

Copy and paste the following SQL statement into a new query window in pgAdmin, then execute the query to add the embedding vector column.

SQL
1
ALTER TABLE sow_validation_results ADD COLUMN IF NOT EXISTS embedding vector(1536);

Be sure to run the ALTER TABLE statements for each of the tables to add all the embeddings columns before moving on to the next step.