6.6 Export Graph Data¶
You must export data from your PostgreSQL database to populate your graph database. Using the Azure Storage (azure_storage
) extension for Azure Database for PostgreSQL provides a streamlined method for copying data out of your PostgreSQL database into CSV files in Azure Blob Storage. In the context of AGE
and loading data into a graph database, the azure_storage
extension facilitates the extraction of relational data from your PostgreSQL database, enabling efficient transfer to Blob Storage. This process ensures that the data needed for constructing and querying your graph database is readily available and can be seamlessly integrated into your data workflows.
Connect your database to Azure Storage¶
You will use pgAdmin to configure the azure_storage
extension's connection to your storage account by executing SQL commands against your database.
Ensure you replace the token in the commands below with the appropriate values from your Azure environment.
Each SQL statement you will execute below contains a [YOUR_STORAGE_ACCOUNT_NAME]
token. Before running any of the queries, you must replace this token with the name of your Storage account resource, which you can copy from the Azure portal.
-
On your local machine, return to the open instance of pgAdmin (or open it if you have closed it) and ensure it is connected to your PostgreSQL database.
-
In the pgAdmin Object Explorer, expand databases under your PostgreSQL server.
-
Right-click the contracts database and select Query Tool from the context menu.
-
Run the following command to use the
azure_storage.account_add()
function to define a connection between your storage account and your PostgreSQL database. Replace the account name and key tokens with values for your storage account.Execute the following SQL commands in pgAdmin!
SQL 1
SELECT azure_storage.account_add('[YOUR_STORAGE_ACCOUNT_NAME]', '[YOUR_STORAGE_ACCOUNT_KEY]');
Need help finding your storage account key?
To get your storage account's access key:
- Navigate to your storage account in the Azure portal.
- Select the Access keys menu under Security + networking in the navigation menu.
- Select Show next to the Key value under key1.
- Select the Copy to clipboard button that appears on the right-hand side of the Key box.
-
Paste the copied key as the
[YOUR_STORAGE_ACCOUNT_KEY]
value in the above SQL statement.
Export data to blob storage¶
As part of the data export process, you will use queries to reshape the source data into the format required to efficiently define nodes and edges in your graph database. Traditional relational databases organize data in tables, while graph databases use nodes and edges to represent entities and their relationships. Converting tabular data into nodes and edges aligns with the graph structure, making relationship analysis more efficient. This transformation enables natural modeling of real-world entities, optimizes query performance, and allows for complex relationship analysis, such as evaluating the connections between vendors, SOWs, and associated invoices. By reshaping your data, you can fully leverage the strengths of AGE
and Azure Database for PostgreSQL for deeper insights and sophisticated analyses.
You will define two nodes and one edge in your graph database. The nodes will contain data vendor, and SOW data. The edge will define the relationship between these.
You will use pgAdmin to execute data export queries leveraging the azure_storage
extension.
-
Return to the open Query Tool in pgAdmin.
-
Run the following query using the
azure_storage.blob_put()
function to write all data from thevendors
table into a CSV file namedvendors.csv
into your storage account'sgraph
container. This data will define thevendor
node in your graph database.Execute the following SQL commands in pgAdmin!
SQL 1 2 3 4 5 6 7 8 9 10 11 12 13
-- Extract data for the vendors node SELECT azure_storage.blob_put( '[YOUR_STORAGE_ACCOUNT_NAME]', 'graph', 'vendors.csv', vendors, 'csv', 'none', azure_storage.options_csv_put(header:=true) ) FROM ( SELECT * FROM vendors ) AS vendors;
-
Execute this query to extract
sow
node data from thesows
table and write it into a CSV file namedsows.csv
into your storage account'sgraph
container. The query excludes a few columns from thesows
table, including theembedding
column, as they are unnecessary in the graph database and can cause errors.Execute the following SQL commands in pgAdmin!
SQL 1 2 3 4 5 6 7 8 9 10 11 12 13
-- Extract data for the SOWs node SELECT azure_storage.blob_put( '[YOUR_STORAGE_ACCOUNT_NAME]', 'graph', 'sows.csv', sows, 'csv', 'none', azure_storage.options_csv_put(header:=true) ) FROM ( SELECT id, number, vendor_id, start_date, end_date, budget FROM sows ) AS sows;
-
Finally, run the following query to extract
has_invoices
edge data from theinvoices
table and write it into a CSV file namedhas_invoices.csv
into thegraph
container in your storage account:Execute the following SQL commands in pgAdmin!
SQL 1 2 3 4 5 6 7 8 9 10 11 12 13
-- Create the has_invoices edge SELECT azure_storage.blob_put( '[YOUR_STORAGE_ACCOUNT_NAME]', 'graph', 'has_invoices.csv', invoices, 'csv', 'none', azure_storage.options_csv_put(header:=true) ) FROM ( SELECT id, vendor_id as start_id, 'vendor' AS start_vertex_type, sow_id AS end_id, 'sow' AS end_vertex_type, number, amount, invoice_date, payment_status FROM invoices ) AS invoices;
Edge definition details
When using
AGE
, edges must contain details about the relationships between nodes. These are defined in the above query by specifying thestart_id
,start_vertex_type
,end_id
, andend_vertex_type
columns. The '_id' columns are mapped to thevendor_id
andsow_id
, respectively, and the start and end vertex types are strings specifying the node type associated with the ID.
Verify CSV files¶
You can use the azure_storage
extension to verify the CSV files were successfully written into the graph
container in your storage account.
-
Execute the following query in the Query Tool in pgAdmin. Ensure you replace the
[YOUR_STORAGE_ACCOUNT_NAME]
token with the name of your storage account.Execute the following SQL commands in pgAdmin!
SQL 1 2
-- Verify the CSV files were written into blob storage SELECT azure_storage.blob_list('[YOUR_STORAGE_ACCOUNT_NAME]', 'graph');
-
You should see a list of blobs in the Data output panel in pgAdmin that includes the three CSV files you exported above.
Congratulations! You have successfully exported data to create your graph database!