Skip to content

6.6 Export Graph Data

You must export data from your PostgreSQL database to populate your graph database. Using the Azure Storage (azure_storage) extension for Azure Database for PostgreSQL provides a streamlined method for copying data out of your PostgreSQL database into CSV files in Azure Blob Storage. In the context of AGE and loading data into a graph database, the azure_storage extension facilitates the extraction of relational data from your PostgreSQL database, enabling efficient transfer to Blob Storage. This process ensures that the data needed for constructing and querying your graph database is readily available and can be seamlessly integrated into your data workflows.

Connect your database to Azure Storage

You will use pgAdmin to configure the azure_storage extension's connection to your storage account by executing SQL commands against your database.

Ensure you replace the token in the commands below with the appropriate values from your Azure environment.

Each SQL statement you will execute below contains a [YOUR_STORAGE_ACCOUNT_NAME] token. Before running any of the queries, you must replace this token with the name of your Storage account resource, which you can copy from the Azure portal.

  1. On your local machine, return to the open instance of pgAdmin (or open it if you have closed it) and ensure it is connected to your PostgreSQL database.

  2. In the pgAdmin Object Explorer, expand databases under your PostgreSQL server.

  3. Right-click the contracts database and select Query Tool from the context menu.

  4. Run the following command to use the azure_storage.account_add() function to define a connection between your storage account and your PostgreSQL database. Replace the account name and key tokens with values for your storage account.

    Execute the following SQL commands in pgAdmin!

    SQL
    1
    SELECT azure_storage.account_add('[YOUR_STORAGE_ACCOUNT_NAME]', '[YOUR_STORAGE_ACCOUNT_KEY]');
    
    Need help finding your storage account key?

    To get your storage account's access key:

    1. Navigate to your storage account in the Azure portal.
    2. Select the Access keys menu under Security + networking in the navigation menu.
    3. Select Show next to the Key value under key1.
    4. Select the Copy to clipboard button that appears on the right-hand side of the Key box.
    5. Paste the copied key as the [YOUR_STORAGE_ACCOUNT_KEY] value in the above SQL statement.

      Screenshot of the Storage account access keys page, with the show button highlighted for the key1 Key.

Export data to blob storage

As part of the data export process, you will use queries to reshape the source data into the format required to efficiently define nodes and edges in your graph database. Traditional relational databases organize data in tables, while graph databases use nodes and edges to represent entities and their relationships. Converting tabular data into nodes and edges aligns with the graph structure, making relationship analysis more efficient. This transformation enables natural modeling of real-world entities, optimizes query performance, and allows for complex relationship analysis, such as evaluating the connections between vendors, SOWs, and associated invoices. By reshaping your data, you can fully leverage the strengths of AGE and Azure Database for PostgreSQL for deeper insights and sophisticated analyses.

You will define two nodes and one edge in your graph database. The nodes will contain data vendor, and SOW data. The edge will define the relationship between these.

You will use pgAdmin to execute data export queries leveraging the azure_storage extension.

  1. Return to the open Query Tool in pgAdmin.

  2. Run the following query using the azure_storage.blob_put() function to write all data from the vendors table into a CSV file named vendors.csv into your storage account's graph container. This data will define the vendor node in your graph database.

    Execute the following SQL commands in pgAdmin!

    SQL
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    -- Extract data for the vendors node
    SELECT azure_storage.blob_put(
        '[YOUR_STORAGE_ACCOUNT_NAME]',
        'graph',
        'vendors.csv',
        vendors,
        'csv',
        'none',
        azure_storage.options_csv_put(header:=true)
    )
    FROM (
        SELECT * FROM vendors
    ) AS vendors;
    
  3. Execute this query to extract sow node data from the sows table and write it into a CSV file named sows.csv into your storage account's graph container. The query excludes a few columns from the sows table, including the embedding column, as they are unnecessary in the graph database and can cause errors.

    Execute the following SQL commands in pgAdmin!

    SQL
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    -- Extract data for the SOWs node
    SELECT azure_storage.blob_put(
        '[YOUR_STORAGE_ACCOUNT_NAME]',
        'graph',
        'sows.csv',
        sows,
        'csv',
        'none',
        azure_storage.options_csv_put(header:=true)
    )
    FROM (
        SELECT id, number, vendor_id, start_date, end_date, budget FROM sows
    ) AS sows;
    
  4. Finally, run the following query to extract has_invoices edge data from the invoices table and write it into a CSV file named has_invoices.csv into the graph container in your storage account:

    Execute the following SQL commands in pgAdmin!

    SQL
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    -- Create the has_invoices edge
    SELECT azure_storage.blob_put(
        '[YOUR_STORAGE_ACCOUNT_NAME]',
        'graph',
        'has_invoices.csv',
        invoices,
        'csv',
        'none',
        azure_storage.options_csv_put(header:=true)
    )
    FROM (
        SELECT id, vendor_id as start_id, 'vendor' AS start_vertex_type, sow_id AS end_id, 'sow' AS end_vertex_type, number, amount, invoice_date, payment_status FROM invoices
    ) AS invoices;
    

    Edge definition details

    When using AGE, edges must contain details about the relationships between nodes. These are defined in the above query by specifying the start_id, start_vertex_type, end_id, and end_vertex_type columns. The '_id' columns are mapped to the vendor_id and sow_id, respectively, and the start and end vertex types are strings specifying the node type associated with the ID.

Verify CSV files

You can use the azure_storage extension to verify the CSV files were successfully written into the graph container in your storage account.

  1. Execute the following query in the Query Tool in pgAdmin. Ensure you replace the [YOUR_STORAGE_ACCOUNT_NAME] token with the name of your storage account.

    Execute the following SQL commands in pgAdmin!

    SQL
    1
    2
    -- Verify the CSV files were written into blob storage
    SELECT azure_storage.blob_list('[YOUR_STORAGE_ACCOUNT_NAME]', 'graph');
    
  2. You should see a list of blobs in the Data output panel in pgAdmin that includes the three CSV files you exported above.

Congratulations! You have successfully exported data to create your graph database!