Skip to content

4.3 Summarize documents

Azure Cognitive Services, specifically the Azure Language Service's Summarization capability, provides advanced abstractive text summarization to condense complex documents like Statements of Work (SoWs) into concise summaries.

By integrating this capability with the PostgreSQL Azure_AI extension, you can dynamically generate summaries for documents stored in relational tables, streamlining workflows and improving document accessibility.

Summarizing Statements of Work (SoWs)

SoWs in the financial industry often contain extensive details about project scope, deliverables, and milestones. Summarizing these documents into a few sentences allows decision-makers to quickly grasp the key information without reading through lengthy documents.

Key Benefits for Summarization

  • Time Efficiency: Quickly identify critical information from long-form documents.
  • Enhanced Accessibility: Summaries provide concise overviews, improving decision-making processes.
  • Scalable Automation: Automatically generate summaries for large volumes of documents without manual intervention.

Azure's Summarization API within the Language Service enables abstractive summarization, creating human-like summaries that convey the document's essence rather than just extracting key phrases.


Using Azure_AI Extension with the azure_cognitive Schema

The Azure_AI extension integrates Azure Cognitive Services' Summarization capabilities directly into SQL workflows, allowing the generation of abstractive summaries of SoWs or other financial documents using simple SQL commands.

Abstractive Summarization

The extension's abstractive summarization capabilities provide a unique, natural-language summary that encapsulates the overall intent of the original text. This is performed by calling the azure_cognitive.summarize_abstractive function within the database. This will generate a 2-3 sentence summary of the text passed in.

SQL
1
SELECT azure_cognitive.summarize_abstractive('This is a document text', 'en', 2)

Consider the following PostgreSQL table:

SQL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE TABLE IF NOT EXISTS sows (
    id BIGSERIAL PRIMARY KEY,
    number text NOT NULL,
    vendor_id BIGINT NOT NULL,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    budget DECIMAL(18,2) NOT NULL,
    document text NOT NULL,
    metadata jsonb,
    summary text,
    FOREIGN KEY (vendor_id) REFERENCES vendors (id) ON DELETE CASCADE
);
SQL
1
2
3
4
5
6
-- Update the summary column with abstractive summaries
UPDATE sows
SET summary = azure_cognitive.summarize_abstractive(
    metadata::text,
    '{"max_sentence_count": 2}'
);

If you receive an error similar to the following, you chose a region that does not support abstractive summarization when creating your Azure environment:

Bash
1
2
3
ERROR: azure_cognitive.summarize_abstractive: InvalidRequest: Invalid Request.
InvalidParameterValue: Job task: 'AbstractiveSummarization-task' failed with validation errors: ['Invalid Request.']
InvalidRequest: Job task: 'AbstractiveSummarization-task' failed with validation error: Document abstractive summarization is not supported in the region Central US. The supported regions are North Europe, East US, West US, UK South, Southeast Asia.

To be able to perform this step and complete the remaining tasks using abstractive summarization, you must create a new Azure AI Language service in one of the supported regions specified in the error message. This service can be provisioned in the same resource group you used for other lab resources.

Alternatively, you may substitute extractive summarization using the azure_cognitive.summarize_extractive function for the remaining tasks but will not get the benefit of being able to compare the output of the two different summarization techniques.

SQL
1
SELECT azure_cognitive.summarize_extractive('This is a document text', 'en', 2);

Insert Document Summary on Database Insert

Leveraging the azure_cognitive.summarize_abstractive method of the azure_ai extension, the database scripts are able to make calls to generate a document summary on INSERT or UPDATE.

Here's an example INSERT script used by the application when creating SOW records that includes the summarization:

SQL
1
2
3
4
5
6
7
INSERT INTO sows (number, start_date, end_date, budget, document, metadata, embeddings, summary, vendor_id)
VALUES (
    $1, $2, $3, $4, $5, $6, 
    azure_openai.create_embeddings('embeddings', $7, throw_on_error => FALSE, max_attempts => 1000, retry_delay_ms => 2000),
    azure_cognitive.summarize_abstractive($7, 'en', 2)
    $8)
RETURNING *;

API Implementation

The /sows/ HTTP POST method of the REST API contains code that inserts or updates SOWs based on the document uploaded. The code for this is within the src/api/app/routers/sows.py file. Open it now in Visual Studio Code and explore the code within the async def analyze_sow method that contain the code to ingest SOW documents, including the portion that performs the database INSERT or UPDATE on the sows table.

You can expand the section below to see the specific section of code that performs the azure_ai call to generate the document summary, within the database INSERT and UPDATE statements.

INSERT / UPDATE SOW with document summary generation
src/api/app/routers/sows.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# Create SOW in the database
async with pool.acquire() as conn:
    if sow_id is None:
        # Create new SOW
        row = await conn.fetchrow('''
            INSERT INTO sows (number, start_date, end_date, budget, document, metadata, summary, vendor_id)
            VALUES (
            $1, $2, $3, $4, $5, $6, 
            azure_cognitive.summarize_abstractive($7, 'en', 2),
            $8)
            RETURNING *;
        ''', sow_number, start_date, end_date, budget, documentName, json.dumps(metadata), full_text, vendor_id)
    else:
        # Update existing SOW with new document
        row = await conn.fetchrow('''
            UPDATE sows
            SET start_date = $1,
                end_date = $2,
                budget = $3,
                document = $4,
                metadata = $5,
                summary = azure_cognitive.summarize_abstractive($6, 'en', 2)
            WHERE id = $7
            RETURNING *;
        ''', start_date, end_date, budget, documentName, json.dumps(metadata), full_text, sow_id)

Additional Learning References