1.4 Provided Artifacts¶
This section provides an overview of the table schema and data artifacts (e.g., SOW documents, invoices, and validation results) used in the solution accelerator. It highlights key fields that enable AI-powered processing and explains why specific fields are used by Azure AI services.
Using your own data?
If you are adapting the accelerator to use your own data, modifications may be required to align with your existing database schema and data structures.
-
Map Your Data to the Existing Schema
- Identify which of your tables correspond to
sows
,invoices
, andvendors
. - Ensure your data can be chunked and processed similarly to
sow_chunks
. - Leverage a JSONB column to store additional data dynamically while keeping the core table structures unchanged.
- Identify which of your tables correspond to
-
Add Vector Embeddings to Enable AI Search
- Create a vector column (
embedding
) in relevant tables to store AI-generated embeddings. - Use pg_diskann for efficient AI-powered search.
- Create a vector column (
-
Modify AI Processing Scripts
- Update data extraction scripts to reflect your field names.
- Ensure AI models extract the correct entities based on your schema.
-
Update Graph Data Export & Relationships
- Modify data export scripts to align with your table structure.
- Ensure Graph relationships correctly link entities in your database.
Understanding the Table Schema¶
The accelerator processes Statements of Work (SOWs), invoices, validation results, and extracted entities using a structured PostgreSQL schema. Below is an overview of the key tables and their roles:
Core Tables & Their Purpose¶
Table | Description |
---|---|
vendors |
Stores vendor details, linking them to associated SOWs and invoices. |
status |
Stores various status values that can be used to track the state of different entities within the system. |
sows |
Contains information on Statements of Work, including terms, parties involved, and deliverables. |
sow_chunks |
Breaks down SOW documents into smaller sections for AI processing. |
milestones |
Tracks contractual milestones extracted from SOWs. |
deliverables |
Details key deliverables as defined in SOWs. |
invoices |
Contains invoice details, such as amounts, due dates, and associated SOWs. |
invoice_line_items |
Stores individual line items extracted from invoices. |
invoice_validation_results |
Stores AI-driven validation for invoice accuracy and discrepancies. |
sow_validation_results |
Stores AI-driven validation results for SOW compliance and consistency. |
copilot_chat_sessions |
Stores information about individual chat sessions. |
copilot_chat_session_history |
Stores the history of messages exchanged during a chat session. |
How These Tables Work Together¶
- SOW documents are processed and broken into structured chunks.
- AI services extract key entities (vendors, deliverables, milestones).
- The extracted information is validated and stored in validation tables.
- Invoices are linked to SOWs for contract compliance checks.
Key Fields Used in AI Processing¶
The solution accelerator integrates with Azure AI services to extract, validate, and process information. Below are key fields that play a role in AI workflows and why they were selected.
Relational Fields Used in AI Services¶
Field Name | Table | AI Usage |
---|---|---|
content |
sow_chunks , invoices |
AI processes this field to extract structured data from raw text. |
embedding |
deliverables , sow_chunks , invoice_line_items , invoice_validation_results , sow_validation_results |
Used for vector search and similarity retrieval in semantic search models. |
due_date |
deliverables , invoice_line_items |
AI extracts and can validate deadlines. |
amount |
invoices |
AI extracts and detects discrepancies in invoice details. |
result |
sow_validation_results , invoice_validation_results |
AI-generated analysis of compliance for dpcuments processed |
validation_passed |
sow_validation_results , invoice_validation_results |
AI-driven validation result for compliance analysis. |
metadata |
invoices |
JSONB column for storing dynamic, AI-generated insights. |
Graph Relationships¶
Field Name | Graph | AI Usage |
---|---|---|
vendor_id ,vendor_name |
vendor_graph |
Used to analyze vendor performance and relationships |
sow_id ,sow_number |
vendor_graph |
Used to track and validate Statements of Work (SOWs) |
payment_status ,invoice_amount |
vendor_graph |
Used to validate and process invoices, detect discrepancies, and ensure compliance. |
Why These Fields Matter¶
- Vector embeddings (
embedding
) enable semantic search by allowing AI-powered retrieval of similar SOW clauses, contract terms, and invoices. - Validation fields (
result
,validation_passed
) ensure AI-generated extractions meet business rules. - Graph relationships within
vendor_graph
supports GraphRAG (Retrieval-Augmented Generation) where chosen fields can be utilized in AI models for analysis related to vendors, contracts, and invoices, as well as enhanced AI accuracy.
Extending the Schema with JSONB¶
The JSONB data type in PostgreSQL provides a powerful way to store and query semi-structured data, making it ideal for AI-driven applications that require flexibility in handling diverse and evolving datasets. AI services often work with rapidly changing schemas, various data formats, and unstructured metadata. JSONB enables these services to integrate additional data points without rigid schema modifications, ensuring adaptability and performance at scale.
Why JSONB is Ideal for AI Services¶
- Schema Evolution: AI applications frequently introduce new features, models, or metadata. JSONB allows for seamless schema modifications without disrupting the existing relational structure.
- Handling Unstructured Data: Many AI workloads ingest sensor data, user interactions, embeddings, and model predictions that don’t fit into traditional relational columns.
- Efficient Indexing and Querying: JSONB supports GIN (Generalized Inverted Index) and JSONPath queries, making it fast and efficient to search for nested data.
- Compact Storage and Faster Retrieval: Unlike plain JSON, JSONB is binary-optimized, reducing overhead in AI pipelines that process large volumes of data.
Using JSONB for AI-Powered Metadata and Model Outputs¶
When integrating AI services into a PostgreSQL-backed system, JSONB can be leveraged to store:
- Inference results (e.g., probabilities, classifications, embeddings).
- Model metadata (e.g., hyperparameters, training configurations).
- Feature extraction details (e.g., NLP tokenized words, vector embeddings).
- Real-time user interactions (e.g., chat history, event tracking).
Adding indexes for JSONB queries¶
GIN (Generalized Inverted Index) is a type of indexing in PostgreSQL that is optimized for handling complex data types, including JSONB. Since JSONB stores data in a binary format and supports rich querying, GIN indexes make it possible to efficiently search within nested JSONB structures.
Advantages of GIN:
- Fast Query Performance: GIN significantly speeds up key-value lookups inside JSONB columns.
- Optimized for Nested Structures: It can efficiently index deeply nested JSON objects and arrays.
- Supports JSONB Operators: PostgreSQL provides powerful operators (@>, ?, ?&, ?|) that are optimized when used with GIN.
Example Usage¶
SQL | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The @>
operator checks if model_output contains the specified JSON key-value pair.
By leveraging JSONB, AI services can store, retrieve, and evolve their data structures efficiently. This approach ensures that new AI-driven insights, feature extraction, and model outputs can be incorporated into the system without rigid schema modifications, ultimately accelerating the deployment and scaling of AI solutions within PostgreSQL environments.