Skip to main content

Data Extraction and Enrichment Patterns for SAP Document AI


Extracting structured data from unstructured documents is only the first step in intelligent document processing. The enrichment layer transforms raw extraction output into business-ready information by validating data against master records and applying business rules. This approach, extraction followed by enrichment, enables high-accuracy automation while having flexibility for specific business logic.

Architecture

image of solution diagram
Solution Diagram Resources
You can download the Solution Diagram as a .drawio file for offline use. Alternatively, you may view and edit the Solution Diagram directly on draw.io.
Please note that any changes made online will need to be saved locally if you wish to keep them.

Flow

The reference architecture demonstrates how extracted data progresses to ERP-ready payloads:

  1. Intelligent Processing: Using SAP Document AI workflows, documents are filtered and routed to be extracted using the right schema.
  • File processing: Attachments are extracted from mails, documents are split as needed, conditions and custom scripts provide filtering and routing
  • Document classification: SAP Document AI can do more than just extracting text. Classify document by type or by any other attribute by creating a custom classifier schema with prompted instructions.
  • LLM-powered extraction: The extraction engine applies the selected schema to retrieve structured data from the document. SAP provides more than 30 schemas for different documents, and custom schemas can cover any other scenario. LLM models will extract the data, with confidence scores for each field. Processing instructions embedded in the schema guide the model to handle ambiguous cases and organization-specific terminology.
  1. Master Data Enrichment - SAP Document AI provides master data enrichment capabilities without any custom code. For custom enrichment and validations, use outbound channels to trigger a CAP application on Cloud Foundry or an Integration Suite flow. These services can update the extraction with the new information.

  2. Confidence-Based Routing - Documents will go through different statuses. Each extracted field includes a confidence score (0-100%). Documents with all fields above the configured threshold (typically 90% for critical fields) proceed automatically. Documents with low-confidence will await for human validation.

  3. Human in the loop: Low confidence documents and documents that need additional approvals are managed within Document AI workspace. Users review the document and the extractions side by side. Their corrections improve future extractions.

Examples in an SAP context

  • Invoice three-way match validation - Document AI extracts invoice header and line items. An enrichment service looks up the referenced purchase order and goods receipt from S/4HANA, performs three-way matching (quantity and price tolerances), and validates tax calculations. High-confidence matched invoices auto-confirm for posting; mismatches route to accounts payable for review.

  • Customer purchase order material validation - Document AI extracts customer PO fields including material numbers. The enrichment service resolves customer-specific material codes (e.g., "CustPartA123" → internal "MATNR-456") against S/4HANA product catalog, retrieves pricing agreements, and performs ATP (Available-to-Promise) checks. Materials not found or out of stock get flagged for sales team review.

  • Healthcare intake form with compliance validation - Document AI extracts patient intake forms (patient ID, diagnosis codes, procedure codes, insurance details). Enrichment service validates patient ID against master data, verifies insurance authorization status, validates ICD-10 diagnosis codes, and cross-checks procedure codes against approved treatment plan. Missing insurance authorization triggers exception workflow before processing.

Services and Components

Resources