Skip to main content

Integration with Databricks


Data from Databricks Lakehouse can be harmonized with SAP and non-sap data via SAP Datasphere's unified data models for use with richer analytics and other use cases.

Architecture

image of solution diagram
Copy to clipboard

Solution Diagram Resources
You can download the Solution Diagram as a .drawio file for offline use. Alternatively, you may view and edit the Solution Diagram directly on draw.io.
Please note that any changes made online will need to be saved locally if you wish to keep them.

1. Integration with Databricks Delta Lake

Mode(s) of Integration: Federating data live into SAP Datasphere.

Delta Lake is an optimized storage layer that provides the foundation for tables in a lakehouse architecture on Databricks. It brings reliability to data lakes, ensuring ACID (Atomicity, Consistency, Isolation, Durability) transactions, scalable metadata handling, and unifying streaming and batch data processing.

Data from Databricks Delta Lake tables can be federated live into SAP Datasphere virtual remote models using SAP Datasphere's data federation architecture. This integration allows for the seamless augmentation of Databricks data with SAP business data in real-time. The federated data can be incorporated into unified semantic models, enabling efficient and real-time analytics through SAP Analytics Cloud dashboards.

The integration process involves:

  1. Connection Setup: Establishing a secure connection between SAP Datasphere and Databricks Delta Lake using supported connectors and authentication mechanisms.
  2. Data Federation: Configuring virtual tables in SAP Datasphere that reference the live data in Databricks Delta Lake without physically moving the data.
  3. Model Augmentation: Enhancing the federated data with SAP business data to create comprehensive and unified semantic models.
  4. Real-time Analytics: Utilizing SAP Analytics Cloud to build dashboards and reports that leverage the real-time, federated data for actionable insights.

This approach ensures that data remains consistent and up-to-date, providing a robust foundation for advanced analytics and decision-making processes.

Resources