Skip to main content

SAP and Databricks Integration in SAP Business Data Cloud

SAP’s partnership with Databricks, a leader in unified data and AI platforms, helps streamline data access and enables businesses to harness SAP data for AI and machine learning use cases. This integration is delivered through SAP Databricks, a fully embedded OEM component of Databricks within the SAP Business Data Cloud.

1. SAP Databricks in SAP Business Data Cloud

SAP Databricks is a data intelligence platform designed to bring data and artificial intelligence (AI) together. Integrated within SAP Business Data Cloud, it provides a seamless way to execute machine learning algorithms on SAP data without requiring external ML platforms. By utilizing the Delta Share protocol, data products from SAP Business Data Cloud can be shared with SAP Databricks for processing, and the results can be stored back in the SAP environment, ensuring data security and governance.

Key features of SAP Databricks in this context include:

  • AI/ML Lifecycle Support: Enables experimentation, production, and deployment of machine learning models, including generative AI and large language models.
  • Data Engineering: Provides automated ETL processing, observability, and monitoring in a unified stack.
  • Data Governance: Manages structured and unstructured data, machine learning models, notebooks, dashboards, and files through Unity Catalog.
  • Integration with SAP Ecosystem: Results from SAP Databricks can be used in SAP Datasphere for further analytics and application development, such as creating SAP Analytics Cloud stories.

This tailored version of Databricks focuses on computing capabilities without including its complete architecture or standalone data storage, ensuring it aligns with the specific needs of SAP Business Data Cloud users.

Architecture

image of solution diagram
Copy to clipboard
Solution Diagram Resources
You can download the Solution Diagram as a .drawio file for offline use. Alternatively, you may view and edit the Solution Diagram directly on draw.io.
Please note that any changes made online will need to be saved locally if you wish to keep them.

Characteristics

Zero-Copy Data Exchange

  • Data products from SAP applications, visible within the SAP Business Data Cloud catalog, can be shared with the embedded Unity Catalog of Databricks with a single click.
  • Uses Delta Sharing to connect and blend data without the need for complex ETL pipelines.
  • Enables collaboration among multiple personas (data scientists, data analysts, and data engineers) on readily available SAP data.

Development with Pro-code Tooling

  • Write Apache Spark pipelines to blend SAP and non-SAP data in SAP Databricks notebooks.
  • Use Databricks SQL to analyze data at scale for faster, data-driven decision-making.

SAP Databricks Services and Components

  • Delta Lake: Open data lakehouse foundation.
  • Unity Catalog: Unified security, governance, and cataloging.
  • Databricks Notebook: Data science, AI, and real-time analytics.
  • Apache Spark: Data processing and analytics with parallel processing capabilities.
  • MLflow: Machine learning lifecycle management.

Use Cases for SAP Databricks

  • AI/ML: Build robust models with curated SAP data in Databricks notebooks, create derived data products, and share them back with the SAP ecosystem for AI-driven decision-making.
  • Data Engineering: Process semi-structured and unstructured data at scale, blending curated SAP data to simplify data pipelines and improve collaboration.
  • Analytics: Explore and analyze large amounts of data shared in the Lakehouse (e.g., from BW) for real-time analytics and visualization.

The following diagram illustrates how SAP Databricks can be used to enrich and enhance existing SAP data products for sharing within the broader Databricks ecosystem. The capabilities outlined above are used to create a new data product to share within the Databricks Unity Catalog or back into the SAP Datasphere component for further consumption.

2. Integrating an Existing Enterprise Databricks Platform with SAP BDC

  • Helps customers safeguard their existing enterprise Databricks investment by enabling integration of their Databricks platform with SAP Business Data Cloud via the BDC Connect service.
  • One-time setup to provision the 'BDC Connect' service for Databricks that makes this integration possible.
  • Enables zero-copy bidirectional data sharing of data products with third-party Databricks environments using the Delta Share protocol.
  • Allows customers to maintain their existing Databricks investment while consuming curated SAP data without ETL for ML and analytics in Databricks.
  • Data products must be explicitly shared to Databricks via the BDC Catalog. This is a minor difference between SAP Databricks and Databricks.
  • Data products from the Foundation Services layer (SAP data products) as well as from Datasphere's Object Store layer (custom data products) can be delta shared to Databricks.
image of solution diagram
Copy to clipboard
Solution Diagram Resources
You can download the Solution Diagram as a .drawio file for offline use. Alternatively, you may view and edit the Solution Diagram directly on draw.io.
Please note that any changes made online will need to be saved locally if you wish to keep them.