Categories: FAANG

How Anyone Can Integrate SAP Data in Hours

12Al3L4Xj9O 789qvKtobhG9Q

Your step-by-step guide to accelerating SAP data integration, unlocking insights, and building operational workflows with Palantir HyperAuto.

Enterprise resource planning software is the brain of big business. Whether it’s revenue and expenses, suppliers and inventory, materials and manufacturing, project management and payroll — ERP is the organizer behind the organization.

It’s only natural that any data team will want to integrate this data. Whether the target is a data lake, a data warehouse, a data mesh, or something else entirely — integrating ERP data is essential to realising the value of the enterprise data platform.

But it’s easier said than done.

The data structures behind common ERPs are designed for the performance of the ERP itself. Connecting directly, whether through HTTP, ODBC or JDBC, reveals a dense forest of namespaces, tables and relationships.

At Palantir, we encounter data teams valiantly hacking through this forest, armed with just a data dictionary and a database CLI. Engineers release code on weekends, crossing their fingers the query they wrote on test will run on production without locking the system and disrupting the whole company.

And even after the data is integrated, it needs to be kept up-to-date and error-free. Building a high-performance incremental pipeline with error detection is no small task.

It’s no wonder companies often look to bring on specialist expertise.

The alternative is Palantir HyperAuto. With smart software to tackle the complexity of ERP systems, data teams can get on with the real work. In the following tutorial, we’ll show how a months-long data integration can be automated in just an hour.

So, what’s HyperAuto?

HyperAuto connects and integrates data from ERP and CRM systems. It analyses the ERP’s metadata to generate robust and extensible data pipelines. These pipelines denormalize and reconstruct the business objects with automatically created data transformation pipelines, which means you don’t need to be an expert in proprietary ERP schema.

HyperAuto can then push the integrated data to a variety of cloud platforms, such as Google BigQuery, Snowflake, AWS RedShift, and many more. Or, it can push the data into Foundry Ontology in two clicks. The Ontology organises data into easy-to-understand semantic objects so anyone in the organization can self-serve and make operational decisions.

Editor’s note: The architecture diagram shows the data integration journey from source systems like SAP to cloud platforms like Amazon Redshift and Azure Synapse, through to analytics, operations and real-life use cases.

Five reasons why data teams choose HyperAuto

1. Integrate data securely and at scale

HyperAuto can ingest data at scale and adapts to fit its unique shape. It supports batch and incremental ingestion from on-premise and cloud sources. With robust access controls, you can ensure the right people can access the right data for appropriate use cases. The included data governance tools make auditing access easy and can simulate permissions changes.

2. Built-in connectors for the most common Enterprise systems

Spend less time configuring and more time executing. HyperAuto comes with built-in support for industry standard ERP/CRM systems like SAP, Salesforce, NetSuite, and HubSpot.

3. Flexible tooling for other data sources

Data integration can be messy: not all source systems fall neatly into cookie-cutter solutions. HyperAuto is extensible. If your system has entity-relationships, it can denormalize them into an analytics-ready format. Building on Palantir’s data connection technology, HyperAuto includes support for Cloud filesystems/blob stores, streaming sources, JDBC databases, REST APIs, NoSQL stores, and more. Explore the full list of connectors here.

4. Start quickly without sacrificing stability

Within minutes, HyperAuto’s Software-Defined Data Integration (SDDI) suite can generate Spark code to clean, transform, and standardize your data. The result is human-readable code with an integrated development environment which can be reviewed, modified, tested, and version controlled in one place. All code is checked-in to the git version control system, so it can be managed just like your human-written code.

5. Centralize your data and push it into the destination system of your choice

With everything in an analytics-ready structure, you can load your data wherever it needs to go. Use HyperAuto’s data export tools to ship to a data warehouse, or Foundry’s Ontology to quickly build insights from scratch.

Tutorial: Integrating SAP data with HyperAuto

Let’s illustrate what makes HyperAuto unique with an example. In this scenario, pretend you’re a data engineer at a manufacturing firm looking to build a complete view of your supply chain across different source systems. To get started, we’ll use HyperAuto to ingest supply chain data from SAP, clean it, and then export it to an S3 bucket. It’s worth noting that all of the data below is notional.

While we will be focusing specifically on SAP integration here, you can read more about other applications for HyperAuto here.

#0: Set up your SAP connection

First things first: head over to the docs to get your SAP data connection online with the official SAP-certified connector. For more information on getting started, click here.

#1: Access your SAP instance

Now we’ll access the connected SAP instance through the Data Connection app. You may have a few integrations, so be sure to look for Sources with the type “magritte-sap-source” and the Source relating to the right SAP instance. For more information on the Data Connection layout and nomenclature, click here.

#2: Explore your data

Click “Explore and create syncs” to open the HyperAuto exploration UI. HyperAuto reads directly from your SAP instance on-demand, so it will include stock SAP tables as well as any custom tables you have created. From here, we will be able to pick and choose the tables we want to import through a visual shopping cart like flow.

What’s more, because you’re able to see and select exactly the data you need, you can avoid unneeded expensive queries, minimising any potential disruption or outage to your ERP instances.

#3: Start importing the data using a pre-packaged workflow

You have a few options from this screen — such as importing individual SAP business objects — but we’re going to get started by importing one of HyperAuto’s pre-packaged workflows from the Workflows tab. We’ll import the Supply Chain Disruption workflow by clicking the “+” button next to it. This will automatically include the tables in this workflow into your exploration. Press the “Layout” button for a cleaner graph visualization. The visualization makes it really clear to see the relationships between all the tables.

HyperAuto is equipped with pre-defined industry use cases or workflows such as supply chain disruption and inventory management. When using a specific workflow, HyperAuto automatically pulls the business objects and tables needed for this workflow together and, when ingested, applies the necessary logic to get the data ready for your use case.

#4: Customize your imports with additional tables

In addition to these workflows, you can further customize your imports with any tables you want. For example, let’s say you were asked to include metadata on units of measurement. Navigate to the “Modules” tab, search for the Units of Measure for Material table (MARM), and click the “+” button to include it in your exploration.

#5: Explore table relations to find what you need

It can be hard to know all of the tables you might want ahead of time, so HyperAuto helps you explore your SAP instance. Without previous knowledge of SAP’s schema, you easily can navigate through linked tables to discover additional usable data. Behind the scenes, HyperAuto is translating all the internal table names and columns names into their human-readable forms. Think of it as the automatic data dictionary.

While exploring these tables, we can see that “Material Descriptions” refers to a client, so let’s import that data as well. Click the link icon on “Material Descriptions” to show its related tables, then include the Clients table into your imports by right clicking on it and choosing “Add 1 table to syncs”.

#6: Configure your syncs

Click “Configure syncs” to review your proposed data ingestion. Here, you can fine-tune settings and preview data for individual tables. This gives you the flexibility to change your ingestion strategy; for example, you can choose between snapshot or append. These give you the option to either fully ingest the data or just pull incremental records. You can check out more information on transaction types here.

For this tutorial, we’ll proceed with the defaults by clicking the “Ingest & Integrate Data” button, then “Create syncs” to confirm.

#7: Generate pipeline code

HyperAuto will now configure all the ingestion syncs with the source database. It’ll also write all the PySpark code we need for our pipeline. Magic? Not quite: by looking at the entity relationships within the system, we can reproduce the business objects and then overlay the logic from the Supply Chain Disruption Workflow.

PySpark is an incredibly scalable and powerful data transformation framework, and its in-memory computation provides high performance. With all the code version controlled in git, it allows you to work with it just like human-written code.

You’ll then see a pop-up informing you that a pull request has been filed with these changes. As all changes exist on a branch, you can conduct any checks and reviews your organization requires before merging into master. For now, we’ll let the pull request merge automatically in the background. Click “Next” to proceed, then close this window. Click “Navigate to SDDI” to go to the SDDI interface.

#8: Run the generated raw data ingests

With the logic in place, now it’s time to bring in the data. You can use the SDDI flow to guide you through the data integration process. As a first step, we need to ingest the raw data from SAP. Click “Action next steps” to run the syncs we just set up as well as the metadata syncing we’ll need later. HyperAuto can then begin to pull all the data behind the scenes.

#9: Open the pipeline visualization

Once finished, click on “Generate pipeline” to move to the next step. This page provides an overall view of any pending changes to your pipelines. If multiple people are collaborating and decided to add additional checks in step 7, we could browse those here. Since our pull request has already merged, click “Open data lineage” to open the pipeline in Data Lineage.

#10: Build the cleaning and integration pipeline

Data Lineage shows how the raw and transformed tables are related to each other. Each rectangular node on the graph represents a dataset and the code that produces it. The connections are the inputs and outputs of the data frames, which makes digging into the dependencies and relationships between transformations easy. This page is packed with insights — you can quickly preview each table’s data and code, trace user permissions, and review build performance history all in the same place. For now, we’ll use it to quickly build our new pipeline.

To continue, select all of the tables with “Select → Select all”, then click the build icon and follow the steps.

#11: Review the finalized data

This will run the cleaning and integration steps that HyperAuto generated. When these builds are finished, you can preview data and see the code behind the finalized datasets.

#12: Create a S3 sync source

With your SAP data now cleaned and integrated, the next step is to push it to your operational layer. For this example, we will connect and push to an S3 bucket. To set up this connection, return to the Data Connection app and select “New source”, then “S3.” From here, you will need to configure the connection by selecting an agent to run this export and providing details on the S3 bucket. You’ll have to save the new data connection source to a project folder so that it can be properly permissioned.

#13: Create a data export task

Now we will configure a task on this S3 source to export the data we’ve been working on. Navigate to the folder where you saved the source, right click on it, and select “Create Data Connection Task.” In this window, we’ll configure the task’s inputs, outputs, and settings.

In the configuration window, we will set this task type to “export-s3-task” and declare the output directory path within the S3 bucket. While there are other options such as bucket policy and rewrite strategy that we could modify, we will use the defaults for now.

Next, we’ll configure the tables we want to export. Click “Add entry” under “Inputs”, then navigate to the finalized tables and add what you want to export. Lastly, we’ll create an output to capture the results of the export by clicking “Add entry” under “Outputs” and selecting an output folder.

#14: Run the export

To run the export, click on the output dataset, then “Build.” Behind the scenes, the tables we configured will be exported to the S3 bucket. In the future, this build could be configured to run automatically when the underlying data updates or on a set schedule.

With these steps done, you have now imported several SAP objects, cleaned them, denormalized them into analytics-ready datasets, and exported them to a data warehouse. From here, you can use these datasets to power any visualization or analytics tools such as Tableau or Power BI. Alternatively, you can also push the data into Foundry’s own analytics and workflow tools. The choice is yours.

You now have an extensible data asset. Using HyperAuto, you can combine objects from multiple SAP/ERP systems and enrich them with data from other sources, such as time series or geospatial data, to create a more holistic view of your business.

Conclusion

In this post, we explored how easy and quick it is to get data out of your ERP(s) and into the end destination of your choice using HyperAuto. Complex data integration projects that take months — or even years — to complete can now be done in hours, allowing engineers to focus on what really matters: delivering data to the people that need it in a format they can work with.

Curious to see how this translates on the ground? Learn how Fujitsu integrated various ERP systems in days to enable everyone — from Operations Managers, Financial Analysts, Service Specialists, Engineers and Executives — to tap into the power of the organization’s data, accelerating decision-making and driving better business outcomes.

Are you exploring data integration tools? If so, sign up for a personalized demo of HyperAuto here.

How Anyone Can Integrate SAP Data in Hours was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

AI Generated Robotic Content