Categories: FAANG

Interoperability: The ins and outs of sharing data (Palantir RFx Blog Series, #4)

12AJUOkEt ab4P Y7c4e8hIqg

As data ecosystems evolve over time, organizations must establish strong interoperability expectations for their software to ensure utility and impact into the uncertain future.

Editor’s note: This is the fourth post in the Palantir RFx Blog Series, which explores how organizations can better craft RFIs and RFPs to evaluate digital transformation software. Each post focuses on one key capability area within a data ecosystem, with the goal of helping companies ask the right questions to better assess technology. Previous installments have included posts on Ontology, Data Connection, and Version Control.

Introduction

For nearly two decades, Palantir has deployed software into some very complex operational environments. Our software has been installed in research labs, factories, oil rigs, server farms, and multi-tenant cloud environments. We’ve wrestled with data architectures ranging from COBOL-based green screens to the latest generation of edge computing environments. Deploying our software in these different scenarios has taught us a great deal about the importance of interoperability — specifically, the technologies necessary for data architectures to establish stable and secure connections with other systems.

“Is your system interoperable?” has become a standard question in most IT evaluations. Unfortunately, this question is largely unhelpful for purposes of evaluating software. Much like the question “are you a good driver?” invites conspicuously confident responses, many software providers declare their systems to be interoperable with a certainty that belies the actual amount of work required to connect one system to the next. It turns out there are many shades of interoperability, and rare is the data system that can connect to others in a truly flexible and efficient way.

In this post, we explore what interoperability means in practice and how organizations can differentiate between software solutions that are technically interoperable (with heavy customization and costly effort) and solutions that are actually interoperable (with standardized, pre-built connectors and minimal effort).

What is interoperability?

Interoperability refers to the capacity of a system to connect with other systems in a standardized and efficient manner. These connections enable many downstream functions, including data migration, cross-platform data sharing, federated search, and sunsetting of outdated software.

When asking the question, “What is interoperability?” it is first necessary to identify what a system does, and what data artifacts it produces that can potentially be accessed by other systems. For example, a system that’s used primarily for data collection will need to provide access to the data itself as well as relevant metadata. A system that manages tooling for data cleaning, transformation authoring, or business logic will need to provide methods for extracting the underlying code so it can be reused by other systems. Whatever the system does — the products it creates and the outputs it produces — needs to be easily accessible and usable by external systems in order to achieve interoperability.

In this way, interoperability is more than just one system’s ability to connect to another. Data ecosystems produce many kinds of useful information, so to achieve true interoperability organizations must be able to link systems together and specify exactly what, when, and how information is shared. This second part is critical — many software systems have some endpoint connection services to link to other systems, but lack the flexibility to export all useful artifacts in the format, structure, and delivery cadence necessary to make those artifacts operationally useful. The graphic below illustrates some common types of data products that may need to be shared or extracted from a system if it is going to claim interoperability.

What are common issues with interoperability?

There are many ways that systems fail to interact effectively with external systems. Some common issues include:

The system uses custom formats that are not easily readable by external tools. This commonly occurs with homegrown systems made by internal teams or custom-built solutions made by IT consultants. It can also occur with highly specialized, proprietary software that may result in vendor lock-in (deliberately or inadvertently). Regardless of the intent, the result is a set of data products that cannot be accessed or read by other programs without custom adaptors or other costly or fragile methods.
The system does not permit data exports. Many systems lack a simple, repeatable mechanism to produce extracts. This is often due to a lack of back-end access with appropriate permissioning schemes. These systems require significant time and effort to configure exports, which typically lead to unnecessary costs and/or dependencies on a third party (e.g., paying a consultant to do one-off extractions).
The system places limits on the scale of exports. Many systems don’t make all resident data available to external systems. Technical limitations may impose size restrictions on exports, creating a situation where data and outputs cannot be synced in real-time across different systems.
The system may be limited in terms of what kinds of data can be extracted. As systems evolve over time, the software may not always be able to accommodate increased data volumes, new data types, or the ability to extract valuable content such as metadata. Fixed or inflexible API endpoints can become obsolete as the scope of the original system expands to absorb new use cases and functions.
The system lacks robust security features. Creating gateways to share data increases the risk for data breach or data loss. Any point of extraction that is not securely encrypted, authenticated, and audited creates data risks for an organization.

Why does interoperability matter?

Ensuring that data systems are equipped with interoperability features is key to maximizing their utility and efficacy into the future. These features help organizations fight “data entropy,” which is the tendency for data ecosystems to become more chaotic and disorganized over time (see our Palantir Explained — Trust in Data blog post for more color). Robust interoperability features also bring immediate benefits, including:

Increased collaboration between people and teams across the enterprise;
Sharing of outputs from one system to another to be used in a different context, workflow, or use case;
Data migrations as needed for strategic or regulatory purposes;
Improved performance and decision-making by pulling data into the systems that are best suited to do a given task (e.g., modeling, reporting, analysis, etc.);
The ability to shut down (or “sunset”) systems that are outdated or no longer effective.

It takes great foresight and effort to maintain an orderly data ecosystem. Data ecosystems develop organically, with point solutions added one by one over many years, each serving a specific, siloed function. As the system becomes more complex, IT organizations often find themselves focused more on maintaining systems than on supporting the broader goals of the organization. This “tail wags dog” dynamic is especially common at large organizations, where multiple lines of business leverage a broad and overlapping set of data sources. IT departments and data teams invest in new software to meet the needs of different user groups, creating new outputs and data products that also need to be managed. Meanwhile, incumbent software accumulates more and more data, making these systems difficult to swap out even when they become outdated or unfit to meet the needs of the teams that rely on them. As a result, many software systems reaching end-of-life simply persist at great expense with custom support and services.

Data ecosystems can become messier as they grow, as different systems and point solutions are added over time in piecemeal fashion. To maintain control over their data, organizations must ensure that all systems share interoperability features.

Interoperability provides a solution to this conundrum through the secure, reliable, and robust exchange of information (whether data, models, code, etc.) with other systems through a standard protocols or methodologies. These features minimize redundancy and duplication while enabling smooth migrations when data needs to be moved or the time comes to sunset a given system.

By ensuring that every system has the capacity to share information comprehensively and on a timely basis, organizations can unlock the full range of capabilities those systems were designed for. Without these capabilities, systems grow in isolation, creating a fractured data landscape for organizations and an isolated environment for users.

Requirements

The solution must provide out-of-the-box capabilities that provide on-demand access to critical information. By now, most organizations understand that data systems need to have an open, interoperable posture in order to be effective in the long-term, and that “vendor lock-in” based on proprietary data formats or closed/missing endpoints is no longer acceptable (if it ever was). Interoperability technologies are a fundamental component of any effective data system and these features need to be baked into the architecture from day one. There must be minimal latency and restrictions on these exports to ensure maximum flexibility as an organization grows and comprehensive coverage to extract and share all relevant content and data products.

The solution must provide comprehensive access to not only data but also metadata, code, business rules, and data products. Effective connections between systems and collaboration between users requires more than just the sharing of data. Enterprise data platforms store, provide, and produce many different types of information — from the data itself to attributes of the data (metadata) to model outputs — all of which need to be accessible to outside systems and different user groups.

The solution must provide secure connection points via industry standard APIs and interfaces. Connection points must follow industry standards, with standard APIs and drivers to extract data, metadata, logic, and other information into third party tools. Using industry standard connections (such as REST APIs, ODBC, and JDBC drivers) ensures maximum accessibility, consistency, and security when connecting to other platforms and tools.

The solution must export all data and data products to open and readable formats. Many systems produce artifacts in proprietary formats. Converting these “closed” data products into useful information requires custom parsers or code, resulting in wasted time or useless outputs. This significantly hampers the utility of the original system, as the outputs and data cannot be easily leveraged elsewhere. Requiring open formats upfront ensures that other systems can leverage outputs while making it easier to extract data when the software reaches end-of-life.

Endpoints for querying and extracting data must be secure, authenticated, and auditable. Organizational data can be left highly exposed without strong security built into endpoint connections. It is therefore critical to ensure that each endpoint is secure, authenticated, and auditable. More specifically, systems must have accountability mechanisms built in to track what data is pulled, by whom, and when. This mitigates potential misuse of the data being pulled (e.g., overly broad searches or extracts) and offers a level of transparency that maximizes organizational control of data. Without these built-in security features, organizations are much more susceptible to data breaches and misuse of data.

The solution must include self-service capabilities to pull information out of systems. Anything short of self-service or a fixed SLA carries a high-risk dependency on third parties to provide timely and adequate support. Self-service means that users and platform administrators have the relevant documentation, APIs, connection points, and permissions to do it themselves directly in the platform. Self-service interoperability features are required in order to reduce the time, stress, and organizational overhead that is common when exporting data.

Conclusion

For organizations to scale in a coherent and resilient way, they need to establish strong interoperability expectations for their software. Requiring interoperability as an out-of-the-box, self-service capability is critical to ensuring the continued relevance and impact of a given system. And by requiring those interoperability features up front — including standardized, pre-built connectors, and the ability to specify what, when, and how data is shared between systems — organizations can ensure they have the necessary tooling to maintain control over their data even as their data ecosystems grow more complex.

Interoperability: The ins and outs of sharing data (Palantir RFx Blog Series, #4) was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

AI Generated Robotic Content