Deploying Across Security Domains

Palantir Binary Transfer Service & Apollo

In the same way we’ve come to expect rapid delivery of Amazon orders, every organization strives for up-to-date, secure software at the speed of need. For those with a modest set of products, capable DevOps teams, and networks that can connect to the public internet, this is a tractable challenge.

Consider, however, Amazon’s challenge of delivering goods to Mackinac Island, MI, which offers a quaint analogy to a disconnected or “air-gapped” network. It’s an island accessible only by ferry, and motorized engines have been banned there since the late 19th century. Once they arrive ashore, Amazon orders are delivered by horseback and dray carriage around town. This is just one of several austere delivery environments where the company refuses to fold.

If you are a government agency, a highly-regulated entity, or a product company servicing these types of organizations, you have customers facing analogous constraints in CI/CD. They may have classified enclaves, fleets of disconnected devices, or remote networks that cannot be served by traditional SaaS frameworks for compliance or security reasons. The problem is that these kinds of air-gaps are part and parcel of national security, defense, public health, and other missions with stringent compliance regimes.

Over decades of working in these spaces, Palantir has honed its autonomous, constraint-based continuous delivery platform — Apollo — to make sure these missions operate with updated, secure software.

In this post, we unpack Palantir Apollo’s approach to deploying software into unclassified and classified air-gapped environments. We begin by framing the problem and then outline Apollo’s solution for air-gapped continuous delivery and day 2 operations. Finally, we unveil a new capability called Palantir Binary Transfer Service (BTS) which autonomously coordinates product and product metadata delivery across domains.

Crafting Network Paths

Consider the plight of a modern physically segmented network serving a national security mission. In these fortified environments, out-of-date software compromises potency and responsiveness, and exposes operations to vulnerabilities. Without a network path from unclassified environments to classified ones, one cannot simply deploy or update software binaries with standard file transfer protocols.

The mousetrap these organizations have historically had to maintain to keep their software ticking along is frustrating:

  1. Determine which software packages need to be sent high-side based on what’s needed, compatible, or allowable
  2. Scan those packages for malicious code and vulnerabilities
  3. Burn them onto one or more DVDs
  4. Manually scan for vulnerabilities
  5. Walk them over to the disconnected network (aka “sneakernet”)
  6. Physically transfer the DVD/media
  7. Scan the software again for viruses high-side
  8. Deploy the software to the target environment(s)

We know because we’ve been in this same foxhole alongside our customers. The transfer process was only modestly improved when Cross Domain Solutions (CDS) arrived on the scene around the turn of the century. Although they permit data to pass between security domains, they don’t do anything to ensure that transfer is done efficiently. (This is to say nothing of the lack of native DevOps support in CDS solutions.)

Getting software deployed onto assets on classified networks was and largely remains time-intensive, sometimes requiring hours of babysitting connections and status bars. The delays are so pronounced that some software companies were forking their codebase into public and private sector versions, which meant commercial users were typically getting new features, patches, and updates faster than their government counterparts.

Danielle Metz, CIO in the Office of the U.S. Secretary of Defense, has placed rapid software delivery at the heart of the DoD’s modernization efforts, stating “the way the department develops and deploys software production systems is a source of new advantage.” At a time where talent cultivation and retention are key tenets of the DoD’s software strategy, it’s risky to repeatedly subject developers to this discouraging process.

To return to our starting metaphor, imagine the product creators were also required to jump on the Mackinac Island Ferry and trot around the island by horse and carriage to ensure safe delivery of their orders. How long before the deployment of the product became operationally and emotionally prohibitive?

Driving Human — Machine Collaboration

At Palantir, one of our foundational goals has always been to build platforms that amplify relative strengths of humans and machines; let robots do what they do best and free humans to creatively build and apply software. We agree with Ms. Metz, and that’s why we equipped Apollo to handle challenging software deployment scenarios with a balance of human and machine coaction. Let’s look at how this works in practice.

Apollo has a hub-and-spoke architecture. The SaaS hub, which operates as the central control plane, can manage an arbitrary number and type of spoke environments. Those spoke environments (e.g., a Kubernetes cluster) run their own spoke control planes that report information and telemetry back to the central hub. Because it maintains a catalog of all software versions that are or should be present in the spoke environments, the hub’s orchestration engine issues actions — installs, upgrades, downgrades, config changes, etc — to agents running on each spoke. The agent then reports the status of those actions back to the hub. If, however, a security boundary or air gap prevents an environment from communicating with the SaaS hub for instructions (e.g., “upgrade Postgres 14.8 to 15.3”), a different architecture is needed. Enter: Apollo Remote Hubs.

A remote Apollo hub is a control plane that resides in an air-gapped or otherwise constrained network and manages spoke environments using a copy of the product catalog from the SaaS hub. Unlike your SaaS hub’s product catalog, which is populated using an Apollo CLI command in your CI pipeline, you periodically populate your remote hub’s catalog with bundles of software data, images, configurations, and settings generated on the SaaS Hub. This is Apollo’s out-of-the-box method for transferring software across security boundaries, and unlike BTS (discussed below) it still involves a manual “sneakernet” step.

What’s the benefit over the frustrating old-school transfer method enumerated above? There are a few, and they have to do with how we’ve architected the bundle itself to work in complex security and compliance regimes.

When you create, download, and manually transfer a bundle from your SaaS hub to your remote hub, the former becomes “aware” of versions, configurations, and metadata deployed in the latter. As a result, subsequent bundles only contain net new, intelligently compressed items for transfer rather then the entire platform definition every time. The Apollo bundle also crucially contains metadata that informs software delivery paths and instructions for product recalls generated on the SaaS hub. Anti-virus (AV) metadata is also relayed in the bundle to ensure high-side AV solutions are always using up-to-date information. Payloads are scanned and their integrity is verified at multiple checkpoints. In short, Apollo’s default process for data transfer includes a manual transfer step, but it alleviates several of yesterday’s DevOps burdens.

This represented a huge step forward for our customers in getting software securely and reliably deployed onto air-gapped networks. It also enabled us to achieve U.S. government accreditations and DoD Impact Level authorizations for Palantir software.

With Apollo’s BTS module, we’ve taken the default software deployment a step further. Now those SaaS-generated bundles can be transferred to and installed in their target environments efficiently and securely using automation. It’s like replacing the ferry and carriage delivery process on Mackinac Island with a high-speed drone that complies with all FAA requirements and local regulations.

When combined with the Palantir BTS module, Apollo solves three challenges associated with transferring software from low-side (e.g., the SaaS Apollo hub) to high-side (e.g., remote hub residing in a classified network) environments. The first challenge was increasing the throughput rate for file transfers. As noted above, transferring full platform binaries in every bundle — even only occasionally — can result in multi-day wait times. Apollo has an efficient data transfer mechanism that assesses, packages, and scans only software that needs to be sent.

The Palantir BTS module specifically addresses the next challenge: securely automating the file transfer process so that bundles can be sent to the remote hub with minimal DevOps overhead. As the above image implies, Apollo lets you download your bundle, burn it to a DVD, sneakernet it to your air-gapped network, and transfer it to your remote hub. Now, users can forego this manual workflow in favor of an automated software transfer process that we’ve honed over many years of experience and has subsequently been accepted by authorizing officials throughout the U.S. government. The Palantir BTS module leverages AWS Diode as a cross-domain solution to move files between security domains. What’s more, it contains an automated mechanism that tracks all software artifacts that have been transferred via Diode, and polls the Apollo Catalog for new releases that need to be transferred. When it finds a new release, it verifies the security scan results as well as the integrity of the artifact by checking the cryptographic checksum published to the artifact repository against a hash generated on the fly. In short, even Apollo’s push-button bundle generation process can be further automated so that new software versions can be scanned, transferred, and released high-side at the speed of mission.

Finally — but certainly not least — was the challenge of ensuring security and integrity of software transfers. Our U.S. government customers are subject to stringent authorization regimes centered largely around security requirements. Apollo has enabled us to achieve FedRAMP, DoD IL5 and IL6, and ICD 503 accreditations of our own software in these environments. Vulnerability and malicious code scanning occurs at checkpoints in the file creation, transfer, and unbundling process.

Organizations with air-gapped environments face monumental challenges every day, but deploying up-to-date software shouldn’t be one of them. Yes, the technical and process constraints associated with governmental accreditations are hard to meet and require a rigorous security architecture. Automating those processes in a compliant way is an impossibly arduous task, typically done by large teams of cleared personnel. The operational overhead of that development paradigm cannot be overstated. The Palantir BTS module brings more value than the mere sum of its components; it also encodes Palantir’s experience into its approved architecture. Today, the Palantir BTS module is used across the government to keep mission systems safely in step with cutting edge technology and has enabled other software providers to offer capabilities within Palantir’s secure, already-accredited environments.

If you’re interested in learning how you can leverage the Palantir BTS module to get your software into the hands of government users across security boundaries or if you’re a government agency looking for a way to help your software vendors achieve accreditation without the standard massive time and resource investment, visit us here and let’s start the conversation.


Deploying Across Security Domains was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.