Scaling On-Prem Security at Palantir

How Insight, Foundry & Apollo Keep Thousands of Servers in Check

Introduction

When it comes to Palantir’s on-premises offerings, security is paramount. We run production software everywhere — from hyperscaler public clouds, to customer-owned data centers that will never interact with the public internet, to edge environments that move around the globe — and tailor our security to each environment.

Palantir’s Cloud Security Posture Management (CSPM) solution protects our cloud service configurations, while our on-premises deployments are secured through operating system (OS) and host-level configurations. Industry benchmarks such as DISA STIGs and CIS are valuable starting points, but Palantir’s standards go further, addressing kernel parameters, log forwarding, system management modules, backups, and the continuous validation we expect in the cloud.

On-premises solutions introduce a different risk profile. Palantir’s on-prem offerings are deployed in environments made up of physical nodes — ranging from a single server to large clusters — either at the edge (in mobile or semi-mobile stacks) or within customer data centers. Edge deployments typically involve just one to three nodes, while data center installations can scale up significantly.

Physical access is a major consideration for on-prem solutions. The security of the local network, hardening the underlying operating systems, and managing multiple OS types all come into play. Additionally, as in the cloud, using Kubernetes as the substrate to deploy solutions like Palantir’s Rubix introduces its own layer of security requirements in an on-prem context.

To meet these challenges, Palantir’s on-prem security standards encompass many individual security checks — and the number continues to grow as new threats emerge. Critical validations include data encryption, enforcing opinionated firmware configurations, leveraging Trusted Platform Modules (TPMs), and ensuring accounts are properly configured. Even details such as the bootloader configuration can significantly impact security.

That’s why we developed Insight, a purpose-built solution to help Palantir environment builders and customers uphold Palantir’s rigorous on-prem deployment security standards. With Insight, organizations can trust that their on-prem environments are protected, allowing them to focus on what matters most: their mission.

Palantir Insight: Host-Level Benchmarking for On-Prem

As Palantir’s on-prem footprint has expanded, manual reviews of hardening checklists quickly became untenable. We needed a consistent and cloud-scale approach for servers that live nowhere near the cloud. Palantir Insight closes that gap.

Insight is a single Go binary that users can run on a Linux host:

$ ./insight scan

Insight provides a collection of “checks,” each of which is used to evaluate a particular security configuration. Checks are modular and can be added, removed, or executed as needed. They range from kernel parameters to the presence of required disaster recovery services. They intentionally go beyond what STIG and CIS provide, encoding the controls we expect in every environment, regardless of external compliance requirements.

Together, checks evaluate a single host against Palantir’s on-prem baseline security standards, emitting a JSON file that contains:

Host metadata (operating system, architecture, compute infrastructure, versions)
Check results with reasons for passing or failing the given check
Summary statistics (number of checks passed/failed)

Note that all sensitive information is redacted to ensure that reports do not inadvertently expose secrets or passwords.

Below is an excerpt from one such report. The “detections” in the top section contains metadata that offers useful context for reviewing the security posture of a host. The “results” field lists every check that was run, along with details explaining why each check passed or failed. Finally, the “statistics” at the bottom of the excerpt provide a quick overview of the host’s level of Insight compliance.

{
    "detections": {
        "operating_system": "linux",
        "distribution": "xxx",
        "architecture": "xxx",
        "hostname": "xxx.xxx.xxx",
        "insight_version": "1.0.0",
        "kernel_version": "x.xx.x"
    },
    "results": [
        {
            "check_name": "No credentials in configuration files",
            "reasoning": "There were 0 files with credentials detected",
            "expected_configuration": "SecureBoot should be configured in the BIOS.",
            "status": "PASSED",
            "check_artifact": {
                "credentials_detected": {},
                "known_bad_credential_files_found": {},
                "number_of_files_with_credentials_detected": 0
            },
        },
        ...
   ],
   "statistics": {
        "ChecksTotal": 100,
        "ChecksRan": 95,
        "ChecksNotApplicable": 5,
        "ChecksSkipped": 0,
        "ChecksPassed": 93,
        "ChecksFailed": 2,
   },
}

Forming the Ontology

Once Insight was able to produce JSON reports, our attention shifted to turning that data into actionable insights. We leveraged Palantir Foundry to build this solution by converting JSON into first-class objects. Below is the “Insight Ontology” which contains four core object types:

Host: Metadata information about the machine
Scan Report: JSON file and summary statistics
Check: Expected host configuration, including severity and remediation text
Result: Outcome of a check, with the reason for passing or failing

The image below shows that when builders upload an Insight scan report, a lightweight function parses the JSON and creates these Ontology objects. This immediately provides lineage, access control, and ad-hoc analytics — no extra code required.

When builders upload an Insight scan report, a lightweight function parses the JSON and creates these Ontology objects.

Collaborative Reviews in Foundry

Builders and reviewers can now see and examine the same data within a dedicated Foundry Workshop application, scoped to the stacks they own and support:

Host Details: Historical trends of Insight compliance across all supported stacks
Scan Report Table: Pass/fail status check of every host
Check Catalog: List of expected configurations, with explanations of their importance and remediation steps

Builders and reviewers can click through each check to see the reasons for passing or failing, and to understand any required remediation steps. Comments can also be added to individual check results to facilitate conversations between builders and reviewers.

Once a remediation is applied, builders either run Insight again or wait for an automated scan to run, then upload the new report into the same interface. Dashboards update within seconds, and the review loop closes without the need for spreadsheets or email attachments.

Capturing Context with the Build Questionnaire

A scan indicates which checks passed and failed, but does not provide information on how the host was built or the scope of the environment. For every new deployment, builders complete a Build Questionnaire that captures:

Basic facts: Who is building it and the size of the build
Infrastructure type: Kubernetes flavor and host operating system
Installation methods: Modules and templates used
Environment purpose: Business justification for building the stack

This Build Questionnaire provides helpful context prior to beginning new builds, ensuring we are prepared to review the stack once it is fully stood up. They have their own dedicated Foundry Workshop application for easy uploading and reviewing by the appropriate teams. We are then able to maintain a historical ledger of build details that would otherwise be lost in meeting notes.

Operating in Air-Gapped Environments

Some customer sites are permanently disconnected; data cannot leave without a lengthy downdraft process, if it can leave at all. This limitation posed additional challenges: while it’s possible to deliver Insight to these customer sites through low-to-high software bundling, on-site review of the results was not feasible. The solution was to leverage Foundry’s Marketplace offering to package and publish our entire Workshop application as a Foundry Marketplace bundle. Environment owners can now self-import the bundle into their stack and access identical dashboards and remediation content — no internet connection required.

In addition to our Marketplace-bundled Foundry Workshop application, we expose metrics related to Insight compliance via our standardized observability and monitoring tooling. This allows large-scale deployments, regardless of whether Foundry is included, to automatically gather Insight compliance results across many hosts into their existing observability tooling.

Eliminating the Last Mile with Apollo

During Insight’s early testing, builders had to fetch the Insight binary from our artifact registry, copy it to every host, execute the scan, retrieve the output, and finally upload it to Foundry. Multiplied by hundreds of hosts, this process became a deterrent and ultimately untenable.

We replaced the manual steps with a Helm chart that deploys Insight as a DaemonSet on every node via Palantir Apollo. The scan runs on a set cadence depending on the environment, writing results to a shared volume. The Insight binary is kept up to date and rolled out to stacks automatically using Apollo’s CI/CD and bundling solutions.

With the Helm chart, builders can now simply download the aggregated reports once per environment and upload them to Foundry. No SSH dance, no version drift, and no pod-to-host file transferring. For deployments that do not utilize Kubernetes, we built a Puppet module that enables the same process.

Impact and What Comes Next

Since rolling out Insight, InfoSec has been involved at every stage of the build process, greatly improving stack visibility and security. This approach has driven a substantial increase in host compliance and streamlined remediation, all while minimizing manual effort. Most importantly, we have developed strong relationships with key stakeholders across the company.

Future work will focus on making Insight even more ingrained in the build process to alleviate pressure on stack builders. We are focused on addressing issues in upstream build templating so that manual remediations become a thing of the past. Additionally, we plan to invest in better educating and providing resources for Palantirians to learn more about on-prem security.

For those wrestling with host compliance at scale — especially across mixed cloud, on-prem, and air-gapped environments — we hope Insight’s architecture sparks inspiration.

Scaling On-Prem Security at Palantir was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.