The Baseline Team and Forward-Deployed Infrastructure Engineering at Palantir

Inside Look: The Baseline Team and Forward-Deployed Infrastructure Engineering at Palantir

At Palantir, our customers rely on our applications operating seamlessly across a variety of cloud providers, on-premises hardware, and both commercial and government networks. They need our platforms to function reliably in these diverse environments. This is where we, the Forward Deployed Infrastructure Engineering team — known as the Baseline team — step in. Amidst this complexity, our mission is ensuring seamless operations and fleet support to help enable our customers’ missions at every turn.

The Baseline team aims to reduce the complexity of our fleet as Palantir scales so that our product developers and customers can focus on achieving outcomes without worrying about the infrastructure and software. Operating at the intersection of innovation and reliability, we accomplish this through three main types of work: environment operations, product support, and infrastructure projects.

Environment Operations

Certain operations at Palantir are better managed by a team focused on the environment or fleet rather than individual Product Development (PD) teams. For instance, building new environments varies based on the platform a customer needs, the type of infrastructure, and whether the environment is on the commercial internet or a classified network, among other considerations. Centralizing this process within the Baseline team allows for consistent build automation and the ability to adapt to both the evolving needs of PD teams as well as Business Development (BD)-driven customer and infrastructure requirements.

When rolling out a new product to the fleet, we manage installations on hundreds of stacks every week. We have optimized this process to be automated whenever possible, doing so in a way that rolls out new products to less-risky environments first and moves on through phases to riskier environments on its way to deploying the product to the fleet. We have built out this automation to run within an internal instance of Foundry, including features like dashboards so anyone interested can track the rollout of a new product. This allows interested devs to see any issues that occurred during the rollout and ensures that these installs become part of the appropriate default platforms for future environment builds.

Another area where the Baseline team can engage in is with the rollout of new infrastructure types. For example, when Palantir’s deployment model changed from software running on VMs to software running on containerized platforms like Kubernetes, we played a crucial role. We built out some of the initial new environments running on containers, helped debug issues that come with setting up new infrastructure for the first time, and identified gaps in how we monitor or operate the new environments. We planned the best way to migrate to the new infrastructure, considered which environments could take on more risk, and migrated those first. We planned for how some of the more remote parts of our fleet that may need to be special-cased in some way for these migrations, getting ahead of potential edge cases.

Product Support

Baseline partners with PD teams to expand Palantir’s software footprint into new and challenging spaces. While PD teams are often directly responsible for the products they develop, this is not always feasible. For example, in remote and disconnected networks, some of which are classified, access is limited to individuals who may need to be trained and accredited, a process that can take many months. This is where the Baseline team comes in. We get accredited for access and provide the first line of support for these environments, eliminating the need for the entire PD team to go through the lengthy accreditation process.

Working with PD teams in this manner has allowed us to establish best practices. One key area where we’ve focused on is creating clear and straightforward monitoring systems. These systems enable someone who did not write the code to quickly diagnose issues in remote environments. This way, both the Baseline and PD teams can start debugging immediately. Before implementing robust monitoring, we sometimes resorted to using Microsoft Paint to draw out graphs and metrics from remote environments, asking, “Is this a problem?” If the answer was yes, we would follow up with PD to create a monitor, streamlining the process for future incidents and enabling quicker issue resolution. This not only benefits us but also improves PD’s best practices for environments where they have direct ownership.

The Baseline team also engineers solutions to optimize limited access in other areas. For example, on some networks, there is a mix of PD teams that do and don’t have access to environments to be operationally responsible. In these cases, we collaborate with teams that have access, empowering them to support their own products. For teams without access, we focus on educating them about the constraints of the network. Even though these teams don’t have direct access, they remain essential for tasks like configuration changes. We have established a process of co-paging a Baseline and a PD team member into incidents, with the Baseline team being hands-on and the PD team providing product expert guidance.

Infrastructure Projects

The Baseline team is on the frontlines, often developing novel solutions for challenges we encounter in the field. For example, one challenge we face when upgrading our platforms is reducing the lag in shipping binaries to remote networks. We want our US government customers on classified networks to receive upgrades as quickly as our commercial customers. To address this challenge, we built Palantir’s Binary Transfer Service (BTS), which automatically ships product binaries and product metadata across domains.

We constantly generate technical solutions for the day-to-day operational workflows we oversee. We mentioned the fleet-wide installation automation above as an example of this, and our stack build automation is another example. Initially, stack builds were managed through a multiple-page checklist that a Baseline team member would run through to get Foundry installed on an AWS stack. This process quickly evolved into a series of helper scripts, then more complex Python automation. It eventually expanded to handle other platforms beyond Foundry, support different cloud providers, and has continued to evolve. This automation was born out of field needs and rapidly became a core offering. Now, our stack build automation is a core part of the Apollo platform, capable of spinning up any existing or new platform on any infrastructure type.

We also plug in whenever deploying infrastructure as part of a new compliance regime. We’ve been integral in achieving accreditations with the US government, including FedRAMP Moderate, IL5, and IL6. Not only do we help stand up new environments to achieve these accreditations, but we also figure out how to adapt existing practices to meet the requirements of these compliance regimes. For instance, if a new compliance regime requires a database to follow a Security Technical Implementation Guide (STIG), we determine whether it makes sense to apply that standard to all databases of that type across the fleet or build automation so new databases in that compliance regime automatically get the correct configuration.

The Baseline team also tackles projects that enable customer outcomes on a shorter time scale than a PD team might be able to support. For example, when the first environments were deployed to the edge, Palantir faced new infrastructure constraints we had not seen before. We had to figure out how to install services that typically required three nodes for quorum on a single host. We also had to manage environments disconnecting from Apollo, as happens with satellites, drones, or trucks when they go out of range. Our goal was not to find the optimal solutions right away but to urgently build something that enabled end users — warfighters in remote places — to accomplish their missions. The Baseline team took stock of numerous issues as we built the first of these environments and figured out the short-term workarounds for each so we could quickly get our customers in the field what they needed.

Join Us, Engineer the Future

Although we have covered three distinct areas where the Baseline team plugs in to help scale Palantir’s operations, it’s important to note that these areas are deeply interconnected. The product support work we do informs the infrastructure projects we develop. The environment operations we undertake shape how we think about product support in the months and years to come. The landscape around all these efforts is constantly evolving as Palantir drives to deliver the latest in AI technology, product delivery, and much more to an ever-widening range of customers.

On the Baseline team, we thrive on overcoming challenges and pioneering solutions that extend Palantir’s capabilities to new frontiers. Our work is dynamic and impactful, ensuring that our technology makes a meaningful difference where it’s needed most. Join us and be part of a team that’s dedicated to engineering the future. Together, we can continue to innovate, adapt, and deliver exceptional results for our customers.

If you are interested in learning more about the Baseline team, please email baseline-recruiting@palantir.com.


The Baseline Team and Forward-Deployed Infrastructure Engineering at Palantir was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.