FORML: Learning to Reweight Data for Fairness

Machine learning models are trained to minimize the mean loss for a single metric, and thus typically do not consider fairness and robustness. Neglecting such metrics in training can make these models prone to fairness violations when training data are imbalanced or test distributions differ. This work introduces Fairness Optimized Reweighting via Meta-Learning (FORML), a …

A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing

A key algorithm for understanding the world is material segmentation, which assigns a label (metal, glass, etc.) to each pixel. We find that a model trained on existing data underperforms in some settings and propose to address this with a large-scale dataset of 3.2 million dense segments on 44,560 indoor and outdoor images, which is …

Regularized Training of Nearest Neighbor Language Models

Including memory banks in a natural language processing architecture increases model capacity by equipping it with additional data at inference time. In this paper, we build upon kNN-LM, which uses a pre-trained language model together with an exhaustive kNN search through the training data (memory bank) to achieve state-of-the-art results. We investigate whether we can …

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent …

Amazon Personalize HIW Diagram Steps

How The Chefz serves the perfect meal with Amazon Personalize

This is a guest post by Ramzi Alqrainy, Chief Technology Officer, The Chefz. The Chefz is a Saudi-based online food delivery startup, founded in 2016. At the core of The Chefz’s business model is enabling its customers to order food and sweets from top elite restaurants, bakeries, and chocolate shops. In this post, we explain …

ML 9589 image001

Distributed training with Amazon EKS and Torch Distributed Elastic

Distributed deep learning model training is becoming increasingly important as data sizes are growing in many industries. Many applications in computer vision and natural language processing now require training of deep learning models, which are growing exponentially in complexity and are often trained with hundreds of terabytes of data. It then becomes important to use …

12A4WmaVC20kLepBrUPtWlJDA 1

The Benefits of Running Kubernetes on Ephemeral Compute

This is the third post in our blog series on Rubix (#1, #2), our effort to rebuild our cloud architecture around Kubernetes. Introduction The advent of containers and their orchestration platforms, most popularly Kubernetes (K8s), has familiarized engineers with the concept of ephemeral compute: once a workload has completed, the resources used to run it …

12A7SmEG0DliJ 5daN5Fq6yQ 1

From Pipeline to Prospect: Inside Palantir Recruitment

Top tips for interns and graduates applying to Palantir, featuring advice from two recent recruiting interns and Palantir’s Early Talent Recruiting Team Hi! We’re Sonam and Jordan. This summer, we had the incredible opportunity to intern on the Recruiting Team at Palantir Technologies. We grew professionally and personally through exposure to multiple sides of Palantir. Sourcing …

12ATMPBZs7nU 9F2Edh3e8sAg

A Day in the Life of a Palantir Incident Management Engineer

The Palantir Incident Response team addresses the highest-priority issues across our platforms — Foundry, Gotham, and Apollo — ensuring they continue to support mission-critical work around the world. Essentially, the team’s core mandate is to respond when things go wrong. More broadly, Incident Response focuses on business continuity while adapting to an ever-expanding feature set as development teams across …

02AqiGHkArgfih01DKP

Introducing Palantir’s Next-Gen Pipeline Builder

Since our inception, we’ve had the privilege of working on some of the most complex, challenging, data-driven missions in the world. This has required us to build software that enables a diverse range of problem-solvers across ever-changing geopolitical, technological, and economic landscapes. Over the past two decades, these experiences have taught us the foundational importance …