Building with Palantir AIP: Semantic Search

By Chad Wahlquist, Palantir Forward Deployed Architect

Welcome! This is the first “cooking show”-style video in the Building with Palantir AIP series, where Palantir engineers and architects will take you through how to build end-to-end workflows using Palantir’s AI Platform.

This first installment demonstrates how AIP enables developers to go from unstructured data to a full-featured semantic search application in a matter of minutes. Semantic search enables users to search based on the meaning of words and phrases, rather than being confined to keyword searches. This means results are more relevant, all-encompassing, and reliable — and can ultimately help drive workflows.

Like a true cooking show, the video kicks off by showcasing the final semantic search application (shown below), and then walks through how to assemble it step-by-step.

This is just one of many use cases exemplifying how you can use AIP to generate an interactive AI copilot that uncovers the useful, often elusive knowledge that’s engrained within an enterprise’s unstructured data.

Think of the hidden, actionable insights that LLMs can help us find across ticketing systems, emails, documents, and more. With AIP, you can swiftly unlock and utilize this knowledge without having to duplicate your data systems or spend months stitching together tools to make an application that’s production-grade.

Walkthrough

In building the semantic search application, this video touches on seven topics (in sequence): Virtual Tables, Embeddings, Pipeline Builder, the Ontology-powered Vector Store, Semantic Search, AIP Logic, and AIP Assist.

Let’s build!

1. Virtual Tables

Virtual Tables is a data connectivity feature within Palantir AIP that enables developers to seamlessly register data assets within their existing cloud data platforms without needing to duplicate data. That means you can begin building on your data in AIP within minutes.

In the video, we demonstrate setting up a connection to Google’s BigQuery, registering a table, and immediately starting to build.

For additional details and the cloud data platforms that Virtual Tables currently supports, refer to the documentation.

2. Embeddings

Embeddings are a critical component of modern natural language processing, and instrumental to workflows deploying the newest generation of LLMs. Embeddings leverage the implicit lexical relationships between words, phrases, or documents, and can be used to power workflows that go far beyond traditional token- and keyword-based search.

In the video, we demonstrate how to use AIP to generate embeddings by converting free-form text from a trouble ticket system into vectors using an embedding model. As shown in the example below, numerically alike embeddings exhibit semantic similarly — for instance, the embedding vector of “face mask” will be closer to the embedding vector of “face covering” than it is to “respirator.”

The vectors we generate represent the semantic meaning of the ticket problem and resolution, enabling easier search and summarization of knowledge across the enterprise.

3. Pipeline Builder

Palantir’s flagship data pipelining application, Pipeline Builder, brings the power of low/no-code to a robust, auto-scaling backend that leverages both Apache Spark and Apache Flink.

Pipeline Builder allows users with any level of technical ability to create production-grade, scalable pipelines — while also embracing git-style branching and merging for both the code and the data in the pipeline.

We demonstrate how Pipeline Builder can connect to Virtual Tables from Big Query, employ AI-driven natural language prompts to construct the cleaning steps of the pipeline (along with documentation), and then call OpenAI’s ada embedding model using the embedding builder board.¹

4. Ontology-powered Vector Store

The Palantir Ontology lies at the heart of the platform, enabling users to create a decision-centric twin of their business and processes.

The Ontology consists of data, logic, and action — functioning as a real-time decision graph of your operations. As a semantic system, the Ontology serves as the ideal foundation for LLMs to securely interact with your enterprise.

We demonstrate creating an ontology object that semantically represents the trouble tickets stored in BigQuery, including a look at native vector management that can be configured in a few clicks.

5. Semantic Search

Semantic search enables users to go beyond simple traditional search paradigms, using AI to leverage the inherent meaning in text to find the most pertinent information and drive workflows.

In previous steps, we created Ontology objects with semantic embeddings. Here, we create a custom function that will act as a tool for our LLM-based AI copilot, demonstrating how Ontology-driven tools open up the world of operational workflows.

 import { Function, Integer } from "@foundry/functions-api"
import { Objects, TitanTroubleTickets } from "@foundry/ontology-api"
import { TextEmbeddingAda_002 } from "@foundry/models-api/language-models"

export class MyFunctions {
@Function()
public async searchTroubleTickets(userText: string, kValue: Integer): Promise<TitanTroubleTickets[]> {

const embedding = await TextEmbeddingAda_002.createEmbeddings({inputs: [userText]}).then((r: { embeddings: any[]; }) => r.embeddings[0]);

let objectSet = Objects.search().titanTroubleTickets();

return objectSet
.nearestNeighbors(obj => obj.ticketDescriptionEmbeddings.near(embedding, {kValue: kValue}))
.orderByRelevance()
.take(kValue);
}
}

6. AIP Logic²

AIP Logic is a no-code environment to build, test, and release functions powered by LLMs. It enables you to build automated business processes that leverage the Ontology without the complexity typically introduced by development environments and API calls.

Using Logic’s intuitive interface, application builders can engineer prompts, test, evaluate and monitor, set up automation, and more.

In this AIP Logic function (shown below), we will use the Ontology object and the semantic search function we created as tools for the LLM to find semantically similar problems, and then summarize resolutions into potential steps and resolutions for the current problem.

7. AIP Assist

Workshop is Palantir’s no-code application building framework, which enables teams to quickly build sophisticated interactive applications, far beyond the limits of traditional read-only BI. All user actions are mediated through the Ontology, and are designed to write back to source systems and capture data for learning loops that incorporate Reinforcement Learning Through Human Feedback (RLHF).

In the video, we use Workshop to construct our application, leveraging a widget that has access to an LLM-backed AIP Logic function which deploys the semantic search function we created. When a user enters information about an issue, we can then search across the Ontology for similar trouble tickets to surface the actions that, based on past resolutions, are most likely to resolve the issue quickly. With this, we are wielding the power of existing unstructured data to drive real actions — and ultimately reduce downtime.

AIP ensures security and transparency throughout, showing the “chain of thought” reasoning of the copilot at every step in the process. This enables enterprises to rapidly test and trust a whole range of Retrieval Augmented Generation (RAG) workflows.

Everything you see in this video — from connecting and registering existing cloud data assets with Virtual Tables, to creating the embeddings and storing them in the Ontology, to creating the copilot logic in AIP Logic, to building the application using Workshop — takes place within a secure, collaborative platform.

With AIP, you can tap into the knowledge that has historically been trapped in ticketing systems, emails, and documents — and harness this liberated knowledge to drive the business forward.

If you’re ready to unlock the power of AI-driven workflows, sign up for an AIP Bootcamp today. Your team will learn from Palantir experts, and more importantly, get hands-on experience with AIP and walk away having assembled real workflows, in a production environment.

Let’s build!
Chad

[1] The embedding board in Pipeline Builder is still in early access and will soon be GA across all AIP enrollments.
[2] AIP Logic is still in early access and will soon be GA across all AIP enrollments.


Building with Palantir AIP: Semantic Search was originally published in Palantir Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.