1 xdU0TAU.max 1000x1000 1

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

With Vertex AI Model Garden, Google Cloud strives to deliver highly efficient and cost-optimized ML workflow recipes. Currently, it offers a selection of more than 150 first-party, open and third-party foundation models. Last year, we introduced the popular open source LLM serving stack vLLM on GPUs, in Vertex Model Garden. Since then, we have witnessed …