Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden
With Vertex AI Model Garden, Google Cloud strives to deliver highly efficient and cost-optimized ML workflow recipes. Currently, it offers a selection of more than 150 first-party, open and third-party foundation models. Last year, we introduced the popular open source LLM serving stack vLLM on GPUs, in Vertex Model Garden. Since then, we have witnessed …
Read more “Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden”