0 NAS Workflow scaled 1

Reduce inference time for BERT models using neural architecture search and SageMaker Automated Model Tuning

In this post, we demonstrate how to use neural architecture search (NAS) based structural pruning to compress a fine-tuned BERT model to improve model performance and reduce inference times. Pre-trained language models (PLMs) are undergoing rapid commercial and enterprise adoption in the areas of productivity tools, customer service, search and recommendations, business process automation, and …

1 Log entry.max 1000x1000 1

Figuring out microservices running on your GKE cluster with help from Duet AI

If you’ve joined a new team recently like I have, you’ve probably had a lot of questions. And answers to those questions may or may not be things you can find easily, and might rely heavily on the generosity, and spare time of your teammates. Let’s say you’re a DevRel engineer, working with Google Kubernetes …

Unlocking the power of chatbots: Key benefits for businesses and customers

Chatbots can help your customers and potential clients find or input information quickly by instantly responding to requests that use audio input, text input or a combination of both, eliminating the need for human intervention or manual research. Chatbots are everywhere, providing customer care support and assisting employees who use smart speakers at home, SMS, …

ASPIRE2520hero

Introducing ASPIRE for selective prediction in LLMs

Posted by Jiefeng Chen, Student Researcher, and Jinsung Yoon, Research Scientist, Cloud AI Team In the fast-evolving landscape of artificial intelligence, large language models (LLMs) have revolutionized the way we interact with machines, pushing the boundaries of natural language understanding and generation to unprecedented heights. Yet, the leap into high-stakes decision-making applications remains a chasm …

1 Copy of Embedding Generator Application v1

AlloyDB AI powers gen AI applications with seamless Vertex AI integration

At Next ‘23, we launched AlloyDB AI, an integrated set of capabilities built into AlloyDB for building generative AI applications. One of those capabilities allows you to call a Vertex AI model directly from the database using SQL. AlloyDB is a fully managed PostgreSQL-compatible database that offers superior performance, availability and scale. In our performance …

Cryptography use cases: From secure communication to data security 

When it comes to data security, the ancient art of cryptography has become a critical cornerstone of today’s digital age. From top-secret government intelligence to everyday personal messages, cryptography makes it possible to obscure our most sensitive information from unwanted onlookers. Whether shopping online or saving valuable trade secrets to disk, we can thank cryptography …

ML 16049 image1 1024x467 1

Fine-tune and deploy Llama 2 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

Today, we’re excited to announce the availability of Llama 2 inference and fine-tuning support on AWS Trainium and AWS Inferentia instances in Amazon SageMaker JumpStart. Using AWS Trainium and Inferentia based instances, through SageMaker, can help users lower fine-tuning costs by up to 50%, and lower deployment costs by 4.7x, while lowering per token latency. …