ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evaluating over stateless web services (RESTful API), based on a single turn user prompt, or an off-policy dialog trajectory, ToolSandbox includes stateful tool …

ML 16534 1

Intelligent healthcare forms analysis with Amazon Bedrock

Generative artificial intelligence (AI) provides an opportunity for improvements in healthcare by combining and analyzing structured and unstructured data across previously disconnected silos. Generative AI can help raise the bar on efficiency and effectiveness across the full scope of healthcare delivery. The healthcare industry generates and collects a significant amount of unstructured textual data, including …

Applications Now Open for $60,000 NVIDIA Graduate Fellowship Awards

Bringing together the world’s brightest minds and the latest accelerated computing technology leads to powerful breakthroughs that help tackle some of the biggest research problems. To foster such innovation, the NVIDIA Graduate Fellowship Program provides grants, mentors and technical support to doctoral students doing outstanding research relevant to NVIDIA technologies. The program, in its 24th …

Delivery robots’ green credentials make them more attractive to consumers

The smaller carbon footprint, or wheel print, of automatic delivery robots can encourage consumers to use them when ordering food, according to a new study. The suitcase-sized, self-driving electric vehicles are much greener than many traditional food delivery methods because they have low, or even zero, carbon emissions. In this study, participants who had more …

5 Free Podcasts That Demystify Machine Learning Concepts

Machine learning (ML) has become a buzzword in recent years, with applications ranging from voice assistants to self-driving cars. Yet, for many, the inner workings of these technologies remain a mystery. Podcasts offer a great way to learn about this field without getting overwhelmed. They break down complex ideas into simpler terms and let you …

Building a Simple RAG Application Using LlamaIndex

In this tutorial, we will explore Retrieval-Augmented Generation (RAG) and the LlamaIndex AI framework. We will learn how to use LlamaIndex to build a RAG-based application for Q&A over the private documents and enhance the application by incorporating a memory buffer. This will enable the LLM to generate the response using the context from both …

12A KjlBrGY66jnz79hybUxdg

Thinking Outside the (Black) Box

Thinking Outside the (Black) Box: Building More Transparent and Explainable AI Systems in AIP (Engineering Responsible AI , #2) Advanced LLMs display incredible capabilities for processing and generating natural language. As discussed in the first blog post in this series, this can be a double-edged sword: LLMs are prone to “hallucinating” nonsensical or fictitious outputs that nonetheless seem …