ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evaluating over stateless web services (RESTful API), based on a single turn user prompt, or an off-policy dialog trajectory, ToolSandbox includes stateful tool …

ML 16534 1

Intelligent healthcare forms analysis with Amazon Bedrock

Generative artificial intelligence (AI) provides an opportunity for improvements in healthcare by combining and analyzing structured and unstructured data across previously disconnected silos. Generative AI can help raise the bar on efficiency and effectiveness across the full scope of healthcare delivery. The healthcare industry generates and collects a significant amount of unstructured textual data, including …

Applications Now Open for $60,000 NVIDIA Graduate Fellowship Awards

Bringing together the world’s brightest minds and the latest accelerated computing technology leads to powerful breakthroughs that help tackle some of the biggest research problems. To foster such innovation, the NVIDIA Graduate Fellowship Program provides grants, mentors and technical support to doctoral students doing outstanding research relevant to NVIDIA technologies. The program, in its 24th …

12A KjlBrGY66jnz79hybUxdg

Thinking Outside the (Black) Box

Thinking Outside the (Black) Box: Building More Transparent and Explainable AI Systems in AIP (Engineering Responsible AI , #2) Advanced LLMs display incredible capabilities for processing and generating natural language. As discussed in the first blog post in this series, this can be a double-edged sword: LLMs are prone to “hallucinating” nonsensical or fictitious outputs that nonetheless seem …

splunk and amazon sagemaker canvas v0.14 figure 1 1

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

As the scale and complexity of data handled by organizations increase, traditional rules-based approaches to analyzing the data alone are no longer viable. Instead, organizations are increasingly looking to take advantage of transformative technologies like machine learning (ML) and artificial intelligence (AI) to deliver innovative products, improve outcomes, and gain operational efficiencies at scale. Furthermore, …

Experimenting with Gemini 1.5 Pro and vulnerability detection

Unpatched software vulnerabilities can have serious consequences. At Google Cloud, we want developers to reduce the risks they face by focusing on developing code that is secure by design and secure by default. While secure development can be time-consuming, generative AI can be used responsibly to help make that development process faster. At Google, we’ve …

ACL Conference 2024

Apple is sponsoring the annual meeting of the Association for Computational Linguistics (ACL), which takes place in person from August 11 to 16, in Bangkok, Thailand. ACL is a conference in the field of computational linguistics, covering a broad spectrum of diverse research areas that are concerned with computational approaches to natural language. Below is …

ML 16154 image001

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

This post is co-written by Kevin Plexico and Shakun Vohra from Deltek. Question and answering (Q&A) using documents is a commonly used application in various use cases like customer support chatbots, legal research assistants, and healthcare advisors. Retrieval Augmented Generation (RAG) has emerged as a leading method for using the power of large language models …

Generating Gender Alternatives in Machine Translation

This paper was accepted at the 5th Workshop on Gender Bias in Natural Language Processing 2024. Machine translation (MT) systems often translate terms with ambiguous gender (e.g., English term “the nurse”) into the gendered form that is most prevalent in the systems’ training data (e.g., “enfermera”, the Spanish term for a female nurse). This often …

ML 17233 image001

Cisco achieves 50% latency improvement using Amazon SageMaker Inference faster autoscaling feature

This post is co-authored with Travis Mehlinger and Karthik Raghunathan from Cisco. Webex by Cisco is a leading provider of cloud-based collaboration solutions which includes video meetings, calling, messaging, events, polling, asynchronous video and customer experience solutions like contact center and purpose-built collaboration devices. Webex’s focus on delivering inclusive collaboration experiences fuels our innovation, which …