Entropy-Preserving Reinforcement Learning

Policy gradient algorithms have driven many recent advancements in language model reasoning. An appealing property is their ability to learn from exploration on their own trajectories, a process crucial for fostering diverse and creative solutions. As we show in this paper, many policy gradient algorithms naturally reduce the entropy—and thus the diversity of explored trajectories—as …

ML 19682 image 1

How Ring scales global customer support with Amazon Bedrock Knowledge Bases

This post is cowritten with David Kim, and Premjit Singh from Ring. Scaling self-service support globally presents challenges beyond translation. In this post, we show you how Ring, Amazon’s home security subsidiary, built a production-ready, multi-locale Retrieval-Augmented Generation (RAG)-based support chatbot using Amazon Bedrock Knowledge Bases. By eliminating per-Region infrastructure deployments, Ring reduced the cost …

Robots with different bodies can now share skills: What intention-based learning changes

Robots are increasingly being used in manufacturing, agriculture and health care. But programming a team of robots to carry out individual tasks raises a question: How can robots learn from other robots if they are built differently? A multi-institutional team including Chongjie Zhang, an associate professor of computer science and engineering at WashU McKelvey Engineering, …

AI benchmark helps robots plan and complete their chores in the real world

No matter how sophisticated they are, robots can often be indecisive and struggle with multi-step chores in the real world. For example, if you tell a robot to tidy a messy room, it might understand the goal but not know where to grab each object. It could even end up inventing steps. To address these …