Architecture 1
Large language models (LLMs) now support a wide range of use cases, from content summarization to the ability to reason about complex tasks. One exciting new topic is taking generative AI to the physical world by applying it to robotics and physical hardware.
Inspired by this, we developed a game for the AWS re:Invent 2024 Builders Fair using Amazon Bedrock, Strands Agents, AWS IoT Core, AWS Lambda, and Amazon DynamoDB. Our goal was to demonstrate how LLMs can reason about game strategy, complex tasks, and control physical robots in real time.
RoboTic-Tac-Toe is an interactive game where two physical robots move around a tic-tac-toe board, with both the gameplay and robots’ movements orchestrated by LLMs. Players can control the robots using natural language commands, directing them to place their markers on the game board. In this post, we explore the architecture and prompt engineering techniques used to reason about a tic-tac-toe game and decide the next best game strategy and movement plan for the current player.
RoboTic-Tac-Toe demonstrates an intuitive interaction between humans, robots, and AI. Participants can access the game portal by scanning a QR code, and choose from multiple modes:
When a player chooses a target cell, the two robots, positioned beside a tic-tac-toe board, respond to commands by executing precise movements to place X or O markers. The following video shows this in action.
RoboTic-Tac-Toe features a seamless integration of AWS services, alleviating the need for pre-programmed sequences. Instead, AI dynamically generates descriptive instructions in real time. The following diagram describes the architecture built on AWS IoT Core, which enables communication between Raspberry Pi Controlled robots and the cloud.
The solution uses the following key services:
Strands Agents automate tasks for your application users by orchestrating interactions between the foundation model (FM), data sources, software applications, and user conversations.
The Supervisor Agent acts as an orchestrator that manages both the Move Agent and the Game Agent, coordinating and streamlining decisions across the system. This process consists of the following steps:
This module provides a governance layer over the AI-powered environment, enabling scalable orchestration across agents. By intelligently directing requests and unifying responses, the Supervisor Agent facilitates reliable execution, simplified monitoring, and enhanced user experience.
The Move Agent generates step-by-step Python code. This process consists of the following steps:
This module provides efficient and traceable navigation for robots by using AI-powered instruction generation while maintaining a robust logging mechanism for monitoring and debugging.
The Game Agent functions as an opponent, capable of playing against human users. To enhance accessibility, players use a mobile-friendly web portal to interact with the game, which includes an admin panel for managing AI-driven matches. The LLM player is a serverless application that combines AWS Lambda, Amazon DynamoDB, and Strands Agent to manage and automate the moves. It tracks game progress by storing move history in an Amazon DynamoDB table, allowing it to reconstruct the current board state whenever requested. The gameplay process consists of the following steps:
‘X’ or ‘O’), making sure that the correct model is used for decision-making.This system allows for real-time, AI-driven gameplay, making it possible for players to compete against an intelligent opponent powered by LLMs.
In our RoboTic-Tac-Toe project, computer vision plays a crucial role in producing precise robot movements and gameplay accuracy. Let’s walk through how we implemented the solution using AWS services and advanced computer vision techniques. Our setup includes a Raspberry Pi camera mounted above the game board, continuously monitoring the robots’ positions and movements. The camera captures images that are automatically uploaded to Amazon S3, forming the foundation of our vision processing pipeline.
We use Principal Component Analysis (PCA) to accurately detect and track robot orientation and position on the game board. This technique helps reduce dimensionality while maintaining essential features for robot tracking. The orientation angle is calculated based on the principal components of the robot’s visual features.
Our OpenCV module is containerized and deployed as an Amazon SageMaker endpoint. It processes images stored in Amazon S3 to determine the following:
A dedicated AWS Lambda function orchestrates the vision processing workflow. It handles the following:
This computer vision system facilitates accurate robot navigation and game state tracking, contributing to the seamless gameplay experience in RoboTic-Tac-Toe. The combination of PCA for orientation detection, OpenCV for image processing, and AWS services for deployment helps create a robust and scalable computer vision solution.
RoboTic-Tac-Toe showcases how AI, robotics, and cloud computing can converge to create interactive experiences. This project highlights the potential of AWS IoT, machine learning (ML), and generative AI in gaming, education, and beyond. As AI-driven robotics continue to evolve, RoboTic-Tac-Toe serves as a glimpse into the future of intelligent, interactive gaming.
Stay tuned for future enhancements, expanded gameplay modes, and even more engaging AI-powered interactions.
TL;DR AI is already raising unemployment in knowledge industries, and if AI continues progressing toward…
The canonical approach in generative modeling is to split model fitting into two blocks: define…
As organizations increasingly adopt AI capabilities across their applications, the need for centralized management, security,…
From uncovering new insights in multimodal data to personalizing customer experiences, AI is emerging as…
OpenAI has sent out emails notifying API customers that its chatgpt-4o-latest model will be retired…