Categories: FAANG

Bringing tic-tac-toe to life with AWS AI services

Architecture 1

Large language models (LLMs) now support a wide range of use cases, from content summarization to the ability to reason about complex tasks. One exciting new topic is taking generative AI to the physical world by applying it to robotics and physical hardware.

Inspired by this, we developed a game for the AWS re:Invent 2024 Builders Fair using Amazon Bedrock, Strands Agents, AWS IoT Core, AWS Lambda, and Amazon DynamoDB. Our goal was to demonstrate how LLMs can reason about game strategy, complex tasks, and control physical robots in real time.

RoboTic-Tac-Toe is an interactive game where two physical robots move around a tic-tac-toe board, with both the gameplay and robots’ movements orchestrated by LLMs. Players can control the robots using natural language commands, directing them to place their markers on the game board. In this post, we explore the architecture and prompt engineering techniques used to reason about a tic-tac-toe game and decide the next best game strategy and movement plan for the current player.

An interactive experience

RoboTic-Tac-Toe demonstrates an intuitive interaction between humans, robots, and AI. Participants can access the game portal by scanning a QR code, and choose from multiple modes:

Player vs. Player – Challenge a human opponent
Player vs. LLM – Test your skills against an AI-powered LLM
LLM vs. LLM – Watch two AI models strategize and compete autonomously

When a player chooses a target cell, the two robots, positioned beside a tic-tac-toe board, respond to commands by executing precise movements to place X or O markers. The following video shows this in action.

Solution overview

RoboTic-Tac-Toe features a seamless integration of AWS services, alleviating the need for pre-programmed sequences. Instead, AI dynamically generates descriptive instructions in real time. The following diagram describes the architecture built on AWS IoT Core, which enables communication between Raspberry Pi Controlled robots and the cloud.

The solution uses the following key services:

Amazon Bedrock LLM – Uses LLMs and prompt engineering to generate movement plans and game strategies
Strands Agents – An open-source SDK that takes a model-driven approach for building and running AI agents
Amazon SageMaker – Powers AI-driven decision-making and robot movement planning
AWS Lambda – Executes the game logic, resulting in smooth operation and real-time responsiveness
Amazon Simple Storage Service (Amazon S3) – Stores game state data and images captured during play

Hardware and software

The project’s physical setup includes a tic-tac-toe board embedded with LED indicators to highlight placements for X and O.
The two robots (modified toy models) operate through Raspberry Pi controllers equipped with infrared and RF modules.
A mounted Raspberry Pi camera enables vision-based analysis, capturing the board’s state and transmitting data for further computer vision processing. Additionally, a dedicated hardware controller acts as an IoT device that connects to AWS IoT Core, which promotes smooth gameplay interactions.

On the software side, AWS Lambda handles invoking the supervisor Strands Agent, for the core game logic and orchestration.
Computer vision capabilities, powered by OpenCV, analyze the board’s layout and power precise robot movements. Amazon Bedrock agents orchestrate tasks to generate movement plans and game strategies.

Strands Agents in action

Strands Agents automate tasks for your application users by orchestrating interactions between the foundation model (FM), data sources, software applications, and user conversations.

Supervisor Agent

The Supervisor Agent acts as an orchestrator that manages both the Move Agent and the Game Agent, coordinating and streamlining decisions across the system. This process consists of the following steps:

The agent receives high-level instructions or gameplay events (for example, “Player X moved to 2B, generate the robot’s response”) and determines which specialized agent—Move Agent or Game Agent—must be invoked.
The Supervisor AWS Lambda function serves as the central controller. When triggered, it parses the incoming request, validates the context, and then routes the request to the appropriate Strands Agent. Tracing is enabled for the entire workflow to allow for monitoring and debugging.
Depending on the request type:
- If it involves updating or analyzing the game state, the Supervisor invokes the Game Agent, which retrieves the board status and generates the next AI-driven move.
- If it involves physical robot navigation, the Supervisor invokes the Move Agent, which produces the movement instructions in Python code.
The Supervisor Agent consolidates the responses from the underlying agents and structures them into a unified output format. This allows for consistency whether the outcome is a robot command, a game move, or a combination of both.
The interactions, including decision paths and final outputs, are logged in an S3 bucket. This logging mechanism provides traceability across multiple agents and supports error handling by returning structured error messages when issues arise.

This module provides a governance layer over the AI-powered environment, enabling scalable orchestration across agents. By intelligently directing requests and unifying responses, the Supervisor Agent facilitates reliable execution, simplified monitoring, and enhanced user experience.

Move Agent

The Move Agent generates step-by-step Python code. This process consists of the following steps:

The agent receives a start and destination position on a grid (for example, “3A to 4B North”), determines the necessary movements, and sends commands to the appropriate robot.
The LLM Navigator AWS Lambda function generates movement instructions for robots using Strands Agents. When triggered, it receives a request containing a session ID and an input text specifying the robot’s starting position and destination. The function then invokes the Strands Agent, sending the request along with tracing enabled to allow for debugging.
The response from the agent consists of movement commands such as turning and moving forward in centimeters.
These commands are processed and logged in an S3 bucket under a CSV file. If the log file exists, new entries are appended. Otherwise, a new file is created.
The function returns a JSON response containing the generated instructions and the time taken to execute the request. If an error occurs, a structured error message is returned.

This module provides efficient and traceable navigation for robots by using AI-powered instruction generation while maintaining a robust logging mechanism for monitoring and debugging.

Game Agent

The Game Agent functions as an opponent, capable of playing against human users. To enhance accessibility, players use a mobile-friendly web portal to interact with the game, which includes an admin panel for managing AI-driven matches. The LLM player is a serverless application that combines AWS Lambda, Amazon DynamoDB, and Strands Agent to manage and automate the moves. It tracks game progress by storing move history in an Amazon DynamoDB table, allowing it to reconstruct the current board state whenever requested. The gameplay process consists of the following steps:

When a player makes a move, the supervisor Strands Agent retrieves this state function and then calls the Strands Agent function to generate the next move. The agent selection depends on the player’s marker (‘X’ or ‘O’), making sure that the correct model is used for decision-making.
The agent processes the current game board as input and returns the recommended next move through an event stream.
The entire workflow is orchestrated by the supervisor Strands Agent. This agent receives API requests, validates inputs, retrieves the board state, invokes the LLM model, and returns a structured response containing the updated game status.

This system allows for real-time, AI-driven gameplay, making it possible for players to compete against an intelligent opponent powered by LLMs.

Powering robot navigation with computer vision

In our RoboTic-Tac-Toe project, computer vision plays a crucial role in producing precise robot movements and gameplay accuracy. Let’s walk through how we implemented the solution using AWS services and advanced computer vision techniques. Our setup includes a Raspberry Pi camera mounted above the game board, continuously monitoring the robots’ positions and movements. The camera captures images that are automatically uploaded to Amazon S3, forming the foundation of our vision processing pipeline.

We use Principal Component Analysis (PCA) to accurately detect and track robot orientation and position on the game board. This technique helps reduce dimensionality while maintaining essential features for robot tracking. The orientation angle is calculated based on the principal components of the robot’s visual features.

Our OpenCV module is containerized and deployed as an Amazon SageMaker endpoint. It processes images stored in Amazon S3 to determine the following:

Precise robot positioning on the game board
Current orientation angles
Movement validation

A dedicated AWS Lambda function orchestrates the vision processing workflow. It handles the following:

SageMaker endpoint invocation
Processing of vision analysis results
Real-time position and orientation updates

This computer vision system facilitates accurate robot navigation and game state tracking, contributing to the seamless gameplay experience in RoboTic-Tac-Toe. The combination of PCA for orientation detection, OpenCV for image processing, and AWS services for deployment helps create a robust and scalable computer vision solution.

Conclusion

RoboTic-Tac-Toe showcases how AI, robotics, and cloud computing can converge to create interactive experiences. This project highlights the potential of AWS IoT, machine learning (ML), and generative AI in gaming, education, and beyond. As AI-driven robotics continue to evolve, RoboTic-Tac-Toe serves as a glimpse into the future of intelligent, interactive gaming.

Stay tuned for future enhancements, expanded gameplay modes, and even more engaging AI-powered interactions.

About the authors

Georges Hamieh is a Senior Technical Account Manager at Amazon Web Services, specialized in Data and AI. Passionate about innovation and technology, he partners with customers to accelerate their digital transformation and cloud adoption journeys. An experienced public speaker and mentor, Georges enjoys capturing life through photography and exploring new destinations on road trips with his family.

Mohamed Salah is a Senior Solutions Architect at Amazon Web Services, supporting customers across the Middle East and North Africa in building scalable and intelligent cloud solutions. He’s passionate about Generative AI, Digital Twins, and helping organizations turn innovation into impact. Outside work, Mohamed enjoys playing PlayStation, building LEGO sets, and watching movies with his family.

Saddam Hussain is a Senior Solutions Architect at Amazon Web Services, specializing in Aerospace, Generative AI, and Innovation & Transformation practice areas. Drawing from Amazon.com’s pioneering journey in AI/ML and Generative AI, he helps organizations understand proven methodologies and best practices that have scaled across millions of customers. His main focus is helping Public Sector customers across UAE to innovate on AWS, guiding them through comprehensive Cloud adoption framework (CAF) to strategically adopt cutting-edge technologies while building sustainable capabilities.

Dr. Omer Dawelbeit is a Principal Solutions Architect at AWS. He is passionate about tackling complex technology challenges and working closely with customers to design and implement scalable, high-impact solutions. Omer has over two decades of financial services, public sector and telecoms experience across startups, enterprises, and large-scale technology transformations.

AI Generated Robotic Content