Crafting new questions for exams and quizzes can be tedious and time-consuming for educators. The time required varies based on factors like subject matter, question types, experience level, and class level. Multiple-choice questions require substantial time to generate quality distractors and ensure a single unambiguous answer, and composing effective true-false questions demands careful effort to avoid vagueness and assess deeper understanding. Creating high-quality assessment questions of any format necessitates meticulous attention to detail from educators in order to produce fair and valid student evaluations. To streamline this cumbersome process, we propose an automated exam generation solution based on Amazon Bedrock.
In this post, we explore how to build an application that generates tests tailored to your own lecture content. We cover the technical implementation using the Anthropic Claude large language model (LLM) on Amazon Bedrock and AWS Lambda deployed with the AWS Serverless Application Model (AWS SAM). This solution enables educators to instantly create curriculum-aligned assessments with minimal effort. Students can take personalized quizzes and get immediate feedback on their performance. This solution simplifies the exam creation process while benefiting both teachers and learners.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon using a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. In this post, we focus on a text generation use case, and can choose from Amazon Titan Text G1 and other models on Amazon Bedrock, including Anthropic Claude, AI21 Labs Jurassic, Meta Llama 2, and Cohere Command.
With the ability to scale up to 200,000-token context windows, Anthropic Claude v2.1 on Amazon Bedrock is our preferred choice for this post. It is typically helpful when working with lengthy documents such as entire books. When we talk about tokens, we refer to the smallest individual “atoms” of a language model, and can varyingly correspond to words, subwords, characters, or even bytes (in the case of Unicode). For Anthropic Claude on Amazon Bedrock, the average token is about 3.5 English characters. The 200,000 tokens supported by Anthropic Claude v2.1 on Amazon Bedrock would be equivalent to roughly 150,000 words or over 500 pages of documents.
This post demonstrates how to use advanced prompt engineering to control an LLM’s behavior and responses. It shows how to randomly generate questions and answers from lecture files, implemented as a simple serverless application.
The following diagram illustrates the application architecture. We distinguish two paths: the educator path (1) and the learner path (2).
As first-time users, both educator and learner need to complete the sign-up process, which is done by two separate Amazon Cognito user pools. For the educator, when the sign-up is complete, Amazon Cognito invokes the Lambda function called CognitoPostSignupFn
to subscribe the educator to an Amazon Simple Notification Service (Amazon SNS) topic. The educator must approve the subscription to this topic in order to be notified by email with the scorecard of each learner who will be taking the generated exam.
Figure 1: Architectural diagram of the exam generator application
The workflow includes the following steps:
gen-exam.<your-domain-name>
through Amazon Route 53, which redirects the request to the Application Load Balancer (ALB).1.1 The ALB communicates with Amazon Cognito to authenticate the educator on the educator user pool.
1.2 The educator uploads a lecture as a PDF file into the exam generation front-end.
1.3 The Amazon Elastic Container Service (Amazon ECS) container running on AWS Fargate uploads the file to Amazon Simple Storage Service (Amazon S3) in the Examgen
bucket under the prefix exams
.
1.4 The S3 bucket is configured using event notification. Whenever a new file is uploaded, a PutObject
is activated to send the file to the ExamGenFn
Lambda function.
1.5 The Lambda function ExamGenFn
invokes the Anthropic Claude v2.1 model on Amazon Bedrock to generate exam questions and answers as a JSON file.
1.6 The Amazon Bedrock API returns the output Q&A JSON file to the Lambda function.
1.7 The ExamGenFn
Lambda function saves the output file to the same S3 bucket under the prefix Questions-bank
. (You can choose to save it to a different S3 bucket.)
1.8 The ExamGenFn
Lambda function sends an email notification to the educator through the SNS topic to notify that the exam has been generated.
take-exam.<your-domain-name>
through Route 53, which redirects the request to the ALB.2.1 The ALB communicates with Amazon Cognito to authenticate the learner on the learner user pool.
2.2 The learner accesses the frontend and selects a test to take.
2.3 The container image sends the REST API request to Amazon API Gateway (using the GET method).
2.4 API Gateway communicates with the TakeExamFn
Lambda function as a proxy.
2.5 The Lambda TakeExamFn
function retrieves from S3 bucket under the prefix Questions-bank
the available exam in JSON format.
2.6 The JSON file is returned to API Gateway.
2.7 API Gateway transmits the JSON file to the ECS container in the front-end.
2.8 The container presents the exam as a UI using the Streamlit framework. The learner then takes the exams. When the learner is finished and submits their answers, the ECS container performs a comparison between the answers provided and the correct answers, and then shows the score results to the learner.
2.9 The ECS container stores the scorecard in an Amazon DynamoDB table.
2.10 The Lambda DynamoDBTriggerFn
function detects the new scorecard record on the DynamoDB table and sends an email notification to the educator with the learner’s scorecard.
This is an event-driven architecture made up of individual AWS services that are loosely integrated with each other, with each service handling a specific function. It uses AWS serverless technologies, allowing you build and run your application without having to manage your own servers. All server management is done by AWS, providing many benefits such as automatic scaling and built-in high availability, letting you take your idea to production quickly.
In this section, we go through the prerequisite steps to complete before you can set up this solution.
You can add access to a model from the Amazon Bedrock console. For this walkthrough, you need to request access to the Anthropic Claude model on Amazon Bedrock. For more information, see Model access.
You need to install the following:
If you don’t already have a DNS domain registered, you need to create one in order to not expose the DNS of your ALB. For instructions, refer to Registering a new domain.
You also need to request two public certificates, one for each front-end: gen-exam.<your-domain-name>
and take-exam.<your-domain-name>
. Refer to Requesting a public certificate to request a public certificate on AWS Certificate Manager.
Save the values for genCertificateArn
and takeCertificateArn
.
If you want to build the app in a development environment without using your own domain, you can uncomment the following section in the sam
template:
Before we embark on constructing the app, let’s delve into prompt engineering. We use Chain-of-Thought (CoT) Prompting, which allows the model to break down complex reasoning into smaller, more manageable steps. By providing the AI with intermediate prompts that guide its reasoning process step by step, CoT prompting enables the model to tackle sophisticated reasoning tasks. Guiding the AI through an analytical chain of thought in this way allows it to develop complex reasoning capabilities that would otherwise be beyond its unaided abilities.
In the ExamGenFn
Lambda function, we use the following prompt to guide the model through reasoning steps. You can change the prompt and give it different personas and instructions, and see how it behaves.
The application presented in this post is available in the following GitHub repo with the building blocks code. Let’s start with a git pull
on the repo.
We recommend using temporary credentials with the AWS CLI to make programmatic requests for AWS resources using the AWS CLI.
You build two containers, one for generating exams and one for taking exams. Let’s start with building the generating exam Docker image:
GenExamImageUri
and TakeExamImageUri
.Now that you have both containers ready to run, let’s build the rest of the components using AWS SAM.
AWS SAM consists of two parts:
For further information, refer to Using the AWS Serverless Application Model (AWS SAM).
user@exam-gen ~ % cd exam-gen-ai-blog
and run the sam build
command.Before you run sam deploy
, be aware of the following:
sam
template. To list your VPC IDs and subnets within a selected VPC ID, run the following commands to extract your VpcId
and your two SubnetId
:GenExamCallbackURL
(for generating exam) and TakeExamCallbackURL
(for taking exam) are used by Amazon Cognito. They are URLs where the user is redirected to after a successful sign-in.sam
template:You can follow the creation on the AWS CloudFormation console.
This following video demonstrates running the sam build
and sam deploy
commands.
Figure 2: SAM build and SAM deploy execution
You can use your browser to test the solution.
gen-exam.<your-domain-name>
.You’ll receive an email with a confirmation code.
Once verified, you will land on a page to generate your quiz.
For this example, we use the whitepaper AWS Cloud Adoption Framework: Security Perspective as our input file. We generate four multiple-choice questions and one true/false question.
Then you’ll receive an email confirming the exam has been generated.
take-exam.<your-domain-name>
, and you’ll find the exam on the dropdown menu.The educator will receive an email with the scorecard of the learner.
You have just built a simple application that randomly generates questions and answers from uploaded documents. Learners can take the generated exams and educators can receive scorecards via email when tests are complete. The integration with the DynamoDB table allows you to store the responses on a long-term basis.
There are many possibilities to build on top of this and create a fully featured learning and testing application. One area of expansion is uploading multiple documents at once. As of this writing, users can only upload one document at a time, but support for bulk uploads would improve efficiency and make it easier to work with large sets of source materials. Educators could be empowered to gather and upload content from various documents and websites as source material for questions. This provides greater flexibility compared to using a single document. Moreover, with a data store, they could view and analyze learner answers via a scorecard interface to track progress over time.
It’s important to clean up your resources in the following order:
In this post, we showed how to build a generative AI application powered by Amazon Bedrock that creates exam questions using lecture documents as input to support educators with an automated tool to continuously modernize quiz material and improve learners’ skills. Learners will be able to take the freshly generated exam and get the score results. With the capabilities of Amazon Bedrock and the AWS SAM, you can increase educators’ productivity and foster student success.
For more information on working with generative AI on AWS for education use cases, refer to Generative AI in education: Building AI solutions using course lecture content.
Explore key takeaways from our recent webinar on how AI can transform ABM strategies with…
Generative AI applications are gaining widespread adoption across various industries, including regulated industries such as…
Foundation models such as Gemini have revolutionized how we work, but sometimes they need guidance…
According to Meta, memory layers may be the the answer to LLM hallucinations as they…
According to Mark Zuckerberg, Meta trust and safety workers will be relocated to Texas to…
EPFL researchers have developed 4M, a next-generation, open-sourced framework for training versatile and scalable multimodal…