2122
This post is cowritten with Gayathri Rengarajan and Harshit Kumar Nyati from PowerSchool.
PowerSchool is a leading provider of cloud-based software for K-12 education, serving over 60 million students in more than 90 countries and over 18,000 customers, including more than 90 of the top 100 districts by student enrollment in the United States. When we launched PowerBuddy
In this post, we demonstrate how we built and deployed a custom content filtering solution using Amazon SageMaker AI that achieved better accuracy while maintaining low false positive rates. We walk through our technical approach to fine tuning Llama 3.1 8B, our deployment architecture, and the performance results from internal validations.
PowerBuddy is an AI assistant that provides personalized insights, fosters engagement, and provides support throughout the educational journey. Educational leaders benefit from PowerBuddy being brought to their data and their users’ most common workflows within the PowerSchool ecosystem – such as Schoology Learning, Naviance CCLR, PowerSchool SIS, Performance Matters, and more – to ensure a consistent experience for students and their network of support providers at school and at home.
The PowerBuddy suite includes several AI solutions: PowerBuddy for Learning functions as a virtual tutor; PowerBuddy for College and Career provides insights for career exploration; PowerBuddy for Community simplifies access to district and school information, and others. The solution includes built-in accessibility features such as speech-to-text and text-to-speech functionality.
As an education technology provider serving millions of students—many of whom are minors—student safety is our highest priority. National data shows that approximately 20% of students ages 12–17 experience bullying, and 16% of high school students have reported seriously considering suicide. With PowerBuddy’s widespread adoption across K-12 schools, we needed robust guardrails specifically calibrated for educational environments.
The out-of-the-box content filtering and safety guardrails solutions available on the market didn’t fully meet PowerBuddy’s requirements, primarily because of the need for domain-specific awareness and fine-tuning within the education context. For example, when a high school student is learning about sensitive historical topics such as World War II or the Holocaust, it’s important that educational discussions aren’t mistakenly flagged for violent content. At the same time, the system must be able to detect and immediately alert school administrators to indications of potential harm or threats. Achieving this nuanced balance requires deep contextual understanding, which can only be enabled through targeted fine-tuning.
We needed to implement a sophisticated content filtering system that could intelligently differentiate between legitimate academic inquiries and truly harmful content—detecting and blocking prompts indicating bullying, self-harm, hate speech, inappropriate sexual content, violence, or harmful material not suitable for educational settings. Our challenge was finding a cloud solution to train and host a custom model that could reliably protect students while maintaining the educational functionality of PowerBuddy.
After evaluating multiple AI providers and cloud services that allow model customization and fine-tuning, we selected Amazon SageMaker AI as the most suitable platform based on these critical requirements:
Our content filtering system architecture, shown in the preceding figure, consists of several key components:
After exploring multiple approaches to content filtering, we decided to fine-tune Llama 3.1 8B using Amazon SageMaker JumpStart. This decision followed our initial attempts to develop a content filtering model from scratch, which proved challenging to optimize for consistency across various types of harmful content.
SageMaker JumpStart significantly accelerated our development process by providing pre-configured environments and optimized hyperparameters for fine-tuning foundation models. The platform’s streamlined workflow allowed our team to focus on curating high-quality training data specific to educational safety concerns rather than spending time on infrastructure setup and hyperparameter tuning.
We fine-tuned Llama 3.1 8B model using Low Rank Adaptation (LoRA) technique on Amazon SageMaker AI training jobs, which allowed us to maintain full control over the training process.
After the fine-tuning was done, we deployed the model on SageMaker AI managed endpoint and integrated it as a critical safety component within our PowerBuddy architecture.
For our production deployment, we selected NVIDIA A10G GPUs available through ml.g5.12xlarge instances, which offered the ideal balance of performance and cost-effectiveness for our model size. The AWS team provided crucial guidance on selecting optimal model serving configuration for our use case. This advice helped us optimize both performance and cost by ensuring we weren’t over-provisioning resources.
Below is the code snippet to fine-tune the model on the pre-processed dataset. Instruction tuning dataset is first converted into domain adaptation dataset format and scripts utilize Fully Sharded Data Parallel (FSDP) as well as Low Rank Adaptation (LoRA) method for fine-tuning the model.
We define an estimator object first. By default, these models train via domain adaptation, so you must indicate instruction tuning by setting the instruction_tuned
hyperparameter to True
.
After we define the estimator, we are ready to start training:
estimator.fit({"training": train_data_location})
After training, we created a model using the artifacts stored in S3 and deployed the model to a real-time endpoint for evaluation. We tested the model using our test dataset that covers key scenarios to validate performance and behavior. We calculated recall, F1, confusion matrix and inspected misclassifications. If needed, adjust hyperparameters/prompt template and retrain; otherwise proceed with production deployment.
You can also check out the sample notebook for fine tuning Llama 3 models on SageMaker JumpStart in SageMaker examples.
We used the Faster autoscaling on Amazon SageMaker realtime endpoints notebook to set up autoscaling on SageMaker AI endpoints.
To validate our content filtering solution, we conducted extensive testing across multiple dimensions:
The fine-tuned content filtering model demonstrated higher performance than generic, out-of-the-box filtering solutions in key safety metrics. It achieved a higher accuracy (0.93 compared to 0.89), and better F1-scores for both the safe (0.95 compared to 0.91) and unsafe (0.90 compared to 0.87) classes. The fine-tuned model also demonstrated a more balanced trade-off between precision and recall, indicating more consistent performance across classes. Importantly, it makes fewer false positive errors by misclassifying only 6 safe cases as unsafe, compared to 19 original responses in a test set of 160— a significant advantage in safety-sensitive applications. Overall, our fine-tuned content filtering model proved to be more reliable and effective.
As the PowerBuddy suite evolves and is integrated into other PowerSchool products and agent flows, the content filter model will be continuously adapted and improved with fine tuning for other products with specific needs.
We plan to implement additional specialized adapters using the SageMaker AI multi-adapter inference feature alongside our content filtering model subject to feasibility and compliance consideration. The idea is to deploy fine-tuned small language models (SLMs) for specific problem solving in cases where large language models (LLMs) are huge and generic and don’t meet the need for narrower problem domains. For example:
This approach will deliver significant cost savings by eliminating the need for separate model deployments while maintaining the specialized performance of each adapter.
The goal is to create an AI learning environment that is not only safe but also inclusive and responsive to diverse student needs across our global implementations, ultimately empowering students to learn effectively while being protected from harmful content.
The implementation of our specialized content filtering system on Amazon SageMaker AI has been transformative for PowerSchool’s ability to deliver safe AI experiences in educational settings. By building robust guardrails, we’ve addressed one of the primary concerns educators and parents have about introducing AI into classrooms—helping to ensure student safety.
As Shivani Stumpf, our Chief Product Officer, explains: “We’re now tracking around 500 school districts who’ve either purchased PowerBuddy or activated included features, reaching over 4.2 million students approximately. Our content filtering technology ensures students can benefit from AI-powered learning support without exposure to harmful content, creating a safe space for academic growth and exploration.”
The impact extends beyond just blocking harmful content. By establishing trust in our AI systems, we’ve enabled schools to embrace PowerBuddy as a valuable educational tool. Teachers report spending less time monitoring student interactions with technology and more time on personalized instruction. Students benefit from 24/7 learning support without the risks that might otherwise come with AI access.
For organizations requiring domain-specific safety guardrails, consider how the fine-tuning capabilities and managed endpoints of SageMaker AI can be adapted to your use case.
As we continue to expand PowerBuddy’s capabilities with the multi-adapter inference of SageMaker, we remain committed to maintaining the perfect balance between educational innovation and student safety—helping to ensure that AI becomes a positive force in education that parents, teachers, and students can trust.
Here is a workflow to fix most of the Qwen-Image-Edit-2509 zooming problems, and allows any…
Time series data have the added complexity of temporal dependencies, seasonality, and possible non-stationarity.
CodeMender helps patch critical software vulnerabilities, and rewrites and secures existing code.
Building the best AI applications requires both the freedom to choose the most powerful, specialized…
OpenAI launched an agent builder that the company hopes will eliminate fragmented tools and make…
“I don’t think we have an easy relationship with our technology at the moment,” the…