ML 18972 harmonic app screenshot
This post was written with Bryan Woolgar-O’Neil, Jamie Cockrill and Adrian Cunliffe from Harmonic Security
Organizations face increasing challenges protecting sensitive data while supporting third-party generative AI tools. Harmonic Security, a cybersecurity company, developed an AI governance and control layer that spots sensitive data in line as employees use AI, giving security teams the power to keep PII, source code, and payroll information safe while the business accelerates.
The following screenshot demonstrates Harmonic Security’s software tool, highlighting the different data leakage detection types, including Employee PII, Employee Financial Information, and Source Code.
Harmonic Security’s solution is also now available on AWS Marketplace, enabling organizations to deploy enterprise-grade data leakage protection with seamless AWS integration. The platform provides prompt-level visibility into GenAI usage, real-time coaching at the point of risk, and detection of high-risk AI applications—all powered by the optimized models described in this post.
The initial version of their system was effective, but with a detection latency of 1–2 seconds, there was an opportunity to further enhance its capabilities and improve the overall user experience. To achieve this, Harmonic Security partnered with the AWS Generative AI Innovation Center to optimize their system with four key objectives:
This post walks through how Harmonic Security used Amazon SageMaker AI, Amazon Bedrock, and Amazon Nova Pro to fine-tune a ModernBERT model, achieving low-latency, accurate, and scalable data leakage detection.
Harmonic Security’s initial data leakage detection system relied on an 8 billion (8B) parameter model, which effectively identified sensitive data but incurred 1–2 second latency, which ran close to the threshold of impacting user experience. To achieve sub-500 millisecond latency while maintaining accuracy, we developed two classification approaches using a fine-tuned ModernBERT model.
First, a binary classification model was prioritized to detect Mergers & Acquisitions (M&A) content, a critical category for helping prevent sensitive data leaks. We initially focused on binary classification because it was the simplest approach that would seamlessly integrate within their current system that invokes multiple binary classification models in parallel. Secondly, as an extension, we explored a multi-label classification model to detect multiple sensitive data types (such as billing information, financial projections, and employment records) in a single pass, aiming to reduce the computational overhead of running multiple parallel binary classifiers for greater efficiency. Although the multi-label approach showed promise for future scalability, Harmonic Security decided to stick with the binary classification model for the initial version.The solution uses the following key services:
The following diagram illustrates the solution architecture for low-latency inference and scalability.
The architecture consists of the following components:
The solution supports the following features:
High-quality training data for sensitive information (such as M&A documents and financial data) is scarce. We used Meta Llama 3.3 70B Instruct and Amazon Nova Pro to generate synthetic data, expanding upon Harmonic’s existing dataset that included examples of data in the following categories: M&A, billing information, financial projection, employment records, sales pipeline, and investment portfolio. The following diagram provides a high-level overview of the synthetic data generation process.
The synthetic data generation framework is comprised of a series of steps, including:
For the binary M&A classification task, we generated three distinct types of examples:
The generation process maintained careful proportions between these example types, with particular emphasis on near-miss examples to address precision requirements.
For the more complex multi-label classification task across four sensitive information categories, we developed a sophisticated generation strategy:
Our multi-label generation prioritized realistic co-occurrence patterns between categories while maintaining sufficient representation of individual categories and their combinations. As a result, synthetic data increased training examples by 10 times (binary) and 15 times (multi-label) more. It also improved the class balance because we made sure to generate the data with a more balanced label distribution.
We fine-tuned ModernBERT models on SageMaker to achieve low latency and high accuracy. Compared with decoder-only models such as Meta Llama 3.2 3B and Google Gemma 2 2B, ModernBERT’s compact size (149M and 395M parameters) translated into faster latency while still delivering higher accuracy. We therefore selected ModernBERT over fine-tuning those alternatives. In addition, ModernBERT is one of the few BERT-based models that supports context lengths of up to 8,192 tokens, which was a key requirement for our project.
Our first fine-tuned model used ModernBERT-base, and we focused on binary classification of M&A content.We approached this task methodically:
The result was a fine-tuned model that could distinguish M&A content from non-sensitive information with a higher F1 score than the 8B parameter model.
For our second model, we tackled the more complex challenge of multi-label classification (detecting multiple sensitive data types simultaneously within single text passages).We fine-tuned a ModernBERT-large model to identify various sensitive data types like billing information, employment records, and financial projections in a single pass. This required:
This approach enabled our system to identify multiple sensitive data types in a single inference pass.
To find the optimal configuration for our models, we used Optuna to optimize key parameters. Optuna is an open-source hyperparameter optimization (HPO) framework that helps find the best hyperparameters for a given machine learning (ML) model by running many experiments (called trials). It uses a Bayesian algorithm called Tree-structured Parzen Estimator (TPE) to choose promising hyperparameter combinations based on past results.
The search space explored numerous combinations of key hyperparameters, as listed in the following table.
| Hyperparameter | Range |
| Learning rate | 5e-6–5e-5 |
| Weight decay | 0.01–0.5 |
| Warmup ratio | 0.0–0.2 |
| Dropout rates | 0.1–0.5 |
| Batch size | 16, 24, 32 |
| Gradient accumulation steps | 1, 4 |
| Focal loss gamma (multi-label only) | 1.0–3.0 |
| Class threshold (multi-label only) | 0.1–0.8 |
To optimize computational resources, we implemented pruning logic to stop under-performing trials early, so we could discard configurations that were less optimal. As seen in the following Optuna HPO history plot, trial 42 had the most optimal parameters with the highest F1 score for the binary classification, whereas trial 32 was the most optimal for the multi-label.
Moreover, our analysis showed that dropout and learning rate were the most important hyperparameters, accounting for 48% and 21% of the variance of the F1 score for the binary classification model. This explained why we noticed the model overfitting quickly during previous runs and stresses the importance of regularization.
After the optimization experiments, we discovered the following:
This allowed our models to achieve a high F1 score efficiently by running hyperparameter tuning in an automated fashion, which is crucial for production deployment.
After fine-tuning and deploying the optimized model to a SageMaker real-time endpoint, we performed load testing to validate the performance and autoscaling under pressure to meet Harmonic Security’s latency, throughput, and elasticity needs. The objectives of the load testing were:
The methodology involved the following:
As shown in the following graph, we found that the maximum throughput under a latency of 1 second was 1,185 RPM, so we decided to set the auto scaling threshold to 70% of that at 830 RPM.
Based on the performance observed during load testing, we configured a target-tracking auto scaling policy for the SageMaker endpoint using Application Auto Scaling. The following figure illustrates this policy workflow.
The key parameters defined were:
SageMakerVariantInvocationsPerInstance (830 invocations/instance/minute)This target-tracking policy adjusts instances based on traffic, maintaining performance and cost-efficiency. The following table summarizes our findings.
| Model | Requests per Minute |
|---|---|
| 8B model | 800 |
| ModernBERT with auto scaling (5 instances) | 1,185-5925 |
| Additional capacity (ModernBERT vs. 8B model) | 48%-640% |
This section showcases the significant impact of the fine-tuning and optimization efforts on Harmonic Security’s data leakage detection system, with a primary focus on achieving substantial latency reductions. Absolute latency improvements are detailed first, underscoring the success in meeting the sub-500 millisecond target, followed by an overview of performance enhancements. The following subsections provide detailed results for binary M&A classification and multi-label classification across multiple sensitive data types.
We evaluated the fine-tuned ModernBERT-base model for binary M&A classification against the baseline 8B model, introduced in the solution overview. The most striking achievement was a transformative reduction in latency, addressing the initial 1–2 second delay that risked disrupting user experience. This leap to sub-500 millisecond latency is detailed in the following table, marking a pivotal enhancement in system responsiveness.
| Model | median_ms | p95_ms | p99_ms | p100_ms |
|---|---|---|---|---|
| Modernbert-base-v2 | 46.03 | 81.19 | 102.37 | 183.11 |
| 8B model | 189.15 | 259.99 | 286.63 | 346.36 |
| Difference | -75.66% | -68.77% | -64.28% | -47.13% |
Building on this latency breakthrough, the following performance metrics reflect percentage improvements in accuracy and F1 score.
| Model | Accuracy Improvement | F1 Improvement |
| ModernBERT-base-v2 | +1.56% | +2.26% |
| 8B model | – | – |
These results highlight that ModernBERT-base-v2 delivers a groundbreaking latency reduction, complemented by modest accuracy and F1 improvements of 1.56% and 2.26%, respectively, aligning with Harmonic Security’s objectives to enhance data leakage detection without impacting user experience.
We evaluated the fine-tuned ModernBERT-large model for multi-label classification against the baseline 8B model, with latency reduction as the cornerstone of this approach. The most significant advancement was a substantial decrease in latency across all evaluated categories, achieving sub-500 millisecond responsiveness and addressing the previous 1–2 second bottleneck. The latency results shown in the following table underscore this critical improvement.
| Dataset | model | median_ms | p95_ms | p99_ms |
| Billing and payment | 8B model | 198 | 238 | 321 |
| ModernBERT-large | 158 | 199 | 246 | |
| Difference | -20.13% | -16.62% | -23.60% | |
| Sales pipeline | 8B model | 194 | 265 | 341 |
| ModernBERT-large | 162 | 243 | 293 | |
| Difference | -16.63% | -8.31% | -13.97% | |
| Financial projections | 8B model | 384 | 510 | 556 |
| ModernBERT-large | 160 | 275 | 310 | |
| Difference | -58.24% | -46.04% | -44.19% | |
| Investment portfolio | 8B model | 397 | 498 | 703 |
| ModernBERT-large | 160 | 259 | 292 | |
| Difference | -59.69% | -47.86% | -58.46% |
This approach also delivered a second key benefit: a reduction in computational parallelism by consolidating multiple classifications into a single pass. However, the multi-label model encountered challenges in maintaining consistent accuracy across all classes. Although categories like Financial Projections and Investment Portfolio showed promising accuracy gains, others such as Billing and Payment and Sales Pipeline experienced significant accuracy declines. This indicates that, despite its latency and parallelism advantages, the approach requires further development to maintain reliable accuracy across data types.
In this post, we explored how Harmonic Security collaborated with the AWS Generative AI Innovation Center to optimize their data leakage detection system achieving transformative results:
Key performance improvements:
By using SageMaker, Amazon Bedrock, and Amazon Nova Pro, Harmonic Security fine-tuned ModernBERT models that deliver sub-500 millisecond inference in production, meeting stringent performance goals while supporting EU compliance and establishing a scalable architecture.
This partnership showcases how tailored AI solutions can tackle critical cybersecurity challenges without hindering productivity. Harmonic Security’s solution is now available on AWS Marketplace, enabling organizations to adopt AI tools safely while protecting sensitive data in real time. Looking ahead, these high-speed models have the potential to add further controls for additional AI workflows.
To learn more, consider the following next steps:
By adopting these steps, organizations can harness AI-driven cybersecurity to maintain robust data protection and seamless user experiences across diverse workflows.
In today's dynamic business environment, accurate forecasting is the bedrock of efficient operations. Yet, businesses…
BISC is an ultra-thin neural implant that creates a high-bandwidth wireless link between the brain…
Google DeepMind and UK AI Security Institute (AISI) strengthen collaboration on critical AI safety and…
Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing…
Automated smoke testing using Amazon Nova Act headless mode helps development teams validate core functionality…
Today, we expanded Google’s support for Model Context Protocol (MCP) with the release of fully-managed,…