Online communities are driving user engagement across industries like gaming, social media, ecommerce, dating, and e-learning. Members of these online communities trust platform owners to provide a safe and inclusive environment where they can freely consume content and contribute. Content moderators are often employed to review user-generated content and check that it’s safe and compliant with your terms of use. However, the ever-increasing scale, complexity, and variety of inappropriate content makes human moderation workflows unscalable and expensive. The result is poor, harmful, and non-inclusive communities that disengage users and negatively impact the community and business.
Along with user-generated content, machine-generated content has brought a fresh challenge to content moderation. It automatically creates highly realistic content that may be inappropriate or harmful at scale. The industry is facing the new challenge of automatically moderating content generated by AI to protect users from harmful material.
In this post, we introduce toxicity detection, a new feature from Amazon Comprehend that helps you automatically detect harmful content in user- or machine-generated text. This includes plain text, text extracted from images, and text transcribed from audio or video content.
Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning (ML) to uncover valuable insights and connections in text. It offers a range of ML models that can be either pre-trained or customized through API interfaces. Amazon Comprehend now provides a straightforward, NLP-based solution for toxic content detection in text.
The Amazon Comprehend Toxicity Detection API assigns an overall toxicity score to text content, ranging from 0–1, indicating the likelihood of it being toxic. It also categorizes text into the following seven categories and provides a confidence score for each:
You can access the Toxicity Detection API by calling it directly using the AWS Command Line Interface (AWS CLI) and AWS SDKs. Toxicity detection in Amazon Comprehend is currently supported in the English language.
Text moderation plays a crucial role in managing user-generated content across diverse formats, including social media posts, online chat messages, forum discussions, website comments, and more. Moreover, platforms that accept video and audio content can use this feature to moderate transcribed audio content.
The emergence of generative AI and large language models (LLMs) represents the latest trend in the field of AI. Consequently, there is a growing need for responsive solutions to moderate content generated by LLMs. The Amazon Comprehend Toxicity Detection API is ideally suited for addressing this need.
You can send up to 10 text segments to the Toxicity Detection API, each with a size limit of 1 KB. Every text segment in the request is handled independently. In the following example, we generate a JSON file named toxicity_api_input.json
containing the text content, including three sample text segments for moderation. Note that in the example, the profane words are masked as XXXX.
You can use the AWS CLI to invoke the Toxicity Detection API using the preceding JSON file containing the text content:
The Toxicity Detection API response JSON output will include the toxicity analysis result in the ResultList
field. ResultList
lists the text segment items, and the sequence represents the order in which the text sequences were received in the API request. Toxicity represents the overall confidence score of detection (between 0–1). Labels includes a list of toxicity labels with confidence scores, categorized by the type of toxicity.
The following code shows the JSON response from the Toxicity Detection API based on the request example in the previous section:
In the preceding JSON, the first text segment is considered safe with a low toxicity score. However, the second and third text segments received toxicity scores of 73% and 98%, respectively. For the second segment, Amazon Comprehend detects a high toxicity score for VIOLENCE_OR_THREAT
; for the third segment, it detects PROFANITY
with a high toxicity score.
The following code snippet demonstrates how to utilize the Python SDK to invoke the Toxicity Detection API. This code receives the same JSON response as the AWS CLI command demonstrated earlier.
In this post, we provided an overview of the new Amazon Comprehend Toxicity Detection API. We also described how you can parse the API response JSON. For more information, refer to Comprehend API document.
Amazon Comprehend toxicity detection is now generally available in four Regions: us-east-1, us-west-2, eu-west-1, and ap-southeast-2.
To learn more about content moderation, refer to Guidance for Content Moderation on AWS. Take the first step towards streamlining your content moderation operations with AWS.
TL;DR A conversation with 4o about the potential demise of companies like Anthropic. As artificial…
Whether a company begins with a proof-of-concept or live deployment, they should start small, test…
Digital tools are not always superior. Here are some WIRED-tested agendas and notebooks to keep…
Machine learning (ML) models are built upon data.
Editor’s note: This is the second post in a series that explores a range of…
David J. Berg*, David Casler^, Romain Cledat*, Qian Huang*, Rui Lin*, Nissan Pow*, Nurcan Sonmez*,…