Categories: FAANG

Improving Trust in AI and Online Communities with PaLM-based Moderation

To empower developers to identify sensitive content in a rapidly changing media environment, we are excited to announce Text Moderationpowered by PaLM 2, available through the Cloud Natural Language API. Built in collaboration with Jigsaw and Google Research, Text Moderation helps organizations scan for sensitive or harmful content. Here are some examples of how the Text Moderation service can be used:

Brand Safety: Protect against user-generated content and publisher content that are considered not “brand safe” for the advertiser
User protection: Scan for potentially offensive or harmful content
Generative AI risk mitigation: Help safeguard against the generation of inappropriate content in outputs from generative models

Promote brand safety

Brand safety is a set of procedures that aim to protect the reputation and trustworthiness of a brand in the digital age. One of the biggest risks to brand safety is the content that ads are associated with; if an ad appears on a website that contains content that does not conform with the sponsoring brand’s values, it can reflect poorly on the brand and organization, so it’s important for companies to identify and remove content that isn’t aligned with brand guidelines or consistent with the brand.

Text Moderation can be used by our customers to identify content that they determine is offensive or harmful, sensitive in context, or otherwise inappropriate for their brand. Once an organization has identified this content, teams can take steps to remove it from advertising campaigns or prevent it from being associated with the brand in the future, helping ensure that advertising campaigns are effective and that the brand is associated with positive and trustworthy content.

Protect users from harmful content

Digital media platforms, gaming publishers, and online marketplaces all have a vested interest in mitigating the risks of user-generated content. They want to provide a safe and welcoming environment for their users while also maintaining an open and free exchange of ideas. Text Moderation can help them achieve this goal, using artificial neural networks to detect and remove harmful content, such as harassment or abuse. These efforts can help reduce harm, improve customer experience, and increase customer retention.

Mitigate risks of generative models

Over the last year, progress in AI has enabled software to more reliably generate text, images, and video, leading to new products and services that use machine learning, including text generators, to create content. However, with any AI content generation, there is a risk of producing offensive material, even inadvertently.

To address this risk, we have trained and evaluated the Text Moderation service on real prompts and responses from large generative models. Text Moderation is versatile and covers a broad range of content types, making it a powerful tool for protecting users from harmful content.

Getting started with Text Moderation using the Natural Language API

Text Moderation is powered by Google’s latest PaLM 2 foundation model to identify a wide range of harmful content, including hate speech, bullying, and sexual harassment. Easy to use and integrate with existing systems, the API can be accessed from almost any programming language to return confidence scores across 16 different “safety attributes.”

Visit the Natural Language AIwebsite to give it a try and refer to the “Text Moderation” page for details. You may also try out the Text Moderation codelab here.

AI Generated Robotic Content

Next Optimize equipment performance with historical data, Ray, and Amazon SageMaker »

Previous « YouWeb launches climate tech incubator to drive carbon management

Published by

AI Generated Robotic Content

Tags: ai/mlfaang

2 years ago

Instagirl v2.0 – Out Now!

Hello! Thanks for the massive support and feedback on our first models and posts. We…

11 hours ago

AI/ML Research

Time-Series Transformation Toolkit: Feature Engineering for Predictive Analytics

In time series analysis and forecasting , transforming data is often necessary to uncover underlying…

11 hours ago

FAANG

The Interspeech 2025 Speech Accessibility Project Challenge

While the last decade has witnessed significant advancements in Automatic Speech Recognition (ASR) systems, performance…

11 hours ago

FAANG

Pioneering AI workflows at scale: A deep dive into Asana AI Studio and Amazon Q index collaboration

Organizations today face a critical challenge: managing an ever-increasing volume of tasks and information across…

11 hours ago

AI/ML News

New ‘persona vectors’ from Anthropic let you decode and direct an LLM’s personality

A new study from Anthropic introduces "persona vectors," a technique for developers to monitor, predict…

12 hours ago

AI/ML News

A Single Poisoned Document Could Leak ‘Secret’ Data Via ChatGPT

Security researchers found a weakness in OpenAI’s Connectors, which let you hook up ChatGPT to…

12 hours ago

Improving Trust in AI and Online Communities with PaLM-based Moderation

Promote brand safety

Protect users from harmful content

Mitigate risks of generative models

Getting started with Text Moderation using the Natural Language API

Related Post

Recent Posts

Instagirl v2.0 – Out Now!

Time-Series Transformation Toolkit: Feature Engineering for Predictive Analytics

The Interspeech 2025 Speech Accessibility Project Challenge

Pioneering AI workflows at scale: A deep dive into Asana AI Studio and Amazon Q index collaboration

New ‘persona vectors’ from Anthropic let you decode and direct an LLM’s personality

A Single Poisoned Document Could Leak ‘Secret’ Data Via ChatGPT