Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. This post shows how you can mask profane words and phrases with a grawlix string (“?$#@$”).

Amazon Translate typically chooses clean words for your translation output. But in some situations, you want to prevent words that are commonly considered as profane terms from appearing in the translated output. For example, when you’re translating video captions or subtitle content, or enabling in-game chat, and you want the translated content to be age appropriate and clear of any profanity, Amazon Translate allows you to mask the profane words and phrases using the profanity masking setting. You can apply profanity masking to both real-time translation or asynchronous batch processing in Amazon Translate. When using Amazon Translate with profanity masking enabled, the five-character sequence ?$#@$ is used to mask each profane word or phrase, regardless of the number of characters. Amazon Translate detects each profane word or phrase literally, not contextually.

Solution overview

To mask profane words and phrases in your translation output, you can enable the profanity option under the additional settings on the Amazon Translate console when you run the translations with Amazon Translate both through real-time and asynchronous batch processing requests. The following sections demonstrate using profanity masking for real-time translation requests via the Amazon Translate console, AWS Command Line Interface (AWS CLI), or with the Amazon Translate SDK (Python Boto3).

Amazon Translate console

To demonstrate handling profanity with real-time translation, we use the following sample text in French to be translated into English:

Ne sois pas une garce

Complete the following steps on the Amazon Translate console:

  1. Choose French (fr) as the Source language.
  2. Choose English (en) as the Target Language.
  3. Enter the preceding example text in the Source Language text area.

The translated text appears under Target language. It contains a word that is considered profane in English.

  1. Expand Additional settings and enable Profanity.

The word is now replaced with the grawlix string ?$#@$.

AWS CLI

Calling the translate-text AWS CLI command with --settings Profanity=MASK masks profane words and phrases in your translated text.

The following AWS CLI commands are formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^).

aws translate translate-text \
--text <<INPUT TEXT>> \
--source-language-code fr \
--target-language-code en \
--settings Profanity=MASK

You get a response like the following snippet:

{ "TranslatedText": "<output text with ?$#@$>", "SourceLanguageCode": "fr", "TargetLanguageCode": "en", "AppliedSettings": { "Profanity": "MASK" }
}

Amazon Translate SDK (Python Boto3)

The following Python 3 code uses the real-time translation call with the profanity setting:

import boto3
import json translate = boto3.client('translate') SOURCE_TEXT = ("<Sample Input Text>") OUTPUT_LANG_CODE = 'en' result = translate.translate_text( Text=SOURCE_TEXT, SourceLanguageCode='auto', TargetLanguageCode=OUTPUT_LANG_CODE, Settings={'Profanity': 'MASK'}
) print("Translated Text:{}".format(result['TranslatedText']))

Conclusion

You can use the profanity masking setting to mask words and phrases that are considered profane to keep your translated text clean and meet your business requirements. To learn more about all the ways you can customize your translations, refer to Customizing Your Translations using Amazon Translate.


About the Authors

Siva Rajamani is a Boston-based Enterprise Solutions Architect at AWS. He enjoys working closely with customers and supporting their digital transformation and AWS adoption journey. His core areas of focus are serverless, application integration, and security. Outside of work, he enjoys outdoors activities and watching documentaries.

Sudhanshu Malhotra is a Boston-based Enterprise Solutions Architect for AWS. He’s a technology enthusiast who enjoys helping customers find innovative solutions to complex business challenges. His core areas of focus are DevOps, machine learning, and security. When he’s not working with customers on their journey to the cloud, he enjoys reading, hiking, and exploring new cuisines.

Watson G. Srivathsan is the Sr. Product Manager for Amazon Translate, AWS’s natural language processing service. On weekends you will find him exploring the outdoors in the Pacific Northwest.

Categories: Machine Learning