Producing high-quality, multilanguage subtitles for video content is a difficult business problem that is often labor intensive to solve. Good subtitles have the power to extend the reach of video content and increase understanding for all viewers, but high-quality subtitles often require teams to spend hours transcribing, subtitling, translating, reviewing, and correcting. Video content with multiple language sources and targets, or video content in specialized domains, can also require teams with expertise in multiple dimensions. Often, time and resources cannot meet the demand for the work required.

Using Content Localization on AWS, a new solution that creates subtitles for video-on-demand content, teams can use human-in-the-loop (HITL) capabilities from Amazon Transcribe and Amazon Translate to generate high-quality, multilanguage subtitles. (Amazon Transcribe automatically converts speech to text, and Amazon Translate delivers fast, high-quality, affordable, and customizable language translation.) The solution provides a drag-and-drop interface to transcribe and translate videos. It also provides an editor (as shown in the following graphic) to manually correct the timing and texts of machine-generated subtitles and their translated counterparts. This editor integrates with the customization options from Amazon Transcribe and Amazon Translate so users can selectively use their edits to influence how speech-to-text and translation services treat domain-specific words and other unique content.

Figure 1 - Content Localization on AWS user interface.

Figure 1 – Content Localization on AWS user interface

For example, imagine you work for a public health organization that wants to provide educational videos to healthcare workers as quickly as possible to address the response to emerging diseases around the world. Though using generic machine transcription and translation can speed up the process, you are likely to encounter some issues with new words (natural language), domain-specific terms, or errors in contextual interpretation of the audio and text translations of your content. Managing the subtitling workflow across domain experts and translators for different language pairs can also be a challenge as errors found in one output might need to apply to outputs that other team members are working on.

Or imagine your technology company wants to permit viewers to access conference recordings online so that viewers can hear about new products in their chosen language. You want to target viewers around the world with high-quality subtitles, within hours of the original presentation. You are especially aware that new product names may not yet be in the training dataset for artificial intelligence (AI) services and that deep technical terminology may be misinterpreted from context alone.

In a previous blog post, we showed how Amazon Transcribe custom vocabularies can improve the accuracy of speech-to-text transcriptions for domain-specific terminology. In addition, we demonstrated how the Amazon Translate feature for Custom Terminology—which lets you customize output from Amazon Translate to use company- and domain-specific vocabulary—can adapt to recognize and translate specific terms for your content. And we showed how Active Custom Translation (ACT) can use your provided guidance to reflect style, tone, and word choices or to tailor translations for terms or phrases that are unique to a specific domain, such as life sciences, law, or finance.

Content Localization on AWS combines these services into a cloud-based workflow for creating, editing, and customizing subtitles generated on Amazon Web Services (AWS). You can use a guided experience to automatically generate subtitles for your content, then make manual corrections in multiple languages, as shown in the following graphics. Your corrections are tracked and can be used to update Amazon Transcribe custom vocabularies and Amazon Translate custom terminologies to improve future generated subtitle results.

Figure 2 - In-app user interface for creating and modifying custom vocabularies.

Figure 2 – In-app user interface for creating and modifying custom vocabularies

Figure 3 - In-app user interface for creating and modifying custom terminologies.

Figure 3 – In-app user interface for creating and modifying custom terminologies

Content Localization on AWS relies on the AWS Media Insights Engine, a development framework for applying machine learning services to media workflows, and can be deployed on a new or existing AWS Media Insights Engine instance. When deployed, the solution will create resources to run a serverless web application and APIs, manage web application users and authentication, and manage automated workflows for video content analysis and subtitle generation.

Figure 4 - User interface for video uploads and workflow configuration.

Figure 4 – User interface for video uploads and workflow configuration

Content Localization on AWS provides a web interface to upload videos and configure speech-to-text and translation options for workflows, as shown above. You can also specify an Amazon Transcribe Custom Vocabulary, Amazon Translate Custom Terminology, and/or Parallel Data selection or choose to run additional computer vision (CV) analysis using Amazon Rekognition, which offers pretrained and customizable CV capabilities to extract information and insights from your images and videos.

Figure 5 - user interface for subtitle editing and video analysis

Figure 5 – user interface for subtitle editing and video analysis

When the workflow is complete, the web interface will display a searchable table of all the generated assets. You can then use the web interface to interactively edit the text and timing of machine-generated subtitles from your source video, as shown in the previous graphic. Those edits are stored and applied to any assets created from the source subtitles, such as translated subtitles. When you are satisfied with the quality of the source subtitles, you can review and manually correct the translated subtitles as well. Translation edits will also be saved for future downstream assets that originate from that language asset.

This HITL process allows teams to use the strengths of AI and human review to produce fast and accurate multilanguage subtitles.

When should you use this solution?

  • You want to automatically create transcriptions and subtitles in the source language for video content.
  • You want to automatically create multilanguage transcriptions and subtitles for video content.
  • You want to automatically create multilanguage audio transcripts.
  • You want to incorporate edits and corrections in your subtitles for inclusion in automatically generated results downstream. For example, if you edit source language subtitles, then you can regenerate target language translations and audio transcripts to inherit those corrections.
  • You want to use the customization options of Amazon Transcribe and Amazon Translate to improve the quality of output for your specialized content.

Using the new Content Localization on AWS solution, users can generate fast and accurate subtitles for videos at an affordable price. Deploy the solution using AWS Solutions Implementations—which you can use to solve common problems and build faster using the AWS platform—or clone the source code for the solution from GitHub. View the implementation guide for Content Localization on AWS.

Categories: Media