Our customers with rights ownership to stream video-on-demand (VOD) content are monetizing their libraries in one of two ways: subscription-based VOD (SVOD) or advertising-based VOD (AVOD). With SVOD, they generate revenue from monthly subscriptions, letting their subscribers access and consume their content libraries for fixed fees. This model usually means that their subscribers can consume the content without watching any ads. The other method is AVOD, whereby the service and content can be consumed by anyone, but ads are placed amid the content. The clients must play the ads, and this is how these customers’ businesses generate revenue. Using server-side ad insertion (SSAI) prevents client-side ad blockers from inhibiting the content monetization strategy.

In this post, we explore best practices for creating AVOD workflows for quickly and efficiently monetizing your existing media library. We review the most important protocols, terms, and requirements to facilitate a smooth delivery to your end customers, and we focus on monetization and workflow simplicity on Amazon Web Services (AWS). Let’s look at some of the components and standards involved in the process of curating VOD assets for ad insertion using AWS Elemental MediaConvert, a file-based video transcoding service, and AWS Elemental MediaTailor, a channel assembly and personalized ad insertion service.

Considering the options for ad signaling

The main consideration for monetizing your content is how the ad signaling will be handled. Some content might have no ad markers of any kind; some might have Society of Cable Telecommunications Engineers (SCTE)–35 markers for insertion points; and others might include an underlying “slate” during ad breaks that must be overwritten by new ads. We use a combination of SCTE-35, VAST, VMAP, and ESAM standards to address these requirements.

SCTE-35

SCTE-35 is a signaling standard for advertising and programming control. An SCTE-35 signal can be used both in live or VOD streams to signal to MediaTailor that it’s time to reach out to the ad decision server (ADS), such as SpringServe or FreeWheel, for new ads. When generating over-the-top (OTT) outputs, MediaConvert places “discontinuities” in the children manifests between the segments where SCTE-35 markers are detected. This lets MediaTailor frame accurately begin the ad playback for a seamless viewing experience. This effectively puts an instantaneous decoding refresh (IDR) frame at the beginning of the segment after the ads have played, so the player knows it’s okay to refresh and play the new content after the discontinuity. SCTE markers can be used to signal an ad insertion or replacement.

VAST

Video Ad Serving Template (VAST) is the standard used for communication between the ad insertion application and the ADS. The VAST XML–based schema contains detailed metadata and information about the ad(s) as well as measurement information in the form of beacon URLs, so MediaTailor can send tracking data back to the ADS, effectively counting the ad impression, quartile views, or ad completion. This beaconing is all handled by default in a server-side implementation, so our customers can get started more quickly and without needing to customize their players. Client-side beaconing is also supported for customers who have additional tracking requirements. The ADS request uses standard URL parameters, which often include PublisherIDs for classifying audiences into age, gender, demographic, and other segments to facilitate ad decisioning and personalization. The URL parameters are straightforward to generate, making the interaction between MediaTailor and the ADS very simple.

This diagram shows the origin manifest, which includes the CUE-OUT, which MediaTailor uses to request new ads from the ADS for each viewer.

VMAP

Video Multiple Ad Playlist (VMAP) is the standard used in conjunction with VAST to provide more complex instructions to the ad insertion system. Whereas VAST will return ad information for a single break, a response constructed with VMAP lets the ADS specify the layout of multiple ads in multiple ad breaks by returning multiple VAST URLs in the single VMAP response. Each VAST URL is responsible for returning one ad pod for one break. This is useful when planning out the entire ad break structure for a specific media asset.

This diagram shows MediaTailor using VMAP to insert preroll, midroll, and postroll ads.

ESAM

To achieve the frame-accurate insertions, MediaConvert uses Event Signaling and Management (ESAM) XML, which can call out the exact timing of the requested breaks and create the needed manifest decoration for use downstream by MediaTailor, whether the outputs are Dynamic Adaptive Streaming over HTTP (DASH) or HTTP Live Streaming (HLS). Below are two ESAM XML examples that you can include in the “Signal processing notification XML” section of your MediaConvert job.

  • This XML inserts two ad breaks, one at 10 seconds and one at 5 minutes.
  • This XML inserts a signal at 10 seconds to overwrite 30 seconds of content.

When you use these XMLs to create the signals, have a look at the differences in the origin manifests.

This HLS manifest shows the CUE-OUT and CUE-IN that will signal to MediaTailor to insert new ads and expand the total duration of the asset.

This HLS manifest shows the CUE-OUT and CUE-IN that will signal to MediaTailor to insert ads and overwrite segments 00006–00020 with new ad content, keeping the asset duration unchanged.

A Moving Picture Experts Group (MPEG)–DASH manifest shows where the ad signaling is presented. Unlike with HLS, the DASH manifest format is XML-based, and ad signaling is referenced inside its own <EventStream> element. The presentation time specified in the <Event> element dictates where the ad break should be inserted in the media segment timeline. The duration specified in the <Event> element dictates whether MediaTailor will insert ads to increase the overall length of the asset (duration = 0) or to replace underlying or existing ads (duration > 0).

This MPEG-DASH manifest shows the Event elements that will signal MediaTailor to insert ads and overwrite segments between presentation times 107331360 and 110211360 with new ad content, keeping the asset duration unchanged.

If there are no SCTE-35 signals in your VOD content, MediaTailor relies on the ADS to provide a VMAP response that dictates the ad placement and duration in the content. For best results when using a VMAP response to insert ads, you need to first identify where there can be legitimate placement of ads in your VOD assets. You can do this by manually scrubbing through each piece of media and noting timestamps of scene changes, or you can do this by using artificial intelligence with a service like Amazon Rekognition, which offers pretrained and customizable computer vision capabilities to extract information and insights from your images and videos. This blog post is an excellent reference to begin using machine learning to locate the ideal ad breaks. After the ad break positions have been determined, you need to transcode the media using an ESAM workflow similar to the two scenarios that are mentioned above, which will help you make sure that the ad insertion system can stitch in ads at the correct placement. If you do not prepare your content to be segmented at the right positions for ad insertion, your client experience will be degraded because MediaTailor does not split segments, retranscode the media, or add IDR frames. Thus, it will insert the ad break at the nearest segment boundary.

Preparing media for ad insertion

Now that we’ve reviewed the various standards and protocols needed for ad insertion workflows, let’s review the steps to prepare your media. First off, you’ll store your library in Amazon Simple Storage Service (Amazon S3), which offers industry-leading scalability, data availability, security, and performance. But the Amazon S3 storage class you choose is critical; Amazon S3 storage classes are purpose built to provide the lowest-cost storage for different access patterns. For your large master files, as soon as they have run through MediaConvert and passed quality control, move them immediately to Amazon S3 Glacier Flexible Retrieval, which delivers flexible retrieval options that balance cost with access times ranging from minutes to hours, or Amazon S3 Glacier Deep Archive, which supports long-term retention and digital preservation for data. For the OTT variations that your viewers watch, you can often use Amazon S3 Intelligent-Tiering (S3 Intelligent-Tiering), which automatically moves data to the most cost-effective access tier based on access frequency. This helps you save on storage costs by managing the tier transitions for you. Inside MediaConvert, you need to decide your output types: HLS, DASH, or Common Media Application Format (CMAF). If you are injecting the SCTE-35 markers using the ESAM workflow, be sure to activate the passthrough/insert for each format, detailed in the following screenshot.

DASH container settings

This image shows the DASH ESAM SCTE-35 set to “Insert.”

Ad insertion systems like MediaTailor require formatting the DASH manifest with a segment timeline for each media rendition. By default, this setting is not activated in MediaConvert.

This image shows the “Write segment timeline in representation” set to “Enabled.”

Please review this GitHub repository, which provides instructions on how to condition the manifest appropriately based on the MediaConvert job’s ESAM Signal Confirmation and Conditioning (SCC) XML.

MediaConvert outputs MPEG-DASH with SCTE-35 signaling in the headers of the MP4 segments, but it does not write the ad signaling into the manifest. Therefore, for MediaTailor to recognize the break points and issue a VAST request per break, a post-transcode Lambda function is required so that you can add the EventStream and Event elements at the manifest’s corresponding period.

HLS transport stream (TS) container settings

This image shows the SCTE-35 source HLS TS container settings set to “Passthrough.”

HLS group settings

This image shows the HLS group settings with ad markers selected.

CMAF container settings

This image shows the CMAF ESAM SCTE-35 set to “Insert.”

NOTE: If you are using CMAF, then a custom transcode profile must be created for MediaTailor to prepare your ads in the correct format.

Timing the sidecar captions

For MediaTailor to seamlessly stitch ads into the content, the segment length of the various media types must be consistent: for example, 6 seconds per segment. This also applies to subtitle tracks such as Web Video Text Tracks (WebVTT) in HLS or Timed Text Markup Language (TTML) in MPEG-DASH.

HLS TS sidecar caption settings

MediaConvert currently outputs HLS WebVTT segments at 300-second intervals. Although this is supported in the HLS authoring spec (5.7), it’s not ideal for ad insertion because the video and audio segments are usually set to 6 seconds, which is the recommendation from Apple. Please review this GitHub repository, which explains this in more detail and provides a solution to remedy the HLS output.

This image shows the settings for WebVTT in HLS.

MPEG-DASH sidecar caption settings

In MPEG-DASH, you need to select “TTML” in a fragmented MPEG-4 (fMP4) encapsulated sidecar caption format. This will signal MediaConvert to create a segment timeline for the subtitle segments in the DASH manifest.

This image shows the TTML caption settings for DASH.

Fine-tuning your workflow

The parameters that you pass from your player to your ADS can determine which ads are returned for a user. MediaTailor refers to these as “player params.” This could include information about the viewer, their username, or their location information, which can help you target ads. One common method to help show relevant ads is to use the #EXT-X-ASSET tag in your origin manifest so that the ADS knows what content is playing. This can be added post-transcoding using a function of AWS Lambda—a serverless, event-driven compute service—to reference the asset information inside your media asset management (MAM) system. Additionally, managed private UPID data can be injected into the SCTE markers and extracted by MediaTailor to be sent to the ADS. This private UPID data, which might include the name of the show, season, or episode, is helpful for sponsorship ad targeting, the genre, or the age rating—data that is useful for contextual ad targeting and protecting children’s content from including adult advertising. Showing more relevant ads to your viewers can improve overall engagement ratings and increase monetization.

When dealing with VAST requests, some customers rely on #EXT-X-CUE-OUT:0 markers, which don’t specify an ad break duration. In this case, MediaTailor will make a 5-minute ad pod request to the ADS. This can be adjusted on the ADS-side for what maximum duration of ad pods to return. With VMAP, ad break duration is not a parameter that gets populated when the VAST request is made to the ADS. Instead, the ADS will rely on a parameter such as “AssetID” to be passed using a player parameter containing the identifier of the VOD asset. The ADS then looks up the ad map for the VOD asset and returns a VMAP response to MediaTailor with ad break positions and times.

Taking the next steps

In this post, we walked through different configurations and workflows to help you jump-start the monetization of your AVOD content library. The next building blocks to explore are encryption and just-in-time packaging with AWS Elemental MediaPackage, which reliably prepares and protects your video for delivery over the internet. Using MediaPackage, you can lower your transcoding and storage requirements by using your HLS assets and creating DASH or CMAF outputs in near real time. And by using MediaPackage and MediaConvert, you can also bring in a third-party digital rights management provider for more advanced encryption requirements. When these building blocks are linked together, you can build a highly resilient, scalable, and cost-effective AVOD offering for your customers.