This is the second in a series of blog posts about contextualized viewer engagement and monetization for live over-the-top (OTT) events. This post describes in technical terms how to use Amazon Web Services (AWS) to drive contextualization and deeper viewer engagement for live event streams. Click here to read the first post in the series.

This blog post provides a high-level solution architecture and subsequent deep dive into individual solution components. The solution is illustrated with sample source code to speed business adoption and reduce implementation time.

Technical architecture

The following architecture diagram outlines solution components and how they orchestrate to deliver business functionality. The core components are:

  • Live stream ingest mechanism built on AWS Elemental MediaLive (a broadcast-grade live video processing service), AWS Elemental MediaPackage (which prepares and protects your video for delivery over the Internet), and Amazon CloudFront (a content delivery network, or CDN).
  • A mechanism based on Amazon CloudWatch (a monitoring and observability service) and AWS Lambda (a serverless, event-driven compute service) to initiate the workflow according to the calendar timing of live events.
  • A celebrity and label detector mechanism within a dedicated Amazon Virtual Private Cloud (Amazon VPC), a logically isolated virtual network, to analyze live video streams and to detect celebrities or objects.
  • A viewer personalization engine within a dedicated Amazon VPC to deliver a personalized engagement or monetization instrument to viewers, based on their viewing context and other influencing factors.

Note that, for the sake of simplicity and focus, the previous architecture diagram does not include elements of architectural security, audit, or instrumentation. It is imperative that real-world implementations comply with AWS architectural best practices for a secure, resilient, and performant solution.

Solution components

The following sequence diagram summarizes the overall workflow, followed by details of each solution component.

The following sequence diagram summarizes the overall workflow, followed by details of each solution component.

1. Live stream publisher: The live stream is processed through MediaLive and MediaPackage before being pushed to a CloudFront CDN. In this example, the live stream is considered to be in HTTP live streaming (HLS) format. The manifest file for a typical HLS file (.M3U8) looks like the following:


The transport stream (.ts) files in the previous snippet represent the containers for video content, which are incrementally added to the manifest file as the live stream event progresses.

2. Incremental segment file detector: New .ts file segments are detected as soon as they are added to the HLS manifest. These new segments contain the video that the media player will play next, depending on its current position and buffer. The latest collection of such .ts file segments is continuously captured and posted to an Amazon Simple Notification Service (Amazon SNS topic for content analysis of its video frames. The following code snippet is one approach to continuously and incrementally detect such video segment URLs and post to an Amazon SNS topic.

The Amazon SNS topic referred to in the above code snippet is configured to initiate an AWS Lambda function for further analysis of the video segments.

import boto3 # declarations
base_url = '<base url of cloudfront without resource object>'
full_url = '<full url of cloudfront with resource file>'
sns_topic = ''
play = True
sleep_time = 3 # collection of unique url
unique_url_set_ts = set() # handler for sns service
sns = boto3.client('sns') # run in loop for duration of live event
while play: response = requests.get(base_url) if response.status_code == 200: data = response.text lines = data.split('\n') index = 0 while index < len(lines): line = lines[index] if line.find('#EXTINF') != -1: result = line.split(':') sleep_time = result[1] # the next tine will have ts segment line = lines[index+1] if line.find('.ts?m=') != -1: ts_url = line if ts_url not in unique_url_set_ts: # add to set to avoid recycle unique_url_set_ts.add(ts_url) segment_url = base_url + ts_url # post to sns topic for downstream processing response = sns.publish(TopicArn=topic, Message=segment_url)

3. Video frame extractor: This is an AWS Lambda function initiated by an Amazon SNS topic. It extracts video frames, or images, from the latest .ts file at a preconfigured time interval. FFmpeg is used for frame extraction and is deployed as a layer within AWS Lambda. The extracted frames are uploaded to a bucket in Amazon Simple Storage Service (Amazon S3), an object storage service, for machine learning (ML)–based content analysis. A sample code snippet follows:

import os
import boto3 def lambda_handler(event, context): ts_url = event['Records'][0]['Sns']['Message'] s3 = boto3.client('s3') cmd = '/opt/ffmpeg-lib/ffmpeg -i ' + ts_url + ' -copyts -vf fps=1/4 -f image2 -frame_pts true /tmp/%d.jpg' try: os.system(cmd) # loop through ffmpeg created files directory = '/tmp' for filename in os.listdir(directory): if filename.endswith(".jpg"): fully_qualified_file_name = os.path.join(directory, filename) # upload file to S3 for further processing response = s3.upload_file(fully_qualified_file_name, '<target-bucket-name>', filename) else: continue except Exception as e: print('an exception has occurred ' + str(e)) return False return True 

4. Celebrity detector using Amazon Rekognition: An AWS Lambda function is initiated by an Amazon S3 event for every video frame uploaded. Each frame is analyzed using Amazon Rekognition, which automates image and video analysis with ML, for known celebrity faces. This is important because a viewer may be interested in more engagement through a trivia, poll, or quiz centered around a celebrity they just saw on screen. Similarly, a viewer may click on an advertisement for a brand that a celebrity endorses. A sample code snippet follows:

import boto3 def lambda_handler(event, context) rek= boto3.client('rekognition') bucket = ‘<test_bucket>’ frame = '<image_file>' try: response = rek.recognize_celebrities(Image={'S3Object': {'Bucket': bucket, 'Name': frame}}) except Exception as e: print(‘an exception has occurred ‘ + str (e)) return response

5. Object detector using Amazon Rekognition: An AWS Lambda function is initiated by an Amazon S3 event for every video frame uploaded. Each frame is analyzed using Amazon Rekognition for known objects or labels—for example, a sofa in a scene. A viewer may be interested in more engagement through advertisements or digital commerce centered around the same object(s) they just saw on screen. A sample code snippet follows:

import boto3 def lambda_handler(event, context) rek = boto3.client('rekognition') bucket = ‘<bucket_name>’ frame = '<image_file>' try: response = rek.detect_labels(Image = {"S3Object": {"Bucket":bucket’, "Name":frame}}, MaxLabels=5, MinConfidence=90) except Exception as e: print('an exception has occurred ' + str(e)) raise e return response

6. Stream time calculator: It is imperative to correctly map the celebrity and object label to the time of the streaming event where it was detected. For example, if a sports event starts at 2:00 p.m. GMT, the time of the streaming event should be calculated as 120 seconds into the event at 2:02 p.m. GMT, assuming near-zero latency or buffering. FFmpeg is used to determine the beginning time position of a video stream in each .ts file. A sample code snippet follows:

import subprocess as sp
import os # ts url to be analyszed
ts_url = 'https://<cdn-url>/segment_1.ts' # ffmpeg command to get the time stamp
ffmpeg_cmd = "(ffmpeg -hide_banner -i " + ts_url + " -an -vf showinfo 2> ffmpeg_tmp.log; cat ffmpeg_tmp.log) | grep 'Duration' | awk '{print $4}'"
# command to delete temporary log file
file_del_cmd = "rm -f ffmpeg_tmp.log" # execute command to get start time
start_time = sp.getoutput(ffmpeg_cmd)
start_time_int_parsed = int(float(start_time[:-1]))
print(start_time_int_parsed) # command to delete the temporary log file
# remove the temporary file

The variable start_time_int_parsed captures the time position of the first image frame within a given .ts file. A .ts file has a finite length of the video, captured in the variable #EXTINF of the HLS manifest file. In the previous example, the length of each video chunk is 10 seconds. Moreover, assuming the solution is designed to extract one frame per second, an offset count must be added to a given frame within a .ts segment, based on the sequence position of the frame. A sample calculation is provided as follows:

• For a given segment start_time_int_parsed = 40 [40th second into the event]
• Total number of seconds in the video = 10 seconds
• Number of frames extracted per second = 1
• Total number of frames from segment is 10 / 1 = 10
• Therefore, the first frame represents the 40th second; the second, the 42nd second; the seventh, the 47th second of the video

7. Video frame information recorder using Amazon DynamoDB: When a video frame is analyzed for celebrities and labels objects, it persists to Amazon DynamoDB—a fast, flexible NoSQL database service—with the media identifier as the partition key, video position as the sort key, and other attributes, such as time stamp, celebrity list, object list, and confidence level. A sample code snippet follows:

import boto3 try: dynamodb = boto3.resource('dynamodb', region_name='us-east-1') table = dynamodb.Table('celebrity_object_in_media_stream’) response = table.put_item( Item={‘media_id':id, ‘media_time_position’:time_position, 'celeb_json':celebrity_info_json)
except Exception as e: print(‘an exception has occurred ‘ + str(e))

8. Viewer request handler: This is a service based on Amazon API Gateway—a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs—and AWS Lambda, which is invoked from the viewer device. The input from the viewer device is the media identifier, time position of the media stream in the player, and optional viewer profile identifier. The viewer profile identifier is used to contextualize and personalize the engagement and monetization instruments. This contextualization can be on the basis of demography, behavior, or psychograph. The viewer device can optionally send a forward or backward time offset along with the API input. For example, if the current time position on the player is 50 seconds, the backward offset is 10 seconds, and the forward offset is 5 seconds. The API will then retrieve the distinct collection of all celebrities and objects observed between the 40th and 55th seconds of the event. The window of viewer relevance might span an elongated time window, where contextualized and personalization instruments are still effective.

9. Viewer experience personalizer: This service delivers the personalized and contextualized instrument to a viewer watching a live event stream. An instrument can be a hyper-targeted advertisement, marketing campaign, trivia question, poll, quiz, or an upsell/cross-sell proposal. The input to this service is the media identifier, celebrities, and objects recently viewed, and the viewer profile based on demography, behavior, or psychograph. An ML-based classifier determines the right set of engagement instruments to the right viewer at the right point in time.


This blog post provides detailed cookbook-style instructions about how the business concept published in part 1 of this blog series is implemented on the AWS stack. This foundational implementation can be extended and customized for different content types, context and target viewer profiles, and super creative engagement mechanisms. The third and final post in this blog series explores how to build an ML-based custom classifier used to determine the right set of engagement instruments to the right viewer at the right point in time.