During the past year, we’ve seen customers running self-managed Elasticsearch clusters on AWS who were running out of compute and storage capacity because of the non-elasticity of their clusters. They adopted Amazon OpenSearch Service (Successor To Amazon Elasticsearch Service) to benefit from better flexibility for their logs and enhanced retention periods.

In this post, we discuss how to build a cost-effective extension to your Elasticsearch cluster with Amazon OpenSearch Service to extend the retention time of your data.

In May 2021, we published the blog post Introducing Cold Storage for Amazon OpenSearch Service, which explained how to reduce your overall cost. Cold storage separates compute and storage costs when you detach indices from the domain. You benefit from a better overall storage-to-compute ratio. Cold storage can reduce your data retention cost by up to 90% per GB versus storing the same data in the Hot tier. You can automate your data lifecycle management and move your data between the three tiers (Hot, UltraWarm, and Cold) thanks to Index State Management.

AWS Professional Services teams worked with one customer to add an OpenSearch Service domain as a second target for their logs. This customer was only able to keep their indices for 8 days on their existing self-managed Elasticsearch cluster. Because of legal and security requirements, they needed to retain data up to 6 months. The solution was to use an Elasticsearch cluster (running 7.10 version) on Amazon OpenSearch Service as an extension of their existing Elasticsearch cluster. This gave their internal application teams an additional Kibana dashboard to visualize their indices for more than 8 days. This extension uses the UltraWarm tier to provide warm access to their data. Then, they move data to the Cold storage tier when they’re not actively using it to remove compute resources and for cost-effectiveness.

Building this solution as an extension to their existing self-managed cluster gave them 172 extra days of access to their logs (21.5 times the data retention length) at an incremental cost of 15%.

Demystifying Index State Management

Index State Management (ISM) enables you to create a policy to automate index management within different tiers in an OpenSearch Service domain.

As of February 2022, three tiers are available in Amazon OpenSearch Service: Hot, UltraWarm, and Cold.

The default Hot tier is for active writing and low-latency analytics. UltraWarm is for read-only data up to three petabytes at one-tenth of the Hot tier cost, and Cold is for unlimited long-term archival. Although Hot storage is used for indexing and provides the fastest access, UltraWarm complements the Hot storage tier by providing less expensive storage for older and less-frequently accessed data. This is done while maintaining the same interactive analytics experience. Rather than attached storage, UltraWarm nodes use Amazon Simple Storage Service (Amazon S3) and a sophisticated caching solution to improve performance.

ISM helps you from a cost-effective perspective—when you don’t need to access your data after a certain period but you still need to keep them because of legal requirements, for instance, to automate the transition of your data within those tiers. Those operations are based on index age, size, and other conditions.

Also, the order of transition needs to be respected from Hot to UltraWarm to Cold, and from Cold to UltraWarm to Hot—you can’t change this order.

Solution overview

Our solution enables you to extend the retention time for your data. We show you how to add a second Cold OpenSearch Service domain to your existing self-managed Hot deployment. You use Elasticsearch snapshots to move data from the Hot cluster to the Cold domain. You use ISM policies applied to these indices, with different retention periods before their deletion, from 14–180 days.

In addition to that, you add 9 recommended alarms for Amazon OpenSearch Service in Amazon CloudWatch via an AWS CloudFormation template to enhance your ability to monitor your stack. Those recommended alarms notify you, through an Amazon Simple Notification Service (Amazon SNS) topic, on key metrics you should monitor, like ClusterStatus, FreeStorageSpace, CPUUtilization, and JVMMemoryPressure.

The following diagram illustrates the solution architecture:

The diagram contains the following components in our solution for extending your self-managed Elasticsearch cluster with Amazon OpenSearch Service (available on GitHub):

  1. Snapshots repository
    1. You run an AWS Lambda function one time to register your S3 bucket (snapshots-bucket in the diagram) as a snapshots repository for your OpenSearch Service domain.
  2. ISM policies
    1. You run a Lambda function one time to create six ISM policies that automate the migration of your indices from the Hot tier to UltraWarm and from UltraWarm to Cold storage, as soon as they are restored within the domain, with different retention periods (14, 21, 35, 60, 90, and 180 days before deletion).
  3. Index migration
    1. You use an Amazon EventBridge rule to trigger automatically—once a day— a Lambda function (RestoreIndices in the diagram).
    2. This function parses the latest snapshots that have been pushed by the Elasticsearch cluster.
    3. When the function finds a new index that doesn’t exist yet in the OpenSearch Service domain, it initiates a restore operation and attaches an ISM policy (created during step 2.1).
  4. Free UltraWarm cache
    1. You use an EventBridge rule to trigger automatically – once a day – an AWS Lambda function (MoveToCold in the diagram).
    2. This function checks for indices that have been warm accessed and moves them back to the Cold tier in order to free UltraWarm nodes caches.
  5. Alerting
    1. You use CloudWatch to create 9 alarms based on Amazon OpenSearch Service CloudWatch metrics.
    2. CloudWatch redirects alarms to an SNS topic.
    3. You receive notifications from the SNS topic, which sends emails as soon as an alarm is raised.

Prerequisites

Complete the following prerequisite steps:

  1. Deploy a self-managed Elasticsearch cluster (running on premises or in AWS) that pushes snapshots periodically to an S3 bucket (ideally once a day).
  2. Deploy an OpenSearch Service domain (running OpenSearch 1.1 version) and enable UltraWarm and Cold options.
  3. Deploy a proxy server (NGINX on the architecture diagram) in a public subnet that allows access to dashboards for your OpenSearch Service domains, hosted within a VPC.
  4. To automate multiple mechanisms in this solution, create an AWS Identity and Access Management (IAM) role for our different Lambda functions. Use the following IAM policy:
{ "Version": "2012-10-17", "Statement": [ { "Action": "logs:CreateLogGroup", "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:*", "Effect": "Allow" }, { "Action": [ "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/*:*", "Effect": "Allow" }, { "Action": "iam:PassRole", "Resource": "arn:aws:iam::123456789012:role/snapshotsRole", "Effect": "Allow" }, { "Action": [ "es:ESHttpPut", "es:ESHttpGet", "es:ESHttpPost" ], "Resource": "arn:aws:es:us-east-1:123456789012:domain/my-test-domain/*", "Effect": "Allow" } ]
}

This policy allows our Lambda functions to send PUT, GET and POST requests to our OpenSearch Service domain, register their logs in CloudWatch Logs, and pass an IAM role used to access the S3 bucket that stores snapshots.

  1. Additionally, edit the trust relationship to be assumed by Lambda:
    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" } ]
    }

You use this IAM role for the Lambda functions you create.

You also need to configure OpenSearch’s security plugin to assign permissions for traffic Lambda sends to OpenSearch.

  1. Sign in to your Cold domain’s Kibana dashboard and in the Security section, choose Roles.

Here you can find existing and predefined Kibana roles.

  1. Select the all_access role and choose Mapped users.
  2. Choose Manage mapping to edit the mapped users.
  3. Enter the ARN of the IAM role you just created as a new backend role on this Kibana role.

In the following sections, we walk you through the steps to set up each component in the solution architecture.

Snapshots repository

To migrate your logs from the Hot cluster to the Cold domain, you register your S3 bucket that stores logs in the form of snapshots (from the Elasticsearch cluster) as a snapshots repository for your OpenSearch Service domain.

  1. Create an IAM role (for this post, we use SnapshotsRole for the role name) to give permissions to the Cold domain to access your S3 bucket that stores snapshots from your Elasticsearch cluster. Use the following IAM policy for this role:
    { "Version": "2012-10-17", "Statement": [{ "Action": [ "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::s3-bucket-name" ] }, { "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::s3-bucket-name/*" ] } ]
    }

  2. Edit the trust relationship to be used from Amazon OpenSearch Service:
    { "Version": "2012-10-17", "Statement": [{ "Sid": "", "Effect": "Allow", "Principal": { "Service": "es.amazonaws.com" }, "Action": "sts:AssumeRole" }] }

  3. Create the Lambda function that is responsible for registering this S3 bucket as the snapshots repository.

On the GitHub repository, you can find the files needed to build this part. See the lambda-functions/register-snapshots-repository.py Python file to create the Lambda function.

  1. Choose Test on the Lambda console to run the function.

You only to run it once. It registers the S3 bucket as a new snapshots repository for your OpenSearch Service domain.

  1. Verify the snapshots repository by navigating to the Kibana dashboard of the Cold domain on the Dev Tools tab and running the following command:
    GET _snapshots/myelasticsearch-snapshots-repository (replace with your repository name)

You can also achieve this step from an Amazon Elastic Compute Cloud (Amazon EC2) instance (instead of a Lambda function) because it only has to be run once, with an instance profile IAM role attached to the EC2 instance.

Index State Management policies

You use Index State Management to automate the transition of your indices between storage tiers in Amazon OpenSearch Service. To use ISM, you create policies (small JSON documents that define a state automaton) and attach these policies to the indices in your domain. ISM policies specify states with actions and transitions that enable you to move and delete indices. You can use the functions/create-indexstatemanagement-policy.py Lambda code to create six ISM policies that automate transition within tiers and delete your Cold indices after 14, 21, 35, 60, 90, and 180 days. You use the IAM role you created earlier, and run that function once to create the policies in your domain.

Navigate to Kibana in your OpenSearch Service domain and choose Index Management. On the State management policies page, verify that you can see your ISM policies.

Index migration

To migrate your data from the Hot cluster to the Cold domain, you use the functions/restore-indices.py code to create a Lambda function (RestoreIndices) and the cfn-templates/event-bridge-lambda-function.yaml CloudFormation template to create its trigger, which is an EventBridge rule (scheduled once a day at 12 AM). Your indices are migrated to the Cold domain thanks to the Lambda function that parses indices within your snapshots repository, and initiates restore operations for each new index that doesn’t exist in the Cold domain. As soon as the index is restored in the domain, the Lambda function attaches an ISM policy to it, based on its index pattern to determine its retention period.

Python code looks for an application name structured in exactly three letters (for example, aws). If your logs have a different index pattern, you need to update relevant code lines (trigramme = index [5:8]).

Free UltraWarm cache

To free cache your UltraWarm nodes from the Cold domain, you use the functions/move-to-Cold.py code to create a Lambda function (MoveToCold) and the cfn-templates/event-bridge-lambda-function.yaml CloudFormation template to create its trigger, which is an EventBridge rule (change its schedule to avoid operating in parallel with the previous rule). Your indices that are in UltraWarm tier for warm access are moved to Cold storage to free the nodes cache to prepare the next index migration and for cost-effectiveness.

Alerting

To get alerted via email when the Cold domain requires your attention, you use the cfn-templates/alarms.yaml CloudFormation template to create an SNS topic that receives notifications when one of the 9 CloudWatch alarms have been raised, based on the Amazon OpenSearch Service metrics. Those alarms come from the recommended CloudWatch alarms for Amazon OpenSearch Service.

Conclusion

In this post, we covered a solution to enable an OpenSearch domain as an extension to your existing self-managed Elasticsearch cluster, in order to extend the retention period of applications logs in a serverless and cost-effective way.

If you’re interested in going deeper into Amazon OpenSearch Service and AWS Analytics capabilities in general, you can get help and join discussions on our forums.


About the Authors

Alexandre Levret is a Professional Services consultant within Amazon Web Services (AWS) dedicated to the public sector in Europe. He aims to build, innovate and inspire his multiple customers which face challenges that cloud computing can help them to resolve.