Organizations managing cloud infrastructure in AWS need effective mechanisms to audit operations in their AWS accounts for security and compliance. In November 2013, we announced AWS CloudTrail as the auditing platform for AWS. Since then, millions of customers have adopted this service. We believe CloudTrail is so important to AWS customers’ success that every new account created includes a 90-day free trial. We’ve also given our customers access to longer data retention, as well as provided them with integral copies of trails that multiple teams can consume.

Today, we are excited to announce the general availability of AWS CloudTrail Lake, a managed data lake that lets organizations aggregate, immutably store, and query events recorded by CloudTrail for auditing, security investigation, and operational troubleshooting. This new platform simplifies CloudTrail analysis workflows by integrating collection, storage, preparation, and optimization for analysis and query in the same product. This removes the need to maintain separate data processing pipelines that span across teams and products to analyze CloudTrail events.

CloudTrail Lake enables querying of CloudTrail data using the familiar SQL query language. The platform also includes sample queries that are designed to help users get started with writing queries for common scenarios, such as identifying records of all activities performed by a user to help accelerate security investigations. The immutable nature of storage, coupled with a default retention window of seven years, helps customers meet compliance requirements. CloudTrail Lake supports the collection of events from multiple AWS regions and AWS accounts.

In this blog post, I’ll walk you through an example of how you can get started with enabling CloudTrail Lake and performing a few example queries.

Enabling CloudTrail Lake

To use CloudTrail Lake, you must enable it in AWS CloudTrail. Use the following steps to enable CloudTrail Lake and create an event data store that we will query later on.

  1. Open the AWS Console and log in with an account with administrative permissions to manage AWS CloudTrail.
  2. Navigate to the CloudTrail console. In the left-hand navigation menu, choose Lake
  3. Choose the Event Data Store tab
  4. Navigate to, and choose, Create event data store.
  5. Enter a name for the data store. For my example, I’ll use the name “MyNewDataStore”
  6. Enter the retention period for your data store. You can enter from 7 days to 2555 days.
  7. Select whether you want to include only the current region in this data source. The current region’s name is shown for your reference.
  8. (Optional) Select if you want to include all accounts in the organization (applies only to AWS Organizations environments)
General details screen for the event data source creation page. User can specify retention period, and options for multi account and multi-region

Create Event Data Store General Details

  1. (Optional) Enter any tags for the event data source. Tags can help you organize and sort resources in your AWS account. To learn more about using tags, see Tagging AWS resources in the AWS General Reference.
  2. Choose Next.
  3. Select the event types that you want to track. CloudTrail Lake allows you to collect data on both management events and data events.
  4. If you selected tracking for management events, you can select whether you want to track read and write events. You can also select whether to exclude AWS Key Management Service (KMS) and AWS RDS data API events from tracking.
Choose events screen that allows users to specify the CloudTrail events that will be part of the event data source. You can also set options to exclude certain API activity.

Choose events screen that allows users to specify the CloudTrail events that will be part of the event data source. You can also set options to exclude certain API activity.

  1. Choose Next.
  2. On the review page, make sure the options you configured are correct. When ready, choose Create event data store.

CloudTrail Lake then creates your event data store. You’ll see the status of your new event data store in the Status pane of the Event data stores list. After a few minutes, your data store will start and can then be queried.

Sample queries

You can explore the features of CloudTrail Lake by trying some of the sample queries included with this service. To use a sample query, use the following steps:

  1. Navigate to the Samples queries
  2. For this example, choose the Multi-region console logins This query will display all users who have logged into the console from a specified set of regions, within a specified date range.
  3. The following sample query is automatically populated into the Query editor (you must replace $EDS_ID with the id of your event data store) :
SELECT eventTime, useridentity.arn, awsRegion FROM $EDS_ID WHERE eventTime > '2021-07-20 00:00:00' AND eventTime < '2021-07-23 00:00:00' AND awsRegion in ('us-east-1') AND eventName = 'ConsoleLogin'

Sample query screen

Sample query screen

  1. Next, you must replace the time range that will be searched with the time range you want to use. The date string specified after eventTime > is the earliest event timestamp that will be included, while the date string specified after eventTime < is the latest event timestamp that will be included. (Note: you can use >= or <= to make the time stamp inclusive of the date/time provided. For a full list of all the operators supported, please see here. )
  2. Finally, specify for which regions you would like to search login events. The sample already includes us-east-1 and us-east-2. This looks good to me. If you want to add or remove regions from the search, you can add or remove them from the parenthesis after the awsRegion in statement.

With my changes made, my new query now looks as follows:

SELECT eventTime, useridentity.arn, awsRegion FROM 2add3562-038a-4075-95af-e219ea33a2df WHERE eventTime > ‘2021-12-05 00:00:00’ AND eventTime < ‘2021-12-16 00:00:00’ AND awsRegion in (‘us-east-1’, ‘us-west-2’) AND eventName = ‘ConsoleLogin’

Now that the query is ready, I click “Run”. After a few seconds I can see the results under “Query Results”

CloudTrail Lake included additional resources you can use while building a query. For example, the left-hand pane on the console gives a full list of the event properties you can query. This is helpful when you want to add fields to query for or criteria to further refine your query.

You can also save your queries and reuse them later, right from the console. Just choose the Save button and enter a descriptive name for the query. You can access all your saved queries from the Saved queries tab.

Other sample queries

Here are a few other queries you can try to get a sense of the power of this platform. Make sure to replace the event data source in the query statement with the correct one for your account.

Show all recorded API activity for a specific IAM key

SELECT eventTime, eventName, userIdentity.principalId
FROM 11f564ae-cf2e-40a4-9683-05ffaa976706
WHERE userIdentity.accessKeyId like 'AKIAXZUQIC6XEVCJJFM7'

Show any security group changes

SELECT eventname, useridentity.username, sourceIPAddress, eventtime, element_at(requestParameters, 'groupId') as SecurityGroup, element_at(requestParameters, 'ipPermissions') as ipPermissions
FROM $EDS_ID
WHERE (element_at(requestParameters, 'groupId') like '%sg-%')
and eventtime > '2017-11-01T00:00:00Z'
order by eventtime asc;

Generally available today

You can enable CloudTrail Lake in the CloudTrail console, by using the AWS Software Development Kits (SDKs), or by using the AWS Command Line Interface (CLI). CloudTrail Lake is currently available in the following regions: US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Africa (Cape Town), Europe (Ireland), Europe (London), Europe (Paris), Europe (Milan), Europe (Frankfurt), Europe (Stockholm), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka), Asia Pacific (Hong Kong), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Mumbai), Middle East (Bahrain), and South America (Sao Paulo). To get started, see Working with CloudTrail Lake in the CloudTrail User Guide.

Cleanup

If you no longer would like to use CloudTrail lake, just make sure to delete the event data store. To do this follow these steps:

  1. Click on the Event data stores tab in the Lake console.
  2. Select the event data store from the list.
  3. From the actions menu, select “Change termination protection”.
  4. From the change termination protection pop-up select Disabled and click “Save”.
  5. From the Actions menu select Delete, confirm that you want to delete it by entering the name of the data store. Then click “Delete”. This will place your event data store in the pending deletion state.
  6. This will disable the data store and in seven days it will be deleted permanently.

Conclusion

In the blog post we’ve announced the new CloudTrail Lake service. We’ve shown you some examples of how you can enable it, as well as how to start writing your own queries. We also provided you some sample queries to get started. We’re excited to make this new service available for you and can’t wait to see what great things you build with it.

About the author

Andres Silva

Andres Silva

Andres Silva is a Principal Specialist Solutions Architect with the Cloud Operations team at AWS. He has been working with AWS technology for more than 9 years. Andres works closely with the AWS Service teams to design solutions at scale that help customers implement and support complex cloud infrastructures. When he is not building cloud automation, he enjoys skateboarding with his 2 kids.