You can now use open source frameworks such as Apache Spark, Apache Hive, and Presto running on Amazon EMR clusters directly from Amazon SageMaker Studio notebooks to run petabyte-scale data analytics and machine learning. Amazon EMR automatically installs and configures open source frameworks and provides a performance-optimized runtime that is compatible with and faster than standard open source. For e.g. Spark 3.0 on Amazon EMR is 1.7x faster than it’s open source equivalent. Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. Analyzing, transforming and preparing large amounts of data is a foundational step of any data science and ML workflow. This release makes it simple to use popular frameworks such as Apache Spark, Hive, and Presto running on EMR clusters directly from Sagemaker Studio to help simplify data science and ML workflows.

Categories: AWS