Amazon EMR

While trying to create EMR cluster I got following error.

"Could not create cluster.The instance profile for the newly created default role is not yet visible. Please try after a few seconds." 

Tried again and again but no luck.  Ultimately I deleted all the default EMR roles which were there and I recreated the roles as following.


aws emr create-default-roles --profile admin --region us-east-1

After that I was able to create the cluster. 


https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-costs-with-idle-checks-and-automatic-resource-termination-using-advanced-amazon-cloudwatch-metrics-and-aws-lambda/

Master node does not need high compute power.Core task nodes need both compute and storage and task  nodes don't need storage.  

You can not change instance type for master and core nodes while cluster is running but you can do so for task nodes.

By default  EMR schedules jobs in such a way that even if any task node gets terminated , jobs can still continue.  It schedles the master processes on core nodes ( nodes with label 'CORE' ) . 

you should only run core nodes on Spot Instances when partial HDFS data loss is tolerable.

When you launch EMR cluster , you have two options for picking instance : 1) Uniform Instance Group  or 2)  Instance Fleet.    Uniform Instance group will let you choose single type of instance for master , core and task nodes.  Though master , core and task instance types can be different from each other.   With instance fleet option you get more options . You can have mix instance types and mix purchase options ( spot vs. on-demand for core and task nodes ).  You can specify max spot price and target capacities ( spot vs on-demand )  in terms of units. For spot instances you can specify provisioning time-out and what do if spot instance capacity is not met in that time. See various options below when cluster composition is set to  instance fleet.


You can also specify the same via a json config file while using AWS CLI for creating cluster. 

Comments

Popular posts from this blog

SQL

Analytics

DBeaver