Azure HDInsight

You can quickly launch Azure HDInsight cluster using following link.

https://azure.microsoft.com/en-in/resources/templates/101-hdinsight-spark-linux/

A few things I learned while using above link. First thing is it lands you on a page where you need to enter cluster name, cluster admin name and  password , sshuser name and password.  When I entered required info and clicked "Purchase" , it failed. When I looked into the log, I found that issue was related to password. The message coming back is  not always correct. It says the password should be between  6-xx characters. In reality it needs the password to be at least 10 characters with at least one upper case , on small case and 1 number.  I had to spend 15-20 minutes to figure this out and get it working.

Once you launch the cluster , you can access the Ambari UI at   <clustername>.azurehdinsight.net. You can access spark cluster via commands like <clustername>.azurehdinsight.net/livy/sessions or <clustername>.azurehdinsight.net/livy/batches.  For more information see following link.

https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-livy-rest-interface


When you launch cluster , it also provides you ssh access at   <sshuser>@<clustername>-ssh.azurehdinsight.net  . You can login there and upload files, spark code etc.  etc.  You can use hadoop commands here. For example you can upload "demo.jar" and then put that into hdfs location as following.

hadoop fs  -put -f demo.jar /example/data/codebase/jar

Now you can  access this file using following URL in livy comamnds.

"wasb://<clustername>@<storageaccountname>.blob.core.windows.net/example/data/codebase/jar/demo.jar"






Comments

Popular posts from this blog

SQL

Analytics

DBeaver