Let’s start by entering the following command to view the contents of the root folder in our HDFS file system: If you’re successful, you should see this output. In the SSH console, enter your username and password. Hadoop uses a file system called HDFS, which is implemented in Azure HDInsight clusters as Azure Blob storage. When prompted, enter your SSH username and password you specified earlier (NOT THE CLUSTER USERNAME!!). (If a security warning pops up stating that the host certificate can’t be verified, just click Yes to continue). Then under connection type, select SSH and click Open. Open PuTTY and in the session blade, enter the host name in the host name text box. ![]() Click on Secure Shell and then in the blade, note your host name for your cluster (should take the format of ) In the Azure portal, navigate to the HDInsight Cluster blade for your HDInsight Cluster. We can also scale the amount of worker nodes to meet increasing processing demand.Īwesome! We have a Cluster and we’re ready to connect to it! We can view a summary of our cluster through the HDInsight Cluster blade. Once it’s created, we can view the configuration of our cluster in the portal. This can take a while (20 minutes), so treat yourself to some green tea :) Default Container: Enter the cluster name you specified previouslyĬlick Create and wait for it to provision.Resource Group (create a new one if you don’t have one).SSH Password (same as the cluster password.SSH Username: (Has to be different to your cluster username).Cluster Password: Create your password (make sure you remember it!).Cluster Username: Create your name (make sure you remember it!).Subscription: Choose your Azure subscription.Then create a New HDInsight Cluster to create a new cluster:Ĭreate your new cluster with the following attributes: In your portal, click on New > Data + Analytics > HDInsight. Let’s head over to Azure and start this process. ![]() Provisioning and Configuring an HDInsight ClusterĪlright, enough theory! Let’s have a go at provisioning our HDInsight Cluster in Azure. ![]() You can do this on tutorial on Linux or Mac’s, but I don’t use those operating systems so you’ll have to search elsewhere for guidance on that. We can use the command line, but for simplicity this graphical tool is fine. We’ll be working with Azure Blob Storage during this tutorial. We can connect to Hadoop services using a remote SSH session. HDInsight Hadoop clusters can be provisioned as Linux virtual machines in Azure. You can sign up for a trial or if you have one, use an MSDN subscription. What you’ll need to complete this tutorial: It can process messy data sets to provide end users with some fascinating insights into their unstructured data.įor this tutorial, I’m going to show you how we can create HDInsight clusters in Azure, how we can connect and see what’s in our clusters and run MapReduce jobs on the data within our clusters. Apache Hadoop is an open-source solution for storing and analysing massive amounts of unstructured data.
0 Comments
Leave a Reply. |