Setting up a Big Cluster in 3 Easy Steps with CloudSigma

HDP is the industry’s a truly secure, enterprise-ready open source Apache™ Hadoop® distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust big data analytics that accelerate decision making and innovation. (Source: https://hortonworks.com/products/data-platforms/hdp/).

I am going to install HDP to create a big cluster of 5 nodes deployed on CloudSigma. CloudSigma provides easy deployment, a vast library of operating systems and easy-to-use interface to set a Big Data Platform within minutes.

Step 1: Set up and Configure your Desired Server Infrastructure

To begin, I have already created five machines at CloudSigma. These machines have 16 GB RAM, 8 cores (2.5 GHz each) and 256 GB SSD. These configurations cost around 20 cents per hour for each machine on CloudSigma to run. I have installed Ubuntu 16.04 on each of the machines. Also, I have cloned the following Ubuntu drives from CloudSigma’s library:

Ubuntu 16.04 with VirtiO drivers Python 3 and 2.7.12 Pip 9.0.1 OpenSSL 1.0.2l Cloud-init 0.7.9 and latest updates until 2017-12-26

Step 2: Set up the Master/Slave Configuration

Next, for our big data tools to work properly, we require that our host (master) should be able to communicate with each of the nodes (slaves). So, we create another sudo user account, say m1 with the following commands on each machine.

adduser m1  		
usermod -aG sudo m1

1 2	adduser m1 usermod -aG sudo m1

Now for the machines to be able to communicate to each other, we first give each of the machine a name in /etc/hosts file:

sudo vi /etc/hosts

1	sudo vi /etc/hosts

Add entries similar to these with the IPs of your machine and the names you want to give the machines, for example:

IP_5 machine5.CloudSigma.dann machine5

Now we want that our m1 user from machine1 can access m1 user on other machines without being asked for password. For that passwordless ssh setup is done.

On machine1:

su - m1

su - m1

II. Create a ssh key

ssh-keygen

1	ssh-keygen

III. After that, copy the key to other machines

ssh-copy-id m1@Machine2
ssh-copy-id m1@Machine3
ssh-copy-id m1@Machine4
ssh-copy-id m1@Machine5

ssh-copy-id m1@Machine2

ssh-copy-id m1@Machine3

ssh-copy-id m1@Machine4

ssh-copy-id m1@Machine5

Step 3: Get Ambari Up and Running

Go to HortonWorks’s HDP download page and choose your preferred option. We are going to install HDP 2.6.4 (Automated) with Ambari 2.6.1. Click on download and it will redirect you to Apache Ambari Installation page. Select the base OS. In our case, I have Ubuntu 16 machines.

Following that, login to the host machine as root.

sudo su -

sudo su -

Next, download the Ambari repository file to a directory of choice. Execute the commands as mentioned on the page to download the repository file.

Now that we have the repo file, we can install Ambari. Since it downloads files of around 750 MB, a cloud platform is preferable for such clusters. This is because they provide an average download speed of around 40 MBps. So, it takes seconds with CloudSigma,

apt-get install ambari-server

1	apt-get install ambari-server

It’s time to set up the Ambari Server next.

ambari-server setup

1	ambari-server setup

It will ask several things but default options are fine for our purposes. So, we can just hit enter while going through them and the setup will be done.

Finally, with the following command you can start Ambari:

ambari-server start

1	ambari-server start

In order to access the Ambari UI, go to this address using your browser on any computer/tablet.

http://&lt;&gt;:8080

1	http://<>:8080

For example, if my IP is 213.125.36.21, then I will go to the address http://213.125.36.21:8080.

Now that you are in Ambari UI, you can log in using default username – admin and password – admin. You should change them to something secure straight away.

And voilà – we are finally finished! This was our tutorial on how to set up a big cluster in 3 simple steps.

For more tutorials, go ahead and explore our Community Section on the website.

Happy Computing!

About
Latest

About Akshay Nagpal

Big Data Analytics and ML enthusiast.

Removing Spaces in Python - March 24, 2023
Is Kubernetes Right for Me? Choosing the Best Deployment Platform for your Business - March 10, 2023
Cloud Provider of tomorrow - March 6, 2023
SOLID: The First 5 Principles of Object-Oriented Design? - March 3, 2023
Setting Up CSS and HTML for Your Website: A Tutorial - October 28, 2022