Setting up the distributed operating system (DC/OS)- CloudSigma

According to dcos.io, DC/OS is a distributed operating system based on the Apache Mesos distributed systems kernel.

Having a distributed operating system allows users to enable the management of multiple machines as if they were a single computer. In addition, it automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate remote management and monitoring of the cluster and its services.

Simplifying it, it abstracts and manages the resources like CPU, memory, storage etc. of the machines and makes it look like one system having all those resources. Hence, enabling fault tolerant and elastic distributed operating systems to be easily built and run effectively.

Some of the features of DCOS are:

Containerization: It works on the concept of containers. It creates containers and assigns these containers to services/applications. It works with docker and AppC images.
Linear Scalability: It can scale up to 10000s of nodes.
HA: The system is highly available with fault tolerant replicated master and agents using zookeeper.
APIs: HTTP APIs are available for developing new distributed applications, for operating the cluster, and for monitoring.
Web UI: Built-in Web UI for viewing cluster details and status.

Cluster Preparation

For installation of DCOS, I am creating 3 masters, 3 agents and one bootstrap node.

Firstly, for each of the master node, I am using the following configuration:
CPU: 10 Ghz
RAM: 32 GB
SSD: 120 GB

Next, for each of the agent node, I am using the following configuration:
CPU: 5 GHz
RAM: 16 GB
SSD: 60 GB

Finally, for bootstrap node, I am using the following configuration:
CPU: 5 GHz
RAM: 16 GB
SSD: 60 GB

For each of the instance, I have cloned the image “CoreOS – Container Linux 1235.12.0” from CloudSigma’s library. There’s a newer version available but that is not officially supported. I have resized the drive as specified above.

Installation Prerequisites

I am logging into the bootstrap node with username “core” and the ssh key as specified while creating the instances. While doing this, ensure that for each of the instance, ssh key set is same. This is necessary because it is required by the DC/OS installer.

On each of the machines, I am running this command to disable automatic updates since as of now, DCOS is not compatible with all the versions of CoreOS. I am doing this in order to stop the update-engine which automatically updates the CoreOS to a recent version.

sudo systemctl stop update-engine

1	sudo systemctl stop update-engine

On the bootstrap node, I am creating a folder ‘genconf’, where all the installation related files will be put on course of the installation.

mkdir genconf

1	mkdir genconf

In the genconf folder, I am creating a file ‘ip-detect’ which should report the IP address of each node.

touch genconf/ip-detect

1	touch genconf/ip-detect

In the same file, I am adding the following code. When this code is run on bash, it gives the IP of the particular node it is run on.

#!/usr/bin/env bash
set -o nounset -o errexit
export PATH=/usr/sbin:/usr/bin:$PATH
echo $(ip addr show eth0 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | head -1)

#!/usr/bin/env bash

set -o nounset -o errexit

export PATH=/usr/sbin:/usr/bin:$PATH

echo $(ip addr show eth0 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | head -1)

As a next step, I am creating another file ‘config.yaml’ in the genconf folder.

For your reference and information, YAML (YAML Ain’t Markup Language) is a human-readable data serialization language. It is commonly used for configuration files, but could be used in many applications where data is being stored (e.g. debugging output) or transmitted (e.g. document headers).

In this yaml file we will be adding the configuration customized for our cluster environment. DC/OS uses these configurations during the installation to generate cluster specific installation files.

For a detailed list of configuration parameters, refer to this link.

Note that the IP address of each node must be accessible from bootstrap node as well as from each other.

cluster_name: CloudSigma-Cluster

# Only override this value if you're hosting the contents of genconf/serve/
# at a custom location. The CLI installer will automatically distribute
# its contents to this location on all cluster nodes prior to install.
bootstrap_url: file:///opt/dcos_install_tmp
master_discovery: static
exhibitor_storage_backend: static
# If Google DNS is not available, you can replace these servers with your
# local DNS servers.
resolvers:
- 8.8.8.8
- 8.8.4.4

# Enter the master node IPs here
master_list:
- <master-ip-1>
- <master-ip-2>
- <master-ip-3>

# Enter the agent node IPs here
agent_list:
- <agent-ip-1>
- <agent-ip-2>
- <agent-ip-3>
- <agent-ip-4>
- <agent-ip-5>

# Enter the public agent IP here
public_agent_list:
- <public-agent-ip>

# Enter the node username. In our case, it's 'core'
ssh_user: core

cluster_name: CloudSigma-Cluster

# Only override this value if you're hosting the contents of genconf/serve/

# at a custom location. The CLI installer will automatically distribute

# its contents to this location on all cluster nodes prior to install.

bootstrap_url: file:///opt/dcos_install_tmp

master_discovery: static

exhibitor_storage_backend: static

# If Google DNS is not available, you can replace these servers with your

# local DNS servers.

resolvers:

- 8.8.8.8

- 8.8.4.4

# Enter the master node IPs here

master_list:

- <master-ip-1>

- <master-ip-2>

- <master-ip-3>

# Enter the agent node IPs here

agent_list:

- <agent-ip-1>

- <agent-ip-2>

- <agent-ip-3>

- <agent-ip-4>

- <agent-ip-5>

# Enter the public agent IP here

public_agent_list:

- <public-agent-ip>

# Enter the node username. In our case, it's 'core'

ssh_user: core

I will copy the common ssh key to genconf folder with name ‘ssh_key’. Changing the permissions to 600 which means that user has read and write permissions over the file.

cp <path-to-key> genconf/ssh_key && chmod 0600 genconf/ssh_key

1	cp <path-to-key> genconf/ssh_key && chmod 0600 genconf/ssh_key

Installation

Now that configuration part is complete, I will move on to the installation part.

In this part, I will first download the DC/OS installer file. Before doing this, it is required that we get our data transfer limit increased as huge data transfer can lead to our IPs getting blackholed to protect them from DDoS attacks. One can ask CloudSigma to increase the limit over their 24/7 Chat window. It’s a very quick process.

curl -O https://downloads.dcos.io/dcos/stable/dcos_generate_config.sh

1	curl -O https://downloads.dcos.io/dcos/stable/dcos_generate_config.sh

The setup script extracts a Docker container that uses the generic DC/OS install files to create customized DC/OS build files for my cluster. The build files are output to ./genconf/serve/.

sudo bash dcos_generate_config.sh --genconf

1	sudo bash dcos_generate_config.sh --genconf

You will see a similar output:

Extracting image from this script and loading into docker daemon, this step can take a few minutes
dcos-genconf.8ecb7913da270b9422-0add3ab2ad4ca4e863.tar
e53f74215d12: Loading layer [==================================================>]   5.06MB/5.06MB
eb1177963bcc: Loading layer [==================================================>]  22.91MB/22.91MB
c7fb3c437f35: Loading layer [==================================================>]  4.074MB/4.074MB
3b53e3dec3df: Loading layer [==================================================>]    220MB/220MB
c5e1c0bc9b27: Loading layer [==================================================>]   2.56kB/2.56kB
6ec9b63ce368: Loading layer [==================================================>]  32.46MB/32.46MB
17373561bf21: Loading layer [==================================================>]  674.8MB/674.8MB
d764eacaed48: Loading layer [==================================================>]  4.608kB/4.608kB
16dd1e193ee7: Loading layer [==================================================>]  3.072kB/3.072kB
77ec8641110b: Loading layer [==================================================>]  7.168kB/7.168kB
f23fdffad147: Loading layer [==================================================>]  1.536kB/1.536kB
Loaded image: mesosphere/dcos-genconf:8ecb7913da270b9422-0add3ab2ad4ca4e863
====> EXECUTING CONFIGURATION GENERATION
Generating configuration files...
Cluster package list: package_lists/01dc1c6fa76371695f1c28c9108cb620bd50d9cb.package_list.json
Package filename: packages/dcos-config/dcos-config--setup_e83fb2dfca669d83d58c46f02e6bf880764cd4a5.tar.xz
Package filename: packages/dcos-metadata/dcos-metadata--setup_e83fb2dfca669d83d58c46f02e6bf880764cd4a5.tar.xz
Generating Bash configuration files for DC/OS

Extracting image from this script and loading into docker daemon, this step can take a few minutes

dcos-genconf.8ecb7913da270b9422-0add3ab2ad4ca4e863.tar

e53f74215d12: Loading layer [==================================================>] 5.06MB/5.06MB

eb1177963bcc: Loading layer [==================================================>] 22.91MB/22.91MB

c7fb3c437f35: Loading layer [==================================================>] 4.074MB/4.074MB

3b53e3dec3df: Loading layer [==================================================>] 220MB/220MB

c5e1c0bc9b27: Loading layer [==================================================>] 2.56kB/2.56kB

6ec9b63ce368: Loading layer [==================================================>] 32.46MB/32.46MB

17373561bf21: Loading layer [==================================================>] 674.8MB/674.8MB

d764eacaed48: Loading layer [==================================================>] 4.608kB/4.608kB

16dd1e193ee7: Loading layer [==================================================>] 3.072kB/3.072kB

77ec8641110b: Loading layer [==================================================>] 7.168kB/7.168kB

f23fdffad147: Loading layer [==================================================>] 1.536kB/1.536kB

Loaded image: mesosphere/dcos-genconf:8ecb7913da270b9422-0add3ab2ad4ca4e863

====> EXECUTING CONFIGURATION GENERATION

Generating configuration files...

Cluster package list: package_lists/01dc1c6fa76371695f1c28c9108cb620bd50d9cb.package_list.json

Package filename: packages/dcos-config/dcos-config--setup_e83fb2dfca669d83d58c46f02e6bf880764cd4a5.tar.xz

Package filename: packages/dcos-metadata/dcos-metadata--setup_e83fb2dfca669d83d58c46f02e6bf880764cd4a5.tar.xz

Generating Bash configuration files for DC/OS

Next, I will run the pre-requisite installation script which includes system updates, compression utilities ((UnZip, GNU tar, and XZ Utils), and cluster permissions.

sudo bash dcos_generate_config.sh --install-prereqs

1	sudo bash dcos_generate_config.sh --install-prereqs

Now that the pre-requisities are installed, I will run the pre-flight script which will validate if the cluster is installable.

sudo bash dcos_generate_config.sh --preflight

1	sudo bash dcos_generate_config.sh --preflight

After the validation, I will run the deploy part of it. It will install DC/OS on the cluster.

sudo bash dcos_generate_config.sh --deploy

1	sudo bash dcos_generate_config.sh --deploy

After the installation, I am running the diagnostic script to verify if the services are up and running.

sudo bash dcos_generate_config.sh --postflight

1	sudo bash dcos_generate_config.sh --postflight

I am opening the link http://<master-public-ip>:8181/exhibitor/v1/ui/index.html and checking the status of the master servers. When all the status icons are green, I will be able to access the DC/OS web interface.

Before moving any further, I will first backup the installation files in case I need to add another agent node in the future.

cd genconf/serve
sudo tar cf dcos-install.tar *

1 2	cd genconf/serve sudo tar cf dcos-install.tar *

To do this, I will copy the tar to another location to keep a backup.

Setting up For First Use

Next, I will open the web interface at the master IP. It will ask me to create an account using Google, GitHub or Microsoft account.

Once installed, click on the top left logo and it will show a menu. Click on ‘Install CLI’. This would give you platform specific ways to install CLI. For CoreOS, we will change it to the following

rm /home/core/.bashrc 
cp /usr/share/skel/.bashrc /home/core/ 
echo PATH=$PATH:/opt/mesosphere/bin >> /home/core/.bashrc 
source /home/core/.bashrc

rm /home/core/.bashrc

cp /usr/share/skel/.bashrc /home/core/

echo PATH=$PATH:/opt/mesosphere/bin >> /home/core/.bashrc

source /home/core/.bashrc

I am removing the .bashrc file from home since it’s read only and I won’t be able to edit it. Then, I am copying it from it’s original location back to home. This allows me to edit the file. I am appending the PATH with the directory where I am going to put dcos files. Then I will source the bashrc to update the environment.

curl https://downloads.dcos.io/binaries/cli/linux/x86-64/dcos-1.11/dcos -o dcos &&
sudo mv dcos /opt/mesosphere/bin && 
cd /opt/mesosphere/bin &&
sudo chmod +x /opt/mesosphere/bin/dcos && 
dcos cluster setup http://IP && 
dcos

curl https://downloads.dcos.io/binaries/cli/linux/x86-64/dcos-1.11/dcos -o dcos &&

sudo mv dcos /opt/mesosphere/bin &&

cd /opt/mesosphere/bin &&

sudo chmod +x /opt/mesosphere/bin/dcos &&

dcos cluster setup http://IP &&

dcos

Subsequently, you will see this displayed on your screen:

If your browser didn't open, please go to the following link:

http://IP/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob

Enter OpenID Connect ID Token:

If your browser didn't open, please go to the following link:

http://IP/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob

Enter OpenID Connect ID Token:

Go to the URL, sign in with your account and copy paste the token provided there to continue.

Now, DCOS is ready to use on both CLI and GUI.

User Interface

We will take a glance over the UI:

On the DashBoard, we can see the status of our complete cluster, the resources, tasks, service, component health, etc.
On Services page, we can see all the service instances selected to be deployed. It shows the status as well like Running/Deploying.
The jobs page is to Create both one-off or scheduled jobs to perform tasks at a predefined interval.
There is a Catalog from where we can choose and install any of the tools provided like Confluent-Kafka, HDFS, Jenkins, Marathon, Spark, etc.
By clicking on Nodes (under Resources), we can see the usage of resources of each node and their health.
The Cluster Overview gives all the details regarding the cluster, from version to each of the technical details.
Components show all the DC/OS components and their health.
In Settings -> Package Repositories, we can add or delete a repository for installing packages.
On Organization page, we can add or remove users.

Adding another Node

Prepare the new node with CoreOS – Container Linux 1235.12.0 from CloudSigma’s library. Next, log in to BootStrap node.

I am copying the dcos-install.tar, which I had created earlier, to the new node.

scp ~/dcos-install.tar $username@$node-ip:~/dcos-install.tar

1	scp ~/dcos-install.tar $username@$node-ip:~/dcos-install.tar

I am creating a directory for extracting installer files into. Along with that, I am unpacking the tar file.

sudo mkdir -p /opt/dcos_install_tmp
sudo tar xf dcos-install.tar -C /opt/dcos_install_tmp

1 2	sudo mkdir -p /opt/dcos_install_tmp sudo tar xf dcos-install.tar -C /opt/dcos_install_tmp

Finally, I will install DCOS on the agent node.

And we are good to go! This is how you install the distributed operating system DC/OS.

For more guides and tutorials, check out our tutorials page.

About
Latest

About Akshay Nagpal

Big Data Analytics and ML enthusiast.

Removing Spaces in Python - March 24, 2023
Is Kubernetes Right for Me? Choosing the Best Deployment Platform for your Business - March 10, 2023
Cloud Provider of tomorrow - March 6, 2023
SOLID: The First 5 Principles of Object-Oriented Design? - March 3, 2023
Setting Up CSS and HTML for Your Website: A Tutorial - October 28, 2022