Setting up the DCOS, distributed operating system

According to dcos.io, DC/OS is a distributed operating system based on the Apache Mesos distributed systems kernel. It enables the management of multiple machines as if they were a single computer. It automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate remote management and monitoring of the cluster and its services.

Simplifying it, it abstracts and manages the resources like CPU, memory, storage etc. of the machines and makes it look like one system having all those resources. Hence, enabling fault tolerant and elastic distributed systems to easily be built and run effectively.

Some of the features of DCOS are:

  • Containerization: It works on the concept of containers. It creates containers and assigns these containers to services/applications. It works with docker and AppC images.
  • Linear Scalability: It can scale up to 10000s of nodes.
  • HA: The system is highly available with fault tolerant replicated master and agents using zookeeper.
  • APIs: HTTP APIs are available for developing new distributed applications, for operating the cluster, and for monitoring.
  • Web UI: Built-in Web UI for viewing cluster details and status.

Cluster Preparation

For installation of DCOS, I am creating 3 masters, 3 agents and one bootstrap node.

For each of the master node, I am using the following configuration:
CPU: 10 Ghz
RAM: 32 GB
SSD: 120 GB

For each of the agent node, I am using the following configuration:
CPU: 5 GHz
RAM: 16 GB
SSD: 60 GB

For bootstrap node, I am using the following configuration:
CPU: 5 GHz
RAM: 16 GB
SSD: 60 GB

For each of the instance, I have cloned the image “CoreOS – Container Linux 1235.12.0” from CloudSigma’s library. There’s a newer version available but that is not officially supported. I have resized the drive as specified above.

Installation Prerequisites

I am logging into the bootstrap node with username “core” and the ssh key as specified while creating the instances. Ensure that for each of the instance, ssh key set is same. This is required by the DC/OS installer.

On each of the machines, I am running this command to disable automatic updates since as of now, DCOS is not compatible with all the versions of CoreOS. This will stop the update-engine which automatically updates the CoreOS to a recent version.

On the bootstrap node, I am creating a folder ‘genconf’, where all the installation related files will be put on course of the installation.

In the genconf folder, I am creating a file ‘ip-detect’  which should report the IP address of each node.

In the same file, I am adding the following code. When this code is run on bash, it gives the IP of the particular node it is run on.

 
As a next step, I am creating another file ‘config.yaml’ in the genconf folder.

For your reference and information, YAML (YAML Ain’t Markup Language) is a human-readable data serialization language. It is commonly used for configuration files, but could be used in many applications where data is being stored (e.g. debugging output) or transmitted (e.g. document headers).

In this yaml file we will be adding the configuration customized for our cluster environment. DC/OS uses these configurations during the installation to generate cluster specific installation files.

For a detailed list of configuration parameters, refer to this link.

Note that the IP address of each node must be accessible from bootstrap node as well as from each other.

I will copy the common ssh key to genconf folder with name ‘ssh_key’. Changing the permissions to 600 which means that user has read and write permissions over the file.

Installation

Now that configuration part is complete, I will move on to the installation part.

In this part, I will first download the DC/OS installer file. Before doing this, it is required that we get our data transfer limit increased as huge data transfer can lead to our IPs getting blackholed to protect them from DDoS attacks. One can ask CloudSigma to increase the limit over their 24/7 Chat window. It’s a very quick process.

The setup script extracts a Docker container that uses the generic DC/OS install files to create customized DC/OS build files for my cluster. The build files are output to ./genconf/serve/.

Similar output will be shown:

 

Next, I will run the pre-requisite installation script which includes system updates, compression utilities ((UnZip, GNU tar, and XZ Utils), and cluster permissions.

Now that the pre-requisities are installed, I will run the pre-flight script which will validate if the cluster is installable.

After the validation, I will run the deploy part of it. It will install DC/OS on the cluster.

After the installation, I am running the diagnostic script to verify if the services are up and running.

I am opening the link http://<master-public-ip>:8181/exhibitor/v1/ui/index.html and checking the status of the master servers. When all the status icons are green, I will be able to access the DC/OS web interface.

Before moving any further, I will first backup the installation files in case I need to add another agent node in the future.

I will copy the tar to another location to keep a backup.

Setting up For First Use

Next, I will open the web interface at the master IP. It will ask me to create an account using Google, GitHub or Microsoft account.

 

Once installed, click on the top left logo and it will show a menu. Click on ‘Install CLI’. This would give you platform specific ways to install CLI. For CoreOS, we will change it to the following

 

I am removing the .bashrc file from home since it’s read only and I won’t be able to edit it. Then, I am copying it from it’s original location back to home. This allows me to edit the file. I am appending the PATH with the directory where I am going to put dcos files. Then I will source the bashrc to update the environment.

This will be displayed on your screen:

Go to the URL, sign in with your account and copy paste the token provided there to continue.

Now, DCOS is ready to be used on both CLI and GUI.

User Interface

We will take a glance over the UI:

  • On the DashBoard, we can see the status of our complete cluster, the resources, tasks, service, component health, etc.
  • On Services page, we can see all the service instances selected to be deployed. It shows the status as well like Running/Deploying.
  • The jobs page is to Create both one-off or scheduled jobs to perform tasks at a predefined interval.
  • There is a Catalog from where we can choose and install any of the tools provided like Confluent-Kafka, HDFS, Jenkins, Marathon, Spark, etc.
  • By clicking on Nodes (under Resources), we can see the usage of resources of each node and their health.
  • The Cluster Overview gives all the details regarding the cluster, from version to each of the technical details.
  • Components show all the DC/OS components and their health.
  • In Settings -> Package Repositories, we can add or delete a repository for installing packages.
  • On Organization page, we can add or remove users.

Adding another Node

Prepare the new node with CoreOS – Container Linux 1235.12.0 from  CloudSigma’s library. Log in to BootStrap node.

I am copying the dcos-install.tar, which I had created earlier, to the new node.

I am creating a directory for extracting installer files into. Along with that, I am unpacking the tar file.

 

Finally, I will install DCOS on the agent node.

For Private node, the command is
For Public node, the command is
 

To verify the agent node installation, we can run the following commands to confirm the number of nodes.

Public Agents:
Private Agents:
 

And we are good to go!

 

 

Share this Post

Leave a Reply