Installing Hadoop on a Single Node in Five Simple Steps

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect …

Setting up the DCOS, distributed operating system

According to dcos.io, DC/OS is a distributed operating system based on the Apache Mesos distributed systems kernel. It enables the management of multiple machines as if they were a single computer. It automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate …

Host your own Git Repositories with GitLab

In this post, I am going to demonstrate the installation of GitLab. With GitLab, we can host our own repositories at a central place with the ease of the Git features. GitLab is the first single application for all stages of the DevOps lifecycle. Only GitLab enables Concurrent DevOps, unlocking organizations from the constraints of the toolchain. GitLab provides unmatched visibility, …

Setting up a Blog Using Ghost

Ghost is a fully open source adaptable platform for building and running modern online publications. With Ghost, setting up a blog is a child’s play. It gets easier with CloudSigma’s already prepared base library OS images. To start with, I am creating a machine on CloudSigma with 5 GHz CPU and 8 GB RAM. I am naming it “Ghost-Blog” and optimizing …

Realtime Twitter Data Ingestion using Flume

With more than 330 million active users, Twitter is one of the top platforms where people like to share their thoughts. Twitter data can be used for a variety of purposes such as research, consumer insights, demographic insights and many more. Twitter data insights are especially useful for businesses as they allow for the analysis of large amounts of data …

Setting up a Big Data Cluster on Cloudera

CDH is Cloudera’s 100% open source platform distribution, including Apache Hadoop and built specifically to meet enterprise demands. CDH delivers everything you need for enterprise use right out of the box. By integrating Hadoop with more than a dozen other critical open source projects, Cloudera has created a functionally advanced system that helps you perform end-to-end Big Data workflows. (Source) …

Setting up a Big Data Cluster within Minutes in 3 Easy Steps

HDP is the industry’s a truly secure, enterprise-ready open source Apache™ Hadoop® distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust big data analytics that accelerate decision making and innovation. (Source: https://hortonworks.com/products/data-platforms/hdp/). I am going to install HDP to create a cluster of 5 nodes deployed on CloudSigma. …

CloudSigma How To Series: Drive Snapshots

In this tutorial of CloudSigma’s How-to Series we walk you through one very important feature of CloudSigma – snapshots! With this feature you are able to create point-in-time snapshots of your drives, which can later be cloned and upgraded to create stand-alone drives. A snapshot can be created on-demand while the server is running, thus in no way affecting the …