Improving Data-intensive Applications by Moving Data and Computation into Mixed Cloud/Fog Environments

The DITAS project is a Research and Innovation Action funded by the European Commission as part of the Horizon2020 Programme. The project started in January 2017 and will conclude in December 2019. The consortium includes 5 industry partners (including CloudSigma) and 3 research organisations from 6 European countries. The coordinator of the DITAS project is Dr. David García Pérez, from ATOS Spain.

The main objective of the DITAS project is to develop a framework that facilitates the deployment of data-intensive applications and services across complex, heterogeneous and distributed systems, hybrid-cloud and multi-cloud environments according to predefined specifications, while simplifying data management by abstracting away the complexity of the underlying infrastructure.

The challenge

The Internet of Things is changing the way we generate and collect data, allowing us to create smart networks, and implement predictive analytics and machine learning across an ever growing array of smart devices and IT platforms. The amount of data being generated is huge, as is the potential for exploiting these data. Industries such as health, manufacturing, smart cities, and smart grid can benefit greatly from the seemingly unlimited interconnections possible. However, the potential for exploiting such data is still largely untapped due to the distributed nature of these resources and the incompatibility with current processing and storage technologies. Fog Computing promises to fully exploit the potential of the edge of the network combining traditional devices and smart devices. With this approach data can be stored and processed closer to where they are produced and/or consumed, but can the same level of reliability and scalability be achieved in the fog as in the cloud?

The proposed solution

First of all, it should be understood that Fog Computing does not replace Cloud Computing, but rather complements it by allowing us to access and/or process time-sensitive data at the edge of the network, while leaving the heavy lifting to the cloud. Due to a sharp increase of smart devices connected to the Internet in recent years, operators at the edge of the network are no longer considered as content consumers but also as content providers. Therefore, a mixed cloud / edge is envisaged and forms the basis for the DITAS project.

DITAS General Approach

The execution environment

CloudSigma leads Work Package 4, which involves the development of an Execution Environment which is designed to take decisions about data and computational movement. The execution environment consists of, (1) a data movement enactor that, based on the information collected by the monitoring system, is able to select the most suitable data movement techniques, (2) A distributed monitoring and analytics system that is able to collect information about how the application behaves with respect to the data management, (3) An execution engine able to support the execution and the adaptation – through computational movement – of data-intensive applications distributed among on-premises and on cloud resources, (4) An Auditing and Compliance framework which will enforce data security and privacy policies across the DITAS architecture. The three main technical implementations in the project are as follows:

DITAS SDK

DITAS SDK provides extensions of popular tools such as Node-RED to define applications. The key element of this tool is to allow developers to design applications by specifying Virtual Data Containers (VDCs) and constraints/preferences for Cloud & Edge resources to be exploited. Applications are then deployed satisfying all constraints based on developer’s instructions and the degree of freedom given by the VDCs.

DITAS Virtual Data Containers

VDCs provide an abstraction layer for developers so they can focus only on data, what they want to use and why, forgetting about implementation details. With VDCs applications can easily access required data, in the desired format and with the proper quality level, rather than directly searching for and accessing them among various data infrastructure providers. At design-time, VDCs allow developers to simply define data requirements, quality and how important data is. At run-time, VDCs are responsible for providing the right data and satisfying requirements by hiding the complex underlying infrastructure composed of different platforms, storage systems, and network capabilities.

DITAS Execution Environment

DITAS EE is based on our powerful execution engine capable of managing a distributed architecture and taking care of data movement and computation, maintaining coordination with other resources involved in the same application. DITAS EE also has a monitoring system capable of checking the status of the execution, track data movements, and collect all data necessary for understanding the behavior of the application.

So what does it mean for our customers?

The DITAS framework has the potential to impact many of our current customers, particularly those dealing with big data. One market segment in particular is the satellite and ground based Earth Observation data processors. CloudSigma hosts a number of large EO data sets, most notably up-to-date archives of Sentinel-1 and Sentinel-2. We use OpenStack SWIFT to facilitate object storage atop of our custom built ZFS-based clustered storage solution. The DITAS platform could potentially be used by CloudSigma to distribute / replicate these big data sets across cloud locations in order to better serve customer usage patterns of specific subsets of this data. Should the DITAS VDC concept become a tangible part of OpenStack SWIFT’s object distribution and replication framework, strengthened with data subset relevance per cloud location extracted from the monitoring and analytics components of the DITAS architecture, this will significantly benefit CloudSigma customers by offering low latency high, speed EO data set access, regardless of the computing location. It will also allow CloudSigma to better utilise spare archive storage infrastructure in each of our cloud locations by balancing capacity across clouds.

About Vanya Nikova

Vanya is leading the Global Customer Development Team at CloudSigma. Beside that, she is responsible for a number of big data and big science partners and projects at CloudSigma. She has a Masters in Business Administration from the University of Mannheim, Germany with 10 years of work experience in sales and consulting services.

Leave a Reply