ditas

Improving Data-intensive Applications by Moving Data and Computation into Mixed Cloud/Fog Environments

Ditas Logo

The DITAS project is a Research and Innovation Action funded by the European Commission as part of the Horizon2020 Programme. The project started in January 2017 and will conclude in December 2019. The consortium includes 5 industry partners (including CloudSigma) and 3 research organisations from 6 European countries. The coordinator of the DITAS project is Dr. David García Pérez, from ATOS Spain.

The main objective of the DITAS project is to develop a framework that facilitates the deployment of data-intensive applications and services across complex, heterogeneous and distributed systems, hybrid-cloud and multi-cloud environments according to predefined specifications, while simplifying data management by abstracting away the complexity of the underlying infrastructure.

The challenge

The Internet of Things is changing the way we generate and collect data, allowing us to create smart networks, and implement predictive analytics and machine learning across an ever growing array of smart devices and IT platforms. So the amount of data it generates is huge, as is the potential for exploiting this data. Industries such as health, manufacturing, smart cities, and smart grid can benefit greatly from the seemingly unlimited interconnections possible. However, the potential for exploiting such data is still hard to realize due to the distributed nature of these resources and the incompatibility with current processing and storage technologies.

Fog Computing promises to fully exploit the potential of the edge of the network combining traditional devices and smart devices. With this approach we could store and process data closer to production or consumption, but can we achieve the same level of reliability and scalability in the fog as in the cloud?

The proposed solution

First of all, it should be understood that Fog Computing does not replace Cloud Computing, but rather complements it. It does this by allowing us to access and/or process time-sensitive data at the edge of the network, while leaving the heavy lifting to the cloud. Due to a sharp increase of smart devices connected to the Internet in recent years, operators at the edge of the network are no longer considered as content consumers but also as content providers. Therefore, a mixed cloud / edge is an option and forms the basis for the DITAS project.

DITAS General Approach 1024x542

DITAS General Approach

The execution environment

CloudSigma leads Work Package 4, which involves the development of an Execution Environment. This is designed to take decisions about data and computational movement. It consists of the following:

    1. A data movement enactor based on the information collected by the monitoring system. It can select the most suitable data movement techniques.
    2. A distributed monitoring and analytics system that can collect information about how the application behaves with respect to data management.
    3. An execution engine able to support the execution and the adaptation of data-intensive applications. This is done through computational movement and the applications are distributed among on-premises and on cloud resources.
    4. An Auditing and Compliance framework which will enforce data security and privacy policies across the DITAS architecture.

The three main technical implementations in the project are as follows:

Mi

DITAS SDK

DITAS SDK provides extensions of popular tools such as Node-RED to define applications. The key element of this tool is to allow developers to design applications by specifying Virtual Data Containers (VDCs) and constraints/preferences for Cloud & Edge resources to be exploited. Applications are then deployed satisfying all constraints based on developer’s instructions and the degree of freedom given by the VDCs.

DITAS Virtual Data Containers

VDCs provide an abstraction layer for developers so they can focus only on data, what they want to use and why, forgetting about implementation details. With VDCs applications can easily access required data, in the desired format and with the proper quality level, rather than directly searching for and accessing them among various data infrastructure providers. At design-time, VDCs allow developers to simply define data requirements, quality and how important data is. At run-time, VDCs are responsible for providing the right data and satisfying requirements by hiding the complex underlying infrastructure composed of different platforms, storage systems, and network capabilities.

DITAS Execution Environment

The DITAS EE is based on our powerful execution engine capable of managing a distributed architecture and taking care of data movement and computation, maintaining coordination with other resources involved in the same application. DITAS EE also has a monitoring system capable of checking the status of the execution, track data movements, and collect all data necessary for understanding the behavior of the application.

So what does it mean for our customers?

The DITAS framework has the potential to impact many of our current customers, particularly those dealing with big data. One market segment in particular is the satellite and ground based Earth Observation data processors. CloudSigma hosts a number of large EO data sets, most notably up-to-date archives of Sentinel-1 and Sentinel-2. We use OpenStack SWIFT to facilitate object storage atop of our custom built ZFS-based clustered storage solution. The DITAS platform could potentially serve CloudSigma to distribute / replicate these big data sets across cloud locations. This will better serve customer usage patterns of specific subsets of this data.

Should the DITAS VDC concept become a tangible part of OpenStack SWIFT’s object distribution and replication framework, strengthened with data subset relevance per cloud location extracted from the monitoring and analytics components of the DITAS architecture, this will significantly benefit CloudSigma customers by offering low latency high, speed EO data set access, regardless of the computing location. It will also allow CloudSigma to better utilize spare archive storage in each cloud location by balancing capacity across clouds.

21774781b83819cd2d06995ad9d229f0?s=80&r=g

About Peter Gray

Peter is a Project Manager working predominantly on H2020 Framework Research and Innovation projects facilitated by the European Union and the Swiss State Secreteriat for Education and Innovation (SERI). His background is in the cultural heritage sector where he has managed a number of large-scale digitisation projects for various museums, national archives, and libraries.