A Big Data Placement Strategy in Geographically Distributed Datacenters - Recherche en informatique (CRI)
 Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

A Big Data Placement Strategy in Geographically Distributed Datacenters

Résumé

With the pervasivness of the "Big Data" characteristic together with the expansion of geographically distributed datacenters in the Cloud computing context, processing large-scale data applications has become a crucial issue. Indeed, the task of finding the most efficient way of storing massive data across distributed locations is increasingly complex. Furthermore, the execution time of a given task that requires several datasets might be dominated by the cost of data migrations/exchanges, which depends on the initial placement of the input datasets over the set of datacenters in the Cloud and also on the dynamic data management strategy. In this paper, we propose a data placement strategy to improve the workflow execution time through the reduction of the cost associated to data movements between geographically distributed datacenters, considering their characteristics such as storage capacity and read/write speeds. We formalize the overall problem and then propose a data placement algorithm structured into two phases. First, we compute the estimated transfer time to move all involved datasets from their respective locations to the one where the corresponding tasks are executed. Second, we apply a greedy algorithm in order to assign each dataset to the optimal datacenter w.r.t the overall cost of data migrations. The heterogeneity of the datacenters together with their caracteristics (storage and bandwith) are both taken into account. Our experiments are conducted using Cloudsim simulator. The obtained results show that our proposed strategy produces an efficient placement and actually reduces the overheads of the data movement compared to both a random assignment and a selected placement algorithm from the litterature.
Fichier principal
Vignette du fichier
A-735.pdf (369.64 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02958822 , version 1 (06-10-2020)

Identifiants

  • HAL Id : hal-02958822 , version 1

Citer

Laila Bouhouch, Mostapha Zbakh, Claude Tadonki. A Big Data Placement Strategy in Geographically Distributed Datacenters. The 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications, May 2020, Marrakesh, Morocco. ⟨hal-02958822⟩
46 Consultations
117 Téléchargements

Partager

Gmail Facebook X LinkedIn More