Abstract
Computing clusters have been widely deployed for scientific and engineering applications to support intensive computation and massive data operations. As applications and resources in a cluster are subject to failures, fault-tolerance strategies are commonly adopted, sometimes at the expense of additional delays in job response times, or unnecessarily increasing resource usage. In this paper, we explore concurrent replication with canceling, a fault-tolerance approach where jobs and their replicas are processed concurrently, and the successful completion of either triggers the removals of its replica. We propose a stochastic model to study how this approach affects the cluster service level objectives (SLOs), particularly the offered response time percentiles. In addition to the expected gains in reliability, the proposed model allows us to determine the regions of the utilization where introducing replication with canceling effectively reduces the response times. Moreover, we show how this model can support resource provisioning decisions with reliability and response time guarantees.
Translated title of the contribution | Mejora de la fiabilidad y los tiempos de respuesta mediante la replicación en clústeres informáticos |
---|---|
Original language | English (US) |
Title of host publication | 2015 IEEE Conference on Computer Communications, IEEE INFOCOM 2015 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1355-1363 |
Number of pages | 9 |
Volume | 26 |
ISBN (Electronic) | 9781479983810 |
DOIs | |
State | Published - Aug 21 2015 |
Externally published | Yes |
Event | 34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015 - Hong Kong, Hong Kong Duration: Apr 26 2015 → May 1 2015 |
Conference
Conference | 34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015 |
---|---|
Country/Territory | Hong Kong |
City | Hong Kong |
Period | 4/26/15 → 5/1/15 |
All Science Journal Classification (ASJC) codes
- General Computer Science
- Electrical and Electronic Engineering