Evaluating the effectiveness of replication for tail-tolerance

Título traducido de la contribución: Evaluación de la eficacia de la réplica para la tolerancia de la cola

Zhan Qiu, Juan F. Perez

Resultado de la investigación: Contribución a libro /Tipo informe o reporteContribución en conferencia

10 Citas (Scopus)

Resumen

Los clústeres informáticos (CC) son una plataforma rentable y de alto rendimiento para aplicaciones científicas y de ingeniería intensivas en computación. Un desafío clave en la gestión de los CCs es lograr consistentemente tiempos de respuesta bajos. En particular, los métodos tolerantes a la cola tienen como objetivo mantener la cola de la distribución del tiempo de respuesta corta. En este trabajo exploramos la replicación concurrente con la cancelación, un enfoque tolerante a la cola que implica procesar las solicitudes y sus réplicas simultáneamente, recuperar el resultado de la primera réplica que se completa y cancelar todas las demás réplicas. Proponemos un modelo estocástico que considera cualquier número de réplicas, tiempos generales de procesamiento y entre llegadas, y calcula la distribución del tiempo de respuesta. Demostramos que la replicación puede ser muy efectiva para mantener el tiempo de respuesta corto, pero estos beneficios dependen en gran medida de la distribución del tiempo de procesamiento, así como de la utilización del CC y de las características estadísticas del proceso de llegada. También explotamos el modelo para apoyar la selección del número óptimo de réplicas, y una estrategia de aprovisionamiento de recursos que cumpla con los objetivos de nivel de servicio en los percentiles de tiempo de respuesta.
Idioma originalEnglish (US)
Título de la publicación alojadaProceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015
EditorialInstitute of Electrical and Electronics Engineers Inc.
Páginas443-452
Número de páginas10
ISBN (versión digital)9781479980062
DOI
EstadoPublished - ene 1 2015
Publicado de forma externa
Evento15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015 - Shenzhen
Duración: may 4 2015may 7 2015

Conference

Conference15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015
PaísChina
CiudadShenzhen
Período5/4/155/7/15

Huella dactilar

Cluster computing
Processing
Stochastic models
Costs

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Computer Networks and Communications
  • Software

Citar esto

Qiu, Z., & Perez, J. F. (2015). Evaluating the effectiveness of replication for tail-tolerance. En Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015 (pp. 443-452). [7152510] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CCGrid.2015.22
Qiu, Zhan ; Perez, Juan F. / Evaluating the effectiveness of replication for tail-tolerance. Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 443-452
@inproceedings{24d21b8a7dd8451ab3f3e6af3f1a6b65,
title = "Evaluating the effectiveness of replication for tail-tolerance",
abstract = "Computing clusters (CC) are a cost-effective high-performance platform for computation-intensive scientific and engineering applications. A key challenge in managing CCs is to consistently achieve low response times. In particular, tail-tolerant methods aim to keep the tail of the response-time distribution short. In this paper we explore concurrent replication with cancelling, a tail-tolerant approach that involves processing requests and their replicas concurrently, retrieving the result from the first replica that completes, and cancelling all other replicas. We propose a stochastic model that considers any number of replicas, general processing and inter-arrival times, and computes the response time distribution. We show that replication can be very effective in keeping the response-time tail short, but these benefits highly depend on the processing-time distribution, as well as on the CC utilization and the statistical characteristics of the arrival process. We also exploit the model to support the selection of the optimal number of replicas, and a resource provisioning strategy that meets service-level objectives on the response-time percentiles.",
author = "Zhan Qiu and Perez, {Juan F.}",
year = "2015",
month = "1",
day = "1",
doi = "10.1109/CCGrid.2015.22",
language = "English (US)",
pages = "443--452",
booktitle = "Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Qiu, Z & Perez, JF 2015, Evaluating the effectiveness of replication for tail-tolerance. En Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015., 7152510, Institute of Electrical and Electronics Engineers Inc., pp. 443-452, Shenzhen, 5/4/15. https://doi.org/10.1109/CCGrid.2015.22

Evaluating the effectiveness of replication for tail-tolerance. / Qiu, Zhan; Perez, Juan F.

Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 443-452 7152510.

Resultado de la investigación: Contribución a libro /Tipo informe o reporteContribución en conferencia

TY - GEN

T1 - Evaluating the effectiveness of replication for tail-tolerance

AU - Qiu, Zhan

AU - Perez, Juan F.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Computing clusters (CC) are a cost-effective high-performance platform for computation-intensive scientific and engineering applications. A key challenge in managing CCs is to consistently achieve low response times. In particular, tail-tolerant methods aim to keep the tail of the response-time distribution short. In this paper we explore concurrent replication with cancelling, a tail-tolerant approach that involves processing requests and their replicas concurrently, retrieving the result from the first replica that completes, and cancelling all other replicas. We propose a stochastic model that considers any number of replicas, general processing and inter-arrival times, and computes the response time distribution. We show that replication can be very effective in keeping the response-time tail short, but these benefits highly depend on the processing-time distribution, as well as on the CC utilization and the statistical characteristics of the arrival process. We also exploit the model to support the selection of the optimal number of replicas, and a resource provisioning strategy that meets service-level objectives on the response-time percentiles.

AB - Computing clusters (CC) are a cost-effective high-performance platform for computation-intensive scientific and engineering applications. A key challenge in managing CCs is to consistently achieve low response times. In particular, tail-tolerant methods aim to keep the tail of the response-time distribution short. In this paper we explore concurrent replication with cancelling, a tail-tolerant approach that involves processing requests and their replicas concurrently, retrieving the result from the first replica that completes, and cancelling all other replicas. We propose a stochastic model that considers any number of replicas, general processing and inter-arrival times, and computes the response time distribution. We show that replication can be very effective in keeping the response-time tail short, but these benefits highly depend on the processing-time distribution, as well as on the CC utilization and the statistical characteristics of the arrival process. We also exploit the model to support the selection of the optimal number of replicas, and a resource provisioning strategy that meets service-level objectives on the response-time percentiles.

UR - http://www.scopus.com/inward/record.url?scp=84941210218&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84941210218&partnerID=8YFLogxK

U2 - 10.1109/CCGrid.2015.22

DO - 10.1109/CCGrid.2015.22

M3 - Conference contribution

SP - 443

EP - 452

BT - Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Qiu Z, Perez JF. Evaluating the effectiveness of replication for tail-tolerance. En Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 443-452. 7152510 https://doi.org/10.1109/CCGrid.2015.22