Cutting Latency Tail

Analyzing and Validating Replication without Canceling

Título traducido de la contribución: latencia de corte de la cola: Análisis y validación de la reproducción sin cancelar

Zhan Qiu, Juan F. Perez, Robert Birke, Lydia Chen, Peter G. Harrison

Resultado de la investigación: Contribución a RevistaArtículo

4 Citas (Scopus)

Resumen

La variabilidad del tiempo de respuesta en las aplicaciones de software puede degradar gravemente la calidad de la experiencia del usuario. Para reducir esta variabilidad, la replicación de solicitudes surge como una solución efectiva al generar múltiples copias de cada solicitud y utilizar el resultado de la primera que se complete. La mayoría de los estudios anteriores se han centrado principalmente en la latencia media de los sistemas que implementan la cancelación de réplicas, es decir, todas las réplicas de una solicitud se cancelan una vez que termina la primera. En cambio, desarrollamos modelos para obtener la distribución del tiempo de respuesta para sistemas en los que la cancelación de réplicas puede ser demasiado costosa o inviable de implementar, como en los sistemas "rápidos", como los servicios web, o en los sistemas heredados. Además, introducimos un nuevo modelo de servicio para considerar explícitamente la correlación en los tiempos de procesamiento de las réplicas de las solicitudes, y diseñamos un algoritmo eficiente para parametrizar el modelo a partir de datos reales. Las evaluaciones exhaustivas de un benchmark de MATLAB y una aplicación web de tres niveles (MediaWiki) muestran una precisión notable, por ejemplo, un error medio del 7 (4 por ciento) en el tiempo de respuesta del percentil 99 para el benchmark (respectivamente, MediaWiki), cuyas peticiones se ejecutan en el orden de los segundos (respectivamente, milisegundos). De este modo, se obtiene información sobre los niveles óptimos de replicación a partir de este análisis cuantitativo preciso, bajo una amplia variedad de escenarios de sistema.
Idioma originalEnglish (US)
Número de artículo7932099
Páginas (desde-hasta)3128-3141
Número de páginas14
PublicaciónIEEE Transactions on Parallel and Distributed Systems
Volumen28
N.º11
DOI
EstadoPublished - nov 1 2017
Publicado de forma externa

Huella dactilar

Response time (computer systems)
Legacy systems
Application programs
Web services
MATLAB
Processing
Chemical analysis

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Citar esto

Qiu, Zhan ; Perez, Juan F. ; Birke, Robert ; Chen, Lydia ; Harrison, Peter G. / Cutting Latency Tail : Analyzing and Validating Replication without Canceling. En: IEEE Transactions on Parallel and Distributed Systems. 2017 ; Vol. 28, N.º 11. pp. 3128-3141.
@article{d704b62d690b4ad9b85bff436aa24344,
title = "Cutting Latency Tail: Analyzing and Validating Replication without Canceling",
abstract = "Response time variability in software applications can severely degrade the quality of the user experience. To reduce this variability, request replication emerges as an effective solution by spawning multiple copies of each request and using the result of the first one to complete. Most previous studies have mainly focused on the mean latency for systems implementing replica cancellation, i.e., all replicas of a request are canceled once the first one finishes. Instead, we develop models to obtain the response-time distribution for systems where replica cancellation may be too expensive or infeasible to implement, as in 'fast' systems, such as web services, or in legacy systems. Furthermore, we introduce a novel service model to explicitly consider correlation in the processing times of the request replicas, and design an efficient algorithm to parameterize the model from real data. Extensive evaluations on a MATLAB benchmark and a three-tier web application (MediaWiki) show remarkable accuracy, e.g., 7 (4 percent) average error on the 99th percentile response time for the benchmark (respectively, MediaWiki), the requests of which execute in the order of seconds (respectively, milliseconds). Insights into optimal replication levels are thereby gained from this precise quantitative analysis, under a wide variety of system scenarios.",
author = "Zhan Qiu and Perez, {Juan F.} and Robert Birke and Lydia Chen and Harrison, {Peter G.}",
year = "2017",
month = "11",
day = "1",
doi = "10.1109/TPDS.2017.2706268",
language = "English (US)",
volume = "28",
pages = "3128--3141",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "11",

}

Cutting Latency Tail : Analyzing and Validating Replication without Canceling. / Qiu, Zhan; Perez, Juan F.; Birke, Robert; Chen, Lydia; Harrison, Peter G.

En: IEEE Transactions on Parallel and Distributed Systems, Vol. 28, N.º 11, 7932099, 01.11.2017, p. 3128-3141.

Resultado de la investigación: Contribución a RevistaArtículo

TY - JOUR

T1 - Cutting Latency Tail

T2 - Analyzing and Validating Replication without Canceling

AU - Qiu, Zhan

AU - Perez, Juan F.

AU - Birke, Robert

AU - Chen, Lydia

AU - Harrison, Peter G.

PY - 2017/11/1

Y1 - 2017/11/1

N2 - Response time variability in software applications can severely degrade the quality of the user experience. To reduce this variability, request replication emerges as an effective solution by spawning multiple copies of each request and using the result of the first one to complete. Most previous studies have mainly focused on the mean latency for systems implementing replica cancellation, i.e., all replicas of a request are canceled once the first one finishes. Instead, we develop models to obtain the response-time distribution for systems where replica cancellation may be too expensive or infeasible to implement, as in 'fast' systems, such as web services, or in legacy systems. Furthermore, we introduce a novel service model to explicitly consider correlation in the processing times of the request replicas, and design an efficient algorithm to parameterize the model from real data. Extensive evaluations on a MATLAB benchmark and a three-tier web application (MediaWiki) show remarkable accuracy, e.g., 7 (4 percent) average error on the 99th percentile response time for the benchmark (respectively, MediaWiki), the requests of which execute in the order of seconds (respectively, milliseconds). Insights into optimal replication levels are thereby gained from this precise quantitative analysis, under a wide variety of system scenarios.

AB - Response time variability in software applications can severely degrade the quality of the user experience. To reduce this variability, request replication emerges as an effective solution by spawning multiple copies of each request and using the result of the first one to complete. Most previous studies have mainly focused on the mean latency for systems implementing replica cancellation, i.e., all replicas of a request are canceled once the first one finishes. Instead, we develop models to obtain the response-time distribution for systems where replica cancellation may be too expensive or infeasible to implement, as in 'fast' systems, such as web services, or in legacy systems. Furthermore, we introduce a novel service model to explicitly consider correlation in the processing times of the request replicas, and design an efficient algorithm to parameterize the model from real data. Extensive evaluations on a MATLAB benchmark and a three-tier web application (MediaWiki) show remarkable accuracy, e.g., 7 (4 percent) average error on the 99th percentile response time for the benchmark (respectively, MediaWiki), the requests of which execute in the order of seconds (respectively, milliseconds). Insights into optimal replication levels are thereby gained from this precise quantitative analysis, under a wide variety of system scenarios.

UR - http://www.scopus.com/inward/record.url?scp=85032457020&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032457020&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2017.2706268

DO - 10.1109/TPDS.2017.2706268

M3 - Article

VL - 28

SP - 3128

EP - 3141

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 11

M1 - 7932099

ER -