Cutting Latency Tail: Analyzing and Validating Replication without Canceling

Zhan Qiu, Juan F. Perez, Robert Birke, Lydia Chen, Peter G. Harrison

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Response time variability in software applications can severely degrade the quality of the user experience. To reduce this variability, request replication emerges as an effective solution by spawning multiple copies of each request and using the result of the first one to complete. Most previous studies have mainly focused on the mean latency for systems implementing replica cancellation, i.e., all replicas of a request are canceled once the first one finishes. Instead, we develop models to obtain the response-time distribution for systems where replica cancellation may be too expensive or infeasible to implement, as in 'fast' systems, such as web services, or in legacy systems. Furthermore, we introduce a novel service model to explicitly consider correlation in the processing times of the request replicas, and design an efficient algorithm to parameterize the model from real data. Extensive evaluations on a MATLAB benchmark and a three-tier web application (MediaWiki) show remarkable accuracy, e.g., 7 (4 percent) average error on the 99th percentile response time for the benchmark (respectively, MediaWiki), the requests of which execute in the order of seconds (respectively, milliseconds). Insights into optimal replication levels are thereby gained from this precise quantitative analysis, under a wide variety of system scenarios.

Original languageEnglish (US)
Article number7932099
Pages (from-to)3128-3141
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume28
Issue number11
DOIs
StatePublished - Nov 1 2017
Externally publishedYes

Fingerprint

Response time (computer systems)
Legacy systems
Application programs
Web services
MATLAB
Processing
Chemical analysis

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

Qiu, Zhan ; Perez, Juan F. ; Birke, Robert ; Chen, Lydia ; Harrison, Peter G. / Cutting Latency Tail : Analyzing and Validating Replication without Canceling. In: IEEE Transactions on Parallel and Distributed Systems. 2017 ; Vol. 28, No. 11. pp. 3128-3141.
@article{d704b62d690b4ad9b85bff436aa24344,
title = "Cutting Latency Tail: Analyzing and Validating Replication without Canceling",
abstract = "Response time variability in software applications can severely degrade the quality of the user experience. To reduce this variability, request replication emerges as an effective solution by spawning multiple copies of each request and using the result of the first one to complete. Most previous studies have mainly focused on the mean latency for systems implementing replica cancellation, i.e., all replicas of a request are canceled once the first one finishes. Instead, we develop models to obtain the response-time distribution for systems where replica cancellation may be too expensive or infeasible to implement, as in 'fast' systems, such as web services, or in legacy systems. Furthermore, we introduce a novel service model to explicitly consider correlation in the processing times of the request replicas, and design an efficient algorithm to parameterize the model from real data. Extensive evaluations on a MATLAB benchmark and a three-tier web application (MediaWiki) show remarkable accuracy, e.g., 7 (4 percent) average error on the 99th percentile response time for the benchmark (respectively, MediaWiki), the requests of which execute in the order of seconds (respectively, milliseconds). Insights into optimal replication levels are thereby gained from this precise quantitative analysis, under a wide variety of system scenarios.",
author = "Zhan Qiu and Perez, {Juan F.} and Robert Birke and Lydia Chen and Harrison, {Peter G.}",
year = "2017",
month = "11",
day = "1",
doi = "10.1109/TPDS.2017.2706268",
language = "English (US)",
volume = "28",
pages = "3128--3141",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "11",

}

Cutting Latency Tail : Analyzing and Validating Replication without Canceling. / Qiu, Zhan; Perez, Juan F.; Birke, Robert; Chen, Lydia; Harrison, Peter G.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 28, No. 11, 7932099, 01.11.2017, p. 3128-3141.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Cutting Latency Tail

T2 - Analyzing and Validating Replication without Canceling

AU - Qiu, Zhan

AU - Perez, Juan F.

AU - Birke, Robert

AU - Chen, Lydia

AU - Harrison, Peter G.

PY - 2017/11/1

Y1 - 2017/11/1

N2 - Response time variability in software applications can severely degrade the quality of the user experience. To reduce this variability, request replication emerges as an effective solution by spawning multiple copies of each request and using the result of the first one to complete. Most previous studies have mainly focused on the mean latency for systems implementing replica cancellation, i.e., all replicas of a request are canceled once the first one finishes. Instead, we develop models to obtain the response-time distribution for systems where replica cancellation may be too expensive or infeasible to implement, as in 'fast' systems, such as web services, or in legacy systems. Furthermore, we introduce a novel service model to explicitly consider correlation in the processing times of the request replicas, and design an efficient algorithm to parameterize the model from real data. Extensive evaluations on a MATLAB benchmark and a three-tier web application (MediaWiki) show remarkable accuracy, e.g., 7 (4 percent) average error on the 99th percentile response time for the benchmark (respectively, MediaWiki), the requests of which execute in the order of seconds (respectively, milliseconds). Insights into optimal replication levels are thereby gained from this precise quantitative analysis, under a wide variety of system scenarios.

AB - Response time variability in software applications can severely degrade the quality of the user experience. To reduce this variability, request replication emerges as an effective solution by spawning multiple copies of each request and using the result of the first one to complete. Most previous studies have mainly focused on the mean latency for systems implementing replica cancellation, i.e., all replicas of a request are canceled once the first one finishes. Instead, we develop models to obtain the response-time distribution for systems where replica cancellation may be too expensive or infeasible to implement, as in 'fast' systems, such as web services, or in legacy systems. Furthermore, we introduce a novel service model to explicitly consider correlation in the processing times of the request replicas, and design an efficient algorithm to parameterize the model from real data. Extensive evaluations on a MATLAB benchmark and a three-tier web application (MediaWiki) show remarkable accuracy, e.g., 7 (4 percent) average error on the 99th percentile response time for the benchmark (respectively, MediaWiki), the requests of which execute in the order of seconds (respectively, milliseconds). Insights into optimal replication levels are thereby gained from this precise quantitative analysis, under a wide variety of system scenarios.

UR - http://www.scopus.com/inward/record.url?scp=85032457020&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032457020&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2017.2706268

DO - 10.1109/TPDS.2017.2706268

M3 - Article

AN - SCOPUS:85032457020

VL - 28

SP - 3128

EP - 3141

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 11

M1 - 7932099

ER -