Variability-aware request replication for latency curtailment

Zhan Qiu, Juan F. Pérez, Peter G. Harrison

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Processing time variability is commonplace in distributed systems, where resources display disparate performance due to, e.g., different workload levels, background processes, and contention in virtualized environments. However, it is paramount for service providers to keep variability in response time under control in order to offer responsive services. We investigate how request replication can be used to exploit processing time variability to reduce response times, considering not only mean values but also the tail of the response time distribution. We focus on the distributed setup, where replication is achieved by running copies of requests on multiple servers that otherwise evolve independently, and waiting for the first replica to complete service. We construct models that capture the evolution of a system with replicated requests using approximate methods and observe that highly variable service times offer the best opportunities for replication - reducing the response time tail in particular. Further, the effect of replication is non-uniform over the response time distribution: gains in one metric, e.g., the mean, can be at the cost of another, e.g., the tail percentiles. This is demonstrated in wide range of numerical virtual experiments. It can be seen that capturing service time variability is key to the evaluation of latency tolerance strategies and in their design.

Original languageEnglish (US)
Title of host publicationIEEE INFOCOM 2016 - 35th Annual IEEE International Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
Volume2016-July
ISBN (Electronic)9781467399531
DOIs
StatePublished - Jul 27 2016
Externally publishedYes
Event35th Annual IEEE International Conference on Computer Communications, IEEE INFOCOM 2016 - San Francisco, United States
Duration: Apr 10 2016Apr 14 2016

Conference

Conference35th Annual IEEE International Conference on Computer Communications, IEEE INFOCOM 2016
CountryUnited States
CitySan Francisco
Period4/10/164/14/16

Fingerprint

Processing
Servers
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Electrical and Electronic Engineering

Cite this

Qiu, Z., Pérez, J. F., & Harrison, P. G. (2016). Variability-aware request replication for latency curtailment. In IEEE INFOCOM 2016 - 35th Annual IEEE International Conference on Computer Communications (Vol. 2016-July). [7524365] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFOCOM.2016.7524365
Qiu, Zhan ; Pérez, Juan F. ; Harrison, Peter G. / Variability-aware request replication for latency curtailment. IEEE INFOCOM 2016 - 35th Annual IEEE International Conference on Computer Communications. Vol. 2016-July Institute of Electrical and Electronics Engineers Inc., 2016.
@inproceedings{41d357b44ccb42f3aa8ea38a1b1502df,
title = "Variability-aware request replication for latency curtailment",
abstract = "Processing time variability is commonplace in distributed systems, where resources display disparate performance due to, e.g., different workload levels, background processes, and contention in virtualized environments. However, it is paramount for service providers to keep variability in response time under control in order to offer responsive services. We investigate how request replication can be used to exploit processing time variability to reduce response times, considering not only mean values but also the tail of the response time distribution. We focus on the distributed setup, where replication is achieved by running copies of requests on multiple servers that otherwise evolve independently, and waiting for the first replica to complete service. We construct models that capture the evolution of a system with replicated requests using approximate methods and observe that highly variable service times offer the best opportunities for replication - reducing the response time tail in particular. Further, the effect of replication is non-uniform over the response time distribution: gains in one metric, e.g., the mean, can be at the cost of another, e.g., the tail percentiles. This is demonstrated in wide range of numerical virtual experiments. It can be seen that capturing service time variability is key to the evaluation of latency tolerance strategies and in their design.",
author = "Zhan Qiu and P{\'e}rez, {Juan F.} and Harrison, {Peter G.}",
year = "2016",
month = "7",
day = "27",
doi = "10.1109/INFOCOM.2016.7524365",
language = "English (US)",
volume = "2016-July",
booktitle = "IEEE INFOCOM 2016 - 35th Annual IEEE International Conference on Computer Communications",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Qiu, Z, Pérez, JF & Harrison, PG 2016, Variability-aware request replication for latency curtailment. in IEEE INFOCOM 2016 - 35th Annual IEEE International Conference on Computer Communications. vol. 2016-July, 7524365, Institute of Electrical and Electronics Engineers Inc., 35th Annual IEEE International Conference on Computer Communications, IEEE INFOCOM 2016, San Francisco, United States, 4/10/16. https://doi.org/10.1109/INFOCOM.2016.7524365

Variability-aware request replication for latency curtailment. / Qiu, Zhan; Pérez, Juan F.; Harrison, Peter G.

IEEE INFOCOM 2016 - 35th Annual IEEE International Conference on Computer Communications. Vol. 2016-July Institute of Electrical and Electronics Engineers Inc., 2016. 7524365.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Variability-aware request replication for latency curtailment

AU - Qiu, Zhan

AU - Pérez, Juan F.

AU - Harrison, Peter G.

PY - 2016/7/27

Y1 - 2016/7/27

N2 - Processing time variability is commonplace in distributed systems, where resources display disparate performance due to, e.g., different workload levels, background processes, and contention in virtualized environments. However, it is paramount for service providers to keep variability in response time under control in order to offer responsive services. We investigate how request replication can be used to exploit processing time variability to reduce response times, considering not only mean values but also the tail of the response time distribution. We focus on the distributed setup, where replication is achieved by running copies of requests on multiple servers that otherwise evolve independently, and waiting for the first replica to complete service. We construct models that capture the evolution of a system with replicated requests using approximate methods and observe that highly variable service times offer the best opportunities for replication - reducing the response time tail in particular. Further, the effect of replication is non-uniform over the response time distribution: gains in one metric, e.g., the mean, can be at the cost of another, e.g., the tail percentiles. This is demonstrated in wide range of numerical virtual experiments. It can be seen that capturing service time variability is key to the evaluation of latency tolerance strategies and in their design.

AB - Processing time variability is commonplace in distributed systems, where resources display disparate performance due to, e.g., different workload levels, background processes, and contention in virtualized environments. However, it is paramount for service providers to keep variability in response time under control in order to offer responsive services. We investigate how request replication can be used to exploit processing time variability to reduce response times, considering not only mean values but also the tail of the response time distribution. We focus on the distributed setup, where replication is achieved by running copies of requests on multiple servers that otherwise evolve independently, and waiting for the first replica to complete service. We construct models that capture the evolution of a system with replicated requests using approximate methods and observe that highly variable service times offer the best opportunities for replication - reducing the response time tail in particular. Further, the effect of replication is non-uniform over the response time distribution: gains in one metric, e.g., the mean, can be at the cost of another, e.g., the tail percentiles. This is demonstrated in wide range of numerical virtual experiments. It can be seen that capturing service time variability is key to the evaluation of latency tolerance strategies and in their design.

UR - http://www.scopus.com/inward/record.url?scp=84983362827&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84983362827&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM.2016.7524365

DO - 10.1109/INFOCOM.2016.7524365

M3 - Conference contribution

AN - SCOPUS:84983362827

VL - 2016-July

BT - IEEE INFOCOM 2016 - 35th Annual IEEE International Conference on Computer Communications

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Qiu Z, Pérez JF, Harrison PG. Variability-aware request replication for latency curtailment. In IEEE INFOCOM 2016 - 35th Annual IEEE International Conference on Computer Communications. Vol. 2016-July. Institute of Electrical and Electronics Engineers Inc. 2016. 7524365 https://doi.org/10.1109/INFOCOM.2016.7524365