Managing Response Time Tails by Sharding

P. G. Harrison, N. M. Patel, J. F. Perez, Z. Qiu

Resultado de la investigación: Contribución a RevistaArtículo

Resumen

Matrix analytic methods are developed to compute the probability distribution of response times (i.e., data access times) in distributed storage systems protected by erasure coding, which is implemented by sharding a data object into N fragments, only K<; N of which are required to reconstruct the object. This leads to a partial-fork-join model with a choice of canceling policies for the redundant N−K tasks. The accuracy of the analytical model is supported by tests against simulation in a broad range of setups. At increasing workload intensities, numerical results show the extent to which increasing the redundancy level reduces the mean response time of storage reads and significantly flattens the tail of their distribution; this is demonstrated at medium-high quantiles, up to the 99th. The quantitative reduction in response time achieved by two policies for canceling redundant tasks is also shown: for cancel-at-finish and cancel-at-start, which limits the additional load introduced whilst losing the benefit of selectivity amongst fragment service times.
Idioma originalEnglish
Número de artículo5
PublicaciónACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS
Volumen4
N.º1
DOI
EstadoPublished - mar 2019

Citar esto

Harrison, P. G., Patel, N. M., Perez, J. F., & Qiu, Z. (2019). Managing Response Time Tails by Sharding. ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 4(1), [5]. https://doi.org/10.1145/3300143
Harrison, P. G. ; Patel, N. M. ; Perez, J. F. ; Qiu, Z. / Managing Response Time Tails by Sharding. En: ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS. 2019 ; Vol. 4, N.º 1.
@article{8676507522034678ab89a6c13e6cdd1c,
title = "Managing Response Time Tails by Sharding",
abstract = "Matrix analytic methods are developed to compute the probability distribution of response times (i.e., data access times) in distributed storage systems protected by erasure coding, which is implemented by sharding a data object into N fragments, only K<; N of which are required to reconstruct the object. This leads to a partial-fork-join model with a choice of canceling policies for the redundant N−K tasks. The accuracy of the analytical model is supported by tests against simulation in a broad range of setups. At increasing workload intensities, numerical results show the extent to which increasing the redundancy level reduces the mean response time of storage reads and significantly flattens the tail of their distribution; this is demonstrated at medium-high quantiles, up to the 99th. The quantitative reduction in response time achieved by two policies for canceling redundant tasks is also shown: for cancel-at-finish and cancel-at-start, which limits the additional load introduced whilst losing the benefit of selectivity amongst fragment service times.",
author = "Harrison, {P. G.} and Patel, {N. M.} and Perez, {J. F.} and Z. Qiu",
year = "2019",
month = "3",
doi = "10.1145/3300143",
language = "Ingl{\'e}s",
volume = "4",
number = "1",

}

Harrison, PG, Patel, NM, Perez, JF & Qiu, Z 2019, 'Managing Response Time Tails by Sharding', ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, vol. 4, n.º 1, 5. https://doi.org/10.1145/3300143

Managing Response Time Tails by Sharding. / Harrison, P. G.; Patel, N. M.; Perez, J. F.; Qiu, Z.

En: ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, Vol. 4, N.º 1, 5, 03.2019.

Resultado de la investigación: Contribución a RevistaArtículo

TY - JOUR

T1 - Managing Response Time Tails by Sharding

AU - Harrison, P. G.

AU - Patel, N. M.

AU - Perez, J. F.

AU - Qiu, Z.

PY - 2019/3

Y1 - 2019/3

N2 - Matrix analytic methods are developed to compute the probability distribution of response times (i.e., data access times) in distributed storage systems protected by erasure coding, which is implemented by sharding a data object into N fragments, only K<; N of which are required to reconstruct the object. This leads to a partial-fork-join model with a choice of canceling policies for the redundant N−K tasks. The accuracy of the analytical model is supported by tests against simulation in a broad range of setups. At increasing workload intensities, numerical results show the extent to which increasing the redundancy level reduces the mean response time of storage reads and significantly flattens the tail of their distribution; this is demonstrated at medium-high quantiles, up to the 99th. The quantitative reduction in response time achieved by two policies for canceling redundant tasks is also shown: for cancel-at-finish and cancel-at-start, which limits the additional load introduced whilst losing the benefit of selectivity amongst fragment service times.

AB - Matrix analytic methods are developed to compute the probability distribution of response times (i.e., data access times) in distributed storage systems protected by erasure coding, which is implemented by sharding a data object into N fragments, only K<; N of which are required to reconstruct the object. This leads to a partial-fork-join model with a choice of canceling policies for the redundant N−K tasks. The accuracy of the analytical model is supported by tests against simulation in a broad range of setups. At increasing workload intensities, numerical results show the extent to which increasing the redundancy level reduces the mean response time of storage reads and significantly flattens the tail of their distribution; this is demonstrated at medium-high quantiles, up to the 99th. The quantitative reduction in response time achieved by two policies for canceling redundant tasks is also shown: for cancel-at-finish and cancel-at-start, which limits the additional load introduced whilst losing the benefit of selectivity amongst fragment service times.

U2 - 10.1145/3300143

DO - 10.1145/3300143

M3 - Artículo

VL - 4

IS - 1

M1 - 5

ER -

Harrison PG, Patel NM, Perez JF, Qiu Z. Managing Response Time Tails by Sharding. ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS. 2019 mar;4(1). 5. https://doi.org/10.1145/3300143