On the latency-accuracy tradeoff in approximate MapReduce jobs

Título traducido de la contribución: Sobre la compensación de latencia y precisión en los trabajos aproximados de MapReduce

Juan F. Perez, Robert Birke, Lydia Y. Chen

Resultado de la investigación: Contribución a libro /Tipo informe o reporteContribución en conferencia

2 Citas (Scopus)

Resumen

Para asegurar la escalabilidad de los grandes análisis de datos, las plataformas MapReduce aproximadas emergen para intercambiar explícitamente la precisión por la latencia. Un paso clave para determinar los niveles óptimos de aproximación es capturar la latencia de los grandes trabajos de datos, que durante mucho tiempo se ha considerado un reto debido a la compleja dependencia entre las entradas de datos y las tareas de mapeo/reducción. En este trabajo, utilizamos métodos analíticos matriciales para derivar modelos estocásticos que pueden predecir un amplio espectro de métricas de latencia, por ejemplo, promedio, colas y distribuciones, para trabajos aproximados de MapReduce que están sujetos a estrategias de muestreo de entrada y reducción de tareas. Además de capturar la dependencia entre las oleadas de tareas de mapas/reducir, nuestros modelos incorporan dos políticas de programación de trabajos, a saber, exclusivas y superpuestas, y dos estrategias de eliminación de tareas, a saber, temprana y rezagada, lo que nos permite evaluar de manera realista las ganancias potenciales de rendimiento de la computación aproximada. Nuestro análisis numérico muestra que los modelos propuestos pueden guiar a las grandes plataformas de datos para determinar las estrategias de aproximación óptimas y los grados de aproximación.
Idioma originalEnglish (US)
Título de la publicación alojadaINFOCOM 2017 - IEEE Conference on Computer Communications
EditorialInstitute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)9781509053360
DOI
EstadoPublished - oct 2 2017
Evento2017 IEEE Conference on Computer Communications, INFOCOM 2017 - Atlanta
Duración: may 1 2017may 4 2017

Conference

Conference2017 IEEE Conference on Computer Communications, INFOCOM 2017
PaísUnited States
CiudadAtlanta
Período5/1/175/4/17

Huella dactilar

Stochastic models
Scalability
Numerical analysis
Scheduling
Sampling
Big data

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Electrical and Electronic Engineering

Citar esto

Perez, J. F., Birke, R., & Chen, L. Y. (2017). On the latency-accuracy tradeoff in approximate MapReduce jobs. En INFOCOM 2017 - IEEE Conference on Computer Communications [8057038] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFOCOM.2017.8057038
Perez, Juan F. ; Birke, Robert ; Chen, Lydia Y. / On the latency-accuracy tradeoff in approximate MapReduce jobs. INFOCOM 2017 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2017.
@inproceedings{f847d7d4eb1e42828fd09e4b9d41f906,
title = "On the latency-accuracy tradeoff in approximate MapReduce jobs",
abstract = "To ensure the scalability of big data analytics, approximate MapReduce platforms emerge to explicitly trade off accuracy for latency. A key step to determine optimal approximation levels is to capture the latency of big data jobs, which is long deemed challenging due to the complex dependency among data inputs and map/reduce tasks. In this paper, we use matrix analytic methods to derive stochastic models that can predict a wide spectrum of latency metrics, e.g., average, tails, and distributions, for approximate MapReduce jobs that are subject to strategies of input sampling and task dropping. In addition to capturing the dependency among waves of map/reduce tasks, our models incorporate two job scheduling policies, namely, exclusive and overlapping, and two task dropping strategies, namely, early and straggler, enabling us to realistically evaluate the potential performance gains of approximate computing. Our numerical analysis shows that the proposed models can guide big data platforms to determine the optimal approximation strategies and degrees of approximation.",
author = "Perez, {Juan F.} and Robert Birke and Chen, {Lydia Y.}",
year = "2017",
month = "10",
day = "2",
doi = "10.1109/INFOCOM.2017.8057038",
language = "English (US)",
booktitle = "INFOCOM 2017 - IEEE Conference on Computer Communications",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Perez, JF, Birke, R & Chen, LY 2017, On the latency-accuracy tradeoff in approximate MapReduce jobs. En INFOCOM 2017 - IEEE Conference on Computer Communications., 8057038, Institute of Electrical and Electronics Engineers Inc., Atlanta, 5/1/17. https://doi.org/10.1109/INFOCOM.2017.8057038

On the latency-accuracy tradeoff in approximate MapReduce jobs. / Perez, Juan F.; Birke, Robert; Chen, Lydia Y.

INFOCOM 2017 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2017. 8057038.

Resultado de la investigación: Contribución a libro /Tipo informe o reporteContribución en conferencia

TY - GEN

T1 - On the latency-accuracy tradeoff in approximate MapReduce jobs

AU - Perez, Juan F.

AU - Birke, Robert

AU - Chen, Lydia Y.

PY - 2017/10/2

Y1 - 2017/10/2

N2 - To ensure the scalability of big data analytics, approximate MapReduce platforms emerge to explicitly trade off accuracy for latency. A key step to determine optimal approximation levels is to capture the latency of big data jobs, which is long deemed challenging due to the complex dependency among data inputs and map/reduce tasks. In this paper, we use matrix analytic methods to derive stochastic models that can predict a wide spectrum of latency metrics, e.g., average, tails, and distributions, for approximate MapReduce jobs that are subject to strategies of input sampling and task dropping. In addition to capturing the dependency among waves of map/reduce tasks, our models incorporate two job scheduling policies, namely, exclusive and overlapping, and two task dropping strategies, namely, early and straggler, enabling us to realistically evaluate the potential performance gains of approximate computing. Our numerical analysis shows that the proposed models can guide big data platforms to determine the optimal approximation strategies and degrees of approximation.

AB - To ensure the scalability of big data analytics, approximate MapReduce platforms emerge to explicitly trade off accuracy for latency. A key step to determine optimal approximation levels is to capture the latency of big data jobs, which is long deemed challenging due to the complex dependency among data inputs and map/reduce tasks. In this paper, we use matrix analytic methods to derive stochastic models that can predict a wide spectrum of latency metrics, e.g., average, tails, and distributions, for approximate MapReduce jobs that are subject to strategies of input sampling and task dropping. In addition to capturing the dependency among waves of map/reduce tasks, our models incorporate two job scheduling policies, namely, exclusive and overlapping, and two task dropping strategies, namely, early and straggler, enabling us to realistically evaluate the potential performance gains of approximate computing. Our numerical analysis shows that the proposed models can guide big data platforms to determine the optimal approximation strategies and degrees of approximation.

UR - http://www.scopus.com/inward/record.url?scp=85034099178&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034099178&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM.2017.8057038

DO - 10.1109/INFOCOM.2017.8057038

M3 - Conference contribution

BT - INFOCOM 2017 - IEEE Conference on Computer Communications

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Perez JF, Birke R, Chen LY. On the latency-accuracy tradeoff in approximate MapReduce jobs. En INFOCOM 2017 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc. 2017. 8057038 https://doi.org/10.1109/INFOCOM.2017.8057038