Holistic Workload Scaling: A New Approach to Compute Acceleration in the Cloud

Juan F. Perez, Lydia Y. Chen, Massimo Villari, Rajiv Ranjan

Research output: Contribution to specialist publicationArticle

2 Citations (Scopus)

Abstract

Workload scaling is an approach to accelerating computation and thus improving response times by replicating the exact same request multiple times and processing it in parallel on multiple nodes and accepting the result from the first node to finish. This is not unlike a TV game show, where the same question is given to multiple contestants and the (correct) answer is accepted from the first to respond. This is different than traditional strategies for parallelization as used in, say, MapReduce workloads, where each node runs a subset of the overall workload. There are a variety of strategies that trade off metrics such as cost, utilization, performance, and interprocessor communication requirements. Performance modeling can help determine optimal approaches for different environments and goals. This is important, because poor performance can lead to application and domain-specific losses, such as e-commerce conversions and sales. Performance modeling and analysis plays an important role in designing and driving the selection of resource scaling mechanisms. Such modeling and analysis is complex due to time-varying workload arrival rates and request sizes, and even more complex in cloud environments due to the additional stochastic variation caused by performance interference due to resource sharing across co-located tenants. Moreover, little is known on how to multi-scale, i.e., dynamically and simultaneously scale resources vertically, horizontally, and through workload scaling. In this article, we first demonstrate the effectiveness of multi-scaling in reducing latency, and then discuss the performance modeling challenges, particularly for workload scaling.

Original languageEnglish (US)
Pages20-30
Number of pages11
Volume5
No1
Specialist publicationIEEE Cloud Computing
DOIs
StatePublished - Jan 1 2018
Externally publishedYes

Fingerprint

Sales
Communication
Processing
Costs

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Software
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Perez, Juan F. ; Chen, Lydia Y. ; Villari, Massimo ; Ranjan, Rajiv. / Holistic Workload Scaling : A New Approach to Compute Acceleration in the Cloud. In: IEEE Cloud Computing. 2018 ; Vol. 5, No. 1. pp. 20-30.
@misc{aeac46ea7b8a457886fd8b9b984946ae,
title = "Holistic Workload Scaling: A New Approach to Compute Acceleration in the Cloud",
abstract = "Workload scaling is an approach to accelerating computation and thus improving response times by replicating the exact same request multiple times and processing it in parallel on multiple nodes and accepting the result from the first node to finish. This is not unlike a TV game show, where the same question is given to multiple contestants and the (correct) answer is accepted from the first to respond. This is different than traditional strategies for parallelization as used in, say, MapReduce workloads, where each node runs a subset of the overall workload. There are a variety of strategies that trade off metrics such as cost, utilization, performance, and interprocessor communication requirements. Performance modeling can help determine optimal approaches for different environments and goals. This is important, because poor performance can lead to application and domain-specific losses, such as e-commerce conversions and sales. Performance modeling and analysis plays an important role in designing and driving the selection of resource scaling mechanisms. Such modeling and analysis is complex due to time-varying workload arrival rates and request sizes, and even more complex in cloud environments due to the additional stochastic variation caused by performance interference due to resource sharing across co-located tenants. Moreover, little is known on how to multi-scale, i.e., dynamically and simultaneously scale resources vertically, horizontally, and through workload scaling. In this article, we first demonstrate the effectiveness of multi-scaling in reducing latency, and then discuss the performance modeling challenges, particularly for workload scaling.",
author = "Perez, {Juan F.} and Chen, {Lydia Y.} and Massimo Villari and Rajiv Ranjan",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/MCC.2018.011791711",
language = "English (US)",
volume = "5",
pages = "20--30",
journal = "IEEE Cloud Computing",
issn = "2325-6095",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

Holistic Workload Scaling : A New Approach to Compute Acceleration in the Cloud. / Perez, Juan F.; Chen, Lydia Y.; Villari, Massimo; Ranjan, Rajiv.

In: IEEE Cloud Computing, Vol. 5, No. 1, 01.01.2018, p. 20-30.

Research output: Contribution to specialist publicationArticle

TY - GEN

T1 - Holistic Workload Scaling

T2 - A New Approach to Compute Acceleration in the Cloud

AU - Perez, Juan F.

AU - Chen, Lydia Y.

AU - Villari, Massimo

AU - Ranjan, Rajiv

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Workload scaling is an approach to accelerating computation and thus improving response times by replicating the exact same request multiple times and processing it in parallel on multiple nodes and accepting the result from the first node to finish. This is not unlike a TV game show, where the same question is given to multiple contestants and the (correct) answer is accepted from the first to respond. This is different than traditional strategies for parallelization as used in, say, MapReduce workloads, where each node runs a subset of the overall workload. There are a variety of strategies that trade off metrics such as cost, utilization, performance, and interprocessor communication requirements. Performance modeling can help determine optimal approaches for different environments and goals. This is important, because poor performance can lead to application and domain-specific losses, such as e-commerce conversions and sales. Performance modeling and analysis plays an important role in designing and driving the selection of resource scaling mechanisms. Such modeling and analysis is complex due to time-varying workload arrival rates and request sizes, and even more complex in cloud environments due to the additional stochastic variation caused by performance interference due to resource sharing across co-located tenants. Moreover, little is known on how to multi-scale, i.e., dynamically and simultaneously scale resources vertically, horizontally, and through workload scaling. In this article, we first demonstrate the effectiveness of multi-scaling in reducing latency, and then discuss the performance modeling challenges, particularly for workload scaling.

AB - Workload scaling is an approach to accelerating computation and thus improving response times by replicating the exact same request multiple times and processing it in parallel on multiple nodes and accepting the result from the first node to finish. This is not unlike a TV game show, where the same question is given to multiple contestants and the (correct) answer is accepted from the first to respond. This is different than traditional strategies for parallelization as used in, say, MapReduce workloads, where each node runs a subset of the overall workload. There are a variety of strategies that trade off metrics such as cost, utilization, performance, and interprocessor communication requirements. Performance modeling can help determine optimal approaches for different environments and goals. This is important, because poor performance can lead to application and domain-specific losses, such as e-commerce conversions and sales. Performance modeling and analysis plays an important role in designing and driving the selection of resource scaling mechanisms. Such modeling and analysis is complex due to time-varying workload arrival rates and request sizes, and even more complex in cloud environments due to the additional stochastic variation caused by performance interference due to resource sharing across co-located tenants. Moreover, little is known on how to multi-scale, i.e., dynamically and simultaneously scale resources vertically, horizontally, and through workload scaling. In this article, we first demonstrate the effectiveness of multi-scaling in reducing latency, and then discuss the performance modeling challenges, particularly for workload scaling.

UR - http://www.scopus.com/inward/record.url?scp=85045029058&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045029058&partnerID=8YFLogxK

U2 - 10.1109/MCC.2018.011791711

DO - 10.1109/MCC.2018.011791711

M3 - Article

AN - SCOPUS:85045029058

VL - 5

SP - 20

EP - 30

JO - IEEE Cloud Computing

JF - IEEE Cloud Computing

SN - 2325-6095

ER -