Dual Scaling VMs and Queries: Cost-Effective Latency Curtailment

Juan F. Perez, Robert Birke, Mathias Bjorkqvist, Lydia Y. Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Wimpy virtual instances equipped with small numbers of cores and RAM are popular public and private cloud offerings because of their low cost for hosting applications. The challenge is how to run latency-sensitive applications using such instances, which trade off performance for cost. In this study, we analytically and experimentally show that simultaneously scaling resources at coarse granularity and workloads, i.e., submitting multiple query clones to different servers, at fine granularity can overcome the performance disadvantages of wimpy VM instances and achieve stringent latency targets that are even lower than the average execution times of wimpy servers. To such an end, we first derive a closed-form analysis for the latency under any given VM provisioning and query replication level, considering cloning policies that can (not) terminate outstanding clones with (without) an overhead. Validated on trace-driven simulations, our analysis is able to accurately predict the latency and efficiently search for the optimal number of VMs and clones. Secondly, we develop a dual elastic scaler, DuoScale, that dynamically scales VMs and clones according to the workload dynamics so as to achieve the target latency in a cost-effective manner. The effectiveness of DuoScale lies on the observation that the application performance only scales sub-linearly with increasing vertical or horizontal resource provisioning, i.e., resources per VM or number of VMs. We evaluate DuoScale against VM-only scaling strategies via extensive trace-driven simulations as well as experimental results on a cloud test-bed. Our results show that DuoScale is able to achieve the stringent target latency by using clones on wimpy VMs with cost savings up to 50%, compared to scaling brawny VMs that have better performance at a higher unit cost.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages988-998
Number of pages11
ISBN (Electronic)9781538617915
DOIs
StatePublished - Jul 13 2017
Event37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017 - Atlanta, United States
Duration: Jun 5 2017Jun 8 2017

Conference

Conference37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017
CountryUnited States
CityAtlanta
Period6/5/176/8/17

Fingerprint

Costs
Servers
Cloning
Random access storage

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Perez, J. F., Birke, R., Bjorkqvist, M., & Chen, L. Y. (2017). Dual Scaling VMs and Queries: Cost-Effective Latency Curtailment. In Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017 (pp. 988-998). [7980040] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDCS.2017.231
Perez, Juan F. ; Birke, Robert ; Bjorkqvist, Mathias ; Chen, Lydia Y. / Dual Scaling VMs and Queries : Cost-Effective Latency Curtailment. Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 988-998
@inproceedings{2196aaf0d9df4baebc12316034d5ba0f,
title = "Dual Scaling VMs and Queries: Cost-Effective Latency Curtailment",
abstract = "Wimpy virtual instances equipped with small numbers of cores and RAM are popular public and private cloud offerings because of their low cost for hosting applications. The challenge is how to run latency-sensitive applications using such instances, which trade off performance for cost. In this study, we analytically and experimentally show that simultaneously scaling resources at coarse granularity and workloads, i.e., submitting multiple query clones to different servers, at fine granularity can overcome the performance disadvantages of wimpy VM instances and achieve stringent latency targets that are even lower than the average execution times of wimpy servers. To such an end, we first derive a closed-form analysis for the latency under any given VM provisioning and query replication level, considering cloning policies that can (not) terminate outstanding clones with (without) an overhead. Validated on trace-driven simulations, our analysis is able to accurately predict the latency and efficiently search for the optimal number of VMs and clones. Secondly, we develop a dual elastic scaler, DuoScale, that dynamically scales VMs and clones according to the workload dynamics so as to achieve the target latency in a cost-effective manner. The effectiveness of DuoScale lies on the observation that the application performance only scales sub-linearly with increasing vertical or horizontal resource provisioning, i.e., resources per VM or number of VMs. We evaluate DuoScale against VM-only scaling strategies via extensive trace-driven simulations as well as experimental results on a cloud test-bed. Our results show that DuoScale is able to achieve the stringent target latency by using clones on wimpy VMs with cost savings up to 50{\%}, compared to scaling brawny VMs that have better performance at a higher unit cost.",
author = "Perez, {Juan F.} and Robert Birke and Mathias Bjorkqvist and Chen, {Lydia Y.}",
year = "2017",
month = "7",
day = "13",
doi = "10.1109/ICDCS.2017.231",
language = "English (US)",
pages = "988--998",
booktitle = "Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Perez, JF, Birke, R, Bjorkqvist, M & Chen, LY 2017, Dual Scaling VMs and Queries: Cost-Effective Latency Curtailment. in Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017., 7980040, Institute of Electrical and Electronics Engineers Inc., pp. 988-998, 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, Atlanta, United States, 6/5/17. https://doi.org/10.1109/ICDCS.2017.231

Dual Scaling VMs and Queries : Cost-Effective Latency Curtailment. / Perez, Juan F.; Birke, Robert; Bjorkqvist, Mathias; Chen, Lydia Y.

Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 988-998 7980040.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Dual Scaling VMs and Queries

T2 - Cost-Effective Latency Curtailment

AU - Perez, Juan F.

AU - Birke, Robert

AU - Bjorkqvist, Mathias

AU - Chen, Lydia Y.

PY - 2017/7/13

Y1 - 2017/7/13

N2 - Wimpy virtual instances equipped with small numbers of cores and RAM are popular public and private cloud offerings because of their low cost for hosting applications. The challenge is how to run latency-sensitive applications using such instances, which trade off performance for cost. In this study, we analytically and experimentally show that simultaneously scaling resources at coarse granularity and workloads, i.e., submitting multiple query clones to different servers, at fine granularity can overcome the performance disadvantages of wimpy VM instances and achieve stringent latency targets that are even lower than the average execution times of wimpy servers. To such an end, we first derive a closed-form analysis for the latency under any given VM provisioning and query replication level, considering cloning policies that can (not) terminate outstanding clones with (without) an overhead. Validated on trace-driven simulations, our analysis is able to accurately predict the latency and efficiently search for the optimal number of VMs and clones. Secondly, we develop a dual elastic scaler, DuoScale, that dynamically scales VMs and clones according to the workload dynamics so as to achieve the target latency in a cost-effective manner. The effectiveness of DuoScale lies on the observation that the application performance only scales sub-linearly with increasing vertical or horizontal resource provisioning, i.e., resources per VM or number of VMs. We evaluate DuoScale against VM-only scaling strategies via extensive trace-driven simulations as well as experimental results on a cloud test-bed. Our results show that DuoScale is able to achieve the stringent target latency by using clones on wimpy VMs with cost savings up to 50%, compared to scaling brawny VMs that have better performance at a higher unit cost.

AB - Wimpy virtual instances equipped with small numbers of cores and RAM are popular public and private cloud offerings because of their low cost for hosting applications. The challenge is how to run latency-sensitive applications using such instances, which trade off performance for cost. In this study, we analytically and experimentally show that simultaneously scaling resources at coarse granularity and workloads, i.e., submitting multiple query clones to different servers, at fine granularity can overcome the performance disadvantages of wimpy VM instances and achieve stringent latency targets that are even lower than the average execution times of wimpy servers. To such an end, we first derive a closed-form analysis for the latency under any given VM provisioning and query replication level, considering cloning policies that can (not) terminate outstanding clones with (without) an overhead. Validated on trace-driven simulations, our analysis is able to accurately predict the latency and efficiently search for the optimal number of VMs and clones. Secondly, we develop a dual elastic scaler, DuoScale, that dynamically scales VMs and clones according to the workload dynamics so as to achieve the target latency in a cost-effective manner. The effectiveness of DuoScale lies on the observation that the application performance only scales sub-linearly with increasing vertical or horizontal resource provisioning, i.e., resources per VM or number of VMs. We evaluate DuoScale against VM-only scaling strategies via extensive trace-driven simulations as well as experimental results on a cloud test-bed. Our results show that DuoScale is able to achieve the stringent target latency by using clones on wimpy VMs with cost savings up to 50%, compared to scaling brawny VMs that have better performance at a higher unit cost.

UR - http://www.scopus.com/inward/record.url?scp=85027252830&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027252830&partnerID=8YFLogxK

U2 - 10.1109/ICDCS.2017.231

DO - 10.1109/ICDCS.2017.231

M3 - Conference contribution

AN - SCOPUS:85027252830

SP - 988

EP - 998

BT - Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Perez JF, Birke R, Bjorkqvist M, Chen LY. Dual Scaling VMs and Queries: Cost-Effective Latency Curtailment. In Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 988-998. 7980040 https://doi.org/10.1109/ICDCS.2017.231