Optimizing interactive analytics engines for heterogeneous clusters
Raina, Ashwini
Loading…
Permalink
https://hdl.handle.net/2142/101460
Description
Title
Optimizing interactive analytics engines for heterogeneous clusters
Author(s)
Raina, Ashwini
Issue Date
2018-05-09
Director of Research (if dissertation) or Advisor (if thesis)
Gupta, Indranil
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Real-time analytics, data replication
Abstract
This thesis targets the growing area of interactive data analytics engines. It builds upon a system called Getafix, an intelligent data replication and placement algorithm, and optimizes Getafix for running mixed queries over a heterogeneous cluster. The new algorithm is called Getafix-H, a cluster aware version of Getafix replication algorithm, with built-in optimizations for segment balancing and cluster auto-tiering. We integrated Getafix-H as an extension to Getafix inside Druid, a modern open-source interactive data analytics engine. We present experimental results using workloads from Yahoo!’s production Druid cluster. Compared to Getafix, Getafix-H improves the tail latency by 18% and reduces memory usage by up to 27% (2-3X improvement over Scarlett). In presence of stragglers, Getafix-H improves tail latency by 55% and reduces memory usage by upto 20% compared to Getafix. Getafix-H enables sysadmins to auto-tier a heterogeneous cluster with the tiering accuracy of up to 80%.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.