Getafix: Workload-aware distributed interactive analytics

Ghosh, Mainak; Xu, Le; Qian, Xiaoyao; Kao, Thomas; Gupta, Indranil; Gupta, Himanshu

Getafix: Workload-aware distributed interactive analytics

Ghosh, Mainak; Xu, Le; Qian, Xiaoyao; Kao, Thomas; Gupta, Indranil; Gupta, Himanshu

Permalink

https://hdl.handle.net/2142/89289

Description

Title

Getafix: Workload-aware distributed interactive analytics

Author(s)

Ghosh, Mainak
Xu, Le
Qian, Xiaoyao
Kao, Thomas
Gupta, Indranil
Gupta, Himanshu

Issue Date

2016-03-08

Keyword(s)

Data management
Workload aware
Lookback processing

Abstract

Distributed interactive analytics engines (Druid, Redshift, Pinot) need to achieve low query latency while using the least storage space. This paper presents a solution to the problem of replication of data blocks and routing of queries. Our techniques decide the replication level of individual data blocks (based on popularity, access counts), as well as output optimal placement patterns for such data blocks. For the static version of the problem (given set of queries accessing some segments), our techniques are provably optimal in both storage and query latency. For the dynamic version of the problem, we build a system called Getafix that dynamically tracks data block popularity, adjusts replication levels, dynamically routes queries, and garbage collects less useful data blocks. We implemented Getafix into Druid, the most popular open-source interactive analytics engine. Our experiments use both synthetic traces and production traces from Yahoo! Inc.’s production Druid cluster. Compared to existing techniques Getafix either improves storage space used by up to 3.5x while achieving comparable query latency, or improves query latency by up to 60% while using comparable storage.

Type of Resource

text

Permalink

http://hdl.handle.net/2142/89289

Getafix: Workload-aware distributed interactive analytics

Ghosh, Mainak; Xu, Le; Qian, Xiaoyao; Kao, Thomas; Gupta, Indranil; Gupta, Himanshu

Permalink

Description

Owning Collections

Research and Tech Reports - Computer Science PRIMARY

Getafix: Workload-aware distributed interactive analytics

Ghosh, Mainak; Xu, Le; Qian, Xiaoyao; Kao, Thomas; Gupta, Indranil; Gupta, Himanshu

Permalink

Description

Owning Collections

Research and Tech Reports - Computer Science PRIMARY

Log In