Mitigating Spark straggler tasks for iterative applications by data re-partitioning
Teng, Bo
Loading…
Permalink
https://hdl.handle.net/2142/97707
Description
Title
Mitigating Spark straggler tasks for iterative applications by data re-partitioning
Author(s)
Teng, Bo
Issue Date
2017-04-18
Director of Research (if dissertation) or Advisor (if thesis)
Campbell, Roy H.
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Straggler
Machine learning
Apache Spark
Iterative application
Abstract
Many of the data science applications nowadays feature large datasets and short tasks that run many iterations. When running these applications on a parallel processing framework like Apache Spark, one problem that affects the running time is the straggler, where a disproportionate long-running task slows down the entire cluster. In this work we present a straggler mitigation technique tailored for applications that run small tasks for many iterations over a large dataset, and implemented the algorithm in Apache Spark. We monitor the resources available on each Spark node, and dynamically re partition the dataset proportional to the estimated resource available. We have shown that our algorithm has negligible overhead for resource monitoring, and can improve the performance of Spark cluster significantly when stragglers are present.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.