Zorro: Zero-Cost Reactive Failure Recovery in Distributed Graph Processing
Pundir, Mayank; Leslie, Luke M.; Gupta, Indranil; Campbell, Roy H.
Loading…
Permalink
https://hdl.handle.net/2142/75959
Description
Title
Zorro: Zero-Cost Reactive Failure Recovery in Distributed Graph Processing
Author(s)
Pundir, Mayank
Leslie, Luke M.
Gupta, Indranil
Campbell, Roy H.
Issue Date
2015-05-07
Keyword(s)
Distributed graph processing
Failure recovery
Reactive approaches
Checkpointing
Abstract
Distributed graph processing systems largely rely on proactive techniques for failure recovery. Unfortunately, these approaches (such as checkpointing) entail a significant overhead. In this paper, we argue that distributed graph processing systems should instead use a reactive approach to failure recovery. The reactive approach trades off completeness of the result (generating a slightly inaccurate result) while reducing the overhead during failure-free execution to zero. We build a system called Zorro that imbues this reactive approach, and integrate Zorro into two graph processing systems – PowerGraph and LFGraph. When a failure occurs, Zorro opportunistically exploits vertex replication (inherent in today’s graph processing systems) to quickly rebuild the state of failed servers. Experiments using real-world graphs demonstrate that Zorro is able to recover over 99% of the graph state when a few servers fail, and between 87-92% when half the cluster fails. Furthermore, using eight common graph processing algorithms, Zorro incurs little to no accuracy loss in all experimental failure scenarios.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.