Withdraw
Loading…
Compiler and runtime techniques for optimizing dynamic scripting languages
Wang, Haichuan
Loading…
Permalink
https://hdl.handle.net/2142/78638
Description
- Title
- Compiler and runtime techniques for optimizing dynamic scripting languages
- Author(s)
- Wang, Haichuan
- Issue Date
- 2015-04-20
- Director of Research (if dissertation) or Advisor (if thesis)
- Padua, David A.
- Doctoral Committee Chair(s)
- Padua, David A.
- Committee Member(s)
- Adve, Vikram S.
- Hwu, Wen-Mei W.
- Wu, Peng
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- R Programming Language
- Dynamic Scripting Language
- Compiler
- Performance
- Specialization
- Vectorization
- Abstract
- This thesis studies the compilation and runtime techniques to improve the performance of dynamic scripting languages using R programming language as a test case. The R programming language is a convenient system for statistical computing. In this era of big data, R is becoming increasingly popular as a powerful data analytics tool. But the performance of R limits its usage in a broader context. The thesis introduces a classification of R programming styles into Looping over data(Type I), Vector programming(Type II), and Glue codes(Type III), and identified the most serious overhead of R is mostly manifested in Type I R codes. It proposes techniques to improve the performance R. First, it uses interpreter level specialization to do object allocation removal and path length reduction, and evaluates its effectiveness for GNU R VM. The approach uses profiling to translate R byte-code into a specialized byte-code to improve running speed, and uses data representation specialization to reduce the memory allocation and usage. Secondly, it proposes a lightweight approach that reduces the interpretation overhead of R through vectorization of the widely used Apply class of operations in R. The approach combines data transformation and function vectorization to transform the looping-over-data execution into a code with mostly vector operations, which can significantly speedup the execution of Apply operations in R without any native code generation and still using only a single-thread of execution. Thirdly, the Apply vectorization technique is integrated into SparkR, a widely used distributed R computing system, and has successfully improved its performance. Furthermore, an R benchmark suite has been developed. It includes a collection of different types of R applications, and a flexible benchmarking environment for conducting performance research for R. All these techniques could be applied to other dynamic scripting languages. The techniques proposed in the thesis use a pure interpretation approach (the system based on the techniques does not generate native code) to improve the performance of R. The strategy has the advantage of maintaining the portability and compatibility of the VM, simplify the implementation. It is also a very interesting problem to see the potential of an interpreter.
- Graduation Semester
- 2015-5
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/78638
- Copyright and License Information
- Copyright 2015 by Haichuan Wang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…