Withdraw
Loading…
High performance histogramming on massively parallel processors
Ross, Gregory
Loading…
Permalink
https://hdl.handle.net/2142/50625
Description
- Title
- High performance histogramming on massively parallel processors
- Author(s)
- Ross, Gregory
- Issue Date
- 2014-09-16
- Director of Research (if dissertation) or Advisor (if thesis)
- Hwu, Wen-Mei W.
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- General Purpose Computing on Graphics Processing Units (GPGPU)
- Compute Unified Device Architecture (CUDA)
- Histogram
- Image Processing
- Parallelism
- Abstract
- Histogramming is a technique by which input datasets are mined to extract features and patterns. Histograms have wide range of uses in computer vision, machine learning, database processing, quality control for manufacturing, and many applications benefit from advance knowledge about the distribution of data. Computing a histogram is, essentially, the antithesis of parallel processing. Without the use of slow atomic operations or serial execution when contributing data to a histogram bin in an input-driven method, there would likely be inaccuracies in the resulting output. An output-driven method would eliminate the need for atomic operations but would amplify read bandwidth requirements, reduce overall throughput, and result in a zero or negative gain in performance. We introduce a method to pack multiple bins into a memory word with the goal of better utilizing GPU resources. This method improves GPU occupancy relative to earlier histogram kernel implementations, increases the number of working threads to better hide the latency of atomic operations and collisions while maintaining reasonable throughput. This technique will be demonstrated to improve performance of histogram functions of various sizes with a variety of inputs, including a study on a particular application. While the results are heavily driven by dependancies on input data patterns, the conclusions gathered in this thesis will outline that the packed atomics histogramming kernel can and usually does outperform other implementations in all but a select number of exceptions.
- Graduation Semester
- 2014-08
- Permalink
- http://hdl.handle.net/2142/50625
- Copyright and License Information
- Copyright 2014 Gregory Ross
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…