On implementing sparse matrix-vector multiplication on intel platform

AlMasri, Mohammad

On implementing sparse matrix-vector multiplication on intel platform

AlMasri, Mohammad

Permalink

https://hdl.handle.net/2142/101729

Description

Title

On implementing sparse matrix-vector multiplication on intel platform

Author(s)

AlMasri, Mohammad

Issue Date

2018-07-19

Director of Research (if dissertation) or Advisor (if thesis)

Hwu, Wen-Mei W.
Abu-Sufah, Walid

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Date of Ingest

2018-09-27T16:34:23Z

Keyword(s)

SpMV, SIMD, CCF, CSR, I-e, MKL, OpenMP, Skylake, KNL

Abstract

Sparse matrix-vector multiplication, SpMV, can be a performance bottle-neck in iterative solvers and algebraic eigenvalue problems. In this thesis, we present our sparse matrix compressed chunk storage format (CCF) and SpMV CCF kernel that realizes high performance on Intel Xeon multicore and Phi processors for unstructured matrices. CCF kernel exploits the properties of CCF to enhance load balancing and SIMD efficiency. Moreover, we present the CCF auto-tuner that selects the most effective parameters and the SpMV kernel to achieve the highest possible performance that CCF can attain on a target architecture. Using 151 unstructured matrices from 38 application areas, we compare the performance of the CCF kernel to that of MKL 2018u1 SpMV CSR, MKL 2018u2 Inspector executor SpMV CSR, and Compressed Vectorization-oriented sparse Row (CVR) SpMV. We execute the kernels on a dual 24-core Skylake Xeon Platinum 8160 and a 68-core KNL Xeon Phi 7250. Executing on the dual 24-core Skylake Xeon Platinum 8160, and compared to MKL SpMV CSR, our kernel achieves superior execution throughputs for 135 matrices (89%) with an average speed improvement of 2.3x and maximum speed improvement of 27.5x. Our kernel outperforms MKL Inspector-executor SpMV CSR for 109 matrices (73%) with an average speed improvement of 1.5x and maximum speed improvement of 3.0x. Moreover, SpMV CCF outperforms SpMV CVR for 81% of the matrices with an average speed improvement of 1.8x and maximum speed improvement of 4.2x. Executing on the 68-core KNL Xeon Phi 7250, CCF achieves high average and maximum speed improvements compared to the other three kernels but for slightly smaller percentages of matrices. Lastly, we show that auto-tuning CCF parameters improves the performance for more than 50 matrices compared to the default CCF on Skylake and KNL with an average speed improvement of 1.2x.

Graduation Semester

2018-08

Type of Resource

text

Permalink

http://hdl.handle.net/2142/101729

Copyright and License Information

2018 Mohammad Almasri

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

On implementing sparse matrix-vector multiplication on intel platform

AlMasri, Mohammad

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In