Efficient memory-level parallelism extraction with decoupled strands
Crago, Neal
Loading…
Permalink
https://hdl.handle.net/2142/24372
Description
Title
Efficient memory-level parallelism extraction with decoupled strands
Author(s)
Crago, Neal
Issue Date
2011-05-25T14:39:25Z
Director of Research (if dissertation) or Advisor (if thesis)
Patel, Sanjay J.
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Memory Latency Tolerance
Accelerators
Processors
Decoupled
Abstract
We present Outrider, an architecture for throughput-oriented processors that exploits intra-thread memory-level parallelism (MLP) to improve performance efficiency on highly threaded workloads. Outrider enables a single thread of execution to be presented to the
architecture as multiple decoupled instruction streams, consisting of either memory accessing or memory consuming instructions. The key insight is that by decoupling the instruction streams, the processor pipeline can expose MLP in a
way similar to out-of-order designs while relying on a low-complexity in-order micro-architecture. Instead of adding more threads as is done in modern GPUs, Outrider can expose the same MLP with fewer threads and reduced contention for resources shared among threads.
We demonstrate that Outrider can outperform single-threaded cores by 23-131% and a 4-way simultaneous multi-threaded core by up to 87% in data parallel applications in a 1024-core system. Outrider achieves these performance gains without incurring the overhead of additional hardware thread contexts, which results in improved efficiency compared to a
multi-threaded core.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.