Modeling the winning seed distribution of the NCAA basketball tournament

Khatibi, Arash

Modeling the winning seed distribution of the NCAA basketball tournament

Khatibi, Arash

Permalink

https://hdl.handle.net/2142/95338

Description

Title

Modeling the winning seed distribution of the NCAA basketball tournament

Author(s)

Khatibi, Arash

Issue Date

2016-11-23

Director of Research (if dissertation) or Advisor (if thesis)

Jacobson, Sheldon H.

Doctoral Committee Chair(s)

Jacobson, Sheldon H.

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Bracket Challenge
NCAA Tournament

Abstract

The National Collegiate Athletic Association's (NCAA) men's division I college basketball tournament is an annual competition that draws widespread attention in the United States. Estimating the outcome of each game is a popular activity undertaken by numerous websites, fans, and more recently, academic researchers. There has been a surge of interest in proposing mathematical methods to model the tournament's results and pick the winners of future games. This thesis analyzes the results of the NCAA basketball tournament since 1985 and proposes several models to capture the winning seed distribution in each round. The Exponential Model estimates the winning probability of each team by modeling the time between a team's successive winnings in a round as an exponential random variable. The Exponential Model estimates a zero probability for events that have not occurred in the training data set. The Markov Model solves this limitation by defining a Markov chain that incorporates each team's winnings in prior rounds to estimate its winning probability. Results of these two models are validated using a chi-squared goodness of fit test. The Power Model, which is an intelligent tool for generating brackets of winners, quantifies the relative strength of each match-up in a round as a power function of the teams' seed numbers, with the exponent estimated using the historical results. The main problem of the Power Model is the data complications that are generally caused by the small size of the training data set, especially in later rounds. The Position and Upset Models solve this problem by representing the tournament's games as a binary sequence and estimating the outcome of each game based on the teams' performance in the similar game. While generating a bracket in a forward direction from the first to the last round propagates the incorrect picks through the tournament, correctly picking the winners in later rounds automatically fills the bracket for several games in earlier rounds. This motivates developing bidirectional models that pick the winners based on a combination of models in forward and backward directions. The Power, Position, Upset, and bidirectional models are assessed based on the aggregate performance of millions of brackets for the five most recent tournaments (2012-2016). The proposed models allow one to estimate the likelihoods of different seed combinations by applying the estimated winning seed distributions, which accurately summarize the seeds' aggregate performance and provide a deeper understanding of the uncertainty in the games' outcomes.

Graduation Semester

2016-12

Type of Resource

text

Permalink

http://hdl.handle.net/2142/95338

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Dept. of Computer Science

Modeling the winning seed distribution of the NCAA basketball tournament

Khatibi, Arash

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In