Withdraw
Loading…
Learning from the experts: Measuring the policy content of legislation using machine learning
Dee, Ethan Adam
This item's files can only be accessed by the Administrator group.
Permalink
https://hdl.handle.net/2142/120545
Description
- Title
- Learning from the experts: Measuring the policy content of legislation using machine learning
- Author(s)
- Dee, Ethan Adam
- Issue Date
- 2023-04-25
- Director of Research (if dissertation) or Advisor (if thesis)
- Krasa, Stefan
- Doctoral Committee Chair(s)
- Krasa, Stefan
- Committee Member(s)
- Bernhardt, Mark D
- Winters, Matthew S
- Garlick, Alex
- Department of Study
- Economics
- Discipline
- Economics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- machine learning
- natural language processing
- text classification
- supervised learning
- text as data
- legislation
- issue attention
- policy agenda
- transformer
- word embedding
- congress
- legislature
- Abstract
- Legislation entails a significant portion of legislative activity, both in volume and in impact, but the high-dimensional information contained within textual data does not lend itself to quantitative analysis. Knowing that a legislator has, for example, been the sponsor of 100 "Education" bills requires knowing how to find "Education" bills in the first place. Deciding what does and does not count as an "Education" bill is the peril of the chorus of researchers who have long pursued ways to measure the policy content of legislation in a tractable way. I present two approaches to classifying legislation into policy areas, offering both generalizable methodological insights for social scientists, as well as two new datasets which classify the universe of state and U.S. Congressional legislation from 2009-2023 into broad, comprehensive policy areas. The first approach presumes a starting point where the researcher has not coded any bills by hand. I begin by refining the "dictionary method," presented in Garlick (2022), using a set of keywords chosen to represent topics, and train a supervised machine learning model to understand the context that often surrounds these keywords, to form predictions for the topics it should assign to each bill. This allows the model to generate predictions for bills which do not contain keywords and form a richer depiction of topics than simplistic keyword-to-topic assignment rules allow. The second approach involves training a supervised machine learning model to emulate the decision function that generated extant hand-coded data. I use the Congressional Bills Project (Adler and Wilkerson 2015) and Pennsylvania Policy Database Project (McLaughlin et al. 2010) hand-coded data to train the model, offering methodological insights regarding how to handle "noise" in hand-coder data (such as two hand-coders coding the same bill differently), and present several ways to validate the model's output and generate out-of-sample predictions. Further, I demonstrate how the downstream researcher can use a supervised machine learning model as a companion to their hand-coders, as it offers a unique and compelling perspective on their corpus which can improve the quality of the hand-coded data.
- Graduation Semester
- 2023-05
- Type of Resource
- Thesis
- Copyright and License Information
- Copyright 2023 Ethan Dee
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…