Withdraw
Loading…
Improving music mood classification using lyrics, audio and social tags
Hu, Xiao
Loading…
Permalink
https://hdl.handle.net/2142/18435
Description
- Title
- Improving music mood classification using lyrics, audio and social tags
- Author(s)
- Hu, Xiao
- Issue Date
- 2011-01-14T22:50:50Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Downie, J. Stephen
- Doctoral Committee Chair(s)
- Smith, Linda C.
- Committee Member(s)
- Downie, J. Stephen
- Zhai, ChengXiang
- Heidorn, P. Bryan
- Department of Study
- Library & Information Science
- Discipline
- Library & Information Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Music mood classification
- Music
- Mood
- Metadata
- Social tags
- Lyrics
- Affect analysis
- Multimodal classification
- Emotion theories
- Music mood categories
- Music information retrieval
- Abstract
- The affective aspect of music (popularly known as music mood) is a newly emerging metadata type and access point to music information, but it has not been well studied in information science. There has yet to be developed a suitable set of mood categories that can reflect the reality of music listening and can be well adopted in the Music Information Retrieval (MIR) community. As music repositories have grown to an unprecedentedly large scale, people call for automatic tools for music classification and recommendation. However, there have been only a few music mood classification systems with suboptimal performances, and most of them are solely based on the audio content of the music. Lyric text and social tags are resources independent of and complementary to audio content but have yet to be fully exploited. This dissertation research takes up these problems and aims to 1) summarize fundamental insights in music psychology that can help information scientists interpret music mood; 2) identify mood categories that are frequently used by real-world music listeners, through an empirical investigation of real-life social tags applied to music; 3) advance the technology in automatic music mood classification by a thorough investigation on lyric text analysis and the combination of lyrics and audio. Using linguistic resources and human expertise, 36 mood categories were identified from the most popular social tags collected from last.fm, a major Western music tagging site. A ground truth dataset of 5,296 songs in 18 mood categories were built with mood labels given by a number of real-life users. Both commonly used text features and advanced linguistic features were investigated, as well as different feature representation models and feature combinations. The best performing lyric feature sets were then compared to a leading audio-based system. In combining lyric and audio sources, both methods of feature concatenation and late fusion (linear interpolation) of classifiers were examined and compared. Finally, system performances on various numbers of training examples and different audio lengths were compared. The results indicate: 1) social tags can help identify mood categories suitable for a real world music listening environment; 2) the most useful lyric features are linguistic features combined with text stylistic features; 3) lyric features outperform audio features in terms of averaged accuracy across all considered mood categories; 4) systems combining lyrics and audio outperform audio-only and lyric-only systems; 5) combining lyrics and audio can reduce the requirement on training data size, both in number of examples and in audio length. Contributions of this research are threefold. On methodology, it improves the state of the art in music mood classification and text affect analysis in the music domain. The mood categories identified from empirical social tags can complement those in theoretical psychology models. In addition, many of the lyric text features examined in this study have never been formally studied in the context of music mood classification nor been compared to each other using a common dataset. On evaluation, the ground truth dataset built in this research is large and unique with ternary information available: audio, lyrics and social tags. Part of the dataset has been made available to the MIR community through the Music Information Retrieval Evaluation eXchange (MIREX) 2009 and 2010, the community-based evaluation framework. The proposed method of deriving ground truth from social tags provides an effective alternative to the expensive human assessments on music and thus clears the way to large scale experiments. On application, findings of this research help build effective and efficient music mood classification and recommendation systems by optimizing the interaction of music audio and lyrics. A prototype of such systems can be accessed at http://moodydb.com.
- Graduation Semester
- 2010-12
- Permalink
- http://hdl.handle.net/2142/18435
- Copyright and License Information
- Copyright 2010 Xiao Hu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Information Sciences
Dissertations and theses from the School of Information SciencesManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…