Improving music mood classification using lyrics, audio and social tags

Hu, Xiao

Improving music mood classification using lyrics, audio and social tags

Hu, Xiao

Permalink

https://hdl.handle.net/2142/18435

Description

Title

Improving music mood classification using lyrics, audio and social tags

Author(s)

Hu, Xiao

Issue Date

2011-01-14T22:50:50Z

Director of Research (if dissertation) or Advisor (if thesis)

Downie, J. Stephen

Doctoral Committee Chair(s)

Smith, Linda C.

Committee Member(s)

Downie, J. Stephen
Zhai, ChengXiang
Heidorn, P. Bryan

Department of Study

Library & Information Science

Discipline

Library & Information Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2011-01-14T22:50:50Z

Keyword(s)

Music mood classification
Music
Mood
Metadata
Social tags
Lyrics
Affect analysis
Multimodal classification
Emotion theories
Music mood categories
Music information retrieval

Abstract

The affective aspect of music (popularly known as music mood) is a newly emerging metadata type and access point to music information, but it has not been well studied in information science. There has yet to be developed a suitable set of mood categories that can reflect the reality of music listening and can be well adopted in the Music Information Retrieval (MIR) community. As music repositories have grown to an unprecedentedly large scale, people call for automatic tools for music classification and recommendation. However, there have been only a few music mood classification systems with suboptimal performances, and most of them are solely based on the audio content of the music. Lyric text and social tags are resources independent of and complementary to audio content but have yet to be fully exploited. This dissertation research takes up these problems and aims to 1) summarize fundamental insights in music psychology that can help information scientists interpret music mood; 2) identify mood categories that are frequently used by real-world music listeners, through an empirical investigation of real-life social tags applied to music; 3) advance the technology in automatic music mood classification by a thorough investigation on lyric text analysis and the combination of lyrics and audio. Using linguistic resources and human expertise, 36 mood categories were identified from the most popular social tags collected from last.fm, a major Western music tagging site. A ground truth dataset of 5,296 songs in 18 mood categories were built with mood labels given by a number of real-life users. Both commonly used text features and advanced linguistic features were investigated, as well as different feature representation models and feature combinations. The best performing lyric feature sets were then compared to a leading audio-based system. In combining lyric and audio sources, both methods of feature concatenation and late fusion (linear interpolation) of classifiers were examined and compared. Finally, system performances on various numbers of training examples and different audio lengths were compared. The results indicate: 1) social tags can help identify mood categories suitable for a real world music listening environment; 2) the most useful lyric features are linguistic features combined with text stylistic features; 3) lyric features outperform audio features in terms of averaged accuracy across all considered mood categories; 4) systems combining lyrics and audio outperform audio-only and lyric-only systems; 5) combining lyrics and audio can reduce the requirement on training data size, both in number of examples and in audio length. Contributions of this research are threefold. On methodology, it improves the state of the art in music mood classification and text affect analysis in the music domain. The mood categories identified from empirical social tags can complement those in theoretical psychology models. In addition, many of the lyric text features examined in this study have never been formally studied in the context of music mood classification nor been compared to each other using a common dataset. On evaluation, the ground truth dataset built in this research is large and unique with ternary information available: audio, lyrics and social tags. Part of the dataset has been made available to the MIR community through the Music Information Retrieval Evaluation eXchange (MIREX) 2009 and 2010, the community-based evaluation framework. The proposed method of deriving ground truth from social tags provides an effective alternative to the expensive human assessments on music and thus clears the way to large scale experiments. On application, findings of this research help build effective and efficient music mood classification and recommendation systems by optimizing the interaction of music audio and lyrics. A prototype of such systems can be accessed at http://moodydb.com.

Graduation Semester

2010-12

Permalink

http://hdl.handle.net/2142/18435

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Information Sciences

Dissertations and theses from the School of Information Sciences

Improving music mood classification using lyrics, audio and social tags

Hu, Xiao

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Information Sciences

Log In