Withdraw
Loading…
Automatic classification of electronic music and speech/music audio content
Chen, Austin
Loading…
Permalink
https://hdl.handle.net/2142/49569
Description
- Title
- Automatic classification of electronic music and speech/music audio content
- Author(s)
- Chen, Austin
- Issue Date
- 2014-05-30T16:50:24Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Hasegawa-Johnson, Mark A.
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- speech/music discrimination
- genre classification
- Music information retrieval
- Gaussian mixture model
- Audio content analysis
- Audio classification
- Abstract
- Automatic audio categorization has great potential for application in the maintenance and usage of large and constantly growing media databases; accordingly, much research has been done to demonstrate the feasibility of such methods. A popular topic is that of automatic genre classification, accomplished by training machine learning algorithms. However, “electronic” or “techno” music is often misrepresented in prior work, especially given the recent rapid evolution of the genre and subsequent splintering into distinctive subgenres. As such, features are extracted from electronic music samples in an experiment to categorize song samples into three subgenres: deep house, dubstep, and progressive house. An overall classification performance of 80.67% accuracy is achieved, comparable to prior work. Similarly, many past studies have been conducted on speech/music discrimination due to the potential applications for broadcast and other media, but it remains possible to expand the experimental scope to include samples of speech with varying amounts of background music. The development and evaluation of two measures of the ratio between speech energy and music energy are explored: a reference measure called speech-to-music ratio (SMR) and a feature which is an imprecise estimate of SMR called estimated voice-to-music ratio (eVMR). eVMR is an objective signal measure computed by taking advantage of broadcast mixing techniques in which vocals, unlike most instruments, are typically placed at stereo center. Conversely, SMR is a hidden variable defined by the relationship between the powers of portions of audio attributed to speech and music. It is shown that eVMR is predictive of SMR and can be combined with state-of-the-art features in order to improve performance. For evaluation, this new metric is applied in speech/music (binary) classification, speech/music/mixed (trinary) classification, and a new speech-to-music ratio estimation problem. Promising results are achieved, including 93.06% accuracy for trinary classification and 3.86 dB RMSE estimation of the SMR.
- Graduation Semester
- 2014-05
- Permalink
- http://hdl.handle.net/2142/49569
- Copyright and License Information
- Copyright 2014 Austin Chen
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…