Statistical Machine Learning Techniques for Chord Estimation

Get Complete Project Material File(s) Now! »

Chroma Representation, Background

Because they are a powerful compact representation of the tonal content information of the signal, chroma features have been widely used as input features of music analysis models based on the music harmonic content, such as chord or key finding, cover song detection or structure estimation. Various approaches for chroma computation exist. Although they present some variances in the implementation, they follow in general the same guideline that consists of two main steps:
1. First, a semitone pitch class spectrum (SPS), that is a log-frequency representation of the spectral content of the music audio signal, is constructed. It is expressed in a MIDI-note scale and is either computed from the Fourier transform or from the constant-Q transform. The center frequencies of the CQT can be chosen according to the frequencies of the equal-tempered scale. In such a case, the constant-Q spectrum corresponds to a semitone pitch class spectrum.
2. Secondly, the semitone pitch spectrum is mapped to the chroma vectors. For this, the semitones in octave distance are added up to pitch classes.
The chromagram computation may include some other steps such as a pre-processing step that separates harmonic and noise components, a filtering step that smoothes the chromagram or a post-processing normalization step that makes the chromagram invariant to dynamics. We review in the following some chroma feature extraction methods.

Chromagram Based on the Fourier Transform

In many approaches, the chromagram is generated using the Fourier transform. This approach was first proposed by Fujishima in [Fuj99], where the input signal is transformed from the time to the frequency domain using an FFT. Frequency bins corresponding to a same semitone are summed up to form a semitone pitch spectrum, which is then folded to pitch classes, resulting in a PCP vector.
This approach was followed by a large number of researchers with some variants.
In some approaches, the resolution of the chromagram is increased in order to improve robustness against tuning and other frequency oscillations, such as in the work of Goto [Got06], where a chromagram is computed so that there are 100 cents to a tempered semitone. Some approaches introduce a filtering process to reduce transient and noise, such as in the work of Peeters [Pee06b].
The FFT is particularly blurred at low frequencies. In order to identify strong tonal components in the spectrum and to get a higher resolution estimate of the underlying frequency, Ellis & Poliner [EP07] do not compute the chroma feature directly from the FFT. They use the Instantaneous Frequency spectrum, which uses the phase derivative to interpolate the frequency distribution.

Considering the Harmonics in the Pitch Class Profiles

Some methods for chroma computation take into account the higher harmonics of the notes in the chroma features computation. For instance G´omez introduces in [G´06a] an extension of the PCP, the Harmonic Pitch Class Profiles (HPCPs). A weighting procedure is proposed in order to make harmonics contribute to the pitch class of its fundamental frequency, so that each peak frequency fi has a contribution to the frequencies having fi as harmonic frequency (fi , fi 2 , fi 3 , fi 4 , . . . ).
Lee [Lee06a] proposes a feature vector called the Enhanced Pitch Class Profile (EPCP) for the application of chord recognition from audio. The chromagram is computed from the Harmonic Product Spectrum (HPS) instead of the DFT. The use of a HPS allows the elimination of non-tonal signal components from the spectrum.

Chromagram Based on multi-f0s

Some approaches compute chroma features from a multi-pitch representation instead of a spectral representation. For instance, Ryyn¨anen & Klapuri [RK08b] compute a chromagram from a pitch salience estimator. In [ZR07], Zenz & Rauber compute a multi-pitch based chromagram using the Enhanced Autocorrelation (EAC) algorithm described by Tolonen et al [TK00]. Varewyck et al. [VPM08] also propose a chroma extraction method based on multiple pitch tracking techniques.

Why Using Chroma Features for Harmonic Content Analysis?

We have chosen to use the chroma representation because we think that it is a very intuitive and natural representation of the signal in terms of harmony. We find it particularly convenient for chord analysis: the 12 bins of the chroma features correspond to the traditional pitch classes of the equal tempered scale. The chromagram can be followed as a music score when listening to the music.

Derivation of Chroma Features

In what follows, we focus on the derivation of three chroma representation extraction methods. The first two are based on the two above-mentioned spectral representations of the signal (FFT and CQT), the third one is based on a multipitch tracking technique. These approaches will be analyzed and compared in Section 3.6.

Chroma Based on a Spectral Representation

We review here two chromagram computation methods based on a spectral representation. The first one is based on the conventional fixed resolution FFT and the second one is based on the multi-resolution CQT. The two methods follow the same general schema represented in Figure 3.5. We start by estimating the tuning of the piece. The chromagram is computed in three steps after tuning estimation. First, the values of the DFT/CQT are mapped to a semitone pitch spectrum. The corresponding channels are then smoothed over time. Finally, the resulting semitone pitch spectrum is mapped to the semi-tone pitch classes.

Table of contents :

1 Introduction
1.1 Motivations
1.2 Scope of the Thesis
1.3 Relevant Music Theoretic Concepts and Terminology
1.3.1 Notes
1.3.2 Key and Scales
1.3.3 Chords
1.3.4 Metrical Structure
1.4 Applications
1.5 Objectives
1.6 Overview of the Thesis
1.7 Main Thesis Contributions
2 Databases and Evaluation Measures Used in This Dissertation
2.1 Introduction
2.2 About Evaluation
2.3 Music Collections for Evaluation
2.3.1 Signal Experiment Test-set
2.3.2 Popular Music: The Beatles Test-set
2.3.2.1 Chord Annotations
2.3.2.2 Key Annotations
2.3.2.3 Metric Structure Annotation
2.3.3 Classical Music: The Piano Mozart test-set
2.3.4 Detailed Analysis of the Databases
2.4 Evaluation measures
2.4.1 Beat and Downbeat Tracking Evaluation Measure
2.4.2 Chord Evaluation Measures
2.4.2.1 Label Accuracy
2.4.2.2 Segmentation Accuracy
2.4.2.3 Neighboring Chords Confusions
2.4.3 Keys
2.4.3.1 Main Key
2.4.3.2 Local Keys
2.4.4 About Statistical Significance Testing
2.4.5 About Evaluation of Algorithms Based on Training
3 Towards a Signal Representation for Harmonic Content Analysis
3.1 Introduction
3.2 A Representation of Audio for Harmonic Content Analysis
3.2.1 Music Transcription-Based Approaches
3.2.2 Chroma Representation, an Alternative to Transcription
3.2.2.1 Definition
3.2.3 Representation of Music Signals, Notations
3.2.4 About Acoustic Signal Representation
3.2.4.1 Fourier Transform
3.2.4.2 Frequency Resolution Versus Time Resolution
3.2.4.3 Constant-Q Transform
3.3 Chroma Representation, Background
3.3.1 Chromagram Based on the Fourier Transform
3.3.2 Considering the Harmonics in the Pitch Class Profiles
3.3.3 Constant-Q Profiles
3.3.4 Chromagram Based on multi-f0s
3.3.5 Filter bank
3.3.6 Extension: the Tonal Centroid
3.3.7 Why Using Chroma Features for Harmonic Content Analysis? .
3.4 Derivation of Chroma Features
3.4.1 Chroma Based on a Spectral Representation
3.4.1.1 Tuning
3.4.1.2 Frequency Region Selection for Chroma Computation
3.4.1.3 Computation of a Semitone Pitch Spectrum
3.4.1.4 Smoothing
3.4.1.5 Chroma Spectrum
3.4.1.6 Post-processing: Normalization
3.4.2 Chroma Based on multiple f0s
3.5 Two Problems Related to the Chroma Features
3.5.1 Chroma Features and Harmonics
3.5.2 Beat-Synchronous Analysis
3.5.2.1 Towards a Beat-Synchronous Analysis
3.5.2.2 Problem of Mixing Harmonies
3.5.2.3 Influence of the Position of an Adaptive Window
3.6 Selecting a Feature Vector for Harmonic Analysis
3.6.1 Defining a Measure to Compare Various Features
3.6.1.1 Previously proposed measures
3.6.1.2 Proposed Measure for Chroma Feature Comparison .
3.6.2 Database for Feature Selection
3.6.3 On the use of a Beat-Synchronous Analysis
3.6.3.1 Beat-Synchronous Versus Frame-by-Frame Analysis .
3.6.3.2 Influence of the Position of an Adaptive Window
3.6.3.3 Conclusion on Beat-Synchronous Analysis
3.6.4 Fixed versus Multi-resolution Analysis
3.6.5 Multi-f0s Versus Spectral Representation
3.7 Summary and Conclusion
4 Chord Progression Estimation From an Audio File
4.1 Introduction
4.2 Previous Work on Chord Estimation
4.2.1 Features That Describe the Harmonic Content
4.2.2 Statistical Machine Learning Techniques for Chord Estimation
4.2.2.1 HMM-based Baseline Approaches
4.2.2.2 Chords and Musical Context
4.2.2.3 Introducing Language Modeling, N-grams
4.2.2.4 Other Statistical Modeling Approaches
4.2.3 Pattern Matching Approaches
4.2.4 Real-Time Implementation for Chord Estimation
4.2.5 Summary of Chords Estimation Techniques
4.2.5.1 Summary of the Above-Presented Methods
4.2.5.2 Summary of the MIREX Chord Recognition Systems .
4.3 Proposed Approach for Chord Estimation
4.3.1 Hidden Markov Models
4.3.2 On the Use of HMM for Chord Estimation
4.3.3 The Problem of the Harmonics
4.4 Chord Estimation From the Chroma Vectors Using a HMM
4.4.1 Model
4.4.1.1 Chord Lexicon
4.4.1.2 Overview of the Proposed Model
4.4.2 Initial State Distribution
4.4.3 Observation Symbol Probability Distribution
4.4.3.1 Method 1
4.4.3.2 Method 2
4.4.3.3 Method 3
4.4.4 State Transition Probability Distribution
4.4.4.1 Method A
4.4.4.2 Method B
4.4.4.3 Method C
4.4.4.4 Method D
4.4.5 Chord Progression Detection Over Time
4.5 Evaluation and Results
4.5.1 Test Set and Protocol
4.5.2 Results
4.5.3 Analysis of Results
4.5.3.1 Chord Estimation Method
4.5.3.2 Transition Matrix
4.5.3.3 Number of Harmonics
4.5.4 Discussion
4.5.4.1 Chord Confusions Due to Ambiguous Mapping
4.5.4.2 Neighboring Triad Confusions
4.5.4.3 Passing or Missing Tones
4.5.4.4 Limitation for Inharmonic Sounds
4.6 Conclusion
5 Joint Estimation of Chords and Downbeats
5.1 Introduction
5.2 Related Work
5.3 Proposed Approach
5.4 Model
5.4.1 Extraction of Beat-Synchronous Chroma Features
5.4.2 Overview of the Model
5.4.3 Initial State Distribution π
5.4.4 Observation Probabilities
5.4.4.1 Observation pim Probability Distribution
5.4.4.2 Observation Chord Symbol Probability Distribution
5.4.5 State Transition Probability Distribution
5.4.5.1 Distribution of Chord Changes
5.4.5.2 Transition Matrix for a Constant 4/4 Meter
5.4.5.3 Transition Matrix for a Variable Meter
5.4.6 Simultaneous Estimation of Chords and Downbeats
5.5 Evaluation Method
5.6 Analysis of the Results
5.6.1 Chords and Downbeats Interaction
5.6.2 Downbeat Position Estimation
5.6.2.1 Semi-automatic Downbeat Position Estimation
5.6.2.2 Estimated Beats Versus Theoretical Beats
5.6.2.3 Comparison With the State-of-the-art
5.6.2.4 Handling Variable Meter
5.6.2.5 Handling Insertion or Deletion of Beats
5.6.3 Chord Estimation
5.6.3.1 MIREX 2008 “Audio Chord Detection”
5.6.3.2 MIREX 2009 “Audio Chord Detection”
5.6.3.3 Chord Segmentation
5.6.3.4 Analysis of Chord Detection Errors
5.6.3.5 Tactus-synchronous Versus Tactum-synchronous Analysis .
5.6.4 Case Study Examples
5.6.4.1 Boundary Errors
5.6.4.2 Chord Changes
5.7 Conclusion
6 Interaction Between Chords, Downbeats and Keys
6.1 Introduction
6.1.1 Organization of the chapter:
6.2 Related work
6.2.1 Global Key
6.2.1.1 Template-Based Key-finding Models
6.2.1.2 Key-finding Models Based on HMMs
6.2.1.3 The Spiral Array Model
6.2.2 Local Key
6.2.3 Key Estimation Methods Based on Chord Progression
6.2.4 Summary of the Works on Key Estimation
6.3 Interaction between Chords, Meter and Global Key
6.3.1 Overview of the Model
6.3.2 Musical Key Information in the Transition Matrix
6.3.3 Simultaneous estimation of key, chords and downbeats
6.3.3.1 Key Selection
6.3.3.2 Post-processing Key Estimation Step
6.3.4 Test-Set and Evaluation Measure
6.3.5 Overall Results
6.3.6 Analysis of the Results
6.4 Interaction between Chords, Meter and Local Key
6.4.1 The Problem of the Analysis Window length
6.4.2 Model
6.4.3 Extraction of Key Observation Vectors
6.4.4 Key Estimation From Chords Using Hidden Markov Models .
6.4.4.1 Initial State Distribution
6.4.4.2 Observation Probabilities of Keys
6.4.4.3 State Transition Probability Distribution
6.4.4.4 Local Key Estimation
6.4.5 Evaluation
6.4.5.1 Test-set and evaluation measures
6.4.5.2 Results and discussion
6.4.5.3 Relationship Between Chords and Local Key
6.4.5.4 Importance of the Metrical Structure
6.4.5.5 Effect of the Length of the Analysis Window
6.4.5.6 Effect of the Choice of the Key Templates
6.4.5.7 Smooth Modulations:
6.5 Conclusion of the Chapter
7 Conclusion
7.1 Thesis Contributions
7.1.1 Features
7.1.2 Chords
7.1.3 Downbeat
7.1.4 Key
7.2 Future Works
Annexe A – List of the Beatles songs
Annexe B – List of publications
Bibliography