Can Statistical Language Models be Used for the Analysis of Harmonic Progressions?
The availability of large, electronically encoded text corpora and the use of computers in recent decades have made Natural Language Processing (NLP) a flourishing research area. A wealth of standard techniques has been developed to serve use cases like document retrieval, identification of a finite vocabulary and synonyms, and the collocation of terms. Similarly, social networking among musicians in internet forums and the advent of automatic chord extraction have led to the establishment of chord databases, if on a smaller scale. Comparatively little research has been carried out on these growing corpora of chords. We suspect that one reason for this lack of research lies in the difficulty to decide if chords or other harmonic elements can be treated like lexemes in a text corpus. More simply, the question is: What is a word in terms of harmony? In this paper we propose a bottom-up approach. In order to find harmonic units whose distributions resemble distributions of words we consider chord elements differing in (a) length of chord sequence (counted in chord symbols), and (b) chord alphabet. Using lengths from 1 to 4 and two different chord alphabets we obtain a parameter space of size 8. For each of the parameter settings we compute statistical summaries of the resulting frequency distribution of the harmonic unit. As results, we report the parameter settings for two different chord corpora (2500+ songs each) that generate a frequency model corresponding most closely to the Brown Corpus, a general text corpus of American English.
@inproceedings{mauch:csl:2008,
Author = {Mauch, M. and M"{u}llensiefen, D. and Dixon, S. and Wiggins, G.},
Booktitle = {Proceedings of the 10th International Conference on Music Perception and Cognition, Sapporo, Japan},
Title = {Can Statistical Language Models be Used for the Analysis of Harmonic Progressions?},
Year = {2008}}









