Computational Reconstruction of Cognitive Music Theory

Introduction
When listeners hear music, they sometimes apprehend a deep structure beneath the surface succession of notes. In those moments, we might say that understanding of the piece has taken place. If such an underlying structure actually exists and we can extract meaning from it, could a computer system recover something analogous? This article examines music structure from three connected perspectives: music theory as seen through artificial intelligence, music as a computational object, and music considered through linguistics. Although these viewpoints share deep connections, each carries its own intellectual history and unresolved questions.
Music theory and artificial intelligence
The central goal of artificial intelligence is to model human cognition and to produce abstract representations of the real world. Representation methods ought to partition their target domain by means of equivalence relations. In actual human perception, however, these equivalence relations do not surface directly; instead, they appear indirectly as judgments of similarity. According to the MIT Encyclopedia of the Cognitive Sciences, there are several ways to approach the modeling of similarity: geometric, featural, alignment-based, and transformational. A key point is that every method of handling similarity is underpinned by equivalence relations. Whatever notion of similarity we entertain, its strength depends on how thoroughly equivalence relations hold recursively for the substructures of a piece. Put differently, a stable equivalence relation leads to a stable sense of similarity.
Within music information research, musical similarity has attracted considerable attention. Some researchers are driven by practical engineering goals such as music retrieval, classification, and recommendation; others focus on modeling the cognitive processes that underlie our perception of similarity. A central question remains: how can such similarity—or an appropriate equivalence relation—be captured in a representation of musical objects? Marsden has set forth requirements for a representation system: musical objects must be well-defined and must all be grounded in real-world referents. These requirements are essential for mechanizing music theory and notably parallel the formalization of intelligence and the representation of knowledge.
Because a musical piece contains notes, passages, chords, rhythms, and many other elements, numerous kinds of equivalence relations can be considered. For example, if we compare two versions of a melody—say, the opening of Bach's Invention No. 1 and a transposed variant a perfect fifth higher with certain lowered notes — the two melodies are not equal in any literal note-by-note representation. But once we consider abstraction layers such as pitch-interval representation, deeper equivalences may emerge.
