Computational Systems for Music Improvisation

Human improvisation stands as one of the most demanding and rewarding creative acts. Musical improvisation demands a blend of skills: physical technique, deep genre knowledge, implicit nonverbal communication, and the capacity for real-time creative inspiration and judgment. Building computer systems that can collaborate meaningfully in this spontaneous musical dialogue requires tackling considerable complexity.

This investigation examines computational music improvisers—digital software systems that act as improvising partners with creative agency—along with their often custom hardware and software interfaces. The goal is to map existing systems in order to support the design of future ones and expand what is possible in improvised performance. To that end, a diverse array of existing systems was reviewed, drawing out the key considerations involved in designing a computational partner for improvisation. While some findings apply to improvisation generally, the primary focus here is musical improvisation, where a human musician or group works with computational improvisers toward musically satisfying outcomes.

The field draws from machine listening, artificial intelligence, musical interaction design, and algorithmic composition. Three criteria guide which systems are included: they must be able to improvise music with a human collaborator; they must provide an interface for interaction or control; and they must display some level of creative agency. This scope excludes systems that do not allow improvisation, autonomous algorithms that do not interact with humans in real time (such as musebots), and digital musical instruments that lack creative agency.

Many computational improvisers to date have been designed in an ad-hoc, individual manner. The aim is to deepen understanding of the key design and evaluation issues, enabling a more structured approach for future creative improvising machines. Another concern is understanding how computational systems provoke and stimulate human creativity in ways that differ from what human collaborators alone can provide.

Creative Agency

Attention is restricted to systems with a perceived degree of creative agency, where the human user senses the machine as an autonomous contributor to ongoing collaborative creativity. This autonomy might emerge from complex internal dynamics or from algorithms specifically designed to create it. The concept draws on two distinct types of autonomy described by Boden. The first kind, physical autonomy, relates to dynamical systems, multi-agent models, and artificial life, and is rooted in self‑organisation—a bottom‑up emergence of behaviour from interacting components. The second kind, mental or intentional autonomy, relies on concepts like intention, belief, and desire from Theory of Mind, and would require musical subjectivity and intention. This second kind has long been a goal for AI research but remains largely elusive; however, several computational improvisers can produce the illusion of musical intention, which can be seen as a top‑down approach.

A system’s creative agency arises from its autonomy directed toward creative outputs. The human performer perceives that the system makes novel and appropriate contributions to the musical outcome. The more substantial and effective the system’s contributions, the greater its creative agency. For present purposes, agency is evaluated in a weak sense: its degree is determined through perception rather than formal proof or empirical measurement. Although this may be seen as subjective, such judgment is no different from what any human musician or critic would use when assessing a potential improvisation partner.

Taxonomy of Improvisational Music Systems

Improvisational systems present a wide range of opportunities for computational and musical innovation, but many past developments have been project‑specific or tailored to an individual designer’s or performer’s quirks. To clarify this fragmented design space, a review of existing research was conducted, leading to a taxonomy. Its development was iterative: the authors—all experienced in building computational improvisers—compiled a collection of known systems (from performances attended as audience or performer, from personal development, reading, or general knowledge). Additional systems were encountered through contextual review. Over forty systems were considered, selected to represent a broad range of design approaches. Around half met the final criteria and are included in the taxonomy. The list aims to be representative rather than exhaustive.

Related Work

Employing computation in improvised music has spanned several research fields over recent decades, alongside many artistic projects. Early work tended to focus on compositional approaches because real‑time processing was difficult. As computational power increased, the complexity of real‑time improvisers grew. Earlier techniques—like algorithmic composition modules—still appear in later systems. Relevant areas include interactive composition, where Joel Chadabe’s systems like the Coordinated Electronic Music Studio (1969) gave a sense of conversing with an instrument that had its own personality. Those early systems are viewed here more as complex electronic instruments than as computational improvisers, a line that is admittedly somewhat arbitrary. Laurie Spiegel’s Music Mouse is listed as the first example of a computational improvisation system.

Algorithmic composition—also called generative music or musical metacreation—has a rich non‑interactive history. Many computational improvisers use these techniques for musical output. Some systems generate improvised material in compositional rather than performance contexts, such as Keller’s Impro‑Visor, which produces notated jazz solos from chord progressions.

Rowe defined interactive music systems for joint human‑computer performance and classified them along three dimensions: drive (score‑driven or performance‑driven), response method (transformative, generative, or sequenced), and paradigm (a continuum from instrument to player). The systems of interest tend toward the “player” end. Since the mid‑1980s, score‑following systems have provided computer auto‑accompaniment that adjusts playback speed to match a human’s timing; these are used by some improvisers, such as Shimon.

In the domain of digital musical instruments (DMIs), while some design and evaluation concepts are relevant, typical DMIs focus on control rather than creative agency. Examples like Laetitia Sonami’s Lady’s Glove and the reacTable were excluded because, despite some agency in their synthesis and mappings, they are considered primarily instruments.

Descriptive Axes

The descriptive axes of the taxonomy emerged from an iterative process of describing, analyzing, and categorizing systems from the collection. Key aspects were extracted to understand how systems function and to guide design and evaluation approaches.

The improvisational model refers to how the human conceptualizes the interaction. A “duet” model applies to systems that emulate human improvisational behavior. Systems using non‑linear dynamics or chaos generate rich responses and are described as “collaborating with a complex system.” Systems where creative agency lies in learning or evolving mappings between gesture and sound are termed “gestural instruments.” Environments for real‑time construction of music algorithms take their improvisational model as “live‑coding.”

Notable features vary widely and are listed per system but not used as taxonomic dimensions due to their idiosyncrasy. Perceived creative agency is rated on a 0‑5 scale (0 = none, 5 = typical human collaborator). Ratings are subjective, based on mutual understanding or experience with each system. For live‑coding systems like JITLib, which have no inherent agency, the rating estimates a maximal value based on known use‑cases where programming achieves some agency.

Binary indicators mark whether an interface exposes real‑time control beyond direct music or audio input, and whether the system learns over time from cumulative interaction. Musical analyses are abbreviated by a single capital letter: Melody/pitch‑class, Key, Harmony, Rhythm, Sound (timbre), Phrasing, note Density, Loudness/dynamics, Timing (including micro‑timing, beat tracking, onset detection, tempo), score‑Following, and Orchestration (musical parts or roles).

The aesthetic source describes how the system creates appropriate, meaningful musical output. Since no computational approach to autonomous human‑level aesthetic appreciation yet exists, the system must derive musicality from elsewhere. Four categories are identified: System‑designer aesthetics are baked in via rules of harmony, voice‑leading, spectral balance, and rhythm, programmed into generation or analysis modules. Performer‑as‑source systems inherit features from the human’s musical output while they perform, either by imitation and transformation of rhythms, expression, or pitch‑class sets, or by allowing the performer significant influence on aesthetics (e.g., live coding). Corpus‑based systems pre‑train on a musical database, and the user may change the corpus to influence style.

Finally, a system may apply machine listening or machine learning to gradually infer what constitutes ‘good’ music. These four categories are not mutually exclusive.

The final column, Methods, examines the internal algorithmic structure of the system regarding specific techniques employed. These encompass familiar mathematical approaches such as Markov processes, neural nets, and others.

We have populated Table 1 — shown below — chronologically with systems that were analyzed and deemed eligible during the creation of our taxonomy. These systems cover numerous combinations of our descriptive axes. From this collection we have selected seven to describe in greater detail, systems we regard as particularly historically innovative, influential in inspiring other systems, or illustrative examples of novel computational improvisers.

Table 1. Taxonomy of Improvisational Interfaces

[System list with headers as presented in source]

3. Example Systems

3.1. Music Mouse (Laurie Spiegel, 1986)

Music Mouse (Spiegel 1987) stands as an early commercially available computational improvisation system. It is a screen-based algorithmic instrument containing embedded knowledge of chords, scales, and stylistic conventions. The user controls melodic note selection through mouse movement and specifies numerous musical parameters via keyboard commands. The program exemplified a rule-based music system with real-time user control, suitable for both improvisation and composition.

The system generates four-voice harmony from two-dimensional mouse movement while simultaneously reading computer keyboard commands that affect orchestration, harmony, voicing, and more. The software’s internal logic adapts features such as harmony type, transposition, scale degree, and melodic inversion in real time to match the mouse-selected pitches. It employs built-in constraints to produce music with conventional stylistic logic from potentially random input pitches.

Figure 1. Spiegel’s Music Mouse contains embedded chord, scale, and stylistic knowledge. The program tracks mouse movement and keyboard commands to generate music.

3.2. Cypher (Robert Rowe, 1992)

Cypher (Rowe 1992) was developed under the influence of connectionism, particularly the work of Marvin Minsky (1986). The system comprises two listeners — one analyzing incoming MIDI data from the human performer and the other describing how lower-level descriptors change over time — and a player that generates musical responses from the internal representation of acquired information. Each component is itself composed of several agents that can connect and interact with one another. In the first listener, for instance, data is classified according to six dimensions: vertical density, attack speed, loudness, register, duration, and harmony. In the second listener, previous reports are grouped into segments and phrases through beat-tracking, boundary detection, tonal pivots, and similar processes. These agents consult each other to determine the most probable class of the event in question, based on the system’s built-in musical knowledge.

The player component produces musical output via three methods: transformational, algorithmic, and pooling from a sequence library. Cypher can operate under performer control — for example, the human performer can connect player methods to specific listener messages — but it can also compose without input by applying transformational processes to stored representations or by generating material from scratch. In the former case, the human interacts with Cypher as in a musical duet, playing with the system as one would with a human collaborator, aside from maintaining some control over the system’s connections and parameters.

3.3. Voyager (George Lewis, 1986–2003)

Lewis describes Voyager as a “virtual improvising orchestra,” a software system that both listens and responds in an interactive dialogue between musician and machine — what we term a musical duet between human and machine. Voyager’s design incorporated not only technological aspects but also socio-cultural dimensions of music composition, forming an intimate, bespoke framework for music-making that aimed to embody “African-American cultural practice” (Lewis 2006). Lewis began developing the system in 1986 and has since created numerous versions and improvements.

Voyager was among the first improvisational systems to employ multiple virtual players (agents) that together constitute the computer’s musical improviser. Unlike other multi-agent systems — such as Blackwell’s Swarm Music (see Table 1) — Voyager features a supervisory control system that selects agent combinations and their generation method from a set of carefully crafted algorithms. Melody generation, for example, was drawn from fifteen possible algorithms with access to 150 microtonal pitch sets.

Despite several performer-specific design choices, Voyager’s basic information flow — including pitch following, real-time statistical analysis of low-level musical information, exclusive focus on sonic interaction, and the balance between musical response and idea initiation — makes it an intriguing example of performer-driven design.

3.4. JITLib (Julian Rohrhuber, 2003)

In live coding, performers improvise by writing and editing code during the performance itself. Experiments in editing code at performance time date back to the late 1970s, when the League of Automatic Music Composers’ network music performances sometimes involved editing patches (Collins et al. 2003). However, an explicit practice emerged around 2000, when early proponents such as McLean, Collins, and Rohrhuber began editing their patches during performances and developing tools to support this.

The first dedicated live coding language was JITLib (Collins et al. 2003), a library for the audio-programming language SuperCollider. It enables writing and rewriting of algorithms with on-the-fly compilation, facilitating the live writing and editing of sound-synthesis functions and effectively blurring the boundaries between performer and instrument designer.

The system itself possesses no built-in analysis, but its mutable nature means that a skilled performer could theoretically build this in during performance. One could argue that JITLib exhibits a modest degree of perceived agency given the unpredictable nature of live algorithm design. Aesthetic evaluation takes place at performance time through a feedback loop between the algorithm’s output and the performer.

3.5. Shimon (Gil Weinberg & Guy Hoffman, 2006)

Figure 2. Shimon

Shimon is an animatronic machine improviser that plays marimbas in a jazz ensemble (Hoffman and Weinberg 2011). It is designed around the concept of machine embodiment and the extra-musical communication provided by physical gestures among an ensemble and with the audience. Shimon improvises jazz solos using a real-time engine trained via a genetic algorithm, and performs beat-tracking and score-following for auto-accompaniment. It typically plays jazz standards.

Particular attention was paid to Shimon’s physical movements. Its mechanical nature presents both challenges — such as correct timing when moving between distant notes — and opportunities for human players to anticipate its motions. The designers have conducted studies indicating enhanced temporal entrainment in performance due to the visual cues Shimon provides to human players. It also appears to help audiences understand the robot’s musical contributions and strongly contributes to the perceived sense of its creative agency.

3.6. Wekinator (Rebecca Fiebrink, 2011)

Interactive machine learning (IML) involves a human user interactively training a machine learning algorithm. Wekinator is an IML system designed for musical applications (Fiebrink and Cook 2010). It is used to improvise mappings and interpolations between sets of input-output pairs. The input can be a live audio or video feed, MIDI, or anything that can be converted into numerical vectors. The output is a sequence of numerical vectors, typically converted into OpenSoundControl or MIDI messages. The mapping between inputs and outputs — which allows control of the output stream from an input stream — is achieved through a set of built-in models tuned to perform well with small training sets. Only a few examples are needed to gain tangible control over the output.

Wekinator can interpolate between output targets as different input targets are presented, enabling continuous control. Since it generates OpenSoundControl and MIDI, it can integrate with any system supporting those protocols. Thus, users can interact with everything from an effects processor to a full algorithmic improviser.

3.7. Reflexive Looper (François Pachet et al., 2013)

Building on prior work with the “Continuator,” Pachet (2006) and colleagues developed the concept of “reflexive interactions” in a system called the Reflexive Looper (Pachet et al. 2013b). Based on the idea of an enhanced loop pedal — where a learning system allows one to play with past virtual copies of oneself (Pachet et al. 2013a) — the Reflexive Looper attempts to create “musical performance copies” of a player from their style, effectively forming a virtual band to accompany the performer (Fig. 3).

The machine imparts a significant sense of creative agency by inheriting musicality from the human performer with sufficient transformation to avoid seeming like direct imitation. Its musical analyses are based solely on what the musician is doing in the moment and the intensity of that playing — loudness, number of notes in the bass line or melody, number of notes in the chord.

The looper can assume different instantiations of an instrumentalist — such as a guitarist or pianist — playing a bass line, a chord line, or an improvised solo line, with each of these responding to the performer. The system shares the performer’s goal of creating “band music,” achieving this by aiming for the best ensemble sound possible. The musicians’ creative activity is challenged and stimulated by playing with responsive copies of themselves, leading to musical creations that would not have been possible playing alone.

4. Design Considerations

From the taxonomy above and its application to the range of systems in Table 1, we distill design considerations pertinent to developing new improvisational systems.

In conjunction with Table 1, Figure 4 provides a schema covering many of the system configurations commonly used by the improvisational systems examined in this research. Mandatory elements include a human and a computer with communication between them, and minimally some form of generative output module in the computational system. Choices then need to be made about which communication channels are used and what optional elements are included — for example, some form of machine embodiment (such as animatronics) housing the computational engine, single or multiple agent design, and various machine listening or computational creativity components such as memory, expert system analysis, generative style replication, and computational aesthetic evaluation. One simple observation from a design perspective is that the range of approaches argues against any particular feature set being essential.

Analysis of Table 1 reveals interesting insights into possible system designs. Somewhat surprisingly, sophisticated musical analysis methods are not essential to creating an improvisational system. In systems with little or no musical analysis, the human musician must work with the system, often requiring carefully constructed interactions to achieve acceptable results. Two earlier systems — Cypher and Oscar — attempt to analyze the most musical features. The observation that later systems tend toward more minimal analyses suggests that implementing complex musical understanding remains ambitious and probably not the most effective use of design resources. Another common approach — particularly in reflexive systems — is to inherit musicality from the performer rather than trying to extract it from complex internal analysis.

Figure 3. Schematic diagram of the Reflexive Looper. This system uses a multimodal representation of incoming music, with off-line training providing basic musical knowledge.

Another temporal trend is toward greater stylistic generality. Where early systems such as Music Mouse, Cypher, and Voyager made heavy use of hard-coded rules of musical structure or preprogrammed sequences, later systems have leveraged the recent availability of online machine learning techniques — such as Variable Markov Models (VMM), real-time adaptive Support Vector Machines (SVM), and other statistical analyses — to extend to different styles. If designing a system for others to use, this may be an important consideration. At the far end of this scale, systems such as JITLib and Wekinator impose few stylistic constraints on the performer, relying on human listening as the primary aesthetic evaluation method, which requires considerable experience working with the system.

We have focused on systems that impart a sense of creative agency, making this another natural design consideration reflected in many of the canvassed systems. One might assume a trade-off exists between creative agency and controllability, yet analysis of Table 1 suggests a more nuanced relationship. Several systems to which we assigned the highest creative agency also provide non-musical parametric controls. To some extent, however, this may reflect contexts of use: when high-agency systems are used in performance, they may be retrofitted with controls to tame complexity. For example, Swarm Music and CIM expose some manual overrides. In general, systems that rely on complex bottom-up dynamics for their creative agency need to expose more non-musical control to counteract the self-generating aspects of their autonomy. Higher degrees of musical analysis can eliminate the need for extra-musical control, but this often limits musical flexibility or generality.

Finally, pre-training with musical knowledge — while effective in imbuing a system with a degree of musical competence — limits stylistic possibilities during performance. Finding sufficient new training data and performing effective training can be time-consuming and difficult, though a good training set can provide important genre-specific knowledge unavailable through other approaches.

We now turn to the topic of evaluation, an aspect of designing improvisational systems rarely addressed in any of the systems considered for our taxonomy.

I provide the sources and claims… (optional)f{Assessing improvisation systems requires leveraging established HCI evaluation frameworks tailored to musical interaction. Kiefer et al. (2008) explored a case study on musical controllers: their method paired quantitative telemetry data with qualitative grounded theory analysis of structured interviews, illustrating how both paradigms serve creative domains.

Stowell et al. (2009) argued that traditional task-based HCI methods, such as talk-aloud protocols, fit poorly with live music-making’s fluidity. They preferred comparative output analysis and discourse analysis centered on human judgment. This experiential bent aligns with Jordanous (2012), who proposed a three-step evaluation process for creative systems: (1) define target creativity criteria, (2) specify assessment methods, and (3) execute using qualitative or quantitative approaches. Central to her framework is a set of 14 empirically derived "components of creativity," including spontaneity and domain competence.

O’Modhrain (2011) tailored DMI evaluation to four perspectives—performer, audience, designer, manufacturer—emphasizing enjoyment, playability, robustness, and specification alignment. Interrogating longitudinal interviews gauged enjoyment, whereas traditional hardware and software testing measured playability/robustness. Jordà and Mealla (2014) offered a complementary teaching framework for DMIs, factoring mapping richness into system assessments and musicality into performance evaluation, using peer evaluacion through Likert-scale questionnaires.

Thus, evaluative strategies span granularity levels, from low-level interface logging to holistic aesthetic judgments, and accommodate early‑stage participatory design up through concert‑ready validation. User, designer, and audience voices remain equally relevant. Acknowledging varied perspectives, HCI’s shift from solely task‑optimized to experience‑focused techniques suits computational improvisation well.

Given the field’s maturation, a key paper aim is framing the core design considerations for any computational improviser. Best‑practice evaluative methods, including those above, must ground new system development.

The final section underscores our review taxonomy for classifying computational improvisers. Systems considered allow spontaneity between human performers and algorithmic partners, each exhibiting some degree of creative agency. Their variety illustrates historical and technical breadth, though we make no claim of exhaustion.

Our descriptive taxonomy—examining creative agency level, musical analysis and aesthetic strategy integration, interaction design style, and underlying algorithms—foregrounds key design and evaluative challenges. Notably, achieving audible agency does not demand maximal system complexity; we aim for structured cross‑comparison to accelerate progress in innovative, impactful performer‑audience experiences.

Technological performers keep improvisational artistry evolving.

A serious burden of proof rests on system evaluation. Unless aligned from design stages with user-study tests and replicate‑able documentation, formal comparison across the differing projects surveyed—whose authors value creativity is beyond questioning—remains elusive.

References list the usual supply: Assayag et al. (2006) on Omax, Biles (1994) on GenJam, Lewis (1999,2006) on Voyager, Rowe (1992) on Cypher, Pachet (2002,2006, 2013b) on Continuator and loopers, Dahlstedt (2006) on a concept of parameter‐space mutation, Eigenfeldt (2016) reviewing musebots, but codify no recipe yet measure friendly how other factors contribute improvement any person could enact to reuse results without special permission interpretation.