Creative Applications of Interactive Mobile Music

May 30, 2026
17 min read

The past decade has witnessed remarkable progress in making musical content portable and mobile, fueled by advances in wireless networks, miniaturized storage, and improved device engineering. This chapter examines foundational research conducted at Sony Computer Science Laboratory Paris (CSL) — a noncommercial lab devoted to fundamental investigation — and at Culture Lab Newcastle. The approach draws on methods from creative practice and reconsiders the forms that music can assume when carried on mobile devices and transmitted over wireless infrastructures. A central premise holds that music is an emergent, fluid, expressive, and context-dependent phenomenon rather than a fixed commercial commodity. The discussion spans domestic settings, social scenarios, locative media, and interactive performance. Together, the works presented offer insight into conceptual approaches to mobile music creation that developed outside the sphere of commercial applications.

Convergence and Integration: From Walkman to iPhone

It is now routine to imagine and use sophisticated portable devices that function as telephone, music player, and camera all at once. Apple’s iPhone and the many smartphones built on Android and Symbian operating systems connect to broadband cellular networks and provide location sensing through the Global Positioning System (GPS). Almost all these products are still called mobile phones, implying that voice communication remains primary while music and imaging are secondary. As a composer, I sought to invert this relationship and conceive advanced musical scenarios that leveraged wireless streaming, gesture detection, and location awareness.

Even when music takes center stage, these coexisting capabilities on a single device represent not only high technological integration but also a form of conceptual convergence. While the hardware unites these functions, less attention has been paid to how people actually integrate their usage. Playing music, calling, messaging, taking photos, and mapping locations remain separate applications that shift the device’s current mode. No forward-looking app pipes an MP3 playlist as background during a call, for example, nor does any system easily connect music with photographic images. Raskin’s concept of modelessness in screen-based interface design allows users to productively manage multiple tasks. Moving from modal interfaces to modeless interaction is less trivial on portable devices with their limited screens and in-the-while usage contexts, but tackling these challenges could yield more imaginative integration of the different media functions.

Jenkins extends simple feature integration by proposing technological convergence. Beyond the union of sound, image, location, and communications lies a higher-order convergence of consumer electronics hardware, media content, network data, and other services. Convergence products have flourished — most notably Apple’s iTunes system, which couples entertainment content and application software catalogues with hardware offerings. But this convergence has occurred at a commercial level without fundamentally transforming content or its formats to exploit the new possibilities of personalized, context-aware network distribution. In online music distribution, a single remains a single, and an album remains an album. The work described here adapts existing music into fresh, malleable formats suited to the infrastructures they inhabit, and imagines entirely new music created specifically for these systems.

Combining a personal music player with a mobile telephone seems natural. Beyond technology integration and conceptual convergence lie underlying cultural differences between music listening and communications that make the combination nontrivial. Bull notes the isolating character of headphone listening, while Ito observes the constant contact that mobiles provide. MP3 players and cell phones share many qualities — they are portable, audio, highly personal devices — yet they serve very different social functions. The work presented here seeks ways to bridge these differences and imagine what a true convergence device might be. We draw on social computing to see how music can serve the new social dynamics that mobile networks allow. From an audio processing standpoint, participative, flexible content forms become feasible. Finally we examine real-world issues of deploying such systems on off-the-shelf mobile phones and commercial cellular networks.

Location Sensing

Dynamic geographic location is one of the fundamental characteristics of a mobile user. Mobile usage implies access to the same information universe regardless of location — anytime, anywhere. Yet designing information systems for mobile use involves more than porting a desktop web page to display on a phone screen. Not only do screen dimensions and device form-factor change, but so does the entire usage dynamic. Instead of providing a single information stream while in motion, we focus on shifting needs triggered by location changes. Commercial location-based services range from simple geotagging on Flickr to broadcast location updates via Google Latitude, to soundwalk city tours narrated by movie stars — but the killer location-aware application has still not arrived. Just as creating information spaces for mobile environments differs fundamentally from serving stationary settings, imagining music for mobile environments must go beyond carrying one’s entire album collection in a shirt pocket. Here we examine ways location sensing can be employed musically to produce new, contextual experiences.

Artists working in locative media art have seized on the creative potential of geographic information. Their work includes visualizing movement across geographic space as drawings, tagging physical space with sound (as in Shepherd’s Tactical Sound Garden), and linking theatrical choreography to participants’ displacements (as in Blast Theory’s seminal projects). Sound-based projects like Sonic City turn the city into an interface for a generative electronic music system. Harris’s works, meanwhile, subvert assumptions of multi-user connectedness to focus instead on the data jitter of stasis.

GPS is the technology most commonly associated with location tracking, but it is not the only option. My projects have used motion capture, Bluetooth signal reception, GSM antenna strength, and GPS to sense user location. Each approach presents distinct trade-offs in accuracy and response time that affect musicality when used as input to sound processes. Geographic localization, then, is not one thing but a form of information that can be captured in various musical ways.

Domestic Environments

Indoor location sensing remains a highly relevant technical challenge, while GPS works only outdoors. Starting in 2002 with the SoundLiving project, we used low-power Bluetooth base stations to equip a domestic environment, creating personalized spheres of sound that followed a user throughout the home. The working prototype augmented a home stereo system. Instead of a traditional remote control, the user carried a Bluetooth probe communicating with receivers in each room. The listener selected music via the touchscreen on the device. When the listener moved to another room — from living room to kitchen, for instance — the probe announced his presence to the room entered, and the system seamlessly re-routed the network data stream carrying the music. From the listener’s perspective, the music simply continued uninterrupted, naturally emerging from the speakers in the kitchen while ceasing in the living room. It felt as though the music formed a personal audio sphere drifting with him through the house.

The design separated the mobile device (the location probe) from sound production (the stereo speakers). Hidden behind what looked like an ordinary hi-fi were localization and network routing services integrating the speaker systems throughout the house.

Wireless audio broadcast products have since been introduced, including Apple’s AirPlay. These are usually no more than cable replacements, based on a broadcast model where a single source sends audio to multiple wireless speakers. They do not perform location sensing, and more importantly, they do not address the personal nature of music that takes on embodied qualities co-located around and relocated with the listener’s movements. SoundLiving was unique in providing continuity of music delivery, creating a location-aware personal audio bubble.

Malleable Content

Moving from an indoor domestic space to imagining music across multi-user geographic space, we developed a remix software engine that generated continuous variations on familiar popular music according to location. Each participant was represented by a musical part or instrument; their relative proximity mapped onto the amplitude of that part in the mix. The user’s gestures and actions on the mobile device (personal context) modulated effects on their part, giving the group an idea of that user’s behavior — running, dancing, or sitting still. The mix of parts reflected the social context encoded in location data. The resulting mix streamed back over wireless broadband networks to every mobile device. All group members heard the same stream, creating a shared experience. The fact that a song’s remix could reflect each participant’s behavior and the group’s global state yielded what we call a social remix.

We implemented the Malleable Mobile Music system using a familiar pop song from a commercially available album. After detecting the recording’s global tempo, we built a temporal map of the song that identified large-scale structure (verse, chorus) and the appearance of musical parts (voice, percussion, horn section) in each section. The Malleable Music engine used the song map as an index into the original recording, instantly seeking to any measure and looping a specified segment a certain number of times. The server instantiated multiple voices of this engine — one voice per participant — and synchronized them. In a three-user system, for instance, three voices independently played arbitrary sections of the original recording, all rhythmically synchronized and mixed into a stream sent to all devices. With the map and original recording, a live cut-and-paste remixing occurred in real time.

In this setup, music became a direct carrier of social information. The part representing each user functioned as that user’s musical avatar. The remix unfolded according to participants’ movements. One user could perceive another’s proximity by the volume of their part in the mix, and could guess at their activity through the filtering and delay effects applied. This points to using music as an ambient information display, where the user need not take any explicit action (like calling or texting) to learn useful information about friends’ relative proximity and activity. That social information is embedded within the musical content itself and perceived during ordinary music listening.

This work continues a long tradition of making existing music interactive. Early examples include Peter Gabriel’s CD-ROM Xplora 1 (1995). Since the initial malleable music research in 2004, the company MXP4 in 2006 introduced a file format that separately delivers component tracks of a musical composition for synchronized interactive playback to facilitate listener remixing. Trent Reznor published a web-based remix system in 2007. While all these systems enable deconstructing and reconstituting music, the commercial examples focus on an individual user actively engaging in the remix process. With Malleable Mobile Music, we aimed not merely to deploy an interactive music system in geographic space but to re-contextualize music following Erickson’s concept of social translucence, making music a location-aware and responsive medium that reflects back to the listener the state of her immediate social group.

Sensing the Self within a Group

A perceptual challenge arises when decoding fluid changes in abstract music that represents concrete physical phenomena like proximity. Social translucence describes how social dynamic appears in information displays, a term used in social computing to signify using social information to support collective action. A key element in decoding is situating oneself within the whole. In a location-based remix, the instrument representing a listener might remain at constant volume relative to other dynamic voices. Providing local context for that listener’s own part can give reflexive understanding of the situation, thereby aiding the listener in decoding the wider context of other users. We extended Erickson’s term to coin reflexive translucence, including the user’s own sense of agency and position within a group.

To heighten this sense of agency, we implemented two responses to local context. One subsystem detected grip pressure, rotation, and shaking through sensors on the handheld device; the second provided a localized audio display. This research took place in 2005, two years before the iPhone popularized accelerometer integration for user interfaces. The local sensors detected gestures — conscious or subconscious — that listeners performed while hearing music. The captured gesture (via pressure and accelerometer sensors) in turn affected the music playing, creating a feedback loop of perception, reaction, and enactment. In a multi-user musical context like a social remix, these local sensors gave each listener interactive feedback on their own part, producing a distinct immediacy compared with the more slowly modulated parts representing other users.

The mobile devices’ local sound output could be directed to the built-in speaker or the headphone jack, creating a separate audio stream from the collective mix playing in a room. This helped place the device and its listener within physical space and situate the listener in music projecting multiple users’ states. Acoustically, this placed a sound source in the environment, markedly different from virtual surround-sound panning for the group. Using local outputs or the network as audio destinations, and the possibility to render sound publicly or locally through headphones, created a multifaceted hybrid audio space that supports both personal and community social contexts.

Into the Wild — the Real World and Real Mobiles

In the social remix examples, we concentrated on how generative music content and delivery could respond to personal and community contexts. Community context came from geographic data produced by a location simulator module that placed three visual avatars on a city map. Dragging the avatars across the screen generated geographic input, feeding the Malleable Mobile Music engine as control information over the network. This helped us conceive how music might evolve as community members moved around. The next step was to move beyond the simulator and explore location tracking techniques usable outdoors. Originally conducted in 2003, this work anticipated the widespread deployment of 3G/UMTS mobile broadband across Europe. When 3G networks and handsets arrived, these early experiments translated into real-world implementations.

After aggregate runs from multiple participants and single runners were compiled, the accumulated data formed statistically significant patterns over time and across individuals. By linking playlist contents to each data point, early insights into music’s role in training began to emerge. Unlike products such as Nike + iPod that connect listening to athletic activity, Dry Run embedded music within both the geographical and emotional arcs of a session. Aggregating across sessions and runners pointed toward an unrealized goal for a future version: extracting the “ideal playlist” from the dataset.

background image

The Mobile Musical Instrument

For our most recent mobile music project, we adopted the now-ubiquitous iPhone and turned attention toward creative practice and live concert performance. Over the period spanned by this research, laboratory technologies gradually migrated into consumer products. The iPhone represented the genuine arrival of advanced multimedia, embodied, mobile computing. This shift prompted the release of numerous music apps, some discussed elsewhere in this volume. Among them was RJDJ, standing for Reality DJ, a free software app for iPhone and Android phones that emerged from the open source community. RJDJ functions as a mobile interactive music playback engine built on Pure Data (PD), the open source branch of graphical music programming environments that also includes Max MSP, originally developed at IRCAM by Miller Puckette. The platform implements a simplified PD version on mobile processors, enabling interactivity through microphone input, accelerometers, and the touch screens found on advanced smartphones. RJDJ’s developers publish a catalog they call “reactive music” — various generative and interactive forms that extend traditional Walkman-style listening from fixed music assets to continuously context-aware sound. With this, some of the vision initially articulated in research projects like Sonic City reached the marketplace.

These technologies and concepts, arriving to a broader audience, rested on software platforms that had anchored interactive computer music composition and performance since the 1980s. Starting with IRCAM’s Patcher, the paradigm of programmatic functions represented onscreen as objects connected by virtual wires began as a method for controlling computer music synthesis on mainframes. Real-time computer music on personal computers and laptops became pervasive through Max/MSP and PD. Porting PD to iPhone and Android essentially put an IRCAM studio from the 1980s into one’s pocket. This transposition from mainframe to mobile carried a fundamental shift in the social contexts where computer music could now occur.

My own musical output paralleled this development, with the formation of ensembles reflecting musical and technological contexts in live performance. This included Sensorband, formed in 1993, and Sensors_Sonics_Sights, formed in 2003. With the move to RJDJ on iPhone, the duo of Adam Parkinson and myself, 4 Hands iPhone, continues the live computer music tradition through mobile music technology.

In the duo, we harness a widely available consumer device, the Apple iPhone, as an expressive, gestural musical instrument. The device is a well-known iconic object of desire in consumer society. The iPhone’s primary function for most listeners is playing music as a commodity. We re-appropriate this object and the commodity of music, exploiting its advanced technical capabilities to transform a consumer item into an expressive concert instrument. With one device in each hand, we create chamber music — four hands for iPhone. The accelerometers, typically serving as tilt sensors for rotating photos, are reused for precise capture of free-space performer gestures. The multitouch screen, otherwise used for scrolling and pinch-zooming text, becomes a reconfigurable graphic interface akin to the JazzMutant Lemur, with programmable faders, buttons, and 2D controllers that adjust synthesis parameters in real time. We have ported Nobuyasu Sakonda’s advanced granular synthesis implementation from MaxMSP to RJDJ, using it as the sole process by which a battery of sounds is stretched, frozen, scattered, and restitched.

background image

Source sounds include excerpts and loops from popular music — the very commodified music typically fixed and consumed on iPods — along with natural sounds, artificially processing and recontextualizing the kinds of experiences associated with standard, ambulatory personal music player use. Because all system components — sensor input, signal processing, sound synthesis, and audio output — reside in a single device, the RJDJ-enabled iPhone differs greatly from the typical controller-plus-laptop model used in contemporary digital music performance. Encapsulating all instrumental qualities from gestural input to expressive sound output within a self-contained, manipulable object transforms the mobile phone beyond a consumer icon into a powerful musical instrument.

Conclusion

The systems described here all aimed to leverage contextual sensing — from location tracking to gestural capture — together with dynamic media delivery to forge new musical experiences shareable among groups of performers and listeners. Step by step over several technology iterations, we developed localization, musical expression, content delivery, association, and community dynamic. SoundLiving began as a single-user experience with a fixed piece of music seamlessly redirected on the fly. That evolved into a group experience with Malleable Mobile Music, where each participant maintained a sense of agency for their contribution, a phenomenon we termed reflexive translucence. Net_Dérive emphasized music’s potential to carry social information. Shifting from questions of music conveying human presence, we focused on closely linking sonification and visualization of community dynamics. This was extended to a larger group in a non-musical context in Dry Run. Finally, 4 Hands iPhone returns contextual sensing to a purely musical performance, exploring the mobile phone as an expressive, holistic musical instrument.

Throughout these projects runs a working method that considers music an emergent form to be sculpted rather than a fixed media commodity to be consumed. While this position seems natural for the live performance seen in 4 Hands iPhone, we applied this notion of malleability — reshaping the consumer format of popular song — in SoundLiving and Malleable Mobile Music to demonstrate how otherwise fixed recorded music can become interactive and context-sensitive, defining personal spatial spheres and signaling human presence. Small’s concept of musicking describes forms of engagement with music that overcome traditional boundaries between making and listening. Here we created musical systems built on mobile technologies that perhaps represent musicking machines.