Introduction
Digital Audio Workstation (DAW) design is replete with references to analog recording technology. How did the modern DAW evolve to its current state of dependence on skeuomorphism and will it continue to rely on analog audio metaphors for the foreseeable future? In Part I of this article, we examine recording technologies and practices of the analog era in order to establish a context for Part II, an examination of the evolution of design metaphors in the digital age of music production. Lastly, Part III analyzes the limits of skeuomorphic representations in current DAWs and musical interfaces, and explores the impact of controllerism, the game industry, and user experience design on the future of musical interface design. Throughout, we ask how user interface metaphors liberate or constrain their users’ musical creativity and learning. We posit that while past and present DAW design practices have relied heavily on skeuomorphism—software modeled on hardware—future DAW designs will invert this paradigm and hardware will be modeled on software, spurred by innovations in the gaming industry.
Any user interface, musical or otherwise, must operate within the limits of our ability to draw metaphorical connections between visual images and abstract concepts. A software interface is not a neutral intermediary between the user and the work; it guides and shapes the final product, especially when the user is inexperienced. Software interfaces do a great deal of implicit teaching, and indeed may be the only instructor that some musicians ever encounter. When metaphors that are valid in one technological domain are transposed unaltered into a new one, they may no longer accurately represent the underlying functionality. Rose and Meyer (2005) assert it is often the case that new technologies are slow to bring about changes in the way we learn and create because we use them to carry out old learning models. As an example, they posit twenty years elapsed between the invention of film and the art of cinematography. The practices of filming from multiple perspectives and zooming lay dormant while the camera was used to document stage productions from a stationary position: “The technology to do these things was in place early, but people needed time to discover the new capacities of movie cameras and shift their mindset away from the old, more limited methodologies of the stage” (2005: p.14).
The use of the term Digital Audio Workstation (DAW) in this article is expansive and all encompassing, pertaining to any software that enables the creation or manipulation of sound. Examining the design of DAWs marketed to the project studio reveals a parallel in which software-based recording is largely modeled on its hardware predecessors: the tape recorder, the analog mixing console, and outboard signal processors. In most cases, DAW interfaces integrate skeuomorphs of these analog recording technologies. Consider the following definition by Hayles:
Skeuomorph is a term anthropologists use for a device that once had a functional purpose but in a successor artifact loses its functionality and is retained as a design motif or decorative element…Skeuomorphs are everywhere in our environment, ranging from light bulbs fashioned in the shape of candles to Velcro tennis shoes with buckles to flesh-colored prostheses equipped with artificial fingers (2002: p.119).
Granted, skeuomorphs are decorative, but they are also educative. The candle-shaped light bulb provides a convenient (albeit obvious) example. Hypothetically, an individual who has never seen a light bulb, but has used a candle could deduce that the light bulb produces light because it is shaped like a candle. This principle also can work in reverse, as exemplified in the viral video “A Magazine Is an iPad That Does Not Work” (2011), which features a one-year-old attempting to operate a magazine as a tablet using hand gestures such as swiping, pinching, and pushing.
Skeuomorphs are often generational, and therefore time sensitive. The original reference of a skeuomorph, steeped in meaning for one generation, may be irrelevant and unhelpful for the following generation. For example, this article was written partially with Microsoft Word, which still uses an icon of a 3.5” floppy disk as its saving function. This is a helpful cue to anyone hailing from the 486 generation [1], but considering that this format has been out of mainstream circulation for over two decades, it is of dubious value for anyone who has never seen a 1.44 MB plastic disk [2].
From the standpoint of pure functionality, skeuomorphism in software design is unnecessary. There was a time when all computer interfaces relied solely on text entered into command lines; in theory, DAWs could still work this way. Therefore, it should be possible to design music production software that makes no reference to previous recording technologies. Some software designs do precisely that; for example, Max/MSP/Jitter uses a “boxes and arrows” interactive flowchart visualization scheme. It has always been possible to represent signal flow abstractly in this manner; Max simply makes the abstraction concrete. While its visualization scheme is elegant, Max also presents a steep learning curve for novice users. Complex operations like recording or processing audio can be significantly more challenging to learn from a truly novel interface like Max than from a DAW steeped in skeuomorphs—presuming that the user is familiar with the interfaces referenced by the skeuomorphs.
Successful designers can depart radically from established design motifs without alienating or confusing users. We have adjusted quite easily to the gestures of swiping and dragging on the screens of our mobile devices. On the other hand, an interface metaphor that deviates from custom too widely can place a heavy burden of learning on the user, which can turn into frustration and abandonment of the interface altogether in favor of a more familiar alternative. The challenge for designers of digital audio interfaces is to walk the fine line between evolving to adapt to more complex user needs without alienating users stuck on the tape metaphor.
Part One: Analog Ancestors
DAW interface metaphors are dominated by the multitrack tape recorder. Tape boasted a number of advantages over its predecessor, the disk: increased recording time, expanded frequency response, more malleable editing capabilities, and comparative ease of overdubbing. When Bing Crosby made the unprecedented demand in the mid-1940s that his radio show be pre-recorded, ABC switched from disks to tape because it was easier to edit and had superior sound quality (Morton: 2004, p.123). Crosby also introduced tape to jazz guitarist and Gibson guitar namesake Les Paul. When he received one of the few early Ampex tape recorders in existence from Crosby in 1949, Paul intuited almost immediately how to retrofit it to enable overdubbing (Paul & Cochran: 2008, p.203): “This studio technique, which took him roughly two years to perfect, would ultimately force the industry to reexamine its approach to recording” (Shaughnessy: 1993, p.143). Paul’s tape-based overdubbing experiments upended the existing paradigm of the recording studio: “Prior to the invention of multi-track recording in the 1950s, the relationships between sounds were controlled at the time of recording” (Tankel: 1990, p.39). Paul ushered in a new era in which the recording was no longer assumed to be a real time performance: “Tape and the editing process made possible the creation of an entirely studio-based music whose sole mode of existence was as a recording” (Clarke: 2007, p.54). In the decades that followed, the real-time performance recording became an exception rather than the rule, as the practice of overdubbing became standard in the overwhelming majority of recording sessions. Savage aptly summarizes this phenomenon:
The implication behind “it could have happened” is, of course, that it didn’t happen. That is, the recording presents a musical performance that did not happen on the specific time-line that the finished product presents. (2009: p.33)
Given this latter-20th century reality of an established overdub-dependent culture, any emerging technology championed to supplant tape would have to at least match its multi-tracking and editing capabilities. Further, due to tape’s ubiquity in the recording complex, a successful rivaling technology would have to mesh with the existing infrastructure.
While Les Paul has been touted as the historical figurehead of forward-thinking tape techniques, his peer and friend Bill Putnam has taken on a similarly iconic role in the development of mixing consoles and signal processors. In her discussion of the development of the custom mixing console, Massy (2010) contextualizes Putnam’s innovations: “Early recording studios like Sun Studios in Memphis often used broadcast equipment or had an improvised collection of two or three rotary knob mixers tied together,” and as a result, “There was no standardized way for the audio engineer to use a multitrack recorder.” Swedien (as cited in Cogan: 2003b) attributes the familiar standardized outlay of the console to Putnam: “the design of modern recording desks, the way components are laid out and the way they function, cue sends, echo returns, multitrack switching, they all originated in Bill’s imagination.” The mixing console as we now know it “remained almost unchanged from Bill’s first rather small recording consoles” (Swedien: 2003, p.123), with one significant exception. Atlantic records engineer Tom Dowd was driven by his frustrations with the ergonomics of the console to incorporate the linear fader in order to make the physical actions of mixing less cumbersome:
The equipment most places were using in those days consisted of hand-me-down stuff from broadcast facilities, including consoles that had these big fat three-inch knobs…The problem was that you couldn’t get two or three under your hands. It just wasn’t accurate, it was plain stupid. Eventually I found a manufacturer who was making slide wires–faders that were linear instead of cylindrical and traveled 5 inches up and down. Because of the narrow width of these things, I could fit them into a board half as wide. Which enabled me to put a whole group of faders in two hands, which is what I wanted to do all along. Finally, I could play the faders like you could play a piano (as cited in Simons: 2004, p.53).
Here we are presented with an early example of metaphorical mapping onto music production technology; Dowd’s impetus for the design of the slider was to make mixing akin to playing the piano. With most manufacturers of mixing consoles adopting Dowd’s design, the fader came to be a standard feature of a mixing desk. Artists ranging from the Beatles (Emerick & Massey: 2007, p112) to the Beastie Boys (Brown: 2009, p.45) have remarked on the phenomenon of “playing” the mixing console. At its most primitive level, mixing entails setting track levels to static states. In most cases, however, the faders are in flux over the duration of a piece of music, ebbing and flowing to match the dynamics, or alternatively, reinventing the dynamic entirely as an aesthetic choice. The technique of “riding the faders” is a defining feature of music production, and its provenance can be traced to Dowd’s desire to “play” the console.
Before the emergence of standardized consoles like those built by SSL in the late 1970s, mixing boards were largely individually customized. The comparatively minimal signal processing performed on such consoles was achieved by routing signals to external units. Engineer Eliot Scheiner remarked that, in the late 1960s, effects technology in the studio was minimal: “There was no EQ or compression in the console; the room that we worked in had two Pultec EQs and two Fairchilds. That was that” (as cited in Droney: 2003, p.196). Bill Putnam is credited for pioneering now-commonplace audio technologies such as equalization, and Cogan (2003a) recalls that while serving during the Second World War, Putnam authored a paper on “the workings of a 3-band EQ amplifier, capable of independent boost and cut controls for highs, mids and lows. This was the first time this concept…was put forth.” Putnam’s company Urei was the first to integrate equalization directly into each channel strip on the console (Robjohns: 2001). Urei is almost synonymous with Putnam’s famed 1176 limiting amplifier that debuted in 1967, and that came to characterize the compressed sounds of radio hits for decades to follow.
The mid-century designs and practices of Paul and Putnam have persisted through to the present. Indeed, the products of multitrack analog recording are so ubiquitous in our musical culture that it is easy to forget how brief their history is. The paradigm of the tape recorder and mixing desk now seems inevitable and unshakeable, especially since so much software has continued to emulate it. Nevertheless, the DAW is not a tape recorder or mixing desk. As the computer has grown in power and fallen in price, it has brought about a sweeping wave of change in the way that music is conceptualized and produced. In the following section we explore the tension between digital tools and the analog metaphors we use to engage with them.
Part Two – The Digital (R)Evolution
In this section, we present a brief overview of the history of design metaphors in music recording technologies. While a comprehensive chronology of the DAW’s development to its current state is beyond the scope of this article, an examination of the design metaphors employed in its digital predecessors and early incarnations is warranted because it aids in contextualizing its skeuomorphic lineage.
The visual modeling of sound events (MIDI or audio) is foundational to the graphical interface of the typical modern DAW. Thomas Stockham pioneered visual editing of digital waveforms using a computer in the mid-1970s (Fine: 2008, p.4), leading to the establishment of the first DAW, Soundstream. This system used an oscilloscope to model waveforms, affording visual editing tasks such as splicing and crossfading (Barber: 2012). Similarly, both the keyboard-anchored Synclavier and Fairlight systems had appended monitors that enabled the graphical representation and manipulation of sampled waveforms. While the Fairlight modeled 3D representations of audio, the Synclavier’s 2D representations of waveforms have come to be the standard depiction of audio in the digital domain. Digidesign’s Sound Designer software, the ancestor of Pro Tools, proliferated visual-based waveform editing to other hardware samplers beginning in 1985. By the early 1990s, the paradigmatic modern waveform editor was firmly established in the likes of Digidesign’s Sound Tools (1989) and Pro Tools (1991), Opcode’s Studio Vision (1990), MOTU’s Digital Performer (1990), and Steinberg’s Cubase Audio (1992).
The representation of MIDI data in the visual field can also be linked to hardware-software hybrid sequencers such as the Fairlight and Synclavier. When the Fairlight CMI Series II was introduced in 1980, it added a novel feature called Real Time Composer (commonly referred to as “Page R”): “a brilliant addition to the Series II that gave the world the first integrated graphical pattern-based sequencer” (Leete: 1999). The “Pattern Editor” within Page R resembles the typical left-to-right scrolling arrangement view of most modern DAWs.
In contrast, the Synclavier employed a more abstract concept with its “Recorder Display.” It too made possible the visual representation of music sequences in real time, with vertically scrolling displays of each note’s timing, pitch, and duration in lines of text. However, the user could only view three tracks at a time this way; perhaps that is why the horizontal paradigm won out over the vertical.
NOTE: Scroll to 5:25 in the above video to see a demonstration of the Synclavier’s “Recorder Display”
In retrospect, scrolling lines of textual code may seem cumbersome in comparison to the graphic representation of MIDI in current DAWs, but the software-based sequencers of the mid-to-late 1980s initially followed in the footsteps of the Synclavier. For example, the first version of Notator (the predecessor of Logic) for the Atari ST featured “arrange” and “pattern” windows, both of which displayed MIDI data in vertically-scrolling lists of text. This information could be edited by text entry or by manipulating the accompanying graphical music score (Moorhead: 1989). The other major sequencer on the market during this era, MOTU’s Performer, also represented MIDI data as an event list. The first sequencer to break with the event list paradigm was Passport’s Master Tracks Pro, which introduced the now ever-present piano roll model in its “Step Editor”:
The step editor window depicts one track only of note information. The y-axis is pitch information graphically represented by a sideways piano keyboard…The x-axis displays time as measures. Notes are graphically represented on the grid as bars. As notes go higher in pitch the bars get higher and as they get longer in duration the bars get longer. (Bachand: 1988)
The developers of other sequencers took notice and followed suit, including Opcode, makers of Vision (1989):
Although we had loyal and passionate customers we were #3 behind both Performer from MOTU and Master Tracks from Passport in terms of sales. We believed we could change this by reinventing our sequencer and incorporating both a list view like Performer and a graphic piano roll view like Master Tracks (Halaby: 2011).
The piano roll is a particularly peculiar skeuomorph because its original real-world incarnation peaked in popularity in the 1920s with the player piano, a device that most MIDI users would not have experienced first-hand. However, the player piano is an apt analogy to MIDI:
The paper piano roll isn’t a recording you can play back in the absence of the piano itself; instead, the holes punched in it are read by a mechanism that then tells the piano’s hammers what strings to hit (Anderton: 2014, p.36).
The piano roll was its own unique medium distinct from audio recordings and sheet music. For example, George Gershwin cut more rolls than records, and understandably so; piano rolls were the first multi-track medium and they afforded compositional techniques such as overdubbing, vari-speed, and even quantizing that recording could not:
Overdubbing could add octaves or even a third and fourth hand, and a performance could be recorded in pitch at a slow tempo, to be played back at normal speed to heightened effect. Piano rolls were liable to jog along in a robotically regular tempo, since the length of tones was calibrated by measured perforations in the paper, a sort of quantizing before the word had even been invented (Hyman: 1998, p.52).
While the player piano metaphor constituted a breakthrough in software sequencer design, tape continued to be the dominant model. Software reviewers were cognizant of the fact that tape metaphors were the default design metaphor: “The transport window [of Master Tracks] conforms to the standard tape recorder analogy that most other sequencers use” (Bachand: 1988). “Does it [Notator] run like a tape recorder…? The answer is ‘yes’” (Moorhead: 1989), and “Cubase emulates a multitrack tape recorder as its basic model” (Snow: 1990).
Steinberg’s Cubase inaugurated the text-less “arrange” window, depicting musical events as rectangular blocks stacked vertically as tracks. As the following excerpt from Paul White’s interview of Karl Steinberg evinces, Cubase’s unveiling of the arrange view in 1989 was a novel concept that was soon imitated by its competitors:
White: Cubase was a totally new concept in graphical interfaces. Was this designed from scratch or were you influenced by the interfaces used by graphics software packages?
Steinberg: The interface was largely our own idea. We got a lot of user input from Pro 24 [Steinberg’s preceding sequencer], then we got together to discuss what the ideal sequencer interface should look like, taking into account the capabilities of the machine. Cubase makes the data contents available in a much more visual way than Pro 24.
White: It is obviously a very successful interface, because all your major competitors have adopted some variation on it for their own products (White: 1995).
As computer-based audio and MIDI applications evolved in the late 1980s, their paths of development eventually intersected with the establishment of integrated Audio-MIDI DAWs that boasted the best of both worlds, beginning with Opcode’s Studio Vision in 1990. Just as Master Tracks’ piano roll came to be the standard method of graphically representing MIDI, the Cubase arrange view was co-opted by most DAWs as the preferred method of depicting MIDI and audio events (one notable exception is Ableton Live’s “session view,” a spreadsheet metaphor resembling the “track mode” in Dr. T’s KCS, first released for the Atari ST in 1984). Most DAWs of the 1990s integrated a transport panel with such recognizable analog recorder functions as play, stop, fast-forward, rewind, and record, along with editing tools to “cut” and “paste.”
As DAW editing windows have mostly been modeled on tape, so too have their mix interfaces mirrored the venerable analog console [3]. Consider the following review of Cubase VST by Aikin, and how closely the graphical mixer resembles its analog ancestors:
The VST on-screen mixer is elegant. Each channel strip has a long-throw fader, an LED meter with a clipping light, and a readout (in dB) of how high the wave data has peaked since the readout was last cleared. Also on the strip are a tiny panpot strip, mute and solo buttons, ”backlit” buttons that tell whether FX or EQ is active for that channel and open up the EQ/send window when clicked, and a monitor (audio thru) button (Aikin: 1997, p.87).
Signal processors were the final component of the analog studio to be absorbed into the digital domain. Because desktop computers had limited processing power, early versions of signal processing plugins for the Pro Tools platform relied on additional DSP (Collins: 1996). Graham (1997, p.32) credits Digidesign’s Evan Brooks for creating the first ever plugin for Sound Designer, a noise and hum reduction tool. Graham further cites Waves’ Q10, a 10-band paragraphic equalizer, as the first third-party plugin. The interface of the Q10 featured vertical sliders resembling those on a mixing console to control input and output levels on the left and right channels with accompanying meters, but otherwise it did not incorporate other obvious analog skeuomorphs. Most of the Q10’s interface consisted of preset buttons appropriately sized for mouse clicks and an easy-to-read equalization graph. Analog skeuomorphs were also scant in Digidesign’s proprietary TDM plugins such as D-Verb and Mod Delay, which followed a similar template of adjusting parameters using a series of horizontal sliders. The slider knobs themselves did not conform to the analog look, instead presenting the user with an inverted triangle shape resembling an arrow.
It is something of a conundrum that these early plugins did not incorporate many skeuomorphic cues for the sake of design or education at the transitional time when they were likely needed most. “Realistic” graphics are much more demanding of computing resources than geometrical abstractions, and were thus largely unavailable to designers in the 1990s. It is likely that the plugins were nevertheless widely adopted because their limited functionality could be learned easily without much metaphorical assistance: “Since plug-ins tend to be focused on specific applications, they many times (though not always) present a relatively simple learning curve” (Graham: 1997, p.34).
The advent of Steinberg’s Cubase VST (Virtual Studio Technology) in 1996–97 was a turning point in DAW design for two reasons: it used a method of signal processing that no longer relied on additional DSP chips, and it popularized skeuomorphic plugin design modeled on legacy analog gear. When popular publications such as The Economist (2013) and Scientific American (Pogue, 2013, p.29) discuss skeuomorphism, it is usually in connection with vintage-looking graphical designs like the ones used in Steinberg’s new plugin format:
Cubase VST has a gorgeous graphic interface which emulates the look and feel of analog recording equipment, mixers and effects units…Steinberg are obviously hoping to render the traditional outboard effects rack obsolete (Lau: 1998).
In contrast to Lau’s enthusiasm, Aikin offered a more critical review of the consequences of such design implementations:
The built-in effects are designed to look and operate as much as possible like their hardware counterparts. This is a friendly metaphor, but it may be unnecessarily limiting. The parameters are displayed in a little black ”LCD window” and edited with a ”knob” using the mouse. The knob has fine resolution, but it’s still tricky to nail specific values (Aikin: 1997, p.90).
Since the 1990s, music software has become more elaborate and intricate, and user interfaces have likewise grown in complexity. In 2000, skeuomorphism reached its apotheosis with Propellerheads’ Reason, complete with virtual rack mount screws, patch cables, power switches, LEDs, knobs, and sliders, all designed to powerfully evoke the analog gear that was very much in vogue among electronic music producers at the time. Audio plugins continue to rely on highly skeuomorphic design elements, which now extend beyond the visual realm into the actual sound processing. Consider, for example, Slate Digital’s VTM (Virtual Tape Machines). It boasts elaborate graphics, including complex shadows and reflections on the seemingly graspable switches, knobs, and tape reels, but its auditory gestures toward the analog world are even stronger than its visual ones. The VTM makes digital look vintage, but more importantly, it makes digital sound vintage, going so far as to offer optional tape hiss. This is “technostalgia” at its apogee, and it extends our definition of skeuomorphism beyond visual to aural. For musicians and audio aficionados alike, sound is paramount, and it is naturally more attractive for software to reference the sound of an antecedent analog technology, not just its visual appearance.
By the late 1990s, DAWs had not yet superseded analog equipment, due to their still-limited functionality and high price. Instead, DAWs were used in tandem with analog mixing consoles, supplanting the tape recorder as editing tool and storage medium, but not as the recording complex itself. The “W” in DAW, workstation, was initially underused, and its capacities were underdeveloped. For the most part, hard drive storage was relegated to the role of tape replacement. However, even in such hybrid systems, the digital paradigm had begun to diverge from the analog one. While the physical properties of tape limited the number of tracks it could accommodate, the track capacity of a DAW is limited only by computing power, which would grow exponentially in keeping with Moore’s Law [4].
It was not until the end of the twentieth century that a mass audience experienced purely digital music via Ricky Martin’s “Livin’ La Vida Loca” (1999), “the first Number One record to be done completely within a hard disk system” (Daley: 1999). Engineer Charles Dye’s “in-the-box” Pro Tools production bypassed the need for an analog mixing console and hardware-based signal processors, establishing a precedent that was to become the de facto standard. In less than a decade, the DAW went from augmenting the recording studio to reconstructing it: “by 2007, between 70 percent and 80 percent of all pop music (and probably nearly 100 percent of all hip-hop, R&B, and dance music) was mixed in the box” (Milner: 2009, p.338). From 2000 to the present time (2015), the DAW underwent a rapid evolution from tape surrogate to all-in-one studio. This evolution has included the more recent development of phone- and tablet-based “MAWs” (Mobile Audio Workstations).
As a result of computers’ greater affordability and portability, the digital studio is no longer a physical place, but rather a set of practices. Digital musicians can and do work in any environment they find congenial: homes, hotel rooms, parks, airports, buses, trains, and so on. The “writing” that occurs in this type of informal context often takes the form of exploratory and improvisatory recording and sequencing, a process bearing little resemblance to pencil-and-paper composition. The casualness enabled by such practices dovetails with the high value that hip-hop, electronic dance music and related pop styles place on spontaneity and immediacy (Söderman & Folkestad: 2004; Seabrook: 2012). In the following section, we explore the consequences for musical creativity and learning brought on by the ubiquity of the DAW, and by the evolution of its interface metaphors.
Part Three: User Interface Paradigms and the Future of the DAW
Throughout the analog era, innovators and experimentalists attempted to “play the studio” (Eno: 1979), but, for the most part, there was a clear distinction between musical instruments and recording equipment. In the digital era, by contrast, performance, recording and composition have largely collapsed into a single act. The experiments of musique concréte composers and tape splicers have become solidly entrenched within the pop mainstream.
The DAW is not simply a collection of tools to document a performance; it is a music creation tool in its own right (Thibeault: 2011, p.49). Most DAWs include a robust suite of instruments, the capabilities to “score” and record them, and the tools to mix and process the results. By the same token, the teams of specialists required to operate an analog studio are rapidly being supplanted by producers working alone, in pairs or in small groups. With the roles and processes of recording artists changing so dramatically, it follows that the design metaphors of their tools are changing in tandem. If the workflow of the digital studio no longer resembles that of the analog studio, then the interfaces need not resemble the analog tools either. As software takes on new functions and features, designers must find new visualization and interaction schemes to represent them to users.
The recording-as-performance paradigm (Zak: 2001) has been accelerated by the explosive growth in the use of the MIDI protocol and its influence on audio advancements. MIDI has become a kind of lingua franca of digital music, used to control nearly all hardware and software instruments, including synthesizers, samplers and drum machines. MIDI is more akin to music notation than audio recording, and it can be composed graphically without a real-time performance. A MIDI file is a dynamically interactive score, as easily edited as text in a word processor. The fluidity of MIDI has come to serve as a benchmark for music production generally, challenging software developers to devise new methods to make audio more malleable. As a result, audio and MIDI are becoming increasingly fungible: the newest DAWs can extract the pitch and rhythm content of audio and enable the user to edit and manipulate them as easily as notes in a MIDI file. For example, Pro Tools’ “elastic audio,” Logic’s “flex time” and Ableton Live’s “warping” afford their users the ability to dramatically alter the rhythmic content of audio. Comparable feats are possible in the frequency domain by isolating and tuning single pitches, as in Celemony’s Melodyne and Antares’ Auto-tune.
MIDI hardware has been dominated by the piano keyboard metaphor, with drum pads as the only widely-used alternative. However, with software driving the development of new interface metaphors, software paradigms will increasingly drive the design of MIDI hardware. For example, a variety of specialized controllers have emerged designed specifically for Ableton Live’s Session View. There is no hardware predecessor for this interface; it most closely resembles a spreadsheet loaded with segments of audio and MIDI. Controllers like Ableton’s Push, Novation’s Launchpad and Akai’s APC are laid out in a grid corresponding to the Session View spreadsheet. Tapping a square in the grid triggers the sound in the corresponding cell in the software. One might well expect MIDI hardware to follow software, since MIDI has always existed in the digital domain. What about hardware with a longer history? Will the design of interfaces like mixing desks and instruments come to be driven by software paradigms as well? We address this question in the remainder of this section.
Because a mouse or touchpad can only click one object at a time, using a pointer to mix multiple signals as a unit on a virtual console requires the user to first select and group the virtual faders one at a time. It is not possible to simply grab multiple faders and move them with one gesture like a hand can on a physical console (though this functionality is available on multitouch-enabled tablets using applications such as V-Control). Audio engineers that are accustomed to mixing on a console frequently complain that mixing “in-the-box” is a frustrating experience. The criticisms voiced by recording engineer John Cornfield capture the essence of the argument against DAW-based mixing:
It takes longer to get the sound, whereas most of the stuff is right on the board. It’s just more hands on, isn’t it? Mixing with the mouse drives you up the wall after a while (as cited in Touzeau: 2006, p.188).
Mixing on a physical console requires gross motor movement such as rolling around in a desk chair to maneuver around the mixing board and reaching at arms’ length to adjust a fader or knob. It is a kinesthetic experience that occurs at the macro-level, requiring the entire body, whereas mixing with a mouse or trackpad only requires micro-movements of the hand. As mixing consoles have been pushed towards obsolescence, mixing in-the-box has become the rule rather than the exception, and Cornfield’s sentiment will soon have little relevance to the up-and-coming generations of music-makers who will never have experienced a mixing console. Nevertheless, Cornfield’s critique has merit, and it could lead to new approaches in digital music production and the technologies that support these tasks. While the channel strip remains the predominant mixing metaphor, the stage metaphor presents a viable alternative, approximating the experience of an audience member at a live concert. Gibson (1997, pp. 22–24) defines the three axes of the sound stage metaphor:
- X: left to right (the traditional concept of panning)
- Y: top to bottom (frequency: higher frequencies are perceived as having height—this convention is not consistently adopted)
- Z: front to back (volume: louder sound sources are perceived as being closer)
Audio production has long involved the creation of an implicit auditory “space.” The stage metaphor makes this space visually explicit. The user can see the position of sound sources as well as their relationships to one another without relying on memory or comparisons of pan pot and fader values between channels. There have been several implementations and adaptations of this metaphor in mixing interfaces using the mouse (Pachet & Delerue: 2000; Holladay: 2005), touch screen (Diamante: 2007; Carrascal & Jordà: 2011), augmented touch screen with additional microcontroller and sensors (Gelineck et al.: 2013), computer vision and gestural control (Lech & Kostek: 2013), and computer vision with a portable commercial sensor (Ratcliffe: 2014). While some early implementations used a standalone design, more recent implementations have been designed as control interfaces for DAWs. The combination of the stage metaphor and accompanying hardware restores the ability to interact with multiple sonic parameters at the same time, while adding the ability to directly visualize spatial relationships between sound sources. While such a paradigm has the potential to overcome the deficiencies of the combination of the channel strip metaphor and mixing with the mouse, it has not been adopted in any commercial DAW. For the time being, the skeuomorph of the analog mixing console remains largely uncontested.
Most DAWs include software instruments controlled via MIDI. As both pop and “art” music shift away from acoustic instrumentation toward synthesized and sampled sounds, these software instruments have become more important. Nearly all software instruments are controlled with a piano-like keyboard and/or drum pads. These skeuomorphs are good fits for keyboard and percussion instruments, respectively, but they are awkward for other synthesized sounds. As we have considered the source of metaphors for mixing and editing in the DAW, we must similarly examine the metaphors used for performing, since this usage is becoming increasingly central.
The piano keyboard offers a straightforward and intuitive mapping of one key to one pitch. Hitting the key harder to produce a louder sound mirrors the familiar analog world. The MIDI standard has entrenched the keyboard metaphor in the DAW, and from there it has also extended into environments where there is no keyboard involved, such as the scale entry interface in Antares’ Auto-tune. The problem with the piano keyboard is its strict pitch quantization. The finite pitch set is gentle on beginners, perhaps, but it restricts the expressiveness of non-keyboard instruments. The microtonal nuances we have come to expect from a century of vernacular and pop music are simply unavailable to the keyboard player. The pitch wheel on some keyboards overcomes this shortcoming to an extent, but it is a primitive affordance at best compared to guitarists’ and violinists’ ability to shift the pitches of different notes within a chord by different amounts. The touchscreen offers some promise in this regard. The iOS app Nodebeat has an exceptionally expressive touch keyboard. Notes played in the center of the keys produce standard pitches, but the player can also span the entire pitch continuum by dragging from one key region to the next. The Seaboard is a rare keyboard controller that attempts to give a similar degree of control over pitch nuance.
Interfaces modeled on non-keyboard acoustic instruments such as the guitar, saxophone, violin, drums, and accordion hew close to analog reality in their parameters and mappings. It does not require much imagination to understand how a MIDI violin maps fingerboard position to pitch. The Akai MPC sampler, mimicked in many subsequent devices, is essentially a set of small drums played with the fingertips. The relationship between hitting a sampler pad and hearing a sample played back is obvious, visceral and appealing to our intuition. However, the pads only enable control along two axes: time and velocity. Any additional expressive parameters must be performed with additional hardware or entered tediously on the screen.
As electronic music has come to dominate popular culture, artists have struggled to find ways to make their studio creations come alive onstage. Recreating an elaborate studio setup onstage is generally impractical, and laptop computers are prone to crashing. Given the high degree of automation at work in most digital music, there seems to be little point to bringing the studio onstage in the first place, since watching a performer simply hit play on the computer is quite unsatisfying. To make this new music work in the live context, new technologies are needed.
Controllerism is a performance method whereby musicians use specialized control surfaces to trigger sample playback and manipulate effects parameters with the full fluidity and expressiveness of a conventional instrument. Inspired in large part by the virtuosic turntable manipulations carried out onstage by hip-hop DJs, controllerism strives to bring spontaneity and immediacy to DAW-based performance. Such performance can take place on stage or in the studio:
Live Electronic Music is a product of the belief that the body is participating once again in the music making process, that the human is having a physical effect on the music, not just pressing buttons to facilitate the playback of recordings. (Vandemast-Bell: 2013, pp. 241–243)
As its name implies, Ableton’s Live is a DAW designed from the ground up with real-time performance in mind. Controllers such as the Push, Launchpad, and APC enable performers to reconfigure and resequence their tracks on the fly. Native Instruments’ Maschine takes a similar approach, also using customized hardware. It is important to note these tools lean closer to DJ practice than traditional instrumental performance. The designers’ expectation is that the user will be carrying out fine-grained playback recording and sound manipulation, rather than generating new sounds “from scratch.”
Designers who wish to give digital musicians the real-time expressive nuance of acoustic instruments without imitating those instruments’ physical forms are confronted with a serious difficulty: the mappings from gesture to sound must be determined arbitrarily from the ground up. The past century has seen a variety of fascinating experiments in non-traditional control schemes, from the theremin onward, but the hegemony of the piano keyboard metaphor (and, to a lesser extent, other acoustic instrument metaphors) remains unchallenged. This is certainly not for a lack of trying among experimental interface designers. Experimental instruments have incorporated motion sensors, touch-based controls, piezoelectric pickups, and contact mics attached to every conceivable object, and even direct readings of brainwaves. None has seen widespread adoption, despite the efforts of their inventors and predominantly academic supporters, such as the International Conference on New Interfaces for Musical Expression (NIME).
Musicians who wish to adopt any of the novel physical interfaces listed above face a daunting learning curve. Before expression is possible, a musician must understand the device’s idiosyncratic mappings between gesture and sound. Further, the audience must go through this learning process as well. We participate imaginatively in musical performances, and we lose our ability to connect with the performer emotionally if we cannot connect the performer’s actions to the sounds produced. Audiences for recorded music have grown accustomed to not being able to identify every sound source, but they still desire some sense of how the music was made in order to relate emotionally. Morton Subotnick (personal communication, October 2012) lamented that once the gestures in his stage pieces became excessively abstracted from the sounds they triggered, it would have made no difference if he had simply played a tape of the desired sounds with the performer miming along. It is no wonder interface designers keep returning to drums and keyboards.
Productive Failure
The electric guitar amplifier offers many lessons for the digital interface designer. Modern guitarists modulate their signal with devices ranging from expression and effects pedals to e-bows, talk boxes, loop pedals, and MIDI pickups. While feedback was originally considered a defect or error, it has become a central pillar of the electric guitar’s expressive use. The example of guitar amplifier feedback poses questions for other electronic music interface designers: how can a tool fail productively? How can designers build a functioning tool that still leaves the door open to similarly unpredictable use cases? How can space be left for the emergent, the serendipitous, the flaw-turned-virtue? For a digital environment, is skeuomorphism the right design strategy for open-endedness?
Dennett (2001) argues that virtual environments like composition software need artificial collision detection. While the “real” world is full of rough edges, entropy and chaos, these things need to be inserted into computer programs laboriously and by hand. Computer music lacks the “spontaneous intrusions” of human music — there is no amplifier feedback unless the programmer puts it there. An acoustic instrument is slowly, constantly going out of tune. Even hitting the same piano key at the same velocity produces subtly different sounds each time. Analog synthesizers are sensitive to temperature, humidity and the power coming out of the wall. By contrast, MIDI playback is always the same, unless the programmer goes to considerable effort to write randomizing algorithms.
A maximally expressive musical interaction system must be noisy and unpredictable. Many musicians might disagree with this statement, because they devote their lives to reducing their instruments’ unpredictability. Nevertheless, the element of chaos implicit in acoustic and analog instruments is valuable, because the noise contains potential new signals. A major part of our creativity is not just creating patterns, but discerning patterns in noise that are not really there. Randomness is fertile soil for creativity. The electric guitar has evolved so rapidly because generations of unschooled rock guitarists continually expand its sonic palette in the course of naive and playful experimentation. Software can feel sterile because it lacks chaos. The best interfaces should reward open-ended tinkering, and offer up periodic surprises.
iOS GarageBand Guitar
Apple’s iPad version of GarageBand employs several new interface schemes that utilize the touchscreen to great effect. In particular, the guitar interface user can tap out notes on the graphical fretboard, and can bend the strings to produce the same microtonal nuance available on a real guitar. In some respects, the virtual guitar in GarageBand offers some advantages over its physical counterpart because wider string bends on a real guitar require considerably more physical strength. In addition to the real fretboard layout, GarageBand users can also play the guitar in Scales mode. When a scale is selected (major, blues, Mixolydian, etc.), the scale tones will then be the only notes available on the fretboard. With no “wrong” notes possible, novice players are free to effortlessly explore melodic ideas. In chord strumming mode the graphical strings are overlaid by rectangular blocks, with each block representing a different chord in the selected key. Brushing a fingertip across the strings within a given rectangle sounds the corresponding chord. It feels very much like strumming with a pick, with the exception that all of the notes are automatically “correct.” Once the user has strummed a series of chords, they can be edited at the individual note level. An advanced user might rough out a guitar part by strumming, and then refine the harmonies one note at a time in the MIDI piano roll view.
The keyboards in GarageBand work very much like the guitar. Individual notes can be played on a regular keyboard, and there is also an equivalent to the guitar’s strumming mode, where moving a fingertip on a vertical strip produces arpeggios or melodic figures. While the prefabricated patterns are banal, they can be easily customized in the MIDI editor. The violin, cello and upright bass interface has a similar arpeggio/pattern interface, along with a fretboard mode allowing bowing, pizzicato, vibrato and microtonal slides. Scales mode is available on all of these instruments as well. Here we see a combination of familiar skeuomorphic analogies that depart enough from their sources to ease the musical discovery process for novices significantly. While GarageBand is generally considered to be a DAW, we have framed it as a set of instruments. In reality, it is both, and as such it provides a perfect example of the broader confluence between production and performance tools taking place throughout the digital music landscape.
Life Beyond the GUI: Convergence of User Experience Design, Game Controllers and Musical Interfaces
The iterative driving of hardware design by software paradigms is increasingly evident in user experience design, with the most notable impact in the gaming industry. While video arcades in the 70s and 80s provided better graphics and sound than home game consoles, this was no longer the case by the 90s. As LaViola (2008, p.11) explains, in the 90s,
The standard video arcade was able to move into the home. It became clear that arcade games couldn’t compete with game consoles and PCs, in terms of graphics, sound, and length of play per gaming session. So, to compete with game consoles, the only thing video arcades could do was innovate the user interface. This innovation came in the form of a variety of input devices and strategies that got players more actively involved.
New physical interactions in arcades included cars and motorcycles which users could ride in, interactive dance floors, etc. Once the gaming industry introduced new and interesting ways for consumers to interact with games in arcades, consumers came to expect continual innovation and novelty, else there would be no reason to leave PCs and home consoles to go to the arcade.
The graphical user interface (GUI) and the Windows Icons Menu Pointer (WIMP) system has been the predominant user interface paradigm for the personal computer since the early 1980s, using the QWERTY keyboard and mouse as input. Many have observed deficiencies of the standard GUI, and proposed and established groundwork toward conceivable alternatives or augmentations to it (see below for examples), but only recently, within approximately the past five years have these alternatives infiltrated consumer personal computing markets. In contrast to the PC, which clung to the traditional GUI for 30 years, the history of user interfaces in the arcade and console game industry evidences that they have been quicker to adopt newer hardware. Whilst arcade game developers introduced new hardware into the user experience for fear of losing consumers to console games, developers of console games also introduced new ways for consumers to interact with their games (e.g. Nintendo Wii and Microsoft Kinect). Game developers continuously challenge the norms of user experience for consumers, creating significant demand on hardware developers to release new controllers and sensors. La Grou (2014) describes the current state of digital interface design as the “first person shooter era of media production.” The gaming industry has been the source of much of this rapid development and all forms of media have reaped the benefits in tow, music being no exception. The use of gaming controllers as musical interfaces can be traced back from early home game consoles through today. As gaming paradigms warrant more advanced means of input, alternatives to the standard GUI are increasingly being considered in consumer gaming devices, and once new hardware reaches the consumer, it is finding its way into musical applications. The future of musical hardware is driven not only by designs used in DAWs and other music software, but also by user interface design in games and user experience design, more generally. This is an iterative process to which several parties are contributing.
Console games have evolved to include numerous expansions of simple joysticks and buttons, some of which have been used in musical interfaces. However, controllers that have offered more expressive musical potential include those that enable interfaces with three-dimensional spatial interaction and virtual or augmented reality. LaViola (2008, p.13) discusses the three main types of hardware that enabled more advanced interaction in user interfaces as of 2008: physical props modeled after real world objects (e.g. the Guitar Hero Controller), simple vision-based tracking sensors, and Nintendo’s approach in the Wii remote, including an accelerometer and light tracking with the use of a sensor bar.
Props, and the Guitar Hero controller specifically have been implemented into performative musical interfaces (Luhtala, Kymäläinen & Plomp: 2011; Dahlstedt et al: 2014).
Many have implemented musical controllers with the Wii remote as well (Peng & Gerhard: 2009; Miller & Hammond: 2010). In addition, since 2008, gyroscopes have been utilized in many devices with a focus on mobile computing in smart phones and tablets. This type of hardware has received a significant amount of mileage in musical interface design, especially with the use of multi-touch control. In addition, many have stressed the importance of tactile feedback in user interfaces. A theoretical framework for “tangible user interfaces” (TUI) (Ullmer & Ishii: 2001) has been established as an alternative to the traditional GUI. With TUI, the digital domain can take on tangible, physical representation. TUI has been implemented into musical interfaces like the Reactable, where a multi-touch surface is used as a tabletop and augmented with interactive fiducial marker blocks.
These tangible blocks have also been augmented with sensors and microcontrollers, to supply more flexible input with “smart tangibles” (e.g. Gelineck et al: 2013).
While tangibles have seen significant development, there have also been substantial advancements in designing interactions requiring no touch at all. Several computer vision-based sensors have been designed to give the user more direct body control in games. Recent consumer sensors used for this purpose include the Microsoft Kinect and the Leap Motion. These sensors, combined with machine learning, are designed to detect motion of specific joints on the human skeleton. The Kinect has been integrated into several XBox games, following design principles of Natural User Interface (NUI) design. NUI strives to strip away metaphors and provide a more direct interaction between the user and the software, where the input sensor acts as a natural extension of the user’s body and mind. For games, this generally means that the user does not hold a controller; the user is the controller. Instead of holding down an arrow to run, the user physically runs. Instead of pressing “A” to jump, the user jumps, and the software responds directly to the user’s actions. These vision-based sensors are seeing significant development in musical interfaces (Sentürk et al: 2012; Yoo, Beak & Lee: 2013; Ratcliffe: 2014), and afford the designer nearly limitless expressive potential. Principles of NUI design are also implemented into multi-touch (natural gestures), and wearable technology.
To contemplate what the future of music and media production tools (software and hardware) might look like, it is helpful to consider some projections of price-performance data for increasingly popular technologies employed in user interface/user experience design specifically for games, as this is the driving market. These are not unlike older predictions made by Moore (1965) and Kurzweil (2001). As many have observed, price-performance improves at a reasonably predictable rate over time, enough that Moore and Kurzweil were able to predict the rate of growth of different technologies. Examining this data, La Grou (2014) found that 3D vision-based sensing, haptics, wearable technology, augmented reality and virtual reality were becoming increasingly powerful and affordable (e.g., Leap Motion, Oculus Rift, and Google Glass are following this trend). The future of these technologies will likely be embedded and packaged into computers and workstations, and integrated into music production and performance. The gaming industry will continue to commercialize and popularize user interfaces that challenge the traditional GUI, and this will likely challenge skeuomorphic GUIs in DAWs as well. Based on price-performance projections, La Grou (2014) predicts that by 2040, there will be no mouse or touch screens, users will interact with music by harnessing gestural controllers such as head-worn hardware with virtual 3D user interfaces. What started as a physical mixing console and tape machine evolved to a digital audio workstation and could very likely become a “virtual audio workstation” or “virtual media workstation,” transforming how all media is produced.
While alternatives to traditional musical interface technologies are continually introduced and implemented, many of them remain in a gestational state and have yet to evolve to the extent of their forebears such as the mixing console and piano keyboard. The prevalence of skeuomorphism in DAWs and software tools subsists largely because the older tools are well tested and evolved. There are undoubtedly many potential applications for these emerging technologies to be implemented into musical interfaces: live performance, recording, production and in all of the blurred regions between them. Today’s laptop performers are faced with situations where they are tethered to their computers, in many ways stripping performance of its performative qualities. The integration of 3D sensing, wearable, tangible, augmented reality and virtual reality technologies into musical interfaces has the potential to break down the walls put up by the design metaphors and user interface paradigms used in DAWs today. This integration can allow today’s music makers to focus on the performative aspects of their music, whether they are performing in front of an audience, or writing music in their living rooms. Even as systems are integrating more advanced hardware and software, technological learning curves do not need to be insurmountable, so long as there is a focus on the user experience.
Conclusions
During the first decades of digital audio, software design was dependent on skeuomorphs of analog hardware, but now we are beginning to see the converse: hardware modeled on software. We expect that in the future, software design will increasingly drive hardware design. As the DAW continues to evolve, what innovations should we expect or hope for?
Presently, a musician who wishes to compose, record or perform with a DAW must devote considerable focus to the mechanics of the interface itself: remembering which command is located in which menu, finding the target area for the pointer and executing the mouse or touchscreen drag correctly, managing window layout and the files and settings represented therein. This is attention that is necessarily diverted from the music itself. How might future interfaces liberate more attention from themselves to return to the creative act? Will complex computer operations always require complex interfaces?
Acoustic instruments have had hundreds or thousands of years to develop. With sufficient training and practice, a pianist or flutist can cease to be conscious of the mechanics of the instrument entirely. Arguably, it is a requirement of virtuosity that instrument mechanics no longer need occupy any of the performer’s volitional attention. That said, the possibilities of acoustic instruments are finitely bounded. A piano sounds like a piano; a flute sounds like a flute. A computer can sound like a piano, a flute or literally anything else. Will it ever be possible for a digital musician to master its possibilities the way an expert pianist or flutist can master their instruments?
Implicit through our entire discussion is the assumption that digital music necessarily requires the musician’s eyes on the screen. Computers provide a great deal of visual feedback and little to no haptic feedback. Has there ever been a more visual form of music than audio and MIDI in the DAW? It has been found in many studies that the visual modality can impact and dominate the auditory modality in human multi-modal perception mechanisms. Vines, Krumhansl, Wanderley and Levitin (2006) found that visual cues have a complex impact on the auditory perception of a musical performance, serving both to enhance or reduce subjects’ perception of musical tension and extend perception of phrasing throughout a performance. Valente, Myrbeck and Braasch (2009) observed that both visual and auditory cues impact listener’s sense of spatialization and perceived auditory width. Some sound engineers believe that dependency on visual feedback while working with sound in DAWs can produce negative aesthetic effects (e.g., Owsinski: 2006). This is a true challenge for user interface designers of audio software interfaces. Significant investigation should be made into the types and quantities of visual feedback that are necessary for the user to efficiently complete the required audio-centric tasks.
Will we one day regard waveforms on a screen the way we regard wax cylinders, as necessary but obsolete evolutionary steps? Or will digital music grow ever more visually immersive and demanding? Will new interface metaphors inhibit creativity through the cognitive burdens they impose, or will they provoke new forms of creativity presently undreamt of?
Acknowledgments
Thanks Huron, Milo and Finlay.
Notes
[1] 486 refers to the microprocessors prevalent in personal computers of the 1990s.
[2] We may be finally witnessing a shift away from the disk metaphor; the authors edited this article using Google Docs, which obviates the need for a “save” icon by simply saving each keystroke automatically in the background.
[3] It is worthwhile noting that in a 1986 episode of ‘The Computer Chronicles’, Hybrid Arts’ Bob Moore demonstrated his ‘tapeless recording studio’, the ADAP DAW, which featured an on-screen mixer resembling an analog console: https://www.youtube.com/watch?v=D8lSMytqdEY)
[4] Gordon Moore (1965) predicted that the capacity of a transistor would double every two years, leading to continually faster and cheaper computing.
Bibliography
Aikin, J. (1997) ‘Keyboard Reports: Steinberg Cubase VST’. In: Keyboard. January, pp. 85–90.
Anderton, C. (2014) ‘MIDI Reloaded’. In: Keyboard. March, pp. 36–37.
Bachand, R. (1988) ‘Master Tracks Pro: MIDI Power From Passport’. In: START. 2, 5. Available at: http://www.atarimagazines.com/startv2n5/mastertrackspro.html (Accessed: August 2014).
Barber, S. (2012) ‘Soundstream: The Introduction of Commercial Digital Recording in the United States’. In: Journal of the Art of Record Production. [Online] 7. Available at: https://arpjournal.com/2140/soundstream-the-introduction-of-commercial-digital-recording-in-the-united-states/
Brown, J. (2009) Rick Rubin: In the Studio. Toronto: ECW Press.
Buskin, R. (2007) ‘Classic tracks: Les Paul and Mary Ford “How High the Moon”’. In: Sound on Sound [Online] January. Availabe at: http://www.soundonsound.com/sos/jan07/articles/classictracks_0107.htm (Accessed: August 2014).
Carrascal, J.P., & Jordà, S. (2011) ‘Multitouch Interface for Audio Mixing’. In: Proceedings of the International Conference on New Interfaces for Musical Expression, Oslo, Norway. pp. 100–103.
Clarke, E. F. (2007) ‘The Impact of Recording On Listening’. In: Twentieth-Century Music. 4, 1, pp. 47–70.
Cogan, J. (2003a) ‘Bill Putnam’. In: Mix [Online] October. Available at: http://mixonline.com/recording/interviews/audio_bill_putnam/ (Accessed: August 2014).
Cogan, J. (2003b) ‘Bill Putnam’. In: Mix [Online] November. Available at: http://mixonline.com/recording/business/audio_bill_putnam_2/ (Accessed: August 2014).
Collins, M. (1996) ‘TDM Software Plugins: Digidesign Pro Tools, Part 1’. In: Sound on Sound. [Online] February. Available at: http://www.soundonsound.com/sos/1996_articles/feb96/tdmplugins.html (Accessed: August 2014).
Dahlstedt, P., Karlsson, P., Widell, K., & Blomdahl, T. (2014) ‘YouHero – Making an Expressive Concert Instrument from the GuitarHero Controller’. In: Proceedings of the International Conference on New Interfaces for Musical Expression. London.
Daley, D. (1999) ‘Recordin’ “La Vida Loca”: The Making of a Hard Disk Hit”’. In: Mix [Online] November 1. Available at: http://www.mixonline.com/mag/audio_recordin_la_vida/ (Accessed: August 2014).
Daley, D. (2004) ‘The Engineers Who Changed Recording: Fathers of Invention’. In: Sound on Sound [Online] October. Available at: http://www.soundonsound.com/sos/Oct04/articles/rocketscience.htm (Accessed: August 2014).
Dennett, D. (2001) ‘Collision Detection, Muselot and Scribble: Some Reflections on Creativity’. In: Cope, D. (ed.) Virtual Music. Cambridge, MA: MIT Press.
Diamante, V. (2007) ‘Awol: Control Surfaces and Visualization for Surround Creation’. Technical Report, University of Southern California, Interactive Media Division.
Droney, M. (2003) Mix Masters: Platinum Engineers Reveal Their Secrets for Success. Boston: Berklee Press.
Emerick, J. & Massey, H. (2007) Here, There and Everywhere: My Life Recording the Music of the Beatles. New York: Penguin.
Eno, B. (1979) ‘The Studio As Compositional Tool’. In: Down Beat [Online]. Retrieved from http://music.hyperreal.org/artists/brian_eno/interviews/downbeat79.htm (Accessed: August 2014).
Fine, T. (2008) ‘The Dawn of Commercial Digital Recording’. In: ARSC Journal. 39, 1, pp. 1–13.
Gelineck, S., Overholt, D., Büchert, M. & Andersen, J. (2013) ‘Towards an Interface for Music Mixing Based on Smart Tangibles and Multitouch’. In: Proceedings of the International Conference on New Interfaces for Musical Expression. Daejeon, Korea.
Gibson, P. (1997) The Art of Mixing: A Visual Guide To Recording, Engineering and Production. Boston, MA: Thomson Course Technology.
Graham, M. (1997) ‘The Plug-In Zone.’ In: Keyboard. August, pp. 32–49.
Halaby, C. (2011) ‘“It Was 21 Years Ago Today…” How The First Software DAW Came About’. In: KVR Audio [Online]. Available at: http://www.kvraudio.com/focus/it_was_21_years_ago_today_how_the_first_software_daw_came_about_15898 (Accessed: August 2014)
Hayles, N. K. (2002) ‘The Complexities of Seriation’. In: PMLA. 117, 1, pp. 117–121.
Holladay, A. (2005) ‘Audio Dementia: A Next Generation Audio Mixing Software Application’. In: Proceedings of the 118th AES Convention. Barcelona, Spain.
Hyman, D. (1998) ‘Rhapsody for George: A Centennial Celebration of Gershwin’s Legacy’. In: JAZZIZ. December, pp. 52–54.
Katz, M. (2004) Capturing Sound: How Technology Has Changed Music. Berkeley and Los Angeles: University of California Press.
Kurzweil, R. (2001) ‘The Law of Accelerating Returns’. [Online]. Available at: http://www.kurzweilai.net/the-law-of-accelerating-returns (Accessed: August 2014).
La Grou, J. (2014, April 4) ‘Studio of the Future: 2020-2050’. Lecture presented at New York University. New York, NY.
Lau, P. (1998) ‘Steinberg Cubase VST’. In: Canadian Musician. January. 20, 1, pp. 21.
LaViola, J. (2008) ‘Bringing VR and Spatial 3D Interaction to the Masses through Video Games’. In: IEEE Computer Graphics and Applications. 28, 5, pp. 10–15.
Lech, M., & Kostek, B. (2013) ‘Testing A Novel Gesture-Based Mixing Interface’. In: Journal of the Audio Engineering Society. 61, 5, pp. 301–313.
Leete, N. (1999) ‘Fairlight Computer’. In: Sound on Sound [Online] April. Available at: http://www.soundonsound.com/sos/apr99/articles/fairlight.htm (Accessed: August 2014).
Luhtala, T. Kymäläinen, & J. Plomp. (2011) ‘Designing a Music Performance Space for Persons with Intellectual Learning Disabilities’. In: Proceedings of the International Conference on New Interfaces for Musical Expression. Oslo, Norway.
Mann, S. (2002) Intelligent Image Processing. New York: John Wiley and Sons.
Marrington, M. (2011) ‘Experiencing Musical Composition In The DAW: The Software Interface As Mediator Of The Musical Idea’. In: The Journal on the Art of Record Production. 5. [Online]. Available at: https://arpjournal.com/845/experiencing-musical-composition-in-the-daw-the-software-interface-as-mediator-of-the-musical-idea-2/
Massy, S. (2010) ‘Gear Stories with Sylvia Massy: The Age of Customs’. In: Mix [Online] July. Available at: http://mixonline.com/recording/gear_stories/gear_stories_custom_consoles/ (Accessed: August 2014).
Miller, J. & Hammond, T. (2010) ‘Wiiolin: A Virtual Instrument Using the Wii Remote’. In: Proceedings of the International Conference on New Interfaces for Musical Expression. Sydney, Australia.
Milner, G. (2009) Perfecting Sound Forever: An Aural History of Recorded Music. New York: Faber and Faber.
Moore, G.E. (1965) ‘Cramming More Components onto Integrated Circuits’. In: Electronics. April 19, pp. 114–117.
Moorhead, J.P. (1989) ‘Creator and Notator: Super Sequencing, Super Scoring’. In: START. 3, 6. Available at: http://www.atarimagazines.com/startv3n6/creator_and_notator.html (Accessed: August 2014).
Morton, D. L. (2004) Sound Recording: The Life Story of a Technology. Baltimore: The Johns Hopkins University Press.
Owsinski, B. (2006) The Mixing Engineer’s Handbook: Second Edition. Boston: Thomson Course Technology PTR.
Pachet, F. & Delerue, O. (2000) ‘On-the-Fly Multi Track Mixing’. In: Proceedings of the 109th AES Convention. Los Angeles.
Paul, L. & Cochran, M. (2008) Les Paul – In His Own Words. York, PA: Gemstone.
Peng, L. & Gerhard, D. (2009) ‘A Wii-Based Gestural Interface for Computer-Based Conducting Systems’. In: Proceedings of the International Conference on New Interfaces For Musical Expression. Pittsburgh, PA.
Pogue, D. (2013) ‘Out with the Real’. In: Scientific American. February, 308, 2, p.29.
Ratcliffe, J. (2014) ‘The Hand Motion-Controlled Audio Mixer: A Natural User Musical Interface’. Masters Thesis, New York University.
Robjohns, H. (2001) ‘Universal Appeal: Universal Audio 1176LN Limiting Amplifier’. In: Sound on Sound [Online] June. Available at: http://www.soundonsound.com/sos/jun01/articles/universal1176.htm (Accessed: August 2014).
Rose, D. H., & Meyer, A. (2005) ‘The Future is In the Margins: The Role of Technology and Disability in Educational Reform’. In: Rose, D., Meyer, A., & Hitchcock, C. (eds.) The Universally Designed Classroom: Accessible Curriculum and Digital Technologies. Cambridge, MA: Harvard Education Press, pp. 13–36.
Savage, S. (2009) ‘It Could Have Happened: The Evolution of Music Construction’. In: Cook, N., Clarke, E., Leech-Wilkinson, D., & Rink, J. (eds.) The Cambridge Companion to Recorded Music. Cambridge, UK: Cambridge University Press, pp. 32–35.
Seabrook, J. (2012) ‘The Song Machine’. In: The New Yorker. March 26. Available at: http://www.newyorker.com/magazine/2012/03/26/the-song-machine (Accessed: August 2014).
Senturk, S., Lee, S.W., Sastry, A. Daruwalla, A., & Weinberg, G. (2012) ‘Crossole: A Gestural Interface for Composition, Improvisation and Performance using Kinect’. In: Proceedings of the International Conference on New Interfaces For Musical Expression. Ann Arbor, MI.
Shaughnessy, M. (1993) Les Paul: An American Original. New York: William Morrow.
Simons, D. (2004) Studio Stories: How the Great New York Records Were Made. San Francisco, CA: Backbeat Books.
Snow, D. (1990) ‘Cubase: Pro-Level MIDI Sequencer’. In: START. 5, 1. [Online] Available at: http://www.atarimagazines.com/startv5n1/cubase.html (Accessed: August 2014).
Söderman, J., & Folkestad, G. (2004) ‘How Hip-Hop Musicians Learn: Strategies in Informal Creative Music Making’. In: Music Education Research. 6, 3, pp. 313–326.
Swedien, B. (2003) Make Mine Music. Norway: MIA Musikk.
Tankel, J. D. (1990) ‘The Practice of Recording Music: Remixing As Recoding’. In: Journal of Communication. 40, 3, pp. 34–46.
The Economist. (2013) ‘What is Skeuomorphism? The Economist Explains’. In: The Economist [Online] June 25. Available at: www.economist.com (Accessed: August 2014).
Thibeault, M. D. (2011) ‘Wisdom for Music Education From the Recording Studio’. In: General Music Today. 25, 2, pp. 49–52.
Touzeau, J. (2009) Home Studio Essentials. Boston, MA: Course Technology Cengage Learning.
Ullmer, B., & Ishii, H. (2001) ‘Emerging Frameworks for Tangible User Interfaces’. In: Carroll, J.M. (ed.) Human–Computer Interaction in the New Millennium. Boston: Addison-Wesley, pp. 579–601.
Valente, D.L., Myrbeck, S.A., & Braasch, J. (2009) ‘Matching Perceived Auditory Width to the Visual Image of a Performing Ensemble in Contrasting Multi-Modal Environments’. In: Proceedings of the 127th Convention of the Audio Engineering Society. New York.
Vandemast-Bell, P. (2013) ‘Rethinking Live Electronic Music: A DJ Perspective.’ In: Contemporary Music Review. 32, 2-03, pp. 239–248.
Vines, B.W., Krumhansl, C.L., Wanderley, M.M., and Levitin, D.J. (2006) ‘Cross-Modal Interactions in the Perception of Musical Performance’. In: Cognition. 101, pp. 80–113.
White, P. (1995) ‘Karl Steinberg: Cubase and Computers’. In: Sound on Sound [Online] January. Available at: http://www.soundonsound.com/sos/1995_articles/jan95/karlsteinberg.html (Accessed: August 2014).
Yoo, M., Beak, J., & Lee, I. (2011) ‘Creating Musical Expression Using Kinect’. In: Proceedings of the International Conference on New Interfaces for Musical Expression. Oslo, Norway.
Zak, A. (2001) The Poetics of Rock: Cutting Tracks, Making Records. Berkeley, CA: University of California Press.