The effect of spatial treatment of music on listener’s emotional arousal

Introduction

Sound engineers and producers manipulate various properties of recorded audio in the creation of a finished recorded product. This paper discusses the spatial properties of a recording. These properties are the cues that allow us to discern the position in space from which a sound is originating, and also to make judgements about the space in which the sound is being produced. During the production of a finished recording, the engineer/producer creates a ‘virtual sound stage’ and places the recorded sources within that stage. This stage is either an entirely manufactured space, created with electronic effects, or is a reproduction of the actual performance environment, though often taken from an ‘ideal’ perspective not actually available to an audience member. Within this virtual space, sound sources can be arranged using pan pots, which alter the amplitude balance between two or more speakers giving rise to an Inter-aural Amplitude Difference (IAD) between a listener’s ears, using delays, which imitate the Inter-aural Time Difference (ITD) cue which exists due to the distance between the listener’s ears, and very occasionally using Head Related Transfer Functions (HRTF) which encode both the IAD and ITD and also include filtering resulting from physiological features such as the pinna which allow vertical and front-to-back discriminations to be made. Alternatively, stereo, surround or three dimensional microphone techniques can encode directly positional information using one or all of these techniques. Often studio recordings may make use of both recorded positional information, and virtual positioning, for instance by adding an instrument recorded using a stereo technique to a panned mix.

Simply listening to any commercial production of suitably high standard provides ample evidence of the care and attention that goes into the spatial treatment of recorded sound. The ongoing discourse surrounding the technical aspects and merits of surround sound in the AES in particular points toward the importance with which spatial treatment is held amongst recording professionals. It has been claimed that up to one third of the perceived quality of a recording comes from its spatial qualities(Rumsey 2006). Very little however has been written as to the affect this spatial treatment has on the listener’s emotional response to the recorded music. This paper seeks to tie together conceptually, some of the current research from seemingly disparate fields relevant to this question. It will then describe and discuss results from a pilot study designed to establish a methodology for testing the affect of spatial treatment on listeners using the measurement of Electro Dermal Activity (EDA) also known as Galvanic Skin Response (GSR). Finally it will propose some potential explanations for the effect spatial treatment has on our experience of recorded music.

Context

The study of emotion in music psychology has experienced significant growth in recent years. The question of whether and how emotion may be induced by music continues to be debated (see Juslin, Vastfjall 2008, Scherer 2004) and a number of frameworks have been suggested which may be useful in investigating the influence of spatial properties on musical listening. One difficulty is in defining what is meant by emotion in this context. Juslin and Sloboda (2001) outline a number of approaches to classifying emotions, and point to some problems with adopting general approaches to emotion from psychology. Juslin (2009) provides six subcomponents for the definition of emotion, of which physiological response is one. It is assumed by Juslin and Vastfjall (2008) that musical emotions share the same mechanisms as normal emotions, and thus the same emotions should be elicited. They do however point out that emotions induced by music are different in that they lack a goal orientation. The assumption that basic emotions require cognitive appraisal may instead indicate that a different mechanism is used for musical emotion induction.

A more modest approach may be to study feeling as a subset of emotion. Scherer (2004) considers feeling using a valence versus arousal model, a two dimensional approach similar to examples of dimensional models of emotion (Sloboda, Juslin 2001). This simplification aids the development of experiment methodologies by providing a useful measure for quantitative analysis.

The most complete framework of psychological mechanisms for emotional induction presented thus far (Juslin, Vastfjall 2008) provides six such mechanisms. Of these, evaluative conditioning, visual imagery, episodic memory and musical expectancy would seem to have some relevance to spatial treatment of music, particularly where some form of architectural acoustic may be associated with a musical piece or genre.

For a space to be detected or experienced aurally, it is fairly obvious that there must be a sound to stimulate the acoustic environment. A model of production rules for emotional induction by music in which various features are part of a multiplicative function is proposed by Scherer and Zentner (2001). This model provides a framework to allow for quantitative measurement of the various factors so that appropriate weighting can be applied. Of the four main categories of features put forward, spatial characteristics fit within the contextual features. Conceptually, this allows for spatial characteristics to mediate in some way, the induction of emotion by recorded music.

The various theories advanced within music psychology provide a number of possible explanations for any effects resulting from changes in spatial treatment, depending on the mechanisms involved in the emotion elicitation.

Spatial treatment has been studied from the perspective of quality evaluation. Berg (2002) analysed the responses of listeners using their own terms to describe the spatial attributes of recordings, creating useful clusters of terms. For instance, the term envelopment clustered with ‘positiveness’, naturalness and presence. This analysis provides a useful look at how basic attributes of spatial characteristics (such as source localisation or apparent source width) contribute to higher order conceptions such as presence. Rumsey (2006) gives a useful summary of the research to date in this area, and points out that the ultimate goal of this research should be to develop models such that physical attributes of sound can be mapped accurately to perceptual attributes. The measures of quality suggested all appear to rely on self-reporting, and the questions raised revolve around what sort of scales and language should be standardised to make useful comparisons of spatial quality.

Useful studies for this topic have also come from the field of Virtual Environments (VE). The end goal of a VE is to create a sense of presence, of actually being in a virtual world. In a very relevant experiment, Vastfjall et. al. (V2002) asked listeners to rate their emotional reactions as well as the expressed emotions of recordings with various levels of added reverberation. The listeners were also given the visual stimuli of a (computer generated) concert hall. His rating scale was based on a circumplex model of arousal and pleasantness, very similar to the two dimensional feeling construct suggested by Scherer (2004). Correlation was found between the level of both reverberation and pleasantness, and also of arousal. Vastfjall claims that this provides evidence that high levels of arousal corresponds to a higher level of presence. A further study (Vastfjall 2003) looked at self-reported measures of presence and emotional reactions resulting from the use of mono, stereo and six channel reproduction systems. This study provided further evidence of the correlation between reported presence and levels of emotional reaction. Interestingly, a later study which tested for increased levels of reported presence as a result of accurate spatial rendering for a given visual stimuli, found no such correlation (although the investigators cited possible methodological reasons for this)(Larsson, Västfjäll et al. 2007).

Much of the existing research into the reproduction of spatial audio therefore focuses on the ability to create a sense of presence, a sense that the virtual experience is not being mediated. The self-reported data seem to indicate that presence has a correlation with arousal, potentially mediating the arousal caused by the auditory stimulus. It is not clear however, that totally accurate rendering of acoustic environments is necessary, or in fact ideal, for a sense of presence to occur. It does not seem to be clear at this stage whether the correlation of presence to emotional arousal reflects causality in one direction or another, however arousal has been shown to correlate with other musical characteristics.

In physiological theories of emotion, arousal is understood to provide the strength of emotional response, though not its character (Rickard 2004).  A good measure of physiological arousal is Electro Dermal Activity (EDA), a measure of changes to skin conductance. Another reaction, known as the chills or thrills response, is also shown to correlate well with arousal, although not always with changes in EDA (Rickard 2004, Guhn, Hamm et al. 2007, Grewe, Nagel et al. 2007, Panksepp 1995, Craig 2005). Measurements of EDA and the discrete nature of chills/thrills provide useful quantitative data for testing for stimuli that elicit changes in arousal and therefore indicate strong emotional responses. For instance, Sloboda (1991) used a survey to identify structural attributes in music which participants reported as eliciting the chills/thrills response. These features fit well with Meyer’s musical expectancy theory. Guhn et. al. (2007) found a different set of features in a continuous response experiment which took physiological measurements along with self-reported data.

Whilst there are a number of promising indications, it has yet to be shown directly that manipulation of spatial characteristics of recorded music can have a measurable effect on the arousal component of an emotional response. The present study is designed to test whether a significant correlation can be measured between the spatial manipulation of a recorded piece and the arousal arising from listening to that piece.

Method

For this investigation, two stimuli that differed only in respect of their spatial characteristics were presented to a sample of listeners in a counterbalanced, repeated measures design. A studio recording was therefore produced so that all of the recorded parameters of the stimuli could be controlled.

The music recorded as the stimuli in this experiment was Schubert’s Stänchen D920a for contralto and male chorus. This music was chosen because the ensemble (piano and voices) was available to me quickly, because it was small enough to fit inside the recording studios at the University and because anecdotal evidence suggested that it would elicit at least some arousal response. It also has a sufficient number of parts to make the spatial manipulation obvious. To make the process of learning the piece more straightforward, an English translation of the piece was chosen.

In order to control all spatial aspects of the piece, it was recorded in the main studio at the University of Hull’s Salmon Grove Studios. The main studio is a very ‘dead’ space acoustically, with highly absorptive panels on all of the walls and on the ceiling, such that reflections of the vocal sounds were minimised.

The piano part of the piece was recorded initially as MIDI data from a Kurzweil digital stage piano directly onto a MIDI track in Pro Tools. This was subsequently processed using the Native Instruments Akoustic Piano plugin which allowed for the synthesis of a Steinway grand piano sound without any acoustic reinforcement.

The contralto part was recorded simultaneously with the piano to allow for elastic interpretations of tempo. An AKG C414 XLS large capsule condenser microphone was used to capture the voice. The pre-amplification came from the Audient ACS 8024 console and the signal was bussed without further signal processing to the Pro Tools HD system and recorded at 24bit/96kHz using standard 96i/o converters. Various takes were combined (comped) together to produce the final ‘performance’. It is worth noting that the performers were given reverb in their headphones during the recording phase, though this reverb was not recorded or used in the final mix.

The male ensemble was overdubbed onto the piano and contralto parts. The piece had four parts for the male ensemble. Four singers were initially recorded, each of them singing the same part into individual microphones. This was overdubbed and then the next part was sung and overdubbed such that my four singers generated twenty two tracks of vocals to give more of a chorus effect (not all of the parts were within the vocal range of all of the singers). Finally a fifth singer recorded the bottom line to give additional weight to the ‘low A’ at some of the cadence points. The male singers used a combination of AKG C414 XLS and Rode NT1a microphones, using the same signal path as the main vocal line.

Basic editing and a generous amount of pitch correction was applied to the male chorus lines. Mix levels were set and automation was applied in sympathy with the dynamics of the piece. No dynamic processing was applied.

The final mix was produced with a quadraphonic monitoring setup of Mackie HR824 powered monitors in a square configuration. The mono mix was set such that the same, mono signal was produced by each of the four speakers. Rather than trying to predict at this early stage what reverb parameters were most likely to favour arousal responses, a spatial treatment consistent with standard production aesthetic, perhaps weighted toward a less subtle application of room reflections, was applied. Through subjective evaluation, Apple Logic’s Space Designer plugin was found to create the most appropriate effect of the surround capable reverbs available. Stems from the mono mix were imported into Logic, and panned across the front of the sound stage. The main vocal line and the piano were situated centrally (the piano consisted of a stereo pair which was narrowly distributed in a pseudo-realistic fashion) whilst the twenty three tracks of the male chorus were distributed approximately evenly, within their parts, across the sound stage from hard left to hard right. The difference between the mono/non-spatialised and spatialised mixes was judged to be obvious. The only further processing applied to the mixes was a gain stage that made sure that the two mixes had an equal RMS level when the four channels were taken as a whole. With an RMS level of -27dB full scale (when the louder mix was normalised to peak at full scale) the mixes were obviously very dynamic, but were judged appropriate within the quadraphonic setting. The two mixes peaked within 2dB of each other indicating a very similar dynamic range.

For the data gathering phase, two separate Pro Tools sessions were created. Both sessions began with five minutes of birdsong, used to create a baseline level for the EDA measurements. Following this, one session had the spatialised mix presented whilst the other had the non-spatialised mix presented first. The piece lasts for approximately five minutes and fifty seconds. Following this, five more minutes of birdsong was presented, followed by the alternate mix from that presented first.

The room used for the measurements was the composition/film studio at the University of Hull. This room was judged a more comfortable environment than the main production studios for the experiments. The same Mackie HR824 powered monitors were installed in the same quadraphonic configuration. Using a sound pressure level meter, the monitoring system was calibrated to the K20 system(Katz 2002) such that each speaker produced 83dB when white noise was played at -20dBFS. A very comfortable chair was installed for the benefit of the subjects.

30 subjects completed the experiment. A wide range of subjects volunteered for the experiment and no selection criteria was applied except that the subjects were not involved in the recording of the stimuli, and had no knowledge in advance of the purpose of the experiment.

Sex Frequency Percentage
Male 16
Female 14
Musical Training
None 8 27%
Student 10 33%
Amateur 3 10%
Professional 9 30%
Music Production Training
Yes 16 53%
No 14 47%
Normal chills frequency
Never 0 0%
Occasional 6 20%
Sometimes 13 43%
Frequent 8 27%
Very frequent 3 10%
Familiarity with the piece
None 28 94%
Somewhat familiary 2 6%

Each subject had their EDA continuously measured using a Biopac MP45 USB interface and recorded in Biopac Student Lab Pro software on a PC laptop computer. Hardware and software settings were left as recommended such that the measured signal was sampled at 1000Hz. Isotonic electrode gel was applied to a pair of electrodermal finger transducers (BSL-SS3LA) and these were attached to the index and middle fingers of the subjects’ right hand. As the electrodes were given time to bed in, the subjects were asked to read an explanation of the chills/thrills response and asked to indicate using a motorised fader on a Mackie Control Extender, when and to what extent they experienced such responses. The fader movements were recorded as MIDI volume data on a MIDI track within the Pro Tools session used to play the stimuli.

Playback of the Pro Tools session and recording of the EDA response were synchronised. At the end of the measurement, subjects were thanked for their time, asked to fill out a questionnaire and the purpose of the experiment was explained to them. As an afterthought, each subject was also asked if they were aware of the difference between the musical excerpts they had heard.

Analysis

The EDA measurements from the experiment were scored as per the instructions recommended by the equipment manufacturer (Application note 216: scoring methods for electrodermal response changes. 2008). Firstly the signal was smoothed using a Finite Impulse Response (FIR), low pass filter at 1Hz to eliminate noise from the signal. Next, the signal was passed through an Infinite Impulse Response (IIR), high pass filter at 0.05Hz to convert it into an AC signal from its original DC format. Then, a transformation to find spikes above a threshold of 0.5% of the baseline EDA for each subject was run across the AC signal to identify arousal change events (Andreassi 2000). This transformation outputs a value of 1 each time the threshold is exceeded. Next, the peaks of this (now binary) signal were counted for the period of each musical stimulus. The values for the spatial and non-spatial event counts were then recorded for each participant, with the order of stimulus presentation also noted.

The data were transcribed into SPSS software for analysis. Kolmogorov-Smirnov (K-S) and Shapiro-Wilk normality tests were performed on the spatial and non-spatial results and on the difference between the spatial and non-spatial results. These tests showed that the distribution for spatial chills and for the difference between the spatial and non-spatial chills was sufficiently normal. The analysis did show there to be two outlier measurements, one positive and one negative, and these were duly discarded. This reflects the natural variation in the skin conductance of individuals and also the variation in the physiological responses to arousal and varied arousal responses. It is noted that the non-spatial chills data passed the K-S test for normality, but failed the more powerful Shapiro-Wilk by the barest margin (significance=0.049).

In order to limit the impact of the inter-subject variability, a Paired Samples T-Test was performed on the data. Since the difference between the conditions is analysed using the T-test, and this data was normally distributed, the test was judged appropriate. The legitimacy of the results of the test are in this instance based on the premise that the counterbalanced testing would eliminate sufficiently the systematic variation arising from the order of presentation of the stimuli. Assuming a significance value of 0.05, this test failed to show a significant correlation present for EDA events comparing the spatial and non-spatial treatments (p value = 0.089). Testing the correlation with order of presentation using the same technique gives a more statistically significant result (p value = 0.0005).

A further analysis was conducted of differences in amplitude between EDA changes at specific ‘chill’ moments within the piece. An analysis of the timing of the chills identified nine potential clusters where chills were occurring. Three of these clusters (the first three) had useful data (chill responses) occurring in both spatial and non-spatial conditions in more than half of the subject listeners.

The first of these clusters was a response to the first entry of the contralto voice, and occurred 14 to 15 seconds into the piece (which is approximately 5 minutes and 50 seconds in length). The second cluster occurred between 1 minute and 4 seconds and 1 minute and 8 seconds, and seems to have resulted from a build-up of tension created harmonically and dynamically. Harmonically, tension is built using an enharmonic sequential progression (C-Fmin, C#-F#min, D-Gmin/D). In this progression, flattened thirds of the minor chords become fifths of the major, whilst the third of the major functions as a leading note to the minor chord. This tension is heightened through repetition, both texturally, and through the alternation from major to minor, supported by the overall upward movement of the vocal parts.

Figure 1. Section from Schubert’s Ständchen, D920a representing the second chill cluster in the piece

The third cluster appears to be the result of both a dynamic swell from piano to fortissimo and back to pianissimo, and also another use of key changes, using a different chord progression utilising dominant sevenths and common notes (F – F7, Bb – C7, F – F7) and a sort of interrupted cadence to Db (instead of the expected Bb). This sets up a different, more localised tension and release as well as building an expectation at the cadence point which is subsequently interrupted.

Figure 2. Section from Schubert’s Ständchen, D920a representing the third chill cluster in the piece

Normality tests were conducted on the log SCR (baseline-adjusted amplitudes) of the clustered responses. Normality was demonstrated when presentation order was included as a factor in the tests and two outlier measurements were discarded.  Paired sample t-tests on this data failed to show a significant relationship between the amplitude of the SCR and the spatial treatment of the audio. It is interesting to note that despite failing to establish significance, the two clusters which are potentially dependant on harmony showed an opposite tendency from that hypothesised in that, on average, a larger SCR amplitude was recorded for the non-spatial treatment. For the cluster that had no harmonic basis, the mean SCR was higher for the spatially treated sound.

Responses from the brief questionnaire were also entered into SPSS for analysis. One-way ANOVA results looking for evidence of correlations found no significant correlations for age, sex, level of musical training or self reported chills frequency in everyday listening, whilst the responses to the question of familiarity with the piece were such that no conclusions could be drawn. One question that showed some potential was the question of whether the subject had any sound production/engineering training. A significance value (p) of 0.087 was found for this correlation.

Discussion

This experiment investigated whether a correlation exists between the number of arousal events as measured by scored changes in EDA and the spatial treatment of recorded material. This question is important because it would mean that we could demonstrate that the spatial treatment of audio has a measureable affect not just on the preferences people report for spatial treatment, but also on the way that they respond to music emotionally. Of course, physiological arousal is but a small component of the various models proposed for emotional induction by music, however it would provide a useful quantitative measure. Furthermore, the valence/arousal model provides a useful analogue to the way in which spatial treatment (and potentially other treatments in the studio such as dynamic processing) may have a role in mediating the emotional responses arising from music.

The results from this experiment do not support conclusively such a correlation. The experiment does induce obvious systematic and non-systematic errors in that both the familiarity with the music, and the length of time listeners spend under experimental conditions, play a more statistically significant role in the mediation of arousal events than does the spatial treatment. Whilst these effects on the mean can be assumed to be neutralised by the repeated measures design, the relatively large standard deviations derived from this experimental methodology mean that statistical significance was not reached with the sample for this pilot. On the other hand, if the present results in relation to overall number of arousal events were to be achieved with a larger sample size (for instance n=90) then significance could be demonstrated.

Regarding the more troubling lack of significance in relation to the amplitude of arousal events according to spatial treatment, it is interesting to note that the direction of the mean effect differed according to the nature of the cue that created the arousal event. The first cluster of responses were centred on the entrance of a new voice (the lead alto) in the piece, whilst the other two clusters correlated with an increase in tension brought about by deliberate disruption of harmonic expectations as well as dynamic intensity. These musical structures have been shown to create chill responses in a number of other studies (Guhn, Hamm et al. 2007, Grewe, Nagel et al. 2007, Sloboda 1991). One hypothesis that could be drawn from this result is that different triggers are mediated by different spatial characteristics. For instance, in the cases where harmonic progression was arguably central to the arousal event, it may be that spreading the voices across the stereo field, thus reducing the sense of ensemble, reduced the strength arousal response because the voices became separate auditory streams (Bregman 1990). Such as theory is consonant with the results from Berg’s study (Berg, Rumsey 2000) which suggested that indistinct stereo positioning (or large apparent source width) deriving from strong lateral reflections was preferred by naïve listeners (though not by trained listeners). This separation of the auditory streams may be working against the increase in arousal deriving from an increased sense of presence as discussed in the virtual reality literature. An increase in the sense of presence means that there is a commensurate decrease in the sense that a given experience is mediated by some form of technology, that the user feels ‘present’ in the virtual environment. It has been suggested, though not proven (Larsson et al. 2007) that an increase in presence derives from an increase in realism, and therefore that accurate spatial rendering should increase the sense of presence experienced by the listener. As mentioned by Guhn et. al. (2007), it is ultimately a combination of potential chill inducing cues which result in the chill response. Unpicking which cues are potentially affected by which spatial treatments would require the identification or deliberate composition of musical passages that exhibit each type of cue in isolation.

Rather than simply repeating the experiment with a larger sample in order to establish potential statistical significance, it is proposed that the methodology itself be altered in response to the hypothesis suggested above and the issues with systematic and non-systematic errors inducing large standard deviations. To this end, a new experiment is proposed in which three stimuli are created instead of two. One stimulus will again be entirely lacking in reverberation and will be presented without source separation. The second will be dry, but will have stereo separation applied. The final stimulus will have room reflections, but the sources will appear from the same position on the sound stage. In this way, it can be assessed whether the sense of presence, and the effect of source separation, are independent and potentially contradictory in how they influence different arousal triggers. The stimuli will also be presented in a different way. In order to reduce the level of standard deviation derived from ordering effects, an alternate mix of the recorded piece (with some separation and some reflections) will be given to the participants to listen to several times before the experiment so that they are familiar with the piece. Each stimulus will then be presented to the listeners at 24-hour intervals, that is, on successive days, at the same time of day. It is hoped that this will reduce the amount of intra-subject difference resulting from the order of presentation. Grewe et. al. (2007) found some evidence to suggest that chill response to a piece of music is relatively stable over at least three days of testing.

Should the results of this alternative methodology prove more significant, then a larger sample study using newly recorded stimulus and this altered methodology will be pursued.

About the Author

Michael Fletcher
University of Hull
m.p.fletcher@hull.ac.uk

Bibliography

Application note 216: scoring methods for electrodermal response changes. 2008. Goleta, California. Biopac Systems Inc.

Andreassi, J.L. 2000. Psychophysiology: Human behavior & physiological response. Fourth edition. London. Lawrence Erlbaum Associates, Publishers.

Berg, J. 2002. Systematic evaluation of perceived spatial quality in surround sound systems. PhD Thesis. Luleå University of Technology, School of Music at Piteå, Sweden.

Berg, J. and F. Rumsey. 2000. Correlation between emotive, descriptive and naturalness attributes in subjective data relating to spatial sound reproduction. Presented at 109th AES Convention, 22-25 September 2000. Audio Engineering Society.

Bregman, A. 1990. Auditory scene analysis: the perceptual organisation of sound. Cambridge, Mass. MIT Press.

Craig, D. 2005. An exploratory study of physiological changes during ‘chills’ induced by music. Musicae Scientiae 9(2), 273-288.

Grewe, O., Nagel, F., Kopiez, R. and E. Altenmüller. 2007. Listening to Music as a Re-creative Process: Physiological, Psychological, and Psychoacoustical Correlates of Chills and Strong Emotions. Music Perception 24(3), 297-314.

Guhn, M., Hamm, A. and M. Zentner. 2007. Physiological and Musico-Acoustic Correlates of the Chill Response. Music Perception 24(5), 473-485.

Juslin, P.N. 2009. Emotional responses to music. In: S. Hallam, I. Cross and M. Thaut, eds, The Oxford Handbook of Music Psychology. Oxford. Oxford University Press, 131-140.

Juslin, P.N. and D. Västfjäll. 2008. Emotional responses to music: the need to consider underlying mechanisms. The Behavioral and brain sciences 31(5), 559-75; discussion 575-621.

Katz, B. 2002. Mastering audio : the art and the science. Oxford. Focal.

Larsson, P., Västfjäll, D., Olsson, P. and M. Kleiner. 2007. When What You Hear is What You See: Presence and Auditory-Visual Integration in Virtual Environments, Presence 2007: The 10th Annual International Workshop on Presence, October 25 – 27 2007. International Society for Presence Research 11-18.

Panksepp, J. 1995. The emotional sources of ‘chills’ induced by music. Music Perception 13, 171-207.

Rickard, N.S. 2004. Intense emotional responses to music: a test of the physiological arousal hypothesis. Psychology of Music 32(4), 371-388.

Rumsey, F. 2006. Spatial audio and sensory evaluation techniques – context, history and aims, Proceedings of the International Seminar on Spatial Audio and Sensory Evaluation Techniques, April 6-7 2006.

Scherer, K.R. and M.R. ZENTNER. 2001. Emotional effects of music: production rules. In: P.N. Juslin and  J.A. Sloboda, eds, Music and emotion: theory and research. Oxford. Oxford University Press, 361-392.

Scherer, K. 2004. Which Emotions Can be Induced by Music? What Are the Underlying Mechanisms? And How Can We Measure Them? Journal of New Music Research 33(3), 239-252.

Sloboda, J.A. and Juslin, P.N. 2001. Psychological perspectives on music and emotion. In: P.N. Juslin and J.A. Sloboda, eds, Music and Emotion: Theory and Research. Oxford. Oxford University Press, 71-104.

Sloboda, J.A. 1991. Music Structure and Emotional Response: Some Empirical Findings. Psychology of Music 19(2), 110-120.

Västfjäll, D. 2003. The subjective sense of presence, emotion recognition, and experienced emotions in auditory virtual environments. Cyberpsychology & behavior : the impact of the Internet, multimedia and virtual reality on behavior and society 6(2), 181-188.

Västfjäll, D., Larsson, P. and M. Kleiner. 2002. Emotion and auditory virtual environments: affect-based judgments of music reproduced with virtual reverberation times. Cyberpsychology & behavior : the impact of the Internet, multimedia and virtual reality on behavior and society 5(1), 19-32.