Eyes wide shut, ears wide open - the promise and challenges of audio-based augmented reality

Last weekend, I sat on a park bench with my headphones on and eagerly watched a timer counting down on the app on my phone. As the counter hit zero, sound filled my ears and soon I was instructed to close my eyes. For the following 20 minutes, I was immersed into an audio play that played with my perception of the bench and its surroundings; where I was and who was around me.

While there is a body of research around audio-based augmented reality technologies, currently ‘AAR’ manifests largely through creative endeavours, such as the above. What does the future hold for AAR and what can we say about its user experience qualities?

Audience mindset and setting

Darkfield is a collective working in this space. During the pandemic, they pivoted from distinctive physical containers - ‘sonic theme parks’ - to at-home experiences delivered via their Darkfield Radio app. They also repurposed one of their earlier location-based projects into a virtual container in VR, as part of the first volume of the Immersive Arcade VR showcase.

I’ve experienced most of Darkfield’s work. They ask quite a lot from the audience - or do they? During the weekend, I walked to a park bench at a certain time, then an hour later I was supposed to sit on a passenger seat of a car (which I don’t have) before returning home for the final part, two hours after the park bench premise, instructed to sit in the largest room I could find.

Earlier in the year, with Eternal , I lied down on the sofa of my living room, listening to Dracula occupying the same space. With Double, I sat at my kitchen table facing another person. Eyes shut, but at least for me, what characterises these experiences is your constant awareness to keep your eyes closed, even if at times it is tempting to open them - to check that someone actually isn’t in the room - because that’s how authentic the spatial impression is.

When you compare these requirements with e.g. putting on a VR headset, are they higher or lower? The answer is highly individual, I suppose. Regardless, onboarding is an important part of designing the experience as a whole: I’m tempted to liken it to the ‘set and setting’ that people talking about psychedelics emphasise for the ingredients of a positive experience.


The varieties of AAR 

One of the reasons why the particular setting is important is that while Darkfield’s work augments reality, technically it is not spatially aware of you. One of their trademark tropes is a character whispering to your ear, as if they would be right next to you. However, if you turn your head to face the imagined character (albeit eyes closed), their voice turns with your head, i.e. it does reside in an absolute position in space but rather in a fixed location in relation to the headphone speaker in your ear.

This is because Darkfield’s work uses binaural audio - making a spatial audio experience would require tracking your head movement, or a device you are holding, such as a smartphone.

Tracking head or limb movements, and a person’s position within a space, is something that headsets can do, and therefore, once spatial audio becomes more commonplace in consumer devices - as we are seeing, e.g. with Apple’s efforts with the AirPods, a new design space opens for AAR experiences. 

Trade-offs: Embodiment vs accessibility

It is a tricky design space though, because the suspension of disbelief that Darkfield’s or similar work relies on is based on the audience members closing their eyes. While spatial audio has qualities that, by its nature, invites audiences to move and shift their attention in more embodied ways, such experiences predicate vision rather than darkness. 

Making experiences for ‘eyes wide shut, ears wide open’, i.e. audio-only, embodied spatial experiences that leverage illusion of place and plausibility (to use Mel Slater’s terminology), would mean developing for a VR headset and implementing a virtual space where the spatial dimension comes alive. Or, even if the headset would practically be an eye cover to block sight, making anything useful would necessitate using scene understanding to map the boundaries of physical surroundings and alerting the audience with guard rails, much like the guardian grids seen in VR operating systems currently - and from an immersion and suspension of disbelief points of view, they make the seams of the experience literally visible.

So there is a trade-off with accessibility - smartphone distribution makes AAR more accessible and closing one’s eyes is less of a barrier for entry than wearing a device on one’s face. In some ways - hearing-impaired people notwithstanding - I would argue AAR, with the combination of a smartphone and headphones, is the most accessible and most immersive type of entertainment one can find.

Eyes wide shut, ears wide open at SIGGRAPH 2021

In practice, for now, perhaps anyone with creative ambitions who wants to move beyond fixed-position binaural audio to spatial audio has to be content with creating seated experiences, ideally experienced with a rotating chair, much like the case tends to be with viewing 360 video or linear, on-the-rails VR experiences.

Nevertheless, I believe there is a future for more fully spatial and embodied AAR experiences and that the current stage is an interim phase. We need to start exploring this space, and that is what my SIGGRAPH Labs session this year is about. I’ve prepared a Unity project with which you can rapidly prototype AAR experiences. The approach is meant to encourage creative exploration without the need of programming skills.

If that makes you curious, join the conference and my session on August 13th - click on the link below for details!


Stay safe,