The concept of interactive audio (IA) characterizes techniques designed to allow specifically created audio, placed in a given application, to react to user input and/or changes in the context.
The technology behind this concept can be applied to many software domains, such as games, guidance systems, dynamic music generators, audio soundscapes or Augmented Reality Audio (ARA) applications. The common denominator on all these examples is the need of an interchange format, along with an interoperable audio system able to play the interactive sound tracks.
We have developed an advanced interactive audio system based on two components:
- An XML interchange format for describing IA scenes: A2ML, and its guidance-oriented derivation MAUDL.
- A mobile sound engine API implementing the A2ML and MAUDL features, currently available for the iOS platform (Android may be supported in the future).
Augmented Reality Audio (ARA) tools
Augmented Reality Audio is used in many mobile applications like geolocalized games, non-linear audio walking tours, navigation systems for visually impaired people. Different types of navigation will require different types of applications. For example, a mountain biker navigation application will be very different from a guidance application for visually impaired people.
The rendering of an ARA scene can be experimented through the use of bone conduction headsets, headphones with integrated microphones or earphones with acoustically transparent earpieces, with the audio being played by a mobile phone. ARA applications can be designed so that they do not interfere with the user practicing other activies, i.e., the application leaves the user's hands free and does not require visual attention from the user.
All of the three characteristics of an AR system set different requirements for ARA software and hardware.
- an ARA scene has to be authored through the join use of two XML languages, one for the representation of the real world and the other one for the representation of the 3D virtual audio scene, the link between the two being done through a tag- based dispatching language.
- for interaction in mobile usage, outdoor tracking has to be done through GPS and indoor tracking through embedded sensors like accelerometers and magnetometers or external ranging sensors. A user of an ARA application can interact via the microphones in the headset, speech or sound recognition being be used for controlling the application. The audio language must be an event-based language to allow interactive audio through instantiation of sound models.
- for 3D rendering of sound objects, the user's position and orientation need to be estimated in real time. With real-time head tracking the virtual sound objects can be tied to the surrounding environment and thus the virtual environment stays in place even if the user moves. Another benefit with real-time head tracking is that the dynamic cues from head rotation help localizing sound objects in the environment, especially for front-back confusion.