A Novel Approach to Streaming and Client Side Rendering of Multichannel Audio with Synchronised Metadata

Matthew Paradis, Chris Pike, Richard Day, Frank Melchior

Object based audio broadcasting is an approach which combines audio with metadata that describes how the audio should be rendered. This metadata can include spatial positioning mixing parameters and descriptors to define the type of audio represented by the object. In this user-led interactive demo we show an approach to enabling the streaming of multichannel audio and synchronised metadata to the browser. Audio is rendered in the browser to multiple formats based on the information contained in the synchronised metadata channel. This allows adaptive mixing and rendering of content and user interaction. Based on the MPEG/DASH standard this approach allows an arbitrary number of audio channels to be presented as discrete inputs to the Web Audio API (dependent on any channel limit imposed by the browser). Binaural, 5.1 and stereo renders can be generated and selected for output by the user in real time without any change to the source media stream. Channels marked as being interactive can have their properties exposed to the user to adjust based on their preferences. The audio and metadata is originated from a single BWF file compliant with ITU-R BS 2076 (Audio Definition Model) with the audio being encoded using AAC (as per the MPEG/DASH standard) and the metadata represented in JSON format to the browser. This approach provides a flexible framework for the prototyping and presentation of new audio experiences to online audiences and provides a platform for delivery object based audio to online users.

            
@inproceedings{2016_EA_54,
  abstract = {Object based audio broadcasting is an approach which combines audio with metadata that describes how the audio should be rendered. This metadata can include spatial positioning mixing parameters and descriptors to define the type of audio represented by the object. In this user-led interactive demo we show an approach to enabling the streaming of multichannel audio and synchronised metadata to the browser. Audio is rendered in the browser to multiple formats based on the information contained in the synchronised metadata channel. This allows adaptive mixing and rendering of content and user interaction. Based on the MPEG/DASH standard this approach allows an arbitrary number of audio channels to be presented as discrete inputs to the Web Audio API (dependent on any channel limit imposed by the browser). Binaural, 5.1 and stereo renders can be generated and selected for output by the user in real time without any change to the source media stream. Channels marked as being interactive can have their properties exposed to the user to adjust based on their preferences. The audio and metadata is originated from a single BWF file compliant with ITU-R BS 2076 (Audio Definition Model) with the audio being encoded using AAC (as per the MPEG/DASH standard) and the metadata represented in JSON format to the browser. This approach provides a flexible framework for the prototyping and presentation of new audio experiences to online audiences and provides a platform for delivery object based audio to online users.},
  address = {Atlanta, GA, USA},
  author = {Paradis, Matthew and Pike, Chris and Day, Richard and Melchior, Frank},
  booktitle = {Proceedings of the International Web Audio Conference},
  editor = {Freeman, Jason and Lerch, Alexander and Paradis, Matthew},
  month = {April},
  pages = {},
  publisher = {Georgia Tech},
  series = {WAC '16},
  title = {A Novel Approach to Streaming and Client Side Rendering of Multichannel Audio with Synchronised Metadata},
  year = {2016},
  ISSN = {2663-5844}
}

Download PDF Watch video