This Web SDK is currently in BETA, and is subject to change. Some features may not function correctly, or may be removed in future versions of the software.
High Fidelity's Audio SDK defines a virtual audio space for 2D or 3D applications. Within the virtual audio space are two types of endpoints: (1) sound sources and (2) listeners. Each of these endpoints has a position within the virtual audio space. The audio from each sound source is fed into High Fidelity’s Head-Related Transfer Function (HRTF) algorithm so that each listener hears the sound as if it was generated from the sound source's relative direction and distance from the listener. The web application can set or change the positions of sound sources and listeners, as well as the direction the listener is facing. Changes to sound source positions can be synchronized across the network to other instances of the application through the use of metadata.
The library supports tight coupling and a high update rate for a sound source's positional data. If metadata is enabled, the audio stream from a local sound source will be modified to include the position of the source. Other instances of the application can extract the position of the source along with audio data. This enables near real-time position updates and avoids any distracting desynchronization between the perceived auditory location of a sound source and the visual representation of the sound source. For example, if a sound is coming from an avatar that is speaking as it is walking, the sound will move in sync with the avatar in motion.
When a source and listener are within 1 meter of each other, an alternate "near field" HRTF is used. This includes a head occlusion algorithm and can help deliver ASMR-like effects. Nearby audio sources sound convincingly close, like someone whispering in your ear.
All audio within this library is handled as 32-bit floats, which allows for an unlimited dynamic range. This is important in applications where more than one source may be producing sound at the same time. The more sound sources, the more critical this feature is to simulating sound in the natural world. Unlike conventional handling of mixed audio, the High Dynamic Range Peak-Limiter eliminates any perceptible audio clipping, distortion, and other artifacts.
Internally, WebRTC uses a 16-bit audio pipeline. Before audio is sent across the network or delivered to an output device, the Peak Limiter applies a specialized look-ahead function, which results in a high-quality final compression of the audio pipeline.
When it's time to deliver audio from this library to the application, the higher dynamic-range data must be brought within the expected range, which is what the Peak Limiter does. It works by looking ahead several frames and adjusting the overall volume of the output stream, caps the loudness, and can handle huge overloads.
Noise suppression on audio input can be achieved by passing the microphone stream through a Noise Suppression node. This node provides two methods of noise suppression: DNN and Noise Gate.
DNN (Deep Neural Network) noise suppression automatically suppresses background noise and removes noise from the speech signal. It is best used in very noisy environments, when the SNR is low. It may occasionally alter the sound quality of the speech.
Noise Gate noise suppression uses a manual threshold to remove only background noise while the speech signal is passed through unprocessed. It is best used in relatively noise-free environments, when the SNR is high. The threshold should be set to the lowest setting that effectively blocks the background noise. The gate can be completely disabled by setting the threshold to its minimum value.
Both methods reduce CPU load and network traffic when blocking background noise, but DNN noise suppression is more computationally expensive than Noise Gate.
The nodes in this package are compatible with standard WebAudio nodes. They can be arranged as needed to take advantage of WebAudio's flexible modular design. If all audio sources use 48 kHz, resampling is avoided and the best audio quality will be delivered.