Audiomentations documentation
Audiomentations is a Python library for audio data augmentation, built to be fast and easy to use - its API is inspired by albumentations. It's useful for making audio deep learning models work well in the real world, not just in the lab. Audiomentations runs on CPU, supports mono audio and multichannel audio and integrates well in training pipelines, such as those built with TensorFlow/Keras or PyTorch. It has helped users achieve world-class results in Kaggle competitions and is trusted by companies building next-generation audio products with AI.
Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!
Setup
pip install audiomentations
Optional requirements
Some features have extra dependencies. Extra python package dependencies can be installed by running
pip install audiomentations[extras]
Feature | Extra dependencies |
---|---|
Limiter |
cylimiter |
LoudnessNormalization |
pyloudnorm |
Mp3Compression |
ffmpeg and [pydub or lameenc ] |
RoomSimulator |
pyroomacoustics |
Note: ffmpeg
can be installed via e.g. conda or from the official ffmpeg download page.
Usage example
Waveform
from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np
augment = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
Shift(p=0.5),
])
# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)
# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)
Waveform transforms
For a list and explanation of all waveform transforms, see Waveform transforms in the menu.
Waveform transforms can be visualized (for understanding) by the audio-transformation-visualization GUI (made by phrasenmaeher), where you can upload your own input wav file
Composition classes
Compose
Compose applies the given sequence of transforms when called, optionally shuffling the sequence for every call.
OneOf
OneOf randomly picks one of the given transforms when called, and applies that transform.
An optional weights
list of floats may be given to guide the probability of each transform for being chosen. If not specified, a transform is chosen uniformly at random.
Code example:
from audiomentations import OneOf, PitchShift
pitch_shift = OneOf(
transforms=[
PitchShift(method="librosa_phase_vocoder"),
PitchShift(method="signalsmith_stretch"),
],
weights=[0.1, 0.9],
)
SomeOf
SomeOf randomly picks several of the given transforms when called, and applies those transforms.
Known limitations
- A few transforms do not support multichannel audio yet. See Multichannel audio
- Expects the input dtype to be float32, and have values between -1 and 1.
- The code runs on CPU, not GPU. For a GPU-compatible version, check out pytorch-audiomentations
- Multiprocessing probably works but is not officially supported yet
Contributions are welcome!
Multichannel audio
As of v0.22.0, all transforms except AddBackgroundNoise
and AddShortNoises
support not only mono audio (1-dimensional numpy arrays), but also stereo audio, i.e. 2D arrays with shape like (num_channels, num_samples)
. See also the guide on multichannel audio array shapes.
Acknowledgements
Thanks to Nomono for backing audiomentations.
Thanks to all contributors who help improving audiomentations.