Audiomentations documentation
A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.
Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!
Setup
pip install audiomentations
Optional requirements
Some features have extra dependencies. Extra python package dependencies can be installed by running
pip install audiomentations[extras]
Feature | Extra dependencies |
---|---|
Limiter |
cylimiter |
LoudnessNormalization |
pyloudnorm |
Mp3Compression |
ffmpeg and [pydub or lameenc ] |
RoomSimulator |
pyroomacoustics |
Note: ffmpeg
can be installed via e.g. conda or from the official ffmpeg download page.
Usage example
Waveform
from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np
augment = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
Shift(p=0.5),
])
# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)
# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)
Spectrogram
from audiomentations import SpecCompose, SpecChannelShuffle, SpecFrequencyMask
import numpy as np
augment = SpecCompose(
[
SpecChannelShuffle(p=0.5),
SpecFrequencyMask(p=0.5),
]
)
# Example spectrogram with 1025 frequency bins, 256 time steps and 2 audio channels
spectrogram = np.random.random((1025, 256, 2))
# Augment/transform/perturb the spectrogram
augmented_spectrogram = augment(spectrogram)
Waveform transforms
For a list and explanation of all waveform transforms, see Waveform transforms in the menu.
Waveform transforms can be visualized (for understanding) by the audio-transformation-visualization GUI (made by phrasenmaeher), where you can upload your own input wav file
Spectrogram transforms
For a list and brief explanation of all spectrogram transforms, see Spectrogram transforms
Composition classes
Compose
Compose applies the given sequence of transforms when called, optionally shuffling the sequence for every call.
SpecCompose
Same as Compose, but for spectrogram transforms
OneOf
OneOf randomly picks one of the given transforms when called, and applies that transform.
SomeOf
SomeOf randomly picks several of the given transforms when called, and applies those transforms.
Known limitations
- A few transforms do not support multichannel audio yet. See Multichannel audio
- Expects the input dtype to be float32, and have values between -1 and 1.
- The code runs on CPU, not GPU. For a GPU-compatible version, check out pytorch-audiomentations
- Multiprocessing probably works but is not officially supported yet
Contributions are welcome!
Multichannel audio
As of v0.22.0, all transforms except AddBackgroundNoise
and AddShortNoises
support not only mono audio (1-dimensional numpy arrays), but also stereo audio, i.e. 2D arrays with shape like (num_channels, num_samples)
. See also the guide on multichannel audio array shapes.
Acknowledgements
Thanks to Nomono for backing audiomentations.
Thanks to all contributors who help improving audiomentations.