Skip to content

Audiomentations documentation

Build status Code coverage Code Style: Black Licence: MIT DOI

A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.

Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!

Setup

Python version support PyPI version Number of downloads from PyPI per month

pip install audiomentations

Optional requirements

Some features have extra dependencies. Extra python package dependencies can be installed by running

pip install audiomentations[extras]

Feature Extra dependencies
Limiter cylimiter
LoudnessNormalization pyloudnorm
Mp3Compression ffmpeg and [pydub or lameenc]
RoomSimulator pyroomacoustics

Note: ffmpeg can be installed via e.g. conda or from the official ffmpeg download page.

Usage example

Waveform

from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np

augment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(p=0.5),
])

# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)

# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)

Spectrogram

from audiomentations import SpecCompose, SpecChannelShuffle, SpecFrequencyMask
import numpy as np

augment = SpecCompose(
    [
        SpecChannelShuffle(p=0.5),
        SpecFrequencyMask(p=0.5),
    ]
)

# Example spectrogram with 1025 frequency bins, 256 time steps and 2 audio channels
spectrogram = np.random.random((1025, 256, 2))

# Augment/transform/perturb the spectrogram
augmented_spectrogram = augment(spectrogram)

Waveform transforms

For a list and explanation of all waveform transforms, see Waveform transforms in the menu.

Waveform transforms can be visualized (for understanding) by the audio-transformation-visualization GUI (made by phrasenmaeher), where you can upload your own input wav file

Spectrogram transforms

For a list and brief explanation of all spectrogram transforms, see Spectrogram transforms

Composition classes

Compose

Compose applies the given sequence of transforms when called, optionally shuffling the sequence for every call.

SpecCompose

Same as Compose, but for spectrogram transforms

OneOf

OneOf randomly picks one of the given transforms when called, and applies that transform.

SomeOf

SomeOf randomly picks several of the given transforms when called, and applies those transforms.

Known limitations

  • A few transforms do not support multichannel audio yet. See Multichannel audio
  • Expects the input dtype to be float32, and have values between -1 and 1.
  • The code runs on CPU, not GPU. For a GPU-compatible version, check out pytorch-audiomentations
  • Multiprocessing probably works but is not officially supported yet

Contributions are welcome!

Multichannel audio

As of v0.22.0, all transforms except AddBackgroundNoise and AddShortNoises support not only mono audio (1-dimensional numpy arrays), but also stereo audio, i.e. 2D arrays with shape like (num_channels, num_samples). See also the guide on multichannel audio array shapes.

Acknowledgements

Thanks to Nomono for backing audiomentations.

Thanks to all contributors who help improving audiomentations.