Task View

This page breaks functionality up according to tasks in order to make it easier to find relevant functions. Throughout this page, we assume that you are familiar with how to estimate models (Models and root manipulations) and specify distances (Distances).


There are two main ways to perform classification using the functionality in this package

  • Nearest-neighbor based
  • Feature based

Nearest-neighbor based classification

This is very simple, select a distance, and calculate the nearest neighbor from a dataset to your query. The dataset can be either a vector of models estimated from signals, a vector of spectrograms, or a matrix of embedding vectors derived from models.

First, we demonstrate how one can perform "leave-one-out" corss validation within a labeled dataset, i.e., for each example, classify it using all the others. Since a distance based classifier does not have an explicit "training phase", this sort of cross-validation is comparatively cheap to perform.

Any distance can be used to calculate a distance matrix using the function distmat. Given a distance matrix D, you can predict the nearest-neighbor class with the following function

function predict_nn(labels::Vector{Int}, D)
    dists, inds = findmin(D + Inf*I, dims=2) # The diagonal contains trivial matches, hence add infinite Identity
    inds = vec(getindex.(inds, 2))
    map(i->labels[i], inds)

predicted_classes = predict_nn(labels, D)

When we want to classify a new sample q, we can simply broadcast[tmap] a distance d between q and all labeled samples in the training set

dists = d.(models_train, q)
predicted_class = labels[argmin(dists)]

note that if you're doing detection, i.e., looking for a short q in a much longer time series, see Detection below, and the function distance_profile.

Nearest neighbor using embeddings

By far the fastest neighbor querys can be made by extracting embeddings from estimated models and using a KD-tree to accelerate neigbor searches. Below, we'll go into detail on how to do this. This corresponds to using the EuclideanRootDistance with uniform weighting on the poles.

The following function finds you the $k$ most likely classes corresponding to query embedding q from within Xtrain. Xtrain and q are expected to be embeddings formed by the function embeddings from AudioClustering.jl. (See Calculate root embeddings from sound files for an intro.)

using MultivariateStats, NearestNeighbors, AudioClustering

Xtrain = embeddings(models_train) # Assumes that you have already estimated models

function knn_classify(labels, Xtrain, q, k)
    N = size(Xtrain,2)
    W = fit(Whitening, Xtrain)
    X = MultivariateStats.transform(W,Xtrain)
    q = MultivariateStats.transform(W,q)
    tree = NearestNeighbors.KDTree(Xtrain)
    inds, dists = knn(tree, q, min(5k+1, N-1), true)
    ul = unique(labels[inds[2:end]])
    ul[1:min(k, length(ul))]

Increased accuracy is often obtained by estimating models with a few different specifications and fitting methods and use them all to form predictions (this will form an ensemble). The following code fits models with different fit methods and of different orders

using ThreadTools, AudioClustering, ProgressMeter
modelspecs = collect(Iterators.product(10:2:14, (TLS,LS))) # Model order × fitmethod

manymodels = @showprogress "Estimating models" map(modelspecs) do (na, fm)
    fitmethod = fm(na=na, λ=1e-5)
    tmap(sounds) do sound
        sound = @view(sound[findfirst(!iszero, sound):findlast(!iszero, sound)])
        sound = Float32.(SpectralDistances.bp_filter(sound, (50 / fs, 0.49))) # Apply some bandpass filtering

manyX = embeddings.(manymodels) # This is not a matrix of matrices

To predict a single class, let many classifiers vote for the best class

using MLBase # For mode

function vote(preds)
    map(1:length(preds[1])) do i
        mode(getindex.(preds, i))

votes = [classpred1, classpred2, classpred3, ...] # Each classpred can be obtained by, e.g., knn_classify above.
majority_vote = vote(votes)
@show mean(labels .== majority_vote) # Accuracy

To predict "up to $k$ classes", try the following

using StatsBase # for countmap
function predict_k(labels, preds, k)
    map(eachindex(labels)) do i
        cm = countmap(getindex.(preds, i)) |> collect |> x->sort(x, by=last, rev=true)

votes = [classpred1, classpred2, classpred3, ...]
k_votes = predict_k(labels, votes, k)
@show mean(labels .∈ k_votes) # Accuracy

To figure out which classifier is best, rank them like so

function ranking(labels, preds)
    scores = [mean(labels .== yh) for yh in preds]
    sortperm(scores, rev=true)

votes = [classpred1, classpred2, classpred3, ...]
r = ranking(labels, votes)

Feature-based classification

The embeddings extracted above can be used as features for a standard classifier. Below we show an example using a random forest

using DecisionTree, MultivariateStats, Random, AudioClustering
N       = length(labels)
X       = embeddings(models)' |> copy # DecisionTree expects features along columns
perm    = randperm(N)
Nt      = N ÷ 2 # Use half dataset for training
train_x = X[perm[1:Nt], :]
train_y = labels[perm[1:Nt]]
test_x  = X[perm[Nt+1:end], :]
test_y  = labels[perm[Nt+1:end]]

model = RandomForestClassifier(n_trees=400, max_depth=15)
DecisionTree.fit!(model, train_x, train_y)

predictions = DecisionTree.predict(model, test_x)
k_predictions =
getindex.(sortperm.(eachrow(predict_proba(model, test_x)), rev = true), Ref(1:3)) # Predict top 3
@show accuracy = mean(predictions .== test_y)    # Top class prediction accuracy
@show accuracy = mean(test_y .∈ k_predictions)  # Top 3 classes predictions accuracy

The features derived here can of course be combined with any number of other features, such as from AcousticFeatures.jl.


Detection refers to finding a short query pattern q in a long recording y. This task can often be performance optimized for expensive-to-compute distances.

In its most basic form, a dection score can be calculated by simply broadcasting a distance over y, see Detection using examples.

For spectrogram distances, we have optimized methods for calculating distance profiles, see Computing a spectrogram distance profile. Also TimeDistance has an optimized method for distance_profile.

Detection can also be done using Dynamic Time Warping combined with optimal transport, see Dynamic Time Warping. For examples of the combination of DTW and OT, see the following notebooks

Unsupervised learning

For clustering applications, there are a number of approaches

  • Distance matrix
  • Feature-based
  • K-barycenters

Clustering using a distance matrix

Using distmat with keyword arg normalize=true, you can obtain a distance matrix that can be used with a large number of clustering algorithms from Clustering.jl or HDBSCAN.jl.

Clustering using features

Using embeddings from AudioClustering.jl, you can run regular K-means which is blazingly fast, but often produces worse clusterings than more sophisticated methods.

Clustering using K-barycenters

This approach is similar to K-means, but uses a transport-based method to calculate distances and form averages rather than the Euclidean distance. See the example K-Barycenters.

Finding motifs or outliers

To find motifs (recurring patterns) or outliers (discords), see MatrixProfile.jl which interacts well with SpectralDistances.jl.

Dimensionality reduction

Several sounds from the same class can be reduced to a smaller number of sounds by forming a barycenter. See examples Barycenters, Barycenters between spectrograms and the figure in the readme (reproduced below) which shows how four spectrograms can be used to calculate a "center spectrogram".


Dataset augmentation

Barycenters can also be used also to augment datasets with points "in-between" other points. The same figure in the readme (reproduced above) illustrates how four spectrogams are extended into 25 spectrograms.

Interpolation between spectra

An interpolation between spectra is obtained by calculating a barycenter using varying barycentric coordinates. See Interpolations and Barycenters.

  • tmapIf d is an expensive distance to compute, you may want to consider using tmap from ThreadTools.jl