Distances

Overview

The following is a reference for all the distances defined in this package. Once a distance object is defined, it can be evaluated in one of two ways, defined by the Distances.jl interface

dist = DistanceType(options)
d = evaluate(d, x1, x2; kwargs...) # keyword arguments are used to control the solvers for some transport-based distances
d = dist(x1, x2) # A shorter syntax for calling the distance
Note

All distances return the distance raised to the power p, thus RationalOptimalTransportDistance(p=2)(x1,x2) $= W_2(x_1,x_2)^2$ where $W_2$ denotes the Wasserstein distance of order 2.

Before we proceed, we list a number of classes of distances that are available

Some of these distances operate directly on signals, these are

EnergyDistance
ModelDistance
SpectralDistances.AbstractWelchDistance

Of these, ModelDistance is a bit special, works like this

SpectralDistances.ModelDistanceType
ModelDistance{D <: AbstractDistance} <: AbstractSignalDistance

A model distance operates on signals and works by fitting an LTI model to the signals before calculating the distance. The distance between the LTI models is defined by the field distance. This is essentially a wrapper around the inner distance that handles the fitting of a model to the signals. How the model is fit is determined by fitmethod.

Arguments:

  • fitmethod::FitMethod: LS, TLS or PLR
  • distance::D: The inner distance between the models

Example:

using SpectralDistances
innerdistance = OptimalTransportRootDistance(domain=Continuous(), β=0.005, p=2)
dist = ModelDistance(TLS(na=30), innerdistance)
source

The inner distance in ModelDistance can be any AbstractRationalDistance. The options are

DiscretizedRationalDistance
RationalCramerDistance
RationalOptimalTransportDistance
SpectralDistances.AbstractCoefficientDistance
SpectralDistances.AbstractRootDistance

These distances operate on LTI models. Some operate on the coefficients of the models

CoefficientDistance

and some operate on the roots of the models

EuclideanRootDistance
HungarianRootDistance
KernelWassersteinRootDistance
OptimalTransportRootDistance

A full example

To use the OptimalTransportRootDistance and let it operate on signals, we may construct our distance object as follows

julia> innerdistance = OptimalTransportRootDistance(domain=Continuous(), β=0.005, p=2)
OptimalTransportRootDistance{Continuous,typeof(identity),typeof(simplex_residueweight),Nothing}(Continuous(), identity, SpectralDistances.simplex_residueweight, 0.005, 2, nothing)

julia> dist = ModelDistance(TLS(na=10), innerdistance)
ModelDistance{OptimalTransportRootDistance{Continuous,typeof(identity),typeof(simplex_residueweight),Nothing},TLS}(TLS(10, 0.0), OptimalTransportRootDistance{Continuous,typeof(identity),typeof(simplex_residueweight),Nothing}(Continuous(), identity, SpectralDistances.simplex_residueweight, 0.005, 2, nothing))

julia> X1, X2 = randn(1000), randn(1000);

julia> dist(X1,X2)
0.4030359376468506

julia> dist = ModelDistance(LS(na=2), innerdistance);

julia> t = 0:0.01:10;

julia> X1, X2 = sin.(2π*1 .*t), sin.(2π*1.1 .*t); # Two signals that are close in frequency

julia> dist(X1,X2)
0.0005678342162688199

julia> X1, X2 = sin.(2π*1 .*t), sin.(2π*2 .*t);   # Two signals that are further apart in frequency

julia> dist(X1,X2)
0.004073228552285849

Using Welch periodograms

We can calculate the Wasserstein distance between spectra estimated using the Welch method like so

julia> dist = WelchOptimalTransportDistance(p=2)
WelchOptimalTransportDistance{Nothing,Tuple{},NamedTuple{(),Tuple{}}}(nothing, (), NamedTuple(), 2)

julia> X1, X2 = randn(1000), randn(1000);

julia> dist(X1,X2)
0.00237957140366748

julia> t = 0:0.01:10;

julia> X1, X2 = sin.(2π*1 .*t), sin.(2π*1.1 .*t); # Two signals that are close in frequency

julia> dist(X1,X2)
0.00035863696706223353

julia> X1, X2 = sin.(2π*1 .*t), sin.(2π*2 .*t);   # Two signals that are further apart in frequency

julia> dist(X1,X2)
0.0011840976205946979

Gradients

Some distances will allow you to propagate gradients through them. Below is an example using Zygote and the OptimalTransportRootDistance

using Zygote
Zygote.@nograd rand # Currently needed woraround
x1 = SpectralDistances.bp_filter(randn(100000), (0.1,0.3))  # Create two signals
x2 = SpectralDistances.bp_filter(randn(100000), (0.1,0.2))
fm = LS(na=10)      # LS is the best supported fitmethod for gradients

dist = ModelDistance(fm,OptimalTransportRootDistance(domain = Continuous()))      # Since we're measureing distance between signals, we wrap the distance in a ModelDistance
ModelDistance{OptimalTransportRootDistance{Continuous,typeof(identity),typeof(simplex_residueweight),Nothing},LS}(LS(10, 0.01), OptimalTransportRootDistance{Continuous,typeof(identity),typeof(simplex_residueweight),Nothing}(Continuous(), identity, SpectralDistances.simplex_residueweight, 0.01, 2, nothing))
julia> dist(x1,x2)
0.02731881482356442

julia> ∇x1 = Zygote.gradient(x->real(evaluate(dist,x,x2)), x1)[1] # The call to real is a workaround for a Zygote bug
100000-element Array{Float64,1}:
  4.1838710264084406e-7
 -8.125071451986144e-7
  3.5349251950013737e-6
 -4.693274890883421e-6
  8.876663288856337e-6
 -9.109374176908494e-6
  1.0714110339739344e-5
 -7.914244154582994e-6
  6.974972676978242e-6
 -1.9604674441410045e-6
  ⋮
  2.803220403840163e-6
 -2.2086674980118802e-6
 -2.6724187496047165e-7
 -2.9027591437761906e-6
  1.1361431525961929e-6
 -1.5598299810563685e-6
 -2.161877463447161e-7
  1.2368388797136808e-6
  1.7215367550240197e-6

The differentiation takes some time, but it should be fast enough to be generally useful for gradient-based learning of autoencoders etc. The following is a benchmark performed on an old laptop without GPU (the distances are not yet tested on GPUs)

@btime Zygote.gradient(x->real(evaluate($dist,x,$x2)), $x1);
#  134.965 ms (107566 allocations: 134.77 MiB)

with a precomputed reference model, it goes even faster

m2 = fm(x2)
m2 = change_precision(Float64, m2) # Tihs step is important for performance
@btime Zygote.gradient(x->real(evaluate($dist,x,$m2)), $x1);
#  80.200 ms (103437 allocations: 69.62 MiB)

The same benchmarks performed on a 2019 desktop computer yields the following timings

julia> @btime Zygote.gradient(x->real(evaluate($dist,x,$x2)), $x1);
  45.926 ms (107748 allocations: 136.18 MiB)

julia> @btime Zygote.gradient(x->real(evaluate($dist,x,$m2)), $x1);
  25.120 ms (103619 allocations: 71.03 MiB)

Unbalanced transport

There are situations in which one would like to avoid fully transporting all mass between two measures. A few such cases are

  • The two measures do not have the same mass. In this case, the standard, balanced, optimal-transport problem is unfeasible.
  • Energy is somehow lost or added to one spectra in a way that should not be accounted for by transport. This would be the case if
    • Spectral energy is absorbed by a channel through which a signal is propagated. In this case it would not make sense to try to transport mass from the other spectrum away from the absorbed (dampended) frequency.
    • Spectral energy is added by a noise source. This energy should ideally not be considered for transport and should rather be destroyed.

For situations like this, an AbstractDivergence can be supplied to the OptimalTransportRootDistance. This divergence specifies how expensive it is to create or destroy mass in the spectra. The available divergences are listed in the docs of UnbalancedOptimalTransport.jl, to which we outsource the solving of the unbalanced problem. For convenience, the wrapper sinkhorn_unbalanced is available to interface the unbalanced solver in the same way as the solvers from this package are interfaced.

julia> using DSP

julia> fm = LS(na = 10);

julia> m1 = fm(filtfilt(ones(10), [10], randn(1000)));
┌ Warning: Roots on negative real axis, no corresponding continuous time representation exists. Consider prefiltering the signal or decreasing the regularization factor.
└ @ SpectralDistances ~/work/SpectralDistances.jl/SpectralDistances.jl/src/ltimodels.jl:83

julia> m2 = fm(filtfilt(ones(5), [5], randn(1000)));
┌ Warning: Roots on negative real axis, no corresponding continuous time representation exists. Consider prefiltering the signal or decreasing the regularization factor.
└ @ SpectralDistances ~/work/SpectralDistances.jl/SpectralDistances.jl/src/ltimodels.jl:83

julia> dist = OptimalTransportRootDistance(domain = Continuous(), p=1, divergence=Balanced())
OptimalTransportRootDistance{Continuous,typeof(identity),typeof(simplex_residueweight),Balanced}(Continuous(), identity, SpectralDistances.simplex_residueweight, 0.01, 1, Balanced())

julia> d1 = evaluate(dist,m1,m2)
0.143795527143696

julia> dist = OptimalTransportRootDistance(domain = Continuous(), p=1, divergence=KL(1.0));

julia> d2 = evaluate(dist,m1,m2)
0.13624756673989538

julia> dist = OptimalTransportRootDistance(domain = Continuous(), p=1, divergence=KL(10.0));

julia> d3 = evaluate(dist,m1,m2)
0.1430032060986858

julia> dist = OptimalTransportRootDistance(domain = Continuous(), p=1, divergence=KL(0.01));

julia> d4 = evaluate(dist,m1,m2)
0.029111809605294095

julia> d1 > d3 > d2 > d4
true

When the distance is evaluated the second time, unbalanced transport is used. The d2 should be equal to or smaller than d1. If we make the KL term larger, the distance approaches the balanced cost, and if we make it smaller, it becomes very cheap to create/destroy mass and less mass is transported.

The first case, where divergence=Balanced() was supplied, should be equivalent to not providing any divergence at all. In pratice results might differ slightly since a different solver implementation is used.

Below is an example in which the unbalanced transport between two systems is computed. The two systems do not have the same number of poles, and if destruction of mass is made cheap, not all mass is transported. The thickness of the lines indicate mass flow.

using Plots
m1 = AR(Continuous(), [1, 0.1, 1.3])                        |> change_precision(Float64)
m2 = AR(Continuous(), polyconv([1, 0.1, 1], [1, 0.1, 1.2])) |> change_precision(Float64)
D  = SpectralDistances.distmat_euclidean(m1.pc, m2.pc)
w1, w2 = unitweight.((m1, m2))
figs = map([0.001, 0.01, 0.1]) do tv
    divergence = TV(tv)
    Γ, a, b = sinkhorn_unbalanced(D, w1, w2, divergence, β = 0.01)
    lineS   = 20Γ
    lineS[lineS.<0.1] .= 0
    alphaS  = lineS ./ maximum(lineS)
    f       = scatter(m1.pc, legend = false, ms = 10, title = "TV=$tv")
    scatter!(m2.pc, ms = 10)
    for (i, p1) in enumerate(m1.pc), (j, p2) in enumerate(m2.pc)
        coords = [p1, p2]
        plot!(
            real(coords),
            imag(coords),
            linewidth = lineS[i, j],
            alpha     = alphaS[i, j],
            color     = :black,
        )
    end
    f
end
plot(figs..., layout = (1, 3), ylims = (0.9, 1.2))
┌ Warning: No strict ticks found
└ @ PlotUtils ~/.julia/packages/PlotUtils/es5pb/src/ticks.jl:283
┌ Warning: No strict ticks found
└ @ PlotUtils ~/.julia/packages/PlotUtils/es5pb/src/ticks.jl:283
┌ Warning: No strict ticks found
└ @ PlotUtils ~/.julia/packages/PlotUtils/es5pb/src/ticks.jl:283
┌ Warning: No strict ticks found
└ @ PlotUtils ~/.julia/packages/PlotUtils/es5pb/src/ticks.jl:283
┌ Warning: No strict ticks found
└ @ PlotUtils ~/.julia/packages/PlotUtils/es5pb/src/ticks.jl:283
┌ Warning: No strict ticks found
└ @ PlotUtils ~/.julia/packages/PlotUtils/es5pb/src/ticks.jl:283

Function reference

Docstrings

SpectralDistances.DiscretizedRationalDistanceType
DiscretizedRationalDistance{WT, DT} <: AbstractRationalDistance

This distance discretizes the spectrum before performing the calculations.

Arguments:

  • w::WT = LinRange(0.01, 0.5, 300): Frequency set
  • distmat::DT = distmat_euclidean(w, w): DESCRIPTION
source
SpectralDistances.EnergyDistanceType
EnergyDistance <: AbstractSignalDistance

std(x1) - std(x2) This distance can be added to a loss function to ensure that the energy in the two signals is the same. Some of the optimal transport-based distances are invariant to the energy in the signal, requiring this extra cost if that invariance is not desired. Combining distances is done by putting two or more in a tuple. Usage: combined_loss = (primary_distance, EnergyDistance())

source
SpectralDistances.EuclideanRootDistanceType
EuclideanRootDistance{D, A, F1, F2} <: AbstractRootDistance

Simple euclidean distance between roots of transfer functions

Arguments:

  • domain::D: Discrete or Continuous
  • assignment::A = SortAssignement(imag): Determines how roots are assigned. An alternative is HungarianAssignement
  • transform::F1 = identity: DESCRIPTION
  • weight : A function used to calculate weights for the induvidual root distances. A good option is residueweight
  • p::Int = 2 : Order of the distance
source
SpectralDistances.HungarianRootDistanceType
HungarianRootDistance{D, ID <: Distances.PreMetric, F} <: AbstractRootDistance

Similar to EuclideanRootDistance but does the pole assignment using the Hungarian method.

Arguments:

  • domain::D = Continuous(): Discrete or Continuous
  • distance::ID = SqEuclidean(): Inner distance
  • transform::F = identity: If provided, this Function transforms all roots before the distance is calculated
source
SpectralDistances.KernelWassersteinRootDistanceType
KernelWassersteinRootDistance{D, F, DI} <: AbstractRootDistance

A kernel version of the root distance

Arguments:

  • domain::D = Continuous(): Discrete or Continuous
  • λ::Float64 = 1.0: Kernel precision, lower value means wider kernel.
  • transform::F = identity: If provided, this Function transforms all roots before the distance is calculated
source
SpectralDistances.OptimalTransportHistogramDistanceType
OptimalTransportHistogramDistance{DT} <: AbstractDistance

Optimal transport between two histograms. If you pass two vectors to this distance two histograms will be computed automatically (fit(Histogram,x)). Pass two histograms to get better manual control.

Arguments:

  • p::Int = 1: order
source
SpectralDistances.OptimalTransportRootDistanceType
OptimalTransportRootDistance{D, F1, F2, S} <: AbstractRootDistance

The Sinkhorn distance between roots. The weights are provided by weight, which defaults to residueweight.

Arguments:

  • domain::D = Continuous(): Discrete or Continuous
  • transform::F1 = identity: Probably not needed.
  • weight::F2 =simplex_residueweight: A function used to calculate weights for the induvidual root distances.
  • β::Float64 = 0.01: Amount of entropy regularization
  • p::Int = 2 : Order of the distance
  • divergence::S = nothing: A divergence that penalizes creation and destruction of mass.
source
SpectralDistances.RationalCramerDistanceType
RationalCramerDistance{DT} <: AbstractRationalDistance

Similar to RationalOptimalTransportDistance but does not use inverse functions.

Arguments:

  • domain::DT = Continuous(): Discrete or Continuous
  • p::Int = 2: order
  • interval = (-(float(π)), float(π)): Integration interval
source
SpectralDistances.RationalOptimalTransportDistanceType
CationalOptimalTransportDistance{DT, MT} <: AbstractRationalDistanCe

calculates the Wasserstein distance using the closed-form sold*c for (d,c) in zip(D,C)s and inverse cumulative functions.

C ArgumentC:

  • domain::DT = Continuous(): Discrete or [Continuous](@red*c for (d,c) in zip(D,C)r
  • magnitude::MT = Identity():
  • interval = (-(float(π)), float(π)): Integration interval
source
SpectralDistances.WelchLPDistanceType
WelchLPDistance{AT <: Tuple, KWT <: NamedTuple, F} <: AbstractWelchDistance

Lᵖ distance between welch spectra, mean(abs(x1-x2)^p).

#Arguments:

  • args::AT = (): These are sent to welch_pgram
  • kwargs::KWT = NamedTuple(): These are sent to welch_pgram
  • p::Int = 2: Order of the distance
  • normalized::Bool = true: Normlize spectrum to sum to 1 (recommended)
  • transform::F = identity: Optional function to apply to the spectrum, example log1p or sqrt. Must not produce negative values, so log is not a good idea. The function is applied like this: transform.(x1).
source
SpectralDistances.WelchOptimalTransportDistanceType
WelchOptimalTransportDistance{DT, AT <: Tuple, KWT <: NamedTuple} <: AbstractWelchDistance

Calculates the Wasserstein distance between two signals by estimating a Welch periodogram of each.

Arguments:

  • distmat::DT: you may provide a matrix array for this
  • args::AT = (): Options to the Welch function
  • kwargs::KWT = NamedTuple(): Options to the Welch function
  • p::Int = 2 : Order of the distance
source
SpectralDistances.discrete_grid_transportcostMethod
discrete_grid_transportcost(x::AbstractVector{T}, y::AbstractVector{T}, p = 2, tol=sqrt(eps(T)); inplace=false) where T

Calculate the optimal-transport cost between two vectors that are assumed to have the same support, with sorted and equdistant support points.

The calculated cost corresponds to the following

D = [abs2((i-j)/(n-1)) for i in 1:n, j in 1:n] # note the 1/(n-1)
Γ = discrete_grid_transportplan(x, y)
dot(Γ, D)
  • ´p´ is the power of the distance
  • ´inplaceif true,x` is overwritten.
source
SpectralDistances.discrete_grid_transportplanMethod
discrete_grid_transportplan(x::AbstractVector{T}, y::AbstractVector{T}, tol=sqrt(eps(T)); inplace=false) where T

Calculate the optimal-transport plan between two vectors that are assumed to have the same support, with sorted support points.

  • ´inplaceif true,x` is overwritten.
  • It's possible to supply keyword argument g to provide preallocated memory for the transport plan. Make sure it's all zeros if you do.
source
SpectralDistances.distmatMethod
distmat(dist::Any, e::AbstractArray{T,1} where T; normalize, kwargs...) -> LinearAlgebra.Symmetric

Compute the symmetric, pairwise distance matrix using the specified distance.

  • normalize: set to true to normalize distances such that the diagonal is zero. This is useful for distances that are not true distances due to d(x,y) ≠ 0 such as the OptimalTransportRootDistance

This function uses multiple threads if available and copies the distance object to each thread in case it stores an internal cache.

source
SpectralDistances.precomputeFunction
precompute(d::AbstractDistance, As, threads=true)

Perform computations that only need to be donce once when several pairwise distances are to be computed

Arguments:

  • As: A vector of models
  • threads: Us multithreading? (true)
source
SpectralDistances.ConvOptimalTransportDistanceType
ConvOptimalTransportDistance <: AbstractDistance

Distance between matrices caluclated using sinkhorn_convolutional. This type automatically creates a workspace object that is reused between invocations. Functions that internally performs threaded computations copy this object to each thread, but if manual threading is performed, this must be handled manually, e.g.:

using ThreadTools
dists = [deepcopy(dist) for _ in 1:Threads.nthreads()]
D = tmap(eachindex(X)) do i
    evaluate(dists[Threads.threadid()], Q, X[i]; kwargs...)
end

It's important to tune the two parameters below, see the docstring for sinkhorn_convolutional for more help.

  • To get sharp barycenters, a smaller β around 0.001 is recommended.

  • To get smooth distance profiles (distance_profile), a slightly higher β than for barycenters is recommended. β around 0.01-0.05 should do fine.

  • β = 0.001

  • dynamic_floor = -10.0

  • invariant_axis::Int = 0 If this is set to 1 or 2, the distance will be approximately invariant to translations along the invariant axis. As an example, to be invariant to a spectrogram being shifted slightly in time, set invariant_axis = 2.

See also SlidingConvOptimalTransportDistance

source
SpectralDistances.SCWorkspaceMethod
SCWorkspace(A, B, β)

Workspace object for sinkhorn_convolutional. Manually construct this in order to save allocations between consequtive calls to the solver.

Arguments:

  • A: The first matrix
  • B: The second matrix
  • β: the regularization parameter
source
SpectralDistances.IPOTFunction
Γ, u, v = IPOT(C, a, b; β=1, iters=1000)

The Inexact Proximal point method for exact Optimal Transport problem (IPOT) (Sinkhorn-like) algorithm. C is the cost matrix and a,b are vectors that sum to one. Returns the optimal plan and the dual potentials. See also sinkhorn. β does not have to go to 0 for this alg to return the optimal distance, in fact, if β is set too low, this alg will encounter numerical problems.

A Fast Proximal Point Method for Computing Exact Wasserstein Distance Yujia Xie, Xiangfeng Wang, Ruijia Wang, Hongyuan Zha https://arxiv.org/abs/1802.04307

source
SpectralDistances.mask_filterFunction
mask_filter(X::Matrix, th=0.25)
mask_filter!(Y, X=Y, th=0.25)

Mask out parts of a spectrogram that only contains impulsive noise. ´th ∈ (0,1) determines the amount of masking, a higher value masks out more. This function is to be applied after something like normalize_spectrogram.

source
SpectralDistances.sinkhornMethod
Γ, u, v = sinkhorn(C, a, b; β=1e-1, iters=1000)

The Sinkhorn algorithm. C is the cost matrix and a,b are vectors that sum to one. Returns the optimal plan and the dual potentials. This function is relatively slow, see also sinkhorn_log! IPOT and sinkhorn_log for faster algorithms.

source
SpectralDistances.sinkhorn_convolutionalMethod
sinkhorn_convolutional(w::SCWorkspace{T}, A::AbstractMatrix, B::AbstractMatrix; β = 0.01, τ = 1 / eps(T), iters = 1000, tol = 1.0e-6, ϵ = eps(T) ^ 2, verbose = false, initUV = true) where T

Calculate the entropically regularizaed Sinkhorn distance between two matrices where the ground cost is squared euclidean. This function uses an efficient convolutional algorithm and is much more efficient than the corresponding sinkhorn_log! in this special case.

This function spends most of its time doing matrix multiplications. You may want to tune BLAS.set_num_threads in order to maximize performance. 1-2 threads is often best, especially if you thread outside of calls to this function.

Arguments:

  • w: workspace object
  • A: The first matrix
  • B: The second matrix
  • β: regularization parameter. To get smooth distance profiles (distance_profile), a slightly higher β than for barycenters is recommended. β around 0.01 should do fine.
  • τ: stabilization parameter
  • iters: maximum number of iterations
  • tol: tolerance (change in oen of the dual variables)
  • ϵ: other stabilization parameter
  • verbose: print stuff?
  • initUV: if true, initializes dual variables in w to one. It can be useful to set this to false if you have a good initial guess (warm start). Set the corresponding w.U, w.V to provide the initial guess.
source
SpectralDistances.sinkhorn_log!Method

Same as sinkhorn_log but operates in-place to save memory allocations. This function has higher performance than sinkhorn_log, but might not work as well with AD libraries.

This function can be made completely allocation free with the interface sinkhorn_log(w::SinkhornLogWorkspace{T}, C, a, b; kwargs...)

The sinkhorn_log! solver also accepts a keyword argument check_interval = 20 that determines how often the convergence criteria is checked. If β is large, the algorithm might converge very fast and you can save some iterations by reducing the check interval. If β is small and the algorithm requires many iterations, a larger number saves you from computing the check too often.

The workspace w is created linke this: w = SinkhornLogWorkspace(FloatType, length(a), length(b))

source
SpectralDistances.sinkhorn_logMethod
Γ, u, v = sinkhorn_log(C, a, b; β=1e-1, iters=1000, tol=1e-8)

The Sinkhorn algorithm (log-stabilized). C is the cost matrix and a,b are vectors that sum to one. Returns the optimal plan and the dual potentials. See also sinkhorn_log! for a faster implementation operating in-place, and IPOT for a potentially more exact solution.

When this function is being differentiated, warnings about inaccurate solutions are turned off. You may choose to manually asses the error in the constrains by ea, eb = SpectralDistances.ot_error(Γ, a, b).

The IPOT algorithm: https://arxiv.org/pdf/1610.06519.pdf

source
SpectralDistances.sinkhorn_unbalancedMethod
Γ, u, v = sinkhorn_unbalanced(C, a, b, divergence; β=1e-1, iters=1000, tol=1e-8)

The Unbalanced Sinkhorn algorithm (log-stabilized). C is the cost matrix and a,b are vectors that are not required to sum to one.

Ref: "Sinkhorn Divergences for Unbalanced Optimal Transport" https://arxiv.org/abs/1910.12958 Makes use of UnbalancedOptimalTransport.jl

source
SpectralDistances.ot_jumpMethod
ot_jump(D, P1, P2)

Solve the optimal transport problem using JuMP. This function is only available if using JuMP, GLPK.

Arguments:

  • D: Distance matrix
  • P1: Weight vector 1
  • P2: Weight vector 2
source
SpectralDistances.ot_convexMethod
ot_convex(D, P1, P2)

Solve the optimal transport problem using Convex.jl. This function is only available if using Convex.jl, GLPK.

Arguments:

  • D: Distance matrix
  • P1: Weight vector 1
  • P2: Weight vector 2
source
SpectralDistances.complete_distmatFunction
D̃, S = complete_distmat(D, W, λ = 2)

Takes an incomplete squared Euclidean distance matrix D and fills in the missing entries indicated by the mask W. W is a BitArray or array of {0,1} with 0 denoting a missing value. Returns the completed matrix and an SVD object that allows reconstruction of the generating point set X.

NOTE This function is only available after using Convex, SCS.

Arguments:

  • D: The incomplete matrix
  • W: The mask
  • λ: Regularization parameter. A higher value enforces better data fitting, which might be required if the number of entries in D is very small.

Example:

using Distances
P = randn(2,40)
D = pairwise(SqEuclidean(), P)
W = rand(size(D)...) .> 0.3 # Create a random mask
W = (W + W') .> 0           # It makes sense for the mask to be symmetric
W[diagind(W)] .= true
D0 = W .* D                 # Remove missing entries

D2, S = complete_distmat(D0, W)

@show (norm(D-D2)/norm(D))
@show (norm(W .* (D-D2))/norm(D))

The set of points that created D can be reconstructed up to an arbitrary rotation and translation, X contains the reconstruction in the d first rows, where d is the dimension of the point coordinates. To reconstruct X using S, do

X  = Diagonal(sqrt.(S.S))*S.Vt

# Verify that reconstructed `X` is correct up to rotation and translation
A = [X[1:2,:]' ones(size(D,1))]
P2 = (A*(A \ P'))'
norm(P-P2)/norm(P) # Should be small

Ref: Algorithm 5 from "Euclidean Distance Matrices: Essential Theory, Algorithms and Applications" Ivan Dokmanic, Reza Parhizkar, Juri Ranieri and Martin Vetterli https://arxiv.org/pdf/1502.07541.pdf

source

Details

Transport-based distances may require some tuning parameters to be set for the solvers. The available solvers are

  • sinkhorn: not recommended due to numerical issues, but this is the most commonly cited algorithm.
  • sinkhorn_log: better numerical stability than the standard.
  • sinkhorn_log!: in-place version that is faster, but some AD libraries might not like it (often the default if no solver is provided).
  • sinkhorn_unbalanced: this solver accepts a divergence that penalizes creation/destruction of mass. It thus handles measure of different masses and can choose to create/destroy mass instead of transporting it.
  • IPOT Finds exact solution (without entropy regularization), requires β around 0.1-1.
  • ot_jump: exact solution using JuMP, requires using JuMP, GLPK before it becomes available.
  • ot_convex: exact solution using Convex.jl, requires using Convex, GLPK before it becomes available.
  • sinkhorn_convolutional: applicable only to 2D grids, but is very efficient in this case. ConvOptimalTransportDistance is the corresponding distance type.

Providing solver and options

options = (solver=sinkhorn_log!, tol=1e-6, iters=100_000)
distance = OptimalTransportRootDistance(domain=Continuous(), p=1, β=0.001)
SpectralDistances.evaluate(distance, model1, model2; options...)

Maximum performance

This solver sinkhorn_log! can be made completely allocation free with the interface

sinkhorn_log(w::SinkhornLogWorkspace{T}, C, a, b; kwargs...)

The workspace w is created linke this:

w = SinkhornLogWorkspace(eltype(a), length(a), length(b))

This will save you both allocations and time if called multiple times, especially important if you intend to make use of multiple threads. For multiple threads, make sure to create one workspace for each thread.

The sinkhorn_log! solver also accepts a keyword argument check_interval = 20 that determines how often the convergence criteria is checked. If β is large, the algorithm might converge very fast and you can save some iterations by reducing the check interval. If β is small and the algorithm requires many iterations, a larger number saves you from computing the check too often.

See also inplace functions