Distances
Overview
The following is a reference for all the distances defined in this package. Once a distance object is defined, it can be evaluated in one of two ways, defined by the Distances.jl interface
dist = DistanceType(options)
d = evaluate(d, x1, x2; kwargs...) # keyword arguments are used to control the solvers for some transport-based distances
d = dist(x1, x2) # A shorter syntax for calling the distance
All distances return the distance raised to the power p
, thus RationalOptimalTransportDistance(p=2)(x1,x2)
$= W_2(x_1,x_2)^2$ where $W_2$ denotes the Wasserstein distance of order 2.
Before we proceed, we list a number of classes of distances that are available
SpectralDistances.AbstractDistance
SpectralDistances.AbstractRationalDistance
SpectralDistances.AbstractSignalDistance
SpectralDistances.BuresDistance
SpectralDistances.CoefficientDistance
SpectralDistances.ConvOptimalTransportDistance
SpectralDistances.DiscreteGridTransportDistance
SpectralDistances.DiscreteGridTransportDistance
SpectralDistances.DiscretizedRationalDistance
SpectralDistances.EnergyDistance
SpectralDistances.EuclideanRootDistance
SpectralDistances.HungarianRootDistance
SpectralDistances.KernelWassersteinRootDistance
SpectralDistances.ModelDistance
SpectralDistances.OptimalTransportHistogramDistance
SpectralDistances.OptimalTransportRootDistance
SpectralDistances.RationalCramerDistance
SpectralDistances.RationalOptimalTransportDistance
SpectralDistances.SCWorkspace
SpectralDistances.SlidingConvOptimalTransportDistance
SpectralDistances.SymmetricDistance
SpectralDistances.WelchLPDistance
SpectralDistances.WelchOptimalTransportDistance
Some of these distances operate directly on signals, these are
EnergyDistance
ModelDistance
SpectralDistances.AbstractWelchDistance
Of these, ModelDistance
is a bit special, works like this
SpectralDistances.ModelDistance
— TypeModelDistance{D <: AbstractDistance} <: AbstractSignalDistance
A model distance operates on signals and works by fitting an LTI model to the signals before calculating the distance. The distance between the LTI models is defined by the field distance
. This is essentially a wrapper around the inner distance that handles the fitting of a model to the signals. How the model is fit is determined by fitmethod
.
Arguments:
Example:
using SpectralDistances
innerdistance = OptimalTransportRootDistance(domain=Continuous(), β=0.005, p=2)
dist = ModelDistance(TLS(na=30), innerdistance)
The inner distance in ModelDistance
can be any AbstractRationalDistance
. The options are
DiscretizedRationalDistance
RationalCramerDistance
RationalOptimalTransportDistance
SpectralDistances.AbstractCoefficientDistance
SpectralDistances.AbstractRootDistance
These distances operate on LTI models. Some operate on the coefficients of the models
CoefficientDistance
and some operate on the roots of the models
EuclideanRootDistance
HungarianRootDistance
KernelWassersteinRootDistance
OptimalTransportRootDistance
A full example
To use the OptimalTransportRootDistance
and let it operate on signals, we may construct our distance object as follows
julia> innerdistance = OptimalTransportRootDistance(domain=Continuous(), β=0.005, p=2)
OptimalTransportRootDistance{Continuous, typeof(identity), typeof(simplex_residueweight), Nothing}(Continuous(), identity, SpectralDistances.simplex_residueweight, 0.005, 2, nothing)
julia> dist = ModelDistance(TLS(na=10), innerdistance)
ModelDistance{OptimalTransportRootDistance{Continuous, typeof(identity), typeof(simplex_residueweight), Nothing}, TLS}(TLS(10, 0.0), OptimalTransportRootDistance{Continuous, typeof(identity), typeof(simplex_residueweight), Nothing}(Continuous(), identity, SpectralDistances.simplex_residueweight, 0.005, 2, nothing))
julia> X1, X2 = randn(1000), randn(1000);
julia> dist(X1,X2)
0.2520217583902977
julia> dist = ModelDistance(LS(na=2), innerdistance);
julia> t = 0:0.01:10;
julia> X1, X2 = sin.(2π*1 .*t), sin.(2π*1.1 .*t); # Two signals that are close in frequency
julia> dist(X1,X2)
0.0005678342162688657
julia> X1, X2 = sin.(2π*1 .*t), sin.(2π*2 .*t); # Two signals that are further apart in frequency
julia> dist(X1,X2)
0.004073228552286193
Using Welch periodograms
We can calculate the Wasserstein distance between spectra estimated using the Welch method like so
julia> dist = WelchOptimalTransportDistance(p=2)
WelchOptimalTransportDistance{Nothing, Tuple{}, NamedTuple{(), Tuple{}}}(nothing, (), NamedTuple(), 2)
julia> X1, X2 = randn(1000), randn(1000);
julia> dist(X1,X2)
0.000665687115159275
julia> t = 0:0.01:10;
julia> X1, X2 = sin.(2π*1 .*t), sin.(2π*1.1 .*t); # Two signals that are close in frequency
julia> dist(X1,X2)
0.00035863696706223353
julia> X1, X2 = sin.(2π*1 .*t), sin.(2π*2 .*t); # Two signals that are further apart in frequency
julia> dist(X1,X2)
0.0011840976205946979
Gradients
Some distances will allow you to propagate gradients through them. Below is an example using Zygote and the OptimalTransportRootDistance
using Zygote
Zygote.@nograd rand # Currently needed woraround
x1 = SpectralDistances.bp_filter(randn(100000), (0.1,0.3)) # Create two signals
x2 = SpectralDistances.bp_filter(randn(100000), (0.1,0.2))
fm = LS(na=10) # LS is the best supported fitmethod for gradients
dist = ModelDistance(fm,OptimalTransportRootDistance(domain = Continuous())) # Since we're measureing distance between signals, we wrap the distance in a ModelDistance
ModelDistance{OptimalTransportRootDistance{Continuous, typeof(identity), typeof(simplex_residueweight), Nothing}, LS}(LS(10, 0.01), OptimalTransportRootDistance{Continuous, typeof(identity), typeof(simplex_residueweight), Nothing}(Continuous(), identity, SpectralDistances.simplex_residueweight, 0.01, 2, nothing))
julia> dist(x1,x2)
0.028026726850105148
julia> ∇x1 = Zygote.gradient(x->real(evaluate(dist,x,x2)), x1)[1] # The call to real is a workaround for a Zygote bug
ERROR: Compiling Tuple{SpectralDistances.var"##sinkhorn_log!#168", Float64, Float64, Int64, Float64, Int64, Int64, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, typeof(sinkhorn_log!), SpectralDistances.SinkhornLogWorkspace{ComplexF64, Matrix{ComplexF64}, Vector{ComplexF64}}, Matrix{Float64}, Vector{ComplexF64}, Vector{ComplexF64}}: try/catch is not supported. Refer to the Zygote documentation for fixes. https://fluxml.ai/Zygote.jl/latest/limitations
The differentiation takes some time, but it should be fast enough to be generally useful for gradient-based learning of autoencoders etc. The following is a benchmark performed on an old laptop without GPU (the distances are not yet tested on GPUs)
@btime Zygote.gradient(x->real(evaluate($dist,x,$x2)), $x1);
# 134.965 ms (107566 allocations: 134.77 MiB)
with a precomputed reference model, it goes even faster
m2 = fm(x2)
m2 = change_precision(Float64, m2) # Tihs step is important for performance
@btime Zygote.gradient(x->real(evaluate($dist,x,$m2)), $x1);
# 80.200 ms (103437 allocations: 69.62 MiB)
The same benchmarks performed on a 2019 desktop computer yields the following timings
julia> @btime Zygote.gradient(x->real(evaluate($dist,x,$x2)), $x1);
45.926 ms (107748 allocations: 136.18 MiB)
julia> @btime Zygote.gradient(x->real(evaluate($dist,x,$m2)), $x1);
25.120 ms (103619 allocations: 71.03 MiB)
Unbalanced transport
There are situations in which one would like to avoid fully transporting all mass between two measures. A few such cases are
- The two measures do not have the same mass. In this case, the standard, balanced, optimal-transport problem is unfeasible.
- Energy is somehow lost or added to one spectra in a way that should not be accounted for by transport. This would be the case if
- Spectral energy is absorbed by a channel through which a signal is propagated. In this case it would not make sense to try to transport mass from the other spectrum away from the absorbed (dampended) frequency.
- Spectral energy is added by a noise source. This energy should ideally not be considered for transport and should rather be destroyed.
For situations like this, an AbstractDivergence
can be supplied to the OptimalTransportRootDistance
. This divergence specifies how expensive it is to create or destroy mass in the spectra. The available divergences are listed in the docs of UnbalancedOptimalTransport.jl, to which we outsource the solving of the unbalanced problem. For convenience, the wrapper sinkhorn_unbalanced
is available to interface the unbalanced solver in the same way as the solvers from this package are interfaced.
julia> using DSP
julia> fm = LS(na = 10);
julia> m1 = fm(filtfilt(ones(10), [10], randn(1000)));
┌ Warning: Roots on negative real axis, no corresponding continuous time representation exists. Consider prefiltering the signal or decreasing the regularization factor. └ @ SpectralDistances ~/work/SpectralDistances.jl/SpectralDistances.jl/src/ltimodels.jl:83
julia> m2 = fm(filtfilt(ones(5), [5], randn(1000)));
julia> dist = OptimalTransportRootDistance(domain = Continuous(), p=1, divergence=Balanced())
OptimalTransportRootDistance{Continuous, typeof(identity), typeof(simplex_residueweight), Balanced}(Continuous(), identity, SpectralDistances.simplex_residueweight, 0.01, 1, Balanced())
julia> d1 = evaluate(dist,m1,m2)
0.11280317723019066
julia> dist = OptimalTransportRootDistance(domain = Continuous(), p=1, divergence=KL(1.0));
julia> d2 = evaluate(dist,m1,m2)
0.10693322616421522
julia> dist = OptimalTransportRootDistance(domain = Continuous(), p=1, divergence=KL(10.0));
julia> d3 = evaluate(dist,m1,m2)
0.11218407751283632
julia> dist = OptimalTransportRootDistance(domain = Continuous(), p=1, divergence=KL(0.01));
julia> d4 = evaluate(dist,m1,m2)
0.027840191480047752
julia> d1 > d3 > d2 > d4
true
When the distance is evaluated the second time, unbalanced transport is used. The d2
should be equal to or smaller than d1
. If we make the KL
term larger, the distance approaches the balanced cost, and if we make it smaller, it becomes very cheap to create/destroy mass and less mass is transported.
The first case, where divergence=Balanced()
was supplied, should be equivalent to not providing any divergence at all. In pratice results might differ slightly since a different solver implementation is used.
Below is an example in which the unbalanced transport between two systems is computed. The two systems do not have the same number of poles, and if destruction of mass is made cheap, not all mass is transported. The thickness of the lines indicate mass flow.
using Plots
m1 = AR(Continuous(), [1, 0.1, 1.3]) |> change_precision(Float64)
m2 = AR(Continuous(), polyconv([1, 0.1, 1], [1, 0.1, 1.2])) |> change_precision(Float64)
D = SpectralDistances.distmat_euclidean(m1.pc, m2.pc)
w1, w2 = unitweight.((m1, m2))
figs = map([0.001, 0.01, 0.1]) do tv
divergence = TV(tv)
Γ, a, b = sinkhorn_unbalanced(D, w1, w2, divergence, β = 0.01)
lineS = 20Γ
lineS[lineS.<0.1] .= 0
alphaS = lineS ./ maximum(lineS)
f = scatter(m1.pc, legend = false, ms = 10, title = "TV=$tv")
scatter!(m2.pc, ms = 10)
for (i, p1) in enumerate(m1.pc), (j, p2) in enumerate(m2.pc)
coords = [p1, p2]
plot!(
real(coords),
imag(coords),
linewidth = lineS[i, j],
alpha = alphaS[i, j],
color = :black,
)
end
f
end
plot(figs..., layout = (1, 3), ylims = (0.9, 1.2))
Function reference
SpectralDistances.IPOT
SpectralDistances.complete_distmat
SpectralDistances.discrete_grid_transportcost
SpectralDistances.discrete_grid_transportplan
SpectralDistances.distmat
SpectralDistances.distmat_euclidean
SpectralDistances.distmat_euclidean!
SpectralDistances.domain
SpectralDistances.domain_transform
SpectralDistances.mask_filter
SpectralDistances.normalize_spectrogram
SpectralDistances.ot_convex
SpectralDistances.ot_jump
SpectralDistances.precompute
SpectralDistances.sinkhorn
SpectralDistances.sinkhorn_convolutional
SpectralDistances.sinkhorn_convolutional
SpectralDistances.sinkhorn_log
SpectralDistances.sinkhorn_log!
SpectralDistances.sinkhorn_unbalanced
Docstrings
SpectralDistances.AbstractDistance
— TypeThe top level distance type
SpectralDistances.AbstractRationalDistance
— TypeAll subtypes of this type operates on rational transfer functions
SpectralDistances.AbstractSignalDistance
— TypeAll subtypes of this type operates on signals
SpectralDistances.BuresDistance
— TypeBuresDistance <: AbstractDistance
Distance between pos.def. matrices
SpectralDistances.CoefficientDistance
— TypeCoefficientDistance{D, ID} <: AbstractCoefficientDistance
Distance metric based on model coefficients
Arguments:
domain::D
:Discrete
orContinuous
distance::ID = SqEuclidean()
: Inner distance between coeffs
SpectralDistances.DiscreteGridTransportDistance
— TypeDiscreteGridTransportDistance{DT} <: AbstractDistance
Optimal transport between two measures on a common discrete grid.
SpectralDistances.DiscreteGridTransportDistance
— MethodDiscreteGridTransportDistance(Cityblock(), Float32, n, n)
The constructor takes an inner distance, the numeric type and the length of the two measures.
SpectralDistances.DiscretizedRationalDistance
— TypeDiscretizedRationalDistance{WT, DT} <: AbstractRationalDistance
This distance discretizes the spectrum before performing the calculations.
Arguments:
w::WT = LinRange(0.01, 0.5, 300)
: Frequency setdistmat::DT = distmat_euclidean(w, w)
: DESCRIPTION
SpectralDistances.EnergyDistance
— TypeEnergyDistance <: AbstractSignalDistance
std(x1) - std(x2)
This distance can be added to a loss function to ensure that the energy in the two signals is the same. Some of the optimal transport-based distances are invariant to the energy in the signal, requiring this extra cost if that invariance is not desired. Combining distances is done by putting two or more in a tuple. Usage: combined_loss = (primary_distance, EnergyDistance())
SpectralDistances.EuclideanRootDistance
— TypeEuclideanRootDistance{D, A, F1, F2} <: AbstractRootDistance
Simple euclidean distance between roots of transfer functions
Arguments:
domain::D
:Discrete
orContinuous
assignment::A =
SortAssignement
(imag)
: Determines how roots are assigned. An alternative isHungarianAssignement
transform::F1 = identity
: DESCRIPTIONweight
: A function used to calculate weights for the induvidual root distances. A good option isresidueweight
p::Int = 2
: Order of the distance
SpectralDistances.HungarianRootDistance
— TypeHungarianRootDistance{D, ID <: Distances.PreMetric, F} <: AbstractRootDistance
Similar to EuclideanRootDistance
but does the pole assignment using the Hungarian method.
Arguments:
domain::D = Continuous()
:Discrete
orContinuous
distance::ID = SqEuclidean()
: Inner distancetransform::F = identity
: If provided, this Function transforms all roots before the distance is calculated
SpectralDistances.KernelWassersteinRootDistance
— TypeKernelWassersteinRootDistance{D, F, DI} <: AbstractRootDistance
A kernel version of the root distance
Arguments:
domain::D = Continuous()
:Discrete
orContinuous
λ::Float64 = 1.0
: Kernel precision, lower value means wider kernel.transform::F = identity
: If provided, this Function transforms all roots before the distance is calculated
SpectralDistances.OptimalTransportHistogramDistance
— TypeOptimalTransportHistogramDistance{DT} <: AbstractDistance
Optimal transport between two histograms. If you pass two vectors to this distance two histograms will be computed automatically (fit(Histogram,x)
). Pass two histograms to get better manual control.
Arguments:
p::Int = 1
: order
SpectralDistances.OptimalTransportRootDistance
— TypeOptimalTransportRootDistance{D, F1, F2, S} <: AbstractRootDistance
The Sinkhorn distance between roots. The weights are provided by weight
, which defaults to residueweight
.
Arguments:
domain::D = Continuous()
:Discrete
orContinuous
transform::F1 = identity
: Probably not needed.weight::F2 =
simplex_residueweight
: A function used to calculate weights for the induvidual root distances.β::Float64 = 0.01
: Amount of entropy regularizationp::Int = 2
: Order of the distancedivergence::S = nothing
: A divergence that penalizes creation and destruction of mass.
SpectralDistances.RationalCramerDistance
— TypeRationalCramerDistance{DT} <: AbstractRationalDistance
Similar to RationalOptimalTransportDistance
but does not use inverse functions.
Arguments:
domain::DT = Continuous()
:Discrete
orContinuous
p::Int = 2
: orderinterval = (-(float(π)), float(π))
: Integration interval
SpectralDistances.RationalOptimalTransportDistance
— TypeCationalOptimalTransportDistance{DT, MT} <: AbstractRationalDistanCe
calculates the Wasserstein distance using the closed-form sold*c for (d,c) in zip(D,C)s and inverse cumulative functions.
C ArgumentC:
domain::DT = Continuous()
:Discrete
or [Continuous
](@red*c for (d,c) in zip(D,C)rmagnitude::MT = Identity()
:interval = (-(float(π)), float(π))
: Integration interval
SpectralDistances.SymmetricDistance
— TypeSymmetricDistance{D} <: AbstractDistance
Evaluates d(x,y) - 0.5(d(x,x) + d(y,y))
SpectralDistances.WelchLPDistance
— TypeWelchLPDistance{AT <: Tuple, KWT <: NamedTuple, F} <: AbstractWelchDistance
Lᵖ distance between welch spectra, mean(abs(x1-x2)^p)
.
#Arguments:
args::AT = ()
: These are sent towelch_pgram
kwargs::KWT = NamedTuple()
: These are sent towelch_pgram
p::Int = 2
: Order of the distancenormalized::Bool = true
: Normlize spectrum to sum to 1 (recommended)transform::F = identity
: Optional function to apply to the spectrum, examplelog1p
orsqrt
. Must not produce negative values, solog
is not a good idea. The function is applied like this:transform.(x1)
.
SpectralDistances.WelchOptimalTransportDistance
— TypeWelchOptimalTransportDistance{DT, AT <: Tuple, KWT <: NamedTuple} <: AbstractWelchDistance
Calculates the Wasserstein distance between two signals by estimating a Welch periodogram of each.
Arguments:
distmat::DT
: you may provide a matrix array for thisargs::AT = ()
: Options to the Welch functionkwargs::KWT = NamedTuple()
: Options to the Welch functionp::Int = 2
: Order of the distance
SpectralDistances.discrete_grid_transportcost
— Methoddiscrete_grid_transportcost(x::AbstractVector{T}, y::AbstractVector{T}, p = 2, tol=sqrt(eps(T)); inplace=false) where T
Calculate the optimal-transport cost between two vectors that are assumed to have the same support, with sorted and equdistant support points.
The calculated cost corresponds to the following
D = [abs2((i-j)/(n-1)) for i in 1:n, j in 1:n] # note the 1/(n-1)
Γ = discrete_grid_transportplan(x, y)
dot(Γ, D)
- ´p´ is the power of the distance
- ´inplace
if true,
x` is overwritten.
SpectralDistances.discrete_grid_transportplan
— Methoddiscrete_grid_transportplan(x::AbstractVector{T}, y::AbstractVector{T}, tol=sqrt(eps(T)); inplace=false) where T
Calculate the optimal-transport plan between two vectors that are assumed to have the same support, with sorted support points.
- ´inplace
if true,
x` is overwritten. - It's possible to supply keyword argument
g
to provide preallocated memory for the transport plan. Make sure it's all zeros if you do.
SpectralDistances.distmat
— Methoddistmat(dist, e::AbstractVector; normalize, kwargs...) -> Any
Compute the symmetric, pairwise distance matrix using the specified distance.
normalize
: set to true to normalize distances such that the diagonal is zero. This is useful for distances that are not true distances due tod(x,y) ≠ 0
such as theOptimalTransportRootDistance
This function uses multiple threads if available and copies the distance object to each thread in case it stores an internal cache.
SpectralDistances.distmat_euclidean
— Functiondistmat_euclidean(e1::AbstractVector, e2::AbstractVector, p = 2)
The euclidean distance matrix between two vectors of complex numbers
SpectralDistances.distmat_euclidean!
— Functiondistmat_euclidean!(D, e1::AbstractVector, e2::AbstractVector, p = 2) = begin
In-place version
SpectralDistances.domain
— Methoddomain(d::AbstractDistance)
domain(d) -> Discrete{Int64}
Return the domain of the distance
SpectralDistances.domain_transform
— Methoddomain_transform(d::AbstractDistance, e)
Change domain of roots
SpectralDistances.precompute
— Functionprecompute(d::AbstractDistance, As, threads=true)
Perform computations that only need to be donce once when several pairwise distances are to be computed
Arguments:
As
: A vector of modelsthreads
: Us multithreading? (true)
SpectralDistances.ConvOptimalTransportDistance
— TypeConvOptimalTransportDistance <: AbstractDistance
Distance between matrices caluclated using sinkhorn_convolutional
. This type automatically creates a workspace object that is reused between invocations. Functions that internally performs threaded computations copy this object to each thread, but if manual threading is performed, this must be handled manually, e.g.:
using ThreadTools
dists = [deepcopy(dist) for _ in 1:Threads.nthreads()]
D = tmap(eachindex(X)) do i
evaluate(dists[Threads.threadid()], Q, X[i]; kwargs...)
end
It's important to tune the two parameters below, see the docstring for sinkhorn_convolutional
for more help.
To get sharp barycenters, a smaller β around 0.001 is recommended.
To get smooth distance profiles (
distance_profile
), a slightly higher β than for barycenters is recommended. β around 0.01-0.05 should do fine.β = 0.001
dynamic_floor = -10.0
invariant_axis::Int = 0
If this is set to 1 or 2, the distance will be approximately invariant to translations along the invariant axis. As an example, to be invariant to a spectrogram being shifted slightly in time, setinvariant_axis = 2
.
See also SlidingConvOptimalTransportDistance
SpectralDistances.SCWorkspace
— MethodSCWorkspace(A, B, β)
Workspace object for sinkhorn_convolutional
. Manually construct this in order to save allocations between consequtive calls to the solver.
Arguments:
A
: The first matrixB
: The second matrixβ
: the regularization parameter
SpectralDistances.SlidingConvOptimalTransportDistance
— TypeSlidingConvOptimalTransportDistance
Similar to ConvOptimalTransportDistance
but lets the shorter signal slide across the longer signal and returns the minimum distance. Equivalent to calling
minimum(distance_profile(d::ConvOptimalTransportDistance, a, b; kwargs...))
SpectralDistances.IPOT
— FunctionΓ, u, v = IPOT(C, a, b; β=1, iters=1000)
The Inexact Proximal point method for exact Optimal Transport problem (IPOT) (Sinkhorn-like) algorithm. C
is the cost matrix and a,b
are vectors that sum to one. Returns the optimal plan and the dual potentials. See also sinkhorn
. β
does not have to go to 0 for this alg to return the optimal distance, in fact, if β is set too low, this alg will encounter numerical problems.
A Fast Proximal Point Method for Computing Exact Wasserstein Distance Yujia Xie, Xiangfeng Wang, Ruijia Wang, Hongyuan Zha https://arxiv.org/abs/1802.04307
SpectralDistances.mask_filter
— Functionmask_filter(X::Matrix, th=0.25)
mask_filter!(Y, X=Y, th=0.25)
Mask out parts of a spectrogram that only contains impulsive noise. ´th ∈ (0,1) determines the amount of masking, a higher value masks out more. This function is to be applied after something like normalize_spectrogram
.
SpectralDistances.normalize_spectrogram
— Methodnormalize_spectrogram(S, dynamic_floor = default_dynamic_floor(S))
SpectralDistances.sinkhorn
— MethodΓ, u, v = sinkhorn(C, a, b; β=1e-1, iters=1000)
The Sinkhorn algorithm. C
is the cost matrix and a,b
are vectors that sum to one. Returns the optimal plan and the dual potentials. This function is relatively slow, see also sinkhorn_log!
IPOT
and sinkhorn_log
for faster algorithms.
SpectralDistances.sinkhorn_convolutional
— Methodsinkhorn_convolutional(A::AbstractMatrix, B::AbstractMatrix; β = 0.001, kwargs...)
SpectralDistances.sinkhorn_convolutional
— Methodsinkhorn_convolutional(w::SCWorkspace{T}, A::AbstractMatrix, B::AbstractMatrix; β = 0.01, τ = 1 / eps(T), iters = 1000, tol = 1.0e-6, ϵ = eps(T) ^ 2, verbose = false, initUV = true) where T
Calculate the entropically regularizaed Sinkhorn distance between two matrices where the ground cost is squared euclidean. This function uses an efficient convolutional algorithm and is much more efficient than the corresponding sinkhorn_log!
in this special case.
This function spends most of its time doing matrix multiplications. You may want to tune BLAS.set_num_threads
in order to maximize performance. 1-2 threads is often best, especially if you thread outside of calls to this function.
Arguments:
w
: workspace objectA
: The first matrixB
: The second matrixβ
: regularization parameter. To get smooth distance profiles (distance_profile
), a slightly higher β than for barycenters is recommended. β around 0.01 should do fine.τ
: stabilization parameteriters
: maximum number of iterationstol
: tolerance (change in oen of the dual variables)ϵ
: other stabilization parameterverbose
: print stuff?initUV
: if true, initializes dual variables inw
to one. It can be useful to set this to false if you have a good initial guess (warm start). Set the correspondingw.U, w.V
to provide the initial guess.
SpectralDistances.sinkhorn_log!
— MethodSame as sinkhorn_log
but operates in-place to save memory allocations. This function has higher performance than sinkhorn_log
, but might not work as well with AD libraries.
This function can be made completely allocation free with the interface sinkhorn_log(w::SinkhornLogWorkspace{T}, C, a, b; kwargs...)
The sinkhorn_log!
solver also accepts a keyword argument check_interval = 20
that determines how often the convergence criteria is checked. If β
is large, the algorithm might converge very fast and you can save some iterations by reducing the check interval. If β
is small and the algorithm requires many iterations, a larger number saves you from computing the check too often.
The workspace w
is created linke this: w = SinkhornLogWorkspace(FloatType, length(a), length(b))
SpectralDistances.sinkhorn_log
— MethodΓ, u, v = sinkhorn_log(C, a, b; β=1e-1, iters=1000, tol=1e-8)
The Sinkhorn algorithm (log-stabilized). C
is the cost matrix and a,b
are vectors that sum to one. Returns the optimal plan and the dual potentials. See also sinkhorn_log!
for a faster implementation operating in-place, and IPOT
for a potentially more exact solution.
When this function is being differentiated, warnings about inaccurate solutions are turned off. You may choose to manually asses the error in the constrains by ea, eb = SpectralDistances.ot_error(Γ, a, b)
.
The IPOT algorithm: https://arxiv.org/pdf/1610.06519.pdf
SpectralDistances.sinkhorn_unbalanced
— MethodΓ, u, v = sinkhorn_unbalanced(C, a, b, divergence; β=1e-1, iters=1000, tol=1e-8)
The Unbalanced Sinkhorn algorithm (log-stabilized). C
is the cost matrix and a,b
are vectors that are not required to sum to one.
Ref: "Sinkhorn Divergences for Unbalanced Optimal Transport" https://arxiv.org/abs/1910.12958 Makes use of UnbalancedOptimalTransport.jl
SpectralDistances.ot_jump
— Methodot_jump(D, P1, P2)
Solve the optimal transport problem using JuMP. This function is only available if using JuMP, GLPK
.
Arguments:
D
: Distance matrixP1
: Weight vector 1P2
: Weight vector 2
SpectralDistances.ot_convex
— Methodot_convex(D, P1, P2)
Solve the optimal transport problem using Convex.jl. This function is only available if using Convex.jl, GLPK
.
Arguments:
D
: Distance matrixP1
: Weight vector 1P2
: Weight vector 2
SpectralDistances.complete_distmat
— FunctionD̃, S = complete_distmat(D, W, λ = 2)
Takes an incomplete squared Euclidean distance matrix D
and fills in the missing entries indicated by the mask W
. W
is a BitArray
or array of {0,1} with 0 denoting a missing value. Returns the completed matrix and an SVD object that allows reconstruction of the generating point set X
.
NOTE This function is only available after using Convex, SCS
.
Arguments:
D
: The incomplete matrixW
: The maskλ
: Regularization parameter. A higher value enforces better data fitting, which might be required if the number of entries inD
is very small.
Example:
using Distances
P = randn(2,40)
D = pairwise(SqEuclidean(), P)
W = rand(size(D)...) .> 0.3 # Create a random mask
W = (W + W') .> 0 # It makes sense for the mask to be symmetric
W[diagind(W)] .= true
D0 = W .* D # Remove missing entries
D2, S = complete_distmat(D0, W)
@show (norm(D-D2)/norm(D))
@show (norm(W .* (D-D2))/norm(D))
The set of points that created D
can be reconstructed up to an arbitrary rotation and translation, X
contains the reconstruction in the d
first rows, where d
is the dimension of the point coordinates. To reconstruct X
using S
, do
X = Diagonal(sqrt.(S.S))*S.Vt
# Verify that reconstructed `X` is correct up to rotation and translation
A = [X[1:2,:]' ones(size(D,1))]
P2 = (A*(A \ P'))'
norm(P-P2)/norm(P) # Should be small
Ref: Algorithm 5 from "Euclidean Distance Matrices: Essential Theory, Algorithms and Applications" Ivan Dokmanic, Reza Parhizkar, Juri Ranieri and Martin Vetterli https://arxiv.org/pdf/1502.07541.pdf
Details
Transport-based distances may require some tuning parameters to be set for the solvers. The available solvers are
sinkhorn
: not recommended due to numerical issues, but this is the most commonly cited algorithm.sinkhorn_log
: better numerical stability than the standard.sinkhorn_log!
: in-place version that is faster, but some AD libraries might not like it (often the default if no solver is provided).sinkhorn_unbalanced
: this solver accepts a divergence that penalizes creation/destruction of mass. It thus handles measure of different masses and can choose to create/destroy mass instead of transporting it.IPOT
Finds exact solution (without entropy regularization), requires β around 0.1-1.ot_jump
: exact solution using JuMP, requiresusing JuMP, GLPK
before it becomes available.ot_convex
: exact solution using Convex.jl, requiresusing Convex, GLPK
before it becomes available.sinkhorn_convolutional
: applicable only to 2D grids, but is very efficient in this case.ConvOptimalTransportDistance
is the corresponding distance type.
Providing solver and options
options = (solver=sinkhorn_log!, tol=1e-6, iters=100_000)
distance = OptimalTransportRootDistance(domain=Continuous(), p=1, β=0.001)
SpectralDistances.evaluate(distance, model1, model2; options...)
Maximum performance
This solver sinkhorn_log!
can be made completely allocation free with the interface
sinkhorn_log(w::SinkhornLogWorkspace{T}, C, a, b; kwargs...)
The workspace w
is created linke this:
w = SinkhornLogWorkspace(eltype(a), length(a), length(b))
This will save you both allocations and time if called multiple times, especially important if you intend to make use of multiple threads. For multiple threads, make sure to create one workspace for each thread.
The sinkhorn_log!
solver also accepts a keyword argument check_interval = 20
that determines how often the convergence criteria is checked. If β
is large, the algorithm might converge very fast and you can save some iterations by reducing the check interval. If β
is small and the algorithm requires many iterations, a larger number saves you from computing the check too often.
See also inplace functions