Maximum-likelihood and MAP estimation

Filters calculate the likelihood and prediction errors while performing filtering, this can be used to perform maximum likelihood estimation or prediction-error minimization. One can estimate all kinds of parameters using this method, in the example below, we will estimate the noise covariance. The tutorial on this page focuses on problems where the number of parameters is so small that we can visualize the likelihood function, for higher-dimensional problems, see Using an optimizer.

We start by generating some data by simulating a two-dimensional linear system with known dynamics parameters, where we will attempt to optimize the dynamics noise covariance using maximum likelihood and MAP estimation.

Generate data by simulation

This simulates the same linear system as on the index page of the documentation

using LowLevelParticleFilters, LinearAlgebra, StaticArrays, Distributions, Plots
nx = 2   # Dimension of state
nu = 2   # Dimension of input
ny = 2   # Dimension of measurements
N = 2000 # Number of particles

const dg = MvNormal(ny,1.0)          # Measurement noise Distribution
const df = MvNormal(nx,1.0)          # Dynamics noise Distribution
const d0 = MvNormal(@SVector(randn(nx)),2.0)   # Initial state Distribution

const A = SA[1 0.1; 0 1]
const B = @SMatrix [0.0 0.1; 1 0.1]
const C = @SMatrix [1.0 0; 0 1]

dynamics(x,u,p,t) = A*x .+ B*u
measurement(x,u,p,t) = C*x
vecvec_to_mat(x) = copy(reduce(hcat, x)') # Helper function
pf = ParticleFilter(N, dynamics, measurement, df, dg, d0)
xs,u,y = simulate(pf,300,df)

(StaticArraysCore.SVector{2, Float64}[[-0.4155662090726869, -1.9961772371845634], [-1.760456429629702, -1.2274958409066061], [-0.7368621994515796, -1.4423415749096746], [-0.7549297949955807, -1.2235819594865172], [-1.1193749845108643, -1.0065994255418858], [0.05650805495554523, -1.5735118480687025], [0.1458197233443846, -1.6986439006016916], [-1.1871160834062133, -1.7180617153827407], [-2.178661236223445, -4.550464237992873], [-3.5685716627511788, -2.462118713506638]  …  [97.89998301936352, 5.684783374986914], [98.6169336314781, 4.244931649201986], [98.64825221288768, 1.7585930671360728], [99.52258756212109, 1.177412410670638], [100.09387116157735, 0.2158631985307219], [99.40609538081503, 0.24292819766967644], [99.71832822455733, -0.42545739772752245], [100.34966722486863, -0.9415166501253162], [100.52301864641174, -0.1039523623678682], [100.64404746789562, -0.12393529449671936]], [[-0.7547821809913309, 0.4192948070518522], [-0.31841497824678977, 0.08501307867515671], [-1.1436069052414222, 1.524870030857272], [-0.6228384782623171, -0.40698702200556314], [-0.20482942261345383, 0.25263834268952984], [0.9465521424656925, -0.7353659935256396], [0.015754839126118074, 1.0500819252314235], [-1.4779108499597216, -1.175569319807741], [1.2233933585901988, 0.0745563696384836], [-0.1782599011202913, 0.19101936013297574]  …  [0.3055439599023937, 1.6139436233344753], [-1.7119741327097642, -0.8370902412316079], [-0.49592490960004554, -0.6167797066956248], [-0.2602317941154602, -0.28935792209508593], [0.34867654410768945, -0.3575626470040553], [-0.6525114444992685, 0.8491156335696141], [-1.3666954987842277, 0.22307566743147916], [-0.436036034124986, 2.04696767726872], [-0.7110343210545785, 0.5184599778801239], [1.0168747568052736, 0.0990163334147549]], StaticArraysCore.SizedVector{2, Float64, Vector{Float64}}[[-1.161815349128111, -1.0403162191647288], [0.6539212867355615, -0.37319513091916345], [-2.0582377701712837, 0.22483085323999963], [-0.960008159337121, -0.4197221931530638], [-1.8625952937786603, -0.4238659524294809], [-0.3410515207050042, -1.4163933182968167], [0.07090550547268129, -1.5560425612540123], [-1.622714191525593, -3.374503694771968], [-0.08228374644013137, -4.094635831051067], [-2.8772220875830237, -3.817761195382678]  …  [97.4501747011798, 5.356990421711283], [99.04258211887183, 3.211062240830713], [97.98358690483506, 2.5511544397921293], [99.7143703634097, -0.5435083300355539], [99.68409329874636, 0.6705858954738246], [99.72701647540364, -0.018084958035376797], [101.10919474081904, -1.177065277680959], [99.88569793364185, -1.4308859732068953], [100.14000653276764, 0.34569299982345403], [100.90883591440836, -0.975244456120559]])

Compute likelihood for various values of the parameters

Since this example looks for a single parameter only, we can plot the likelihood as a function of this parameter. If we had been looking for more than 2 parameters, we typically use an optimizer instead (see Using an optimizer).

p = nothing
svec = exp10.(LinRange(-0.8, 1.2, 60))
llspf = map(svec) do s
    df = MvNormal(nx,s)
    pfs = ParticleFilter(N, dynamics, measurement, df, dg, d0)
    loglik(pfs, u, y, p)
end
llspfaux = map(svec) do s
    df = MvNormal(nx,s)
    pfs = AuxiliaryParticleFilter(N, dynamics, measurement, df, dg, d0)
    loglik(pfs, u, y, p)
end
plot( svec, llspf,
    xscale = :log10,
    title = "Log-likelihood",
    xlabel = "Dynamics noise standard deviation",
    lab = "PF",
)
plot!(svec, llspfaux, yscale=:identity, xscale=:log10, lab="AUX PF", c=:green)
vline!([svec[findmax(llspf)[2]]], l=(:dash,:blue), primary=false)

the correct value for the simulated data is 1 (the simulated system is the same as on the front page of the docs).

We can do the same with a Kalman filter, shown below. When using Kalman-type filters, one may also provide a known state sequence if one is available, such as when the data is obtained from a simulation or in an instrumented lab setting. If the state sequence is provided, state-prediction errors are used for log-likelihood estimation instead of output-prediction errors.

eye(n) = SMatrix{n,n}(1.0I(n))
llskf = map(svec) do s
    kfs = KalmanFilter(A, B, C, 0, s^2*eye(nx), eye(ny), d0)
    loglik(kfs, u, y, p)
end
llskfx = map(svec) do s # Kalman filter with known state sequence, possible when data is simulated
    kfs = KalmanFilter(A, B, C, 0, s^2*eye(nx), eye(ny), d0)
    loglik_x(kfs, u, y, xs, p)
end
plot!(svec, llskf, yscale=:identity, xscale=:log10, lab="Kalman", c=:red)
vline!([svec[findmax(llskf)[2]]], l=(:dash,:red), primary=false)
plot!(svec, llskfx, yscale=:identity, xscale=:log10, lab="Kalman with known state sequence", c=:purple)
vline!([svec[findmax(llskfx)[2]]], l=(:dash,:purple), primary=false)

the result can be quite noisy due to the stochastic nature of particle filtering. The particle filter likelihood agrees with the Kalman-filter estimate, which is optimal for the linear example system we are simulating here, apart for when the noise variance is small. Due to particle depletion, particle filters often struggle when dynamics-noise is too small. This problem is mitigated by using a greater number of particles, or simply by not using a too small covariance.

MAP estimation

Maximum a posteriori estimation (MAP) is similar to maximum likelihood (ML), but includes also prior knowledge of the distribution of the parameters in a way that is similar to parameter regularization. In this example, we will estimate the variance of the noises in the dynamics and the measurement functions.

To solve a MAP estimation problem, we need to define a function that takes a parameter vector and returns a filter, the parameters are used to construct the covariance matrices:

filter_from_parameters(θ, pf = nothing) = KalmanFilter(A, B, C, 0, exp(θ[1])^2*eye(nx), exp(θ[2])^2*eye(ny), d0) # Works with particle filters as well

The call to exp on the parameters is so that we can define log-normal priors

priors = [Normal(0,2),Normal(0,2)]

2-element Vector{Distributions.Normal{Float64}}:
 Distributions.Normal{Float64}(μ=0.0, σ=2.0)
 Distributions.Normal{Float64}(μ=0.0, σ=2.0)

Now we call the function log_likelihood_fun that returns a function to be minimized

ll = log_likelihood_fun(filter_from_parameters, priors, u, y, p)

Since this is once again a low-dimensional problem, we can plot the LL on a 2d-grid

function meshgrid(a,b)
    grid_a = [i for i in a, j in b]
    grid_b = [j for i in a, j in b]
    grid_a, grid_b
end
Nv       = 20
v        = LinRange(-0.7,1,Nv)
llxy     = (x,y) -> ll([x;y])
VGx, VGy = meshgrid(v,v)
VGz      = llxy.(VGx, VGy)
heatmap(
    VGz,
    xticks = (1:Nv, round.(v, digits = 2)),
    yticks = (1:Nv, round.(v, digits = 2)),
    xlabel = "sigma v",
    ylabel = "sigma w",
) # Yes, labels are reversed
# Mark the maximum with a red dot
max_idx = argmax(VGz)
scatter!([max_idx[1]], [max_idx[2]], c=:red, marker=:x, markersize=10, lab="Maximum")

For higher-dimensional problems, we may estimate the parameters using an optimizer, e.g., Optim.jl. See Using an optimizer for examples, including how to maximize the log-likelihood using Gauss-Newton optimization.