Parameter Estimation

State estimation is an integral part of many parameter-estimation methods. Below, we will illustrate several different methods of performing parameter estimation. We can roughly divide the methods into two camps

Methods that optimize prediction error or likelihood by tweaking model parameters.
Methods that add the parameters to be estimated as state variables in the model and estimate them using standard state estimation.

From the first camp, we provide som basic functionality for maximum-likelihood estimation and MAP estimation, described below. An example of (2), joint state and parameter estimation, is provided in Joint state and parameter estimation.

Maximum-likelihood estimation

Filters calculate the likelihood and prediction errors while performing filtering, this can be used to perform maximum likelihood estimation or prediction-error minimization. One can estimate all kinds of parameters using this method, in the example below, we will estimate the noise covariance. We may for example plot likelihood as function of the variance of the dynamics noise like this:

Generate data by simulation

This simulates the same linear system as on the index page of the documentation

using LowLevelParticleFilters, LinearAlgebra, StaticArrays, Distributions, Plots
nx = 2   # Dimension of state
nu = 2   # Dimension of input
ny = 2   # Dimension of measurements
N = 2000 # Number of particles

const dg = MvNormal(ny,1.0)          # Measurement noise Distribution
const df = MvNormal(nx,1.0)          # Dynamics noise Distribution
const d0 = MvNormal(@SVector(randn(nx)),2.0)   # Initial state Distribution

const A = SA[1 0.1; 0 1]
const B = @SMatrix [0.0 0.1; 1 0.1]
const C = @SMatrix [1.0 0; 0 1]

dynamics(x,u,p,t) = A*x .+ B*u
measurement(x,u,p,t) = C*x
vecvec_to_mat(x) = copy(reduce(hcat, x)') # Helper function
pf = ParticleFilter(N, dynamics, measurement, df, dg, d0)
xs,u,y = simulate(pf,300,df)

(StaticArraysCore.SVector{2, Float64}[[1.0812304200971112, -0.14562935751734435], [1.501594694993898, -0.07108329662378277], [2.3075079134292444, -1.1560289007332327], [2.6635715614772093, -2.23476507953973], [2.4018909304017018, -0.8850752951793424], [3.040139591160852, -1.8730969890976585], [2.106369703630613, -0.9910164754326841], [2.2014153716404445, 0.30731080489488494], [1.2341316245824845, 0.8640490593138092], [1.9821689109665344, 0.5219833031589821]  …  [-346.35998230016065, -6.760162909970502], [-345.5568956166975, -8.718211019154673], [-345.5552446839173, -8.922045089574796], [-347.89988206866525, -10.348118982301767], [-350.88017306871757, -10.105702256393261], [-353.6484140168432, -11.631986572799734], [-354.5689899868184, -14.276249072799914], [-356.9790118148404, -14.44453336355976], [-360.4405249673336, -11.883152686023458], [-362.38540964385845, -8.740532912271114]], [[0.5839054062927054, -0.27929477378843304], [0.41007457728427377, 0.9184433161348705], [0.11204639752074251, 0.5631568392262732], [1.1255022555445502, -0.432131932236531], [-1.4609285400734122, -1.0047900574943391], [1.658369563599552, 0.41173127501428186], [0.080280121082907, -1.6750679461296805], [0.47545184209756375, 0.9200875097747347], [-0.8745692970385783, -0.6063211557138892], [-0.6371267005592348, 0.3182510062201885]  …  [-1.2489711048850263, 0.5737070285485795], [-0.5339829136703733, 0.2608634380802061], [-0.24862328178704068, 0.9549561375433364], [-0.16281638770760531, 1.0145987581223301], [-1.9226803890658999, 0.8827977883629121], [-0.858732315998032, -0.8824817581291353], [-0.6331406379760188, 0.42937840647202086], [-0.4115253891654016, -0.3459220491586696], [1.0877107905147378, -1.2027615350296437], [0.5789458507677563, 1.2073250722043054]], StaticArraysCore.SizedVector{2, Float64, Vector{Float64}}[[0.8815883951384904, -1.7156187413294892], [1.8337731282623388, 1.5701256174756675], [2.5125292832480235, -1.5643741850740913], [3.003336524644781, -2.7952792898629335], [2.7631549630627887, -2.336600472901745], [4.036687073230463, -0.82350325641754], [3.1028604858958575, 1.3105875427192473], [3.538475048629526, 0.28655028607694827], [2.082765268422021, 2.5642951457392664], [1.9061451519060317, 0.2484667625643573]  …  [-347.25324418218844, -6.458867363573821], [-344.0748922983356, -8.367269747588448], [-345.7149543592732, -7.0994420709780295], [-347.5844958094148, -11.72014716574825], [-350.5707368254294, -9.503952658726623], [-354.6778901639587, -10.764120106357932], [-355.18469850988043, -14.751268948173388], [-357.08870317843594, -14.532155665139612], [-360.85675501419405, -13.219967102181629], [-362.9353648380914, -7.437857896647259]])

Compute likelihood for various values of the parameters

Since this example looks for a single parameter only, we can plot the likelihood as a function of this parameter. If we had been looking for more than 2 parameters, we typically use an optimizer instead (see further below).

p = nothing
svec = exp10.(LinRange(-0.8, 1.2, 60))
llspf = map(svec) do s
    df = MvNormal(nx,s)
    pfs = ParticleFilter(N, dynamics, measurement, df, dg, d0)
    loglik(pfs, u, y, p)
end
llspfaux = map(svec) do s
    df = MvNormal(nx,s)
    pfs = AuxiliaryParticleFilter(N, dynamics, measurement, df, dg, d0)
    loglik(pfs, u, y, p)
end
plot( svec, llspf,
    xscale = :log10,
    title = "Log-likelihood",
    xlabel = "Dynamics noise standard deviation",
    lab = "PF",
)
plot!(svec, llspfaux, yscale=:identity, xscale=:log10, lab="AUX PF", c=:green)
vline!([svec[findmax(llspf)[2]]], l=(:dash,:blue), primary=false)

the correct value for the simulated data is 1 (the simulated system is the same as on the front page of the docs).

We can do the same with a Kalman filter, shown below. When using Kalman-type filters, one may also provide a known state sequence if one is available, such as when the data is obtained from a simulation or in an instrumented lab setting. If the state sequence is provided, state-prediction errors are used for log-likelihood estimation instead of output-prediction errors.

eye(n) = SMatrix{n,n}(1.0I(n))
llskf = map(svec) do s
    kfs = KalmanFilter(A, B, C, 0, s^2*eye(nx), eye(ny), d0)
    loglik(kfs, u, y, p)
end
llskfx = map(svec) do s # Kalman filter with known state sequence, possible when data is simulated
    kfs = KalmanFilter(A, B, C, 0, s^2*eye(nx), eye(ny), d0)
    loglik_x(kfs, u, y, xs, p)
end
plot!(svec, llskf, yscale=:identity, xscale=:log10, lab="Kalman", c=:red)
vline!([svec[findmax(llskf)[2]]], l=(:dash,:red), primary=false)
plot!(svec, llskfx, yscale=:identity, xscale=:log10, lab="Kalman with known state sequence", c=:purple)
vline!([svec[findmax(llskfx)[2]]], l=(:dash,:purple), primary=false)

the result can be quite noisy due to the stochastic nature of particle filtering. The particle filter likelihood agrees with the Kalman-filter estimate, which is optimal for the linear example system we are simulating here, apart for when the noise variance is small. Due to particle depletion, particle filters often struggle when dynamics-noise is too small. This problem is mitigated by using a greater number of particles, or simply by not using a too small covariance.

MAP estimation

Maximum a posteriori estimation (MAP) is similar to maximum likelihood (ML), but includes also prior knowledge of the distribution of the parameters in a way that is similar to parameter regularization. In this example, we will estimate the variance of the noises in the dynamics and the measurement functions.

To solve a MAP estimation problem, we need to define a function that takes a parameter vector and returns a filter, the parameters are used to construct the covariance matrices:

filter_from_parameters(θ, pf = nothing) = KalmanFilter(A, B, C, 0, exp(θ[1])^2*eye(nx), exp(θ[2])^2*eye(ny), d0) # Works with particle filters as well

The call to exp on the parameters is so that we can define log-normal priors

priors = [Normal(0,2),Normal(0,2)]

2-element Vector{Distributions.Normal{Float64}}:
 Distributions.Normal{Float64}(μ=0.0, σ=2.0)
 Distributions.Normal{Float64}(μ=0.0, σ=2.0)

Now we call the function log_likelihood_fun that returns a function to be minimized

ll = log_likelihood_fun(filter_from_parameters, priors, u, y, p)

Since this is once again a low-dimensional problem, we can plot the LL on a 2d-grid

function meshgrid(a,b)
    grid_a = [i for i in a, j in b]
    grid_b = [j for i in a, j in b]
    grid_a, grid_b
end
Nv       = 20
v        = LinRange(-0.7,1,Nv)
llxy     = (x,y) -> ll([x;y])
VGx, VGy = meshgrid(v,v)
VGz      = llxy.(VGx, VGy)
heatmap(
    VGz,
    xticks = (1:Nv, round.(v, digits = 2)),
    yticks = (1:Nv, round.(v, digits = 2)),
    xlabel = "sigma v",
    ylabel = "sigma w",
) # Yes, labels are reversed

For higher-dimensional problems, we may estimate the parameters using an optimizer, e.g., Optim.jl.

Bayesian inference using PMMH

We proceed like we did for MAP above, but when calling the function metropolis, we will get the entire posterior distribution of the parameter vector, for the small cost of a massive increase in the amount of computations. metropolis runs the Metropolis Hastings algorithm, or more precisely if a particle filter is used, the "Particle Marginal Metropolis Hastings" (PMMH) algorithm. Here we use the Kalman filter simply to have the documentation build a bit faster, it can be quite heavy to run.

filter_from_parameters(θ, pf = nothing) = KalmanFilter(A, B, C, 0, exp(θ[1])^2*I(nx), exp(θ[2])^2*I(ny), d0) # Works with particle filters as well

The call to exp on the parameters is so that we can define log-normal priors

priors = [Normal(0,2),Normal(0,2)]
ll     = log_likelihood_fun(filter_from_parameters, priors, u, y, p)
θ₀     = log.([1.0, 1.0]) # Starting point

We also need to define a function that suggests a new point from the "proposal distribution". This can be pretty much anything, but it has to be symmetric since I was lazy and simplified an equation.

draw   = θ -> θ .+ 0.05 .* randn.() # This function dictates how new proposal parameters are being generated.
burnin = 200 # remove this many initial samples ("burn-in period")
@info "Starting Metropolis algorithm"
@time theta, lls = metropolis(ll, 2200, θ₀, draw) # Run PMMH for 2200  iterations
thetam = reduce(hcat, theta)'[burnin+1:end,:] # Build a matrix of the output
histogram(exp.(thetam), layout=(3,1), lab=["R1" "R2"]); plot!(lls[burnin+1:end], subplot=3, lab="log likelihood") # Visualize

In this example, we initialize the MH algorithm on the correct value θ₀, in general, you'd see a period in the beginning where the likelihood (bottom plot) is much lower than during the rest of the sampling, this is the reason we remove a number of samples in the beginning, typically referred to as "burn in".

If you are lucky, you can run the above threaded as well. I tried my best to make particle filters thread safe with their own rngs etc., but your milage may vary. For threading to help, the dynamics must be non-allocating, e.g., by using StaticArrays etc.

@time thetalls = LowLevelParticleFilters.metropolis_threaded(burnin, ll, 2200, θ₀, draw, nthreads=2)
histogram(exp.(thetalls[:,1:2]), layout=3)
plot!(thetalls[:,3], subplot=3)

Bayesian inference using DynamicHMC.jl

The following snippet of code performs the same estimation as above, but uses the much more sophisticated HMC sampler in DynamicHMC.jl rather than the PMMH sampler above. This package requires the log-likelihood function to be wrapped in a custom struct that implements the LogDensityProblems interface, which is done below. We also indicate that we want to use ForwardDiff.jl to compute the gradients for fast sampling.

using DynamicHMC, LogDensityProblemsAD, ForwardDiff, LogDensityProblems, LinearAlgebra, Random

struct LogTargetDensity{F}
    ll::F
    dim::Int
end
LogDensityProblems.logdensity(p::LogTargetDensity, θ) = p.ll(θ)
LogDensityProblems.dimension(p::LogTargetDensity) = p.dim
LogDensityProblems.capabilities(::Type{LogTargetDensity}) = LogDensityProblems.LogDensityOrder{0}()

function filter_from_parameters(θ, pf = nothing)
    # It's important that the distribution of the initial state has the same
    # element type as the parameters. DynamicHMC will use Dual numbers for differentiation,
    # hence, we make sure that d0 has `eltype(d0) = eltype(θ)`
    T = eltype(θ)
    d0 = MvNormal(T.(d0.μ), T.(d0.Σ))
    KalmanFilter(A, B, C, 0, exp(θ[1])^2*eye(nx), exp(θ[2])^2*eye(ny), d0) 
end
ll = log_likelihood_fun(filter_from_parameters, priors, u, y, p)

D = length(θ₀)
ℓπ = LogTargetDensity(ll, D)
∇P = ADgradient(:ForwardDiff, ℓπ)

results = mcmc_with_warmup(Random.default_rng(), ∇P, 3000)
DynamicHMC.Diagnostics.summarize_tree_statistics(results.tree_statistics)
lls = [ts.π for ts in results.tree_statistics]

histogram(exp.(results.posterior_matrix)', layout=(3,1), lab=["R1" "R2"])
plot!(lls, subplot=3, lab="log likelihood") # Visualize

Joint state and parameter estimation

In this example, we'll show how to perform parameter estimation by treating a parameter as a state variable. This method can not only estimate constant parameters, but also time-varying parameters. The system we will consider is a quadruple tank, where two upper tanks feed into two lower tanks. The outlet for tank 1 can vary in size, simulating, e.g., that something partially blocks the outlet. We start by defining the dynamics on a form that changes the outlet area $a_1$ at time $t=500$:

using LowLevelParticleFilters
using SeeToDee
using Distributions
using StaticArrays
using Plots, LinearAlgebra

function quadtank(h,u,p,t)
    k1, k2, g = 1.6, 1.6, 9.81
    A1 = A3 = A2 = A4 = 4.9
    a1, a3, a2, a4 = 0.03, 0.03, 0.03, 0.03
    γ1, γ2 = 0.2, 0.2

    if t > 500
        a1 *= 2 # Change the parameter at t = 500
    end

    ssqrt(x) = √(max(x, zero(x)) + 1e-3) # For numerical robustness at x = 0

    SA[
        -a1/A1 * ssqrt(2g*h[1]) + a3/A1*ssqrt(2g*h[3]) +     γ1*k1/A1 * u[1]
        -a2/A2 * ssqrt(2g*h[2]) + a4/A2*ssqrt(2g*h[4]) +     γ2*k2/A2 * u[2]
        -a3/A3*ssqrt(2g*h[3])                          + (1-γ2)*k2/A3 * u[2]
        -a4/A4*ssqrt(2g*h[4])                          + (1-γ1)*k1/A4 * u[1]
    ]
end

nu = 2 # number of control inputs
nx = 4 # number of state variables
ny = 2 # number of measured outputs
Ts = 1 # sample time

We then define a measurement function, we measure the levels of tanks 1 and 2, and discretize the continuous-time dynamics using a Runge-Kutta 4 integrator SeeToDee.Rk4:

measurement(x,u,p,t) = SA[x[1], x[2]]
discrete_dynamics = SeeToDee.Rk4(quadtank, Ts)

We simulate the system using the rollout function and add some noise to the measurements. The inputs in this case are just square waves.

Tperiod = 200
t = 0:Ts:1000
u = vcat.(0.25 .* sign.(sin.(2pi/Tperiod .* t)) .+ 0.25)
u = SVector{nu}.(vcat.(u,u))
x0 = Float64[2,2,3,3]
x = LowLevelParticleFilters.rollout(discrete_dynamics, x0, u)[1:end-1]
y = measurement.(x, u, 0, 0)
y = [y .+ 0.01.*randn.() for y in y]

plot(
    plot(reduce(hcat, x)', title="State"),
    plot(reduce(hcat, u)', title="Inputs")
)

To perform the joint state and parameter estimation, we define a version of the dynamics that contains an extra state, corresponding to the unknown or time varying parameter, in this case $a_1$. We do not have any apriori information about how this parameter changes, so we say that its derivative is 0 and it's thus only driven by noise:

function quadtank_paramest(h, u, p, t)
    k1, k2, g = 1.6, 1.6, 9.81
    A1 = A3 = A2 = A4 = 4.9
    a3, a2, a4 = 0.03, 0.03, 0.03
    γ1, γ2 = 0.2, 0.2

    a1 = h[5] # the a1 parameter is a state

    ssqrt(x) = √(max(x, zero(x)) + 1e-3) # For numerical robustness at x = 0

    SA[
        -a1/A1 * ssqrt(2g*h[1]) + a3/A1*ssqrt(2g*h[3]) +     γ1*k1/A1 * u[1]
        -a2/A2 * ssqrt(2g*h[2]) + a4/A2*ssqrt(2g*h[4]) +     γ2*k2/A2 * u[2]
        -a3/A3*ssqrt(2g*h[3])                          + (1-γ2)*k2/A3 * u[2]
        -a4/A4*ssqrt(2g*h[4])                          + (1-γ1)*k1/A4 * u[1]
        0 # the state is only driven by noise
    ]
end

discrete_dynamics_params = SeeToDee.Rk4(quadtank_paramest, Ts)

We then define a nonlinear state estimator, we will use the UnscentedKalmanFilter, and solve the filtering problem. We start by an initial state estimate $x_0$ that is slightly off for the parameter $a_1$

nx = 5
names = SignalNames(x = ["h1", "h2", "h3", "h4", "a1"], y = ["h$i" for i in 1:2], u = ["p1", "p2"], name="UKF") # For nicer plot labels

R1 = SMatrix{nx,nx}(Diagonal([0.1, 0.1, 0.1, 0.1, 0.0001])) # Use of StaticArrays is generally good for performance
R2 = SMatrix{ny,ny}(Diagonal((1e-2)^2 * ones(ny)))
x0 = SA[2, 2, 3, 3, 0.02] # The SA prefix makes the array static, which is good for performance

kf = UnscentedKalmanFilter(discrete_dynamics_params, measurement, R1, R2, MvNormal(x0, R1); ny, nu, Ts, names)

sol = forward_trajectory(kf, u, y)
plot(sol, plotx=false, plotxt=true, plotu=false, ploty=true, legend=:bottomright)
plot!([0,500,500,1000], [0.03, 0.03, 0.06, 0.06], l=(:dash, :black), sp=5, lab="True param")

as we can see, the correct value of the parameter is quickly found ($a_1$), and it also adapts at $t=500$ when the parameter value changes. The speed with which the parameter adapts to changes is determined by the covariance matrix $R_1$, a higher value results in faster adaptation, but also higher sensitivity to noise.

If adaptive parameter estimation is coupled with a model-based controller, we get an adaptive controller! Note: the state that corresponds to the estimated parameter is typically not controllable, a fact that may require some special care for some control methods.

We may ask ourselves, what's the difference between a parameter and a state variable if we can add parameters as state variables? Typically, parameters do not vary with time, and if they do, they vary significantly slower than the state variables. State variables also have dynamics associate with them, whereas we often have no idea about how the parameters vary other than that they vary slowly.

Abrupt changes to the dynamics like in the example above can happen in practice, for instance, due to equipment failure or change of operating mode. This can be treated as a scenario with time-varying parameters that are continuously estimated.

Using an optimizer

The state estimators in this package are all statistically motivated and thus compute things like the likelihood of the data as a by-product of the estimation. Maximum-likelihood or prediction-error estimation is thus very straight-forward by simply calling a gradient-based optimizer with gradients provided by differentiating through the state estimator using automatic differentiation. In this example, we will continue the example from above, but now estimate all the parameters of the quad-tank process. This time, they will not vary with time. We will first use a standard optimization algorithm from Optim.jl to minimize the cost function based on the prediction error, and then use a Gauss-Newton optimizer.

We now define the dynamics function such that it takes its parameters from the p input argument. We also define a variable p_true that contains the true values that we will use to simulate some estimation data

function quadtank(h, u, p, t)
    k1, k2, g = p[1], p[2], 9.81
    A1 = A3 = A2 = A4 = p[3]
    a1 = a3 = a2 = a4 = p[4]
    γ1 = γ2 = p[5]

    ssqrt(x) = √(max(x, zero(x)) + 1e-3) # For numerical robustness at x = 0

    SA[
        -a1/A1 * ssqrt(2g*h[1]) + a3/A1*ssqrt(2g*h[3]) +     γ1*k1/A1 * u[1]
        -a2/A2 * ssqrt(2g*h[2]) + a4/A2*ssqrt(2g*h[4]) +     γ2*k2/A2 * u[2]
        -a3/A3*ssqrt(2g*h[3])                          + (1-γ2)*k2/A3 * u[2]
        -a4/A4*ssqrt(2g*h[4])                          + (1-γ1)*k1/A4 * u[1]
    ]
end

discrete_dynamics = SeeToDee.Rk4(quadtank, Ts) # Discretize the dynamics using a 4:th order Runge-Kutta integrator
p_true = [1.6, 1.6, 4.9, 0.03, 0.2]

Similar to previous example, we simulate the system, this time using a more exciting input in order to be able to identify several parameters

Tperiod = 200
t = 0:Ts:1000
u1 = vcat.(0.25 .* sign.(sin.(2pi/Tperiod .* (t ./ 40).^2)) .+ 0.25)
u2 = vcat.(0.25 .* sign.(sin.(2pi/Tperiod .* (t ./ 40).^2 .+ pi/2)) .+ 0.25)
u  = SVector{nu}.(vcat.(u1,u2))
x0 = SA[2.0,2,3,3] # Initial condition, static array for performance
x = LowLevelParticleFilters.rollout(discrete_dynamics, x0, u, p_true)[1:end-1]
y = measurement.(x, u, 0, 0)
y = [y .+ 0.01 .* randn.() for y in y]

plot(
    plot(reduce(hcat, x)', title="State"),
    plot(reduce(hcat, u)', title="Inputs")
)

This time, we define a cost function for the optimizer to optimize, we'll use the sum of squared errors (sse). It's important to define the UKF with an initial state distribution with the same element type as the parameter vector so that automatic differentiation through the state estimator works, hence the explicit casting T.(x0) and T.(R1). We also make sure to use StaticArrays for the covariance matrices and the initial condition for performance reasons (optional).

nx = 4
R1 = SMatrix{nx,nx}(Diagonal([0.1, 0.1, 0.1, 0.1])) # Use of StaticArrays is generally good for performance
R2 = SMatrix{ny,ny}(Diagonal((1e-2)^2 * ones(ny)))
x0 = SA[2.0, 2, 3, 3]

function cost(p::Vector{T}) where T
    kf = UnscentedKalmanFilter(discrete_dynamics, measurement, R1, R2, MvNormal(T.(x0), T.(R1)); ny, nu, Ts)
    LowLevelParticleFilters.sse(kf, u, y, p) # Sum of squared prediction errors
end

We generate a random initial guess for the estimation problem

p_guess = p_true .+  0.1*p_true .* randn(length(p_true))

5-element Vector{Float64}:
 1.5524906519904813
 1.8775750226115546
 4.428448666554554
 0.02590304999150804
 0.1887687563066955

Solving using Optim

We first minimize the cost using the BFGS optimization algorithm from Optim.jl

using Optim
res = Optim.optimize(
    cost,
    p_guess,
    BFGS(),
    Optim.Options(
        show_trace = true,
        show_every = 5,
        iterations = 100,
        time_limit = 30,
    ),
    autodiff = :forward, # Indicate that we want to use forward-mode AD to derive gradients
)

 * Status: success

 * Candidate solution
    Final objective value:     4.112172e-01

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 3.19e-09 ≰ 0.0e+00
    |x - x'|/|x'|          = 6.93e-10 ≰ 0.0e+00
    |f(x) - f(x')|         = 7.44e-15 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.81e-14 ≰ 0.0e+00
    |g(x)|                 = 1.31e-10 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit 30)
    Iterations:    14
    f(x) calls:    37
    ∇f(x) calls:   37

We started out with a normalized parameter error of

using LinearAlgebra
norm(p_true - p_guess) / norm(p_true)

0.10171638186250584

and ended with

p_opt = res.minimizer
norm(p_true - p_opt) / norm(p_true)

0.06043388874189955

Solving using Gauss-Newton optimization

Below, we optimize the sum of squared residuals again, but this time we do it using a Gauss-Newton style algorithm (Levenberg Marquardt). These algorithms want the entire residual vector rather than the sum of squares of the residuals, so we define an alternative "cost function" called residuals that calls the lower-level function LowLevelParticleFilters.prediction_errors!

using LeastSquaresOptim

function residuals!(res, p::Vector{T}) where T
    kf = UnscentedKalmanFilter(discrete_dynamics, measurement, R1, R2, MvNormal(T.(x0), T.(R1)); ny, nu, Ts)
    LowLevelParticleFilters.prediction_errors!(res, kf, u, y, p)
end

res_gn = optimize!(LeastSquaresProblem(x = copy(p_guess), f! = residuals!, output_length = length(y)*ny, autodiff = :forward), LevenbergMarquardt())

p_opt_gn = res_gn.minimizer
norm(p_true - p_opt_gn) / norm(p_true)

0.05181296446739656

When performing sum-of-squares minimization like here, we can, assuming that we converge to the global optimum, estimate the covariance of the estimated parameters. The precision matrix $Λ$, which is the inverse of the covariance matrix of the parameters, is given by a scaled Hessian of the cost function. The Gauss-Newton appoximation of the Hessian is given by $J'J$, where $J$ is the Jacobian of the residuals.

using ForwardDiff
T = length(y)
J = ForwardDiff.jacobian(residuals!, zeros(T * ny), res_gn.minimizer)
Λ = (T - length(p_guess)) * Symmetric(J' * J) # Precision matrix of the estimated parameters
# Σ = inv(Λ) # Covariance matrix of the estimated parameters (only compute this if precision matrix is well conditioned)
svdvals(Λ)

5-element Vector{Float64}:
     1.0581525693369225e7
 14925.359179054769
   862.8219878962076
   386.58162555667235
     3.4106051316475588e-12

In this case, the precision matrix is singular, indicating that there is at least one diretion in parameter space that yields no increase in cost, and we can thus not determine where along a line in this direction the true parameter lies.

Gauss-Newton algorithms are often more efficient at sum-of-squares minimization than the more generic BFGS optimizer. This form of Gauss-Newton optimization of prediction errors is also available through ControlSystemIdentification.jl, which uses this package undernath the hood.

Which method should I use?

The methods demonstrated above have slightly different applicability, here, we try to outline which methods to consider for different problems

Method	Parameter Estimation	Covariance Estimation	Time Varying Parameters	Online Estimation
Maximum likelihood	🟢	🟢	🟥	🟥
Joint state/par estim	🔶	🟥	🟢	🟢
Prediction-error opt.	🟢	🟥	🟥	🟥

When trying to optimize parameters of the noise distributions, most commonly the covariance matrices, maximum-likelihood (or MAP) is the only recommened method. Similarly, when parameters are time varying or you want an online estimate, the method that jointly estimates state and parameter is the only applicable method. When fitting standard parameters, all methods are applicable. In this case the joint state and parameter estimation tends to be inefficient and unneccesarily complex, and it is recommended to opt for maximum likelihood or prediction-error minimization. The prediction-error minimization (PEM) with a Gauss-Newtown optimizer is often the most efficient method for this type of problem.

Maximum likelihood estimation tends to yield an estimator with better estimates of posterior covariance since this is explicitly optimized for, while PEM tends to produce the smallest possible prediction errors.

Identifiability

Polynomial methods

There is no guarantee that we will recover the true parameters by perfoming parameter estimation, especially not if the input excitation is poor. For the system in this tutorial, we will generally find parameters that results in a good predictor for the system (this is after all what we're optimizing for), but these may not be the "correct" parameters. A tool like StructuralIdentifiability.jl may be used to determine the identifiability of parameters and state variables (for rational systems), something that for this system could look like

using StructuralIdentifiability

ode = @ODEmodel(
    h1'(t) = -a1/A1 * h1(t) + a3/A1*h3(t) +     gam*k1/A1 * u1(t),
    h2'(t) = -a2/A2 * h2(t) + a4/A2*h4(t) +     gam*k2/A2 * u2(t),
    h3'(t) = -a3/A3*h3(t)                 + (1-gam)*k2/A3 * u2(t),
    h4'(t) = -a4/A4*h4(t)                 + (1-gam)*k1/A4 * u1(t),
	y1(t) = h1(t),
    y2(t) = h2(t),
)

local_id = assess_local_identifiability(ode)

where we have made the substitution $\sqrt h \rightarrow h$ due to a limitation of the tool (it currently only handles rational ODEs). The output of the above analysis is

julia> local_id = assess_local_identifiability(ode)
Dict{Nemo.fmpq_mpoly, Bool} with 15 entries:
  a3  => 0
  gam => 1
  k2  => 0
  A4  => 0
  h4  => 0
  h2  => 1
  A3  => 0
  a1  => 0
  A2  => 0
  k1  => 0
  a4  => 0
  h3  => 0
  h1  => 1
  A1  => 0
  a2  => 0

indicating that we can not hope to resolve all of the parameters. However, using appropriate regularization from prior information, we might still recover a lot of information about the system. Regularization could easily be added to the function cost above, e.g., using a penalty like (p-p_guess)'Γ*(p-p_guess) for some matrix $\Gamma$, to indicate our confidence in the initial guess.

Linear methods

This package also contains an interface to ControlSystemsBase, which allows you to call ControlSystemsBase.observability(f, x, u, p, t) on a filter f to linearize (if needed) it in the point x,u,p,t and assess observability using linear methods (the PHB test). Also ControlSystemsBase.obsv(f, x, u, p, t) for computing the observability matrix is available.

Videos

Examples of parameter estimation are available here

By using an optimizer to optimize the likelihood of an UnscentedKalmanFilter:

Estimation of time-varying parameters:

Adaptive control by means of estimation of time-varying parameters: