Fault detection

This is also a video tutorial, available below:

Fault detection using state estimation

This tutorial explores the use of a Kalman filter for fault detection in a thermal system

  • Modeling
  • Filtering
  • Maximum-likelihood estimation of covariance and model parameters
  • Monitor prediction-error Z-score to detect faults
    • A fault may be faulty sensor or unexpected temperature fluctuations
using DelimitedFiles, Plots, Dates
using LowLevelParticleFilters, LinearAlgebra, StaticArrays
using LowLevelParticleFilters: AbstractKalmanFilter, particletype, covtype,state,  covariance, parameters, KalmanFilteringSolution
using Optim

Load data

From kaggle.com/datasets/arashnic/sensor-fault-detection-data

A time series of temperature measurements

using Downloads
url = "https://drive.google.com/uc?export=download&id=1zuIBaOhhrCxnifbvY7qJQTOyKWBDeBRh"
filename = "sensor-fault-detection.csv"
Downloads.download(url, filename)
raw_data = readdlm(filename, ';')
header = raw_data[1,:]
df = dateformat"yyyy-mm-ddTHH:MM:SS"

The data is not stored in order

time_unsorted = DateTime.(getindex.(raw_data[2:end, 1], Ref(1:19)), df)
62629-element Vector{Dates.DateTime}:
 2017-03-01T23:20:00
 2017-03-02T04:00:00
 2017-03-23T06:25:00
 2017-03-23T19:35:00
 2017-04-04T15:10:00
 2017-04-04T23:50:00
 2017-04-05T04:55:00
 2017-04-05T16:25:00
 2017-04-07T16:20:00
 2017-04-07T23:50:00
 ⋮
 2017-07-21T19:40:00
 2017-07-21T21:20:00
 2017-06-27T16:10:00
 2017-06-29T03:45:00
 2017-05-05T17:00:00
 2017-07-07T16:40:00
 2017-05-07T09:20:00
 2017-05-07T14:55:00
 2017-05-07T20:40:00

so we compute a sorting permutation that brings it into chronological order

perm = sortperm(time_unsorted)
time = time_unsorted[perm]
y = raw_data[2:end, 3][perm] .|> float

y is the recorded temperature data.

Look at the data

plot(time, y, ylabel="Temperature", legend=false)
Example block output
timev = Dates.value.(time)  ./ 1000 # A numerical time vector, time was in milliseconds
plot(diff(timev), yscale=:log10, title="Time interval between measurement points", legend=false)
Example block output

Samples are not evenly spaced (lots of missing data), but the interval is always a multiple of Ts

intervals = sort(unique(diff(timev)))
intervals ./ intervals[1]
Ts = intervals[1]
Tf = intervals[end] - intervals[1]

We expand the data arrays such that we can treat them as having a constant sample interval, time points where there is no data available are indicated as missing

time_full = range(timev[1], timev[end], step=Ts)

available_inds = [findfirst(==(t), time_full) for t in timev]

y_full = fill(NaN, length(time_full))
y_full[available_inds] .= y
y_full = replace(y_full, NaN=>missing)
y_full = SVector{1}.(y_full)

Design Kalman filter

Modeling

A simple model of temperature change is

\[\dot T(t) = \alpha \big(T(t) - T_{env}(t)\big) + w(t)\]

Where $T$ is the temperature of the system, $T_{env}$ the temperature of the environment and $w$ represents thermal energy added or removed by unmodeled sources.

Since we have no knowledge of $T_{env}$ and $w$, but we observe that they vary slowly, we add yet another state variable to the model corresponding to an integrating disturbance model:

\[\begin{aligned} \dot T(t) &= z(t) + b_T w_T(t) \\ \dot z(t) &= b_z w_z(t) \end{aligned}\]

This model is linear, and can be written on the form

\[\begin{aligned} \dot x &= Ax + Bw \\ y &= Cx + e \end{aligned}\]

with $A$ matrix

\[A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}\]

which, when discretized (assuming unit sample interval), becomes

\[A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\]

A,B,C,D = SA[1.0 1; 0 1], @SMatrix(zeros(2,0)), SA[1.0 0], 0;

Picking covariance matrices

R1 = 1e-4LowLevelParticleFilters.double_integrator_covariance(1) |> SMatrix{2,2}
R2 = SA[0.1^2;;]
d0 = LowLevelParticleFilters.SimpleMvNormal(SA[y[1], 0], SA[100.0 0; 0 0.1])
kf = KalmanFilter(A,B,C,D,R1,R2,d0; Ts)
KalmanFilter{StaticArraysCore.SMatrix{2, 2, Float64, 4}, StaticArraysCore.SMatrix{2, 0, Float64, 0}, StaticArraysCore.SMatrix{1, 2, Float64, 2}, Matrix{Float64}, StaticArraysCore.SMatrix{2, 2, Float64, 4}, StaticArraysCore.SMatrix{1, 1, Float64, 1}, LowLevelParticleFilters.SimpleMvNormal{StaticArraysCore.SVector{2, Float64}, StaticArraysCore.SMatrix{2, 2, Float64, 4}}, StaticArraysCore.SVector{2, Float64}, StaticArraysCore.SMatrix{2, 2, Float64, 4}, Float64, LowLevelParticleFilters.NullParameters, Float64}([1.0 1.0; 0.0 1.0], 2×0 StaticArraysCore.SMatrix{2, 0, Float64, 0} with indices SOneTo(2)×SOneTo(0), [1.0 0.0], Matrix{Float64}(undef, 1, 0), [2.5e-5 5.0e-5; 5.0e-5 0.0001], [0.010000000000000002;;], LowLevelParticleFilters.SimpleMvNormal{StaticArraysCore.SVector{2, Float64}, StaticArraysCore.SMatrix{2, 2, Float64, 4}}([27.63677025, 0.0], [100.0 0.0; 0.0 0.1]), [27.63677025, 0.0], [100.0 0.0; 0.0 0.1], 0, 300.0, LowLevelParticleFilters.NullParameters(), 1.0, 2, 0, 1, SignalNames(["x1", "x2"], String[], ["y1"], "KF"))

Perform filtering

When data is missing, we omit the call to correct!. We still perform the prediction step though.

function special_forward_trajectory(kf::AbstractKalmanFilter, u::AbstractVector, y::AbstractVector, p=parameters(kf))
    reset!(kf)
    T    = length(y)
    x    = Array{particletype(kf)}(undef,T)
    xt   = Array{particletype(kf)}(undef,T)
    R    = Array{covtype(kf)}(undef,T)
    Rt   = Array{covtype(kf)}(undef,T)
    e    = zeros(eltype(particletype(kf)), length(y))
	σs   = zeros(eltype(particletype(kf)), length(y))
    ll   = zero(eltype(particletype(kf)))
    for t = 1:T
        ti = (t-1)*kf.Ts
        x[t]  = state(kf)      |> copy
        R[t]  = covariance(kf) |> copy
		if !any(ismissing, y[t])
        	lli, ei, S, Sᵪ = correct!(kf, u[t], y[t], p, ti)
			σs[t] = √(ei'*(Sᵪ\ei)) # Compute the Z-score
			e[t] = ei[]
			ll += lli
		end
        xt[t] = state(kf)      |> copy
        Rt[t] = covariance(kf) |> copy
        predict!(kf, u[t], p, ti)
    end
    KalmanFilteringSolution(kf,u,y,x,xt,R,Rt,ll,e), σs
end

u_full = [@SVector(zeros(0)) for y in y_full];

start = 1 # Change this value to display different parts of the data set
N = 1000  # Number of data points to include (to limit plot size in the docs, plot with Plots.plotly() and N = length(y_full) to see the full data set with the ability to zoom interactively in the plot)

sol, σs = special_forward_trajectory(kf, u_full[(1:N) .+ (start-1)], y_full[(1:N) .+ (start-1)])

sol.ll
-33.70147554944659

Smoothing

For good measure, we also perform smoothing, computing

\[x(k \,|\, T_f)\]

as opposed to filtering which is computing

\[x(k \,|\, k)\]

or prediction

\[x(k \,|\, k-1)\]

xT,RT = smooth(sol, kf, sol.u, sol.y)

Visualize the filtered and smoothed trajectories

timevec = range(0, step=Ts, length=length(sol.y))

plot(sol,
    plotx   = false,
    plotxt  = true,
    plotRt  = true,
    plotyh  = false,
    plotyht = true,
    size = (650,600), seriestype = [:line :line :scatter], link = :x,
)
plot!(timevec, reduce(hcat, xT)[1,:], sp=3, label="Smoothed")
Example block output

Estimate the dynamics covariance using maximum-likelihood estimation (MLE)

Since we have a single parameter only, we may plot the loss landscape.

svec = exp10.(range(-5, -2, length=30)) # Covariance values to try

# Compute the log-likelihood for all covariance values
lls = map(svec) do s #
	R1 = s*LowLevelParticleFilters.double_integrator_covariance(1) |> SMatrix{2,2}
	kf = KalmanFilter(A,B,C,D,R1,R2,d0; Ts)
	sol, σs = special_forward_trajectory(kf, u_full, y_full)
	sol.ll
end

plot(svec, lls, xscale=:log10, title="Log-likelihood estimation")
Example block output

Get the covariance parameter associated with the maximum likelihood:

svec[argmax(lls)]
0.0002807216203941176

Optimize "friction" and covariance jointly

We can add some damping to the velocity state in the double-integrator model. When doing so, we should also estimate the full covariance matrix of the dynamics noise. This gives us an estimation problem with 1 + 3 parameters, 3 for the triangular part of the covariance matrix Cholesky factor. Estimating the Cholesky factor instead of the full covariance matrix yields fewer optimizaiton variables and ensures that the result is a valid, positive definite and symmetric covariance matrix. To ensure that the "friction parameter" is positive, we optimize the $\log$ of the parameter.

A double integrator has the dynamics matrix

\[\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\]

By modifying this to

\[\begin{bmatrix} 1 & 1 \\ 0 & \alpha \end{bmatrix}\]

where $0 \leq \alpha \leq 1$, we can add some damping to the velocity, i.e., if no force is acting on it it will eventually slow down to velocity zero. It's not quite correct to call the parameter $\alpha$ a "damping term", the formulation $\beta = 1 - \alpha$ would be closer to an actual discrete-time damping factor.

function triangular(x)
    m = length(x)
    n = round(Int, sqrt(2m-1))
    T = zeros(eltype(x), n, n)
    k = 1
    for i = 1:n, j = i:n
        T[i,j] = x[k]
        k += 1
    end
    T
end

invtriangular(T) = [T[i,j] for i = 1:size(T,1) for j = i:size(T,1)]

params = log.([invtriangular(cholesky(R1).U); 1])

function get_opt_kf(logp)
	T = eltype(logp)
	p = exp.(logp)
	R1c = triangular(p[1:3]) |> SMatrix{2,2}
	R1 = R1c'R1c + 1e-8I
	vel = p[4]
	vel > 1 && (return T(Inf))
	A = SA[1 1; 0 vel]
	d0T = LowLevelParticleFilters.SimpleMvNormal(T.(d0.μ), T.(d0.Σ + I))
	kf = KalmanFilter(A,B,C,D,R1,R2,d0T; Ts, check=false)
end

function cost(logp)
	try
		kf = get_opt_kf(logp)
		soli, σs = special_forward_trajectory(kf, u_full, y_full)
		return -soli.ll
	catch e
		return eltype(logp)(Inf)
	end
end

cost(params)
-43252.219664809716

Optimize

res = Optim.optimize(
    cost,
    params,
    LBFGS(),
    Optim.Options(
        show_trace        = true,
        show_every        = 5,
        iterations        = 1000,
		x_tol 			  = 1e-7,
    ),
	autodiff = :forward,
)
get_opt_kf(res.minimizer).R1
2×2 StaticArraysCore.SMatrix{2, 2, Float64, 4} with indices SOneTo(2)×SOneTo(2):
 0.000104169  0.000198333
 0.000198333  0.000377665

The initial guess was

R1
2×2 StaticArraysCore.SMatrix{2, 2, Float64, 4} with indices SOneTo(2)×SOneTo(2):
 2.5e-5  5.0e-5
 5.0e-5  0.0001

Compare optimized parameter vector with initial guess:

exp.([params res.minimizer])
4×2 Matrix{Float64}:
 0.005        0.0102058
 0.01         0.0194333
 2.50766e-11  2.50766e-11
 1.0          0.904142

Visualize optimized filtering trajectory

kf2 = get_opt_kf(res.minimizer)
sol2, σs2 = special_forward_trajectory(kf2, u_full[(1:N) .+ (start-1)], y_full[(1:N) .+ (start-1)])

xT2,RT2 = smooth(sol2, kf2, sol2.u, sol2.y)

plot(sol2, plotx=false, plotxt=true, plotRt=true, plotyh=false, plotyht=true, size=(650,600), seriestype=[:line :line :scatter], link=:x)
plot!(timevec, reduce(hcat, xT2)[1,:], sp=3, label="Smoothed")

outliers = findall(σs2 .> 5)
vline!([timevec[outliers]], sp=3)
Example block output

Fault detection

We implement a simple fault detector using Z-scores. When the Z-score is higher than 4, we consider it a fault.

plot(timevec, σs2); hline!([1 2 3 4], label=false)
Example block output

Z-scores may not capture large outliners if they occur when the estimator is very uncertain Does Z-score correlate with "velocity", i.e., are faults correlated with large continuous slopes in the data?

sol_full, σs_full = special_forward_trajectory(kf2, u_full, y_full)
scatter(abs.(getindex.(sol_full.xt, 2)), σs_full, ylabel="Z-score", xlabel="velocity")
Example block output

not really, it looks like large Z-scores can appear even when the estimated velocity is small.

Summary

  • A state estimator can indicate faults when the error is larger than expected
  • What is expected is determined by the model

The notebook used in the tutorial is available here: