Fault detection
This is also a video tutorial, available below:
Fault detection using state estimation
This tutorial explores the use of a Kalman filter for fault detection in a thermal system
- Modeling
- Filtering
- Maximum-likelihood estimation of covariance and model parameters
- Monitor prediction-error Z-score to detect faults
- A fault may be faulty sensor or unexpected temperature fluctuations
using DelimitedFiles, Plots, Dates
using LowLevelParticleFilters, LinearAlgebra, StaticArrays
using LowLevelParticleFilters: AbstractKalmanFilter, particletype, covtype,state, covariance, parameters, KalmanFilteringSolution
using Optim
Load data
From kaggle.com/datasets/arashnic/sensor-fault-detection-data
A time series of temperature measurements
using Downloads
url = "https://drive.google.com/uc?export=download&id=1zuIBaOhhrCxnifbvY7qJQTOyKWBDeBRh"
filename = "sensor-fault-detection.csv"
Downloads.download(url, filename)
raw_data = readdlm(filename, ';')
header = raw_data[1,:]
df = dateformat"yyyy-mm-ddTHH:MM:SS"
The data is not stored in order
time_unsorted = DateTime.(getindex.(raw_data[2:end, 1], Ref(1:19)), df)
62629-element Vector{Dates.DateTime}:
2017-03-01T23:20:00
2017-03-02T04:00:00
2017-03-23T06:25:00
2017-03-23T19:35:00
2017-04-04T15:10:00
2017-04-04T23:50:00
2017-04-05T04:55:00
2017-04-05T16:25:00
2017-04-07T16:20:00
2017-04-07T23:50:00
⋮
2017-07-21T19:40:00
2017-07-21T21:20:00
2017-06-27T16:10:00
2017-06-29T03:45:00
2017-05-05T17:00:00
2017-07-07T16:40:00
2017-05-07T09:20:00
2017-05-07T14:55:00
2017-05-07T20:40:00
so we compute a sorting permutation that brings it into chronological order
perm = sortperm(time_unsorted)
time = time_unsorted[perm]
y = raw_data[2:end, 3][perm] .|> float
y
is the recorded temperature data.
Look at the data
plot(time, y, ylabel="Temperature", legend=false)

timev = Dates.value.(time) ./ 1000 # A numerical time vector, time was in milliseconds
plot(diff(timev), yscale=:log10, title="Time interval between measurement points", legend=false)

Samples are not evenly spaced (lots of missing data), but the interval is always a multiple of Ts
intervals = sort(unique(diff(timev)))
intervals ./ intervals[1]
Ts = intervals[1]
Tf = intervals[end] - intervals[1]
We expand the data arrays such that we can treat them as having a constant sample interval, time points where there is no data available are indicated as missing
time_full = range(timev[1], timev[end], step=Ts)
available_inds = [findfirst(==(t), time_full) for t in timev]
y_full = fill(NaN, length(time_full))
y_full[available_inds] .= y
y_full = replace(y_full, NaN=>missing)
y_full = SVector{1}.(y_full)
Design Kalman filter
Modeling
A simple model of temperature change is
\[\dot T(t) = \alpha \big(T(t) - T_{env}(t)\big) + w(t)\]
Where $T$ is the temperature of the system, $T_{env}$ the temperature of the environment and $w$ represents thermal energy added or removed by unmodeled sources.
Since we have no knowledge of $T_{env}$ and $w$, but we observe that they vary slowly, we add yet another state variable to the model corresponding to an integrating disturbance model:
\[\begin{aligned} \dot T(t) &= z(t) + b_T w_T(t) \\ \dot z(t) &= b_z w_z(t) \end{aligned}\]
This model is linear, and can be written on the form
\[\begin{aligned} \dot x &= Ax + Bw \\ y &= Cx + e \end{aligned}\]
with $A$ matrix
\[A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}\]
which, when discretized (assuming unit sample interval), becomes
\[A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\]
A,B,C,D = SA[1.0 1; 0 1], @SMatrix(zeros(2,0)), SA[1.0 0], 0;
Picking covariance matrices
R1 = 1e-4LowLevelParticleFilters.double_integrator_covariance(1) |> SMatrix{2,2}
R2 = SA[0.1^2;;]
d0 = LowLevelParticleFilters.SimpleMvNormal(SA[y[1], 0], SA[100.0 0; 0 0.1])
kf = KalmanFilter(A,B,C,D,R1,R2,d0; Ts)
KalmanFilter{StaticArraysCore.SMatrix{2, 2, Float64, 4}, StaticArraysCore.SMatrix{2, 0, Float64, 0}, StaticArraysCore.SMatrix{1, 2, Float64, 2}, Matrix{Float64}, StaticArraysCore.SMatrix{2, 2, Float64, 4}, StaticArraysCore.SMatrix{1, 1, Float64, 1}, LowLevelParticleFilters.SimpleMvNormal{StaticArraysCore.SVector{2, Float64}, StaticArraysCore.SMatrix{2, 2, Float64, 4}}, StaticArraysCore.SVector{2, Float64}, StaticArraysCore.SMatrix{2, 2, Float64, 4}, Float64, LowLevelParticleFilters.NullParameters, Float64}([1.0 1.0; 0.0 1.0], 2×0 StaticArraysCore.SMatrix{2, 0, Float64, 0} with indices SOneTo(2)×SOneTo(0), [1.0 0.0], Matrix{Float64}(undef, 1, 0), [2.5e-5 5.0e-5; 5.0e-5 0.0001], [0.010000000000000002;;], LowLevelParticleFilters.SimpleMvNormal{StaticArraysCore.SVector{2, Float64}, StaticArraysCore.SMatrix{2, 2, Float64, 4}}([27.63677025, 0.0], [100.0 0.0; 0.0 0.1]), [27.63677025, 0.0], [100.0 0.0; 0.0 0.1], 0, 300.0, LowLevelParticleFilters.NullParameters(), 1.0, 2, 0, 1, SignalNames(["x1", "x2"], String[], ["y1"], "KF"))
Perform filtering
When data is missing, we omit the call to correct!
. We still perform the prediction step though.
function special_forward_trajectory(kf::AbstractKalmanFilter, u::AbstractVector, y::AbstractVector, p=parameters(kf))
reset!(kf)
T = length(y)
x = Array{particletype(kf)}(undef,T)
xt = Array{particletype(kf)}(undef,T)
R = Array{covtype(kf)}(undef,T)
Rt = Array{covtype(kf)}(undef,T)
e = zeros(eltype(particletype(kf)), length(y))
σs = zeros(eltype(particletype(kf)), length(y))
ll = zero(eltype(particletype(kf)))
for t = 1:T
ti = (t-1)*kf.Ts
x[t] = state(kf) |> copy
R[t] = covariance(kf) |> copy
if !any(ismissing, y[t])
lli, ei, S, Sᵪ = correct!(kf, u[t], y[t], p, ti)
σs[t] = √(ei'*(Sᵪ\ei)) # Compute the Z-score
e[t] = ei[]
ll += lli
end
xt[t] = state(kf) |> copy
Rt[t] = covariance(kf) |> copy
predict!(kf, u[t], p, ti)
end
KalmanFilteringSolution(kf,u,y,x,xt,R,Rt,ll,e), σs
end
u_full = [@SVector(zeros(0)) for y in y_full];
start = 1 # Change this value to display different parts of the data set
N = 1000 # Number of data points to include (to limit plot size in the docs, plot with Plots.plotly() and N = length(y_full) to see the full data set with the ability to zoom interactively in the plot)
sol, σs = special_forward_trajectory(kf, u_full[(1:N) .+ (start-1)], y_full[(1:N) .+ (start-1)])
sol.ll
-33.70147554944659
Smoothing
For good measure, we also perform smoothing, computing
\[x(k \,|\, T_f)\]
as opposed to filtering which is computing
\[x(k \,|\, k)\]
or prediction
\[x(k \,|\, k-1)\]
xT,RT = smooth(sol, kf, sol.u, sol.y)
Visualize the filtered and smoothed trajectories
timevec = range(0, step=Ts, length=length(sol.y))
plot(sol,
plotx = false,
plotxt = true,
plotRt = true,
plotyh = false,
plotyht = true,
size = (650,600), seriestype = [:line :line :scatter], link = :x,
)
plot!(timevec, reduce(hcat, xT)[1,:], sp=3, label="Smoothed")

Estimate the dynamics covariance using maximum-likelihood estimation (MLE)
Since we have a single parameter only, we may plot the loss landscape.
svec = exp10.(range(-5, -2, length=30)) # Covariance values to try
# Compute the log-likelihood for all covariance values
lls = map(svec) do s #
R1 = s*LowLevelParticleFilters.double_integrator_covariance(1) |> SMatrix{2,2}
kf = KalmanFilter(A,B,C,D,R1,R2,d0; Ts)
sol, σs = special_forward_trajectory(kf, u_full, y_full)
sol.ll
end
plot(svec, lls, xscale=:log10, title="Log-likelihood estimation")
Get the covariance parameter associated with the maximum likelihood:
svec[argmax(lls)]
0.0002807216203941176
Optimize "friction" and covariance jointly
We can add some damping to the velocity state in the double-integrator model. When doing so, we should also estimate the full covariance matrix of the dynamics noise. This gives us an estimation problem with 1 + 3 parameters, 3 for the triangular part of the covariance matrix Cholesky factor. Estimating the Cholesky factor instead of the full covariance matrix yields fewer optimizaiton variables and ensures that the result is a valid, positive definite and symmetric covariance matrix. To ensure that the "friction parameter" is positive, we optimize the $\log$ of the parameter.
A double integrator has the dynamics matrix
\[\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}\]
By modifying this to
\[\begin{bmatrix} 1 & 1 \\ 0 & \alpha \end{bmatrix}\]
where $0 \leq \alpha \leq 1$, we can add some damping to the velocity, i.e., if no force is acting on it it will eventually slow down to velocity zero. It's not quite correct to call the parameter $\alpha$ a "damping term", the formulation $\beta = 1 - \alpha$ would be closer to an actual discrete-time damping factor.
function triangular(x)
m = length(x)
n = round(Int, sqrt(2m-1))
T = zeros(eltype(x), n, n)
k = 1
for i = 1:n, j = i:n
T[i,j] = x[k]
k += 1
end
T
end
invtriangular(T) = [T[i,j] for i = 1:size(T,1) for j = i:size(T,1)]
params = log.([invtriangular(cholesky(R1).U); 1])
function get_opt_kf(logp)
T = eltype(logp)
p = exp.(logp)
R1c = triangular(p[1:3]) |> SMatrix{2,2}
R1 = R1c'R1c + 1e-8I
vel = p[4]
vel > 1 && (return T(Inf))
A = SA[1 1; 0 vel]
d0T = LowLevelParticleFilters.SimpleMvNormal(T.(d0.μ), T.(d0.Σ + I))
kf = KalmanFilter(A,B,C,D,R1,R2,d0T; Ts, check=false)
end
function cost(logp)
try
kf = get_opt_kf(logp)
soli, σs = special_forward_trajectory(kf, u_full, y_full)
return -soli.ll
catch e
return eltype(logp)(Inf)
end
end
cost(params)
-43252.219664809716
Optimize
res = Optim.optimize(
cost,
params,
LBFGS(),
Optim.Options(
show_trace = true,
show_every = 5,
iterations = 1000,
x_tol = 1e-7,
),
autodiff = :forward,
)
get_opt_kf(res.minimizer).R1
2×2 StaticArraysCore.SMatrix{2, 2, Float64, 4} with indices SOneTo(2)×SOneTo(2):
0.000104169 0.000198333
0.000198333 0.000377665
The initial guess was
R1
2×2 StaticArraysCore.SMatrix{2, 2, Float64, 4} with indices SOneTo(2)×SOneTo(2):
2.5e-5 5.0e-5
5.0e-5 0.0001
Compare optimized parameter vector with initial guess:
exp.([params res.minimizer])
4×2 Matrix{Float64}:
0.005 0.0102058
0.01 0.0194333
2.50766e-11 2.50766e-11
1.0 0.904142
Visualize optimized filtering trajectory
kf2 = get_opt_kf(res.minimizer)
sol2, σs2 = special_forward_trajectory(kf2, u_full[(1:N) .+ (start-1)], y_full[(1:N) .+ (start-1)])
xT2,RT2 = smooth(sol2, kf2, sol2.u, sol2.y)
plot(sol2, plotx=false, plotxt=true, plotRt=true, plotyh=false, plotyht=true, size=(650,600), seriestype=[:line :line :scatter], link=:x)
plot!(timevec, reduce(hcat, xT2)[1,:], sp=3, label="Smoothed")
outliers = findall(σs2 .> 5)
vline!([timevec[outliers]], sp=3)

Fault detection
We implement a simple fault detector using Z-scores. When the Z-score is higher than 4, we consider it a fault.
plot(timevec, σs2); hline!([1 2 3 4], label=false)

Z-scores may not capture large outliners if they occur when the estimator is very uncertain Does Z-score correlate with "velocity", i.e., are faults correlated with large continuous slopes in the data?
sol_full, σs_full = special_forward_trajectory(kf2, u_full, y_full)
scatter(abs.(getindex.(sol_full.xt, 2)), σs_full, ylabel="Z-score", xlabel="velocity")

not really, it looks like large Z-scores can appear even when the estimated velocity is small.
Summary
- A state estimator can indicate faults when the error is larger than expected
- What is expected is determined by the model
The notebook used in the tutorial is available here: