In a previous blog post, we spoke about how to tune a Kalman filter, a widely used state estimator for linear dynamical systems. Unfortunately, not all practically occurring systems are represented sufficiently well with linear dynamics affected by Gaussian noise. In fact, such assumptions are almost never satisfied completely, even if they may be good enough in many situations. In this post, we will take a step up in sophistication and consider nonlinear systems expressed as differential-algebraic equations, and explore the challenges and opportunities that arise when performing state estimation and virtual sensing for this richer set of systems.
Modern equation-based modeling tools, such as ModelingToolkit, allow the user to quickly and in a component-based fashion, model complex real-world systems. Such tools often produce differential-algebraic equation models (DAEs), i.e., the dynamics may include both differential equations and algebraic equations. The standard Kalman filter, and most of its nonlinear extensions, do not deal with algebraic equations, so how do we estimate the state for such a DAE system?
Before we proceed, we define some notion. Let the dynamic equations of the system we want to estimate the state of be written as
$$M\dot x(t) = f(x(t),u(t))$$
where $$M$$ is called a "mass matrix", $$x$$ is the state (technically, since we are dealing with DAEs, it's a descriptor) and $$u$$ is the input. If we take $$M=I$$, we have a standard ODE system, if $$M$$ is invertible, we can easily convert this system to an ODE system like $$ẋ=M^-1f(x,u)$$, so the interesting case is when $$M$$ is singular. A commonly occurring special case is when $$M$$ has the form
$$\left\lbrack \matrix{1 & 0\cr 0 & 0} \right\rbrack$$
i.e., the system can be written as
$$\left\lbrack \matrix{\dot x_\delta \cr 0} \right\rbrack = f(x,u)$$
where $$\dot x_\delta$$ includes the derivatives of the differential state variables only. In this form, we have $$f$$ containing a number of ordinary differential equations, followed by one or several algebraic equations $$0=g(x,u)$$
Differential-equation solvers that support DAEs often make use of a nonlinear root finder that continuously solves the equation $$0=g(x,u)$$ in order to simulate the system.
Now, how do we estimate the state descriptor $$x$$ in a DAE system? This question is much more nuanced than when we're dealing with systems of ordinary differential equations only. How come?
When we design a state estimator, we must come up with some model of the disturbances that affect the dynamics and the measurements. This disturbance model is usually designed to include both unmeasured inputs that affect the system, but also to mitigate the effects of model error. Modeling any physical system perfectly well is impossible, and we must somehow account for the fact that our dynamic equations are an imperfect representation of the true dynamics that govern the system. As detailed in the previous post, the Kalman filter models these disturbances as Gaussian noise, but for nonlinear DAE systems, things can quickly become more challenging.
To ground the continued discussion, we introduce an example system, the Cartesian pendulum. The most common representation of the dynamics of a simple pendulum is $$\ddot \phi - \frac{g}{l}\sin\phi + \tau$$ In this representation, we use the angle $$\phi$$ of the pendulum as our coordinate, and $$\tau$$ is some torque applied to the joint. However, we can model the pendulum also in Cartesian $$(x,y)$$ coordinates, in which case the dynamic equations may be represented as
$$\dot x = x_v$$
$$\dot y = y_v$$
$$x_v = -\lambda_x + f_x$$
$$y_v = -\lambda_y + f_y - g$$
$$0 = x^2 + y^2 - l^2$$
where $$x,y$$ are the coordinates of the tip of the pendulum, $$f_x, f_y$$ are forces acting on the tip, $$\lambda$$ is the tension force in the pendulum rod, $$g$$ is the gravitiational acceleration and $$l$$ is the length of the pendulum. Notice how there's no differential equation for $$\lambda$$, instead, there is an algebraic equation that says that the tip of the pendulum is at a distance $$l$$ from the pivot
$$0 = x^2 + y^2 - l^2 \iff \sqrt{x^2 + y^2} = l$$
from this algebraic equation, it's possible to compute the tension force (if we know the mass of the pendulum).
An animation of this pendulum is shown below, this animation is produced by the very last code in this post, but the figure is reproduced here to establish a picture of how the system looks and feels.
Some numerical integrators do not like this representation of the pendulum, the problem is that it has a high DAE index. The index of a DAE is roughly the number of times you need to differentiate the algebraic equation with respect to time before you get a differential equation. This representation of the pendulum has index 3 since our algebraic equation is a position constraint. If we differentiate this equation, we get another algebraic equation that says that the velocity is tangential to the circle the pendulum is tracing. If we differentiate one more time, we get an algebraic equation that says that the acceleration of the pendulum is centripetal (points towards the center point). Finally, if we differentiate one more time, we get a differential equation. Now, why don't we always differentiate sufficiently many times to get a differential equation so that we can forget about DAEs altogether? The problem is that while the differentiated system is mathematically equivalent, we now have to integrate one extra equation, and integration is associated with some small numerical error. Over time, this integrated error grows, causing our original index-3 constraint $$\sqrt{x^2 + y^2} = l$$ to be violated. The more times we differentiate (and thus have to integrate), the bigger this problem becomes. This can be compared to trying to estimate a position by integrating a noisy accelerometer measurement twice, this typically leads to a very noisy estimate of the position. The figure below illustrates the problem of integrating small random errors.
If all we are doing is simulating the system, this problem can be mitigated by making the integrator tolerance very small so that the numerical drift is small, but what about when we are performing state estimation?
Imagine that we obtain measurements of the $$x,y$$ coordinates of the pendulum, perhaps from a camera. These are going to be noisy and associated with some probability-density function. Let's say that the noise is Gaussian and distributed according to $$N(0,R_y)$$ where $$R_y$$ is a 2×2 covariance matrix. We immediately spot a problem, the measurement noise has two degrees of freedom, but the real system only has one. Our noisy measurements are likely going to lie outside of the circle $$\sqrt{x^2 + y^2} = l$$ and thus violate the constraint! If we differentiate the algebraic equations twice, to obtain an index-1 DAE, we hide this problem since we can solve the algebraic equation that relates $$x,y$$ and $$\lambda$$ through centripetal acceleration for a feasible $$\lambda$$ given any pair of $$x,y$$ measurements, but our original index-3 position constraint will be silently violated.
The problems don't end here though, how do we represent the posterior distribution of the state? The Kalman filter represents this as a multivariate normal distribution, but no normal distribution has support only on a circle, and we can thus never represent the posterior uncertainty as a multivariate normal in the $$x,y$$ space. We cannot even compute the standard mean of several distinct points on the circle without getting a point that lies inside the circle and thus violates the constraint (This is what the Unscented Kalman filter (UKF) would have done for this system).
How do we solve this problem? Unfortunately, there is not a single easy answer, and the best approach is context-dependent. For the pendulum, the answer is obvious, model the system in polar coordinates instead of Cartesian coordinates. For other systems, the answer is not so straightforward. If we know that we are modeling an index-3 system like the Cartesian pendulum, we could imagine the following list of partial solutions
Use an integrator that handles index 3 algebraic equations and keep the original formulation of the dynamics without differentiating the algebraic equation. Then project each measurement onto the feasible space of $$\sqrt{x^2 + y^2} = l$$.
Represent posterior distributions over the state as samples rather than as normal distributions.
Represent the input disturbances as dynamically feasible inputs to the system (in spirit with the previous blog post). This prevents the system from moving away from the index-3 constraint manifold due to disturbances.
In the rest of this post, we will make use of all three of the above-suggested points. However, it would be possible to make use of an index-1 formulation provided that the dynamically feasible disturbances are modeled already during the index-reduction stage.
Let's stop talking and implement some state estimation! We will use a particle filter as our state estimator, a sequential Monte Carlo method that represents distributions as collections of samples ("particles"). The particle filter is implemented in LowLevelParticleFilters.jl The integration of the continuous-time dynamics will be done using a direct-collocation method from the package SeeToDee.jl.
We start by loading the required packages:
using Pkg; Pkg.add(url="https://github.com/baggepinnen/SeeToDee.jl"); Pkg.add(url="https://github.com/baggepinnen/LowLevelParticleFilters.jl"); Pkg.add([
"Test"
"Random"
"LinearAlgebra"
"Statistics"
"StaticArrays"
"Distributions"
"Plots"
"NonlinearSolve"
])
using LowLevelParticleFilters, SeeToDee, Test, Random, LinearAlgebra, Statistics, StaticArrays, Distributions, Plots, NonlinearSolve
begin
Random.seed!(0)
gr(fmt=:png)
end
We then define the dynamic equations, our $$M\dot x(t) = f(x(t),u(t))$$
pend
A Cartesian pendulum of length 1 in DAE form
"A Cartesian pendulum of length 1 in DAE form"
function pend(state, f, p, t=0)
x,y,u,v,λ = state
g = 9.82
SA[
u # = ẋ
v # = ẏ
-λ*x + f[1] # = u̇
-λ*y - g + f[2] # = v̇
x^2 + y^2 - 1 # = 0
]
end
Next, we define some of the properties of the state estimator. We integrate with a fixed interval of $$T_s=0.01$$ seconds and use $$N=100$$ particles to represent our state distribution. We also choose some covariance properties of the measurement noise $$\sigma_y$$ and the input disturbance $$\sigma_w$$. We will model the dynamic disturbance as entering like Cartesian forces, similar to how the control input enters. This is a dynamically feasible disturbance that the DAE integrator can integrate the same way as it integrates the control input forces. Since the integrator we use treats inputs as constant during the integration step (Zero-order-Hold), this implicitly makes the assumption that disturbance forces are also piecewise constant. Since the integration time step $$T_s$$ is small, this is an acceptable model in this case and we will get a good approximation to a continuous-time noise process.
begin
nx = 4 # Dinemsion of differential state
na = 1 # Number of algebraic variables
nu = 2 # Dinemsion of input
ny = 2 # Dinemsion of measurements
Ts = 0.01
N = 100 # Number of particles
p = nothing # We have no parameters to keep track of
σy = 0.1 # Measurement noise std dev
σy_model = 0.2 # We model slightly larger measurement noise than we actually have
σw = 5.0 # Dynamics noise std dev
dg = MvNormal(zeros(2), σy_model)
end;
We then implement a function that models the measurement process. We assume that we can measure the coordinates $$x,y$$. Here we sample a random value from the measurement-noise distribution and then project the result measurement onto the circle traced by the pendulum. The resulting distribution is thus no longer Gaussian, but that is not a problem for a particle filter!
measurement (generic function with 2 methods)
@inbounds function measurement(x,u,p,t, noise=false)
if noise
y = SA[x[1] + σy*randn(), x[2] + σy*randn()]
y ./ norm(y) # Measurements projected to be consistent with algebraic constraint
else
SA[x[1], x[2]]
end
end
Here, we chose to apply measurement noise with standard deviation $$\sigma_y=0.1$$ to our measurements, but let the model in the filter use $$\sigma_{y_{model}}=0.2$$, this is a common trick in particle filtering to improve the performance of the estimator without increasing the number of particles, and thus the computational requirements, too much.
measurement_likelihood (generic function with 1 method)
function measurement_likelihood(x, u, y, p, t)
logpdf(dg, measurement(x, u, p, t)-y) # A simple measurement model with normal additive noise
end
Next, we discretize the continuous-time dynamics using the direct-collocation method. This method will happily accept algebraic equations, even if they are index 3.
We then compute an initial condition and create an initial state distribution from which the particle filter will sample particles when initializing the filter.
begin
x0 = SeeToDee.initialize(discrete_dynamics, [1.0,0,0,0,0], p) # This makes susre that the initial condition satisfies the algebraic equation.
x0 = SVector(x0...)
d0 = MvNormal(x0, 0) # Initial state Distribution, zero variance
end;
begin
x0 = SeeToDee.initialize(discrete_dynamics, [1.0,0,0,0,0], p) # This makes susre that the initial condition satisfies the algebraic equation.
x0 = SVector(x0...)
d0 = MvNormal(x0, 0) # Initial state Distribution, zero variance
end;
We also need to define the dynamics of each particle. The function dynamics below makes use of the discretized dynamic equations but adds noise handling if the filter calls for dynamics noise to be added. The noise is sampled from a normal distribution and added to the control input.
dynamics (generic function with 2 methods)
function dynamics(x, u, p, t, noise=false)
if noise
# Add input noise (acts like Cartesian forces)
un = @SVector randn(length(u))
xp = discrete_dynamics(x, u + σw*un, p, t)
else
xp = discrete_dynamics(x, u, p, t)
end
xp
end
We now package all the settings and functions into an AdvancedParticleFilter struct
apf = AdvancedParticleFilter(N, dynamics, measurement, measurement_likelihood, p, d0, threads=false);
With the filter constructed, we can simulate a draw from the posterior distribution implied by the dynamics and noise distributions. We generate some interesting-looking control inputs u
, and pass this to the simulate
function. This function will add noise to the simulation in accordance with the functions dynamics
and measurement
defined above. To perform the actual filtering, we call forward_trajectory
which performs filtering along the entire trajectory all at once. The result of the filtering is a solution object that contains all the particles. In order to easily visualize the result, we compute the mean over all the particles at each time step using the function mean_trajectory
.[1] The result of the simulation and the filtering is shown below.
1
Computing the mean of points on a circle will in general lead to a point that is not on the circle, we thus project the computed mean back onto the circle before plotting. The theoretically correct way of computing this mean is to compute a generalized mean under the metric implied by measuring distances along the circle.
begin
t = range(0, length=250, step=Ts)
# Generate input
ui = [10sign.(sin.(5 .* t.^2)) randn(length(t))] |> eachrow .|> collect
x,u,y = simulate(apf, ui; dynamics_noise=true, measurement_noise=true)
u = SVector{2, Float64}.(u) # For slightly higher-performance filtering
y = SVector{2, Float64}.(y)
X = reduce(hcat, x)'
Y = reduce(hcat, y)'
U = reduce(hcat, u)'
@time "Particle Filtering" sol = forward_trajectory(apf, u, y, p);
Xmean = mean_trajectory(sol)
for i in axes(Xmean, 1)
Xmean[i, 1:2] ./= norm(Xmean[i, 1:2]) # Project onto circle
end
state_names = ["x" "y" "xᵥ" "yᵥ" "λ"]
figX = plot(X, label=state_names, layout=nx+na)
plot!(Xmean, label="Xmean")
# plot!(Xmode, label="Xmode")
figY = plot(Y, label="Y")
figU = plot(U, label="U")
plot(figX, figY, figU, layout=@layout([a{0.6h}; b{0.2h}; c{0.2h}]), size=(800,1100))
end
Particle Filtering: 1.167511 seconds (1.92 M allocations: 1.070 GiB, 11.78% gc time)
Instead of plotting only the mean of the particles, we could plot all the individual particles as well. Below, we do this and plot the true evolution of the state on top.
begin
Xpart = stack(sol.x)
fig_part = plot(layout=nx+na, plot_title="Particle trajectories")
for i = 1:nx+na
scatter!(Xpart[i, :, :]', label="X$i", sp=i, m=(1,:black, 0.5), lab=false)
end
plot!(X, label=state_names, c=1)
fig_part
end
Due to the random force input, we see that the tension force in the pendulum rod $$\lambda$$ is rather noisy. We also see that some of the particle trajectories appear to suddenly die out, in particular in the plot for $$y$$. This is a characteristic of the particle filter, it internally makes use of a resampling procedure in order to discard particles that appear to be unlikely representations of the true state given the measurements that are received.
To finish off, we render an animation that compares the evolution of the pendulum to that obtained using the filtering.
anim = @animate for i in 1:length(t)
x, y = X[i, 1:2] # True coordinates
xm, ym = sol.y[i] # Noisy measurement
xf, yf = Xmean[i, 1:2] # Mean of filtered estimate
plot([0, x], [0, y], m=:circle, markersize=[3, 15], ratio=:equal, label="True", l=7)
plot!([0, x], [0, y], m=:circle, markersize=[3, 15], ratio=:equal, label="Estimate", c=:purple, l=(2))
scatter!([xm], [ym], m=(:circle, :red, 2), label="Noisy measurements",
legend=:topright, xlims=(-1.1, 1.1), ylims=(-1.1, 0.2), grid=false, framestyle=:zerolines)
end; giffig = gif(anim, "anim_fps30.gif", fps = 30)
Saved animation to /home/jrun/Notebooks/JuliaHub/ab6957c0-b370-4fb8-b075-1e6065cfaa62/anim_fps30.gif
In the animation, it is hard to tell the difference between the true pendulum state and the estimate of the filter.
We have seen how it is possible to estimate the state not only for linear systems with Gaussian noise, but also for the much richer set of systems described by differential-algebraic equations. When we are dealing with a DAE system, we must be extra careful when thinking about what disturbances are affecting the dynamics, and ensure that we employ a technique that will not lead to violations of the algebraic equations. In this post, we made use of a particle filter for the estimation, a very generally applicable estimator that can handle challenging scenarios such as nonlinearities and bimodal state distributions. We modeled the disturbances as entering through the control inputs, a common approach in practice.
Internally, the filter represents the state distribution as a collection of particles. The filtering solution contains all of those particles in a matrix of size $$N \times T$$
Matrix{SVector{5, Float64}}
(100, 250)
typeof(sol.x), size(sol.x)
We can verify that each particle satisfies the algebraic equation:
violation = 6.651346140529313e-12
violation = maximum(abs.(getindex.(pend.(sol.x, u', p, 0), 5)))
The largest violation of the algebraic equation over all the particles was 6.65e-12, which is hopefully less than the solver tolerance 1.0e-8