Urysohn Adaptive Filter
This is interactive article designed by Andrew Polar and Mike Poluektov for a quick demonstration of their novel nonlinear adaptive filtering algorithm. It can be used in the field of data modelling, including identification of nonlinear dynamic systems. The explanation is backed up by numerical simulation, which is implemented in JavaScript of this single HTML file. The report attached at the bottom of this article is result of execution of client-side JavaScript code, which can be obtained by view source option of the browser. JavaScript must be enabled. Those who disabled JavaScript on security reasons, can save this file locally and open with the browser.
Patent pending "Method for identifying discrete Urysohn models", 15/998381, filed 08/11/2018.
Quick introduction
$$y_i = \sum_{j=1}^m h_j \cdot x_{ij} = \mathbf{h^T x_i}.$$
$\mathbf{x_i}$ is i-th input vector or data sample with elements $x_{ij}$, scalars $y_i$ are outputs, and $\mathbf{h}$ is a model vector with elements $h_j$. If real-time modeling is applied, which means identification along with data reading, the first choice is the Least Mean Squares (LMS). It is assumed that reader is familiar with its concept and it is not repeated in this article. When accuracy of the model doesn't meet expectations, it may be replaced by nonlinear model with according change in the algorithm. The usual choices are Volterra LMS or Kernel LMS. These algorithms replace original samples with elements $x_{ij}$ by a new data. The elements in these data samples may be sitting in different powers and in products (such as $x_{kn}*x_{kp}$), but the model itself non-the-less stays linear relatively to estimated parameters.The authors of this article approached to improving of the model from completely different perspective. The idea is to replace each linear term in above model by unknown function and identify them all from data
$$y_i = \sum_{j=1}^m f_j (x_{ij}). $$
The only limitation imposed on the functions is piecewise linearity. The functions are continuous, defined on entire interval of $x$ from the range $[x_{min}, x_{max}]$ and represent several blocks of connected straight lines. The first linear model is a particular case of the latter. When every function has only one linear block and the lines in each of them pass through origin, the latter turns into linear model.In case of two inputs $\mathbf{x_i}$ and $\mathbf{y_i}$ the model becomes blockwice bilinear, and it also can be identified by samples of inputs and outputs with only limitation of blockwice bilinearity on estimated functions $g_j(\cdot,\cdot)$
$$z_i = \sum_{j=1}^m g_j (x_{ij}, y_{ij}). $$
Quick explanation
For explanation we consider function defined by a single straight line between argument points $x^{left}$ and $x^{right}$ and known by two function values $f^{left}$ and $f^{right}$. For every argument $x$ from a defined interval the function value is then computed as
$$f = f^{left} * (1 - x) + f^{right} * x,$$
where $x$ is relative distance from the left abscissa of the interval. We call values $1-x$ and $x$ the weight coefficients for linear block and denote them as $w^{left}$, $w^{right}$. The upper case symbols are introduced to distinguish variables from their values, so when we write $x_{ik} = w^{left}$, we mean that variable $x_{ik}$ equals to a number $w^{left}$.The rest part of identification algorithm is even more obvious than that. The model represents multiple functions and each function may have multiple linear blocks. If we stretch all arguments and function values into vectors and place them into a table along with weights for particular data sample, we can see the following:
| $f_{1}$ | other functions | $f_{m}$ | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Arguments | $x^{1,1}$ | $x^{1,2}$ | $x^{1,3}$ | $...$ | $...$ | $x^{m,1}$ | $x^{m,2}$ | $x^{m,3}$ | $...$ |
| Functions | $f^{1,1}$ | $f^{1,2}$ | $f^{1,3}$ | $...$ | $...$ | $f^{m,1}$ | $f^{m,2}$ | $f^{m,3}$ | $...$ |
| Weights | $0$ | $w^{1,left}$ | $w^{1,right}$ | $...$ | $...$ | $w^{m,left}$ | $w^{m,right}$ | $0$ | $...$ |
The rows of arguments and functions represent current model, they all numbers. The row of weights is built according to a particular input sample. The inner product of weights and functions must be equal to output and if not, the function values are corrected according to a classic LMS concept. Each argument falls into only a single linear interval, so only two weight coefficients are different from zero within each function block. That makes rows for different data samples less linear dependent compared to linear adaptive filters, and provides fast convergence. Obviously, the order of columns in a table is free, only weight coefficients must be placed on the right positions and in suggested software DEMO this table is not even built. The model is assigned as two-dimensional array and values of the model are corrected as array elements. The table is brought up for explanation only.
The model with two inputs is not much different in principal. According to suggested concept each function $g_j(\cdot,\cdot)$ is bilinear within multiple rectangular fragments from the field of definition $[x_{min}, x_{max}], [y_{min}, y_{max}]$. These rectangular fragments cover entire field of definition without gaps and overlapping. They may not necessarily be all equal in size. The arguments for each function $g_j$ fall into a single rectangular fragment. If we name the corner points of this fragment as North-West, North-East, South-West, South-East, than function value is computed by bilinear formula:
$$g = NW + (NE-NW) * x + (SW - NW) * y + ((SE + NW) - (NE + SW) * x * y.$$
Here $x$ is relative distance from North and $y$ is relative distance from West. The weight coefficients are defined in the following way:$$w^{NW} = (1 - x) (1 - y)$$ $$w^{NE} = x (1 - y)$$ $$w^{SW} = (1 - x) y$$ $$w^{SE} = x y$$
The next step is obvious, similar table can be arranged for two input model, the elements of the model can be stretched into a vector, weight coefficients are built by inputs and also can be placed into a table, inner product of model and weights must be equal to output, if not, the model is corrected.Bilinear model does not assume that that all four points belong to a same plane, but they may in particular case.
Redundancy of the models
Relation to integral equations
$$y(t) = \int_{0}^T U[x(t,s),s]ds, $$
which means that suggested technique can be used for identification of the kernel of this operator in real-time by processing input/output sequences. The model with two inputs is also discrete form of Urysohn with a different kernel$$z(t) = \int_{0}^T U[x(t,s),y(t,s),s]ds. $$
That makes the suggested technique a convenient choice for identification of nonlinear dynamic systems. For dynamic systems arguments $x$ and $y$ depend on difference in the following way $x(t-s)$ and $y(t-s)$, which mathematically describes specific properties of such systems, but the suggested algorithm works for generic case.Conclusion
Since models are actually kernels of Urysohn equations, it is suggested to call this technique Urysohn Adaptive Filtering. This is a brief explanation, many important details are omitted. They are published in archived articles, available online:
Modelling of Non-linear Control Systems using the Discrete Urysohn Operator
Canonical block-oriented model
Many coding sample can be found on the personal site of the author of this article
There are also examples of identification of physical objects by recorded input/output values.
Numerical simulation
Image of identified kernel