Online Delta Rule

The online delta rule provides a streaming alternative to batch ridge regression. Instead of collecting all reservoir states and solving a global linear system, the delta rule updates the readout weights incrementally after each time step. This makes it suitable for real-time applications, non-stationary data, and memory-constrained environments.

The Delta Rule

The delta rule is a gradient descent algorithm applied to the instantaneous squared error at each time step. Given:

Reservoir state vector $\mathbf{x}_t \in \mathbb{R}^N$ at time $t$
Predicted output $\hat{\mathbf{y}}_t = W_{\text{out}}\, \mathbf{x}_t$
Target output $\mathbf{y}_t$

The error is:

e_t = \mathbf{y}_t - \hat{\mathbf{y}}_t

The weight update is:

\Delta W_{\text{out}} = \eta\, e_t\, \mathbf{x}_t^\top

W_{\text{out}} \leftarrow W_{\text{out}} + \Delta W_{\text{out}}

where $\eta > 0$ is the learning rate.

For multi-output systems where $W_{\text{out}} \in \mathbb{R}^{N_{\text{out}} \times N}$ , the update is:

W_{\text{out}} \leftarrow W_{\text{out}} + \eta\, (\mathbf{y}_t - W_{\text{out}}\, \mathbf{x}_t)\, \mathbf{x}_t^\top

This is a rank-1 update to the weight matrix at each time step, costing $O(N_{\text{out}} \times N)$ operations.

Derivation from Gradient Descent

The delta rule can be derived as stochastic gradient descent on the instantaneous loss:

\mathcal{L}_t = \frac{1}{2} \|\mathbf{y}_t - W_{\text{out}}\, \mathbf{x}_t\|^2

The gradient with respect to $W_{\text{out}}$ is:

\frac{\partial \mathcal{L}_t}{\partial W_{\text{out}}} = -(\mathbf{y}_t - W_{\text{out}}\, \mathbf{x}_t)\, \mathbf{x}_t^\top = -e_t\, \mathbf{x}_t^\top

Moving in the negative gradient direction with step size $\eta$ gives the delta rule update. Because the readout is linear, there are no local minima — the loss landscape is a convex quadratic, and gradient descent converges to the global optimum given a sufficiently small learning rate.

Convergence Properties

Learning Rate Selection

The learning rate $\eta$ controls the speed and stability of convergence:

Too large: The weights oscillate and may diverge. The critical stability threshold depends on the spectral norm of the state correlation matrix.
Too small: Convergence is slow. The weights take many iterations to approach the optimal solution.
Optimal range: For reservoir computing, $\eta \in [10^{-4}, 10^{-2}]$ is a common starting range, but the exact value depends on the signal amplitude, reservoir size, and desired convergence speed.

A useful rule of thumb: the learning rate should satisfy $\eta \lt 2 / \|\mathbf{x}\|^2_{\max}$ where $\|\mathbf{x}\|^2_{\max}$ is the maximum squared norm of the state vector encountered during training.

Relationship to Ridge Regression

In the limit of infinitely many passes over a stationary training set, the delta rule converges to the ordinary least-squares solution (unregularized). To obtain the effect of ridge regularization, you can add a weight decay term:

W_{\text{out}} \leftarrow (1 - \eta\lambda)\, W_{\text{out}} + \eta\, e_t\, \mathbf{x}_t^\top

where $\lambda$ plays the same role as the ridge parameter. This is equivalent to L2-regularized stochastic gradient descent.

SPIRES API

Online training is performed with spires_train_online():


spires_status spires_train_online(
    spires_reservoir *r,
    const double *input_series,
    const double *target_series,
    size_t series_length,
    double learning_rate
);

Parameters:

Parameter	Description
`r`	Pointer to an initialized reservoir
`input_series`	Flat array of input values, length `series_length * num_inputs`
`target_series`	Flat array of target values, length `series_length * num_outputs`
`series_length`	Number of time steps
`learning_rate`	Step size $\eta$ for the delta rule

Returns: SPIRES_OK on success, or an error status code.

Example Usage


#include <spires.h>
#include <stdio.h>
 
/* Assume reservoir r is already created */
/* Assume input_train[] and target_train[] are populated */
 
double eta = 1e-3;
spires_status s = spires_train_online(r, input_train, target_train, N_TRAIN, eta);
if (s != SPIRES_OK) {
    fprintf(stderr, "Online training failed with status %d\n", s);
    spires_reservoir_destroy(r);
    return 1;
}
 
/* Reservoir is now trained -- run inference */
double *predictions = spires_run(r, input_test, N_TEST);

Incremental Training

Unlike ridge regression, online training can be called multiple times to continue learning from new data:


/* Train on first batch */
spires_train_online(r, input_batch1, target_batch1, batch1_len, eta);
 
/* Train on second batch -- weights continue from where they left off */
spires_train_online(r, input_batch2, target_batch2, batch2_len, eta);
 
/* Train on third batch -- adapts to new data */
spires_train_online(r, input_batch3, target_batch3, batch3_len, eta);

Each call processes the input series one step at a time, updating the weights after each step. The reservoir state is preserved between calls, so the second batch continues from the state reached at the end of the first batch.

Comparison with Ridge Regression

Property	Ridge Regression	Online Delta Rule
Algorithm	Closed-form batch solve	Iterative stochastic gradient
Optimality	Global optimum in one pass	Converges to optimum over time
Memory	$O(T \times N)$ for state matrix	$O(N)$ for weight update
Compute	$O(N^2 T + N^3)$	$O(N \times T)$ per pass
Streaming data	Not supported	Native support
Non-stationary data	Must retrain from scratch	Adapts continuously
Regularization	Explicit $\lambda$ parameter	Via weight decay or early stopping
Typical use	Offline benchmarks	Real-time, adaptive systems

When to Use Online Training

Choose the online delta rule when:

Memory is limited: You cannot afford to store the full $T \times N$ state matrix. The delta rule uses $O(N_{\text{out}} \times N)$ memory for the weights and $O(N)$ for the current state vector.
Data arrives in a stream: Sensor data, financial time series, or robotic control signals that arrive continuously and must be processed in real time.
The environment is non-stationary: The relationship between inputs and targets changes over time, and the readout must adapt. The delta rule naturally tracks changes because it continually adjusts the weights.
Multiple passes are acceptable: For offline data, you can run the delta rule over the same data multiple times (epochs) to improve convergence.

Practical Considerations

Learning Rate Scheduling

For better convergence, you can decrease the learning rate over time. A common schedule is:

\eta_t = \frac{\eta_0}{1 + t / \tau_\eta}

where $\eta_0$ is the initial learning rate and $\tau_\eta$ is a decay time constant. SPIRES uses a fixed learning rate per call; to implement scheduling, reduce $\eta$ between successive calls to spires_train_online().

Initialization

The readout weights $W_{\text{out}}$ are initialized to zero when the reservoir is created. This is appropriate for the delta rule, which will learn the correct weights from the training data. If you call spires_train_online() after spires_train_ridge(), the delta rule will refine the ridge solution rather than starting from zero.

Washout

As with ridge regression, the first few time steps of the reservoir response are transient. With the delta rule, the weights update during these transient steps, which can introduce small errors. In practice, these are overwritten by subsequent updates. For critical applications, you can drive the reservoir with a washout input before beginning online training.

← Ridge Regression | AGILE Optimizer →