Skip to contentSkip to Content
DocsUser GuideTrainingOnline Delta Rule

Online Delta Rule

The online delta rule provides a streaming alternative to batch ridge regression. Instead of collecting all reservoir states and solving a global linear system, the delta rule updates the readout weights incrementally after each time step. This makes it suitable for real-time applications, non-stationary data, and memory-constrained environments.

The Delta Rule

The delta rule is a gradient descent algorithm applied to the instantaneous squared error at each time step. Given:

  • Reservoir state vector xtRN\mathbf{x}_t \in \mathbb{R}^N at time tt
  • Predicted output y^t=Woutxt\hat{\mathbf{y}}_t = W_{\text{out}}\, \mathbf{x}_t
  • Target output yt\mathbf{y}_t

The error is:

et=yty^te_t = \mathbf{y}_t - \hat{\mathbf{y}}_t

The weight update is:

ΔWout=ηetxt\Delta W_{\text{out}} = \eta\, e_t\, \mathbf{x}_t^\top WoutWout+ΔWoutW_{\text{out}} \leftarrow W_{\text{out}} + \Delta W_{\text{out}}

where η>0\eta > 0 is the learning rate.

For multi-output systems where WoutRNout×NW_{\text{out}} \in \mathbb{R}^{N_{\text{out}} \times N}, the update is:

WoutWout+η(ytWoutxt)xtW_{\text{out}} \leftarrow W_{\text{out}} + \eta\, (\mathbf{y}_t - W_{\text{out}}\, \mathbf{x}_t)\, \mathbf{x}_t^\top

This is a rank-1 update to the weight matrix at each time step, costing O(Nout×N)O(N_{\text{out}} \times N) operations.

Derivation from Gradient Descent

The delta rule can be derived as stochastic gradient descent on the instantaneous loss:

Lt=12ytWoutxt2\mathcal{L}_t = \frac{1}{2} \|\mathbf{y}_t - W_{\text{out}}\, \mathbf{x}_t\|^2

The gradient with respect to WoutW_{\text{out}} is:

LtWout=(ytWoutxt)xt=etxt\frac{\partial \mathcal{L}_t}{\partial W_{\text{out}}} = -(\mathbf{y}_t - W_{\text{out}}\, \mathbf{x}_t)\, \mathbf{x}_t^\top = -e_t\, \mathbf{x}_t^\top

Moving in the negative gradient direction with step size η\eta gives the delta rule update. Because the readout is linear, there are no local minima — the loss landscape is a convex quadratic, and gradient descent converges to the global optimum given a sufficiently small learning rate.

Convergence Properties

Learning Rate Selection

The learning rate η\eta controls the speed and stability of convergence:

  • Too large: The weights oscillate and may diverge. The critical stability threshold depends on the spectral norm of the state correlation matrix.
  • Too small: Convergence is slow. The weights take many iterations to approach the optimal solution.
  • Optimal range: For reservoir computing, η[104,102]\eta \in [10^{-4}, 10^{-2}] is a common starting range, but the exact value depends on the signal amplitude, reservoir size, and desired convergence speed.

A useful rule of thumb: the learning rate should satisfy η<2/xmax2\eta \lt 2 / \|\mathbf{x}\|^2_{\max} where xmax2\|\mathbf{x}\|^2_{\max} is the maximum squared norm of the state vector encountered during training.

Relationship to Ridge Regression

In the limit of infinitely many passes over a stationary training set, the delta rule converges to the ordinary least-squares solution (unregularized). To obtain the effect of ridge regularization, you can add a weight decay term:

Wout(1ηλ)Wout+ηetxtW_{\text{out}} \leftarrow (1 - \eta\lambda)\, W_{\text{out}} + \eta\, e_t\, \mathbf{x}_t^\top

where λ\lambda plays the same role as the ridge parameter. This is equivalent to L2-regularized stochastic gradient descent.

SPIRES API

Online training is performed with spires_train_online():

spires_status spires_train_online( spires_reservoir *r, const double *input_series, const double *target_series, size_t series_length, double learning_rate );

Parameters:

ParameterDescription
rPointer to an initialized reservoir
input_seriesFlat array of input values, length series_length * num_inputs
target_seriesFlat array of target values, length series_length * num_outputs
series_lengthNumber of time steps
learning_rateStep size η\eta for the delta rule

Returns: SPIRES_OK on success, or an error status code.

Example Usage

#include <spires.h> #include <stdio.h> /* Assume reservoir r is already created */ /* Assume input_train[] and target_train[] are populated */ double eta = 1e-3; spires_status s = spires_train_online(r, input_train, target_train, N_TRAIN, eta); if (s != SPIRES_OK) { fprintf(stderr, "Online training failed with status %d\n", s); spires_reservoir_destroy(r); return 1; } /* Reservoir is now trained -- run inference */ double *predictions = spires_run(r, input_test, N_TEST);

Incremental Training

Unlike ridge regression, online training can be called multiple times to continue learning from new data:

/* Train on first batch */ spires_train_online(r, input_batch1, target_batch1, batch1_len, eta); /* Train on second batch -- weights continue from where they left off */ spires_train_online(r, input_batch2, target_batch2, batch2_len, eta); /* Train on third batch -- adapts to new data */ spires_train_online(r, input_batch3, target_batch3, batch3_len, eta);

Each call processes the input series one step at a time, updating the weights after each step. The reservoir state is preserved between calls, so the second batch continues from the state reached at the end of the first batch.

Comparison with Ridge Regression

PropertyRidge RegressionOnline Delta Rule
AlgorithmClosed-form batch solveIterative stochastic gradient
OptimalityGlobal optimum in one passConverges to optimum over time
MemoryO(T×N)O(T \times N) for state matrixO(N)O(N) for weight update
ComputeO(N2T+N3)O(N^2 T + N^3)O(N×T)O(N \times T) per pass
Streaming dataNot supportedNative support
Non-stationary dataMust retrain from scratchAdapts continuously
RegularizationExplicit λ\lambda parameterVia weight decay or early stopping
Typical useOffline benchmarksReal-time, adaptive systems

When to Use Online Training

Choose the online delta rule when:

  • Memory is limited: You cannot afford to store the full T×NT \times N state matrix. The delta rule uses O(Nout×N)O(N_{\text{out}} \times N) memory for the weights and O(N)O(N) for the current state vector.
  • Data arrives in a stream: Sensor data, financial time series, or robotic control signals that arrive continuously and must be processed in real time.
  • The environment is non-stationary: The relationship between inputs and targets changes over time, and the readout must adapt. The delta rule naturally tracks changes because it continually adjusts the weights.
  • Multiple passes are acceptable: For offline data, you can run the delta rule over the same data multiple times (epochs) to improve convergence.

Practical Considerations

Learning Rate Scheduling

For better convergence, you can decrease the learning rate over time. A common schedule is:

ηt=η01+t/τη\eta_t = \frac{\eta_0}{1 + t / \tau_\eta}

where η0\eta_0 is the initial learning rate and τη\tau_\eta is a decay time constant. SPIRES uses a fixed learning rate per call; to implement scheduling, reduce η\eta between successive calls to spires_train_online().

Initialization

The readout weights WoutW_{\text{out}} are initialized to zero when the reservoir is created. This is appropriate for the delta rule, which will learn the correct weights from the training data. If you call spires_train_online() after spires_train_ridge(), the delta rule will refine the ridge solution rather than starting from zero.

Washout

As with ridge regression, the first few time steps of the reservoir response are transient. With the delta rule, the weights update during these transient steps, which can introduce small errors. In practice, these are overwritten by subsequent updates. For critical applications, you can drive the reservoir with a washout input before beginning online training.


← Ridge Regression | AGILE Optimizer →

Last updated on