Online Delta Rule
The online delta rule provides a streaming alternative to batch ridge regression. Instead of collecting all reservoir states and solving a global linear system, the delta rule updates the readout weights incrementally after each time step. This makes it suitable for real-time applications, non-stationary data, and memory-constrained environments.
The Delta Rule
The delta rule is a gradient descent algorithm applied to the instantaneous squared error at each time step. Given:
- Reservoir state vector at time
- Predicted output
- Target output
The error is:
The weight update is:
where is the learning rate.
For multi-output systems where , the update is:
This is a rank-1 update to the weight matrix at each time step, costing operations.
Derivation from Gradient Descent
The delta rule can be derived as stochastic gradient descent on the instantaneous loss:
The gradient with respect to is:
Moving in the negative gradient direction with step size gives the delta rule update. Because the readout is linear, there are no local minima — the loss landscape is a convex quadratic, and gradient descent converges to the global optimum given a sufficiently small learning rate.
Convergence Properties
Learning Rate Selection
The learning rate controls the speed and stability of convergence:
- Too large: The weights oscillate and may diverge. The critical stability threshold depends on the spectral norm of the state correlation matrix.
- Too small: Convergence is slow. The weights take many iterations to approach the optimal solution.
- Optimal range: For reservoir computing, is a common starting range, but the exact value depends on the signal amplitude, reservoir size, and desired convergence speed.
A useful rule of thumb: the learning rate should satisfy where is the maximum squared norm of the state vector encountered during training.
Relationship to Ridge Regression
In the limit of infinitely many passes over a stationary training set, the delta rule converges to the ordinary least-squares solution (unregularized). To obtain the effect of ridge regularization, you can add a weight decay term:
where plays the same role as the ridge parameter. This is equivalent to L2-regularized stochastic gradient descent.
SPIRES API
Online training is performed with spires_train_online():
spires_status spires_train_online(
spires_reservoir *r,
const double *input_series,
const double *target_series,
size_t series_length,
double learning_rate
);Parameters:
| Parameter | Description |
|---|---|
r | Pointer to an initialized reservoir |
input_series | Flat array of input values, length series_length * num_inputs |
target_series | Flat array of target values, length series_length * num_outputs |
series_length | Number of time steps |
learning_rate | Step size for the delta rule |
Returns: SPIRES_OK on success, or an error status code.
Example Usage
#include <spires.h>
#include <stdio.h>
/* Assume reservoir r is already created */
/* Assume input_train[] and target_train[] are populated */
double eta = 1e-3;
spires_status s = spires_train_online(r, input_train, target_train, N_TRAIN, eta);
if (s != SPIRES_OK) {
fprintf(stderr, "Online training failed with status %d\n", s);
spires_reservoir_destroy(r);
return 1;
}
/* Reservoir is now trained -- run inference */
double *predictions = spires_run(r, input_test, N_TEST);Incremental Training
Unlike ridge regression, online training can be called multiple times to continue learning from new data:
/* Train on first batch */
spires_train_online(r, input_batch1, target_batch1, batch1_len, eta);
/* Train on second batch -- weights continue from where they left off */
spires_train_online(r, input_batch2, target_batch2, batch2_len, eta);
/* Train on third batch -- adapts to new data */
spires_train_online(r, input_batch3, target_batch3, batch3_len, eta);Each call processes the input series one step at a time, updating the weights after each step. The reservoir state is preserved between calls, so the second batch continues from the state reached at the end of the first batch.
Comparison with Ridge Regression
| Property | Ridge Regression | Online Delta Rule |
|---|---|---|
| Algorithm | Closed-form batch solve | Iterative stochastic gradient |
| Optimality | Global optimum in one pass | Converges to optimum over time |
| Memory | for state matrix | for weight update |
| Compute | per pass | |
| Streaming data | Not supported | Native support |
| Non-stationary data | Must retrain from scratch | Adapts continuously |
| Regularization | Explicit parameter | Via weight decay or early stopping |
| Typical use | Offline benchmarks | Real-time, adaptive systems |
When to Use Online Training
Choose the online delta rule when:
- Memory is limited: You cannot afford to store the full state matrix. The delta rule uses memory for the weights and for the current state vector.
- Data arrives in a stream: Sensor data, financial time series, or robotic control signals that arrive continuously and must be processed in real time.
- The environment is non-stationary: The relationship between inputs and targets changes over time, and the readout must adapt. The delta rule naturally tracks changes because it continually adjusts the weights.
- Multiple passes are acceptable: For offline data, you can run the delta rule over the same data multiple times (epochs) to improve convergence.
Practical Considerations
Learning Rate Scheduling
For better convergence, you can decrease the learning rate over time. A common schedule is:
where is the initial learning rate and is a decay time constant. SPIRES uses a fixed learning rate per call; to implement scheduling, reduce between successive calls to spires_train_online().
Initialization
The readout weights are initialized to zero when the reservoir is created. This is appropriate for the delta rule, which will learn the correct weights from the training data. If you call spires_train_online() after spires_train_ridge(), the delta rule will refine the ridge solution rather than starting from zero.
Washout
As with ridge regression, the first few time steps of the reservoir response are transient. With the delta rule, the weights update during these transient steps, which can introduce small errors. In practice, these are overwritten by subsequent updates. For critical applications, you can drive the reservoir with a washout input before beginning online training.