Skip to contentSkip to Content
DocsUser GuideTrainingRidge Regression

Ridge Regression

Ridge regression is the primary training method in SPIRES. It is a batch algorithm that collects all reservoir states from a training run, assembles them into a matrix, and solves a regularized least-squares problem to find the optimal readout weights. This produces a global optimum in a single pass.

The Training Problem

After driving the reservoir with an input sequence {u1,u2,,uT}\{u_1, u_2, \ldots, u_T\}, the reservoir produces a sequence of state vectors {x1,x2,,xT}\{\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_T\}, where xtRN\mathbf{x}_t \in \mathbb{R}^N and NN is the number of neurons.

The readout layer computes:

y^t=Woutxt\hat{\mathbf{y}}_t = W_{\text{out}}\, \mathbf{x}_t

where WoutRNout×NW_{\text{out}} \in \mathbb{R}^{N_{\text{out}} \times N} is the readout weight matrix. The goal of training is to find WoutW_{\text{out}} such that y^t\hat{\mathbf{y}}_t approximates the target yt\mathbf{y}_t as closely as possible.

State Matrix Assembly

Define the state matrix Φ\Phi by stacking the reservoir state vectors as rows:

Φ=(x1x2xT)RT×N\Phi = \begin{pmatrix} \mathbf{x}_1^\top \\ \mathbf{x}_2^\top \\ \vdots \\ \mathbf{x}_T^\top \end{pmatrix} \in \mathbb{R}^{T \times N}

and the target matrix YY similarly:

Y=(y1y2yT)RT×NoutY = \begin{pmatrix} \mathbf{y}_1^\top \\ \mathbf{y}_2^\top \\ \vdots \\ \mathbf{y}_T^\top \end{pmatrix} \in \mathbb{R}^{T \times N_{\text{out}}}

The training problem is to find WoutW_{\text{out}} that minimizes:

YΦWoutF2\|Y - \Phi\, W_{\text{out}}^\top\|_F^2

where F\|\cdot\|_F denotes the Frobenius norm.

Tikhonov Regularization

The ordinary least-squares solution Wout=(ΦΦ)1ΦYW_{\text{out}}^\top = (\Phi^\top \Phi)^{-1} \Phi^\top Y is numerically unstable when the state matrix Φ\Phi is rank-deficient or ill-conditioned, which is common in reservoir computing because:

  • The number of neurons NN may exceed the number of training samples TT.
  • Correlated neuron activity produces near-singular covariance matrices.
  • Spiking dynamics can produce degenerate states (e.g., all-zero or all-one columns).

Tikhonov regularization (ridge regression) adds a penalty term λWoutF2\lambda \|W_{\text{out}}\|_F^2 to the objective:

minWoutYΦWoutF2+λWoutF2\min_{W_{\text{out}}} \|Y - \Phi\, W_{\text{out}}^\top\|_F^2 + \lambda \|W_{\text{out}}\|_F^2

The closed-form solution is:

Wout=(ΦΦ+λI)1ΦYW_{\text{out}}^\top = \bigl(\Phi^\top \Phi + \lambda I\bigr)^{-1} \Phi^\top Y

The regularization parameter λ>0\lambda > 0 has two effects:

  1. Numerical stability: Adding λI\lambda I to ΦΦ\Phi^\top \Phi ensures the matrix is positive definite and invertible.
  2. Generalization: Penalizing large weights prevents overfitting to noise in the training data.

Choosing the Regularization Parameter

The regularization parameter λ\lambda controls the bias-variance trade-off:

λ\lambdaBehavior
Very small (101010^{-10})Near-zero regularization; fits training data closely but may overfit
Small (10810^{-8} to 10610^{-6})Mild regularization; good for clean signals
Moderate (10410^{-4} to 10210^{-2})Strong regularization; better for noisy data
Large (10010^{0} to 10210^{2})Over-regularized; underfits the data

A common approach is to sweep λ\lambda over a logarithmic grid (e.g., 1010,109,,10010^{-10}, 10^{-9}, \ldots, 10^{0}) and select the value that minimizes error on a validation set. The SPIRES optimizer can automate this search.

In log-space, the optimal λ\lambda for the optimizer is stored as best_log10_ridge in the spires_opt_result struct.

SPIRES API

Ridge regression training is performed with spires_train_ridge():

spires_status spires_train_ridge( spires_reservoir *r, const double *input_series, const double *target_series, size_t series_length, double lambda );

Parameters:

ParameterDescription
rPointer to an initialized reservoir
input_seriesFlat array of input values, length series_length * num_inputs
target_seriesFlat array of target values, length series_length * num_outputs
series_lengthNumber of time steps in the training data
lambdaRegularization parameter (λ0\lambda \geq 0)

Returns: SPIRES_OK on success, or an error status code.

Example Usage

#include <spires.h> #include <stdio.h> /* Assume reservoir r is already created */ /* Assume input_train[] and target_train[] are populated */ double lambda = 1e-6; spires_status s = spires_train_ridge(r, input_train, target_train, N_TRAIN, lambda); if (s != SPIRES_OK) { fprintf(stderr, "Ridge training failed with status %d\n", s); spires_reservoir_destroy(r); return 1; } /* Reservoir is now trained -- run inference */ double *predictions = spires_run(r, input_test, N_TEST);

Internal Implementation

When you call spires_train_ridge(), SPIRES performs the following steps:

  1. Reset the reservoir to its initial state.
  2. Drive the reservoir with the input series, recording the state vector xt\mathbf{x}_t at each time step tt.
  3. Assemble the state matrix ΦRT×N\Phi \in \mathbb{R}^{T \times N} and target matrix YRT×NoutY \in \mathbb{R}^{T \times N_{\text{out}}}.
  4. Compute ΦΦRN×N\Phi^\top \Phi \in \mathbb{R}^{N \times N} and ΦYRN×Nout\Phi^\top Y \in \mathbb{R}^{N \times N_{\text{out}}} using BLAS dgemm.
  5. Regularize by adding λ\lambda to the diagonal of ΦΦ\Phi^\top \Phi.
  6. Solve the linear system (ΦΦ+λI)Wout=ΦY(\Phi^\top \Phi + \lambda I)\, W_{\text{out}}^\top = \Phi^\top Y using LAPACKE (dposv for symmetric positive definite systems).
  7. Store the resulting WoutW_{\text{out}} inside the reservoir struct.

The heavy computation is in steps 4 and 6, which are performed by optimized BLAS and LAPACK routines. For a reservoir with N=500N = 500 neurons and T=5000T = 5000 time steps, the matrix ΦΦ\Phi^\top \Phi is 500×500500 \times 500 and the solve completes in milliseconds on modern hardware.

Memory Considerations

Ridge regression requires storing the entire state matrix Φ\Phi in memory:

  • Memory: T×N×8T \times N \times 8 bytes (for double precision)
  • For T=10000T = 10000 and N=500N = 500: approximately 40 MB

If memory is a constraint (e.g., very long training sequences or very large reservoirs), consider:

  • Reducing the training length TT (ensure it is still long enough to capture the dynamics).
  • Using the online delta rule instead, which processes one time step at a time and does not store the state matrix.

Washout Period

For reservoir computing, the first few time steps of the reservoir response depend on the initial conditions rather than the input signal. These transient states should be discarded from the training data to avoid corrupting the readout weights. This is called the washout period.

A common practice is to discard the first 100—500 time steps of the state matrix before assembling Φ\Phi. SPIRES handles this automatically when a washout length is specified, or you can account for it by starting your target series after the washout period.

Multi-Output Training

Ridge regression naturally supports multiple outputs. If num_outputs > 1, the target matrix YY has multiple columns, and the solution WoutW_{\text{out}} has multiple rows. Each output is trained jointly, sharing the same regularization parameter λ\lambda. If different outputs require different regularization, you can train them separately by setting num_outputs = 1 and calling spires_train_ridge() multiple times with different target series.


← Barabasi-Albert | Online Delta Rule →

Last updated on