Ridge Regression

Ridge regression is the primary training method in SPIRES. It is a batch algorithm that collects all reservoir states from a training run, assembles them into a matrix, and solves a regularized least-squares problem to find the optimal readout weights. This produces a global optimum in a single pass.

The Training Problem

After driving the reservoir with an input sequence $\{u_1, u_2, \ldots, u_T\}$ , the reservoir produces a sequence of state vectors $\{\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_T\}$ , where $\mathbf{x}_t \in \mathbb{R}^N$ and $N$ is the number of neurons.

The readout layer computes:

\hat{\mathbf{y}}_t = W_{\text{out}}\, \mathbf{x}_t

where $W_{\text{out}} \in \mathbb{R}^{N_{\text{out}} \times N}$ is the readout weight matrix. The goal of training is to find $W_{\text{out}}$ such that $\hat{\mathbf{y}}_t$ approximates the target $\mathbf{y}_t$ as closely as possible.

State Matrix Assembly

Define the state matrix $\Phi$ by stacking the reservoir state vectors as rows:

\Phi = \begin{pmatrix} \mathbf{x}_1^\top \\ \mathbf{x}_2^\top \\ \vdots \\ \mathbf{x}_T^\top \end{pmatrix} \in \mathbb{R}^{T \times N}

and the target matrix $Y$ similarly:

Y = \begin{pmatrix} \mathbf{y}_1^\top \\ \mathbf{y}_2^\top \\ \vdots \\ \mathbf{y}_T^\top \end{pmatrix} \in \mathbb{R}^{T \times N_{\text{out}}}

The training problem is to find $W_{\text{out}}$ that minimizes:

\|Y - \Phi\, W_{\text{out}}^\top\|_F^2

where $\|\cdot\|_F$ denotes the Frobenius norm.

Tikhonov Regularization

The ordinary least-squares solution $W_{\text{out}}^\top = (\Phi^\top \Phi)^{-1} \Phi^\top Y$ is numerically unstable when the state matrix $\Phi$ is rank-deficient or ill-conditioned, which is common in reservoir computing because:

The number of neurons $N$ may exceed the number of training samples $T$ .
Correlated neuron activity produces near-singular covariance matrices.
Spiking dynamics can produce degenerate states (e.g., all-zero or all-one columns).

Tikhonov regularization (ridge regression) adds a penalty term $\lambda \|W_{\text{out}}\|_F^2$ to the objective:

\min_{W_{\text{out}}} \|Y - \Phi\, W_{\text{out}}^\top\|_F^2 + \lambda \|W_{\text{out}}\|_F^2

The closed-form solution is:

W_{\text{out}}^\top = \bigl(\Phi^\top \Phi + \lambda I\bigr)^{-1} \Phi^\top Y

The regularization parameter $\lambda > 0$ has two effects:

Numerical stability: Adding $\lambda I$ to $\Phi^\top \Phi$ ensures the matrix is positive definite and invertible.
Generalization: Penalizing large weights prevents overfitting to noise in the training data.

Choosing the Regularization Parameter

The regularization parameter $\lambda$ controls the bias-variance trade-off:

$\lambda$	Behavior
Very small ( $10^{-10}$ )	Near-zero regularization; fits training data closely but may overfit
Small ( $10^{-8}$ to $10^{-6}$ )	Mild regularization; good for clean signals
Moderate ( $10^{-4}$ to $10^{-2}$ )	Strong regularization; better for noisy data
Large ( $10^{0}$ to $10^{2}$ )	Over-regularized; underfits the data

A common approach is to sweep $\lambda$ over a logarithmic grid (e.g., $10^{-10}, 10^{-9}, \ldots, 10^{0}$ ) and select the value that minimizes error on a validation set. The SPIRES optimizer can automate this search.

In log-space, the optimal $\lambda$ for the optimizer is stored as best_log10_ridge in the spires_opt_result struct.

SPIRES API

Ridge regression training is performed with spires_train_ridge():


spires_status spires_train_ridge(
    spires_reservoir *r,
    const double *input_series,
    const double *target_series,
    size_t series_length,
    double lambda
);

Parameters:

Parameter	Description
`r`	Pointer to an initialized reservoir
`input_series`	Flat array of input values, length `series_length * num_inputs`
`target_series`	Flat array of target values, length `series_length * num_outputs`
`series_length`	Number of time steps in the training data
`lambda`	Regularization parameter ( $\lambda \geq 0$ )

Returns: SPIRES_OK on success, or an error status code.

Example Usage


#include <spires.h>
#include <stdio.h>
 
/* Assume reservoir r is already created */
/* Assume input_train[] and target_train[] are populated */
 
double lambda = 1e-6;
spires_status s = spires_train_ridge(r, input_train, target_train, N_TRAIN, lambda);
if (s != SPIRES_OK) {
    fprintf(stderr, "Ridge training failed with status %d\n", s);
    spires_reservoir_destroy(r);
    return 1;
}
 
/* Reservoir is now trained -- run inference */
double *predictions = spires_run(r, input_test, N_TEST);

Internal Implementation

When you call spires_train_ridge(), SPIRES performs the following steps:

Reset the reservoir to its initial state.
Drive the reservoir with the input series, recording the state vector $\mathbf{x}_t$ at each time step $t$ .
Assemble the state matrix $\Phi \in \mathbb{R}^{T \times N}$ and target matrix $Y \in \mathbb{R}^{T \times N_{\text{out}}}$ .
Compute $\Phi^\top \Phi \in \mathbb{R}^{N \times N}$ and $\Phi^\top Y \in \mathbb{R}^{N \times N_{\text{out}}}$ using BLAS dgemm.
Regularize by adding $\lambda$ to the diagonal of $\Phi^\top \Phi$ .
Solve the linear system $(\Phi^\top \Phi + \lambda I)\, W_{\text{out}}^\top = \Phi^\top Y$ using LAPACKE (dposv for symmetric positive definite systems).
Store the resulting $W_{\text{out}}$ inside the reservoir struct.

The heavy computation is in steps 4 and 6, which are performed by optimized BLAS and LAPACK routines. For a reservoir with $N = 500$ neurons and $T = 5000$ time steps, the matrix $\Phi^\top \Phi$ is $500 \times 500$ and the solve completes in milliseconds on modern hardware.

Memory Considerations

Ridge regression requires storing the entire state matrix $\Phi$ in memory:

Memory: $T \times N \times 8$ bytes (for double precision)
For $T = 10000$ and $N = 500$ : approximately 40 MB

If memory is a constraint (e.g., very long training sequences or very large reservoirs), consider:

Reducing the training length $T$ (ensure it is still long enough to capture the dynamics).
Using the online delta rule instead, which processes one time step at a time and does not store the state matrix.

Washout Period

For reservoir computing, the first few time steps of the reservoir response depend on the initial conditions rather than the input signal. These transient states should be discarded from the training data to avoid corrupting the readout weights. This is called the washout period.

A common practice is to discard the first 100—500 time steps of the state matrix before assembling $\Phi$ . SPIRES handles this automatically when a washout length is specified, or you can account for it by starting your target series after the washout period.

Multi-Output Training

Ridge regression naturally supports multiple outputs. If num_outputs > 1, the target matrix $Y$ has multiple columns, and the solution $W_{\text{out}}$ has multiple rows. Each output is trained jointly, sharing the same regularization parameter $\lambda$ . If different outputs require different regularization, you can train them separately by setting num_outputs = 1 and calling spires_train_ridge() multiple times with different target series.

← Barabasi-Albert | Online Delta Rule →