Memory & Information

A reservoir computer is only useful if it can remember past inputs and transform them in ways that a linear readout can exploit. This page develops the formal framework for quantifying reservoir memory and information processing, introduces the key measures — memory capacity, active information storage, and information transfer — and characterizes the fundamental trade-off between them as a function of the fractional order $\alpha$ .

The Reservoir Computing Framework

State Update Equation

The general reservoir computing framework defines a recurrent dynamical system driven by external input. For a reservoir of $N$ neurons with state vector $\mathbf{x}(t) \in \mathbb{R}^N$ , input $\mathbf{u}(t) \in \mathbb{R}^{N_{\text{in}}}$ , and output $\mathbf{y}(t) \in \mathbb{R}^{N_{\text{out}}}$ :

\tag{1} x_i(t+1) = f_i\!\left(\sum_j W_{ij}^{\text{res}} \, x_j(t) + \sum_k W_{ik}^{\text{in}} \, u_k(t)\right)

where $f_i(\cdot)$ is the neuron activation function, $W^{\text{res}} \in \mathbb{R}^{N \times N}$ is the fixed recurrent weight matrix, and $W^{\text{in}} \in \mathbb{R}^{N \times N_{\text{in}}}$ is the fixed input weight matrix.

Linear Readout

The output is a linear combination of the reservoir state:

\tag{2} \mathbf{y}(t) = W^{\text{out}} \, \mathbf{x}(t)

where $W^{\text{out}} \in \mathbb{R}^{N_{\text{out}} \times N}$ is the only trained component. This linearity constraint is essential: it makes training a convex optimization problem and ensures that the reservoir’s computational power is entirely determined by its dynamics, not by the readout’s capacity.

Memory Capacity

Memory capacity (MC) measures how well a reservoir can reconstruct past inputs from its current state. It was introduced by Jaeger (2002) and provides a single scalar summary of the reservoir’s memory profile.

Linear Memory Capacity

To measure linear memory at lag $k$ , train the readout to reconstruct $u(t-k)$ from $\mathbf{x}(t)$ . The reconstruction is $\hat{u}(t-k) = \mathbf{w}_k^\top \mathbf{x}(t)$ . The linear memory capacity at lag $k$ is the squared correlation between the target and reconstruction:

\tag{3} \text{MC}_k^{\text{lin}} = \frac{\text{Cov}^2\!\bigl(u_{t-k},\, \hat{u}_{t-k}\bigr)}{\text{Var}(u_{t-k}) \cdot \text{Var}(\hat{u}_{t-k})}

Each $\text{MC}_k^{\text{lin}} \in [0, 1]$ , where 1 indicates perfect linear reconstruction and 0 indicates no recoverable linear information.

Nonlinear Memory Capacity

Reservoirs also store nonlinear functions of past inputs. To quantify this, the target is set to Legendre polynomials of past inputs: $P_m(u_{t-k})$ , where $P_m$ is the $m$ -th Legendre polynomial. Legendre polynomials form an orthonormal basis on $[-1, 1]$ , ensuring that nonlinear memory contributions at different orders are independent.

\tag{4} \text{MC}_{k,m}^{\text{nonlin}} = \frac{\text{Cov}^2\!\bigl(P_m(u_{t-k}),\, \hat{P}_m(u_{t-k})\bigr)}{\text{Var}(P_m(u_{t-k})) \cdot \text{Var}(\hat{P}_m(u_{t-k}))}

Total Memory Capacity

The total memory capacity sums over all lags and all polynomial orders:

\tag{5} \text{MC}_{\text{total}} = \sum_{k=1}^{\infty} \text{MC}_k^{\text{lin}} + \sum_{k=1}^{\infty} \sum_{m=2}^{\infty} \text{MC}_{k,m}^{\text{nonlin}}

A fundamental result due to Jaeger (2002) states that for a reservoir with $N$ neurons and i.i.d. input, the total memory capacity is bounded:

\text{MC}_{\text{total}} \leq N

This bound means that memory is a finite resource. The reservoir can distribute its $N$ degrees of freedom across lags and nonlinear orders, but it cannot exceed its dimensionality. The fractional order $\alpha$ controls how this budget is allocated.

Information-Theoretic Measures

Memory capacity measures reconstruction accuracy, but it does not capture the full picture of information processing. Information theory provides complementary measures that decompose the reservoir’s computation into storage and transfer components.

Active Information Storage (AIS)

Active information storage quantifies how much information the reservoir’s current state carries about its own past:

\tag{6} \text{AIS} = I\bigl(X_t \,;\, X_{t+1}\bigr)

where $I(\cdot \,;\, \cdot)$ denotes mutual information. High AIS indicates that the reservoir maintains a stable internal model — its state at $t+1$ is highly predictable from its state at $t$ . This corresponds to strong internal dynamics and persistent memory.

Information Transfer

Information transfer quantifies how much information flows from the input to the reservoir state:

\tag{7} T = I\bigl(U_{\text{past}} \,;\, X_t\bigr)

where $U_{\text{past}} = \{u(t-1), u(t-2), \ldots\}$ is the input history. High transfer indicates that the reservoir is strongly driven by external input — it acts as a sensitive transducer of the input signal.

The Fundamental Trade-Off

AIS and information transfer are in tension. A reservoir cannot simultaneously maximize both:

High AIS, low transfer: The reservoir’s dynamics are dominated by its own internal state. It is an autonomous system with strong memory but weak coupling to the input. It acts as a stable internal model.
Low AIS, high transfer: The reservoir’s state is dominated by the input drive. It has little autonomous dynamics but faithfully represents the input history. It acts as a sensitive sensor.
Intermediate regime: A balance where both AIS and transfer are significant. The reservoir both remembers its past and responds to new input. This is the regime of maximal computational capacity.

This trade-off is governed by the fractional order $\alpha$ :

\alpha \downarrow \;\implies\; \text{AIS} \uparrow, \quad T \downarrow \qquad \text{(internal model)}

\alpha \uparrow \;\implies\; \text{AIS} \downarrow, \quad T \uparrow \qquad \text{(sensitive sensor)}

Memory Capacity as a Function of $\alpha$

Experimental measurements of MC as a function of $\alpha$ reveal a characteristic sigmoidal curve:

For $\alpha \lesssim 0.3$ : MC is near its maximum. The long power-law memory of the fractional derivative allows the reservoir to retain input information over many time steps.
For $\alpha \approx 0.5\text{--}0.7$ : MC decreases through its steepest region. This is the transition zone where the memory profile shifts from power-law to exponential.
For $\alpha \gtrsim 0.9$ : MC approaches the classical (integer-order) value. The reservoir has Markovian dynamics and retains only recent inputs.

The sigmoidal transition is not a sharp phase transition but a smooth crossover. The midpoint and steepness depend on the reservoir size $N$ , spectral radius $\rho$ , and input strength.

Memory Profile Shape

The distribution of MC across lags also changes with $\alpha$ :

Low $\alpha$ : MC is distributed broadly across many lags. The reservoir remembers distant past inputs at the expense of precise recall of recent ones.
High $\alpha$ : MC is concentrated at short lags. Recent inputs are recalled with high fidelity, but information about the distant past is lost.
Intermediate $\alpha$ : A balanced profile that provides reasonable recall across a range of lags.

The Critical-Like Operating Point

The intermediate- $\alpha$ regime has a special significance. At the boundary between the sensor and internal-model regimes, the reservoir operates near a critical-like point where:

Computational capacity is maximized. The reservoir can perform both memory-intensive and input-sensitive computations.
Dynamical range is maximized. The reservoir responds to a wide range of input amplitudes.
Separation property is strongest. Different input sequences produce maximally distinct reservoir trajectories.

This is analogous to the “edge of chaos” phenomenon in classical reservoir computing, where computational capacity is maximized at the boundary between stable and chaotic dynamics (spectral radius $\rho \approx 1$ ). In fractional-order reservoirs, the fractional order $\alpha$ provides a second axis along which this critical-like behavior can be tuned, independently of the spectral radius.

The practical consequence is that the optimal $\alpha$ for a given task is typically not at the extremes ( $\alpha = 0$ or $\alpha = 1$ ) but in the intermediate range. The precise optimum depends on the task’s temporal structure:

Tasks requiring long-range memory (e.g., slow physiological processes) favor lower $\alpha$ .
Tasks requiring rapid response (e.g., real-time control) favor higher $\alpha$ .
Tasks requiring both (e.g., speech recognition) favor intermediate $\alpha$ .

Relationship to Spectral Radius

The spectral radius $\rho$ and the fractional order $\alpha$ both control the reservoir’s memory properties, but they act through different mechanisms:

Property	Spectral radius $\rho$	Fractional order $\alpha$
Controls	Rate of state decay per step	Shape of memory kernel
Mechanism	Eigenvalue scaling	Power-law vs. exponential decay
Memory increase	$\rho \to 1$	$\alpha \to 0$
Stability boundary	$\rho = 1$ (echo state limit)	None (stable for all $\alpha \in (0,1]$ )
Effect on nonlinearity	Indirect (via signal amplitude)	Direct (kernel shape)

The two parameters are complementary. The spectral radius sets the overall gain of the recurrent dynamics, while the fractional order shapes the temporal kernel through which past information is integrated. A well-designed fractional reservoir optimizes both.

Practical Implications for SPIRES

The theoretical framework described here has direct practical consequences for configuring SPIRES reservoirs:

Choosing $\alpha$ : Start with $\alpha \in [0.3, 0.7]$ for tasks with mixed temporal requirements. Use the AGILE optimizer to search the $\alpha$ space systematically.
Memory capacity measurement: SPIRES can be used to measure MC experimentally by training the readout on delayed versions of the input. Plotting $\text{MC}_k$ vs. $k$ reveals the memory profile and helps diagnose whether $\alpha$ is appropriate for the task.
History length $L$ : The history length must be long enough to capture the memory profile. A rule of thumb: $L$ should be at least $2\times$ the longest lag at which $\text{MC}_k$ is appreciably nonzero.
Interaction with spectral radius: The optimal $(\rho, \alpha)$ pair is task-dependent. Generally, combining $\rho \approx 0.9\text{--}1.0$ with intermediate $\alpha$ provides a good starting point.

References

Jaeger, H. (2002). Short-term memory in echo state networks. GMD Report 152, German National Research Center for Information Technology.
Lizier, J. T. (2012). The Local Information Dynamics of Distributed Computation in Complex Systems. Springer.
Lizier, J. T., Prokopenko, M., & Zomaya, A. Y. (2012). Local measures of information storage in complex distributed computation. Information Sciences, 208, 39—54.
Verstraeten, D., Schrauwen, B., D’Haene, M., & Stroobandt, D. (2007). An experimental unification of reservoir computing methods. Neural Networks, 20(3), 391—403.
Lundstrom, B. N., Higgs, M. H., Spain, W. J., & Fairhall, A. L. (2008). Fractional differentiation by neocortical pyramidal neurons. Nature Neuroscience, 11(11), 1335—1342.

← LIF Dynamics | Spectral Radius →