Skip to contentSkip to Content
DocsTheoryMemory & Information

Memory & Information

A reservoir computer is only useful if it can remember past inputs and transform them in ways that a linear readout can exploit. This page develops the formal framework for quantifying reservoir memory and information processing, introduces the key measures — memory capacity, active information storage, and information transfer — and characterizes the fundamental trade-off between them as a function of the fractional order α\alpha.

The Reservoir Computing Framework

State Update Equation

The general reservoir computing framework defines a recurrent dynamical system driven by external input. For a reservoir of NN neurons with state vector x(t)RN\mathbf{x}(t) \in \mathbb{R}^N, input u(t)RNin\mathbf{u}(t) \in \mathbb{R}^{N_{\text{in}}}, and output y(t)RNout\mathbf{y}(t) \in \mathbb{R}^{N_{\text{out}}}:

xi(t+1)=fi ⁣(jWijresxj(t)+kWikinuk(t))(1)\tag{1} x_i(t+1) = f_i\!\left(\sum_j W_{ij}^{\text{res}} \, x_j(t) + \sum_k W_{ik}^{\text{in}} \, u_k(t)\right)

where fi()f_i(\cdot) is the neuron activation function, WresRN×NW^{\text{res}} \in \mathbb{R}^{N \times N} is the fixed recurrent weight matrix, and WinRN×NinW^{\text{in}} \in \mathbb{R}^{N \times N_{\text{in}}} is the fixed input weight matrix.

Linear Readout

The output is a linear combination of the reservoir state:

y(t)=Woutx(t)(2)\tag{2} \mathbf{y}(t) = W^{\text{out}} \, \mathbf{x}(t)

where WoutRNout×NW^{\text{out}} \in \mathbb{R}^{N_{\text{out}} \times N} is the only trained component. This linearity constraint is essential: it makes training a convex optimization problem and ensures that the reservoir’s computational power is entirely determined by its dynamics, not by the readout’s capacity.

Memory Capacity

Memory capacity (MC) measures how well a reservoir can reconstruct past inputs from its current state. It was introduced by Jaeger (2002) and provides a single scalar summary of the reservoir’s memory profile.

Linear Memory Capacity

To measure linear memory at lag kk, train the readout to reconstruct u(tk)u(t-k) from x(t)\mathbf{x}(t). The reconstruction is u^(tk)=wkx(t)\hat{u}(t-k) = \mathbf{w}_k^\top \mathbf{x}(t). The linear memory capacity at lag kk is the squared correlation between the target and reconstruction:

MCklin=Cov2 ⁣(utk,u^tk)Var(utk)Var(u^tk)(3)\tag{3} \text{MC}_k^{\text{lin}} = \frac{\text{Cov}^2\!\bigl(u_{t-k},\, \hat{u}_{t-k}\bigr)}{\text{Var}(u_{t-k}) \cdot \text{Var}(\hat{u}_{t-k})}

Each MCklin[0,1]\text{MC}_k^{\text{lin}} \in [0, 1], where 1 indicates perfect linear reconstruction and 0 indicates no recoverable linear information.

Nonlinear Memory Capacity

Reservoirs also store nonlinear functions of past inputs. To quantify this, the target is set to Legendre polynomials of past inputs: Pm(utk)P_m(u_{t-k}), where PmP_m is the mm-th Legendre polynomial. Legendre polynomials form an orthonormal basis on [1,1][-1, 1], ensuring that nonlinear memory contributions at different orders are independent.

MCk,mnonlin=Cov2 ⁣(Pm(utk),P^m(utk))Var(Pm(utk))Var(P^m(utk))(4)\tag{4} \text{MC}_{k,m}^{\text{nonlin}} = \frac{\text{Cov}^2\!\bigl(P_m(u_{t-k}),\, \hat{P}_m(u_{t-k})\bigr)}{\text{Var}(P_m(u_{t-k})) \cdot \text{Var}(\hat{P}_m(u_{t-k}))}

Total Memory Capacity

The total memory capacity sums over all lags and all polynomial orders:

MCtotal=k=1MCklin+k=1m=2MCk,mnonlin(5)\tag{5} \text{MC}_{\text{total}} = \sum_{k=1}^{\infty} \text{MC}_k^{\text{lin}} + \sum_{k=1}^{\infty} \sum_{m=2}^{\infty} \text{MC}_{k,m}^{\text{nonlin}}

A fundamental result due to Jaeger (2002) states that for a reservoir with NN neurons and i.i.d. input, the total memory capacity is bounded:

MCtotalN\text{MC}_{\text{total}} \leq N

This bound means that memory is a finite resource. The reservoir can distribute its NN degrees of freedom across lags and nonlinear orders, but it cannot exceed its dimensionality. The fractional order α\alpha controls how this budget is allocated.

Information-Theoretic Measures

Memory capacity measures reconstruction accuracy, but it does not capture the full picture of information processing. Information theory provides complementary measures that decompose the reservoir’s computation into storage and transfer components.

Active Information Storage (AIS)

Active information storage quantifies how much information the reservoir’s current state carries about its own past:

AIS=I(Xt;Xt+1)(6)\tag{6} \text{AIS} = I\bigl(X_t \,;\, X_{t+1}\bigr)

where I(;)I(\cdot \,;\, \cdot) denotes mutual information. High AIS indicates that the reservoir maintains a stable internal model — its state at t+1t+1 is highly predictable from its state at tt. This corresponds to strong internal dynamics and persistent memory.

Information Transfer

Information transfer quantifies how much information flows from the input to the reservoir state:

T=I(Upast;Xt)(7)\tag{7} T = I\bigl(U_{\text{past}} \,;\, X_t\bigr)

where Upast={u(t1),u(t2),}U_{\text{past}} = \{u(t-1), u(t-2), \ldots\} is the input history. High transfer indicates that the reservoir is strongly driven by external input — it acts as a sensitive transducer of the input signal.

The Fundamental Trade-Off

AIS and information transfer are in tension. A reservoir cannot simultaneously maximize both:

  • High AIS, low transfer: The reservoir’s dynamics are dominated by its own internal state. It is an autonomous system with strong memory but weak coupling to the input. It acts as a stable internal model.
  • Low AIS, high transfer: The reservoir’s state is dominated by the input drive. It has little autonomous dynamics but faithfully represents the input history. It acts as a sensitive sensor.
  • Intermediate regime: A balance where both AIS and transfer are significant. The reservoir both remembers its past and responds to new input. This is the regime of maximal computational capacity.

This trade-off is governed by the fractional order α\alpha:

α        AIS,T(internal model)\alpha \downarrow \;\implies\; \text{AIS} \uparrow, \quad T \downarrow \qquad \text{(internal model)} α        AIS,T(sensitive sensor)\alpha \uparrow \;\implies\; \text{AIS} \downarrow, \quad T \uparrow \qquad \text{(sensitive sensor)}

Memory Capacity as a Function of α\alpha

Experimental measurements of MC as a function of α\alpha reveal a characteristic sigmoidal curve:

  • For α0.3\alpha \lesssim 0.3: MC is near its maximum. The long power-law memory of the fractional derivative allows the reservoir to retain input information over many time steps.
  • For α0.50.7\alpha \approx 0.5\text{--}0.7: MC decreases through its steepest region. This is the transition zone where the memory profile shifts from power-law to exponential.
  • For α0.9\alpha \gtrsim 0.9: MC approaches the classical (integer-order) value. The reservoir has Markovian dynamics and retains only recent inputs.

The sigmoidal transition is not a sharp phase transition but a smooth crossover. The midpoint and steepness depend on the reservoir size NN, spectral radius ρ\rho, and input strength.

Memory Profile Shape

The distribution of MC across lags also changes with α\alpha:

  • Low α\alpha: MC is distributed broadly across many lags. The reservoir remembers distant past inputs at the expense of precise recall of recent ones.
  • High α\alpha: MC is concentrated at short lags. Recent inputs are recalled with high fidelity, but information about the distant past is lost.
  • Intermediate α\alpha: A balanced profile that provides reasonable recall across a range of lags.

The Critical-Like Operating Point

The intermediate-α\alpha regime has a special significance. At the boundary between the sensor and internal-model regimes, the reservoir operates near a critical-like point where:

  1. Computational capacity is maximized. The reservoir can perform both memory-intensive and input-sensitive computations.
  2. Dynamical range is maximized. The reservoir responds to a wide range of input amplitudes.
  3. Separation property is strongest. Different input sequences produce maximally distinct reservoir trajectories.

This is analogous to the “edge of chaos” phenomenon in classical reservoir computing, where computational capacity is maximized at the boundary between stable and chaotic dynamics (spectral radius ρ1\rho \approx 1). In fractional-order reservoirs, the fractional order α\alpha provides a second axis along which this critical-like behavior can be tuned, independently of the spectral radius.

The practical consequence is that the optimal α\alpha for a given task is typically not at the extremes (α=0\alpha = 0 or α=1\alpha = 1) but in the intermediate range. The precise optimum depends on the task’s temporal structure:

  • Tasks requiring long-range memory (e.g., slow physiological processes) favor lower α\alpha.
  • Tasks requiring rapid response (e.g., real-time control) favor higher α\alpha.
  • Tasks requiring both (e.g., speech recognition) favor intermediate α\alpha.

Relationship to Spectral Radius

The spectral radius ρ\rho and the fractional order α\alpha both control the reservoir’s memory properties, but they act through different mechanisms:

PropertySpectral radius ρ\rhoFractional order α\alpha
ControlsRate of state decay per stepShape of memory kernel
MechanismEigenvalue scalingPower-law vs. exponential decay
Memory increaseρ1\rho \to 1α0\alpha \to 0
Stability boundaryρ=1\rho = 1 (echo state limit)None (stable for all α(0,1]\alpha \in (0,1])
Effect on nonlinearityIndirect (via signal amplitude)Direct (kernel shape)

The two parameters are complementary. The spectral radius sets the overall gain of the recurrent dynamics, while the fractional order shapes the temporal kernel through which past information is integrated. A well-designed fractional reservoir optimizes both.

Practical Implications for SPIRES

The theoretical framework described here has direct practical consequences for configuring SPIRES reservoirs:

  1. Choosing α\alpha: Start with α[0.3,0.7]\alpha \in [0.3, 0.7] for tasks with mixed temporal requirements. Use the AGILE optimizer to search the α\alpha space systematically.

  2. Memory capacity measurement: SPIRES can be used to measure MC experimentally by training the readout on delayed versions of the input. Plotting MCk\text{MC}_k vs. kk reveals the memory profile and helps diagnose whether α\alpha is appropriate for the task.

  3. History length LL: The history length must be long enough to capture the memory profile. A rule of thumb: LL should be at least 2×2\times the longest lag at which MCk\text{MC}_k is appreciably nonzero.

  4. Interaction with spectral radius: The optimal (ρ,α)(\rho, \alpha) pair is task-dependent. Generally, combining ρ0.91.0\rho \approx 0.9\text{--}1.0 with intermediate α\alpha provides a good starting point.

References

  1. Jaeger, H. (2002). Short-term memory in echo state networks. GMD Report 152, German National Research Center for Information Technology.
  2. Lizier, J. T. (2012). The Local Information Dynamics of Distributed Computation in Complex Systems. Springer.
  3. Lizier, J. T., Prokopenko, M., & Zomaya, A. Y. (2012). Local measures of information storage in complex distributed computation. Information Sciences, 208, 39—54.
  4. Verstraeten, D., Schrauwen, B., D’Haene, M., & Stroobandt, D. (2007). An experimental unification of reservoir computing methods. Neural Networks, 20(3), 391—403.
  5. Lundstrom, B. N., Higgs, M. H., Spain, W. J., & Fairhall, A. L. (2008). Fractional differentiation by neocortical pyramidal neurons. Nature Neuroscience, 11(11), 1335—1342.

← LIF Dynamics | Spectral Radius →

Last updated on