Memory & Information
A reservoir computer is only useful if it can remember past inputs and transform them in ways that a linear readout can exploit. This page develops the formal framework for quantifying reservoir memory and information processing, introduces the key measures — memory capacity, active information storage, and information transfer — and characterizes the fundamental trade-off between them as a function of the fractional order .
The Reservoir Computing Framework
State Update Equation
The general reservoir computing framework defines a recurrent dynamical system driven by external input. For a reservoir of neurons with state vector , input , and output :
where is the neuron activation function, is the fixed recurrent weight matrix, and is the fixed input weight matrix.
Linear Readout
The output is a linear combination of the reservoir state:
where is the only trained component. This linearity constraint is essential: it makes training a convex optimization problem and ensures that the reservoir’s computational power is entirely determined by its dynamics, not by the readout’s capacity.
Memory Capacity
Memory capacity (MC) measures how well a reservoir can reconstruct past inputs from its current state. It was introduced by Jaeger (2002) and provides a single scalar summary of the reservoir’s memory profile.
Linear Memory Capacity
To measure linear memory at lag , train the readout to reconstruct from . The reconstruction is . The linear memory capacity at lag is the squared correlation between the target and reconstruction:
Each , where 1 indicates perfect linear reconstruction and 0 indicates no recoverable linear information.
Nonlinear Memory Capacity
Reservoirs also store nonlinear functions of past inputs. To quantify this, the target is set to Legendre polynomials of past inputs: , where is the -th Legendre polynomial. Legendre polynomials form an orthonormal basis on , ensuring that nonlinear memory contributions at different orders are independent.
Total Memory Capacity
The total memory capacity sums over all lags and all polynomial orders:
A fundamental result due to Jaeger (2002) states that for a reservoir with neurons and i.i.d. input, the total memory capacity is bounded:
This bound means that memory is a finite resource. The reservoir can distribute its degrees of freedom across lags and nonlinear orders, but it cannot exceed its dimensionality. The fractional order controls how this budget is allocated.
Information-Theoretic Measures
Memory capacity measures reconstruction accuracy, but it does not capture the full picture of information processing. Information theory provides complementary measures that decompose the reservoir’s computation into storage and transfer components.
Active Information Storage (AIS)
Active information storage quantifies how much information the reservoir’s current state carries about its own past:
where denotes mutual information. High AIS indicates that the reservoir maintains a stable internal model — its state at is highly predictable from its state at . This corresponds to strong internal dynamics and persistent memory.
Information Transfer
Information transfer quantifies how much information flows from the input to the reservoir state:
where is the input history. High transfer indicates that the reservoir is strongly driven by external input — it acts as a sensitive transducer of the input signal.
The Fundamental Trade-Off
AIS and information transfer are in tension. A reservoir cannot simultaneously maximize both:
- High AIS, low transfer: The reservoir’s dynamics are dominated by its own internal state. It is an autonomous system with strong memory but weak coupling to the input. It acts as a stable internal model.
- Low AIS, high transfer: The reservoir’s state is dominated by the input drive. It has little autonomous dynamics but faithfully represents the input history. It acts as a sensitive sensor.
- Intermediate regime: A balance where both AIS and transfer are significant. The reservoir both remembers its past and responds to new input. This is the regime of maximal computational capacity.
This trade-off is governed by the fractional order :
Memory Capacity as a Function of
Experimental measurements of MC as a function of reveal a characteristic sigmoidal curve:
- For : MC is near its maximum. The long power-law memory of the fractional derivative allows the reservoir to retain input information over many time steps.
- For : MC decreases through its steepest region. This is the transition zone where the memory profile shifts from power-law to exponential.
- For : MC approaches the classical (integer-order) value. The reservoir has Markovian dynamics and retains only recent inputs.
The sigmoidal transition is not a sharp phase transition but a smooth crossover. The midpoint and steepness depend on the reservoir size , spectral radius , and input strength.
Memory Profile Shape
The distribution of MC across lags also changes with :
- Low : MC is distributed broadly across many lags. The reservoir remembers distant past inputs at the expense of precise recall of recent ones.
- High : MC is concentrated at short lags. Recent inputs are recalled with high fidelity, but information about the distant past is lost.
- Intermediate : A balanced profile that provides reasonable recall across a range of lags.
The Critical-Like Operating Point
The intermediate- regime has a special significance. At the boundary between the sensor and internal-model regimes, the reservoir operates near a critical-like point where:
- Computational capacity is maximized. The reservoir can perform both memory-intensive and input-sensitive computations.
- Dynamical range is maximized. The reservoir responds to a wide range of input amplitudes.
- Separation property is strongest. Different input sequences produce maximally distinct reservoir trajectories.
This is analogous to the “edge of chaos” phenomenon in classical reservoir computing, where computational capacity is maximized at the boundary between stable and chaotic dynamics (spectral radius ). In fractional-order reservoirs, the fractional order provides a second axis along which this critical-like behavior can be tuned, independently of the spectral radius.
The practical consequence is that the optimal for a given task is typically not at the extremes ( or ) but in the intermediate range. The precise optimum depends on the task’s temporal structure:
- Tasks requiring long-range memory (e.g., slow physiological processes) favor lower .
- Tasks requiring rapid response (e.g., real-time control) favor higher .
- Tasks requiring both (e.g., speech recognition) favor intermediate .
Relationship to Spectral Radius
The spectral radius and the fractional order both control the reservoir’s memory properties, but they act through different mechanisms:
| Property | Spectral radius | Fractional order |
|---|---|---|
| Controls | Rate of state decay per step | Shape of memory kernel |
| Mechanism | Eigenvalue scaling | Power-law vs. exponential decay |
| Memory increase | ||
| Stability boundary | (echo state limit) | None (stable for all ) |
| Effect on nonlinearity | Indirect (via signal amplitude) | Direct (kernel shape) |
The two parameters are complementary. The spectral radius sets the overall gain of the recurrent dynamics, while the fractional order shapes the temporal kernel through which past information is integrated. A well-designed fractional reservoir optimizes both.
Practical Implications for SPIRES
The theoretical framework described here has direct practical consequences for configuring SPIRES reservoirs:
-
Choosing : Start with for tasks with mixed temporal requirements. Use the AGILE optimizer to search the space systematically.
-
Memory capacity measurement: SPIRES can be used to measure MC experimentally by training the readout on delayed versions of the input. Plotting vs. reveals the memory profile and helps diagnose whether is appropriate for the task.
-
History length : The history length must be long enough to capture the memory profile. A rule of thumb: should be at least the longest lag at which is appreciably nonzero.
-
Interaction with spectral radius: The optimal pair is task-dependent. Generally, combining with intermediate provides a good starting point.
References
- Jaeger, H. (2002). Short-term memory in echo state networks. GMD Report 152, German National Research Center for Information Technology.
- Lizier, J. T. (2012). The Local Information Dynamics of Distributed Computation in Complex Systems. Springer.
- Lizier, J. T., Prokopenko, M., & Zomaya, A. Y. (2012). Local measures of information storage in complex distributed computation. Information Sciences, 208, 39—54.
- Verstraeten, D., Schrauwen, B., D’Haene, M., & Stroobandt, D. (2007). An experimental unification of reservoir computing methods. Neural Networks, 20(3), 391—403.
- Lundstrom, B. N., Higgs, M. H., Spain, W. J., & Fairhall, A. L. (2008). Fractional differentiation by neocortical pyramidal neurons. Nature Neuroscience, 11(11), 1335—1342.