Scoring Metrics
The scoring configuration tells the AGILE optimizer how to evaluate and rank candidate reservoir configurations. It defines which performance metric to use, how to penalize variance across random seeds, and how to penalize computational cost. These choices shape the optimizer’s search toward configurations that are not only accurate but also robust and efficient.
The spires_opt_score Struct
struct spires_opt_score {
double lambda_var; /* variance penalty weight */
double lambda_cost; /* computational cost penalty weight */
int metric; /* performance metric (enum) */
};Fields
| Field | Type | Description |
|---|---|---|
lambda_var | double | Weight for the variance penalty. Higher values favor configurations with consistent performance across random seeds. Range: . |
lambda_cost | double | Weight for the computational cost penalty. Higher values favor cheaper (faster) configurations. Range: . |
metric | int | The performance metric to optimize. One of SPIRES_METRIC_AUROC or SPIRES_METRIC_AUPRC. |
Composite Score
The optimizer computes a composite score for each candidate configuration:
where:
- is the mean of the performance metric across random seeds
- is the standard deviation of the metric across seeds
- is a normalized computational cost measure
- and are the penalty weights
The optimizer maximizes this composite score. A configuration that scores highly must have a high mean metric, low variance across seeds, and low computational cost (if cost is penalized).
Performance Metrics
AUROC (Area Under the Receiver Operating Characteristic)
score.metric = SPIRES_METRIC_AUROC; /* value: 0 */AUROC measures the probability that a randomly chosen positive example is ranked higher than a randomly chosen negative example by the classifier. It ranges from 0 to 1, where:
- 1.0: Perfect discrimination
- 0.5: Random guessing (no discriminative ability)
- < 0.5: Worse than random (predictions are inverted)
AUROC is threshold-independent: it evaluates the ranking quality of the output across all possible classification thresholds. This makes it robust to miscalibrated output scales.
When to use AUROC:
- Balanced or moderately imbalanced classification tasks
- When you care about overall ranking quality
- When the cost of false positives and false negatives is roughly equal
AUPRC (Area Under the Precision-Recall Curve)
score.metric = SPIRES_METRIC_AUPRC; /* value: 1 */AUPRC measures the area under the precision-recall curve, where precision = TP / (TP + FP) and recall = TP / (TP + FN). It ranges from 0 to 1, where:
- 1.0: Perfect precision and recall at all thresholds
- Baseline: Equal to the positive class prevalence (e.g., 0.01 for 1% positive rate)
AUPRC is more informative than AUROC when classes are highly imbalanced, because it focuses on the performance of the positive class. A high AUROC can be achieved trivially on imbalanced data by predicting the majority class, but a high AUPRC requires genuinely identifying the rare positive cases.
When to use AUPRC:
- Highly imbalanced classification tasks (e.g., anomaly detection, rare event prediction)
- When false negatives are costly (missing a positive case)
- When the positive class prevalence is below 10%
Choosing Between AUROC and AUPRC
| Scenario | Recommended Metric |
|---|---|
| Balanced classes (40—60% positive) | AUROC |
| Moderate imbalance (10—40% positive) | AUROC or AUPRC |
| High imbalance (1—10% positive) | AUPRC |
| Extreme imbalance (< 1% positive) | AUPRC |
| Cost-sensitive with equal costs | AUROC |
| Detecting rare events | AUPRC |
Variance Penalty
The variance penalty controls how much the optimizer values consistency across random seeds.
When a configuration is evaluated with random seeds, it produces metric values . The mean and standard deviation are computed, and the score is reduced by .
Effect of lambda_var
lambda_var | Behavior |
|---|---|
| 0.0 | No variance penalty; optimizer seeks highest mean performance regardless of consistency |
| 0.5 | Moderate penalty; a 1-standard-deviation decrease in consistency costs half a metric point |
| 1.0 | Strong penalty; equivalent to optimizing the lower bound |
| 2.0 | Very strong; approximately a 95% confidence lower bound |
Practical guidance:
- For research and benchmarking, use
lambda_var = 0.0or0.5to find the highest-performing configuration. - For production deployments where reliability matters, use
lambda_var = 1.0or higher to ensure the chosen configuration performs consistently. - If using very few seeds (1—2), the variance estimate is unreliable. Either increase the number of seeds or reduce the penalty.
Cost Penalty
The cost penalty discourages the optimizer from selecting computationally expensive configurations when cheaper alternatives perform nearly as well.
The cost is a normalized measure that accounts for:
- Reservoir size: Larger reservoirs (more neurons) are more expensive.
- Neuron complexity: Fractional neurons with long histories are more expensive per step than simple LIF neurons.
- Connectivity density: Denser networks have more synaptic computations.
Effect of lambda_cost
lambda_cost | Behavior |
|---|---|
| 0.0 | No cost penalty; optimizer chooses the best configuration regardless of computational expense |
| 0.1 | Mild penalty; prefers cheaper configurations when performance is similar |
| 0.5 | Moderate penalty; willing to sacrifice some performance for significant speedup |
| 1.0 | Strong penalty; aggressively favors cheap configurations |
Practical guidance:
- For offline analysis where compute time is not critical, use
lambda_cost = 0.0. - For real-time or embedded applications where inference speed matters, increase
lambda_costto bias toward smaller, faster reservoirs. - For balancing accuracy and efficiency,
lambda_cost = 0.1is a good starting point.
Example Configurations
Maximum Performance
Seek the highest AUROC, regardless of variance or cost:
struct spires_opt_score score = {
.lambda_var = 0.0,
.lambda_cost = 0.0,
.metric = SPIRES_METRIC_AUROC,
};Robust Performance
Optimize for consistent AUROC across seeds:
struct spires_opt_score score = {
.lambda_var = 1.0,
.lambda_cost = 0.0,
.metric = SPIRES_METRIC_AUROC,
};Balanced Efficiency
Good AUPRC with a preference for cheaper configurations:
struct spires_opt_score score = {
.lambda_var = 0.5,
.lambda_cost = 0.2,
.metric = SPIRES_METRIC_AUPRC,
};Real-Time Deployment
Strongly favor fast configurations for anomaly detection:
struct spires_opt_score score = {
.lambda_var = 1.0,
.lambda_cost = 0.5,
.metric = SPIRES_METRIC_AUPRC,
};Interpreting the Result
After optimization, the spires_opt_result struct contains:
struct spires_opt_result {
spires_reservoir_config best_config; /* optimal configuration */
double best_log10_ridge; /* log10 of best ridge lambda */
double best_score; /* composite score */
double metric_mean; /* mean metric across seeds */
double metric_std; /* std of metric across seeds */
};| Field | Interpretation |
|---|---|
best_score | The composite score (metric mean minus penalties). This is what the optimizer maximized. |
metric_mean | The raw mean performance metric. Compare this across different scoring configurations to understand the accuracy-cost trade-off. |
metric_std | The variability across seeds. Lower is better for deployment reliability. |
best_log10_ridge | The optimal ridge regularization parameter in log-space. Use pow(10.0, best_log10_ridge) to get . |
Interaction with Budget Levels
The scoring configuration is applied at every budget level. At low-fidelity levels (fewer seeds, less data), the metric estimates are noisier. The variance penalty effectively accounts for this: configurations with high variance at low fidelity are penalized, which is appropriate because truly good configurations tend to show consistent performance even with limited evaluation.
At higher fidelity levels, the metric estimates become more reliable, and the variance penalty becomes a true measure of the configuration’s inherent robustness rather than an artifact of limited evaluation.