Skip to contentSkip to Content

Scoring Metrics

The scoring configuration tells the AGILE optimizer how to evaluate and rank candidate reservoir configurations. It defines which performance metric to use, how to penalize variance across random seeds, and how to penalize computational cost. These choices shape the optimizer’s search toward configurations that are not only accurate but also robust and efficient.

The spires_opt_score Struct

struct spires_opt_score { double lambda_var; /* variance penalty weight */ double lambda_cost; /* computational cost penalty weight */ int metric; /* performance metric (enum) */ };

Fields

FieldTypeDescription
lambda_vardoubleWeight for the variance penalty. Higher values favor configurations with consistent performance across random seeds. Range: 0\geq 0.
lambda_costdoubleWeight for the computational cost penalty. Higher values favor cheaper (faster) configurations. Range: 0\geq 0.
metricintThe performance metric to optimize. One of SPIRES_METRIC_AUROC or SPIRES_METRIC_AUPRC.

Composite Score

The optimizer computes a composite score for each candidate configuration:

Score=mˉλvarσmλcostc\text{Score} = \bar{m} - \lambda_{\text{var}} \cdot \sigma_m - \lambda_{\text{cost}} \cdot c

where:

  • mˉ\bar{m} is the mean of the performance metric across random seeds
  • σm\sigma_m is the standard deviation of the metric across seeds
  • cc is a normalized computational cost measure
  • λvar\lambda_{\text{var}} and λcost\lambda_{\text{cost}} are the penalty weights

The optimizer maximizes this composite score. A configuration that scores highly must have a high mean metric, low variance across seeds, and low computational cost (if cost is penalized).

Performance Metrics

AUROC (Area Under the Receiver Operating Characteristic)

score.metric = SPIRES_METRIC_AUROC; /* value: 0 */

AUROC measures the probability that a randomly chosen positive example is ranked higher than a randomly chosen negative example by the classifier. It ranges from 0 to 1, where:

  • 1.0: Perfect discrimination
  • 0.5: Random guessing (no discriminative ability)
  • < 0.5: Worse than random (predictions are inverted)

AUROC is threshold-independent: it evaluates the ranking quality of the output across all possible classification thresholds. This makes it robust to miscalibrated output scales.

When to use AUROC:

  • Balanced or moderately imbalanced classification tasks
  • When you care about overall ranking quality
  • When the cost of false positives and false negatives is roughly equal

AUPRC (Area Under the Precision-Recall Curve)

score.metric = SPIRES_METRIC_AUPRC; /* value: 1 */

AUPRC measures the area under the precision-recall curve, where precision = TP / (TP + FP) and recall = TP / (TP + FN). It ranges from 0 to 1, where:

  • 1.0: Perfect precision and recall at all thresholds
  • Baseline: Equal to the positive class prevalence (e.g., 0.01 for 1% positive rate)

AUPRC is more informative than AUROC when classes are highly imbalanced, because it focuses on the performance of the positive class. A high AUROC can be achieved trivially on imbalanced data by predicting the majority class, but a high AUPRC requires genuinely identifying the rare positive cases.

When to use AUPRC:

  • Highly imbalanced classification tasks (e.g., anomaly detection, rare event prediction)
  • When false negatives are costly (missing a positive case)
  • When the positive class prevalence is below 10%

Choosing Between AUROC and AUPRC

ScenarioRecommended Metric
Balanced classes (40—60% positive)AUROC
Moderate imbalance (10—40% positive)AUROC or AUPRC
High imbalance (1—10% positive)AUPRC
Extreme imbalance (< 1% positive)AUPRC
Cost-sensitive with equal costsAUROC
Detecting rare eventsAUPRC

Variance Penalty

The variance penalty λvar\lambda_{\text{var}} controls how much the optimizer values consistency across random seeds.

When a configuration is evaluated with SS random seeds, it produces SS metric values {m1,m2,,mS}\{m_1, m_2, \ldots, m_S\}. The mean mˉ\bar{m} and standard deviation σm\sigma_m are computed, and the score is reduced by λvarσm\lambda_{\text{var}} \cdot \sigma_m.

Effect of lambda_var

lambda_varBehavior
0.0No variance penalty; optimizer seeks highest mean performance regardless of consistency
0.5Moderate penalty; a 1-standard-deviation decrease in consistency costs half a metric point
1.0Strong penalty; equivalent to optimizing the lower bound mˉσm\bar{m} - \sigma_m
2.0Very strong; approximately a 95% confidence lower bound

Practical guidance:

  • For research and benchmarking, use lambda_var = 0.0 or 0.5 to find the highest-performing configuration.
  • For production deployments where reliability matters, use lambda_var = 1.0 or higher to ensure the chosen configuration performs consistently.
  • If using very few seeds (1—2), the variance estimate is unreliable. Either increase the number of seeds or reduce the penalty.

Cost Penalty

The cost penalty λcost\lambda_{\text{cost}} discourages the optimizer from selecting computationally expensive configurations when cheaper alternatives perform nearly as well.

The cost cc is a normalized measure that accounts for:

  • Reservoir size: Larger reservoirs (more neurons) are more expensive.
  • Neuron complexity: Fractional neurons with long histories are more expensive per step than simple LIF neurons.
  • Connectivity density: Denser networks have more synaptic computations.

Effect of lambda_cost

lambda_costBehavior
0.0No cost penalty; optimizer chooses the best configuration regardless of computational expense
0.1Mild penalty; prefers cheaper configurations when performance is similar
0.5Moderate penalty; willing to sacrifice some performance for significant speedup
1.0Strong penalty; aggressively favors cheap configurations

Practical guidance:

  • For offline analysis where compute time is not critical, use lambda_cost = 0.0.
  • For real-time or embedded applications where inference speed matters, increase lambda_cost to bias toward smaller, faster reservoirs.
  • For balancing accuracy and efficiency, lambda_cost = 0.1 is a good starting point.

Example Configurations

Maximum Performance

Seek the highest AUROC, regardless of variance or cost:

struct spires_opt_score score = { .lambda_var = 0.0, .lambda_cost = 0.0, .metric = SPIRES_METRIC_AUROC, };

Robust Performance

Optimize for consistent AUROC across seeds:

struct spires_opt_score score = { .lambda_var = 1.0, .lambda_cost = 0.0, .metric = SPIRES_METRIC_AUROC, };

Balanced Efficiency

Good AUPRC with a preference for cheaper configurations:

struct spires_opt_score score = { .lambda_var = 0.5, .lambda_cost = 0.2, .metric = SPIRES_METRIC_AUPRC, };

Real-Time Deployment

Strongly favor fast configurations for anomaly detection:

struct spires_opt_score score = { .lambda_var = 1.0, .lambda_cost = 0.5, .metric = SPIRES_METRIC_AUPRC, };

Interpreting the Result

After optimization, the spires_opt_result struct contains:

struct spires_opt_result { spires_reservoir_config best_config; /* optimal configuration */ double best_log10_ridge; /* log10 of best ridge lambda */ double best_score; /* composite score */ double metric_mean; /* mean metric across seeds */ double metric_std; /* std of metric across seeds */ };
FieldInterpretation
best_scoreThe composite score (metric mean minus penalties). This is what the optimizer maximized.
metric_meanThe raw mean performance metric. Compare this across different scoring configurations to understand the accuracy-cost trade-off.
metric_stdThe variability across seeds. Lower is better for deployment reliability.
best_log10_ridgeThe optimal ridge regularization parameter in log-space. Use pow(10.0, best_log10_ridge) to get λ\lambda.

Interaction with Budget Levels

The scoring configuration is applied at every budget level. At low-fidelity levels (fewer seeds, less data), the metric estimates are noisier. The variance penalty effectively accounts for this: configurations with high variance at low fidelity are penalized, which is appropriate because truly good configurations tend to show consistent performance even with limited evaluation.

At higher fidelity levels, the metric estimates become more reliable, and the variance penalty becomes a true measure of the configuration’s inherent robustness rather than an artifact of limited evaluation.


← Budget Configuration | Memory Management →

Last updated on