Primary metric is AUROC/AP — F1 is near-zero for all methods due to ~0.02% anomaly rate (4 events in 22,683 rows). This is expected and correct: the evaluation uses point labels from ...