Bench Testing - Search News

Terminal-Bench 2.0 launches alongside Harbor, a new framework for testing agents in containers

The developers of Terminal-Bench, a benchmark suite for evaluating the performance of autonomous AI agents on real-world terminal-based tasks, have released version 2.0 alongside Harbor, a new ...

Machine Design

R&D Spotlight: Designing a Test Bench for Armored Vehicle Suspensions

Test engineers undoubtedly agree on the need for a test rig that can evaluate the reliability of a vehicle’s suspension system. However, developing and building a high-performance fatigue bench that ...

ExecutiveGov

NIST Seeks Public Input on Draft Best Practices for Automated AI Benchmark Testing

NIST said Friday that its Center for AI Standards and Innovation, or CAISI, released an initial public draft of NIST AI 800-2, “ Practices for Automated Benchmark Evaluations of Language Models ,” and ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Terminal-Bench 2.0 launches alongside Harbor, a new framework for testing agents in containers

R&D Spotlight: Designing a Test Bench for Armored Vehicle Suspensions

NIST Seeks Public Input on Draft Best Practices for Automated AI Benchmark Testing

Trending now