>Blog
No existing benchmark tells us which model fits which cognitive role. So we're building o2Bench — an open source suite that evaluates models against specific functions in an agent architecture.