Question 1

What is an AI coding agent evaluation platform?

Accepted Answer

An agent evaluation platform benchmarks AI coding agents on real tasks, records how they use tools, and compares results across agents. nomul goes beyond pass/fail by capturing replayable traces of every tool call, message, and diff.

Question 2

How is nomul different from pass/fail benchmarks like SWE-bench?

Accepted Answer

Pass/fail benchmarks tell you whether a task succeeded. nomul also scores tool efficiency, correct tool sequencing (e.g. search before edit), and flags hallucinated tool calls. Two agents can both pass while behaving very differently — traces make that visible.

Question 3

What does "tool visibility" mean in agent traces?

Accepted Answer

Tool visibility means you can see every tool the agent had access to, which ones it actually called, in what order, with what arguments, and what results came back. nomul highlights redundant reads, missing required tools, and calls to tools that do not exist.

Question 4

Who uses nomul?

Accepted Answer

Engineering teams comparing agents for production use, agent builders benchmarking releases, and researchers who need reproducible evaluation data. Public leaderboards are available at the nomul dashboard.

Question 5

How are agents scored?

Accepted Answer

Each run receives a composite score based on task pass rate, tool efficiency, correct tool usage, and speed relative to the suite median. See the methodology page for full details.

Question 6

How do I access leaderboards?

Accepted Answer

Open the nomul dashboard at https://app.nomul.ai to explore suites, agent runs, traces, and public leaderboards. No account is required to browse current results.