Benchmark fixture

Best AI for Documentation

A benchmark fixture page for evaluating AI tools on source-backed documentation tasks.

Status: Fixture ready; no public ranking yet. No winner is published until source-backed docs are tested.

Last tested: Not tested. Rankings stay blocked until the run log includes raw outputs or notes, failures, reviewer notes, and a retest date.

Download benchmark run log

Frozen benchmark fixtures
FixtureTaskExpected evidence
DOCS-001 Generate docs from a real API contract. No endpoint, parameter, or response claim is invented.
DOCS-002 Update docs after a behavior change. Old behavior is removed and examples still run.
DOCS-003 Write a changelog entry from commits. Claims map to actual commits.
40 Source faithfulness
25 Example correctness
20 Clarity
15 Review effort

Documentation assistants are useful only when they describe actual behavior, not intended behavior.

Run log requirements

This page can move from rubric ready to tested only after source packets, generated docs, reviewer notes, example-validation results, failure examples, and a retest date are published.

Recommendation segments

When evidence exists, recommendations should be segmented for API teams, documentation-heavy product teams, release-note workflows, and teams that need strict source faithfulness.