Skip to main content

MT Eval Arena

Think you can solve it? Prove it.

An open proving ground for machine translation methods — especially for languages that commercial services will never support.

StatusPre-release — Coming Soon
📐

Standardized Benchmarks

Reproducible evaluation with chrF++, exact match, FST acceptance, semantic scoring, and bootstrap confidence intervals. Every run is fingerprinted.

🏴

Community Sovereignty

Winning methods transfer ownership to the language community. OCAP® principles. Communities control their data, their methods, and their revenue.

🔌

Open Plugin Architecture

Bring any method: coached LLM, fine-tuned model, FST-gated pipeline, or custom plugin. If it produces translations, the harness can score it.

🚀

Deployment Bridge

Proven methods deploy to production via i18n-rosetta. Developers consume via API. Revenue flows back to the community.

Current Benchmarks

EDTeKLA Dev Set v1

  • Language: English → Plains Cree (SRO)
  • Entries: 124 curated pairs
  • License: CC BY-NC-SA 4.0
  • Source: University of Alberta

FLORES+ Devtest

  • Languages: English → 39 languages
  • Entries: 1,012 sentences per language
  • License: CC BY-SA 4.0
  • Source: OLDI / HuggingFace