Side-by-side eval runs
Compare GPT-4o vs. GPT-4.1 or prompt v1 vs. v2 in one place, with outputs lined up instead of buried in logs.
OpenEvals is the first release: an open-source studio for prompt evals, model comparisons, and regression tracking. The goal is simple: make eval workflows feel as approachable and polished as the products they protect.
Compare GPT-4o vs. GPT-4.1 or prompt v1 vs. v2 in one place, with outputs lined up instead of buried in logs.
Pin a baseline run, rerun the suite later, and see whether the prompt actually improved or drifted backward.
Suites live as YAML, can be forked cleanly, and plug into CI so the repo works for both solo builders and teams.
Project Tabs
1 live now. As new repositories launch, they can drop into this strip without another redesign.
Open-source eval studio for prompt comparisons, regression tracking, and GitHub-native LLM evals. Built to make prompt testing visual, shareable, and approachable.
Why It Belongs Here
A full open-source product loop: UX, backend orchestration, CI integration, and GitHub community setup.
More project tabs plug in here automatically.

Preview linked to GitHub
Open projectRelease
v0.1.0 public
Workflow
UI + CLI + CI
Community
Discussions live
Public repo
OpenEvals is already live on GitHub with release notes, discussions, and a contribution surface.
Self-hostable
The stack is React + FastAPI + PostgreSQL + Redis so people can run it locally or deploy it without guesswork.
Built to be shared
Example suites, GitHub Actions support, and clean screenshots make it easier for people to post, fork, and benchmark publicly.
OpenEvals is the first project here because it already has a real release surface, a clear use case, and a reason for strangers to share it. More open-source systems and ML projects will land here over time.
A repo should explain itself quickly. People should understand what it does, why it matters, and how to try it within minutes.
Good OSS should feel intentional, not like a dumped side project. Presentation is part of adoption.
Projects here should invite comparison screenshots, forks, issues, and benchmarks instead of quietly sitting in a portfolio grid.