The only dev-evaluation engine built on a benchmark corpus

Code is what you ship.
Judgment is how you got there.

Two audits per repo - code quality and work patterns. Evidence-backed, methodology in the open, defensible for both the developer and the people hiring.

Sample profile·verified 2026-07-051 of 3

Alex Chen

Senior Backend Engineer · 8 years

Python · Rust · Distributed Systems

Audit 1 · Code Quality

82/ 100

TIER 3 · DEEPWhat was built - features, architecture, intent.

Reach

47/ 100

Stars + forks + dependents.
Never collapsed with V4.

Audit 2 · Work Patterns

6 of 11 signals shown

decision_quality

100

iteration_discipline

rationale_habit

100

self_correction

commit_cadence_health

feature_completion_rate

Sample decision·✓ well reasonedthread × 6 · 12d

“Replaced internal frozenlist subclass with the external aiosignal library to reduce maintenance surface.”

Source: commit_message_body

▸src/auth/cookie.pyline 142
▸tests/test_cookie.pyline 88

7 repos audited23 open-source PRs merged✓ Verified by Oddit

auto · 6s

How we audit

Two audits per repo. No black box, no magic.

Code quality (what you built) runs alongside work patterns (how you ship). Hover any stage below to see what happens.

Code Quality · V4+Work Patterns · V5=Cross-Signals · V4×V5

Stage 01 · Parse

Build an abstract syntax tree of every file in the repo.

We use cAST chunking to walk every Python, TypeScript, Go, and Rust file. We know what's a function, what's a route, what's a config file - before any AI runs. Static analysis identifies entry points, package boundaries, and ownership.

MOAT·WHY_ODDIT

SEEN

We read every line, not just star counts.

Your work gets the attention a senior reviewer would give it.

HONEST

Caps prevent UI work being sold as systems work.

A polished frontend can't masquerade as deep tech. The score reflects what you actually built.

WHOLE

You're more than one number.

Code quality, work patterns, and reach scored as separate signals - never collapsed into a single misleading number.

CALIBRATED

Built on a benchmark, not on vibes.

Scoring methodology grounded in a 58-repo benchmark corpus - designed to support the validity evidence AI hiring regulation increasingly requires.

Audit · 1 of 2 · Code Quality (V4)

The 4-Bucket Engine

Audit one: what you built, how you structured it, and whether the commit history is real. Audit two — work patterns (V5) — runs in parallel and surfaces in your profile.

Bucket A

Features

What you built — custom implementations, algorithms, API integrations. Each feature is evidence-verified against your source code and classified into three complexity tiers.

20pts

Invention - novel architecture, primitives

6pts

Engineering - custom decision logic

1pt

Integration - wiring standard tools

Bucket B

Architecture

Design patterns, separation of concerns, reusable abstractions. Diminishing returns prevent gaming.

Bucket C

Intent

Error handling, config management, test coverage, edge cases. Six quality signals normalized to 25.

Bucket D

Forensics

Commit sessions, fix ratio, message quality, evolution patterns. Detects bulk imports and fake history.

Session analysis·Time-spread check·Commit authenticity·Evolution mix

Protected by 10 anti-gaming layers including evidence gates, authorship verification, and time-spread analysis.

Methodology

Built for the validity bar AI hiring needs.

Most engineering-evaluation tools generate a score and ask you to trust it. We document the methodology, calibrate it against a benchmark, and refuse to score when the evidence isn't there.

BENCHMARK

A 58-repo corpus, methodology in the open.

Every scoring decision is calibrated against a benchmark corpus spanning languages, repo sizes, and engineering disciplines — not a black-box LLM call on every audit.

EVIDENCE

Every claim cites file and line.

Scores trace back to specific code locations. No hallucinated features, no inflated tiers — claims with missing evidence are dropped before they hit the score.

MULTI-SIGNAL

Code quality AND work patterns, never collapsed.

Two independent pipelines: V4 measures what you built; V5 measures how you ship. Cross-signals join them. One number lies; signal layers tell the truth.

HONEST

When the data is thin, we say so.

Insufficient-data verdicts are first-class. Small repos, terse commits, missing rationale — surfaced as gaps instead of papered over with confident-sounding scores.

Methodology in active development. Validity benchmark grows weekly. Designed to support the evidentiary requirements modern AI-hiring regulation increasingly expects — independent of any single vendor or model.