Loading...
The only dev-evaluation engine built on a benchmark corpus

Code is what you ship.
Judgment is how you got there.

Two audits per repo - code quality and work patterns. Evidence-backed, methodology in the open, defensible for both the developer and the people hiring.

Sample profile·verified 2026-07-051 of 3

Alex Chen

Senior Backend Engineer · 8 years
Python · Rust · Distributed Systems
Audit 1 · Code Quality
82/ 100
TIER 3 · DEEPWhat was built - features, architecture, intent.
Reach
47/ 100
Stars + forks + dependents.
Never collapsed with V4.
Audit 2 · Work Patterns
6 of 11 signals shown
decision_quality
100
iteration_discipline
88
rationale_habit
100
self_correction
75
commit_cadence_health
92
feature_completion_rate
85
Sample decision·well reasonedthread × 6 · 12d

Replaced internal frozenlist subclass with the external aiosignal library to reduce maintenance surface.

Source: commit_message_body
  • src/auth/cookie.pyline 142
  • tests/test_cookie.pyline 88
7 repos audited23 open-source PRs merged✓ Verified by Oddit
auto · 6s
How we audit

Two audits per repo. No black box, no magic.

Code quality (what you built) runs alongside work patterns (how you ship). Hover any stage below to see what happens.

Code Quality · V4+Work Patterns · V5=Cross-Signals · V4×V5
Stage 01 · Parse
Build an abstract syntax tree of every file in the repo.
We use cAST chunking to walk every Python, TypeScript, Go, and Rust file. We know what's a function, what's a route, what's a config file - before any AI runs. Static analysis identifies entry points, package boundaries, and ownership.
MOAT·WHY_ODDIT
SEEN
We read every line, not just star counts.
Your work gets the attention a senior reviewer would give it.
HONEST
Caps prevent UI work being sold as systems work.
A polished frontend can't masquerade as deep tech. The score reflects what you actually built.
WHOLE
You're more than one number.
Code quality, work patterns, and reach scored as separate signals - never collapsed into a single misleading number.
CALIBRATED
Built on a benchmark, not on vibes.
Scoring methodology grounded in a 58-repo benchmark corpus - designed to support the validity evidence AI hiring regulation increasingly requires.

Audit · 1 of 2 · Code Quality (V4)

The 4-Bucket Engine

Audit one: what you built, how you structured it, and whether the commit history is real. Audit two — work patterns (V5) — runs in parallel and surfaces in your profile.

Bucket A

0

Features

What you built — custom implementations, algorithms, API integrations. Each feature is evidence-verified against your source code and classified into three complexity tiers.

20pts

Invention - novel architecture, primitives

6pts

Engineering - custom decision logic

1pt

Integration - wiring standard tools

Bucket B

0

Architecture

Design patterns, separation of concerns, reusable abstractions. Diminishing returns prevent gaming.

Bucket C

0

Intent

Error handling, config management, test coverage, edge cases. Six quality signals normalized to 25.

Bucket D

0

Forensics

Commit sessions, fix ratio, message quality, evolution patterns. Detects bulk imports and fake history.

Session analysis·Time-spread check·Commit authenticity·Evolution mix

Protected by 10 anti-gaming layers including evidence gates, authorship verification, and time-spread analysis.

Methodology

Built for the validity bar AI hiring needs.

Most engineering-evaluation tools generate a score and ask you to trust it. We document the methodology, calibrate it against a benchmark, and refuse to score when the evidence isn't there.

BENCHMARK
A 58-repo corpus, methodology in the open.
Every scoring decision is calibrated against a benchmark corpus spanning languages, repo sizes, and engineering disciplines — not a black-box LLM call on every audit.
EVIDENCE
Every claim cites file and line.
Scores trace back to specific code locations. No hallucinated features, no inflated tiers — claims with missing evidence are dropped before they hit the score.
MULTI-SIGNAL
Code quality AND work patterns, never collapsed.
Two independent pipelines: V4 measures what you built; V5 measures how you ship. Cross-signals join them. One number lies; signal layers tell the truth.
HONEST
When the data is thin, we say so.
Insufficient-data verdicts are first-class. Small repos, terse commits, missing rationale — surfaced as gaps instead of papered over with confident-sounding scores.

Methodology in active development. Validity benchmark grows weekly. Designed to support the evidentiary requirements modern AI-hiring regulation increasingly expects — independent of any single vendor or model.

For your next job

A Portfolio Recruiters Actually Trust

Resumes list skills. Oddit proves them. Build a verified portfolio and share it anywhere.

Step 1

Score your repos

Get an AI-verified score that reflects what you actually built — not just stars or commit counts.

Step 2

Share one link

Your public profile at weoddit.com/p/you — a single URL with every scored project and your overall tier.

Step 3

Stand out to recruiters

Hiring managers see evidence-backed proof of your skills instead of self-reported bullet points.

weoddit.com/p/yourname
D
yournameADVANCED

Fullstack Developer

my-saas-app

74

ml-pipeline

82

cli-tool

61

Shareable. Verified. Always up to date.

Find Your Next Contribution

Semantic search across 400,000+ open source issues

Build your verified portfolio.

Score your first repo — free, no signup, 60 seconds.