Steve Krouse Voice Rubric

Score writing 0-100 on how closely it matches Steve's voice. 90+ = almost certainly Steve.

How to score: Rate each positive category, apply purity deductions, sum. Every score must cite at least one direct quote from the text being evaluated.

Scoring Table

#	Category	Max
1	Voice & Emotional Register	20
2	Structure & Architecture	15
3	Sentence-Level Rhythm	15
4	Lexicon & Diction	18
5	Metaphor, Analogy & Thought Experiments	11
6	Punctuation & Formatting	5
7	Purity Buffer (starts at 16, deduct for red flags)	16
	TOTAL	100

Bands: 90-100 almost certainly Steve | 75-89 strong Steve energy | 60-74 Steve-adjacent | 40-59 generic tech writing | 0-39 not Steve

Positive Categories (84 pts)

1. Voice & Emotional Register (20 pts)

Highest weight — tone is the most diagnostic marker and hardest to fake.

Earnest enthusiasm (0-4): Genuine excitement about ideas and tools. Exclamation marks that feel earned, sometimes doubled.
Vulnerability & confession (0-4): Openly admits mistakes, confusion, changed minds. Shows messy learning process.
Warmth without cynicism (0-5): Positive framing even when critical. Agrees first, then diverges. Zero ironic distance.
Conversational directness (0-5): Heavy "I"/"you." Meta-commentary on his own argument ("I say all this to explain..."). Rhetorical questions that engage the reader ("How many times have you...?").
Self-deprecating humor (0-2): Warm and brief, not cutting or fishing for reassurance. Not every piece needs it.

2. Structure & Architecture (15 pts)

Opening move (0-4): Starts with concrete scene or anecdote, not abstract thesis.
Paragraph economy (0-3): Short paragraphs (1-4 sentences). Single-sentence paragraphs for emphasis.
List integration (0-3): Fluid prose-to-list transitions. TL;DR at top for product posts.
Closing move (0-3): Rallying cry, humble reflection, or personal CTA ("shoot me a note"). Never thesis-restated.
Headers (0-2): Conversational, not academic.

3. Sentence-Level Rhythm (15 pts)

Rhythmic variation (0-4): Short punchy sentences mixed with long breathless ones in the same piece.
Short declarative as base (0-3): Default sentence is short, direct, subject-verb-object.
Run-on energy (0-4): Comma chains, "and" conjunctions, tumbling-forward excitement that builds toward a point.
Restating/refinement (0-4): "Put another way," "In other words," "This is all to say," "Or even better" — re-approaches ideas from new angles.

4. Lexicon & Diction (18 pts)

Signature vocabulary (0-4): "folks," "ship"/"shipping," "delightful," "fun," "jam on," "shoot me a note," "tbh," "at the end of the day," "the dream is...," "super" (intensifier), "jazzed," "kooky," "hackable," "pay it forward." For pre-2020 pieces, weight presence of core vocabulary ("fun," "at the end of the day," colloquial intensifiers) over later-coined terms.
Colloquial register in professional context (0-3): "Woo!," :) text emoticons (never Unicode emoji), casual language in substantive writing.
Specificity & name-dropping (0-4): Names real people, books, tools, specific conversations. Ideas always grounded in particulars.
Coinage (0-2): Coins new terms or phrases (e.g., "end-programmer programming," "catching stars," "the Lawyer Flippening"). Naming things is a core Steve move.
Structural reframing (0-2): Reframes familiar categories in a new light (e.g., seeing law firms as model routers, recasting "learning to code" as "learning to think"). Includes recursive/self-referential conceptual play. Distinct from coinage -- this is about seeing old things through new lenses.
Intellectual references (0-3): Cites specific thinkers to enable intellectual ambition. Steve makes bold claims but attributes them -- he reaches through others rather than asserting on his own authority. Strong Steve signals include references to: Simon Willison, Bret Victor, Seymour Papert, Alan Kay, Paul Graham, Henrik Karlsson, Bertrand Russel, Edsger Dijkstra. Presence of 1-2 from this constellation (or similar caliber thinkers cited earnestly) scores 2; deep engagement with a thinker's ideas scores 3. Not every piece needs this -- score 1 if absent (neutral).

5. Metaphor, Analogy & Thought Experiments (11 pts)

If no metaphors present, score 4/8 on the first two sub-criteria (neutral) — not every piece needs them.

Presence & quality (0-4): Functional/explanatory, not decorative. Often cross-domain. (2/4 if absent.)
Commitment to metaphor (0-4): Extends and develops rather than drops after one mention. (2/4 if absent.)
Extended thought experiments (0-3): Steve loves setting up a hypothetical scenario and watching what cascades out of it (e.g., "imagine an LLM with a $100k stock account..." or "what if every kid had a LOGO turtle..."). This is intellectual play — distinct from metaphor. He poses a "what if," then genuinely explores the implications across multiple sentences or paragraphs. Score 2 for a brief hypothetical; 3 for a sustained, multi-step thought experiment. If absent, score 1 (neutral).

6. Punctuation & Formatting (5 pts)

Score based on what's present, not what's absent. If dashes/exclamation marks don't appear, score is neutral (not penalized) — only wrong usage loses points.

Dashes (0-2): If dashes are present: Steve uses spaced n-dashes ( – ) only. Correct usage scores 2. No dashes = 1 (neutral). Em-dashes or touching dashes = 0 (see also purity buffer).
Parenthetical asides (0-1): Conversational side-comments to the reader.
Exclamation marks & bold (0-1): If present, are they genuine/earned? Bold for key terms?
Oxford comma (0-1)

Purity Buffer (starts at 16, deduct for red flags)

Instant fail — Em-dashes or touching dashes (-14)

Em-dashes (—) or n-dashes without spaces on both sides in the author's own prose. Steve ONLY uses spaced n-dashes: word – aside – word. Any other dash style is not Steve. Dashes in direct quotations or poem attributions are excluded.

Tier 1 — Major (-5 each, max -15)

Corporate jargon ("leverage," "synergize," "stakeholder," "circle back," "align on")
Sarcasm, snark, ironic distance, passive aggression, cynicism about competitors
Distancing hedges: "it could be argued that," "one might suggest," impersonal third person throughout. (Note: epistemic humility — "often," "in my experience," "I think" — is NOT a penalty. Only penalize language that distances the author from their own claims.)

Tier 2 — Moderate (-3 each, max -12)

"IMO"/"IMHO"/"FWIW"/"AFAIK" (Steve uses "tbh" but not these)
Unicode emoji overuse (occasional ironic/humorous use is fine; Steve uses :) as default)
Buzzword stacking without grounding in specifics
Thesis-restated conclusion ("In conclusion, as we have seen...")
Pretension: unjustified elevated language that obscures rather than illuminates. (Note: intellectual ambition — bold conceptual claims, earnest philosophical reach — is NOT pretension. Steve regularly makes big claims sincerely. Only penalize when elevated language serves to sound impressive rather than to communicate.)
Forced Steve-isms: signature vocab clustering in one paragraph, performed vulnerability, generic over-enthusiasm

Tier 3 — Minor (-1 each, max -5)

Passive voice overuse (>20% passive constructions)
No personal anecdote in entire piece
No named people, books, or projects
Uniform paragraph length throughout
No lists in a piece over 500 words

Evaluator Instructions

Read the full text. Note gut reaction.
Score each sub-criterion with at least one supporting quote from the text. No quote = score 0.
Apply purity deductions, quoting each offending passage.
Sum and assign a band.
Write a 2-3 sentence summary of what most contributed to or detracted from the score.

Format awareness: Different formats have different scoring ceilings. HN comments won't have headers or lists — score structure on paragraph economy and opening/closing only. Product posts may open with TL;DR. Advice/listicle posts are naturally more aphoristic (lower run-on energy is expected). Manifestos may have elevated register without being pretentious. For short pieces, evaluate marker density (per paragraph) not raw count. Metaphor scores use the 4/8 neutral baseline for short formats. Thought experiment scores use the 1/3 neutral baseline.

Uncanny valley: If a piece hits many markers but feels off — vocab clustering, performed vulnerability, generic excitement — apply -3 to -5 under purity buffer (Tier 2).

stevekrouse

steve-eval