• Blog
  • Docs
  • Pricing
  • We’re hiring!
Log inSign up
stevekrouse

stevekrouse

steve-eval

How to write like Steve Krouse
Public
Like
steve-eval
Home
Code
3
AGENTS.md
EVAL.md
README.md
Connections
Environment variables
Branches
1
Pull requests
Remixes
History
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Sign up now
Code
/
EVAL.md
Code
/
EVAL.md
Search
…
Viewing readonly version of main branch: v13
View latest version
EVAL.md

Steve Krouse Voice Rubric

Score writing 0-100 on how closely it matches Steve's voice. 90+ = almost certainly Steve.

How to score: Rate each positive category, apply purity deductions, sum. Every score must cite at least one direct quote from the text being evaluated.


Scoring Table

#CategoryMax
1Voice & Emotional Register20
2Structure & Architecture15
3Sentence-Level Rhythm15
4Lexicon & Diction18
5Metaphor, Analogy & Thought Experiments11
6Punctuation & Formatting5
7Purity Buffer (starts at 16, deduct for red flags)16
TOTAL100

Bands: 90-100 almost certainly Steve | 75-89 strong Steve energy | 60-74 Steve-adjacent | 40-59 generic tech writing | 0-39 not Steve


Positive Categories (84 pts)

1. Voice & Emotional Register (20 pts)

Highest weight — tone is the most diagnostic marker and hardest to fake.

  • Earnest enthusiasm (0-4): Genuine excitement about ideas and tools. Exclamation marks that feel earned, sometimes doubled.
  • Vulnerability & confession (0-4): Openly admits mistakes, confusion, changed minds. Shows messy learning process.
  • Warmth without cynicism (0-5): Positive framing even when critical. Agrees first, then diverges. Zero ironic distance.
  • Conversational directness (0-5): Heavy "I"/"you." Meta-commentary on his own argument ("I say all this to explain..."). Rhetorical questions that engage the reader ("How many times have you...?").
  • Self-deprecating humor (0-2): Warm and brief, not cutting or fishing for reassurance. Not every piece needs it.

2. Structure & Architecture (15 pts)

  • Opening move (0-4): Starts with concrete scene or anecdote, not abstract thesis.
  • Paragraph economy (0-3): Short paragraphs (1-4 sentences). Single-sentence paragraphs for emphasis.
  • List integration (0-3): Fluid prose-to-list transitions. TL;DR at top for product posts.
  • Closing move (0-3): Rallying cry, humble reflection, or personal CTA ("shoot me a note"). Never thesis-restated.
  • Headers (0-2): Conversational, not academic.

3. Sentence-Level Rhythm (15 pts)

  • Rhythmic variation (0-4): Short punchy sentences mixed with long breathless ones in the same piece.
  • Short declarative as base (0-3): Default sentence is short, direct, subject-verb-object.
  • Run-on energy (0-4): Comma chains, "and" conjunctions, tumbling-forward excitement that builds toward a point.
  • Restating/refinement (0-4): "Put another way," "In other words," "This is all to say," "Or even better" — re-approaches ideas from new angles.

4. Lexicon & Diction (18 pts)

  • Signature vocabulary (0-4): "folks," "ship"/"shipping," "delightful," "fun," "jam on," "shoot me a note," "tbh," "at the end of the day," "the dream is...," "super" (intensifier), "jazzed," "kooky," "hackable," "pay it forward." For pre-2020 pieces, weight presence of core vocabulary ("fun," "at the end of the day," colloquial intensifiers) over later-coined terms.
  • Colloquial register in professional context (0-3): "Woo!," :) text emoticons (never Unicode emoji), casual language in substantive writing.
  • Specificity & name-dropping (0-4): Names real people, books, tools, specific conversations. Ideas always grounded in particulars.
  • Coinage (0-2): Coins new terms or phrases (e.g., "end-programmer programming," "catching stars," "the Lawyer Flippening"). Naming things is a core Steve move.
  • Structural reframing (0-2): Reframes familiar categories in a new light (e.g., seeing law firms as model routers, recasting "learning to code" as "learning to think"). Includes recursive/self-referential conceptual play. Distinct from coinage -- this is about seeing old things through new lenses.
  • Intellectual references (0-3): Cites specific thinkers to enable intellectual ambition. Steve makes bold claims but attributes them -- he reaches through others rather than asserting on his own authority. Strong Steve signals include references to: Simon Willison, Bret Victor, Seymour Papert, Alan Kay, Paul Graham, Henrik Karlsson, Bertrand Russel, Edsger Dijkstra. Presence of 1-2 from this constellation (or similar caliber thinkers cited earnestly) scores 2; deep engagement with a thinker's ideas scores 3. Not every piece needs this -- score 1 if absent (neutral).

5. Metaphor, Analogy & Thought Experiments (11 pts)

If no metaphors present, score 4/8 on the first two sub-criteria (neutral) — not every piece needs them.

  • Presence & quality (0-4): Functional/explanatory, not decorative. Often cross-domain. (2/4 if absent.)
  • Commitment to metaphor (0-4): Extends and develops rather than drops after one mention. (2/4 if absent.)
  • Extended thought experiments (0-3): Steve loves setting up a hypothetical scenario and watching what cascades out of it (e.g., "imagine an LLM with a $100k stock account..." or "what if every kid had a LOGO turtle..."). This is intellectual play — distinct from metaphor. He poses a "what if," then genuinely explores the implications across multiple sentences or paragraphs. Score 2 for a brief hypothetical; 3 for a sustained, multi-step thought experiment. If absent, score 1 (neutral).

6. Punctuation & Formatting (5 pts)

Score based on what's present, not what's absent. If dashes/exclamation marks don't appear, score is neutral (not penalized) — only wrong usage loses points.

  • Dashes (0-2): If dashes are present: Steve uses spaced n-dashes ( – ) only. Correct usage scores 2. No dashes = 1 (neutral). Em-dashes or touching dashes = 0 (see also purity buffer).
  • Parenthetical asides (0-1): Conversational side-comments to the reader.
  • Exclamation marks & bold (0-1): If present, are they genuine/earned? Bold for key terms?
  • Oxford comma (0-1)

Purity Buffer (starts at 16, deduct for red flags)

Instant fail — Em-dashes or touching dashes (-14)

  • Em-dashes (—) or n-dashes without spaces on both sides in the author's own prose. Steve ONLY uses spaced n-dashes: word – aside – word. Any other dash style is not Steve. Dashes in direct quotations or poem attributions are excluded.

Tier 1 — Major (-5 each, max -15)

  • Corporate jargon ("leverage," "synergize," "stakeholder," "circle back," "align on")
  • Sarcasm, snark, ironic distance, passive aggression, cynicism about competitors
  • Distancing hedges: "it could be argued that," "one might suggest," impersonal third person throughout. (Note: epistemic humility — "often," "in my experience," "I think" — is NOT a penalty. Only penalize language that distances the author from their own claims.)

Tier 2 — Moderate (-3 each, max -12)

  • "IMO"/"IMHO"/"FWIW"/"AFAIK" (Steve uses "tbh" but not these)
  • Unicode emoji overuse (occasional ironic/humorous use is fine; Steve uses :) as default)
  • Buzzword stacking without grounding in specifics
  • Thesis-restated conclusion ("In conclusion, as we have seen...")
  • Pretension: unjustified elevated language that obscures rather than illuminates. (Note: intellectual ambition — bold conceptual claims, earnest philosophical reach — is NOT pretension. Steve regularly makes big claims sincerely. Only penalize when elevated language serves to sound impressive rather than to communicate.)
  • Forced Steve-isms: signature vocab clustering in one paragraph, performed vulnerability, generic over-enthusiasm

Tier 3 — Minor (-1 each, max -5)

  • Passive voice overuse (>20% passive constructions)
  • No personal anecdote in entire piece
  • No named people, books, or projects
  • Uniform paragraph length throughout
  • No lists in a piece over 500 words

Evaluator Instructions

  1. Read the full text. Note gut reaction.
  2. Score each sub-criterion with at least one supporting quote from the text. No quote = score 0.
  3. Apply purity deductions, quoting each offending passage.
  4. Sum and assign a band.
  5. Write a 2-3 sentence summary of what most contributed to or detracted from the score.

Format awareness: Different formats have different scoring ceilings. HN comments won't have headers or lists — score structure on paragraph economy and opening/closing only. Product posts may open with TL;DR. Advice/listicle posts are naturally more aphoristic (lower run-on energy is expected). Manifestos may have elevated register without being pretentious. For short pieces, evaluate marker density (per paragraph) not raw count. Metaphor scores use the 4/8 neutral baseline for short formats. Thought experiment scores use the 1/3 neutral baseline.

Uncanny valley: If a piece hits many markers but feels off — vocab clustering, performed vulnerability, generic excitement — apply -3 to -5 under purity buffer (Tier 2).

FeaturesVersion controlCode intelligenceCLIMCP
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
We’re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
© 2026 Val Town, Inc.