[{"data":1,"prerenderedAt":573},["ShallowReactive",2],{"blog-\u002Fblog\u002Fuse-claude-as-a-scorer":3},{"id":4,"title":5,"body":6,"date":564,"description":565,"draft":566,"extension":567,"meta":568,"navigation":86,"path":569,"seo":570,"stem":571,"__hash__":572},"blog\u002Fblog\u002Fuse-claude-as-a-scorer.md","Use Claude as a scorer, not just a generator",{"type":7,"value":8,"toc":556},"minimark",[9,18,30,35,38,52,56,59,408,411,415,422,454,458,461,498,502,513,536,539,543,549,552],[10,11,12,13,17],"p",{},"Everyone's first instinct with an LLM is to make it ",[14,15,16],"em",{},"write"," something. Generate the\npost, generate the summary, generate the reply. That's the flashy half. The half that\nactually keeps an automated pipeline from shipping garbage is the boring one:",[10,19,20,24,25,29],{},[21,22,23],"strong",{},"Use the model as a scorer."," A function from an item to a number. ",[26,27,28],"code",{},"f(item) → score",".\nThen sort, filter, and only spend your expensive generation step on the things that\nearned it.",[31,32,34],"h2",{"id":33},"why-scoring-beats-rules","Why scoring beats rules",[10,36,37],{},"You already know how to filter with code. Keyword lists, regexes, recency windows,\nsource allowlists. They're fast and free and you should still use them as a first pass.",[10,39,40,41,44,45,47,48,51],{},"But the judgments that matter in a content pipeline aren't keyword-shaped. ",[14,42,43],{},"Is this\nstory actually newsworthy to a working engineer, or is it recycled hype with a new\nheadline?"," No regex answers that. A model can — that's exactly the fuzzy, context-heavy\ncall LLMs are good at. The trick is to stop asking it to ",[14,46,16],{}," about the story and\nstart asking it to ",[14,49,50],{},"judge"," the story.",[31,53,55],{"id":54},"the-pattern","The pattern",[10,57,58],{},"Score every candidate, keep the top N. Here's the whole thing in Python with the\nAnthropic SDK:",[60,61,66],"pre",{"className":62,"code":63,"language":64,"meta":65,"style":65},"language-python shiki shiki-themes github-dark github-dark","import anthropic, json\n\nclient = anthropic.Anthropic()\n\nRUBRIC = \"\"\"You score stories for an audience of working AI engineers, 1-5:\n5 = they'd stop scrolling and read it today\n4 = solid, relevant, shippable\n3 = mildly interesting, not urgent\n2 = recycled or thin\n1 = noise \u002F pure hype\n\nReturn ONLY JSON: {\"score\": \u003Cint 1-5>, \"reason\": \"\u003Cone line>\"}\"\"\"\n\ndef score(item: dict) -> dict:\n    msg = client.messages.create(\n        model=\"claude-haiku-4-5\",   # small, fast, cheap — use the current Haiku id\n        max_tokens=100,\n        temperature=0,              # scoring is not a place for creativity\n        system=RUBRIC,\n        messages=[{\"role\": \"user\", \"content\": f\"{item['title']}\\n\\n{item['summary']}\"}],\n    )\n    return json.loads(msg.content[0].text)\n\nranked = sorted(candidates, key=lambda i: score(i)[\"score\"], reverse=True)\ntop = ranked[:5]\n","python","",[26,67,68,81,88,100,105,119,125,131,137,143,149,154,160,165,189,200,219,233,250,262,326,332,346,351,391],{"__ignoreMap":65},[69,70,73,77],"span",{"class":71,"line":72},"line",1,[69,74,76],{"class":75},"sOPea","import",[69,78,80],{"class":79},"suv1-"," anthropic, json\n",[69,82,84],{"class":71,"line":83},2,[69,85,87],{"emptyLinePlaceholder":86},true,"\n",[69,89,91,94,97],{"class":71,"line":90},3,[69,92,93],{"class":79},"client ",[69,95,96],{"class":75},"=",[69,98,99],{"class":79}," anthropic.Anthropic()\n",[69,101,103],{"class":71,"line":102},4,[69,104,87],{"emptyLinePlaceholder":86},[69,106,108,112,115],{"class":71,"line":107},5,[69,109,111],{"class":110},"s8ozJ","RUBRIC",[69,113,114],{"class":75}," =",[69,116,118],{"class":117},"s4wv1"," \"\"\"You score stories for an audience of working AI engineers, 1-5:\n",[69,120,122],{"class":71,"line":121},6,[69,123,124],{"class":117},"5 = they'd stop scrolling and read it today\n",[69,126,128],{"class":71,"line":127},7,[69,129,130],{"class":117},"4 = solid, relevant, shippable\n",[69,132,134],{"class":71,"line":133},8,[69,135,136],{"class":117},"3 = mildly interesting, not urgent\n",[69,138,140],{"class":71,"line":139},9,[69,141,142],{"class":117},"2 = recycled or thin\n",[69,144,146],{"class":71,"line":145},10,[69,147,148],{"class":117},"1 = noise \u002F pure hype\n",[69,150,152],{"class":71,"line":151},11,[69,153,87],{"emptyLinePlaceholder":86},[69,155,157],{"class":71,"line":156},12,[69,158,159],{"class":117},"Return ONLY JSON: {\"score\": \u003Cint 1-5>, \"reason\": \"\u003Cone line>\"}\"\"\"\n",[69,161,163],{"class":71,"line":162},13,[69,164,87],{"emptyLinePlaceholder":86},[69,166,168,171,175,178,181,184,186],{"class":71,"line":167},14,[69,169,170],{"class":75},"def",[69,172,174],{"class":173},"sFR8T"," score",[69,176,177],{"class":79},"(item: ",[69,179,180],{"class":110},"dict",[69,182,183],{"class":79},") -> ",[69,185,180],{"class":110},[69,187,188],{"class":79},":\n",[69,190,192,195,197],{"class":71,"line":191},15,[69,193,194],{"class":79},"    msg ",[69,196,96],{"class":75},[69,198,199],{"class":79}," client.messages.create(\n",[69,201,203,207,209,212,215],{"class":71,"line":202},16,[69,204,206],{"class":205},"s-3mD","        model",[69,208,96],{"class":75},[69,210,211],{"class":117},"\"claude-haiku-4-5\"",[69,213,214],{"class":79},",   ",[69,216,218],{"class":217},"sJ8bj","# small, fast, cheap — use the current Haiku id\n",[69,220,222,225,227,230],{"class":71,"line":221},17,[69,223,224],{"class":205},"        max_tokens",[69,226,96],{"class":75},[69,228,229],{"class":110},"100",[69,231,232],{"class":79},",\n",[69,234,236,239,241,244,247],{"class":71,"line":235},18,[69,237,238],{"class":205},"        temperature",[69,240,96],{"class":75},[69,242,243],{"class":110},"0",[69,245,246],{"class":79},",              ",[69,248,249],{"class":217},"# scoring is not a place for creativity\n",[69,251,253,256,258,260],{"class":71,"line":252},19,[69,254,255],{"class":205},"        system",[69,257,96],{"class":75},[69,259,111],{"class":110},[69,261,232],{"class":79},[69,263,265,268,270,273,276,279,282,285,288,290,293,296,299,302,305,308,311,313,316,318,321,323],{"class":71,"line":264},20,[69,266,267],{"class":205},"        messages",[69,269,96],{"class":75},[69,271,272],{"class":79},"[{",[69,274,275],{"class":117},"\"role\"",[69,277,278],{"class":79},": ",[69,280,281],{"class":117},"\"user\"",[69,283,284],{"class":79},", ",[69,286,287],{"class":117},"\"content\"",[69,289,278],{"class":79},[69,291,292],{"class":75},"f",[69,294,295],{"class":117},"\"",[69,297,298],{"class":110},"{",[69,300,301],{"class":79},"item[",[69,303,304],{"class":117},"'title'",[69,306,307],{"class":79},"]",[69,309,310],{"class":110},"}\\n\\n{",[69,312,301],{"class":79},[69,314,315],{"class":117},"'summary'",[69,317,307],{"class":79},[69,319,320],{"class":110},"}",[69,322,295],{"class":117},[69,324,325],{"class":79},"}],\n",[69,327,329],{"class":71,"line":328},21,[69,330,331],{"class":79},"    )\n",[69,333,335,338,341,343],{"class":71,"line":334},22,[69,336,337],{"class":75},"    return",[69,339,340],{"class":79}," json.loads(msg.content[",[69,342,243],{"class":110},[69,344,345],{"class":79},"].text)\n",[69,347,349],{"class":71,"line":348},23,[69,350,87],{"emptyLinePlaceholder":86},[69,352,354,357,359,362,365,368,371,374,377,380,383,385,388],{"class":71,"line":353},24,[69,355,356],{"class":79},"ranked ",[69,358,96],{"class":75},[69,360,361],{"class":110}," sorted",[69,363,364],{"class":79},"(candidates, ",[69,366,367],{"class":205},"key",[69,369,370],{"class":75},"=lambda",[69,372,373],{"class":79}," i: score(i)[",[69,375,376],{"class":117},"\"score\"",[69,378,379],{"class":79},"], ",[69,381,382],{"class":205},"reverse",[69,384,96],{"class":75},[69,386,387],{"class":110},"True",[69,389,390],{"class":79},")\n",[69,392,394,397,399,402,405],{"class":71,"line":393},25,[69,395,396],{"class":79},"top ",[69,398,96],{"class":75},[69,400,401],{"class":79}," ranked[:",[69,403,404],{"class":110},"5",[69,406,407],{"class":79},"]\n",[10,409,410],{},"That's it. The model is now a ranking function, and the rest of your pipeline only ever\nsees the best five things instead of the firehose.",[31,412,414],{"id":413},"making-it-cheap","Making it cheap",[10,416,417,418,421],{},"Scoring runs on ",[14,419,420],{},"every"," candidate, so cost adds up fast if you're careless.",[423,424,425,432,442,448],"ul",{},[426,427,428,431],"li",{},[21,429,430],{},"Use a small model."," Scoring is a Haiku job, not an Opus job. You're asking for a\nnumber, not an essay.",[426,433,434,441],{},[21,435,436,437,440],{},"Cap ",[26,438,439],{},"max_tokens"," hard."," A score and a one-line reason fit in ~100 tokens. Don't pay\nfor a paragraph you'll throw away.",[426,443,444,447],{},[21,445,446],{},"Score in parallel."," These calls are independent — fan them out, don't loop.",[426,449,450,453],{},[21,451,452],{},"Pre-filter with code first."," Don't spend a model call ranking something a recency\nwindow already killed.",[31,455,457],{"id":456},"making-it-stable","Making it stable",[10,459,460],{},"A scorer that returns 3 today and 5 tomorrow for the same input is worse than useless.",[423,462,463,472,482,488],{},[426,464,465,471],{},[21,466,467,470],{},[26,468,469],{},"temperature=0","."," You want the same input to land in the same bucket every time.",[426,473,474,477,478,481],{},[21,475,476],{},"Anchor the scale in the prompt."," \"Score 1-5\" alone invites drift. Spell out what\neach number ",[14,479,480],{},"means",", like the rubric above. The anchors are what make the scores\ncomparable across runs.",[426,483,484,487],{},[21,485,486],{},"Don't ask for 1-100."," That's false precision — the model can't reliably tell a 73\nfrom a 76, and now neither can you. A tight 1-5 with explicit anchors is honest about\nthe resolution you actually have.",[426,489,490,493,494,497],{},[21,491,492],{},"Force structured output."," Parsing free text as JSON works until the day it doesn't.\nIf you want it bulletproof, define a tool with a typed schema and let the model fill it\nin, instead of hoping ",[26,495,496],{},"json.loads"," succeeds.",[31,499,501],{"id":500},"making-it-honest-the-anti-slop-part","Making it honest (the anti-slop part)",[10,503,504,505,508,509,512],{},"Here's the failure mode that bites people: the model is ",[14,506,507],{},"confident",", the JSON is\n",[14,510,511],{},"clean",", and you start trusting the scores without ever checking them. Confidence is not\ncorrectness.",[423,514,515,524,530],{},[426,516,517,523],{},[21,518,519,520,470],{},"Always require a ",[26,521,522],{},"reason"," One line, per item. It costs almost nothing and it's the\nonly way you'll catch the rubric being misread.",[426,525,526,529],{},[21,527,528],{},"Log every score with its reason."," When the pipeline picks something dumb, you want\nthe receipt, not a mystery.",[426,531,532,535],{},[21,533,534],{},"Spot-check the boundary."," Read the things that scored a 3. That's where the model's\njudgment is fuzziest and where your rubric needs sharpening.",[10,537,538],{},"If you can't explain why item A beat item B, your rubric is the problem — not the model.",[31,540,542],{"id":541},"this-is-the-agent-loop-skeleton","This is the agent-loop skeleton",[10,544,545,546,548],{},"Score-then-act is most of what \"agentic\" actually means in production. Generation is the\npart that demos well; scoring and filtering is the part that makes the output not embarrass\nyou. And it's completely transferable — the same ",[26,547,28],{}," shows up as re-ranking\nretrieved chunks in RAG, triaging a flood of PRs, prioritizing leads, picking which alert\nactually pages a human.",[10,550,551],{},"Reach for the model as a judge before you reach for it as a writer. Your pipeline gets\ncheaper, more predictable, and a lot less sloppy.",[553,554,555],"style",{},"html pre.shiki code .sOPea, html code.shiki .sOPea{--shiki-default:#F97583;--shiki-dark:#F97583}html pre.shiki code .suv1-, html code.shiki .suv1-{--shiki-default:#E1E4E8;--shiki-dark:#E1E4E8}html pre.shiki code .s8ozJ, html code.shiki .s8ozJ{--shiki-default:#79B8FF;--shiki-dark:#79B8FF}html pre.shiki code .s4wv1, html code.shiki .s4wv1{--shiki-default:#9ECBFF;--shiki-dark:#9ECBFF}html pre.shiki code .sFR8T, html code.shiki .sFR8T{--shiki-default:#B392F0;--shiki-dark:#B392F0}html pre.shiki code .s-3mD, html code.shiki .s-3mD{--shiki-default:#FFAB70;--shiki-dark:#FFAB70}html pre.shiki code .sJ8bj, html code.shiki .sJ8bj{--shiki-default:#6A737D;--shiki-dark:#6A737D}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":65,"searchDepth":83,"depth":83,"links":557},[558,559,560,561,562,563],{"id":33,"depth":83,"text":34},{"id":54,"depth":83,"text":55},{"id":413,"depth":83,"text":414},{"id":456,"depth":83,"text":457},{"id":500,"depth":83,"text":501},{"id":541,"depth":83,"text":542},"2026-06-14","The most underrated way to put an LLM in a pipeline is as a ranking function, not a writer. How to do it cheaply, stably, and without slop.",false,"md",{},"\u002Fblog\u002Fuse-claude-as-a-scorer",{"title":5,"description":565},"blog\u002Fuse-claude-as-a-scorer","qWGEhVVLKQf_iwhmwRIniiGyzxQuUCX3Xfk0jHLIF_o",1781756060062]