[{"data":1,"prerenderedAt":408},["ShallowReactive",2],{"blog-\u002Fblog\u002Fwhat-production-means-for-an-llm-pipeline":3},{"id":4,"title":5,"body":6,"date":399,"description":400,"draft":401,"extension":402,"meta":403,"navigation":67,"path":404,"seo":405,"stem":406,"__hash__":407},"blog\u002Fblog\u002Fwhat-production-means-for-an-llm-pipeline.md","What \"production\" actually means for an LLM pipeline",{"type":7,"value":8,"toc":391},"minimark",[9,13,16,21,29,32,36,39,280,292,296,305,309,312,357,360,364,367,371,374,377,384,387],[10,11,12],"p",{},"The gap between a working demo and a system you can leave running is not the model. The\nmodel is the easy part — it works in the demo. The gap is all the plumbing the demo got to\nskip because you were standing right there when it ran.",[10,14,15],{},"Here's the field guide to the boring parts, ordered by reliability bought per line of code.",[17,18,20],"h2",{"id":19},"_1-validate-the-models-output-never-trust-free-text","1. Validate the model's output — never trust free text",[10,22,23,24,28],{},"The model ",[25,26,27],"em",{},"will"," return garbage eventually: a missing field, prose wrapped around your JSON,\nan empty string. If your pipeline assumes well-formed output, it dies on the first bad one —\nat 3am, unattended.",[10,30,31],{},"Parse, validate against a schema, and retry on failure. Better, force structured output with\na typed tool so malformed responses can't happen in the first place. The rule: the boundary\nbetween the model and the rest of your code is untrusted input. Treat it like any other.",[17,33,35],{"id":34},"_2-retries-with-backoff","2. Retries with backoff",[10,37,38],{},"Model APIs rate-limit and blip. A single 429 should not kill a run.",[40,41,46],"pre",{"className":42,"code":43,"language":44,"meta":45,"style":45},"language-python shiki shiki-themes github-dark github-dark","import time, random\n\nRETRYABLE = {408, 409, 429, 500, 502, 503, 529}\n\ndef with_retry(fn, *, attempts=5, base=1.0):\n    for n in range(attempts):\n        try:\n            return fn()\n        except APIStatusError as e:\n            if e.status_code not in RETRYABLE or n == attempts - 1:\n                raise\n            time.sleep(base * 2 ** n + random.random())  # exponential backoff + jitter\n","python","",[47,48,49,62,69,119,124,160,178,187,196,211,248,254],"code",{"__ignoreMap":45},[50,51,54,58],"span",{"class":52,"line":53},"line",1,[50,55,57],{"class":56},"sOPea","import",[50,59,61],{"class":60},"suv1-"," time, random\n",[50,63,65],{"class":52,"line":64},2,[50,66,68],{"emptyLinePlaceholder":67},true,"\n",[50,70,72,76,79,82,85,88,91,93,96,98,101,103,106,108,111,113,116],{"class":52,"line":71},3,[50,73,75],{"class":74},"s8ozJ","RETRYABLE",[50,77,78],{"class":56}," =",[50,80,81],{"class":60}," {",[50,83,84],{"class":74},"408",[50,86,87],{"class":60},", ",[50,89,90],{"class":74},"409",[50,92,87],{"class":60},[50,94,95],{"class":74},"429",[50,97,87],{"class":60},[50,99,100],{"class":74},"500",[50,102,87],{"class":60},[50,104,105],{"class":74},"502",[50,107,87],{"class":60},[50,109,110],{"class":74},"503",[50,112,87],{"class":60},[50,114,115],{"class":74},"529",[50,117,118],{"class":60},"}\n",[50,120,122],{"class":52,"line":121},4,[50,123,68],{"emptyLinePlaceholder":67},[50,125,127,130,134,137,140,143,146,149,152,154,157],{"class":52,"line":126},5,[50,128,129],{"class":56},"def",[50,131,133],{"class":132},"sFR8T"," with_retry",[50,135,136],{"class":60},"(fn, ",[50,138,139],{"class":56},"*",[50,141,142],{"class":60},", attempts",[50,144,145],{"class":56},"=",[50,147,148],{"class":74},"5",[50,150,151],{"class":60},", base",[50,153,145],{"class":56},[50,155,156],{"class":74},"1.0",[50,158,159],{"class":60},"):\n",[50,161,163,166,169,172,175],{"class":52,"line":162},6,[50,164,165],{"class":56},"    for",[50,167,168],{"class":60}," n ",[50,170,171],{"class":56},"in",[50,173,174],{"class":74}," range",[50,176,177],{"class":60},"(attempts):\n",[50,179,181,184],{"class":52,"line":180},7,[50,182,183],{"class":56},"        try",[50,185,186],{"class":60},":\n",[50,188,190,193],{"class":52,"line":189},8,[50,191,192],{"class":56},"            return",[50,194,195],{"class":60}," fn()\n",[50,197,199,202,205,208],{"class":52,"line":198},9,[50,200,201],{"class":56},"        except",[50,203,204],{"class":60}," APIStatusError ",[50,206,207],{"class":56},"as",[50,209,210],{"class":60}," e:\n",[50,212,214,217,220,223,226,229,232,234,237,240,243,246],{"class":52,"line":213},10,[50,215,216],{"class":56},"            if",[50,218,219],{"class":60}," e.status_code ",[50,221,222],{"class":56},"not",[50,224,225],{"class":56}," in",[50,227,228],{"class":74}," RETRYABLE",[50,230,231],{"class":56}," or",[50,233,168],{"class":60},[50,235,236],{"class":56},"==",[50,238,239],{"class":60}," attempts ",[50,241,242],{"class":56},"-",[50,244,245],{"class":74}," 1",[50,247,186],{"class":60},[50,249,251],{"class":52,"line":250},11,[50,252,253],{"class":56},"                raise\n",[50,255,257,260,262,265,268,270,273,276],{"class":52,"line":256},12,[50,258,259],{"class":60},"            time.sleep(base ",[50,261,139],{"class":56},[50,263,264],{"class":74}," 2",[50,266,267],{"class":56}," **",[50,269,168],{"class":60},[50,271,272],{"class":56},"+",[50,274,275],{"class":60}," random.random())  ",[50,277,279],{"class":278},"sJ8bj","# exponential backoff + jitter\n",[10,281,282,283,287,288,291],{},"Distinguish ",[284,285,286],"strong",{},"retryable"," (rate limits, overloads, timeouts) from ",[284,289,290],{},"fatal"," (a 400 — your\nrequest is malformed and will be malformed every time). Retrying a fatal error just burns\ntime and money.",[17,293,295],{"id":294},"_3-idempotency-assume-every-run-can-die-halfway","3. Idempotency — assume every run can die halfway",[10,297,298,299,304],{},"A scheduled job that crashes gets retried, which means every step runs twice sometimes.\nDesign so repeating a step is safe: guard irreversible actions with a\n",[300,301,303],"a",{"href":302},"\u002Fblog\u002Fthe-seen-store","seen-store",", use idempotency keys, write to temp files and rename.\n\"Exactly once\" is a distributed-systems fantasy; \"at least once, safely\" is achievable and\nenough.",[17,306,308],{"id":307},"_4-cost-tracking","4. Cost tracking",[10,310,311],{},"If you're not logging tokens and dollars per run, you find out what your pipeline costs from\nthe bill. It's a counter:",[40,313,315],{"className":42,"code":314,"language":44,"meta":45,"style":45},"usage = resp.usage  # input_tokens, output_tokens\nrun_cost += usage.input_tokens * IN_RATE + usage.output_tokens * OUT_RATE\n",[47,316,317,330],{"__ignoreMap":45},[50,318,319,322,324,327],{"class":52,"line":53},[50,320,321],{"class":60},"usage ",[50,323,145],{"class":56},[50,325,326],{"class":60}," resp.usage  ",[50,328,329],{"class":278},"# input_tokens, output_tokens\n",[50,331,332,335,338,341,343,346,349,352,354],{"class":52,"line":64},[50,333,334],{"class":60},"run_cost ",[50,336,337],{"class":56},"+=",[50,339,340],{"class":60}," usage.input_tokens ",[50,342,139],{"class":56},[50,344,345],{"class":74}," IN_RATE",[50,347,348],{"class":56}," +",[50,350,351],{"class":60}," usage.output_tokens ",[50,353,139],{"class":56},[50,355,356],{"class":74}," OUT_RATE\n",[10,358,359],{},"Log it per run with the run id. The day a prompt change quietly triples your token use, this\nis how you catch it in hours instead of at the end of the month.",[17,361,363],{"id":362},"_5-a-kill-switch","5. A kill switch",[10,365,366],{},"A runaway loop — a retry storm, a pagination bug, a model that keeps asking for \"one more\nstep\" — can spend real money fast. Put a ceiling on it: a per-run budget cap, a max-iteration\ncount, a circuit breaker that halts when the error rate spikes. Cheap insurance against the\n3am surprise.",[17,368,370],{"id":369},"_6-observability-you-can-grep","6. Observability you can grep",[10,372,373],{},"When it breaks, you get logs and nothing else. Make them worth having: structured lines with\na run id, the inputs, the decision, the output. \"Something failed\" is useless; \"run a3f scored\nitem X at 2, below threshold, skipped\" tells you whether the bug is the model or your code.",[375,376],"hr",{},[10,378,379,380,383],{},"The mindset under all six: ",[284,381,382],{},"assume every external call fails, every model output is suspect,\nand every run can be interrupted and resumed."," Build for that and you can actually walk away\nfrom the thing.",[10,385,386],{},"\"Production\" isn't a deploy target. It's just the set of failures you've already handled — and\nthe demo has handled none of them. Start with output validation and retries; they buy the most\nreliability per line. The rest you add the first time each one bites you.",[388,389,390],"style",{},"html pre.shiki code .sOPea, html code.shiki .sOPea{--shiki-default:#F97583;--shiki-dark:#F97583}html pre.shiki code .suv1-, html code.shiki .suv1-{--shiki-default:#E1E4E8;--shiki-dark:#E1E4E8}html pre.shiki code .s8ozJ, html code.shiki .s8ozJ{--shiki-default:#79B8FF;--shiki-dark:#79B8FF}html pre.shiki code .sFR8T, html code.shiki .sFR8T{--shiki-default:#B392F0;--shiki-dark:#B392F0}html pre.shiki code .sJ8bj, html code.shiki .sJ8bj{--shiki-default:#6A737D;--shiki-dark:#6A737D}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":45,"searchDepth":64,"depth":64,"links":392},[393,394,395,396,397,398],{"id":19,"depth":64,"text":20},{"id":34,"depth":64,"text":35},{"id":294,"depth":64,"text":295},{"id":307,"depth":64,"text":308},{"id":362,"depth":64,"text":363},{"id":369,"depth":64,"text":370},"2026-06-16","The demo works on your laptop. Production is everything the demo skipped — retries, idempotency, output validation, cost tracking, and what to do when the model returns garbage at 3am.",false,"md",{},"\u002Fblog\u002Fwhat-production-means-for-an-llm-pipeline",{"title":5,"description":400},"blog\u002Fwhat-production-means-for-an-llm-pipeline","bfTzu0x32Bg73Mal_viiEg0sAZzv1ssi3EkMO9dPfdo",1781756059953]