[{"data":1,"prerenderedAt":517},["ShallowReactive",2],{"blog-\u002Fblog\u002Fthe-seen-store":3},{"id":4,"title":5,"body":6,"date":508,"description":509,"draft":510,"extension":511,"meta":512,"navigation":61,"path":513,"seo":514,"stem":515,"__hash__":516},"blog\u002Fblog\u002Fthe-seen-store.md","The seen-store — give your agent a memory so it stops repeating itself",{"type":7,"value":8,"toc":502},"minimark",[9,13,21,26,33,382,385,389,403,437,440,444,447,468,479,483,486,489,498],[10,11,12],"p",{},"The moment you put an agent on a schedule, the question stops being \"can it do the task\"\nand becomes \"will it do the task it already did.\" A pipeline that runs every morning and\ncan't remember yesterday will re-post the same story, re-render the same video, re-email\nthe same person — confidently, on schedule, forever.",[10,14,15,16,20],{},"The fix is the cheapest reliability primitive in an autonomous system, and almost everyone\nadds it ",[17,18,19],"em",{},"after"," the agent embarrasses them. Add it first.",[22,23,25],"h2",{"id":24},"the-seen-store","The seen-store",[10,27,28,29],{},"A seen-store is a persistent set of keys you've already acted on. The rule is two lines of\nEnglish: ",[30,31,32],"strong",{},"check before you act, record after.",[34,35,40],"pre",{"className":36,"code":37,"language":38,"meta":39,"style":39},"language-python shiki shiki-themes github-dark github-dark","import json, hashlib, pathlib\n\nSEEN = pathlib.Path(\"seen.json\")\n\ndef _load() -> set[str]:\n    return set(json.loads(SEEN.read_text())) if SEEN.exists() else set()\n\ndef _save(keys: set[str]) -> None:\n    SEEN.write_text(json.dumps(sorted(keys)))\n\ndef key_for(item: dict) -> str:\n    # Stable across runs: canonical URL beats title — titles get edited.\n    basis = item.get(\"url\") or item[\"title\"]\n    return hashlib.sha256(basis.strip().lower().encode()).hexdigest()[:16]\n\ndef unseen(items: list[dict]) -> list[dict]:\n    seen = _load()\n    return [i for i in items if key_for(i) not in seen]\n\ndef mark(items: list[dict]) -> None:\n    seen = _load()\n    seen.update(key_for(i) for i in items)\n    _save(seen)\n","python","",[41,42,43,56,63,83,88,107,141,146,168,183,188,209,216,246,259,264,284,295,329,334,352,361,376],"code",{"__ignoreMap":39},[44,45,48,52],"span",{"class":46,"line":47},"line",1,[44,49,51],{"class":50},"sOPea","import",[44,53,55],{"class":54},"suv1-"," json, hashlib, pathlib\n",[44,57,59],{"class":46,"line":58},2,[44,60,62],{"emptyLinePlaceholder":61},true,"\n",[44,64,66,70,73,76,80],{"class":46,"line":65},3,[44,67,69],{"class":68},"s8ozJ","SEEN",[44,71,72],{"class":50}," =",[44,74,75],{"class":54}," pathlib.Path(",[44,77,79],{"class":78},"s4wv1","\"seen.json\"",[44,81,82],{"class":54},")\n",[44,84,86],{"class":46,"line":85},4,[44,87,62],{"emptyLinePlaceholder":61},[44,89,91,94,98,101,104],{"class":46,"line":90},5,[44,92,93],{"class":50},"def",[44,95,97],{"class":96},"sFR8T"," _load",[44,99,100],{"class":54},"() -> set[",[44,102,103],{"class":68},"str",[44,105,106],{"class":54},"]:\n",[44,108,110,113,116,119,121,124,127,130,133,136,138],{"class":46,"line":109},6,[44,111,112],{"class":50},"    return",[44,114,115],{"class":68}," set",[44,117,118],{"class":54},"(json.loads(",[44,120,69],{"class":68},[44,122,123],{"class":54},".read_text())) ",[44,125,126],{"class":50},"if",[44,128,129],{"class":68}," SEEN",[44,131,132],{"class":54},".exists() ",[44,134,135],{"class":50},"else",[44,137,115],{"class":68},[44,139,140],{"class":54},"()\n",[44,142,144],{"class":46,"line":143},7,[44,145,62],{"emptyLinePlaceholder":61},[44,147,149,151,154,157,159,162,165],{"class":46,"line":148},8,[44,150,93],{"class":50},[44,152,153],{"class":96}," _save",[44,155,156],{"class":54},"(keys: set[",[44,158,103],{"class":68},[44,160,161],{"class":54},"]) -> ",[44,163,164],{"class":68},"None",[44,166,167],{"class":54},":\n",[44,169,171,174,177,180],{"class":46,"line":170},9,[44,172,173],{"class":68},"    SEEN",[44,175,176],{"class":54},".write_text(json.dumps(",[44,178,179],{"class":68},"sorted",[44,181,182],{"class":54},"(keys)))\n",[44,184,186],{"class":46,"line":185},10,[44,187,62],{"emptyLinePlaceholder":61},[44,189,191,193,196,199,202,205,207],{"class":46,"line":190},11,[44,192,93],{"class":50},[44,194,195],{"class":96}," key_for",[44,197,198],{"class":54},"(item: ",[44,200,201],{"class":68},"dict",[44,203,204],{"class":54},") -> ",[44,206,103],{"class":68},[44,208,167],{"class":54},[44,210,212],{"class":46,"line":211},12,[44,213,215],{"class":214},"sJ8bj","    # Stable across runs: canonical URL beats title — titles get edited.\n",[44,217,219,222,225,228,231,234,237,240,243],{"class":46,"line":218},13,[44,220,221],{"class":54},"    basis ",[44,223,224],{"class":50},"=",[44,226,227],{"class":54}," item.get(",[44,229,230],{"class":78},"\"url\"",[44,232,233],{"class":54},") ",[44,235,236],{"class":50},"or",[44,238,239],{"class":54}," item[",[44,241,242],{"class":78},"\"title\"",[44,244,245],{"class":54},"]\n",[44,247,249,251,254,257],{"class":46,"line":248},14,[44,250,112],{"class":50},[44,252,253],{"class":54}," hashlib.sha256(basis.strip().lower().encode()).hexdigest()[:",[44,255,256],{"class":68},"16",[44,258,245],{"class":54},[44,260,262],{"class":46,"line":261},15,[44,263,62],{"emptyLinePlaceholder":61},[44,265,267,269,272,275,277,280,282],{"class":46,"line":266},16,[44,268,93],{"class":50},[44,270,271],{"class":96}," unseen",[44,273,274],{"class":54},"(items: list[",[44,276,201],{"class":68},[44,278,279],{"class":54},"]) -> list[",[44,281,201],{"class":68},[44,283,106],{"class":54},[44,285,287,290,292],{"class":46,"line":286},17,[44,288,289],{"class":54},"    seen ",[44,291,224],{"class":50},[44,293,294],{"class":54}," _load()\n",[44,296,298,300,303,306,309,312,315,317,320,323,326],{"class":46,"line":297},18,[44,299,112],{"class":50},[44,301,302],{"class":54}," [i ",[44,304,305],{"class":50},"for",[44,307,308],{"class":54}," i ",[44,310,311],{"class":50},"in",[44,313,314],{"class":54}," items ",[44,316,126],{"class":50},[44,318,319],{"class":54}," key_for(i) ",[44,321,322],{"class":50},"not",[44,324,325],{"class":50}," in",[44,327,328],{"class":54}," seen]\n",[44,330,332],{"class":46,"line":331},19,[44,333,62],{"emptyLinePlaceholder":61},[44,335,337,339,342,344,346,348,350],{"class":46,"line":336},20,[44,338,93],{"class":50},[44,340,341],{"class":96}," mark",[44,343,274],{"class":54},[44,345,201],{"class":68},[44,347,161],{"class":54},[44,349,164],{"class":68},[44,351,167],{"class":54},[44,353,355,357,359],{"class":46,"line":354},21,[44,356,289],{"class":54},[44,358,224],{"class":50},[44,360,294],{"class":54},[44,362,364,367,369,371,373],{"class":46,"line":363},22,[44,365,366],{"class":54},"    seen.update(key_for(i) ",[44,368,305],{"class":50},[44,370,308],{"class":54},[44,372,311],{"class":50},[44,374,375],{"class":54}," items)\n",[44,377,379],{"class":46,"line":378},23,[44,380,381],{"class":54},"    _save(seen)\n",[10,383,384],{},"That's the whole idea. A JSON file is fine until it isn't; swap it for SQLite or Redis when\nyou outgrow it and the interface stays the same.",[22,386,388],{"id":387},"the-key-is-the-actual-hard-part","The key is the actual hard part",[10,390,391,394,395,398,399,402],{},[41,392,393],{},"unseen()"," is trivial. ",[41,396,397],{},"key_for()"," is where the bodies are buried. The key has to be\n",[30,400,401],{},"stable"," — the same logical item must hash to the same key on every run.",[404,405,406,413,431],"ul",{},[407,408,409,412],"li",{},[30,410,411],{},"Don't key on the title."," Titles get re-edited; a one-character change and the item\nlooks brand new.",[407,414,415,418,419,422,423,426,427,430],{},[30,416,417],{},"Canonicalize URLs"," before hashing — strip ",[41,420,421],{},"utm_"," params, trailing slashes, fragments.\n",[41,424,425],{},"example.com\u002Fpost"," and ",[41,428,429],{},"example.com\u002Fpost?utm_source=rss"," are the same story.",[407,432,433,436],{},[30,434,435],{},"Content-hash when there's no stable id"," — for near-duplicates (the same story from two\noutlets), hash a normalized chunk of the body, not the headline.",[10,438,439],{},"A bad key gives you one of two failures: too loose and you re-do work, too tight and you\nsilently drop new items. Spend your time here.",[22,441,443],{"id":442},"record-before-or-after","Record before or after?",[10,445,446],{},"This is the question that decides whether your seen-store is an optimization or a\ncorrectness guarantee.",[404,448,449,459],{},[407,450,451,454,455,458],{},[30,452,453],{},"Record after success"," and a crash ",[17,456,457],{},"between acting and recording"," gives you a duplicate\nnext run.",[407,460,461,454,464,467],{},[30,462,463],{},"Record before acting",[17,465,466],{},"between recording and acting"," drops the item\nforever.",[10,469,470,471,474,475,478],{},"For reversible work (rendering a file you'll overwrite anyway), record-after and accept the\noccasional repeat. For ",[30,472,473],{},"irreversible side effects — sending an email, posting a video,\ncharging a card"," — the seen-store ",[17,476,477],{},"is"," your correctness boundary: guard the irreversible\ncall, and lean toward at-least-once with a downstream dedupe rather than at-most-once that\ncan silently drop. A duplicate is embarrassing; a dropped payment is a bug report.",[22,480,482],{"id":481},"keep-it-bounded","Keep it bounded",[10,484,485],{},"A seen-store that only grows is a slow leak. Cap it: a TTL (drop keys older than N days), or\na max size with FIFO eviction. For most content pipelines, \"anything older than 30 days\nwon't resurface anyway\" is a fine pruning rule.",[487,488],"hr",{},[10,490,491,492,497],{},"Build the seen-store before the agent does something twice in public. It's twenty lines, it's\nthe difference between \"runs unattended\" and \"runs unattended until it doesn't,\" and it's the\nfoundation everything else in ",[493,494,496],"a",{"href":495},"\u002Fblog\u002Fwhat-production-means-for-an-llm-pipeline","a production pipeline","\nstands on.",[499,500,501],"style",{},"html pre.shiki code .sOPea, html code.shiki .sOPea{--shiki-default:#F97583;--shiki-dark:#F97583}html pre.shiki code .suv1-, html code.shiki .suv1-{--shiki-default:#E1E4E8;--shiki-dark:#E1E4E8}html pre.shiki code .s8ozJ, html code.shiki .s8ozJ{--shiki-default:#79B8FF;--shiki-dark:#79B8FF}html pre.shiki code .s4wv1, html code.shiki .s4wv1{--shiki-default:#9ECBFF;--shiki-dark:#9ECBFF}html pre.shiki code .sFR8T, html code.shiki .sFR8T{--shiki-default:#B392F0;--shiki-dark:#B392F0}html pre.shiki code .sJ8bj, html code.shiki .sJ8bj{--shiki-default:#6A737D;--shiki-dark:#6A737D}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":39,"searchDepth":58,"depth":58,"links":503},[504,505,506,507],{"id":24,"depth":58,"text":25},{"id":387,"depth":58,"text":388},{"id":442,"depth":58,"text":443},{"id":481,"depth":58,"text":482},"2026-06-15","An autonomous pipeline that can't remember what it already did will, eventually, do it again. The fix is twenty lines, not a database migration.",false,"md",{},"\u002Fblog\u002Fthe-seen-store",{"title":5,"description":509},"blog\u002Fthe-seen-store","1YZwEh3PUOQrn7YUBdIf9TZRJHzheoUQALEPb-AcxeU",1781756060062]