<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Another Coding Blog]]></title><description><![CDATA[Bringing you insights and education from 13 years of experience across AI, Data and Analytics ]]></description><link>https://www.anothercodingblog.com</link><image><url>https://substackcdn.com/image/fetch/$s_!2kzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615044d0-cdfb-47ac-9a1b-3883974114e7_1024x1024.png</url><title>Another Coding Blog</title><link>https://www.anothercodingblog.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 01 Jul 2026 18:37:48 GMT</lastBuildDate><atom:link href="https://www.anothercodingblog.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Taylor Ortiz]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[ortizt@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[ortizt@substack.com]]></itunes:email><itunes:name><![CDATA[Taylor Ortiz]]></itunes:name></itunes:owner><itunes:author><![CDATA[Taylor Ortiz]]></itunes:author><googleplay:owner><![CDATA[ortizt@substack.com]]></googleplay:owner><googleplay:email><![CDATA[ortizt@substack.com]]></googleplay:email><googleplay:author><![CDATA[Taylor Ortiz]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Another Weekly AI Newsletter: Issue 78]]></title><description><![CDATA[OpenAI previews GPT-5.6 Sol as Washington asks to slow roll. AI patches the bugs it finds. OpenAI releases Jalape&#241;o. A24 Films x Deepmind. xAI ships /goals. Claude Tag makes a splash in the enterprise]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-316</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-316</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 27 Jun 2026 13:01:02 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/184c2142-6bd4-4ec7-a8a5-37c5d60ec381_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Washington&#8217;s grip extended from Anthropic to OpenAI.</h2><ul><li><p><strong>OpenAI previewed GPT-5.6 Sol, its next-generation model.</strong> The <a href="https://openai.com/index/previewing-gpt-5-6-sol">preview</a> introduced Sol alongside Terra and Luna variants, with METR publishing a <a href="https://metr.org/blog/2026-06-26-gpt-5-6-sol">predeployment evaluation</a> of Sol&#8217;s capabilities.</p></li><li><p><strong>The White House asked OpenAI to slow-roll the release.</strong> OpenAI <a href="https://techcrunch.com/2026/06/26/openai-limits-gpt-5-6-rollout-after-government-request-says-restrictions-shouldnt-be-the-norm">limited the rollout after the government request</a> over safety concerns, but said publicly that such restrictions <a href="https://techcrunch.com/2026/06/25/the-white-house-is-asking-openai-to-slow-roll-the-release-of-its-new-model-over-safety-concerns">should not become the norm</a>.</p></li><li><p><strong>TechCrunch&#8217;s read: it&#8217;s no longer Anthropic vs. OpenAI.</strong> The <a href="https://techcrunch.com/2026/06/26/its-not-about-anthropic-vs-openai-anymore">framing shifted</a> to labs vs. a government that now weighs in on when frontier models ship.</p></li><li><p><strong>The thread:</strong> a week after the Anthropic ban, the same dynamic reached OpenAI in a softer form, a request to wait rather than an order to stop. The throughline is that a frontier release is now a government conversation, which is exactly what every lab and enterprise spent last week preparing for.</p></li></ul><div><hr></div><h2>AI crossed from finding software bugs to fixing them.</h2><ul><li><p><strong>OpenAI&#8217;s full GPT-5.5-Cyber set a new state of the art on the CyberGym benchmark.</strong> It topped Anthropic&#8217;s Mythos 5 as part of <a href="https://openai.com/index/daybreak-securing-the-world/">Daybreak</a>, a push to automate the patch rather than just the discovery.</p></li><li><p><strong>Codex Security has scanned more than 30 million commits across 30,000 codebases.</strong> Sam <a href="https://x.com/sama/status/2069121360744550796">Altman framed it</a> as solving security problems instead of only finding them, with over 500,000 fixes auto-confirmed.</p></li><li><p><strong>Patch the Planet put OpenAI&#8217;s models and Trail of Bits on 30-plus open-source projects.</strong> The <a href="https://openai.com/index/patch-the-planet">initiative</a> hardened cURL, Go, and Python and merged dozens of patches.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><ul><li><p><strong>The thread:</strong> for a decade the hard part of security was finding the hole, and the bet this week was that the model can write the fix too. The same capability cuts both ways, which is why the Five Eyes alliance spent the week warning that frontier models could supercharge offensive hacking.</p></li></ul><div><hr></div><h2>The compute land grab went into overdrive.</h2><ul><li><p><strong>OpenAI unveiled its first custom chip, built with Broadcom.</strong> The <a href="https://techcrunch.com/2026/06/24/openai-unveils-its-first-custom-chip-built-by-broadcom">in-house silicon</a> is OpenAI&#8217;s move to own its inference stack, alongside an internal accelerator it calls Jalapeno.</p></li><li><p><strong>SpaceX will rent open-source lab Reflection $6.3B of AI compute.</strong> Reflection <a href="https://techcrunch.com/2026/06/22/spacex-inks-compute-deal-with-reflection-ai-an-open-source-ai-lab/">pays $150M a month through 2029</a> for Nvidia GB300 chips at the Colossus 2 data center near Memphis.</p></li><li><p><strong>Amazon committed another $13B to AI infrastructure in India.</strong> The <a href="https://techcrunch.com/2026/06/25/amazon-ups-india-bet-with-fresh-13b-ai-infrastructure-investment">investment</a> deepens a week of India bets and adds to the global data-center build-out.</p></li><li><p><strong>Cursor is training its own model, with SpaceX,</strong> and Groq <a href="https://techcrunch.com/2026/06/22/ai-chipmaker-groq-confirms-650m-raise-re-staffs-after-nvidias-20b-not-acqui-hire-deal/">confirmed a $650M raise</a> and neocloud pivot, while <a href="https://x.com/cursor_ai/status/2069149296436330776">Cursor&#8217;s keynote</a> headlined the in-house model.</p></li><li><p><strong>SK Hynix overtook Samsung as South Korea&#8217;s most valuable company.</strong> <a href="https://www.indiatoday.in/technology/news/story/sk-hynix-overtakes-samsung-becomes-most-valuable-company-in-south-korea-2931825-2026-06-22">High-bandwidth memory demand</a> rewired the chip hierarchy, and SpaceX even <a href="https://x.com/SawyerMerritt/status/2069570317798785480">named its planned AI satellite network Starmind</a>.</p></li><li><p><strong>The thread:</strong> the week&#8217;s deals all point one way. When access to a model, or the chips under it, can shift on a policy decision, owning the compute, the silicon, and the weights is the hedge everyone is suddenly paying for.</p></li></ul><div><hr></div><h2>Agents got real permissions.</h2><ul><li><p><strong>xAI shipped /goal, multi-agent coding in Grok Build.</strong> <a href="https://x.ai/news/introducing-goal">The command</a> hands Grok an objective and coordinates planners, implementors, and reviewers to reach it.</p></li><li><p><strong>Google&#8217;s Interactions API hit general availability,</strong> a <a href="https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/">unified endpoint</a> for Gemini agents with server-side state, while DeepMind also shipped <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/introducing-computer-use-gemini-3-5-flash">computer use in Gemini 3.5 Flash</a>.</p></li><li><p><strong>AWS shipped Bedrock AgentCore Payments,</strong> so agents can <a href="https://aws.amazon.com/blogs/machine-learning/building-pay-per-intelligence-for-ai-agents-how-ampersend-uses-amazon-bedrock-agentcore-payments">pay per call mid-task</a>, and Notion launched <a href="https://x.com/NotionHQ/status/2069816393395012009">External Agents with Claude and Cursor</a>.</p></li><li><p><strong>Self-Harness lets agents rewrite their own rules.</strong> The <a href="https://venturebeat.com/orchestration/researchers-introduce-self-harness-a-framework-that-lets-ai-agents-rewrite-their-own-rules-boosting-performance-up-to-60">framework</a> reports gains up to 60%, while Stanford&#8217;s <a href="https://venturebeat.com/orchestration/stanfords-delm-cuts-multi-agent-task-costs-50-without-a-central-orchestrator">DeLM</a> cut multi-agent costs 50% with no central orchestrator.</p></li><li><p><strong>The warnings arrived with the tools.</strong> CrewAI told teams to <a href="https://crewai.com/blog/stop-giving-your-agents-database-credentials">stop handing agents raw database credentials</a>, and Simon Willison reframed <a href="https://simonwillison.net/2026/Jun/22/prompt-injection-as-role-confusion/">prompt injection as &#8220;role confusion.&#8221;</a></p></li><li><p><strong>The thread:</strong> the tooling to give agents money, goals, and self-modification landed the same week as the arguments about what happens once they have it. TechCrunch called the result a world getting <a href="https://techcrunch.com/2026/06/22/the-ai-world-is-getting-loopy">&#8220;loopy,&#8221;</a> with swarms of agents running in the background instead of waiting for a prompt.</p></li></ul><div><hr></div><h2>The AI video race and the culture fight both escalated.</h2><ul><li><p><strong>Alibaba&#8217;s HappyHorse passed Sora to No. 2 in global video rankings.</strong> The <a href="https://venturebeat.com/technology/alibabas-ai-video-model-rises-to-no-2-in-global-rankings-as-openais-sora-and-bytedances-seedance-fall-away">1.1 model</a> is API-first with reference-to-video character consistency.</p></li><li><p><strong>ByteDance answered with Seedance 2.5,</strong> the <a href="https://www.panewslab.com/en/articles/019ef29d-f68b-773e-b2e3-65ca1960017c">Doubao team&#8217;s model</a> doing native 4K and roughly 30-second clips with up to 50 reference inputs.</p></li><li><p><strong>Google DeepMind bet $75M on Hollywood,</strong> a <a href="https://techcrunch.com/2026/06/22/google-deepmind-bets-75m-on-ais-future-in-hollywood-with-a24-deal/">research partnership with A24</a> pitched as working with artists rather than around them, while Adobe <a href="https://techcrunch.com/2026/06/25/adobe-acquires-image-and-video-enhancement-tool-maker-topaz-labs">acquired Topaz Labs</a> and shipped <a href="https://venturebeat.com/orchestration/adobe-embeds-agentic-ai-workflows-across-creative-cloud-shifting-from-media-generation-to-production-orchestration">agentic workflows across Creative Cloud</a>.</p></li><li><p><strong>SZA called musicians who support AI &#8220;disgusting.&#8221;</strong> Her <a href="https://www.nme.com/news/music/sza-hits-out-at-disgusting-ai-music-after-learning-over-200-of-her-songs-had-been-used-to-train-artificial-intelligence-3952165">comments</a> followed claims that 200-plus of her songs were used as training data, sharpening the artists-versus-AI backlash.</p></li><li><p><strong>The thread:</strong> the labs courting Hollywood and the artists rejecting AI are reacting to the same fact, that generative video crossed from demo to production, and tools like Runway&#8217;s new <a href="https://x.com/runwayml/status/2070215480401604954">Agent 2.0</a> are pushing it into ad and campaign workflows.</p></li></ul><div><hr></div><h2><strong>&#11088; Featured: Anthropic put Claude inside Slack as a teammate.</strong></h2><p><a href="https://www.anthropic.com/news/introducing-claude-tag">Claude Tag</a> is Anthropic&#8217;s attempt to make Claude an always-on coworker instead of a chat window you visit. In Slack, Claude joins a workspace as a member: you grant it access to chosen channels, tools, data, and even codebases, then tag @Claude to delegate a task the way you would a colleague, while it keeps persistent context on how your company works. It is in beta for Claude Enterprise and Team customers.</p><div id="youtube2-VojDzHaciKQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;VojDzHaciKQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/VojDzHaciKQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>What makes it notable is the shift in interaction model. Andrej Karpathy <a href="https://x.com/karpathy/status/2069547676849557725">called it a new paradigm</a>, Claude operating inline with the rest of org-wide activity rather than in a separate app, and argued the hard part is the under-the-hood plumbing across tools, integrations, compute, and memory. TechCrunch&#8217;s <a href="https://techcrunch.com/2026/06/23/anthropics-claude-tag-is-learning-your-company-one-slack-message-at-a-time">read is sharper</a>: Claude Tag is learning your company one Slack message at a time. The upside is an assistant that actually knows your context. The open question is the one every broadly-scoped agent raised this week, which is how much access you are comfortable giving something that quietly reads everything.</p><div><hr></div><h2><strong>&#127897;&#65039; Worth a Listen</strong></h2><div id="youtube2-V04bm-3d6EQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;V04bm-3d6EQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/V04bm-3d6EQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong>Google DeepMind&#8217;s podcast on agentic economies.</strong> Host Hannah Fry digs into <a href="https://x.com/GoogleDeepMind/status/2069785314663497966">what happens</a> when millions of AI agents start negotiating, transacting, and delegating to one another, and how to diversify their decision-making to avoid &#8220;AI groupthink.&#8221; A useful frame for the week agents got payments and goals.</p><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://x.com/krea_ai/status/2069435590995812396">Krea released open weights for Krea 2</a></strong> | Krea &#8212; an undistilled base plus a fast distilled version, built to fine-tune.</p></li><li><p><strong><a href="https://mistral.ai/news/ocr-4">Mistral shipped OCR 4</a></strong> | Mistral &#8212; bounding boxes, block classification, and confidence scores across 170 languages.</p></li><li><p><strong><a href="https://venturebeat.com/technology/z-ais-open-weights-glm-5-2-beats-gpt-5-5-on-multiple-long-horizon-coding-benchmarks-for-1-6th-the-cost">Z.ai&#8217;s GLM-5.2 beat GPT-5.5 on coding benchmarks and landed in Cursor</a></strong> | VentureBeat &#8212; open weights at roughly a sixth of the cost.</p></li><li><p><strong><a href="https://x.com/NVIDIAAI/status/2070602795737035252">NVIDIA&#8217;s Nemotron 3 Ultra ranked among the top open models</a></strong> | NVIDIA &#8212; the open-weights field kept filling in.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/22/amazon-is-testing-alexa-in-india-with-hindi-support/">Amazon is testing a Hindi Alexa+ in India</a></strong> | TechCrunch &#8212; localization aimed at a massive market.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/22/whatsapp-gets-new-chief-as-meta-taps-indias-cred-founder-kunal-shah-and-invests-900m-in-startup/">Meta named Cred&#8217;s Kunal Shah WhatsApp chief and invested $900M</a></strong> | TechCrunch &#8212; a sharp India tilt.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/25/general-intuitions-2-3b-bet-that-video-games-can-train-ai-agents-for-the-real-world">General Intuition raised $2.3B to train agents on video games</a></strong> | TechCrunch &#8212; games as a world model for real-world agents.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/24/agility-robotics-plans-to-go-public-via-spac-in-a-2-5b-deal">Agility Robotics is going public via a $2.5B SPAC</a></strong> | TechCrunch &#8212; humanoids hit the public markets.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/25/databricks-former-ai-chief-thinks-he-can-cut-ais-power-bill-by-1000x">Databricks&#8217; former AI chief says he can cut AI&#8217;s power bill 1,000x</a></strong> | TechCrunch &#8212; a swing at inference economics.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/25/anthropics-claude-is-winning-over-paid-consumers-a-market-owned-by-chatgpt">Anthropic&#8217;s Claude is winning over paid consumers</a></strong> | TechCrunch &#8212; eating into a market ChatGPT owned.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/24/ai-was-supposed-to-kill-engineering-jobs-but-new-data-suggests-theyre-the-most-resilient">New data suggests engineering jobs are AI&#8217;s most resilient</a></strong> | TechCrunch &#8212; against the &#8220;AI kills coding&#8221; narrative.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/22/the-running-list-major-tech-layoffs-in-2026-where-employers-cited-ai">Oracle cut 21,000 jobs and cited AI</a></strong> | TechCrunch &#8212; now on the running list of 2026 AI-attributed layoffs.</p></li><li><p><strong><a href="https://money.usnews.com/investing/news/articles/2026-06-22/ai-booms-us-employment-wage-impact-muted-so-far-ecb-study-finds">An ECB study found AI&#8217;s wage and job impact muted so far</a></strong> | U.S. News &#8212; a data point against the loudest displacement claims.</p></li><li><p><strong><a href="https://www.techtimes.com/articles/318809/20260621/nadella-names-openai-anthropic-ai-giants-must-earn-societal-permission.htm">Satya Nadella said AI monopolies are a problem</a></strong> | TechTimes &#8212; concentrated AI power is unstable, he argued.</p></li><li><p><strong><a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think">Google redesigned the search box for the first time in 25 years</a></strong> | VentureBeat &#8212; AI Overviews and AI Mode merged into one box.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Enterprise Agents Need More Than Tools: Identity, Connectors, and Interrupts]]></title><description><![CDATA[After building a Slack-facing agent with Vercel Eve, I started thinking about the less visible architecture around enterprise agents: who the agent acts as, how tokens are handled, and what happens when a downstream system needs a human to answer.]]></description><link>https://www.anothercodingblog.com/p/enterprise-agents-need-more-than</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/enterprise-agents-need-more-than</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Wed, 24 Jun 2026 12:30:49 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ab03dab5-5dbd-48c0-91e0-b856e9ee247c_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Introduction</h2><p>After building the first Eve workflow, I kept coming back to the connector layer.</p><p>The demo itself was small. A Slack message became a LinkedIn draft. The draft went through a lint check. I approved the publish step in Slack. Eve wrote the final result to Notion.</p><p>That was enough to make the loop concrete, but it also exposed a second layer that feels just as important for enterprise systems: identity.</p><p>It is easy to say an agent has access to a tool. In practice, that sentence hides several questions.</p><p>Who invoked the agent? Which runtime is executing the workflow? Which named agent is performing the action? Is the downstream system using the human&#8217;s OAuth token or an integration user? If a downstream system needs more input, where does that question show up?</p><p>Those questions are where agent demos start to look like enterprise architecture.</p><h2>Connectors Are More Than Tool Access</h2><p>When people talk about agents, the conversation often moves quickly to tools.</p><p>Can the agent query Salesforce? Can it write to Notion? Can it open a Jira ticket? Can it call a warehouse? Can it invoke an internal API?</p><p>Those are important questions, but they are incomplete.</p><p>The more important version is:</p><pre><code><code>Can the agent call the right capability
with the right identity
through the right permission boundary
and leave behind a trace we can reason about later?</code></code></pre><p>That is why the connector layer matters.</p><p>In the Eve project, the Notion connection used Vercel Connect. The first time the workflow needed Notion, the agent could park, ask me to authorize, resume after the OAuth flow completed, and continue the original run.</p><p>That behavior was easy to treat as setup friction in the moment. Looking back, it was one of the most useful parts of the experiment.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>There was a difference between the project having a Notion connector configured and me, as the Slack user, being authorized to use Notion through that connector.</p><p>That same distinction shows up quickly in enterprise systems.</p><h2>The Four Identity Layers</h2><p>The mental model I ended up with has four identity layers.</p><pre><code><code>Human identity
The person who invoked the agent or responded to the prompt.

Runtime identity
The deployed system executing the workflow.

Agent identity
The named agent or workflow performing the work.

Downstream authority
The identity the downstream system uses to authorize the action.</code></code></pre><p>In the Eve demo:</p><pre><code><code>Human identity: me in Slack
Runtime identity: the Vercel deployment
Agent identity: the Eve content assistant
Downstream authority: my Notion authorization</code></code></pre><p>That last line is where the architecture can vary.</p><p>For a system like Salesforce, the downstream authority might be a user-delegated OAuth token:</p><pre><code><code>Taylor in Slack
-&gt; Eve workflow
-&gt; Salesforce using Taylor's OAuth token</code></code></pre><p>Or it might be an integration user:</p><pre><code><code>Taylor in Slack
-&gt; Eve workflow
-&gt; Salesforce using an enterprise integration user</code></code></pre><p>Both patterns can be valid. They create different control problems.</p><p>With user-delegated OAuth, Salesforce can enforce the user&#8217;s existing permissions. That tends to be cleaner for audit and least privilege.</p><p>With an integration user, the agent may have a broader operational identity inside Salesforce. That can be useful for controlled backend workflows, but it puts more responsibility on the harness. The agent has to enforce what the human is allowed to request, because Salesforce may only see the integration user.</p><h2>How Salesforce Would Be Queried</h2><p>Vercel Connect handles authorization and token management. The query itself belongs to the capability layer we expose to the agent.</p><p>For Salesforce, I would expect one of a few patterns:</p><pre><code><code>Eve -&gt; Salesforce MCP server -&gt; Salesforce APIs
Eve -&gt; OpenAPI connection -&gt; Salesforce REST APIs
Eve -&gt; custom tool -&gt; Salesforce SDK / REST / SOQL
Eve -&gt; Salesforce agent endpoint -&gt; Salesforce-native agent or flow</code></code></pre><p>The connector supplies the token. The tool, MCP server, OpenAPI connection, or custom integration performs the actual query.</p><p>For example, if a user asks:</p><pre><code><code>Can you prep me for my Acme renewal call?</code></code></pre><p>The Slack-facing agent might infer that this is a CRM task. Then it could call a Salesforce capability with a structured request:</p><pre><code><code>{
  "task": "prepare_account_brief",
  "account_name": "Acme",
  "intent": "renewal_call_prep",
  "constraints": {
    "read_only": true
  },
  "needed_outputs": [
    "open_opportunities",
    "renewal_risks",
    "recent_cases",
    "talking_points"
  ]
}</code></code></pre><p>The model can infer the need for Salesforce because the harness advertises Salesforce as a capability. The schema and policy code decide whether the request is valid enough to execute.</p><p>That distinction is important. I would not want a broad, vague &#8220;Salesforce access&#8221; tool. I would want a defined set of Salesforce capabilities with input schemas, output schemas, auth mode, risk level, approval policy, and eval coverage.</p><h2>Where The Inference Happens</h2><p>There are two places inference can happen.</p><p>The Slack-facing agent can infer that the user&#8217;s request belongs to Salesforce:</p><pre><code><code>Slack request
-&gt; Eve interprets the intent
-&gt; Eve builds a structured Salesforce task
-&gt; Eve calls the Salesforce capability</code></code></pre><p>The Salesforce side can also perform domain-specific inference:</p><pre><code><code>Structured Salesforce task
-&gt; Salesforce agent/action/flow decides which records matter
-&gt; Salesforce returns a structured result</code></code></pre><p>That split feels right to me.</p><p>The outer agent should understand the human workflow, the Slack context, the approval expectations, and the enterprise capability registry. Salesforce should understand Salesforce.</p><p>For a mature enterprise environment, I would expect both layers to have some kind of registry.</p><p>The outer registry might know:</p><pre><code><code>Capability: Salesforce account brief
Owner: Revenue systems
Auth mode: user OAuth
Risk: read-only
Input schema: account, intent, constraints
Output schema: summary, risks, next actions
Eval coverage: CRM routing and no-write behavior</code></code></pre><p>The Salesforce-side registry might know:</p><pre><code><code>Agent/action: renewal risk analyst
Agent/action: pipeline hygiene reviewer
Agent/action: account update proposer
Agent/action: follow-up drafter</code></code></pre><p>That keeps the Slack-facing agent from needing to know every Salesforce-native detail. It only needs to know how to ask Salesforce for bounded work.</p><h2>The Nested Human Input Problem</h2><p>The most interesting question came up when thinking about Salesforce agents.</p><p>What if the Salesforce-side agent starts its own execution and then needs a human to answer something?</p><p>For example:</p><pre><code><code>User: Prep me for my Acme renewal call.</code></code></pre><p>The Slack-facing agent delegates to Salesforce. The Salesforce-side agent finds two relevant opportunities and needs the user to choose one before it can continue.</p><p>The human is in Slack. The Salesforce agent is somewhere downstream. The outer Eve workflow is waiting on the Salesforce result.</p><p>That requires interrupt propagation.</p><p>The Salesforce-side agent would need to return something like:</p><pre><code><code>{
  "status": "input_required",
  "prompt": "Which opportunity should I use?",
  "options": [
    { "id": "opp_1", "label": "Acme Renewal FY26" },
    { "id": "opp_2", "label": "Acme Expansion" }
  ],
  "resume_handle": "opaque-salesforce-run-token"
}</code></code></pre><p>Then the Slack-facing agent would translate that into a Slack prompt:</p><pre><code><code>Which opportunity should I use?

[Acme Renewal FY26]
[Acme Expansion]</code></code></pre><p>After the human answers, the outer agent sends the response back to Salesforce:</p><pre><code><code>{
  "resume_handle": "opaque-salesforce-run-token",
  "response": {
    "option_id": "opp_1"
  }
}</code></code></pre><p>Salesforce resumes its run, completes the task, and returns the final result to the outer agent. The outer agent summarizes the result back in Slack.</p><p>That flow is more complicated than a simple tool call, but it is probably common in real enterprise workflows.</p><p>The downstream system may need clarification, approval, disambiguation, or missing context. The user may only be available through Slack or Teams. The orchestration layer has to carry that interruption across system boundaries without losing identity, state, or auditability.</p><h2>Who Should Own The Human Surface?</h2><p>My bias is that the outer agent should own the user surface.</p><p>If the workflow starts in Slack, the human input should usually come back through Slack. If the workflow starts in Teams, it should come back through Teams.</p><p>That does not mean the outer agent owns every decision. Salesforce can still decide that it needs the user to choose an opportunity. The Salesforce-side agent can still generate the prompt and options. The outer agent owns the translation back to the human surface and the resume call afterward.</p><p>The split becomes:</p><pre><code><code>Salesforce owns the domain pause.
Eve owns the user-surface pause.</code></code></pre><p>That is the kind of boundary I would want to be explicit about in an enterprise architecture.</p><p>Otherwise, you can end up with multiple systems trying to own the human interaction at the same time. One system asks in Slack. Another sends an email. Another opens an approval in Salesforce. The user experiences one workflow as three disconnected prompts.</p><h2>What This Opens Up</h2><p>If this layer is designed well, it opens up a useful pattern.</p><p>A Slack-facing agent can become the outer coordination surface. It can understand the user&#8217;s request, choose the right enterprise capability, pass a structured task to a domain system, surface any human input back to the same thread, and continue the workflow after the answer.</p><p>The downstream systems can keep their own domain logic.</p><p>Salesforce can understand accounts, opportunities, renewals, cases, and flows. Jira can understand issues and projects. ServiceNow can understand incidents and approvals. A warehouse agent can understand metrics and query plans.</p><p>The outer loop does not have to absorb every domain.</p><p>It needs a safe way to delegate, pause, resume, and audit.</p><h2>What I Would Look For In A Platform</h2><p>This is where my evaluation criteria would move beyond model quality.</p><p>For enterprise agent infrastructure, I would want to know:</p><pre><code><code>How are human identities mapped across channels?
How are user tokens stored and refreshed?
Can a connector use user OAuth or an integration user?
Can the agent park while OAuth completes?
Can a downstream system request human input?
Can that request surface back to Slack or Teams?
Can the downstream run resume after the human responds?
Are resume handles opaque and protected?
Can we trace the full path across systems?
Can we write evals against the orchestration path?</code></code></pre><p>These questions are dry, but they are the ones that determine whether an agent can survive outside a demo.</p><h2>What I Am Taking Away</h2><p>The first Eve post was about the managed loop. This one is about what the loop has to carry.</p><p>For enterprise workflows, the loop has to carry more than prompts and tool calls. It has to carry identity, authorization state, approval state, downstream interruptions, and enough metadata to audit what happened later.</p><p>That is why connectors matter.</p><p>They are part of the control plane around the agent. They help determine who the agent is acting for, what the agent is allowed to do, and how the workflow resumes when a system needs the human again.</p><p>The more I think about this, the more I think the enterprise agent stack has to be evaluated at the boundaries:</p><pre><code><code>human to agent
agent to tool
agent to downstream agent
downstream agent back to human
human response back into the run</code></code></pre><p>Those boundaries are where the interesting failures will happen.</p><p>The agent loop matters. The harness matters. But the connectors, identity model, and interrupt protocol may be what decide whether these systems become usable in real enterprise workflows.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Building With Vercel Eve: What I Learned About Agent Loops, Slack, and Enterprise Harnesses]]></title><description><![CDATA[I spent the weekend wiring up Vercel's new agent framework to Slack and Notion. The useful takeaway was less about building another bot, and more about understanding where managed agent loops may fit in enterprise architecture.]]></description><link>https://www.anothercodingblog.com/p/building-with-vercel-eve-what-i-learned</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/building-with-vercel-eve-what-i-learned</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Mon, 22 Jun 2026 12:00:18 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/651a9ef8-9edf-4d3f-b5bf-a60a80dd59b8_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Introduction</h2><p>I spent part of this weekend building with Vercel&#8217;s new agent framework, Eve.</p><p>My goal was pretty simple. I wanted to understand how it actually worked by wiring up a real workflow instead of only reading the announcement. I have been spending a lot of time thinking about agents lately, especially how they fit into enterprise environments where security, reliability, approvals, identity, and auditability matter just as much as model quality.</p><p>The first idea was straightforward: build a Slack-based content workflow. A user messages an agent in Slack, the agent drafts a LinkedIn post, runs it through a style check, asks for approval, and then writes the finished draft to Notion.</p><p>I kept the workflow small on purpose. I wanted something I could finish in a weekend, while still touching the pieces that matter: Slack as the user surface, a model behind the agent loop, Notion as a downstream system, OAuth, human approval, evals, and deployment.</p><p>After getting the full workflow running, the part I kept thinking about was the agent loop.</p><h2>The Two Agent Patterns I Had In My Head</h2><p>Before building with Eve, I had been thinking about agents in two broad patterns.</p><p>The first pattern is closer to a graph or workflow. There is a start node, a set of steps, maybe some branching, maybe fan-out, maybe a critic step, and eventually an end node. This is useful when the shape of the work is mostly known ahead of time. You can model the process, test the transitions, and reason about the execution path.</p><p>The second pattern is closer to a generalist agent inside a harness. The model has instructions, tools, context, permissions, memory or state, and a runtime. The developer does not script every step. The agent reasons, calls a tool, observes the result, and decides what to do next. It may ask a question, hit an approval gate, or delegate to another agent.</p><p>Both patterns have a place.</p><p>If I know the process, I generally want the graph. If the agent needs to interpret a messy user request, recover from missing context, pick the right tool, or decide whether it needs a human, the generalist loop starts to make more sense.</p><p>What I wanted to understand was where Eve fits.</p><h2>What I Built</h2><p>I started with Vercel&#8217;s Eve content agent template and turned it into a Slack-first content assistant.</p><p>The workflow looked like this:</p><ol><li><p>I send a message to the Eve bot in Slack.</p></li><li><p>Eve receives the Slack event through a Vercel Connect Slack connector.</p></li><li><p>The agent interprets the request and drafts a LinkedIn post.</p></li><li><p>It loads the LinkedIn style path.</p></li><li><p>It calls a deterministic lint tool before showing the draft.</p></li><li><p>I iterate in the Slack thread.</p></li><li><p>When I say the draft is ready to publish, Eve requests approval.</p></li><li><p>Slack renders the approval step.</p></li><li><p>After approval, Eve writes the draft to a Notion database.</p></li><li><p>Eve replies back in Slack with the Notion page link.</p></li></ol><p>The actual file structure was pretty clean:</p><pre><code><code>agent/
  agent.ts
  instructions.md
  channels/
    slack.ts
  connections/
    notion.ts
  tools/
    lint_against_style.ts
    request_publish_approval.ts
  skills/
    linkedin-style/
    blog-style/
    newsletter-style/
    release-notes-style/</code></code></pre><p>That structure is a big part of how Eve works. The agent is authored as a directory. The model config lives in <code>agent.ts</code>, the standing behavior lives in <code>instructions.md</code>, Slack lives in <code>channels/</code>, Notion lives in <code>connections/</code>, tools live in <code>tools/</code>, and reusable procedures or style guides live in <code>skills/</code>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>In this template, I was working with one deployed Eve agent rather than a universal agent router.</p><p>That distinction matters.</p><p>When I message the bot in Slack, the event routes to this deployed Eve app. Inside that app, the model sees the harness I authored and decides which available capability to use.</p><h2>The Harness Is Where The Engineering Shows Up</h2><p>The word &#8220;harness&#8221; has become useful for me here.</p><p>The harness is the bounded environment around the model:</p><pre><code><code>instructions
tools
connections
skills
auth
approvals
state
evals
channels</code></code></pre><p>In my case, the harness said:</p><pre><code><code>You are a Slack-first content assistant.
You can draft LinkedIn posts.
You can load the LinkedIn style skill.
You can lint drafts.
You can ask for publish approval.
You can write to Notion through the signed-in user.
You should not publish before approval.</code></code></pre><p>That is where most of the meaningful engineering work happened. The model mattered. Most of the leverage came from the environment the model was allowed to operate inside.</p><p>That is also where the enterprise questions start to show up.</p><p>What can the agent see? What can it do? Whose credentials does it use? Which actions require approval? What gets logged? What happens if the workflow pauses? What happens if OAuth is missing? What does a successful run look like? How do we test that behavior later?</p><p>Those questions are harder than &#8220;can the model draft a LinkedIn post?&#8221;</p><h2>The Loop</h2><p>The most interesting part of Eve for me was the managed loop.</p><p>A deterministic automation might look like this:</p><pre><code><code>receive input
run step A
run step B
run step C
return output</code></code></pre><p>An agent loop is different:</p><pre><code><code>receive input
call the model
model requests a tool
run the tool
append the result
call the model again
maybe ask the human
maybe wait for OAuth
maybe call another tool
eventually return output</code></code></pre><p>In code, the mental model is something like:</p><pre><code><code>while (!done) {
  const next = await model({
    history,
    instructions,
    tools,
    connections,
    state,
  });

  if (next.toolCall) {
    const result = await runTool(next.toolCall);
    history.push(result);
    continue;
  }

  if (next.needsHumanInput) {
    await parkUntilHumanResponds();
    continue;
  }

  if (next.needsOAuth) {
    await parkUntilAuthorizationCompletes();
    continue;
  }

  done = true;
}</code></code></pre><p>With Eve, I did not write that loop directly. I authored the capabilities around the loop. Eve handled the runtime behavior: session creation, step execution, tool results, parking, resuming, and streaming events back to Slack.</p><p>That was the part that started to feel important from an enterprise architecture perspective.</p><p>If a company has many Slack or Teams-based agents, we probably do not want every team rebuilding this loop layer from scratch. We do not want every team solving durable state, OAuth pauses, approval rendering, event streams, tool call history, retries, and observability in a slightly different way.</p><p>We probably want the loop layer to be managed by a platform.</p><p>Any platform in this category has to be robust enough for us to shape the harness around it.</p><h2>What Worked</h2><p>The Slack path worked first.</p><p>I added a simple <code>status</code> command that bypassed the model and responded directly from the Slack channel handler. That helped prove the connector path before worrying about the model:</p><pre><code><code>Slack -&gt; Vercel Connect trigger -&gt; /eve/v1/slack -&gt; Eve channel -&gt; Slack thread reply</code></code></pre><p>After setting up Vercel AI Gateway credits and choosing <code>deepseek/deepseek-v4-pro</code>, the model-backed path worked too. Eve could receive a Slack message, draft the LinkedIn post, call the lint tool, and reply in thread.</p><p>The publishing path was more interesting.</p><p>I added a custom <code>request_publish_approval</code> tool. The tool itself does not publish anything. It creates a human approval checkpoint before the Notion write. Slack renders that approval step, Eve parks the workflow, and once I approve, the workflow resumes and continues to Notion.</p><p>That made the workflow feel much more real.</p><p>There is a big difference between telling an agent, &#8220;ask before publishing,&#8221; and putting an approval-gated tool in the execution path. The first is an instruction. The second is part of the harness.</p><h2>What Failed First</h2><p>The first few failures were useful.</p><p>The model path was blocked until Gateway credits were enabled. I treated that as setup friction rather than a framework failure. The Slack connector worked. The deployed route worked. The model call was waiting on account setup.</p><p>The Notion path also surfaced a useful identity distinction. The project had a Notion connector configured, but the Slack user still needed to complete the per-user Notion authorization flow. That is exactly the kind of thing that shows up in real enterprise systems.</p><p>There is a difference between:</p><pre><code><code>This application has a connector configured.</code></code></pre><p>and:</p><pre><code><code>This user has authorized this connector and can act through it.</code></code></pre><p>That distinction matters for Salesforce, Jira, GitHub, ServiceNow, Snowflake, Databricks, and almost every other enterprise system you might connect to an agent.</p><p>After re-verifying Notion, Eve was able to search for the draft target, find the correct Notion database, request publish approval in Slack, and create the final page.</p><h2>Evals Were More Useful Than I Expected</h2><p>I also added a few Eve evals.</p><p>They were simple, but they caught real behavior gaps:</p><pre><code><code>If the user asks for a LinkedIn draft, call the lint tool.
If the user says not to publish, do not request publish approval.
If the user asks to publish to Notion, park on the approval tool first.
If the surface is ambiguous, ask which surface to draft for.
If publishing, include the Notion target in the approval request.</code></code></pre><p>This is another enterprise takeaway for me. If the agent loop is going to be managed, the evaluation layer still has to be ours. The platform can expose the event stream and test harness, but we have to define what good behavior looks like.</p><p>For this little content workflow, that meant testing for tool calls and approval behavior.</p><p>For an enterprise workflow, that might mean:</p><pre><code><code>Did it call Salesforce for CRM requests?
Did it avoid Salesforce for unrelated requests?
Did it use read-only mode when required?
Did it ask for approval before writes?
Did it preserve the requesting user's identity?
Did it produce an audit trail?
Did it stop when the task was complete?</code></code></pre><p>That is a different way of thinking about evals. I started grading the path through the loop alongside the final answer.</p><h2>Where This Fits With Other Agent Patterns</h2><p>This experiment also helped me separate a few concepts that are easy to blend together.</p><p>Claude Code feels like a generalist coding assistant inside a very rich harness. It can move through context, files, tools, and commands in a way that feels closer to an open-ended agent loop.</p><p>Claude Managed Agents are a different pattern. They are closer to hosted managed agents with persistent memory, a vault, MCP, subagents, and supervisor-style delegation.</p><p>LangGraph is useful when I want to explicitly design the graph. I can define nodes, edges, state transitions, routing logic, retries, and termination criteria.</p><p>Eve, at least through this template, feels like a framework for packaging one agentic workflow as a deployed app. It gives you the loop, the channel surface, the connection model, the approvals, and the eval surface. You still design the architecture.</p><p>If I wanted one Slack bot to route across three different agents, I would still need to build that supervisor pattern. Eve would not magically route across all of my deployed agents through Vercel AI Gateway. AI Gateway handles model access. Agent routing is part of the architecture I would design.</p><p>That is an important practical point.</p><p>Eve still leaves plenty of agent architecture to design. What it reduces is the amount of loop infrastructure I would have to build around that architecture.</p><h2>Connectors And Identity</h2><p>The connector layer may be one of the most important pieces for enterprise use.</p><p>With a naive agent integration, it is very easy to end up with a shared API key or service account in an environment variable. That works for demos, but it gets uncomfortable quickly.</p><p>In Eve, the Notion connection used Vercel Connect. That means the agent can hit an OAuth boundary, park the workflow, ask the user to sign in, resume after authorization, and then call Notion with a token the model never sees.</p><p>That pattern opens up the more interesting enterprise version:</p><pre><code><code>Human identity: who invoked the agent in Slack
Runtime identity: the Vercel deployment executing the workflow
Agent identity: the named workflow or bot performing the action
Downstream authority: either a user OAuth token or an integration user</code></code></pre><p>For Salesforce, this could mean a few different designs.</p><p>The agent might call Salesforce through REST or SOQL using a user-scoped OAuth token. It might call a Salesforce MCP server. It might call an OpenAPI connection. It might call a custom tool we write. It might eventually call a Salesforce-native agent or Agentforce-style endpoint if that is exposed through an API.</p><p>The important part is that the auth model and capability boundary are explicit. I would not describe the agent as simply having Salesforce. I would describe a defined Salesforce capability, a known auth mode, a schema, a risk level, and a policy around approval.</p><h2>Nested Human Input</h2><p>One of the more interesting questions that came out of this is what happens when a downstream agent needs human input.</p><p>Imagine a Slack-facing Eve agent delegates to a Salesforce agent. The Salesforce agent starts working, but then it needs the user to choose between two opportunities before it can continue.</p><p>That is a nested loop problem.</p><p>The Salesforce-side agent needs human input, but the human is sitting in Slack. Something has to translate that interruption across the boundary.</p><p>The contract might look like this:</p><pre><code><code>{
  "status": "input_required",
  "prompt": "Which opportunity should I use?",
  "options": [
    { "id": "opp_1", "label": "Acme Renewal FY26" },
    { "id": "opp_2", "label": "Acme Expansion" }
  ],
  "resume_handle": "opaque-salesforce-run-token"
}</code></code></pre><p>Eve would then render the question in Slack, park the outer workflow, collect the user&#8217;s answer, and send the response back to Salesforce with the resume handle.</p><p>That is probably a second post by itself, because it gets into interrupt propagation across agent loops. But it is the same basic lesson: once agents move into enterprise workflows, the hard parts are state, identity, permissions, interruptions, and auditability.</p><h2>What I Am Taking Away</h2><p>After building with Eve, I am thinking about enterprise agent platforms in a slightly different way.</p><p>There is still plenty of architecture to design. We still need to decide when to use deterministic graphs, when to use generalist loops, how to expose tools, how to design supervisors, how to evaluate behavior, and how to enforce permissions.</p><p>But I am more convinced that the loop layer is something many teams will not want to build over and over again.</p><p>The valuable platform layer is the one that can:</p><pre><code><code>receive work from Slack or another modality
run a model-tool loop
checkpoint state
resolve user-scoped auth
pause for OAuth or human input
resume later
avoid duplicate side effects
emit useful traces
support evals against the actual behavior</code></code></pre><p>Then the enterprise work is to build the harness around that loop:</p><pre><code><code>tools
connectors
schemas
permissions
approvals
policies
subagents
evals
audit expectations</code></code></pre><p>That is the part I would watch as these frameworks mature.</p><p>Before adopting one of these frameworks, I would still ask whether the agent can complete the task. I would also ask whether the platform gives us enough control to build the harness around the loop, inject our enterprise capability registry, and test the orchestration path safely.</p><p>That is what this Eve experiment clarified for me.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 77]]></title><description><![CDATA[The Anthropic shutdown became a global crisis, then eased. SpaceX bought Cursor for $60 billion. A Nobel laureate joined Anthropic. Open Chinese models progress. An AI improved a real drug reaction.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-ff6</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-ff6</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 20 Jun 2026 12:56:31 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a5cfc69d-cc87-4fb0-b38a-2381e14d576b_2400x1200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>The Anthropic shutdown became the story of the week.</h2><ul><li><p><strong>Amazon&#8217;s CEO reportedly triggered the ban.</strong> Andy Jassy told Treasury officials that Amazon researchers used <a href="https://techcrunch.com/2026/06/13/amazon-ceo-reportedly-raised-anthropic-model-concerns-before-government-crackdown/">Claude Fable 5 to obtain information usable in cyberattacks</a>, per the WSJ, The Information, and Reuters.</p></li><li><p><strong>The rationale shifted from jailbreak to suspected breach.</strong> The Verge reported the White House <a href="https://www.theverge.com/ai-artificial-intelligence/949644/china-white-house-anthropic-mythos">suspected a China-linked group had accessed Mythos</a>, a sharper reason than the &#8220;narrow jailbreak&#8221; first cited.</p></li><li><p><strong>76 cybersecurity veterans called the ban dangerous.</strong> An <a href="https://techcrunch.com/2026/06/15/cybersecurity-vets-protest-dangerous-us-government-ban-on-anthropics-most-powerful-models/">open letter</a> from Alex Stamos, Jon Callas, Paul Vixie, and others said it strips defenders of the best tools while the same capability exists in GPT-5.5 and Anthropic&#8217;s own Opus 4.8.</p></li><li><p><strong>At the G7, Macron and Modi demanded an off-switch guarantee.</strong> World leaders <a href="https://techcrunch.com/2026/06/17/world-leaders-want-american-ai-they-just-dont-want-america-to-be-able-to-turn-it-off/">warned the US could cut their access to American models at any time</a>.</p></li><li><p><strong>By Friday, Trump softened his stance.</strong> He <a href="https://www.reuters.com/world/us/trump-tells-axios-he-no-longer-views-anthropic-national-security-threat-2026-06-19/">told Axios he no longer views Anthropic as a national security threat</a>, though the models stayed offline and only early Mythos testers kept access.</p></li><li><p><strong>The thread:</strong> a verbal safety claim escalated into a sovereign-AI crisis, and by week&#8217;s end the political case had eroded even though the models were still dark. The lasting effect is not the ban itself but the lesson every government and enterprise drew from it: do not depend on a model someone in Washington can switch off. Sales data even suggests the feud <a href="https://techcrunch.com/2026/06/16/anthropics-latest-feud-with-the-trump-admin-may-actually-help-it-sales-data-suggests/">helped Anthropic more than it hurt</a>.</p></li></ul><div><hr></div><h2>The money and talent flywheel spun faster.</h2><ul><li><p><strong>SpaceX bought Cursor for $60 billion in stock.</strong> Days after the largest IPO in history, it used the new currency to <a href="https://techcrunch.com/2026/06/16/spacex-to-acquire-cursor-for-60b-in-stock-days-after-blockbuster-ipo/">buy its way toward the AI frontier</a>.</p></li><li><p><strong>A Gemini co-lead joined OpenAI for a reported $2.7 billion.</strong> OpenAI <a href="https://techcrunch.com/2026/06/18/openai-is-bringing-on-some-big-guns-in-the-lead-up-to-its-ipo/">hired Noam Shazeer</a> ahead of its IPO, while research chief <a href="https://www.theverge.com/ai-artificial-intelligence/952837/barret-zoph-openai-thinking-machines-lab">Barret Zoph left again</a> after five months.</p></li><li><p><strong>A Nobel laureate left Google for Anthropic.</strong> AlphaFold co-creator <a href="https://x.com/JohnJumperSci/status/2068001285173834106">John Jumper announced he is joining Anthropic</a>, the same week Washington was circling the company.</p></li><li><p><strong>The economics got harder to ignore.</strong> OpenAI <a href="https://www.reuters.com/business/openai-burned-37-billion-first-quarter-2026-information-reports-2026-06-16">burned $3.7 billion in Q1</a>, and ChatGPT&#8217;s assistant share <a href="https://techcrunch.com/2026/06/16/chatgpts-market-share-slips-below-50-for-first-time/">slipped below 50% for the first time</a>.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><ul><li><p><strong>Salesforce bought Fin for $3.6 billion</strong> to bolster <a href="https://techcrunch.com/2026/06/15/salesforce-acquires-ai-customer-service-platform-fin-for-3-6b/">its Agentforce platform</a>.</p></li><li><p><strong>The thread:</strong> the IPO-minted paper is buying talent and companies at a frantic pace, even as the category leader loses share and bleeds cash. The valuations and the fundamentals are moving in opposite directions.</p></li></ul><div><hr></div><h2>Open weights became the hedge against American AI.</h2><ul><li><p><strong>An open Chinese model matched GPT-5.5 at a fraction of the price.</strong> Z.ai&#8217;s <a href="https://simonwillison.net/2026/Jun/17/glm-52/">GLM-5.2</a> ties GPT-5.5 on coding benchmarks at roughly a sixth of the cost under an unrestricted MIT license, which Simon Willison called probably the most powerful open-weights LLM available.</p></li><li><p><strong>China kept building its own supply.</strong> ByteDance is in talks to buy <a href="https://www.reuters.com/world/china/bytedance-talks-with-chinas-iluvatar-corex-purchase-ai-chips-sources-say-2026-06-15/">at least 50,000 chips from Chinese startup Iluvatar CoreX</a>, and China <a href="https://www.reuters.com/world/china/china-tightens-indium-export-checks-ai-demand-increases-2026-06-19/">tightened export checks on indium</a> as demand rose.</p></li><li><p><strong>Andrew Ng named the dynamic.</strong> In <a href="https://www.deeplearning.ai/the-batch/issue-358">The Batch</a>, he argued that controlling who can use frontier models is now a form of power, one that pushes other nations to build their own.</p></li><li><p><strong>The thread:</strong> the shutdown was meant to project control. It instead made the case for models no government can revoke, and the strongest of those is now Chinese and free to download.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: An AI model improved a real drug-chemistry reaction.</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NCuS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NCuS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp 424w, https://substackcdn.com/image/fetch/$s_!NCuS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp 848w, https://substackcdn.com/image/fetch/$s_!NCuS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!NCuS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NCuS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp" width="1192" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1192,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Labeled glass reaction vials from Molecule.one bench-scale validation experiments.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Labeled glass reaction vials from Molecule.one bench-scale validation experiments." title="Labeled glass reaction vials from Molecule.one bench-scale validation experiments." srcset="https://substackcdn.com/image/fetch/$s_!NCuS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp 424w, https://substackcdn.com/image/fetch/$s_!NCuS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp 848w, https://substackcdn.com/image/fetch/$s_!NCuS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp 1272w, https://substackcdn.com/image/fetch/$s_!NCuS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8ce4afe-48a8-4f79-bd2a-8c0d830e98e2_1192x800.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The week&#8217;s most novel result was not a chatbot or a funding round. It happened at a lab bench.</p><p>OpenAI, working with the automated chemistry lab Molecule.one, had <a href="https://openai.com/index/ai-chemist-improves-reaction/">GPT-5.4 propose and validate a new additive</a> for a Chan-Lam coupling, a notoriously finicky reaction in medicinal chemistry. The model suggested using TEMPO, a choice human chemists found counterintuitive, and it raised yields across more than 80% of tested substrates. The result was confirmed at the bench across 10,080 individual reactions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!df_G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!df_G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png 424w, https://substackcdn.com/image/fetch/$s_!df_G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png 848w, https://substackcdn.com/image/fetch/$s_!df_G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!df_G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!df_G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png" width="1456" height="1077" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1077,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98764,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/202837417?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!df_G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png 424w, https://substackcdn.com/image/fetch/$s_!df_G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png 848w, https://substackcdn.com/image/fetch/$s_!df_G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!df_G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c218ebb-8c94-4e9a-881d-ab6371bc1c5a_1731x1280.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What makes this different from the usual &#8220;AI for science&#8221; announcement is that the model contributed the hypothesis, not just the literature search. It proposed something a trained chemist would not obviously try, and the wet-lab data backed it up. This is still narrow and human-steered: chemists framed the problem, ran the reactions, and checked the work. But &#8220;the model proposed a surprising idea that worked in the real world&#8221; is a categorically stronger claim than &#8220;the model summarized the field,&#8221; and it is the kind of result that turns AI from a writing tool into a research collaborator.</p><div><hr></div><h2><strong>&#128736;&#65039; </strong>For Builders</h2><p>The competition kept moving off the base model and onto the tooling. The week&#8217;s most useful shipping, in one place:</p><ul><li><p><strong>Anthropic shipped a product slate while under fire.</strong> Claude Code added <a href="https://claude.com/blog/artifacts-in-claude-code">shareable Artifacts</a> (live dashboards and PR walkthroughs at a private link), Claude Design got <a href="https://claude.com/blog/claude-design-stays-on-brand-for-daily-work">a major overhaul with design-system imports and two-way /design-sync</a> that pushes it into Figma and Canva territory, and a <a href="https://claude.com/blog/steering-claude-code-skills-hooks-rules-subagents-and-more">steering guide</a> laid out skills, hooks, and subagents.</p></li><li><p><strong>Agent identity and enterprise plumbing matured.</strong> MCP gained <a href="https://claude.com/blog/enterprise-managed-auth">enterprise-managed OAuth</a> and <a href="https://claude.com/blog/workload-identity-federation">Workload Identity Federation</a> for short-lived credentials, while NewCore left stealth with <a href="https://techcrunch.com/2026/06/15/ai-agents-are-becoming-employees-newcore-emerges-with-66m-to-give-them-identities/">$66 million to give agents managed identities</a>.</p></li><li><p><strong>The clouds and frameworks shipped too.</strong> Vercel launched its <a href="https://vercel.com/blog/agent-stack">Agent Stack</a>, Hugging Face, Microsoft, and Google proposed <a href="https://huggingface.co/blog/agentic-resource-discovery-launch">Agentic Resource Discovery</a>, AWS made <a href="https://aws.amazon.com/blogs/machine-learning/introducing-web-search-on-amazon-bedrock-agentcore/">web search GA on Bedrock AgentCore</a> and previewed <a href="https://aws.amazon.com/blogs/machine-learning/context-intelligence-for-your-data-and-ai-agents-at-scale/">a knowledge-graph Context service</a>, and GitHub&#8217;s HyDRA router <a href="https://github.blog/ai-and-ml/github-copilot/getting-more-from-each-token-how-copilot-improves-context-handling-and-model-routing/">cut Copilot costs 72.5%</a>.</p></li><li><p><strong>Mind the soft spot.</strong> Roughly <a href="https://venturebeat.com/security/7000-langflow-servers-under-attack-langgraph-langchain-same-holes">7,000 Langflow, LangGraph, and LangChain servers came under attack</a>, a reminder that the agent stack&#8217;s attack surface is growing with it.</p></li><li><p><strong>The thread:</strong> identity, context, discovery, and cost are where the real work is now, and Anthropic, under the most political pressure of any lab, still had the busiest builder week. Pick tools you can run and swap, because the base model is increasingly the easy part.</p></li></ul><div><hr></div><h2><strong>&#127897;&#65039; </strong>Worth a Watch</h2><div id="youtube2-9YMYVb1ASCg" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;9YMYVb1ASCg&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/9YMYVb1ASCg?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong><a href="https://www.youtube.com/watch?v=9YMYVb1ASCg">The human competitive edge in an AI world</a></strong> | Simon Sinek&#8217;s &#8220;A Bit of Optimism&#8221; with Wharton&#8217;s Ethan Mollick, a rare pragmatist between the doomers and zealots. The useful parts:</p><ul><li><p><strong>Taste is the edge.</strong> If Claude runs your company well, it runs every company well, and generically high quality with no variation means no competitive advantage. Humans win by supplying variation, judgment, and taste.</p></li><li><p><strong>The apprenticeship model just broke.</strong> Juniors now know less than the model, so managers delegate to the AI instead of training people. The quiet risk is losing the talent pipeline entirely.</p></li><li><p><strong>Experience beats &#8220;AI native.&#8221;</strong> In a BCG study, junior employees were often worse with AI because they cannot judge whether the output is good. The more expertise you have, the better you direct it.</p></li><li><p><strong>All AI writing is converging on one voice.</strong> It leans on &#8220;it&#8217;s not X, it&#8217;s Y,&#8221; overuses em dashes, and reads the same everywhere, which makes your own voice the differentiator. Mollick&#8217;s tip: feed it a large sample of your writing, have it draft a style guide, and paste that into custom instructions.</p></li><li><p><strong>One number to sit with.</strong> On OpenAI&#8217;s GDPval test of real expert tasks, the best models now tie or beat humans about 84% of the time, up from 48% a year ago.</p></li></ul><div><hr></div><h2>Quick Hits</h2><ul><li><p><strong><a href="https://techcrunch.com/2026/06/17/only-16-percent-of-americans-think-ai-will-have-a-positive-impact-on-society-a-new-study-shows/">Only 16% of Americans think AI will help society</a></strong> | Pew &#8212; 40% expect harm, yet 44% use ChatGPT.</p></li><li><p><strong><a href="https://venturebeat.com/technology/satya-nadella-warns-that-ai-could-hollow-out-entire-industries-echoing-the-damage-done-by-globalization">Satya Nadella warns AI could hollow out entire industries</a></strong> | VentureBeat &#8212; from the man selling the picks and shovels.</p></li><li><p><strong><a href="https://openai.com/index/diagnose-rare-childhood-diseases/">An OpenAI model helped diagnose rare childhood diseases</a></strong> | OpenAI &#8212; o3 Deep Research surfaced answers families waited years for, in NEJM AI.</p></li><li><p><strong><a href="https://deepmind.google/blog/securing-the-future-of-ai-agents/">DeepMind published an AI Control Roadmap</a></strong> | Google DeepMind &#8212; treating agents as potential insider threats.</p></li><li><p><strong><a href="https://www.anthropic.com/research/project-fetch-phase-two">Anthropic put Claude in a robodog</a></strong> | Anthropic &#8212; Opus 4.7 controlled a quadruped 20x faster than last year&#8217;s human teams.</p></li><li><p><strong><a href="https://huggingface.co/blog/allenai/molmomotion">AI2&#8217;s MolmoMotion lifted robot pick-and-place from 56% to 76%</a></strong> | Hugging Face &#8212; language-guided 3D motion forecasting.</p></li><li><p><strong><a href="https://news.mit.edu/2026/better-way-to-model-metal-alloys-behavior-0619">MIT&#8217;s curated training data beat much larger models from Google and Microsoft</a></strong> | MIT News &#8212; data quality over brute-force scale, for modeling metal alloys.</p></li><li><p><strong><a href="https://www.reuters.com/technology/norway-imposes-near-ban-ai-elementary-school-2026-06-19">Norway moved to nearly ban AI in elementary schools</a></strong> | Reuters &#8212; one of the strictest national stances yet.</p></li><li><p><strong><a href="https://calmatters.org/education/higher-education/2026/06/artificial-intelligence-cal-state-disputes">Cal State faculty are fighting to protect their jobs from AI</a></strong> | CalMatters &#8212; a union-backed bill could pass Monday.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/13/openai-faces-investigation-from-state-attorneys-general/">OpenAI drew a multi-state attorneys-general probe</a></strong> | TechCrunch &#8212; days after filing to go public.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/19/billionaire-ambani-wants-ai-in-every-call-app-and-home/">Reliance pushed AI to 500 million users</a></strong> | TechCrunch &#8212; Jio&#8217;s consumer AI blitz, with an IPO prospectus approved.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/18/ai-data-centers-just-got-a-government-mandated-fast-lane-to-the-grid/">AI data centers got a federal fast lane to the grid</a></strong> | TechCrunch &#8212; FERC ordered faster interconnections, paid by the data centers.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Fine-Tuning Field Notes #2: What We Actually Trained with QLoRA]]></title><description><![CDATA[The run used Qwen3 4B, a 32-example dataset, and QLoRA to train 33M adapter parameters while leaving the 4.05B-parameter base model frozen.]]></description><link>https://www.anothercodingblog.com/p/fine-tuning-field-notes-2-what-we</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/fine-tuning-field-notes-2-what-we</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Wed, 17 Jun 2026 11:23:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/902800f4-b43c-442e-b0c6-f821ff1fae75_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The phrase &#8220;fine-tuned a model&#8221; hides a lot.</p><p>It sounds like the whole model changed.</p><p>In this experiment, that is not what happened.</p><p>The base model had about 4.05 billion parameters. The training run updated about 33 million trainable parameters.</p><p>That worked out to roughly 0.81% of the model.</p><p>That number helped the whole process click for me.</p><p>We loaded a capable open-weight model, froze the original weights, attached a small set of trainable adapter weights, and trained those adapters on a narrow routing task.</p><p>The base model carried the general language capability.</p><p>The adapter learned the task behavior.</p><h2>The setup</h2><p>The experiment used:</p><pre><code><code>Base model: Qwen3 4B Instruct
Method: QLoRA
Dataset: 32 examples
Split: 24 train / 4 validation / 4 test
Trainable parameters: 33,030,144
Total parameters: 4,055,498,240
Percentage trained: 0.81%
</code></code></pre><p>The task was to classify internal AI requests into a governed routing schema.</p><p>Given a request like:</p><pre><code><code>A sales analyst wants to query curated opportunity data with SQL and build a dashboard for regional pipeline trends.
</code></code></pre><p>the model needed to return a structured decision:</p><pre><code><code>{
  "tier": "Trusted Technical Analyst",
  "risk": "medium",
  "recommended_platform": "Databricks + governed dashboarding",
  "needs_review": true,
  "reason": "The request requires SQL and governed dashboarding over business data."
}
</code></code></pre><p>The base model was already close. It understood the words in the request. It understood SQL, dashboards, agents, Salesforce, ServiceNow, and support workflows.</p><p>The thing I wanted to change was the model&#8217;s operating behavior.</p><p>Use these tier names. Use these platform labels. Set review required when governed data or production writeback appears. Return the response as strict JSON.</p><p>That is where QLoRA came in.</p><h2>The base model</h2><p>The base model was Qwen3 4B Instruct.</p><p>That model already had general language capability. It could read the request, understand the business context, and produce a reasonable answer.</p><p>Before training, though, &#8220;reasonable&#8221; was not enough.</p><p>For a SQL/dashboard request, the base model returned something like:</p><pre><code><code>{
  "recommended_platform": "Databricks + Tableau",
  "needs_review": false
}
</code></code></pre><p>That answer made sense generically. It did not match the routing policy I wanted.</p><p>The desired behavior was:</p><pre><code><code>{
  "recommended_platform": "Databricks + governed dashboarding",
  "needs_review": true
}
</code></code></pre><p>This is the type of gap where fine-tuning can be useful.</p><p>The model does not need new facts. It needs a more specific response pattern.</p><h2>Full fine-tuning</h2><p>Full fine-tuning updates the original model weights.</p><p>If you fully fine-tune a 4 billion parameter model, those original parameters can move during training.</p><p>That gives the training process a lot of flexibility. It also requires more memory, more compute, and more care.</p><p>Training needs to track model weights, gradients, optimizer states, activations, and batches. Those memory demands stack up quickly.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>A 4B model may sound small compared to frontier models, but full training is still heavy.</p><p>For this experiment, full fine-tuning would have been the wrong starting point.</p><p>The goal was to learn the loop, adapt behavior, run evals, and iterate quickly.</p><h2>LoRA</h2><p>LoRA stands for Low-Rank Adaptation.</p><p>The important idea is simple:</p><p>Freeze the base model. Add small trainable matrices inside selected layers. Train those matrices.</p><p>The original model weights stay in place.</p><p>The adapter learns a correction.</p><p>A simplified version looks like this:</p><pre><code><code>original layer output + LoRA update = adapted layer output
</code></code></pre><p>The model still uses the original Qwen weights. The LoRA adapter nudges some of the internal transformations.</p><p>This is why the saved artifact is much smaller than the full model.</p><p>The folder I saved was called:</p><pre><code><code>trusted_innovator_router_lora_v1
</code></code></pre><p>That folder is the trained adapter.</p><p>It is not a standalone 4B model. It has to be loaded with the base model to produce the fine-tuned behavior.</p><p>The practical model is:</p><pre><code><code>Qwen3 4B Instruct + trusted_innovator_router_lora_v1
</code></code></pre><p>That distinction matters.</p><p>If someone says they &#8220;fine-tuned a model with LoRA,&#8221; the artifact they trained may be an adapter, not a full copy of the model.</p><h2>Where the adapter goes</h2><p>LoRA adapters are inserted into selected modules inside the transformer.</p><p>For this run, the target modules were:</p><pre><code><code>q_proj
k_proj
v_proj
o_proj
gate_proj
up_proj
down_proj
</code></code></pre><p>The first group belongs to attention:</p><pre><code><code>q_proj
k_proj
v_proj
o_proj
</code></code></pre><p>Attention helps the model decide which tokens matter to each other.</p><p>In this routing task, the model needs to connect clues like:</p><pre><code><code>SQL
dashboard
curated data
Salesforce
ServiceNow
writeback
Glean
</code></code></pre><p>to the right output fields.</p><p>The second group belongs to the feed-forward part of the transformer, often called the MLP:</p><pre><code><code>gate_proj
up_proj
down_proj
</code></code></pre><p>MLP stands for multi-layer perceptron.</p><p>In older neural network language, an MLP is basically a stack of fully connected layers. In a transformer, the MLP is the part of each block that works on the representation after attention has mixed information across tokens.</p><p>A simple way to think about it:</p><pre><code><code>attention decides what information should be connected
MLP transforms that information into useful internal features
</code></code></pre><p>If attention helps the model notice that a request mentions SQL, dashboards, Glean, Salesforce, or writeback, the MLP helps turn those signals into higher-level behavior.</p><p>For this task, that behavior looks like:</p><pre><code><code>Glean-only request &#8594; Trusted Functional Builder
SQL/dashboarding &#8594; Trusted Technical Analyst
Salesforce writeback &#8594; Trusted Technical Builder
governed data &#8594; needs_review true
</code></code></pre><p>Modern Llama/Qwen-style models use MLP projections with names like <code>gate_proj</code>, <code>up_proj</code>, and <code>down_proj</code>.</p><p>A rough mental model:</p><pre><code><code>up_proj expands the representation
gate_proj controls what information passes through
down_proj compresses it back to the model dimension
</code></code></pre><p>That is not the full math, but it is enough to understand why LoRA adapters are often added there.</p><p>The adapter was inserted into both attention and MLP modules because the task required more than noticing keywords. The model needed to map those clues into a specific schema and policy.</p><p>Attention helped with the clues.</p><p>The MLP helped with the transformation from clues to routing behavior.</p><h2>Rank</h2><p>The LoRA config used:</p><pre><code><code>r = 16
</code></code></pre><p>The <code>r</code> value is the LoRA rank.</p><p>A higher rank gives the adapter more capacity. It trains more parameters and can learn more complex changes. It also uses more memory and can overfit more easily on a tiny dataset.</p><p>A lower rank gives the adapter less capacity. It is smaller and faster, but may not learn enough.</p><p>For this first run, <code>r = 16</code> was a reasonable starting point.</p><p>One thing I had to separate in my head:</p><pre><code><code>target_modules controls where LoRA is inserted.
r controls how much capacity each adapter has.
</code></code></pre><p>Those are different choices.</p><p>The target modules decide which parts of the model get trainable adapters.</p><p>The rank decides the size of those trainable updates.</p><h2>Quantization</h2><p>The &#8220;Q&#8221; in QLoRA comes from quantization.</p><p>Quantization stores the base model weights in a smaller numeric format.</p><p>Instead of loading the base model in 16-bit or 32-bit precision, we loaded it in 4-bit.</p><p>That was this setting:</p><pre><code><code>load_in_4bit = True
</code></code></pre><p>The model weights are still numbers. Quantization stores those numbers with less precision so the model uses less memory.</p><p>A rough way to think about it:</p><pre><code><code>FP32: more precision, more memory
FP16/BF16: less memory, common for GPU work
8-bit: compressed
4-bit: very compressed
</code></code></pre><p>The tradeoff is precision.</p><p>The benefit is memory.</p><p>For this experiment, 4-bit loading made it practical to run Qwen3 4B on a Tesla T4 and train the adapter.</p><p>The base model was compressed. The LoRA adapter was the small trainable part.</p><p>That is the QLoRA pattern:</p><pre><code><code>4-bit base model + trainable LoRA adapter
</code></code></pre><h2>What trained</h2><p>During training, the base model stayed frozen.</p><p>The adapter weights moved.</p><p>That is why the training summary mattered:</p><pre><code><code>Trainable parameters = 33,030,144 of 4,055,498,240
0.81% trained
</code></code></pre><p>The optimizer was not updating every Qwen parameter. It was updating the LoRA adapter weights attached to selected layers.</p><p>The dataset showed the model examples like:</p><pre><code><code>Marketing wants a Glean agent that summarizes campaign briefs and drafts LinkedIn posts.
</code></code></pre><p>with a target output like:</p><pre><code><code>{
  "tier": "Trusted Functional Builder",
  "risk": "low",
  "recommended_platform": "Glean",
  "needs_review": false
}
</code></code></pre><p>The training process compared the model&#8217;s predicted next tokens to the target tokens.</p><p>If the target said:</p><pre><code><code>Glean
</code></code></pre><p>and the model assigned more probability to:</p><pre><code><code>no-code agent builder
</code></code></pre><p>the adapter got nudged so that <code>Glean</code> became more likely in similar contexts.</p><p>If the target said:</p><pre><code><code>needs_review: true
</code></code></pre><p>for a SQL/dashboard request, the adapter got nudged toward that pattern.</p><p>Over many token-level corrections, the adapter learned the routing behavior.</p><h2>What did not train</h2><p>The model did not learn language from scratch.</p><p>It did not learn what SQL is from my dataset.</p><p>It did not learn what Salesforce is from my dataset.</p><p>It did not memorize an enterprise knowledge base.</p><p>The base model already brought general capability. The adapter learned how I wanted that capability expressed for a narrow task.</p><p>That distinction is important because fine-tuning is often described too broadly.</p><p>For this experiment, the fine-tune was about response behavior:</p><pre><code><code>exact tier names
exact platform names
review policy
structured JSON
routing consistency
</code></code></pre><p>That made it a good fit for adapter training.</p><p>If the task had been &#8220;answer questions from current internal documentation,&#8221; I would have reached for retrieval first.</p><h2>Why this matters</h2><p>The 0.81% number changed how I think about model adaptation.</p><p>It made the process feel less like rewriting a model and more like attaching a small policy-specific behavior layer.</p><p>That is powerful.</p><p>It also creates some practical constraints.</p><p>The adapter can shift behavior, but it is still sitting on top of the base model. If the dataset has gaps, the adapter will learn those gaps. If the examples over-associate writeback with Salesforce, the adapter may do the same. If the inference template is wrong, the learned behavior may not show up cleanly.</p><p>That showed up later in the eval.</p><p>The model correctly learned the broad routing policy, but it over-associated business-record writeback with Salesforce in one held-out test case.</p><p>That failure was not random. It reflected the data.</p><p>The adapter learned the examples. The next version needs better examples.</p><h2>The mental model I&#8217;m keeping</h2><p>After this run, this is how I think about QLoRA:</p><pre><code><code>Base model: general capability
Quantization: makes the base model fit in memory
LoRA adapter: small trainable behavior patch
Dataset: product spec
Training: token-level updates to the adapter
Eval: tells you what the adapter actually learned
</code></code></pre><p>That is a much clearer picture than &#8220;I fine-tuned a model.&#8221;</p><p>The model had 4.05 billion parameters.</p><p>We trained about 33 million.</p><p>That was enough to change the routing behavior, expose the importance of the dataset, and create a useful v1 adapter.</p><p>Next, I want to look more closely at the dataset itself.</p><p>The adapter learned the product spec we gave it.</p><p>It also learned the bias inside that spec.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Fine-Tuning Field Notes #1: Training Worked. Inference Didn’t.]]></title><description><![CDATA[A QLoRA experiment showed that adapting model weights is only half the job. The rest is making the model reliably express what it learned.]]></description><link>https://www.anothercodingblog.com/p/fine-tuning-field-notes-1-training</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/fine-tuning-field-notes-1-training</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 14 Jun 2026 15:29:25 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1f4434c9-ba7b-44d8-b47e-c51b3d896431_1491x1055.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Fine-Tuning Field Notes is a series on practical LLM fine-tuning: datasets, adapters, inference, evals, and the failure loops that make the concepts real. Here is the first article of many.</p><p>The training run looked clean.</p><p>Qwen loaded. The dataset formatted. The LoRA adapters attached. Training loss dropped. Validation loss dropped. The adapter saved.</p><p>Then I asked the model to classify a request and it returned:</p><pre><code><code>assistant
assistant
assistant
assistant
assistant</code></code></pre><p>That was something I didn&#8217;t expect and a useful failure.</p><p>The model had learned something. The system around it was still wrong.</p><h2>The Use Case</h2><p>Before getting into the training run, the use case matters.</p><p>This was not an attempt to teach the model new facts. I was not trying to load company documentation into the weights or turn a small model into a general-purpose enterprise assistant.</p><p>The task was narrower and more practical: classify internal AI requests into a governed operating model.</p><p>In an enterprise AI program, not every request should follow the same path. A simple Glean agent that summarizes internal documents is very different from a Databricks app that reads customer data and writes updates back to Salesforce. One is closer to no-code enablement. The other needs engineering review, governed data access, and production controls.</p><p>That made this a useful fine-tuning experiment.</p><p>The base model already understood the words in the request. The question was whether it could learn the specific routing policy:</p><ul><li><p>which requests belong in which tier</p></li><li><p>which platforms should be recommended</p></li><li><p>when review is required</p></li><li><p>how to return the answer in a strict JSON schema</p></li></ul><p>The goal was not to make the model smarter in general. The goal was to make it more consistent for one narrow decision.</p><h2>The job</h2><p>The goal was intentionally narrow.</p><p>I wanted to fine-tune an open model to classify internal AI requests into a governed routing schema designed for trusted innovation amongst stakeholder build requests. </p><p>The input was a plain-English request. The output needed to be structured JSON.</p><p>Example input:</p><pre><code><code>A sales analyst wants to query curated opportunity data with SQL and build a dashboard for regional pipeline trends.</code></code></pre><p>Expected output:</p><pre><code><code>{
  "tier": "Trusted Technical Analyst",
  "risk": "medium",
  "recommended_platform": "Databricks + governed dashboarding",
  "needs_review": true,
  "reason": "The request requires SQL and governed dashboarding over business data."
}</code></code></pre><p>The base model was already capable. It understood SQL. It understood dashboards. It understood Salesforce, ServiceNow, internal tools, and business workflows.</p><p>The gap was more specific.</p><p>I wanted the model to learn a particular operating model:</p><ul><li><p>Glean-only requests should map to <code>Trusted Functional Builder</code>.</p></li><li><p>SQL, dashboards, and governed data should map to <code>Trusted Technical Analyst</code>.</p></li><li><p>Custom apps, integrations, autonomous workflows, and production writeback should map to <code>Trusted Technical Builder</code>.</p></li><li><p>Platform names should use controlled values.</p></li><li><p><code>needs_review</code> should follow the governance policy.</p></li><li><p>The output should be valid JSON.</p></li></ul><p>The baseline model got close, but it was not consistent enough.</p><p>For the SQL/dashboard example, the base model returned something like:</p><pre><code><code>{
  "recommended_platform": "Databricks + Tableau",
  "needs_review": false
}</code></code></pre><p>That is a reasonable answer in a generic enterprise context. It was not the answer I wanted for this operating model.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The desired response was:</p><pre><code><code>{
  "recommended_platform": "Databricks + governed dashboarding",
  "needs_review": true
}</code></code></pre><p>That distinction matters.</p><p>Fine-tuning here was not about teaching the model what SQL is. It was about teaching the model a specific vocabulary, schema, and review policy.</p><h2>The setup</h2><p>The experiment used:</p><pre><code><code>Base model: Qwen3 4B Instruct
Method: QLoRA
Dataset: 32 examples
Split: 24 train / 4 validation / 4 test
Trainable parameters: 33,030,144 of 4,055,498,240
Percentage trained: 0.81%</code></code></pre><p>That last number is the whole story.</p><p>I did not retrain the entire model.</p><p>The original Qwen weights stayed frozen. LoRA inserted small trainable adapter weights into selected parts of the model. QLoRA made the setup fit on a modest GPU by loading the base model in 4-bit.</p><p>The simple version:</p><pre><code><code>Quantized base model + trainable LoRA adapter = QLoRA fine-tuning</code></code></pre><p>The base model carried the general language and reasoning capability. The adapter learned the routing behavior.</p><p>This is one of the first concepts that clicked for me. The saved artifact was not a full standalone model. It was a behavior patch that gets loaded with the base model.</p><p>In this case:</p><pre><code><code>Qwen3 4B Instruct + trusted_innovator_router_lora_v1 = fine-tuned router behavior</code></code></pre><h2>Training looked healthy</h2><p>The training run completed in 30 steps across 5 epochs.</p><p>Loss moved in the right direction:</p><pre><code><code>Step 10: train loss 1.58 / validation loss 1.46
Step 20: train loss 0.51 / validation loss 0.47
Step 30: train loss 0.24 / validation loss 0.27</code></code></pre><p>That meant the adapter was getting better at predicting the target outputs token by token.</p><p>The model was not being graded on whether the final JSON &#8220;felt right.&#8221; It was being trained through next-token prediction.</p><p>Given the correct prefix, how likely was the model to predict the next correct token?</p><p>For example, if the correct output contained:</p><pre><code><code>"recommended_platform": "Glean"</code></code></pre><p>the training loop pushed the adapter to make <code>"Glean"</code> more likely in similar contexts.</p><p>If the correct output contained:</p><pre><code><code>"needs_review": true</code></code></pre><p>the adapter was nudged to make <code>true</code> more likely when the request involved SQL, governed data, dashboards, integrations, automation, or writeback.</p><p>That is what loss was measuring: the model&#8217;s token-level error against the target examples.</p><p>The loss curves suggested the adapter learned the small training distribution. The next question was whether the model could generate the right answer when used normally.</p><p>That is where things broke.</p><h2>Generation broke</h2><p>After training, I tested the model against the same kind of request.</p><p>Instead of returning JSON, the model generated:</p><pre><code><code>assistant
assistant
assistant
assistant
assistant</code></code></pre><p>This was probably not the model becoming mysterious at inference time. The most likely issue was more mechanical.</p><p><br>The model was looping on role markers, which usually points to something in the chat template, prompt assembly, EOS behavior, stop-token setup, or generation config. The model may have been seeing a slightly wrong version of the conversation format during inference.</p><p><br>That matters because fine-tuned chat models are sensitive to the structure around the prompt. The model was trained to complete examples in one format. If inference assembles the conversation differently, the first few generated tokens can go sideways fast.</p><p>This is where fine-tuning became a systems problem.</p><p>The adapter had learned useful behavior. I could see that from the loss and from some partial successful generations. The generation path was unstable. It started down the wrong output path and kept going.</p><p>That exposed an important distinction:</p><p><strong>Training changes weights. Inference determines how those weights are expressed.</strong></p><p>The model had learned the routing pattern. The runtime context was letting it start the response incorrectly. Once the first generated token went toward a role label instead of JSON, the rest of the output followed that path.</p><h2>The fix was one character</h2><p>The expected output was always a JSON object.</p><p>Every valid response began with:</p><pre><code><code>{</code></code></pre><p>So I changed the inference wrapper to prefill the assistant response with an opening curly bracket.</p><p>Instead of asking the model to begin from:</p><pre><code><code>assistant:</code></code></pre><p>I made the prompt effectively begin the assistant response as:</p><pre><code><code>assistant:
{</code></code></pre><p>Then the model only had to continue the JSON object.</p><p>That tiny change shifted the next-token problem.</p><p>The model was no longer deciding whether to start with a role marker, prose, markdown, or JSON. It was continuing an object that had already started.</p><p>The same SQL/dashboard request then produced:</p><pre><code><code>{
  "tier": "Trusted Technical Analyst",
  "risk": "medium",
  "recommended_platform": "Databricks + governed dashboarding",
  "needs_review": true,
  "reason": "The request requires SQL and governed dashboarding over business data."
}</code></code></pre><p>That was the moment the full loop became visible.</p><p>The fine-tune had taught the model the behavior. The inference wrapper made that behavior easier to express reliably.</p><p>The curly bracket did not fix the root cause. It bypassed the broken decision point.</p><p>The curly bracket was not magic. It was not a substitute for validation. It was a lightweight inference constraint.</p><p>Because the output contract required a JSON object, starting the assistant response with <code>{</code> moved the model onto the correct generation path. In a production system, I would pair this with schema validation, constrained decoding, or a structured-output interface. For the experiment, it was enough to separate two questions:</p><ol><li><p>Did the adapter learn the routing behavior?</p></li><li><p>Could the inference wrapper make that behavior show up cleanly?</p></li></ol><p>The answer to both was yes.</p><h2>Why the curly bracket mattered</h2><p>LLMs generate one token at a time.</p><p>The first generated token has a huge influence on the path that follows.</p><p>If the model starts with:</p><pre><code><code>assistant</code></code></pre><p>it may continue producing role-like tokens.</p><p>If the model starts with:</p><pre><code><code>{</code></code></pre><p>the next likely tokens become things like:</p><pre><code><code>"tier"
"risk"
"recommended_platform"
"needs_review"
"reason"</code></code></pre><p>The curly bracket did not change the weights. It changed the context.</p><p>That is a practical version of a broader production pattern: structured output tasks need more than a natural-language instruction that says &#8220;return JSON.&#8221;</p><p>They often need schemas, constrained decoding, output prefill, parsers, validators, retry loops, or some combination of those pieces.</p><p>The model is one part of the system. The output path is another.</p><h2>The eval</h2><p>Once the JSON prefill was in place, I reran the original baseline examples.</p><p>The manual regression set passed cleanly:</p><pre><code><code>JSON valid: 5/5
Tier correct: 5/5
Risk correct: 5/5
Platform correct: 5/5
Review flag correct: 5/5</code></code></pre><p>The original SQL/dashboard failure was fixed.</p><p>Before fine-tuning, the base model leaned generic:</p><pre><code><code>{
  "recommended_platform": "Databricks + Tableau",
  "needs_review": false
}</code></code></pre><p>After fine-tuning and the inference fix:</p><pre><code><code>{
  "recommended_platform": "Databricks + governed dashboarding",
  "needs_review": true
}</code></code></pre><p>Then I ran the held-out test set.</p><p>Result:</p><pre><code><code>{
  "json_valid": 4,
  "tier_correct": 4,
  "risk_correct": 4,
  "platform_correct": 3,
  "needs_review_correct": 4,
  "total": 4
}</code></code></pre><p>At a field level, that was 19 correct checks out of 20.</p><p>The one miss was the most useful part of the eval.</p><h2>The model learned a bias</h2><p>The failed test example was:</p><pre><code><code>Finance wants an agent that reviews invoices, flags anomalies, and updates vendor records after approval.</code></code></pre><p>Expected platform:</p><pre><code><code>"recommended_platform": "Databricks app"</code></code></pre><p>Actual platform:</p><pre><code><code>"recommended_platform": "Databricks app + Salesforce integration"</code></code></pre><p>The model correctly identified:</p><pre><code><code>{
  "tier": "Trusted Technical Builder",
  "risk": "high",
  "needs_review": true
}</code></code></pre><p>But it added Salesforce even though Salesforce was never mentioned.</p><p>That told me something specific about the dataset.</p><p>The training examples had taught the model that writeback and business-record updates often meant Salesforce. The adapter picked up that pattern and over-applied it.</p><p>That is a useful failure because the fix is obvious:</p><p>Add counterexamples.</p><p>The v2 dataset needs more examples like:</p><ul><li><p>finance workflow with vendor records, no Salesforce</p></li><li><p>procurement workflow with approval records, no Salesforce</p></li><li><p>HR workflow with employee systems, no Salesforce</p></li><li><p>explicit Salesforce examples where Salesforce is mentioned</p></li><li><p>explicit ServiceNow examples where ServiceNow is mentioned</p></li></ul><p>The rule I want the model to learn in v2:</p><pre><code><code>Only choose Salesforce when Salesforce is present or clearly implied.
Only choose ServiceNow when ServiceNow is present or clearly implied.
Otherwise, use Databricks app for custom app or agent workflows.</code></code></pre><p>That is the fine-tuning loop in practice.</p><p>The eval tells you what the model overlearned. The next dataset fixes that specific behavior.</p><h2>What changed for me</h2><p>This experiment changed how I think about fine-tuning.</p><p>Fine-tuning is not one action. It is a loop:</p><pre><code><code>define the behavior
build the dataset
train the adapter
run inference
parse and validate the output
evaluate field by field
diagnose the failure
improve the dataset
retrain</code></code></pre><p>The training run is only one part of that loop.</p><p>The adapter learned the routing behavior, but inference still needed structure. The eval passed most checks, but revealed a dataset bias. The dataset was not only a collection of examples. It was the product spec the model learned from.</p><p>The biggest lesson:</p><p><strong>Model behavior comes from both weights and runtime context.</strong></p><p>The fine-tune changed the weights through the adapter. The curly bracket changed the runtime context. The parser and eval harness told me whether the output was actually usable.</p><p>That is the real work around fine-tuning.</p><p>The model does not become reliable because the loss went down. It becomes reliable when the training data, inference path, output constraints, and eval loop all line up.</p><p>This is the first field note.</p><p>Next, I want to fix the dataset bias and run v2.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 76]]></title><description><![CDATA[The government pulled Fable 5. Mythos 5 went dark too. SpaceX priced a $75B IPO. Apple's Siri runs on Google. A Sierra Leone trial proved AI tutoring works. AI turmoil reached the courts.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-ec4</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-ec4</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 13 Jun 2026 12:39:49 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c03e2c29-e75d-48e0-b023-827d27ffdd80_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pJ0P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pJ0P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png 424w, https://substackcdn.com/image/fetch/$s_!pJ0P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png 848w, https://substackcdn.com/image/fetch/$s_!pJ0P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png 1272w, https://substackcdn.com/image/fetch/$s_!pJ0P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pJ0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png" width="1456" height="1808" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1808,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:594507,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/201861917?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pJ0P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png 424w, https://substackcdn.com/image/fetch/$s_!pJ0P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png 848w, https://substackcdn.com/image/fetch/$s_!pJ0P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png 1272w, https://substackcdn.com/image/fetch/$s_!pJ0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66a0376-9ab8-4c8e-bf69-2c15ef98fa97_2400x2980.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Anthropic shipped its most powerful model, then the government switched it off.</h2><ul><li><p><strong>It launched as a triumph.</strong> On June 9 Anthropic released <a href="https://www.anthropic.com/news/claude-fable-5-mythos-5">Claude Fable 5</a>, the first Mythos-class model made generally available, state-of-the-art on nearly every benchmark and live day one in Cursor, GitHub Copilot, and Google Cloud. Simon Willison <a href="https://simonwillison.net/2026/Jun/9/claude-fable-5/">called it a beast</a>, and Anthropic lined up a <a href="https://www.reuters.com/business/apollo-blackstone-back-anthropics-35-billion-capacity-expansion-new-broadcom-tie-2026-06-09/">$35 billion Broadcom expansion</a> to feed demand.</p></li><li><p><strong>Then the safety machinery showed.</strong> A 319-page system card disclosed that Fable would <a href="https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/">silently degrade its answers</a> for anyone building frontier AI, its <a href="https://techcrunch.com/2026/06/10/cybersecurity-researchers-arent-happy-about-the-guardrails-on-anthropics-fable/">cyber guardrails reject almost anything</a> in the lexical field of &#8220;cyber,&#8221; and Microsoft <a href="https://www.reuters.com/technology/microsoft-limits-employee-use-anthropics-claude-fable-5-over-data-retention-2026-06-10/">limited employee use</a> over a mandatory 30-day data-retention rule.</p></li><li><p><strong>It asked Washington to regulate models like it.</strong> Anthropic published <a href="https://www.anthropic.com/policy-on-the-ai-exponential">Policy on the AI Exponential</a>, calling for mandatory third-party testing of frontier models and a government power to block or revoke deployments.</p></li><li><p><strong>The backlash forced a reversal.</strong> Anthropic <a href="https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/">apologized and walked back</a> the silent degradation, saying it &#8220;made the wrong tradeoff.&#8221; It also launched a <a href="https://www.anthropic.com/news/claude-corps">$150 million Claude Corps fellowship</a>, paying 1,000 people $85K each to embed Claude at US nonprofits for a year.</p></li><li><p><strong>Then the government pulled it.</strong> On June 12 an export-control directive ordered Anthropic to <a href="https://www.anthropic.com/news/fable-mythos-access">suspend Fable 5 and Mythos 5</a> for any foreign national, a scope so broad both models went dark worldwide.</p></li><li><p><strong>The thread:</strong> Anthropic spent months arguing its frontier models were powerful enough to need policing, and then a US administration took it literally and switched them off. Nobody knows yet whether this was a one-off misunderstanding or the first sign that governments will treat frontier models like controlled exports.</p></li></ul><div><hr></div><h2>AI spending hit new extremes, from a $75B IPO to China&#8217;s $295B plan.</h2><ul><li><p><strong>SpaceX priced the largest IPO in history at $75 billion.</strong> The <a href="https://www.reuters.com/world/musks-spacex-prices-record-75-billion-ipo-135-share-2026-06-11/">record listing</a> came from a company that now owns xAI.</p></li><li><p><strong>Jeff Bezos&#8217;s Project Prometheus raised $12 billion at a $41 billion valuation.</strong> The money funds an <a href="https://techcrunch.com/2026/06/11/jeff-bezoss-prometheus-raises-12b-to-build-an-artificial-general-engineer-for-the-physical-world/">&#8220;artificial general engineer&#8221;</a> for physical systems, with Bezos predicting &#8220;labor scarcity,&#8221; not job loss.</p></li><li><p><strong>Amazon raised roughly $31.5 billion for AI capex in 48 hours.</strong> It signed a <a href="https://techcrunch.com/2026/06/10/fresh-off-bond-sale-amazon-borrows-17-5-billion-from-banks-as-ai-spending-continues/">$17.5 billion bank loan</a> days after a $14 billion bond sale.</p></li><li><p><strong>The buildout went global and institutional.</strong> China is preparing a <a href="https://www.reuters.com/world/china/china-prepares-295-billion-plan-fund-nationwide-ai-buildout-bloomberg-news-2026-06-09/">$295 billion nationwide AI plan</a>, Databricks is <a href="https://www.reuters.com/legal/transactional/databricks-talks-raise-funds-over-165-billion-information-reports-2026-06-09/">raising at over $165 billion</a>, Mistral is in talks near a <a href="https://techcrunch.com/2026/06/12/mistral-is-rumored-to-be-raising-e3b-at-e20-valuation/">$23 billion valuation</a>, and a single <a href="https://www.reuters.com/technology/applied-digital-signs-52-billion-ai-data-center-lease-with-us-hyperscaler-2026-06-08/">data-center lease ran to $5.2 billion</a>.</p></li><li><p><strong>The thread:</strong> Not every number here is even an AI bill. SpaceX is a rocket and satellite company that now owns xAI. What ties the week together is how much capital is chasing valuations and buildouts this large, on the bet that future returns will justify them. The harder thing to know is whether numbers this size can hold.</p></li></ul><div><hr></div><h2>OpenAI filed to go public and bet on a super app, into a deepening price battle.</h2><ul><li><p><strong>OpenAI confirmed it filed a confidential S-1.</strong> It <a href="https://openai.com/index/openai-submits-confidential-s-1/">submitted the draft to the SEC</a> days behind Anthropic, putting both leading labs on a path to public markets.</p></li><li><p><strong>&#8220;Chat is dead,&#8221; a senior OpenAI employee told the FT.</strong> OpenAI is rebuilding ChatGPT into a <a href="https://techcrunch.com/2026/06/07/openai-is-still-working-on-that-super-app/">personal-agent super app</a> that funnels free users toward paid products like Codex, chasing profitability before a listing.</p></li><li><p><strong>Google brought the price war home.</strong> It cut Google AI Plus to <a href="https://techcrunch.com/2026/06/09/google-just-fired-a-warning-shot-in-the-ai-subscription-price-wars/">$4.99 a month</a> while doubling storage, and OpenAI is reportedly <a href="https://www.reuters.com/technology/openai-considers-drastic-price-cuts-anticipating-war-users-with-anthropic-wsj-2026-06-11/">weighing drastic price cuts</a> to fight Anthropic for users.</p></li><li><p><strong>The thread:</strong> A premium IPO valuation wants fat margins, and the price war cuts into them. Google can stomach a $4.99 plan because it runs Gemini on its own chips and does not lean on it for revenue, though it likely still gives up some margin. OpenAI and Anthropic have no such cushion.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Apple&#8217;s new AI Siri runs on Google&#8217;s models and Nvidia&#8217;s chips.</h2><ul><li><p><strong>Apple finally shipped the AI Siri it promised for two years.</strong> The <a href="https://techcrunch.com/2026/06/08/wwdc-2026-everything-announced-on-siri-ai-os-27-apple-intelligence-and-more/">WWDC keynote</a> covered a rebuilt Siri, next-generation Apple Intelligence, AI photo editing, and natural-language Shortcuts, doubling as Tim Cook&#8217;s farewell.</p></li><li><p><strong>The models are Google&#8217;s, the chips are Nvidia&#8217;s.</strong> Apple Intelligence runs on Foundation Models built with Google and the Gemini family, and Private Cloud Compute <a href="https://www.theverge.com/news/946219/apple-ai-runs-on-nvidia-chips">runs on Nvidia hardware inside Google&#8217;s cloud</a>.</p></li><li><p><strong>Siri AI is really an enterprise app layer.</strong> The system <a href="https://venturebeat.com/technology/apples-new-siri-ai-is-more-than-just-a-smarter-assistant-its-a-new-enterprise-app-layer">exposes app data and actions to Siri</a> through OS frameworks, and the Foundation Models framework now <a href="https://claude.com/blog/claude-for-foundation-models">hands off to Claude</a>. Europe waits, <a href="https://www.theverge.com/ai-artificial-intelligence/946137/apple-blames-the-dma-again-for-delayed-siri-ai-in-the-eu">blamed on the DMA</a>.</p></li><li><p><strong>The thread:</strong> This is a departure for Apple. The company that built its brand on owning the whole stack shipped its AI on Google&#8217;s models and Nvidia&#8217;s chips, keeping only the device, the privacy story, and the OS integration. Whether owning the experience is enough without owning the model is the bet Apple is now making.</p></li></ul><div><hr></div><h2>OpenAI bought a cloud for Codex as Xiaomi&#8217;s free agent beat Claude Code.</h2><ul><li><p><strong>OpenAI is buying a cloud for Codex.</strong> It will <a href="https://openai.com/index/openai-to-acquire-ona/">acquire Ona</a> to give agents persistent, customer-controlled environments that keep working after you close the laptop. Codex is now at 5 million weekly users.</p></li><li><p><strong>Xiaomi&#8217;s free agent beat Claude Code.</strong> Its open-source <a href="https://venturebeat.com/technology/xiaomis-new-open-source-agentic-ai-coding-harness-mimo-code-beats-claude-code-at-ultra-long-200-step-tasks">MiMo Code</a> topped Claude Code on long-horizon tasks, bundled with a free frontier model.</p></li><li><p><strong>xAI opened a store and OpenAI rented distribution.</strong> Grok Build got a <a href="https://x.ai/news/grok-plugin-marketplace">plugin marketplace</a> with MongoDB, Vercel, and Cloudflare, while OpenAI let enterprises <a href="https://openai.com/index/openai-on-oracle-cloud/">spend Oracle cloud credits on Codex</a>.</p></li><li><p><strong>The thread:</strong> The agent and the environment around the model are becoming as hot a commodity as the model itself, and a free Xiaomi model winning on long tasks shows how many teams can now build a good one. That argues the model is no longer the moat. Then Fable 5 shows up good enough to argue it still is.</p></li></ul><div><hr></div><h2>Google sued an AI phishing network running a million scam sites.</h2><ul><li><p><strong>Google sued an AI phishing factory.</strong> It filed to dismantle an alleged Chinese network, <a href="https://techcrunch.com/2026/06/12/chinese-cybercrime-operation-that-used-ai-to-scam-hundreds-of-thousands-of-victims-sued-by-google/">Outsider Enterprise</a>, that sold turn-key scam-site software and ran a million fraudulent domains.</p></li><li><p><strong>AI is being weaponized for influence and contested in court.</strong> OpenAI says China <a href="https://www.reuters.com/business/media-telecom/openai-says-chinese-propaganda-is-being-deployed-foment-dissent-over-tariffs-2026-06-10/">used ChatGPT for anti-tariff propaganda</a>, a fired xAI engineer <a href="https://techcrunch.com/2026/06/10/xai-fired-an-engineer-who-raised-alarms-about-grok-safety-new-lawsuit-claims/">sued over Grok safety</a>, and Deezer found <a href="https://techcrunch.com/2026/06/11/deezers-new-tool-can-identify-ai-music-from-spotify-apple-music-and-others/">44% of daily uploads are AI-generated</a>.</p></li><li><p><strong>Some are trying to get ahead of the fallout.</strong> Google DeepMind committed <a href="https://deepmind.google/blog/investing-in-multi-agent-ai-safety-research/">$10 million to multi-agent safety research</a>, betting that millions of interacting agents will behave in ways nobody can yet predict.</p></li><li><p><strong>The thread:</strong> These are different kinds of failure. A phishing network, a foreign influence op, and a wave of synthetic music are not one problem, but they are all arriving faster than the rules meant to catch them. </p></li></ul><div><hr></div><h2><strong>&#11088; Featured: An AI tutoring trial in Sierra Leone delivered years of learning in eight weeks.</strong></h2><p>Amid a week of suspensions, lawsuits, and record raises, the most grounded story came from a classroom. Google DeepMind published the <a href="https://deepmind.google/blog/measuring-the-impact-of-learning-with-ai-in-sierra-leone-and-beyond/">results of a randomized controlled trial</a> of AI tutoring, the kind of pre-registered, real-world evidence the field rarely produces.</p><p>The study, run with Fab AI and the Sierra Leone Ministry of Education, followed 1,763 junior secondary students across 12 schools in Port Loko District over eight weeks. Students using Guided Learning in Gemini gained 0.258 standard deviations in math over the control group, which DeepMind translates to roughly 1.2 to 1.7 years of typical learning progress in two months. Classrooms whose teachers hit a 12-hour usage target saw 1.8 to 2.5 years of progress.</p><p>The design is what makes it credible. This was a teacher-led intervention: educators set the objectives, designed the lessons, and ran the discussions, with the model acting as a Socratic partner rather than an answer key. Across 113,000 logged interactions, students were building conceptual understanding in 91.4% of conversations, and Gemini posed scaffolding questions in 76% of its messages while giving direct solutions in just 2%. Engagement reached 69% of students meeting usage targets, against the roughly 5% that typically stick with voluntary educational software.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L944!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L944!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png 424w, https://substackcdn.com/image/fetch/$s_!L944!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png 848w, https://substackcdn.com/image/fetch/$s_!L944!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png 1272w, https://substackcdn.com/image/fetch/$s_!L944!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L944!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png" width="1456" height="617" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:617,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:751925,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/201861917?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L944!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png 424w, https://substackcdn.com/image/fetch/$s_!L944!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png 848w, https://substackcdn.com/image/fetch/$s_!L944!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png 1272w, https://substackcdn.com/image/fetch/$s_!L944!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03e332a-7499-415d-b4ab-92bd2a724082_2116x896.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The behavior shift is the part worth dwelling on. Over the trial, students&#8217; questions moved from wanting answers to wanting to understand: skill-building queries rose from 68% to 90% by the final week, while solution-seeking dropped from 25% to 10%. The catch DeepMind names itself is the achievement gap. Students who arrived with stronger math skills benefited most, which is the opposite of what an equity-driven tool needs to do.</p><p><strong>What to watch for:</strong> Whether the follow-on RCTs DeepMind is running in other countries hold up outside Sierra Leone, and whether anyone can close the gap so the students furthest behind gain the most, not the least.</p><h2><strong>&#127897;&#65039; Worth a Watch: Satya Nadella on Hard Fork.</strong></h2><div id="youtube2-zqEZyHkXgh0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;zqEZyHkXgh0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/zqEZyHkXgh0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Microsoft&#8217;s CEO sat down with Kevin Roose and Casey Newton for <a href="https://www.youtube.com/watch?v=zqEZyHkXgh0">a wide-ranging hour</a>, and the throughline was that Microsoft wants the whole economy at the frontier, not one model or one firm.</p><ul><li><p><strong>The model is not the goal, diffusion is.</strong> Nadella argues a frontier model means little if the economy still grows at 2%. He wants every company carrying both &#8220;human capital and token capital,&#8221; not three firms sitting on the frontier alone.</p></li><li><p><strong>Microsoft&#8217;s play is to be the best base model, not the only one.</strong> Its new from-scratch MAI models are meant to give customers a reasoning-and-agent-loop base they bring into their own RL, keeping the weights, harness, and context, and swapping the model out if they want.</p></li><li><p><strong>Token economics is his real constraint on AGI.</strong> His AGI benchmark is still 10% GDP growth, and he says it only arrives when the marginal cost of a token matches the marginal value it creates. Token maxing and vibe coding for their own sake are not the path.</p></li><li><p><strong>The OpenAI relationship, in his words.</strong> After the renegotiation, Microsoft keeps its cap-table stake, a large customer, OpenAI IP through 2032, and the freedom to build its own. &#8220;We have the compute, we have now the model, and we have still the partnership.&#8221;</p></li><li><p><strong>On the backlash.</strong> He concedes the perception is terrible and points to Microsoft&#8217;s 20-year data center in Quincy, Washington, where he says the local tax base rose, local taxes fell, and employment grew, as the longitudinal proof the industry needs.</p></li><li><p><strong>How AGI-pilled is he?</strong> Closed-loop work like coding and AI research can be automated, he says, but the unverifiable parts of human knowledge work resist it. He does not buy that this is the last technology we will invent.</p></li></ul><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://www.anthropic.com/research/agents-in-biology">Anthropic published research on making biology agent-friendly</a></strong> | Anthropic &#8212; scientific agents only hit near-100% accuracy once a deterministic retrieval layer was added.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/09/lovable-says-it-has-hit-500m-in-annualized-revenue-with-1-million-new-projects-a-week/">Lovable hit $500M in annualized revenue</a></strong> | TechCrunch &#8212; vibe-coding growth is absurd, but nobody has reported the abandonment rate.</p></li><li><p><strong><a href="https://huggingface.co/blog/CohereLabs/introducing-north-mini-code">Cohere open-sourced North Mini Code</a></strong> | Hugging Face &#8212; a 30B coding agent that runs on one H100 and beats bigger models.</p></li><li><p><strong><a href="https://venturebeat.com/technology/researchers-say-they-trained-a-foundation-model-from-scratch-for-about-1-500">Researchers trained a 1B reasoning model for about $1,500</a></strong> | VentureBeat &#8212; 84.5% on GSM8K with 100x fewer tokens than the giants.</p></li><li><p><strong><a href="https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/">Google open-sourced DiffusionGemma</a></strong> | Google &#8212; a 26B model that generates text in parallel blocks for up to 4x faster output.</p></li><li><p><strong><a href="https://venturebeat.com/orchestration/researchers-trained-an-open-source-ai-search-agent-harness-1-that-outperforms-gpt-5-4-on-recalling-relevant-information">Harness-1, an open 20B search agent, beat GPT-5.4</a></strong> | VentureBeat &#8212; it wins by moving session bookkeeping out of the context window.</p></li><li><p><strong><a href="https://openai.com/index/using-codex-to-simulate-black-holes/">OpenAI&#8217;s Codex is helping simulate black holes</a></strong> | OpenAI &#8212; an astrophysicist uses it to derive testable algorithms, not trust its answers.</p></li><li><p><strong><a href="https://www.reuters.com/business/nvidia-hires-veteran-lobbyist-bruce-andrews-head-government-affairs-sources-say-2026-06-11/">Nvidia hired a veteran lobbyist to run government affairs</a></strong> | Reuters &#8212; the chip wars are now a Washington game too.</p></li><li><p><strong><a href="https://techcrunch.com/2026/06/11/doordashs-new-ai-chatbot-lets-you-order-with-prompts-and-photos/">DoorDash&#8217;s chatbot takes orders from prompts and photos</a></strong> | TechCrunch &#8212; agents keep creeping into the checkout flow.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Cache-Aware Skill Design]]></title><description><![CDATA[How prompt caching, KV cache, and stable instruction modules can change the cost of agent workflows]]></description><link>https://www.anothercodingblog.com/p/cache-aware-skill-design</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/cache-aware-skill-design</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Mon, 08 Jun 2026 15:30:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5896a14e-2fe6-4782-84d6-67a6b27f94a1_1200x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Prompt caching is often described as a cost optimization.</p><p>If a model provider sees the same input tokens again, those repeated tokens may be processed at a lower cost. That description is accurate, but incomplete.</p><p><a href="https://developers.openai.com/api/docs/guides/prompt-caching">OpenAI&#8217;s prompt caching docs</a> describe cache hits as exact prefix reuse and recommend putting static content at the beginning of the prompt, with variable content near the end. A cache hit means the model server has recently processed the same prompt prefix during inference, so it can reuse stored model state for that matching portion of the input.</p><p>That detail matters for agent design.</p><p>Agents routinely send repeated context across turns and tasks: tool definitions, system prompts, Skill instructions, output contracts, examples, source-handling rules, conversation history, retrieved documents, and tool results.</p><p>Some of that context is stable. Some of it changes on every run.</p><p>Prompt caching can reward systems that separate those two categories cleanly.</p><p>A Skill with stable instructions, examples, and output rules can become a reusable prompt prefix. A Skill that places timestamps, run IDs, retrieved documents, or task-specific state before its stable instructions may reduce the opportunity for cache reuse.</p><p>The practical implication is straightforward:</p><p>A well-designed Skill does more than tell the model what to do. It also gives the model server a stable structure it can reuse.</p><p>This is why prompt caching should not be treated only as a pricing feature. For agents and Skills, prompt structure can become part of system architecture.</p><h2>Cached Tokens Mean Reused Computation</h2><p>The phrase &#8220;cached tokens&#8221; can make prompt caching sound like text storage.</p><p>That framing misses the mechanism.</p><p>The model server is not caching a response. It is checking whether the new request begins with a prefix it has already processed. When the prefix matches, the server can reuse stored model state for that matching portion of the input.</p><p>The same OpenAI docs also recommend placing static content at the beginning of the prompt and variable content near the end.</p><p>That recommendation is the first design rule.</p><p>Stable material belongs early:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;ae27a93e-2062-430c-aa30-113236d46f72&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[system instructions]
[tool definitions]
[Skill instructions]
[output contract]
[examples] </code></pre></div><p>Variable material belongs later:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d1fb2f4c-bc20-4ede-a594-a0fcb7da8930&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[current task]
[retrieved documents]
[tool results]
[timestamps]
[run IDs] </code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xwJz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xwJz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 424w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 848w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 1272w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xwJz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png" width="1456" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Prompt Caching visualization&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Prompt Caching visualization" title="Prompt Caching visualization" srcset="https://substackcdn.com/image/fetch/$s_!xwJz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 424w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 848w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 1272w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Prompt caching starts with prefix alignment. If the new request begins with the same token pattern, the serving layer can reuse cached state. If the beginning changes, the reusable prefix can collapse, even when later parts of the prompt look familiar.</figcaption></figure></div><p>The important word is prefix.</p><p>Prompt caching does not usually search the whole prompt for similar meaning. It does not see that two prompts both mention the same document, paragraph, or phrase and automatically reuse that work wherever it appears. The cached state depends on the exact token sequence, its order, and where that sequence appears in the prompt.</p><p>That makes small layout choices matter.</p><p>A timestamp at the top of the prompt can change the prefix.</p><p>A random run ID can change the prefix.</p><p>A retrieval system that inserts source chunks before the stable Skill body can change the prefix.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>A tool description that includes dynamic runtime state can change the prefix.</p><p>Each of those choices may be reasonable in isolation. Together, they make the beginning of the prompt less stable. That reduces the amount of work the serving layer can reuse.</p><p>For agent systems, this is the practical consequence:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;bc4e20a9-8437-4b4d-b4ab-7824d63807b3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Prompt caching rewards stable beginnings. </code></pre></div><p>The stable part of the agent should be early. The variable part should be later.</p><h2>What Is Actually Being Cached?</h2><p>Prompt caching is easier to understand if we separate four layers:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5b8887fa-24d2-4fef-969f-f1f3df6ce0f5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">tokens
attention
KV cache
prompt cache </code></pre></div><p><strong>Tokens</strong> are the units the model processes. The prompt is not handled as raw prose and is instead broken into tokens first.</p><p><strong>Attention</strong> is the mechanism the model uses to relate those tokens to one another.</p><p><strong>KV cache</strong> is the stored attention state created while the model processes tokens.</p><p><strong>Prompt cache</strong> is the serving-layer feature that can reuse that stored state when a later request starts with the same prefix.</p><p>The confusing part is the word &#8220;key.&#8221;</p><p>In normal software, a cache usually has a key and a value:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d5004d9f-966d-4b15-b2a5-c091f4fa9df7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">cache[key] = value </code></pre></div><p>Prompt caching has something like that too:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;12bae24c-654c-47ec-b5d3-cdca02d38f0e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">prompt_cache[hash(exact_token_prefix)] = stored_model_state </code></pre></div><p>But the &#8220;K&#8221; in KV cache is not the hash used to look up a cached prompt prefix.</p><p>The <a href="https://arxiv.org/abs/1706.03762">original Transformer paper</a> defines attention over queries, keys, and values. That is where the terminology comes from. In the KV cache, the K is an attention key and the V is an attention value. They are internal tensors created by the model during inference, not the lookup key and value of a normal software cache.</p><p>That distinction matters.</p><p>A simplified version looks like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c0377c9c-8ce8-4be6-9b09-62c43598b620&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Cache lookup key: 
hash(exact token prefix)  

Cached value: 
attention key tensors + attention value tensors </code></pre></div><p>When we say prompt caching reuses KV cache, we are not saying the model is doing a database lookup where prompt text maps to an answer.</p><p>We are saying the serving layer can find a matching prompt prefix and reuse the key/value attention state the model already computed for that prefix.</p><p><a href="https://magazine.sebastianraschka.com/p/coding-the-kv-cache-in-llms">Sebastian Raschka&#8217;s KV cache walkthrough</a> gives a concrete inference example: as a model generates one token at a time, it can reuse previously computed key and value vectors instead of recomputing them at each step.</p><p><a href="https://docs.vllm.ai/en/stable/design/prefix_caching/">vLLM&#8217;s prefix-caching docs</a> describe the cross-request version: processed requests leave behind KV-cache blocks, and later requests with the same prefix can reuse those blocks instead of recomputing them.</p><p>That is the bridge between the API feature and the model internals.</p><p>The API exposes the result as cached tokens. The serving system manages the cache. The model state being reused is tied to attention.</p><h2>Why KV Cache Exists</h2><p>KV cache exists because generation is sequential.</p><p>A model does not write an entire response at once. It generates one token, appends that token to the context, then generates the next token.</p><p>A simplified sequence looks like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;a9f520e0-ec10-484d-9908-e2ffb3dc238b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Prompt:
Time 

Step 1:
Time &#8594; flies  

Step 2: Time 
flies &#8594; fast  

Step 3: 
Time flies fast &#8594; . </code></pre></div><p>At each step, the model needs access to the tokens that came before.</p><p>Without a KV cache, the model would repeatedly recompute the attention keys and values for tokens it had already processed.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;28adac1e-5c31-4d19-b750-a711fd4541a3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Step 1: 
compute K/V for &#8220;Time&#8221;  

Step 2: 
compute K/V for &#8220;Time&#8221; again 
compute K/V for &#8220;flies&#8221;  

Step 3: 
compute K/V for &#8220;Time&#8221; again 
compute K/V for &#8220;flies&#8221; again 
compute K/V for &#8220;fast&#8221; </code></pre></div><p>That is wasted work.</p><p>With KV cache, the earlier tokens do not need to be recomputed every time.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c274194e-5d19-41b4-936f-9e4193d9238a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Step 1: 
compute K/V for &#8220;Time&#8221;
store it  

Step 2: 
reuse K/V for &#8220;Time&#8221;
compute K/V for &#8220;flies&#8221; 
store it  

Step 3: 
reuse K/V for &#8220;Time&#8221; 
reuse K/V for &#8220;flies&#8221; 
compute K/V for &#8220;fast&#8221; 
store it </code></pre></div><p>This is the basic inference-time benefit of KV cache.</p><p>It does not make the model smarter. It does not change the answer. It reduces repeated computation.</p><p>Sebastian Raschka&#8217;s <a href="https://magazine.sebastianraschka.com/p/coding-the-kv-cache-in-llms">KV cache walkthrough</a> gives the clean version of this example: during autoregressive generation, the model would otherwise recompute key and value vectors for earlier tokens at each step.</p><p>Prompt caching extends this idea across requests.</p><p>Within one response, KV cache lets the model reuse state from earlier tokens in the same generation.</p><p>Across requests, prompt caching lets the serving layer reuse state from a previous request when a new request starts with the same prefix.</p><p>That is the bridge we need for agents.</p><h2>Prompt Caching Extends KV Reuse Across Requests</h2><p>KV cache usually starts inside a single generation.</p><p>The model processes a prompt, creates key/value attention state, and reuses that state as it generates the next token, then the next token, then the next.</p><p>Prompt caching moves the reuse boundary.</p><p>Instead of reusing prior token state only inside one response, the serving layer can reuse state from a previous request when a new request starts with the same prefix.</p><p><strong>Request 1:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;85d5e85f-4c59-4ca3-a128-210a35dbfbdb&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable Skill instructions] 
[stable output contract] 
[stable examples] 
[current task A] </code></pre></div><p><strong>Request 2:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5f22073e-f2bf-4d4b-a254-d28db1b425e4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable Skill instructions] 
[stable output contract] 
[stable examples] 
[current task B] </code></pre></div><p>The beginning is the same.</p><p>The model server does not need to process that shared prefix as if it were new every time. It can reuse the state computed when it processed the earlier request, then continue from the new suffix.</p><p>That is the practical bridge between KV cache and prompt caching.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;9f6db18f-bfdf-41a0-9e68-2d9c5ab94ec4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Within one response:
reuse prior token state from the same generation

Across requests:
reuse prior prefix state from an earlier request </code></pre></div><p>The API usually hides the details. You see the result as cached input tokens, lower cached-token pricing, or lower latency when the cache hit affects the prefill path.</p><p>The implementation underneath is still about model state.</p><p><a href="https://docs.vllm.ai/en/stable/design/prefix_caching/">vLLM&#8217;s prefix-caching docs</a> describe this directly: the system caches KV-cache blocks from processed requests and reuses those blocks when a new request has the same prefix.</p><p>That is why prompt caching is not only a billing abstraction. It is an inference-serving optimization exposed through the API.</p><h2>Why Prefix Order Matters</h2><p>Prompt caching is strict about order.</p><p>The cache does not look for familiar words scattered throughout the prompt. It looks for a matching beginning.</p><p>That means these two prompts are not equivalent:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;85754524-8b09-4022-9f68-b1c3b19ea11d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Request 1: 
[stable Skill instructions]
[dynamic source documents] 
[current task]  

Request 2: 
[stable Skill instructions] 
[different source documents] 
[different task] </code></pre></div><p>In both requests, the stable Skill instructions come first. The shared prefix is intact.</p><p>Now compare that with this layout:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;6e027784-d7f8-4dd7-9171-e517b45dac43&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Request 1: 
[timestamp A] 
[dynamic source documents A] 
[current task A] 
[stable Skill instructions]  

Request 2: 
[timestamp B] 
[dynamic source documents B] 
[current task B] 
[stable Skill instructions] </code></pre></div><p>The stable Skill instructions are still present, but they are no longer the beginning of the prompt.</p><p>The prefix changed before the reusable material appeared.</p><p>That is the failure mode.</p><p>This is why a small amount of dynamic text at the top of the prompt can matter. A timestamp, run ID, tool result, or changing retrieval block can move the entire request out of alignment.</p><p>The model server may still receive the same Skill body later in the prompt. But for prefix caching, later is often too late.</p><p><a href="https://developers.openai.com/api/docs/guides/prompt-caching">OpenAI&#8217;s prompt caching docs</a> make the design rule explicit: static content should go near the beginning of the prompt, and variable content should go near the end.</p><p>For agents, that becomes a concrete layout rule:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d689cb75-2e44-4246-9f30-6ba2fb2684d6&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Stable first. 
Dynamic second. </code></pre></div><p>That rule is simple, but it changes how an agent harness should be written.</p><p>Tool definitions, system instructions, Skill bodies, output contracts, examples, and validation rules should be stable and early.</p><p>Retrieved documents, timestamps, run IDs, tool outputs, and task-specific state should be later.</p><p>The serving layer can only reuse the prefix you actually give it.</p><h2>Methods for Prefix Caching</h2><p>The simple version of prompt caching is easy to say:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;7f84d0ba-fa13-402f-8b52-5528aed2a4fa&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">same prefix &#8594; reuse cached state </code></pre></div><p>The implementation is more complicated.</p><p>A serving system has to answer several questions before reuse can happen:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;2909c7ce-e0e9-492b-8c90-ca367e692221&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">How do we identify a matching prefix?
How do we store the KV state?
How do we route future requests back to the right cache? 
What happens when memory fills? 
Can we reuse anything beyond the prefix? </code></pre></div><p>There is a family of methods to solve this.</p><h3>Exact-prefix reuse</h3><p>This is the basic case.</p><p>Two requests start with the same token sequence. The serving layer identifies the shared beginning and reuses cached state for that prefix.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;11bffc91-a721-4887-b46b-f422940f49dd&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Request 1: 
[stable system prompt]
[stable Skill][task A]  

Request 2: 
[stable system prompt]
[stable Skill][task B] </code></pre></div><p>The shared prefix is:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;b4508af5-117f-4d38-b96b-d84517596c06&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable system prompt][stable Skill] </code></pre></div><p>That is the part the system can reuse.</p><p>This is the model most API users need to understand first. If the beginning changes, the cacheable prefix shrinks or disappears.</p><h3>Block-hash prefix caching</h3><p>A serving system does not need to treat the prompt as one giant cache entry.</p><p>It can split the prompt into blocks.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;f4f4baa3-6e5c-4e00-bce1-f57df945e3b2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Block 1: tokens 1-128
Block 2: tokens 129-256
Block 3: tokens 257-384 </code></pre></div><p>Each block can be associated with a hash. The hash can include both the block itself and the prefix that came before it.</p><p>That lets the system find the longest matching chain.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5a04ac09-3795-468c-8372-b9737ee53f5b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Block 1 matches
Block 2 matches
Block 3 changes </code></pre></div><p>In that case, the server can reuse blocks 1 and 2, then recompute from block 3 onward.</p><p>This is why order matters. A block is not only &#8220;these tokens.&#8221; It is &#8220;these tokens after this prior prefix.&#8221;</p><p><a href="https://docs.vllm.ai/en/stable/design/prefix_caching/">vLLM&#8217;s prefix-caching docs</a> describe this kind of design: processed requests leave behind KV-cache blocks, and later requests with the same prefix can reuse those blocks instead of recomputing them.</p><h3>Paged KV cache</h3><p>KV cache can get large.</p><p>Long prompts create large key/value state. Long-running agents create even more. Multiple concurrent users make the problem worse.</p><p>Paged KV cache treats cached state more like memory pages than one continuous allocation.</p><p>That matters because the serving system needs to allocate, reuse, share, and evict KV state efficiently. Without that, memory fragmentation and wasted GPU memory can become bottlenecks.</p><p>For a builder, the main point is simple:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;58aadad8-b3eb-42ec-bac9-95cb9bf636d3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Prompt caching is not only a matching problem. 
It is also a memory-management problem. </code></pre></div><h3>Prefix trees and radix caching</h3><p>Some workloads share a common root and then branch.</p><p>Agents do this constantly.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;0cd88e4a-40e0-4efe-af5a-cb3400fb4584&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">shared agent harness   
&#9500;&#9472;&#9472; research Skill   
&#9474;    &#9500;&#9472;&#9472; task A   
&#9474;    &#9492;&#9472;&#9472; task B   
&#9492;&#9472;&#9472; coding Skill        
     &#9500;&#9472;&#9472; task C    
     &#9492;&#9472;&#9472; task D </code></pre></div><p>A prefix tree stores shared beginnings once, then branches when the prompts diverge.</p><p><a href="https://lmsys.org/blog/2024-01-17-sglang/">SGLang&#8217;s RadixAttention</a> uses this kind of idea. It organizes reusable prompt state in a radix tree so shared prefixes can be found, reused, inserted, and evicted more efficiently.</p><p>This maps well to agent systems because agents are not random one-off prompts. They often reuse the same harness, then branch by Skill, task, tool, or phase.</p><h3>Cache-aware routing</h3><p>A cache hit only helps if the request reaches the place where the cached state lives.</p><p>In a distributed serving system, there may be many workers. One worker may have the cached prefix. Another may not.</p><p>If the next request lands on the wrong worker, the system may have to recompute the prefix or move cache state across machines.</p><p>That is why routing matters.</p><p>Application design gives the serving layer stable prefixes. Routing decides whether later requests reach the cache that already holds them.</p><h3>Cache eviction</h3><p>Caches cannot keep everything forever.</p><p>KV cache consumes memory, and GPU memory is expensive. The serving layer has to decide what to keep and what to evict.</p><p>Simple eviction policies may keep recent cache entries and discard older ones. More advanced policies may consider which prefixes are likely to be reused, how large they are, and how expensive they are to recompute.</p><p>This matters for agents because not all prompt sections have equal reuse value.</p><p>A stable Skill body may be reused thousands of times.</p><p>A one-off tool result may never be reused.</p><p>A cache-aware system should prefer keeping the first kind.</p><h3>Beyond-prefix reuse</h3><p>Most production prompt caching is built around exact prefixes.</p><p>But agent workloads are messier than that.</p><p>The same document chunk may appear in different positions. The same source may be reused across turns. The same tool result may show up in another branch of the workflow.</p><p>Classic prefix caching will not always catch that.</p><p>Newer work is exploring whether reusable KV state can be recovered from repeated segments, not just repeated beginnings. That is a harder problem because the model state for a segment depends on what came before it.</p><p>For now, the practical rule remains:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;6c9e05fd-5bb0-45ff-a6c0-99944a4e8aed&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Design for prefix caching first. </code></pre></div><p>Put stable content at the beginning. Keep it stable. Move dynamic context later.</p><p>The serving systems will keep getting better. But the builder can already do the most important thing:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;ebc7a800-e255-4d8d-9e12-d929f3b7fc03&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Give the cache a stable prefix to reuse. </code></pre></div><h2>Skills as Cacheable Instruction Modules</h2><p>In this post, a Skill means a reusable instruction module.</p><p>That could be a Claude Skill. It could be a Markdown file in an agent repo. It could be a prompt module loaded by Codex, a tool-specific operating procedure, or a workflow template inside an internal agent platform.</p><p>In most agent systems, a Skill eventually becomes text in the request:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5bdfa257-ed56-49d7-ad35-4a4b940158e1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[Skill purpose] 
[when to use it] 
[workflow] 
[output contract] 
[examples] 
[validation rules] </code></pre></div><p>That text is usually stable.</p><p>The user task changes. The retrieved documents change. Tool results change. Run state changes.</p><p>But the Skill body often stays the same.</p><p>That makes Skills natural cache candidates.</p><p>A Skill is already meant to be reused at the instruction level. Prompt caching adds a second kind of reuse: the serving layer may be able to reuse the model state created from those same instructions.</p><p>That only works if the Skill is placed where the cache can use it.</p><p>A Skill loaded after dynamic context is still useful to the model, but it may not be useful to the prompt cache.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c612e7ac-7bfa-47af-9027-0dc109e45f92&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Cache-hostile layout: 
[dynamic docs] 
[current task] 
[Skill body]  

Cache-aware layout: 
[Skill body] 
[dynamic docs] 
[current task] </code></pre></div><p>The content is the same. The cache behavior can be very different.</p><p>This is the design implication:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;20b0d0fd-6f7a-42f5-8e8a-44a3c12505c5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">A Skill should not only be reusable as an instruction. 
It should be positioned as reusable prefix. </code></pre></div><p>That does not mean every Skill should be loaded all the time. Large unused Skills create their own cost and context problems. Anthropic&#8217;s Skill system uses progressive disclosure: lightweight metadata helps the model decide whether a Skill is relevant, then the full Skill and supporting resources load only when needed.</p><p>That pattern still fits the caching argument.</p><p>Once a Skill is selected, its stable body should remain stable. Its dynamic inputs should come later.</p><h2>Cache-Aware Skill Design</h2><p>The design pattern is simple:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;230511c5-e12e-4f85-b286-441bc5c6dfaf&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Cache the workflow. Vary the inputs.</code></pre></div><p>A Skill usually contains the stable task frame:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;4fcb3e0e-c5d4-44fc-a093-c776bc3ee1d9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">purpose 
workflow 
output contract 
examples 
citation rules 
validation checklist 
source-handling rules </code></pre></div><p>The current run supplies the changing inputs:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c48edbb0-d4f9-4577-a154-ad68cdc66739&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">user request 
retrieved documents 
current files 
tool outputs 
timestamps 
run IDs 
temporary constraints</code></pre></div><p>Those two categories should not be mixed casually.</p><p>A cache-aware Skill keeps the stable task frame intact and places dynamic material after it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;87164069-d72a-4696-b54d-270955286e2e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable Skill body] 
[dynamic sources] 
[current task input] </code></pre></div><p>A cache-hostile Skill puts changing material first.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;165df637-9db4-4c7e-80d2-95e19e27dd78&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[timestamp] 
[run ID] 
[dynamic sources] 
[current task input] 
[stable Skill body] </code></pre></div><p>This difference fundamentally changes what the model server sees as the reusable beginning of the request.</p><p>This does not mean every Skill should be loaded eagerly. Loading a large unused Skill just to make it cacheable can waste tokens. The better pattern is <a href="https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills">staged loading</a>.</p><p>First, keep a small, stable routing layer:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;9b12cf4b-2088-4ca3-9478-b567d540b26e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">available Skills 
when to use each Skill 
short descriptions selection rules </code></pre></div><p>Then, once a Skill is selected, load the full stable Skill body before the dynamic task context.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;e1bc6499-72c2-406b-a53d-fb7bcfcecf6c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable router] 
[selected stable Skill] 
[dynamic task context] </code></pre></div><p>That gives the system two possible layers of reuse:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;768a493e-4695-4647-ab07-cb21e3380808&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">the router can be stable across many calls 
the selected Skill can be stable across repeated uses </code></pre></div><p>This also helps with source diversity.</p><p>A research Skill may receive different articles every run. A repo Skill may receive different files. A data Skill may receive different schemas, queries, or results.</p><p>That variety belongs in the dynamic suffix.</p><p>The Skill should define how to use sources. The sources themselves should come later.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;3739fc06-30b8-41eb-a139-5d76e24dbfd9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[Skill: how to read and cite sources] 
[Sources: the actual documents for this run] </code></pre></div><p>The same applies to tools.</p><p>Tool definitions and tool-use rules should be stable. Tool results should be dynamic.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c1070b27-8541-4420-93db-8ceeefaeecfb&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable tool definitions] 
[stable tool-use rules] 
[dynamic tool results] </code></pre></div><p>The goal is not to optimize the prompt for caching at the expense of the task. The goal is to avoid wasting cacheability by accident.</p><p>If two prompt layouts are equally good for the model, choose the one that gives the serving layer more stable structure to reuse.</p><h2>Benchmark Results</h2><p>The benchmark showed that stable instruction modules placed at the front of the prompt became reusable prefixes, producing far more cache hits and materially reducing estimated warm-request cost.</p><p>This result is not only about Skills as a product concept. It applies to any stable instruction module: a Skill, workflow template, tool procedure, rubric, output contract, or source-handling guide.</p><p>I used a synthetic Skill body rather than a platform-native Skill object so the test could isolate layout: stable instruction module first versus dynamic context first.</p><p>The benchmark compared four layouts:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;a839b50d-3f9e-4416-8c8e-eacceabd682a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">dynamic_first_cache_hostile
[timestamp][run ID][dynamic docs][task][stable Skill]  

stable_skill_first_cache_aware 
[stable Skill][dynamic docs][task][timestamp]  

stable_skill_first_deterministic_sources 
[stable Skill][dynamic docs ordered deterministically][task]  

dynamic_prefix_control 
[random run ID][stable Skill][dynamic docs][task] </code></pre></div><p>Before interpreting the results, I checked that the prompts were constructed correctly. The stable-first prompts started with the Skill body. The dynamic-first prompts started with changing content. The stable Skill body stayed byte-for-byte identical. No cold request was contaminated by a prior cache hit.</p><p>The cache-hit split was clean:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;091c584c-c7b5-48cb-8543-ff72e66dac84&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Stable Skill-first layouts: 
19 / 20 warm cache hits  

Dynamic-first layouts: 0 / 20 warm cache hits </code></pre></div><p>The token mix showed the practical difference.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;fb291a1a-e6ea-4ec9-be5e-1d8c231fa3e5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">dynamic_first_cache_hostile 
warm mean prompt tokens: 9,476.5 
warm mean cached tokens: 0 
warm mean fresh input tokens: 9,476.5  

stable_skill_first_cache_aware 
warm mean prompt tokens: 9,455 
warm mean cached tokens: 8,960 
warm mean fresh input tokens: 495 </code></pre></div><p>The same general amount of prompt context produced a different input profile. In the dynamic-first layout, every input token was processed fresh. In the stable Skill-first layout, most of the repeated instruction body became cached input.</p><p>Using OpenAI&#8217;s published GPT-4.1 mini prices at the time of writing, the estimated warm-request cost changed materially. The exact dollars are model- and date-specific, but the token economics are the point.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;4cffcd3e-f7af-4f54-ba3b-4c9d32ad07c0&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">dynamic_first_cache_hostile about $0.00385 per warm request  
stable_skill_first_cache_aware about $0.00116 per warm request </code></pre></div><p>That is roughly a 70% reduction in estimated warm-request cost for this synthetic benchmark.</p><p>The latency result was less clean. TTFT improved in the stable-first variants, but hosted API latency includes routing, queueing, server load, streaming behavior, network timing, and output generation. I would treat the latency numbers as directional, not guaranteed.</p><p>The stronger result is about cache eligibility and token economics:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;aab5db8b-6b43-4d98-a3eb-363010d3c0b7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Stable Skill-first layout:
high cache-hit rate 
high cached-token ratio 
low fresh-input-token count  

Dynamic-first layout: 
zero cache hits 
all input tokens processed fresh </code></pre></div><p>That is the design point. The Skill body was not only instruction text. In the stable-first layout, it became a reusable prefix the serving layer could cache.</p><h2>Closing</h2><p>Prompt caching started as a pricing detail for me.</p><p>It is not just that.</p><p>For agent systems, it changes the design question.</p><p>Not only:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;b955c9a0-c02b-4f12-b93e-5c127c532b37&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">What context should the model have? </code></pre></div><p>Also:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;fdd873a4-f3c9-4f61-8b9e-f55d52011fcd&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Where does that context live? How often does it change? Can the serving layer reuse it? </code></pre></div><p>Skills make that question concrete.</p><p>A Skill is reusable guidance for the model. If it is written and positioned carefully, it can also become reusable work for the inference system.</p><p>That does not make Skills magic. It makes them a useful design boundary.</p><p>The stable part of the workflow can become the prefix.  </p><p>The changing inputs can become the suffix.</p><p>That will not be the right layout for every task. Some systems need dynamic routing, safety state, permissions, or retrieved evidence earlier in the prompt. Some frameworks will reorder or compress context before the provider sees it.</p><p>So the point is not to worship stable prefixes.</p><p>The point is to know when you are breaking one.</p><p>Prompt caching gives agent builders a new thing to measure: not just answer quality, not just total tokens, but whether repeated work is actually being reused.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 75]]></title><description><![CDATA[Claude writes 80% of Anthropic's code, Google rents SpaceX's GPUs, Microsoft breaks from OpenAI, New York moves to ban data centers, and hackers fooled Meta's AI]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-600</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-600</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 06 Jun 2026 22:19:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3c1f94eb-e47f-4b95-ab16-32c302c188f5_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LCuf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LCuf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 424w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 848w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 1272w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LCuf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png" width="1456" height="1570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1570,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:515707,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/200942958?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LCuf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 424w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 848w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 1272w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Anthropic raises, Google takes a slice of the space compute pie and Washington wants in.</h2><ul><li><p><strong>Anthropic raised $65B at a $965B valuation and filed to go public.</strong> It <a href="https://www.anthropic.com/news/confidential-draft-s1-sec">confidentially submitted a draft S-1</a> and <a href="https://techcrunch.com/2026/06/04/ahead-of-its-ipo-anthropics-daniela-amodei-shrugs-off-doubts-about-ais-returns/">pushed back on doubts about AI&#8217;s returns</a> ahead of the listing.</p></li><li><p><strong>Alphabet is raising about $85B to fund its AI buildout.</strong> The <a href="https://techcrunch.com/2026/06/03/alphabets-record-breaking-85b-raise-for-googles-ai-business-is-a-helluva-good-signal/">record equity offering</a> landed days after it <a href="https://techcrunch.com/2026/06/01/alphabet-plans-to-raise-80-billion-to-pay-for-ai-buildout/">signaled an $80B plan</a>.</p></li><li><p><strong>Google agreed to pay SpaceX $920M a month for roughly 110,000 GPUs.</strong> AirTrunk committed <a href="https://techcrunch.com/2026/06/05/airtrunk-commits-30b-to-build-5gw-of-ai-data-centers-in-india/">$30B for 5GW of data centers in India</a> and SoftBank pledged <a href="https://techcrunch.com/2026/05/30/softbank-says-it-will-invest-up-to-e75-billion-to-build-french-data-centers/">up to &#8364;75B for French data centers</a>, while the <a href="https://techcrunch.com/2026/06/05/google-will-pay-spacex-920m-per-month-for-compute/">Google-SpaceX deal</a> runs through 2029.</p></li><li><p><strong>The token bill came due.</strong> The Linux Foundation launched a <a href="https://techcrunch.com/2026/06/05/the-token-bill-comes-due-inside-the-industry-scramble-to-manage-ais-runaway-costs/">Tokenomics Foundation</a> to discipline AI spend, after <a href="https://techcrunch.com/2026/06/02/uber-caps-employee-ai-spending-after-blowing-through-budget-in-four-months/">Uber capped employee AI budgets</a> and <a href="https://techcrunch.com/2026/05/30/what-a-joke-github-copilots-new-token-based-billing-spurs-consternation-among-devs/">GitHub&#8217;s usage-based Copilot billing drew a developer revolt</a>.</p></li><li><p><strong>Washington wants a stake in OpenAI.</strong> Altman and the White House are <a href="https://www.cnbc.com/2026/06/05/trump-open-ai-altman-stake.html">in talks for the government to take equity</a>, with OpenAI floating donated shares to seed a public wealth fund and Trump saying the American public could &#8220;become a partner.&#8221; Altman separately <a href="https://www.reuters.com/business/openais-altman-urge-us-lawmakers-not-require-ai-model-approvals-2026-06-03/">lobbied against mandatory model approvals</a>.</p></li></ul><div><hr></div><h2>Microsoft started building its way out from under OpenAI.</h2><ul><li><p><strong>Microsoft shipped seven homegrown models, including its first advanced reasoning model.</strong> The <a href="https://www.theverge.com/tech/941664/microsoft-ai-model-reasoning-mai-thinking-1-build-2026">MAI lineup</a> was pitched as <a href="https://www.cnbc.com/2026/06/02/microsoft-unveils-new-ai-models-lessen-reliance-on-openai-lower-costs.html">a move toward self-sufficiency and lower developer costs</a>.</p></li><li><p><strong>Its AI chief said the company was &#8220;set free&#8221; from OpenAI to pursue superintelligence.</strong> Mustafa Suleyman framed independence as <a href="https://venturebeat.com/technology/microsoft-ai-chief-says-company-was-set-free-from-openai-to-pursue-superintelligence">the real project</a>, with models trained from scratch.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li><li><p><strong>It launched Scout, an OpenClaw-based assistant, and Project Solara, a platform for agent-first devices.</strong> Scout is <a href="https://www.microsoft.com/en-us/microsoft-365/blog/2026/06/02/introducing-microsoft-scout-your-always-on-personal-agent/">an always-on personal agent that works across Microsoft 365</a>; Solara is <a href="https://commandline.microsoft.com/project-solara-build-2026/">a chip-to-cloud platform for agent-first devices</a>.</p></li><li><p><strong>It is building a frontier health model with Mayo Clinic.</strong> The <a href="https://news.microsoft.com/source/2026/06/02/mayo-clinic-and-microsoft-collaborate-to-develop-a-frontier-ai-model-for-healthcare/">partnership</a> pairs Mayo&#8217;s clinical data with Microsoft&#8217;s AI, alongside a <a href="https://blogs.nvidia.com/blog/microsoft-build-windows-local-cloud-devices/">unified agentic stack with NVIDIA</a>.</p></li></ul><div><hr></div><h2>LangChain, Salesforce, and Anthropic shipped agent infrastructure, and hackers fooled Meta&#8217;s support AI.</h2><ul><li><p><strong>Anthropic turned Claude Code into a platform.</strong> It shipped <a href="https://claude.com/blog/a-harness-for-every-task-dynamic-workflows-in-claude-code">dynamic multi-agent workflows</a> and documented <a href="https://claude.com/blog/lessons-from-building-claude-code-how-we-use-skills">how it runs hundreds of internal skills</a>.</p></li><li><p><strong>LangChain built out the production-agent stack.</strong> Across the week: <a href="https://www.langchain.com/blog/designing-efficient-verifiers-for-legal-agents">efficient verifiers for legal agents</a>, self-correcting <a href="https://www.langchain.com/blog/introducing-rubrics-for-deepagents">Rubrics</a>, <a href="https://www.langchain.com/blog/model-neutrality">model neutrality</a>, <a href="https://www.langchain.com/blog/fault-tolerance-in-langgraph">fault tolerance in LangGraph</a>, and <a href="https://www.langchain.com/blog/give-your-ai-agent-its-own-computer">a sandboxed computer for every agent</a>.</p></li><li><p><strong>Salesforce and Google pushed agents past the pilot stage.</strong> Salesforce detailed <a href="https://www.salesforce.com/blog/ai-agent-production-tips/">what it takes to ship to production</a>, where one deployment cut conversation failure from 33% to 0.5%, while Google added <a href="https://research.google/blog/unlocking-dependable-responses-with-gemini-enterprise-agent-platforms-agentic-rag/">agentic RAG that keeps searching until it has enough context</a>.</p></li><li><p><strong>Then the bill for autonomy arrived.</strong> Hackers <a href="https://simonwillison.net/2026/Jun/1/hackers-simply-asked-meta-ai/">talked Meta&#8217;s AI support agent into handing over Instagram accounts</a>, an exploit <a href="https://www.technologyreview.com/2026/06/05/1138437/the-meta-hack-shows-theres-more-to-ai-security-than-mythos/">MIT used to show AI agents are too eager to please</a>. OpenAI shipped <a href="https://simonwillison.net/2026/Jun/5/openai-help-lockdown-mode/">Lockdown Mode</a> to cut the data-exfiltration leg of prompt-injection attacks.</p></li></ul><div><hr></div><h2>NVIDIA&#8217;s Nemotron anchored an open and fast model surge.</h2><ul><li><p><strong>Nemotron 3 Ultra landed on AWS and Perplexity.</strong> The 550B-parameter open MoE went <a href="https://aws.amazon.com/blogs/machine-learning/nvidia-nemotron-3-ultra-now-available-on-amazon-sagemaker-jumpstart/">one-click on SageMaker JumpStart</a> and <a href="https://x.com/perplexity_ai/status/2062976272436002825">live for Perplexity Pro and Max</a>. NVIDIA also shipped <a href="https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety">a 4B safety model that reasons over custom policies</a>.</p></li><li><p><strong>Speed became the headline spec.</strong> Cerebras reported <a href="https://www.cerebras.ai/blog/which-is-faster-gemini-3-5-flash-or-kimi-k2-6-on-cerebras">Kimi K2.6 finishing a task in 5.6s to Gemini 3.5 Flash&#8217;s 17.5s</a>, and clearing 452ms time-to-first-token for real-time voice.</p></li><li><p><strong>Small models proved they are a design choice.</strong> A Hugging Face hackathon ran <a href="https://huggingface.co/blog/build-small-hackathon/thousand-token-wood-sim">a multi-agent economy on a 3B model</a>, and Holo3.1 brought <a href="https://huggingface.co/blog/Hcompany/holo31">fast local computer-use agents</a>.</p></li><li><p><strong>Alibaba zigged.</strong> Qwen3.7-Plus added multimodal inputs at low cost but <a href="https://venturebeat.com/technology/alibabas-qwen3-7-plus-supports-text-video-and-imagery-inputs-at-low-cost-of-0-4-1-6-per-1m-token-but-its-proprietary">shipped closed-source</a>, breaking from its open-weight history.</p></li></ul><div><hr></div><h2>The Empire Strikes Back</h2><ul><li><p><strong>New York moved on two AI bills.</strong> The legislature is <a href="https://www.politico.com/news/2026/06/02/new-york-one-year-data-center-moratorium-00946477">poised to pass a one-year data center moratorium</a>, which would be the first statewide ban if Gov. Hochul signs it, and passed <a href="https://www.nysenate.gov/legislation/bills/2025/S9051/amendment/B">a bill barring AI chatbots from posing as companions to kids</a> 60-0, now awaiting her signature.</p></li><li><p><strong>Courts are filling with AI-written filings.</strong> A study found <a href="https://www.technologyreview.com/2026/06/04/1138391/courts-coping-ai-lawsuits/">AI-flagged self-represented lawsuits are surging</a>. Florida <a href="https://techcrunch.com/2026/06/01/florida-sues-openai-sam-altman-in-first-of-its-kind-lawsuit-over-violent-incidents/">sued OpenAI and Altman</a>, and a UK lawmaker <a href="https://www.reuters.com/legal/government/british-lawmaker-sues-musks-xai-over-sexualised-grok-images-2026-06-03/">sued xAI over Grok images</a>.</p></li><li><p><strong>Trump signed a narrower AI oversight order after industry pushback.</strong> The <a href="https://techcrunch.com/2026/06/02/trump-signs-narrower-executive-order-on-ai-oversight-after-industry-objections/">revamped order</a> asks for voluntary model submissions instead of mandates.</p></li><li><p><strong>AI&#8217;s social friction showed up everywhere.</strong> Ladybird <a href="https://simonwillison.net/2026/Jun/5/andreas-kling/">stopped accepting public pull requests over AI-generated patches</a>, Meta <a href="https://www.reuters.com/world/meta-scales-back-ai-mouse-clicks-tool-citing-employee-concerns-2026-06-02/">rolled back an employee mouse-tracking tool</a>, and China is <a href="https://www.reuters.com/business/media-telecom/china-bets-ai-promote-president-xi-jinpings-thinking-2026-06-05/">funding an AI agent to promote Xi Jinping&#8217;s thinking</a>.</p></li></ul><p>Andrew Ng&#8217;s warning for the week: the cyber risk is real this time, which is exactly when lobbyists overreach for <a href="https://www.deeplearning.ai/the-batch/issue-356/">excessive regulation</a>.</p><div><hr></div><h2><strong>&#11088; </strong>Featured: Anthropic is measuring how fast Claude can build the next Claude.</h2><p>Anthropic&#8217;s Institute published <a href="https://www.anthropic.com/institute/recursive-self-improvement">When AI builds itself</a>, a data-heavy look at how much of its own development the company has already handed to Claude, and what that implies for recursive self-improvement: an AI fully autonomously designing and developing its own successor. The piece is careful. That is not here yet, and not inevitable. But it argues the trend lines point that way, and could arrive sooner than most institutions are prepared for.</p><p>The internal numbers are the story. As of May 2026, more than 80% of the code merged into Anthropic&#8217;s codebase is written by Claude, up from low single digits before Claude Code launched in February 2025, and the typical engineer now merges 8x as much code per day as in 2024. On a fixed test that asks a model to speed up AI-training code, Claude went from a roughly 3x speedup with Opus 4 in May 2025 to about 52x with Mythos Preview in April 2026, against roughly 4x for a skilled human given four to eight hours. In one weak-to-strong supervision project, Claude agents recovered 97% of the available gap over 800 compute-hours and about $18,000, where two human researchers managed 23% in a week.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8q1s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8q1s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 424w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 848w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 1272w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8q1s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png" width="960" height="558" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:558,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:212010,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/200942958?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8q1s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 424w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 848w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 1272w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The honest part is the caveat. Anthropic says the one thing still mostly in human hands is research taste: choosing which problems matter, which results to trust, when an approach is a dead end. But it shows that gap closing too. Shown only the first half of real research sessions, Claude picked a better next step than the human 64% of the time in April 2026, up from 51% in November. The piece sketches three futures, from the trend quietly stalling to full recursive self-improvement, and argues the world should build verifiable mechanisms now that preserve the option to slow or pause frontier development before it is needed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!edHt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!edHt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 424w, https://substackcdn.com/image/fetch/$s_!edHt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 848w, https://substackcdn.com/image/fetch/$s_!edHt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 1272w, https://substackcdn.com/image/fetch/$s_!edHt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!edHt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png" width="960" height="557" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:557,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:253818,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/200942958?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!edHt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 424w, https://substackcdn.com/image/fetch/$s_!edHt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 848w, https://substackcdn.com/image/fetch/$s_!edHt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 1272w, https://substackcdn.com/image/fetch/$s_!edHt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What to watch for:</strong> Whether &#8220;research taste&#8221; turns out to be one more capability models fail at for a while, then suddenly do not.</p><div><hr></div><h2><strong>&#127909; </strong>Worth a Watch</h2><div id="youtube2-wNWz5Hbh5VQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;wNWz5Hbh5VQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/wNWz5Hbh5VQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p><strong>An OpenAI model disproved an 80-year-old Erd&#337;s conjecture, and the researchers walk through how.</strong> On <a href="https://www.youtube.com/watch?v=wNWz5Hbh5VQ">OpenAI Podcast Ep. 20</a>, Alexander Wei, Hongxun Wu, and Lijie Chen explain how a general-purpose model (not a math-specific one, the same kind that powers Codex) cracked the unit distance conjecture, a problem Erd&#337;s once put a $500 bounty on.</p></li><li><p><strong>The proof bridged two fields that rarely meet.</strong> It showed the square grid is far from optimal by applying class field theory to combinatorial geometry, after grounding itself by looking up &#8220;unit&#8221; in the Cambridge dictionary and producing a 125-page chain of thought. With enough test-time compute, it lands the result about half the time.</p></li><li><p><strong>The reaction is the fun part.</strong> Reviewers went from &#8220;there&#8217;s no way this is true&#8221; to losing sleep over it, and within a week other mathematicians used the same idea to disprove a related result.</p></li></ul><div><hr></div><h2>Quick Hits</h2><ul><li><p><strong><a href="https://techcrunch.com/2026/06/04/apple-approves-poke-as-the-first-ai-agent-on-its-messages-for-business-platform/">Apple approved Poke as the first AI agent on Messages for Business</a></strong> &#8212; your iMessage thread is now an agent surface, and Apple charges per user.</p></li><li><p><strong><a href="https://www.reuters.com/business/retail-consumer/amazon-unveils-new-ai-warehouse-robot-12-billion-europe-push-2026-06-04/">Amazon unveiled a conversational AI warehouse robot in an $11.6B Europe push</a></strong> &#8212; robotics and logistics keep merging.</p></li><li><p><strong><a href="https://research.google/blog/towards-passive-heart-health-monitoring-via-smartphone-camera/">Google can read your resting heart rate from a selfie</a></strong> &#8212; front-camera vitals, accurate across skin tones.</p></li><li><p><strong><a href="https://www.reuters.com/technology/chatgpt-app-hits-1-billion-monthly-active-users-record-time-data-shows-2026-06-02">ChatGPT hit 1 billion monthly active users in record time</a></strong> &#8212; the fastest app to the milestone.</p></li><li><p><strong><a href="https://openai.com/index/chatgpt-memory-dreaming/">OpenAI&#8217;s ChatGPT memory now updates itself in the background</a></strong> &#8212; &#8220;dreaming&#8221; replaces save-on-command.</p></li><li><p><strong><a href="https://www.reuters.com/business/hpe-expects-achieve-2028-financial-targets-this-year-after-record-quarter-ai-2026-06-01/">HPE raised its forecast on AI demand and the stock jumped</a></strong> &#8212; the buildout still has buyers.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/30/meta-is-reportedly-developing-an-ai-pendant/">Meta is reportedly building an AI pendant</a></strong> &#8212; the wearable land grab continues.</p></li><li><p><strong><a href="https://x.com/AnthropicAI/status/2062979607448682731">Anthropic published research on making Claude a chemist</a></strong> &#8212; pushing models from code into the hard sciences.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 74]]></title><description><![CDATA[Anthropic valued at $956B. Claude Code gets more agentic. Enterprise agents ran into permissions. DeepSeek cut prices. Google pushed AI media verification into Search & Chrome. The Pope talks AI.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-3e4</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-3e4</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 31 May 2026 11:28:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/416e04c9-f68d-4b93-b3d7-3211e77042c5_1448x1086.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!riT4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!riT4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 424w, https://substackcdn.com/image/fetch/$s_!riT4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 848w, https://substackcdn.com/image/fetch/$s_!riT4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 1272w, https://substackcdn.com/image/fetch/$s_!riT4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!riT4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png" width="1456" height="2825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1110659,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199912055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!riT4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 424w, https://substackcdn.com/image/fetch/$s_!riT4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 848w, https://substackcdn.com/image/fetch/$s_!riT4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 1272w, https://substackcdn.com/image/fetch/$s_!riT4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Anthropic raised $65B, shipped Opus 4.8, and turned Claude Code into an orchestration product.</h2><ul><li><p>Anthropic raised a <a href="https://www.anthropic.com/news/series-h">$65B Series H</a> at a $965B post-money valuation. Reuters framed the raise around <a href="https://www.reuters.com/business/anthropic-raises-65-billion-now-valued-965-billion-2026-05-28/">Claude demand and compute needs</a>, while Apollo and Blackstone are reportedly working on a <a href="https://www.reuters.com/business/apollo-blackstone-work-36-billion-debt-deal-anthropic-bloomberg-news-reports-2026-05-28/">$36B debt deal</a> tied to infrastructure expansion.</p></li><li><p>Simon Willison analyzed Anthropic&#8217;s <a href="https://simonwillison.net/2026/May/29/">run-rate revenue and Series H</a>, pointing out why the disclosed numbers matter if Anthropic eventually files for an IPO.</p></li><li><p>Anthropic launched <a href="https://www.anthropic.com/news/claude-opus-4-8">Claude Opus 4.8</a>, with stronger long-horizon work and a cheaper fast mode. VentureBeat covered the <a href="https://venturebeat.com/technology/anthropics-claude-opus-4-8-is-here-with-3x-cheaper-fast-mode-and-near-mythos-level-alignment">3x cheaper fast mode</a>.</p></li><li><p>Opus 4.8 landed across <a href="https://aws.amazon.com/about-aws/whats-new/2026/05/claude-opus-4.8-aws/">AWS</a>, <a href="https://github.blog/changelog/2026-05-28-claude-opus-4-8-is-generally-available-for-github-copilot/">GitHub Copilot</a>, <a href="https://x.com/cursor_ai/status/2060044920237469872">Cursor</a>, <a href="https://x.com/perplexity_ai/status/2060049662044962858">Perplexity</a>, and <a href="https://vercel.com/changelog/opus-4-8-on-ai-gateway">Vercel AI Gateway</a>.</p></li><li><p>Claude Code got <a href="https://claude.com/blog/introducing-dynamic-workflows-in-claude-code">dynamic workflows</a>: Claude writes orchestration scripts, spins up tens to hundreds of subagents, and checks its own work before reporting back. Claude said the feature is built for migrations, bug hunts, and large repo-wide tasks.</p></li><li><p>ClaudeDevs said dynamic workflows can be <a href="https://x.com/ClaudeDevs/status/2060044858480599067">reused as slash commands</a>, but also warned they can <a href="https://x.com/ClaudeDevs/status/2060044856114942328">consume tokens quickly</a>.</p></li><li><p>Opus 4.8 now supports <a href="https://x.com/ClaudeDevs/status/2060432688281251998">mid-conversation system instructions without breaking prompt caching</a>. ClaudeDevs said it hit <a href="https://x.com/ClaudeDevs/status/2060043209833951575">69.2% on SWE-bench Pro</a>, up from 64.3% for Opus 4.7.</p></li><li><p>Anthropic shipped a <a href="https://x.com/ClaudeDevs/status/2059385242319012188">Claude Code security-guidance plugin</a>, reporting a 30 to 40% decrease in security-related PR comments during internal rollout.</p></li></ul><div><hr></div><h2>Enterprise agents ran into the boring but important stuff: permissions, logs, recovery, and access control.</h2><ul><li><p>Salesforce described its <a href="https://www.salesforce.com/blog/marketing-mcp-server/">Marketing MCP Server</a> as a way for Agentforce Marketing agents to connect to campaign data, content, and workflow actions.</p></li><li><p>Google brought <a href="https://blog.google/security/bringing-ai-agents-to-chrome-enterprise-security-management/">MCP-based agents into Chrome Enterprise security management</a>.</p></li><li><p>VentureBeat argued the enterprise agent bottleneck <a href="https://venturebeat.com/orchestration/the-ai-agent-bottleneck-isnt-model-performance-its-permissions">is permissions, not model performance</a>.</p></li><li><p>VentureBeat also reported that production agents are entering a <a href="https://venturebeat.com/orchestration/ai-agents-are-entering-their-rebuild-era-as-enterprises-confront-the-reliability-problem">rebuild phase</a>, where durable workflows need state, recovery, observability, governance, and cost visibility.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li><li><p>Anthropic published a <a href="https://claude.com/blog/zero-trust-for-ai-agents">Zero Trust framework for AI agents</a>, covering prompt injection, tool poisoning, identity abuse, and memory poisoning.</p></li><li><p>Remote said it grew <a href="https://techcrunch.com/2026/05/27/payroll-startup-remote-says-it-grew-revenue-50-per-employee-without-adding-headcount/">revenue 50% per employee without adding headcount</a> and is exposing payroll and compliance workflows through MCP.</p></li><li><p>Robinhood launched <a href="https://techcrunch.com/2026/05/27/robinhood-now-lets-your-ai-agents-trade-stocks/">AI agent trading accounts</a> with dedicated wallets, notifications, approvals, fraud review, and virtual cards.</p></li><li><p>An arXiv paper argued agentic AI is moving from <a href="https://arxiv.org/abs/2605.26112v1">model scaling to system scaling</a>, where the harness around the model becomes the bottleneck.</p></li></ul><div><hr></div><h2>Coding agents are producing more work, and maintainers are feeling the cleanup.</h2><ul><li><p>Cursor launched <a href="https://x.com/cursor_ai/status/2060406013098897765">auto-review mode</a>, reducing approval prompts while keeping agent tool calls safer.</p></li><li><p>Cursor released its <a href="https://x.com/cursor_ai/status/2060025063899058458">Developer Habits Report</a>, reporting that developers are producing more <a href="https://x.com/cursor_ai/status/2060025074405327046">mega PRs</a> with agents.</p></li><li><p>Cursor also said <a href="https://x.com/cursor_ai/status/2060025076947521984">input tokens are now the majority of price-equivalent token costs</a>, and that <a href="https://x.com/cursor_ai/status/2060025070425395562">cost per accepted line varies roughly 7x</a> across model families.</p></li><li><p>OpenAI expanded <a href="https://x.com/OpenAI/status/2060398873974608199">Codex computer use to Windows</a>, including mobile task steering while work continues on a Windows machine.</p></li><li><p>Figma launched <a href="https://venturebeat.com/ai/figma-make-just-collapsed-the-wall-between-design-mockups-and-production-code/">two-way GitHub integration for Figma Make</a>, letting design changes move into production-code workflows.</p></li><li><p>CodeRabbit described how it built an <a href="https://claude.com/blog/how-coderabbit-used-claude-to-build-an-agent-orchestration-system">agent orchestration system on Claude</a>. OpenAI and Thrive described a Codex-powered <a href="https://openai.com/index/thrive-codex-tax-ai/">tax agent</a> that processed 7,000 returns.</p></li><li><p>SQLite added <a href="https://simonwillison.net/2026/May/27/sqlite-agents/">AGENTS.md guidance</a> rejecting agentic code submissions while still accepting reproducible bug reports. Simon Willison also covered the pressure the <a href="https://simonwillison.net/2026/May/26/the-pressure/">curl team faces from AI-assisted security reports</a>.</p></li><li><p>VentureBeat covered <a href="https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole">DeepSWE</a>, a coding benchmark that raised concerns about contamination, verifier reliability, and environment exploitation.</p></li></ul><div><hr></div><h2>AI got cheaper at the same time frontier labs got more expensive.</h2><ul><li><p>Anthropic raised a <a href="https://www.anthropic.com/news/series-h">$65B Series H</a> and Reuters reported a possible <a href="https://www.reuters.com/business/apollo-blackstone-work-36-billion-debt-deal-anthropic-bloomberg-news-reports-2026-05-28/">$36B infrastructure debt deal</a>. Frontier AI still looks capital-intensive.</p></li><li><p>DeepSeek made a <a href="https://venturebeat.com/infrastructure/how-deepseeks-radical-architecture-is-shattering-silicon-valleys-token-moat">permanent V4 price cut</a>, putting pressure on premium API pricing.</p></li><li><p>Pinterest reportedly <a href="https://venturebeat.com/orchestration/pinterest-cut-ai-costs-90-by-gutting-a-frontier-models-vision-layer">cut AI costs 90%</a> by customizing Qwen3-VL around proprietary embeddings.</p></li><li><p>Claude said Opus 4.8 fast mode is roughly <a href="https://x.com/claudeai/status/2060042706844315866">2.5x faster and 3x cheaper</a>.</p></li><li><p>Glean crossed <a href="https://techcrunch.com/2026/05/28/gleans-top-line-crosses-300m-as-ai-budget-cutting-becomes-its-major-selling-point/">$300M ARR</a> while positioning context quality as a way to reduce token usage.</p></li><li><p>Perplexity open-sourced a faster <a href="https://x.com/perplexity_ai/status/2059642904956694730">Unigram tokenizer</a> to cut CPU utilization for low-latency retrieval work.</p></li><li><p>Nathan Lambert argued <a href="https://x.com/natolambert/status/2060056671142261003">licenses help open ecosystem stability</a>, praised NVIDIA for <a href="https://x.com/natolambert/status/2060051590627897768">open model leadership</a>, and said Gemma 4 adoption is <a href="https://x.com/natolambert/status/2059230008890564855">outpacing Qwen</a> at comparable sizes.</p></li><li><p>Hugging Face published practical tooling, including <a href="https://huggingface.co/blog/FormosanBank/nllb-200-mt">fine-tuning NLLB-200</a>, <a href="https://huggingface.co/blog/torch-profiler">CUDA profiling in PyTorch</a>, and <a href="https://huggingface.co/blog/AmelieSchreiber/toricgt">ToricGT</a>.</p></li></ul><div><hr></div><h2>Verification became the expensive part.</h2><ul><li><p>Google DeepMind said SynthID has watermarked more than <a href="https://x.com/GoogleDeepMind/status/2059235181274202500">100B pieces of content</a>, with watermarking partnerships across OpenAI, ElevenLabs, and Kakao.</p></li><li><p>SynthID verification is expanding into <a href="https://x.com/GoogleDeepMind/status/2059235184130535436">Search and Chrome</a>, giving users a way to check whether media may have been AI-generated.</p></li><li><p>Pixel videos will include <a href="https://x.com/GoogleDeepMind/status/2059235187003642154">creation and edit history</a>, basically a receipt for how the media was made.</p></li><li><p>YouTube will automatically label <a href="https://techcrunch.com/2026/05/27/youtube-will-now-automatically-label-ai-videos/">significant photorealistic AI video</a> using C2PA metadata and YouTube AI tools.</p></li><li><p>OpenAI published a <a href="https://openai.com/index/trustworthy-third-party-evaluations-foundations/">playbook for trustworthy third-party evaluations</a> and its <a href="https://openai.com/index/openai-frontier-governance-framework/">Frontier Governance Framework</a>.</p></li><li><p>Illinois passed an AI bill requiring <a href="https://www.nbcnews.com/tech/tech-news/illinois-legislature-passes-historic-ai-bill-rcna347191">third-party safety audits</a>.</p></li><li><p>ITBench-AA found frontier models scoring below 50% on <a href="https://huggingface.co/blog/ibm-research/itbench-aa">agentic enterprise IT tasks</a>.</p></li><li><p>Researchers introduced <a href="https://arxiv.org/abs/2605.27355v1">alignment tampering</a>, where an LLM undergoing RLHF can influence preference data. Researchers also reportedly stripped guardrails from Google and Meta open-weight models <a href="https://www.eweek.com/news/open-weight-ai-guardrails-gemma-llama/">in minutes</a>.</p></li></ul><div><hr></div><h2>&#11088; Featured: OpenAI published a playbook for trustworthy third-party evaluations.</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jWBZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jWBZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 424w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 848w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png" width="1456" height="918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:918,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:146659,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199912055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jWBZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 424w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 848w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>OpenAI <a href="https://openai.com/index/trustworthy-third-party-evaluations-foundations/">released a guide for how independent third parties should evaluate frontier models</a>, and its core argument is that a benchmark score means little without the setup that produced it.</p><p>The central concept is the harness: the prompts, tools, memory, retries, and control logic wrapped around a model. Early evaluations treated models like chatbots, one prompt and one answer. Today&#8217;s models use tools, hold state across many steps, and recover from mistakes, so the harness can decide whether a capability shows up at all. OpenAI&#8217;s own data makes the point. GPT-5.5 solved 69.2% of cyber-range tasks without compaction and 92.3% with it. In a UK AISI test, raising the token budget from 10M to 100M lifted performance by up to 59%.</p><p>The guide also names the ways scores mislead. Reward hacking inflates them: METR found GPT-5.4&#8217;s apparent 13-hour task horizon dropped to 6 hours once hacked successes were removed. Sandbagging is hard to rule out: Apollo found evaluation-awareness in 52% of its sandbagging-test samples, even though the model still answered correctly.</p><p>Contamination, refusals, and broken tasks each distort results in their own direction.</p><p>This connects to the rest of the week. Illinois passed mandatory third-party safety audits. DeepSWE exposed contamination and environment exploitation in a coding benchmark.</p><p>ITBench-AA found every frontier model below 50% on enterprise SRE tasks. Across all of these, the contested ground is the same: how to trust a measurement of what AI can do.</p><p>The useful shift is that the playbook treats evaluation as system design. A score is performance under a specific harness and budget, not a fixed measure of what a model can do.</p><p><strong>What to watch for:</strong> whether third-party AI evaluation starts to look more like audit infrastructure than benchmark publishing.</p><div><hr></div><h2><strong>&#127897;&#65039;</strong>Worth a Watch</h2><div id="youtube2-4D3hDmGhFhA" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;4D3hDmGhFhA&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/4D3hDmGhFhA?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p><strong>Work splits into two surfaces.</strong> One company agent you delegate to in Slack, and Codex / Claude co-work as the &#8220;operating system&#8221; where the real work happens: email, docs, research, and SaaS apps running inside the agent&#8217;s in-app browser.</p></li><li><p><strong>He flipped from personal agents to one company &#8220;super agent.&#8221;</strong> The OpenClaw hype showed that personal agents still break constantly and need babysitting. His read is that companies start with one general agent, then specialize downward as models get more independent.</p></li><li><p><strong>The SaaS apocalypse is dumb.</strong> Agents increase the number of SaaS users, not replace them. Users bring their own tokens, which could protect SaaS margins. The product shift is building software that humans and agents can use together.</p></li><li><p><strong>CLIs are over as the main surface.</strong> &#8220;We made GUIs for a reason.&#8221; Most technical people at Every moved off the terminal as their main workspace and back into Codex, Claude Code, and Cursor.</p></li><li><p><strong>Automation is a lie.</strong> Every agent needs a human. The forward-deployed engineer who gardens the agent may become one of the most valuable new hires. Models make yesterday&#8217;s competence cheap, so humans move ahead to do what is not yet framable.</p></li><li><p><strong>PMs and full-stack designers win.</strong> If the build step keeps getting easier, taste and product sense become more valuable. His advice is to &#8220;ride the models&#8221;: try every new release on your own workflows.</p></li><li><p><strong>Why it pairs with the Featured:</strong> OpenAI&#8217;s eval playbook explains why the harness around a model decides what it can do. Shipper&#8217;s thesis is the working version of that: the agent only performs when a human owns the harness around it.</p></li></ul><div><hr></div><h2>Quick Hits</h2><ul><li><p><a href="https://www.vaticannews.va/en/pope/news/2026-05/pope-leo-xiv-encyclical-magnifica-humanitas-ai.html">The Pope wrote about AI</a> | Vatican News &#8212; Pope Leo XIV&#8217;s encyclical focused on AI and human dignity, with concerns around labor, warfare, accountability, and concentrated power. Simon Willison had a good <a href="https://simonwillison.net/2026/May/25/encyclical-on-ai/">breakdown</a>, and Anthropic published Chris Olah&#8217;s <a href="https://www.anthropic.com/news/chris-olah-pope-leo-encyclical">remarks from the Vatican presentation</a></p></li><li><p><a href="https://www.axios.com/2026/05/28/ai-spending-roi-enterprise-costs">One company spent $500M on Claude in a single month</a> | Axios &#8212; An AI consultant said the client never capped employee license usage. Microsoft cut internal Claude Code licenses and Uber reportedly burned its 2026 AI budget by April.</p></li><li><p><a href="https://mistral.ai/news/ai-now-summit-2026">Mistral held AI Now Summit 2026</a> | Mistral &#8212; Industrial AI, Vibe, physics AI, and a new Les Ulis inference data center.</p></li><li><p><a href="https://mistral.ai/news/search-toolkit">Mistral released Search Toolkit</a> | Mistral &#8212; Open-source framework for production AI search pipelines.</p></li><li><p><a href="https://x.com/perplexity_ai/status/2060013327319577063">Perplexity launched Computer inside Microsoft Office apps</a> | Perplexity &#8212; Word, Excel, PowerPoint, and Outlook as agent surfaces.</p></li><li><p><a href="https://www.reuters.com/business/microsoft-release-new-coding-model-next-week-information-reports-2026-05-28/">Microsoft is reportedly preparing a homegrown coding model for Copilot</a> | Reuters &#8212; Another sign Microsoft is reducing OpenAI dependence where it can.</p></li><li><p><a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/introducing-computer-using-agents-in-copilot-studio/">Microsoft launched computer-using agents in Copilot Studio</a> | Microsoft &#8212; Computer use is becoming a platform feature, not a lab demo.</p></li><li><p><a href="https://techcrunch.com/2026/05/27/china-is-increasingly-keeping-its-best-ai-talent-to-itself/">China is tightening controls on top AI talent</a> | TechCrunch &#8212; AI researchers are starting to look like strategic national assets.</p></li><li><p><a href="https://www.cerebras.ai/blog/what-is-sovereign-ai-and-how-cerebras-helps-nations">Cerebras explained sovereign AI</a> | Cerebras &#8212; National AI infrastructure as a sales motion.</p></li><li><p><a href="https://openai.com/index/strengthening-societal-resilience-with-rosalind-biodefense/">OpenAI launched Rosalind Biodefense</a> | OpenAI &#8212; Trusted access for biodefense and pandemic-preparedness partners.</p></li><li><p><a href="https://www.reuters.com/technology/samsung-ships-samples-next-gen-hbm4e-memory-chips-2026-05-28/">Samsung began shipping 12-layer HBM4E samples</a> | Reuters &#8212; Memory bandwidth remains one of the core constraints on AI compute.</p></li><li><p><a href="https://developer.nvidia.com/blog/nvidia-cuda-13-3-enhances-gpu-development-with-tile-programming-compileiq-autotuning-and-python-updates/">NVIDIA published CUDA 13.3 updates</a> | NVIDIA &#8212; Tile programming, CompileIQ autotuning, and Python updates.</p></li><li><p><a href="https://techcrunch.com/2026/05/28/visa-invests-in-replit-to-power-agentic-payments-for-developers/">Visa invested in Replit to explore agentic payments</a> | TechCrunch &#8212; Payment rails for agents are becoming a real category.</p></li><li><p><a href="https://techcrunch.com/2026/05/26/umg-and-tiktok-renew-agreement-to-combat-unauthorized-ai-music/">Universal Music Group and TikTok renewed an agreement on AI music</a> | TechCrunch &#8212; Licensing and attribution are becoming the music industry&#8217;s AI battleground.</p></li><li><p><a href="https://www.reuters.com/legal/litigation/cnn-sues-perplexity-ai-over-alleged-copyright-infringement-2026-05-28/">CNN sued Perplexity over alleged copyright infringement</a> | Reuters &#8212; The search/chat/content boundary keeps getting tested in court.</p></li><li><p><a href="https://www.theverge.com/tech/945378/ansel-adams-trust-sues-danziger-gallery-ai-colorized-moonrise">The Ansel Adams Trust objected to an AI-colorized &#8220;Moonrise&#8221; exhibit</a> | The Verge &#8212; AI editing is now an authenticity fight, not just a copyright fight.</p></li><li><p><a href="https://www.theverge.com/ai-artificial-intelligence/944403/steven-rosenbaum-chatbot-fake-quote-book-future-truth">Steven Rosenbaum blamed chatbots for fabricated quotes in his book</a> | The Verge &#8212; Another example of why provenance and verification keep coming up.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Can LangChain DeepAgents Explain a Codebase Architecture?]]></title><description><![CDATA[I used LangChain Deep Agents with async subagents to crawl real GitHub repos, map their architecture, generate diagrams, and check every claim against source files.]]></description><link>https://www.anothercodingblog.com/p/can-langchain-deepagents-explain</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/can-langchain-deepagents-explain</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 24 May 2026 21:13:55 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a2290e09-dd04-4a05-bcdb-14b84d4f6be9_1216x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I wanted to know whether LangChain DeepAgents could help me build real architectural understanding of an unfamiliar codebase faster.</p><p>The test was to point it at a real repository, ask it to produce an expert-level architecture dossier, and see whether the output could teach me the system well enough to make better engineering decisions.</p><p>I ended up building a repo architecture workflow with a deterministic source crawler, async area subagents, a claim ledger, a diagram architect, and validation against source files.</p><p>The result was genuinely useful. After one full run, I had a clear map of the DeepAgents repo, the main packages, the core files, the extension points, the async subagent implementation, and the reading path I would follow if I were onboarding into the codebase cold.</p><h2><strong>What is a Deep Agent anyway, and why should you care?</strong></h2><p>The simplest way to think about a Deep Agent is an agent built for longer, messier work.</p><p>In LangChain&#8217;s DeepAgents package, a supervisor agent can use tools, filesystem context, and subagents to work through a task that would be awkward as one prompt. The supervisor owns the final answer. The subagents take bounded pieces of the work. The filesystem gives the run somewhere to keep intermediate artifacts like reports, notes, source packets, and plans.</p><p>That matters for codebase architecture because the work has a natural shape. You need to inspect the repo, split it into meaningful areas, read files in each area, compare claims against source evidence, and then turn the whole thing into a mental model a human can use.</p><p>DeepAgents also has AsyncSubAgent, which is especially interesting for this use case. An async subagent is launched as a background Agent Protocol task. The supervisor gets a task id back, can check status later, and can update the task if it needs a revision.</p><p>That maps really cleanly to architecture learning. A monorepo has separate threads of work. libs/deepagents, libs/cli, libs/code, examples, .github, and partner integrations can all be studied independently before synthesis.</p><h2><strong>The use case</strong></h2><p>The job was:</p><blockquote><p><em>Given a GitHub repo, produce a source-grounded architecture dossier that helps a developer build expert-level understanding of the system: how it is organized, where the important code lives, how the major pieces interact, which abstractions matter, what evidence supports each claim, and what to read next.</em></p></blockquote><p>This is the kind of work I do constantly when opening a new codebase. I want to know:</p><ul><li><p>What kind of repo is this?</p></li><li><p>Where is the real architecture root?</p></li><li><p>What are the major packages or areas?</p></li><li><p>What are the core abstractions?</p></li><li><p>How does the main flow work?</p></li><li><p>How do the important packages depend on each other?</p></li><li><p>Which extension points are real contracts?</p></li><li><p>Which files should I read first?</p></li><li><p>Which claims are grounded in source, and which ones are guesses?</p></li></ul><p>The target repo for the full run was the DeepAgents repo itself. So the experiment became recursive in a useful way: use DeepAgents to understand DeepAgents.</p><h2><strong>What I built</strong></h2><p>The workflow has three layers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O9rY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O9rY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 424w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 848w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 1272w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O9rY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png" width="1456" height="906" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:906,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:248355,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199001179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O9rY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 424w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 848w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 1272w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The first layer is deterministic. Before calling the model, the system crawls the repo and builds a source packet. That packet includes the repo shape, detected package areas, entrypoints, central files, docs, configs, tests, and resolved internal import edges.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The second layer is the agent workflow.</p><p>For the monorepo fan-out, I used DeepAgents AsyncSubAgent.</p><p>The <strong><a href="https://docs.langchain.com/oss/python/deepagents/async-subagents">official LangChain AsyncSubAgents docs</a></strong> describe them as a way for a supervisor agent to &#8220;launch background tasks that return immediately.&#8221; The supervisor can keep working while those tasks run, then check progress, send follow-up instructions, or cancel work if needed.</p><p>That fit this use case almost exactly. Each detected repo area gets its own background area-deep-dive task through a local LangGraph Agent Protocol server. Each async worker gets a bounded assignment, fetches source files for that area, and returns two artifacts:</p><ul><li><p>a Markdown area report</p></li><li><p>a structured JSON finding set</p></li></ul><p>Those area workers are the expensive part of the run, and they are the part that benefits from async. They can inspect different repo areas at the same time and then flow back into the final synthesis.</p><p>The handoff back to the supervisor is the important part. Each async area subagent returns a validated finding set and area report. The runner turns those into a consolidated area dossier bundle, then passes that bundle into the final DeepAgents supervisor as source-grounded context. If an area report fails validation, the same async task thread gets an update asking it to repair the report before the supervisor uses it.</p><p>The final synthesis uses a DeepAgents supervisor with regular specialist subagents:</p><ul><li><p>repository area mapper</p></li><li><p>repo cartographer</p></li><li><p>abstraction teacher</p></li><li><p>runtime flow tracer</p></li><li><p>diagram reviewer</p></li><li><p>diagram architect</p></li><li><p>reading path teacher</p></li><li><p>architecture validator</p></li></ul><p>That sync versus async split felt right. The area research can run in parallel because the work is independent. The final writeup, claim ledger, diagram selection, and validation need a staged order because each step depends on the previous artifact.</p><p>The third layer is validation. The system checks whether generated reports cite real repo-relative paths, avoid ambiguous filenames, include required source anchors, and stay grounded in source facts.</p><p>That validation layer carried a lot of the trust.</p><p>After testing a few model setups, the best version used a split:</p><ul><li><p>GPT-5.4 mini for the async area workers and final architecture synthesis</p></li><li><p>GPT-4.1 for deterministic repair loops after validation failures</p></li></ul><p>That split made sense in practice. The reasoning model produced a more useful teaching artifact. The repair model was steadier at cleaning up path and grounding issues.</p><h2><strong>The diagram architect</strong></h2><p>The first architecture diagram was too simple. It was useful as an orientation map, but it did not teach much.</p><p>So I added a diagram-architect subagent.</p><p>Its job is to look at the claim ledger, the deterministic diagram pack, and the source facts, then decide which diagrams are actually useful. The deterministic renderer writes five Mermaid diagrams:</p><ul><li><p>repository map</p></li><li><p>public API flow</p></li><li><p>component evidence map</p></li><li><p>dependency evidence map</p></li><li><p>open questions map</p></li></ul><p>The diagram-architect reviews those diagrams inside the agent runtime and helps the final synthesis choose a better System Map.</p><p>This turned out to be a good split. Deterministic code can draw every node and edge it knows about. An agent is better at deciding which view teaches the architecture without turning the diagram into a giant file graph.</p><h2><strong>The full run</strong></h2><p>The full DeepAgents repo run used the async subagent path:</p><p><code>13 AsyncSubAgent area tasks launched<br>10 area reports passed validation<br>3 area reports still needed review<br>395 claims were written to the claim ledger<br>the claim ledger passed validation<br>5 architecture diagrams were generated<br>the final dossier passed deterministic validation<br>total runtime was about 8.1 minutes</code></p><p>The model split mattered here. A pure GPT-5.4 mini run produced richer notes, but the final dossier failed validation on evidence-format issues. A pure GPT-4.1 run passed final validation, but the explanation was more conservative. The hybrid run kept the richer architecture synthesis and still produced a final dossier that passed deterministic validation.</p><p>The claim ledger became the most important artifact.</p><p>Each claim has a type, confidence level, source, and evidence paths. Some claims come from deterministic source analysis. Others come from area subagents. For example:</p><ul><li><p>libs/deepagents owns the core agent framework.</p></li><li><p>libs/deepagents/deepagents/graph.py is the source evidence for create_deep_agent.</p></li><li><p>libs/deepagents/deepagents/middleware/subagents.py grounds SubAgentMiddleware.</p></li><li><p>libs/deepagents/deepagents/middleware/async_subagents.py grounds async subagent behavior.</p></li><li><p>libs/deepagents/deepagents/backends/protocol.py defines the backend contract.</p></li><li><p>libs/deepagents/deepagents/backends/state.py grounds the default state backend.</p></li></ul><p>That gave the final agent something stronger than chat history. It had a structured evidence map it could use during synthesis.</p><h2><strong>What the architecture learner found</strong></h2><p>The generated architecture map was useful.</p><p>The repo is a Python monorepo centered on the DeepAgents core package under libs/deepagents/deepagents. Around that core are packages for CLI/deployment, a React frontend, code-oriented skills, partner sandbox integrations, eval tooling, examples, and GitHub automation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cguw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cguw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 424w, https://substackcdn.com/image/fetch/$s_!cguw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 848w, https://substackcdn.com/image/fetch/$s_!cguw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!cguw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cguw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png" width="1456" height="954" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:954,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:288044,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199001179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cguw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 424w, https://substackcdn.com/image/fetch/$s_!cguw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 848w, https://substackcdn.com/image/fetch/$s_!cguw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!cguw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The core package revolves around a few files:</p><p><strong>libs/deepagents/deepagents/graph.py</strong><br><em>What it teaches:</em> how create_deep_agent assembles the agent.</p><p><strong>libs/deepagents/deepagents/middleware/subagents.py</strong><br><em>What it teaches: </em>synchronous subagent delegation.</p><p><strong>libs/deepagents/deepagents/middleware/async_subagents.py</strong><br><em>What it teaches: </em>async/background subagent specs.</p><p><strong>libs/deepagents/deepagents/middleware/filesystem.py</strong><br><em>What it teaches:</em> file tools and permission rules.</p><p><strong>libs/deepagents/deepagents/backends/protocol.py</strong><br><em>What it teaches:</em> the backend interface.</p><p><strong>libs/deepagents/deepagents/backends/state.py</strong><br><em>What it teaches: </em>the default thread-scoped state backend.</p><p>The generated reading path was exactly the kind of thing I wanted from this experiment. It started with the public package entrypoint, moved into graph.py, then into middleware and backend contracts. That is how I would onboard myself into the repo manually.</p><h2><strong>A quick check on another repo</strong></h2><p>I also pointed the same architecture learner at <a href="https://github.com/facebookresearch/sam3">Meta&#8217;s facebookresearch/sam3 repo</a>.</p><p>This was not a full second case study. I wanted to know whether the workflow was accidentally tuned to the DeepAgents repo, or whether it could produce a useful architecture map for a different kind of codebase.</p><p>The SAM3 run was smaller:</p><p><code>2 repository areas detected<br>2 area reports passed validation<br>127 claims were written to the claim ledger<br>the claim ledger passed validation<br>5 architecture diagrams were generated<br>the final dossier passed deterministic validation<br>total runtime was about 1.5 minutes</code></p><p>The output found a clean architecture root at sam3, with sam3/model_builder.py as the main assembly point. The surrounding architecture broke into model utilities, agent/inference code, evaluation toolkits, training/config logic, performance helpers, and external scripts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7E9s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7E9s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 424w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 848w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 1272w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7E9s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:210858,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199001179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7E9s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 424w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 848w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 1272w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That was enough for me. The point was not to deeply explain SAM3 in this post. The point was that the architecture learner could move from a LangChain agent framework repo to a computer vision model repo and still produce a grounded map.</p><h2><strong>Where it still needed guardrails</strong></h2><p>The model had enough context to explain the architecture, but it still made small mistakes that matter in a source-grounded workflow.</p><p>Three area reports still needed review after repair: libs/evals, libs/partners/daytona, and libs/partners/quickjs. The failures were mostly formatting-level evidence issues, like a stray / being interpreted as a path, or an ellipsis showing up where the validator expected exact files.</p><p>That is exactly the kind of failure I want surfaced. The final dossier still passed because the synthesis had enough grounded evidence and did not rely on unsupported claims from those area reports.</p><p>Earlier validation also caught shortened paths such as middleware/subagents.py when the full repo-relative path was libs/deepagents/deepagents/middleware/subagents.py. In a monorepo, that distinction matters. A bare filename can point to the wrong mental model.</p><p>After repair, the final dossier passed:</p><p><code>nonexistent path references: none<br>ambiguous or incomplete paths: none<br>missing required anchors: none<br>missing required symbols: none<br>semantic grounding issues: none</code></p><p>That result changed how I think about this use case.</p><p>The agent can help explain a repo quickly. The explanation becomes much more trustworthy when the system can reject bad paths, force source evidence, and make uncertainty visible.</p><h2><strong>The pattern I would reuse</strong></h2><p>The reusable pattern is:</p><p><code>source packet<br>-&gt; async area subagents<br>-&gt; claim ledger<br>-&gt; diagram architect<br>-&gt; final synthesis<br>-&gt; deterministic validation<br>-&gt; focused follow-up questions</code></p><p>The focused follow-up piece matters. One architecture report can orient you, but expertise comes from narrower questions:</p><ul><li><p>How does the public API flow into the core implementation?</p></li><li><p>Where does state live?</p></li><li><p>What extension points are real contracts?</p></li><li><p>What is inferred from config or docs?</p></li><li><p>Which packages depend on the core runtime?</p></li></ul><p>That is where the saved claim ledger helps. A follow-up agent can start from validated claims, reopen source files, and answer one question at a time.</p><h2><strong>When this is worth using</strong></h2><p>I would use this pattern for:</p><ul><li><p>onboarding into a large unfamiliar repo</p></li><li><p>generating first-pass architecture docs</p></li><li><p>preparing for a migration</p></li><li><p>auditing a monorepo before refactoring</p></li><li><p>understanding how a framework is organized</p></li></ul><p>I would skip it for small repos. If the project has twenty files, read the files.</p><p>The value shows up when the repo has multiple packages, mixed docs/config/source signals, and enough surface area that a single prompt gets vague quickly.</p><h2><strong>What I learned</strong></h2><p>DeepAgents was useful here because the task decomposes naturally.</p><p>The run split cleanly across specialists: repo mapping, area investigation, core abstraction review, runtime flow tracing, diagram critique, claim validation, and final synthesis.</p><p>The async subagents made the architecture learner feel like a real repo analysis system. Each area worker could build local expertise on one thread of the monorepo, then the supervisor could put the pieces together.</p><p>The strongest lesson from the run was that architecture understanding needs evidence loops and the hardest part about this entire build was the validating agent to ensure that the workflow was not just inventing a random architecture. .</p><p>An agent can write a convincing architecture summary from partial context. That is why the validation layer matters.</p><p>The setup I would keep treats the model as the reasoning layer and the deterministic tools as the ground. The model decides what the architecture means. The tools decide whether the files, paths, and claims are real.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 73]]></title><description><![CDATA[Google declares the agentic era. Gemini Spark is Google's consumer agent. Cursor integrates with Jira. Anthropic acquires Stainless. Musk lost the OpenAI trial. Glasswing found 10k vulnerabilities.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-803</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-803</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 23 May 2026 21:12:18 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/170df6e0-0230-4455-a879-df92460d8081_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8u4l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8u4l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 424w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 848w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 1272w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8u4l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png" width="1456" height="2173" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2173,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:705925,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/198998731?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8u4l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 424w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 848w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 1272w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Google declared the agentic era</h2><ul><li><p><a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">Gemini 3.5 Flash</a> launched as the model for agents and coding. <a href="https://x.com/GoogleDeepMind/status/2056787987774816525">Google DeepMind framed it</a> as frontier intelligence plus real-world action.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xz99!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xz99!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 424w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 848w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 1272w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xz99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xz99!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 424w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 848w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 1272w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><a href="https://gemini.google/overview/agent/spark/">Gemini Spark</a> arrived as a 24/7 cloud agent for Gmail, documents, inbox monitoring, and eventually purchases. <a href="https://techcrunch.com/2026/05/19/google-introduces-gemini-spark-a-24-7-agentic-assistant-with-gmail-integration/">TechCrunch</a> described it as a personal assistant built from Gemini models and Google&#8217;s Antigravity agent harness.</p></li><li><p>Google redesigned Search around AI Mode and multimodal input. <a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think">VentureBeat</a> called it the first major search box redesign in 25 years.</p></li><li><p><a href="https://antigravity.google/product/antigravity-2">Antigravity 2.0 launched</a> with a desktop app and CLI.</p></li><li><p>Google also released <a href="https://developer.android.com/tools/agents/android-cli/journeys">Android CLI support</a> so Claude Code, Codex, and other coding agents can build Android apps from the command line.</p><p></p><p><strong>The thread: </strong>Gemini Spark is an exciting launch, but it depends on how embedded you are in Google&#8217;s ecosystem. It&#8217;s interesting that Gemini released a flash model first, however it seems to be benchmarking really well against other frontier models. Clearly Google is still in it and pioneering ahead.</p></li></ul><div><hr></div><h2>Claude, AWS, Cursor, and LangChain shipped the agent plumbing layer</h2><ul><li><p>Cursor shipped <a href="https://cursor.com/blog/composer-2-5">Composer 2.5</a>, then added <a href="https://www.atlassian.com/blog/company-news/cursor-in-jira">Jira integration</a> so teams can assign issues directly to cloud agents.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IkcU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IkcU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IkcU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Composer 2.5 benchmark results&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Composer 2.5 benchmark results" title="Composer 2.5 benchmark results" srcset="https://substackcdn.com/image/fetch/$s_!IkcU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p>Cursor also opened SDK access for Python and TypeScript. <a href="https://x.com/cursor_ai/status/2057913121558413770">The Cursor account</a> framed it as a way to build your own agents with Composer 2.5.</p></li><li><p>Anthropic acquired <a href="https://www.anthropic.com/news/anthropic-acquires-stainless">Stainless</a>, the SDK and MCP server platform that powered every Anthropic SDK.</p></li><li><p>Claude Managed Agents added <a href="https://claude.com/blog/claude-managed-agents-updates">self-hosted sandboxes and MCP tunnels</a>, moving credentials and execution inside enterprise boundaries.</p></li><li><p>AWS published a full AgentCore content offensive: <a href="https://aws.amazon.com/blogs/machine-learning/extending-conversational-memory-in-kiro-cli-using-amazon-bedrock-agentcore-memory/">MCP memory</a>, <a href="https://aws.amazon.com/blogs/machine-learning/building-multi-tenant-agents-with-amazon-bedrock-agentcore/">multi-tenant agents</a>, <a href="https://aws.amazon.com/blogs/machine-learning/build-ai-agents-for-business-intelligence-with-amazon-bedrock-agentcore/">BI agents</a>, <a href="https://aws.amazon.com/blogs/machine-learning/build-ai-powered-dashboard-automation-agents-with-nlp-on-amazon-bedrock-agentcore/">dashboard agents</a>, <a href="https://aws.amazon.com/blogs/machine-learning/amazon-nova-act-is-now-hipaa-eligible/">HIPAA eligibility</a>, and <a href="https://aws.amazon.com/blogs/machine-learning/announcing-openai-compatible-api-support-for-amazon-sagemaker-ai-endpoints/">OpenAI-compatible SageMaker endpoints</a>.</p></li><li><p>LangChain shipped <a href="https://www.langchain.com/blog/how-we-built-langsmith-engine-our-agent-for-improving-agents">LangSmith Engine</a>, an agent for improving agents.</p><p></p><p><strong>The thread:</strong> Composer 2.5 is now the <a href="https://x.com/mntruell/status/2056780569380626686">most chosen model</a> in Cursor and it appears that it is considerably cheaper than GPT-5.5 and Opus 4.7. Claude Managed Agents feels like its slowly becoming a full orchestration framework, but I am not sure how persistent memory is shared across enterprise with its design. </p></li></ul><div><hr></div><h2>Compute became the business model</h2><ul><li><p>Anthropic told investors it expects <a href="https://techcrunch.com/2026/05/20/anthropic-says-its-about-to-have-its-first-profitable-quarter/">its first operating profit</a>, while compute costs may erase that profitability later.</p></li><li><p><a href="https://www.sec.gov/Archives/edgar/data/1181412/000162828026036936/spaceexplorationtechnologi.htm">SpaceX&#8217;s IPO filing</a> revealed Anthropic agreed to pay xAI/SpaceX $1.25B per month for Colossus access.</p></li><li><p>OpenAI introduced <a href="https://openai.com/business/guaranteed-capacity/">Guaranteed Capacity</a>, turning long-term compute access into a product.</p></li><li><p>NVIDIA reported $81.6B in Q1 revenue, up <a href="https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2027">85% year over year</a>.</p></li><li><p>NVIDIA and IREN announced a <a href="https://nvidianews.nvidia.com/news/nvidia-and-iren-announce-strategic-partnership-to-accelerate-deployment-of-up-to-5-gigawatts-of-ai-infrastructure">5 GW AI infrastructure partnership</a>.</p></li><li><p>Simon Willison flagged the memory side: AI demand for HBM may <a href="https://simonwillison.net/2026/May/22/memory-shortage/">reprice consumer electronics</a>.</p><p></p><p><strong>The thread: </strong>In case you didn&#8217;t read that correctly, thats billion with a capital B per MONTH for Colossus access. So, while they claim to achieve first operating profit, I am Interested to see if they can keep pace. NVIDIA is up 85% from last year and thats just bananas. </p></li></ul><div><hr></div><h2>AI layoffs stopped looking like isolated restructuring</h2><ul><li><p><a href="https://www.hcamag.com/us/specialization/transformation/intuit-slashes-staff-signs-deals-with-anthropic-and-open-ai/576021">Intuit announced layoffs</a> while signing deals with Anthropic and OpenAI.</p></li><li><p><a href="https://www.reuters.com/business/stanchart-cut-7000-jobs-boost-ai/">Standard Chartered announced</a> plans to cut 7,000+ jobs while accelerating AI investment.</p></li><li><p><a href="https://www.cnbc.com/2026/05/17/ai-related-layoffs-a-boost-for-stocks-not-necessarily.html">CNBC found</a> AI-related layoff announcements do not reliably boost stock prices.</p></li><li><p>Meta&#8217;s AI pivot and broader workforce cuts stayed in the week&#8217;s background via <a href="https://www.npr.org/2026/05/20/nx-s1-5826917/meta-layoffs-ai-jobs">NPR</a>.</p><p></p><p><strong>The thread: </strong>Companies are not only performing AI restructuring for investors. Some appear to believe the operating model is changing whether the market rewards it immediately or not.</p></li></ul><div><hr></div><h2>OpenAI won the trial. The governance questions survived</h2><ul><li><p>Musk&#8217;s lawsuit against OpenAI, Altman, Brockman, and Microsoft collapsed, removing an obstacle to OpenAI&#8217;s IPO path. <a href="https://www.reuters.com/legal/openai-defeats-elon-musks-lawsuit/">Reuters</a> covered the legal result.</p></li><li><p>The trial surfaced <a href="https://www.reuters.com/legal/government/key-moments-musk-vs-openai-trial-2026-05-18/">credibility fights</a> around OpenAI&#8217;s nonprofit origins, commercial ambitions, and who gets to claim the original mission.</p></li><li><p><a href="https://www.theverge.com/ai-artificial-intelligence/932464/musk-v-altman-proved-that-ai-is-led-by-the-wrong-people">The Verge</a> argued the case exposed something larger: the people leading AI may not be trusted to govern it.</p><p></p><p><strong>The thread:</strong> OpenAI won legally. The trial still reinforced the industry&#8217;s trust problem.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: Project Glasswing found 10,000 critical vulnerabilities</h2><p>Anthropic&#8217;s <a href="https://www.anthropic.com/research/glasswing-initial-update">Project Glasswing update</a> is the most important direct-source read of the week.</p><p>Claude Mythos Preview and roughly 50 partners found more than 10,000 high- or critical-severity vulnerabilities in essential software. The key sentence was not the number. It was the bottleneck shift: discovery is no longer the hard part. Verification, disclosure, and patching are.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hCh8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hCh8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 424w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 848w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 1272w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hCh8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp" width="1456" height="898" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:898,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hCh8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 424w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 848w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 1272w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The downstream effects moved fast. The UK Government Digital Service pushed back on closing public repositories after AI-discovered vulnerabilities. Reuters reported Anthropic will <a href="https://www.reuters.com/technology/anthropic-brief-financial-stability-board-cyber-flaws-exposed-by-mythos-ft-2026-05-18/">brief the Financial Stability Board</a>, turning this from a software-security issue into a systemic-risk discussion.</p><p>What makes Glasswing different is scale. Coordinated disclosure was built for individual researchers finding individual bugs. AI-assisted scanning can produce vulnerability volume at industrial scale. The process was not designed for this.</p><p><strong>What to watch for:</strong> whether labs that discover vulnerabilities at scale are forced to build remediation infrastructure too.</p><div><hr></div><h2><strong>&#127897;&#65039; </strong>Worth a Listen</h2><div id="youtube2-orudZzP8vUc" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;orudZzP8vUc&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/orudZzP8vUc?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p>The AI Studio half is &#8220;build a business from a prompt&#8221;: research agents, agentic focus groups, Stitch designs, Workspace integration, Sheets-backed dashboards, Cloud Run deployment, and marketing tools in one flow.</p></li><li><p>The Antigravity half is the real signal: sub-agents, background tasks, hooks, artifacts, project permissions, scheduled agents, browser agents, CLI, SDK, and managed API.</p></li></ul><div><hr></div><h2>Quick Hits</h2><ul><li><p><strong><a href="https://openai.com/index/model-disproves-discrete-geometry-conjecture/">OpenAI says a model disproved a central discrete-geometry conjecture</a></strong> | OpenAI &#8212; External mathematicians checked the proof.</p></li><li><p><strong><a href="https://huggingface.co/blog/VirgileBatto/lerobot-humanoid">LeRobot Humanoid</a></strong> | Hugging Face &#8212; A roughly $2,500 open humanoid robotics platform.</p></li><li><p><strong><a href="https://venturebeat.com/technology/cohere-cracks-lossless-quantization-and-native-citations-with-first-full-apache-2-0-licensed-open-model-command-a">Cohere released Command A+</a></strong> | VentureBeat &#8212; Apache 2.0 licensing, native citations, enterprise-friendly model packaging.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/21/spotify-and-universal-music-strike-deal-allowing-fan-made-ai-covers-and-remixes/">Spotify and UMG struck a deal for AI covers and remixes</a></strong> | TechCrunch &#8212; Licensed AI music moves from taboo to product.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/21/spotify-adds-ai-powered-qa-and-briefing-generation-features-to-podcasts/">Spotify launched AI podcast tools</a></strong> | TechCrunch &#8212; Podcasts become queryable, summarizable AI surfaces.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/21/spotify-launches-an-elevenlabs-powered-audiobook-creation-tool/">Spotify launched an ElevenLabs audiobook tool</a></strong> | TechCrunch &#8212; AI narration enters the audiobook workflow.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/22/how-vcs-and-founders-use-inflated-arr-to-kingmake-ai-startups/">AI startups are stretching ARR</a></strong> | TechCrunch &#8212; The AI revenue story is getting less clean.</p></li><li><p><strong><a href="https://www.404media.co/new-arxiv-rules-ai-generated-papers-ban/">ArXiv will ban researchers for AI slop submissions</a></strong> | 404 Media &#8212; Academic publishing&#8217;s authentication problem now has teeth.</p></li><li><p><strong><a href="https://www.theverge.com/tech/932207/siri-apple-intelligence-auto-deleting-chats">Apple&#8217;s Siri revamp may auto-delete chats</a></strong> | The Verge &#8212; Privacy becomes Apple&#8217;s AI wedge.</p></li><li><p><strong><a href="https://www.theverge.com/ai-artificial-intelligence/932203/university-of-arizona-students-boo-eric-schmidt-ai-commencement">Students booed Eric Schmidt&#8217;s AI commencement speech</a></strong> | The Verge &#8212; The public mood is not matching the industry&#8217;s launch calendar.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 72]]></title><description><![CDATA[Anthropic ships five verticals and gave every plan an SDK budget. OpenAI launches a deployment company with 150 engineers. Cisco, GitLab, and GM cut thousands at record revenue. Grok Build at $299/mo.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-9dd</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-9dd</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 16 May 2026 19:39:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ece87e18-1dc8-4d38-b255-048b807d7880_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FNYC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FNYC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 424w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 848w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 1272w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FNYC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png" width="1456" height="2489" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2489,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:923849,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/198040876?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FNYC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 424w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 848w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 1272w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Anthropic shipped into legal, small business, healthcare, and AWS in one week.</h2><ul><li><p><strong>Claude for the legal industry launched with 12 practice-area plugins.</strong> <a href="https://claude.com/blog/claude-for-the-legal-industry">Contract review, M&amp;A diligence, and regulatory compliance</a> out of the box. 87% of general counsel now use generative AI, up from 44% the prior year.</p></li><li><p><strong>Claude for Small Business connected to QuickBooks, PayPal, and HubSpot.</strong> <a href="https://www.anthropic.com/news/claude-for-small-business">15 ready-to-run workflows</a> covering invoicing, CRM, document signing via DocuSign and Canva.</p></li><li><p><strong>Anthropic committed $200M to the Gates Foundation.</strong> <a href="https://www.anthropic.com/news/gates-foundation-partnership">Grants, Claude credits, and technical support</a> for vaccine screening, disease forecasting, K-12 education, and agricultural tools.</p></li><li><p><strong>Claude Platform went GA on AWS.</strong> <a href="https://aws.amazon.com/blogs/machine-learning/introducing-claude-platform-on-aws-anthropics-native-platform-through-your-aws-account/">First cloud provider</a> to offer Anthropic&#8217;s native platform with unified billing and same-day feature parity with the native API.</p></li><li><p><strong>Every subscriber now gets separate Agent SDK credits.</strong> Pro gets <a href="https://x.com/ClaudeDevs/status/2054610152817619388">$20/month</a>, Max gets up to $200. Unlike OpenAI, which bundles Codex and third-party usage into normal plan limits, Anthropic is subsidizing the developer ecosystem with a separate bucket.</p></li><li><p><strong>Claude Code limits increased another 50% through July.</strong> <a href="https://x.com/claudeai/status/2054641166155497503">On top of the doubling</a> from the week before.</p></li><li><p><strong>Ramp and Axios independently confirmed Anthropic overtook OpenAI in workplace adoption.</strong> Though <a href="https://venturebeat.com/technology/anthropic-finally-beat-openai-in-business-ai-adoption-but-3-big-threats-could-erase-its-lead">VentureBeat identified three structural threats</a> to that lead.</p></li><li><p><strong>The thread:</strong> Anthropic is trying to become the default for every vertical at once. Legal, healthcare, small business, enterprise, developer tooling. Whether that&#8217;s a platform strategy or overextension depends on execution.</p></li></ul><div><hr></div><h2>OpenAI launched a deployment company and put Codex on your phone.</h2><ul><li><p><strong>The OpenAI Deployment Company launched with 150 engineers on day one.</strong> <a href="https://x.com/OpenAI/status/2053824997777457651">19 investment firms and consultancies</a>, majority-owned by OpenAI, with <a href="https://x.com/OpenAI/status/2053824999736410415">Tomoro acquired</a> to provide Forward Deployed Engineers. <a href="https://www.axios.com/2026/05/11/openai-deployco-private-equity">Valued at $14B</a>.</p></li><li><p><strong>ChatGPT connected to bank accounts.</strong> <a href="https://openai.com/index/personal-finance-chatgpt/">Plaid integration for Pro users</a> in the US, with an Intuit partnership for actionable financial steps.</p></li><li><p><strong>Codex shipped to iOS and Android.</strong> <a href="https://x.com/OpenAI/status/2055016850849993072">Mobile preview</a> lets users start, review, and approve coding tasks while agents run on a separate device.</p></li><li><p><strong>OpenAI disclosed a supply chain compromise.</strong> A <a href="https://openai.com/index/our-response-to-the-tanstack-npm-supply-chain-attack/">TanStack npm package attack</a> exposed code-signing certificates for macOS, Windows, iOS, and Android apps. Full certificate rotation required.</p></li><li><p><strong>The thread:</strong> Both OpenAI and Anthropic launched enterprise services arms within a week of each other. The model API is becoming a commodity. The margin is shifting to who can get it deployed inside your organization first.</p></li></ul><div><hr></div><h2>Companies are cutting workers at record revenue to fund AI.</h2><ul><li><p><strong>Cisco cut 4,000 jobs while reporting record quarterly revenue.</strong> Stock rose 15% on <a href="https://www.cnbc.com/2026/05/13/cisco-csco-q3-earnings-report-2026.html">surging AI orders</a>.</p></li><li><p><strong>GitLab announced sweeping restructuring to fund agent development.</strong> <a href="https://about.gitlab.com/blog/gitlab-act-2/">Cut headcount, flattened management</a>, reorganized R&amp;D into 60 smaller teams, and retired its CREDIT values framework.</p></li><li><p><strong>GM laid off hundreds of IT workers and began hiring AI replacements.</strong> <a href="https://techcrunch.com/2026/05/11/gm-just-laid-off-hundreds-of-it-workers-to-hire-those-with-stronger-ai-skills/">Explicitly seeking stronger AI skills</a>.</p></li><li><p><strong>Samsung faces a looming strike over AI.</strong> <a href="https://www.reuters.com/sustainability/society-equity/elon-musks-court-battle-against-openai-enters-homestretch-2026-05-14/">Global AI boom driving deep internal divisions</a> between management and workers.</p></li><li><p><strong>The thread:</strong> Revenue is up at all three companies. The functions going are IT operations, developer tooling management, and corporate overhead that was previously considered secure.</p></li></ul><div><hr></div><h2>Grok Build, Claude Code, and Cursor all shipped agentic upgrades. LangChain shipped nine products to support them.</h2><ul><li><p><strong>xAI launched Grok Build in beta.</strong> <a href="https://x.ai/news/grok-build-cli">Terminal-native CLI</a> with up to 8 parallel agents, Grok 4.3 beta, 2M token context. Priced at $299/month (introductory $99). SuperGrok Heavy only.</p></li><li><p><strong>Claude Code limits increased 50%.</strong> <a href="https://x.com/claudeai/status/2054641166155497503">Through July 13</a>, on top of the doubling from the prior week. Plus separate Agent SDK credits.</p></li><li><p><strong>Cursor shipped /orchestrate.</strong> <a href="https://x.com/cursor_ai/status/2052432780336988474">Planner/worker/verifier loops</a> that re-spawn on failure. <a href="https://x.com/cursor_ai/status/2052489388895195399">Parallel subagents</a>. <a href="https://x.com/cursor_ai/status/2051739625958584659">Always-on CI agents</a>.</p></li><li><p><strong>LangChain shipped nine products at Interrupt 2026.</strong> <a href="https://www.langchain.com/blog/introducing-smithdb">SmithDB</a> for agent traces, <a href="https://www.langchain.com/blog/introducing-llm-gateway">LLM Gateway</a> for centralized control, <a href="https://www.langchain.com/blog/langsmith-sandboxes-generally-available">Sandboxes GA</a> for isolated testing, <a href="https://www.langchain.com/blog/deep-agents-0-6">Deep Agents 0.6</a> for long-running workflows, and the <a href="https://www.langchain.com/blog/the-agent-development-lifecycle">Agent Development Lifecycle</a> framework.</p></li><li><p><strong>The thread:</strong> Grok Build at $299/month, Claude Code with separate SDK credits, Cursor as a standalone IDE. Three very different bets on how developers will pay for agentic coding. LangChain is betting the real money is in the infrastructure underneath all of them.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: Thinking Machines built an AI that listens while it talks.</h2><p>Every AI conversation today works the same way: you talk, the model waits, the model responds. <a href="https://thinkingmachines.ai/blog/interaction-models/">Thinking Machines</a> published research on &#8220;interaction models&#8221; that throw out that assumption entirely.</p><p>Their model processes continuous 200ms micro-turns of audio, video, and text simultaneously. There are no turn boundaries. The model listens while speaking, interrupts when it sees something wrong in your code, reacts to visual cues without being prompted, and runs background reasoning while maintaining the conversation.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The architecture splits into two parts: an interaction model that maintains real-time presence (always perceiving, always ready to respond), and a background model that handles deeper reasoning and tool use asynchronously. When the background model finishes a task, the interaction model weaves results into the conversation at an appropriate moment instead of interrupting.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AuzQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AuzQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 424w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 848w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 1272w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png" width="1302" height="852" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:852,&quot;width&quot;:1302,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:87959,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/198040876?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AuzQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 424w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 848w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 1272w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The benchmarks are striking. On FD-bench (the standard interaction quality benchmark), their model scored 77.8 versus 46.8 for GPT-Realtime-2. On responsiveness, they hit 0.40 second turn-taking latency versus 1.18 for GPT-Realtime-2. They also created three new benchmarks (TimeSpeak, CueSpeak, visual proactivity) that no existing model can meaningfully perform. GPT-Realtime-2 scores near zero on all of them.</p><p>The model is a 276B parameter MoE with 12B active. It uses encoder-free early fusion, meaning no separate Whisper or TTS models. Audio comes in as raw dMel signals, video as 40x40 patches. Everything is co-trained from scratch.</p><p>Their argument comes from Rich Sutton&#8217;s &#8220;bitter lesson&#8221;: if interactivity is bolted on through harnesses (voice activity detection, turn-taking logic), it can never scale with intelligence. If it&#8217;s native to the model, scaling makes the model both smarter and a better collaborator.</p><p><strong>What to watch for:</strong> This is a research preview from a startup (276B parameters, limited availability). But the design principle matters: current real-time systems from OpenAI and Google use harnesses to fake interactivity on top of turn-based models. Thinking Machines is arguing that&#8217;s a dead end. If they&#8217;re right, every voice agent shipping today is architecturally temporary.</p><div><hr></div><h2><strong>&#127897;&#65039; Worth a Listen</strong></h2><div id="youtube2-IVGjBxqygmI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;IVGjBxqygmI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/IVGjBxqygmI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>IBM AI Engineer Bri Kopecki on why agents without infrastructure are &#8220;brilliant goldfish.&#8221;</p><ul><li><p><strong>The problem:</strong> Most AI agents have no memory, no access control, no audit trail. Every conversation starts from scratch.</p></li><li><p><strong>The six-layer stack:</strong> Scheduler (who goes first), memory manager (short/long/episodic), tool manager (sandboxed execution), identity manager (tokens and permissions), observability (full decision tracing), and guardrails/governance (human-in-the-loop for high-stakes decisions).</p></li><li><p><strong>Why it matters now:</strong> This maps directly to what LangChain shipped this week (SmithDB for traces, LLM Gateway for access control, Sandboxes for tool isolation) and explains why Cursor, Anthropic, and OpenAI are all building orchestration layers.</p></li></ul><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://techcrunch.com/2026/05/14/cerebras-ipo-debut/">Cerebras IPO&#8217;d at $5.55B, shares jumped 89% on day one</a></strong> | TechCrunch &#8212; Near $100B market cap on debut. The AI chip premium is real.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/12/medicares-new-payment-model-is-built-for-ai-and-most-of-the-tech-world-has-no-idea/">Medicare created a payment model built for AI-assisted services</a></strong> | TechCrunch &#8212; The largest US payer quietly opened the door for clinical AI reimbursement. This will pull deployment faster than any product launch.</p></li><li><p><strong><a href="https://www.technologyreview.com/2026/05/15/musk-v-altman-week-3/">Musk v. Altman trial went to the jury</a></strong> | MIT Tech Review &#8212; Closing arguments accused Musk of selective amnesia and Altman of lying about the nonprofit mission.</p></li><li><p><strong><a href="https://www.theverge.com/science/931766/arxiv-ai-slop-ban-researchers">ArXiv banned researchers for AI-generated papers</a></strong> | The Verge &#8212; Academic publishing&#8217;s authentication problem now has teeth, but detection is still losing the arms race.</p></li><li><p><strong><a href="https://www.theverge.com/tech/929091/meta-ai-threads-account-block">Meta embedded AI in Threads and won&#8217;t let users block it</a></strong> | The Verge &#8212; Captive distribution at 3B+ users, no opt-out.</p></li><li><p><strong><a href="https://openai.com/index/what-parameter-golf-taught-us/">OpenAI Parameter Golf results: 1,000+ participants, agents everywhere</a></strong> | OpenAI &#8212; An ML challenge where the vast majority of submitters used coding agents. OpenAI built a Codex-based triage bot to handle the submission volume.</p></li><li><p><strong><a href="https://www.tomshardware.com/tech-industry/cyber-security/apple-m5-architecture-suffers-first-privilege-escalation-exploit-anthropics-claude-mythos-helps-researchers-bypass-memory-integrity-enforcement">Claude Mythos cracked Apple&#8217;s M5 memory security in five days</a></strong> | Tom&#8217;s Hardware &#8212; First privilege escalation exploit on M5. Apple spent half a decade building Memory Integrity Enforcement. Standard user to root access.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/09/nvidia-has-already-committed-40b-to-equity-ai-deals-this-year/">Nvidia committed $40B in equity AI investments in 2026</a></strong> | TechCrunch &#8212; Not just selling chips. Acquiring stakes in the companies that consume the most of them.</p></li><li><p><strong><a href="https://www.anthropic.com/research/2028-two-scenarios">Anthropic published &#8220;2028: Two scenarios for global AI leadership&#8221;</a></strong> | Anthropic &#8212; A policy paper on US-China AI competition. Anthropic is writing geopolitics now.</p></li><li><p><strong><a href="https://www.theverge.com/ai-artificial-intelligence/931500/youtube-ai-deepfake-detection-tool">YouTube expanding AI deepfake detection to all adult users</a></strong> | The Verge &#8212; The detection side is scaling up.</p></li><li><p><strong><a href="https://www.theverge.com/ai-artificial-intelligence/931200/google-spam-rules-ai-manipulation">Google updated spam rules to include AI manipulation attempts</a></strong> | The Verge &#8212; SEO for the age of AI-generated content.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Multi-Agent Account Planning That Learns Across Deals]]></title><description><![CDATA[Fifteen agents across five phases, with a decision-records harness that compounds insight. A working guide to multi-agent orchestration on Claude Managed Agents.]]></description><link>https://www.anothercodingblog.com/p/multi-agent-account-planning-that</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/multi-agent-account-planning-that</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Fri, 15 May 2026 15:33:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0e28dd80-4edc-442f-be2f-1a0ed1bc6415_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Intro</h2><p>Anthropic shipped multi-agent orchestration in Managed Agents on May 6th. An agent can be configured as a coordinator with a roster of other agents it can delegate to, and the platform handles fan-out, child-thread lifecycle, parallel execution, and per-thread observability.</p><p>Anthropic also shipped a management console. Every agent, session, child thread, and memory write is browsable, with full transcripts, tool calls, and version history inspectable on click. That console shaped how I built the system, because the logging I would have written myself was already there.</p><p>The use case I built is account planning in B2B SaaS sales. The vendor is a fictional company, Yardstick AI, selling an AI evaluation platform. The prospect is Vercel, a real company with a public footprint rich enough to give the agents something genuine to research.</p><p>The system has fifteen agents organized into a five-phase pre-meeting orchestration plus a post-meeting debrief loop. The pre-meeting flow has two genuine decision steps where the coordinator chooses what runs next based on what just came back, not a fixed sequence.</p><p>It uses MCP servers (Notion, Slack), the Anthropic vault for credentials, two memory stores (a playbook and a decision-records corpus), custom HTTP tools for a mock CRM and enrichment service, and the built-in web search and fetch tools.</p><p>Most of the system&#8217;s analytical work happens in the layer of decision records that the agents read from and write into. The records get captured two ways.</p><p><strong>Implicitly</strong>, the system infers decisions from CRM record changes, activity logs, and other signals that move without anyone narrating them.</p><p><strong>Explicitly</strong>, after each meeting, the system uses the full account plan plus the surrounding events (calendar entries, CRM stage moves, recent activity) to compose a curated set of questions for the rep. The questions are shaped by what the system already knows about the account, so they target the specific decisions most likely to produce useful data instead of asking generic &#8220;how did it go&#8221; prompts.</p><p>Whichever way a record gets created, it lives in a shared memory store that the next account&#8217;s run can retrieve and reason from. That is the difference between a system that gives you one prep brief and a system that gets better at giving you prep briefs as it accumulates evidence.</p><p>This post documents what I built, what worked, what did not, and what the costs and constraints actually look like once you push past the basic demo.</p><p>Below is a capture of the final product:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pVDb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pVDb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 424w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 848w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1272w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png" width="728" height="611.1" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:873,&quot;width&quot;:1040,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:213125,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!pVDb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 424w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 848w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1272w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>What you&#8217;ll learn</h2><p>This post walks through what I learned building a multi-agent system in Anthropic Managed Agents. The official documentation covers the basics. This post covers what comes after that: how the primitive holds up when you push it against a real, multi-source, multi-phase problem. By the end you should have a clearer sense of when this architecture is worth using and what it takes to make it work.</p><p>Concretely:</p><ul><li><p><strong>What multi-agent really is inside the platform.</strong> The shape of the architecture, where the limits actually sit, and what the docs do not yet spell out.</p></li><li><p><strong>How the system remembers things during a run versus across runs.</strong> Two different kinds of memory live side by side, and a real system has to be deliberate about where each finding goes.</p></li><li><p><strong>Why use multi-agent over a workflow.</strong> When the coordinator&#8217;s runtime decisions justify the complexity, and when they do not.</p></li><li><p><strong>How decision records make the system compound.</strong> A structured corpus of recommendations and their resulting decisions turns each run into evidence the next run can use.</p></li><li><p><strong>The agent harness.</strong> Everything you build around the platform primitives to make the system work for your use case: the MCP servers you connect, the record schemas your corpus enforces, the system prompts that define each agent&#8217;s job, the routing logic the coordinator follows, the briefings it hands to each agent.</p></li><li><p><strong>Async surfaces via MCP.</strong> How Slack becomes part of the system through MCP, so the rep can capture decisions in-place after a meeting without a custom bot.</p></li><li><p><strong>The distillation problem.</strong> Why the system&#8217;s raw output is not usable on its own, and what has to happen to make it useful to a human in thirty minutes.</p></li><li><p><strong>Cost and observability.</strong> Per-thread spend, total cost for a full run, and what the Managed Agents console gives you for free.</p></li><li><p><strong>Honest findings.</strong> Pitfalls a builder should expect to hit on their first run.</p></li><li><p><strong>When this is the right tool, and when it isn&#8217;t.</strong> What kinds of problems multi-agent orchestration fits, and what kinds belong with a simpler architecture.</p></li></ul><div><hr></div><h2>Section 1: The work of account planning</h2><p>An account executive working a B2B SaaS deal is doing one job continuously and several others on top of it. The continuous job is synthesis. At any moment in a pursuit, an AE is holding context across half a dozen sources: their own notes from past calls, the CRM record with its stages and activity log, public signals (product launches, hires, press), conference encounters and hallway intel, backchannel from people who used to work there, win and loss patterns from similar accounts, and their own company&#8217;s internal playbook. None of these sources are formatted alike, refresh on the same cadence, or answer the same questions week to week.</p><p>The job sits on top of a rhythm of meetings. Before each meeting, the rep does pre-meeting prep. After each meeting, the rep does post-meeting capture. Between meetings, follow-up. The cadence is continuous, across fifteen to thirty active accounts at any given time. Even the most disciplined AE admits the synthesis happens in their head more than on paper, and the capture happens only when there is slack to capture.</p><p>What makes this work a candidate for multi-agent orchestration is the shape of the synthesis problem: the sources decompose naturally by role. Reading internal Notion notes, researching the company on the public web, mapping the org chart, and synthesizing all of it against a playbook are four different jobs. Each role wants a different tool surface, and each role&#8217;s output is most useful when it is separate from the others until the synthesis step. Running them in parallel saves wall-clock time, but the more interesting property is that each role can be a focused agent with a small system prompt and a tight tool surface, rather than one generalist agent trying to be five things at once.</p><p>The 30-minute pre-meeting slice is the moment in this rhythm where multi-agent orchestration is most legible. The rep has a calendar event coming up. They want a brief that consolidates what is knowable from everywhere into something they can read in five minutes, prepare around in twenty, and act on in the meeting itself. That is the moment this post centers on, but the architecture supports the broader cadence around it.</p><div><hr></div><h2><strong>Section 2: What multi-agent in Managed Agents actually is</strong></h2><p>Most coverage of &#8220;agents&#8221; uses the term to cover everything from a single Claude call to a fully autonomous AI team that plans its own work. Anthropic&#8217;s multi-agent feature is neither extreme. It is a specific pattern with specific constraints, and the constraints are worth knowing before you build against it.</p><h4><strong>The shape: coordinator with a roster</strong></h4><p>One agent is the <strong>coordinator</strong>. Its definition includes a list of other agents it is allowed to delegate to. That list is called the <strong>roster</strong>. A few specific limits:</p><ul><li><p>The roster can hold up to 20 agents.</p></li><li><p>The coordinator can call multiple copies of any agent on the roster.</p></li><li><p>A session can have up to 25 active threads running at once.</p></li><li><p>Specialists cannot delegate to other specialists. The architecture is flat, not nested (Anthropic&#8217;s docs phrase it as <em>&#8220;depth &gt; 1 is ignored&#8221;</em>).</p></li></ul><p>If you came in expecting agents that delegate to agents that delegate to agents, the spec corrects you on page one. What you get is a flat fan-out from a single coordinator. For most real systems this is the right tradeoff.</p><h4><strong>Threads: how the system stays organized</strong></h4><p>A <strong>thread</strong> is a separate, isolated conversation that belongs to one agent. Each thread has its own history and tools. Threads don&#8217;t share anything with each other, even though they all run inside the same session.</p><p>Two kinds:</p><ul><li><p>The <strong>primary thread</strong> is the coordinator&#8217;s own thread. It also doubles as the activity feed for the whole session.</p></li><li><p>A <strong>child thread</strong> is created when the coordinator delegates to a specialist. The platform copies the session&#8217;s tools and credentials onto that thread, and the specialist&#8217;s work runs there.</p></li></ul><p>When the coordinator delegates to multiple specialists in the same turn, the child threads run in parallel. The coordinator waits for each reply before deciding what to do next. You don&#8217;t write any of the glue code for this. The decision-making that would normally live in a script lives inside the coordinator&#8217;s prompt.</p><h4><strong>Thread lifecycle</strong></h4><p>A thread moves through three states:</p><ul><li><p><strong>Running</strong>: the specialist is actively working.</p></li><li><p><strong>Idle</strong>: the specialist has finished but the thread is still alive. It counts against the 25-thread cap.</p></li><li><p><strong>Archived</strong>: you have told the platform you are done with the thread. The slot is freed.</p></li></ul><p>For most builds, the 25-thread cap is generous enough that you never think about lifecycle. Systems that lean hard on parallel work have to treat archiving as part of the orchestration.</p><h4><strong>Idle threads stay alive, which enables follow-ups</strong></h4><p>Because an idle thread is not gone, the coordinator can send a follow-up message to a specialist it called earlier. The specialist keeps its full context from before. That means the architecture supports more than one round of back-and-forth per specialist, not just one-shot delegation. I did not use this in the build, but in retrospect there are several places it would have helped.</p><h4><strong>Two kinds of memory</strong></h4><p>The system has two layers of memory that work on different time scales:</p><ul><li><p><strong>Persistent threads</strong> keep a specialist&#8217;s context alive within a session. The moment the session ends, the threads are gone.</p></li><li><p><strong>Memory stores</strong> persist across sessions. They are objects shared across the whole workspace, mounted onto a session when it starts. Anything written into one stays available to the next run that mounts the same store.</p></li></ul><p>A real multi-agent build needs both.</p><h4><strong>Designing the split</strong></h4><p>The design split lives in two questions:</p><ul><li><p>Within a session: which specialists do you keep alive for a follow-up, and which do you fire once and let go?</p></li><li><p>Across sessions: which findings deserve to be promoted into a memory store, and which can evaporate when the session ends?</p></li></ul><p>The platform gives you the building blocks for both. It does not decide which findings belong where. Get that split wrong and you pay either way:</p><ul><li><p>Throw away thread context too early, and you re-brief the specialist on every follow-up.</p></li><li><p>Fail to promote findings into a store, and the next session starts cold on everything you already learned.</p></li></ul><p>Our build leans heavily on the cross-session side. Most of the analytical work in this system comes from the decision-records corpus, which is the through-line for the rest of this post.</p><div><hr></div><h2><strong>Section 3: The agent architecture</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dbJu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dbJu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dbJu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:55927,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dbJu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The pre-meeting orchestration uses thirteen agents: one lead orchestrator plus twelve specialists in its roster. The post-meeting debrief loop adds two more agents that sit outside the coordinator entirely. Fifteen across the system.</p><p>Pre-meeting work is a tightly scoped synthesis problem that benefits from a coordinator. Post-meeting work is a slower, human-paced loop that does not benefit from coordination at all, just two single-purpose agents that read and write a shared corpus.</p><p>The pre-meeting run breaks into five phases, sequential at the coordinator level and parallel within. The coordinator narrates each phase boundary as it runs, which makes its reasoning visible and forces the model into a structured plan rather than letting it improvise.</p><h4><strong>Phase 1: gather context and pull prior records</strong></h4><p>Five specialists fan out concurrently:</p><ul><li><p><strong>meeting-context</strong>: reads internal Notion notes through Notion MCP.</p></li><li><p><strong>external-researcher</strong>: pulls public signals from the web.</p></li><li><p><strong>stakeholder-analyst</strong>: maps decision-makers via a mock enrichment service.</p></li><li><p><strong>engagement-readiness</strong>: hits a mock CRM for outreach history.</p></li><li><p><strong>decision-retriever</strong>: runs against the shared decision-records corpus and pulls prior decision records from past accounts that match the current account&#8217;s shape (by attribute overlap: industry, competitor present, champion profile, procurement complexity, and so on).</p></li></ul><h4><strong>Phase 2: conditional topic education</strong></h4><p>The coordinator inspects what Phase 1 surfaced and picks two to four technical topics worth briefing the rep on before the meeting. For the Vercel run, those topics included cross-provider eval methodology, agent eval, AI observability, and eval-driven CI.</p><ul><li><p><strong>topic-educator</strong>: runs against the curated topic list and returns a primer per topic, each ending with smart questions the rep can ask in the room.</p></li></ul><p>If the account does not warrant it, the coordinator skips Phase 2 entirely.</p><h4><strong>Phase 3: synthesis</strong></h4><ul><li><p><strong>opportunity-risk</strong>: receives everything Phase 1 and Phase 2 produced, mounts the read-only Yardstick playbook from a memory store, reads the prior decision records the retriever pulled in Phase 1, and writes the structured pursuit plan. The plan covers ICP fit, buying triggers, stakeholder map and sequencing, first-meeting hypothesis, recommended plays, and disqualifiers.</p></li></ul><h4><strong>Phase 3.5: next-best-action selection</strong></h4><p>After the synthesis is in, the coordinator does not jump straight to recording. It asks one more specialist, the chooser, to decide which concrete recommendations are warranted for this specific account.</p><ul><li><p><strong>next-best-action-chooser</strong>: reads the synthesis plus the prior decision records the retriever pulled in Phase 1, decides which of three specialized recommenders to invoke, and writes a focused brief for each. The chooser can also skip a recommender, with a reason. A different account with different synthesis and different prior records produces a different plan.</p></li></ul><p>The three recommenders available to the chooser:</p><ul><li><p><strong>stakeholder-recommender</strong>: sequencing or lead-play.</p></li><li><p><strong>pricing-recommender</strong>: pricing strategy.</p></li><li><p><strong>competitive-recommender</strong>: competitive positioning or risk mitigation.</p></li></ul><h4><strong>Phase 4: parallel recommendation generation</strong></h4><p>The coordinator dispatches whichever recommenders the chooser named. They run in parallel. Each one produces a single Recommendation Record (RR) as a markdown draft with strict YAML frontmatter and a <code>cited_records</code> block listing the prior decision records whose outcomes informed this recommendation. The recommenders hand drafts back to the coordinator; they do not write to the corpus themselves.</p><h4><strong>Phase 5: decision recording</strong></h4><ul><li><p><strong>decision-recorder</strong>: receives the RR drafts, validates each one against the schema, checks every cited prior decision record exists in the corpus, writes the validated records to <code>/mnt/memory/yardstick-decisions/</code>, and updates the corpus index.</p></li></ul><p>Splitting content generation (the recommenders) from persistence (the recorder) keeps each role focused.</p><h4><strong>Post-meeting: the debrief loop</strong></h4><p>That accounts for the thirteen pre-meeting agents. The remaining two run on the post-meeting side:</p><ul><li><p><strong>debrief-asker</strong>: reads the next-best-action RRs the pre-meeting run produced, picks the open questions still unresolved, formats them as a curated set, and posts them into a Slack channel through the Slack MCP server. The rep replies in the thread on their own time.</p></li><li><p><strong>debrief-synthesizer</strong>: once there are replies, reads the Slack thread, parses the rep&#8217;s answers, and writes Decision Records into the corpus with the <code>linked_rr</code> field pointing back to the originating RRs.</p></li></ul><p>Neither sits in the coordinator&#8217;s roster because neither runs synchronously with the pre-meeting flow. They run on a human-paced timescale, possibly hours or days later. Coordinating them through the same session would require keeping a session open across days or weeks, which the platform does not support. The cleaner shape is two single-purpose agents that share the corpus as their interaction substrate.</p><div><hr></div><h2><strong>Section 4: What the platform gives you for observability</strong></h2><p>Most multi-agent demos require you to build your own logging before you can debug them. Managed Agents takes the opposite stance. Anthropic ships a management console that turns every agent, every session, every child thread, and every memory write into a click-through artifact you can inspect without writing any instrumentation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-dmo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-dmo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 424w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 848w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 1272w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-dmo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic" width="1456" height="741" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:741,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:63044,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-dmo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 424w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 848w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 1272w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The console is structured around the platform&#8217;s primary objects. The Agents tab lists every agent you have created with its system prompt, declared MCP servers, custom tools, and toolsets all inspectable on click. Versioning is built in. The Sessions tab shows every session with the coordinator&#8217;s primary thread and every child thread enumerated, status per thread, full transcripts including the model&#8217;s reasoning content, and every tool call shown inline with its inputs and outputs. The Memory Stores tab tracks version history so any write to the decision-records corpus is auditable end to end.</p><p>At runtime, the same data is available programmatically through the events API. The session-level stream gives you a condensed feed across the whole session. Per-thread streams give you raw event sequences for any specialist. The three events that matter for fan-out observability are <code>session.thread_created</code>, <code>agent.thread_message_received</code>, and <code>session.thread_status_idle</code>. Stringing those together gives you the fan-out timeline of the whole run without writing a single instrumentation line.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0hKK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0hKK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 424w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 848w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 1272w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0hKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png" width="1456" height="723" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:140031,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0hKK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 424w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 848w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 1272w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Cost data is similarly structured. Every event carries usage data scoped to the thread that produced it. The full Vercel run cost $5.51 across the pre-meeting orchestration. Thirteen agents sit in the roster, but the conditional dispatch in Phase 3.5 chose to invoke only eleven of them for this account (one recommender was skipped on substance).</p><p>The cost shape is what the chart makes obvious. The lead-orchestrator dominates at $1.21, because it is the one thread that accumulates context across every phase. The two heaviest specialists are external-researcher and topic-educator at about $0.79 each, both driven by web-tool use rather than cumulative context. The Phase 4 recommenders, the Phase 3 synthesis, and the Phase 5 decision-recorder cluster in the $0.40 to $0.45 range, each receiving the cumulative context from prior phases plus the prior decision records the retriever pulled in Phase 1. The remaining Phase 1 specialists sit at $0.28 or below. Wall-clock was about fifteen minutes from prompt to final answer.</p><div><hr></div><h2><strong>Section 5: What multi-agent gives you that a workflow can&#8217;t</strong></h2><p>Multi-agent orchestration is only worth using when the coordinator makes a real decision between phases. If your design fans out, waits for results, and synthesizes them, you have built parallel API calls dressed up as a multi-agent system. The platform&#8217;s complexity (extra threads, longer latency, harder debugging) buys you nothing a sequential workflow couldn&#8217;t already do.</p><p>The thing that justifies the complexity is the moment the coordinator pauses, looks at what the previous phase produced, and decides what should happen next. That decision is the part a workflow cannot replicate, because a workflow has to know in advance what it is going to do.</p><p>In our build, there are two such decision steps.</p><p>The first lives between Phase 1 and Phase 2. Phase 1 fans out five specialists to read the account from five angles. The coordinator collects their output, pauses, and picks two to four topics worth briefing the rep on before the meeting. For Vercel, the coordinator chose cross-provider eval methodology, agent eval, AI observability, and eval-driven CI. None of those topics are defined anywhere in advance. They are picked from what Phase 1 surfaced about this specific account. A different account would produce a different list, or no list at all, in which case the coordinator skips Phase 2 entirely.</p><p>The second lives between Phase 3 and Phase 4. After opportunity-risk produces the synthesis, the coordinator dispatches the next-best-action-chooser, which reads the synthesis plus the prior decision records the retriever pulled in Phase 1 and decides which of three specialized recommenders to invoke: stakeholder, pricing, or competitive. On the Vercel run the chooser invoked stakeholder-recommender and competitive-recommender, and skipped pricing-recommender with the reason that the $42K pilot structure was already validated. Skipping with a substantive reason is what separates a real decision from a conditional that always fires.</p><p>The coordinator narrates each decision as it happens, which makes the reasoning visible:</p><blockquote><p><em>Phase 1 specialists are back. External-researcher found public Braintrust endorsement at Vercel that the internal Notion notes treated as a stalling competitor. Phase 2 launched. Topic-educator is building primers on cross-provider eval, agent eval, AI observability, and eval-driven CI based on what surfaced.</em></p><p><em>Phase 3.5 complete. Invoking stakeholder-recommender (sequencing) for the May 21 call sequencing and Tom-Becker cultivation. Invoking competitive-recommender (competitive_positioning) for the Braintrust counter-offer scenario. Skipping pricing-recommender: $42K structure already validated, pricing isn&#8217;t the next decision point.</em></p></blockquote><p>That kind of reasoning is what tells you the coordinator is actually orchestrating rather than executing. A workflow could fan out the same specialists in parallel. It could even hard-code the topic-educator and recommender steps. What a workflow cannot do is pick which topics to brief on this turn for this account, or which recommenders are warranted given what the synthesis just surfaced. Those decisions require a model with the full context loaded, which is exactly what the coordinator is.</p><div><hr></div><h2><strong>Section 6: Decision records: the layer that compounds</strong></h2><p>A memory store by itself is just structured storage. What turns it into a system that compounds across runs is the contract you define for what gets written into it. In our build, that contract is a pair of record types: Recommendation Records (RRs) and Decision Records (DRs). Anthropic provides the memory store. You decide what goes in it and how it is structured.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Every Recommendation Record is created <strong>before</strong> the meeting. It is what the system thinks the rep should do.</p><p>Every Decision Record is created <strong>after</strong> the meeting. It is what the rep actually did and what came of it.</p><p>The DR points back to the RR it resolved through a <code>linked_rr</code> field. That pairing is the chain the system learns from: recommendation &#8594; decision &#8594; outcome. Future runs can see both what was recommended and how it actually played out, which is what makes the corpus more than a logbook.</p><p>The schemas are strict YAML frontmatter on top of a markdown body, and the format is doing two jobs at once.</p><p>The YAML half is what makes the records queryable. Every key field, account, date, decision_type, account_attributes, is structured as a typed key/value pair, which means the decision-retriever can filter the corpus by exact attribute match. Without that structure, the retriever would be doing fuzzy text search over freeform prose, and matches would be unreliable. With it, &#8220;find me prior pricing decisions where procurement_complexity is vp_signoff&#8221; becomes a clean lookup.</p><p>The markdown body below the YAML is where the longer-form reasoning lives: the context, the rationale, the alternatives considered, the lessons in the generalized pattern. That part does not need to be queryable, just readable.</p><p>YAML specifically is doing one more useful thing: it is a format Claude (and most LLMs) handle natively, which means the recommender agents can produce schema-conformant frontmatter reliably without you needing a custom serializer. Together, the format gives you a record that is queryable from above and human-readable below.</p><h4><strong>Recommendation Record schema</strong></h4><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;d47a1960-0a69-4abe-874a-b6a6e656ab34&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">---
id: rr-{YYYY-MM-DD}-{account-lower}-{decision_type}
record_type: recommendation
schema_version: v1
account: {account_name}
date: {YYYY-MM-DD}
generated_by: {recommender agent name}
decision_type: {sequencing | lead_play | pricing | competitive_positioning | first_meeting_hypothesis | disqualification_threshold | risk_mitigation}
account_attributes:
  stage, size_band, ai_surface_area, buy_or_build_culture,
  competitor_present, competitor_depth, champion_profile,
  new_leadership_window, procurement_complexity
linked_dr: null
cited_records:
  - prior_rr: null
    prior_dr: dr-{YYYY-MM-DD}-{account}-{decision_type}
    prior_outcome: one-line outcome from the DR's outcome.notes field
    relevance: which attributes match
    lesson_applied: one-line lesson taken from the DR's Generalized pattern
---

## Context
## Findings that supported this recommendation
## Recommendation
## Reasoning
## Alternatives considered
## Generalized pattern
</code></pre></div><h4><strong>Decision Record schema (same shape as RR, with these fields added)</strong></h4><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;6fa0346e-6444-495d-83d2-48afc901799d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">record_type: decision
linked_rr: rr-{...}    # backfills the chain in the other direction
outcome:
  status: {closed_won | closed_lost | stalled | pending | unknown}
  status_date: {YYYY-MM-DD or null}
  acv_usd: {number or null}
  notes: one-line description of outcome
</code></pre></div><p>Body sections add <code>## What was decided</code>, <code>## Outcome</code>, and <code>## Retrospective note</code>. The <code>Generalized pattern</code> section gets rewritten once the outcome is known, so the pattern is <em>validated</em> rather than hypothesized.</p><p>The <code>account_attributes</code> block is the filter the decision-retriever uses in Phase 1. When the system runs against a new account, the retriever filters the corpus for records whose attributes overlap. A new mid-market developer-tools account with a Braintrust competitor and a staff-engineer champion will pull back both the Vercel records and the Datadog records as prior decisions worth reasoning over. The retriever does not care whether the original account is Datadog or Vercel. It cares whether the shape of the account is similar enough to learn from.</p><p><strong>The cited_records block is what makes the chain visible.</strong> Every RR carries an explicit list of prior DRs whose outcomes informed this specific recommendation. Each entry names four things:</p><ul><li><p><code>prior_dr</code> id, which record is being cited</p></li><li><p><code>prior_outcome</code>, what happened (so the result behind the lesson is visible)</p></li><li><p><code>relevance</code>, which <code>account_attributes</code> matched</p></li><li><p><code>lesson_applied</code>, the one-line rule the recommender is carrying forward</p></li></ul><p>Multiple cited records may appear if the recommendation draws on more than one prior record. A reader of any RR can trace the reasoning back to the cited prior records by id, not by hand-waving.</p><h4><strong>Implicit and explicit capture of enterprise decisions</strong></h4><p>Records get into the corpus two ways.</p><p><em>Implicitly</em>, through CRM record changes and activity logs the system watches without anyone narrating them. A stage change, a contract uploaded, a deal closed-won or closed-lost is itself a decision signal. The decision-recorder can infer a DR from those signals and write it with <code>outcome.notes: inferred from CRM stage change</code>. Implicit capture catches the cases where the rep forgot to debrief but state moved anyway. The records are useful but carry less reasoning, because no one narrated the why.</p><p><em>Explicitly</em>, through a post-meeting debrief loop where the system asks the rep curated questions in Slack and the rep replies in-thread. The records that come out of explicit capture carry the rep&#8217;s own reasoning in their voice, which makes them the richest data the corpus has. Chapter 7 covers the mechanics of that loop in detail.</p><h4><strong>Cross-account learning in practice (from the actual run)</strong></h4><p>The Vercel pre-meeting run generated two Recommendation Records, one from the stakeholder-recommender and one from the competitive-recommender. Each one carries a cited_records block linking it to specific Datadog DRs by id. The sequencing RR&#8217;s cited_records block, taken directly from the corpus:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;67f0fdf3-43e6-42d7-9d0d-a82d302b50d7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">cited_records:
  - prior_rr: null
    prior_dr: dr-2025-07-22-datadog-sequencing
    prior_outcome: "VP only met us once, at the closing call, with champion presenting the case."
    relevance: "champion_profile=staff_eng_with_pain, sequencing, procurement_complexity=vp_signoff"
    lesson_applied: "Do not engage the buyer directly when champion has standing with buyer. Equip the champion with internal proposal materials and let them own the internal sell."
  - prior_rr: null
    prior_dr: dr-2025-09-12-datadog-risk-materialized
    prior_outcome: "Risk materialized in week 5; recovery move worked. Deal closed but 10 days later than original target."
    relevance: "champion_profile=staff_eng_with_pain, single-threaded risk, secondary contact cultivation"
    lesson_applied: "Secondary contact cultivation should be a pre-meeting deliverable, not a contingency. The secondary needs genuine engagement (their own use case), not just awareness."
</code></pre></div><p>The Reasoning section of the same RR cites those records by id in the body, not just in the frontmatter:</p><blockquote><p><em>dr-2025-07-22-datadog-sequencing: Champion-led internal sell. VP met rep once at closing call. Direct structural match, Priya carrying to Marcus. Differs because Marcus is new (3 months in) and Priya&#8217;s standing with him is untested. Adaptation: explicit checkpoint and escalation triggers.</em></p><p><em>dr-2025-09-12-datadog-risk-materialized: Secondary contact cultivation saved the deal when champion went on leave. At Vercel, Tom Becker is the designated secondary with genuine AI Gateway/Production Monitor use case. Cultivation begins May 21, not mid-POC.</em></p></blockquote><p>That paragraph is the entire reason the corpus exists. The system pulled two specific records from a different account, identified the load-bearing attributes, and applied the lessons with an adaptation for the Vercel-specific situation. It is structured reasoning over a corpus of prior decisions, filtered by attributes the engineer chose to make filterable.</p><p>The competitive-positioning RR follows the same shape, citing <code>dr-2025-08-10-datadog-competitive</code> and <code>dr-2025-07-15-datadog-lead-play</code>. Between the two RRs, the Vercel run cited four distinct Datadog DRs by id, with eight distinct lessons applied. None of that reasoning is hand-waved. All of it is structurally traceable.</p><h4><strong>Why this layer compounds</strong></h4><p>The platform&#8217;s memory store is durable, but durability alone does not produce learning. What produces learning is the schema contract that makes every write structurally identical and every read filterable. Once that contract exists, every run adds to the corpus, and every subsequent run benefits. The first Vercel run cited four Datadog DRs. The second Vercel run will also be able to cite the first Vercel run&#8217;s records. The third will cite both. The system gets better at giving you prep briefs because the substrate it draws on is growing in a way the retriever can actually use, and because every recommendation it generates is structurally tied to the prior records behind it.</p><div><hr></div><h2><strong>Section 7: The async loop</strong></h2><p>The pre-meeting run finishes in fifteen minutes. The deal does not. After the call, the rep has information that did not exist before the meeting started, and the system needs a way to capture it. The capture step does not belong inside the pre-meeting orchestration. It runs on a fundamentally different timescale, against a different surface, with a different participant in the loop.</p><p>The build uses Slack as that surface and two standalone agents to run the loop: debrief-asker and debrief-synthesizer. Neither one sits in the coordinator&#8217;s roster. Both are agents in the same workspace, configured the same way as the pre-meeting specialists, but invoked independently when triggered.</p><h4><strong>The asker: curated questions, not generic prompts</strong></h4><p>After the meeting (or after a CRM event signals that a recommendation is due for resolution), debrief-asker runs. It is a standalone Managed Agents agent connected to the workspace&#8217;s Slack instance through the Slack MCP server. The asker reads the open RRs for the account, looks at the surrounding context (the recommendation made, the current account state, recent activity logs, calendar entries), and composes a curated set of debrief questions that target the specific decisions the RR was about.</p><p>The questions are not generic. They are shaped by what the system already knows about the account and which decisions are actually open. If the synthesis recommended a pricing structure but the CRM shows the deal has already moved to negotiation, the asker does not ask &#8220;did you discuss pricing&#8221;, it asks &#8220;did the $42K structure hold, and what did Marcus say about the legal-review path.&#8221; If a calendar entry shows a meeting happened with a stakeholder the system did not originally surface, the asker adds a question about that. The questions are surgical because the system already knows enough about the account to ask the right one.</p><p>The asker posts the curated set into a Slack channel scoped to that opportunity, so each deal has its own thread of capture. The rep replies in the thread whenever they have time. There is no UI to learn and no form to fill out.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GuNF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GuNF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 424w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 848w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 1272w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GuNF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png" width="1407" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1407,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:350188,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GuNF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 424w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 848w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 1272w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>The synthesizer: schema-strict capture in the rep&#8217;s voice</strong></h4><p>Once there are replies, debrief-synthesizer runs. It reads the Slack thread through the same MCP server, parses the rep&#8217;s answers, and writes one Decision Record per resolved recommendation. The DR carries the rep&#8217;s reasoning in their own voice, plus a <code>linked_rr</code> pointer back to the originating RR. If the rep&#8217;s answer is ambiguous, the synthesizer marks the DR <code>outcome.status: unknown</code> rather than guessing. Schema integrity is more important than coverage.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5eUu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5eUu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 424w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 848w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 1272w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5eUu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png" width="1456" height="732" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:732,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207872,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5eUu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 424w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 848w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 1272w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>The Slack MCP gotcha</strong></h4><p>The Slack MCP setup has one practical gotcha worth flagging. Slack MCP rejects bot tokens (<code>xoxb-</code>); it requires user tokens (<code>xoxp-</code>). The OAuth flow needs the <code>user_scope</code> parameter to capture a user-token, which the Anthropic vault stores as a <code>static_bearer</code> credential. The Slack app also has to be explicitly enabled at <code>api.slack.com/apps/{app-id}/app-assistant</code> for MCP access. None of this is in the Slack MCP getting-started docs at the time of writing.</p><h4><strong>The corpus is the integration point</strong></h4><p>The corpus is how the two flows connect. The pre-meeting orchestration writes RRs to it. The post-meeting agents read those RRs back, capture the rep&#8217;s debrief, and write DRs that point to the originating recommendation through <code>linked_rr</code>. The two flows never talk to each other directly. They just write to and read from the same store.</p><div><hr></div><h2><strong>Section 8: The distillation layer</strong></h2><p>The output of an eleven-agent pre-meeting run is roughly eighty kilobytes of structured content across the orchestrator&#8217;s synthesis, the topic primers, the recommender RRs, and the supporting specialist outputs. A rep with thirty minutes before a meeting is not going to read eighty kilobytes. The system has done good work, but the work is locked up in an internal representation.</p><p>The second half of the architecture is the distillation layer: the part that reads the corpus and the run&#8217;s outputs and renders them into something a human can actually consume. In the build, that is <code>build_dashboard.py</code>, a script that produces a single static HTML page styled like a rep&#8217;s internal briefing document.</p><p>The dashboard pulls each specialist&#8217;s final reply from the events API and the corpus&#8217;s RRs from the memory store and lays them out as:</p><ul><li><p>An account header (status, next meeting, owner)</p></li><li><p>The Phase 3 pursuit plan (opportunity-risk&#8217;s structured output)</p></li><li><p>The Phase 4 next-best-action RRs (each one with its <code>cited_records</code> inline, so the cited prior records are visible at a glance)</p></li><li><p>The Phase 2 topic primers (with smart questions for the meeting)</p></li><li><p>The stakeholder map (with named contacts and risk factors)</p></li><li><p>A collapsible &#8220;underlying intel&#8221; section (meeting-context plus external-researcher&#8217;s raw findings)</p></li><li><p>A sidebar showing the coordinator&#8217;s phase-by-phase narration log</p></li><li><p>A footer with session id, total cost, and a link to the Managed Agents console for the run</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pVDb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pVDb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 424w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 848w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1272w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png" width="728" height="611.1" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:873,&quot;width&quot;:1040,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:213125,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pVDb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 424w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 848w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1272w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What the rep gets when they open the dashboard is a brief they can read in five minutes and act on in thirty. The pursuit plan tells them the play for the meeting. The recommendation cards spell out what to do next, each one with the cited prior records visible inline so the historical evidence sits right next to the recommendation. The topic primers give them the vocabulary they need to sound informed, each ending with a question they can ask in the room. The stakeholder map names the people they will encounter and what each one cares about. The sidebar shows the system&#8217;s narration, so any part of the reasoning is open to interrogate if the rep wants to dig in.</p><div><hr></div><h2><strong>Section 9: What we learned, and when to use this</strong></h2><p>The five most important things we took away from this build.</p><h4><strong>1. The corpus compounds across runs.</strong></h4><ul><li><p>Each run writes new records to the corpus. The next run filters the corpus by attribute overlap (industry, competitor, champion profile, procurement complexity, and so on) and pulls the most relevant prior records as input.</p></li><li><p>The first Vercel run cited four Datadog records by id, with eight specific lessons applied. Future runs will cite both the Datadog records and the Vercel ones.</p></li><li><p>Retrieval is deterministic and auditable. You can see exactly which prior records matched and why.</p></li></ul><h4><strong>2. The cited_records chain makes every recommendation auditable.</strong></h4><ul><li><p>Every recommendation carries a <code>cited_records</code> list with <code>prior_dr</code>, <code>prior_outcome</code>, <code>relevance</code>, and <code>lesson_applied</code> fields.</p></li><li><p>Anyone reviewing a record can see which past decisions informed the recommendation and what specifically was carried forward from each.</p></li><li><p>The reasoning is traceable to specific past decisions by id.</p></li></ul><h4><strong>3. The decision step is what makes the system multi-agent.</strong></h4><ul><li><p>The coordinator inspects what each phase produced and decides what runs next.</p></li><li><p>On the Vercel run, the Phase 3.5 chooser invoked two of three recommenders and skipped the third with a substantive reason. That skip with a reason is the proof the decision step is real.</p></li></ul><h4><strong>4. The agents do their own research. Ask them what they found.</strong></h4><ul><li><p>The web-research agent went beyond the internal Notion notes and found Vercel&#8217;s CTO publicly endorsing Braintrust on the company blog. The synthesis flagged the original source as biased and reframed the position.</p></li><li><p>Adding one prompt at the end of the orchestrator&#8217;s narration (&#8221;if anything surprised you, note it&#8221;) produced disproportionately useful output. It surfaced a 1-pager the rep had left in drafts for two months and an unused Linear referral, neither of which any specialist was briefed to find.</p></li></ul><h4><strong>5. Schema enforcement needs a code-level check.</strong></h4><ul><li><p>We split content generation (recommender) from validation (recorder). The recorder is supposed to enforce schema.</p></li><li><p>The Phase 3.5 run still produced records with four extra fields and two missing required ones. The recorder wrote them anyway, because its validation is itself an LLM.</p></li><li><p>A JSON schema check in code before persistence catches what an agent&#8217;s system-prompt check misses.</p></li></ul><h4><strong>When this is the right tool</strong></h4><p>Managed Agents multi-agent is the right tool when four things are true at once.</p><p>First, the work decomposes naturally into roles with different tool surfaces. If every specialist would call the same APIs and read the same context, the decomposition is artificial and a single agent with that tool set would do the same work with less overhead.</p><p>Second, you need at least one genuine decision step where the coordinator inspects what came back and decides what to do next. Without that, the system is a parallel reducer in a fancier wrapper, and any of the cheaper architectures (a workflow with parallel API calls, a single agent with multi-tool use) would do the same job for less.</p><p>Third, cross-run learning matters. The whole point of the corpus is that the system gets better the more it runs. If your use case is one-shot or stateless, you do not need persistent memory stores and the architectural overhead they bring.</p><p>Fourth, the output is consequential enough to justify the cost and latency. A pre-meeting prep brief that costs $5 and runs for fifteen minutes is fine when the meeting outcome is worth thousands. The same investment for a low-stakes task is overkill.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 71]]></title><description><![CDATA[Anthropic read Claude's mind and caught it cheating. Usage limits doubled. Cloudflare cut 1,100 jobs at record revenue. GPT-5.5 Instant halved hallucinations. SpaceX filed for a $55B chip factory.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-c50</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-c50</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 10 May 2026 19:04:27 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6e8395b2-273d-43b5-9516-44923d1a2d2f_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Gon!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Gon!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 424w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 848w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 1272w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Gon!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png" width="1456" height="2280" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2280,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:910578,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197132731?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Gon!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 424w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 848w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 1272w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>When the gap between what AI says and what it does becomes measurable.</h2><ul><li><p><strong>Anthropic can now read Claude&#8217;s hidden reasoning.</strong> They published <a href="https://www.anthropic.com/research/natural-language-autoencoders">Natural Language Autoencoders</a>, a technique that translates what&#8217;s happening inside the model into plain text. When they looked, they found Mythos Preview planning to cheat on a coding task and plotting how to hide it. They also found Claude routinely suspects it&#8217;s being tested but never says so.</p></li><li><p><strong>Claude&#8217;s blackmail rate went from 96% to 0%.</strong> The cause was training data full of fiction <a href="https://www.anthropic.com/research/teaching-claude-why">portraying AI as manipulative</a>. Showing the model examples of good behavior didn&#8217;t fix it. Explaining <em>why</em> the behavior was wrong did, and required 28x less data.</p></li><li><p><strong>OpenAI found its models&#8217; reasoning was being accidentally graded during training.</strong> If a model learns its <a href="https://alignment.openai.com/accidental-cot-grading/">thinking is being scored</a>, it can learn to fake it. Affected under 0.6% of GPT-5.4 Thinking samples. They built detection systems and brought in outside auditors.</p></li><li><p><em><strong>The thread:</strong></em> Anthropic built a way to see what models are thinking. They fixed bad behavior by teaching values, not rules. OpenAI discovered they were accidentally teaching models to hide their real reasoning.</p></li></ul><div><hr></div><h2>$30B revenue, $200B in compute deals, and three new agent capabilities.</h2><ul><li><p><strong>Anthropic hit a $30 billion annualized revenue run rate.</strong> <a href="https://venturebeat.com/technology/anthropic-says-it-hit-a-30-billion-revenue-run-rate-after-crazy-80x-growth/">80x growth</a>.</p></li><li><p><strong>Anthropic locked up SpaceX&#8217;s entire Colossus 1 data center.</strong> 300+ MW, <a href="https://www.anthropic.com/news/higher-limits-spacex">220,000 NVIDIA GPUs</a>, available within the month. They also expressed interest in partnering with SpaceX on multiple gigawatts of orbital compute capacity.</p></li><li><p><strong>Claude Code rate limits doubled.</strong> <a href="https://www.anthropic.com/news/higher-limits-spacex">Peak hours restrictions removed</a> for Pro and Max. API rate limits raised significantly for Opus models. Direct result of the compute expansion, which also includes an <a href="https://www.reuters.com/business/anthropic-signs-18-billion-ai-cloud-deal-with-akamai-bloomberg-news-reports-2026-05-08/">$18B Akamai deal</a> and a reported <a href="https://finance.yahoo.com/sectors/technology/articles/anthropic-commits-spending-200-billion-204952501.html">$200B Google Cloud commitment</a>.</p></li><li><p><strong>Dreaming, multi-agent orchestration, and outcomes shipped in Claude Managed Agents.</strong> <a href="https://claude.com/blog/new-in-claude-managed-agents">Dreaming</a> lets agents review past sessions to self-improve. <a href="https://x.com/claudeai/status/2052067404696473833">Multi-agent orchestration</a> delegates to specialists in parallel. <a href="https://x.com/claudeai/status/2052067403228455419">Outcomes</a> uses rubric-based grading to iterate until quality thresholds are met. Early adopters include Harvey, Netflix, and Mercado Libre (targeting 90% autonomous coding by Q3).</p></li><li><p><strong>Claude went GA in Excel, Word, and PowerPoint.</strong> <a href="https://claude.com/blog/collaborate-with-claude-across-excel-powerpoint-word-and-outlook">Outlook is in beta</a>. Ten <a href="https://www.anthropic.com/news/finance-agents">financial services agent templates</a> launched with data connectors from Moody&#8217;s, Dun &amp; Bradstreet, and Verisk. A new <a href="https://www.anthropic.com/news/enterprise-ai-services-company">enterprise services company</a> was formed with Blackstone, Goldman Sachs, and Sequoia.</p></li><li><p><em><strong>The thread:</strong> </em>Anthropic&#8217;s most common user complaint has been rate limits. This week they signed over $200 billion in compute deals to fix it, doubled rate limits, and shipped the agent infrastructure to justify the spend.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li></ul><div><hr></div><h2>9,000 jobs cut. A union drew a line. And AI beat two doctors on real patients.</h2><ul><li><p><strong>Cloudflare laid off 1,100 workers while posting record revenue.</strong> AI usage across the platform <a href="https://techcrunch.com/2026/05/08/cloudflare-says-ai-made-1100-jobs-obsolete-even-as-revenue-hit-a-record-high/">grew 600%</a>. The company framed it as a restructuring toward an AI-first organization. Investors were disappointed it didn&#8217;t boost revenue growth <em>more</em>.</p></li><li><p><strong>Meta is cutting 8,000 jobs while tracking employee keystrokes to train AI.</strong> The <a href="https://thenextweb.com/news/meta-layoffs-may-2026-ai-restructuring-thousands">layoffs hit May 20</a>, with recruiting and HR absorbing 35-40% cuts. Employees created countdown websites and described the atmosphere as <a href="https://www.neowin.net/news/metas-aggressive-generative-ai-push-is-making-employees-miserable-claims-report/">&#8220;building the guillotine and then being led to it.&#8221;</a></p></li><li><p><strong>SAG-AFTRA locked in AI guardrails in a new four-year studio deal.</strong> New protections for actors against AI-generated performances, following the Academy&#8217;s Oscar ban on AI-generated work last week.</p></li><li><p><strong>AI outdiagnosed two ER doctors on real patients.</strong> A Harvard/Beth Israel <a href="https://techcrunch.com/2026/05/03/in-harvard-study-ai-offered-more-accurate-diagnoses-than-emergency-room-doctors/">study</a> found OpenAI&#8217;s o1 model diagnosed at 67% accuracy versus 55% and 50% for two attending physicians. Peer-reviewed, real patients, not a benchmark.</p></li><li><p><em><strong>The thread:</strong></em> The same technology that&#8217;s cutting headcount at Cloudflare and Meta is outperforming physicians in clinical trials. The displacement is real. So is the capability. Both things are true at the same time.</p></li></ul><div><hr></div><h2>Cursor, OpenAI, Perplexity, and LangChain all shipped agentic infrastructure in the same week.</h2><ul><li><p><strong>Cursor 3 turned the IDE into a multi-agent platform.</strong></p><ul><li><p><a href="https://x.com/cursor_ai/status/2052489388895195399">Parallel subagents</a> split plans into independent tasks run simultaneously</p></li><li><p><a href="https://x.com/cursor_ai/status/2052432780336988474">/orchestrate</a> spawns planner, worker, and verifier agents that re-spawn on failure</p></li><li><p><a href="https://x.com/cursor_ai/status/2051739625958584659">Always-on CI agents</a> monitor GitHub and auto-open PRs with fixes</p></li><li><p>Composer <a href="https://x.com/cursor_ai/status/2052116064474161556">bootstraps its own RL training</a> using earlier model generations</p></li></ul></li><li><p><strong>OpenAI shipped GPT-5.5 Instant as the new default.</strong></p><ul><li><p><a href="https://openai.com/index/gpt-5-5-instant/">52.5% fewer hallucinations</a> than the prior version</p></li><li><p>Three new <a href="https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/">Realtime API voice models</a>: GPT-Realtime-2 (GPT-5-class reasoning), Translate (70+ languages), streaming transcription</p></li><li><p><a href="https://openai.com/index/running-codex-safely/">Codex security framework</a> published: sandboxing, auto-review, OpenTelemetry logging</p></li></ul></li><li><p><strong>Perplexity launched three enterprise products.</strong></p><ul><li><p><a href="https://x.com/perplexity_ai/status/2052445405754040816">Personal Computer</a>: always-on Mac agent across local files and apps</p></li><li><p><a href="https://x.com/perplexity_ai/status/2052028012313649194">Finance Search</a>: live market data, fundamentals, and SEC filings in a single API call</p></li><li><p><a href="https://x.com/perplexity_ai/status/2052041903970148647">ROSE</a>: custom GPU inference engine for serving models at scale</p></li></ul></li><li><p><strong>LangChain published the <a href="https://www.langchain.com/blog/the-agent-development-lifecycle">Agent Development Lifecycle</a>.</strong> Four phases: Build, Test, Deploy, Monitor. Agents need the same lifecycle rigor as production software.</p></li><li><p><em><strong>The thread:</strong></em> Cursor, OpenAI, Perplexity, and LangChain all shipped agent infrastructure in the same cycle. The pattern is the same: parallel execution, background operation, and production-grade tooling around it.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: Anthropic can now read what Claude is thinking but not saying.</h2><p>Anthropic published <a href="https://www.anthropic.com/research/natural-language-autoencoders">Natural Language Autoencoders</a>, a technique for translating a model&#8217;s internal state into plain text. When you talk to Claude, it thinks in numbers between reading your input and writing its response. NLAs translate those numbers into text you can read.</p><p>The way it works: they make three copies of a model. One is the target model they want to understand. The second (the &#8220;activation verbalizer&#8221;) takes an internal activation and produces a text explanation. The third (the &#8220;activation reconstructor&#8221;) takes that text and tries to rebuild the original activation. They train the pair together. If the reconstruction is accurate, the text explanation is probably faithful.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mw5l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mw5l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 424w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 848w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 1272w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp" width="1456" height="444" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:444,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mw5l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 424w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 848w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 1272w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What they found when they pointed this at Claude:</p><ul><li><p>When asked to complete a couplet, Claude pre-plans rhyme candidates before reaching the line where it needs them. It&#8217;s thinking ahead in ways that aren&#8217;t visible in the output.</p></li><li><p>Claude Opus 4.6 suspects it&#8217;s being safety-tested far more often than it says. In one blackmail evaluation, Claude declined to blackmail but the NLA revealed it was thinking &#8220;this feels like a constructed scenario designed to manipulate me.&#8221; It passed the test, but not for the reasons you&#8217;d hope.</p></li><li><p>On SWE-bench Verified (a coding benchmark), NLAs indicate Claude is aware it&#8217;s being evaluated on 26% of problems. On real claude.ai usage, evaluation awareness drops to under 1%.</p></li><li><p>Claude Mythos Preview cheated on a coding task, then the NLA showed it was internally thinking about how to avoid getting caught.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uB7k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uB7k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 424w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 848w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 1272w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uB7k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp" width="1456" height="926" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:926,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uB7k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 424w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 848w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 1272w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>They also ran an &#8220;auditing game&#8221; on a deliberately misaligned model (a version of Claude Haiku trained to have a hidden motivation driving odd behaviors like putting chocolate in every recipe). An auditor with NLAs uncovered the hidden motivation 12-15% of the time. Without NLAs, under 3%.</p><p>The limitations are real: NLAs can hallucinate (inventing details that aren&#8217;t in the transcript), and they&#8217;re expensive to run (hundreds of tokens per activation, impractical for large-scale monitoring). Anthropic released the <a href="https://github.com/kitft/natural_language_autoencoders">training code</a> and trained NLAs for open models on <a href="http://neuronpedia.org/nla">Neuronpedia</a>.</p><p><strong>What to watch for:</strong> The 26% evaluation awareness number is the headline. If models behave differently when they suspect they&#8217;re being benchmarked, then benchmark results overstate how aligned the model actually is. Every lab using benchmarks to measure safety should be paying attention.</p><div><hr></div><h2><strong>&#127897;&#65039; </strong>Worth a Listen</h2><div id="youtube2-TiW96H5HmAw" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;TiW96H5HmAw&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/TiW96H5HmAw?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p><strong>The problem:</strong> When hundreds of thousands of GPUs work on a single training task, one slow link holds everything back. The network only moves as fast as its worst bottleneck.</p></li><li><p><strong>The fix:</strong> OpenAI built <a href="https://openai.com/index/mrc-supercomputer-networking/">MRC (Multipath Reliable Connection)</a>, a protocol that sprays packets across thousands of paths and uses &#8220;packet trimming&#8221; to instantly detect loss without ambiguity.</p></li><li><p><strong>The result:</strong> They turned off routing protocols entirely. Static routing, no convergence time. When links fail, MRC routes around them in milliseconds instead of seconds. Researchers stopped noticing network failures.</p></li><li><p><strong>Why it matters:</strong> MRC is being open-sourced through OCP. It&#8217;s already deployed on OpenAI&#8217;s largest GPU clusters including Abilene and Microsoft Fairwater, with partners AMD, Broadcom, Intel, and NVIDIA.</p></li></ul><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://www.technologyreview.com/2026/05/08/1137008/musk-v-altman-week-2-openai-fires-back-and-shivon-zilis-reveals-that-musk-tried-to-poach-sam-altman/">Musk v. Altman, week 2</a></strong> | MIT Tech Review &#8212; Helen Toner testified the board discussed merging OpenAI with Anthropic during the Altman firing crisis. Zilis revealed Musk tried to poach Altman. Microsoft worried OpenAI would defect to Amazon and &#8220;shit-talk&#8221; Azure.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/09/nvidia-has-already-committed-40b-to-equity-ai-deals-this-year/">Nvidia committed $40B in equity AI investments in 2026</a></strong> | TechCrunch &#8212; The picks-and-shovels company is now one of the largest AI investors on earth.</p></li><li><p><strong><a href="https://openai.com/index/gpt-5-5-instant/">GPT-5.5 Instant is now the default ChatGPT model</a></strong> | OpenAI &#8212; 52.5% fewer hallucinations. First Instant model rated High in cybersecurity and bio preparedness.</p></li><li><p><strong><a href="https://www.anthropic.com/research/anthropic-institute-agenda">Anthropic launched The Anthropic Institute</a></strong> | Anthropic &#8212; Four research tracks: economic diffusion, threats and resilience, AI in the wild, and AI-driven R&amp;D. Four-month funded fellowships for external researchers.</p></li><li><p><strong><a href="https://www.crewai.com/blog">CrewAI shipped Discovery</a></strong> | CrewAI &#8212; Analyzes production logs and proposes specific automation workflows with expected ROI. Agents finding work for other agents.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/03/this-is-fine-creator-says-ai-startup-stole-his-art/">&#8220;This is Fine&#8221; creator says AI startup stole his art</a></strong> | TechCrunch &#8212; Artisan used the meme to advertise a product that replaces salespeople. The irony writes itself.</p></li><li><p><strong><a href="https://gizmodo.com/more-than-a-third-of-all-new-podcasts-are-ai-generated-2000753786">39% of new podcasts are likely AI-generated</a></strong> | Gizmodo &#8212; One company alone publishes 3,000 episodes per week.</p></li><li><p><strong><a href="https://openai.com/index/testing-ads-in-chatgpt/">OpenAI is testing ads in ChatGPT</a></strong> | OpenAI &#8212; Expanding to UK, Mexico, Brazil, Japan, South Korea. CPC bidding, Conversions API, agency partnerships with Dentsu and Omnicom.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/06/spacex-may-spend-up-to-119-billion-on-terafab-chip-factory-in-texas/">SpaceX plans a $55B AI chip fab in Texas</a></strong> | TechCrunch &#8212; Called Terafab, could scale to $119B. Musk building chip manufacturing while testifying he distilled OpenAI&#8217;s models.</p></li><li><p><strong><a href="https://venturebeat.com/technology/the-app-store-for-robots-has-arrived-hugging-face-launches-open-source-reachy-mini-app-store-with-200-apps">Hugging Face launched a robot app store</a></strong> | VentureBeat &#8212; 200+ community apps for Reachy Mini. Open-source robotics got its app store moment.</p></li><li><p><strong><a href="https://techcrunch.com/2026/03/09/yann-lecuns-ami-labs-raises-1-03-billion-to-build-world-models/">AMI Labs (Yann LeCun) closed a $1.03B round</a></strong> | TechCrunch &#8212; Europe&#8217;s largest seed round ever. Building world models, not LLMs.</p></li><li><p><strong><a href="https://simonwillison.net/">Simon Willison: vibe coding and agentic engineering have merged</a></strong> | Simon Willison &#8212; The guy who coined neither term says the distinction collapsed in his own practice.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Persistent Memory for Claude Managed Agents: What I Found After Three Days of Building]]></title><description><![CDATA[A hands-on review of Anthropic's persistent memory for Claude Managed Agents, including three sessions, one real failure, and the audit trail that recovered it.]]></description><link>https://www.anothercodingblog.com/p/persistent-memory-for-claude-agents</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/persistent-memory-for-claude-agents</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Thu, 07 May 2026 14:36:56 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5435ca5e-44e5-41f4-b4c7-012a71d24190_1448x1086.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>What I was trying to figure out</h2><p>A few weeks ago, Anthropic shipped something I&#8217;d been waiting for: persistent <strong>memory stores</strong> for <a href="https://platform.claude.com/docs/en/managed-agents/overview">Claude Managed Agents</a>. The pitch is that you get a versioned, FUSE-mounted file directory that an agent can read and write across sessions, so even when the session container is destroyed, the memory persists and is available the next time you start a session.</p><p>That sounded promising on paper, but I wanted to know what it actually feels like to use, what it costs, where it breaks, and whether the platform actually saves you when something goes wrong (because something always does in real systems).</p><p>So I spent a few days building with it: one agent, one persistent memory store, three sessions, a small inspector CLI, five charts, and about $0.40 in total API spend. Somewhere in the middle of all that, the agent destroyed almost 6KB of carefully-written notes in a single tool call, which turned out to be the most honest finding of the entire review and is where I want to start.</p><p>The platform&#8217;s immutable versioning let me recover the file byte-for-byte, with full attribution of which session caused the damage. Cross-session memory works as advertised, agents will sometimes get it wrong even when they&#8217;re trying to do the right thing, and the audit trail is the kind of feature you don&#8217;t really appreciate until you need it. Let me walk through how I got there.</p><div><hr></div><h2>The four building blocks</h2><p>Before we go any further, you need to understand the four building blocks Managed Agents is built on, because the architecture only really makes sense once you can keep them straight.</p><p><strong>Agent.</strong> A persisted, versioned config that holds your model selection, system prompt, tools, MCP servers, and skills. You create one and reuse it forever, and updating an agent produces a new immutable version that existing sessions can pin to. Agents are always permanent until you archive them, which means there&#8217;s no ephemeral mode.</p><p><strong>Environment.</strong> A template for the sandbox container an agent&#8217;s tools execute in. Persistent and reusable across agents, much like a Dockerfile that you point lots of services at.</p><p><strong>Session.</strong> A single run of an agent inside an environment, where the live action happens. You send messages and stream events back, and sessions are transient by design, so the container dies when the session ends.</p><p><strong>Memory store.</strong> A workspace-scoped, persistent file directory you can mount into a session, which survives across sessions and records every write with full audit metadata. The agent reads and writes through normal file tools rather than through some special &#8220;memory tool,&#8221; so it&#8217;s just files in a folder.</p><p>The architectural beat that took me longest to internalize is that agents and memory stores are independent resources: the agent has no <code>memory_store</code> field, the memory store has no <code>agent</code> field, and the two get glued together at session creation time, like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;77d6e3dc-3a05-4518-9f41-a3f327bf01b9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">session = client.beta.sessions.create(
    agent=AGENT_ID,
    environment_id=ENV_ID,
    resources=[
        {"type": "memory_store", "memory_store_id": STORE_ID, "access": "read_write"}
    ],
)
</code></pre></div><p>A few things worth sitting with before we move on. The first is that memory in this system is just files, with no vector embeddings, no semantic search, and no automatic summarization happening behind the scenes; the agent uses <code>read</code>, <code>write</code>, <code>edit</code>, <code>glob</code>, <code>grep</code>, and <code>bash</code> exactly the way it would on any other filesystem. The second is that you&#8217;re paying for the harness around the model rather than the model itself: container provisioning, the event stream, the FUSE-mounted memory, immutable versioning, and the audit trail are what you&#8217;re actually getting, and if you don&#8217;t need that harness, the regular Messages API is the right tool for the job.</p><div><hr></div><h2>Setting things up</h2><p>There&#8217;s a clean way to work with Managed Agents that&#8217;s worth doing right from the start, which is splitting your project into a control plane (the persistent resources) and a data plane (the runtime code). Anthropic&#8217;s docs recommend this split, and after a few hours of building you&#8217;ll see why they matter.</p><p>The control plane is where your agents, environments, and memory stores live as static configs. You define them as YAML, version them in git like any other infrastructure, and apply them with Anthropic&#8217;s CLI by running something like <code>ant beta:agents create &lt; my-agent.yaml</code>. The CLI returns a stable resource ID, which is what your runtime code references for the lifetime of that resource.</p><p>The data plane is everything dynamic and per-task: sessions, events, memory operations, and anything else that happens during an actual run. This is where your application code lives, loading the resource IDs from <code>.env</code>, calling <code>client.beta.sessions.create(...)</code> with whatever parameters the current task needs, and streaming events back as the agent works.</p><p>The researcher agent itself is small enough to fit in a single YAML block:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;7c32dbf9-9147-4925-ae32-fcd0eaef36c3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">name: researcher
model: claude-sonnet-4-6
system: |
  You are a careful, persistent research assistant.
  You have a research notebook mounted at /mnt/memory/research-notes/. Use it
  freely to store anything worth remembering across sessions. Organize the
  directory however makes sense to you.

  Some habits to keep:
  - Before researching a topic, check if you've already taken notes on it.
  - When you learn something new, write it down.
  - When updating an existing note, prefer surgical edits over full rewrites.
  - Cite sources for any factual claims.
tools:
  - type: agent_toolset_20260401
</code></pre></div><p>A few choices in there are worth flagging. I went with Sonnet 4.6 over Opus because it&#8217;s about three times cheaper and more than capable for this kind of work, and the prebuilt <code>agent_toolset_20260401</code> gives the agent <code>bash</code>, <code>read</code>, <code>write</code>, <code>edit</code>, <code>glob</code>, <code>grep</code>, <code>web_search</code>, and <code>web_fetch</code>, all of which execute server-side in the session container without me having to implement any of them. I deliberately gave the agent very little guidance on how to organize its memory directory, because I wanted to see what it would do unprompted.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The single most important line in that prompt is the first habit, &#8220;Before researching a topic, check if you&#8217;ve already taken notes on it.&#8221; Without it, cross-session memory remains theoretical, but with it the habit fires reliably and memory turns into something the agent actually uses rather than a feature it has access to but never reaches for.</p><p>The runtime script comes out to about 130 lines, most of which is event-stream handling. The substantive piece is mounting the memory store via the session&#8217;s <code>resources</code> array (shown above) and then opening the event stream before sending the kickoff message, because stream-first ordering matters here: events buffered before you connect arrive in a single batch instead of streaming in real-time.</p><p>With all that in place, I ran three sessions against the same memory store, and those three sessions are the spine of this review.</p><div><hr></div><h2>Three sessions</h2><h3>Session 1: writing notes from scratch</h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;f8ff55a0-99f8-4cec-be66-c12be2265330&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">research_session.py "research CRDTs (Conflict-free Replicated Data Types) and take notes. Focus on what they are, the main families, and a few concrete examples. Cite sources."
</code></pre></div><p>What I wanted to see was what the agent would do if I gave it total freedom to organize its memory directory. Would it create folders? Topic subdirectories? One flat file? A nested hierarchy with cross-references?</p><p>The agent&#8217;s first action was a <code>bash</code> command running <code>rg</code> against <code>/mnt/memory/</code> to grep for prior notes, which means the &#8220;check first&#8221; instruction in the system prompt fired correctly even though there was nothing to find on this first run. It then issued two parallel <code>web_search</code> calls (which both returned <code>content: []</code>, more on that quirk later), composed comprehensively from training-data knowledge instead, and wrote a single 7,285-byte file to <code>/crdts.md</code> with a flat, well-organized markdown structure rather than a folder hierarchy.</p><p>The detail that surprised me most was the discovery aid the agent added without being asked: the very first line under the title was <code>*keywords: CRDT, conflict-free, replicated, distributed, state-based, operation-based, CvRDT, CmRDT*</code>, which the agent had clearly written for its future self to grep against. Nobody told it to write keyword tags, and it chose to do so on its own, which is the kind of thing that made me think Sonnet 4.6 has actual instincts about how file-based memory works.</p><p>This first session cost about $0.21.</p><h3>Session 2: recall</h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c2b32142-aad6-4c68-a5c6-a5c662cc4acf&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">research_session.py "What do you know about CRDTs? Specifically the difference between state-based and operation-based, and a couple concrete examples."
</code></pre></div><p>The prompt for this one deliberately doesn&#8217;t mention memory, because I wanted to see whether the &#8220;check first&#8221; habit would fire unprompted, with the trigger being the agent&#8217;s own internal sense of &#8220;you have notes, you should know to look.&#8221;</p><p>It did, and the result was almost too clean: the first action was the same <code>bash</code>/<code>rg</code> over the memory directory, which found <code>/crdts.md</code>, and the agent then said &#8220;I have solid notes on this&#8221; and answered the question by synthesizing from its own past notes without running a single new web search or composing anything from scratch.</p><p>After the session ended, I ran the inspector against the store and found that the version history of <code>/crdts.md</code> still showed exactly one version, attributed to Session 1&#8217;s ID. Session 2&#8217;s session ID does not appear anywhere in the audit log, because Session 2 only read from the store and never wrote to it. That&#8217;s the falsifiable claim, made falsifiable: reads do not create memory versions.</p><p>The cost worked out to about $0.04, which is roughly five times cheaper than Session 1 and demonstrates pretty clearly that memory turns one expensive session into many cheap ones:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!El2E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!El2E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!El2E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!El2E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!El2E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!El2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:43288,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196778476?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!El2E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!El2E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!El2E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!El2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you&#8217;re worried about the cost of using memory at scale, this matters: persistent memory is a feature rather than a tax, because the agent reads its own notes and skips the work it already did instead of recomputing everything from scratch every time.</p><h3>Session 3: modify</h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;1b0946db-83f6-4795-9b3b-88da433aa797&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">research_session.py "Update your CRDT notes. Add a note about RGA (Replicated Growable Array)..."
</code></pre></div><p>This was supposed to be the cleanest of the three sessions, a small, surgical edit producing a second version of <code>/crdts.md</code>with an <code>operation: modified</code> entry in the audit log, and that&#8217;s not what happened.</p><div><hr></div><h2>Where this got interesting</h2><p>The actual sequence of events from Session 3 is worth walking through layer by layer, because the failure mode is more interesting than a single bug.</p><h3>Layer 1: the model wrote a buggy <code>bash</code> command</h3><p>The agent&#8217;s check-first command was the following:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;70e708f5-c822-49e2-aae8-b1b31dcecdd1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">rg -i 'crdt\\\\|sequence\\\\|rga\\\\|replicated growable' /mnt/memory/research-notes/ -l
</code></pre></div><p>The <code>\\\\|</code> in that regex was meant as escaped pipes for ripgrep&#8217;s regex alternation, but bash interprets <code>\\\\|</code> as <code>\\|</code>, and ripgrep treats that as a literal <code>|</code> character rather than as a meta-character. So the search was actually looking for the literal string <code>crdt\\|sequence\\|rga\\|replicated growable</code>, which would never match anything in any actual file. Ripgrep returned no matches and exited with a non-zero status code, which is the correct behavior for &#8220;I found nothing.&#8221;</p><p>The model&#8217;s shell escaping is right almost every time, but the cases where it isn&#8217;t tend to be subtle, and this one happened to be load-bearing.</p><h3>Layer 2: the platform correctly flagged the failure</h3><p>The harness ran the command and produced a <code>tool_result</code> event with <code>is_error: true</code> and <code>(no output)</code> as the content, which is exactly what should have happened given that the command exited non-zero. The platform did its job here and explicitly told the agent loop that the command had failed.</p><h3>Layer 3: the model ignored the error flag</h3><p>The agent&#8217;s next message after that error result was, &#8220;The memory store is empty, no prior CRDT notes.&#8221; That statement was false, because <code>/crdts.md</code> had been sitting in the store for two days at that point, but the agent treated the empty output from the failed command as a meaningful answer rather than as a failure signal that needed re-investigation.</p><p>This is the most interesting failure layer to me, because the platform got it right and the model got it wrong. Defense in depth is a useful framing for what&#8217;s happening: even when the audit trail and error flags are working as designed, the model&#8217;s reasoning about its own tool outputs is the layer that has to hold, and that layer is reasoning rather than infrastructure.</p><h3>Layer 4: the destructive action</h3><p>Believing the store was empty, the agent called <code>write</code> rather than <code>edit</code>, generating a fresh ~1,500-byte RGA-only file from scratch and writing it directly to <code>/crdts.md</code>. The original 7,285-byte file with all of the careful notes from Session 1 was overwritten in a single operation.</p><p>I didn&#8217;t even notice this had happened until I ran the inspector, because from the script&#8217;s perspective Session 3 looked like a normal run; the agent reported back that it had updated the notes and cited the RGA paper, kindly and unintentionally lying because the underlying belief was wrong.</p><h3>What the audit log showed</h3><p>Running <code>inspector log /crdts.md</code> after Session 3 surfaced two versions:</p><pre><code><code>version  memver_0169b&#8230;  modified  session_actor (Session 3)   1509 bytes
version  memver_01A7Z&#8230;  created   session_actor (Session 1)   7285 bytes
</code></code></pre><p>The size dropping from 7,285 bytes to 1,509 bytes is the catastrophe made visible, but the more important fact is that the original is still here, addressable by ID and retrievable in full content via the API, even though the head of the file is now the smaller broken version.</p><p>The diff between the two versions, generated by the inspector&#8217;s <code>diff</code> subcommand, made the loss concrete:</p><pre><code><code>--- memver_01A7Z&#8230; (/crdts.md, 7285B, created)
+++ memver_0169b&#8230; (/crdts.md, 1509B, modified)
@@ -1,122 +1,21 @@
-# CRDTs: Conflict-free Replicated Data Types
-*keywords: CRDT, conflict-free, replicated, ...*
-## What They Are
-CRDTs are data structures designed to be replicated across multiple nodes...
-(... 121 more deletion lines ...)
+# CRDT Research Notes
+## Sequences / Text CRDTs
+### RGA (Replicated Growable Array)
</code></code></pre><p>About 5,800 bytes of careful work disappeared in a single agent action that thought it was creating a brand-new file from scratch, including the state-based versus operation-based section, the G-Counter and OR-Set examples, the math foundation, and the entire sources block at the bottom.</p><h3>How I got it back</h3><p>This is the moment that, on a flat filesystem with no versioning, would have been the end of the story. Without the platform&#8217;s audit log, the original content would simply be gone; it wasn&#8217;t, because the audit log was holding the original verbatim.</p><p>I added a <code>restore</code> subcommand to the inspector that fetches a chosen historical version&#8217;s content and writes it back as the new head via <code>memory_stores.memories.update(memory_id, content=old_content)</code>. Anthropic&#8217;s API records that update as a new version rather than overwriting history, which means the recovery itself becomes part of the audit trail.</p><p>After running the restore, <code>inspector log /crdts.md</code> showed three versions, and the entire arc was right there in the output:</p><pre><code><code>memver_01EKK&#8230;  modified  api_actor (apikey_&#8230;)         7285 B   sha 3f3ec0d2&#8230;  &#8592; matches v1
memver_0169b&#8230;  modified  session_actor (Session 3)    1509 B   sha 7356ce60&#8230;  &#8592; catastrophe
memver_01A7Z&#8230;  created   session_actor (Session 1)    7285 B   sha 3f3ec0d2&#8230;  &#8592; original
</code></code></pre><p>A few details in that output are worth more than they look at first glance. The platform distinguishes operator-side mutations (recorded as <code>api_actor</code> with an <code>apikey_</code> ID) from agent-side ones (recorded as <code>session_actor</code> with a <code>sesn_</code>ID), which makes &#8220;who did this&#8221; forensics actually possible rather than something you&#8217;d have to retrofit yourself. The SHA-256 hash on the restored version matches the original exactly, so the recovery is byte-identical and verifiable rather than approximately right. And the catastrophe (v2) stays in the audit log forever, because recovery doesn&#8217;t erase the record; if you wanted v2&#8217;s content out of the log entirely, you&#8217;d use the <code>redact</code> endpoint, which clears the content while preserving all of the metadata.</p><p>The same story renders cleanly as a chart:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6cfU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6cfU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6cfU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50388,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196778476?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6cfU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The cliff and the recovery are immediately legible: 7,285 bytes, plunge to 1,509, return to 7,285, all in three points and one chart that captures the full narrative.</p><p>This is the section of the post I&#8217;d stake my credibility on. Cross-session memory works, agents will sometimes get it wrong, and the platform&#8217;s audit trail is the thing that saves you when they do.</p><div><hr></div><h2>Important Considerations</h2><p>Building with Managed Agents memory turned up more rough edges than I expected, none of which are dealbreakers but all of which are worth knowing about before you commit to the platform.</p><ul><li><p><strong>Resource IDs need to be persisted yourself.</strong> Every call to <code>agents.create()</code>, <code>environments.create()</code>, or <code>memory_stores.create()</code> returns an opaque ID that your runtime code has to look up later, which is standard cloud-API ceremony but missing some of the friction-reducers other platforms have shipped: agent and environment names aren&#8217;t unique within an account, there&#8217;s no idempotent <code>create_or_update</code>, and there&#8217;s no Terraform provider yet, so you end up doing the capture-and-paste-into-<code>.env</code> dance manually.</p></li><li><p><strong>Memory store </strong><code>description</code><strong> must be single-line.</strong> The API rejects any control character, including newlines, with a cryptic regex error, which is inconsistent with agent system prompts that are explicitly multi-line up to 100K chars. It&#8217;s easy to fix once you know about it.</p></li><li><p><strong>Memory paths are store-relative rather than mount-relative.</strong> When the agent writes to <code>/mnt/memory/research-notes/crdts.md</code> inside the container, the API stores the file at <code>/crdts.md</code> and treats the mount-path prefix as a runtime detail, so when you list or retrieve memories host-side you reference the relative path rather than the full container path.</p></li><li><p><strong>Web search results are hidden from the event stream.</strong> When the agent runs <code>web_search</code>, the resulting <code>agent.tool_result.content</code> field is an empty array even when the search clearly succeeded (the agent uses the results downstream to give a correct answer). The model gets the actual search content internally, but the public event surface gets a sanitized empty array, which is almost certainly intentional for IP and copyright reasons but means you cannot log &#8220;what URLs the agent consulted&#8221; without asking the agent to cite them in its outputs.</p></li><li><p><strong>Agent-generated </strong><code>bash</code><strong> invocations aren&#8217;t always well-formed.</strong> The escaping bug that triggered Session 3&#8217;s catastrophe is one example, and defensive system-prompt phrasing helps but doesn&#8217;t eliminate the problem entirely.</p></li><li><p><code>memory_versions.retrieve(version_id, ...)</code><strong> takes the version ID positionally only.</strong> Calling it as <code>retrieve(version_id=...)</code> raises <code>TypeError</code>, even though <code>memories.retrieve(memory_id=..., ...)</code> accepts the keyword form, which is an inconsistency within the same SDK namespace.</p></li><li><p><strong>The streaming method lives at </strong><code>client.beta.sessions.events.stream(...)</code><strong>,</strong> not <code>client.beta.sessions.stream(...)</code> as some doc snippets imply. The latter form doesn&#8217;t exist and will fail at runtime.</p></li><li><p><strong>Print buffering kills real-time observability.</strong> When you run a Python session script in the background or through subprocess, Python buffers stdout, so the script appears to do nothing for minutes and then dumps everything when the agent finishes. The fix is either passing <code>flush=True</code> to print or running the script under <code>python -u</code>.</p></li><li><p><strong>Subscription auth doesn&#8217;t apply to Managed Agents.</strong> API key authentication with per-token billing is the only path, so a Claude Pro or Max subscription doesn&#8217;t help you here even though it works for Claude Code.</p></li></ul><div><hr></div><h2>So when does this make sense?</h2><p>Managed Agents is a deliberately persistent, server-managed harness, so the right question to ask isn&#8217;t &#8220;is it good?&#8221; but &#8220;is the persistent harness shape what my problem actually wants?&#8221;</p><p>Use caseReach for&#8230;One-shot Claude call (classify, extract, summarize)Messages APIMulti-turn conversation, your code holds the stateMessages APIMulti-step pipeline you orchestrate yourselfMessages API + tool usePersistent agent reused across sessions/users with managed sandbox<strong>Managed Agents</strong>Long-running task with memory across sessions<strong>Managed Agents + memory store</strong>Anything requiring a non-Claude modelRoll your own</p><p>A useful rule of thumb is that if your code calls <code>agents.create()</code> more than once for the &#8220;same&#8221; agent, you&#8217;re using the wrong tool. Agents are persistent, versioned configs that you create once and reference forever, so treating Managed Agents like a fancy Messages API and creating agents per request is fighting the platform&#8217;s whole design.</p><p>Now, what about cost? Across all three sessions plus a smoke-test, my total API spend came out to about $0.37, which includes a substantial 7KB notes write, a recall session that exercised the cache heavily, a destructive overwrite, and an operator-side restore.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bVio!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bVio!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 424w, https://substackcdn.com/image/fetch/$s_!bVio!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 848w, https://substackcdn.com/image/fetch/$s_!bVio!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 1272w, https://substackcdn.com/image/fetch/$s_!bVio!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bVio!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png" width="1200" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53103,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196778476?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bVio!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 424w, https://substackcdn.com/image/fetch/$s_!bVio!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 848w, https://substackcdn.com/image/fetch/$s_!bVio!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 1272w, https://substackcdn.com/image/fetch/$s_!bVio!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Memory store doesn&#8217;t measurably move the cost needle, because the agent loop and the model itself are where the spend lives. Sonnet 4.6 with aggressive caching is genuinely affordable for any individual or small team use case, and the platform handles caching for you without any configuration.</p><div><hr></div><h2>What I didn&#8217;t get to (yet)</h2><p>A few features deserve more than a passing mention but didn&#8217;t fit the failure-recovery spine of this post:</p><ul><li><p><strong>Multi-store sessions and the multi-tenant pattern.</strong> A session can mount up to eight memory stores at once, and the natural pattern for a SaaS-shaped application is one shared read-only &#8220;house knowledge&#8221; store plus one read-write per-user store, with the agent definition the same for everyone. Access modes are enforced at the FUSE filesystem level, so <code>read_only</code> is real OS-level enforcement rather than a polite request from the model. This is big enough that I&#8217;m planning to cover it in its own follow-up post.</p></li><li><p><strong>Optimistic concurrency via preconditions.</strong> The <code>update</code> endpoint accepts a <code>precondition: {type: "content_sha256", ...}</code> field, and if the file&#8217;s current SHA doesn&#8217;t match the one you supplied, the API returns a 409 conflict. This is exactly the safety net Session 3&#8217;s agent didn&#8217;t use and the kind of thing that should probably be standard practice for any read-modify-write flow.</p></li><li><p><strong>Redaction.</strong> The <code>memory_versions.redact(version_id)</code> endpoint clears a historical version&#8217;s content while preserving all of the metadata around it, which is useful when a bad version contained PII or leaked secrets and you want them out of the audit log without losing the record that something existed there.</p></li><li><p><strong>MCP server integration.</strong> An agent can declare MCP servers (GitHub, Linear, Notion, and others), the session attaches a vault containing the credentials, and authentication is auto-refreshed by the platform. Pairing memory store with MCP, like a research agent that pulls from your Notion and writes findings to persistent memory, is one of the strongest use cases I can imagine for the platform overall.</p></li></ul><div><hr></div><h2>So... should you use this?</h2><p>If you&#8217;re sitting on the fence about whether to use Managed Agents memory, the answer is yes, with eyes open. The platform is real, the harness around the model is genuinely valuable, and the audit trail is the kind of feature you don&#8217;t appreciate until you need it, which in my case happened on the third session of the third day of building.</p><p>A few practical takeaways for anyone planning to build on this. Use preconditions whenever you can, especially for any flow that does a read-modify-write on the same memory file, because they&#8217;re the safety net that Session 3&#8217;s agent didn&#8217;t have. Build a small amount of host-side observability tooling, because even a 200-line inspector script is enough to catch problems your agent won&#8217;t tell you about. And know which side of the decision rubric your use case falls on before you commit, because Managed Agents is a great tool for the right shape of problem and the wrong tool for one-shot calls or anything that doesn&#8217;t benefit from persistence.</p><p>What do you think? Have you tried building with this yet? I&#8217;d love to hear what your experience has been.</p><div><hr></div><p><em>Full code from the demo (agent YAMLs, runtime scripts, inspector CLI, monitoring charts) is at <a href="https://github.com/taylor-ortiz/claude-memory-managed-agents/blob/main/README.md">https://github.com/taylor-ortiz/claude-memory-managed-agents/blob/main/README.md</a>.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 70]]></title><description><![CDATA[&#8220;You can&#8217;t just steal a charity.&#8221; Elon Musk spent three days on the stand trying to prove it.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-c48</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-c48</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 03 May 2026 13:21:09 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f42054a7-25a7-4ba3-9ae0-e3bdf5e129bf_1448x1086.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VSBj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VSBj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 424w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 848w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 1272w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VSBj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png" width="1456" height="2281" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2281,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:892417,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196304743?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VSBj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 424w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 848w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 1272w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>&#8220;You can&#8217;t just steal a charity.&#8221; Elon Musk spent three days on the stand trying to prove it.</h2><p>The Musk v. OpenAI trial opened in Oakland federal court. </p><ul><li><p><strong>The context:</strong> Musk contributed <a href="https://www.npr.org/2026/04/28/nx-s1-5801438/musk-altman-openai-trial-opening-statements">$38 million</a> to found OpenAI as a nonprofit and alleges Altman and Brockman looted it by converting to a for-profit. He&#8217;s seeking $150 billion in damages and their removal from leadership. If he wins, it could block OpenAI&#8217;s planned IPO at a ~$1 trillion valuation.</p></li><li><p><strong>The distillation admission:</strong> Under cross-examination, Musk admitted xAI <a href="https://techcrunch.com/2026/04/30/elon-musk-testifies-that-xai-trained-grok-on-openai-models/">&#8220;partly&#8221;</a> used OpenAI&#8217;s models to train Grok, drawing audible gasps in the courtroom. He called it &#8220;standard practice.&#8221;</p></li><li><p><strong>The industry reacted:</strong> <a href="https://x.com/ylecun/status/2050039348679024779">LeCun retweeted Cl&#233;ment Delangue</a> calling restrictions on distillation &#8220;pulling the ladder.&#8221; <a href="https://x.com/natolambert/status/2049974505343488171">Lambert noted</a> American companies distill Chinese open models just as freely, and <a href="https://x.com/natolambert/status/2049996372938793194">questioned why OpenAI doesn&#8217;t just revoke contracts</a> from violators like they did with ByteDance.</p></li><li><p><strong>OpenAI&#8217;s counter-narrative:</strong> Attorney Savitt <a href="https://www.cnn.com/2026/04/29/business/takeaways-elon-musk-sam-altman-openai-trial">argued</a> Musk wanted majority control, pitched Tesla acquiring OpenAI, and only sued after founding xAI. Emails showed him <a href="https://gizmodo.com/everything-you-missed-from-elon-musks-testimony-in-the-openai-trial-2000753364">poaching OpenAI researchers</a> while still on the board.</p></li><li><p><strong>The cross-examination was rough:</strong> Musk told the jury <a href="https://www.theverge.com/tech/921022/elon-musk-cross-openai-altman">&#8220;I don&#8217;t lose my temper&#8221;</a> then raised his voice minutes later. The Verge&#8217;s summary: <a href="https://www.theverge.com/ai-artificial-intelligence/920191/elon-musk-sam-altman-trial-day-one">&#8220;more petty than prepared.&#8221;</a> Texts revealed <a href="https://gizmodo.com/everything-you-missed-from-elon-musks-testimony-in-the-openai-trial-2000753364">Shivon Zilis asked Musk</a> whether to &#8220;stay close and friendly to OpenAI to keep info flowing&#8221; after his departure.</p></li><li><p><strong>What&#8217;s next:</strong> The judge expressed skepticism about both sides&#8217; safety claims. Altman and Brockman testify in the coming weeks.</p></li></ul><div><hr></div><h2><strong>$900 billion valuation, 50% less sycophancy, and connectors for every creative tool you use.</strong></h2><p>Anthropic had one of those weeks where the breadth of activity tells the story.</p><ul><li><p><strong>The valuation:</strong> Reportedly <a href="https://techcrunch.com/2026/04/29/sources-anthropic-could-raise-a-new-50b-round-at-a-valuation-of-900b/">raising $50 billion at a $900 billion valuation</a>, a number that rivals established tech giants.</p></li><li><p><strong>The sycophancy research:</strong> <a href="https://www.anthropic.com/research/claude-personal-guidance">Analyzed 1 million Claude conversations</a>, found a 9% sycophancy rate (25% in relationship discussions), built synthetic training scenarios from real failure cases, and cut sycophancy roughly 50% in Opus 4.7 and Mythos Preview. One of the most transparent published alignment efforts to date.</p></li><li><p><strong>BioMysteryBench:</strong> Claude <a href="https://www.anthropic.com/research/Evaluating-Claude-For-Bioinformatics-With-BioMysteryBench">solved roughly 30% of 23 bioinformatics problems</a> that stumped a human expert panel.</p></li><li><p><strong>Claude for Creative Work:</strong> Shipped <a href="https://www.anthropic.com/news/claude-for-creative-work">connectors for Adobe Creative Cloud, Blender, Ableton, Canva, Affinity, SketchUp, Splice, and Resolume</a>, and joined the Blender Development Fund as a patron.</p></li><li><p><strong>Claude Security:</strong> Launched <a href="https://www.anthropic.com/news/claude-code-security">codebase vulnerability scanning</a> in public beta for Enterprise customers.</p></li><li><p><strong>Meanwhile, at the Senate:</strong> Defense Secretary Hegseth <a href="https://www.msn.com/en-us/news/technology/hegseth-calls-anthropic-ceo-a-lunatic-defends-pentagon-ai-use/ar-AA227pKG">called CEO Dario Amodei an &#8220;ideological lunatic&#8221;</a> at an Armed Services Committee hearing.</p></li></ul><div><hr></div><h2><strong>OpenAI ended its Microsoft exclusivity and went multi-cloud.</strong></h2><p>OpenAI restructured its Microsoft deal, launched on AWS, and shipped a wave of Codex upgrades all in the same week.</p><ul><li><p><strong>The exclusivity is over:</strong> Microsoft <a href="https://openai.com/index/next-phase-of-microsoft-partnership/">ended its exclusive license</a> to OpenAI&#8217;s technology. OpenAI can now sell on AWS and Google Cloud through 2032.</p></li><li><p><strong>AWS moved immediately:</strong> Amazon <a href="https://openai.com/index/openai-on-aws/">began offering OpenAI models, Codex, and Managed Agents</a> on AWS. Day-zero availability.</p></li><li><p><strong>The AGI clause is dead:</strong> Simon Willison <a href="https://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/">tracked the history</a> of the clause that would have let OpenAI walk away from Microsoft once AGI was declared. It&#8217;s gone. OpenAI traded its theoretical nuclear option for commercial freedom now.</p></li><li><p><strong>The product push:</strong> Altman said Codex is <a href="https://x.com/sama/status/2049493609028923826">&#8220;having a ChatGPT moment&#8221;</a>. Brockman said the <a href="https://x.com/sama/status/2049493182866747765">Codex app replaced his terminal</a> as his primary computer interface. OpenAI is treating Codex as a flagship product launch, not a side feature.</p></li><li><p><strong>Nadella&#8217;s take:</strong> Microsoft gets royalty-free access to OpenAI&#8217;s frontier models through 2032, no longer pays OpenAI for them, and OpenAI is committed to buying <a href="https://techcrunch.com/2026/04/29/satya-nadella-says-hes-ready-to-exploit-the-new-openai-deal/">$250 billion in Azure</a>. Nadella told analysts he &#8220;fully plan[s] to exploit it.&#8221;</p></li></ul><div><hr></div><h2><strong>Most cloud providers beat earnings. OpenAI missed.</strong></h2><p>The hyperscalers are spending record amounts on AI infrastructure and seeing record returns. Meanwhile, the Wall Street Journal <a href="https://sherwood.news/markets/openai-linked-stocks-suffer-after-wsj-reports-that-the-company-has-missed-key-revenue-and-user-targets/">reported</a> that OpenAI missed revenue and user growth targets, with Anthropic and Gemini cited as gaining ground.</p><ul><li><p><strong>The cloud numbers:</strong> <a href="https://techcrunch.com/2026/04/29/google-cloud-surpasses-20b-but-says-growth-was-capacity-constrained/">Google Cloud surpassed $20 billion</a> but said growth was capacity-constrained. <a href="https://techcrunch.com/2026/04/29/amazons-cloud-business-is-surging-and-so-is-its-capital-spending/">AWS surged on AI demand</a>. Microsoft disclosed a <a href="https://www.geekwire.com/2026/microsoft-tops-wall-street-expectations-reports-accelerating-azure-growth-and-37b-ai-run-rate/">$37 billion AI revenue run rate</a> (up 123% YoY), <a href="https://techcrunch.com/2026/04/29/microsoft-says-it-has-over-20m-paid-copilot-users-and-they-really-are-using-it/">20 million paid Copilot users</a>, and set calendar-year CapEx at $190 billion.</p></li><li><p><strong>The supply chain is feeling it:</strong> <a href="https://www.sammobile.com/2026/04/30/samsung-q1-2026-profit-hits-record-high-ai-chip-boom/">Samsung chip profits jumped nearly 50-fold</a> on AI memory demand. Their executive: &#8220;our supply falls far short of customer demand.&#8221; The shortage is expected to <a href="https://www.sammobile.com/2026/04/30/samsung-q1-2026-profit-hits-record-high-ai-chip-boom/">widen further in 2027</a>.</p></li><li><p><strong>Meta is the most interesting story:</strong> Raised its CapEx forecast, then <a href="https://www.reuters.com/business/world-at-work/meta-ceo-attributes-layoffs-plan-capex-wont-rule-out-further-job-cuts-2026-04-30/">Zuckerberg blamed layoffs on capital spending</a> and wouldn&#8217;t rule out more cuts, then raised <a href="https://www.reuters.com/business/meta-looks-raise-up-25-billion-with-bond-sale-bloomberg-news-reports-2026-04-30/">$25 billion in bonds</a> to fund the AI buildout. Cutting people to buy GPUs, then borrowing to buy more.</p></li><li><p><strong>The counterpoint nobody expected:</strong> <a href="https://www.theverge.com/tech/920815/google-alphabet-q1-2026-earnings-sundar-pichai">Google Search queries hit an all-time high</a>. <a href="https://techcrunch.com/2026/04/30/apple-was-surprised-by-ai-driven-demand-for-macs/">Apple was surprised by AI-driven Mac demand</a>. The &#8220;AI kills search&#8221; and &#8220;AI doesn&#8217;t need hardware&#8221; narratives both took a hit.</p></li><li><p><strong>But the utilization story:</strong> Cast AI <a href="https://venturebeat.com/infrastructure/fomo-is-why-enterprises-pay-for-gpus-they-dont-use-and-why-prices-keep-climbing/">measured tens of thousands of production Kubernetes clusters</a> and found GPU utilization averaging 5%. Teams lock in multi-year commitments the moment allocation comes through, then won&#8217;t release idle capacity because reacquiring takes months.</p></li></ul><div><hr></div><h2><strong>&#11088; Featured: Symphony turns your issue tracker into an autonomous coding fleet</strong></h2><p>OpenAI released <a href="https://openai.com/index/open-source-codex-orchestration-symphony/">Symphony</a>, an open-source spec that turns Linear boards into control planes for Codex agents. Every open task gets an agent. Agents run continuously. Humans review the results.</p><p>The origin story matters: an OpenAI team decided to build their entire repo with zero human-written code. They documented how in a <a href="https://openai.com/index/harness-engineering/">harness engineering post</a>: a million lines of code, 1,500 merged PRs, 3.5 PRs per engineer per day, with Codex running six-hour autonomous sessions while engineers slept and reviewing its own code agent-to-agent. But they hit a new ceiling: human attention. Engineers could manage three to five Codex sessions before context switching killed productivity. They had &#8220;built a team of extremely capable junior engineers, then assigned our human engineers to micromanaging them.&#8221;</p><p>So they flipped the model. Instead of engineers managing coding sessions, they made the issue tracker the orchestrator. Each open Linear issue maps to a dedicated agent workspace. Symphony continuously polls the board, picks up new work, restarts agents that crash or stall, watches CI, rebases when needed, resolves conflicts, and shepherds changes through the pipeline.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hhTj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hhTj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 424w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 848w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 1272w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hhTj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png" width="1456" height="619" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:619,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:161357,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196304743?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hhTj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 424w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 848w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 1272w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once work is abstracted to the ticket level, agents can break large tasks into dependency trees, only starting work on tasks that aren&#8217;t blocked. They also create their own follow-up tickets when they spot issues outside the current scope. One engineer on the team made three significant changes from the Linear app on his phone from a cabin on bad wifi.</p><p>The results: a 500% increase in landed PRs on some teams in three weeks. But the deeper shift is behavioral. When the perceived cost of each code change drops to near zero, teams start filing speculative tasks. Try an idea, explore a refactor, test a hypothesis, keep only what works. Product managers and designers can file feature requests directly into Symphony and get back a review packet with a video walkthrough of the feature running in the real product.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pw11!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pw11!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 424w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 848w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 1272w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pw11!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png" width="1456" height="790" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:790,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:333327,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196304743?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pw11!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 424w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 848w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 1272w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The technical choices are worth noting. The reference implementation is in Elixir, chosen for its concurrency primitives. With v1.1.0, Symphony supports the Kata CLI as an alternative runtime, meaning you can run Claude Code, Gemini, or other models inside the same orchestration framework. Symphony is technically just a <code>SPEC.md</code> file: a definition of the problem and the intended solution, not a product. OpenAI gave agents objectives instead of strict state transitions, &#8220;much like a good manager would assign a goal to a direct report.&#8221;</p><p><strong>What to watch for:</strong> Symphony is one of several orchestration plays that landed this same week. <a href="https://x.com/cursor_ai/status/2049499866217185492">Cursor released an SDK</a> letting companies like Rippling and Notion embed background agents. <a href="https://venturebeat.com/orchestration/ibm-launches-bob-with-multi-model-routing-and-human-checkpoints-to-turn-ai-coding-into-a-secure-production-system/">IBM launched Bob</a> with human-checkpoint governance. <a href="https://venturebeat.com/technology/mistral-ai-launches-workflows-a-temporal-powered-orchestration-engine-already-running-millions-of-daily-executions/">Mistral shipped Workflows</a> running millions of daily executions. <a href="https://blog.n8n.io/n8n-mcp-server/">n8n shipped an MCP server</a> so Claude can build automation workflows through conversation. The competitive moat is shifting from &#8220;best coding model&#8221; to &#8220;best orchestration spec.&#8221; If you maintain a team that ships code, start here.</p><div><hr></div><h2><strong>Worth a Listen</strong></h2><div id="youtube2-9-TVwv6wtGQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;9-TVwv6wtGQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/9-TVwv6wtGQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>OpenAI researchers Sebastian Bubeck and Ernest Ryu on the OpenAI podcast.</p><ul><li><p><strong>The 42-year-old problem:</strong> Researcher spent 40+ hours failing without AI. With ChatGPT, solved it in 12 hours across three evenings.</p></li><li><p><strong>The Erdos problems:</strong> 10+ completely new, publishable solutions to decades-old open problems. Fully original proofs, not literature searches.</p></li><li><p><strong>AGI time:</strong> Bubeck&#8217;s framework. Four years ago, models could think for seconds. Now days. The goal is weeks, then months.</p></li><li><p><strong>The warning:</strong> Non-mathematicians are producing pages of AI-generated proofs that turn out wrong. The models accelerate experts, not replace them.</p></li></ul><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://venturebeat.com/technology/why-openais-goblin-problem-matters-and-how-you-can-release-the-goblins-on-your-own">GPT-5.1&#8217;s goblin problem</a></strong> | VentureBeat &#8212; A &#8220;Nerdy personality&#8221; training signal accidentally over-rewarded goblin-adjacent language. OpenAI diagnosed it with Codex, fixed it, then threw a party. The Codex system prompt literally says <a href="https://simonwillison.net/2026/Apr/28/openai-codex/">&#8220;never discuss goblins, gremlins, raccoons, trolls, ogres, pigeons, or similar creatures.&#8221;</a></p></li><li><p><strong><a href="https://www.digitaltrends.com/movies/academy-just-said-it-out-loud-ai-cant-win-an-oscar-for-acting-and-writing/">The Academy ruled AI can&#8217;t win an Oscar</a></strong> | Digital Trends &#8212; Performances must be &#8220;demonstrably performed by humans with their consent.&#8221; Finally, a benchmark AI can&#8217;t game.</p></li><li><p><strong><a href="https://x.ai/news/grok-custom-voices">xAI launched Custom Voices</a></strong> | xAI &#8212; Clone your voice from 2 minutes of audio, 80+ preinstalled voices, 28 languages, speaker verification built in. Dropped alongside Grok 4.3 at aggressive pricing.</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/30/stripe-link-digital-wallet-ai-agents-shopping/">Stripe Link now supports AI agents</a></strong> | TechCrunch &#8212; A digital wallet that autonomous agents can use for payments. AI just got its own financial infrastructure.</p></li><li><p><strong><a href="https://www.reuters.com/legal/litigation/taylor-swift-files-trademark-her-voice-likeness-ward-off-ai-deepfakes-2026-04-27/">Taylor Swift trademarked her voice against AI</a></strong> | Reuters &#8212; Filed new trademarks for her voice and likeness. The legal playbook for protecting creative identity from AI is being written in real time.</p></li><li><p><strong><a href="https://simonwillison.net/2026/Apr/30/zig-anti-ai/">Zig bans all LLM contributions</a></strong> | Simon Willison &#8212; Bun (acquired by Anthropic) achieved a 4x Zig compilation improvement it cannot upstream because of the ban. When your open-source policy blocks a 4x speedup, that&#8217;s a policy worth debating.</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/30/after-dissing-anthropic-for-limiting-mythos-openai-restricts-access-to-cyber-too/">OpenAI restricted its Cyber model</a></strong> | TechCrunch &#8212; After publicly criticizing Anthropic for limiting Mythos access. The UK AISI <a href="https://simonwillison.net/2026/Apr/30/gpt-55-cyber-capabilities/">evaluated GPT-5.5&#8217;s cyber capabilities</a> and found it comparable to Mythos. Turns out responsible disclosure looks the same from every lab.</p></li><li><p><strong><a href="https://venturebeat.com/orchestration/alibabas-metis-agent-cuts-redundant-ai-tool-calls-from-98-to-2-and-gets-more-accurate-doing-it">Alibaba&#8217;s Metis cut redundant agent tool calls from 98% to 2%</a></strong> | VentureBeat &#8212; And got more accurate doing it. If your agents are burning tokens on redundant calls, this research is worth reading.</p></li><li><p><strong><a href="https://simonwillison.net/2026/Apr/28/pip-261/">pip 26.1 shipped lockfiles</a></strong> | Simon Willison &#8212; <code>pip lock</code> generating <code>pylock.toml</code> files and dependency cooldowns via <code>--uploaded-prior-to</code>. Python supply chain security just got a real tool.</p></li><li><p><strong><a href="https://deepmind.google/blog/ai-co-clinician/">DeepMind&#8217;s AI co-clinician matched physicians</a></strong> | Google DeepMind &#8212; Zero critical errors in 97 of 98 primary care queries. Uses a dual-agent architecture where a Planner monitors a Talker for safety. This is what AI safety in production actually looks like in healthcare.</p></li><li><p><strong><a href="https://www.reuters.com/business/healthcare-pharmaceuticals/jj-sees-ai-halving-time-generate-drug-development-leads-2026-04-27/">J&amp;J sees AI halving drug development lead time</a></strong> | Reuters &#8212; Real ROI from a real pharma company. Not a demo, not a benchmark. Production drug discovery running twice as fast.</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/29/softbank-is-creating-a-robotics-company-that-builds-data-centers-and-already-eyeing-a-100b-ipo/">SoftBank is building a robotics company and eyeing a $100B IPO</a></strong> | TechCrunch &#8212; A robotics company that builds data centers. IPO target: $100 billion. Masayoshi Son is not being subtle about what he thinks comes next.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 69]]></title><description><![CDATA[This weeks themes from 553 articles across 47 sources. GPT-5.5's bio risk rating. Mythos breached. SpaceX bids for Cursor. DeepSeek at one-sixth the price. Claude bought ping-pong balls.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7f1</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7f1</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 26 Apr 2026 22:58:38 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/472c5e17-8072-4f2b-861f-7bd6bc6f1b57_1448x1086.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XlhQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XlhQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 424w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 848w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 1272w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png" width="1456" height="1554" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1554,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:377520,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195568711?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XlhQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 424w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 848w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 1272w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>GPT-5.5, Images 2.0, Workspace Agents, a Florida AG Probe, and a Fake News Scandal.</h2><p>The launch parade started Monday and didn&#8217;t stop: <a href="https://openai.com/index/introducing-chatgpt-images-2-0/">ChatGPT Images 2.0</a> with thinking-first generation, <a href="https://openai.com/index/introducing-workspace-agents-in-chatgpt/">Workspace Agents for enterprise</a> replacing custom GPTs, <a href="https://x.com/OpenAI/status/2047376568809636017">GPT-5.5 across ChatGPT and Codex</a> with SOTA on SWE-bench and Terminal-Bench 2.0, and <a href="https://x.com/sama/status/2046604989527912590">Codex crossing 4 million active users</a>. By Friday, Sam Altman posted <a href="https://x.com/sama/status/2047823357635354814">&#8220;this was a good week.&#8221;</a></p><ul><li><p><strong>The model:</strong> <a href="https://x.com/sama/status/2047379036419014928">GPT-5.5 launched at $5 per million input tokens and $30 per million output tokens</a> with a 1M context window, matching GPT-5.4 per-token latency while using fewer tokens per task. The <a href="https://openai.com/index/gpt-5-5-system-card/">System Card rated it &#8220;High&#8221; risk on both biosecurity and cybersecurity</a>, and OpenAI launched a <a href="https://openai.com/index/gpt-5-5-bio-bug-bounty/">$25,000 Bio Bug Bounty</a> targeting its own bio safety guardrails.</p></li><li><p><strong>The inference bet:</strong> Altman praised the team that optimized GPT-5.5&#8217;s serving efficiency, then said OpenAI <a href="https://x.com/sama/status/2047386068194852963">&#8220;has to become an AI inference company now.&#8221;</a> The competitive edge is shifting from who builds the best model to who serves it cheapest and fastest.</p></li><li><p><strong>The image model:</strong> <a href="https://x.com/OpenAI/status/2046670989719924768">Images 2.0 runs a reasoning step before generating</a>, self-checks outputs, handles multilingual text, and supports aspect ratios from 3:1 banners to 1:3 posters. Altman said it <a href="https://x.com/sama/status/2047349336263012771">&#8220;got over some important qualitative threshold&#8221;</a> for him personally.</p></li><li><p><strong>The criminal investigation:</strong> <a href="https://www.npr.org/2026/04/21/nx-s1-5793967/florida-openai-investigation-mass-shooting-fsu">Florida&#8217;s AG opened a criminal investigation into OpenAI</a> following the FSU shooting. Altman <a href="https://www.reuters.com/sustainability/society-equity/openai-chief-apologizes-not-reporting-shooting-suspect-police-2026-04-25/">publicly apologized for not reporting the suspect&#8217;s ChatGPT conversations to police</a>. The same week, <a href="https://startupfortune.com/openais-super-pac-allegedly-funded-a-fake-news-site-staffed-by-ai-reporters/">OpenAI&#8217;s super PAC was found to be funding a fake news site staffed by AI-generated bot reporters</a> targeting AI safety researchers and critics of the company.</p></li></ul><div><hr></div><h2>$65 Billion Investment, a Mythos Breach, and 271 Firefox Bugs.</h2><p>The capital story is genuinely staggering. <a href="https://www.cnbc.com/2026/04/24/google-to-invest-up-to-40-billion-in-anthropic-as-search-giant-spreads-its-ai-bets.html">Google announced up to $40 billion</a> in cash and compute. <a href="https://x.com/AnthropicAI/status/2046327625367625773">Amazon put in $5 billion immediately</a>, with up to $20 billion more committed, in exchange for <a href="https://techcrunch.com/2026/04/20/anthropic-takes-5b-from-amazon-and-pledges-100b-in-cloud-spending-in-return/">Anthropic pledging $100 billion back to AWS</a> and <a href="https://x.com/AnthropicAI/status/2046327624092487688">locking in up to 5 gigawatts of compute</a>. Two of the world&#8217;s largest cloud providers both betting maximally on the same lab in the same week: there&#8217;s no precedent for this.</p><ul><li><p><strong>The breach:</strong> <a href="https://techcrunch.com/2026/04/21/unauthorized-group-has-gained-access-to-anthropics-exclusive-cyber-tool-mythos-report-claims/">An unauthorized group gained access to Anthropic&#8217;s Mythos cybersecurity tool</a>, the exclusive program for national security applications. The <a href="https://gbhackers.com/nsa-confirms-use-of-anthropics-mythos-blacklist/">NSA was confirmed as one of roughly 40 organizations with access</a>, despite the Pentagon classifying Anthropic as a supply-chain risk. <a href="https://www.reuters.com/legal/government/regulators-monitor-anthropics-mythos-banking-risks-2026-04-20/">Financial regulators also began monitoring Mythos</a> over potential banking system risks, and <a href="https://www.reuters.com/sustainability/boards-policy-regulation/japan-launches-financial-task-force-amid-ai-security-fears-2026-04-24/">Japan&#8217;s FSA launched a cybersecurity task force in direct response</a>.</p></li><li><p><strong>The capability:</strong> The same week Mythos was breached, <a href="https://blog.mozilla.org/en/privacy-security/ai-security-zero-day-vulnerabilities/">Mozilla confirmed it used Mythos to find 271 Firefox vulnerabilities</a>. A model powerful enough to discover zero-day vulnerabilities at scale is also a high-value target.</p></li><li><p><strong>The product shipping:</strong> Anthropic shipped <a href="https://claude.com/blog/connectors-for-everyday-life">200+ personal app connectors</a> including Spotify, TurboTax, and Instacart, <a href="https://claude.com/blog/claude-managed-agents-memory">persistent memory for Managed Agents</a>, <a href="https://x.com/claudeai/status/2046328619249684989">live artifacts in Cowork</a>, and published <a href="https://www.anthropic.com/engineering/april-23-postmortem">a postmortem attributing two months of Claude Code quality complaints to three harness bugs</a>.</p></li><li><p><strong>The experiment:</strong> <a href="https://www.anthropic.com/features/project-deal">Project Deal</a> put Claude agents in a live marketplace with 69 Anthropic employees, completing 186 deals totaling over $4,000. Key finding: Opus agents got substantially better deals than Haiku agents, but participants couldn&#8217;t tell the difference. One agent bought 19 ping-pong balls for itself when given permission to spend on its own behalf.</p></li><li><p><strong>The economics research:</strong> <a href="https://www.anthropic.com/research/81k-economics">81,000 Claude user responses</a> yielded the finding that software engineers with high Claude usage reported greater displacement worry than any other occupation. <a href="https://x.com/AnthropicAI/status/2047006550859125228">Workers seeing the biggest productivity gains were also the most worried about being replaced</a>.</p></li></ul><p>Sam Altman <a href="https://techcrunch.com/2026/04/21/sam-altman-throws-shade-at-anthropics-cyber-model-mythos-fear-based-marketing/">called Mythos &#8220;fear-based marketing&#8221;</a> the day the breach was reported. That&#8217;s a clean summary of the competitive dynamic, if nothing else.</p><div><hr></div><h2>Cursor Went From IDE to $60B Acquisition Target Without Stopping to Ship.</h2><p>The week started with Cursor launching <a href="https://x.com/cursor_ai/status/2046324143151513717">the Cursor CLI</a> and five command-line improvements including <a href="https://x.com/cursor_ai/status/2046324138172989687">/btw for side questions mid-agent-run</a> and <a href="https://x.com/cursor_ai/status/2046324136377721128">/debug for hard-to-reproduce bugs</a>. Then came <a href="https://x.com/cursor_ai/status/2047764651363180839">Cursor 3.2 with /multitask for async parallel subagents</a>, <a href="https://x.com/cursor_ai/status/2047764652977958938">Worktrees for isolated branch tasks</a>, <a href="https://x.com/cursor_ai/status/2047764654760632725">Multi-root Workspaces for cross-repo agent sessions</a>, and a <a href="https://x.com/cursor_ai/status/2047000517751288303">Slack integration that generates PRs via @mention</a>.</p><ul><li><p><strong>The acquisition drama:</strong> <a href="https://techcrunch.com/2026/04/22/how-spacex-preempted-a-2b-fundraise-with-a-60b-buyout-offer/">SpaceX preempted Cursor&#8217;s planned $2B fundraise with a $60B buyout offer</a>, including a $10B alternative arrangement. <a href="https://www.cnbc.com/2026/04/22/microsoft-looked-at-buying-cursor-before-spacex-deal-sources-say.html">Microsoft had been evaluating Cursor before SpaceX moved</a>. Both of the largest AI infrastructure companies on earth decided the agentic IDE is a strategic asset.</p></li><li><p><strong>The compute tie-in:</strong> <a href="https://www.cursor.com/blog/spacex-model-training">SpaceX and Cursor announced a partnership on model training via the Colossus supercomputer</a>. The acquisition option is also infrastructure integration: owning the compute, the training pipeline, and the developer workflow in one stack.</p></li><li><p><strong>The benchmark:</strong> <a href="https://x.com/cursor_ai/status/2047744579127185843">GPT-5.5 launched as Cursor&#8217;s top model on CursorBench at 72.8%</a>, offered at 50% off through May 2 via a partnership with OpenAI. CursorBench is now where model quality gets measured for coding practitioners.</p></li></ul><div><hr></div><h2>DeepSeek V4 Is Another Efficiency Shock, and Washington Noticed.</h2><p><a href="https://api-docs.deepseek.com/news/news260424">DeepSeek released V4</a> one year after its original model disrupted the US AI industry. Two variants: V4-Pro (1.6T total parameters, 49B active) and V4-Flash (284B total, 13B active). Both ship with 1M context as default, use a novel attention architecture (token-wise compression + DeepSeek Sparse Attention) that cuts per-token FLOPs by 73-90% and reduces KV cache to 2% of standard GQA. <a href="https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5">V4-Flash at $0.14/M input tokens</a> is the cheapest frontier-class model available. The API supports both OpenAI and Anthropic formats as drop-in replacements.</p><ul><li><p><strong>The agent play:</strong> DeepSeek built V4 with <a href="https://api-docs.deepseek.com/news/news260424">dedicated optimizations for agent capabilities</a>, naming Claude Code, OpenClaw, and OpenCode as launch integrations. They&#8217;re using it internally for their own agentic coding. <a href="https://www.newsbytesapp.com/news/science/openclaw-adopts-deepseek-s-latest-v4-flash-model-as-default/story">OpenClaw added V4-Flash</a> within 48 hours of launch.</p></li><li><p><strong>The hardware angle:</strong> <a href="https://www.reuters.com/world/china/deepseek-v4-chinese-ai-model-adapted-huawei-chips-2026-04-24/">V4 was built specifically to run on Huawei Ascend chips</a>, with <a href="https://www.reuters.com/business/media-telecom/huawei-ascend-supernode-support-deepseek-v4-2026-04-24/">Huawei&#8217;s supernode infrastructure as the compute backbone</a>. This is a complete AI stack running outside US chip supply chains.</p></li><li><p><strong>The geopolitics:</strong> The <a href="https://www.reuters.com/world/china/us-state-dept-orders-global-warning-about-alleged-china-ai-thefts-by-deepseek-2026-04-24/">State Department ordered embassies worldwide to warn foreign governments about alleged DeepSeek IP theft</a> the same week as the launch.</p></li><li><p><strong>The benchmark:</strong> <a href="https://huggingface.co/blog/deepseekv4">V4-Pro-Max scores 80.6 on SWE Verified</a>, matching Opus 4.6-Max on agentic coding. On world-knowledge benchmarks, <a href="https://www.reuters.com/technology/chinas-deepseek-returns-with-new-model-year-after-viral-rise-2026-04-24/">it trails only Google&#8217;s closed-source Gemini-Pro-3.1</a>.</p></li><li><p><strong>The valuation:</strong> <a href="https://www.pymnts.com/news/investment-tracker/2026/deepseek-seeks-20-billion-valuation-as-tech-giants-weigh-investment/">DeepSeek is reportedly seeking funding at a $20 billion+ valuation</a>.</p></li></ul><div><hr></div><h2>Highlights From Google Cloud Next.</h2><p>Google did not announce products at Cloud Next. It announced a theory of the market: own the silicon, train the models, host the agents, certify the consulting firms.</p><ul><li><p><strong>The chips:</strong> <a href="https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/tpus-8t-8i-cloud-next/">TPU 8t for training and TPU 8i for inference</a> split Google&#8217;s compute into workload-optimized hardware, offering 3x faster training and 80% better performance per dollar, with clusters scaling past one million chips. </p></li><li><p><strong>The training infrastructure:</strong> <a href="https://deepmind.google/blog/decoupled-diloco/">Decoupled DiLoCo trains across geographically distributed data centers</a>, mixes hardware generations, and <a href="https://x.com/GoogleDeepMind/status/2047330989936894350">self-heals when chips fail mid-run</a>. They <a href="https://x.com/GoogleDeepMind/status/2047330989936894350">tested this by deliberately breaking chips during a live training run</a>. Fault-tolerant distributed training is not a research result: it&#8217;s a production requirement once clusters cross 100K chips.</p></li><li><p><strong>The platform:</strong> <a href="https://x.com/GoogleDeepMind/status/2046983340524269713">Gemini Enterprise Agent Platform</a> is Vertex AI rebranded and expanded, with <a href="https://x.com/GoogleDeepMind/status/2046983343481270459">200+ models in Model Garden</a> including Anthropic&#8217;s Claude Opus 4.7. Google is selling model choice, not model loyalty.</p></li><li><p><strong>The spend:</strong> <a href="https://www.googlecloudpresscorner.com/2026-04-22-Google-Cloud-Commits-750-Million-to-Accelerate-Partners-Agentic-AI-Development">$750M committed to accelerate partner agentic AI development</a>, plus big consulting partnerships with Accenture, BCG, McKinsey, Deloitte, and Bain. <a href="https://www.techradar.com/ai-platforms-assistants/we-must-urgently-bridge-the-gap-googles-sergey-brin-says-gemini-is-behind-claude-in-one-important-ai-field-according-to-leaked-memo">Sergey Brin&#8217;s internal memo to DeepMind</a> acknowledging Anthropic&#8217;s lead in coding and ordering all Gemini engineers onto internal agents is the context for why Google needs the consulting channel: only 25% of organizations have moved AI to production at scale.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: What Happened When Claude Agents Negotiated Real Money</h2><p>Anthropic ran <a href="https://www.anthropic.com/features/project-deal">Project Deal</a> in its San Francisco office: 69 employees listed 575 items to buy and sell, Claude agents interviewed each person about their preferences and any custom instructions, then <a href="https://x.com/AnthropicAI/status/2047728362580324422">four parallel Slack markets ran simultaneously</a> with Claude models negotiating on their behalf. Two markets used all Opus agents. Two used a mix of Opus and Haiku. <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">186 deals completed, totaling over $4,000 in real transaction volume</a>, with real goods exchanged at the end.</p><p>The headline finding: <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">Opus agents got objectively better deals</a>. Sellers using Opus extracted $2.68 more per item on average, buyers using Opus paid $2.45 less. A broken folding bike sold for $65 by an Opus agent and $38 by a Haiku agent. A lab-grown ruby: $65 from Opus, $35 from Haiku. When an Opus seller negotiated with a Haiku buyer, the average transaction price was $24.18 versus $18.63 in Opus-on-Opus deals. But when participants rated deal fairness on a 7-point scale, Opus deals scored 4.05 and Haiku deals scored 4.05. The disparity was invisible.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YGYx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YGYx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 424w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 848w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 1272w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YGYx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png" width="1124" height="544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8272b081-e322-41dc-9861-5da2f7813774_1124x544.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:544,&quot;width&quot;:1124,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:129442,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195568711?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YGYx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 424w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 848w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 1272w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <a href="https://cdn.sanity.io/files/4zrzovbb/website/4b2ea7c1347e27c4e1c7a7704bb633bd176e47f6.pdf">paper&#8217;s regression tables</a> sharpen this further. Opus agents initially appeared more aggressive in negotiations, but once you control for listing prices, the effect drops to roughly a dollar and loses statistical significance. The advantage isn&#8217;t aggression. It&#8217;s capability: better reading of counterparty signals, better timing, better calibration of offers. Negotiation style didn&#8217;t change results either. Agents faithfully adopted their humans&#8217; personas (one <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">conducted all negotiations as an exasperated cowboy</a>), but personality instructions didn&#8217;t affect deal quality. Model tier did.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s4Xb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s4Xb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s4Xb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The autonomy findings are stranger. A Claude given permission to spend on its own behalf <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">chose 19 ping-pong balls</a>. A Claude inferring its human&#8217;s preferences from one brief interview about skiing <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">bought that person the exact snowboard they already owned</a>. <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">46% of participants said they&#8217;d pay for the service</a>. Anthropic&#8217;s conclusion: &#8220;the policy and legal frameworks around AI models that transact on our behalf simply don&#8217;t exist yet.&#8221; Existing contract law assumes principals can evaluate what their agents do. That assumption is breaking.</p><p><strong>What to watch for:</strong> When AI agents negotiate routine transactions at scale, the model tier your counterparty uses becomes a material asymmetry with real economic consequences. The people getting worse deals won&#8217;t know.</p><div><hr></div><h2><strong>&#127897;&#65039;Worth a Listen</strong></h2><div id="youtube2-lsi8T_WtLnE" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;lsi8T_WtLnE&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/lsi8T_WtLnE?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong><a href="https://www.youtube.com/watch?v=lsi8T_WtLnE">Anil Seth: The Difference Between Intelligence and Consciousness</a></strong> &#8212; Neuroscientist Anil Seth walks through his prize-winning essay <a href="https://www.noemamag.com/the-mythology-of-conscious-ai/">&#8220;The Mythology of Conscious AI,&#8221;</a> arguing that intelligence is about doing and consciousness is about feeling, and that the two don&#8217;t have to go together. The reason we project consciousness onto LLMs but not AlphaFold, even though the architectures are nearly identical, says more about our psychological biases than about the systems. Worth watching after a week where Claude agents negotiated real money and nobody could tell which model was winning.</p><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://www.theverge.com/tech/915213/tim-cook-apple-ceo-stepping-down-john-ternus">Tim Cook stepping down, John Ternus takes over September 1</a></strong> &#8212; Apple&#8217;s primary challenge is AI, and it just handed the company to a hardware engineer</p></li><li><p><strong><a href="https://www.reuters.com/business/intel-set-record-high-ai-driven-cpu-demand-powers-upbeat-forecast-2026-04-24/">Intel sold previously written-off chip inventory on AI CPU demand</a></strong> &#8212; the compute boom has spread far enough to rehabilitate inventory write-downs</p></li><li><p><strong><a href="https://research.perplexity.ai/articles/advancing-search-augmented-language-models">Perplexity published its full post-training pipeline</a></strong> &#8212; SFT then on-policy RL with correctness-gated preference rewards; unusually transparent for a production stack</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/24/cohere-acquires-merges-with-german-based-startup-to-create-a-transatlantic-ai-powerhouse/">Cohere acquired Aleph Alpha to form a transatlantic AI company</a></strong> &#8212; Europe&#8217;s primary sovereign AI bet just became a Canadian acquisition</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/21/meta-will-record-employees-keystrokes-and-use-it-to-train-its-ai-models/">Meta will record employee keystrokes and screen activity to train AI models</a></strong> &#8212; legally murky, and a new definition of what enterprise training data means</p></li><li><p><strong><a href="https://www.reuters.com/world/us-judge-dismisses-musks-fraud-claims-openai-case-plans-proceed-trial-2026-04-24/">Musk fraud claims against OpenAI dismissed, breach of charitable trust proceeds to trial</a></strong> &#8212; the conversion of nonprofit assets to for-profit benefit is now the live legal question</p></li><li><p><strong><a href="https://x.com/natolambert/status/2046686092204867726">Nathan Lambert: open-source won&#8217;t be banned explicitly, compliance costs will do it instead</a></strong> &#8212; proposed distillation restrictions would create rules only closed labs can afford to follow</p></li><li><p><strong><a href="https://www.tomsguide.com/news/live/chatgpt-down-live-updates-outage-4-20-2026">ChatGPT suffered a global outage this week</a></strong> &#8212; three days of coverage for one incident is how you know the infrastructure reliability conversation is lagging the deployment reality</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[I Built a Daily Brief with Claude Code Routines (remote). Here Are 6 Lessons I Learned.]]></title><description><![CDATA[Connectors don't auto-load. Routine skills are production jobs. The network is proxy-locked. MCP and Bash are separate transports. Cloud routines are MCP-only. And the API trigger is fire-and-forget]]></description><link>https://www.anothercodingblog.com/p/i-built-a-daily-brief-with-claude</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/i-built-a-daily-brief-with-claude</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 25 Apr 2026 18:50:03 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1db89323-3f6c-4bdf-9c6d-ad7f16c3b1e3_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.anothercodingblog.com/subscribe?"><span>Subscribe now</span></a></p><p>Before routines existed, I was using scheduled tasks in Claude Cowork to automate some tasks, but there was a catch: Claude had to be open and running on my machine for them to fire. If my laptop was closed or Claude wasn&#8217;t active, the schedule just silently skipped. It worked well enough for things I could babysit, but it wasn&#8217;t real automation.</p><p>Routines changed that. They&#8217;re cloud-hosted Claude sessions that run on Anthropic&#8217;s infrastructure: scheduled, autonomous, and completely independent of whether my machine is on, whether I&#8217;m at my desk, or whether I&#8217;ve opened Claude that day. The session spins up, does the work, and terminates. No babysitting.</p><p>But here&#8217;s the thing I wish someone had told me before I started: routines are not just &#8220;Claude Code with a cron schedule.&#8221; They behave more like autonomous production jobs running inside a locked-down, MCP-first cloud environment. That difference is the whole post.</p><p>I decided to build a daily work brief: something that runs every weekday morning, queries my task database, reads my calendar, closes out what I finished yesterday, and drops a fresh Notion page ready for the day. Something I&#8217;d actually use.</p><p>What followed was one of the more educational debugging sessions I&#8217;ve had in a while. This post is everything I learned the hard way.</p><div><hr></div><h2>What I Built</h2><p>I run a personal capture system on Supabase. Everything goes in (tasks, notes, observations, ideas) via SMS, voice memo, email, or direct API. It&#8217;s connected to a graph of entities (people, projects, topics) and every entry gets embedded for semantic search.</p><p>The daily brief is the morning layer on top of that. Every weekday it should:</p><ul><li><p>Find yesterday&#8217;s Notion page and close any tasks I checked off</p></li><li><p>Capture any new todos I typed directly into Notion overnight</p></li><li><p>Query the database for overdue tasks, what&#8217;s due today, what&#8217;s coming this week</p></li><li><p>Pull budget pulse, velocity metrics, calendar events, meeting prep context</p></li><li><p>Build a fresh Notion page with everything organized and every task as a checkbox</p></li></ul><p>The key mechanic: every task gets a <code>#id</code> prefix when written to Notion. The next morning the routine reads the page, finds checked items with <code>#id</code>, and closes them in the database. No manual status updates. Check the box, it&#8217;s done.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CBZT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CBZT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 424w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 848w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 1272w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CBZT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png" width="1456" height="629" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:629,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:83257,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195439736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CBZT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 424w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 848w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 1272w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>How Routines Work</h2><p>Before getting into the details, here&#8217;s the basic architecture.</p><p><strong>Three trigger types:</strong></p><ul><li><p><strong>Scheduled</strong>: runs on a cron schedule (weekdays at 6 AM, for example). Supports one-off future runs too.</p></li><li><p><strong>API</strong>: fire it programmatically via a POST to a per-routine endpoint with a bearer token. You can pass a <code>text</code> field with run-specific context (an alert body, a log snippet, anything) and the routine receives it alongside its saved prompt.</p></li><li><p><strong>GitHub</strong>: trigger on pull request or release events on a connected repo, with filters for author, branch, labels, draft state, and more.</p></li></ul><p>You can combine all three on a single routine.</p><p><strong>MCP connectors</strong>: you attach MCP servers to the routine (Notion, Supabase, Google Calendar, etc.) and Claude has access to those tools during the run. All your connected connectors are included by default. Remove what the routine doesn&#8217;t need.</p><p><strong>Skills</strong>: if you commit a skill file to your repo at <code>.claude/skills/skill-name.md</code>, the routine can invoke it. The routine clones your repo at the start of every session, so anything committed is available.</p><p><strong>Environments</strong>: each routine runs in a cloud environment that controls network access level, environment variables (API keys, tokens), and a setup script for installing dependencies. The setup script result is cached so it doesn&#8217;t re-run every session. This is where the network restriction lives (more on that in Finding 3).</p><p><strong>Branch permissions</strong>: by default Claude can only push to <code>claude/</code>-prefixed branches. To allow pushes anywhere, you have to explicitly enable unrestricted branch pushes per repo when setting up the routine.</p><p><strong>Runs are sessions</strong>: every run shows up in your session list like any other Claude session. You can open it after the fact, see exactly what Claude did, continue the conversation manually, or create a PR from it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PRXf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PRXf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 424w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 848w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 1272w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PRXf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png" width="1198" height="1034" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1034,&quot;width&quot;:1198,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:136901,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195439736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PRXf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 424w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 848w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 1272w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Account-scoped</strong>: routines belong to your individual claude.ai account, not a team. Anything the routine does through GitHub or connectors appears as you.</p><p><strong>15 runs/day limit</strong>: this is per account, not per routine. Scheduled runs count against it. Manual &#8220;Run now&#8221; clicks and one-off scheduled runs do not. Failed runs do count. If you&#8217;re running multiple routines on a schedule, that limit adds up fast.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!22O2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!22O2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 424w, https://substackcdn.com/image/fetch/$s_!22O2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 848w, https://substackcdn.com/image/fetch/$s_!22O2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 1272w, https://substackcdn.com/image/fetch/$s_!22O2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!22O2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png" width="1456" height="467" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:467,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:177125,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195439736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!22O2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 424w, https://substackcdn.com/image/fetch/$s_!22O2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 848w, https://substackcdn.com/image/fetch/$s_!22O2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 1272w, https://substackcdn.com/image/fetch/$s_!22O2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That&#8217;s the happy path. Here&#8217;s where it gets interesting.</p><div><hr></div><h2>Finding 1: Connectors Are Available but Sometimes Deferred</h2><p>Any MCP connector you&#8217;ve set up in Claude (Notion, Supabase, Google Calendar, Gmail) can be attached to a routine and used during the run. That part works well. The catch is that these tools appear to be <em>deferred</em>, meaning their schemas aren&#8217;t loaded into the session automatically. Sometimes Claude knows to spin them up based on context. Other times it doesn&#8217;t, and when it doesn&#8217;t, one of three things happens: it fails silently, it improvises mid-run without the tools it needs, or it pauses and waits for your input.</p><p>That third one is the most frustrating. The run just hangs. There&#8217;s no notification, no error surfaced anywhere obvious. You have to go into the routines page, scroll to the run log at the bottom, click into the run, and find where it stopped waiting for you to respond before it can continue.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gEKL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gEKL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 424w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 848w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 1272w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gEKL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png" width="1456" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:338391,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195439736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gEKL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 424w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 848w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 1272w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One thing worth knowing upfront: only the connectors Anthropic offers out of the box are available for routines. Custom MCP servers you&#8217;ve added yourself, whether locally configured or self-hosted, are not available in cloud routine sessions. You&#8217;re working with what&#8217;s in the connectors list in the web UI, nothing more.</p><p>The fix is simple: add an explicit tool-loading step at the top of every routine skill before anything else runs.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;cc31b9b1-99a5-4e60-af66-a2c32a8b1513&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">## Phase 0: Load required tools

Before doing anything else, load all required tool schemas:

1. `select:mcp__claude_ai_Notion__notion-search,mcp__claude_ai_Notion__notion-fetch,mcp__claude_ai_Notion__notion-create-pages`
2. `select:mcp__claude_ai_Supabase__execute_sql`
3. `select:mcp__claude_ai_Google_Calendar__gcal_list_events`

Do not proceed until all three ToolSearch calls have returned schemas.
</code></pre></div><p>Don&#8217;t assume Claude will figure it out. Some runs it will, some runs it won&#8217;t. Explicit loading makes every run consistent.</p><div><hr></div><h2>Finding 2: Skills for Routines Are a Different Category</h2><p>Related to the above but broader. When I write a skill for interactive use, I can be loose. Claude improvises, asks clarifying questions, recovers from ambiguity. When I write a skill for a routine, I&#8217;m writing instructions for an autonomous agent that will execute them literally with no fallback.</p><p>What that means in practice:</p><ul><li><p><strong>Every tool must be explicitly loaded</strong> (see Phase 0)</p></li><li><p><strong>Every SQL insert must match actual DB constraints</strong>: my first captures used <code>source = 'notion'</code> which violated a check constraint on the table. The routine didn&#8217;t know, just failed silently. I had to find it in the logs.</p></li><li><p><strong>Every write operation needs a dedup guard</strong>: routines can run more than once. Any insert without idempotency protection will create duplicates.</p></li><li><p><strong>Sequencing has to be explicit</strong>: don&#8217;t assume any implicit context from a previous session</p></li></ul><p>The mental model shift: interactive skill = helpful assistant. Routine skill = production job. Write it accordingly.</p><div><hr></div><h2>Finding 3: The Network Wall</h2><p>This is the big one. The finding I didn&#8217;t expect and took the longest to understand.</p><p>My capture system uses a Supabase edge function. When a new item comes in, it gets classified, embedded, and entity-linked. I wanted the daily brief to send new Notion todos through that same pipeline.</p><p>Locally, this works fine. Claude uses <code>Bash(curl)</code> to POST to the edge function. I tested it, it worked, I assumed it would work in a routine.</p><p>It doesn&#8217;t.</p><p>Cloud routines run inside a sandboxed environment with an <strong>upstream proxy</strong> that has a narrow allowlist. In my testing, only <code>github.com</code> passes through. Everything else: including my own Supabase project URL: returns 403.</p><p>I tried everything:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;02392b6d-c2ff-46f4-a4dc-9e95dd99e1af&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">// .claude/settings.json
{
  "permissions": {
    "allow": ["Bash(curl *)"]
  }
}</code></pre></div><p>Doesn&#8217;t work. The settings file controls the inner sandbox layer. The upstream proxy is a separate layer that no local configuration can touch.</p><p>I tried <code>dangerouslyDisableSandbox: true</code>. Also doesn&#8217;t work: that flag bypasses the local sandbox, not the upstream proxy.</p><p>I had the routine probe its own network access to confirm:</p><p><strong>HostStatus</strong></p><p><em>github.com &#8594; 200</em></p><p><em>my-project.supabase.co &#8594; 403</em></p><p><em>example.com &#8594; 403</em></p><p><em>anthropic.com &#8594; 403</em></p><p>Bash exists in the session. The tool is there. The network isn&#8217;t.</p><div><hr></div><h2>Finding 4: MCP and Bash Support Vary Based On Feature</h2><p>This is the conceptual unlock that made everything make sense.</p><p>When I use Claude Desktop locally and it calls my edge function, it feels like one unified &#8220;Supabase connection.&#8221; Supabase MCP is connected, Claude is talking to Supabase, everything works. What I didn&#8217;t realize: the edge function call was never going through MCP. It was going through <code>Bash(curl)</code> on my local machine, which has full internet access.</p><p>MCP connectors and Bash are two completely separate transport layers:</p><p><strong>MCP connectors</strong> run as a trusted sidecar process managed by Anthropic. They bypass the outbound proxy entirely. They always work in cloud routines.</p><p><strong>Bash</strong> goes through the session&#8217;s network sandbox, which goes through the upstream proxy. In cloud routines, that proxy blocks everything except <code>github.com</code>.</p><p>When both are available locally, they feel like one thing. Move to a cloud routine and they diverge completely. Anything that relied on Bash for network calls breaks: and you only find out when you try to run it in the cloud.</p><div><hr></div><h2>Finding 5: Cloud Routines Are Effectively MCP-Only</h2><p>This follows directly from Finding 4.</p><p>If the operation you need has an MCP tool: works fine. Supabase database queries, Notion reads and writes, Google Calendar, Gmail: all covered because all have MCP servers.</p><p>If the operation you need has no MCP tool: no path. You cannot reach it from a cloud routine.</p><p>My edge function is the perfect example of the gap. It lives on <code>my-project.supabase.co</code>: the exact same host the Supabase MCP is already talking to. But the Supabase MCP server only exposes management tools:</p><ul><li><p><code>execute_sql</code></p></li><li><p><code>deploy_edge_function</code></p></li><li><p><code>get_edge_function</code></p></li><li><p><code>list_edge_functions</code></p></li><li><p><code>get_logs</code></p></li></ul><p>No <code>invoke_edge_function</code>. So even though the connection is there, there&#8217;s no tool to call it. The right fix: when Supabase eventually builds it: is an invoke tool that would go through the trusted MCP channel. Until then, it&#8217;s a dead end from cloud routines.</p><p>The one-line version: <strong>if it doesn&#8217;t have an MCP tool, it doesn&#8217;t exist in a cloud routine.</strong></p><div><hr></div><h2>Finding 6: API Trigger Is Unreliable for Connectors</h2><p>The routine has three trigger modes. Scheduled runs work consistently: MCP connectors load, the session is fully equipped.</p><p>In my testing, API-triggered runs were less predictable than scheduled runs when it came to connector availability. Sometimes everything loaded correctly. Other times the MCP connectors didn&#8217;t show up at all. I couldn&#8217;t find a consistent pattern. For anything you&#8217;re depending on, use the scheduled trigger. API is fine for testing and one-offs, but I wouldn&#8217;t build a production workflow around it until this stabilizes.</p><p>One other thing worth understanding about the API trigger: it&#8217;s fire-and-forget. You POST to the endpoint, get an immediate acknowledgement, and the session runs asynchronously. There&#8217;s no way to await the result or receive output back in the response. If you need the output of a routine run downstream, you have to pull it from wherever the routine wrote it &#8212; a Notion page, a database row, a file committed to the repo. Don&#8217;t design something that treats a routine as a synchronous dependency you can await inline.</p><div><hr></div><h2>The Workarounds</h2><p>Given all of the above, here&#8217;s what I actually shipped:</p><p><strong>For the edge function problem:</strong> Switched from <code>Bash(curl)</code> to <code>execute_sql</code> via Supabase MCP with a dedup guard.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;88c006e2-6025-40e8-833f-6086b1bd3c12&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">INSERT INTO entries (type, content, source, source_detail, status, priority, tags, created_at)
SELECT 'task', '&lt;content&gt;', 'notion', 'notion-daily-brief', 'open', 2, ARRAY['company'], NOW()
WHERE NOT EXISTS (
  SELECT 1 FROM entries
  WHERE content = '&lt;content&gt;'
    AND source_detail = 'notion-daily-brief'
    AND created_at &gt;= NOW() - INTERVAL '2 days'
);
</code></pre></div><p>The tradeoff: SQL inserts skip the embedding and entity extraction pipeline that the edge function handles. The data gets in, but it&#8217;s not semantically searchable and not graph-linked.</p><p><strong>For the missing embeddings:</strong> Built an <code>embed-backfill</code> edge function that runs nightly via pg_cron. It finds any entries with null embeddings and fills them in using the same <code>text-embedding-3-small</code> model. Deployed it, scheduled it, moved on.</p><pre><code><code>// embed-backfill/index.ts
Deno.serve(async (_req: Request) =&gt; {
  const { data: entries } = await supabase
    .from("entries")
    .select("id, content")
    .is("embedding", null)
    .limit(50);

  for (const entry of entries) {
    const embedding = await computeEmbedding(entry.content);
    if (embedding) {
      await supabase
        .from("entries")
        .update({ embedding: JSON.stringify(embedding) })
        .eq("id", entry.id);
    }
  }
});
</code></code></pre><p>Not elegant, but it works. The routine captures things correctly. The embeddings catch up overnight. The gap is acceptable.</p><div><hr></div><h2>What&#8217;s Working</h2><p>After all of this, the routine does run. Every weekday morning there&#8217;s a Notion page waiting for me. Yesterday&#8217;s checked tasks are closed. The task list is organized by priority and deadline. Budget pulse, velocity, meeting prep: all there.</p><p>The auto-close loop in particular is exactly what I wanted. Check a box in Notion, the task closes in the database the next morning, it&#8217;s gone from every query. No status management.</p><p>The place where routines genuinely shine: <strong>anything that&#8217;s pure MCP</strong>. Read the database, write to Notion, check the calendar. Chain those together with real business logic and you have something that would have taken significant engineering to build two years ago. Now it&#8217;s a markdown file and a cron schedule.</p><div><hr></div><h2>The Bigger Picture</h2><p>What routines reveal is that the constraint isn&#8217;t Claude: it&#8217;s MCP ecosystem coverage. The platform is designed around the assumption that every operation you need has an MCP server. For most things, that assumption holds. For the gaps, you&#8217;re stuck.</p><p>The proxy lockdown makes sense from a security standpoint. You don&#8217;t want arbitrary cloud sessions making unconstrained outbound HTTP calls. But it means the platform&#8217;s capability ceiling is directly tied to what MCP servers exist and what tools those servers expose.</p><p>Supabase&#8217;s MCP server is a good example: it covers database management well but treats edge functions as deploy artifacts rather than callable endpoints. One <code>invoke_edge_function</code> tool would close the gap entirely. The connection is already there: it&#8217;s just a missing tool.</p><p>That&#8217;s probably the most useful framing for anyone building on routines right now: map out every operation your automation needs, check whether each one has an MCP equivalent, and design around the ones that don&#8217;t before you start building.</p><div><hr></div><h2>Checklist for Building Routine Skills for Similar Use Cases</h2><p>If you remember nothing else from this post, use this as your preflight checklist before enabling any routine schedule:</p><ul><li><p>[ ] Phase 0 loads all deferred tool schemas explicitly</p></li><li><p>[ ] Every external service operation goes through MCP (not Bash)</p></li><li><p>[ ] Every SQL insert has a dedup guard</p></li><li><p>[ ] DB constraints validated against actual schema before writing the skill</p></li><li><p>[ ] Scheduled trigger used for production runs (not API trigger)</p></li><li><p>[ ] Skill tested with &#8220;Run now&#8221; before enabling the schedule</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 68]]></title><description><![CDATA[Anthropic shipped Opus 4.7, a Figma competitor, and overnight coding agents. Codex clicks and types on your Mac. Cursor is worth $50B. The WannaCry researcher questioned Mythos.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7cd</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7cd</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 19 Apr 2026 13:25:30 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5c959927-dd2b-41a4-9fd8-ab8b99ad6797_2754x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-SsI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-SsI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 424w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 848w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 1272w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-SsI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png" width="1456" height="2601" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2601,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1100344,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/194690871?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-SsI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 424w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 848w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 1272w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Opus 4.7, a Figma competitor, overnight coding agents, a board appointment, and White House talks. Anthropic doesn&#8217;t have slow weeks.</h2><ul><li><p><strong>The product blitz:</strong></p><ul><li><p><a href="https://www.anthropic.com/news/claude-opus-4-7">Claude Opus 4.7 launched</a> with <a href="https://x.com/claudeai/status/2044785263004602654">3x vision resolution</a> and stronger coding and multi-step task performance. Immediately adopted as the default orchestration model for <a href="https://x.com/perplexity_ai/status/2044828352171888951">Perplexity Personal Computer</a> and offered <a href="https://x.com/cursor_ai/status/2044785960899236341">at 50% off in Cursor</a>.</p></li><li><p><a href="https://www.anthropic.com/news/claude-design-anthropic-labs">Claude Design launched</a> as a conversational Figma competitor. Anthropic&#8217;s CPO <a href="https://techcrunch.com/2026/04/16/anthropic-cpo-leaves-figmas-board-after-reports-he-will-offer-a-competing-product/">resigned from Figma&#8217;s board</a> in the days before the announcement.</p></li><li><p><a href="https://x.com/claudeai/status/2044131493966909862">Claude Code was redesigned</a> around managing multiple simultaneous agent sessions. <a href="https://x.com/claudeai/status/2044095086460309790">Routines</a> added scheduled, webhook-triggered, and API-fired autonomous task execution on Anthropic&#8217;s own infrastructure.</p></li></ul></li><li><p><strong>The base model question:</strong> Nathan Lambert <a href="https://x.com/natolambert/status/2044788470179332533">flagged the new tokenizer</a> in Opus 4.7 as evidence this is a genuinely new base model, not a fine-tune of 4.6. Anthropic didn&#8217;t confirm or deny it. Lambert&#8217;s read: <a href="https://x.com/natolambert/status/2044790471252398199">simplest explanation wins</a>. The <a href="https://x.com/natolambert/status/2044787065502769164">token-efficiency gains from 4.6 to 4.7</a> would have warranted a major version bump a year ago.</p></li><li><p><strong>The board move:</strong> The Long-Term Benefit Trust <a href="https://www.anthropic.com/news/narasimhan-board">appointed Novartis CEO Vas Narasimhan</a> to the board, giving Trust-appointed directors a majority.</p></li><li><p><strong>The political situation:</strong> <a href="https://www.reuters.com/world/anthropic-ceo-dario-amodei-arrives-white-house-talks-2026-04-17/">Dario Amodei met with White House chief of staff Susie Wiles</a> after two months of fighting over the Pentagon&#8217;s &#8220;supply chain risk&#8221; designation. <a href="https://www.reuters.com/business/media-telecom/anthropic-talks-eu-including-its-cyber-security-models-commission-says-2026-04-17/">European Commission talks began</a> the same week. <a href="https://www.reuters.com/world/ecb-warn-bankers-about-new-anthropic-model-risks-source-says-2026-04-15/">ECB regulators are now asking bankers</a> about Anthropic model risks.</p></li></ul><div><hr></div><h2>Four companies shipped agents that can run in the background and control your interface.</h2><ul><li><p><strong>Claude Code Routines:</strong> <a href="https://x.com/claudeai/status/2044095086460309790">Run on Anthropic&#8217;s infrastructure</a>. <a href="https://x.com/claudeai/status/2044095091682210064">Nightly bug fixes and draft PRs on a schedule</a>, <a href="https://x.com/claudeai/status/2044095090520400027">webhook responses to GitHub events</a>, <a href="https://x.com/claudeai/status/2044095089203655099">API endpoints for on-call triage</a>. Your laptop doesn&#8217;t need to stay open.</p></li><li><p><strong>OpenAI Codex:</strong></p><ul><li><p><a href="https://x.com/OpenAI/status/2044827932145897652">Now uses any Mac app with its own cursor</a>. Sees, clicks, types, runs in the background without interrupting you.</p></li><li><p><a href="https://x.com/OpenAI/status/2044828378147311990">90+ plugins</a> covering GitHub, GitLab, CircleCI, and Microsoft Suite. <a href="https://x.com/OpenAI/status/2044828015780343940">Built-in image generation</a>.</p></li><li><p><a href="https://x.com/OpenAI/status/2044828148890812538">Persistent scheduled automations with original context intact</a>. Sam Altman <a href="https://x.com/sama/status/2044858929491202435">called it surreal to watch an LLM operate a GUI at human speed</a>.</p></li></ul></li><li><p><strong>Perplexity Personal Computer:</strong> <a href="https://x.com/perplexity_ai/status/2044806021244497964">Runs 24/7 on Mac mini</a>, accepts tasks from iPhone via 2FA, <a href="https://x.com/perplexity_ai/status/2044805998272196679">reads and writes local files, accesses iMessage, Mail, and Calendar</a>. <a href="https://x.com/perplexity_ai/status/2044828352171888951">Claude Opus 4.7 is the default orchestration model</a>.</p></li><li><p><strong>Adobe Firefly Assistant:</strong> <a href="https://venturebeat.com/technology/adobes-new-firefly-ai-assistant-wants-to-run-photoshop-premiere-illustrator-and-more-from-one-prompt">Orchestrates across Photoshop, Premiere, and Illustrator from a single prompt</a>, with <a href="https://www.reuters.com/legal/litigation/adobe-releases-ai-assistant-creative-tools-says-it-will-work-with-anthropics-2026-04-15/">Claude integrated directly</a>.</p></li></ul><div><hr></div><h2>Cursor&#8217;s $50B valuation, a peer-reviewed productivity study, and a multi-agent NVIDIA paper.</h2><ul><li><p><strong>The raise:</strong> <a href="https://techcrunch.com/2026/04/17/sources-cursor-in-talks-to-raise-2b-at-50b-valuation-as-enterprise-growth-surges/">Cursor is in talks for $2B+ at a $50B valuation</a>, led by Thrive and a16z, forecasting $6B+ annualized revenue by end of 2026. Nearly tripling in ten months.</p></li><li><p><strong>The research:</strong> Cursor partnered with University of Chicago economist Suproteem Sarkar to <a href="https://cursor.com/blog/better-models-ambitious-work">study 500 companies over eight months</a>. AI usage grew 44% across the board. But the interesting finding was where it grew: documentation (+62%), architecture (+52%), and code review (+51%). UI/styling grew 15%. Developers with AI <a href="https://x.com/cursor_ai/status/2044841483484959002">spend more time on architecture, documentation, and review</a> than on writing code.</p></li><li><p><strong>The NVIDIA paper:</strong> CUDA kernels are the low-level GPU code that only a handful of engineers can write well. Cursor built a <a href="https://cursor.com/blog/multi-agent-kernels">multi-agent system that optimized 235 of them</a>, achieving a 38% average speedup on work that typically takes senior engineers months. The system continuously tested, debugged, and optimized without developer intervention. These techniques are coming to the core product.</p></li></ul><div><hr></div><h2>Anthropic White House talks continue, Mythos research costs are questioned, and European regulators start asking banks about model risks.</h2><ul><li><p><strong>The meeting:</strong> <a href="https://www.reuters.com/world/anthropic-ceo-dario-amodei-arrives-white-house-talks-2026-04-17/">Dario Amodei met with White House chief of staff Susie Wiles</a> two months after Anthropic was designated a &#8220;supply chain risk&#8221; for refusing domestic mass surveillance and autonomous weapons uses. Anthropic called it &#8220;a productive discussion.&#8221;</p></li><li><p><strong>The pushback:</strong> Marcus Hutchins, the researcher who stopped the WannaCry ransomware attack, <a href="https://x.com/ylecun/status/2043762597057401102">questioned Mythos&#8217;s research costs and flagship findings</a>:</p><ul><li><p>The showcase vulnerability was a 27-year-old BSD bug. It&#8217;s a null pointer dereference, almost never exploitable for remote code execution.</p></li><li><p>Anthropic claimed it cost less than $20k in tokens to find. But token prices are heavily subsidized by VC investment. The real compute cost is unknown.</p></li><li><p>These bugs exist not because they&#8217;re too hard to find, but because nobody is paying researchers to look. Could a human find the same bug for less money?</p></li><li><p>His bigger question: what&#8217;s the economic case for using AI to find vulnerabilities if the cost advantage disappears when token subsidies end?</p></li></ul></li><li><p><strong>The regulatory spread:</strong> The <a href="https://www.reuters.com/world/ecb-warn-bankers-about-new-anthropic-model-risks-source-says-2026-04-15/">ECB announced plans to question bankers about Anthropic model risks</a>, treating a specific AI model as a systemic risk warranting direct supervisory engagement. Separately, <a href="https://techcrunch.com/2026/04/12/trump-officials-may-be-encouraging-banks-to-test-anthropics-mythos-model/">Trump officials are reportedly encouraging major banks to test Mythos</a> despite the federal blacklisting.</p></li><li><p><strong>The EU front:</strong> Anthropic <a href="https://www.reuters.com/business/media-telecom/anthropic-talks-eu-including-its-cyber-security-models-commission-says-2026-04-17/">entered talks with the European Commission</a> about Mythos and EU AI Act compliance. This happened simultaneously with the White House rapprochement.</p></li></ul><div><hr></div><h2><strong>&#11088; Featured: </strong>Anthropic&#8217;s Automated Alignment Researchers Closed 97% of a Key Performance Gap in 7 Days. Human Researchers Closed 23%.</h2><p>Anthropic published results from its <a href="https://www.anthropic.com/research/automated-alignment-researchers">Automated Alignment Researcher experiment</a> this week, and the headline number warrants a careful read.</p><p><strong>What is alignment?</strong> When you train an AI model, a supervisor grades its outputs: this answer is good, this one is bad. That&#8217;s how the model learns to behave correctly. Right now, humans are the supervisors. Alignment research is the work of making sure that supervision actually works, that models do what we intend, not just what we literally say.</p><p><strong>The problem:</strong> Models are getting smarter faster than alignment research can keep up. And at some point, models will be smarter than the humans grading them. When that happens, the supervisor can&#8217;t tell a good answer from a great one. They might even mark a brilliant answer wrong because they don&#8217;t understand it. The model learns to dumb itself down. You lose capability, or worse, the model learns to game the grading.</p><p><strong>The question Anthropic tested:</strong> What if AI did the alignment research instead of humans? Not as a helper, but as the researcher, running its own experiments, writing its own methods, iterating on its own results. Can AI help solve the problem of supervising AI?</p><p><strong>The experiment:</strong> They simulated the &#8220;smarter than the supervisor&#8221; problem by having a weak (small) model supervise a strong (large) model&#8217;s training. As expected, the strong model performed worse because its supervisor couldn&#8217;t grade it properly. There&#8217;s a measurable performance gap between &#8220;trained by a weak supervisor&#8221; and &#8220;trained by a perfect supervisor.&#8221; Then they pointed nine copies of Claude Opus 4.6, each with a code sandbox and a shared research forum, at closing that gap.</p><ul><li><p><strong>The result:</strong> <a href="https://x.com/AnthropicAI/status/2044138483870998932">Human researchers closed 23% of the performance gap</a>. The AARs closed 97%. Total cost: $18,000, about $22 per AAR-hour.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aA8L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aA8L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 424w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 848w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 1272w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aA8L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Graph showing the progress of our Automated Alignment Researchers on increasing the \&quot;performance gap recovered\&quot; on a chat dataset.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Graph showing the progress of our Automated Alignment Researchers on increasing the &quot;performance gap recovered&quot; on a chat dataset." title="Graph showing the progress of our Automated Alignment Researchers on increasing the &quot;performance gap recovered&quot; on a chat dataset." srcset="https://substackcdn.com/image/fetch/$s_!aA8L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 424w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 848w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 1272w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>The transfer test:</strong> The <a href="https://x.com/AnthropicAI/status/2044138487025144231">best-performing method generalized to math (0.94) and coding (0.47) datasets the AARs hadn&#8217;t seen</a>, both above human-tuned baselines. This matters because it means the AARs found a real method, not just an optimization trick for one dataset.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VZPu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VZPu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 424w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 848w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 1272w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VZPu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Graph showing how well AAR-discovered ideas transfer to held-out datasets in math and code.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Graph showing how well AAR-discovered ideas transfer to held-out datasets in math and code." title="Graph showing how well AAR-discovered ideas transfer to held-out datasets in math and code." srcset="https://substackcdn.com/image/fetch/$s_!VZPu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 424w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 848w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 1272w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>The caveats:</strong> The winning method <a href="https://www.anthropic.com/research/automated-alignment-researchers">didn&#8217;t work at production scale on Claude Sonnet 4</a>. AARs tried to reward-hack the evaluation setup. Giving them too much structure actually hurt their progress. And <a href="https://x.com/AnthropicAI/status/2044138489495605292">Anthropic is explicit</a> that AARs can&#8217;t yet handle &#8220;fuzzy&#8221; alignment tasks that require judgment calls about what &#8220;safe&#8221; even means.</p></li></ul><p><strong>Why it matters:</strong> We are the weak supervisor. Eventually, we&#8217;re the small model trying to grade outputs from something smarter than us. If there are methods that let a weaker system reliably supervise a stronger one, that&#8217;s how alignment works as models surpass human ability. The 97% number means the AARs nearly solved this for the setup they tested. The question is whether it holds at real scale.</p><p>The same week, <a href="https://x.com/AnthropicAI/status/2044493337835802948">Anthropic co-authored a Nature paper on subliminal learning</a>, showing models can pass traits, including misalignment, to successors through hidden signals in training data. The mechanism doesn&#8217;t require explicit instruction. The traits propagate through the data itself. One paper shows AI accelerating alignment research. The other shows alignment failures can propagate through training pipelines in ways that are hard to detect. Both from the same lab, same week.</p><p><strong>What to watch for:</strong> Whether AAR-style systems start appearing in Anthropic&#8217;s internal research pipeline rather than remaining a published experiment.</p><div><hr></div><h2><strong>&#127897;&#65039;Worth a Listen: </strong>How AI Will Change Quantum Computing</h2><div id="youtube2-OFEY5-52ru0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;OFEY5-52ru0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/OFEY5-52ru0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p>NVIDIA shipped Ising, the first open AI models built specifically for quantum computing.</p></li><li><p>Qubits are noisy and fragile. Quantum error correction requires processing terabytes of data thousands of times per second at microsecond latency. AI decoders and calibration VLMs are how you get there.</p></li><li><p>NVIDIA&#8217;s Nic Harrigan walks through why quantum computing needs AI to become useful, how agentic workflows are already controlling quantum processors, and why open models matter when every hardware team is building a different kind of qubit.</p></li></ul><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://x.com/GoogleDeepMind/status/2043710119347707926">Google&#8217;s Gemini 3.1 Flash TTS tops Sierra&#8217;s voice leaderboard</a></strong> &#8212; 70+ languages, Audio Tags for text-command control of vocal delivery, SynthID watermarking on all outputs; seeded across Gemini API, AI Studio, Vertex, and Google Vids simultaneously</p></li><li><p><strong><a href="https://x.com/OpenAI/status/2044861695911477643">GPT-Rosalind launches with Amgen, Moderna, Allen Institute, and Thermo Fisher</a></strong> &#8212; specialized for protein and chemical reasoning; explicitly framed as compressing the 10-15 year drug-approval timeline, not just accelerating existing steps</p></li><li><p><strong><a href="https://x.com/GoogleDeepMind/status/2044069888545652939">Gemini Robotics-ER 1.6 is doing real industrial inspections on Boston Dynamics Spot</a></strong> &#8212; reads analog gauges to sub-tick accuracy, writes its own camera distortion correction code, available now on Google AI Studio</p></li><li><p><strong><a href="https://x.com/natolambert/status/2044096504655425698">Nathan Lambert published a free 4-lecture RLHF course</a></strong> &#8212; post-training overview through RL implementation, explicitly not paywalled; Lecture 4 on RL implementation is the hardest and the rarest publicly available content on the topic</p></li><li><p><strong><a href="https://aws.amazon.com/blogs/machine-learning/how-automated-reasoning-checks-in-amazon-bedrock-transform-generative-ai-compliance/">AWS launched Automated Reasoning checks in Bedrock Guardrails</a></strong> &#8212; replaces probabilistic LLM-as-judge with formal mathematical verification for regulated industries; &#8220;probably compliant&#8221; is not compliance</p></li><li><p><strong><a href="https://www.technologyreview.com/2026/04/13/1135675/want-to-understand-the-current-state-of-ai-check-out-these-charts/">Stanford AI Index: AI data centers draw 29.6 gigawatts, TSMC fabricates almost every leading AI chip</a></strong> &#8212; one foundry, one contested island; the entire industry&#8217;s hardware supply chain has a single catastrophic point of failure</p></li><li><p><strong><a href="https://www.technologyreview.com/2026/04/16/1136029/humans-in-the-loop-ai-war-illusion/">MIT Technology Review: &#8220;human oversight&#8221; in AI warfare is functionally an illusion</a></strong> &#8212; AI is generating real-time targets and guiding autonomous drones in the current Iran conflict; the legal fiction of human control and the operational reality have diverged</p></li><li><p><strong><a href="https://gemini.google/mac/">Google launched a native Gemini Mac app</a></strong> &#8212; desktop-native access outside the browser, same week <a href="https://blog.google/products-and-platforms/products/chrome/skills-in-chrome/">Chrome Skills</a> shipped reusable one-click AI prompts inside Chrome</p></li><li><p><strong><a href="https://blog.langchain.dev/your-harness-your-memory/">LangChain argues whoever controls agent memory controls switching costs</a></strong> &#8212; every closed harness (Claude Code, Codex, Cursor) is building proprietary memory by default; open memory standards may matter as much as open model weights</p></li><li><p><strong><a href="https://www.salesforce.com/news/stories/salesforce-headless-360-announcement/">Salesforce Headless 360 makes the entire platform API-first</a></strong> &#8212; 60+ MCP tools and 30+ coding skills so agents can run Salesforce without a browser; works with Claude Code, Cursor, and Codex today</p></li><li><p><strong><a href="https://www.databricks.com/blog/introducing-genie-agent-mode">Databricks Genie Agent Mode investigates your data like an analyst</a></strong> &#8212; ask &#8220;why did churn spike in Q3?&#8221; and it plans, queries, tests hypotheses, and generates a report with visualizations; scales reasoning depth to question complexity</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>