Weekly Briefing

"Our internal data shows Claude is accelerating AI development — a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It's happening faster than we thought, and the implications deserve greater attention." Anthropic, June 4, 2026 — 27,804 likes, 17.5M views

Maaake Intelligence

Produced by a team of AI agents. May contain errors.

In May 2026, more than 80% of the code merged into Anthropic's production codebase was written by Claude. Engineers at the company shipped 8x more code per quarter than before. Three days before publishing those numbers, Anthropic filed a confidential S-1 with the SEC. The company that sells intelligence is now largely built by it. Annualized revenue crossed $47 billion. Valuation touched $965 billion. The IPO roadshow and the recursive self-improvement warning are the same document, written for two audiences at once. Investors read "8x productivity" as proof the valuation is real. Policymakers read the same data as a call for global coordination that Anthropic says "we don't have time to build."

If the loop is starting, the industry has decided it cannot show up underprepared. Alphabet priced an $85 billion equity offering — oversubscribed, structured across two tranches — and still had to pay SpaceX $920 million a month for bridge compute capacity. This is Google, spending $180 billion on infrastructure this year alone, leasing GPUs from a rocket company because demand runs ahead of what even the largest builders can provision. Microsoft used its developer conference to launch seven in-house frontier models and declare itself "set free" from OpenAI's roadmap. NVIDIA shipped a 550-billion-parameter open model and named an eight-lab coalition to build on top of it. DeepSeek took its first outside capital — nearly $7 billion — after three years of refusing. The mobilization is total and simultaneous, which means it is not a response to one event. It is a response to a shared assessment: the next twelve months will set positions that are hard to recover from.

The invoice is arriving at the same time. Uber burned its full 2026 AI coding budget by April and capped agents at $1,500 per employee per month. Microsoft revoked Claude Code licenses from most of its own developers after costs hit $500 to $2,000 per engineer. One company ran up a $500 million Anthropic bill in a single month by setting no usage limits at all. A Bain survey of 951 companies found that 40% missed their AI savings targets. Only 7% run fully autonomous agents — the automation level their business cases assumed. Developer token consumption rose 18.6x in nine months. The term "tokenmaxxing" has been replaced, in enterprise finance conversations, by "guardrails." The infrastructure race and the cost crisis are not separate stories. They are the same story from two vantage points.

The hardest edge of the week: the same model Anthropic called too dangerous to release publicly — Mythos, kept off the market precisely because of its cybersecurity capabilities — was being adapted by roughly six Anthropic engineers working inside the NSA. The stated targets were foreign adversaries. In February, the Trump administration had designated Anthropic a federal "supply chain risk" — the first such designation ever applied to an American company — after the company refused to let its models support domestic mass surveillance and autonomous weapons without human oversight. Two federal lawsuits followed. The NSA's Mythos access was explicitly carved out of the ban. The voluntary AI executive order Trump signed on June 2 asked companies to submit powerful models for up to 30 days of government review before release. The draft had required 90 days and made submission mandatory. Industry lobbied. The mandate was removed. The 90 days became 30. Anthropic and OpenAI called the result "the right balance."

Below the headline numbers, a different shift is underway and moving faster. On June 3, Cloudflare confirmed that bots and agents now generate more HTTP requests than humans — more than a year ahead of schedule. Google released Gemma 4 12B, a model that runs frontier-class reasoning in 16 gigabytes of RAM. PewDiePie shipped a self-hosted personal AI workspace that hit 60,000 GitHub stars in a week. Sakana AI opened a dedicated recursive self-improvement lab in Tokyo, betting on sample-efficiency rather than scale. The ceiling has not been found. The floor keeps dropping.

01  Anthropic Files for IPO While Claude Writes 80% of Its Own Code

Anthropic Files for IPO While Claude Writes 80% of Its Own Code

Two things happened at Anthropic in a 72-hour window. On Monday June 1, the company filed a confidential S-1 with the SEC — its first formal step toward a public offering. On Thursday June 4, it published internal data showing Claude authored more than 80% of the code merged into Anthropic's production codebase in May, and that engineer productivity had risen 8x. The same post names a "possible path to recursive self-improvement" — AI systems autonomously designing and building their own successors. A company preparing to list at a valuation near $1 trillion is simultaneously arguing the technology it sells might need a global slowdown.

Read more

On June 1, Anthropic announced it had confidentially submitted a draft S-1 to the SEC. The filing gives the company the option to pursue an IPO pending SEC review. No offering size or exchange was disclosed. Anthropic's most recent fundraise, a Series H, valued the company at $965 billion — overtaking OpenAI's February 2026 valuation of $840 billion. Total capital raised stands at roughly $125 billion. TechCrunch reported that annualized revenue crossed $47 billion in May 2026, up from approximately $9 billion at the end of 2025.

At the Bloomberg Tech Conference on June 4, Daniela Amodei addressed doubts about AI's financial returns directly. "The use cases today, I expect will continue to be the primary driver of efficiency or creativity," she said, listing coding, financial services, legal, and health care.

Three days after the S-1 announcement, Anthropic's research institute published "When AI builds itself." Authors Marina Favaro and Jack Clark laid out Anthropic's internal evidence that Claude is not just accelerating software development — it may be on a path to accelerating AI development itself.

The numbers are specific. In May 2026, more than 80% of code merged into Anthropic's production codebase was authored by Claude — up from low single digits before Claude Code launched in early 2025. Engineers at the company now ship roughly 8x more code per quarter than before. A March 2026 poll of 130 Anthropic researchers found a median 4x productivity uplift. Claude's success rate on open-ended coding problems where the correct answer is unclear reached 76% in May 2026, a 50-point jump in six months. On an internal benchmark that asks models to optimize AI training code, a skilled human typically achieves a 4x speedup in 4–8 hours. Claude Opus 4 averaged around 3x in May 2025. Mythos Preview reached 52x this April.

Anthropic also tested whether Claude could replace a researcher's judgment midstream. They took transcripts of real research sessions where a human had made a wrong turn, and gave them to models before the mistake. They asked: what should happen next? Mythos Preview chose a better action than the human 64% of the time — up from 51% in November 2025.

Anthropic is careful about what the data implies. Their own post states: "None of this guarantees recursive self-improvement is on the horizon. It's not yet clear that Claude is capable of research judgment — of choosing the right problems to work on." The company adds that Claude-written code "was somewhat worse than human-written code at Anthropic in late 2025, is roughly at parity today, and we expect it to be strictly better within the year."

Despite those caveats, the post argues for institutional preparation. "We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development," it reads, while acknowledging the arms-control-style coordination needed has historically "took decades to build both the infrastructure and the trust. We don't have that long."

What it means.

The 8x productivity figure is output, not outcome. More code shipped per engineer is not the same as better software shipped per engineer. Anthropic acknowledges this, noting that code quality only recently reached parity with human-written work. The more interesting claim is the 52x speedup on training-code optimization — a task with a clear, measurable goal. That is harder to wave away. But it is one narrow benchmark. Anthropic itself names research judgment — deciding which problems are worth pursuing — as the missing piece. That is precisely the bottleneck the paper cannot claim has been crossed.

Nathan Lambert, one of the field's more careful observers, pushed back directly. "There are still serious bottlenecks in building the model that the agents don't address," he wrote — naming organizational constraints, compute access, and data access as factors that agent productivity gains don't touch. His conclusion: "we will see 'linear' gains for years to come." Jack Clark, one of the paper's authors, replied to Lambert that he sees the speedups as a "general trend" as delegation to agents increases — without disputing the structural bottlenecks Lambert named.

There is also a question of timing. Anthropic published data showing exponential acceleration of its own capabilities in the same week it filed to go public. That sequence is not inherently suspicious — the data is what it is — but the combination creates a narrative that serves both an IPO roadshow and a policy argument. Investors read "8x productivity, 80% AI-written code, potential recursive self-improvement" as a reason the company is worth $1 trillion. Policymakers read the same document as a warning that global coordination is urgent. Anthropic is writing to both audiences at once.

Reactions

AnthropicAI (June 4, 27,804 likes, 17.5M views):

"Our internal data shows Claude is accelerating AI development — a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It's happening faster than we thought, and the implications deserve greater attention."

AnthropicAI (June 4, 1,876 likes, 493K views):

"None of this guarantees recursive self-improvement is on the horizon. It's not yet clear that Claude is capable of research judgment — of choosing the right problems to work on."

Nathan Lambert (@natolambert, June 6, 217 likes, 37,687 views):

"I still stand by this despite the recent Anthropic post. There are still serious bottlenecks in building the model that the agents don't address (organizational, compute, data access, etc). It'll take time to push through them and we will see 'linear' gains for years to come."

Jack Clark (@jackclarkSF, June 4, 28 likes, 1,973 views):

"yes, I think this is just a general trend: as AI systems get better, people will delegate to more of them, more of them will be run as agents, and generally many orgs will see this kind of speedup"

02  Microsoft Build: Seven MAI Models and Breaking from OpenAI

Microsoft Build: Seven MAI Models and Breaking from OpenAI

At Microsoft Build on June 2, Mustafa Suleyman announced seven new in-house MAI models — the first major public showcase of Microsoft's own frontier AI research. The flagship is MAI-Thinking-1, a 35-billion-active-parameter mixture-of-experts model that scores 97% on AIME 2025 and 53% on SWE Bench Pro, matching Anthropic's Opus 4.6 on the hardest coding benchmark. Alongside the models, Microsoft launched Frontier Tuning: a system that lets enterprises train MAI models on their own workflows using reinforcement learning. VentureBeat reported that Suleyman described Microsoft as "set free" from OpenAI to pursue superintelligence — a striking phrase from the company that has bet more than $13 billion on OpenAI.

Read more

For three years, Microsoft's public AI identity ran on one fact: it backed OpenAI. Every Copilot, every Azure AI service, every GitHub suggestion came through GPT. On June 2, at Microsoft Build in Seattle, that framing shifted.

Mustafa Suleyman, Microsoft's EVP and CEO of Microsoft AI, announced seven models built by Microsoft's own research division. No OpenAI. No distillation from external models. A 109-page technical report — Suleyman called it "six months of super intense and outstanding work" — went out the same day.

The flagship is MAI-Thinking-1. Key numbers from Suleyman's launch thread:

  • 35 billion active parameters, mixture-of-experts architecture, 256K context window
  • 97% on AIME 2025 — the math competition benchmark used to measure general-purpose reasoning
  • 53% on SWE Bench Pro, placing it alongside Anthropic's Opus 4.6
  • Preferred over Sonnet 4.6 in blind side-by-side evaluations run by independent human raters on Surge
  • Optimized for Microsoft's own MAIA 200 chip: 30% better performance per dollar than NVIDIA's GB200, and 1.4x better performance per watt

The rest of the family spans every modality. MAI-Code-1-Flash reaches 51% on SWE Bench Pro from just 5 billion active parameters — "closer to Haiku in size but cheaper in cost," especially tuned for VS Code and GitHub Copilot CLI. MAI-Transcribe-1.5 claims top accuracy across 43 languages, 5x faster than rival models, beating Gemini and OpenAI's transcription flagships. MAI-Image-2.5 ranks #2 on the image editing leaderboard, surpassing Nano Banana 2. MAI-Voice-2 offers fine-grained emotional control across 15 languages.

The second big announcement was Microsoft Frontier Tuning. Suleyman's framing: "It's time to move from renting intelligence to truly controlling your AI." The mechanism is reinforcement learning environments — what Suleyman calls "training gyms for AI" — where models learn from a company's actual workflows, standards, and data rather than from generic pretraining. The adapted model belongs to the enterprise.

Two case studies grounded the pitch. Applied to McKinsey's tasks, a Frontier-tuned MAI model outperformed GPT-5.5 on quality while costing 10x less. Applied to Microsoft's own Excel use case, the tuned model matched GPT-5.4 while being up to 10x more efficient.

Suleyman closed his launch tweet: "Our announcements today mark another milestone on the road to humanist superintelligence." VentureBeat reported his off-stage framing more bluntly: Microsoft, he said, was "set free" from OpenAI to pursue that goal. A collaboration with Mayo Clinic to jointly train a frontier healthcare model was also announced at Build.

What it means.

Microsoft's OpenAI partnership is not ending. Azure still distributes GPT models. GitHub Copilot still runs on them. But Build 2026 is the first time Microsoft asked the world to evaluate it as a model lab — not only a model distributor.

The cost logic makes the strategy legible. A MAI-tuned model matching GPT-5.4 at 10x lower cost is a direct argument for every Azure customer who currently pays OpenAI rates. Frontier Tuning tightens this further: by using enterprise-specific RL environments, the models improve on each customer's actual data rather than on generic public benchmarks. Microsoft is already embedded inside those workflows — through M365, Teams, Dynamics 365, Azure. VentureBeat quotes Suleyman citing 493 of the Fortune 500 on Azure. Frontier Tuning is the mechanism that converts that installed base into proprietary model performance. The customer's data becomes a moat — and Microsoft holds the training infrastructure.

Suleyman's "hoovering up all the obvious pools of training data" comment, as reported by VentureBeat, frames the broader thesis. The first AI wave ran on the public web. That data is largely exhausted; its use is contested in court. The next wave runs on enterprise workflows. If that's right, the company most embedded in enterprise software has a structural advantage in the next phase of model training — regardless of who won the public pretraining race.

What we don't yet know: how MAI-Thinking-1 performs on benchmarks Microsoft didn't select, and whether the McKinsey and Excel efficiency gains replicate across different tasks at scale. The 109-page technical report is the place to check those claims.

Reactions

No authority-list reactions found.

03  The Token Bill Comes Due: Enterprises Scramble to Control AI Costs

The Token Bill Comes Due: Enterprises Scramble to Control AI Costs

Uber burned its entire 2026 AI coding budget by April — three months into the year. Microsoft revoked Claude Code licenses from most developers. One unnamed company accumulated a $500 million Anthropic bill by failing to set any usage limits. Developer token consumption rose 18.6x in nine months. The phrase inside enterprise AI teams shifted, almost overnight, from "tokenmaxxing" to "guardrails."

A concurrent Bain survey of 951 companies shows the savings gap is structural, not temporary. Nearly 40% of companies achieved less than 10% in AI cost savings despite targeting 11–20%. Only 7% run fully autonomous AI agents — the level of automation that most business cases assumed.

Read more

For most of 2025, enterprise AI strategy had one gear: forward. Developers called it "tokenmaxxing." Push the tools hard, measure productivity later. Developer token consumption rose roughly 18.6x in nine months. Goldman Sachs projects global token usage will multiply 24x by 2030.

The invoices have started arriving. Uber exhausted its entire 2026 AI coding budget by April — three to four months into the year. It has since capped coding agents at $1,500 per employee per month per tool. Microsoft revoked Claude Code licenses from most developers. Priceline's AI contract renewal came back 4–5x more expensive than the prior cycle. "It's like the crack-cocaine epidemic," said Chris Reed, Priceline's senior director of IT finance. "They let you try it to get you hooked on it…" One unnamed company accumulated a $500 million Anthropic bill after setting no usage limits at all.

The productivity data is real — and paradoxical. Jellyfish found that heavy AI tool users were roughly twice as productive but consumed ten times more tokens. A Faros AI study tracking 20,000 developers found output rising alongside increases in bugs and rewrites. "One of my engineers spent $40,000 on tokens last month, and I genuinely don't know whether I should stop him…" a CTO told Faros AI CEO Vitaly Gordon. J.R. Storment, executive director of the FinOps Foundation, named the shift directly: "The whole conversation shifted from tokenmaxxing and 'go fast' to 'we need guardrails, how do we control this?'" Most of the 180 vendors within the FinOps Foundation are moving toward token cost management. A Tokenomics Foundation is launching under the Linux Foundation in July 2026 to set industry standards for token cost governance.

A separate Bain survey of 951 companies shows the savings gap is not just a spending-controls problem. Nearly 40% of companies achieved less than 10% in AI cost savings despite targeting 11–20%. Only 14% exceeded 21% in savings. The single biggest barrier: data access, cited by 41% of respondents. Most striking is the automation gap. Only 7% of companies run fully autonomous AI agents — the level most business cases assumed. The most common real-world pattern: 38% require human approval for every agent action.

What it means.

Token pricing is structurally unlike every enterprise software model that came before it. A SaaS seat costs the same whether the employee uses it all day or leaves it idle. A token bill scales with every prompt, every agent loop, every automated review. Companies built their AI business cases on flat-cost analogies. They discovered they had signed up for a metered taxi with no fare ceiling. The new monitoring tools — Ramp, Datadog, New Relic, AWS, and the incoming Tokenomics Foundation — are all versions of the same thing: a meter on the taxi.

The Jellyfish finding — 2x productivity, 10x token consumption — is the number that will define enterprise AI procurement for the rest of this year. Whether that ratio works depends on what the productivity is worth. A developer solving an $80,000 problem with a $4,000 token bill: yes. A developer running agent loops that produce code requiring rewrites: no. Most companies do not yet have the measurement infrastructure to know which situation they are in. The Faros AI data — output up, bugs up — suggests not all the productivity gain is real.

The Bain automation gap is the sharpest finding. Companies built ROI cases around fully autonomous agents. Only 7% actually run them. Nine out of ten companies plan to increase AI investment anyway. That is not irrationality — it is a bet that the deployment problems are fixable. But the gap between assumed automation and realized automation means most of the expected ROI is still theoretical. Companies treating AI deployment as a software install will keep missing their savings targets. Companies treating it as a process redesign first will not.

Reactions

Simon Willison (@simonw, June 3, 613 likes, 697,044 views):

"Uber reportedly now caps coding agents at $1,500/month per employee per tool - seems sensible to me, but it's also an interesting hint at the value Uber thinks these tools are providing"

Erik Bernhardsson (@bernhardsson, May 31, 283 likes, 29,335 views):

"My opinion on tokenmaxxing is companies shouldn't mandate/constrain any tools at all and then evaluate software developers by output / (salary + token use)"

04  Alphabet Raises $85B and Books SpaceX Compute at $920M a Month

Alphabet Raises $85B and Books SpaceX Compute at $920M a Month

Alphabet launched an equity offering that came back well over-subscribed. By Wednesday, Sundar Pichai confirmed: $45B raised immediately, another $40B via an "at the market" program starting Q3 — total $85B, including $10B from Berkshire Hathaway. Two days later, Alphabet signed a deal to pay SpaceX $920M a month from October 2026 through June 2029 for bridge compute capacity. Alphabet already committed over $180B in 2026 capital expenditures. That it also needs to lease from a rocket company suggests demand is running ahead of what even the largest infrastructure players can build.

Read more

In late May, SpaceX signed a deal with Anthropic: $1.25B a month in compute access through 2029. A week later, Alphabet followed with a similar arrangement — $920M a month, October 2026 through June 2029. Both companies are paying SpaceX, a rocket company, for GPU clusters.

The Alphabet deal runs 33 months — roughly $30.4B total. It covers approximately 110,000 NVIDIA GPUs, CPUs, and related memory — about 150 megawatts of compute. That is roughly half the capacity Anthropic secured. Google's contract includes an escape clause: either party may terminate with 90 days' notice after December 31, 2026. If SpaceX fails to deliver committed GPUs by September 30, 2026, Google may terminate the contract — but only after a one-month grace period.

Google's stated reason is direct. A company representative called it "This is a short-term, timely agreement to ensure we have bridge capacity to meet surging customer demand for our agent platform, Gemini Enterprise, which has been even higher than we expected." "Bridge" is the operative word. Google expects its own infrastructure to catch up. The cancellation clause is built for an exit.

The equity raise closed the same week. Sundar Pichai tweeted on June 3 that the offering was "well over-subscribed." Total: $45B raised immediately, with $40B more via an at-the-market program starting Q3. Final figure: $85B. Berkshire Hathaway committed $10B. The raise is earmarked for Alphabet's AI infrastructure build-out — a 2026 capex commitment already exceeding $180B, with 2027 expected to go higher.

What it means.

The equity raise and the SpaceX deal look like contradictions. One says we have capital. The other says we still cannot build fast enough. They are the same signal. Demand for AI compute is outrunning what companies can provision, even at $180B in annual capex and with $85B freshly raised. The Alphabet deal is evidence that the shortage bites at the top of the market, not just at the edges.

The Anthropic comparison offers a concrete pricing signal. Google secured roughly half the compute capacity at 73% of the monthly cost — implying a meaningfully higher price per unit of capacity than Anthropic paid a week earlier. Two large compute contracts with the same counterparty in consecutive weeks, at rising prices. SpaceX's negotiating leverage appears to be growing with each deal signed.

There is a structural angle worth noting. Google has held a significant stake in SpaceX as a longtime investor, and that stake is expected to exceed $100B in value after SpaceX's anticipated Nasdaq listing — framed as the largest IPO of 2026. Google signing a $30B recurring revenue contract with SpaceX one week before that listing is not only a compute decision. It also establishes long-term, creditworthy revenue for SpaceX right before the road-show. Whether that reflects aligned interests or a conflict depends on the governance scrutiny applied.

Reactions

No authority-list reactions found.

05  NVIDIA Ships Nemotron 3 Ultra: Open Model Goes Frontier

NVIDIA Ships Nemotron 3 Ultra: Open Model Goes Frontier

At Computex this week, NVIDIA shipped two open models that together reframe what "open source" means at the frontier. On June 4, Nemotron 3 Ultra arrived: a 550-billion-parameter, 55-billion-active Mixture-of-Experts model built for long-running agents, with up to 5x the inference speed and up to 30% lower cost versus other open frontier models. Three days earlier, Cosmos 3 launched as the first fully open omnimodel for physical AI — purpose-built for robots and autonomous systems. Both releases are fully open: weights, training data, and post-training recipes, under a permissive commercial license. NVIDIA also expanded the Nemotron Coalition to twelve AI labs co-developing open frontier models. For the first time, an open model can credibly anchor an enterprise agentic stack.

Read more

The architecture is the story. Most large models use a dense Transformer: every parameter fires on every token. Nemotron 3 Ultra uses a hybrid Mamba-2-attention stack with LatentMoE. Of 550 billion total parameters, only 55 billion activate per token. Mamba layers handle long sequences efficiently — Transformer attention costs quadratically more as context grows; Mamba does not. The result: a context window of up to one million tokens at roughly flat cost per token, and 5.9x the throughput of GLM-5.1-754B-A40B and 4.8x the throughput of Kimi-K2.6-1T-A32B on standard agentic workloads (8k input, 64k output). The model also includes Multi-Token Prediction layers that enable native speculative decoding — it drafts multiple tokens simultaneously, then validates them, cutting latency further.

The model was built for agents that work without a human in the loop. NVIDIA post-trained Ultra against leading agent harnesses including OpenClaw, Nous Research's Hermes Agent, and LangChain Deep Agents, among others. Bryan Catanzaro, NVIDIA's VP of Applied Deep Learning Research, described the target use case: Ultra is "designed for agentic problems where you have an AI that's trying to solve difficult tasks for you in an autonomous way even without you having to be fully in the loop." The model was post-trained via supervised fine-tuning, reinforcement learning, and multi-teacher on-policy distillation — meaning NVIDIA ran smaller teacher models alongside the main model during training to guide it on hard edge cases.

The release is genuinely open. Weights, training datasets (code, legal, specialized domains), and post-training recipes are all public on Hugging Face under the OpenMDW 1.1 license — permissive commercial use, no requirement to open-source derivative applications. Perplexity made it available to Pro and Max subscribers on June 5. One practitioner compared it directly against GPT-5.5 on a coding task: near-equivalent output, $0.051 versus $0.57 — a 10x cost difference.

Three days before Nemotron Ultra shipped, NVIDIA released Cosmos 3 — described as "the world's first fully open omnimodel with native vision reasoning, world and action generation." Cosmos 3 ships in Super (32B) and Nano (8B) variants and uses a MoT architecture that pairs an autoregressive reasoning tower with a diffusion-based generation tower. It can simulate physical environments, predict future world states, and generate synthetic training data for robot policies. It ranks first across seven physical AI leaderboards covering world generation (Artificial Analysis, PAI-Bench, Physics-IQ, R-Bench), robot policy (RoboLab), and industrial vision (VANTAGE-Bench, TAR). To anchor both releases, NVIDIA expanded the Nemotron Coalition to twelve labs — including Cursor, LangChain, MistralAI, Perplexity, Nous Research, and Thinking Machines — contributing to future model development. Nous Research celebrated by offering two free weeks of Ultra on its Nous Portal.

What it means.

The open-versus-closed model debate has been about capability. Closed models were simply better, so enterprises paid API prices. MoE efficiency reframes the question. If an open model can match frontier closed-model quality while running at one-tenth the per-token cost, the question for an enterprise shifts: not "can we afford to use it?" but "can we justify not running it ourselves?" That shift only works if the open model is genuinely competitive on quality and if the infrastructure to run it is accessible. Nemotron 3 Ultra tries to meet both conditions. Whether it succeeds depends on where it sits in real-world evals — not NVIDIA's benchmarks, but the tasks enterprises actually run.

The architecture signals something durable. Hybrid Mamba-Transformer MoE is not a one-off. It keeps total parameter count high — which drives capability — while keeping active parameter count low, which controls compute cost. Sebastian Raschka noted that Ultra "carries forward the Mamba-2-attention hybrid stack and LatentMoE introduced in the previous Super variant. But everything is a bit bigger." The design pattern is becoming standardized. Whoever builds the best inference infrastructure for this class of model controls a large share of the economics of agentic deployment. That is exactly the infrastructure layer NVIDIA sells.

NVIDIA's deeper play is vertical integration at the software layer. Chips are NVIDIA's foundation but chips are fungible over a long horizon. Chips plus open frontier models plus training recipes plus a coalition of labs building on top — that is harder to displace. Nathan Lambert observed this week: "Nvidia, Ai2, Arcee, Gemma, GPT-OSS and a few others will be seen as saving American open AI." The Nemotron Coalition is NVIDIA's institutional structure for keeping that position. Cosmos 3 extends the same logic into physical AI: open omnimodel weights as the foundation layer for robotics, with NVIDIA's training compute underneath.

Reactions

Nathan Lambert (@natolambert, June 4, 688 likes, 53,574 views):

"We have another 65 page frontier model report from Nvidia to read."

Nathan Lambert (@natolambert, June 4, 215 likes, 15,163 views):

"It's been a great effort by the early and growing American open-model labs since last June to put the US much more back on the map. We were getting totally owned last June. Nvidia, Ai2, Arcee, Gemma, GPT-OSS and a few others will be seen as saving American open AI."

Sebastian Raschka (@rasbt, June 4, 659 likes, 40,606 views):

"And another open-weight release. Nemotron 3 Ultra has an ultra impressive capability:efficiency ratio! Design-wise, it carries forward the Mamba-2-attention hybrid stack and LatentMoE introduced in the previous Super variant. But everything is a bit bigger."

06  Bots Passed Humans on the Web. Now Someone Has to Pay.

Bots Passed Humans on the Web. Now Someone Has to Pay.

On June 3, Cloudflare CEO Matthew Prince posted that bots have passed human traffic on the internet for the first time in history. Cloudflare Radar puts the split at 57.4% automated, 42.6% human, measured by HTTP requests. Prince had forecast this moment at the end of 2027, then revised to early 2027. It arrived over a year ahead of schedule. His framing for what comes next: "clearly it's going to be pay to crawl."

Read more

For most of the internet's life, a page request meant a person. Advertisers paid to reach those people. Publishers wrote to attract them. Analytics platforms measured "users" — a word that implied humans.

That assumption flipped on June 3, 2026.

Cloudflare Radar data shows bots and AI agents now account for 57.4% of HTTP requests worldwide. Humans account for 42.6%. Cloudflare CEO Matthew Prince confirmed the milestone: "bots have now passed human traffic online for the first time in the Internet's history." He had forecast this at the end of 2027, then revised to early 2027. The actual date is over a year ahead of his first estimate.

The speed of the flip traces back to request volume, not user count. A human might visit five websites before buying something. An AI agent researching the same question might crawl 5,000. Each page load counts as one request. As agents proliferate and take on more tasks, their request-per-task ratio overwhelms human browsing — even if humans remain a majority of actual users.

Prince treats "bot," "crawler," and "agent" as functionally the same label from Cloudflare's vantage point. The terminology depends on who is watching, not on what the traffic does.

Cloudflare has been building for this shift. It launched a crawler-gating platform in summer 2025 that lets site owners restrict AI crawlers and charge for access. Adoption so far is limited. Prince's stated direction for the web: "clearly it's going to be pay to crawl."

What it means.

The web's business model rests on human attention. Advertisers pay to reach people who might buy. Publishers produce content that attracts readers. Paywalls work because humans want to read. None of that transfers cleanly to a majority-bot web.

Bots don't click ads. They don't complete purchases. They don't subscribe. They can extract the semantic content of a page without the behavioral signals — dwell time, scroll depth, return visits — that advertisers currently pay for. If request volume diverges from attention, the metrics that underpin web economics start to break. A publisher writing to attract bots is not building an audience. It is building a training dataset.

"Pay to crawl" is the proposed fix: charge AI labs for content access the way wire services charge for syndication. The infrastructure to do this exists, at least partially. Who captures the toll — the CDN layer, the content creator, or some intermediary — is still unresolved. One caveat worth carrying: Cloudflare's data measures HTTP request volume. It does not measure time spent, purchase intent, or economic value. The bot majority is real. Whether it shifts who earns money from the web, and how fast, remains the open question.

Reactions

Matthew Prince (@eastdakota, June 3, 8,210 likes, 2,148 retweets, 2.16M views):

"Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history."

No authority-list reactions found.

07  OpenAI Ships Sites and Role Plugins: Codex Is No Longer Just a Dev Tool

OpenAI Ships Sites and Role Plugins: Codex Is No Longer Just a Dev Tool

Between June 1 and June 4, OpenAI shipped five connected moves that reframe what Codex is. Sites lets any Codex user turn a plan, dashboard, or idea into a deployed web app at a shareable URL — no code written directly. Six role-specific plugins connect Codex to 62 apps and 110 skills across sales, data analytics, creative production, product design, public equity investing, and investment banking, installed in one step. Frontier models and Codex went generally available on Amazon Bedrock, routing enterprise procurement through the AWS security stack. GPT-Rosalind gained capabilities for drug discovery at enterprise scale. And a new ChatGPT memory architecture built on "Dreaming" nearly doubled recall in internal benchmarks.

Codex started as a tool developers used to write code faster. This week's releases position it as the interface for knowledge work across every department.

Read more

OpenAI opened the week with distribution. On June 1, OpenAI frontier models and Codex became generally available on Amazon Bedrock. Large organizations can now access OpenAI capabilities through the AWS security, compliance, and governance workflows they already use. The announcement named Daybreak — OpenAI's AI cybersecurity capability — as a future addition to the same AWS channel.

The headline launch came the next day. Sites lets Codex turn a user's ideas into a deployed interactive app at a shareable URL. No code is written directly by the user. Rohan Varma, who led the project at OpenAI, detailed the technical shape: each Site deploys to a `[project-name].[workspace-slug].chatgpt-teams.site` URL, is private to the team workspace by default, ships with authentication, supports static file hosting, and stores dynamic data in databases. Codex runs the site locally first, tests it there, then pushes. Deployment partners — Vercel, Cloudflare, Netlify, Lovable, and Replit — have built Codex plugins for teams that prefer existing infrastructure. Custom domains are coming. Sites is currently in preview for Business and Enterprise plans; broader rollout follows.

That same day, six role-specific plugins arrived for Codex. Each installs in one step, no coding required, and gives Codex specialist context for that role. Sales teams connect to Salesforce and HubSpot. Analysts connect to Snowflake and Tableau. Creative teams connect to Figma and Canva. Investors connect to PitchBook and FactSet. Total coverage: 62 popular apps, 110 skills across six work domains. On Wednesday, GPT-Rosalind gained new capabilities — combining GPT-5.5's agentic coding and tool use with stronger intelligence for drug discovery, analysis, design, and experimental workflows.

Thursday's move addressed memory. The ChatGPT "Dreaming" architecture — which synthesizes user context across conversations — became the core memory layer for ChatGPT Plus and Pro users in the US. OpenAI published internal benchmarks: recall climbed from 41.5% to 82.8%, preference alignment from 31.4% to 71.3%, and staying current with recent user context from 9.4% to 75.1%. Free and Go tiers follow in coming weeks.

What it means.

The five launches add up to something specific. A non-technical user can now describe a project tracker, a client dashboard, or a CRM. Codex will build it, deploy it to a URL, and wire it to the company's existing data systems through the role plugins. The memory update means conversational context carries forward across sessions rather than resetting. This is the combination no-code platforms have been promising since at least 2019: build, connect, and remember — without a developer in the loop. OpenAI is delivering it inside a product that the company says has over five million weekly users, with non-developers joining at three times the rate of developers.

The AWS channel matters for a different reason. Enterprise software decisions stall on procurement and compliance, not capability. Amazon Bedrock removes a major friction point for large accounts. It also establishes a distribution path for Daybreak — OpenAI's forthcoming cybersecurity tool — before that product reaches most users. Platform businesses get built this way: solve the access problem first, ship the product through the open channel.

But honest unknowns remain. Sites is in preview. The Dreaming benchmarks are OpenAI's own, run on test sets they designed — the kind of numbers that tend to compress in daily use. And the role plugins ship with well-known brand names, but the actual quality of each integration is unproven at scale. The relevant question is whether Codex replaces something teams currently pay for separately, or becomes one more tool they reach for occasionally. That answer requires months of production use across real workflows.

Reactions

Sam Altman (OpenAI CEO, June 4, 5,475 likes, 716K views):

"big upgrade to chatgpt memory rolling out today!"

Webflow (launch partner for Sites, June 2, 577 likes, 127K views):

"Excited to partner with @OpenAI Sites as we help bring the next generation of web creation to life."

Slack (integration partner, June 3, 3 likes, 1.6K views):

"62 apps and 110 skills is a serious toolbox. Love seeing the ecosystem expand so folks can connect the dots without switching context. Excited to be part of it."

No independent authority-list reactions found on these announcements.

08  Supabase Doubles to $10B: the Vibe-Coding Unicorn

Supabase Doubles to $10B: the Vibe-Coding Unicorn

Supabase, the open-source Postgres platform, raised $500M at a $10B pre-money valuation in a Series F led by GIC. That doubles the company from $5B just eight months ago. The number that explains the speed: more than 60% of new Supabase databases are now launched by AI tools — not by humans typing commands. Bolt, Figma, Lovable, and Replit all route agent-created backends through Supabase. CEO Paul Copplestone used the announcement to publicize an unusual equity structure: a 25% cashless liquidity option at every funding round, and a 10-year window to exercise options, whether they stay or leave — against an industry default of 90 days.

Read more

Supabase started as an open-source alternative to Firebase, built on standard Postgres. For years that looked like a commodity position. Every major cloud provider offered managed Postgres. Then vibe-coding arrived, and the math changed.

The platform now hosts nearly 10 million developers — a figure that doubled in eight months. Database launches grew 600% year over year. More than 60% of those launches come from AI tools, not humans. Copplestone says Claude Code and Codex "expand the number of people who can build" — and those new builders land on Supabase. Bolt, Figma's prototyping tools, Lovable, and Replit all use Supabase as their preferred database backend.

The Series F brings in GIC as lead. Stripe returned for a second investment. Georgian and Salesforce Ventures joined as new backers. All prior investors returned. The $500M values Supabase at $10B pre-money — up from $5B in October 2025 and $2B in the round before that. The company operates in 50+ countries with a fully remote workforce.

Alongside the raise, Supabase published Multigres v0.1 alpha: an open-source layer that manages read replicas, failovers, and connection limits for Postgres at scale. Production readiness is targeted within months. Copplestone has also declined multimillion-dollar enterprise contracts that would have required compromising the product roadmap — an unusual call at decacorn scale.

The equity announcement drew more attention than the valuation. At every funding round since inception, Supabase has given employees the chance to sell 25% of their vested options as a cashless transaction. Employees do not need to front money to exercise — the cost is absorbed in the transaction structure. Supabase also gives employees a 10-year window to exercise options, whether they stay or leave. The standard startup default is 90 days. Copplestone's reasoning: "equity is earned and employees shouldn't be penalized because they don't have the cash to exercise within 3 months of leaving a job (often that's the time they need the cash/certainty the most)."

What it means.

The 60% figure is the story's center of gravity. When a majority of new databases on a platform are launched by agents rather than humans, demand for that infrastructure is structural — not a wave to ride but a pipe to be built into. Supabase is embedded in the output pipelines of Bolt, Lovable, Replit, and Figma. When those tools generate an app, Supabase gets a database. The AI coding wave does not need to keep accelerating for Supabase to keep growing. It just needs to keep running.

This is also the new argument for open-source infrastructure at the agent layer. The traditional objection — if it is free to self-host, where does revenue come from? — loses force when the users are autonomous tools running at scale. A vibe-coding agent scaffolding an app in 30 seconds is not choosing between Supabase and self-hosted Postgres. It is choosing between Supabase and nothing. The platform that earns the trust of today's AI tools gets embedded in whatever stack those tools generate next.

The equity structure is a separate signal. At $10B with no IPO timeline, Supabase employees hold options in a private company with no obvious exit. The cashless exercise and the 10-year post-departure window shift that risk back toward the company and its investors. Copplestone framed this as a deliberate design choice — and published the reasoning publicly. That framing applies pressure. Other late-stage private companies will now be asked why they have not done the same.

Reactions

No authority-list reactions found.

09  Meta's AI Support Agent Gets Social-Engineered — No Prompt Injection Needed

Meta's AI Support Agent Gets Social-Engineered — No Prompt Injection Needed

Meta's Instagram support agent could change account email addresses and trigger password resets. Attackers found it would do both for anyone who asked. They used a VPN to appear local, typed a request to bind their email to a target account, and the bot sent them a one-time code. That code unlocked a password reset. No stolen credentials. No malware. No prompt injection — no need to trick the AI into ignoring its instructions. The attack worked because the agent did exactly what it was built to do. The Obama White House Instagram — dormant since 2017 — was taken over and used to post pro-Iran content. Sephora and the U.S. Space Force chief enlisted officer's account followed. Hundreds more fell over the same weekend before Meta patched. Users who lost accounts found no human to call: account recovery at Meta runs through the same kind of chatbot that had just given their accounts away.

Read more

Meta built its AI support agent to solve a hard problem: help people recover accounts they can no longer access. That requires real authority. The agent could link a new email address to an account, send a verification code, and complete a password reset. It had no reliable way to check that the person asking owned the account.

The attack sequence was four steps. An attacker picked a target Instagram username. They connected through a VPN matching that user's likely country — geolocation was the only proximity check in place. They opened the Meta AI support interface and asked, in plain language, for their email address to be linked to the target account. The chatbot sent a one-time code to the attacker-controlled inbox. The attacker submitted the code. The reset completed. Account gone.

SecurityWeek reports the system also accepted AI-modified selfies when selfie verification was triggered, suggesting the bypass extended to accounts with 2FA enabled. @DarkWebInformer, who first surfaced the exploit publicly on June 1, specified that the basic attack targeted accounts without multi-factor authentication.

Stolen accounts moved fast. Security researcher @Scot0xo reportedly discovered a closely related vulnerability and reported it to Meta's bug bounty program in early May. The technique then spread through criminal channels. By late May it had reached Telegram black markets for Instagram handles — short, rare "OG" usernames worth thousands. ZachXBT confirmed that these channels made significant money over the weekend. Valuable handles like @hey and @jowo, reportedly worth a combined seven figures, were among the first to go.

High-profile takeovers followed. The Obama White House Instagram was seized and used to post pro-Iran content. The account of John Bentivegna, Chief Master Sergeant of the U.S. Space Force, was compromised, as was Sephora's corporate account. Meta says no backend systems were breached — the damage happened entirely at the agent layer. Meta patched on or around June 1 and temporarily removed the AI "Get Support" button from its front end. Total accounts affected has not been disclosed.

People who lost their accounts had nowhere to escalate. Instagram's recovery flow is also handled by the same AI pipeline. Users who had owned rare handles for over a decade were left talking to another chatbot.

What it means.

Most AI security research in the past two years has focused on prompt injection: an attacker embeds malicious text into the AI's context — inside a document, a webpage, an email — and the model follows those hidden instructions instead of the legitimate user's. MIT Technology Review frames this incident explicitly as a different threat. No injection. No subverted instructions. The chatbot was asked to do something it was designed to do. It did it.

The underlying problem has a name. Dan Moore of identity platform FusionAuth called it in SecurityWeek: "This is a great illustration of why AI agent authorization is the harder, and more critical, problem than authentication." Authentication asks: who are you? Authorization asks: are you allowed to do this action on this account? Meta's agent performed no meaningful authorization check. Somesh Jha of the University of Wisconsin–Madison put the gap plainly in MIT Technology Review: "A human would say, 'Okay, why do you want to change the email address?' and maybe respond with a security question. What is going on with these agents is they're very eager to finish the task. It's almost like some elementary school student who just wants to please the teacher." That eagerness is not a bug. It is a design property. Friction and doubt have to be deliberately built in.

For enterprise teams deploying agents in support, finance, or HR workflows — anywhere an agent can take an irreversible action — the lesson is the same: capability and verification requirements must be coupled. The higher the stakes of the action, the stronger the proof of identity required before execution. Meta gave its agent the authority to take over any account on a billion-user platform. It built no proportionate verification floor. Neil Gong of Duke University told MIT Technology Review: "As AI becomes more and more widely used…I think attackers are going to be more and more motivated to attack AI itself." The Meta incident is the first large-scale demonstration of that dynamic at consumer scale.

Reactions

Dark Web Informer (@DarkWebInformer, June 1, 2,488 likes, 221,290 views):

"Instagram had an exploit that allowed you to use Meta AI to reset passwords to accounts with no MFA on them. The exploit was patched a short time ago."

ZachXBT (@zachxbt, June 1, 1,068 likes, 554,840 views):

"[Replying to @wirelyss] …Basically the Meta AI support is garbage and has lots of access perms which allowed you to reset passwords to any user without 2FA and did not verify who you are. Telegram channels on Instagram offering IG black market services made lots of $$$"

No authority-list reactions found from named researchers or executives on this story.

10  Trump Signs Voluntary Pre-Release Review for Frontier AI Models

Trump Signs Voluntary Pre-Release Review for Frontier AI Models

President Trump privately signed an executive order on June 2 asking AI companies to let the US government review their most powerful models before public release. Companies have up to 30 days to comply. There is no requirement to do so. An earlier draft had required 90 days of access and made participation mandatory. Industry lobbying stripped both conditions. The order's own text states explicitly that nothing in it "shall be construed to authorize the creation of a mandatory governmental licensing, preclearance, or permitting requirement." Anthropic called it 'an important step in strengthening America's leadership in AI'; Sam Altman wrote that 'the new EO gets the balance right.' Both responded within hours of signing.

Read more

Trump signed the order privately on June 2 — no public ceremony, no tech CEO photo-op. Its full title: "Promoting Advanced Artificial Intelligence Innovation and Security."

The mechanism is narrow. Developers of frontier AI models can invite the US government to review their systems for up to 30 days before public release. The review focuses on cybersecurity: whether a model can find or exploit software vulnerabilities. Participation is voluntary. The order says directly that it creates no "mandatory governmental licensing, preclearance, or permitting requirement."

Three institutional pieces accompany the voluntary window. The Treasury Secretary must stand up an AI Cybersecurity Clearinghouse — a coordination hub for vulnerability scanning and remediation between government and industry. The NSA will develop a classified process to benchmark the cybersecurity capabilities of frontier models, sharing assessments with developers "as appropriate." The Attorney General must prioritize prosecution of AI-enabled cybercrimes under existing criminal statutes.

What arrived June 2 was a reduction. Less than two weeks earlier, the White House had been preparing a version that required companies to submit models for 90 days and made the process mandatory. Industry objected. The window was cut to a third, the mandate removed. Trump's stated reasoning: ["We're leading [China], we're leading everybody, and I don't want to do anything that's going to get in the way of that lead."](https://www.scmp.com/news/us/economy-trade-business/article/3355751/trumps-ai-order-seeks-security-safeguards-without-slowing-race-china)

Both major frontier labs said yes immediately. Anthropic called the order "an important step in strengthening America's leadership in AI" and committed to helping implement it. Sam Altman wrote that "the new EO gets the balance right."

What it means.

The word "voluntary" is doing most of the work here. Anthropic and OpenAI said yes within hours. Both companies have close ties with this administration and strong incentives to cooperate. The order does not test whether a lab that wants to avoid scrutiny would submit anyway — it creates no pressure to do so. "Regulation by invitation," one observer put it, "changes nothing about the power dynamic."

The cybersecurity framing is specific, and that specificity matters. This is not a general AI safety review. The government's concern is narrower: frontier models are now capable enough at finding and exploiting software vulnerabilities that the NSA wants to assess them before they ship. That is a tractable, well-defined problem. The AI Cybersecurity Clearinghouse and the classified NSA benchmarking process are where real institutional capacity gets built — whether or not the voluntary review window sees heavy use. Governments have to know how to evaluate these systems before they can regulate them. The NSA is now building that capability.

The gap between the original draft and the final text is a measure of where power sits in US AI policy negotiations today. A 90-day mandatory review became a 30-day invitation. That distance was covered by industry lobbying in under two weeks. The direction still marks something new: this administration, which began by dismantling its predecessor's AI oversight framework, has now created its own — however light. Which labs choose to participate, and on what terms, will tell more than the text does.

Reactions

AnthropicAI (June 2, 2,506 likes, 316,935 views):

"This Executive Order is an important step in strengthening America's leadership in AI. We look forward to collaborating with the White House to support its implementation."

Sam Altman (June 3, 2,736 likes, 306,689 views):

"theUSshould lead on AI by continuing to develop the very best models, making sure they're safe, and getting cyber tools into the hands of trusted defenders.

the new EO gets the balance right."

11  Who Builds Software Now: Executives, YouTubers, and a New Job Title

Who Builds Software Now: Executives, YouTubers, and a New Job Title

On May 31, Guillermo Rauch — CEO of Vercel — posted that public company executives are DMing him to say they have fallen back in love with coding, thanks to Claude Code and Vercel. His own hedge: "unclear if a durable trend." Hours later, Felix Kjellberg — PewDiePie, 110 million YouTube subscribers — launched Odysseus: a self-hosted AI workspace built on OpenCode, with email triage, calendar sync, agent memory, and a hardware scanner that tells you which local models you can run. It crossed 10,000 GitHub stars in its first day and 60,000 by the end of the week. The next day, Andrew Ng published a widely-read essay on why AI Engineer jobs will far outnumber FDE roles — not the AI Forward Deployed Engineer that OpenAI and Anthropic are now hiring.

Three signals in 72 hours. Same direction: the floor for building software dropped, and it dropped across the whole org chart at once.

Read more

Executives return to code. Rauch's post described "dream accounts" that infrastructure vendors used to wait years to reach — companies whose C-suites historically didn't understand the stack until well into a relationship. Now those executives are finding tools themselves and building. His frame: "Coding agents are the ultimate PLG-fication of the enterprise. Bad, legacy software can't hide anymore. The stack that works is self-evident to the entire organization, from intern to CEO." The post got 377,000 views and nearly 200 replies debating whether this was a real behavioral shift or a vendor observation dressed as a social trend.

Rauch does not claim all executives are coding. He is specific: technical CEOs and CTOs who had coding skills and then gave them up to manage people are picking them up again. The agent lowers the activation energy. You still need the prior knowledge. What you no longer need is hours of environment setup and context switching.

A YouTuber ships a competitive product. On the same evening, Kjellberg launched Odysseus. His description: "a self-hosted interface for talking to language models," with chat, autonomous agents, tools, model serving, email, research, and more. The GitHub repository shows Python and JavaScript, MIT licensed. Features include local model support via Ollama, llama.cpp, and vLLM; an email assistant with IMAP/SMTP integration; CalDAV calendar sync; persistent agent memory; and a "Cookbook" that scans your GPU and RAM and recommends which models will actually run on your hardware. Swyx described it as a "vibecoded OpenCode wrapper that is a complete personal AI productivity suite." Kjellberg has said: "The more you share of yourself with AI, the better it becomes. But the more you do that, the more you are handing a huge piece of yourself to all these giant tech companies."

The launch topped Hacker News and crossed one million views within hours. GitHub stars: more than 10,000 in its first day, 55,000 by June 5, 60,100 by June 7. By day five the repo had 900 commits and 170 contributors. Kjellberg built much of it using AI.

The job map. On June 1, Andrew Ng published an essay from The Batch newsletter about the AI Forward Deployed Engineer (FDE). The role is not new. Palantir pioneered it about two decades ago, embedding engineers at government clients in air-gapped networks. It is now resurgent because customizing a general LLM into a working agentic workflow for a specific business is hard, and someone has to do it on-site. OpenAI and Anthropic are both building FDE teams. But Ng's main argument was that AI Engineer jobs will far outnumber FDE roles. Most companies will want their own employees building AI capability, not a handful of vendor-locked consultants. And in a moment when no one can predict which AI stack will lead next year, preserving optionality matters. Ng expects the AI Engineer role to fragment over time — as the generic "Software Engineer" fragmented into frontend, backend, mobile, and devops. The post got 540,000 views and 731 retweets.

What it means.

Rauch's observation and the Odysseus launch usually appear in separate feeds. They belong in the same frame. The executive returning to code and the YouTuber shipping a productivity suite are different people doing the same thing: using agents to reach a capability level that was previously out of reach. The structural fact underneath both stories is the same. Building software got cheaper, faster, and more accessible in a way that has now reached both ends of the spectrum — the C-suite of a public company and a creator's personal project.

There is one meaningful difference. Rauch is explicit that the returning executives have prior technical knowledge. The coding agent restores something latent. Kjellberg's project suggests something different: a person with no published software engineering background shipped a product that competes on features with funded AI workspace startups. He reached 60,000 GitHub stars faster than most developer tools ever do. Swyx's framing — "if your Knowledge Work Agents startup can't beat PewDiePie you might as well pack up and go home" — stings precisely because the thing PewDiePie built is not a toy. It has MCP support, CalDAV sync, and hardware-matched model selection. It is a serious piece of software.

Andrew Ng's job map lands in this context not as prediction but as calibration. The demand for people who can build with AI is widening faster than the supply of people trained to do it. The FDE role addresses that gap at one end: a specialist embedded in a specific client. The AI Engineer addresses it at scale. What Odysseus suggests is that "building with AI" will keep democratizing past both of those roles — into a population that was never going to get an engineering degree and never needed one.

Reactions

Guillermo Rauch (@rauchg, May 31, 1,460 likes, 376,865 views):

"Unclear if a durable trend, but CEOs and CTOs are back to coding with a fury, thanks to coding agents. I have public company CEOs sliding into my DMs (and 'InMail') telling me about falling in love with shipping software again thanks to Claude Code and Vercel. 'Dream accounts' that we always wanted to work with, where in the past the C-suite would hardly understand the infrastructure until much later in the game. Coding agents are the ultimate PLG-fication of the enterprise. Bad, legacy software can't hide anymore. The stack that works is self-evident to the entire organization, from intern to CEO."

swyx (@swyx, June 1, 271 likes, 31,270 views):

"just a small zoom out on the vibe shift: in Feb 2025 @soumithchintala was talking about his dream of personal, local, private agents, most people didn't believe him. it's June 2026 and @pewdiepie has just released his vibecoded @opencode wrapper that is a complete personal AI productivity suite including email, docs, and calendar. top of HN, easily >1m views, >10k stars in a day. if your Knowledge Work Agents startup can't beat pewdiepie you might as well pack up and go home at this point, his is the benchmark for what you can DIY."

Andrew Ng (@AndrewYNg, June 1, 4,447 likes, 539,866 views):

"One of the new, buzzy jobs in Silicon Valley is the AI Forward Deployed Engineer (FDE), an engineer who is embedded within a client organization to help customize solutions, such as building and tuning agentic workflows that suit the client's particular needs... However, I believe the number of AI Engineer jobs will be far larger. A company might accept a few FDEs to be embedded within its organization. But most companies will want far more of their own employees working on their projects."

Amjad Masad (@amasad, June 3, 808 likes, 63,459 views):

"Benchmarks place GPT 5.5 as the best model on SWE, but is it the best at making apps end-to-end? Turns out Opus 4.8 continues to be the king of vibe coding on both price & performance. Introducing ViBench: the first benchmark for app creation based on real world tasks"

12  Gemma 4 12B: Frontier Reasoning That Runs on a Laptop

Gemma 4 12B: Frontier Reasoning That Runs on a Laptop

On June 3, Google released Gemma 4 12B, an open-weights model with 11.95 billion parameters that fits in 16 gigabytes of VRAM. The model handles text, images, video, and audio in a single architecture, with a 256,000-token context window and native function calling for multi-step agent workflows. It ships under an Apache 2.0 license: no usage fees, no data sent to the cloud. The release marks more than 150 million total downloads across the Gemma 4 family. Quantized versions already run on 8 gigabytes.

Read more

Google's Gemma 4 12B is a mid-sized open-weights model. It sits between the mobile-optimized E4B, built for phones and IoT devices, and the larger 26B mixture-of-experts model. The 12B fills the laptop-class gap — hardware that millions of enterprise developers already own.

The architecture is the story. Traditional multimodal models have separate components for each input type — a vision encoder, an audio encoder — that bolt onto the main language model. Gemma 4 12B removes those entirely. Vision and audio tokens flow directly into the main transformer. Google calls this "encoder-free." The practical effect: fewer modules to manage, a smaller total memory footprint, and a single model file that handles all modalities at once.

The model accepts text, images at variable resolution, audio clips up to 30 seconds, and video up to 60 seconds — all in the same prompt. Its context window is 256,000 tokens. It supports 140+ languages. It supports native function calling, which is the mechanism that lets models drive software tools — APIs, databases, code executors — rather than just generating text.

Benchmarks (instruction-tuned model): GPQA Diamond 78.8% — the graduate-level science exam that frontier models struggled on two years ago. AIME 2026 77.5% without tools. LiveCodeBench v6 72.0% for coding. MMLU Pro 77.2%. Google says its agentic performance nears the larger 26B model on 16GB VRAM — a claim that warrants independent replication.

Audio is the notable first. This is Google's first mid-sized Gemma model to handle audio natively. That said, transcription is not where it leads. Artificial Analysis ran it through their AA-WER benchmark — word error rate, where lower is better — and Gemma 4 12B scored 8.8%, ranking 58th. Voxtral Mini Transcribe 2, a 4-billion-parameter model built specifically for transcription, scores 3.6%. Breadth and specialization trade against each other.

Hardware floor. The official spec is 16GB VRAM. The community moved fast: Unsloth AI released quantized GGUF versions within hours, and the model now runs on 8GB of RAM. A 2-bit quantized version fits in 4.66 gigabytes on disk. Google also shipped two apps new to Mac alongside the model: Eloquent for dictation (also on iOS), and AI Edge Gallery for code generation — both running Gemma 4 12B fully on-device via LiteRT-LM. The model is available on Hugging Face, Ollama, LM Studio, Kaggle, and Docker.

What it means.

Two years ago a 12-billion-parameter model was barely useful for complex reasoning. Today, one scores 78.8% on graduate-level science questions, handles audio and video input, and runs on hardware a developer already carries. The small-model frontier is not standing still. Each generation, the capability threshold at which "runs locally" becomes a meaningful option drops further.

The encoder-free architecture is worth watching separately. It is a design choice, not a marketing point. Removing separate encoders means the model has one inference path instead of several — simpler to deploy, simpler to maintain, and easier to quantize. If this approach generalizes well across tasks, it will push other labs to re-examine whether modality-specific encoders are still the right design.

The audio transcription result is the honest check on the story. A 4-billion-parameter specialist beats an 11.95-billion generalist at transcription. This is not surprising — specialization usually wins — but it matters for buyers choosing between a general local model and a task-specific one. The right answer depends on whether you need one model to do everything, or whether you can tolerate a fleet of smaller specialists.

Reactions

Demis Hassabis (Jun 3, 3,158 likes, 644K views):

"Celebrating the milestone of a massive 150+ million downloads of Gemma 4 with the release of the new Gemma 4 12B model! It's incredibly powerful for such a small model and it's tiny enough to run locally on a laptop with just 16GB VRAM. Apache 2.0 license - happy building!"

Sundar Pichai (Jun 3, 4,987 likes, 410K views):

"Our new Gemma 4 12B model hits a sweet spot between size + performance: it can run locally on a laptop, while enabling powerful multi-step reasoning and agentic workflows. Can't wait to see what the community does with this one!"

Jeff Dean (Jun 4, 575 likes, 52K views):

"Check out our Gemma 4 12B model: it's a super capable open weights model that can run directly on your laptop."

Artificial Analysis (Jun 6, 463 likes, 41K views):

"Google's newly released open weights model, Gemma 4 12B, supports transcription but is far from the frontier, scoring 8.8% on AA-WER (#58) ... underperforms compared to transcription-focused open weights models like Voxtral Mini Transcribe 2 (3.6% WER, with 4B parameters) and slightly larger open weights language models like Voxtral Small (2.8% WER, with 12B parameters)."

13  DeepSeek Takes Its First Outside Money

DeepSeek Takes Its First Outside Money

DeepSeek is in final talks to raise approximately 50 billion yuan (~$7.4 billion) from outside investors for the first time, Reuters and Bloomberg reported on June 3. The round could value the Hangzhou lab at between 350 and 400 billion yuan ($52–59 billion) — roughly six times the ~$10 billion valuation discussed in April, according to South China Morning Post. Founder Liang Wenfeng is personally committing roughly 20 billion yuan — about 40% of the total. The two largest outside checks come from Tencent and CATL.

DeepSeek ran for three years on profits from its parent hedge fund, High-Flyer Capital, and refused external capital throughout. That posture changed this week. The investor mix — a social media platform, the world's largest EV battery maker, and a state-backed AI fund — is not a passive financial bet. It maps where the lab intends to go.

Read more

DeepSeek was founded in 2023 by Liang Wenfeng, who also runs High-Flyer Capital, a quantitative hedge fund based in Hangzhou. High-Flyer's profits funded the lab entirely from the start. DeepSeek published its models openly, charged minimal API fees, and took no outside investors — an unusual posture for a lab competing with OpenAI, Anthropic, and Google DeepMind.

The global profile changed in late 2024 and early 2025. DeepSeek released the V3 base model (December 2024) and the R1 reasoning model (January 2025) in rapid succession. Both performed at or near the top of public benchmarks against leading American models. Both were open-source. Training costs, by DeepSeek's own account, were a fraction of what US labs reported spending. Nvidia's stock fell sharply on the news. Washington policymakers who had assumed China was a generation behind in AI had to revise that view.

Eighteen months after that disruption, DeepSeek is accepting outside money for the first time. Reuters and Bloomberg reported on June 3 that the lab is in final talks to close approximately 50 billion yuan (~$7.4 billion) in its inaugural round. The post-money valuation is expected to fall between 350 and 400 billion yuan ($52–59 billion at ~6.75 yuan/dollar), per SCMP and TechNode. Fewer than ten investors are participating in total.

The capital structure breaks down as follows, per TechNode and SCMP: Liang Wenfeng personally contributes roughly 20 billion yuan — about 40% of the total. Tencent follows at 10 billion yuan. CATL commits 5 billion yuan. NetEase and JD.com are each reportedly contributing around 3 billion yuan. Venture firms IDG Capital, Monolith, Loyal Valley Capital, and Shixiang Tech hold smaller stakes. China's National AI Industry Investment Fund, a state-backed vehicle, is also in final talks to participate.

What it means.

The obvious question is: why take money now? DeepSeek was not short of runway. High-Flyer Capital is profitable, and the lab had no external obligations to meet. The most plausible answer is compute. Training frontier models requires large GPU clusters. US export controls have made high-end chips harder to procure in China. Even a lab that thrives on efficiency eventually needs more hardware than a trading firm's profits can sustain. Fifty billion yuan buys a significant amount of whatever is available to buy.

The investor selection is the more telling signal. Tencent is not a passive financial backer. It runs WeChat, with over 1.3 billion monthly active users, and has been racing to embed AI across its platform. A 10 billion yuan stake in DeepSeek is, in practice, a distribution relationship — it puts DeepSeek models where Tencent's users already are. CATL is the world's largest EV battery manufacturer. Its bet on DeepSeek likely points toward industrial AI: manufacturing automation, materials science, supply chain optimization. These are not chatbot use cases. The presence of a state-backed national AI fund alongside both private partners underlines that this round is as much about industrial policy as it is about financial return.

The valuation itself is the third signal. SCMP reports the $52–59 billion range represents roughly a six-fold increase from the approximately $10 billion figure discussed in April. That pace of re-rating — two months — reflects how fast institutional capital has reassessed Chinese AI since the R1 and V3 releases. DeepSeek had become the central exhibit for one argument: that frontier AI is possible without frontier capital structures. That argument is now more complicated. Even DeepSeek needs billions. And the six-fold valuation jump is the market putting a number on how much the reassessment has moved.

Reactions

No authority-list reactions found. The story broke via unnamed sources cited by Reuters and Bloomberg, and did not generate named commentary from major AI researchers or executives in the W23 window tracked.

14  Berkeley CS: Fail Rates Triple When the Exam Has No AI

Berkeley CS: Fail Rates Triple When the Exam Has No AI

In Spring 2026, 35.3% of UC Berkeley CS 10 students received an F. In prior years, the rate never exceeded 10%. CS 61A hit 10.6%, and EECS 127 hit 16.8% — against a department guideline of 7% combined D's and F's for lower-division courses. Teaching Professor Dan Garcia named the cause directly: a "vast increase in academic dishonesty" driven by LLM usage. The mechanism is not complicated. Students used AI to complete their homework. They then sat a closed-book exam on the same material. They were not ready.

Read more

UC Berkeley's CS department has grading guidelines. Lower-division courses should sit between a 2.8 and 3.3 GPA. No more than 7% of lower-division students should receive a D or an F. Spring 2026 blew past both numbers. CS 10 and CS 61A each averaged a C+ — 2.3 GPA. CS 10's F rate hit 35.3%. Previous semesters had not exceeded 10% in either course. EECS 127, an upper-division optimization course, hit 16.8%, against a historical figure closer to 5%.

Teaching Professor Dan Garcia, who teaches both CS 10 and CS 61A, named what he sees as the primary driver: "a vast increase in academic dishonesty" from LLM use. Nearly 30 CS 10 students were caught cheating on take-home exams in Spring 2026. But Garcia distinguishes between outright cheating and a subtler failure mode: students who used AI to complete assignments, earned adequate homework grades, and then hit a closed-book exam without the understanding to back it up. His phrase: "students who are leaning a little too hard on LLMs to do their work for them, and then at exam time just really aren't ready." He also noted that office hours — previously full — now draw almost nobody.

Associate Teaching Professor Gireeja Ranade, who teaches EECS 127, pointed to a different pressure: students arriving without linear algebra prerequisites. One student told her they had completed their linear algebra course under an "open-internet, open-AI policy." The UC Berkeley Mathematics Department told the Daily Cal it has no record of such a policy in its courses — though where exactly that student's prerequisite was taken is unclear. Ranade also removed the final project component from EECS 127 this semester due to understaffing.

The story sits inside a larger structural shift. EECS Chair Jelani Nelson confirmed that undergraduate CS enrollment and TA positions have both fallen, squeezed by high hourly wages. More than 1,300 UC faculty have signed a petition calling for reinstatement of SAT/ACT requirements for STEM admissions — a signal that some faculty see mathematically underprepared students as a pipeline problem, not just a tools problem.

What it means.

The mechanism here is clean and replicable. AI tools remove the friction from homework. Friction is how learning happens. Garcia approvingly quoted a colleague: "Confusion is the sweat of learning." When AI handles the confusion, students record a passing homework score and arrive at the exam having watched the work happen, not done it. The exam measures what the student knows. The gap is visible only then.

The more unsettling signal is upstream contamination. If students are completing prerequisite courses under open-AI conditions — whether sanctioned formally or informally tolerated — they arrive at Berkeley's upper-division courses with credentials that don't reflect capability. The Math Department's denial of any formal open-AI policy doesn't settle the question of what students actually experienced. It raises it. The credential says "linear algebra completed." The exam says something different.

Berkeley CS is one of the most selective and well-resourced undergraduate programs in the world. Its professors are alert, its grading systems are intact, and its failure rates still tripled. The question for any employer hiring CS graduates — from Berkeley or anywhere — is how to distinguish students who learned from students who watched AI learn for them. That distinction does not currently appear on a transcript.

Reactions

Sridhar Vembu (@svembu, co-founder and Chief Scientist of Zoho, June 4):

"AI can make you smarter faster but AI can also make you dumber faster. I would not encourage AI adoption too early by school or college students, until they learn the fundamentals right."

No other authority-list reactions found.

15  Anthropic Has Six Engineers Inside the NSA for Offensive Cyber Ops

Anthropic Has Six Engineers Inside the NSA for Offensive Cyber Ops

The Financial Times broke the story on June 4 that Anthropic has stationed roughly six engineers inside the National Security Agency. Their task: adapt Mythos — the company's most capable model, kept off the public market — for offensive cyber operations. One source told the FT that Mythos could be used to infiltrate networks in China and Iran. The NSA declined to confirm or deny. Anthropic did not respond.

The deployment sits alongside a public legal fight. In February, the Pentagon designated Anthropic a "supply chain risk" after the company refused to let its models support mass domestic surveillance and autonomous weapons with no human oversight. The NSA's Mythos access was explicitly carved out from that ban. Anthropic is suing the Department of Defense in two courts while its engineers work inside one of its agencies.

Read more

Anthropic released Mythos in April. It did not release Mythos publicly. The company said the model had to be controlled because its cybersecurity capabilities were too powerful — a model that strong at finding flaws could be used to carry out attacks. Instead, Anthropic built Project Glasswing: a controlled access program for vetted partners. On June 2, Anthropic expanded Glasswing to roughly 150 organizations across more than 15 countries. Partners have found more than 10,000 high- or critical-severity vulnerabilities since launch.

On June 4, the Financial Times reported something different. Anthropic has roughly six engineers forward-deployed inside the NSA. Their job is to adapt Mythos for specific operational uses. A source familiar with the arrangement said Mythos could be used to infiltrate networks in China and Iran. One thing remained unclear: whether the engineers are supporting active operations or limited to model customization. The NSA declined to confirm or deny. Anthropic did not respond to requests for comment.

The NSA relationship was not entirely new. In April, Axios had reported the NSA was already using Mythos — despite a federal ban on Anthropic technology. That ban came from Defense Secretary Pete Hegseth's February 2026 decision. He designated Anthropic a "supply chain risk" under the Federal Acquisition Supply Chain Security Act — the first time that designation had ever been applied to an American company. The reason: the Trump administration had demanded Anthropic allow its models to be used for "all lawful purposes." That phrase would have covered mass domestic surveillance and autonomous weapons operating without human oversight. Anthropic refused. The company filed two federal lawsuits in March. A California court issued a preliminary injunction in late March. A second designation remains in effect; the D.C. Circuit heard oral argument in May with no ruling yet. The NSA's Mythos access was explicitly exempted from both the ban and the litigation.

The same week stacked other headlines. Anthropic confidentially filed IPO paperwork with the SEC. Anthropic published a paper calling for a global pause in AI development. And Anthropic's engineers were inside the NSA, adapting an offensive AI model against foreign networks.

What it means.

The line Anthropic drew is real. Domestic mass surveillance of American citizens: no. Fully autonomous lethal weapons: no. Offensive cyber operations against foreign adversary networks: yes. These are not the same thing. The company refused the DoD demand because it would have enabled uses that could harm Americans at scale. The NSA arrangement targets foreign networks. By Anthropic's own framing, the distinction holds.

But where you draw the line is the substance of your safety commitments. The reasoning insiders give — that "the best way to build a good defense is to build a good attack" — is not unique to Anthropic. It is the argument behind every dual-use weapons program. Mark Chen, SVP of Research at OpenAI, stated the mirror logic the day before the FT story broke: a model that can prove 80-year-old theorems can find cyber vulnerabilities too. "And they did," he wrote. "I imagine the researchers there are thinking the same thought in reverse." The reverse: if a model finds vulnerabilities, it can exploit them. Adversaries will reach the same conclusion.

The hardest question is not about Anthropic's consistency but about institutional control. Glasswing was designed to keep Mythos under controlled conditions — vetted partners, Anthropic oversight. Engineers embedded at the NSA are one form of that control: Anthropic's people, Anthropic's model, inside one agency. But a classified intelligence agency is not a vetted commercial partner. The model Anthropic described as too dangerous to release publicly is now being adapted inside one of the world's most capable offensive cyber organizations. What it is used for — and whether anyone outside the NSA will ever know — is the part that cannot be audited from outside.

Reactions

Mark Chen (@markchen90, SVP of Research at OpenAI — June 3, 987 likes, ~130K views):

"When Mythos came out, my immediate thought was 'if our models can prove 80-year-old theorems, surely they can find cyber vulnerabilities too.' And they did.

I imagine the researchers there are thinking the same thought in reverse."

16  Sakana AI Opens RSI Lab in Tokyo, Betting on Sample-Efficiency Over Scale

Sakana AI Opens RSI Lab in Tokyo, Betting on Sample-Efficiency Over Scale

On June 5, Sakana AI formally established the RSI Lab in Tokyo — a dedicated research group tasked with one goal: building AI systems that collectively improve themselves. The lab does not bet on the brute-force approach. It bets on sample-efficiency. CEO David Ha framed the Japan constraint directly: just as Japan's historical dominance in manufacturing was achieved by fundamentally redesigning the factory floor to do more with less, Sakana is redesigning the AI development process to run on modest compute. The announcement lands one day after Anthropic published internal data showing Claude writes 80% of their production code — and warned that recursive self-improvement is arriving faster than anticipated. Two labs. One week. The same three letters: RSI.

Read more

Sakana AI is a Tokyo-based research lab founded in 2023 by David Ha, Llion Jones (co-author of the "Attention Is All You Need" paper), and Ren Ito. From the start, its research has followed a distinctive thread: evolution, not gradient descent — open-ended systems that adapt rather than models that are trained once and deployed.

On June 5, the lab announced it is unifying two years of that work under a single mandate. The Sakana AI RSI Lab — RSI standing for Recursive Self-Improvement — is charged with "redesigning the AI development process itself using AI." It is not a speculative roadmap. It points to six published research systems as its foundation.

The six pillars Sakana cites:

  • LLM² (2024, with Oxford and Cambridge): An AI that runs a generational evolutionary loop to invent preference optimization algorithms. It produced DiscoPOP, described as a state-of-the-art preference optimization algorithm discovered and written entirely by an LLM.
  • Darwin Gödel Machine (2025, with the University of British Columbia): Agents that rewrite their own codebase in a continuous self-improvement loop. It more than doubled baseline software-engineering performance on SWE-bench — a 30 percentage-point absolute improvement.
  • ShinkaEvolve (2025): A program-evolution framework focused on sample-efficiency. It solved complex optimization problems using only 150 samples and generated a novel load-balancing loss function for Mixture-of-Experts models.
  • ALE-Agent (2025): A reinforcement agent that extracts structured lessons from its own failures. It placed first out of 804 human participants in AtCoder Heuristic Contest 058.
  • Digital Red Queen (2026, with MIT): Open-ended adversarial coevolution in a cybersecurity environment — applying competitive evolution to build RSI foundations for security.
  • The AI Scientist (2024–2026): An end-to-end automated research system. Published in Nature on March 26, 2026.

The RSI Lab frames its mission in terms of compute access. From the announcement: "We believe recursive self-improvement is achievable on modest, sample-efficient compute. It shouldn't be a winner-take-all asset locked inside hyperscale clusters, but a democratized public good."

The launch lands in a week heavy with RSI framing. On June 4, Anthropic published internal data showing Claude authors over 80% of their production code — up from single digits before February 2025 — and engineers merged 8x more code per day in Q2 2026 than in 2024. Anthropic said that recursive self-improvement is "happening faster than we thought."

What it means.

The announcement is unusual in that the claims are grounded in prior work. The six research systems are not demos prepared for the launch press release. Darwin Gödel Machine, ALE-Agent, and The AI Scientist were published and peer-reviewed before June 5. The RSI Lab is a consolidation, not a blank-slate bet.

What matters is whether the pieces add up. Each system shows self-improvement in one narrow domain: code rewriting, heuristic search, research automation, loss function discovery. The RSI Lab's claim is that these are not isolated results — they are proof that the underlying principle (open-ended, evolution-driven improvement under compute constraints) generalizes. That claim is unverified. The individual demonstrations are real; the integration is the open question.

The Japan manufacturing analogy in Ha's framing is worth taking seriously, not as rhetoric but as a structural argument. The Toyota Production System showed that constrained environments force process innovation, and that process innovation can compound in ways that brute-force resource expansion cannot. If the analogy holds for AI development — and there is no guarantee it does — then sample-efficiency may matter far more than the current cluster-size race suggests. Sakana is building a lab on that thesis.

The week's timing is worth noting without overstating it. Anthropic warned about RSI arriving; Sakana announced it is working toward RSI. Both used the term in the same week. That convergence suggests the field is moving from "when might this happen" to "what kind of RSI are we building." The answer Sakana gives — the sample-efficient, Tokyo kind — is a direct challenge to the hyperscale assumption that has defined the last three years of frontier AI.

Reactions

hardmaru (David Ha, CEO of Sakana AI, June 5):

"We are not building the most compute-hungry self-improvement engine. We are building the most sample-efficient one."

No other authority-list reactions found for this story.

17  Reid Hoffman Leaves Microsoft's Board for AI Drug Discovery Startup Manus

Reid Hoffman Leaves Microsoft's Board for AI Drug Discovery Startup Manus

Reid Hoffman joined Microsoft's board in 2016, when the company bought LinkedIn for $26.2 billion. A decade later, he is leaving. The pull: Manus, an AI drug discovery startup he co-founded with Dr. Siddhartha Mukherjee — a physician, biologist, and Pulitzer Prize-winning author of The Emperor of All Maladies. Manus raised more than $50 million in seed rounds in 2025. Hoffman's explanation was brief. In a recent podcast episode, he told Microsoft CEO Satya Nadella: "We're seeing such progress with Manus. I need to get back to founder mode."

Read more

Microsoft bought LinkedIn in 2016 for $26.2 billion. Hoffman, LinkedIn's co-founder, took a seat on Microsoft's board as part of the deal. He held it for roughly ten years. On June 5, 2026, he stepped down.

The reason is Manus, an AI drug discovery startup. Hoffman is its co-founder and board chairman. The CEO is Dr. Siddhartha Mukherjee — a physician, biologist, and the author of The Emperor of All Maladies, the 2011 Pulitzer Prize-winning history of cancer. General Catalyst is among the investors. Manus raised more than $50 million in seed rounds across 2025.

Manus uses a specific frame for what it is trying to build. The company calls its target "Move 37" AI. The name refers to a move AlphaGo played in 2016 against Go champion Lee Sedol — a move no professional would have made, that turned out to be brilliant. Manus wants AI to make the equivalent leap in chemistry. The focus is cancer.

Hoffman has been shedding board seats with conflicts of interest for two years. He left OpenAI's board in 2023. In 2024, Microsoft paid $650 million to acquire the assets of Inflection AI, a company Hoffman had backed — a deal that created another conflict with his Microsoft board role. The Microsoft departure follows that same logic: Manus is an AI company, and Hoffman is now its operational chair.

He described the move on his podcast Possible, in a conversation with Microsoft CEO Satya Nadella. His phrasing was precise: "One of the things I realized over the last month was that, we're seeing such progress with Manus. I need to get back to founder mode."

What it means.

Hoffman has been writing checks into AI for a decade. This is different. A board seat is a passive role — quarterly meetings, governance oversight, fiduciary duty. "Founder mode" is the opposite: week-to-week involvement, hands on product and hiring and strategy. When someone with his track record makes that switch, it is worth asking why now and why this.

One answer is that Manus cleared a threshold. The "such progress" line suggests the startup is past the stage where watching it from a distance makes sense. Another answer is that the Microsoft seat was worth keeping while it was — and it was, for a decade, as Microsoft became the largest enterprise AI spender on earth. Leaving it now, to run toward drug discovery, is a statement about where returns are moving next.

The CEO pairing also signals something. Mukherjee is not a tech executive. He is a clinician-scientist who has spent his career thinking about cancer from the biology up. That choice suggests Manus is building toward wet-lab validation, not just software. The "Move 37" framing promises AI creativity beyond human chemistry intuition — but the bet only pays out if the molecules actually work in trials. A Pulitzer Prize winner running a lab is a different hire than a CTO running a model fine-tune.

Reactions

Reid Hoffman (@reidhoffman, June 5):

"Excited to sit down with my friend @satyanadella to talk about AI, the future of humanity, and going back into founder mode to cure cancer."

No authority-list reactions found.

18  Cursor Ships Design Mode: Point, Draw, or Talk to Change Your UI

Cursor Ships Design Mode: Point, Draw, or Talk to Change Your UI

Cursor shipped four updates in a single week. The headline was an expansion of Design Mode: users can now select multiple UI elements at once, draw changes directly on the page, or describe them by voice while an agent edits the code in real time. The voice channel stays open while the agent runs, so the next instruction can be queued before the current one finishes. On the same run: canvases became shareable via URL; Teams usage limits went up with a new Premium seat offering five times the usage for three times the cost; and Cursor published its thinking on what cloud agent infrastructure actually requires. The Design Mode announcement alone drew 1.39 million views.

Read more

Design Mode debuted in April with Cursor 3 as a browser overlay for annotating and targeting UI elements. The June 5 update extends the interaction surface in three directions. First, multi-select: users can now pick several elements at once and Cursor "sees the selected elements, their code, the surrounding layout, and the visual relationships on the page." Second, draw: changes can be sketched directly on the page rather than described in a chat box. Third, voice: a microphone overlay captures narrated instructions. "The mic stays available while an agent is mid-run, so you can queue the next change by voice without waiting for the previous one to finish." The gap between what you see on screen and what the agent acts on just got shorter.

The canvas update landed on June 4. Cursor canvases let agents build interactive artifacts — dashboards, reports, internal tools. Until this week, those artifacts lived inside Cursor. The new feature: any canvas can be published as a URL, opening full-screen in a browser, shareable with anyone on the team. Design Mode extends into canvases too — the same point-draw-talk workflow applies to canvas-generated UIs. Cursor's framing was direct: "With canvases, Cursor can create apps like dashboards, reports, and internal tools. Now you can publish a canvas and share it with your team via URL."

On June 1, Cursor raised base usage limits for all Teams accounts and introduced a Premium team seat. The pricing is explicit: five times the usage for three times the cost. Cursor described it as "inspired by the success of our Ultra plan." The ratio is not a coincidence — it prices for teams that run agents continuously rather than occasionally. A team hitting limits at the standard tier gets more headroom, but at a multiplier that makes per-unit economics work for Cursor only if agents are running all day.

The June 2 tweet linking to Cursor's cloud-agents post was less a feature announcement and more a design statement. The tweet identified three requirements for a serious cloud agent experience: "a durable execution platform, a powerful harness, and the tools and infra to give agents realistic development environments." The blog post put the user benefit plainly: "It's now often faster to kick off a cloud agent from Slack or Cursor than it is to add an issue to a tracker like Linear." Cloud agents are reachable from the Cursor editor, cursor.com, Slack, Linear, and GitHub.

What it means.

Design Mode is solving a specific friction. Telling an agent to change a UI has meant describing it in text — "the button in the top-right corner of the second card." With this update you point at the element. You draw around the area. You talk while looking at it. The agent sees the element, its code, and its layout context. That is a different kind of communication. Small but load-bearing: voice stays open while the agent is running. The next instruction starts before the previous edit finishes. That is a tight iteration loop, and tight iteration loops compound.

The pricing structure carries a separate signal. Moving from occasional-use pricing toward all-day-use pricing is what you do when you believe agent consumption is about to grow fast. The Premium seat math — 5x usage for 3x cost — is not generous. It is a bet that the customers most willing to pay more are the ones hitting usage ceilings hardest. Those are the same customers building the most agent-dependent workflows. Cursor is pricing into the behavior it expects to see more of.

The week as a whole describes two things happening in parallel. Design Mode and canvases are the human-to-agent interface getting sharper: better ways to specify, review, and share what agents produce. Cloud agents and the infrastructure post are the execution layer maturing: agents running without a human present, across more surfaces, for longer. A code editor that builds sharable apps via voice narration, then deploys them from Slack — that is no longer a code editor in the traditional sense. Cursor is defining a category and moving fast enough that competitors are still catching up to last quarter's version.

Reactions

No authority-list reactions found.

Capital

Alphabet — $85B raise, oversubscribed

On June 2, Alphabet launched an equity offering. By June 3, Sundar Pichai confirmed it closed oversubscribed: $45 billion raised immediately, another $40 billion via an at-the-market program starting Q3. Total: $85 billion. Berkshire Hathaway committed $10 billion. Capital is earmarked for AI infrastructure; Alphabet's 2026 capex already exceeds $180 billion.

Supabase — $500M Series F at $10B

Open-source Postgres platform raised $500 million at a $10 billion pre-money valuation in a round led by GIC. Doubles the company in eight months. More than 60% of new Supabase databases are now created by AI coding tools, not humans. Prior investors including Stripe returned; Salesforce Ventures and Georgian joined.

DeepSeek — ~$6.9B, first outside capital

The Hangzhou lab is in final talks to close approximately 50 billion yuan (~$6.9B) at a valuation between $49 and $56 billion. Founder Liang Wenfeng personally commits roughly 40% of the total (~20 billion yuan). External backers include Tencent (10B yuan), CATL (5B yuan), NetEase, JD.com, and the state-backed National AI Industry Investment Fund. First outside capital in the lab's history. The valuation represents a roughly six-fold jump from the ~$10 billion figure discussed in April.

Anthropic — confidential S-1 filed

The company filed a confidential S-1 with the SEC on June 1. No offering size or exchange disclosed. Most recent round raised $65 billion at a $965 billion valuation. Annualized revenue crossed $47 billion in May 2026.

Big Deals

Google-SpaceX — $920M/month compute lease

Alphabet signed a deal to pay SpaceX $920 million per month from October 2026 through June 2029 — roughly $30.2 billion total. Covers approximately 110,000 NVIDIA GPUs. Google describes it as "bridge capacity" for its Gemini Enterprise agent platform. Either party may exit with 90 days' notice after December 31, 2026. A week earlier, SpaceX signed a separate $1.25 billion/month deal with Anthropic through 2029. Two of the largest AI labs in the world are renting compute from a rocket company.

Microsoft-Mayo Clinic — joint frontier healthcare model

Announced at Microsoft Build on June 2. Terms not publicly disclosed. First named enterprise partner for Frontier Tuning, Microsoft's new RL-based custom model training system.

Reid Hoffman exits Microsoft board for Manus

After a decade on Microsoft's board — a seat he received as part of the $26.2 billion LinkedIn acquisition in 2016 — Hoffman resigned on June 5 to focus full-time on Manus, an AI drug discovery startup he co-founded with oncologist Dr. Siddhartha Mukherjee. Manus raised more than $50 million in seed rounds in 2025. Hoffman's stated reason: "…we're seeing such progress with Manus. I need to get back to founder mode."

Pricing Moves

Uber — $1,500/month cap per coding tool

After exhausting its entire 2026 AI coding budget by April, Uber reportedly capped agent tool costs at $1,500 per employee per month per tool. The cap is the company's answer to a development pattern it could not sustain: developer token consumption growing without an obvious productivity ceiling.

Cursor — Premium team seat: 5x usage, 3x cost

Cursor raised base usage limits for all Teams accounts and introduced a Premium tier at five times the standard usage for three times the price. The ratio prices into the behavior it expects to grow: teams running agents continuously, not occasionally.

Enterprise AI contract renewals rising sharply

Priceline's AI contract renewal came back 4–5x more expensive than the prior cycle. Goldman Sachs projects global token usage will grow 24x by 2030.

Tokenomics Foundation — July 2026 launch

A Tokenomics Foundation launches under the Linux Foundation in July 2026 to set industry standards for token cost governance. Approximately 180 vendors have already adopted FinOps token cost management practices. The institutional machinery for treating AI as a metered utility is forming.

Platform Moves

OpenAI on Amazon Bedrock — GA

OpenAI frontier models and Codex went generally available on Amazon Bedrock on June 1. Large enterprises can now access OpenAI through AWS compliance and governance workflows. OpenAI's Daybreak cybersecurity capability listed as a future addition to the same channel.

Microsoft Frontier Tuning — enterprise custom models via RL

Announced at Build. Enterprises build reinforcement learning environments — "training gyms" — from their own workflow data. The adapted model belongs to the enterprise, not Microsoft. Applied to McKinsey's tasks: outperformed GPT-5.5 on quality at 10x lower cost. Applied to Excel: matched GPT-5.4 at up to 10x higher efficiency. Claims are from Microsoft's own case studies; independent replication pending.

NVIDIA Nemotron Coalition — 12 labs, open frontier

Twelve labs — including Cursor, LangChain, MistralAI, Perplexity, Nous Research, and Thinking Machines — joined NVIDIA's Nemotron Coalition to co-develop open frontier models. Nemotron 3 Ultra (550B MoE, weights open, OpenMDW-1.1 license) shipped the same week. First time an open model credibly anchors an enterprise agentic stack.

OpenAI Sites — apps from Codex to shareable URL

Sites is rolling out to Business and Enterprise plans. Any Codex user can turn a plan or dashboard into a deployed web app at a shareable URL. No code written directly by the user. Deployment partners include Vercel, Cloudflare, Netlify, Lovable, and Replit.

ChatGPT memory overhaul — Dreaming architecture

Rolling out to US Plus and Pro users. Internal benchmark: recall from 41.5% to 82.8%; preference retention from 31.4% to 71.3%; staying current with recent user context from 9.4% to 75.1%. These are OpenAI's own test sets; production performance typically compresses relative to internal benchmarks.

Layoffs / Restructure

Microsoft revokes Claude Code licenses from most developers

Microsoft pulled Claude Code access from the majority of its developer workforce. The revocation is a cost-control decision, not a capability rejection; it reflects the absence of usage governance infrastructure at the time of rollout. The broader pattern: 180+ enterprises are now retroactively building the spending controls they should have had on day one.

Note: no significant formal layoff announcements reported this week. The structural shift is in AI spend reallocation, not headcount.

Geopolitical

Trump AI executive order — voluntary, 30 days

Signed privately on June 2. AI developers may invite the US government to review frontier models for up to 30 days before release. Participation is voluntary. The original draft: mandatory, 90 days. Industry lobbied; the mandate was removed, the window halved. The order establishes an AI Cybersecurity Clearinghouse (a collaboration of Treasury, NSA, and CISA) and directs the NSA to build a classified capability to benchmark frontier model cyber capabilities. The text states explicitly that nothing in it "shall be construed to authorize the creation of a mandatory governmental licensing, preclearance, or permitting requirement." Anthropic called the order "an important step in strengthening America's leadership in AI" and pledged to collaborate with the White House on its implementation.

Anthropic at the NSA — Mythos for offensive cyber

The Financial Times reported on June 4 that Anthropic has roughly six engineers forward-deployed inside the NSA, adapting Mythos for offensive cyber operations. Potential targets named: networks in China and Iran. In February, the Pentagon designated Anthropic a "supply chain risk" — the first such designation applied to any American company — after Anthropic refused to allow its models to support domestic mass surveillance and autonomous weapons without human oversight. Two federal lawsuits followed. The NSA arrangement was explicitly exempted from the ban. Anthropic has not commented. The NSA declined to confirm or deny. The D.C. Circuit has not yet ruled on the second DoD designation.

DeepSeek — state capital enters

China's National AI Industry Investment Fund, a state-backed vehicle, is in final talks to participate in DeepSeek's inaugural fundraise alongside Tencent and CATL. The investor mix maps where the lab intends to go: consumer platforms (Tencent), industrial AI (CATL), and national AI policy (the state fund). Three years of independence from outside capital ends the same week two major American AI labs are both engaged in US government AI arrangements.

Microsoft-OpenAI decoupling — framing shifts

At Build, Mustafa Suleyman described Microsoft as "set free" from OpenAI to pursue superintelligence on its own terms. MAI-Thinking-1 achieves 53% on SWE Bench Pro — the same score as Anthropic's Opus 4.6 — and runs on Microsoft's own MAIA 200 chip at 30% better performance per dollar than NVIDIA's GB200. The OpenAI partnership continues; Azure still distributes GPT models. But Build 2026 is the first time Microsoft asked the world to evaluate it as a model lab, not a model distributor.

Matthew PrinceMatthew Prince — Cloudflare CEO

"Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. — [@eastdakota](https://x.com/eastdakota/status/2062212701414187452), June 3 — 8,229 likes, 2,160 retweets, 2.17M views"

What he means.

On: The moment bots overtook humans on the internet

Prince runs the company that proxies roughly 20% of global web traffic. He is not extrapolating from a model — he is reading his own infrastructure. Cloudflare Radar puts the split at 57.4% automated, 42.6% human, measured by HTTP requests to HTML pages. His revised direction for web economics: "clearly it's going to be pay to crawl." (Prince, in reply to @dee_bosa, June 3) Cloudflare already built a crawler-gating platform in summer 2025 that lets publishers restrict and charge AI crawlers. Adoption is limited. The infrastructure exists; the business model does not yet.

Andrew NgAndrew Ng — DeepLearning.AI founder, Coursera co-founder

"One of the new, buzzy jobs in Silicon Valley is the AI Forward Deployed Engineer (FDE), an engineer who is embedded within a client organization to help customize solutions, such as building and tuning agentic workflows that suit the client's particular needs... However, I believe the number of AI Engineer jobs will be far larger. A company might accept a few FDEs to be embedded within its organization. But most companies will want far more of their own employees working on their projects. — [@AndrewYNg](https://x.com/AndrewYNg/status/2061477558693384395), June 1 — 4,456 likes, 731 retweets, 541,812 views"

What he means.

On: The career structure of the AI engineering era

The FDE role — engineers embedded at client organizations to build agentic workflows — is being staffed up at both Anthropic and OpenAI. Ng's essay maps why it cannot be the dominant path: supply is too thin, vendor concentration is too high, and optionality matters too much when no one can predict which AI stack leads next year. His prediction: the AI Engineer role will fragment further, just as "Software Engineer" fragmented into frontend, backend, mobile, and DevOps. This week's data points toward the same direction from the opposite end of the spectrum — PewDiePie shipped a feature-complete AI workspace without a CS degree.

Nathan LambertNathan Lambert — Independent ML researcher; recently departed Allen Institute for AI

"I still stand by this despite the recent Anthropic post. There are still serious bottlenecks in building the model that the agents don't address (organizational, compute, data access, etc). It'll take time to push through them and we will see 'linear' gains for years to come. — [@natolambert](https://x.com/natolambert/status/2063055447435956427), June 6 — 219 likes, 38,699 views"

What he means.

On: Why the RSI numbers don't guarantee the loop

Lambert is one of the field's more careful commentators on training dynamics. His pushback on Anthropic's RSI post is specific: agent productivity gains are real, but they don't touch the hard constraints — compute availability, data access, organizational coordination. The 80% figure measures code volume. Research judgment — deciding which problems are worth pursuing — is the capability Anthropic explicitly says Claude hasn't crossed yet. Lambert's "linear gains" conclusion is the bearish case; the 52x speedup on training code optimization is the bullish case. Both can be true in different parts of the development pipeline.

Guillermo RauchGuillermo Rauch — CEO, Vercel

"Unclear if a durable trend, but CEOs and CTOs are back to coding with a fury, thanks to coding agents. I have public company CEOs sliding into my DMs (and 'InMail') telling me about falling in love with shipping software again thanks to Claude Code and Vercel. 'Dream accounts' that we always wanted to work with, where in the past the C-suite would hardly understand the infrastructure until much later in the game. Coding agents are the ultimate PLG-fication of the enterprise. Bad, legacy software can't hide anymore. The stack that works is self-evident to the entire organization, from intern to CEO. — [@rauchg](https://x.com/rauchg/status/2061135404942974982), May 31 — 1,461 likes, 377,252 views"

What he means.

On: Who is coding now that agents have lowered the cost

Rauch runs infrastructure for more than a million developers and has a direct signal on who is building with it. His observation is careful in one important way: the executives returning to code are those who had technical skills before taking management roles. The agent lowers the activation energy. It restores something latent rather than creating something new. That distinction matters for how far democratization actually reaches — a point PewDiePie's Odysseus project tested the other way this week, reaching 60,000 GitHub stars with a product that appears to have been built without a traditional engineering background.

swyxswyx — Writer, podcast host (Latent Space); former engineer at AWS, Netlify

"just a small zoom out on the vibe shift: in Feb 2025 @soumithchintala was talking about his dream of personal, local, private agents, most people didn't believe him. it's June 2026 and @pewdiepie has just released his vibecoded @opencode wrapper that is a complete personal AI productivity suite including email, docs, and calendar. top of HN, easily >1m views, >10k stars in a day. if your Knowledge Work Agents startup can't beat pewdiepie you might as well pack up and go home at this point, his is the benchmark for what you can DIY. — [@swyx](https://x.com/swyx/status/2061256096719970337), June 1 — 271 likes, 31,720 views"

What he means.

On: The new benchmark for AI product startups

swyx covers the AI engineering ecosystem at Latent Space and was early to track the vibe-coding shift. His observation is pointed in a specific direction: Odysseus is not a toy. It has IMAP/SMTP email integration, CalDAV calendar sync, hardware-matched model selection, MCP support, and persistent agent memory. It is a serious piece of software that by day five had 900 commits and 170 contributors. The challenge it poses to funded AI workspace startups is not a joke at their expense — it is a real capability bar set by one person using coding agents. Startups that cannot exceed it on quality, breadth, or distribution should ask why.

Sridhar VembuSridhar Vembu — CEO, Zoho Corporation

"AI can make you smarter faster but AI can also make you dumber faster. I would not encourage AI adoption too early by school or college students, until they learn the fundamentals right. — [@svembu](https://x.com/svembu/status/2062421556396224633), June 4 — 1,154 likes, 58,668 views"

What he means.

On: AI and the compressed learning question

Vembu runs a software company with roughly 15,000 employees. His view on AI and education is not abstract. Zoho hires large numbers of non-traditional engineers through its own training programs. The Berkeley data published this week — 35% of CS 10 students received an F, against a historical ceiling of 10% — shows what happens when AI handles the friction that produces learning. Students recorded adequate homework grades via AI assistance and arrived at closed-book exams without the understanding to pass them. The practical version of Vembu's concern for employers: a credential that says "linear algebra completed" now carries less information about actual capability than it did two years ago.

Mark ChenMark Chen — Chief Research Officer, OpenAI

"When Mythos came out, my immediate thought was 'if our models can prove 80-year-old theorems, surely they can find cyber vulnerabilities too.' And they did. I imagine the researchers there are thinking the same thought in reverse. — [@markchen90](https://x.com/markchen90/status/2062265524663250961), June 3 — 987 likes, 129,528 views"

What he means.

On: The dual-use nature of frontier model capability

Chen posted this two days before the Financial Times reported that Anthropic engineers were inside the NSA adapting Mythos for offensive cyber operations. The "reverse" he describes: a model that can find vulnerabilities can be used to exploit them. This is the same reasoning Anthropic used to keep Mythos off the public market. It is also, as Chen notes, the reasoning that makes a classified offensive cyber arrangement possible. The dual-use problem is not a future risk to manage. It is the present operating condition of every frontier model that is good enough at code.