新闻 · Hotspot — Yueying AI Hub

◇ EDITOR'S NOTE

今天的主线不是单个模型发布，而是 AI builder 正在把 agent 从炫技玩具推向可复用工作流、公司分发和组织护城河。

CHAPTER ONE · AGENT 工作流进入日常生产

agent 工作流进入日常生产

讨论焦点从「能不能让 AI 写代码」转向「怎样让多个 agent、多个模型和人的注意力稳定协作」。

Josh Pigford 的 solo builder 方法论：不是一个 agent，而是一套反复变好的生产线

Peter Yang 总结 Josh Pigford 如何用 AI agents 同时推进多个产品：付费从第一天开始、用 git worktrees 并行开发、让 GPT 和 Claude 互相审查。

@petergyang Peter Yang

展开详情正文 · 1 条来源

这条长帖之所以值得展开，不在于又一次证明 AI 可以帮独立开发者写代码，而是它把「独立开发 + agent」拆成了一套可执行的运营系统。Josh Pigford 的做法很硬：产品从第一天收费，覆盖不了 hosting 或 LLM 成本就关掉；功能开发用独立 git worktrees 分开跑，减少上下文污染；一个模型写，另一个模型审，GPT 会抓出 Opus 漏掉的 bug；每个阶段结束后还用 /learnings skill 把踩坑沉淀成 agent 以后要遵守的规则。这里的启发是，AI 并没有替代产品判断和工程经验，反而放大了经验的价值。真正成熟的 builder 不是把 prompt 当魔法，而是把开发、审查、复盘、止损都工程化，让 agent 在边界清楚的系统里持续变好。

来源

Peter Yang @petergyang 06.01 14:17 ↗

Peter Yang 总结了 @Shpigford 用 AI agents 独自构建多个产品的 6 个经验： 1. 即使害怕丢脸，也要持续发布。 Josh 做了 25 年 solo builder，发布新东西时仍然会觉得害怕，会想「如果没人关心怎么办」。但他仍然选择早发布，因为花几个月做完再给别人用，在他看来是很糟糕的做法。 2. 第一天就收费，不能自我养活的产品就关掉。只要产品有 hosting、LLM 等内置成本，Josh 就会从第一天推出付费版本。如果产品覆盖不了自身成本，他会直接关闭，并退还近期付款。 3. 用独立 git worktrees 并行开发功能。每个 worktree 都是代码库在独立分支上的工作副本，不同功能不会互相影响。worktrees 能减少上下文腐化、隔离错误，并迫使你在进入下一块前测试每个部分。Josh 用 @conductor_build 管理这些 worktrees。 4. 让 GPT 审查 Claude 的工作，也让 Claude 审查 GPT 的工作。 Josh 用一个模型构建，再用另一个模型做 review。他说「GPT 总能发现 Opus 漏掉的三到五个 bug」。不同模型会发现不同问题。在 Conductor 里，他设置了默认 review 模型，让每个 PR 都自动跑审查。 5. 做一个 skill，让 AI 随时间变好。 Josh 做了一个 /learnings skill，每个阶段结束时运行，把每次「不，这样不行」的经验提炼成新规则，帮助 agent 避免重复犯错。 6. AI 让任何人都能 ship，但真实经验仍然重要。 Josh 认为自己现在能用 agents 快速交付，靠的是 AI 之前 25 年的构建经验。他知道自己想要的大致结构，所以能很快抵达那里。他给新 builder 的建议是「多失败」，因为只有做错过，才会知道什么不该做。

My top 6 takeaways from @Shpigford on how to build multiple products solo with AI agents: 1. Keep shipping even if the fear of embarrassment never goes away. After 25 years as a solo builder, Josh still thinks "It's terrifying launching something. Every single time it's just like, what if zero people care?" But he ships early anyway: "The idea of spending months working on something before you put it out for other people to use, I think that's a real bad idea." 2. Charge from day one, and kill products that can’t pay for themselves. For any product with built-in costs (e.g., hosting, LLM), Josh ships a paid version from day one. If the product can't cover its own costs, he doesn't hesitate to shut it down and refund recent payments. 3. Build features in parallel with separate git worktrees Each worktree is a separate working copy of your codebase on its own branch, so features won't interact with each other. Worktrees stop context rot, isolate mistakes, and force you to test every chunk before moving on. Josh uses @conductor_build to manage them all. 4. Have GPT review Claude's work and vice versa Josh builds with one model and then runs a review pass with another “GPT invariably finds three to five bugs that Opus overlooked.” Different models spot different mistakes. In Conductor, he sets a default review model so it runs on every PR. 5. Build a skill to make AI better over time. Josh built a /learnings skill that runs at the end of each phase so that every "no, that didn't work" moment can be distilled into new rules to help the agent avoid repeating the same mistakes. 6. AI lets anyone ship, but real experience still matters. Josh credits 25 years of building before AI for how fast he ships now with agents. "I know the general shape of how I want things to work, so I can very quickly get to that point." His advice for newer builders is to "just fail a lot, because the only way that you'll ever figure out what not to do is by doing the thing incorrectly." Josh walked through his entire development workflow in our episode and shared more skills. 📌 Watch the full episode here: https://t.co/9brwr7daw8

Anthropic 内部也在问：怎么跟得上 Claude 到底做了什么

Thariq 分享 Anthropic 同事 Suzanne 的做法：用一个完整 prompt 让 Claude 解释自己的工作进展，还会搭配 voice mode，让人更自然地介入。

@trq212 Thariq

展开详情正文 · 3 条来源

当 agent 能连续做很多步之后，新的瓶颈变成了「人怎么保持在环」。Thariq 说他一直在问 Anthropic 内部同事，如何既让 Claude 工作，又真正理解 Claude 做了什么；他最喜欢的是 Suzanne 的方法，并放出了完整 prompt，还补充说 Suzanne 会配合 voice mode，让回应更容易、更自然。这个信号很小，但非常贴近真实使用。很多团队的 agent 失败不是模型不会做，而是人在中途失去上下文，最后只能粗暴接管或重跑。对 builder 来说，下一阶段的好工作流不只是自动化任务，还要设计汇报节奏、解释粒度和人类介入入口。voice mode 在这里不是炫酷交互，而是降低监督成本，让人可以像和同事同步一样打断、确认、纠偏。

来源

Thariq @trq212 06.01 20:29 ↗

我一直在问 Anthropic 的其他人，他们是如何跟上 Claude 的工作进展，并完整理解正在完成的工作的。这是我最喜欢的做法之一，来自 Suzanne：https://t.co/nqIMcGXiKI

been asking others at Anthropic how they stay in the loop with Claude and fully understand the work being done this is one of my favorites from Suzanne: https://t.co/nqIMcGXiKI

Thariq @trq212 06.01 20:29 ↗

完整 prompt 的 gist 在这里：https://t.co/L0ffBeU1ua

gist for the full prompt here: https://t.co/L0ffBeU1ua

Thariq @trq212 06.01 23:07 ↗

Suzanne 还提到，她会把这个方法和 voice mode 一起用，这样回应起来更容易，也更自然。

Suzanne also mentioned she uses this with voice mode to make it easier to respond and more natural.

Codex 开始「叫你回来」：agent 协作里的通知层正在变重要

Peter Steinberger 让 Codex 在被卡住时通过外部工具呼叫自己，尤其是发布流程里遇到 npm 和 1Password 门槛时。Dan Shipper 则把持续运行的 Codex swarm 说成新的工作节奏。

@steipete Peter Steinberger

展开详情正文 · 2 条来源

今天几条 Codex 相关更新，都在指向同一个变化：agent 不是一次性命令，而是会在后台推进、等待授权、再把人拉回来的协作者。Peter Steinberger 说，他告诉 Codex 在自己分心且需要帮助解锁时使用一个通知工具，于是偶尔会听见 Codex 跟他说话，特别适合 npm 发布、1Password 门控这类必须人类参与的环节。Dan Shipper 则开玩笑说，如果一直让一群 Codex 跑在 /goal 上，你不必每周工作七天，但你可能会想这么做。这里的产品机会很明确：长任务 agent 需要的不只是更强模型，还需要状态感知、权限请求、异步提醒和恢复上下文。谁能把这些人机交接细节做顺，谁就更接近真正可用的 agent IDE。

来源

Peter Steinberger @steipete 06.01 22:24 ↗

我告诉 Codex：如果我分心了，而它又需要我的帮助才能解除阻塞，就使用 https://t.co/oHS8ombQcW。于是每隔一阵子，我就会听见它在跟我说话，这太酷了。比如发布流程需要 npm，而且被 1Password 门控时，就会用到。

I told codex to use https://t.co/oHS8ombQcW whenever I'm distracted and it needs my help to be unblocked, and ever once it a while I hear it talking to me, and it's the coolest thing ever. (e.g. for releases, that needs npm and is 1Password-gated)

Dan Shipper @danshipper 06.01 13:44 ↗

如果你一直有一群 Codex 在 /goal 上运行，你其实不必每周工作 7 天。但你大概率会想这么做。

you don't have to work 7 days a week if you just have a swarm of Codex's running on /goal all the time but you'll probably want to

CHAPTER TWO · 模型与平台分发重新定价

模型与平台分发重新定价

模型能力、调用成本和云平台入口正在一起改变 builder 的选择题：不只是哪个模型最强，而是谁最便宜、最容易进入企业账单。

MiniMax M3 打到 Opus 和 GPT5 后面，价格才是 Vercel 想强调的变量

Guillermo Rauch 称 MiniMax M3 已成为 Next.js agent eval 上领先的 open model，排名紧跟 Opus 和 GPT5，但便宜 10 倍，在 Vercel AI Gateway 上当前便宜 20 倍。

@rauchg Guillermo Rauch

展开详情正文 · 1 条来源

Vercel CEO Guillermo Rauch 今天把 MiniMax M3 推到台前：在 Next.js agent evaluations 上，它已经是领先的 open model，排在 Opus 和 GPT5 后面，但成本低 10 倍，在 Vercel AI Gateway 当前低 20 倍。这个信息对 builder 的意义不只是「又一个开源模型变强了」，而是 agent 产品的商业账开始被重新计算。过去很多应用默认拿最强闭源模型做 baseline，可一旦任务是 Next.js 这类可评测、可重复的工程场景，便宜模型只要足够接近，就会改变默认选择。对开发者来说，eval 不再是论文里的榜单，而是产品路由、成本控制和毛利率的依据。下一步更实际的做法，是按任务类型建立自己的小型 eval，让模型选择从品牌偏好变成成本和质量的组合决策。

来源

Guillermo Rauch @rauchg 06.01 23:40 ↗

MiniMax M3 现在是 Next.js agent evaluations 上领先的 open model：https://t.co/SnZ54XoRWV。它排在 Opus 和 GPT5 后面，但便宜 10 倍；而且现在在 ▲ AI Gateway 上便宜 20 倍：https://t.co/z9ts1NZDyu

MiniMax M3 is now the leading open model on the Next.js agent evaluations (https://t.co/SnZ54XoRWV). Right behind Opus & GPT5, but 10× cheaper (And 20× cheaper right now on ▲ AI Gateway!) https://t.co/z9ts1NZDyu

OpenAI 上 AWS：企业分发比模型发布本身更像战争入口

Thibault Sottiaux 用玩笑口吻提到 AWS 和 GPT-5.5；Aaron Levie 则点出 AWS 的企业合约和分发能力，可能会扩大 OpenAI 模型触达和整体 token 消耗。

@thsottiaux Thibault Sottiaux

展开详情正文 · 2 条来源

OpenAI 相关人士 Thibault Sottiaux 说「听说 AWS 是酷孩子们在的地方。你好，我们有 GPT-5.5。」这本身像一句带梗的站台，但 Aaron Levie 给出了更商业的解读：AWS 拥有大量企业牵引力和大型承诺合约，这类合作不仅会扩大 OpenAI 模型的分发，也可能推高整个模型供应商生态的 token 消费。这里的关键不是某个模型多了一个入口，而是企业 AI 采购往往跟既有云预算、合约和安全流程绑定。对做 AI 产品的人来说，模型能力只是上半场，下半场是能不能进入客户已有的结算、权限和数据栈。创业公司如果忽略这一层，会高估 demo 到生产的速度；反过来，能把产品嵌进云市场、企业身份和审计流程，就可能比单纯堆模型能力更快变成真实收入。

来源

Thibault Sottiaux @thsottiaux 06.02 03:01 ↗

听说 AWS 是酷孩子们待的地方。你好，我们有 GPT-5.5。https://t.co/TixKoaIS0D

Heard that AWS is where the cool kids are. Hello. We have GPT-5.5. https://t.co/TixKoaIS0D

Aaron Levie @levie 06.02 00:55 ↗

AWS 在企业市场有巨大的牵引力，并且拥有企业客户的大型承诺合约。因此，这次合作既会为 OpenAI 的模型打开更大的分发，也很可能推动各模型供应商整体 token 消耗的增长。https://t.co/I9XJyDAq9F

AWS has massive enterprise traction, with large committed contracts from enterprises. So this partnership opens up both increased distribution for OpenAI’s models, but also likely drives an increase in token consumption overall across model providers. https://t.co/I9XJyDAq9F

企业 agent 的护城河不在通用模型，而在组织自己的知识和流程

Aaron Levie 提醒：当竞争对手也能用同样的 AI 模型时，真正的优势来自内部制度知识、数据资产和行业工作流能否被 AI 接上。

@levie Aaron Levie

展开详情正文 · 1 条来源

Aaron Levie 抛出的是企业 AI 时代最现实的问题：如果竞争对手能访问和你一样的模型与智能，你靠什么建立优势？他的答案是，企业要把内部 institutional knowledge、既有数据资产和特定领域工作流接入 AI，并长期保护这些独特数据、流程和专业知识创造出的价值。Box 看到的客户需求，也不是押注某一个模型，而是希望未来可以把任何模型带到自己的数据上。对 builder 来说，这意味着企业 agent 产品不能只包装一个强模型聊天框。真正有价值的是权限、内容结构、流程上下文、行业对象和可迁移的模型接口。越到后面，模型层越容易商品化，能否理解客户组织里的真实工作流，才会决定产品是不是有黏性。

来源

Aaron Levie @levie 06.02 04:13 ↗

随着我们进入 AI agents 时代，一个决定性问题是：当竞争对手也能访问和你一样的 AI 模型与智能时，你该如何建立竞争优势。未来能领先的公司，将是那些最能利用自身内部制度知识、既有数据资产和领域专属工作流，并把它们与 AI 连接起来的公司。企业是自己搭建技术栈，还是使用多种 best-in-class 工具，当然是一个核心变量。但关键在于，企业要找到一种方式，长期捕获并保护由自身独特数据、流程和专业知识创造的价值。每个行业都会有自己的版本，竞争优势也会因垂直领域而不同。我们在 Box 越来越多地看到这种模式：客户希望确保自己可以利用 institutional knowledge，同时保持灵活性，在任何时候都能把任何 AI 模型和智能带到自己的数据上。这种模式未来会越来越成为战略核心原则。

As we enter the era of AI agents, one of the defining questions is how you develop competitive advantage when your competitor has access to the same AI models and intelligence as you. The companies that are able to best harness their internal institutional knowledge, existing data assets, and domain-specific workflows -- connected with AI -- will be those that are able to stay ahead in the future. Whether a company decides to build out the tech stacks themselves, or leverage a variety of best-in-class tools is certainly one core variable. But the key is to find the way that the enterprise can capture and protect the value created by their unique data, processes, and expertise over the long run. Each industry will have their own version of this, and the competitive advantage will vary by vertical. We’re increasingly seeing this at Box, where customers want to ensure that they can take advantage of their institutional knowledge and have the flexibility of bringing any AI model and intelligence to their data at any time. This is a pattern that will increasingly become a core principle of strategy in the future.

CHAPTER THREE · AI 公司竞争的深层叙事

AI 公司竞争的深层叙事

今天最长的信号来自 DeepMind 传记访谈：AI 竞赛不是纯技术曲线，也被领导者性格、集中下注和公共叙事塑形。

DeepMind 被低估，不是因为模型弱，而是因为没有赢下 ChatGPT 和 Claude Code 这种时刻

Unsupervised Learning 访谈 Sebastian Mallaby：他认为外界过早加冕 OpenAI，也低估了 Demis Hassabis 和 Google DeepMind，但 Google 的短板在于没有抓住消费聊天和 coding agent 的产品叙事。

@podcast Unsupervised Learning

展开详情正文 · 2 条来源

这期 podcast 的价值，在于把 DeepMind 放回 AI 竞赛的原点看。Sebastian Mallaby 认为，外界太早把 OpenAI 和 Sam Altman 加冕为赢家，也低估了 Demis Hassabis 与 Google DeepMind：Demis 早在 2010 年创办 OG lab，DeepMind 也是后来 OpenAI 借鉴的重要源头。但访谈也承认，Google DeepMind 没有创造 ChatGPT 这种消费者时刻，也没有在 Claude Code 代表的 coding agent 时刻成为第一。原因之一是 Demis 的科学家气质和 DeepMind 的多路径下注：能同时做科学、视频、机器人、模型，却不总是像 Anthropic 那样在一个产品机会里重押。对 builder 的提醒很直接：技术深度不自动等于产品动能。前沿公司也会输在焦点和叙事上，小团队反而可以靠集中下注抢到一个具体使用场景。

来源

Unsupervised Learning @podcast 06.01 12:46 ↗

Speaker 1 | 09:37 - 11:27 让我觉得很惊讶的是，人们基本上低估了 Demis。我的书几周前出版后，我接受了很多采访，很多人甚至念不准他的名字，会把 Demis 和 Hassabis 都念错。他们几乎不知道他是谁。我的出版方 Penguin Press 在考虑封面设计时，想用他的照片，但又觉得读者未必认得他，不能靠一个大家认不出来的人来卖书。但 Demis 是 2010 年创办那个 OG lab 的人，比很多人早得多，也创造了后来 OpenAI 借鉴的模型。我做研究时去见 Dario，他也说 Demis 是最初的人物，而且当时整个 AI for science 领域几乎都是他的。当然现在没那么绝对了。但重点是，人们低估了 Demis 的重要性。他们也低估了 Google 这家公司，因为他们认为 Innovator's Dilemma 会让 Google 太慢。OpenAI 用 chatbot 快速出圈，形成巨大品牌效应，于是大家觉得没人能追上。事实证明这错了。到 2025 年末，Google DeepMind 的 Gemini 3.0 在排行榜上优于对手。当然之后又有 Anthropic 的强劲上升。但重点是，人们太快把 OpenAI 和 Sam Altman 加冕为赢家，同时低估了 Demis 这个人和 Google DeepMind 这家公司。

Speaker 1 | 09:37 - 11:27 Well, the extraordinary thing to me is how people basically discounted Demees. I mean, you know, I I've been interviewed a bunch of times since my book came out a few weeks ago and, you know, often people can't pronounce his name. They say Demise and they say Hasabis instead of Hasabis. They kind of barely know who he is. You know my publisher Penguin Press was figuring out what to do the how to do the cover design. And Demis, you know, is the guy who founded the OG lab in 2010 way before anybody else really created the model that then OpenAI copied later. I remember going to see Dario when I was doing my research and, you know, he said, yes, Demis was the original figure and he's got the whole AI for science space to himself which was true at the time. I think it's less true now. But so I think people just underestimated how important he he is. They underestimated Google as a company because they thought Innovator's Dilemma, they were too slow. OpenAI went fast with a model with a with a chatbot. And now they've got this massive brand effect, OpenAI does. And so nobody can catch up. That proved wrong. And you know, as of late twenty twenty five, the Google DeepMind models, Gemini three point o were better on the leaderboards than the adversaries. Of course, since then we've seen a big anthropic surge. But you know, the the point is I think people were too quick to crown OpenAI and Sam Altman as the winner and underestimated both Demis as a person and Google DeepMind as a company.

Unsupervised Learning @podcast 06.01 12:46 ↗

Speaker 1 | 12:06 - 13:27 人们使用 Gemini 的程度比我们意识到的更多。它现在某种程度上已经被打包进 Google Search，也就是 AI mode。但我接受你更大的观点：证明消费者体验的两个标志性时刻，一个是你说的消费端 ChatGPT，另一个是最近的 coding，这两个都不是 Google DeepMind 做出来的。这也许说明了部分问题：Demis 的性格和智识背景很重要。他是 neuroscience 博士，对智能是什么有非常广泛的研究，因此构建人工智能的方法也非常宽。他们有一种「什么都试试」的 AI 研究方式。只要有两条路可走，他们会说两条都走；如果还能找到第三条，他们大概也会走。他们非常对冲。而 Anthropic 能做到 coding，是因为它愿意做更集中的下注。它没有进入整个 generative video 领域，没有做类似的东西。同时，OpenAI 作为 startup，没有 Google 那种必须保护的声誉包袱，所以一开始愿意发布一个会大量幻觉的 chatbot。

Speaker 1 | 12:06 - 13:27 People use Gemini more than we realize. Right? You know, it's sort of bundled into Google search at this point, the AI mode thing. But I I I take the broader point, which is that both of these sort of seminal moments in proving out, you know, the the the consumer experience, which were ChatTPT, as you say, on the consumer side and then more recently with coding. Neither of those came from Google DeepMind. And I think what that perhaps shows us is that, you know, partly because of Demis's personality and his intellectual formation, which was PhD in neuroscience, this very broad study of what intelligence might be, a very broad approach therefore to building artificial intelligence. There's this there's this kind of let's try everything approach to AI research. Whenever there's like two different paths you could go down, they say, well, we'll do both. And if we can find the third path, we'll probably do that too. They're very hedged. Right? Whereas I think Anthropic got to coding because it was willing to take a more concentrated bet. It never went into the whole field of, you know, generative video, just never had a sort of equivalent. And, you know, at the same time, OpenAI, being a startup, didn't have the reputational baggage that it had to protect that Google has, and so it willing to put out a chatbot that hallucinated a lot at the beginning.

AI 安全从「一个实验室自律」变成政府级集体行动问题

Mallaby 讲述 Demis 早期曾幻想一个 singleton lab 场景，但在 OpenAI 等竞争者出现后，他转向认为只有政府规则和跨国协调才能解决安全问题。

@podcast Unsupervised Learning

展开详情正文 · 2 条来源

访谈里另一个强信号是，AI 安全叙事已经从理想主义的「我们这个实验室做对」转向制度问题。Mallaby 说，Demis 早期希望避免竞赛动态，甚至在招聘时会问候选人，如果接近 AGI，是否愿意一起飞去 bunker 处理。但后来他看到赛道拥挤，也意识到单个实验室安全没有用：如果一个实验室慢下来，其他实验室不慢，世界并不会更安全。2015 年 DeepMind 在 SpaceX 做安全峰会，本想把 Elon Musk 拉入安全监督，结果同年底 OpenAI 成立，反而强化了竞赛现实。对 AI builder 来说，这不是遥远的政策问题。越靠近 agent、自动化和高风险行业，产品就越需要提前设计 eval、权限、发布前测试和可审计机制。未来合规不是上线后的补丁，而会变成产品能否进入企业和公共部门的门槛。

来源

Unsupervised Learning @podcast 06.01 12:46 ↗

Speaker 1 | 03:33 - 04:03 不是。我认为 Demis 已经从一个极端转向了另一个极端。他一开始认为可能会有 singleton scenario，也就是只有一个实验室。而他当时实际指的是 DeepMind 和他自己。现在他走到了相反一端：他看到这个领域已经非常拥挤，因此单个实验室独自追求安全几乎没有意义，因为如果一个实验室安全，而其他实验室不安全，世界并不会更安全。所以他真的已经转向把这看作一个只有政府才能解决的集体行动问题。

Speaker 1 | 03:33 - 04:03 No. I think Demis has swung from one extreme to the other. You know, he began by thinking there could be a singleton scenario, just one lab. And by that, he really meant deep mind in himself. To the opposite extreme where he now sees that there's a very crowded field and therefore that it's almost pointless for one lab to pursue safety by itself because if one lab is safe and then the other ones aren't, it doesn't make the world safer. So he really has shifted to seeing this as a collective action problem that only a government can solve.

Unsupervised Learning @podcast 06.01 12:46 ↗

Speaker 1 | 04:38 - 05:46 是的，你说得完全对。那是在 2015 年夏天，他们在 SpaceX 举办了这场 summit，由 Elon Musk 主办。想法是 DeepMind 把他拉进这个帐篷里。他会成为他们努力的一部分，担任某种 safety oversight board 的主席，因此他就不会再建立竞争对手。当然，到了 2015 年底，他确实建立了竞争对手 OpenAI。这件事非常清楚地说明了现实：我们会进入一场竞赛。如果你现在问未来合作怎么办，我认为从 Demis 或任何 lab leader 的角度看，坦白说，你不能相信其他人。因此，建立信任的唯一方式，是有一个政府执行者出现并说：这是所有人的规则，会有公平竞争环境。你们都必须遵守某种安全、减速、模型发布前预测试等规则，而且必须执行。当然反应会是：那中国那边怎么办？所以最终我认为这必须是一次美国和中国的合作，尽管对很多听众来说，这个前景现在可能显得很遥远。

Speaker 1 | 04:38 - 05:46 Yeah. I mean, you're totally right. That was in 2015, summer of twenty fifteen. They have this summit SpaceX. Elon Musk is hosting it, and the idea is he's gonna be brought into the tent by DeepMind. He's gonna be part of their efforts. He's going to be, you know, chairing this sort of safety oversight board, And therefore, he wouldn't set up a competitor. And, of course, at the end of 2015, he did set up a competitor, OpenAI. And so that kind of really drove home the reality that we're gonna have a race. And I think if you ask now, okay. So what about future collaboration? I think from Tembus' point of view or from any of the lab leaders, frankly, you can't trust the other guys. And therefore, the only way you get trust is if you have a government enforcer that comes along and say, look. Here's the rules for everybody. There's gonna be a level playing field. You're all gonna have to, you know, abide by some sort of safety, slowdown, you know, pretesting of models before you release them, all that stuff. And you will have to do it. And then the reaction, of course, will be, yeah. But what about the guys in China? And that's why, ultimately, I think this has to be a US China collaboration, you know, remote though that prospect may seem to many listeners right now.

本周 AI 的要闻，
与当事人。

Anthropic 扩展 Project Glasswing 用于关键基础设施

用户用 Qwen3.6-27B 替代 Claude 进行多智能体编排测试

腾讯秘密为微信打造 AI 智能体连接数百万小程序

AI 客服机器人漏洞绕过 Instagram 双重认证

MiniMax M3：拥有 100 万上下文窗口的开源前沿模型

斯坦福 CS336：从头开始的语言建模

埃森哲以 12 亿美元收购 Ookla

OpenRouter 获 1.13 亿美元 B 轮融资

教皇利奥首篇通谕抨击技术弥赛亚主义

vLLM v0.22.0 发布：DeepSeek V4 成熟化与 Rust 前端

通过探针定向微调让大语言模型表达真实置信度

AI 是否在重蹈前端“失落的十年”？

Anthropic H 轮融资 650 亿美元，估值 9650 亿

英伟达承诺每年在台湾投资 1500 亿美元，将其作为 AI 中心

LLM 写作气味合集引发热议

Simon Willison 称 Anthropic 和 OpenAI 已找到产品市场匹配

MOT 工具对抗 AI 模型中的开放洗白

agent 工作流进入日常生产

Josh Pigford 的 solo builder 方法论：不是一个 agent，而是一套反复变好的生产线

Anthropic 内部也在问：怎么跟得上 Claude 到底做了什么

Codex 开始「叫你回来」：agent 协作里的通知层正在变重要

模型与平台分发重新定价

MiniMax M3 打到 Opus 和 GPT5 后面，价格才是 Vercel 想强调的变量

OpenAI 上 AWS：企业分发比模型发布本身更像战争入口

企业 agent 的护城河不在通用模型，而在组织自己的知识和流程

AI 公司竞争的深层叙事

DeepMind 被低估，不是因为模型弱，而是因为没有赢下 ChatGPT 和 Claude Code 这种时刻

AI 安全从「一个实验室自律」变成政府级集体行动问题

本周 AI 的要闻，与当事人。

Anthropic 扩展 Project Glasswing 用于关键基础设施

用户用 Qwen3.6-27B 替代 Claude 进行多智能体编排测试

腾讯秘密为微信打造 AI 智能体连接数百万小程序

AI 客服机器人漏洞绕过 Instagram 双重认证

MiniMax M3：拥有 100 万上下文窗口的开源前沿模型

斯坦福 CS336：从头开始的语言建模

埃森哲以 12 亿美元收购 Ookla

OpenRouter 获 1.13 亿美元 B 轮融资

教皇利奥首篇通谕抨击技术弥赛亚主义

vLLM v0.22.0 发布：DeepSeek V4 成熟化与 Rust 前端

通过探针定向微调让大语言模型表达真实置信度

AI 是否在重蹈前端“失落的十年”？

Anthropic H 轮融资 650 亿美元，估值 9650 亿

英伟达承诺每年在台湾投资 1500 亿美元，将其作为 AI 中心

LLM 写作气味合集引发热议

Simon Willison 称 Anthropic 和 OpenAI 已找到产品市场匹配

MOT 工具对抗 AI 模型中的开放洗白

Josh Pigford 的 solo builder 方法论：不是一个 agent，而是一套反复变好的生产线

Anthropic 内部也在问：怎么跟得上 Claude 到底做了什么

Codex 开始「叫你回来」：agent 协作里的通知层正在变重要

MiniMax M3 打到 Opus 和 GPT5 后面，价格才是 Vercel 想强调的变量

OpenAI 上 AWS：企业分发比模型发布本身更像战争入口

企业 agent 的护城河不在通用模型，而在组织自己的知识和流程

DeepMind 被低估，不是因为模型弱，而是因为没有赢下 ChatGPT 和 Claude Code 这种时刻

AI 安全从「一个实验室自律」变成政府级集体行动问题

本周 AI 的要闻，
与当事人。