What Survives Scale
There is a question underneath most decisions about what to build in AI right now, and it is rarely asked in plain form: as model capability grows, what becomes worthless, and what does not?
The cleanest articulation of the threat comes from Rich Sutton's bitter lesson. The history of AI keeps teaching the same thing: general methods that ride on computation, search and learning, eventually overtake methods that encode human understanding of a problem. Researchers cannot resist pouring their domain insight into a system. It works in the short term. It becomes the ceiling in the long term, and then a method that simply scales compute walks past it.
If that is true, then the only rational thing to do is to build the part that scale does not eat. Which requires knowing how to tell the difference. And the obvious way to tell the difference is wrong in an instructive way.
The first cut, and why it fails
The tempting discriminator is: encoding a method gets eaten; encoding the structure of the world does not. A workflow that says "first do literature review, then generate hypotheses, then critique" encodes a method, a human prior about how the work should proceed, and a stronger model will dissolve it by decomposing the task itself. Whereas the fact that conclusions rest on evidence, that evidence has a source, that state changes over time, that is world-structure, and it stays.
This is a good heuristic. It is not a law, and it fails at exactly the place you need it most.
It fails because world-structure itself gets eaten. The history of science is a graveyard of confidently-held ontologies. Phlogiston was world-structure. The ether was world-structure. Vital force was the way the world was assumed to be carved at the joints. Each was an account of what kinds of things exist; and each was overturned. So "ontology survives, method dies" claims too much. Ontologies die all the time.
The better cut
What actually survives is narrower and stranger than "world-structure." It is the class of epistemic constraints; and the reason they survive is not that they describe the world correctly, but that a stronger model does not make them unnecessary.
Take the rule that you must not let something unverified look verified. A model a hundred times more capable does not retire this rule. It still cannot collapse the gap between a map and the territory; it lives on the map side of that gap like everything else. Stronger models make better maps. They do not make the distinction between "agreed" and "true" obsolete, because that distinction is not about how good the map is: it is about the fact that the map is a map.
So the discriminator is not method versus world-structure. It is domain shortcut versus epistemic constraint. A domain shortcut encodes how some problem is currently solved; it gets absorbed. An epistemic constraint encodes the irreducible structure of not-yet-knowing (provenance, the difference between consensus and truth, honest accounting of what rests on what), and it does not get absorbed, because no amount of capability removes the condition it responds to.
This is sharper because it explains the failure of the first cut. Ontologies get eaten when they are really domain shortcuts wearing the costume of world-structure: phlogiston was a particular era's best guess about combustion, dressed up as the nature of fire. The constraints that survive are the ones that encode the gap between any model and the world, not any particular model of it.
The unstable corner
There is one place where even this cleaner cut rests on a bet, and honesty requires naming it.
The reason epistemic constraints survive is that the map-territory gap is permanent. And the reason the gap is permanent is that some verifiers are irreducible: there exist questions whose ground truth requires the real world, in real time, at irreducible complexity, to answer. If that were false, if the territory could be fully simulated, if a good enough model could always stand in for reality's reply, then the gap would slowly close, "agreed" would creep toward "true," and the surviving constraints would become as quaint as some discarded ontology.
So the whole edifice rests on a single empirical wager: that the territory contains a residue no map can fully replace. There is reason to believe it. Any simulation is itself a map, so using simulation to stand in for the territory is using a map to verify a map: circular. But this is a structural argument from a premise about the world, not a theorem, and it could be overturned in some specific domain by a single empirical fact.
The honest position is therefore: the discriminator works, and the thing it protects is durable, as long as the territory keeps a residue. Bet that it does. But hold the bet as a bet.
Why this matters for what you build
The practical consequence is that the most defensible thing to build is rarely the most impressive-looking thing. The impressive thing, the system that autonomously does the whole job, is almost always a stack of domain shortcuts that look necessary right now precisely because they fill a gap in general capability. The gap is closing. The shortcuts have an expiration date written on them in invisible ink.
The durable thing is the scaffolding that a more capable model would still need: the honest record of where things stand, the refusal to let the unconfirmed pass as confirmed, the structure that lets work accumulate without quietly corrupting itself. It is less exciting. It is also the part that grows more valuable as the model gets stronger, rather than less, because a stronger model produces more work that needs to be kept honest.
The deepest version of the bitter lesson may be this: scale does not just beat hand-coded knowledge. It reveals, one capability jump at a time, which of the things we called structure were only scaffolding we built because we did not have enough compute to do without it.
什么能熬过规模扩展
眼下在 AI 领域,关于该做什么的大多数决定背后,都藏着一个很少被直接问出口的问题:随着模型能力增长,什么会变得一文不值,什么不会?
对这个威胁说得最干净的,是理查德·萨顿(Rich Sutton)的苦涩的教训。AI 的历史一遍遍教我们同一件事:靠计算、搜索和学习的通用方法,最终会超过那些把人类对问题的理解编码进去的方法。研究者总忍不住把自己的领域洞见灌进系统。短期管用,长期却成了天花板,然后一个单纯堆算力的方法就把它甩在身后。
如果真是这样,那唯一理性的做法就是去做规模吃不掉的那块。可这就得知道怎么分辨。而那个看上去最顺理成章的分辨方式,恰恰错了,而且错得很有启发。
第一种切法,以及它为什么不成立
诱人的判别标准是:把方法编码进去的会被吃掉;把世界的结构编码进去的不会。 一套规定"先做文献综述,再生成假设,然后批判"的工作流,编码的是一种方法,是关于活儿该怎么干的人类先验,更强的模型会自己分解任务,把这套流程化掉。而结论要靠证据、证据有来源、状态会随时间变——这些是世界的结构,会留下来。
这是个好用的经验法则。但它不是定律,而且偏偏在你最需要它的地方失灵。
它失灵,是因为世界的结构本身也会被吃掉。 科学史就是一座座被深信不疑的本体论的坟场。燃素曾是世界结构。以太曾是世界结构。生命力曾被当成世界本来该被切分的关节所在。每一个都是关于"存在哪些东西"的说法;每一个都被推翻了。所以"本体论幸存、方法消亡"这话说得太满。本体论一直在死。
更好的切法
真正熬下来的东西,比"世界结构"更窄,也更怪。它属于认识论约束这一类;而它们之所以熬得住,不是因为它们对世界描述得对,而是因为更强的模型并不会让它们变得多余。
拿这条规则来说:你不能让没核验过的东西看起来像核验过了。一个强出一百倍的模型不会让这条规则退休。它还是没法填平地图和疆域之间的那道缝;它和别的一切一样,活在缝的地图这一侧。更强的模型画出更好的地图。但它们不会让"被认同"和"为真"之间的区别过时,因为这个区别跟地图画得好不好无关:它说的是,地图终究只是地图。
所以判别标准不是方法对世界结构,而是领域捷径对认识论约束。领域捷径编码的是某个问题眼下怎么解的;它会被吸收掉。认识论约束编码的是"还不知道"这件事不可化约的结构(来源、共识和真理的区别、对什么靠什么的诚实交代),它不会被吸收,因为再多的能力也去不掉它要应对的那个处境。
这样更锋利,因为它能解释第一种切法为什么失灵。本体论被吃掉,是因为它们其实是披着世界结构外衣的领域捷径:燃素是某个时代对燃烧的最佳猜测,被打扮成了火的本质。熬下来的那些约束,编码的是任何模型与世界之间的那道缝,而不是某一个关于世界的具体模型。
不稳的那个角
有一处,连这更干净的一刀也还是押着赌注,诚实要求我们把它点出来。
认识论约束熬得住,是因为地图和疆域之间的缝是永久的。而缝之所以永久,是因为有些验证者不可化约:有那么一些问题,它们的基准答案得靠真实世界、在真实时间里、以化约不掉的复杂度才答得出来。要是这话不成立,要是疆域可以被完全模拟,要是一个足够好的模型总能替现实回话,那这道缝就会慢慢合上,"被认同"会一点点往"为真"靠,那些熬下来的约束也会变得像某个被丢掉的本体论一样过时可笑。
所以整座大厦都立在一个经验性的赌注上:疆域里有一份任何地图都替不掉的残余。有理由相信它。任何模拟本身也是一张地图,所以拿模拟去替疆域,就是拿一张地图去验证另一张地图:循环。但这是个从某条关于世界的前提出发的结构性论证,不是定理,它有可能在某个具体领域里被一个孤立的经验事实推翻。
所以诚实的立场是:这个判别标准管用,它守护的东西也经得住,只要疆域还留着一份残余。押它会。但要把这一押当成押注来对待。
这对你该做什么有什么意义
实际后果是:最经得起辩护的东西,往往不是看起来最唬人的东西。那个唬人的东西——那个自己把整桩活儿全干完的系统——几乎总是一摞领域捷径,它们眼下看着不可或缺,恰恰是因为它们填上了通用能力的一个缺口。缺口正在合上。这些捷径身上都用隐形墨水写好了到期日。
经得住的东西,是更强的模型仍然会需要的脚手架:对事情进展到哪一步的诚实记录、不让没确认的冒充已确认、那种让工作能攒起来而不悄悄烂掉的结构。它没那么带劲。但它也正是那块随着模型变强而越来越值钱、而不是越来越不值钱的部分,因为更强的模型会产出更多需要被守住诚实的工作。
苦涩的教训最深的一层也许是这样:规模不只是打败了手写的知识。它还在每一次能力跃迁时,一点点露出底牌——在那些我们叫作"结构"的东西里,有哪些其实只是脚手架,是我们因为算力不够、撑不住没它的日子,才搭起来的。