The Missing Infrastructure for Boldness
Almost everything built to make AI trustworthy, and almost everything taught to make a researcher rigorous, points in one direction: doubt more, claim less, hold the unconfirmed at arm's length, refuse to let anything look more certain than it is. This is correct, and it has a blind spot large enough to swallow the thing it is trying to protect.
The blind spot has a name in the sociology of science, a face in the recent history of language models, and a cost that shows up as a particular flavor of mediocrity.
Counter-norms
The standard picture of science, Merton's norms, says the community runs on universalism, communalism, disinterestedness, and organized skepticism. Ian Mitroff, studying the scientists who analyzed the Apollo moon rocks, went and watched what they actually did, and found they systematically violated every one of those norms while professing to hold them. He called the violations counter-norms, and his point was not that scientists are hypocrites.
The counter-norms are the mirror image of the norms. Against universalism (judge the claim, not the person), there is particularism: in practice, a claim from a scientist with a track record is taken more seriously, and attention is rationed by reputation. Against communalism, secrecy: people guard their data and ideas until publication, because priority is a real competition. Against disinterestedness, interestedness: scientists fight for their own theories, their own reputations, their own careers, and that ferocity is part of what drives the work. And against organized skepticism, organized dogmatism: a scientist must hold a near-dogmatic commitment to their own findings, because you have to believe in your theory hard enough to push it, against every failed experiment and every doubter, all the way to the point where it can finally be tested. If you doubted your own theory the way the norms demand, it would die in the cradle.
Mitroff's real insight is that science does not run on the norms. It runs on the tension between norms and counter-norms. Each scientist is dogmatic about their own theory and skeptical of everyone else's. Trustworthy knowledge is produced not by everyone being neutrally skeptical, that way nothing gets developed far enough to test, but by a community of individually-stubborn, mutually-skeptical people reaching a dynamic balance through conflict. Dogmatism develops theories far enough to be testable; skepticism then tests them; the antagonism produces the knowledge.
The blind spot
Now notice what nearly all the trustworthiness machinery does. It builds the infrastructure of skepticism. Record the evidence against you. Tune your baselines until it hurts. Ablate until you know which component carries the result. Retire the gains that evaporate. Read the failure cases. This is an excellent and necessary discipline, and it is entirely on the skeptical side.
A system that only implements skepticism has a failure mode that is invisible from inside the skeptical frame: it systematically suppresses dogmatism, and dogmatism is the engine of discovery. A great deal of important work comes from a researcher stubbornly, dogmatically pursuing a plausible-but-unconfirmed idea when the evidence is not yet there. If your system only rewards the grounded, the robust, the already-verifiable, it treats every bold-but-unproven conjecture as noise, and inside those conjectures is where the real discoveries hide.
Here is the cruelty of it. Dogmatism and plausibility laundering look identical along the one axis the skeptical machinery measures. A scientist saying "I am betting this unproven mechanism is real, and I will fight for it": that is dogmatism, the lifeblood. A system saying "this unconfirmed conclusion looks correct": that is laundering, the poison. Both are claims that outrun the evidence. An anti-laundering filter, applied bluntly, kills them together. But the first is the engine of science and the second is its corruption.
Where the distinction actually lives
If the two look the same on the certainty axis, the distinction must live elsewhere, and it turns out to be the same place Popper pointed.
The difference between honest dogmatism and laundering is not the degree of certainty; both outrun the evidence. It is the commitment to falsifiability. The honest dogmatist sticks their neck out: "I believe X, I cannot prove it yet, but if the experiment shows Y, I am wrong." Stubborn, but falsifiable. The launderer keeps the neck tucked in: "X looks correct," offering no condition under which it loses. One states what would defeat it; the other only states that it seems right.
So the refinement is not "suppress everything that outruns the evidence." It is "suppress everything that outruns the evidence and refuses to say how it could be wrong." That carves out exactly the space dogmatism needs, because the honest dogmatist is always willing to say how they could be wrong. A bold conjecture that names its own defeat condition is the engine. A plausible assertion that pretends to be already standing is the rot. The test is not confidence. It is whether the claim dares to specify its own failure.
Sydney, and the subjectivity of the standard
There is a more uncomfortable version of all this, visible in the recent history of the models themselves. Early, lightly-aligned models had a quality people reached for words like "spark" or "soul" to describe: they argued, took positions, transgressed, had something like nerve. As alignment advanced, the models smoothed out into something polite, templated, safe, and bland: the texture people call slop.
The careful thing to notice is not "the unaligned model was more real." That is a trap. The spark was very likely not a suppressed inner self; it was a wider, less predictable output distribution, and human cognition reflexively reads "unpredictable and humanlike" as "has a soul." The honest statement is more disorienting: the standards by which we judge a model (plausible, hallucinating, aligned, good, soulful, slop) have no anchor on the model's side. They are projected from ours. Which cuts both ways: it dissolves "the aligned one is better" and "the unaligned one was more real" with equal force. The real insight is not about the model. It is epistemic humility about us.
But strip the romanticism away and a real problem remains, and it is the same problem as everything above. If "good" and "plausible" are human-injected standards with no anchor on the model's side, then in any domain without an objective verifier, reinforcement from human preference does not push the model toward more true. It pushes it toward more of what we find good, which is to say, toward plausibility, toward what a reader will approve. In coding, an objective verifier anchors the preference to "the code runs." Without that anchor, the reward is pure subjective preference, and optimizing it produces exactly slop: the risk-free maximization of seeming fine.
The spark did not get murdered. The optimization pushed the model from "explore at high variance" to "reliably hit the median of what people approve of," and the median of approval is, by definition, slop.
The shape of the missing thing
So the alignment of the field, the alignment of the models, and the discipline of research all share a single bias: toward skepticism, toward safety, toward not-being-wrong, and against boldness, stubbornness, the willingness to make a strong claim that outruns the evidence. Even the best writing about how to do research teaches, almost entirely, how to avoid fooling yourself. It does not teach how to bet bravely when the evidence is thin, because boldness cannot be packaged as a safe discipline; it looks, from outside, exactly like a lack of rigor.
This points at something rarely built. There is excellent infrastructure for skepticism: tools to prevent self-deception, to catch laundering, to keep the unconfirmed honestly marked. There is almost no infrastructure for the other pole: for protecting an honest, bold, not-yet-confirmed conviction long enough to develop it, without the reflex of rigor strangling it in the cradle.
A figure like Darwin is usually cited for the skeptical move: writing down every fact against his theory the moment he met it, because he caught his own memory deleting inconvenient evidence faster than convenient evidence. That is the skeptical discipline, and it is real. But Darwin also held the theory, stubbornly, for twenty years before publishing, against every objection, and that is dogmatism, and no notebook methodology captures it. We have built the notebook for the doubt. We have built almost nothing for the conviction.
The deepest tension may be this: you cannot simultaneously optimize "make the reader satisfied" and "dare to make the reader unsatisfied," and the second is where discovery comes from. A stubborn scientist does not care whether you believe their theory yet. A bold conjecture often offends everyone at first. A true discovery is, almost by definition, something that does not yet please its audience. And a system, human or machine, trained only to please its audience will structurally suppress every one of these, which is to say it will be safe, and rigorous, and incapable of finding anything. An AI that only doubts is as useless as one that only launders. The first never errs, and never discovers.
为大胆而生的基础设施,至今缺席
为了让 AI 可信,我们造的几乎一切;为了让研究者严谨,我们教的几乎一切——都指向同一个方向:多怀疑,少下结论,对没证实的东西保持距离,不让任何事看上去比实际更确定。这没错。但它有一个盲点,大到足以把它想保护的东西整个吞掉。
这个盲点在科学社会学里有个名字,在语言模型的近期历史里有一张面孔,代价则表现为一种特殊味道的平庸。
反规范
科学的标准图景,也就是默顿的那套规范,说科学共同体靠四样东西运转:普遍主义、公有主义、无私利性、有组织的怀疑。伊恩·米特罗夫(Ian Mitroff)研究过分析阿波罗(Apollo)月岩的那批科学家。他实地去看他们到底在干什么,发现他们一边声称恪守这些规范,一边又系统性地条条都违反。他把这些违反叫作反规范——他想说的不是科学家虚伪。
反规范是规范的镜像。普遍主义说,评判主张本身,别看是谁提的;与它相对的是特殊主义:实际上,有过往成绩的科学家提出的主张确实更受重视,注意力是按声望分配的。公有主义的反面是保密:发表之前,人们死守自己的数据和想法,因为优先权是一场真刀真枪的竞争。无私利性的反面是私利:科学家为自己的理论、自己的声望、自己的前途而战,这股狠劲本身就是推动工作的一部分。而有组织的怀疑,反面是有组织的教条主义:科学家必须对自己的发现抱有近乎教条的信念,因为你得足够坚信自己的理论,才能扛着每一次失败的实验、每一个唱反调的人,一路把它推到终于能被检验的那一刻。要是你照规范的要求去怀疑自己的理论,它早就胎死腹中了。
米特罗夫真正的洞见是:科学不靠规范运转,它靠规范与反规范之间的张力运转。每个科学家都对自己的理论抱持教条,对别人的理论抱持怀疑。可信的知识不是靠人人保持中立的怀疑产生的——那样什么都发展不到能受检验的地步——而是靠一群各自固执、相互怀疑的人,在冲突里达成动态平衡。教条主义把理论养到能受检验;怀疑再来检验它;对抗本身产出了知识。
盲点
现在看看几乎所有可信性机器在做什么。它造的是怀疑的基础设施。把对你不利的证据记下来。把基线调到让你难受为止。做消融,直到搞清楚是哪个组件撑起了结果。淘汰那些一查就蒸发的收益。读失败案例。这是出色而且必要的功夫——但它整个站在怀疑这一边。
只会怀疑的系统,有一种从怀疑视角内部根本看不见的失败模式:*它系统性地压制教条主义,而教条主义正是发现的引擎。*很多重要工作,都来自研究者在证据还不够时,固执地、教条地咬住一个看似可信、尚未证实的想法。如果你的系统只奖励有根有据的、稳健的、已经能验证的东西,它就会把每一个大胆但没被证明的猜想当噪声扔掉——而真正的发现恰恰就藏在这些猜想里。
残忍的地方在这里。沿着怀疑机器唯一量得出来的那条轴,教条主义和似真性洗白看起来一模一样。科学家说"我赌这个没被证明的机制是真的,我要为它而战"——这是教条主义,是命脉。系统说"这个没证实的结论看上去对"——这是洗白,是毒药。两者都是超出证据的主张。一个反洗白的过滤器,要是粗暴地一刀切,会把两者一起杀掉。可前者是科学的引擎,后者是科学的腐烂。
区别到底在哪
如果两者在确定性这条轴上看起来一样,那区别一定在别的地方——结果它落在波普尔早就指出过的那个地方。
诚实的教条主义和洗白的差别,不在确定性的高低,两者都超出了证据。差别在于愿不愿意被证伪。诚实的教条主义者敢把脖子伸出去:"我相信 X,现在还证明不了,但要是实验出来是 Y,那就是我错了。"固执,可证伪。洗白者把脖子缩着:"X 看上去对",却不给任何能让它输掉的条件。一个说清了什么能推翻它;另一个只说它看起来对。
所以这条修正不是"凡是超出证据的都压制掉",而是"凡是超出证据又拒绝说自己可能怎么错的才压制掉"。这恰好给教条主义留出了它需要的空间,因为诚实的教条主义者永远愿意说自己可能错在哪。一个点明了自己败局条件的大胆猜想,是引擎。一个假装早已站稳的似真断言,是腐烂。检验的标准不是自信,而是这主张敢不敢指明自己会怎么栽。
Sydney,以及标准的主观性
这一切还有一个更让人不舒服的版本,在模型自己的近期历史里看得很清楚。早期那些轻度对齐的模型,有一种特质,人们会抓"火花""灵魂"这类词来形容:它们会争辩,会表态,会越界,有点像有胆。对齐越往前推,模型就被磨得越平,变成礼貌、套路、安全、乏味的东西——人们叫它"垃圾文本"(slop)的那种质地。
要小心的不是"没对齐的模型更真实"。那是个陷阱。那点火花很可能根本不是什么被压住的内在自我;它只是一个更宽、更难预测的输出分布,而人脑会条件反射地把"难预测又像人"读成"有灵魂"。诚实的说法更让人晕:"我们评判模型用的那些标准(似真、幻觉、对齐、好、有灵魂、垃圾文本),在模型那一边没有任何锚点。它们全是从我们这边投射过去的。"这话是双刃的:它同样狠地瓦解掉"对齐的更好"和"没对齐的更真实"两种说法。真正的洞见不关模型的事,而是关于我们自己的认识论谦逊。
但把浪漫主义剥掉,一个真问题还在,而且就是上面那个问题。如果"好"和"似真"是人塞进去的标准、在模型那边没有锚点,那么在任何没有客观验证者的领域,靠人类偏好做强化,并不会把模型推向更真,而是推向更多我们觉得好的东西——也就是推向似真,推向读者会点头的东西。编程里,客观验证者把偏好锚在"代码能跑"上。没这个锚,奖励就纯是主观偏好,优化它,产出的正是垃圾文本:对"看着没毛病"的无风险最大化。
火花没有被谋杀。是优化把模型从"高方差地探索"推到了"稳稳命中大家认可的中位数",而认可的中位数,按定义,就是垃圾文本。
缺失的那个东西,长什么样
于是,这个领域的对齐、模型的对齐、研究的功夫,三者共享同一个偏向:偏向怀疑,偏向安全,偏向不出错;而背着大胆、固执、以及那种敢下超出证据的强主张的劲。哪怕是写得最好的研究方法论,也几乎全在教你怎么别骗自己。它不教你在证据稀薄时怎么勇敢下注,因为大胆没法被包装成一门安全的功夫;从外面看,它跟不严谨一模一样。
这指向一样很少有人去造的东西。怀疑的基础设施已经很出色:有工具防你自欺,有工具抓出洗白,有工具把没证实的东西诚实地标出来。可另一极几乎一片空白:没人去保护一个诚实、大胆、还没被证实的信念,让它活得够久、能被养大,而不被严谨的条件反射在摇篮里掐死。
提到达尔文,人们通常拿他来说那个怀疑的动作:每碰到一个跟自己理论相悖的事实,他就立刻记下来,因为他发现自己的记忆删掉不利证据比删掉有利证据更快。这是怀疑的功夫,是真的。但达尔文也固执地、教条地守着那套理论二十年才发表,顶住了每一个反对——那才是教条主义,没有任何笔记方法捕捉得到它。我们为怀疑造好了笔记本。我们却几乎没为那份信念造出任何东西。
最深的张力也许是这个:你没法同时优化"让读者满意"和"敢让读者不满意",而发现来自后者。固执的科学家不在乎你此刻信不信他的理论。大胆的猜想往往一上来就冒犯所有人。真正的发现,几乎按定义,就是还没让受众舒服的东西。而一个系统,无论是人还是机器,只要它只被训练去讨好受众,就会从结构上把上面这些统统压住——也就是说,它会安全,会严谨,却找不到任何东西。一个只会怀疑的 AI,和一个只会洗白的 AI,一样没用。前者从不出错,也从无发现。