cankun.me

Querying a Black Box

Jun 14, 2026

Some kinds of work are debugging a system whose source code exists. Other kinds are interrogating a system that has no source code at all, that you can only poke from the outside, that answers you through an instrument that may itself be lying. These two situations feel similar from the inside, both involve uncertainty, both involve careful reasoning toward a conclusion, but they are epistemically different in a way that explains almost everything about why automation works beautifully for one and treats the other like quicksand.

It is worth building the distinction slowly, because the philosophy that illuminates it is old, and the old version is sharper than the casual version that gets passed around.

Hume's problem, in its strong form

The induction problem is usually told softly: you have seen a thousand white swans, you cannot conclude the next is white. Told that way it sounds like a warning against overgeneralizing. Hume's actual version is much harder.

Why do you believe the future will resemble the past at all? Say "because the past has always resembled the past": that is itself an inductive inference, using induction to justify induction, circular. Say "because nature is uniform": and why believe nature is uniform? Only because it has been so far, which is induction again, circular again. Hume's conclusion is not that induction is somewhat unreliable. It is that induction has no rational justification, none. Every "the sun will rise tomorrow," every "this drug will still work next week," is, as a matter of logic, exactly as groundless as its denial. We make these inferences anyway, Hume says, not because we have a reason, but out of habit: a psychological fact, not a logical entitlement.

This matters because the entire enterprise of reasoning from data to conclusion rests on the step Hume dissolved. When you go from "in my sample, X and Y are associated" to "X and Y are really associated," there is no logical bridge. The work is not useless, but its validity was never the thing holding it up. Something else is. What that something else is, is what the next two centuries of philosophy argue about.

Popper's escape, and what it costs

Popper accepts that Hume won, induction has no logical justification, and says the mistake was thinking science runs on induction at all. It does not.

You can never verify a universal claim by observation, because the next observation might break it. But you can falsify it with a single counterexample. Verification and falsification are logically asymmetric: no number of white swans confirms "all swans are white," one black swan refutes it. So science is not the accumulation of supporting evidence until you are sure. It is the proposal of bold, falsifiable conjectures, and the attempt to kill them. What survives the attempt is provisionally kept. A theory's scientific status is not how much evidence supports it but whether it sticks its neck out: whether it says, in advance, what observation would prove it wrong.

There is a quiet, important consequence here. For a conjecture to have any falsifying power, the prediction must come before the evidence. State what you bet you will see, then look. If you look first, see the result, and then construct an explanation, you have not tested anything, because any result can be fit with some story after the fact, and a story that explains everything has been refuted by nothing. The temporal order is not bookkeeping. It is the whole difference between a test and a rationalization.

Why falsification is never clean

Popper is beautiful and incomplete, and the incompleteness has a name: the Duhem-Quine problem. When an experiment contradicts a theory, what exactly has been falsified?

A prediction is never derived from a theory alone. It comes from the theory plus a large cloud of auxiliary assumptions: the instrument works, the reagents are pure, the sample is clean, the statistical model is appropriate, that one confounder you never thought of is absent. When the result conflicts, logic tells you only that something in the whole bundle is wrong. It does not tell you which. So you can always rescue the theory you love by blaming an auxiliary: "must be the equipment, run it again." Falsification is never the clean single stroke Popper imagined. A black swan can always be explained away as "not really a swan" or "I misread."

In domains where the instruments are transparent, this rarely bites: when you read a variable in a debugger, the debugger does not lie, so a failed test usually means the code is wrong, and blame is locatable. But where the instrument is itself opaque, where the measuring apparatus is its own black box you don't fully understand, the blame for a contradiction cannot be cleanly assigned. You wanted to interrogate one black box, and your interrogation tool turned out to be another.

Map and territory

All of this rises to a single picture. Every model, every claim, every hypothesis is a map. The territory is the real world, existing independently of any map. Hume's problem, Popper's asymmetry, Duhem-Quine's undecidability, all of them flow from one fact: we only ever have access to maps comparing against maps. We never get to step outside all maps and grab the territory directly to check.

You want to verify a claim. With what? Another observation, which is itself a map, mediated by your instrument, your method, your conceptual frame. You are forever on the map side, and the territory intervenes only in a few places where it is forced to answer directly: the genuine experiment, and above all the experiment that puts a question to the irreducible complexity of the real world over real time. Those are the rare moments the territory talks back. Everywhere else, what you take for a fact is a map agreeing with a map.

The unifying frame: querying a black box

Put it together and the distinction at the start becomes precise. Some work is debugging a white box: a system with source code, whose state you can read directly, whose measurements are transparent. Other work is querying a black box: a system with no source code, that you cannot read but only perturb, whose every answer comes back through a noisy, possibly-deceptive instrument, and which never even promised that its underlying rules are simple or stable.

This one frame explains the whole asymmetry. Why one kind of work has a cheap verifier and the other does not: the white box's state can be read directly and compared, while the black box's state can only be perturbed and inferred through noise. Why automation soars on one and stalls on the other: querying the white box is instant, certain, infinitely repeatable; querying the black box is slow, noisy, and never exactly reproducible. Why plausibility is so dangerous in one and not the other: in the white box a plausible error is quickly exposed by transparent measurement, while in the black box a plausible error can hide forever behind "maybe it was the instrument." And why the deepest verifier is irreducible: to query the ultimate black box, the real world, in its full diversity, over real time, there is no cheaper query that substitutes, because every cheaper query is a different black box, separated from the real one by another map-territory gap that cannot be cleanly crossed.

The frame does not solve the problem. Nothing solves it; it is the structure of being on the map side. But it tells you what kind of problem you are in, which is the prerequisite for not lying to yourself about what your conclusions are worth. The discipline that follows is simple to state and hard to keep: never let "I have a map of it" pass for "I have touched the territory." The map can be excellent. The territory still has not spoken.

← Back to Writing