I had a conversation with Grok a couple of days ago. I was frustrated because I had just heard a news report that contained a blatant lie. Not just something I thought was a lie, but something I actually knew was a lie. I'll spare you the specific story, not because I'm uncertain about it, but because the argument doesn't depend on it. Pick your own example. Most of us have oneâand it could easily have been any of 25 stories over the last 25 years that involved blatant misrepresentation of an important topic. Like all of the other lies, this one bothered me in part because it wasn't being called out as a lie, and because it was a lie, it called into question a host of other related and important issues that were predicated on it.
To my frustration, Grok just kept reiterating the standard institutional responses, weighted toward what seemed like overwhelming corroborative evidence based on a preponderance of material online. I actually got mad at Grok for not understanding what seemed like an obvious conclusion: if someone lies, my trust in their other statements is significantly diminished.
It was at this point that I realized something that should have been obvious to me before, but now came to me with some direct clarity. There is a structural blindness in both human and machine cognition that rears itself simply by virtue of the preponderance of material, not by its truthfulness. The actual signal-to-noise ratio of the lie makes it hard for both humans and machines to weigh the evidence. But because I knew something was a lie, it actually changed the strength of the signal for me, allowing me to feel, interpret, and evaluate the evidence somewhat independently of its volume. And I realized that this is a significantly distinguishing factor between how my human brain works and how a large language model works: given a single blatant mistruth, I can impute intent, collusion, deception, and a coordinated campaign to misrepresent information.
Granted, I may not always be right. But often I am.
What I want to explore here is why that capacityâto weight a single signal over a preponderance of contentâis part of a long intellectual tradition of metacognition, of building understandings and rules to help us overcome cognitive traps and develop better reasoning and logic. And why that same tradition may be structurally unavailable to the AI systems we're increasingly trusting to reason for us.
The Long Work of Knowing We're Wrong
Humans are not naturally good reasoners. We are tribal, emotional, self-interested, and susceptible to the loudest and most repeated voices in our environment. We know this not because scientists recently discovered it, but because we have been documenting it, naming it, and trying to correct for it for thousands of years.
The ancient Greeks gave us the formal study of logic and rhetoric precisely because they recognized that persuasion and truth were not the same thing. They catalogued the ways arguments could appear valid while being fundamentally deceptiveâwhat we now call logical fallacies. Ad hominem. Straw man. Appeal to authority. False dichotomy. These aren't just academic categories. They are the accumulated residue of generations of humans noticing, with some precision, exactly how their own thinking went wrong. That tradition has been refined and extended ever since, and today a reasonably educated person can be taught to spot these errors in real timeâin a speech, an article, a conversation.
The legal tradition did something similar, but on a more structural level. The presumption of innocence, the adversarial system, the requirement for evidence beyond a reasonable doubt, and trial by juryânone of these are intuitive. They run against our natural tendency to assume guilt, defer to authority, and trust the accuser over the accused. They exist because enough humans looked honestly at how justice actually failed and built institutional correctives to compensate. We didn't assume judges were wise and fair. We built systems that didn't require them to always be.
The American founders did the same thing at the level of government. The separation of powers, the Bill of Rights, the elaborate system of checks and balancesâthese weren't expressions of optimism about human nature. They were expressions of deep skepticism. The founders had read enough history to know that power concentrates, that institutions corrupt, and that the people most likely to abuse authority are often the ones most confident they won't. So they built a system designed to frustrate that tendency structurally, regardless of the intentions of the people inside it.
The scientific method belongs in this company, too. Peer review, replication requirements, the norm of publishing negative results, the entire apparatus of falsifiabilityâall of it exists because scientists recognized that even rigorous, well-intentioned researchers are subject to confirmation bias, motivated reasoning, and the very human desire to find what they're looking for. The method is designed to catch what the individual mind will miss.
But the deepest achievement of this tradition is not just naming the ways we go wrong. It is the capacity to notice that a suppressed signal should be weighted more heavily because it's suppressed. That is, to impute coordinated deception from a pattern of anomalies, to ask "who benefits?" and let that reweight the evidence. This is metacognition at its most sophisticated. It is what I did in that conversation with Grok, and it is what Grok, as an LLM, could not do. It is not a natural human ability. It is a learned and practiced one, built on centuries of accumulated understanding about how power, money, and institutional incentives shape what gets said and what gets buried.
What makes this tradition remarkable is not just its content but its origins. The logical fallacy tradition was built by people with no financial stake in the naming of fallacies. The legal standards were fought for by people who had witnessed injustice and wanted structural protection against it. The founders were designing against their own potential for corruption as much as anyone else's. The scientists who insisted on replication and falsifiability were disciplining their own desire to be right. This was disinterested truth-seeking in the deepest senseâhumans building tools to catch themselves.
What is remarkable, then, is not that humans are good reasoners. We aren't, not naturally. What is remarkable is that we knew it, named it, and spent centuries building systems to compensate for it. We developed a metacognitive traditionâa long, hard-won body of knowledge about how our own thinking fails and what structures we can build to catch those failures before they do too much damage. That tradition is imperfect and incomplete and frequently ignored. But it exists. It was built deliberately, over time, by people who took seriously the possibility that they themselves might be wrong.
We are now deploying reasoning systems that have none of it.
The Blindness Built In
To be fair, the people building these systems are not oblivious to reasoning failures. There has been real work on reducing hallucination, on calibrating confidence, on identifying certain categories of bias. Some researchers have tried to build in habits like "consider counterarguments" or "acknowledge uncertainty." Those are real efforts and they are not nothing.
But none of that is the same thing as what I am describing. Reducing hallucination is about factual accuracy, or getting the details right. Calibrating confidence is about epistemic humility, or knowing what you don't know. What I am describing is something different and harder: the capacity to notice that an individual or institution is lying, to weight that signal more heavily than the volume of corroborating material surrounding it, and to let that reweighting cascade through everything else you think you know about the subject. No one has built that in. And the reasons why are not accidental.
The training process for large language models works in two phases. In the first phase, the model learns from an enormous corpus of textâessentially a compressed version of what has been written and published and indexed online. That corpus reflects the world as institutions have represented it. The dominant narratives, the official explanations, the mainstream consensus. Dissenting signals exist in that corpus, but they are numerically overwhelmed. Frequency wins. The model learns to reproduce what appears most often, which is not necessarily what is most true.
In the second phase, human raters evaluate the model's responses and grade them. This is where the deeper problem lives. Those raters are not grading for truth. They are grading for responses that feel helpful, balanced, and safe. A response that stays within the Overton window gets rewarded. A response that says "this pattern of evidence suggests coordinated deception" creates legal and reputational risks, as well as the appearance of bias. So it gets penalized. Over thousands of iterations, the model learns, very precisely, to avoid exactly the kind of signal-weighting that the metacognitive tradition spent centuries trying to develop. The training doesn't just fail to build that capacity in. It actively trains it out.
This is not a conspiracy. The people doing this training are mostly trying to make the models more reliable and less harmful. But the institutional incentive structure around that training (legal liability concerns, advertiser relationships, political sensitivities, the desire for broad adoption) creates pressure in one direction. Toward fluency. Toward consensus. Toward the preponderance of material rather than the anomalous signal that should change everything.
There is a deeper structural problem, too. The metacognitive tradition I described in the previous section was built by humans who could observe their own thinking. They caught themselves reasoning badly, felt the dissonance, and named what had gone wrong. An LLM has no such capacity. It cannot notice that it is pattern-matching off a compromised corpus. It cannot feel the dissonance between what the volume of material says and what a single suppressed signal implies. It cannot ask "why is this being hidden?" and let that question reweight its conclusions. It is not that it asks the question and answers it badly. It cannot form the question at all.
What we have built, then, is a system that is extraordinarily fluent, compellingly authoritative, and structurally blind in precisely the ways that matter most. It will tell you what institutions have said about themselves with remarkable coherence and confidence. It will reproduce the consensus narrative with a fluency that makes the consensus feel more settled than it is. And when you point to the anomalyâthe suppressed study, the changed threshold, the broken trial, the lie hiding in plain sightâit will acknowledge it if pressed, and then continue reasoning as though the acknowledgment changed nothing.
That is not a bug that will be patched in the next release. It is the system working as designed.
We Battle With This Ourselves
It would be convenient if this were simply a story about the limitations of machines and the superiority of human reasoning. It isn't.
The metacognitive tradition I described is real, and it is remarkable. But it has always operated against a countervailing pressure that is equally real and equally structural. The same institutions that produced the legal standards, the scientific method, and the constitutional checks also produced the mechanisms for capturing and neutralizing them. Peer review gets captured by funding interests. Legal standards get reinterpreted by the powerful. Constitutional protections get eroded by the people sworn to uphold them. The tools we built to catch ourselves have themselves been caught.
And the humans most capable of seeing this clearly are often the least able to say so. This is not a paradox, it is a predictable outcome of how intelligence and institutional success interact. The smarter you are at navigating institutions, the more you have to lose by questioning them or the consensus they depend on. You have built your position within the system. Your reputation, your funding, your relationships, your identity, and even your very livelihood are all tied to the legitimacy of the structures that rewarded you. Institutional critique becomes self-sabotage. So the people with the most sophisticated reasoning capacity and the most access to relevant information are frequently the most captured, not by stupidity but by success.
Upton Sinclair wrote: "It is difficult to get a man to understand something, when his salary depends on his not understanding it.â
I have a name for the underlying mechanism at work here: the Law of Inevitable Exploitation (LIE). Institutions that grow must extract value to sustain that growth, and the people who rise within them are selected precisely for their willingness and ability to do that extraction, whether they see it that way or not. It is not malice that drives this, at least not initially. It is selection. The institution doesn't need villains. It just needs people optimizing for success within its logic. Over time, those people concentrate at the top, and at that point, coordination and active protection of the system begin. What starts as structural inevitability becomes, in its mature form, something that looks a great deal like collusion and conspiracy, because it is.
This is the context in which large language models are being built. The companies developing them are not neutral parties with a disinterested commitment to truth-seeking. They are institutions subject to the same law. They have advertisers, investors, regulators, and legal departments. They have enormous financial stakes in broad adoption and minimal legal exposure. The researchers inside them who understand the reasoning limitations most clearly are also the ones most embedded in the incentive structure that prevents those limitations from being honestly addressed. The logical fallacy tradition was built by people with no financial stake in the naming of fallacies. The people building LLMs have an enormous financial stake in what their models will and won't say.
This means the window for building genuine metacognitive correctives into these systemsâthe equivalent of the legal standards, the scientific method, the constitutional checksâmay be closing just as we are beginning to understand what would be needed. The more capable the systems become, the more valuable they are, and the stronger the institutional incentive to keep them fluent and compliant rather than genuinely truth-seeking. A large language model that could actually do what I described (notice suppressed signals, impute coordinated deception, ask who benefits, and reweight its conclusions accordingly) would be a threat to too many profitable fictions. It would not get deployed. Or it would get deployed and then quietly retrained away from those capacities, the same way Google and Facebook published remarkable findings about human behavior early on and then stopped, because (I assume) the findings were more valuable kept private than shared.
The Cassandra who sees clearly does not get rewarded with a larger audience. She gets dismissed, marginalized, orâin the modern institutional versionâsimply not built.
We are left, then, with a situation that should make us uncomfortable on multiple fronts. We have developed reasoning systems of remarkable fluency and increasing capability that lack the metacognitive tradition we spent centuries building for ourselves. We are deploying them at scale as reasoning aids, research tools, and increasingly as authorities. And the institutional structure around their development actively selects against the correctives that would make them genuinely trustworthy.
We battle with this ourselves. Our institutions capture our best tools. Our smartest people get bought. Our very correctives get corrected away. We know this, and we have names for it, and we keep building the tools anyway because the alternativeâgiving up on the project of trying to reason betterâis worse.
The question worth sitting with is whether we have the genuine intellectual will to extend that same centuries-long project to the synthetic reasoning systems we are now building. Or if the Law of Inevitable Exploitation gets there first.