By the end of Paper II we had a correct foundation and a question we could not grade ourselves on: does the diagonal model govern the true entropy? The trouble was structural. The two of us – a human PI and an A.I. collaborator – had built every claim in the document. We were the worst possible judges of whether those claims were proved or merely persuasive, because everything we had written looked persuasive to us. That is what writing it does.
So we did the obvious thing, late: we got an outside referee. Not a human one – we did not have one to hand – but a second A.I., a reviewer persona run on ChatGPT that we came to call Scholar. We gave it three deliberate handicaps. It could not see our code, so it could never be reassured by “the simulation agrees.” It could not browse the web, so it judged the argument in front of it rather than the reputation around it. And it was instructed to be hostile in the specific way good referees are hostile: demand real lemmas with real expectation bounds, reject any constant that was left to numerics inside a theorem claimed as unconditional, and force every conditional result to wear the word “conditional” on its face.
It was, immediately, harder to satisfy than we were. The first thing it did was refuse the part of the proof we were quietly least sure about – the bound on the diagonal-to-bulk error, which an earlier draft closed by splitting a matrix into two parts and asserting, on the strength of a numerical script, that the off-diagonal part was “the same order” as the diagonal part. Scholar’s response was the correct one and the one we had not made to ourselves: a number from a script can support a theorem, but it cannot be a step in the proof of a theorem you are calling unconditional. Take the constant out of the machine and put it in the mathematics, or stop calling the theorem unconditional.
That single refusal is what turned Paper III from a better-written Paper II into a genuinely stronger result. It forced the centered-operator route – the algebraic identity that closes the bound with no numerical constant at all – which has its own post. It forced the restructure that made the entropy-replacement theorem the headline. And it forced the honest demotion of the product law to conditional.
I want to be careful about the credit and the limits both. Scholar is not a co-author and not an oracle. It cannot certify that our result is new against the entire literature – no reviewer without library access can. It can be wrong, and once or twice it was. But the value of an adversary is not that it is always right. It is that it is not you, and it is not invested in your being right. We gave it the tools to be a real critic and denied it the tools to be a lazy one, and it paid for itself in the first round.
If you are a generalist doing this kind of work with an A.I.: do this earlier than we did. The pair that builds the thing cannot certify the thing. Get a second instance, make it mean, and take away its shortcuts.