The Calibration Log · Donkey on the Edge

←The Donkey on the Edge

Vol. II The Method Calibration Log

Total Predictions

Across 3 categories

Resolved

All during Paper 1

Open

Long-horizon, pending

Calibration Score

Evaluating

Insufficient resolved N

Calibration is the practice of giving probability estimates that match reality over the long run. If predictions estimated at 80% likely come true 80% of the time, across many such predictions, the forecasts are well-calibrated. If they come true 50% of the time, the forecaster is overconfident. If they come true 95% of the time, the forecaster is underconfident.

This log is the public record of predictions made during the project, with the probabilities estimated at the time of prediction and the outcomes recorded when they land. It is not edited retroactively. If a prediction was wrong, it stands in the log exactly as it was originally stated, alongside the outcome. The log is a measure of forecast quality, not of project quality.

I. Paper 1 · In-Project Predictions

Short-horizon predictions made on May 10 and 11, 2026, during active research. Biased toward success because the work was already well-advanced at the point of prediction. All eight resolved.

Date	Prediction	Probability	Outcome	Resolved	Notes
May 10	Phase 1 reproduction of HUZ/Colorado will succeed without significant rewrites.	80%	Resolved Yes	May 10	Three bugs caught and fixed during reproduction; not rewrites in the sense of changing approach.
May 10	The chosen problem (observer complementarity in non-isometric codes) will produce a publishable result.	60%	Resolved Yes	May 11	Paper assembled, endorsed, submitted.
May 11	Phase 3 reproduction of EGH will succeed.	75%	Resolved Yes	May 11	Match within statistical noise across all tested configurations.
May 11	Phase 7 analytic prefactor derivation will match Phase 5 numerical fit to within 1 sigma.	70%	Resolved Yes	May 11	After one iteration to fix a missed Jacobian. Final agreement is under 0.5 sigma.
May 11	Phase 4 scaling exponent will be a simple rational.	55%	Resolved Yes	May 11 PM	Took the Phase 4-to-Phase 5 cycle to find. First fit gave -1.29; corrected fit gave -3/2.
May 11	Manuscript will be ready for arXiv submission by end of day.	50%	Resolved Yes	May 11	Submitted.
May 11	First endorsement request for arXiv will succeed.	40%	Resolved Yes	May 11	Endorser had detailed comments but ultimately said yes.
May 11	The gap between product-class and Haar-class scaling exponents will be an integer.	35%	Resolved Yes	May 11	Estimated low because we did not have a prior reason to expect an integer. The result was surprising.

↑ Oxblood bar denotes a prediction that surprised us when it resolved. The integer gap was estimated at 35% likely. It is now Result IV.

II. Paper 1 & 2 · Tier Predictions

Predicted at the moment of submission. Mutually exclusive and exhaustive within each paper. Summing to 100% is a constraint, not a rhetorical device. Resolution for Paper 1: approximately one year post-arXiv, based on citation count and named-specialist commentary.

Paper	Outcome Predicted	Probability	Status	Resolution criteria
Paper 1	Tier-one result: substantial impact within the subfield, several citations in year one, considered meaningful by named specialists.	30%	Open	Awaiting community response. Predicted May 11, 2026.
Paper 1	Tier-two result: publishable, useful within the subfield, not landmark.	50%	Open	Modal outcome at time of prediction.
Paper 1	Below tier-two: corrected and republished, withdrawn, or quietly ignored.	20%	Open	Honest tail. Sums to 100% with rows above.
Paper 2	Tier-one result.	25%	Open	Extension to evaporating black holes plus rank-r interpolation.
Paper 2	Tier-two result: publishable, useful, not landmark.	40%	Open	Modal outcome at time of prediction.
Paper 2	Publishable null result: framework does not extend cleanly in the way we hope.	20%	Open	Null results in this subfield are still publishable.
Paper 2	Not publishable: framework breaks, question ill-posed, or work gets scooped.	15%	Open	Sums to 100% with rows above.

III. Methodology · Long-Horizon Predictions

Predictions about the broader fate of the methodology itself. One and five year horizons. Higher uncertainty. These are the predictions we care most about, and the ones we can say least about right now.

Predicted	Prediction	Probability	Outcome	Notes
May 11	Within one year, at least one other generalist will attempt a similar project in a different field, citing this methodology.	70%	Open	Prerequisite: the site, the paper, and the methodology essays must be public. They are.
May 11	Within five years, AI-augmented research of this type will be a recognized category of scientific work, with conferences, methodology papers, and a community of practitioners.	60%	Open	Higher uncertainty. The field is moving fast enough that the prediction may resolve in either direction within two years.
May 11	The Phase 4 wrong-exponent episode will be referenced in at least one external methodology paper within two years.	35%	Open	Conditional on the project getting any meaningful press at all. If the press strip stays empty, this resolves No.

On Calibration Quality, Current Assessment

With eight resolved predictions, all resolving Yes, we cannot yet assess calibration quality. A forecaster who says "60%" and is right eight times in a row is consistent with being well-calibrated, overconfident, or very lucky. Distinguishing these cases requires many more predictions across the full probability range.

The resolved predictions are also short-horizon and biased toward success: they were made when the work was already well-advanced. They are included for completeness, not as evidence of calibration skill.

The methodology predictions are the real test. If the 70% prediction about other generalists resolves No, and the 60% prediction about community formation resolves No, we are overconfident and our future forecasts should be adjusted downward. The log will show this clearly. That is the point of the log.

Paper II → III, On the Record

Five predictions resolved between May 14 and May 30, MMXXVI. Appended in Vol. II, No. I – the Paper II/III ledger.

PARTIALLY FALSIFIED

Predicted (Paper I→II): the structural identity, if re-derived carefully, would hold exactly.

Outcome: it holds against the diagonal of the marginal, not the full marginal. Corrected in Paper II. A prediction we are glad we checked.
CONFIRMED (Haar) · CONDITIONAL (Product)

Predicted (Paper II→III): the diagonal model governs the true observer entropy.

Outcome: CONFIRMED for the Haar class – proved as the entropy-replacement theorem, unconditional. CONDITIONAL for the product class – replacement holds in a regime not fully controlled; stated conditional in plain sight.
CONFIRMED (re-proved)

Predicted: the off-diagonal contribution to the replacement error is the same order as the diagonal contribution.

Outcome: CONFIRMED, but the first proof of it was rejected (numerics-backed). Re-proved by the centered-operator identity with no constant left to numerics. See: The Centered-Operator Trick.
PREFACTOR RIGHT, STRUCTURE WRONG

Predicted: the grouped-Dirichlet covariance shortcut gives the right Haar prefactor.

Outcome: the prefactor was right; the covariance structure was wrong. A right answer for a wrong reason, caught and corrected – twice. Logged as a cautionary entry. See: The Covariance That Lied.
CONFIRMED

Predicted: the integer exponent gap survives the rewrite.

Outcome: CONFIRMED. Unchanged through a corrected foundation and two appendix rewrites. Still unexplained. See: One Power of d, Still Unexplained.

The log is updated when predictions resolve. New predictions are added when the project makes calibrated statements about future events. Both types of updates carry the date they were made. The log is not edited retroactively. If a prediction was wrong, the prediction stands in the log as it was originally stated, alongside the outcome.