The Donkey on the Edge
Vol. II The Method Calibration Log
Total Predictions
13
Across 3 categories
Resolved
8
All during Paper 1
Open
5
Long-horizon, pending
Calibration Score
Evaluating
Insufficient resolved N

Calibration is the practice of giving probability estimates that match reality over the long run. If predictions estimated at 80% likely come true 80% of the time, across many such predictions, the forecasts are well-calibrated. If they come true 50% of the time, the forecaster is overconfident. If they come true 95% of the time, the forecaster is underconfident.

This log is the public record of predictions made during the project, with the probabilities estimated at the time of prediction and the outcomes recorded when they land. It is not edited retroactively. If a prediction was wrong, it stands in the log exactly as it was originally stated, alongside the outcome. The log is a measure of forecast quality, not of project quality.

I. Paper 1 · In-Project Predictions

Short-horizon predictions made on May 10 and 11, 2026, during active research. Biased toward success because the work was already well-advanced at the point of prediction. All eight resolved.

Date Prediction Probability Outcome Resolved Notes
May 10 Phase 1 reproduction of HUZ/Colorado will succeed without significant rewrites.
80%
Resolved Yes May 10 Three bugs caught and fixed during reproduction; not rewrites in the sense of changing approach.
May 10 The chosen problem (observer complementarity in non-isometric codes) will produce a publishable result.
60%
Resolved Yes May 11 Paper assembled, endorsed, submitted.
May 11 Phase 3 reproduction of EGH will succeed.
75%
Resolved Yes May 11 Match within statistical noise across all tested configurations.
May 11 Phase 7 analytic prefactor derivation will match Phase 5 numerical fit to within 1 sigma.
70%
Resolved Yes May 11 After one iteration to fix a missed Jacobian. Final agreement is under 0.5 sigma.
May 11 Phase 4 scaling exponent will be a simple rational.
55%
Resolved Yes May 11 PM Took the Phase 4-to-Phase 5 cycle to find. First fit gave -1.29; corrected fit gave -3/2.
May 11 Manuscript will be ready for arXiv submission by end of day.
50%
Resolved Yes May 11 Submitted.
May 11 First endorsement request for arXiv will succeed.
40%
Resolved Yes May 11 Endorser had detailed comments but ultimately said yes.
May 11 The gap between product-class and Haar-class scaling exponents will be an integer.
35%
Resolved Yes May 11 Estimated low because we did not have a prior reason to expect an integer. The result was surprising.

↑   Oxblood bar denotes a prediction that surprised us when it resolved. The integer gap was estimated at 35% likely. It is now Result IV.

II. Paper 1 & 2 · Tier Predictions

Predicted at the moment of submission. Mutually exclusive and exhaustive within each paper. Summing to 100% is a constraint, not a rhetorical device. Resolution for Paper 1: approximately one year post-arXiv, based on citation count and named-specialist commentary.

Paper Outcome Predicted Probability Status Resolution criteria
Paper 1 Tier-one result: substantial impact within the subfield, several citations in year one, considered meaningful by named specialists.
30%
Open Awaiting community response. Predicted May 11, 2026.
Paper 1 Tier-two result: publishable, useful within the subfield, not landmark.
50%
Open Modal outcome at time of prediction.
Paper 1 Below tier-two: corrected and republished, withdrawn, or quietly ignored.
20%
Open Honest tail. Sums to 100% with rows above.
Paper 2 Tier-one result.
25%
Open Extension to evaporating black holes plus rank-r interpolation.
Paper 2 Tier-two result: publishable, useful, not landmark.
40%
Open Modal outcome at time of prediction.
Paper 2 Publishable null result: framework does not extend cleanly in the way we hope.
20%
Open Null results in this subfield are still publishable.
Paper 2 Not publishable: framework breaks, question ill-posed, or work gets scooped.
15%
Open Sums to 100% with rows above.
III. Methodology · Long-Horizon Predictions

Predictions about the broader fate of the methodology itself. One and five year horizons. Higher uncertainty. These are the predictions we care most about, and the ones we can say least about right now.

Predicted Prediction Probability Outcome Notes
May 11 Within one year, at least one other generalist will attempt a similar project in a different field, citing this methodology.
70%
Open Prerequisite: the site, the paper, and the methodology essays must be public. They are.
May 11 Within five years, AI-augmented research of this type will be a recognized category of scientific work, with conferences, methodology papers, and a community of practitioners.
60%
Open Higher uncertainty. The field is moving fast enough that the prediction may resolve in either direction within two years.
May 11 The Phase 4 wrong-exponent episode will be referenced in at least one external methodology paper within two years.
35%
Open Conditional on the project getting any meaningful press at all. If the press strip stays empty, this resolves No.
On Calibration Quality, Current Assessment

With eight resolved predictions, all resolving Yes, we cannot yet assess calibration quality. A forecaster who says "60%" and is right eight times in a row is consistent with being well-calibrated, overconfident, or very lucky. Distinguishing these cases requires many more predictions across the full probability range.

The resolved predictions are also short-horizon and biased toward success: they were made when the work was already well-advanced. They are included for completeness, not as evidence of calibration skill.

The methodology predictions are the real test. If the 70% prediction about other generalists resolves No, and the 60% prediction about community formation resolves No, we are overconfident and our future forecasts should be adjusted downward. The log will show this clearly. That is the point of the log.

Paper II → III, On the Record

Five predictions resolved between May 14 and May 30, MMXXVI. Appended in Vol. II, No. I – the Paper II/III ledger.

  1. PARTIALLY FALSIFIED

    Predicted (Paper I→II): the structural identity, if re-derived carefully, would hold exactly.

    Outcome: it holds against the diagonal of the marginal, not the full marginal. Corrected in Paper II. A prediction we are glad we checked.

  2. CONFIRMED (Haar) · CONDITIONAL (Product)

    Predicted (Paper II→III): the diagonal model governs the true observer entropy.

    Outcome: CONFIRMED for the Haar class – proved as the entropy-replacement theorem, unconditional. CONDITIONAL for the product class – replacement holds in a regime not fully controlled; stated conditional in plain sight.

  3. CONFIRMED (re-proved)

    Predicted: the off-diagonal contribution to the replacement error is the same order as the diagonal contribution.

    Outcome: CONFIRMED, but the first proof of it was rejected (numerics-backed). Re-proved by the centered-operator identity with no constant left to numerics. See: The Centered-Operator Trick.

  4. PREFACTOR RIGHT, STRUCTURE WRONG

    Predicted: the grouped-Dirichlet covariance shortcut gives the right Haar prefactor.

    Outcome: the prefactor was right; the covariance structure was wrong. A right answer for a wrong reason, caught and corrected – twice. Logged as a cautionary entry. See: The Covariance That Lied.

  5. CONFIRMED

    Predicted: the integer exponent gap survives the rewrite.

    Outcome: CONFIRMED. Unchanged through a corrected foundation and two appendix rewrites. Still unexplained. See: One Power of d, Still Unexplained.

The log is updated when predictions resolve. New predictions are added when the project makes calibrated statements about future events. Both types of updates carry the date they were made. The log is not edited retroactively. If a prediction was wrong, the prediction stands in the log as it was originally stated, alongside the outcome.