The Dated Innovation Log as an Audit Artifact: Writing the Verdict Before You Know the Ending

The most defensible document from our first year of subsurface machine-learning work for a mid-sized Middle East carbonate operator is not the model we eventually shipped. It is a plain dated log, a two-page monthly-progress infographic drawn in Canva, that recorded what we tried each month and what verdict it earned. Most of those verdicts were some version of "no." That log, not the model, is what let us stand in front of a ministry board and, later, a set of reviewers and say precisely why we had ruled out the approaches we ruled out. This piece is about the log itself as an artifact of governance and integrity, and about the small disciplines that make such a record hold up when someone with reason to doubt you reads it line by line.

The mechanics of what failed and what replaced it are told elsewhere. The clustering post-mortem, the reason unsupervised methods could not separate the features, and the pivot that followed live in a companion piece; the story of how a first overfit model was fixed by working the data rather than the network lives in another. Neither is re-litigated here. What we want to argue instead is narrower and, we think, more portable: that a dated log kept as you work is a different class of evidence from a narrative written once you know how the story ends, and that the difference is the whole reason to keep one.

What an audit actually asks of a research record

A ministry board and a peer reviewer want the same thing, phrased differently. They are not asking whether your final method is good. They are asking whether you were honest about the road to it, because a method presented without its discarded alternatives is a method you cannot cross-examine. The single question underneath every audit is this: did you rule the alternatives out, or did you simply stop looking when one thing worked?

A dated log answers that question in a way a results section never can. To survive the reading, a log carries three things for each approach: the approach named plainly, the date it was live, and a falsifiable exit written at the time, a specific condition under which it was declared dead. "Density-based clustering, first quarter, abandoned because no single parameter setting held across sections after more than ten thousand were tried" is auditable; each clause can be checked. "We explored clustering and moved on" is not a record but a summary of one, and a summary is exactly what an auditor cannot trust, because summaries are written by the winner.

The falsifiable exit is the load-bearing part. An exit condition recorded in the moment is a commitment you cannot quietly revise: here is what would have made us keep going, and it did not happen. Reconstructed afterward, an exit always fits the decision you already made, because you write it to justify a conclusion you already hold. The log removes that degree of freedom. It forces the verdict to be a thing that happened on a date, not a thing you believe now.

Why the verdict written in the moment beats the tidy story

There is a strong pull, at the end of any project that worked, to present a clean line from problem to solution and let the months that went nowhere fall away. The clean line is easier to read and it makes everyone look more certain than they were. It is also the single most effective way to launder the negatives out of a record.

Reconstruction does not lie so much as it smooths. Writing after the fact, you remember the abandoned branch as obviously doomed, because you know now that it was, and you compress six weeks of genuine uncertainty into a sentence that makes the outcome sound foreordained. The dead-ends stop being experiments with real stakes and become throat-clearing before the real result. That smoothing is what destroys the audit value, because the thing a reviewer needs to see is that the alternatives were live and seriously pursued before they were killed. A negative result only counts as evidence if it was a real attempt, and only a contemporaneous record can show that it was.

Writing the verdict in the moment costs the discomfort of committing a "Fail" to paper while you still half-believe the approach might be rescued. It buys a record no later story can overwrite, because the entry is dated and the date is earlier than the ending. When we later had to explain why label-free methods had been set aside, we were not building a case. We were reading one off a document we had kept the whole time.

Numbers make a verdict checkable

A verdict without a number is an opinion; a verdict with one is a claim someone else can test. "We tried a lot of settings" is an anecdote. "More than ten thousand parameter combinations, and the tuning was logged as very unstable" is a finding an auditor can ask to see. "Clustering was slow" is a shrug. "Out of memory on a fifty-gigabyte machine" is a constraint anyone can reproduce on the same hardware.

Record the number next to the verdict at the moment you reach it, because the number is what makes the exit condition falsifiable. Written later, quantities drift toward whatever makes the story tidy. A log full of round, convenient figures is a log to distrust; a log carrying odd, specific quantities, an awkward memory ceiling, a strange cluster count that alone held together, reads like a measurement rather than a memory.

The dated innovation log rendered as a funnel, read across three quarters. Each approach the log recorded a verdict for terminates in the quarter it was retired: DBSCAN and hierarchical clustering (Jan-Mar, over 10,000 parameter combinations, logged as very unstable, hierarchical ran out of memory and only held together at K=4), GAN NaN-fill (undesired output), and unsupervised segmentation on static sections (discarded); SCOPS self-supervised segmentation dropped in Apr-Jun after 5/3/2-class trials. Conventional fitting is kept for depth only, where its confidence was good, and the supervised pipeline setup carries forward. The funnel mouth narrows column by column as approaches drop out, and the one branch drawn in orange, the supervised transformer, is the only path that survives every quarter. The recorded verdict for every branch is listed in a right-hand legend, tied back to its node so the drawing never crowds itself. The exhibit is not a technical post-mortem, which lives in the companion pieces; it is a picture of what a contemporaneous log preserves. Toggle the lever to strip the retired rows from the drawing: with nothing dated and retired first, the surviving pivot has no evidence behind it, which is the whole reason the retired branches, not the survivor, are the artifact worth keeping. Every node label and verdict is sourced from the monthly-progress log; nothing here is illustrative.

The struck-out list is the artifact, not the debris

The instrument above is a plain rendering of the log itself: each approach that earned a recorded verdict, placed in the quarter it was retired, with the reason preserved beside it. Read it as an audit trail, not a technical diagram. Strip the retired branches away, as the toggle does, and you are left with the surviving path and no case for it. The pivot reads as sound only because the alternatives were struck out first, on dated rows, with numbers attached.

That is why we treat the crossed-out entries as the deliverable rather than as failures to forget. Individually each is a dead-end. Collectively, dated and preserved, they are the proof that the decision at the end of the funnel was earned by elimination, not chosen by preference. A record showing only the winner is asking to be trusted; a record showing the whole field, marked with where and when each option died, has already earned it. The value is in the columns of retired approaches, not the single arrow that survives them.

The placeholder is a feature, not an embarrassment

The working infographic carried, into every saved version, an unedited Canva placeholder paragraph that nobody removed. It is the kind of thing you catch yourself apologizing for. We have come to think it is the most persuasive single element in the whole record.

An unedited placeholder is an integrity signal. It is evidence that the document was a working artifact, updated in the middle of the work and never dressed up for an audience. A record polished for presentation would have had the placeholder cleaned out along with, one worries, the months that did not work; the same instinct removes both. The placeholder that survives is a small, involuntary proof that nothing was back-filled, that the log is contemporaneous rather than reconstructed. The mess is not a lapse in the record. It is the watermark that tells you the record is real.

What to keep, in practice

The transferable practice is short. For every approach, write it down while it is live, with the date. When you retire it, record the verdict and the specific condition that killed it, in the moment, before you know whether the next thing will work. Keep the numbers next to the words. And resist the urge to tidy the artifact for an audience, because the roughness is what makes it trustworthy. The goal is not a beautiful document. It is a document a skeptical reader, a board, a reviewer, your own future self, can use to reconstruct what you decided and, harder, that you were entitled to decide it.

Limitations

This is an account of documentation practice, not a benchmark, and it argues from a single engagement. The verdicts we quote were engineering judgements recorded under time pressure, and some approaches were retired partly because pursuing them further would have cost time committed elsewhere, not because they were proven impossible in principle. A contemporaneous log defends the honesty of a decision; it does not by itself prove the decision was optimal, and a well-kept record of a wrong turn is still a record of a wrong turn. What we defend is the narrower claim: that dating negatives as they happen, with their exit conditions and their numbers, produces a class of evidence that a reconstructed narrative cannot, and that this is worth the discomfort of writing "Fail" before you are sure.

References

[1] Sandve, G. K., Nekrutenko, A., Taylor, J., Hovig, E. Ten Simple Rules for Reproducible Computational Research. PLoS Computational Biology, 2013. On dated, contemporaneous record-keeping as the basis of a defensible computational result. https://doi.org/10.1371/journal.pcbi.1003285

[2] Munafò, M. R. et al. A Manifesto for Reproducible Science. Nature Human Behaviour, 2017. On pre-committed criteria and transparent records as safeguards against post-hoc rationalisation. https://doi.org/10.1038/s41562-016-0021

The Dated Innovation Log as an Audit Artifact: Writing the Verdict Before You Know the Ending

What an audit actually asks of a research record

Why the verdict written in the moment beats the tidy story

Numbers make a verdict checkable

The struck-out list is the artifact, not the debris

The placeholder is a feature, not an embarrassment

What to keep, in practice

Limitations

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on