The most useful thing we shipped in the middle of this engagement did not detect a fracture or fill a gap in a borehole-image log. It was a spreadsheet that became an application. We called it the data-management dashboard, and across the messy stretch of Phase 2 it went from V1 to V11. Nobody sat down to design it. Each version was a patch on the last week's confusion, and the whole sequence is a fairly honest record of what internal tooling has to become when the data it tracks arrives late, wrong, and out of order.
This is a build story about that dashboard. Not the models it fed, but the ledger itself: why it kept getting a new version number, what each one added, and the single moment, at V9, when it stopped being a private QC crutch and became the thing that let us name a program risk on a board deck.
The first version was a table nobody trusted
V1 was a spreadsheet with one tab per delivery. When a batch of wells arrived, someone opened the tab and counted, by hand, which channels were present: the borehole-image log, the apparent dip and azimuth picks, the well radius, the PDF. That was the entire tool. It answered exactly one question, "did this delivery contain what we expected," and it answered it badly, because two people counting the same folder on two days would disagree.
The disagreement was the point. Data was not landing as clean triplets of image log, picks, and PDF. It landed as a folder with a depth range in the file name that did not match the depth range inside the file, with raw pick sheets where the depth jumped from one interval to another with no rows in between, and with static-value ranges that were an order of magnitude off what a borehole-image tool normally writes. A hand-counted tab could not hold any of that. So we versioned it.
Every version number was a defect we had learned to expect
The jump from V1 through the early versions was not planned refactoring. It was us encoding, one at a time, the specific ways a delivery could be broken that we had already been burned by.
An early version added received-and-missing columns per well, so the count stopped being a manual read. The next reconciled the depth range claimed by the delivery against the depth range actually present in the file, because we had shipped analysis against a well before noticing the log covered less depth than the manifest promised. Another version interrogated the sampling interval inside the raw pick sheets, flagging the jumps where picks skipped hundreds of meters, so we could send the operator a precise question instead of a vague worry. A later one added static-range checks per logical file, which is how two wells eventually got flagged for abnormal static ranges and set aside.
None of these were features in the product-roadmap sense. Each one was a scar. The dashboard matured because the data kept finding new ways to be wrong, and the only durable fix was to teach the ledger to catch that class of wrongness automatically rather than trusting a human to notice it by eye at eleven at night.
The version numbers were real, and the decks cite them
It would be easy to round this off as a figure of speech. It is not. The version span shows up in the engagement record. The board deck for the Phase 2 review cites the innovation funnel running from the NaN remover and the sinusoid picker through to the data-management dashboard at V1 to V7, and later V9. The weekly running log enumerates the versions past that, through V11, including the branch where two parallel variants carried the same number for a stretch. The dashboard has a genuine version history, tracked alongside the models, because it was maintained with the same discipline as the models.
The ladder above is the argument in one frame. Drag the version lever and the marker climbs from V1, an ad-hoc QC table, toward V11, a structured well-status ledger. Every rung is one capability the tool gained, and the height is the point: the dashboard did not add features sideways, it climbed. The orange rung is V9, and V9 is where this story turns.
V9 is where a private tool became a board argument
By the time we reached V9, the ledger held a whole well's status rather than a single delivery's contents. It knew which wells had arrived, which had usable static ranges, which were excluded, and how the running total compared to what the next phase needed. That last comparison is what changed the tool's job.
Phase 3 was expected to start in November 2022, and its minimum data requirement was on the order of 10 to 15 wells. The ledger, by V9, could put those two facts next to the actual count in one view: all 10 wells delivered with their image logs, dip and azimuth, radius and PDF, but two excluded for abnormal static ranges, leaving 8 usable. Eight against a minimum of ten to fifteen, with a phase boundary two months out and the pipeline not filling. Stated that plainly, it was no longer a QC nuisance. It was a program risk.
The phrase that made it onto the deck was that data collection was moving at a really slow pace and timelines would shift. That sentence reads like a management observation. It was actually a dashboard reading. The ledger is what let us say it with a number behind it rather than as a feeling, and a number behind it is what a steering group can act on.
The intervention the ledger drove
Naming the risk was not the end of it. The reading at V9 is what triggered an interactive data-management workshop, run when data collection had stalled, built around an 8-dimension before/after MLOps maturity framework adapted from McKinsey's account of what it takes to scale AI like a tech native [1]. The eight dimensions covered data management, AI development, deployment, live model operations, the technology stack, governance and security, model-asset management, and people and skills, each with a concrete before-and-after contrast rather than a score.
The workshop existed because the dashboard had made the shortfall undeniable. Without a structured ledger, the slow-pace problem stays anecdotal, everyone has a different sense of how bad it is, and the fix is another round of asking nicely for data. With the ledger, the shortfall had a shape: this many wells short, this many weeks from the phase boundary, these specific delivery defects recurring. That shape is what a maturity review can be built on, and it reframed the conversation from "please send data faster" to "here is where the data operation itself needs to change." The later dashboard versions, V10 and V11, carried that reframing back into the tool, gating readiness against the phase minimum and holding the maturity review's structure.
What we would keep, and what we would not repeat
The keepable posture is that internal tooling deserves version numbers and the same maintenance the models get. A QC ledger is not overhead you tolerate until the real work starts. On a program where the data arrives in bad shape, it is the instrument that tells you the real work is at risk before the risk is a crisis. The eleven versions were worth every hour, and the one that paid for all of them was V9, the version that could turn a folder count into a board-level sentence.
The part we would not repeat is letting it grow only reactively. Some of those defect classes were predictable from the first bad delivery, and a ledger designed up front to reconcile manifests, sampling intervals, and value ranges would have caught them earlier. The dashboard climbed because we made it climb after each fall. Next time we would rather it started a few rungs higher.
Limitations
This is one program's internal-tooling history, anonymised, and it is a build narrative rather than a controlled study. The version count, the versions cited on the decks, the trigger reading, the Phase 3 timing and well-count minimum, the data state of ten delivered and eight usable, and the eight maturity dimensions are all sourced from the engagement archive. The maturity height assigned to each version in the instrument is an illustrative ordering of the capabilities those versions carried, not a measured metric, and it should be read as a narrative ladder rather than a benchmark. The precise contents of intermediate versions are reconstructed from the running log and progress records and may compress or reorder minor changes. Nothing here is a claim that eleven versions is the right number for any other program; it is the number this data chaos produced.
References
- Fountaine, T., McCarthy, B., Saleh, T. Scaling AI like a tech native: The CEO's role. McKinsey Quarterly, 2020. https://www.mckinsey.com/capabilities/quantumblack/our-insights/scaling-ai-like-a-tech-native-the-ceos-role