There is a version of the subsurface-AI business case that asks the wrong question. It asks how accurate the model is, benchmarks it against an expert, and declares victory or defeat on a percentage point of dip error. That framing misses the reason an operator reaches for automation in the first place. The binding constraint in formation evaluation is rarely accuracy. It is throughput — the rate at which a fixed number of trained interpreters can turn raw image logs into a structurally coherent picture of a field, set against the rate at which the drill bit produces new wells to interpret. When we scoped a multi-phase formation-evaluation programme with a mid-sized Middle East carbonate operator we partnered with, the scoping document made the gap unmistakable: more than 80 processed and interpreted high-resolution borehole image logs already in hand to normalise, and a correlation target that ran from a single pair of wells up to all 80-plus across the field. No realistic headcount clears that. The question was never can a person do this well — it was can any team of people do this fast enough, and the honest answer was no.
The arithmetic of a fixed expert headcount
Start with the unit of work. A geoscientist interpreting a high-resolution borehole image log is picking sinusoids — every fracture, every bedding plane that crosses the borehole — fitting a curve to each, and reading off depth, dip, and azimuth. On a fractured carbonate play that is dense, careful work. The image detail in this dataset resolved down to roughly 50 micrometres, which is precisely what makes manual picking both possible and slow: there is genuine geological signal at fine scale, and a human has to attend to all of it. The contract's own benchmark for an automated pass set the bar at the order of 30 seconds to interpret a metre of log, framed as roughly five times faster than the manual baseline it would replace. Flip that ratio around and the manual cost is on the order of minutes per metre — and a single well carries hundreds of metres of interpretable interval.
Now multiply. Eighty-plus wells, each hundreds of metres deep in the zones that matter, is not a quarter's work for a small interpretation team — it is years of it, and that is before a single new well is drilled. Worse, the deliverable the operator actually wanted was not 80 isolated interpretations. It was correlation: the Phase-2 scope was written explicitly as correlation across 2 to 80 wells, the regional picture in which a bedding-density signature in one well is tied to its continuation in the next, fault and fracture trends integrated field-wide. Correlation is combinatorial. The interpretive load does not grow with the number of wells; it grows with the relationships between them. Adding the eighty-first well does not add one well's worth of work — it adds the work of reconciling that well against everything already interpreted.
This is the shape of every capacity crisis: a load that climbs steeply against a ceiling that does not move. The expert headcount is the ceiling. It is set by a global scarcity of senior borehole-image interpreters, by hiring cycles measured in quarters, and by the years it takes to train a geoscientist to validator-grade judgement. You cannot buy your way through it on the timeline a drilling programme runs to. The instrument below makes the dynamic concrete: a per-survey interpretation load climbing two orders of magnitude while the analyst ceiling stays flat, and the orange wedge between them — the work no fixed team can ever clear.
The lesson generalises past this one field. Drag the ceiling in that exhibit as high as you like; it only flips to "within human reach" at a headcount no operator funds. That is the structural case for automation, and it is why the economics of the engagement were framed from the outset not as hire more interpreters but as scale a model — a decision with a fundamentally different cost curve.
Two cost curves, and why they diverge
Hiring is linear in the work. Each additional well of backlog, each new well off the rig, demands a roughly fixed increment of expert-hours, and expert-hours are the scarcest, least elastic input in the business. Double the drilling cadence and you double the interpretation bill — if you can even find the people, which on the timeline that matters you cannot. The cost per interpreted well is flat at best and rising at worst, because the marginal interpreter is harder to hire than the last.
A model inverts that curve. The expensive part — assembling a labelled corpus, engineering the computer-vision pipeline, and training the detector — is paid once, up front, and it is genuinely expensive: data engineering, augmentation, architecture search, and validation against geologist-picked ground truth. But once that fixed cost is sunk, the marginal cost of interpreting the next well collapses toward the cost of GPU time. The model that has learned to pick sinusoids on this carbonate does not get tired on well sixty, does not need re-hiring for well eighty-one, and runs at the ~30-seconds-per-metre order the contract anchored to. The break-even is not subtle. Against a backlog of 80-plus wells and a live drilling programme feeding more, the amortised cost per well crosses below the manual line early and keeps falling.
Before
Linear in expert-hours
Hire/retain interpreters: cost per well roughly fixed at best, rising as senior interpreters get scarcer; correlation load grows combinatorially with well count
After
Fixed build, near-zero margin
Train the CV pipeline once; marginal cost per new well collapses toward GPU time at ~30 s/m, ~5x the manual rate
Break-even early against an 80+ well backlog and a live drilling cadence
None of this argues that the model replaces the geoscientist. It re-tasks them. The throughput gain we targeted was explicitly a productivity multiplier on the expert, not a substitute for them: in the well-to-well correlation tooling the programme projected on the order of a 60% productivity lift and a 75% interpretation-accuracy gain, with the human moving up the stack — from picking every sinusoid by hand to validating, correlating, and reasoning about the field-scale story the model surfaces. The scarce expert stops being the bottleneck on volume and becomes the arbiter of quality, which is the only role that scarcity should be spent on.
What had to be true for the curve to bend
The throughput argument is only honest if the model is good enough that the expert can trust its output without re-doing the work — otherwise you have not removed the bottleneck, you have added a review step in front of it. That is the entire reason we structured the engagement as a phased ramp rather than a single delivery. The phase ladder ran from Phase 0 setup and governance, through Phase 1 dataset development and the core algorithm and model engineering, into Phase 2 data pre-processing and the 2-80-well correlation and AI-model integration, then Phase 3 production serving, CI/CD, and rollout to new business-unit areas — with Phases 4 and 5 reserved for scale-out and spin-out. Capacity was not switched on; it was earned, one validation gate at a time.
The clearest evidence that this was a capacity ramp and not a demo is the well-maturity tracking the programme reported on its own progress decks. Wells moved through maturity states, and the count of fully matured wells climbed deliberately — on the order of 3 of 80, then 8 of 80, against a near-term target of roughly 20-25 of 80. That is what closing a capacity gap looks like in practice: not a binary "the model works", but a monotonically rising fraction of the field brought to validated, correlation-ready state, each cohort of wells widening the geological diversity the model had seen and therefore the share of the next cohort it could handle without hand-holding. Phase 3 was scoped to need a minimum on the order of 10-15 wells of matured data before production rollout, with a target of 25-plus — the threshold at which the model's coverage of the field's structural variety was broad enough to carry new wells.
The engineering underneath the capacity argument is the same computer-vision pipeline that carries the rest of this programme: a detection-transformer model that casts every fracture and bedding sinusoid as end-to-end set prediction, trained on a geometry-preserving augmentation regime to survive a small-data regime, served behind an MLOps layer that lets the operator retrain as new wells arrive. The capacity gap is the why; that pipeline is the how. A throughput claim with no production-grade serving and retraining story behind it is a benchmark, not a capability — and a benchmark does not clear a backlog.
The handover the capacity case implies
There is a strategic tail to all of this that the phase ladder makes explicit. If automation converts interpretation from a headcount problem into a model problem, then the asset that needs to scale is no longer a hiring pipeline — it is a platform. Phases 4 and 5 were written as scale-out and spin-out precisely because a model that clears one operator's 80-well backlog is, with retraining, a model that clears the next operator's. The capacity gap that justifies the build for a single field is the same gap that exists across the operators we have worked with in the Middle East and the United States; the marginal economics that make automation correct for one carbonate field make a productised formation-evaluation capability correct for many. That is the Phase-4/5 thesis in one line: the throughput dividend, once banked, compounds.
For the operator in question, the near-term win was concrete and unglamorous — a backlog that a fixed team could never have cleared was put on a curve that closed it, and the scarce experts were moved from picking sinusoids to governing a field-scale interpretation they could trust. The longer-term win is the one the capacity arithmetic always points to: when the cost of the next interpretation collapses, the interpretation stops being the product and the platform becomes it.
Why the interpretation backlog was a throughput problem, not a hiring problem
- A backlog of 80+ processed high-resolution borehole image logs and a 2-80-well correlation scope grows combinatorially with well count; a fixed expert headcount — gated by a global scarcity of senior interpreters and multi-year training — is a flat ceiling that no hiring cycle raises on a drilling-programme timeline.
- Hiring is linear in scarce expert-hours; a model pays an expensive fixed build cost once, then drops the marginal cost per well toward GPU time at the ~30-s/m, ~5x-faster order the contract anchored to — so amortised cost per well breaks even early against an 80+ well backlog and keeps falling.
- The curve only bends if the model is trustworthy enough to remove the bottleneck rather than add a review step: the phased ramp (well-maturity climbing 3/80 -> 8/80 toward 20-25/80, Phase 3 gated at ~10-15+ matured wells) earned that trust gate by gate, with the expert re-tasked from picking to validating and correlating — the ~60% productivity, ~75% accuracy dividend, and the Phase-4/5 scale-out case it implies.
References
-
Scope of work, well count (80+ processed/interpreted high-resolution borehole image logs to normalise; integration across all 80+ wells), 2-80-well correlation target, 3-phase / ~20-month structure, and ~30-s/m, ~5x-faster interpretation benchmark — engagement scope-of-work and proposal documents; client and field withheld under operator confidentiality.
-
Phase ladder (Phase 0 setup/governance through Phases 4 & 5 scale-out/spin-out) and well-maturity ramp (3/80, 8/80, target 20-25/80; Phase 3 minimum ~10-15 wells, target 25+) — internal phase and progress reports for the engagement.
-
Well-to-well correlation productivity (~60%) and interpretation-accuracy (~75%) figures, and 50-micrometre borehole-image detail — internal programme decks; data and code withheld under operator confidentiality.
-
Capacity-gap exhibit anchors (40 / 4,000-fold acquisition-density jump) are a published industry reference; the analyst-ceiling curve is an editorial throughput proxy, flagged on-canvas. The wells, field, and operator described here are anonymised.