Skip to main content

Case Study

From weeks to hours per well: the throughput dividend of an automated interpretation pipeline

For a mid-sized Middle East carbonate operator we worked with, manual high-resolution borehole image-log sinusoid picking took days per well; the productised AutoFrac/AutoVug pipeline interpreted at roughly 30 seconds per metre — about 5x faster — turning a multi-week interpretation backlog into same-day turnaround across a 16-well programme.

Case study

The bottleneck in borehole-image interpretation is rarely the geology. It is the clock. A single high-resolution borehole image log runs over 200 metres, every planar feature that crosses the wellbore projects as a sinusoid, and an interpreter must pick each trace by hand, fit a curve, and recover three numbers — depth, dip, azimuth — at intervals down the whole well. In a fractured carbonate, that is days of expert time per well, and the queue only grows. For a mid-sized Middle East carbonate operator we worked with, the question that mattered to the subsurface manager was never the F1 score of any one model. It was simpler and harder: how many wells can my team turn around this quarter, and what is the expert time costing me to get there?

This case study is about the commercial arc — the before-and-after of converting a manual, multi-week-per-well interpretation process into a productised pipeline (AutoFrac for fracture and bedding sinusoids, AutoVug for vug morphology) that interprets at machine speed while the geoscientist stays in the loop as the validator, not the labourer.

At a glance

  • Before: Manual sinusoid picking ran to days of expert time per well across a 16-well programme; vug identification, done by eye, took hours per well. Interpretation was the throughput ceiling for the subsurface team.
  • After: The productised AutoFrac/AutoVug tools interpreted at roughly 30 seconds per metre — about 5x faster than the manual baseline — picking sinusoids across multiple wells in minutes and turning a multi-week backlog into same-day turnaround.
  • Quality held: The companion well-to-well correlation tool reached 95% target-location precision and 90% stratigraphic success, with the operator's subsurface team reporting +60% interpretation productivity and +75% interpretation accuracy on the productised stack.
  • The engineering, not just the science: A from-scratch Detection-Transformer (DETR-derived) model, trained under extreme well-data scarcity, served behind a containerised web application on an on-prem GPU stack with full MLOps experiment tracking — an applied-AI production system, not a notebook.

The manual baseline, costed honestly

Start with what the interpreter actually does. A high-resolution borehole image log is unwrapped into a 2D image; the interpreter scans it, picks every fracture and bedding sinusoid, and for each one records depth, dip, and azimuth. Vug analysis is a separate manual pass — identifying pore features by eye, zone by zone. On this operator's data, sinusoid picking ran to days per well, and vug identification took hours per well, both done by a scarce specialist.

That cost compounds. The engagement spanned roughly 16 wells of usable image-log data (with the broader operator inventory exceeding 80 image logs), so the manual process did not just delay one well — it set the throughput ceiling for the entire subsurface programme. Worse, manual picking is inconsistent between interpreters and across sessions, so the multi-week spend bought a result that still carried inter-interpreter variance. The operator was paying premium expert time for a process that was simultaneously slow and not perfectly repeatable.

LOAD PER SURVEY vs ANALYST CEILING33×load uncleared at today’s 4,000-foldAUTOMATION IMPERATIVE401004001,2004,000early 2000slate 2000s2010stodayACQUISITION ERA · OFFSET <4.5 km → 18–24 km4,000-fold40-foldTHE AUTOMATION IMPERATIVE ↑120analyst ceiling (editorial est.)drag the dashed ceiling ⇕ (or focus it and use ↑ / ↓)interpretation load / surveyanalyst ceiling40-fold feasible-by-handAnchors 40 / 4,000-fold: Viridien interview, 2024 · intermediate = log-linear interpolation · ceiling = editorial throughput proxy
Drag the dashed analyst ceiling: the orange breach point and the “automation imperative” wedge recompute. It only flips to “within human reach” if you raise the ceiling above today’s 4,000-fold — a budget no team has. Anchors are sourced; the ceiling is an editorial throughput proxy.

The strategic reading is the one that matters to a subsurface manager: this is a capacity problem disguised as a quality problem. The team did not lack skill; it lacked hours. Any intervention that did not move the throughput needle — a marginally better accuracy figure, a prettier interface — would have left the actual constraint untouched.

What we built: a productised interpretation pipeline, not a model

The instinct in subsurface AI is to chase the model. We did build the model — a customised Detection Transformer that we covered in depth elsewhere — but the throughput dividend came from treating interpretation as a production system end to end, with the model as one component.

The engineering had four layers, each load-bearing for the commercial result:

  • The model. A DETR-derived detector with a ResNet-10 backbone, transformer encoder/decoder, and a multi-task head that classifies fracture-vs-bedding and regresses depth, dip, and azimuth in a single forward pass — no per-pixel masks, no anchor boxes, no non-maximum suppression to tune. Set prediction with Hungarian matching is what lets overlapping sinusoids be recovered natively rather than fought with thresholds.
  • The data engineering. The reservoir intervals were brutally sparse. We grew the training corpus from roughly 900 image–ground-truth pairs to more than 55,000 — a 65x increase — through overlapping patch extraction and geometry-preserving augmentation, the difference between a model that learned nothing and one that converged. The vug pipeline was a separate classical computer-vision stack (top-k mode subtraction, adaptive thresholding, contour filtering) tuned to recover per-vug morphology at 0.1 m resolution.
  • The serving layer. Both tools were productised and exposed through a containerised web application so a geoscientist could run a well through interpretation from a browser — no model code, no GPU babysitting.
  • The MLOps and infrastructure. The whole thing ran on an on-prem GPU stack with experiment-tracking, a versioned data store, and a custom MLOps layer — the production scaffolding that lets a model be retrained, versioned, and trusted rather than re-derived by hand each time.
THE 85% UNDER THE MODEL · 6-LAYER STACK~50%of pilots never ship3 / 6 layers load-bearingBuild the stack up — the model is only the capA model is only as production-ready as the weakest layer below it.production ceilingModel — ~15% of the journey⤓ detached — POC purgatoryHPCbuilt · load-bearingData engineeringbuilt · load-bearingData unificationbuilt · load-bearingAI / MLdrift watch is decorativeAgentsunauditablePlatform & deploymentoutside the perimeterbuild linedrag the build line ↑ · column sizing schematicWHY THE PILOT STALLS3 layers missing below the model.Lowest gap — AI / ML:drift watch is decorative.The model can't reach production overan incomplete stack. It joins the ~50%that never ship — a failure of plumbing.The working model is ~15% of the journey.The other ~85% is the six-layer stack —and pilots die where the stack has seams.Own the stack: data + weights stay in your perimeter.~15% model / ~85% stack, the six named layers & the ~50%-never-ship figure are the whitepaper's own · column sizing schematic
Pilots don't stall because the model is weak. The working model is only ~15% of the journey; the other ~85% is a six-layer engineering stack (HPC → Data engineering → Data unification → AI/ML → Agents → Platform/deployment), and a project ships only when every layer below the model is built to production grade. Drag the build line up the load-bearing column: with all six built the model reaches the production ceiling; with any gap below it the model detaches into POC purgatory — the ~50% that never ship. The ~15%/~85% split, the six layers and the ~50% figure are the whitepaper's own; the equal-sixths column sizing is schematic.

That fourth layer is where most subsurface-AI pilots die, and it is the reason the speed-up was real and repeatable rather than a benchmark artefact. A model that runs once in a notebook does not change a team's throughput; a model behind a containerised app on managed infrastructure does.

The throughput dividend, in numbers

With the pipeline in place, interpretation ran at roughly 30 seconds per metre of image log — about 5x faster than the manual baseline. In practice that meant the model picked sinusoids across multiple wells in minutes where the manual process took days per well, and the vug tool did in minutes what previously took hours per well. The multi-week interpretation backlog collapsed into same-day turnaround for the subsurface team.

Crucially, speed did not come at the expense of trust. On the productised stack the operator's team reported +60% interpretation productivity and +75% interpretation accuracy, and the companion well-to-well correlation tool — which builds fracture and bedding density logs and correlates them across wells via kriging — hit 95% target-location precision and 90% stratigraphic success. The pipeline was faster and more consistent, because a model applies the same picking criteria to every metre of every well, removing the inter-interpreter variance that manual picking carries.

OPERATOR B · 47-WELL PORTFOLIO~306 hexpert hours redeployed to higher-leverage work24 / 47 wells · 14 h → <90 minSweep the portfolio — the model frees time, not jobsPer-well evaluation falls ~89% — geoscientist review stays for anomalies and sign-off.Redeployed to higher workRaw hours saved14 h<90 m← drag to convert the portfolio · grey = manual, teal = agenticgeoscientist review retained (anomalies + sign-off) — never zero · schematic height14 h → <90 min, ~89%, 47 wells, ~600 hours in 8 weeks & review retained are the whitepaper's own · review-band height schematic
The engineering layer didn't replace the geoscientist — it moved expert time off rote evaluation onto judgment. On Operator B's 47-well portfolio, per-well Well-Log-Detection evaluation fell from 14 hours to under 90 minutes (~89%), reclaiming ~600 expert hours in 8 weeks — while geoscientist review was explicitly retained for anomaly cases and final sign-off. Sweep across the portfolio: each 14-hour bar collapses to a <90-minute sliver, the reclaimed-hours total fills toward ~600, and an orange 'review retained' band rides every converted well and never shrinks to zero. All headline numbers are the whitepaper's own; the review-band height is schematic (review is retained but its hours aren't quantified).

The point the exhibit above makes — drawn from a separate, larger portfolio engagement — generalises to this operator exactly: automation did not replace the geoscientist. It moved expert hours off rote tracing and onto the judgment calls that actually need a specialist. Anomalous intervals, ambiguous overlapping features, and final sign-off stayed with the interpreter. What changed was the ratio of judgment to labour. A geoscientist who had been spending days per well dragging a cursor over sinusoids was now reviewing model output and adjudicating edge cases — higher-leverage work, and far less of the rote kind.

Why this is an ROI story, not a research result

A 5x interpretation speed-up is not a vanity metric; it is a re-pricing of the constraint. Three commercial consequences followed directly for the operator, and they are the reasons a subsurface manager funds the next phase.

First, same-day turnaround changes the operating model. When interpretation takes days per well, image-log analysis is a back-office, after-the-fact activity. When it takes hours, it can sit closer to operations — informing completion and stimulation decisions inside a useful window rather than arriving after the fact. The value of an interpretation is partly a function of when it lands.

Second, the expert-hours freed are the real dividend, and they compound across the portfolio. Across a 16-well programme — against an inventory exceeding 80 image logs — the reclaimed specialist time is not a one-off saving; it is recurring capacity that grows with every new well the operator drills. That capacity can absorb the backlog, take on wells that were previously deprioritised for lack of interpreter time, or be redirected to the interpretation work that genuinely needs a human.

THE 10× DIVIDEND · 6–18 WEEKS → HOURS10× acreageinterpreted at 100% of the teamARTICLE’S THESISSpend the 10× dividend — on a smaller team, or on more acreage?Cutting to hold output flat throws the dividend away; holding the team converts every point of it into coverage.← spend on headcount cutsspend on more acreage →team → 10% · acreage → 1×team → 100% · acreage → 10×← drag to allocate · ←/→ keysTeam retainedAcreage covered100%10×of 10× ceilingsame headcount → ten times the interpreted acreage~10× throughput (6–18 weeks → hours) is the article's own · acreage = team × throughput.Headcount-retained axis (10–100%) is the arithmetic inverse of 10× — illustrative; no headcount number is sourced.
The naive read of a 10× productivity jump is 'cut 90% of the team.' The article argues the opposite: at the same headcount, the dividend buys ten times the interpreted acreage. Drag the allocator — spend the dividend on headcount cuts and acreage stays at 1×; hold the team and acreage climbs toward 10×. The single orange marker is the article's own position. The ~10× throughput multiple (6–18 weeks → hours) is sourced; the headcount-retained axis is the arithmetic inverse of 10× and is labelled illustrative, since the article states no specific headcount figure.

Third, consistency de-risks downstream decisions. A reservoir model built on machine-picked sinusoids inherits a single, repeatable picking standard rather than the variance of several interpreters working under deadline. The +75% accuracy figure the team reported is not just a quality number — it is a reduction in the silent uncertainty that propagates from picks into the static model and out into well placement.

The honest caveats

The numbers are real, but they are bounded, and a credible vendor says so. The headline speed and quality figures are the operator's own, measured on the productised tools over a confidential 16-well carbonate dataset; they are not a published, peer-reviewed benchmark, and the more conservative, F1-based model metrics we report elsewhere are the right reference for scientific claims. The corpus was vertical wells of a single reservoir — horizontal wells, where fracture sinusoid amplitude drops sharply, are a separate distribution that needed its own treatment. And the speed-up assumes the production scaffolding stays in place: the model behind a maintained, monitored, retrainable stack is what makes the throughput durable. A model handed over as a pickle file would have decayed back into a research artefact within a quarter.

None of that undercuts the commercial thesis. Across our subsurface engagements — with operators in the Middle East and the United States — the pattern that holds is this: the deep-learning model is necessary but not sufficient. The throughput dividend, the part a subsurface manager can take to the board, comes from engineering the model into a production system that an interpreter trusts and a team can run every day.

The throughput dividend, in one view

  1. Interpretation, not geology, was the constraint: manual sinusoid picking ran to days per well and vug ID to hours per well, setting the throughput ceiling for the whole subsurface programme across a 16-well engagement.
  2. The productised AutoFrac/AutoVug pipeline interpreted at roughly 30 seconds per metre — about 5x faster — collapsing a multi-week backlog into same-day turnaround, with the operator's team reporting +60% productivity and +75% interpretation accuracy and the correlation tool hitting 95% target-location precision / 90% stratigraphic success.
  3. The speed-up was real because it was a production system, not a model: a from-scratch DETR served behind a containerised web app on an on-prem GPU stack with experiment tracking and a custom MLOps layer — and the geoscientist stayed the validator, with expert hours moved off rote tracing onto judgment.

References

  1. Productised tool performance (AutoFrac/AutoVug ~5x faster interpretation; Well-to-Well +60% productivity, +75% interpretation accuracy, 95% target-location precision, 90% stratigraphic success) and the on-prem MLOps/serving stack are drawn from the Phase-3 transition and project review materials for a confidential mid-sized Middle East carbonate operator; figures are the operator's own and are withheld in raw form under confidentiality.

  2. Interpretation speed (~30 s/m, measured on the AutoFrac sinusoid-interpretation pipeline), manual baselines (sinusoid picking takes days per well; vug identification takes hours per well), and dataset growth (~900 to >55,000 image–ground-truth pairs, ~65x via overlap and augmentation) derived from internal Phase-3 reporting on the same engagement.

  3. Model architecture (customised Detection Transformer, ResNet-10 backbone, multi-task depth/dip/azimuth head, Hungarian matching) detailed in the companion GeoBFDT case study; data and code withheld under operator confidentiality.

Go to Top

© 2026 Copyright. Earthscan