A subsurface AI programme can produce three working models, a stack of validated metrics, and a steering committee that nods — and still leave the operator with nothing it can run. The asset that actually transfers is not the checkpoint file. It is the compute, the data pipeline, the runbooks, and the people who know how to retrain the thing when next year’s wells arrive. When we closed a multi-phase formation-evaluation engagement with a mid-sized Middle East carbonate operator we partnered with, the final deliverable was deliberately not a model. It was a handover: three costed hosting options, an on-prem GPU stack specified down to the storage tier, three tools under service-level agreements, and 55 trained MENA professionals — 15 of them young in-country nationals from two national universities — ready to operate the system after we left.
This is the part of an applied-AI programme that almost no case study describes, because it is the part most programmes skip. We treated it as the acceptance test for everything built before it.
Why the handover is the real deliverable
The engagement ran across three phases between a December 2021 kickoff and a July 2023 close, building three production tools on borehole high-resolution borehole image logsHigh-resolution borehole image logs — microresistivity images of the borehole wall used to pick bedding planes, fractures and vugs. from two different microresistivity imaging tools: AutoFrac for bedding-plane and fracture detection with dip and azimuth regression, AutoVug for vug detection and sizing, and Well-to-Well for cross-well correlation. By the final phase the modelling questions were answered. The open question was governance: who owns the running system, on whose hardware, retrained by whom?
That question is where the failure mode lives. A subsurface model is not the file you train — it is the entire engineering apparatus required to retrain it on demand and run it after the original team has gone. Most operators acquire the artefact and assume they have acquired the capability. They are different things, separated by a six-layer engineering stack that sits underneath the model: HPC, data engineering, data unification, the AI/ML layer itself, the agent/automation layer, and the deployment platform. The trained model is only the visible tip; if any layer below it is missing at handover, the model detaches into proof-of-concept purgatory and the programme silently joins the roughly half of AI pilots that never reach production.
So the transition plan was not a closing courtesy bolted onto a research project. It was the architecture decision the whole final phase turned on. Get the stack-ownership question right and the operator inherits a function; get it wrong and they inherit a slide deck.
Three hosting options, one honest trade-off
We refused to hand over a single “recommended” architecture, because the right answer depends on the operator’s tolerance for capital expenditure, headcount, and operational control — and those are the operator’s calls, not ours. Instead we costed three concrete options and made the trade-off explicit:
- Operator-only (go-it-alone). The operator owns the on-prem stack and runs everything in-house. Lowest recurring cost, highest control, highest demand on internal skills — and, counterintuitively, the slowest iteration in the near term: standing up a retraining run on a freshly transferred team measures in days while the muscle memory builds.
- Operator-plus-vendor (managed, off-prem). A hybrid where we retain the heavy MLOps lifting on managed infrastructure while the operator’s team drives the interpretation workflow. Retraining a tool against new wells lands in the two-to-three-week range, dominated by data movement and managed-pipeline scheduling rather than raw compute.
- Fully managed (as-is). We keep the running platform warm on the existing managed footprint; the operator consumes outputs. Retraining is fastest — minutes to hours — because the pipeline never goes cold, at the cost of the operator never fully internalising it.
The point of laying all three side by side was to make the latency-versus-ownership trade-off impossible to misread. The fully managed option retrains in minutes-to-hours precisely because the operator has not taken the stack in-house; the operator-only option is slower at first precisely because the capability now lives with the operator’s own people. There is no free lunch, and pretending otherwise is how handovers fail six months after the invoice clears.
We attached indicative investment tiers to the escalating capability — on the order of USD 250–350K, USD 650–800K, and USD 1.5–4M — so the steering committee could see the slope from “keep the lights on” to “build a standing subsurface-AI capability,” not just a single number to approve or reject.
The stack that actually transfers
Whichever option an operator chooses, the engineering substrate has to be real, named, and reproducible — otherwise “handover” is a euphemism for “good luck.” The on-prem reference build centred on an Nvidia DGX A100 node — 4 to 8 A100 GPUs, up to 640 GB of GPU memory, delivering 2.5–5 petaFLOPS of AI compute — paired with a DGX server tier carrying 7.7 TB of SSD, 512 GB of system RAM and 320 GB of GPU RAM, and a dedicated data-management server (4 TB SSD, 128 GB RAM) so that DataOps did not compete with training for the same machine.
That hardware was specified to a known workload, not to a brochure. Over the programme, an original corpus of roughly 900 image-and-ground-truth pairs was engineered up to more than 55,000 — a 65× expansion through overlapped patch extraction and geometry-preserving augmentation. Sizing the GPU memory and storage to that trained-on dataset, plus the retraining cadence implied by new well deliveries, is the difference between a stack the operator can actually feed and one that idles. The MLOps layer on top — experiment tracking, data versioning, a custom orchestration layer, and a lightweight serving app — was handed over as code and runbooks, not as tribal knowledge, because reproducibility is what lets a new team rebuild a result rather than admire it.
Each of the three tools transferred under a service-level agreement bundling the full chain: the dataset, the trained model, the architecture, the output specification, and the documentation. The benefits those tools carried were quantified for the operator, not asserted. AutoFrac and AutoVug delivered interpretation roughly 5× faster than the manual pick-and-fit workflow they replaced. Well-to-Well lifted interpretation productivity by about 60% and interpretation accuracy by about 75%, hitting a 95% target-location precision and 90% stratigraphic-correlation success on the operator’s wells. Those are the numbers that justify a standing in-house function rather than a recurring service-company invoice.
Why an SLA, not just a repo
Handing over a model as a Git repository transfers code. Handing it over as a service-level agreement — dataset, model, architecture, output contract, and docs, per tool — transfers an obligation the operator can hold the system to. The SLA is what converts “here are the weights” into “here is a capability with a defined output you can validate against.”
National workforce localisation: the layer no GPU provides
The infrastructure is the easy half. A DGX node can be procured in weeks; an interpretation team that trusts, runs, and improves an AI pipeline cannot. The capability transfer that mattered most was human: across the programme, 55 young MENA professionals were trained on the system, 15 of them young in-country nationals drawn from two national universities. That is not a corporate-social-responsibility footnote bolted onto the close — it is the load-bearing layer of the handover, and it maps directly onto the country’s national workforce localisation priorities for building sovereign technical capacity.
The reason this is non-negotiable is structural. Every layer of the six-layer stack below the model needs someone who can operate it: a data engineer who understands why two of the ten wells in an early intake were excluded for abnormal static-image ranges before training (leaving eight usable), an MLOps engineer who can read a retraining run and tell drift from noise, an interpreter who can sanity-check a predicted dip against the geology. Train the people on the operator’s own data, against the operator’s own tools, and the handover sticks. Hand over weights to a team that has never retrained them and the system rots at the first distribution shift.
This is also where a programme’s field experience compounds rather than evaporates. Across our engagements with operators in the Middle East and the United States, the consistent lesson is that the operators who keep their AI running are the ones whose own people were in the loop before the vendor left — not the ones who bought the best model. The transfer of know-how is the product; the model is the medium it travels on.
What “done” looked like
By the close, the operator had a choice of three costed operating models, a fully specified on-prem stack sized to its real workload, three tools under SLA with quantified speed and accuracy gains, an MLOps layer handed over as reproducible code, and a trained cohort spanning data engineering, ML, and interpretation. A multi-phase, roughly two-year R&D effort had been converted into something an in-house ICT and subsurface team could own, run, and extend — which is the only definition of “production” that survives contact with the year after launch.
The honest caveat is the same one that governs the whole stack-funnel argument: a handover is only as strong as its weakest transferred layer. An operator that takes the hardware but under-resources the people, or takes the people but never stands up reproducible pipelines, will still slide down the funnel — not because the model failed, but because the stack beneath it was never fully built. The transition plan exists precisely to make that failure mode visible before the keys change hands.
What it takes to leave an operator running its own subsurface AI
- The deliverable that determines whether a subsurface-AI programme survives is the handover, not the model — the model is only the visible 15% above a six-layer engineering stack that the operator must be able to own and run.
- Offer the operator a real choice, not one architecture: operator-only / operator-plus-vendor / fully managed, with the honest latency-versus-ownership trade-off (fastest retraining is fastest precisely because the capability has not moved in-house).
- Specify the stack to the real workload (an on-prem DGX A100 node sized to a 65x-grown dataset, MLOps handed over as reproducible code under per-tool SLAs) and transfer the people — 55 trained MENA professionals including 15 young in-country nationals — because the human layer is the one no GPU provides.
References
-
Three-phase formation-evaluation engagement (Dec 2021 kickoff – July 2023 close) with a mid-sized Middle East carbonate operator we partnered with; ICT-transition, hosting-option, hardware, SLA, and national workforce localisation figures derived from internal Phase-3 transition and progress materials. Operator identity, well, field and partner names withheld under confidentiality.
-
PilotToProductionStackFunnel reflects the programme’s own framing that the working model is roughly 15% of the journey and that roughly half of AI pilots never reach production when layers below the model are left unbuilt.