A national oil company in Southeast Asia cut exploration interpretation cycle time from six weeks per prospect to four days — and reduced geoscientist hours by 72% — by deploying an agentic interpretation loop that proposes horizon picks, flags low-confidence zones, and learns from every correction. The bottleneck wasn't compute; it was turning scarce expert judgment into a scalable, consistent asset.
At a glance
Three metrics tell the story of what changed when a human-in-the-loop agent joined the interpretation workflow.
Interpretation cycle time
Geoscientist hours per prospect
Cross-team pick consistency
The challenge
The national oil company operates a diverse exploration portfolio across six basins. Every new prospect begins with a seismic-interpretation exercise: a geoscientist loads a 3D volume, picks key horizons, maps fault networks, and estimates structure. The work is deliberate, detail-heavy, and resistant to shortcuts. A typical prospect took six to eight weeks from acquisition handover to a completed interpretation ready for portfolio ranking.
Time was the obvious problem. With a pipeline of twenty-plus prospects in queue and a geoscience team of eleven, the backlog stretched into quarters. But the deeper issue was consistency. Three geoscientists interpreting the same volume would produce three subtly different fault maps and horizon surfaces. When leadership tried to rank prospects basin-wide, the variance in structural closure estimates made apples-to-apples comparison impossible.
The exploration manager framed the brief plainly: we need to interpret faster, and we need the interpretations to agree with each other when the geology is the same. Manual quality control wasn't scaling, and hiring more geoscientists wouldn't solve the consistency gap.
What we did
We built an agentic interpretation loop — a foundation model fine-tuned on the basin's own labelled seismic volumes, wrapped in a review agent that proposes picks, assesses its own confidence, and routes uncertain sections back to a human geoscientist.
The system works in three passes. First, the model ingests a 3D seismic volume and generates a complete interpretation: horizon surfaces for five key stratigraphic markers and a fault-network skeleton. Second, the review agent scores every inline and crossline pick for geologic plausibility — does the horizon stay continuous across faults, does the dip magnitude match adjacent sections, are there polarity flips that suggest mis-ties. Any segment scoring below a confidence threshold gets flagged. Third, a geoscientist reviews only the flagged zones, accepts or corrects the picks, and commits the edits. Those corrections flow back into the training loop, so the model learns the interpreter's judgment over time.
The fine-tuning corpus came from twelve legacy prospects the team had already interpreted by hand — roughly 140 labelled inlines and 95 crosslines per volume. We augmented with synthetic fault geometries to teach the model how to handle throw variability and relay ramps, features under-represented in the labeled set.
The three-pass interpretation loop
Model proposes full interpretation
Horizons + fault network for the entire 3D volume
Review agent scores confidence
Flags low-plausibility picks and routing uncertain zones to human review
Geoscientist corrects flagged zones
Edits feed back into fine-tuning for the next prospect
The outcome
The first production prospect took nine days end-to-end — the geoscientist spent four hours reviewing flags, made corrections on 11% of the proposed picks, and signed off. By the third prospect, cycle time dropped to four days and review hours to ninety minutes. The model had learned the interpreter's style: where she expected horizons to pinch out near salt diapirs, how she handled ambiguous fault terminations.
Cross-team consistency improved by 38 percentage points, measured as the mean overlap coefficient between independently reviewed interpretations of the same test volume. The variance that used to make portfolio ranking subjective now sat within the noise floor of seismic resolution itself.
Geoscientist hours per prospect fell 72%. The senior interpreter described the change as freeing her to do geology again instead of digitizing reflectors. She now spends her time on the 20% of picks where geologic judgment matters — near-salt deformation, subtle stratigraphic onlap — and lets the agent handle the mechanical 80%.
Where the time went
What this unlocked
Faster interpretation was the visible win, but the downstream changes mattered more. With cycle time compressed, the exploration team could run sensitivity cases — re-interpret a prospect under different velocity models, test alternative fault correlations — without burning weeks of calendar time. Portfolio ranking moved from annual to quarterly, and the geoscience team could feed updated risk assessments into capital allocation before drilling schedules locked.
The consistency gains made cross-basin comparison credible for the first time. Leadership could finally ask which basin offers the best risk-adjusted return and trust that the structural-closure estimates feeding that answer were built on the same interpretation rules.
The learning loop meant the model improved with use. Six months in, the flag rate dropped from 11% to 4%. The agent had internalized not just seismic character but the interpreter's decision heuristics — when to honor a weak reflector, when to call a fault sealed. The geoscientist's corrections became the training data that made her own reviews faster next time.
Lessons and next steps
Three lessons stand out. First, confidence scoring is not optional. An agent that proposes picks without flagging uncertainty forces the human to review everything, collapsing back to manual QC at scale. The review agent's ability to say I'm unsure here is what made the time savings real.
Second, the fine-tuning corpus needs to match the production task. Seismic foundation models pre-trained on global datasets are fluent in seismic texture, but they don't know how this basin's carbonates behave near basement highs or how this team handles fault-shadow artifacts. Twelve hand-labeled prospects were enough to teach basin-specific and interpreter-specific priors.
Third, the win wasn't replacing the geoscientist. It was giving them a tireless first-pass partner so their judgment went to the 20% that needed it. The senior interpreter still signs every interpretation. She just doesn't spend forty hours getting to the signature line anymore.
The exploration team is now extending the agent to well ties — correlating seismic horizons to formation tops in wellbore data — and to amplitude-variation-with-offset attribute analysis. The same human-in-the-loop pattern applies: the agent proposes, flags uncertainty, and learns from corrections. The bottleneck is shifting from how fast we can interpret to how fast we can decide what to drill.
Before and after
The metric that matters most to the exploration manager is simple: how long from seismic handover to a portfolio-ready interpretation. The agentic loop collapsed that timeline by an order of magnitude, and the time saved went straight into better decisions.
Before
6 weeks
After
4 days
References
1 Outcome metrics (interpretation cycle time, geoscientist hours, consistency improvement) provided by client exploration team during post-deployment review, Q3 2024.
2 Cross-team consistency measured as mean overlap coefficient (Jaccard index) of horizon surfaces interpreted independently by three geoscientists on a 20 km² test volume, before and after agent deployment.