Skip to main content

Blog

Seismic Foundation Models: The CDO's New Edge

Operators spend $2.4B/year on subsurface studies that take weeks. Seismic foundation models compress that to hours — and put the CDO in charge.

Tannistha MaitiPetaiby Tannistha Maiti, Petai6 min read
EarthScan insight

Operators are spending $2.4 billion annually on subsurface studies that take 6–18 weeks — seismic foundation models now compress that to hours, on the same hardware.

Why this matters

For three decades, seismic interpretation has been a per-basin, per-project, per-interpreter craft. A team licenses a workstation, loads a 3D survey, trains a bespoke classifier on a few thousand hand-picked labels, and ships a horizon map six months later. The work is good. It is also unrepeatable, uninspectable, and expensive — Wood Mackenzie pegs the global subsurface-study spend at roughly $2.4 billion per year [1].

That economic model is breaking. Pre-trained transformers on 3D seismic volumes are doing for the subsurface what GPT-class models did for text: one model, many tasks, fine-tuned in hours instead of trained from scratch in months. The interesting wrinkle for the executive reader is that the bottleneck has shifted. It is no longer the geoscience. It is the data pipeline — and the person who controls the pipeline is the Chief Data Officer.

The CDO who controls the seismic training pipeline controls the exploration edge — that is now a data infrastructure decision, not a geoscience one.

The current state

The numbers tell a consistent story. Subsurface studies still dominate exploration budgets, and the cycle time has barely moved in twenty years. Interpreters spend weeks on tasks that the underlying math could do in minutes if the data were addressable at cloud scale — which, until recently, it was not.

The subsurface-study economics, today

$2.4B

Annual global subsurface-study spend [1]

6–18 wks

Typical interpretation cycle per survey [2]

40%

Interpreter FTE reduction per project, AI-assisted [3]

$180M

Raised by seismic-AI vendors in 2024 [4]

Seismic data is stored in SEG-Y, a tape-era format from 1975 that assumes sequential read on a single machine. Loading a single 3D survey into a distributed training job means custom shims, brittle ETL, and I/O costs that swamp the GPU bill. That is why, despite every major operator running an AI program for the better part of a decade, almost none have trained a foundation-scale model on their own seismic library.

What changed

Three things happened in parallel, and they only matter together.

First, pre-trained transformers proved out on 3D seismic volumes. Rather than training a fresh CNN per basin, teams now fine-tune a single backbone — pre-trained on hundreds of public and licensed surveys — for horizon picking, fault detection, facies classification, and salt-body segmentation. Generalisation across basins, which used to be the wall every academic paper hit, is now the default behaviour.

Second, MDIO — a cloud-native, chunked, Zarr-backed seismic format — replaced SEG-Y for training workloads. The I/O bottleneck that made distributed seismic training economically impossible has been solved at the file-system layer. A 1 TB survey that took hours to stage on a workstation now streams to a GPU cluster on demand.

Third, global fine-tune workflows have collapsed interpretation cycles from weeks to hours. Equinor's 2025 pilot reported sub-four-hour cycle time on surveys that previously took six to eighteen weeks [2]. The fine-tune happens on a fraction of the labels a bespoke model would have needed, because the backbone already knows what a fault looks like.

The new seismic interpretation stack

  1. Pre-trained backbone

    Transformer trained on hundreds of 3D volumes — basin-agnostic features.

  2. MDIO storage layer

    Cloud-native, chunked seismic — streams directly to GPU.

  3. Global fine-tune

    Hours, not weeks. Small label budget. Reproducible.

  4. Asset-team delivery

    Horizons, faults, facies, salt — single model, multiple heads.

Implications for the executive

The implication is uncomfortable for a lot of org charts. If a foundation model fine-tuned on your seismic library out-performs a bespoke per-basin classifier — and it does — then the competitive moat is no longer the interpretation team. It is the training corpus, the labelling discipline, and the pipeline that gets data from tape archive to GPU. All three of those are CDO problems, not chief-geoscientist problems.

A 3D seismic volume rendered in a cloud-native interpretation pipeline. The cube's outer faces show a colour-banded amplitude reflection pattern (reds, oranges, blues), with horizon meshes draped in pale green and pink at the base, and a dense bouquet of interpreted fault surfaces highlighted in red-orange piercing through the volume. The Geoteric watermark sits in the top-right corner.
A 3D seismic volume with interpreted fault surfaces, rendered in a cloud-native pipeline. The interpretation suite no longer reaches into a workstation's local disk — it streams the cube directly off chunked storage, and the fault surfaces are the output of a model fine-tuned in hours rather than a per-basin pipeline built over months. The competitive question is which side of that workflow you own. (Source: Geoteric)

The reframe

Exploration edge used to live in the interpretation team. It now lives in the data pipeline that feeds the model the interpretation team uses. That is a CDO-owned asset.

This does not eliminate the geoscientist. It re-scopes the role. dGB Earth Sciences reports a 40% reduction in interpreter FTE per project on AI-assisted workflows [3] — the remaining 60% spend their time on geological reasoning, prospect ranking, and edge cases the model flags as low-confidence. That is a higher-leverage job, and a smaller team doing it.

The capital is following. Seismic-AI vendors raised roughly $180M in 2024 [4], and the round sizes are accelerating into 2025 as operators move from pilots to production. The operators who win the next exploration cycle will be the ones who treated their seismic archive as a training asset rather than a project artifact.

What happens next

The CDO who controls the seismic training pipeline controls the exploration edge — and that is a data infrastructure decision, not a geoscience one. Three concrete moves separate the operators who will benefit from the operators who will watch from the sidelines.

What happens next

  1. Inventory the corpus

    Catalog every 3D survey, every interpretation, every label. Treat them as training assets, not project deliverables.

  2. Migrate to MDIO

    Move active surveys off SEG-Y and onto a cloud-native, chunked format. The I/O cost difference at training scale is order-of-magnitude.

  3. Run a fine-tune pilot

    Pick one basin, one task, one backbone. Benchmark cycle time against your current bespoke workflow. The result will tell you what the next budget cycle looks like.

References

[1] Wood Mackenzie. Global subsurface study spend, upstream exploration benchmark (2024).

[2] Equinor. Seismic foundation-model pilot, internal cycle-time report (2025).

[3] dGB Earth Sciences. AI-assisted interpretation productivity benchmark.

[4] Pitchbook. Seismic-AI vendor funding totals (2024).

EarthScan
Continuous AI for explorers

info@earthscan.io

Go to Top

© 2026 Copyright. Earthscan