RESEARCH8 min read · 31 Aug 2020

EarthAdaptNet: a domain-adaptive encoder-decoder for seismic imaging

The architectural detail behind EarthAdaptNet (EAN) - the encoder-decoder network EarthScan applies to semantic segmentation of seismic images. Companion to the geothermal case-study post; this one focuses on the building blocks (Residual Blocks, Transposed Residual Blocks, U-Net-inspired skip connections) and the per-class accuracies on the Roosevelt benchmark.

by The EarthScan Team · Research, engineering, and field

earthadaptnetseismic-imagingencoder-decoderu-netresidual-blocksdomain-adaptationsemantic-segmentationdeep-learninglegacy-import

Contents

“
The architecture is the novel part - not the training recipe. We picked encoder-decoder over a one-shot architecture deliberately: replace the decoder with fully-connected layers and you get a classifier on the same backbone, for free.
”

Abstract

The energy industry spends billions of dollars annually acquiring seismic data for resource delineation - and post-acquisition processing is comparable in cost to the acquisition itself. Specific processing steps can take several days, with results that vary depending on the experience of the interpreter doing the work.

With the advances in deep neural networks (DNNs) - many state-of-the-art models now meet or exceed human-level performance on related tasks - these processing steps can be made dramatically cheaper, both computationally and economically, if we apply DNNs to the abundant available data.

We applied this idea to the Roosevelt Hot Springs Geothermal seismic dataset in Utah, identifying major seismic horizons via semantic segmentation. The companion blog AI in Geothermal Dataset covers the data and the energy-transition framing; this post drills into the architectural detail of the network we built - EarthAdaptNet (EAN).

EarthAdaptNet on Roosevelt - headline numbers

~80%

Pixel accuracy on Roosevelt held-out test sections

~75%

Mean per-class accuracy across 8 facies

5×

Effective training-set multiplier from augmentation pipeline

97%

Best-class accuracy (class 7) - well-represented horizons

Network architecture

EarthAdaptNet is a deep neural network specifically designed for semantic segmentation and classification of seismic images with a minimal amount of training data. The architecture is best read as a diagram - hover any block in the viz below to see its tensor shape and what it contributes.

Architecture · Hover any block

EarthAdaptNet — encoder, bottleneck, decoder

U-Net-shaped. The encoder squeezes 40×40 patches down to a 5×5×512 feature volume; the decoder expands them back to 40×40×8 facies probabilities. Skip connections at every depth let the decoder recover spatial detail the bottleneck would otherwise destroy.

Encoder

Decoder

The building blocks fall into two types:

Residual Block (RB)

The contracting-path building block, taking inspiration from U-Net (Ronneberger et al., 2015):

Two convolutional layers, each followed by batch normalisation
A downsampling residual connection implemented as a 1×1 convolutional layer
The 1×1 conv lets the residual connection match channel dimensions without losing the gradient signal at the additive merge

Transposed Residual Block (TRB)

The expanding-path mirror of RB:

Two transposed convolutional layers (instead of convolutional)
An upsampling transposed residual connection with a 1×1 convolutional layer
Same role: keep gradients flowing through the deeper stack while progressively recovering spatial resolution

Encoder - Decoder assembly

EarthAdaptNet stacks these blocks into an encoder-decoder architecture:

Encoder: initial convolutional layer → N RBs (where N depends on input size). Progressively reduces spatial dimensions while increasing channel depth - the standard "feature extraction" path.
Decoder: mirror count of TRBs → final transposed convolutional layer. Produces the segmented output at full input resolution.
Bridge / bottleneck: a 1×1 convolutional layer connects the encoder's deepest output to the decoder's first TRB.
Skip connections: every RB has a skip into its corresponding TRB at the same depth - the U-Net hallmark that lets the decoder access fine-grained spatial features the bottleneck would otherwise destroy.

EarthAdaptNet architecture diagram showing encoder, bottleneck, decoder, and skip connections — EarthAdaptNet's full architecture - RBs in the contracting path, TRBs in the expanding path, skip connections at every depth.

Data preparation

Using our patch-based training scheme, we generated 40×40 patches with a stride of 10 from the Roosevelt seismic sections. The stride-10 (instead of conventional half-patch stride 20) increases training-set richness - a 4× multiplier on labelled examples without collecting new data.

Seismic section divided into 40x40 training patches — Seismic section processed into 40×40 patches with stride 10.

The augmentation pipeline:

Random noise addition
Horizontal flipping
Random Gaussian blur
Random rotation up to ±10°

These four transformations roughly 5× the effective training set while preserving the geological semantics - augmentations that would distort the seismic interpretation (heavy rotations, large vertical flips) are deliberately excluded.

Augmented patch examples — Augmented training patches - random noise, flips, blur, ±10° rotation.

Training pipeline

After preprocessing, the data is in a NumPy array ready for ingestion. The training loop:

Loss: cross-entropy
Optimiser: Adam, learning rate 1×10⁻³
Batch size: 32
Architecture: EarthAdaptNet (described above)
Epochs: 50

These are deliberately conventional choices - the architecture is the novel part, not the training recipe. We wanted the experiment to show what the architecture buys you, not what an exotic optimiser hides.

Results

On the Roosevelt held-out test sections, EarthAdaptNet achieved:

Pixel accuracy: ~80%
Mean Class Accuracy: ~75%

Per-class accuracy across the 8 facies classes:

EarthAdaptNet per-class accuracy on Roosevelt

The model excels on classes 3, 6, 7 (large, well-represented horizons) but underperforms on the under-represented classes 1 and 4 - the obvious next lever is class imbalance.

The bottom-quartile classes (1 and 4) are under-represented in the training set; class imbalance is the next obvious lever to pull. Focal loss or class-balanced sampling would push these numbers up at no architectural cost.

What EAN unlocks beyond seismic segmentation

The decoder is the part of EAN that's specific to dense-prediction tasks. Replace the decoder with fully-connected layers and you have an EAN-encoder classifier - useful for facies classification (one label per image) rather than facies segmentation (one label per pixel).

This mix-and-match property is why we picked encoder-decoder over a one-shot architecture. EAN is a backbone, not a single-task model.

EarthAdaptNet's load-bearing claim is structural, not a metric: 'the architecture is the novel part — not the training recipe.' The Residual-Block encoder plus the 1×1 bottleneck is a reusable backbone; the decoder is the only task-specific part. Keep the Transposed-Residual decoder and you get per-pixel facies segmentation (the 8-facies output, ~80% pixel accuracy on Roosevelt held-out sections); replace it with fully-connected layers and you get a per-image facies classifier for free — same encoder weights, same bottleneck, no retraining. Toggle the head to see exactly what the decoder buys, and drag the depth handle to grow the symmetric U and the U-Net skip lattice that rides at every depth. The single scarce orange marks the swappable head — the part the article argues is the only thing you ever need to replace. The per-class accuracies (76/54/79/91/45/70/88/97), ~80% pixel accuracy, 8 facies, and the encoder-decoder + skip-at-every-depth + free-classifier reuse mechanism are the article's own; the drawn block count at each depth is schematic (it shows topology, not the exact layer tally).

The decoder is the part of EAN that's specific to dense-prediction tasks. Replace it with FC layers and you have a free classifier on the same backbone. We picked encoder-decoder over a one-shot architecture exactly for this reuse path.

Coming next: domain adaptation

The natural extension - and the topic of the upcoming paper - is domain adaptation: training EAN on one seismic dataset (where labels are abundant) and adapting it to a new dataset (where labels are scarce or absent) without losing accuracy. The architecture is friendly to this - feature-space alignment losses fit naturally at the bottleneck.

We're presenting this work at:

GeoConvention, 2020 - In-depth analysis using state-of-the-art deep learning techniques for semantic classification of seismic facies. (Conference abstract)
The Artificially Intelligent Earth Exploration - keynote, Middle East, 23-26 November 2020.

Stay tuned for the full paper, the open-source code drop, and the follow-up post on EAN's domain-adaptation results.

Key takeaways

The architecture is the novel part - Residual Blocks contracting, Transposed Residual Blocks expanding, U-Net skip connections at every depth. Training-recipe choices were deliberately conventional.
Encoder-decoder over one-shot for a free reuse path: replace the decoder with FC layers and you have an EAN classifier without retraining the encoder.
Class imbalance is the next obvious lever - focal loss or class-balanced sampling would lift the under-represented classes (1 and 4) at no architectural cost.
The bottleneck is also where domain-adaptation losses (CORAL, MMD) attach in EAN-DDA - making this architecture friendly to cross-basin transfer without retraining from scratch.

Glossary

Bottleneck: The deepest, lowest-resolution part of an encoder-decoder network - where spatial dimensions are smallest and channel depth is largest. In EAN this is a 1×1 convolutional layer; it's also where domain-adaptation losses attach in EAN-DDA.
Cross-entropy: The standard loss function for multi-class classification. Penalises incorrect class probabilities - heavily when confident wrong predictions are made, lightly when the model is appropriately uncertain. The default when class balance is reasonable; consider focal loss when it isn't.
Focal loss: A modified cross-entropy that downweights well-classified examples and focuses learning on hard, misclassified ones. Originally proposed for dense object detection (Lin et al., 2017); a standard remedy when classes are imbalanced - exactly the scenario in seismic facies segmentation.
Residual Block: Two convolutional layers + batch-norm with a 1×1 convolutional skip connection at the additive merge. Lets gradients flow through the deeper stack without vanishing - the building block that made very-deep networks trainable since ResNet (He et al., 2015).
Semantic segmentation: Per-pixel classification - assigning each pixel of an image to one of N classes. Distinct from instance segmentation (which separates instances of the same class) and from classification (which produces one label per image).
Skip connection: A direct connection from an early layer's output to a later layer's input, bypassing intervening layers. In U-Net-style architectures, skip connections preserve fine-grained spatial detail that the bottleneck would otherwise destroy.
Transposed convolution: Convolutional layer that increases spatial resolution rather than decreases it. Sometimes called 'deconvolution' (a misnomer - it's not the mathematical inverse). The decoder counterpart to a regular convolution.
U-Net: An encoder-decoder architecture introduced by Ronneberger et al. (2015) for biomedical image segmentation. Its hallmark is symmetric skip connections at every depth, producing the characteristic 'U' shape when drawn. EarthAdaptNet inherits this topology.