Skip to main content

Blog

Well-to-Well at Field Scale: Bootstrapping Correlation with FORCE-2020 Open Data

How a subsurface-AI team de-risked a field-scale well-to-well correlation feature by bootstrapping the method on 118 public Norwegian-Sea wells before any operator data was clean — then shipped fracture and bedding-density logs plus 2D/3D kriging across a target carbonate formation for a Middle East operator we partnered with.

Quamer Nasimby Quamer Nasim10 min read
EarthScan insight

There is a chicken-and-egg problem at the heart of every field-scale machine-learning feature in the subsurface: the method you most want to build — correlating geological structure from one well to the next across a field — is exactly the method you cannot prototype until the operator's own wells are interpreted, normalised, and trustworthy. And interpreting the wells is the work the feature is supposed to accelerate. In a roughly twenty-month engagement with a mid-sized Middle East carbonate operator, our team broke that loop the way good applied-ML teams usually do: we stopped waiting for proprietary data and bootstrapped the entire well-to-well (W2W) pipeline on a public dataset first. This piece is about why that sequencing is the right engineering decision — and how a 118-well open dataset from a completely different basin de-risked a production correlation feature for a carbonate field on the other side of the planet.

The feature, and why it could not start with the operator's wells

Well-to-well correlation is the geological act of recognising that a stratigraphic feature seen in one borehole — a bedding package, a fracture corridor, a porosity break — is the same feature, displaced and deformed, in a neighbouring well. Done by hand it is slow, expert-bound, and inconsistent across interpreters. Done well by a model it becomes a field-scale map: a new subsurface layer the operator can drop on top of its existing reservoir model to plan the next horizontal or infill well.

The engineering target was concrete. Rather than ask a model to "correlate wells" in the abstract, the team reduced correlation to two computable, log-derived signals — fracture density and bedding density — sampled along each borehole, and then to a spatial interpolation problem between wells. Both halves needed to be designed, coded, and validated, and neither could wait for clean operator data — because the operator data was the bottleneck: image logs arrived with depth mismatches against their dip picks, abnormal static-value ranges that broke normalisation, and gaps where image coverage simply did not exist. Building a brand-new correlation method on top of that mess would have conflated two failure modes — a broken method and broken data — with no way to tell them apart.

So the first decision was a data decision, not a model decision: find a clean, large, public well dataset, get the method working end-to-end on it, and only then port the validated pipeline onto the operator's wells as they came online.

FORCE-2020: a clean basin to learn the method on

The dataset the team reached for was FORCE-2020 — the open well-log machine-learning benchmark of 118 wells from the Norwegian Sea, released on Zenodo (record 4351156) for a public lithology-prediction competition. It is everything the operator's early data was not: large, consistently formatted, openly licensed, and rich in the conventional logs (gamma ray, neutron porosity, density, caliper) that drive correlation. Open data is also, increasingly, how serious applied-AI teams cold-start subsurface work — a pattern we have applied across engagements in the Middle East and the United States.

The geology is wrong, of course. The Norwegian Continental Shelf is a clastic, North-Sea-style stratigraphy; the engagement's reservoir was a Middle East carbonate. That mismatch is the entire point — and it is a feature, not a bug, for bootstrapping. What transfers between the two basins is not the geology but the method: how you partition a borehole into analysis windows, the signal-processing that detects a structural break, the matching logic that ties a feature in one well to the same feature in another, and the spatial interpolation that fills the volume between wells. FORCE-2020 let the team build and debug every one of those components against ground truth, with zero dependence on a data-delivery schedule it did not control.

Concretely, the W2W kink-detection prototype partitioned each well into patches of 700 data points along depth, learned to flag patches where the conventional-log signature changes character — a lithofacies "kink" — and validated on held-out wells by querying an unknown well against the learned groups by similarity. That is a transferable engineering primitive: the patch size, the kink detector, and the similarity query are basin-agnostic; only the training labels are Norwegian.

The transfer problem, stated honestly

Bootstrapping on public data buys you a working pipeline, not a model that is correct on the target. A correlation model trained on Norwegian clastics has learned the feature distribution of the wrong basin. Deploy it unchanged on Middle East carbonates and its decision boundary slices through the target distribution in the wrong place — the classic source-versus-target domain gap that sits underneath every transfer-learning effort.

The right mental model is not "retrain from scratch" but feature-distribution alignment: keep the representation the source dataset taught you, and move the target data onto that learned manifold rather than relabelling everything by hand. The interactive below makes the mechanism concrete — a labelled source cloud, an unlabelled target cloud, a decision boundary learned on the source, and an alignment strength you can drag from zero to one to watch the target cross to the correct side. Substitute "118 Norwegian wells" for source and "the operator's carbonate wells" for target and the geometry is exactly our W2W bootstrap.

EAN-DDA · BOTTLENECK FEATURE SPACEλ 0.32domain-adaptation alignment strength12 target points mis-placedPull Penobscot onto the F3 manifold — close the domain gapTrain on labelled F3, deploy on unlabelled Penobscot — alignment moves the target cloud, not the labels.F3 source (labelled · 6 facies)Penobscot target (unlabelled)encoder feature space (bottleneck)F3 decision boundarydomain gapsource ↔ targetdistributions split12 points on thewrong half-spaceλ = 0 · no adaptation← drag alignment strength · grey ring = point on wrong sideλ = 1 · alignedSourced: 6 facies labelled on F3 · clouds, boundary & gap are a schematic of feature alignment
The mechanism behind EAN-DDA, in one move. At the encoder bottleneck, labelled F3 (source) and unlabelled Penobscot (target) patches map into a feature space. Train on F3 alone and the two clouds sit apart — the domain gap — so the F3 decision boundary cuts through the target cloud and facies get mis-read. The deep-domain-adaptation alignment loss pulls the target cloud onto the source manifold: drag the alignment strength λ from 0 to 1 and the orange gap collapses while grey target rings turn teal as they cross to the correct side of the boundary. Only '6 facies classes labelled on F3' is the article's own number; the point clouds, decision boundary, gap, and the live mis-placed-point count are a schematic of feature-distribution alignment — no benchmark metrics are shown.

That is why public-data bootstrapping does not compromise the result. The source dataset does the expensive job — teaching a stable feature space and a working pipeline — and the small, hard-won target data does the cheap job — pulling the model onto the right manifold. You spend the operator's scarce, slowly-arriving wells on alignment, not on cold-starting the entire method.

From open-data prototype to a shipped field feature

Once the pipeline was proven on FORCE-2020, the team ported it onto the operator's wells. Two new well-log curves were engineered from the image-log interpretation: a fracture-density log and a bedding-density log, computed as feature counts aggregated along depth and sampled at a 0.1 m (10 cm) depth interval — fine enough to preserve real stratigraphic structure, coarse enough to suppress pick-level noise.

The aggregation grid was not guessed; it was swept. The team tested 5 cm, 10 cm, and 50 cm grids and settled on 10 cm in production, with a rolling-average kernel of 5 smoothing the fracture-density curve (and a kernel of 10 for the noisier bedding signal). This is disciplined feature engineering — the kind that decides whether a correlation curve is legible or hash — and it is exactly the work the FORCE-2020 prototype let the team rehearse before any operator curve existed.

With per-well density logs in hand, the second half — spatial interpolation between wells — became a geostatistics problem. The team applied 2D and 3D kriging (best linear unbiased prediction) to extend the density signals beyond the borehole wall into the inter-well volume. Kriging needs a variogram model of spatial continuity, so three were fit and compared — Gaussian, linear, and exponential — with linear kriging at a representative well spacing of 40 m giving the cleanest inter-well correlation, and contour analysis swept at 30 m intervals across the well grid. The output is the deliverable the operator actually wanted: a continuous map of fracture and bedding density between wells, not just at them — directional and infill-drilling guidance expressed as a new subsurface layer.

On a target carbonate formation, the result was geologically credible: clear bedding-density correlation across three vertical wells we worked with — strong continuity between two of them and a distinct high-density package mid-section in the third — recovered by the kriged surface without a human drawing a single tie line. Where the open-data bootstrap had been a dress rehearsal, the kriged carbonate map was the live performance.

What the bootstrap actually bought, in numbers

Two figures from the engagement frame the payoff. First, the data-engineering leverage: the supervised image-log dataset feeding the upstream interpretation was grown roughly 65x through augmentation and overlapping-window expansion — a reminder that on small-well-count subsurface problems, the data pipeline is as much of the engineering as the model. Second, the field outcome the operator measured: the W2W workflow was assessed at +60% interpretation productivity and +75% interpretation accuracy versus the manual baseline. Those gains are attributable to the engagement's own carbonate field, not to the Norwegian bootstrap dataset — FORCE-2020's role was to make the method exist and be correct before the operator's data could support it, not to produce the field metric.

That is the division of labour worth internalising. Public data de-risks the method; proprietary data delivers the value. Confuse the two and you either stall waiting for clean operator wells, or you ship a method you only ever validated on the wrong basin. The discipline is to use open data for everything it can legitimately prove — pipeline correctness, patch sizing, kink detection, the transfer mechanism — and reserve the operator's hard-won wells for the one thing only they can prove: that the kriged map is right here, in this reservoir.

Takeaways for the practitioner

If you are building a field-scale subsurface feature and your operator's data is not ready, do not treat that as a blocked sprint — treat the public-data prototype as the first deliverable. A clean open dataset from the wrong basin, FORCE-2020's 118 Norwegian wells in our case, lets you build, debug, and ablate the entire pipeline against ground truth on your own schedule. When the operator's wells arrive you are no longer cold-starting a method; you are aligning a working one onto a new feature distribution — a far smaller, far cheaper problem. Sequence it that way and the chicken-and-egg loop dissolves.

Key takeaways

  1. Field-scale well-to-well correlation has a cold-start trap: the feature meant to accelerate interpretation needs interpreted wells to prototype. Break the loop by bootstrapping the method on public data first.
  2. FORCE-2020 (118 Norwegian-Sea wells, Zenodo 4351156) is the wrong basin geologically but the right basin methodologically — it transfers the pipeline (700-point kink-detection patches, similarity query, kriging) while costing zero proprietary data.
  3. The source-to-target jump is a domain-gap / feature-alignment problem, not a retrain-from-scratch problem: keep the feature space the public data taught you and pull the operator's wells onto that manifold.
  4. The shipped feature engineered fracture- and bedding-density logs sampled every 0.1 m, on a swept 10 cm grid (vs 5/50 cm tested) with rolling-average kernels of 5 and 10, then 2D/3D kriged (Gaussian/linear/exponential variograms; linear at 40 m spacing, 30 m contours) to map density between wells.
  5. Division of labour: public data de-risks the method; proprietary data delivers the value. The carbonate engagement measured +60% productivity and +75% interpretation accuracy — attributable to the operator's own field, with the upstream dataset grown ~65x via augmentation.

References

[1] Bormann, P., Aursand, P., Dilib, F., Manral, S., and Dischington, P. FORCE 2020 Well Well Log and Lithofacies Dataset for Machine Learning Competition. Zenodo, record 4351156 (2020). The 118-well Norwegian-Sea open benchmark used to bootstrap the correlation pipeline. https://doi.org/10.5281/zenodo.4351156

Go to Top

© 2026 Copyright. Earthscan