Skip to main content

Case Study

The Preprocessing Lever: How One Normalisation Choice Moved Class Error From 63.45 to 2.536

An ablation isolated image normalisation as a first-order lever: feeding dynamically normalised high-resolution borehole image logs instead of static dropped class error from 63.45 to 2.536 — and surfaced a compact-microresistivity domain trap that no architecture change could have rescued.

Tannistha Maitiby Tannistha Maiti
Case study

The most expensive bug in an applied-AI subsurface pipeline is rarely in the model. It is upstream, in the four lines of preprocessing that decide what pixels the model ever sees. In our work with a mid-sized Middle East carbonate operator, we ran a controlled ablation on a DETR-derived borehole interpreter — same architecture, same loss, same training budget — that toggled exactly one variable: whether the high-resolution borehole image log was fed to the network in its static or its dynamic normalisation. Switching from static to dynamic dropped the classification error from 63.45 to 2.536 and the combined Hungarian matching loss from 0.119 to 0.015. No layer was added. No hyperparameter was swept. The entire gain came from a normalisation choice that lives in the data layer, not the model layer.

That single result reframes how a serious team should allocate its engineering effort on image-log AI — and it sets up a second, sharper finding about a compact-microresistivity domain trap that would have silently poisoned the whole pipeline if we had not instrumented for it.

Static and dynamic are two different images of the same rock

A high-resolution borehole image log measures micro-resistivity around the borehole wall and unwraps it into a 2D image, with depth on the vertical axis and azimuth on the horizontal. Before that array becomes a picture a model can read, the raw button-current values have to be mapped to a display range. Two normalisations dominate the workflow, and they are not cosmetic variants — they are different functions of the data.

Static normalisation applies one global mapping over a long interval. It preserves the absolute, formation-scale contrast — you can compare resistivity between two zones hundreds of metres apart — but it crushes local detail. A hairline conductive fracture sitting in a uniformly resistive matrix occupies a sliver of the global range, and after the global map it is a few barely-distinguishable grey levels.

Dynamic normalisation re-scales the map over a short sliding window — roughly a metre of borehole at a time. It throws away absolute comparability and spends the entire dynamic range on local structure. That same hairline fracture now spans the full contrast of its window. To a human interpreter, the dynamic image is where you pick the sinusoids; the static image is where you correlate zones. To a convolutional backbone feeding a transformer encoder, the difference is even starker: the dynamic image hands the attention layers the edge gradients they need to localise a sinusoid, while the static image hands them mush.

We held everything else fixed and measured it. The model is a Geological Beddings and Fractures Detection Transformer — a ResNet-10 backbone, a 4-layer transformer encoder and 4-layer decoder, trained from scratch for 100 epochs at batch size 256, AdamW at learning rate 0.0001, with depth, dip and azimuth regressed against ground truth after normalising the targets by 100, 90 and 360 into a 0–1 range. Against that frozen configuration, the only swap was the input normalisation, drawn from matched static and dynamic patches of the same vertical well so the comparison was like-for-like on the rock.

The number that should reorder your roadmap

SCAN QUALITY IS THE BOUND · NOT THE MODELr = 0.62Gamma Ray — solidly recoveredachievable F1 ~ 35% at this scanDegrade the scan — watch the recovery, not the architectureF1 = 35% is a floor set by source-scan quality. Cleaner inputs lift it +12.Gamma RayCaliperSCAN QUALITYstudio scan4th-gen photocopyRASTER LOG (schematic)— GT · — recovered (holds)RECOVERABLE F147% · curated single-source (+12)35% · 10K mixed archive35%GR is continuous — the model locks onto itF1 35%/IoU 30% (10K mixed), r = 0.62 on GR, +12 F1 curated & +5K-CALI are the whitepaper's own · raster, recovery & the F1 interpolation are schematic
Segmentation accuracy on raster logs is bounded by source-scan quality and by curve type, not by the architecture. Drag the scan from a clean studio scan to a 4th-generation photocopy of microfiche: noise and fade build on the raster, and the recovered curve degrades — the smooth Gamma Ray trace stays largely locked while the sharp Caliper trace fragments first and worst. The published F1 = 35% (10K mixed archive) is a floor that cleaner inputs lift by +12 F1, not a ceiling. F1 35%/IoU 30%, Pearson r = 0.62 on GR, the +12-curated point and +5K-CALI are the whitepaper's own; the schematic raster, recovered curve and the F1 interpolation are illustrative.

The instrument above is an illustrative companion to the image-log ablation, not a measurement from it: it renders the same structural law on a different input modality — the raster image log, where digitisation accuracy tracks the fidelity of the source scan rather than the depth of the network. A clean, high-contrast input and a degraded one fed to the identical model end up in different places, and no architecture change closes that gap. That is the recurring lesson across our borehole-AI work, on image logs and raster logs alike: input fidelity is a first-order lever, and it is one you control with preprocessing engineering, not model search. On the borehole-image interpreter, the static-to-dynamic swap is the on-image-log instance of exactly that law — and here we measured it directly.

Here is what the swap bought, on the same model, same data, same budget:

  • Classification error fell from 63.45 to 2.536. On static input the model was effectively guessing which sinusoids were fractures versus beddings — a 63% class error is not a model that needs tuning, it is a model that cannot see the features. On dynamic input that same architecture resolved them cleanly.
  • The combined Hungarian matching loss fell from 0.119 to 0.015, and the geometry (parameter) loss from 0.462 to 0.059. The set-prediction head could only match predicted sinusoids to ground truth one-to-one once the backbone had real contrast to extract.
LOSS-FUNCTION ABLATION · 5 CANDIDATESWINNERLovász-Softmax · shipped F1 35% / IoU 30%GRADIENT MUST MATCH THE METRICPick a loss — the optimiser only sees the gradientOnly the loss aligned with IoU/F1 learns curve continuity. The rest hand-wave it.LovászSCEDiceTverskyFocalWINNERLovászfine-tuneSTRONGSCEwarmupSOLID · 3rdDicePROMISINGTverskyDROPFocalrank order sourced · bar heights illustrativeWHY THIS VERDICTLovász-SoftmaxDirectly optimises IoU — the metricwe report. Gradient aligns; curvecontinuity is preserved.— ground truth — prediction (schematic)✓ Two-loss schedule (SCE → Lovász)SCE warmup → Lovász fine-tune: same accuracy, half the wall-clock.Five candidates, verdicts, F1 35%/IoU 30% & the two-loss schedule are the whitepaper's own · podium heights & thumbnails are schematic
Loss-function choice decides whether the network learns curve continuity. VeerNet tested five losses under identical conditions; only the one whose gradient aligns with the IoU/F1 metric (Lovász-Softmax) wins, and the shipped answer is a two-loss SCE-warmup → Lovász-finetune schedule. Pick a loss to see its ablation verdict, the reason, and a schematic ground-truth-vs-prediction trace; toggle the two-loss schedule (same accuracy, half the wall-clock). The five candidates, verdicts, F1 35%/IoU 30% and the two-loss schedule are the whitepaper's own; the podium bar heights are ordinal (rank sourced) and the prediction thumbnails are schematic.

For context on the magnitude, this preprocessing lever is comparable to the largest architectural levers we found in the same study. Removing geometry-preserving augmentation pushed class error from 2.618% back up to 100%; choosing a heavy ResNet-34 backbone over a lean ResNet-10 collapsed it from 0.499 to 26.759; widening the training set from 3 to 14 wells dropped the Hungarian loss from 0.801 to 0.015. Image normalisation sits in that same top tier of effect sizes — which is the whole point. Teams instinctively spend their budget on backbones and learning-rate schedules. The data tells you to spend a meaningful slice of it on the four lines that decide what the model is allowed to see.

The compact-microresistivity domain trap: when your preprocessing assumption is silently false

The static-versus-dynamic result is the headline. The compact-microresistivity domain trap is the finding that earned the engagement its keep, because it is the kind of failure that does not announce itself with a bad metric — it announces itself with a model that trained fine and is quietly wrong.

The 14-well corpus we worked from mixed two different microresistivity imaging tools — a standard high-resolution borehole image log and a compact microresistivity tool run in slimmer holes. Our pipeline normalised every image into the display range a CNN backbone expects, on the documented assumption that a static high-resolution image arrives in the usual wide-amplitude range — on the order of negative ten-thousand to positive ten-thousand for static, and the conventional 0–255 for dynamic. That assumption is load-bearing: the normalisation arithmetic that converts raw values into model-ready pixels is calibrated to it.

When we instrumented the actual value distributions per tool, the compact-microresistivity static images broke the assumption. Their static range sat at roughly 0 to 15 — three to four orders of magnitude narrower than the expected static envelope, and even narrower than the conventional dynamic range. A normalisation routine written for amplitudes spanning ten thousand, applied to a tool whose values span fifteen, does not throw an error. It produces a near-flat image — a domain that the model has never been trained on and cannot interpret — and it does so invisibly, because every shape downstream is still valid and every batch still runs.

Why this is the dangerous class of bug

A crash is a gift — it stops the pipeline and points at the line. A silent domain shift is the opposite: training converges, the loss curve looks healthy, and the model ships predictions on inputs that fell outside the distribution it learned. The only defence is to measure the per-source value distribution before normalisation and gate on it — treat the input domain as something you assert and test, not something you assume.

This is also why two wells in the early dataset had to be set aside: their static images arrived in an abnormal range, not pre-normalised to the 0–255 the pipeline assumed, and feeding them through the standard path would have meant training on garbage. The discipline that caught it was not geological intuition — it was treating the data pipeline as production software, with explicit, asserted contracts on every input domain. The fix was equally unglamorous: a per-tool normalisation branch that detects the value envelope and maps each instrument into the model's expected range, rather than a single hard-coded transform that quietly assumes one acquisition vintage.

The dataset progression tells the same story in reverse

The corpus itself evolved in a way that underlines the preprocessing thesis. Early model generations were built on three dynamically normalised high-resolution image-log wells, then eight dynamic wells, and only later widened to ten wells that arrived statically normalised. Each transition was not just "more data" — it was a change in the input domain, and each one had to be re-validated as such. A team that treats "we added wells" as a monotone good, without checking that the new wells share the normalisation and value distribution of the old ones, is one acquisition vintage away from the compact-microresistivity trap. Geological diversity is what the model is hungry for; domain consistency is the contract that lets it digest the diversity at all.

What an applied-AI team should take from this

The engineering moral is not "dynamic is better than static" — that is a useful fact, but it is the small lesson. The large lesson is about where the leverage and the risk live in a subsurface vision pipeline.

The leverage lives in the data layer. A normalisation choice produced an effect size on par with the biggest architectural decisions in the study, at a fraction of the engineering cost. Before a team reaches for a heavier backbone or a longer schedule, it should ablate its preprocessing — because the cheapest large win is almost always upstream of the model.

The risk lives in the data layer too. The compact-microresistivity domain trap is a reminder that every preprocessing step encodes an assumption about the input distribution, and assumptions that are never asserted are assumptions that eventually break in silence. The MLOps practice that matters here is mundane and non-negotiable: instrument the per-source value distribution, gate on it, branch the normalisation per acquisition tool, and fail loudly when an input falls outside the contract. That instrumentation is what turns a research pipeline into a production system you can trust on the next well, the next field, and the next operator.

Preprocessing as a first-order lever: the normalisation ablation

Before

Static normalisation, single hard-coded transform

Global colour map crushes local sinusoid contrast; one normalisation assumed for all tools — compact-microresistivity static range (0–15) silently out of domain

After

Dynamic normalisation, per-tool domain gate

Local contrast resolves hairline fractures; per-instrument value-range detection branches the transform and fails loudly on out-of-domain input

Class error 63.45 → 2.536; Hungarian loss 0.119 → 0.015

The preprocessing lever, in three points

  1. Toggling high-resolution borehole image-log input from static to dynamic normalisation — with the model, loss and budget frozen — dropped class error from 63.45 to 2.536 and Hungarian matching loss from 0.119 to 0.015: an effect size on par with the study's biggest architectural levers, won entirely in the data layer.
  2. Dynamic normalisation re-scales contrast over a ~1 m window, handing the convolutional backbone and attention layers the local edge gradients needed to localise sinusoids; static normalisation preserves well-scale comparability but crushes the fine features the model must detect.
  3. The compact-microresistivity domain trap — a compact microresistivity tool whose static values spanned roughly 0–15 instead of the assumed ±10,000 — would have trained silently on near-flat, out-of-distribution images; the fix is to instrument per-source value distributions, gate on them, and branch normalisation per acquisition tool rather than hard-code one transform.

References

  1. GeoBFDT normalisation, augmentation, backbone and well-count ablation figures derived from internal validation on a 14-well Middle East carbonate dataset acquired with two different microresistivity imaging tools; data and code withheld under operator confidentiality.

  2. Carion et al. (2020). End-to-End Object Detection with Transformers (DETR). ECCV 2020. https://arxiv.org/abs/2005.12872

  3. Per-tool static and dynamic value-range characterisation across the two microresistivity imaging tools, and the per-source normalisation gate, are internal EarthScan engineering practice developed during the engagement.

Go to Top

© 2026 Copyright. Earthscan