The Log File Said One Thing, Excel Said Another: The Depth Mismatch That Stalled Labeling

Every supervised borehole-AI project has a moment, somewhere in the first weeks, when the model is the least interesting thing in the room. We hit ours on a roughly twenty-month engagement with a mid-sized Middle East carbonate operator, the day we tried to pair a geologist's dip picks with the image they were picked from and discovered the two files did not agree on where the well was. The picks lived in an Excel sheet and ran from 2050.42 m to 3441.27 m. The binary wireline log file — the digital log format the picks were supposedly drawn against — carried pixels from 2050 m to 2899.9993 m and then stopped. Fourteen hundred metres of labels, half of them pointing at depth where no image existed. No transformer, no augmentation policy, no loss function fixes that. Reconciling pick-to-image depth was a prerequisite no model could skip, and it is the part of the work that never makes it into the paper.

This is a field note about that reconciliation — what breaks, why it breaks, and why depth registration is the first real engineering task in a borehole computer-vision pipeline, ahead of the model by a wide margin.

Two files, two coordinate systems, one well

A supervised image-log model needs exactly one thing to begin: a patch of image and a label that says here, at this depth, with this dip and this azimuth, is a sinusoid. That sounds like a join on a depth key. It is not, because the two sides of the join were produced by different tools, in different formats.

The binary wireline log file is the canonical record — a binary well-log container, the same format our ingestion boilerplate reads for any such log, holding the static and dynamic image arrays with each row stamped with a measured depth. Its depth axis is the ground truth for where a pixel sits in the earth. The Excel sheet is the human artefact: a geologist's raw dip picks, one row per interpreted feature, each carrying a depth, an apparent dip, an azimuth, and a type code — precise where the interpreter looked and silent where they did not. When those two depth axes disagree, you do not have a labeling problem. You have a registration problem, and it has to be solved before labeling can even be defined.

The mismatch, in numbers

The cleanest way to see the failure is to lay the two ranges side by side. On one well, the Excel picks spanned 2050.42 to 3441.27 m. The binary wireline log image covered 2050 to 2899.9993 m. The image simply was not present from 2900 to 3441 m — over five hundred metres of picks with no pixels behind them. Naively zipping picks to image rows by index, or by nearest depth, would have silently mapped a third of the labels onto the wrong rock or onto padding.

It was not an isolated quirk. On another well the Excel sheet ran from 1934.08 to 2048 m while the binary wireline log stopped at 2038 m, leaving the interval 2038 to 2048 m with picks but no image — and the very last pick, a bedding plane logged at 2048 m as a flat 90/90 feature, jumped straight there from 2036.12 m with nothing between.

The picks themselves were not continuous either. Within a single well the depth sequence lurched: 2898 to 3100 m, 1731 to 2054 m, 2096 to 2318 m, 917 to 1109 m. Hundreds of metres with no interpretation at all, bracketed by dense picking elsewhere. An interpreter works the zones that matter and skips the rest; that is good geology and terrible training data if you assume the label stream is dense. Treat those gaps as "no fracture here" and you teach the model that long, richly fractured intervals are barren — a false-negative factory baked in at the data layer.

Segmentation accuracy on raster logs is bounded by source-scan quality and by curve type, not by the architecture. Drag the scan from a clean studio scan to a 4th-generation photocopy of microfiche: noise and fade build on the raster, and the recovered curve degrades — the smooth Gamma Ray trace stays largely locked while the sharp Caliper trace fragments first and worst. The published F1 = 35% (10K mixed archive) is a floor that cleaner inputs lift by +12 F1, not a ceiling. F1 35%/IoU 30%, Pearson r = 0.62 on GR, the +12-curated point and +5K-CALI are the whitepaper's own; the schematic raster, recovered curve and the F1 interpolation are illustrative.

The visual above is the mental model we ended up working to: the usable label set is the intersection of where the image exists and where the interpreter actually picked, and the recoverable fraction collapses fast as either coverage shrinks. Everything outside that intersection is not data — it is a depth-indexed trap that, fed to a detector as a positive or a negative, silently corrupts the gradient the model trains on.

Why the axes drift apart

Three mechanisms, none of them exotic, produce the drift — and naming them is half the fix.

Coverage truncation. Image acquisition and interpretation are different jobs run at different times. The interpreter's spreadsheet can extend above and below the logged image interval — picks carried over from another run, or depths typed against a plan rather than the acquired curve. The binary wireline log file is the only file that knows the true top and bottom of the image, so the log-file coverage envelope, not the spreadsheet, has to define the valid labeling window.

Sub-pixel depth resolution. Even inside the covered interval, registration is never exact to the millimetre. One pixel in this binary wireline log corresponds to about 3 cm of borehole, so a constant ±3 cm depth uncertainty is structurally present in every label before anyone makes a mistake. That number sets the floor for how tightly you can ever score a depth match — which is exactly why downstream we evaluate fracture picks in 3, 6, and 9 cm depth bands rather than pretending to millimetre precision.

Channel and format ambiguity. The same binary wireline log can carry static and dynamic image variants whose value ranges differ and whose depth extents need not match — on one well the static array reached 16 m deeper than the dynamic. The numeric ranges were anomalous too: a compact-imager log arrived with static and dynamic values squeezed into a 0 to 15 band, where a normal high-resolution borehole image static runs from roughly −10⁴ to 10⁴ and a dynamic from 0 to 255. Pick the wrong channel, or assume both share an axis, and the patch you hand the model is registered to the wrong depth or normalised on the wrong scale.

Reconciliation as a pipeline stage, not a cleanup chore

The instinct on a deadline is to fix this by hand — clip the spreadsheet, eyeball the offsets, move on. We did the opposite, and it is the recommendation that survives the project: make depth reconciliation an explicit, deterministic stage of the data pipeline, with the same rigour you would give a model component. In practice that stage does four things, in order:

Adopt the binary wireline log file as the depth authority. Read the image's true top, bottom, and step from the binary header. That envelope, intersected across the channels you intend to use, is the only region where a label can be valid; everything in the spreadsheet outside it is quarantined, never silently dropped.
Register picks to image rows by physical depth. Map each Excel pick to its nearest log-file depth row, carry the residual as metadata, and reject any pick whose nearest row is farther than the pixel-resolution tolerance. The ±3 cm pixel floor becomes the acceptance band, not a fudge factor.
Distinguish "interpreter skipped" from "no feature." Flag interpreted intervals explicitly and exclude un-interpreted depth from the negative-sampling pool, so the model never learns that an unlabelled but fractured zone is empty rock.
Pin the channel and its normalisation. Resolve static-versus-dynamic per well and refuse logs whose range falls outside the expected envelope until they are renormalised — the same gate that caught the 0–15 compact-imager logs before they could poison a training batch.

The point of writing it down as a stage is reproducibility. A hand-clipped spreadsheet is a one-off; a deterministic reconciliation step re-runs when the next ten wells arrive, versions alongside the data, and answers cleanly when a metric moves and someone asks whether the labels or the model changed. In MLOps terms it is the difference between a notebook and a contract — the data contract every downstream component, from patch extractor to detection transformer, is allowed to assume. It is also the stage that makes a training run re-derivable: when an ablation shifts a sensitivity number, you can prove the input tensor was identical and the architecture was the only thing that moved.

The engineering lesson

None of this is glamorous, and that is precisely the lesson. The headline numbers a borehole-AI program reports — fracture sensitivity, dip and azimuth accuracy, the ablation that shows the model learning — are all downstream of one quiet guarantee: that a pixel and its label point at the same depth in the same rock. Get the registration wrong and every metric above it measures the wrong thing with great precision, and no amount of backbone search, augmentation, or learning-rate tuning recovers a gradient computed against mis-registered ground truth.

Across our subsurface engagements, from Middle East carbonates to operators in the United States, the first weeks are almost never about architecture. They are about making the files agree on where the well is. A team that treats log-file-to-Excel reconciliation as a named, tested, versioned pipeline stage will train a model. A team that treats it as a cleanup chore will train one too — on labels that quietly point at the wrong rock — and spend months blaming the loss function.

Key takeaways

Depth registration, not modelling, is the first real engineering task in a supervised image-log pipeline. On one well, dip picks in Excel ran 2050.42–3441.27 m while the binary wireline log image only covered 2050–2899.99 m — over 500 m of labels with no pixels behind them.
Treat the binary wireline log depth axis as the single source of truth. The image's true top, bottom, and step (intersected across channels) define the only window where a label can be valid; everything in the spreadsheet outside it must be quarantined, not silently joined.
Interpreter pick gaps (e.g. depth jumping 2898→3100, 1731→2054, 917→1109 m) are 'not interpreted', not 'no feature'. Sampling negatives from those gaps teaches the model that fractured intervals are barren — a false-negative factory at the data layer.
Sub-pixel depth resolution sets a hard floor: one binary wireline log pixel ≈ 3 cm, so a ±3 cm registration uncertainty is always present. That floor is exactly why downstream fracture picks are scored in 3/6/9 cm depth bands, not at millimetre precision.
Pin the channel and its normalisation per well. Static vs dynamic arrays can differ in depth extent (one static reached 16 m deeper than its dynamic) and in value range (a compact-imager log arrived at 0–15 where a high-resolution borehole image static is normally ≈ −10⁴–10⁴) — both must be resolved before a patch is registered.
Make reconciliation a deterministic, versioned pipeline stage — a data contract every downstream component can assume — not a hand-clipped, one-off cleanup. Reproducibility is the whole point: it is what lets an ablation attribute a metric shift to the model rather than the labels.

The Log File Said One Thing, Excel Said Another: The Depth Mismatch That Stalled Labeling

Two files, two coordinate systems, one well

The mismatch, in numbers

Why the axes drift apart

Reconciliation as a pipeline stage, not a cleanup chore

The engineering lesson

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on