Converting a Two-Class Mask into a Depth-Indexed Curve

The day the segmentation network finally produced a clean two-curve mask, we celebrated for about an hour. Then someone tried to hand it to a petrophysicist, and the obvious thing became undeniable: nobody loads a mask. A petrophysicist loads a curve, a column of values indexed by depth, and what our model returned was a grid of coloured pixels indexed by row and column. Between the thing the model was proud of and the thing a working geoscientist could open sat a short, unglamorous program nobody had written yet. This is our account of writing it, and of finding that the arithmetic in that program decided whether the whole pipeline was trustworthy or merely impressive.

The deliverable was never the mask

We had spent months on VeerNet, our raster-log digitization model, and the segmentation results felt like the achievement. A scanned well-log went in as a tall grayscale image, and a three-class softmax came out: background, curve one, curve two, one label per pixel. On a good scan the two curve classes traced the analog ink almost exactly. It looked done.

It was not, because the customer artefact is a LAS file, the plain-text digital log standard every petrophysical tool reads, and a LAS file is a table [2]. Its first column is depth, sampled at a fixed step from a start depth in the file header; every other column is a measured value at that depth. A pixel mask has none of that. Its vertical axis counts pixel rows from the top of the scan and its horizontal axis counts pixel columns, and neither means anything to a reservoir model. The mask says where the ink is on the page. The LAS says what the rock reads at four thousand one hundred and twenty feet. Closing that gap was the post-processing step, and we had been treating it as a footnote.

The shape of the gap

The mask is indexed by pixel row and pixel column. The deliverable is indexed by physical depth. Turning the first into the second is one affine map followed by a resample, and every decision inside those two operations is a decision about whether a number a petrophysicist reads off the curve is correct.

What we set out to produce, and the constraints around it

The goal was narrow and testable: take the two-class mask for a single log and emit a depth-versus-value curve for each of the two curves, sampled on the same depth axis the operator's own LAS ground truth used, so the two could be compared value for value. We had LAS files for the wells in our validation set, so the number that would tell us whether the post-processing was honest was the gap between our curve and theirs at every sampled depth.

The constraints make this harder than it sounds on a whiteboard. The scan resolution and the LAS sample step are unrelated; several pixel rows fall inside one depth sample, and the ratio is not a round number. The mask is a band a few pixels thick, not a one-pixel line, so a value has to be reduced out of it before it can be placed on a depth. And the recovered curve and the ground-truth curve almost never share the same native sampling, so to compare them at all, both have to be resampled onto one common axis. We fixed that axis at three hundred interpolated depth points across the validated interval, the resolution our validation notebooks ran every comparison at.

How the indexing actually went

The first version was written the way a first version always is, fast and slightly wrong. We collapsed each curve mask to one horizontal position per pixel row, stacked those positions top to bottom, and called the row index the depth. That produced a curve, and the curve even looked plausible when you plotted it. But the depth axis was pixel rows, not feet, so when we lined our curve up against the LAS ground truth the two were on different rulers and the comparison was meaningless. The error metric came back enormous, and for a confusing afternoon we thought the model had regressed.

The model had not regressed. The indexing had. The fix was to write the affine map explicitly: physical depth equals the depth-start from the LAS header plus the pixel-row index multiplied by the depth-step per row. That is the entire conversion in one line, and it is the line everything else hangs on. Get the start and step right and every pixel row lands on the depth it actually represents. Get either one slightly wrong and the whole curve slides up or down the borehole, and a curve that is shifted by even a fraction of a sample is read against the wrong rock at every point.

Then came the rounding decision, where we lost the most time. After the affine map, our row-derived depths did not coincide with the LAS sample depths, because the two grids were never aligned. We had to resample. The naive instinct is to round each row-derived value to the nearest LAS depth and write it there, and that is exactly the decision that quietly destroys accuracy. Rounding to the nearest sample throws away the sub-sample position the affine map worked to compute, and it does so unevenly down the curve, so the recovered values jitter against the truth in a pattern that looks like noise but is an indexing artefact. We replaced the round-and-place step with a proper interpolation, using the standard scientific resampling routine to evaluate each recovered curve at the three hundred LAS depths rather than snapping it to them [3]. On the same model output, that was the difference between a curve that tracked the LAS values and one that disagreed with them sample by sample for no reason a geoscientist could explain.

300

common LAS axis

Interpolated depth points

0.11 / 0.12

vs LAS ground truth

curve1 / curve2 CSV MAE

0.03 / 0.04

squared, lower is better

curve1 / curve2 CSV MSE

Reading the recovered curve against the real one

Once the affine map and the interpolation were both correct, we could finally measure the thing that mattered, which was never the mask quality and always the value the curve reports at a depth. We resampled both the recovered curve and the LAS ground truth onto the same three hundred depth points and took the gap between the two functions sample by sample. Under the Dice-loss multiclass model the mean absolute error came back at 0.11 for curve one and 0.12 for curve two, with mean squared errors of 0.03 and 0.04. Those are the numbers that say the post-processing is honest: the recovered curve agrees with the operator's own digitized log closely enough, in the curve's own units, that a petrophysicist can load it and work.

The exhibit below is the diagnosis we kept open while we debugged the indexing. It renders the row-to-depth map itself and lets you drag the alignment off its correct setting to watch the consequence. At zero offset the recovered curve lands on the right depth and the reconstruction error sits at the sourced floor. Nudge the alignment by even a fraction of a pixel row and the curve walks off the LAS values, the absolute error climbs, and the squared error climbs faster, because squaring the per-sample gap punishes a half-sample slip harder than averaging it does. That divergence is exactly why we report both.

A per-pixel mask is indexed by pixel row; a curve a petrophysicist can load is indexed by physical depth. The conversion is one affine map, depth = start + pixelRow x step, followed by a resample onto the LAS file's own 300 sampled depths, and getting that arithmetic right is what makes the output trustworthy. Drag the row-to-depth alignment offset: at zero offset the recovered curve (teal and, for curve 2, teal) lands on the correct LAS depth and the reconstruction error sits at its sourced floor, the curve1 / curve2 CSV MAE of 0.11 / 0.12 and CSV MSE of 0.03 / 0.04 measured against the LAS ground-truth values under the Dice loss. Push the offset off zero and the recovered curve is read at a depth up to half a sample wrong, so it walks off the grey LAS truth and both error bars climb above the dashed floor, with the squared MSE diverging faster than the absolute MAE. The 300 point count and the four floor error figures are sourced from the engagement archive; the depth-step constants, the pixel-row band and the curve shapes are illustrative, and the inflation returns exactly the sourced floor when the indexing is aligned.

What the exhibit makes physical is the thing the confusing afternoon taught us the hard way. The reconstruction error is not only a property of the model. A perfect mask, indexed against the wrong depth, produces a curve that scores badly, and a good-enough mask, indexed correctly, produces a curve that scores at the floor. The MAE of 0.11 and 0.12 we shipped was earned twice: once by the segmentation network and once by the four lines of indexing arithmetic that placed its output on the right ruler.

Why the segmentation lineage made the last mile easy to underrate

This step is easy to treat as an afterthought for a structural reason that will catch the next team too. The model that produces the mask sits in a long, prestigious research lineage. Per-pixel semantic segmentation with an encoder-decoder is the well-trodden path, and the metrics that lineage rewards, intersection over union and per-class F1, are all computed in pixel space [4]. Everything the research literature trains you to optimize lives on the page, in the geometry of the mask. The depth-indexing step lives outside that frame. It has no place in the segmentation paper, no benchmark, no leaderboard, so it gets written last, quickly, by whoever is closest to the deadline.

The well-logging digitization literature is more honest about this, because it has always had to be. Work on extracting curves from scanned parameter graphs treats the geometry-to-value conversion as a first-class problem rather than a clean-up step, because the people doing it knew the deliverable was a number at a depth, not a trace on an image [1]. We came at the problem from the segmentation side and had to relearn what the digitization side already knew: the value of the whole pipeline is set at the last mile, where pixels become depths, and that mile rewards arithmetic discipline far more than model cleverness.

What the indexing step decided

The deliverable is a depth-versus-value curve, not a mask, so the post-processing has to apply the affine map depth = start + pixelRow x step from the LAS header before any value means anything.
Resampling onto the common axis must interpolate, not round to the nearest sample: rounding throws away the sub-sample position and makes the recovered curve jitter against the truth in a way that looks like model noise but is an indexing artefact.
Measured on three hundred interpolated depth points against the LAS ground truth, the corrected pipeline reached CSV MAE 0.11 and 0.12 and CSV MSE 0.03 and 0.04 for curve one and curve two, and a half-row indexing slip was enough to inflate every one of those figures.

What this engagement changed about how we grade a model

The habit we walked out with is to refuse to grade a digitizer in pixel space ever again. For the rest of the project, a model was only as good as the curve it produced after indexing, measured against a real LAS file in the curve's own units, and the mask metrics became a diagnostic for the segmentation stage rather than a claim about the deliverable. That reordering is small to describe and it changed which problems we worked on, because it made the four lines of indexing arithmetic as important as the loss function, and they turned out to be exactly as easy to get wrong.

The curve a petrophysicist loaded at the end of this work reads 0.11 and 0.12 away from the operator's own log, on average, across three hundred depths. That number belongs partly to a segmentation network and partly to a short, careful program that placed its output on the right ruler and interpolated rather than rounded. We had built the network first and assumed the program was trivial. The program was where the trust lived.

References

Yuan, B. and Yang, Q. (2019). Digitization of Well-Logging Parameter Graphs Based on Gridlines-Elimination Approach. Journal of Petroleum Exploration and Production Technology. https://link.springer.com/article/10.1007/s13202-019-0639-4
Canadian Well Logging Society (1992). LAS Version 2.0: A Digital Standard for Logs, Log ASCII Standard. https://www.cwls.org/products/#products-las
Virtanen, P. et al. (2020). SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17. https://www.nature.com/articles/s41592-019-0686-2
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ECCV 2018. https://arxiv.org/abs/1802.02611

Converting a Two-Class Mask into a Depth-Indexed Curve

The deliverable was never the mask

What we set out to produce, and the constraints around it

How the indexing actually went

Reading the recovered curve against the real one

Why the segmentation lineage made the last mile easy to underrate

What this engagement changed about how we grade a model

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on