What We Gave Up and Gained Switching From Classical CV to a U-Net

The first version of our raster-log curve tracer had no neural network in it at all. It was a few hundred lines of deterministic image processing, and on the scan we tested it against it worked well enough that for a couple of weeks we wondered whether we needed the rest of the project. Then we fed it the next box of scans, and it fell apart. That experience, watching a careful classical pipeline go from competent to useless because the paper was a little fainter, is the whole story of why we eventually trained a U-Net. It is also why we still keep the classical baseline around, and reach for it more often than people expect.

This is a head-to-head, not a coronation. We want to be precise about what the deterministic pipeline gave us, what it cost us, and where the learned model earned the right to replace it. The honest finding is that neither method is simply better. They sit on opposite sides of a crossover, and the engineering skill is knowing which side of it you are on before you commit.

The deterministic pipeline, and where it comes from

A classical curve tracer for a scanned log is an assembly of public, well-understood parts, each of which predates deep learning by years or decades. It is worth naming them, because the strength of the approach is exactly that every step is a known quantity with a known failure mode.

The pipeline starts by turning a grey scan into black ink and white paper. Otsu's method picks the binarisation threshold automatically from the image histogram, which is the standard first move and the first thing that breaks when illumination is uneven across a sheet. ^[1] For the rigid geometry on a log, the decade gridlines on a logarithmic resistivity track, the Hough transform is the right tool: each edge pixel votes for the lines that could pass through it, and a peak in the accumulator is a line in the picture. ^[2] Duda and Hart's polar reformulation is what made that voting robust to lines of any orientation, including the vertical rulings a log is full of. ^[3] The free-form data curve, the part you actually want, is usually found by edge-following on a Canny edge map, tracing a connected contour from one detected point to the next. ^[4] Then morphology cleans up: thinning reduces a thick printed stroke to a one-pixel skeleton you can read depth by depth, with the Zhang-Suen algorithm the usual workhorse. ^[5]

None of this is a strawman. As recently as 2019, Yuan and Yang built a complete, credible well-log digitisation system on exactly this footing, with a gridlines-elimination strategy doing the heavy lifting and no learned component anywhere. ^[6] A classical pipeline is not the thing you build before you know better. It is a legitimate answer that, on the right input, is hard to beat.

What it actually cost to run

The deterministic pipeline's headline virtue is that it has no training cost. There is no labelled corpus to assemble, no annotation budget, no GPU bill, no overnight run that you discover diverged in the morning. You write the code, you tune a handful of constants against a sample sheet, and it runs. Set against the learned model we eventually trained, which took ten hours of training time on fifteen thousand synthetic instances before it produced a single useful prediction, the classical baseline's zero training hours is not a rounding difference. It is the difference between shipping something on day one and shipping something in week three.

The cost it does carry is hidden in that phrase "tune a handful of constants against a sample sheet." Every threshold, every structuring-element size, every hysteresis bound is fitted to one scan quality. The pipeline is not learning what a curve looks like in general. It is being hand-adjusted to one particular printing and one particular scanner. That tuning is cheap to do once and expensive to keep doing, because the moment the input drifts, you are back at the dials.

The point where it broke

The break, when it came, was not subtle. A fainter print run, a scanner with a different contrast curve, a sheet that had yellowed and foxed in a filing cabinet for thirty years, any of these moved the image histogram enough that Otsu's threshold started cutting the curve in half or merging it into the gridlines. Edge-following, which depends on an unbroken contour, would lose the trace at the first faded gap and wander off onto a gridline instead. Each failure was individually fixable by retuning, and collectively unwinnable, because there is no single set of constants that survives the variance in a real archive.

This is the property we mean by brittleness, and it is not a flaw in any one operator. It is structural. A pipeline of deterministic steps tuned to a sample has no mechanism for generalising past the sample. The learned alternative does, because it is trained on the variance directly: our synthetic generator draws curves across a wide range of degradation on purpose, from clean reprints to faint and speckled sheets, so the model sees brittleness during training and learns to be flat against it. The cost of that flatness is the ten hours, the fifteen thousand instances, and a clean-sheet peak that is lower than a well-tuned classical pipeline can reach on its single favoured input.

That tradeoff is the substance of the comparison, so we built a bench for it. Drag the scan-degradation lever and watch the two fidelity lines diverge: the classical baseline starts ahead on a clean reprint and collapses through the knee, while the learned segmenter holds a flatter line and overtakes it. The crossover is the operating point where paying the training cost begins to make sense.

A head-to-head bench for the two ways to trace a curve off a scanned well log: a deterministic classical computer-vision baseline (adaptive thresholding, edge-following or Hough voting, morphological thinning) against the learned segmenter we use in VeerNet. Drag the scan-degradation lever from a clean reprint toward a faint, speckled, foxed scan. The classical line starts competitive on a clean sheet but falls off a cliff as the scan degrades, because every threshold and structuring-element size was tuned to one scan quality; the learned segmenter, trained on 15,000 deliberately degraded synthetic instances, holds a flatter line. Where the two cross is the operating point at which the 10-hour training cost starts to earn its keep. Left of the crossover the classical baseline is the right call. Sourced figures reported verbatim in the scoreboard: learned peak IoU 0.51, peak F1 0.55, peak recall 0.97, per-curve Dice IoU 0.26 and 0.21, 10 training hours on 15,000 instances; classical baseline 0 training hours. The fidelity-vs-degradation curves are illustrative shapes anchored at the sourced endpoints.

Reading the learned model's real numbers honestly

It would be easy to read the bench as a win for the network and stop there. The measured figures argue for more humility than that. On the multiclass setting that traces two curves at once, our learned segmenter reached a peak intersection-over-union of 0.51, a peak F1 of 0.55, and a peak recall of 0.97, and per-curve the Dice-trained intersection-over-union landed at 0.26 for the first curve and 0.21 for the second. Those are not the numbers of a solved problem. A recall of 0.97 with an IoU near 0.5 is the signature of a model that finds almost all of the curve but is loose about its exact extent, which on a one-pixel target is precisely the hard part. The network did not make tracing easy. It made tracing robust, which on a heterogeneous archive is the property that matters, but it bought that robustness at a fidelity ceiling we are still pushing on.

The classical pipeline, by contrast, has no general IoU to report, because it has no general behaviour. On its favoured input it can be tight and clean. Off it, it does not degrade gracefully toward a lower score; it produces garbage you have to detect and discard. That asymmetry is why the two columns of the bench do not line up cell for cell, and why comparing a single fidelity number between them would be dishonest.

Where the deterministic baseline still wins

After all of that, we still run the classical pipeline routinely, and not out of nostalgia. There are three situations where it is the correct call, and a learned model is the wrong one.

The first is rigid, parametric structure. Finding and subtracting decade gridlines, detecting the rectangular track boundaries, normalising the sheet before anything else looks at it: these have closed-form descriptions, and a Hough vote or a morphological opening solves them faster and more reliably than a network ever will. We do not ask the U-Net to find a straight line. We ask the Hough transform, and feed the network a cleaner image as a result.

The second is the cold start. When a new operator's scans arrive and there is no labelled data for their particular printing yet, a data-hungry model has nothing to stand on, while a deterministic pipeline that needs only a few tuned constants can produce a usable first pass immediately. That first pass is not just a deliverable; it becomes a source of provisional labels that seed the learned model's training set for that domain. The classical pipeline is the network's bootstrapper, not only its rival.

The third is anywhere the output has to be auditable. A threshold and a structuring-element size are legible to a geoscientist who wants to know why a curve was traced the way it was. A learned mask is a confidence surface, not an explanation. On work where someone downstream needs to defend the trace, the deterministic pipeline's transparency is a feature the network cannot offer.

The shape of the answer

So we did not replace classical computer vision with a U-Net. We rearranged them. The deterministic geometry does the rigid, parametric, auditable work it has always been best at, and hands the network a normalised image. The learned segmenter does the one thing no fixed set of constants survives, tracing a free-form curve through faded ink and scanner noise that varies from one box of paper to the next. The attention-refined encoder-decoder we use, drawing on the long-range reasoning that self-attention brought to vision, sits at the end of a lineage that runs straight back through fully convolutional networks and U-Net to the same edge maps the classical pipeline reads. ^[7] ^[8] ^[9]

What we gave up was the comfort of a method whose every step we could trace by hand. What we gained was a model that holds its footing when the paper does not cooperate. The bench above is really a map of that trade, and the only mistake it warns against is picking a side before you have looked at which scan you are holding.

Key takeaways

The classical curve tracer is an assembly of public, decades-old parts (Otsu thresholding, Hough voting, Canny edge-following, Zhang-Suen thinning) and was a credible end-to-end well-log digitiser as recently as 2019. It is a legitimate baseline, not a strawman.
Its decisive virtue is zero training cost: no labelled corpus, no annotation budget, no GPU run. It ships on day one, where the learned model needed ten hours of training on fifteen thousand synthetic instances first.
Its decisive flaw is structural brittleness. Every threshold and structuring-element size is tuned to one scan quality, and a fainter print or a different scanner moves the histogram enough to break the trace. There is no single set of constants that survives a real archive.
The learned segmenter buys robustness, not easy fidelity. Trained on deliberately degraded data it holds a flatter line as scans degrade, but its measured numbers (peak IoU 0.51, F1 0.55, recall 0.97; per-curve Dice IoU 0.26 and 0.21) describe a model that finds almost all of a curve while staying loose about its exact extent.
The deterministic baseline still wins on rigid parametric structure (gridlines, track boundaries), on the cold start with no labelled data for a new operator, and anywhere the trace must be auditable. We did not replace classical CV with the U-Net; we rearranged them around a crossover and learned where it falls.

References

[1] Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics (1979). The classic histogram-based method for choosing a binarisation threshold automatically, the front end of most classical curve-extraction pipelines. https://ieeexplore.ieee.org/document/4310076

[2] Hough, P.V.C. Method and Means for Recognizing Complex Patterns. U.S. Patent 3,069,654 (1962). The original parameter-space voting scheme for detecting parametric shapes such as straight lines in images. https://patents.google.com/patent/US3069654A/en

[3] Duda, R.O. and Hart, P.E. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Communications of the ACM (1972). The rho-theta polar parameterisation that made the Hough transform practical on lines of any orientation. https://dl.acm.org/doi/10.1145/361237.361242

[4] Canny, J. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (1986). The standard gradient-based edge detector, with hysteresis thresholding and non-maximum suppression, that feeds most classical line and curve extraction. https://ieeexplore.ieee.org/document/4767851

[5] Zhang, T.Y. and Suen, C.Y. A Fast Parallel Algorithm for Thinning Digital Patterns. Communications of the ACM (1984). The widely used thinning method that reduces a thick printed stroke to a one-pixel skeleton you can trace. https://dl.acm.org/doi/10.1145/357994.358023

[6] Yuan, B. and Yang, Q. Digitization of Well-Logging Parameter Graphs Based on a Gridlines-Elimination Approach. Journal of Petroleum Exploration and Production Technology (2019). A morphology-first, fully classical pipeline for extracting curves from scanned well-log graphs. https://link.springer.com/article/10.1007/s13202-019-0700-3

[7] Long, J., Shelhamer, E., and Darrell, T. Fully Convolutional Networks for Semantic Segmentation. CVPR (2015). The shift from per-image classification to learned dense per-pixel labelling. https://arxiv.org/abs/1411.4038

[8] Ronneberger, O., Fischer, P., and Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI (2015). The symmetric encoder-decoder with skip connections that preserves thin-structure detail and learns from scarce labels. https://arxiv.org/abs/1505.04597

[9] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. Attention Is All You Need. NeurIPS (2017). Self-attention as a mechanism for long-range dependency, later folded into vision backbones at the bottleneck of encoder-decoder networks. https://arxiv.org/abs/1706.03762

What We Gave Up and Gained Switching From Classical CV to a U-Net

The deterministic pipeline, and where it comes from

What it actually cost to run

The point where it broke

Reading the learned model's real numbers honestly

Where the deterministic baseline still wins

The shape of the answer

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on