From Hough Transforms to U-Nets: how document-image AI learned to read curves

A scanned well log is a picture of a measurement. Somewhere on that sheet a gamma-ray curve wanders down the page, a resistivity track loops past its own decade gridlines, and a caliper trace jitters against the borehole wall. The number you actually want, the value at each depth, was thrown away the moment the curve was printed and the paper went into a filing cabinet. Getting it back means teaching a machine to look at the image and recover the line. That problem, reading a continuous curve out of a raster, is older than deep learning by half a century, and the methods that solve it today are the descendants of a long argument about what it means for an algorithm to see a line at all.

This is a primer on that lineage. It is worth walking because the techniques did not replace each other so much as absorb each other, and the architecture that now reads a scanned log carries fingerprints from every prior generation. Understanding the chain is the fastest way to understand why automated raster-log digitisation became practical when it did.

A line is a vote: the Hough transform

The starting point is a single elegant idea from 1962. Paul Hough, working on bubble-chamber photographs, patented a way to detect straight lines without ever testing candidate lines against the image directly. The trick is to flip the question. Instead of asking "which pixels lie on this line," you let each edge pixel ask "which lines could pass through me," and you tally the answers in a separate accounting space. [1]

A point in the image is consistent with infinitely many lines, but those lines form a curve in parameter space. Every edge pixel draws its curve, and wherever many curves cross, many pixels agree on the same line. A peak in the accumulator is a line in the picture. Duda and Hart's 1972 reformulation, parameterising each line by its distance from the origin and its angle rather than slope and intercept, removed the infinities that broke the original on vertical lines and made the method something you could actually ship. [2]

The Hough transform still matters for log digitisation because so much of a log sheet is rigidly geometric. Decade gridlines on a logarithmic resistivity track are straight, evenly spaced, and predictable, and a Hough vote finds them cleanly even when the curve crosses them. The generalisation to circles and arbitrary shapes extends the same voting logic to anything you can write down a parametric form for. What the Hough transform cannot do is read a free-form curve that has no closed parametric description, which is most of what a real log curve is.

Cleaning the strokes: morphology and thinning

Before a network ever sees a scanned log, the image has to be made legible, and the classical toolkit for that is mathematical morphology. Most of these pipelines start at the same place, an edge map: Canny's gradient-based detector, with its hysteresis thresholding and non-maximum suppression, became the standard front end that turns a grey scan into clean stroke boundaries for the Hough voter and the morphology that follows. [3] Erosion and dilation, opening and closing, are set operations on the black-and-white pixels of a thresholded image. They close the hairline breaks where a printed curve faded, strip away speckle from foxed and yellowed paper, and separate a curve that has bled into an adjacent gridline.

Two operations earn their keep on logs in particular. Morphological thinning reduces a thick printed stroke to a one-pixel-wide skeleton, which is the difference between a smear of ink and a curve you can trace depth by depth. Gridline elimination, in turn, exploits the fact that gridlines and data curves differ in orientation and connectivity, so a directional opening can subtract the grid and leave the trace. Yuan and Yang built an entire well-log digitisation pipeline on exactly this gridlines-elimination strategy, recovering curves from scanned parameter graphs with morphology doing the heavy lifting and no learned component at all. [8] Their work is a useful marker: as late as 2019 a careful classical pipeline was a credible answer to this exact problem. The catch is brittleness. Every threshold and structuring-element size is tuned to one scan quality, and a fainter print run or a different scanner breaks the tuning.

When the network started labelling pixels

The break came when segmentation stopped being a sequence of hand-set operations and became something a network learned end to end. The hinge was the fully convolutional network. Long, Shelhamer, and Darrell took a classification network, the kind that emits one label for a whole image, and replaced its dense layers with convolutions so it emitted a label for every pixel. [4] That sounds like a small change. It was the moment "is there a curve in this image" became "which pixels are curve," learned directly from examples rather than engineered rule by rule.

Fully convolutional networks downsample hard to build up semantic context, then upsample back to full resolution, and the upsampling loses the fine spatial detail that a thin curve lives or dies on. The fix that defined the next decade arrived almost immediately. U-Net, published in 2015 for biomedical microscopy, paired a contracting encoder with a symmetric expanding decoder and wired skip connections straight across, so the high-resolution detail captured early survives all the way to the output. [5] Two properties made U-Net the workhorse for problems like ours. It preserves the thin-structure precision that a one-pixel curve demands, and it learns from very few labelled examples, which is decisive in a domain where ground-truth curves are expensive to produce. The biomedical pedigree is not a coincidence: a neuron membrane and a log curve are the same kind of target, a thin connected line winding through a noisy field.

Encoder-decoder, plus the part that looks around

U-Net set the template, and the refinements that followed mostly deepened it rather than discarding it. DeepLabv3+ kept the encoder-decoder spine and added atrous convolution, a way to widen a filter's field of view without losing resolution, so the network could weigh local stroke detail against the larger layout of the page at once. [6] Multi-scale context, learned, inside the same frame.

The newest ingredient in this period is attention. Vaswani and colleagues introduced self-attention as a mechanism for letting any position in a sequence directly consult any other, regardless of distance. [7] On an image, that is the ability to relate a pixel here to a pixel far away, which is precisely what a curve demands of a model, because the meaning of one point on a trace depends on where the trace was a long way up the page. A short stack of attention layers placed at the compressed bottleneck of an encoder-decoder, where the feature map is small enough that all-pairs comparison is affordable, gives the network long-range continuity reasoning on top of the local precision the convolutions already provide. That hybrid, a residual convolutional encoder, an attention-refined bottleneck, and an upsampling decoder, is the shape of the architecture we use for raster-log digitisation, which we call VeerNet: five encoder stages down, two attention layers across the bottleneck, five decoder stages back up.

A field primer on how machines learned to read a continuous curve out of a raster, told as a timeline you click through. Classical CV (the 1962 Hough transform plus morphology) detects rigid, parametric structure by voting; learned segmentation (FCN and U-Net, both 2015) labels every pixel from examples and keeps thin structure alive with skip connections; encoder-decoder networks with attention (self-attention 2017, DeepLabv3+ 2018) add long-range continuity on the same spine. Click an era to see the period-correct method, its year, a one-line account of how it reads a curve, and a schematic of a noisy scan recovered at that era's fidelity that gets cleaner as the lineage accumulates, and none of the methods is obsolete. The methods, years and citations are the documented prior art the article cites; the recovered-curve fidelity is illustrative, not a measured trace.

Why this lineage made raster-log digitisation tractable

Put the chain end to end and the reason the problem became solvable falls out. None of these methods is obsolete; each occupies the rung where it is strongest, and a working pipeline uses all of them.

The Hough transform and morphology still do the deterministic geometry, finding and subtracting gridlines, thinning strokes, normalising a sheet before a network ever sees it, because for rigid structure a closed-form method beats a learned one on both speed and reliability. The learned encoder-decoder then does the part no rule survives, tracing a free-form curve through faded ink, crossing tracks, and scanner noise that no fixed threshold handles. Attention supplies the long-range continuity that keeps a single curve coherent down a page where it repeatedly meets and parts from its neighbours.

The scale of legacy paper is what makes the learned step non-negotiable. Public archives alone run to staggering counts: the Texas Railroad Commission raster set we work from holds 136,771 scanned TIF log images against only 7,781 born-digital LAS files, so the overwhelming majority of the record exists only as pictures. A morphology-only pipeline tuned to one scan quality cannot survive that variance, and hand-digitising it is not a real option at that volume. A network that learns the curve from examples, scaffolded by the classical geometry it cannot do well itself, can.

That is the quiet payoff of sixty years of document-image research. Reading a curve off a scan was never a single invention. It was a slow accumulation, a voting scheme for lines, a calculus for cleaning strokes, a network that labels pixels, skip connections that keep thin structure alive, and attention that reasons across the page, and only with the whole stack in hand does turning a filing cabinet of paper logs into queryable digital curves become an engineering task rather than a research gamble.

Key takeaways

Line and curve extraction from scanned documents is a sixty-year lineage, not a single method: the 1962 Hough transform detects parametric shapes by voting in parameter space, and remains the right tool for rigid structure like log gridlines.
Mathematical morphology (erosion, dilation, thinning, gridline elimination) cleans and skeletonises strokes before any learning step; careful classical pipelines were still credible for well-log digitisation as recently as 2019, but they are brittle to scan quality.
Fully convolutional networks (2015) turned segmentation into learned per-pixel labelling; U-Net's encoder-decoder with skip connections preserves thin-structure detail and learns from scarce labels, which is why it became the workhorse for curve-shaped targets.
Encoder-decoder refinements (atrous convolution for multi-scale context) and self-attention (long-range continuity at the bottleneck) extend rather than replace the U-Net frame. VeerNet's five-stage encoder, two attention layers, and five-stage decoder sit at the end of this chain.
The methods compose rather than compete: classical geometry handles deterministic structure, the learned network traces free-form curves no rule survives. That combination is what makes digitising legacy raster archives (136,771 scanned TIF logs against 7,781 digital LAS files in the public Texas RRC set) an engineering task rather than a research gamble.

From Hough Transforms to U-Nets: how document-image AI learned to read curves

A line is a vote: the Hough transform

Cleaning the strokes: morphology and thinning

When the network started labelling pixels

Encoder-decoder, plus the part that looks around

Why this lineage made raster-log digitisation tractable

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on