When the Reviewer Asks Why Not Deep Learning: Defending a Classical CV Pipeline in 2024

Reviewer 1 asked the question every 2024 reader was already thinking. We had submitted a paper that quantifies dissolution pores, called vugs, in carbonate borehole-image logs using a classical computer-vision pipeline: histogram mode subtraction, adaptive thresholding, contour extraction, and a pair of geometric filters. No neural network anywhere in it. Deep segmentation networks were everywhere in the logging literature by then. Why hand-build an image-processing chain when a model could learn the whole thing?

The reflexive defence is that classical methods are lighter, more interpretable, cheaper to run. All true, and all beside the point. When we went to the literature to answer the question properly, the answer inverted the premise. There was no deep-learning method to fall back from, because the thing a deep-learning method would need did not exist yet. We were not choosing the old tool over the new one. We were building the thing that has to come first.

What the literature check actually returned

The question we set out to answer was narrow and checkable: does a supervised, regression-based vug quantifier exist? Not a vug detector. A quantifier, something that outputs per-vug geometry you can integrate into a porosity number, the way our sinusoid regressor outputs dip and azimuth per feature.

The answer was no. Prior machine-learning work on vugs was classification only. Models labelled a zone or a patch as vuggy or non-vuggy, presence or absence, and the standard framing treated most of a resistivity image as valueless data to be pre-filtered before a handful of geological classes were assigned [1]. None returned the boundary of an individual pore. The reason they stopped at classification was not a modelling failure but a data failure. You cannot train a supervised model to regress or segment vug contours when nobody has ever annotated vug contours. Presence-or-absence is what people could annotate at scale, so presence-or-absence is what the models learned.

That is the whole hinge of the defence. The gap in the literature is not a missing architecture. It is a missing dataset. And a missing dataset is exactly what a classical pipeline is good for.

The pipeline is the dataset factory

Our classical chain does not emit a label per zone. It emits a contour per vug. For every dissolution pore in a section it extracts a closed boundary, then gates that boundary on area and circularity to drop the false positives that image-log artefacts throw off. (The mechanics of that gate are the subject of a separate piece and I will not re-derive them here.) The output is a set of per-pore polygons with geometry attached.

Look at what those polygons are. A per-pore polygon over an image is a segmentation mask. The ground truth a supervised segmentation or regression model would need is precisely what our pipeline produces as its ordinary output. Run the classical method across a well and you have not only measured that well; you have manufactured an annotated corpus, image and contour and area and circularity and depth, for every vug, at whatever volume the wells give you.

So the framing that a classical pipeline is a fallback for teams who cannot train a model gets the dependency backwards. There is no model to fall back from. The pipeline is the step that makes the model trainable at all, and the contours it writes today are the masks a future network trains on tomorrow.

The reviewer-1 defence of a classical vug pipeline in one plate. Left: the literature check that grounds the method as a ladder - no supervised regression-based vug quantifier exists (prior ML vug work is vuggy / non-vuggy classification only), because no contour-annotated vug dataset exists to train one, so the classical pipeline extracts a contour for every vug, and those contours are the masks a future ML or DL model would need. The one orange rung carries the argument: the classical run is the dataset factory, not a fallback. Right: the deployability half of the same defence - six parameters (k=5, delta-m=5, centroid difference=5, IoU merge 20%, Laplacian bbox expansion +10%, mean-filter radius 2R) held universal across every well, orientation and imaging tool, shown as six fixed dots that never move; only two per-well base values, the Laplacian and mean-based thresholds, are calibrated once from one or two representative sections and then frozen for all depths. The well selector cycles the three validated wells (vertical plus horizontal, two imaging tools); the six dots stay put while only the two calibrated bars re-read, making the 6-vs-2 split visible. Sourced from the vug-paper reviewer letters, manuscript and supplementary; one validation well calibrated to Laplacian 0.70 and mean-based 1.00, another to 0.15 and 0.40, with a reference well whose base value is set once. Every number shown here traces to the engagement archive; nothing is illustrative.

We kept this framing consistent through the rest of the review. A second reviewer pushed us to move recent deep-learning references, super-resolution and transformer segmentation work, up into the introduction. We declined and cited them only in the future-scope paragraph, because putting them in the introduction would have implied our method competes with that work. It does not. It precedes it. A pixel-segmentation network of the kind proposed for image-log interpretation still needs a labelled mask to train on and still needs post-hoc fitting to recover geometry [2], which is exactly the gap the contours fill. The classical pipeline is the data-generation stage of a program a segmentation network could one day sit on top of, once the corpus our contours build is large enough to train on.

The other half of the defence: does it actually deploy

A dataset factory is only useful if it runs on more than the well you tuned it on. A method that needs bespoke hand-tuning per well produces a bespoke corpus per well, which is no corpus at all. So the second thing the reviewers pressed on, and the second half of the defence, was parameter governance. If eight knobs have to be re-set for every new well, the deployability claim collapses. They do not. We split the parameter set into two disjoint groups, and that split is the argument.

Six parameters are universal. They are held constant across every well, every orientation, and both imaging tools, and they never change: the number of histogram modes subtracted (five), the mode spacing (five), the centroid-difference threshold for merging duplicate contours (five), the intersection-over-union merge threshold (20 percent), the Laplacian bounding-box expansion (10 percent each side), and the mean-filter enlarged-circle radius (twice the contour radius). We fixed these once, from analysis, and used the same values for the whole study. The mode-count and spacing choice, for instance, comes from a mode-yield analysis that is its own story; the point here is only that once chosen, it is frozen.

Two parameters are per-well base values: the Laplacian threshold and the mean-based threshold in the contour-refinement filters. These have to be established for a new well, because the grayscale statistics of a borehole image drift with depth and differ between tools. But they are set once, from one or two representative sections, and then frozen for every depth in that well. On the validation wells the values were concrete: one settled at a Laplacian threshold of 0.70 and a mean-based threshold of 1.00, another at 0.15 and 0.40, against a reference well. Two numbers, set once, per well. Everything else is fixed.

That six-versus-two split is what makes the corpus reproducible. Six universal parameters keep the extracted contours comparable across wells rather than an artefact of per-well tuning. Two calibrated base values mean a new well costs a few minutes of one-time setup, not a re-derivation of the method. We validated the protocol on three wells that spanned the hard cases: a vertical and a horizontal well, imaged by two different tools. The universal six held across all of them. Only the two base values re-read.

Why this is the honest answer, not a rationalisation

It would be easy to read all of this as a story a team tells to avoid admitting it did not train a model. One fact from the same engagement rules that reading out.

We did use deep learning elsewhere in the same engagement. The sinusoid-fitting side of the work, detecting fractures and beds and regressing their dip and azimuth, is a Detection Transformer, because there the ground truth existed: human interpreters had picked thousands of sinusoids we could match against. The choice was never classical-versus-neural as an ideology. It was driven by whether the labelled data to train a supervised model was in hand. For sinusoids it was, so we trained one. For vug contours it was not, so we built the pipeline that generates it. Same team, same engagement, opposite tool, and the deciding variable was the dataset both times.

That is the defence in one line. The reviewer asked why not deep learning, and the correct answer was that deep learning for this task does not have a dataset yet, and our classical pipeline is how you get one. The objection assumed we had skipped a step. We had built the step everyone else skips.

Limitations

This argument is scoped to vug quantification, where the literature we surveyed was classification-only at submission time. It is not a general claim that classical CV precedes deep learning for every image-log task; on the sinusoid side of the same project, existing human picks made a supervised transformer the right first choice. The claim that the extracted contours are usable as training masks is an argument from their form, not a trained-model result: we did not, in this work, train a downstream segmentation network on the generated corpus, so the enabler role is demonstrated by construction rather than by a final learned model. The per-well base values reported come from three validation wells; the two-parameter calibration burden could grow on wells with imaging conditions outside that range. The universal-parameter set was fixed from our own analysis on this dataset and would need re-confirmation before being treated as universal on a materially different carbonate system or imaging tool.

References

[1] Ren, W. et al. (2020). Valuable Data Extraction for Resistivity Imaging Logging Interpretation. Tsinghua Science and Technology, 25(2), 281-293.

[2] Wang, W. and Zhou, W. (2023). A W-shaped dual encoder-decoder segmentation network for borehole-image fracture interpretation. Petrophysics, 64(1), 38-49.

When the Reviewer Asks Why Not Deep Learning: Defending a Classical CV Pipeline in 2024

What the literature check actually returned

The pipeline is the dataset factory

The other half of the defence: does it actually deploy

Why this is the honest answer, not a rationalisation

Limitations

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on