Keypoint Detection vs Direct Dip/Azimuth Regression: The Depth-Angular Trade-Off

When a Detection Transformer picks a fracture off a high-resolution borehole image log — a strip from one of two different microresistivity imaging tools, the cylindrical wellbore unrolled flat — it does not hand you a fracture. It hands you a tuple — a class label plus three numbers: depth, dip, and azimuth — emitted in a single forward pass for every query the decoder keeps. That tuple is where the modelling problem stops being about set prediction and starts being about how you turn the decoder's output into the three quantities a petrophysicist reports. In a roughly twenty-month engagement with a mid-sized Middle East carbonate operator we partnered with, this is where we spent a disproportionate amount of our engineering time, because the obvious design — regress all three numbers directly off the decoder — has a defect that no amount of training data fixes. This piece is about the two output heads we built, why they fail on opposite axes, and how a 2 cm tolerance and the resolution of a digital-log-format pixel decide which one you should ship.

One transformer, two ways to read its output

Start from the shared backbone. The detector is a DETR-style model — a from-scratch ResNet-10 feature extractor, a four-layer transformer encoder and decoder, the Hungarian bipartite matching loss handling assignment, focal loss on the class and L1 on the parameters. None of that is in question here. What is in question is the head: the last few linear layers that convert each surviving decoder query into the geometry of a sinusoid.

There are two ways to do it.

The first, and the one most teams reach for, is direct regression. Add a small regression head — two linear layers — and train it to emit depth, dip, and azimuth straight off the query embedding, supervised by L1 against the matched ground-truth values. Clean, end-to-end, fully differentiable. The model optimises exactly the three numbers the geologist reads.

The second is keypoint detection plus reconstruction. Instead of regressing the angles, you train the model to localise the keypoints of the sinusoid on the unrolled image — the points the trace passes through — and then, in a post-processing step, you fit a sine curve to those keypoints and read dip and azimuth out of its amplitude and phase. A fracture on an unrolled wellbore is exactly Amplitude·sin(0.0175·x + Phase) + Offset; amplitude encodes dip, phase encodes azimuth, and offset encodes depth. So if you can pick the keypoints accurately, the geometry falls out of a least-squares fit rather than a learned regressor.

Two heads, one model, and — as it turned out — two completely different failure modes.

The defect direct regression cannot shake: depth shift

The direct-regression model was good at angles and quietly bad at depth. Not bad in the sense of large random error — bad in the sense of a systematic vertical shift. Predicted sinusoids would land with the right shape and the right tilt but a few centimetres off in depth, consistently, across the well. For an interpreter trying to tie a fracture to a specific bed boundary, a consistent few-centimetre offset is not noise; it is a registration error that quietly corrupts everything stacked below it.

Two things make this hard to train away. The first is physical and non-negotiable: one pixel of a digital-log-format image is about 3 cm of measured depth, so there is always a ±3 cm depth quantisation baked into the raw data before the model sees a single patch. The second is that an L1 head regressing a continuous offset has no particular pressure to nail the vertical coordinate over the angular ones — the gradient is happy to trade a little depth accuracy for a little angular accuracy, and depth is the axis with the least visual signal to anchor it. The result is a model whose angles you trust and whose depths you don't.

That matters enormously for how you score it. We defined a true positive geologically, not by IoU: a predicted sinusoid counts as a true positive only if its depth lands within 2 cm of a real fracture — and 2 cm is 6 pixels on these logs. Dip and azimuth error are then measured only on the picks that clear that depth gate. So a depth shift is not a cosmetic problem. It is the difference between a true positive and a false positive. A pick at the correct angle but 3 cm off is, under a 2 cm tolerance, simply wrong — it never enters the dip/azimuth average at all. The early reconstruction proof-of-concept made the stakes blunt: on one blind horizontal well, only 6 of 16 fracture sinusoids fell within the 2 cm / 6 px window — a 37.5% localisation hit-rate that no amount of angular precision could rescue, because the gate is depth-first.

What the keypoint head buys, and what it costs

The keypoint head exists to fix exactly this. By making the model localise the sinusoid's keypoints rather than regress a depth offset, you put the supervision where the binding constraint is. The model learns to put points on the image at the right vertical positions, and depth then comes from where those points sit, not from a regressed scalar. In our comparison the keypoint pipeline gave materially better depth localisation — it tightened the vertical registration that direct regression kept smearing — and that, in turn, moved more picks across the 2 cm gate and into the "counts as a detection" column.

But it is not free, and the trade is clean: keypoints win on depth and lose, slightly, on dip and azimuth. Once you reconstruct angles by curve-fitting a handful of picked points, your angular accuracy is only as good as those points and the fit through them. A keypoint that is one or two pixels off in the horizontal direction barely moves depth but perceptibly perturbs the fitted amplitude and phase — so the very mechanism that sharpens depth slightly blurs the angles. Direct regression, optimising the angles end-to-end against an L1 target, holds onto them better. You are moving the error budget from the angular axes to the depth axis, or back, depending on which head you pick. There is no head that is best on all three at once.

The accuracy ladder below is the cleanest way to see why the choice is axis-dependent. Step the depth tolerance and the detection rate is marginal at a tight 3 cm window and only clears a useful bar once you loosen it; step dip or azimuth and the geometry is already strong at tight tolerance. Depth is the axis you have to loosen to make work. Dip and azimuth are already past the line.

GeoBFDT emits the whole (class, depth, dip, azimuth) tuple in one forward pass — but the three axes are not equally hard. Detection along the depth axis is the binding constraint: at a tight 3 cm window fracture F1 is only ~65% (beddings ~63%) and only clears the useful regime for structural work once tolerance loosens to 5 cm (~75% / ~69%); horizontal wells hold ~55% at 4 cm. The geometric axes the interpreter actually fits sinusoids for are already strong at tight tolerance — dip ~90% at 3°, azimuth ~92% (fractures) / ~84% (beddings) at 15°. Pick an axis and step its tolerance: depth is the lever you loosen, dip and azimuth are already past the line. All accuracies and tolerances are the article's own; the dashed ~70% 'useful regime' line is an illustrative reading aid (the article names no exact F1 cutoff).

The asymmetry in that ladder is the whole argument. Detection along the depth axis is the binding constraint — at 3 cm, fracture F1 is only around 65% (beddings ~63%), genuinely marginal, and it only reaches the regime a structural geologist will actually use once tolerance opens to 5 cm (~75% / ~69%). The geometric axes you would think are the hard part are not: dip is already ~90% accurate at a 3° threshold, azimuth ~92% (fractures) at 15°. So the question "which head?" is really the question "is your geologist gated on depth or on angle?" — and the model's own error profile tells you they are gated on depth.

A small engineering detail that decides the comparison: the inference threshold

One non-obvious thing falls out of running both heads side by side, and it is the kind of detail that separates a fair comparison from a misleading one. The two heads do not want the same operating point. The direct-regression model we shipped runs its query-keep decision at a 0.5 object-probability threshold — recall-leaning, because focal loss is already handling the heavy no-object imbalance and we want it to surface candidate sinusoids. The keypoint model runs at 0.9 — precision-leaning, because the downstream curve fit is fragile and a spurious keypoint poisons the reconstructed angle for the whole sinusoid. Bolt a curve-fitting post-processor onto a recall-tuned detector and you will fit sine waves through noise; run a keypoint model at a permissive threshold and the angular cost we just described gets much worse. The head you choose changes the threshold you tune, and benchmarking them at a single shared cutoff would have quietly handicapped one of them. Picking the operating point is a deployment decision, not a hyperparameter footnote — it lives with the head, and it is the kind of choice you carry into MLOps and keep monitoring once the model is grading real wells.

It is worth saying plainly that neither head is failing because the model is undertrained. Both ride the same data-appetite curve every set-prediction model on this problem rides — going from 8 to 11 training wells bought only about a 0.007 MAE improvement in depth, dip, and azimuth, which tells you the residual depth shift is not a "needs more wells" problem. It is a head-design problem, structural to direct regression, and the keypoint head is the architectural answer rather than a data one.

How to actually pick

The decision rule is short, and it is not "keypoints are better." It is:

If your interpreter's downstream task is structural — tying fractures to bed boundaries, building a depth-registered fracture log, feeding a well-to-well correlation where a few centimetres of drift compounds across the section — then depth is your binding axis and you ship the keypoint head, accepting a small angular cost you can quantify and disclose.

If the downstream task is orientation-driven — a stereonet, a dip-azimuth distribution, a stress or fracture-set analysis where the absolute depth of any single pick matters less than getting the angles right across the population — then direct regression is the better trade, and the residual depth shift is tolerable because nothing downstream is gated tightly on it.

And if you genuinely need both at once, the honest answer from this engagement is that you run both heads off the shared transformer and reconcile them — let direct regression own the angles, let the keypoint reconstruction own the depth, and gate the merge on that same 2 cm tolerance so the registration stays in the physical units the petrophysicist trusts. The transformer is shared; only the last few linear layers differ. That is the cheapest two-headed insurance policy in the whole pipeline.

The broader lesson, the one that outlives borehole logs, is that the output head is a modelling decision, not a plumbing detail. The same encoded features can be read out as a direct regression or as a localise-then-reconstruct pipeline, and those two readouts have different, almost orthogonal error profiles. Pick the head whose strong axis is the axis your evaluation — and your geologist — actually grades you on. Everything upstream of the head can be identical and you will still ship the wrong model if you get that one choice backwards.

Key takeaways

A DETR-style fracture detector emits a (class, depth, dip, azimuth) tuple per query; the output head that converts decoder features into those numbers is a real modelling choice with two options — direct regression vs keypoint detection plus sinusoid reconstruction — not a plumbing detail.
Direct regression nails dip and azimuth but leaves a systematic vertical depth shift, partly because 1 digital-log-format pixel ≈ 3 cm so a ±3 cm depth quantisation is baked into the raw data, and an L1 head has no special pressure to favour the depth axis over the angular ones.
Keypoint detection + curve-fit reconstruction puts supervision on the vertical positions and fixes depth, but pays a small angular cost: a keypoint a pixel or two off in the horizontal direction barely moves depth yet perturbs the fitted amplitude (dip) and phase (azimuth).
The trade-off is bounded by hard scoring physics: a true positive requires the pick within a 2 cm depth tolerance (= 6 px), and dip/azimuth error is measured only on picks that clear that gate — so depth localisation decides what even counts as a detection. An early reconstruction PoC hit only 6/16 (37.5%) within the 2 cm window.
Practical detail: the two heads want different inference thresholds — direct regression at 0.5 (recall-leaning), keypoints at 0.9 (precision-leaning, because a spurious keypoint poisons the curve fit). Benchmark each at its own operating point. Pick the head whose strong axis — depth for structural work, angle for orientation analysis — is the one your geologist grades you on; or run both and reconcile.

Keypoint Detection vs Direct Dip/Azimuth Regression: The Depth-Angular Trade-Off

One transformer, two ways to read its output

The defect direct regression cannot shake: depth shift

What the keypoint head buys, and what it costs

A small engineering detail that decides the comparison: the inference threshold

How to actually pick

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on