A fracture on a borehole image log is a sinusoid, and a sinusoid has two kinds of correctness. There is where it sits — the depth of its centre on the wellbore — and there is what shape it has — the amplitude and phase that a geologist reads back as dip and azimuth. When we built a Detection Transformer to pick fractures and bedding planes on the imagery from two different microresistivity imaging tools of a mid-sized Middle East carbonate operator we worked with, the model learned the shape beautifully and the position almost-but-not-quite. Every prediction carried a small, stubborn vertical shift: the regressed sinusoid was the right wave in the right place to within a centimetre or two of depth, but that centimetre or two was enough to flip a true positive into a false one under the depth tolerance the client cared about. This post is about why direct depth regression leaves that residual shift, and how a keypoint-detection post-processing stage reconstructs the sinusoid's position to fix it — trading a sliver of angular accuracy against a hard physical floor on depth.
The model that regresses depth directly
The detector is a DETR-style set-prediction model — the same one we wrote about when we covered the bipartite matching loss. A ResNet-10 backbone feeds a transformer encoder–decoder; a fixed bank of decoder queries each emits one candidate sinusoid, and a focal classification term plus an L1 regression term train the queries against the Hungarian-matched ground truth. For each matched query the L1 head regresses three numbers directly: a depth, a dip, and an azimuth, each normalised to the unit interval. The production checkpoint — 0.0004_250_resnet10_l1_focal — is the strongest of that family.
Regressing depth directly is the obvious thing to do, and it is where the trouble starts. The model treats depth as one more continuous scalar to minimise L1 error on, with no special structure. But depth is not like dip and azimuth. Dip and azimuth are properties of the shape of the whole sinusoid — the entire trace votes on them. Depth is a property of a single privileged location, the wave's centre, and a direct regressor has to infer that centre from the same pooled feature map it uses for everything else. The result is a head that is excellent on average and biased in particular: across a blind interval the predicted centre line sits a near-constant offset above or below the truth. The wave is right. The wave is just shifted.
Why a centimetre matters: the 1 px = 3 cm floor
To see why a small shift is a real problem and not a rounding nuisance, you have to look at the units the imagery actually arrives in. The image logs the operator delivered — stored as the binary wireline log file — carry an intrinsic resolution of 1 pixel = 3 cm of measured depth. That is a hard physical floor, not a modelling choice: the tool sampled the borehole wall at that pitch, so the finest depth distinction the data can ever express is one pixel, and any depth estimate inherits a baseline ±3 cm uncertainty before the model does anything at all.
Now layer the client's evaluation rule on top. A prediction counts as a true positive only if its picked depth lands within a 2 cm tolerance of a real fracture; a sinusoid that is geometrically perfect but vertically shifted past that band is scored as a false positive, and the real fracture it failed to land on becomes a false negative. So the evaluation tolerance (2 cm) is tighter than the data's own resolution (3 cm). In that regime a one- or two-pixel vertical bias is not cosmetic — it is the difference between a pick that counts and a pick that is thrown away, even though dip and azimuth were correct. The whole margin lives in the depth term, and depth is exactly the term direct regression was getting subtly wrong.
Reconstructing position from keypoints
The fix reframes what the model should predict. Instead of asking a regression head to name the depth as a scalar, we ask a second, keypoint-detection stage to locate the sinusoid's anatomy and then reconstruct its position from geometry. A sinusoid is fully described by
where the amplitude A encodes dip, the phase φ encodes azimuth, and — critically — the offset is the vertical position, the very thing the direct head was shifting. If you can detect a few keypoints on the trace — a peak, a trough, a zero-crossing — you can fit those parameters by curve-fitting rather than predicting them in one shot. The offset that best fits the detected keypoints is the corrected depth, and because it is recovered from the trace's own extrema rather than from a pooled global feature, it stops carrying the regression head's systematic bias.
Mechanically this is a post-processing stage, not a replacement model. The keypoint detector runs on the patch, emits a heatmap of candidate sinusoid points, and we keep the high-confidence ones at a 0.9 probability threshold — a deliberately conservative cut, because here precision on the keypoints matters more than recall: a single spurious point can drag the curve fit off. The surviving keypoints are fed to the sinusoid fit, the fit returns amplitude, phase, and offset, and the offset replaces the directly-regressed depth. Note the threshold difference from the detector's own inference setting — the focal-trained detector runs at a recall-weighted 0.5, while the keypoint stage runs at a precision-weighted 0.9. They are tuned for opposite jobs: find every fracture, then pin each one down exactly.
We validated the swap on a held-out blind zone — a continuous interval with no training overlap, around 2635–2660 m — where the keypoint-reconstructed picks tracked the ground-truth centre lines through the band the direct head had been shifting across. The vertical shift, the artifact that had survived every round of fine-tuning on the direct model, was gone.
The trade-off nobody escapes
Keypoint reconstruction is not a free lunch, and the honest version of this story is the trade-off. Recovering depth from detected keypoints made depth localization clearly better — the offset is now pinned to the trace's own geometry. But reconstructing dip and azimuth from a handful of keypoints is slightly worse than regressing them directly from the full feature map, because the direct head got to use the entire sinusoid to vote on its shape, whereas the curve fit sees only the points that cleared the threshold. So the two heads are complementary, not strictly ordered: the direct-regression model is marginally sharper on angle; the keypoint model is decisively sharper on position.
That is the real engineering decision, and it is a decision about which error the geologist cannot tolerate. In a fractured carbonate where picks feed depth-registered well-to-well correlation, a depth that is two pixels off corrupts the spatial register that everything downstream depends on, while a dip that is a fraction of a degree off is comfortably inside interpretive noise. For this operator, depth was the term to protect. And no choice of head changes the floor underneath both of them: with 1 px = 3 cm, ±3 cm of depth uncertainty is baked into the imagery. Keypoint reconstruction does not beat that floor — nothing can — it simply stops the model from adding an avoidable shift on top of the unavoidable one.
Takeaways for the practitioner
The general lesson outlasts the borehole. When a model regresses several quantities jointly and one of them is positional — a depth, a timestamp, a coordinate that a tight downstream tolerance keys on — a single shared regression head will often be excellent on the shape parameters and systematically biased on position, because position is a property of one location while shape is a property of the whole object. Splitting position off into a keypoint-detection-plus-geometric-fit stage lets you recover it from the object's own anatomy rather than from a pooled global feature, which removes the bias. You will usually pay for it in a small loss of accuracy on the parameters you didn't split off — and the right call is set by which error your consumer cannot absorb, measured against the physical resolution floor of the data, not against zero.
Key takeaways
- A DETR-style detector that regresses depth, dip, and azimuth jointly learned fracture shape well but carried a residual vertical shift in depth — because depth is a property of one location (the wave's centre) while dip and azimuth are properties of the whole sinusoid, and a single L1 head infers the centre from a pooled feature map.
- The shift matters because the imagery has a hard 1 pixel = 3 cm log-resolution floor (±3 cm intrinsic depth error), and the evaluation tolerance — a true positive only within 2 cm of a real fracture — is tighter than the data's own resolution, so a one-pixel bias flips a geometrically-correct pick into a false positive.
- A keypoint-detection post-processing stage — run at a deliberately precision-weighted 0.9 confidence threshold, versus the direct detector's recall-weighted 0.5 — detects the sinusoid's peak/trough/zero-crossing and curve-fits A·sin(ωx+φ)+offset; the fitted offset is the corrected depth, recovered from the trace's own geometry instead of a biased global regression.
- The trade-off is real and unavoidable: keypoint reconstruction is decisively better on depth and slightly worse on dip/azimuth (the curve fit sees only thresholded points, not the full feature map). Pick the head by which error your downstream consumer cannot tolerate — here, depth, because it anchors well-to-well correlation.
- No post-processing beats the physical floor: 1 px = 3 cm means ±3 cm is permanent. Keypoint reconstruction only removes the avoidable shift the model added on top of the unavoidable one.
References
[1] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. End-to-End Object Detection with Transformers (DETR). ECCV (2020). The set-prediction backbone the fracture detector is built on. https://arxiv.org/abs/2005.12872