KNN vs GANs vs Interpolation: Filling the Gaps in Borehole Image Logs

A borehole image log is a photograph of the rock taken from the inside of the hole, and like most photographs taken in a hurry it has missing strips. The tool that makes the picture — a high-resolution borehole image log, or its slimmer compact microresistivity tool cousin — presses a ring of electrode pads against the wall and reads conductivity. The pads do not wrap the full circumference. Where the borehole is wider than the pad array can reach, the log records nothing: vertical bands of null data running the length of the well. Before any detector, classifier, or dip-picking algorithm touches that image, somebody has to decide what goes in those bands. Working a 14-well program of two different microresistivity imaging tools in a Middle East carbonate field, we tested four answers to that question — and the one that won was the one nobody writes papers about.

Why the gaps are there, and why they are not noise

The gaps in an image log are not measurement noise to be denoised. They are a geometry problem. A high-resolution borehole imaging tool carries 192 electrodes arranged in two rows across its pads and flaps — 24 copper electrodes per pad — and covers roughly 80% of the borehole wall when the hole is in gauge. The remaining circumference is simply never measured. When the hole washes out, or the tool is run in a larger diameter than its pads were sized for, the unsampled fraction grows and the null bands widen.

High-resolution borehole image log

This matters because the features we care about in a carbonate play — bedding planes, open and healed fractures, vugs — appear in the unrolled image as sinusoids. A planar feature cutting a cylindrical borehole projects to a sine wave when you flatten the cylinder into a 2D image; the amplitude encodes the dip and the phase encodes the azimuth. A single fracture is one continuous sinusoid sweeping across the full width of the image. When a pad gap punches a vertical hole through that image, it cuts the sinusoid into pieces. Whatever you fill the gap with is no longer cosmetic — it is now part of the curve a detector will try to trace, and part of the dip and azimuth a geologist will eventually read off.

So the imputation question is unusually well-posed. We are not asking "what is the most plausible pixel value." We are asking "what fill keeps a sine wave a sine wave across a vertical cut." That single criterion — sinusoid continuity — is what separated the four methods.

The four candidates

We benchmarked four families on the dynamic image channel across the early wells in the dataset, a 14-well vertical set imaged with two different microresistivity tools from a Middle East carbonate field:

1D linear interpolation — fill each gap row by linearly interpolating between the last valid pixel on the left and the first valid pixel on the right. Cheap, embarrassingly parallel, no training.
KNN imputation — scikit-learn's KNNImputer with n_neighbors = 5: each missing pixel is filled from the mean of its nearest neighbours in feature space, where the neighbours are other rows with similar valid pixels.
Iterative imputation — scikit-learn's IterativeImputer: model each feature with missing values as a regression on the others and cycle until convergence.
GAN inpainting — a generative adversarial fill (the GAIN formulation), trained to hallucinate plausible texture into the masked region the way image-inpainting networks fill scratched photographs.

On paper the GAN is the sophisticated choice. Inpainting is a solved-looking problem in computer vision, and a generator that has seen enough rock should be able to paint convincing conductivity into a gap. That intuition is exactly where the experiment got interesting.

A high-resolution borehole image-log pad gap — the dead strip left between two different microresistivity imaging tools' pads — cuts a vertical null band through the unrolled borehole image, and whatever fills it becomes part of the sinusoid a detector traces — so the imputation question is well-posed: does the fill keep a sine wave a sine wave across the cut? Pick a method and the recovered fill redraws across the gap: KNN imputation (n_neighbors=5) interpolates along the curve and stays continuous (teal); the GAN inpaints locally-realistic texture that breaks the curve and exits at the wrong phase (the orange discontinuity is the argument); 1D-linear flattens it to a chord and leaves vertical-line artifacts; the iterative imputer stays continuous but is too slow for per-well runs. KNN won. The method ranking and compute markers (1D ~0.115 s vs KNN ~2.625 s on a 4 m interval; 1D ~11 s whole-well; KNN never finished a whole-well pass) are the article's own; the borehole image texture and the recovered-sinusoid curves are schematic.

What the GAN actually did

The GAN filled the gaps with texture that looked right and was wrong. Conductivity patches inside the null band came out locally plausible — speckle, contrast, the visual signature of carbonate — but the fill did not respect the one constraint that mattered. Where a sinusoid entered the gap on the left at one phase and should have exited on the right at the continuing phase, the GAN had no notion that those two stubs belonged to the same curve. It inpainted each gap as an independent texture-completion problem. The sine wave went in, generic rock came out, and the curve was broken on the far side.

This is not a tuning failure; it is a mismatch between the loss and the task. An adversarial inpainting loss rewards local realism — fool a discriminator that judges patches. Sinusoid continuity is a global geometric constraint that spans the entire image width and ties the two sides of every gap together. Nothing in the GAIN objective encoded "the thing crossing this gap is a single planar feature whose phase is fixed by its azimuth." So the GAN produced the most realistic-looking fills and the least usable curves. We parked it.

The realism trap

Local realism and downstream usefulness are not the same objective. A fill can be visually indistinguishable from real rock and still destroy the global structure — the sinusoid — that every downstream algorithm depends on. When you benchmark an imputation method, score it on the feature you are about to detect, not on how convincing the pixels look.

Why KNN won

KNN imputation preserved sinusoid continuity at the lowest compute cost of any method that worked. The reason is almost banal: by filling each missing pixel from the mean of its five nearest neighbours — rows that already carry the local trend of the curve — KNN interpolates along the structure rather than inventing new structure. It has no generative ambition. Where a sinusoid passes through, the neighbours on either side of the gap encode where the curve is going, and the imputed pixels land on that trajectory. The sine wave stays a sine wave.

It is also cheap enough to be operational, which the iterative imputer was not. The iterative imputer also preserved continuity acceptably but was slow — it cycles regressions to convergence, and on a roughly 1.5 GB image log that is a non-starter for a per-well pipeline. 1D linear interpolation was by far the fastest — on a four-metre interval it averaged 0.115 s against KNN's 2.625 s, and it ran a whole well in roughly 11 s where KNN never finished a whole-well pass at all — but speed bought us artifacts. Linear fills stretch and flatten the curve through wide gaps and leave the vertical-line signature that downstream detectors mistake for real edges.

Segmentation accuracy on raster logs is bounded by source-scan quality and by curve type, not by the architecture. Drag the scan from a clean studio scan to a 4th-generation photocopy of microfiche: noise and fade build on the raster, and the recovered curve degrades — the smooth Gamma Ray trace stays largely locked while the sharp Caliper trace fragments first and worst. The published F1 = 35% (10K mixed archive) is a floor that cleaner inputs lift by +12 F1, not a ceiling. F1 35%/IoU 30%, Pearson r = 0.62 on GR, the +12-curated point and +5K-CALI are the whitepaper's own; the schematic raster, recovered curve and the F1 interpolation are illustrative.

The quantitative tell showed up when we pushed an imputed image through the dip-and-azimuth pipeline and compared against the operator's ground-truth interpretation. At one fracture at 2697.17 m, the KNN-imputed image yielded a predicted dip/azimuth of 65.88° / 271.55° against a ground truth of 74.89° / 307.47° — close on dip, with a real azimuth gap. The alternatives at the same depth were worse: a 1D-interpolated fill drove the estimate to −19.75° / 331.16° and the iterative fill to −37.17° / 260.81° — negative dips, which are physically meaningless and a direct symptom of the fill bending the curve. KNN was the only fill that kept the geometry inside the realm of the believable.

The twist: the best fill was sometimes no fill

There is a coda that complicates the clean story. Once we moved from the unsupervised, classical pipeline to a supervised transformer detector for sinusoids, we ran a head-to-head between KNN-imputed input and a non-imputed input where the nulls were simply set to zero — left as an honest "no data here" marker. The non-imputed input won.

Why zeros beat fills

This is not a contradiction — it is a statement about who consumes the fill. The classical sinusoid-fitting pipeline needs a continuous curve to trace with a Hough transform and a least-squares fit; it cannot fit a sine wave that has holes in it, so it needs KNN. A supervised detector with enough labelled examples learns the gap structure itself and is better off seeing an unambiguous "missing" token than a confident fabrication it has to second-guess. For vug detection we landed on a third answer again — filling nulls with the local median colour rather than interpolating, specifically to avoid the imputation generating false vugs at the gap edges. The right fill is a function of the downstream task, not a property of the image.

GeoBFDT emits the whole (class, depth, dip, azimuth) tuple in one forward pass — but the three axes are not equally hard. Detection along the depth axis is the binding constraint: at a tight 3 cm window fracture F1 is only ~65% (beddings ~63%) and only clears the useful regime for structural work once tolerance loosens to 5 cm (~75% / ~69%); horizontal wells hold ~55% at 4 cm. The geometric axes the interpreter actually fits sinusoids for are already strong at tight tolerance — dip ~90% at 3°, azimuth ~92% (fractures) / ~84% (beddings) at 15°. Pick an axis and step its tolerance: depth is the lever you loosen, dip and azimuth are already past the line. All accuracies and tolerances are the article's own; the dashed ~70% 'useful regime' line is an illustrative reading aid (the article names no exact F1 cutoff).

Key takeaways

Pad gaps on these microresistivity imaging tools are a geometry problem, not noise: the tool only covers ~80% of the borehole wall, so null bands cut through every sinusoid. The fill becomes part of the curve a detector traces.
Score imputation on the downstream feature, not on pixel realism. GAN inpainting produced the most realistic-looking fills and the least usable curves — its adversarial loss rewards local realism, but sinusoid continuity is a global geometric constraint it never encoded.
KNN imputation (n_neighbors=5) won for the classical pipeline: it interpolates along existing structure, preserves sinusoid continuity, and is cheap. The iterative imputer also preserved continuity but was too slow for per-well runs; 1D linear interpolation was fastest (~11s whole-well vs KNN never finishing) but stretched curves and left vertical-line artifacts.
The quantitative tell: at a 2697.17 m fracture, KNN-imputed dip/azimuth (65.88°/271.55°) stayed physically plausible against ground truth (74.89°/307.47°), while 1D and iterative fills produced negative — meaningless — dips.
The best fill depends on the consumer. For the supervised transformer detector, leaving gaps as zeros beat KNN imputation; for vug detection, a local-median fill avoided false vugs. There is no universal imputation; there is only the right fill for the next stage.

What we would build next

The honest limitation of every method here is that none of them knows about sinusoids. KNN won by accident of locality, not by design. The principled next step — flagged in our Phase 1 reporting but not yet built at the time — is a masked-autoencoder approach in the spirit of He et al. (2021): mask the pad gaps deliberately during self-supervised pretraining and let the network learn to reconstruct the rock fabric, including the continuation of planar features, from the surrounding context. A model trained to reconstruct masked image-log patches would, unlike the GAN, have a reconstruction objective that directly penalises a broken sinusoid. That is the version of "fill the gap" worth building — but it earns its keep only against a baseline, and the baseline to beat is not the GAN. It is KNN.

(undefined, undefined)

References

[1] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick. Masked Autoencoders Are Scalable Vision Learners (2021). arXiv:2111.06377. The self-supervised masked-reconstruction recipe referenced as the principled successor to KNN/GAN gap-filling for image logs. https://arxiv.org/abs/2111.06377

KNN vs GANs vs Interpolation: Filling the Gaps in Borehole Image Logs

Why the gaps are there, and why they are not noise

The four candidates

What the GAN actually did

Why KNN won

The twist: the best fill was sometimes no fill

What we would build next

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on