When a carbonate borehole-image detector is short on labelled wells, the first lever any engineer reaches for is augmentation. Cut the well into overlapping patches, apply a stack of photometric transforms to the ones that carry a sinusoid, and the training count climbs from a few hundred into the low thousands. On the fracture and bedding detector we built for a mid-sized Middle East carbonate operator over a roughly twenty-month engagement, we ran that lever to its end on one thin reservoir interval and measured what it actually bought. The answer changed how we spent the rest of the project. Early on, augmentation earned a large, real gain. Past a point it stopped helping and started to hurt, and the reason was specific to what borehole-image labels are.
The seed was thin, and the copies were many
The interval we augmented was small on purpose, so the arithmetic is easy to follow. Before any augmentation it held 236 patches, of which only 19 carried a sinusoid at all, and those 19 patches contained 32 individual sinusoids between them. That is the whole geological signal in the interval: 32 real curves, picked once by a human.
We augmented the sinusoid-bearing patches ten ways each, with the usual photometric set, ColorJitter, GaussianBlur, Sharpen, GaussNoise, Emboss, MedianBlur, at a patch of 100 by 270 pixels with a stride of 20. The corpus grew more than tenfold: 236 patches became 4212, the 19 sinusoid patches became 2046, and the 32 sinusoids became 3565. On a dataset card that reads as a real training set. It is not. Every one of the 2046 sinusoid patches is a photometric re-transform of one of the same 19 originals, and every one of the 3565 sinusoids traces back to one of the same 32 picks. The number that grew was the count. The number that matters, distinct geology carrying independent label information, never moved off 32.
That distinction is the entire piece. Write the apparent count as the product of a fixed real seed and an augmentation factor, and the illusion is obvious on the page:
Only the left term climbs. The right term, the one the model actually learns generalisable structure from, is pinned by how many real wells and real picks you have.
The gain was real, right up until it was not
None of this means augmentation was useless. The augmentation ablation is unambiguous at its endpoints. Trained with no augmentation on this seed, the detector's class error sat at 100, the model simply did not learn the minority sinusoid class. Turn augmentation on and class error dropped to about 2.618, with the Hungarian matching loss falling from 0.174 to 0.0135 and the parameter loss from 0.575 to 0.062. That is a genuine, large gain, and it is why we augment at all: a handful of real curves, seen only once each, is not enough for a Detection Transformer to fit the class, and photometric variety on those curves teaches the network to ignore contrast and sharpness differences it should ignore.
The problem is what sits between and beyond those two endpoints. Photometric augmentation does not add a single new fracture geometry. It re-lights the 32 you already have. Once the network has seen enough re-lightings to become invariant to lighting, every further copy is redundant with respect to geology and non-redundant with respect to noise: if one of those 19 seed patches was mispicked, or picked in a slightly different hand, augmentation faithfully reproduces that error 10 times and weights it 10 times as heavily in the loss. Synthetic volume is a noise multiplier once the useful invariance has been learned.
The instrument plots that shape. The class-error curve is anchored at both sourced ends, 100 with no augmentation and 2.618 with it, and the illustrative envelope between them dips toward a floor and then climbs again as the factor is pushed past its useful range. We pin that floor and that climb to numbers from a completely different experiment, which is the point of the next section.
Two experiments, one verdict, from opposite directions
We had already run a well-count ablation on the same detector: hold the method fixed and add real wells one group at a time. That curve fell steeply and then did something we noted at the time, class error bottomed at about 0.817 with eleven wells and then rose to 2.536 at fourteen wells, a small non-monotone up-tick at the tail rather than a clean asymptote. The mechanism behind that up-tick, a well whose ground truth was picked in a different style, is a separate case study and we will not re-derive it here. (See The 15th Well Made Our Model Worse: Label Consistency Beats Data Volume and How Many Wells Is Enough? A Well-Count Ablation for Fracture Detection.)
What matters here is that the augmentation ablation and the well-count ablation converge. One pushes synthetic volume up while holding real wells fixed; the other pushes real wells up while holding the method fixed. Both curves bottom and then climb. The augmentation curve climbs because copies amplify the label noise in a fixed seed; the well-count curve climbs because a fourteenth well arrives with a differently-styled set of picks. Different levers, same failure surface: added rows stop being independent information and start being correlated noise. When two experiments designed from opposite directions land on the same non-monotone shape, the conclusion is not an artefact of either. The binding constraint is label quality, not row count.
Augmentation also fights the count distribution
There is a second cost that is easy to miss. The sinusoid-count-per-patch distribution in this data was already long-tailed: more than 95% of patches carry roughly one modest sinusoid count, and a thin minority carry far more. A Detection Transformer with a fixed query budget is sensitive to that skew, and the mechanism is worked out in a companion piece rather than repeated here. (See The Sinusoid-Count Skew: Why a Long-Tailed Patch-Label Distribution Wrecked Early Training.)
The relevant fact for augmentation is directional. Copying the sinusoid-bearing patches ten ways does not flatten that tail; it re-samples the existing shape and, because you augment the patches you have rather than the ones you wish you had, it can steepen it. You end up with 2046 sinusoid patches drawn from the same skew that 19 patches had, plus whatever imbalance the empty-patch majority contributed. Volume did not fix the distribution problem. It photocopied it at scale. This is the same lesson a separate model iteration taught us when a much larger augmented corpus of roughly 92k overlapping patches actively hurt performance while real wells helped; the mechanics of that particular cliff are covered elsewhere. (See The 92k-Patch Trap: When More Synthetic Data Made the Model Worse.)
What we changed after this
The practical output was a rule we applied for the rest of the engagement. Augmentation is a fixed, early-saturating tool: apply enough photometric variety to buy lighting invariance, then stop, because past that point every copy raises the weight of your worst picks. The lever that keeps paying is not the augmentation factor. It is the count of distinct, style-consistent real wells, and the consistency of the picks inside them. We spent the back half of the project on label hygiene and per-class, style-consistent training rather than on denser overlap and more transforms, and that is where the accuracy came from.
For anyone standing at the same fork, the diagnostic is cheap. Compute the ratio of apparent training count to distinct real picks. If it is already large and the model is still short, more augmentation will not save you, because you are noise-limited, not volume-limited. Go get a clean well.
Limitations
The two sourced endpoints of the augmentation ablation, class error 100 with no augmentation and 2.618 with it, and the corpus counts 236 to 4212, 19 to 2046, and 32 to 3565, are measured on one thin reservoir interval of a single carbonate operator's wells; the absolute numbers will not transfer to other tools, formations, or annotation teams. The curve shape drawn between and beyond the two endpoints in the instrument is illustrative, anchored at both ends and at the well-count floor and up-tick but not itself a swept measurement. The convergence argument is a claim about the direction of two experiments, not a proof that augmentation and well-count share a single mechanism; they share a failure mode, correlated added rows, through different causes. Finally, this is a statement about photometric augmentation of a fixed real seed. Augmentation that injects genuinely new geometry, rather than re-lighting existing curves, is a different question and is not what these numbers measure.