Augmentation Is a Design Decision, Not a Default

There is a default setting in most computer-vision pipelines, and the default is to reach for the augmentation everyone else reached for. Random crop, horizontal flip, a dash of colour-jitter, maybe a rotation. It is a sensible default precisely because it is generic, and the strongest line of work in the field made that genericity a virtue: instead of hand-picking transforms, search for the policy. AutoAugment framed augmentation as a discrete optimisation problem and learned a policy directly from data, treating the choice of transform, its probability, and its magnitude as things you should not be guessing at (Cubuk et al., 2019). RandAugment then showed you could throw away most of the search, parameterise the whole policy with two numbers, and match or beat the learned recipe at a fraction of the cost (Cubuk et al., 2020). The lesson the field took, correctly, is that you should not be hand-tuning augmentation magnitudes by feel.

We want to make a narrower argument that sits on top of that work rather than against it. The learned-policy results are about which magnitudes to pick from a fixed menu of transforms, and that menu was assembled for natural-image classification benchmarks. The question those papers answer beautifully is not the question a scanned paper well log asks. On a raster well log the menu itself is wrong before any search begins, because most of the transforms on it model corruptions our images cannot carry, and the ones that matter model corruptions a benchmark never sees. Augmentation, for this kind of imagery, is a design decision you make against the failure mode in front of you, and then, if you like, you search the magnitudes of the handful of transforms that survived. This piece is one worked example of making that decision, on raster well-log digitisation, with our network, VeerNet.

What a raster well log actually is

A well log on paper is not a photograph of the world. It is a deterministic rendering of structured source data, a few analogue curves drawn across a ruled depth grid, captured decades later through a flatbed scanner or a phone. That single fact rearranges the whole augmentation question. The input is one grayscale channel, not three colour channels. There is no lighting, no white balance, no scene. There is ink, paper, and a scanning channel that bends and degrades the page in a small number of very specific ways.

So the failure modes are not abstract. A page fed crooked into a scanner shears every depth row. A flatbed or phone capture is almost never square to the page, so the ruled grid the network anchors curves on arrives under a mild perspective warp. Decades-old ink is soft, not crisp, and low scan resolution softens it further. Paper carries fibre texture, foxing, and scanner grain that the clean source rendering never had. These are the corruptions a real raster carries, and they are exactly the corruptions a model trained on clean inputs has never seen. The job of augmentation here is to manufacture those corruptions on purpose so the network meets them in training rather than in production.

The model on the receiving end of that augmentation is VeerNet, our encoder-decoder segmentation network with a transformer refinement stage on the bottleneck, which reads curves off scanned rasters and reconstructs them as numeric traces. We trained it on procedurally generated logs rendered at the engagement's own ragged scales, widths sampled from 3,200 to 12,800 pixels and heights from 480 to 640, two constant curves per log in the multiclass setting against a background class. The renderer hands us pixel-perfect masks for free, which means the only variation the model sees beyond the curve geometry is the variation we choose to inject. That makes the augmentation menu unusually consequential: it is not seasoning on top of a large natural dataset, it is the entire bridge from clean render to degraded scan.

Choosing transforms against the failure mode

Walk the standard menu against that channel and most of it falls away immediately.

Colour-jitter is the clearest case. It perturbs hue, saturation, and per-channel brightness, and it is genuinely useful on RGB photographs where illumination and camera response vary. On a single grayscale channel there is no colour to jitter. The transform either does nothing the segmentation task depends on or, worse, spends part of a fixed augmentation budget producing variation the model will never encounter on a real log. Horizontal flip is similarly empty: a depth axis runs one way, curves have a reading order, and a mirrored log is not a log the scanner will ever produce. These transforms are not harmful in some deep sense, but they are inert, and an inert transform in a tight budget is a cost with no return.

Now walk the failure modes instead. A small homography that tilts the rendered grid models the off-square capture directly. A rotational skew models the crooked page feed, and in our experience it is the single largest realistic corruption, because it shears every row simultaneously rather than perturbing pixels locally. A Gaussian blur models faded ink and low-resolution scanning. Additive paper noise models the fibre texture and grain. Each of these earns its place in the policy for the same reason colour-jitter does not: it reproduces a corruption the test-time raster actually carries. The interesting structural point is that geometric augmentations like tilt and skew have a principled home in the architecture literature too. Spatial Transformer Networks made the case that a network can learn to undo exactly this family of affine and perspective warps, which is the flip side of the same coin: if the model can learn to be invariant to a geometric corruption, you have to show it that corruption first (Jaderberg et al., 2015). And the original U-Net result, which our encoder-decoder shape inherits, leaned on deformation augmentation rather than colour tricks for precisely this reason: on structured biomedical imagery the realistic variation is geometric and textural, not chromatic (Ronneberger et al., 2015).

The instrument below makes that decision tactile. Each toggle previews a transform on a synthetic grayscale log strip and states plainly whether it targets a corruption a real scan carries. Perspective tilt, blur, and paper noise reshape the strip in ways the scanner would; contrast and colour-jitter visibly change nothing the task depends on. The verdict per transform is an editorial judgement about this acquisition channel, not a measured score, and the strip preview is illustrative geometry, both flagged on the exhibit.

A dial of four augmentation transforms held against a synthetic grayscale log strip. Toggle a transform to preview its effect on the strip and read the verdict: perspective tilt, blur, and paper noise each model a corruption a real scanned paper log carries, so they earn their place; contrast / colour-jitter targets a colour channel a 1-channel grayscale log does not have, so it spends budget for no robustness. The synthetic strip is rendered at the engagement's sourced configuration (1 grayscale channel, width 3200 to 12800 px, height 480 to 640 px, two constant curves). The alternative a learned-policy search such as AutoAugment or RandAugment offers a generic recipe tuned for natural-image benchmarks; the dial instead hand-picks each transform against the failure mode the raster actually presents. The transform previews and the earns-its-place verdict are illustrative editorial judgement about this acquisition channel, not a measured score.

Where this meets the learned-policy work, not against it

It would be a misreading to take any of this as a rebuttal of AutoAugment or RandAugment. Their contribution is to remove human guesswork from selecting magnitudes within a transform set, and that contribution holds on our problem too. Once you have decided that perspective tilt, skew, blur, and noise are the transforms that earn their place, you still have to pick how hard to tilt, how much to blur, how crooked to feed the synthetic page, and those are exactly the magnitudes you should not be eyeballing. RandAugment's reduction of the whole policy to a transform-count and a single magnitude is an attractive shape for that residual search, because it keeps the curated menu small and tunes only the dial that is left. TrivialAugment later pushed the same instinct to its limit, sampling a single transform per image at a uniformly random strength and matching the searched policies with essentially no tuning at all, which is a useful reminder that once the menu is right the magnitude search can be very cheap (Mueller and Hutter, 2021).

The piece of the augmentation question those methods deliberately do not address is menu construction. They assume the transform set is given and appropriate, because on their benchmarks it is. The whole argument here is that menu construction is the first and most important decision, and it is domain knowledge, not search. You cannot search your way to discarding colour-jitter on a grayscale log, because a search over magnitudes will happily assign colour-jitter a magnitude of zero or a small one and call it solved, having burned the search budget discovering what one look at the input channel would have told you for free. The search optimises within the menu; the engineer is responsible for the menu.

There is a robustness argument layered on top of menu construction that is also worth keeping straight, because it is a different axis. AugMix showed that mixing several augmentation chains under random convex weights and tying the model's predictions across them with a consistency loss widens the support of the training distribution rather than just deepening it, which is how augmentation buys out-of-distribution robustness rather than mere volume (Hendrycks et al., 2020). That mechanism is real and we use it, but it is orthogonal to the point of this piece. AugMix tells you how to combine transforms to widen support; it does not tell you which transforms belong in the chains in the first place. Feed AugMix a menu of colour-jitter and flips on a grayscale log and you will mix inert transforms into inert chains. The menu still has to be right first.

The recipe, stated plainly

Our augmentation policy for raster well-log digitisation is short, and the shortness is the point. We keep four transforms, all geometric or textural, each chosen because it reproduces a corruption a scanned paper log carries: a perspective tilt for off-square capture, a rotational skew for crooked page feed, a Gaussian blur for faded ink and low scan resolution, and additive paper noise for fibre and grain. We drop colour-jitter and horizontal flip outright, not because they are dangerous but because a single grayscale channel with a reading order gives them nothing to act on. Within that four-transform menu the magnitudes are the part we treat the way the learned-policy literature taught us to, as a search rather than a feel, kept cheap by the smallness of the menu.

Transform	Models which real corruption	Verdict on a grayscale log
Perspective tilt	Flatbed or phone capture not square to page	Earns its place
Skew (rotation)	Crooked page feed; shears every depth row	Earns its place, largest realistic term
Blur	Faded analogue ink, low scan resolution	Earns its place
Paper noise	Fibre texture, foxing, scanner grain	Earns its place
Colour-jitter	A colour channel the input does not have	Dropped, inert on 1 channel
Horizontal flip	A mirrored log the scanner never produces	Dropped, breaks reading order

The loss the policy feeds into matters too, and we set it the same way, by the structure of the problem rather than by default. Curve pixels are a vanishing fraction of any log, so we evaluated an imbalance-aware family and settled on a precision-recall tunable objective, Tversky, for the thin-structure segmentation the augmented data has to support (Salehi et al., 2017). Augmentation and loss are the two design decisions that are most tempting to copy from a template and most rewarding to make from the data, and they interact: there is no point manufacturing realistic skew if the loss cannot see the one-pixel curve the skew is meant to make robust.

The decision procedure

Before reaching for a learned policy, write down the failure modes your test-time input actually carries. Keep the transforms that reproduce those failure modes and drop the ones that model corruptions your input cannot carry. Then, and only then, search the magnitudes of what is left. The search is for magnitudes; the menu is for judgement.

Conclusion

The learned-policy work, AutoAugment and RandAugment, is the right answer to the question it asks: stop hand-tuning augmentation magnitudes, search them. We lean on that answer. But the prior question, which transforms belong on the menu at all, is not a search problem on imagery like a scanned paper well log. It is a design decision made against the corruptions the raster actually carries, and on a single grayscale channel that decision deletes half the standard menu before any tuning begins. Our recipe, four geometric and textural transforms chosen for the scanning channel and nothing chosen for colour, is one worked example. The generalisable habit underneath it is the only thing we would ask a reader to take away: look at your input channel, enumerate its real failure modes, and let that, not the template, decide what augmentation gets to exist.

References

[1] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q. V. Le. AutoAugment: Learning Augmentation Policies from Data. CVPR 2019. https://arxiv.org/abs/1805.09501

[2] E. D. Cubuk, B. Zoph, J. Shlens, Q. V. Le. RandAugment: Practical Automated Data Augmentation with a Reduced Search Space. NeurIPS 2020. https://arxiv.org/abs/1909.13719

[3] M. Jaderberg, K. Simonyan, A. Zisserman, K. Kavukcuoglu. Spatial Transformer Networks. NeurIPS 2015. https://arxiv.org/abs/1506.02025

[4] O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015. https://arxiv.org/abs/1505.04597

[5] D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, B. Lakshminarayanan. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. ICLR 2020. https://arxiv.org/abs/1912.02781

[6] S. S. M. Salehi, D. Erdogmus, A. Gholipour. Tversky Loss Function for Image Segmentation Using 3D Fully Convolutional Deep Networks. MLMI Workshop, MICCAI 2017. https://arxiv.org/abs/1706.05721

[7] S. G. Mueller, F. Hutter. TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation. ICCV 2021. https://arxiv.org/abs/2103.10158

Augmentation Is a Design Decision, Not a Default

What a raster well log actually is

Choosing transforms against the failure mode

Where this meets the learned-policy work, not against it

The recipe, stated plainly

Conclusion

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on