A Field Guide to Synthetic Image Generators in 2022

There is a question that gets skipped when a team decides to train on synthetic images: not which generator makes the most convincing picture, but which one hands back the answer key. For a classifier that only needs a label per image, the distinction barely matters. For a segmentation model, which has to assign a class to every pixel, it is the whole decision. The training data behind VeerNet, the encoder-decoder EarthScan uses to lift curves off scanned paper well logs, had to be labelled at the pixel: which pixels are curve one, which are curve two, which are background. So the generator we picked was judged less on the images it drew than on whether those images arrived already labelled, and that criterion pointed somewhere other than the two families everyone was talking about in 2022.

The three families worth knowing

By 2022 the synthetic-image field had settled into three broad kinds of generator, and it helps to name them by what they are, not by their output. The adversarial family, the generative net trained against a discriminator, is the one that made synthetic faces famous [1]. The diffusion family, the denoising sampler, was the newer arrival and by then clearly the ascendant one for photographic fidelity [2]. Both are learned generators: you train them on a corpus and they sample new images from the distribution they absorbed. The third family is older and less glamorous and does not learn anything. A procedural generator is a drawing program. You write the rules that place every stroke, and it executes them. For our pipeline that program was a pycairo loop that drew log curves onto a canvas from parameterised equations.

Ranked on raw realism of a natural photograph, that ordering is roughly diffusion, then adversarial, then procedural a distant third, because a hand-written drawing loop cannot invent the texture of skin or foliage the way a trained sampler can. If realism were the axis that decided a segmentation pipeline, the field guide would end there and we would have reached for diffusion. It is not the axis, and the reason is the answer key.

Control is the axis that actually pays

The property that matters for a per-pixel task is control, by which we mean something precise: the fraction of pixels the generator emits with a known class already attached. A learned generator scores near zero on that axis. It produces a plausible image and nothing else. Whether a given pixel is curve or background is a fact the model does not represent and cannot report, so the label has to be recovered afterward by a human or a second model annotating the output. That annotation pass is the hidden bill on generative synthetic data for segmentation: you have manufactured an unlimited supply of images and, along with it, an unlimited supply of un-labelled work.

A procedural renderer sits at the other end. Because it drew curve one from a known equation, it knows exactly which pixels curve one occupies; it can emit the segmentation mask in the same pass that renders the visible line, at no extra cost, with no ambiguity. This is not our observation, it is the standard case for renderer-sourced training data. Richter and colleagues made it sharply for driving scenes rendered from game engines: a renderer that draws a scene from known geometry gets the ground-truth labels as a by-product, which is precisely what a photographic source cannot give you [3]. Tremblay and colleagues pushed the same point through domain randomisation, arguing that a procedural generator trains a usable model because it controls, and therefore labels, everything it puts on the canvas [4]. Our contribution was not the insight. It was applying it to well logs, where the structured, drawing-like nature of the imagery makes the procedural route not just cheaper but faithful.

Three synthetic image-generator families scored on the one axis a segmentation pipeline actually pays for: control, the fraction of pixels emitted with a known class at generation time, plotted against realism. The procedural pycairo renderer sits at full control because every draw call knows its own class, so the pixel-perfect mask falls out with the image; it produced 15,000 labelled multiclass logs (20,000 in the 2-curve dataset) from a 2,000-instance binary start, spanning 3,200-12,800px wide and 480-640px tall, with a tunable augmentation stack of sharpness, blur, jitter, perspective, and erasing. GAN and diffusion families score higher on texture realism but hand back no mask, so their real training cost is control plus a separate annotation pass, drawn as a dashed arrow that a label budget can only crawl along and never fully close. Drag the label-budget lever to move the label-free families rightward at a fixed human cost per image while the procedural marker stays pinned right at zero. The orange marker is the only element that argues: labels for free. The counts, dimensions, and augmentation stack are sourced from the engagement archive; the axis coordinates for each family and the per-image annotation minutes are illustrative editorial scoring, not measured benchmarks.

What the pycairo pipeline actually produced

The concrete output is the part that turns an argument into a decision. We started small, with a 2,000-instance binary dataset to check that a segmentation model could separate a single curve from the background at all. Once that held, the drawing loop was generalised to the multiclass problem and scaled up. It produced 15,000 perfectly labelled multiclass instances, and the two-curve variant of the set reached 20,000 logs, each one carrying its own exact mask because the loop that drew it also recorded it. The images were not fixed-size crops. Real scanned logs come in wildly different shapes, so the generator spanned 3,200 to 12,800 pixels in width and 480 to 640 in height, which forced the downstream model to cope with variable dimensions rather than memorising one aspect ratio.

Realism, the axis where a procedural generator is supposed to lose, we bought back deliberately rather than hoping the sampler would supply it. On top of the clean rendered logs we ran an augmentation stack: sharpness and blur to mimic scanner focus, colour and geometric jitter, perspective warps to imitate a page photographed off-square, and random erasing to stand in for stains and torn corners. Every one of those transforms was applied to the image and its mask together, so the label stayed exact through the entire pipeline. That is the move a generative source cannot copy. You can augment a GAN or diffusion sample all you like, but you are augmenting an image whose mask you still do not have.

When the guide points the other way

None of this makes procedural generation the right answer in general, and the field guide is only useful if it is honest about where it inverts. The moment the target imagery is a natural photograph, or anything whose appearance you cannot write down as a drawing rule, the procedural route collapses, because you cannot render what you cannot specify, and the learned generators are the only option that produces a believable image at all. The reason the choice was easy for us is that a well log is closer to a technical drawing than to a photograph: it is lines on a grid, and lines on a grid are exactly what a drawing loop is good at. Read the guide as a decision that turns on one question. If the thing you are generating can be described procedurally and you need per-pixel labels, the renderer wins on control and hands you the labels for free. If it cannot, you are back to a generator plus an annotation budget, and the honest thing is to price that budget in before you start.

Limitations

This is a field guide grounded in one engagement, and the dataset counts, the pixel dimensions, and the augmentation stack are the real archive figures, but the comparison should not be read as a benchmark. We did not train a GAN and a diffusion model on well logs and measure their downstream segmentation accuracy against the procedural set; the case against the learned families here is the structural one, that they do not emit labels, not a head-to-head score. The control-versus-realism positions the instrument plots for each family are editorial scoring meant to carry the argument, and the per-image annotation minutes are illustrative rather than a measured labelling cost, which will vary enormously with tooling and with how forgiving the mask needs to be. The claim that pixel-perfect procedural labels train a good digitiser is also only half the story: whether a model trained on rendered logs transfers to genuinely messy scans is a reality-gap question the augmentation stack addresses but does not close, and it is the question that decides whether any of this was worth doing. Finally, everything here is specific to structured, drawing-like imagery. It says nothing useful about generating faces, scenes, or textures, where the whole calculus reverses.

References

[1] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative Adversarial Nets. Advances in Neural Information Processing Systems 27 (NIPS 2014). The adversarial formulation that learns to synthesise realistic samples but outputs pixels with no per-pixel class attached. https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html

[2] Ho, J., Jain, A., and Abbeel, P. Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems 33 (NeurIPS 2020). The denoising sampler that became the other headline generative family, and which likewise produces an image but not a mask. https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html

[3] Richter, S. R., Vineet, V., Roth, S., and Koltun, V. Playing for Data: Ground Truth from Computer Games. European Conference on Computer Vision (ECCV) 2016. The case that a renderer drawing a scene from known geometry hands back pixel-perfect labels as a by-product. https://link.springer.com/chapter/10.1007/978-3-319-46475-6_7

[4] Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., and Birchfield, S. Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization. CVPR Workshops 2018. The argument that a procedural generator trains a usable model because it controls, and therefore labels, everything it draws. https://openaccess.thecvf.com/content_cvpr_2018_workshops/w14/html/Tremblay_Training_Deep_Networks_CVPR_2018_paper.html

A Field Guide to Synthetic Image Generators in 2022

The three families worth knowing

Control is the axis that actually pays

What the pycairo pipeline actually produced

When the guide points the other way

Limitations

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on