The Marginal Cost of the Next Digitised Log Is a Serving Number, Not a Training One

There are two questions a team asks about the compute behind a digitised well-log archive, and they are almost always answered as if they were one. The first is what it cost to build the model: the training run, the epochs, the GPU-hours that went into producing a set of weights that reads curves off scanned paper. The second is what it costs to digitise the next log now that the model exists. Those are different questions with different answers, and blurring them is how a one-off build cost gets quietly divided into a per-log figure that no serving system ever actually pays. This note is about the second question, the one that decides whether an archive of legacy raster logs is economic to process at all [1]. We derive it for VeerNet, the encoder-decoder EarthScan uses to lift curves off scanned paper logs, and we try to keep the accounting honest.

Set the frame first, because it does most of the work. VeerNet is trained once. A 50-epoch pass over the 2,000-log binary segmentation set takes 110 minutes, and a 50-epoch pass over the 15,000-log multiclass set takes 550 minutes. Those are real, measured budgets, and they are one-off costs. You pay them to obtain the weights, and then you keep the weights. The training pass is not repeated per log any more than a printing press is rebuilt per copy. The economics literature has a plain name for this shape: a fixed build cost amortised over volume, which is a separate line item from the marginal cost of producing one more unit [4]. The trap is amortising the build cost badly, dividing 550 minutes by however many logs happen to be in the pilot set and calling the result a per-log price. That number is real arithmetic and economically meaningless, because it moves the moment you process a second archive with the same weights.

What the next log actually consumes

Once the weights exist, digitising a log is a forward pass. The image goes in, the segmenter emits masks, the masks are turned back into depth-indexed curves. No weights are updated, no gradients are computed, no epoch is run. The same trained encoder-decoder serves the first log and the ten-thousandth, which is the ordinary property of a trained network at inference: you refit nothing per input [2]. So the marginal cost of the next log is not a training cost at all. It is a serving cost, and it comes down to two things: how much you are paying for the machine per unit time, and how many logs that machine clears in that time.

That gives a single division. Take the hourly rent of the GPU you are serving on, and divide it by the number of logs the served model clears in an hour. The result is the compute cost of one more digitised log. Everything else in the training story, the 50 epochs, the two budgets, the batch size of 1 that the variable log widths pinned us to, sits on the fixed side of the ledger and does not enter this number. It is worth being precise about that separation, because it is the whole argument: the training facts are one-off, and the per-log figure is set entirely by serving.

The rent side of the division is sourced. The engagement had two rentable GPU tiers: a high-end card at 750 EUR per month and an advanced card at 1,800 EUR per month. Those are the numerators. Convert a monthly tier price to an hourly rate and you have the top of the division. The bottom, the logs cleared per hour at serve time, is where the number is actually decided, and it is also the term I have to be most careful about, because it is not measured. The throughput figures below are illustrative serving assumptions, not logged benchmarks. I use them to show the shape of the cost, not to quote a rate.

Why one model serves every scan width

The reason throughput is a serving property rather than a per-log training cost turns on a specific engineering fact: one trained model serves every scan width. Raster logs are not a uniform size. They arrive at wildly different widths, which is exactly why training was pinned to a batch size of 1 in the first place, since you cannot stack images of different widths into a tensor without handling the mismatch. At serve time the same variability is handled by aligning each image to a 32-pixel grid before it enters the network, so a single set of weights accepts any width instead of needing a width-specific model. That is what makes throughput a lever you can pull without touching the training run: you are feeding more logs through the one model, not building more models.

This matters for the archive because the archive is large. The Texas Railroad Commission raster dataset behind this work runs to 136,771 TIF files. If each of those needed anything resembling a per-image training cost, the arithmetic would be hopeless. Because they do not, because they are all served by the one model, the cost of clearing the whole archive is the cost of one forward pass multiplied by the count, and the cost of one forward pass is that serving division again. The size of the archive stops being a training problem and becomes a throughput problem, which is a much better problem to have.

Where the unit cost actually lives

Here is the reader that makes the division tangible. Toggle the GPU tier to set the hourly rent, then drag the throughput to set how many logs the served model clears in an hour. The orange marker is the per-log cost, and it is the only element that argues: watch it slide down the ladder as throughput rises.

The marginal compute cost of digitising one more raster log, derived as a single division: the GPU rent per hour over the logs the served model clears that hour. Lever A toggles the rentable tier, 750 EUR per month for a high-end card or 1800 EUR per month for an advanced one, which sets the hourly rent. Lever B drags serving throughput in logs per hour, the term that actually moves the number. The ladder plots cost per log against throughput for the chosen tier with the other tier drawn as a faint ghost line, so two facts show at once: the advanced card sits a fixed multiple above the high-end one at any throughput, and both costs collapse toward a floor as logs per hour rises, because one trained model serves every scan width through 32-pixel alignment rather than being rebuilt per log. The orange marker is the only element that argues: the per-log cost sliding down the ladder as you drag throughput up. The tier prices, the 50 epochs, the 110 and 550 minute training budgets, the batch size of 1, and the 32-pixel alignment are sourced from the engagement archive; the hours-per-rented-month divisor and the logs-per-hour serving band are illustrative inputs, and this is an engineering unit cost only, not a price.

Two facts show up at once, and they are the two facts worth keeping. The first is that the tier choice sets a multiple, not the shape. The advanced card sits at a fixed ratio above the high-end one at every throughput, because it is a bigger number in the numerator of the same division. Moving from the high-end tier to the advanced tier scales the per-log cost by the ratio of the rents and does nothing to the curve it rides. The second fact is the one the instrument exists to make undeniable: both tiers collapse toward a floor as logs-per-hour rises. The per-log cost is rent divided by throughput, so it falls as one over throughput. Double the logs cleared per hour and you halve the cost per log, on either tier, without spending a cent more on hardware. That is why the honest lever is throughput, not the tier. The tier is a multiplier you pick once; the throughput is the term that actually moves the number, and it lives entirely on the serving side.

The falling curve is the point, but it is not a promise of zero. The cost bends toward a floor, it does not reach it. Real serving has overhead the division does not show, and the achievable rate is set by whatever part of the pipeline does not speed up when you push more logs through: the disk read of a large TIF, the alignment step, the reassembly of masks into curves [3]. Push throughput high enough and one of those, not the GPU, becomes the thing setting the pace, and the per-log curve stops bending because you are no longer bound by the term you were dividing by. The instrument draws the ideal one-over-throughput shape; the real curve tracks it until a serving bottleneck takes over.

The unit cost, in brief

Training is a one-off, not a per-log cost. The 110-minute binary budget and the 550-minute multiclass budget at 50 epochs are paid once to obtain the weights, then amortised over every log the one model ever clears. Dividing a build cost by a pilot count is real arithmetic and economically meaningless.
The marginal cost of the next log is a single division: hourly GPU rent over logs cleared that hour. No weights update at serve time, so the number is a serving cost, not a training one.
The tier sets a multiple, not the shape. The high-end tier at 750 EUR per month and the advanced tier at 1,800 EUR per month ride the same one-over-throughput curve; switching tiers scales the per-log cost by the ratio of the rents and changes nothing else.
Throughput is the honest lever. Because one model serves every scan width via 32-pixel alignment, doubling logs-per-hour halves the cost per log on either tier without spending more on hardware, which is what makes a 136,771-file archive a throughput problem rather than a training one.
The curve bends toward a floor, not to zero. A real serving bottleneck (disk read, alignment, mask reassembly) eventually sets the pace, so the per-log cost stops falling where the GPU stops being the binding term.

Limitations

The honest boundary of this note is the throughput term. The tier prices, the epoch count, the two training budgets, the batch size of 1, and the 32-pixel alignment are sourced from the engagement archive and are real. The logs-per-hour range and the mapping from a monthly tier price to an hourly rate are illustrative serving assumptions, not measured benchmarks, so the specific per-log figures the instrument prints should be read as the shape of the cost, not a quoted rate for any operator's hardware. The division also treats serving as GPU-bound, which is only true up to the point where a non-GPU step becomes the bottleneck; past that the real per-log cost sits above the ideal curve and stops falling with throughput [3]. And this is an engineering unit cost, the compute consumed by one forward pass, not a price. It excludes storage, ingestion, quality review of the digitised curves, and the human time to fix the logs the model gets wrong, all of which are real costs of running an archive and none of which the model serves away. Whether a digitised curve is usable is a separate question from what it cost to produce, and this note only answers the second.

The number to quote, and the number not to

The discipline this leaves us with is to answer the right question with the right figure. When someone asks what it costs to digitise one more log, the answer is the serving division: rent per hour over logs per hour, on the tier you are actually renting, at the throughput you are actually clearing. It is a small number and it gets smaller as throughput rises, and it belongs entirely to the serving side of the system. When someone asks what the model cost to build, the answer is the one-off training bill, the 110 and 550 minutes paid once. Quoting the first when you mean the second inflates the marginal cost with a fixed cost that the next log does not pay. Quoting the second when you mean the first hides the fact that the archive gets cheaper per log the faster you serve it. Keep the two apart and the economics of the archive read correctly: build once, then let one model clear every scan there is.

References

[1] Koroteev, D., and Tekic, Z. Artificial intelligence in oil and gas upstream: Trends, challenges, and scenarios for the future. Energy and AI (2021). The upstream context for why processing a legacy raster archive at scale is worth doing, and why per-unit cost is the deciding figure. https://www.sciencedirect.com/science/article/pii/S2666546820300033

[2] Ronneberger, O., Fischer, P., and Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI (2015). The encoder-decoder shape whose trained weights serve every input, which is why the next log is a forward pass and not a refit. https://arxiv.org/abs/1505.04597

[3] Amdahl, G. M. Validity of the single processor approach to achieving large scale computing capabilities. AFIPS Spring Joint Computer Conference (1967). The statement that the achievable rate is set by the part that does not speed up, the reason the per-log cost has a floor. https://dl.acm.org/doi/10.1145/1465482.1465560

[4] Crawford, G. C. Learning and the marginal cost curve. Managerial and Decision Economics (1995). The framing of a fixed build cost amortised over volume as distinct from the marginal cost of one more unit. https://onlinelibrary.wiley.com/journal/10991468

The Marginal Cost of the Next Digitised Log Is a Serving Number, Not a Training One

What the next log actually consumes

Why one model serves every scan width

Where the unit cost actually lives

Limitations

The number to quote, and the number not to

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on