Digitization of raster logs — VeerNet, a U-Net + transformer hybrid

“
A century of careful subsurface measurement, trapped behind manual digitisation. VeerNet's pitch: turn that ‘multi-quarter project’ into a weekend run.
”

Authors: M. Quamer Nasim¹², Narendra Patwardhan¹, Tannistha Maiti¹, Tarry Singh¹

Affiliations: ¹ DeepKapha.ai, Amsterdam, Netherlands ² Indian Institute of Technology, Kharagpur, India

1. Introduction

Well-logging is the process of taking measurements of various rock properties along the length of a well using drilling tools. The digital log responses are functions of lithology, porosity, fluid content, and textural variation of the formation. Well-logging parameters drive the derivation of lithofacies groups and facies-by-facies descriptions of rock properties — the foundation of every reservoir-characterisation pipeline.

Before the advent of digital logging instruments, well-log data was drawn on parameter graphs in curve format. These analog parameter graphs have several disadvantages:

Large physical size, requiring vast archive space
Memory cost when scanned and stored as high-resolution images
Interference from gridlines during automated interpretation
No depth-calibrated machine-readable form

To use this legacy data in modern workflows, well-logging parameter graphs must be converted to (X, Y) coordinates, where X represents parameter values and Y represents depth values. Raster logs — scanned copies of paper logs saved as image files — are an economic intermediate format (Cisco, 1996). Although raster archives are often discarded after vectorisation, they remain the bridge to a global, computer-readable format for the world's legacy hardcopy well-logging data.

This data is far more valuable than just resource exploration. It supports environmental protection, water management, global change studies, and primary applied research. There's no good substitute for a century of careful subsurface measurement — and the digitisation problem is what stands between that data and modern AI workflows.

VeerNet — full-paper benchmark

35,000

Synthetic training images (basin-independent, generated from GP regression)

10,000

Real raster logs from Texas RRC paired with LAS ground truth

F1 > 35%

Real-dataset score with Lovász loss (the winning ablation)

r = 0.62

Pearson correlation on Gamma Ray vs LAS ground truth

The current manual process

Geologists and reservoir engineers revisit raster logs manually or through software applications that require enormous manual input. Beyond the loss of thousands of person-hours, the existing process is error-prone and tedious. Operators wanting to digitise raster logs efficiently must purchase expensive digitisation software — and even then, accumulate hidden technical debt through additional servicing and consultation charges.

The most widely-used commercial baseline, Neuralog, is based on SCTR (Scanning, Compressing, Tracing, Rectifying). Background grid interference frequently pauses the software during curve tracking. Several unsupervised computer-vision methods have been proposed to digitise log data embedded in binary images, falling into two camps:

Pixel-based methods — thinning, Global Curve Vectorisation (GCV) (Zheng et al., 2005; Hilaire and Tombre, 2001; Nagasamy and Langrana, 1990). Thinning reduces a line's width to a one-pixel skeleton; GCV is good for line processing but poor for point-line processing (Yuan and Yang, 2019).
Non-pixel-based methods — feature-detection approaches that extract higher-order shape descriptors before vectorising.

Each has structural limitations: thinning loses line-width information and produces wrong branches at intersections; GCV breaks down on noisy grids.

2. Workflow comparison — Neuralog vs the proposed solution

The existing Neuralog workflow consists of two ribbons:

Raster calibration — the user must:

Provide a rectangular region capturing the header and scale areas
Define a set of points capturing the depth of the log tracking
Determine the left and right axis values
Specify whether the scale is logarithmic or linear

Digitisation — the user must:

Identify the track's top and bottom
Define the image track with width
Add depth points and grids
Pick points along each curve, then let the auto-trace function complete the trace
Re-pick when the auto-trace pauses (frequent on busy grids)

This workflow accumulates user-in-the-loop steps that compound error and consume time.

Our proposed solution operates on minimal manual intervention. The user is not required to provide left/right input points; they don't need to specify the width of the track; the VeerNet architecture has no metadata requirements related to depth grids. The algorithm differentiates between grids and curves end-to-end. The user only needs to provide a cropped raster or paper-log section.

3. Dataset — synthetic and real curves

We trained on two datasets covering complementary regimes.

3.1 Synthetic dataset

Reconstruction of well-log curves through ML techniques like random forests, SVMs (Akinnikawe et al., 2018), and deep neural-network-based prediction models (Kim et al., 2020) is used to generate synthetic logs. Existing methods generate target-field-specific synthetic curves with strong domain dependence. We aimed to develop a generalised model with no specific oil-field bias.

Our hypothesis: curves in a particular track exhibit random overlapping and wrapping factors. Synthetic curves are generated based on mean and standard deviation parameters, with random noise layered on top. The resulting dataset comprises two well-log curves per image, generated through a four-step pipeline:

a) Canvas preparation — set the digital canvas resolution, then add grids. Grids can be:

Linear (regular scale)
Logarithmic (log scale)

To closely match raster images, thickness and brightness of grid lines are randomly assigned — mimicking the erosion introduced in PDF or image prints. Masks are generated identically but without gridlines.

b) Curve generation — define the number of curves per track (1, 2, 3, ..., n). Real well-log curves have:

Local variation — within-segment fluctuations
Global variation — across-segment trends

Random mean, global standard deviation, and local standard deviation are sampled to mimic real curves. A parameter sets the number of global variations (default = 10), giving the curve realistic structural depth.

c) Noise and annotation addition — random text annotations and noise patterns are layered onto the canvas to simulate real-log artefacts.

d) Mask and log generation — the per-curve segmentation masks are produced alongside the rendered log image, giving us paired (image, mask) training examples.

The pipeline produces single-track images and includes up to 3 tracks per image.

3.2 Real dataset

The real dataset was generated from LAS and raster image files obtained from the Texas Railroad Commission (RRC) archive — about 1000 LAS files. The well-log curves used:

Gamma Ray (GR)
Caliper (CALI)
Spontaneous Potential (SP)
Shallow Resistivity
Medium Resistivity
Deep Resistivity
Neutron Porosity
Density logs

Generation pipeline:

Generate mini-batches from the extracted well-log curves of LAS files
Apply Gaussian process regression fits to the mini-batches → ~100 curve distributions
Randomly select and sample from the distributions
Treat NaN values via either:
- (a) Fill with a constant
- (b) Drop rows with NaN

The resulting dataset was the regression-based synthetic-realistic test set used to benchmark generalisation.

4. Architecture — VeerNet

Raster logs have substantially greater resolution than images traditionally consumed by image-segmentation pipelines, which restricts the use of off-the-shelf transfer learning due to memory requirements. As the input has a low signal-to-resolution ratio, rapid downsampling is needed to alleviate unnecessary computation.

VeerNet is a modified U-Net-inspired architecture that strikes a balance between retaining key signal and reducing dimensionality. It uses an attention-augmented read–process–write architecture:

Encoder

Input is downsampled through 4 blocks, each consisting of:

2D convolution that reduces spatial resolution by 2×
Group normalisation
GELU nonlinearity

After 4 such blocks, the feature map is 1/32 the size of the original image.

Transformer middle

The output is tokenised and fed into transformer blocks (Vaswani et al., 2017). The transformer processes and refines the encoder's internal representation, capturing long-range dependencies between distant regions of the log image — the key capability that pure CNN baselines lack.

Decoder

Decoder blocks follow:

Bilinear upsampling to recover spatial resolution
2D convolution
Group normalisation
ReLU activation

Residual connections improve signal strength at each decoder level. The final decoder block produces spatial masks signifying the presence of corresponding log curves.

This approach inherits the best of both U-Net and modern transformer-based models: it keeps inference time and memory requirements low through inductive biases for efficient downsampling (convolution), and learns rich, global context-aware representations (transformer attention).

5. Training

Datasets:

Synthetic: 35,000 images
Real: 10,000 images
Single track per image, 2 or 3 well-log curves

Hyperparameters varied:

Loss functions: Dice, Tversky, Lovász, Focal, SCE (full ablation)
Number of transformer layers: 4, 5, 6
Learning rate: 1.5×10⁻³ to 2.5×10⁻³
LR scheduler: Cosine decay
Max epochs: 250
Compute: 5 NVIDIA A100-SXM-80GB GPUs, ~14 hours per run

5.1 Loss-function ablation

5.1.1 Dice loss

Optimises the network based on the Dice overlap coefficient between predicted segmentation and ground truth (Zhao et al., 2020). Effective at alleviating the foreground/background imbalance characteristic of segmentation problems.

5.1.2 Tversky loss

Based on the Tversky similarity index (Salehi et al., 2017). Adjusting hyperparameters lets you place explicit emphasis on false negatives during training — useful for highly imbalanced data because it leads to high sensitivity.

5.1.3 Lovász loss

A loss function for multiclass semantic segmentation (Berman et al., 2018) that achieves direct optimisation of the mean intersection-over-union loss. Improves IoU as a direct training objective rather than as an evaluation-only metric.

5.1.4 Focal loss

An extension of cross-entropy that down-weights easy examples and focuses training on hard negatives. Adds the modulating factor (1 − pₜ)ᵞ with a tunable focusing parameter γ ≥ 0.

5.1.5 Sparse Cross-Entropy (SCE)

Used when classes are mutually exclusive (each sample belongs to precisely one class) (Totakura et al., 2020). Operates on integer class indices, computing the logarithm only for the index the ground truth indicates — calculating the logarithm once per instance and omitting the summation, leading to better runtime performance.

6. Results

The problem is treated in two stages:

(a) Multi-class classification — identifying well-log curves from background and separating them into individual curves (b) Regression analysis — finding goodness of fit on individual curves

Train/val split: 80% / 20% of the 10,000-image instances. The model takes the user's paper log as input and applies filters to extract feature maps. The middle layer attends to signals in the paper log, ignoring grid lines and annotations. The decoder expands the feature map back to the original shape, removing everything but the signal. From the resulting signal mask, a 1D signal is extracted and saved as CSV/LAS files.

Headline metrics

VeerNet achieves:

Real dataset: F1 > 35%, IoU > 30%
Synthetic dataset: F1 > 30%, IoU > 26%

The maximum IoU score for synthetic and real ranges between >26% and >30% respectively. Performance metrics for the mask-evaluation stage:

IoU (Intersection over Union) — how well the predicted mask matches ground truth
F1 score — harmonic mean of precision and recall

Loss-function comparison

Lovász loss performed best across both datasets. Specifically:

Synthetic dataset (varying lr 0.06 → 0.008): Lovász and Focal showed the best results, with IoU 26%/23% and F1 30%/28%. Best Focal F1 was reported with lr = 0.05; worst F1 at lr = 0.01 with lr-warmup-decay = 0.001.
Real dataset (default lr): Lovász and SCE displayed the best results.

F1 score per loss function — synthetic vs real datasets

Lovász wins both columns. SCE is the production-stability runner-up on real data — close enough to Lovász (35% vs 34%) that the slightly more stable training dynamics make it the safer default for a deployed system.

Number-of-transformer hyperparameter results were inconclusive: SCE and Dice with 3 transformers gave better F1 than 5/4/6; Tversky benefited from 6 transformers. No clean monotonic relationship.

Statistical analysis on derived curves

For curves where ground truth was available, we ran statistical hypothesis tests on the digitised values vs the LAS originals:

Null hypothesis: the calculated and original samples are drawn from the same distribution.
Test: Pearson correlation coefficient + p-value.
Threshold: p < 0.05 to reject the null.

Gamma Ray (GR):

Pearson r is high (model with Lovász loss); p < 0.05 → distributions are statistically distinguishable but the high correlation means the derived value tracks the original closely. The Pearson coefficient of 0.62 indicates strong goodness of fit to real data.

Caliper (CALI):

Pearson r negative for Lovász; low positive 0.12 for SCE; p < 0.05 for both → inconclusive correlation. The CALI result requires a larger training set to converge.

7. Conclusion

Our proposed solution for raster well-log digitisation is simpler and requires substantially less manual intervention than the existing techniques. It is fast and scalable. We have introduced a deep-learning model — VeerNet — that:

Achieves classification accuracy >35% on the real dataset
Identifies well-log curves cleanly from background grids
Generates a Pearson correlation coefficient of 0.62 against actual LAS data on Gamma Ray, indicating strong goodness of fit
Has no public baseline to compare against — this is a new study introducing the architecture

We also introduced new techniques for synthetic well-log data generation using Gaussian process regression, with the synthetic data being basin-independent — usable across any operator's archive without retraining.

Limitations and future work

Single-track input requirement — currently the user provides a cropped section from the front-end, which can introduce errors in scale-value provision.
OCR for scale reading — designing an OCR architecture that automatically reads the scales of well-log curves would close the loop on full end-to-end digitisation.
Multi-track input — supporting all 3 or 4 tracks of a raster log image without manual cropping.
Hyperparameter ablation — the next iteration of VeerNet will provide a complete analysis of all hyperparameters, including the inconclusive transformer-count relationship.

In production

VeerNet is the model that powers ES Raster Digitizer, the EarthScan product line that turns operator raster archives into queryable digital curves at industry-scale throughput. The 50× throughput improvement over interpreter-only workflows means a basin's worth of historical logs goes from "multi-quarter project" to "weekend run."

Note: the original architecture and results figures from this paper were stored on the Strapi/Cloudinary instance that has since been retired. We're rebuilding them from the source PDF in a follow-up content pass, alongside the next-iteration VeerNet results.

Key takeaways

35K synthetic + 10K real images is the benchmark VeerNet was trained against — the synthetic generation pipeline is basin-independent, deployable across any operator's archive.
Best-in-class scores: F1 > 35%, IoU > 30% on real data; Pearson r = 0.62 on Gamma Ray against LAS ground truth.
Lovász wins on synthetic, Lovász + SCE share the lead on real — making Lovász the default and SCE the safe runner-up.
Number-of-transformers is non-monotonic — SCE+Dice want 3, Tversky wants 6, no clean monotonic relationship. The next ablation iteration will resolve.

Glossary

Cosine decay: Learning-rate schedule that decays the LR following a half-cosine curve from initial value to zero across the training horizon. Smoother than step decay, less aggressive than exponential — the default choice for transformer training.
Gaussian process regression: A non-parametric Bayesian regression technique that produces a distribution over functions consistent with observed data. Used here to generate synthetic well-log curves whose statistical properties match real basins without copying any specific well.
GCV: Global Curve Vectorisation — a thinning-based pixel method (Hilaire & Tombre, 2001) that reduces line widths to one-pixel skeletons before vectorising. Good for clean line images; loses information at curve intersections and noisy grids.
GELU: Gaussian Error Linear Unit (Hendrycks & Gimpel, 2016) — a smooth nonlinearity that approximates ReLU but is differentiable everywhere. Standard activation in modern transformer-based architectures (BERT, GPT, ViT).
Group normalisation: Normalises activations across groups of channels (Wu & He, 2018) — independent of batch size. The default normalisation choice when batch sizes are small (a constraint when training high-resolution images on a single GPU).
Lovász loss: A surrogate loss that directly optimises Intersection-over-Union (IoU) via a piecewise-linear extension (Berman et al., 2018). Wins F1/IoU ablations against Dice and cross-entropy when class balance is moderate — including in this paper's ablation.
Pearson correlation: A linear-correlation coefficient between two series — range [-1, 1]. Used here to compare digitised vs LAS ground-truth curves: r ≈ 0.62 on GR is strong; r near zero on CALI confirms the architecture struggles with sharp discontinuous traces.
SCE: Sparse Cross-Entropy — a variant of cross-entropy that operates on integer class indices rather than one-hot vectors. Computes the log only for the index the ground truth indicates; faster than full softmax-cross-entropy when classes are mutually exclusive.
SCTR: Scanning, Compressing, Tracing, Rectifying — the algorithm Neuralog (the dominant commercial baseline) uses for raster-log digitisation. Tracks individual curves through user-supplied seed points, with the trace pausing whenever it encounters background grid interference.
Texas RRC: Texas Railroad Commission — the state regulator overseeing Texas's oil and gas operations. Its public well-log archive is the standard real-data source for raster-log-digitisation research because of size, age range, and free-to-use licensing.