OSDU in Plain Language: What the Open Subsurface Standard Means for Data Teams

Ask what an open subsurface data standard buys you and you will usually get an answer about silos: OSDU exists so that a well log loaded by one team can be found, described, and read by another without a bespoke handoff every time. That is true, and it is the right reason to care. But it describes what happens after a curve is already a curve, already a column of depth-indexed numbers with a name a machine can resolve. The part that decides whether any of it applies to your archive sits one step earlier. This note is that step, written from the receiving end, for a data team that has been told to put its logs into an open standard and has just opened the folder to see what it actually has.

We are writing this because we did open that folder. On one archive the count was 136,771 TIF scans against 7,781 LAS files. For every log that arrives as machine-readable numbers, roughly seventeen arrive as an image of a log: a raster scan of paper, curves printed as ink, depth printed as tick marks, nothing a catalogue can index because there is nothing structured to index. OSDU has no opinion about a TIF. It is not that the standard rejects it; it is that the standard has no slot for it until the scan has become a set of curves. So the first honest thing to say about an open standard is that it presupposes the very thing that is hardest to get.

What OSDU actually standardises

OSDU came out of the Open Group forum formed in 2018 to give the energy industry a common, open data platform rather than one silo per vendor [1], and the Mercury release was the first commercial version of that merged code base that teams could actually load data into [2]. Strip away the platform talk and what it gives a data team is a shared grammar. It fixes how a well is identified, how a log is described, what a curve is called, and how you retrieve it later, so that the answer to "give me the gamma-ray curve for this well" is the same query no matter who loaded it or which cloud it sits on. It ingests the formats logs already come in, LAS and WITSML and the rest, and maps them onto one model. That is genuinely valuable, and none of what follows is a complaint about it.

The precise thing to notice is where that grammar begins. It begins at the point where a curve exists as depth-indexed values under a resolvable name. Everything OSDU does well, it does downstream of that point. Upstream of it, in the space where most real archives actually live, the standard is silent by design, because standardising retrieval was the problem it set out to solve and manufacturing the data was not. The gap is not a flaw. It is the standard's boundary, and knowing exactly where it falls is what separates a team that plans for the ingestion work from one that discovers it three months in.

The seam the standard leaves you

So the real question for a data team is not "does OSDU support well logs" but "what state does my archive have to be in before OSDU support means anything," and the answer has two parts, one obvious and one less so.

The obvious part is the raster. Those 136,771 TIF scans have to become curves. That is not a format conversion; it is a reconstruction. Something has to find each curve in the image, follow it down the depth axis, and emit depth-value pairs, and it has to do that per scan across an archive where the images are not even the same size. A standard cannot do this and does not pretend to. It waits at the door for numbers.

The less obvious part is that even the 7,781 files that are already digital are not automatically clean. Across those LAS files we found three different conventions for the same idea of a missing reading: -999, the string NA, and the string None. To a person these all mean "no value here." To a loader they are three unrelated tokens, and if you feed them straight into the standard you have not standardised your nulls, you have imported your inconsistency into a system that now looks tidy. This is the ordinary texture of real well-log tables, the same heterogeneity McDonald illustrates on a public set of 118 Norwegian Sea wells and 22 measurement columns, where the missingness has to be made explicit before any downstream use [3]. An open standard makes the tidy version portable; it does not make your version tidy.

Both parts land at the same seam. Before the catalogue can key a well by its canonical track groups, the curves have to exist, be depth-indexed on a common grid, and carry a single unambiguous marker for absence. That work is upstream of the standard, and it is where the effort actually is.

The subsurface data pipeline as three columns: what arrives, the digitiser gate, and the standard-conformant catalogue keyed by three canonical track groups (GR/SP/CALI, shallow/medium/deep resistivity, and NPHI/RHOB). The archive arrives as 136,771 TIF raster scans against 7,781 already-digital LAS files, so most of the corpus is pixels rather than numbers, and even the digital side carries inconsistent null conventions (-999, NA, None) that fold to one missing marker. The single orange element is the one that argues: the digitiser gate in the middle, which interpolates every scan to 300 depth points. Toggle it off and the raster lane strands, the catalogue draws only what the existing LAS files already carry, and 136,771 scans never reach the standard; toggle it on and the same catalogue fills across all three track groups. The file counts, null conventions, 300 depth points, and three track groups are sourced from the engagement archive; the bar lengths are scaled from the real TIF-to-LAS counts for legibility, not measured widths.

The exhibit is the argument in one frame. On the right is the standard-conformant catalogue, keyed by the three canonical track groups a petrophysicist reads together: the GR/SP/CALI track, the shallow, medium, and deep resistivity track, and the NPHI/RHOB density-neutron track, which carries two curves. That catalogue is what OSDU is for. In the middle is the one thing that has to exist before it can fill: a digitiser that turns a raster scan into curves and interpolates each to a common grid of 300 depth points. Toggle that gate off and the catalogue draws only what the 7,781 LAS files already carried; the 136,771 scans strand at the door. Toggle it on and the same catalogue fills across all three tracks. The standard did not change. What changed was whether the trapped raster became machine-readable first.

Why the grid and the track groups matter to the standard

It is worth being concrete about why "make the curves exist" is not enough on its own, and why the digitiser's output has to be shaped for the standard rather than just correct. Two scans of the same interval can be printed at different resolutions and cropped to different depth ranges, so the raw curves you recover from them are not directly comparable even once both are numbers. Interpolating every recovered curve to a fixed 300-point depth grid is what makes them comparable: it puts every log on the same ruler, so a query for a depth interval returns the same shape of answer regardless of which scan it came from. That regularity is what a catalogue relies on to treat two wells as instances of the same thing. The track groups do the parallel job for naming: a standard needs to know that this curve is a gamma-ray and that one is a deep resistivity, not just that both are "curve columns," so emitting curves already grouped into the canonical tracks slots the output into the model instead of leaving anonymous numeric columns for someone to label by hand. The upstream step is not merely a precondition for OSDU; it has to be built with OSDU's shape in mind.

What this means for planning

The practical consequence is a reordering of the work. If your archive is mostly raster, the standard is the last mile, not the first, and treating it as the first is how a data programme stalls. The sequence that actually holds is: get the scans digitised into curves; put those curves on a common depth grid; reconcile the null conventions so absence means one thing; group everything into the canonical tracks; and then, with an archive that is finally all numbers under resolvable names, adopt the standard and get the interoperability it was built to give. Skipping to the last step with a folder that is seventeen-eighteenths images does not fail loudly. It fails quietly, as a catalogue that faithfully indexes the small digital slice of your holdings and silently omits the rest.

We use VeerNet as our own answer to the upstream step, which is why its outputs are shaped the way they are: curves interpolated to 300 depth points, grouped into the canonical tracks, with a single missing marker. But the point of this note is not the tool. It is the boundary. An open subsurface standard is a real asset, and the sooner an archive is genuinely made of curves the sooner that asset pays off. The mistake is to hear "open standard" and think it will meet you where your data is. It meets you exactly one step later than that, and the step in between is yours.

Limitations

This is an explainer grounded in one archive's counts, not a survey of OSDU deployments, and it should be read as the former. The 136,771 TIF and 7,781 LAS figures, the three null conventions, the 300-point depth grid, and the three canonical track groups with two curves in the density-neutron track are real numbers from the engagement we drew on; the ratio of raster to digital in your own archive will differ, and with it the size of the upstream problem we are describing. We have also deliberately treated OSDU at the level a data team first meets it, as a shared grammar with an ingestion boundary, and have not gone into its schema versioning, entitlements, or the specifics of any one cloud provider's hosted platform, all of which matter once you are past the boundary and none of which change where the boundary falls. The digitiser step is illustrated through our own tool and its output contract, so the specific shaping choices, the 300-point grid and the track grouping, are ours and are not prescribed by the standard; a different team could meet the same boundary with different upstream choices. And nothing here speaks to whether a recovered curve is accurate enough to trust, which is a separate question the standard also does not answer and which decides, in the end, whether a well that catalogues cleanly is actually usable.

References

[1] The Open Group. Open Subsurface Data Universe (OSDU) Forum overview. The forum formed in 2018 to build an open, standards-based subsurface data platform for the energy industry, reducing data silos through a common data model and ingestion path. https://www.opengroup.org/membership/forums/open-subsurface-data-universe

[2] The Open Group OSDU Forum. Launch of the OSDU Data Platform Mercury Release, the first commercial release of the merged open-source code base made available through cloud service providers for ingestion and consumption of subsurface data. https://drillingcontractor.org/the-open-group-osdu-forum-launches-osdu-data-platform-mercury-release-59740

[3] McDonald, A. Using the missingno Python library to Identify and Visualise Missing Data Prior to Machine Learning. Towards Data Science (2021). A worked example on 118 Norwegian Sea wells and 22 measurement columns showing that well-log tables arrive with heterogeneous missingness that must be made explicit before modelling. https://towardsdatascience.com/using-the-missingno-python-library-to-identify-and-visualise-missing-data-prior-to-machine-learning-ad9d4f78b4dd/

OSDU in Plain Language: What the Open Subsurface Standard Means for Data Teams

What OSDU actually standardises

The seam the standard leaves you

Why the grid and the track groups matter to the standard

What this means for planning

Limitations

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on