Skip to main content

Blog

The Inventory App: Where Digitisation Meets Asset Management

A product note on the part of a log-digitisation system that gets the least attention and does the most work: the inventory. Extracting a curve off a scanned raster produces a machine-readable file, but a file that is not registered anywhere is not an asset, it is a result that will be re-extracted next quarter by someone who could not find it. The inventory app is the layer that turns a one-off extraction into a durable, searchable subsurface-data holding, and it is where digitisation stops being a script and starts being asset management. Grounded in an engagement whose source archive was 136,771 TIF scans and 7,781 LAS files, this walks the four-step scan-to-CSV loop, shows why registration is the step that creates value, and treats the per-seat access model as a consequence of the inventory rather than the digitiser.

Quamer NasimTarry SinghNarendra Patwardhanby Quamer Nasim, Tarry Singh, Narendra Patwardhan8 min read
EarthScan insight

Most of the attention in log digitisation goes to the hard, visible part: turning a scanned paper well log into curves a machine can read. That is the part with the model in it, and it is genuinely difficult. But the part that decides whether the work compounds or evaporates is the boring one that sits after the model, and it is almost never in the demo. It is the inventory: the searchable record of what has been digitised, from which scan, into which curves, so that the next person who needs that log finds it instead of digitising it again. This note is about that layer, and about the claim it rests on, which is that a digitised curve is not an asset until it is catalogued. Extraction gives you a file. Registration is what gives you a holding.

We build this distinction into the product on purpose, because the failure it prevents is expensive and quiet. A team stands up a digitiser, runs a batch, and produces thousands of clean CSVs. Six months later a different analyst needs one of those curves, cannot find it, cannot be sure it exists, and re-runs the extraction. The compute was cheap the second time; the lost time and the two slightly different versions of the same curve were not. The digitiser worked perfectly both times. What was missing was the answer to a plain question: is this log already in our holdings, and where. That question is an inventory question, not a model question, and it is the reason the inventory app is the spine of the product rather than a nicety bolted on the side.

What the inventory has to hold

The scale is what forces the issue. The source archive on the engagement behind this was 136,771 TIF raster scans and 7,781 LAS files. That is not a set anyone browses by eye or tracks in a spreadsheet. At that size the archive has two states that look identical on disk and are worlds apart in value: files that have been digitised and registered, and files that have been digitised and lost track of. The bytes are the same. The worth is not, because worth here is a function of findability, and findability is a property the inventory supplies, not one the extraction produces.

There is a second reason the inventory cannot be an afterthought, which is that a single digitised output is not one thing. Each scan carries several curves laid out across tracks, and the density varies by track: the tracks holding GR, SP, and CALI, and the resistivity track, carry three curves each, while the porosity track carries two, NPHI and RHOB. So one registered log is really a small bundle of separately addressable curves, each of which someone might want to pull on its own. An inventory that only knows "log X exists" is barely an inventory. The one worth building knows which curves came off which track of which scan, so a petrophysicist can ask for the RHOB from a specific well and get exactly that, rather than a whole digitised sheet to hunt through.

The four-step loop, and where value actually lands

The digitiser is a loop with four steps, and naming them in order makes it obvious which one the inventory depends on. First, scan: the raster TIF enters the pipeline. Second, segment: the model lifts each curve off its track. Third, extract: the masks become depth-indexed CSV, one column per curve. Fourth, register: the CSV is catalogued into the inventory with its provenance, so it is searchable. The first three steps are the digitisation everyone thinks of. The fourth is the one that turns the output into an asset, and it is the one most homegrown pipelines skip, because it produces no visible artifact of its own. It just writes a row.

That fourth step is the whole argument, so it is worth being precise about what it changes. After step three you have a correct CSV sitting in a directory. It is machine-readable, it is accurate, and it is invisible: nothing in the system can answer "do we have this curve" except a human who happens to remember. After step four the same CSV is a record you can query, filter, and hand to the next person by reference. Nothing about the numbers changed between step three and step four. What changed is that the curve went from a result to a holding. The instrument below makes that transition the thing you can move with your own hand.

INVENTORY LEDGER · EXTRACTED BYTES VS FINDABLE ASSETS40,475files searchable in the inventoryA digitised curve becomes an asset only once it is registered and searchableA · THE SAME ARCHIVE, TWO WORTHS144,552 files to indexextracted, not findable104,077 filescatalogued asset40,475 filessource archive: 136,771 TIF scans + 7,781 LAS filesB · THE SCAN-TO-CSV LOOP THAT FEEDS THE INVENTORY1Scanraster TIF into the pipeline2Segmentlift each curve off the track3Extractmasks back to depth-indexed CSV4Registercatalogue the CSV into the inventoryonly after step 4 is a curve searchable; steps 1-3 leave a file no one can findeach output carries 2-to-3 curves per track (3: GR/SP/CALI, resistivity · 2: NPHI, RHOB)CATALOGUING LEVERdrag the share of the archive registeredinto the searchable inventory0%25%50%75%100%28%findable40,475stranded104,077per seat$1,200/yrsourced: 136,771 TIF + 7,781 LAS, four-step loop, 2-3 curves/track, $1,200/seat/yr · catalogued share is the lever
The inventory app, argued as one ledger: the source archive of 136,771 TIF scans and 7,781 LAS files is a fixed column, and the cataloguing lever splits it into two worths. The teal segment is what has been extracted but not yet registered, a file that exists and that no one can find; the orange segment, the only element that argues, is what has been catalogued into the searchable inventory and is therefore a durable, discoverable asset. Drag the lever and the same bytes move from stranded to findable without a single new extraction, which is the whole point: value accrues at registration, not at extraction. The right panel is the four-step scan-to-CSV loop that feeds the ledger, with registration drawn as step four so it is clear that steps one through three leave a curve that is machine-readable but invisible. The archive counts, the four-step loop, the 2-to-3 curves per track, and the 1,200 USD per seat per year are sourced from the engagement; the catalogued share is a lever the reader moves, not a claimed figure.

Drag the cataloguing lever and watch the archive split. The teal band is everything that has been extracted but not registered, files that exist and that no one can find; the orange band is the catalogued share, the part that has actually become searchable. The point the exhibit is built to make is that the two bands are the same files. Moving the lever does not run a single new extraction. It only registers what was already digitised, and yet the findable holding grows from almost nothing to the full archive. That is the shape of the claim: on a 144,552-file archive, the value gap between a digitiser and an inventory is not a modelling gap, it is a cataloguing gap, and it is enormous.

Why this is asset management, not filing

Calling the inventory "asset management" rather than "a database of outputs" is a deliberate framing, and it earns its keep in how the system is used. An asset has provenance, a location, and an access policy, and the inventory carries all three. Provenance is the link back from a curve to the exact scan and track it came from, which is what lets a geologist trust a digitised RHOB enough to put it in a decision. Location is the searchable index that answers the existence question before anyone spends compute. Access policy is the part that decides who in the organisation can pull which holding, and it is why the commercial model is what it is: access to the inventory is priced per seat, at 1,200 USD per user per year. That is a subscription to a searchable holding of subsurface data, which is a different thing from paying per scan to run a digitiser. You pay for the digitiser once, per log; you pay for the inventory per person who needs to reach into it, because the inventory is the durable thing and the people are the ones who realise its value.

This is also why the inventory, not the model, is what makes the digitisation compound. A model that gets better makes each new extraction cleaner, which is worth having. But an inventory that holds every extraction, addressable down to the individual curve, is what makes the hundred-thousandth log as easy to find as the first, and what stops the archive from quietly re-digitising itself every time someone new joins and cannot find what is already there. The model is the engine. The inventory is the odometer, the logbook, and the title, and those are the parts that make the thing an asset you own rather than a trip you took.

Where digitisation meets asset management

The two disciplines meet at step four, and the meeting is the product. Digitisation, on its own, is a service you consume: you hand it scans and it hands you curves. Asset management, on its own, has nothing to manage until something has been digitised. Put them together with registration as the hinge, and you get a system where every scan you run makes the holding larger, more searchable, and more worth paying to reach. The 136,771 scans and 7,781 LAS files stop being a backlog to grind through and become an inventory to build up. That is the difference the fourth step makes, and it is why we treat the inventory as the point of the product rather than the place the CSVs happen to land.

Limitations

This is a product argument grounded in one engagement's archive, and it should be read at that altitude. The file counts, 136,771 TIF and 7,781 LAS, are the real source-archive figures, as are the four-step loop, the 2-to-3 curves per track, and the 1,200 USD per seat per year; the catalogued fraction in the exhibit is a lever the reader moves to see the extracted-versus-findable split, not a measured completion rate for any particular deployment, and the exhibit says so on its face. The note also deliberately does not defend the digitiser itself. Whether a given curve is extracted accurately, whether the segmentation holds across the messier scans in a 136,771-file archive, and how the model was built are separate questions, treated elsewhere; the claim here is only that even a perfect extraction is not an asset until it is catalogued. Finally, the per-seat figure is a list price for access, not a statement of realised value; what an inventory is worth to a given operator depends on how often the existence question actually gets asked, which varies with team size and turnover more than with the archive.

Go to Top

© 2026 Copyright. Earthscan