Competing With Enverus and IHS Without Their Data Moat

There is a version of the well-log digitisation pitch that ends the meeting early: the one where you promise to build a better data set than the companies that already own the data. Enverus and IHS have spent decades assembling licensed well headers, production histories, and interpreted logs, and they rent that corpus to nearly everyone who drills. If your plan is to collect a rival corpus and undercut them on price, you have picked the one contest they are structurally certain to win. They started twenty years ago, the data compounds, and every new licence deepens the moat. We are a digitisation team, not a data broker, and our first strategic decision was to stop fighting on that ground at all.

This is a note about the axis we chose instead. It is deliberately not the VeerNet whitepaper, which is about the architecture and the training. It is about the commercial shape: who owns the input, what a unit of output costs to produce, and why those two facts let a small team hold a defensible position next to companies a thousand times its size. We do not compete for the data. We compete for the work of making a customer's own data usable, and that is a different market with a different cost structure.

The moat is real, and it is not the moat you attack

Start by being honest about the incumbent's advantage, because underrating it is how challengers die. The subsurface-data majors own their corpus outright, they license it, and the licence is the product. Christensen's observation applies cleanly: an incumbent optimises the axis it already leads and defends it with everything it has, because that is where its revenue lives [1]. Attacking the corpus means attacking the exact thing the incumbent is best in the world at protecting. You will lose, and you will lose expensively.

The economics of information make the same point from the other side. Shapiro and Varian described why the owner of a large proprietary corpus prices access rather than giving it away: serving one more query costs almost nothing, but the whole collection is enormously valuable, so the owner meters it [2]. That metering is the incumbent's business model, and it is also its blind spot. A licence prices access to data the vendor owns. It says nothing about data the vendor does not own and has no reason to touch.

The data nobody is selling is the data the customer already has

Every operator we have worked with is sitting on an archive of its own scanned logs. Old wells, acquired assets, paper records photographed and filed, raster images that are legally and physically the customer's property and that no incumbent will ever license back to them, because the incumbent does not have them. As a concrete illustration of the scale, a single public archive we trained and tested against holds 136,771 raster TIF scans and 7,781 LAS files. An operator's private version of that pile is the same shape: large, trapped, and worthless as long as the curves live as pixels instead of numbers.

That is the wedge. We are not asking the customer to buy someone else's data. We are turning the customer's own data, which they already paid for once, into something a petrophysicist can compute on. The incumbent cannot follow us here without abandoning the model that makes it money, because there is no corpus to license, only a service to perform. This is the different axis, and it is the whole strategy in one sentence: do not sell data you own, sell the digitisation of data the customer owns.

Where a digitisation challenger can actually win against data-moat incumbents. The plane sets whose data you monetise (left: the incumbent's proprietary licensed corpus, which you cannot own; right: the customer's own archive, already sitting on their drives) against what one more usable record costs to produce (top: licence-priced by a data owner; bottom: compute-priced by a rented GPU at 750 or 1,800 EUR per month, which collapses toward a floor). The incumbents sit top-left: enormous proprietary data at licence-priced access. A challenger that copies that move lands beside them and loses. Drag the lever and the orange marker, the only element that argues, glides from that losing top-left anchor to the bottom-right wedge: the customer's own trapped archive, illustrated by a 136,771-TIF and 7,781-LAS public scan corpus, turned into usable data at a compute-priced margin. The market read-out swaps with it, from chasing the 134B USD TAM the incumbents already own to taking a 3% SOM of the 6.7B USD SAM, 180M USD by EOY5, at 1,200 USD per seat, with 3 letters of intent as early demand. The market figures, the per-seat price, the archive counts, the GPU tiers, and the letters of intent are sourced from the engagement archive; the positions of each firm on the plane are an illustrative positioning judgement, not a measurement.

Why the cost structure holds

A positioning claim is only worth as much as its unit economics, so here is the arithmetic that makes the wedge defensible rather than merely clever. When you sell licensed data, your cost floor is the licence you paid to assemble the corpus. When you sell digitisation of the customer's own archive, your cost floor is compute. One trained model serves every scan, so turning one more raster log into curves costs a slice of a rented GPU, and the rentable tiers we built against were 750 and 1,800 EUR per month. There is no per-record data cost, because there is no data to license and the customer supplies the input for free.

That difference is the reason the position exists. An incumbent's marginal record carries a share of a corpus that took decades and a great deal of money to build. Our marginal record carries a share of a monthly GPU bill. We priced the service at 1,200 USD per seat per year, which works because the thing under the price is compute divided by throughput, not a data licence divided by scarcity. The pricing follows the cost structure, and the cost structure follows the axis we chose.

The market that math implies

None of this requires the challenger to win the whole market, and the sizing makes that plain. Framed the way the incumbents frame it, the total addressable market for oil and gas transactions data runs to about 134 billion USD, and the serviceable slice, the oil and gas technology market where a digitisation tool actually competes, is around 6.7 billion USD. We never modelled taking the 134 billion. Out-hoarding the firms that own that market is exactly the fight we refused. We modelled a 3 percent serviceable-obtainable share of the 6.7 billion, roughly 180 million USD by the end of year five.

Three percent is a modest number on purpose. It is the number you can defend when you are not trying to displace the corpus owner, only to serve a job the corpus owner ignores. Early demand was consistent with that read: we carried three signed letters of intent from prospective customers spanning explorers, a private-equity holder, and an operator, which is a small sample but a real one, and it came from the wedge, not from a promise to out-collect anybody.

What we are not claiming

The honest boundary of this argument is that a good axis is necessary, not sufficient. Choosing to compete on the customer's own archive at a compute-priced margin gets you a position the incumbents have no incentive to defend, per Porter's point that the durable move is to pick ground the leader has not fortified [3]. It does not get you a working product. The digitisation has to be accurate enough that a petrophysicist trusts it, the model has to generalise off the public training corpus onto a given operator's messier scans, and the cost story only holds if serving throughput is high enough to keep the per-log number low. Those are real risks the positioning does not remove. The strategy tells you which fight to have. It does not win the fight.

Limitations

The market figures are the sizing we used, framed as the incumbents frame it, and they estimate a market, not audited revenue: the 134 billion USD total, the 6.7 billion USD serviceable slice, the 3 percent target, and the 180 million USD end-of-year-five projection are a plan, and plans miss. The three letters of intent are a genuine but tiny demand signal, not a repeatable sales motion, and none is a booked contract. The 136,771 TIF and 7,781 LAS counts describe the public archive we illustrate the wedge with, not any one customer's private pile, whose size and quality will vary. The 750 and 1,800 EUR per month GPU tiers are real rental figures, but the per-log cost that follows depends on serving throughput, an engineering result covered elsewhere and not defended here. The positions on the map are a strategic judgement, not a measured coordinate. And the whole argument assumes the digitisation is good enough to sell, which this note asserts rather than proves.

The one decision that mattered

If there is a single transferable idea in this, it is that a challenger's most important choice is often subtractive: deciding which contest not to enter. We could have spent our first two years and all our money trying to assemble a corpus to rival companies that had a twenty-year head start, and we would have had nothing to show for it. Instead we conceded the data they own, entirely and on purpose, and went after the data they do not, the customer's own trapped archive, at a cost structure they cannot match without changing what they are. The moat is real. We just chose not to swim across it.

References

[1] Christensen, C. M. The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail. Harvard Business School Press (1997). Why incumbents optimise the axis they already lead, and how an entrant wins on a different one the incumbent has no incentive to defend. https://www.hbs.edu/faculty/Pages/item.aspx?num=46

[2] Shapiro, C., and Varian, H. R. Information Rules: A Strategic Guide to the Network Economy. Harvard Business School Press (1999). The economics of information as an asset, and why the owner of a large proprietary corpus meters access rather than giving it away. https://www.hbs.edu/faculty/Pages/item.aspx?num=170

[3] Porter, M. E. How Competitive Forces Shape Strategy. Harvard Business Review 57, no. 2 (1979), pp. 137-145. Entry barriers and the case for choosing where to compete rather than accepting the ground an incumbent has already fortified. https://hbr.org/1979/03/how-competitive-forces-shape-strategy

Competing With Enverus and IHS Without Their Data Moat

The moat is real, and it is not the moat you attack

The data nobody is selling is the data the customer already has

Why the cost structure holds

The market that math implies

What we are not claiming

Limitations

The one decision that mattered

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on