Ask an engineer where the bugs were on their last machine-learning product and they will tell you a story about the model: a stubborn loss curve, a class that would not separate, a metric that refused to climb. Ask the same engineer where the project lost the most time and the answer is usually somewhere else entirely. It was the week the dashboard rendered a curve upside down because the depth axis ran the other way. It was the afternoon the array came back one element short and the chart silently dropped its last sample. It was the standup where the model team said the field was called prediction and the frontend had been reading pred for a fortnight. None of those is a modelling failure. They are all the same failure, which is that the interface between the model and the thing that displays it was never written down as a thing anyone owned.
That interface has a name worth using on purpose. It is a data contract: the agreed, explicit shape of the payload the model emits and the frontend consumes, treated as an artefact in its own right rather than as a byproduct that falls out of whatever the model happens to return this week. The argument of this piece is narrow and, I think, underrated. The contract is where most integration bugs in an ML product actually live, and designing it first, before the model is final, is what lets the two halves of the team move at their own speeds without colliding. When EarthScan built VeerNet, the network we use to digitise raster well logs, the mask-to-CSV contract was the first thing we froze and close to the last thing we changed, and that ordering is most of why the dashboard shipped weeks before the model did.
The interface is the seam where two teams stop being one team
There is a deep idea from distributed systems that transfers cleanly to the model-and-dashboard split, even though no network is involved. When two components are developed by two groups on two schedules, the only thing that keeps them composable is a stable agreement about the messages that cross between them. Martin Fowler called the disciplined version of this a consumer-driven contract: the consumer publishes the expectations it depends on, the provider promises to honour them, and either side is then free to change anything that the contract does not mention [1]. The whole point is to make the surface of mutual dependence small and explicit, so that a change inside one component cannot reach into the other unless it crosses that surface, where a test will catch it [2].
A model and a dashboard are exactly this situation wearing different clothes. The model team wants to retrain, swap a loss, change a backbone, and re-run an ablation without asking permission. The frontend team wants to build the chart, the depth scrubber, the tooltip, and the export button without waiting for the final weights. Those two desires are only compatible if the messages between them are pinned. The pinned thing is the payload schema. Once it is fixed, the model team can replace everything behind it and the dashboard team can replace everything in front of it, and as long as the payload still validates, neither break reaches the other. Steve Newman makes the same case for services: stable, versioned interfaces are what let independently deployable parts stay independent, and the moment two parts share internals instead of a contract, they have quietly become one part again [6].
What goes in the contract, concretely
The abstraction only earns its keep when you write down the actual fields, so here is the contract we use for a single digitised curve, in the spirit of the thing rather than its literal serialisation. The payload is JSON, because JSON is the lingua franca of this boundary and its well-formedness rules are unambiguous [3]. Three clauses do almost all of the work.
The first clause is shape: every curve arrives as an array of exactly 300 interpolated depth points. This number is not cosmetic. Our validation notebooks resample each predicted curve and the ground-truth LAS onto the same depth axis at 300 points precisely so that two one-dimensional functions can be compared sample for sample. Because the frontend knows the array is always length 300, it can allocate the chart, size the scrubber, and lay out the export grid before a single real prediction exists. A fixed length is the cheapest possible contract clause and it eliminates an entire genus of off-by-one rendering bugs.
The second clause is the label channel. The model is a segmentation network with a 3-class mask convention: background, curve one, and curve two. The payload therefore carries, alongside the value at each depth, a class tag drawn from that small enumeration. The frontend does not need to know how the mask was produced; it needs to know that the tag is one of three values and what each one means, so it can colour curve one teal, curve two its own shade, and background nothing. Pinning the enumeration in the contract means a future model that adds a fourth class cannot silently appear in the dashboard as an uncoloured mystery. It would fail validation, loudly, at the boundary [4].
The third clause is the one teams most often skip, and it is the one this whole piece is really about: the contract must declare the error the payload is allowed to carry. VeerNet's recovered curves land at a per-curve mean absolute error of 0.11 on curve one and 0.12 on curve two. That is the accuracy the model actually delivers, and so it is the tightest honest tolerance the contract can promise. A contract that declares a smaller error band is not a stricter contract; it is a false one, because it invites the dashboard to render a precision the model cannot back. The tolerance clause is where a data contract stops being a layout convenience and becomes a statement of truth about the product.
The checker above is the contract made literal. Each schema field validates independently against the same payload, and the two structural fields, the 300-point length and the three-class enumeration, pass by construction. The two error fields are gated by the tolerance lever. Drag the declared band below a curve's real error and that field breaks, in orange, because the contract has begun to promise something the model cannot honour. Watch it and the abstract claim becomes concrete: a contract is honest exactly when its declared band covers the error the model genuinely makes, and dishonest the instant it claims tighter.
Why the contract has to be born before the model
Here is the sequencing that the contract unlocks, and it is the practical heart of the argument. If you wait for the model to be finished before you decide what it returns, the frontend cannot start until the model is done, and the two teams are serialised end to end. If instead you write the contract first, the frontend builds against a fixture, a hand-authored payload that satisfies the schema and contains plausible but fake numbers, and it builds the entire dashboard against that fixture while the model team is still arguing about loss functions.
We did exactly this. The dashboard read a fixture of 300 synthetic depth points with three-class tags and a declared error band long before VeerNet produced a single real curve, and when the real model finally emitted its first payload, the dashboard rendered it with no changes, because the contract the fixture obeyed was the contract the model targeted. The fixture was not throwaway work; it became the seed of the frontend's test suite, the payload that every chart component is asserted against forever after. This is the inversion that surprises people: the contract is more durable than either the model or the dashboard, and treating it as the stable centre rather than as fallout from the model is what lets both ends move quickly without breaking each other.
There is a discipline that keeps this honest, and it is older than any of the web tooling. The robustness principle, be conservative in what you send and liberal in what you accept, comes from the earliest internet protocol design and it is exactly the right posture at a model-frontend seam [5]. The model side should emit payloads that conform strictly to the contract, never a stray extra field or a length of 299. The frontend side should validate defensively and fail visibly rather than guess, so that a contract violation surfaces as a caught error at the boundary instead of a curve quietly drawn wrong three components deep. Conservative emission plus liberal, validating consumption is what turns the contract from a gentleman's agreement into a tested boundary [2].
The two failures a contract is built to prevent
It helps to name the specific bugs this catches, because they are mundane and that is the point. The first is the silent shape drift: the model team changes the resampling to 256 points to save a little compute, the payload is still valid JSON, and the dashboard, which hardcoded an assumption of 300 somewhere, draws a curve that is subtly compressed in depth. With a contract and a validator, that change fails at the boundary the moment the array is the wrong length, and it fails in the model team's own test run rather than in a petrophysicist's afternoon. The second is the false-precision drift: someone tightens the tolerance in the spec to make a slide look better, the contract now declares a band the model never hits, and the dashboard dutifully renders curves to a confidence they have not earned. A contract that names the real per-curve error of 0.11 and 0.12 makes that overclaim impossible to ship, because the declared band is checked against the observed band and the lie does not validate.
Both failures share a structure. Each is a change made in good faith on one side of the seam that quietly violates an assumption on the other side, and each is invisible until a human notices a wrong picture. The contract converts every one of them from a late, human-noticed display bug into an early, machine-noticed validation failure, which is the entire economic case for writing the schema down [4][6].
The shape of the agreement matters more than the model behind it
The thing I want a reader to carry out of this is a reweighting of attention rather than a new technique. We spend the overwhelming majority of our energy on the model, because the model is hard and interesting and ours, and we spend almost none on the half-page of schema that decides whether the model's output can be consumed at all. That ratio is upside down relative to where the integration pain actually accrues. The model is replaceable; we swapped backbones and losses repeatedly and the dashboard never noticed, because the contract absorbed the churn. The contract is not replaceable in the same casual way, because changing it means coordinating both teams, and that asymmetry is precisely the signal that the contract, not the model, is the load-bearing interface.
So write it first, version it, validate against it on both sides, and make its tolerance clause tell the truth about the error your model genuinely carries. A mask-to-CSV agreement of three honest clauses bought us a dashboard that shipped ahead of the model it displays, and bought our two teams the freedom to be fast without being fragile. The best thing we built on that project was not a network that reads paper logs; it was the small, unglamorous document that let everything else proceed in parallel.
Key takeaways
- Most integration bugs in an ML product live in the schema between the model and the dashboard, not in the model or the dashboard, because that interface is rarely owned or written down as a versioned artefact.
- A data contract is the consumer-driven-contract pattern applied to the model-frontend seam: pin the payload shape, let either side change anything the contract does not mention, and catch violations at the boundary instead of in a wrong picture.
- The VeerNet mask-to-CSV contract has three clauses: a fixed length of 300 interpolated depth points per curve, a 3-class mask enumeration (background plus two curves), and a declared error band.
- The tolerance clause must tell the truth: VeerNet's per-curve MAE is 0.11 and 0.12, so a contract declaring a tighter band is not stricter but false, and it invites the dashboard to render precision the model cannot back.
- Writing the contract before the model is final lets the frontend build against a fixture and ship weeks ahead; the contract proved more durable than either the model or the dashboard, which is the signal that it, not the model, is the load-bearing interface.
References
[1] Fowler, M. Consumer-Driven Contracts: A Service Evolution Pattern. martinfowler.com (2006). Consumers publish the expectations a provider must honour, so the interface can evolve without breaking either side. https://martinfowler.com/articles/consumerDrivenContracts.html
[2] Fowler, M. IntegrationTest and the place of contract tests in the test pyramid. martinfowler.com (2018). Why a narrow, agreed interface is cheaper to verify than a broad integration test. https://martinfowler.com/bliki/IntegrationTest.html
[3] Bray, T. (ed.) The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259, IETF (2017). The interchange format the payload is serialised in and the rules for a well-formed document. https://www.rfc-editor.org/rfc/rfc8259
[4] Wright, A., Andrews, H., Hutton, B., and Dennis, G. JSON Schema, draft 2020-12. IETF (2020). A vocabulary for declaring and validating the structure of a JSON payload, which makes a data contract machine-checkable. https://json-schema.org/draft/2020-12/release-notes
[5] Postel, J. (ed.) Transmission Control Protocol. RFC 761 (1980). Source of the robustness principle, be conservative in what you send and liberal in what you accept, which every payload boundary inherits. https://www.rfc-editor.org/rfc/rfc761
[6] Newman, S. Building Microservices, 2nd edition. O'Reilly (2021). Versioned, independently deployable interfaces, and why parts that share internals instead of a contract have quietly become one part again. https://www.oreilly.com/library/view/building-microservices-2nd/9781492034018/