The Reviewer Said Unpublishable: Anatomy of a Peer-Review Rescue for Industrial AI Research

The word we did not expect to read in a second-round review letter was "unpublishable." Not "revise," not "insufficient," but the flat recommendation that the paper should not appear at all. It was one reviewer of three on a manuscript describing a Detection-Transformer model that picks fractures and bedding planes on borehole-image logs, work from a roughly twenty-month engagement with a mid-sized Middle East carbonate operator. The other reviewers wanted revisions. This one wanted the door shut, and gave three reasons for it.

What followed is the part of research that rarely gets written down: the exact sequence of moves that turned that reject recommendation into an acceptance. The constraint at the centre of it, a confidentiality agreement that forbids releasing code or data, is the normal condition of industrial AI, not the exception. Build models on a company's proprietary corpus and you will eventually meet a reviewer who treats closed code as disqualifying, and an editor who offers you an escape hatch you cannot use. This is how one rescue actually ran.

The three grounds for a reject recommendation

The reviewer's case was specific and, to be fair to it, technically literate. First, the train, validation, and test split was said to overestimate performance. We had partitioned at the patch level rather than holding out whole wells, because only fourteen wells existed in the entire company-provided corpus and sacrificing one to a test set removes a meaningful slice of the geology the model has to learn from. To a reviewer reading quickly, patch-level splitting reads as leakage. Second, we were charged with refusing to test the model on other wells. Third, with refusing to compare against popular segmentation methods. Two of the three grounds were accusations of refusal.

That framing stung because it was a misreading, but one we had handed the reviewer by not being explicit enough. There was no other well to test on; fourteen wells were the whole dataset. And we had not refused a comparison so much as judged it structurally impossible without saying so on the page. A reject recommendation built on "they refuse" is answerable only if you can show the refusals were never refusals.

The remedy the editor offered, and why we could not take it

The associate editor did something generous and, in context, almost impossible. Rather than side with the reviewer, the editor proposed a remedy: upload the model code to GitHub so an independent third party could verify it on public data. It is exactly the mechanism modern machine learning trusts. Open the code, point it at an open benchmark, let anyone re-run the numbers. For a benchmark paper it would have settled the question in an afternoon.

We could not do it. The code was trained on and entangled with a national oil company's proprietary well data, and the tripartite confidentiality agreement that let us touch that data forbids releasing it. We did the only honest thing available, which was to go back to the client and ask for permission to release the code anyway. They declined. The PhD candidate on the work is bound by the same agreements. The editor's escape hatch was shut, and it was shut by the very contract that made the research possible in the first place.

That is the hinge of the whole story. With the GitHub remedy off the table, we had no way to substitute open verification for the reviewer's doubt. We had to substitute something else, and the only currency left was evidence produced from inside the confidential boundary.

The rescue path of one fracture-detection manuscript, read left to right. A single reviewer recommended it unpublishable on three grounds: a train / validation / test design said to overestimate performance, an apparent refusal to test other wells, and an apparent refusal to compare against popular methods. The associate editor offered a way out: upload the code to GitHub so a third party could verify it on public data. The confidentiality agreement shut that door, and the door is the orange hinge of the whole story: only fourteen company-provided wells existed, and when the client was re-approached for permission to release code they declined. So the verdict was reversed by evidence rather than by argument, three additions all folded under the journal's 11,000-word ceiling: blind-well predictions from both eleven-well models on the remaining wells, a five-horizontal-well inclination study, and a mapping that showed a head-to-head comparison was infeasible rather than refused, because segmentation needs months of manual masks and never outputs dip or azimuth. Toggle the confidentiality block to trace the counterfactual: with it off, the editor's verification path lights; with it on, the real history routes through the three evidence nodes to acceptance. Every verdict, remedy, number, and evidence node is sourced from the second-revision response letter; only the arrangement of the flow is illustrative.

Three additions, one word ceiling

We answered each ground with a concrete artifact rather than a paragraph of defence.

For the charge that we refused to test other wells, we ran blind-well predictions. Both eleven-well models, the combined bedding-plus-fracture detector and the beddings-only variant, were pointed at the wells they had never seen from within the fourteen, and those predictions went into the results and discussion, the conclusion, and the supplementary material. That is the answer a reviewer cannot wave away: here are the predictions, on held-back geology, from the models as published.

For the charge that the validation overstated performance and that inclination was untested, we extended the study with five horizontal wells. Horizontal geometry is the hard case for a sinusoid detector, because the apparent amplitude of a planar feature shrinks as the borehole tilts into the bedding, so the patch size drops and the parameter estimates get noisier. Detection and localisation held up reasonably; parameter estimation was weaker on the small horizontal sample, and we said so rather than hiding it. Admitting the weaker corner of a result is itself an argument that the strong corners are real.

For the charge that we refused to compare against popular methods, we showed the comparison was infeasible, not declined, with the mechanics laid out below. In short: the popular alternatives are segmentation networks that output masks, not the dip and azimuth an interpreter needs. We never claimed to beat prior work; we claimed to do a different job.

Every one of these additions had to fit under the journal's 11,000-word limit. That ceiling is not a footnote. It forced the new figures into the supplementary material and forced the prose to be denser than any of us would have chosen. A rebuttal is not a place to write expansively; it is a place to spend a strict word budget on the smallest set of artifacts that each close one objection cleanly.

Why the comparison was never a comparison

The segmentation ground is the one reviewers press hardest and teams most often lose on, so it is worth being exact. A segmentation model labels pixels, fracture or not fracture, and its output is a mask. What an interpreter needs is not a mask but two numbers per feature: dip and azimuth. Those come only from fitting a sinusoid to each feature.

the sinusoid whose parameters we regress directly

y = A\,\sin\!\left(\tfrac{\pi}{180}\,x + \varphi\right) + \mathrm{offset}

The amplitude (A) encodes dip, the phase (\varphi) encodes azimuth, the offset is depth. A mask contains none of them; it hands you a shape you must then measure, after paying the months of manual masking a segmentation model needs before it can train at all. Our model skips both steps and emits the parameters directly. So "compare against segmentation" asks us to bench a method that outputs the answer against one that outputs a picture of where the answer might be. Drawing that as a workflow contrast, four steps against three, is what turned a suspected refusal into a demonstrated impossibility.

The unglamorous part

Not every fix was strategic. A second reviewer asked us to cite recent work on crack inpainting with denoising diffusion as relevant to discontinuous fractures, which was a fair ask, and we added it. A third reviewer sent a seventeen-item list of typographical corrections, real ones, "charachterization," "metod," "esitimation," the kind of errors that accumulate in a manuscript revised by several hands across time zones. We fixed all seventeen. It is tempting to treat that list as noise next to the unpublishable verdict, but it is not. A reviewer who finds seventeen typos is a reviewer who read every line, and a clean final manuscript signals that the authors took the whole review seriously, not only the parts that threatened the outcome. Rescue is partly evidence and partly the boring signal that you engaged in good faith.

The playbook, stripped to its parts

The vote reversed. The reject recommendation became an acceptance, on the strength of three additions and a workflow diagram, not on any concession about the confidentiality that started the fight.

The transferable lesson is narrow and, I think, durable. When your data is confidential, the open-verification remedy an editor reaches for is unavailable to you, and you cannot win by insisting the remedy is unreasonable. You win by producing evidence from inside the boundary that answers the specific doubt: blind predictions for "you did not test," an extended-geometry study for "your validation is soft," a workflow contrast for "you refused to compare." Each objection maps to an artifact, the artifacts fit the word budget, and the agreement never has to bend. A reject vote is not a verdict on your science. It is a description of what you have not yet shown, and confidential-data research is often the work of showing it without opening the box.

Limitations

This is one paper's rescue, not a controlled study of what reverses reviewers in general. The three evidence additions worked against the three grounds this reviewer raised; a different objection, say a demand for open data itself rather than open code, would have no analogous fix inside the same confidentiality boundary. The horizontal-well extension improved the case but on a small sample, and we reported its parameter estimates as weaker precisely because five wells cannot carry a strong quantitative claim. The word-limit constraint shaped which artifacts made the main text versus the supplementary, so what a reader sees is already a triaged subset of what we ran. And the account here is anonymised: specific well counts and geometries are reported, but the operator, the field, and the named wells are held back under the same agreement that drove the whole episode.

The Reviewer Said Unpublishable: Anatomy of a Peer-Review Rescue for Industrial AI Research

The three grounds for a reject recommendation

The remedy the editor offered, and why we could not take it

Three additions, one word ceiling

Why the comparison was never a comparison

The unglamorous part

The playbook, stripped to its parts

Limitations

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on