Why We Need To Fundamentally Rethink Scientific Publishing

28 min readDec 1, 2021

The existing scientific journal system is broken. We discuss possible solutions, including what journals should select for and new possibilities that web3 technologies offer.

Science philosopher David Deutsch has stated that the purpose of science is the discovery of explanatory knowledge about the world that is both true (i.e. replicable and universal) and “hard to vary” (i.e. producing non-arbitrary explanations that are empirically falsifiable and don’t rely on appealing to authority and doctrine).¹

Falsification, criticism and the proposal of new explanations and discoveries are mediated by scientific journals. Ultimately — for the vast majority of fields — being published in prestigious scientific journals confers legitimacy to scientific work, attracts the attention of researchers worldwide, secures grants from funders for future research, and is essential for scientists to find jobs and to get promoted.

In other words, prestigious scientific journals have emerged as the gatekeepers of scientific legitimacy. But is the process of getting accepted in a top journal really the same thing as doing good science? And if not, is there a better solution that is both realistic and viable?

We believe there is. In this series, we look at the current system of scientific production and describe the structural problems that are arising from it, while also exploring how web3 technologies enable us to build a new system to address these problems.

Scientific journals as ranking and curating devices

Under the current paradigm of scientific production, scientists need to constantly provide evidence of their “productivity” in order to advance their careers (i.e. get hired or promoted) and to obtain funding for their future research plans, because this is how they are evaluated by their employers and by funding agencies.

One obstacle in this evaluation process is that evaluators hardly ever have the time to engage fully with the body of research each scientist has produced. Thoroughly studying all previous work of just one scientist would potentially require days, weeks, or even months.

This is an unrealistic demand on evaluators, even the most diligent and well-intentioned ones. Instead, evaluators are forced to rely on heuristics that make it easier to assess a scientist’s body of work, such as how many peer-reviewed publications a scientist has produced, and whether or not they were featured in top journals.

Publication in scientific journals is therefore the current key performance indicator for scientists. This metric has become the gold standard by which science is curated and ranked across many fields. Some journals are considered to be much more prestigious (i.e. harder to get into) than others, and these are weighted more heavily by evaluators of a scientists’ career, while also adding perceived legitimacy to the findings themselves.

The editors of scientific journals therefore exercise a great deal of influence in the scientific world; it is they who decide which submissions fall within their scope and are “good enough” to be evaluated in detail, and they are the ones who make a final decision about whether to accept or reject a submission based on the peer reviews they receive.²

Here is a good summary of the current state of the peer review system conducted in 2018 by Publon. Notably, most peer reviews are anonymous (i.e. the authors don’t know who their referees were) and they are not shared publicly, even if an article gets accepted for publication. This implies a lack of accountability for what happened during the review process, exposing this critical part of the scientific production function to turf wars, sloppiness, arbitrariness, and conflicts of interest that can be easily hidden. Furthermore, the review process of journals is typically slow, taking months or years until publication, and it is often riddled with journal-specific, arbitrary submission requirements such as formatting instructions that waste scientists’ time as they bounce their submissions around from journal to journal until they finally find a publication outlet.³

Thus, scientific journals play a key gatekeeping role in scientists’ careers, but the way in which articles get selected or rejected by journals is typically intransparent to the public, inefficient, and unaccountable. Furthermore, journals decide under which conditions the wider public can access the articles they accepted for publication. For the vast majority of journals, published articles are either hidden behind paywalls or, if they’re open-access, require substantial publication fees (thousands of dollars) which have to be paid by the authors or their employers. We’ll come back to the business model of journals further below.

Citations and impact

One popular proxy for the importance and quality of a scientific publication (its “impact”) is the number of citations it receives. The more citations an article receives, the more important it is perceived to be for the scientific discourse in a particular field. Citations are easy to count and to compare and have thus become popular quantitative heuristics for judging how successful scientists are. This gives scientists a powerful incentive to increase their citations as an end in itself.

But getting a lot of citations is not synonymous with conducting good science. One of many problems with using citations as a proxy for “impact”⁴ ⁵ is that scientific work takes time to disseminate and to accrue citations. On average, scientific papers reach their citation peak 2–5 years after publication.⁶ ⁷ This makes it almost impossible to use citation counts to evaluate the impact of scientists’ most recent work. Because funders and institutions need to make allocative decisions prior to the completion of a discovery’s citation lifecycle (i.e. “long term credit”), a more immediate cue is used (“short term credit”): the prestige of journals is used to judge the impact of recent work by scientists instead of the number of citations, which take a few years to accumulate. In many fields, it is almost impossible for a scientist to get hired or promoted without having at least one or several recent publications in “top journals”, i.e. those that are perceived as most prestigious and most difficult to get into.

The most salient proxy for the prestige of a journal is its impact factor,⁸ which measures the yearly mean number of citations of articles published in the last two years. The prestige of journals and their impact factor are metrics that by design pool reputation across all the papers published in a journal, irrespective of their actual individual quality and impact. But the distribution of citations within journals is typically highly skewed — about half the citable papers in a journal tend to account for 85% of a journals’ total citations.⁵ Because of these dramatic differences in citation patterns between articles published in the same journal, the impact factor of a journal is only a crude proxy for the quality and importance of papers published within a journal.⁹ Furthermore, the impact factor of small journals can be highly sensitive to the inclusion of one or a few articles that amass a high number of citations quickly.

Journal impact factors also vary substantially across fields, partly as a function of the prevailing citation culture and the absolute size of an academic discipline, but also as a function of journal size and which types of publications are counted (e.g. letters, editorials, news items, reviews).⁶ Thus, the impact factor of a journal is partly driven by aspects that are unrelated to the quality of the articles it publishes.

The impact factor metric was not originally intended for its current usage as a proxy for journal quality. Instead, it was first devised by Eugene Garfield and librarians started adopting it to help decide which journals to subscribe to.⁸ Since it has become an important part of journals’ reputations, for-profit subscription-based journals have since learned to to optimize their impact factor using a wide variety of tactics to game the system.¹⁰

When a metric is optimized for as a target, it often ceases to be a good metric of the underlying object of interest (i.e. the quality and importance of scientific publications).¹⁰ Scientists have been on the receiving end of the adoption of the impact factor and have adopted the norm — even while often decrying it — as a result of institutional demand for immediate proxies of scientific productivity. Problems associated with optimizing over this crude yardstick have been well documented,¹⁰ and despite repeated calls to abandon journal impact factors as a measure of scientific productivity of academics and institutions, it remains the most widely used metric for that purpose, partly due to a lack of agreement about what alternative measure should be used instead.¹¹ ¹²

Responding to the incentives created by this state of affairs, prestigious journals have learned to manage their article portfolio as one would diversify a portfolio of uncertain market bets. Essentially, editors are placing bets on papers according to the number of expected future citations a given article will generate; the more citations a journal’s portfolio generates, the better the impact factor, which in turn drives revenue.

But prestigious journals, once they achieve that prestige, also become market movers: because they have a large “market share” in the attention economy of scholars and journalists, articles published in their outlets are likely to garner more citations, creating a flywheel effect which consolidates the gains of the incumbent journals and makes them extremely hard to displace. A journal with a high impact factor is therefore likely to gather more citations than another journal that publishes an article of the same quality level, moving the impact factor even further away from being a useful metric.

Under the current incentive structure, novelty beats replicability

Independent replication of empirical results is critical to the scientific quest for better explanations for how the world works.¹³ ¹⁴ Without replicability, novel findings can be based on error or fabrication, and we are essentially relying on someone’s authority instead of objective proof. Unfortunately, replications do not score nearly as high in the prestige hierarchy of scientific publications as novel and surprising results. For example, only 3% of all journals in psychology explicitly encourage the submission of replication studies, while many journals explicitly state that they do not publish replications.¹⁵

Thus, scientists have little or no incentive to produce replicable research results. Instead, they face a “publish-or-perish” or even an “impact-or-perish” culture based on novelty and impact that shapes their success in the academy.¹⁰ One of the core issues surrounding the use of the citations and impact factors as metrics for scientific productivity is that they do not account for reproducibility of the published discoveries. Novel, surprising and provocative results are more likely to receive attention and citations, and are therefore sought after by editors and journals — even though novel and surprising findings are also less likely to be true.

The decoupling of replicability from commonly-used performance indicators has contributed to a raging replication crisis in many fields of science.¹³ ¹⁶ ¹⁷ ¹⁸ ¹⁹ ²⁰ The incentives for scientists to produce novel, attention-grabbing results are so strong that many cases of downright data manipulation and fraud have been reported.²¹ ²² ²³ Furthermore, poor research designs and data analysis as well as many researcher degrees of freedom in the analysis of data encourage false-positive findings.¹³ ¹⁶ ²⁴ As a result, recent large-scale replication studies of high-impact papers in the social sciences found that only ~60% of the original results could be replicated.¹⁷ ¹⁹ More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.²⁵ This means widely circulated results are about as likely to be right as wrong.

To make matters worse, non-replicable studies tend to be cited more than replicable studies,²⁶ and citation patterns of papers after strong contradictory replication results adjust only modestly.²⁷ As a result of this bias in favor of novelty and against replicability, the scientific endeavor is not self-correcting efficiently. Because citations in published articles are only looking backwards in time (i.e. they only reflect what parts of previously published literature were cited), it’s nearly impossible for readers of an article to ascertain whether a study’s novel findings are replicable and trustworthy or not. Journals also have an incentive not to facilitate replications because successful replications are not novel enough to garner a lot of attention (i.e. impact and citations), while unsuccessful replications undermine journals’ claims of quality assurance.

In the technical appendix, we explore in more detail the incentives of journal editors to select for novelty and against research that replicates existing results. In contrast, we consider what an ‘ideal’ criterion might look like that maximizes the value of the overall research enterprise. Replications, particularly the first few, would receive significantly more weight in an ideal system of how science is evaluated.

The current separation of replicability from impact, the lack of incentives to replicate existing work, and the lack of incentives to provide “forward-looking” visibility of replication outcomes all contribute to the precarious state of many scientific fields today.16 Fundamentally, there is a disconnect between the current practice of rewarding scientists for publishing as many “high impact” findings as possible and the goal of the scientific endeavour — developing reliable explanations.

Yet, despite its inherent flaws, prestigious journals and academic institutions continue to operate under this paradigm, and scientists have little choice but to play along because their professional future largely depends on it.

The business model of scientific journals

Traditional scientific journals require authors to transfer their copyrights to the publisher. Copyrights are a type of intellectual property that gives its owner the exclusive right to make copies of creative work, thereby creating monopoly power for the copyright owner to monetize the work. The market for scientific publications is largely dominated by five large for-profit companies (Elsevier, Black & Wiley, Taylor & Francis, Springer Nature and SAGE), which together control more than 50% of the market between them.²⁸ Worldwide sales of access rights to scientific papers amount to more than USD 19 billion, which puts the scientific publication industry between the music industry and the film industry in terms of revenue.

The two leading business models of publication companies are “pay-for-access” and “pay-for-publication”. Both of these models rely on the unpaid labour of scientists to conduct peer-review which amounts to a multi-billion dollar donation of scientists to the publication industry, boosting the profits of publishing houses mostly with public funds or researcher’s private time and denying scientists fair rewards for performing high-quality referee work.²⁹

In the pay-for-access model, journals charge subscription fees to individuals and institutions such as university libraries. Each individual journal usually charges hundreds of dollars for an annual subscription and access to individual articles typically costs between $20 and $100.

Institutional subscribers such as universities, libraries, and governments are presented with bundle “deals”, which often contain not only the most highly ranked journals of a publisher but also a large number of niche or low impact journals which the subscriber might not pay for if not included in a bundle. This practice of exploiting a dominant market position by bundling goods is a powerful anti-competitive strategy to consolidate that market position.³⁰ ³¹ ³² By essentially taking up a large chunk of a library’s budget in one deal, an incumbent can protect its market against competition from newcomers.

Journal subscriptions under this model are a huge burden on public funds.³³ For example, the UK spent $52.3 million for annual journal subscriptions in 2014,³⁴ and the Netherlands was paying over $14 million in 2018 for subscriptions by their public universities to the journals of just one large publishing house (Elsevier). Despite the substantial expenditure of public funds on journal subscription fees, the tax-paying public, which funds most of the research and the journal subscription fees, does not have access to the science its taxes pay for.

In the “pay-for-publication” model, authors pay a fee for each article they publish. In contrast to the “pay-for-access” model, these articles are published under an open-access agreement and are typically accessible to the public online. Publication charges vary across journals and article types, with typical publication fees ranging between $2,000 and $11,000.35 Scientists either have to pay these fees out of their research budgets, or out of their own pockets, or they rely on their employers (e.g. universities) to cover the cost. The total number and the market share of “pay-for-access” journals continue to grow each year.³⁶ ³⁷

There is a perverse incentive at the heart of the “pay-for-publication” model: authors of an article pay only upon acceptance of their manuscript. This means that, for every rejected manuscript, a journal loses money. Thus, open-access journals need to be less restrictive in their selection to sustain their business models. While open-access journals have lowered the barriers to accessibility of knowledge, and many are well-meaning and high-quality journals, the model as a whole has led to a worldwide epidemic of predatory journals, a lowering of standards, and has opened the floodgates for research of little to no value.³⁸ ³⁹ ⁴⁰ ⁴¹

Our science arbitration system is therefore stuck between a rock and a hard place: on one side, subscription-based publishers control the distribution channels and have proven resilient, immovable, and powerful forces of capital extraction from taxpayer funds. Their highly selective flagship journals allow for profitable bundle deals. Meanwhile, on the other side, the open-access model thrives on volume and has enabled the rise of predatory publishers worldwide, flooding the scientific literature with an onslaught of fraudulent, unsound, or even plagiarized reports masquerading as science.¹⁰

Finally, both the “pay-for-access” and the “pay-for-publication” models exclude the vast majority of scientists from low-prestige institutions and people from developing countries from their ability to participate in science, thereby exaggerating inequality and restricting opportunities for progress and development.

In recent years, we have witnessed a rise in free alternatives: preprint platforms such as bioRxiv, medRxiv, or SSRN which allow scientists to post early versions of their manuscripts online. These preprint platforms follow the lead of physicists, which principally rely on Arxiv for disseminating work in their communities. In a similar vein, economists rely on working paper platforms such as NBER, mostly due to the fact that it often takes multiple years to be published in a reputable economic journal. However, preprints and working papers are not peer-reviewed and often differ substantially from the final version of the published manuscript or never get accepted for publication in a peer-reviewed journal at all. Thus, it is difficult or impossible for laymen readers to evaluate if they can trust the results reported in these outlets. As we have seen during the COVID epidemic, preprint platforms, especially in the medical field, can be misused to spread misinformation and unsound scientific results.⁴²

In summary, the current scientific publication ecosystem is highly exploitative and unfair; it restricts scientific progress and opportunities for development; and it primarily benefits the current oligopoly of scientific publishing houses and their shareholders at the expense of the public. While preprint platforms exist as an alternative to academic journals, they lack the rigor of peer-review and are more prone to be the source of incorrect information.

How Web3 technologies offer hope for the future

Technological innovations have historically enabled vast improvements in our ability to produce and share knowledge. Examples include the invention of printing (which made storing and distributing knowledge possible at scale), the development and improvement of scientific instruments, the Internet (which enabled immediate, worldwide access to computer programs, databases, and publications), and supercomputers that now permit fast processing of massive amounts of data.

The latest wave of innovation concerns human coordination at scale using web3 technologies which enable a decentralized version of the Internet that is based on peer-to-peer networks of a growing list of publicly available, tamper-proof records. Web3 is a powerful departure from the centralized, intransparent, data-hoarding principles of web2 which underlies the attention economy, the success of companies such as Facebook, Google, and the proprietary, vertically-integrated platforms of oligopolistic scientific publishers.

In contrast to these, the core premise of web3 is the widespread distribution of ownership to users and the trustless, censorship-resistant execution of code orchestrated through distributed ledger technology. As web3 adoption gains steam and viable applications continue to be built, one intriguing question is whether elite journals could be restructured as a scientific cooperative on web3.

The potential benefit of restructuring the current scientific publishing paradigm on web3 is that it would enable scientists to earn a stake in the multi-million dollar business of scientific publishing based on the soundness of their contributions. If this could be done successfully, it would materially address some of the challenges and problems that have arisen under the current, centralized model as outlined above. But while technologically feasible, it would likely be opposed by incumbents: leading publishers have firmly opposed ownership as a red line not to be crossed, preferring mass resignation of their editors over setting such a menacing precedent to their bottom line.⁴³ The world’s best scientists create immense value both to the world and for publishers, and web3 offers a new paradigm for this value to be recognized.

Beyond returning the value created by scientists to scientists, web3 offers technological capabilities for new modes of cooperation, incentive systems, and remunerating instruments. As we have seen with DEFI, the finance industry is under pressure by the rise of programmable monies (“money legos”). DAOs — decentralized autonomous organisations — are emerging at an increasing pace, ranging from financial services providers (e.g. MakerDao) to digital art investment collectives (e.g. PleasrDAO). Web3 is burgeoning with radical experimentation, such as new modes of capital allocation for public good through quadratic funding (e.g. GITCOIN), decentralized identity management, decentralized storage solutions (e.g. IPFS, ARWEAVE, Filecoin), self-custodial collective wallets (e.g. Gnosis), and a blossoming DAO toolkit ecosystem (e.g. Aragon, Commons Stack).

Furthermore, the possibility of pseudonymous identities tied to scientific reputation offers a new horizon for keeping the identity of referees protected even in a completely open, transparent scientific evaluation system.⁴⁴ In web3, we can tie pseudonymous identities tied to real, highly valued contributions to scientific endeavours in a tamper-proof and auditable way. By combining such a “proof-of-skill” system with pseudonymity, we can create a scientific ecosystem that simultaneously promotes open debate and reduces bias.

At the heart of the web3 ethos is the dream of decentralizing the world towards a more merit-based distribution of value and ownership, and returning the sovereignty of the individual over his finances, data, contributions and identity. Now that the building blocks are out there, there is much to be said about the promises of Scientific Journals as DAO collectives, channeling the value they create back to their communities.

Steps have already been taken by a few pioneers in the space of application of Web3 to science. There is a niche ecosystem out there already: VitaDAO is an example of a Web3 project which brings together some of the world’s great laboratories in longevity research with an emphasis on funding their effort and having a stake in the IP which results from it. Other projects such as ResearchHub are attempting to crowd-source curation of scientific work through Reddit-like social mechanisms.

The scale of the problem we face is truly global, and much of the future of humanity depends on our scientific engine’s ability to self-correct, falsify, criticize, and converge closer to the truth. In his book, David Deutsch supports that as long as these core properties are maintained, humanity has set course towards the beginning of an infinity of progress.1 Unfortunately, there is empirical evidence demonstrating that scientific progress has been steadily decelerating in the past few decades, with each Dollar invested into science yielding smaller social returns over time.⁴⁵ One possible explanation of this worrying trend is that ideas are getting harder to find.⁴⁵ But the replication crisis and the open floodgates of bad science also point to the faulty functioning of our scientific validation apparatus as a source of decreasing returns to science.

Combined in the right way, web3 technologies could disrupt and substantially improve our scientific-legitimacy conferring engine all while returning the value created by scientists to scientists.

Technical appendix

Scientific journals as agent-based black boxes that predict the value of manuscripts

To improve the current publication system, it would be useful to define an objective function that describes what journals should select for to maximize the contribution of publications to the creation of knowledge. Based on such an objective function, different selection mechanisms could be compared and ranked in their ability to contribute to the creation of knowledge. This is what we attempt to do here.

As a first step, we can conceptualize journals as prediction pipelines designed to sort and classify scientific work according to its expected value. Each participant in the evaluation process of a journal has a model of the world, or more precisely — of what constitutes valuable science. Participants may or may not agree on what they view as valuable science. And, typically, neither referees nor editors are explicit about what their personal evaluation criteria are. Let us call these potentially heterogeneous models of the world “black boxes”.

At each stage of the scientific publication process, these black boxes produce signals which are combined into a final prediction rendered by the editor. Provided the expected scientific value exceeds a certain journal set-standard, the work is accepted for publication. If it misses the mark, the work is rejected or invited for resubmission provided the referee’s requests can be thoroughly addressed.

Machine-learning framework: Scientific journals as ensemble learning

The majority of current scientific journals can be thought of as a 3-stage predictive process that combines predictions from different black-box algorithms. In machine learning, this is known as ensemble learning. Ensemble learning is the process of combining different predictive algorithms to increase predictive accuracy.⁴⁶ ⁴⁷ The editor, generally a senior scientist, performs an initial prediction (“The desk”) which constitutes the initial filtering on expected scientific impact. Passing the desk brings a paper into the next stage, which involves sending out the submission to peer-reviewers. The reviewers perform their own predictions on the expected scientific value of the work. In the final stage, the editor weighs and aggregates these signals with his own to form his final prediction.

Agent-based framework: Effort and truth are necessary to prevent noise, collusion and sabotage

In an ideal world, every black box involved in the process a) expends maximum effort and b) truthfully reports its prediction. The former is required because these models of the world are costly to apply: the detailed and minute work required to evaluate the soundness of the methodology and the justifications for the conclusion is a time-consuming process. Every submission is a high-dimensional input that needs to be broken down and evaluated on multiple dimensions to determine its expected scientific impact. By expanding insufficient effort, the prediction turns to noise.

By not reporting the truth, we run into the risks of unwarranted gatekeeping. Likewise, there is a threat of collusion between authors and peer-reviewers to provide each other with inflated reviews. Noise, sabotage and collusion are three failure modes of modern scientific journals’ peer-review process and can only be averted through effort and honesty. This is a particularly acute problem because peer reviewers (and often editors) work pro-bono for the publishing house, and there is little to no benefit in providing effortful reviews.¹⁰ ²⁹

Formalizing the scientific journal

In an abstract sense, we can think of a research work as determining the truth of a hypothesis, by offering new evidence that is, ideally, very convincing (but may in fact not be so). A hypothesis has the form that condition X leads to outcome Y. The quality of the research contribution (Q) depends on how much we learn (L), i.e. how much the information increases our confidence in the hypothesis, and how important the hypothesis is to the scientific enterprise overall (V). That is, let Q=V∙L.

The value of new knowledge depends on its implications, given our existing knowledge base, and on the potential proceeds from those implications, for example new inventions. These things are difficult to observe. Even similarly qualified referees and editors may disagree to an extent on what V is, because of their subjective understanding of current knowledge, their skill and imagination in envisioning future impact, and their perception as to which problems are most important to solve. We just take it as given here that there is a meaningful true V, and that readers of scientific work “guess” at it. Greater ability tends to produce better guesses.

How much we learn can be understood with reference to Bayes’ rule, P(Y|X)=P(Y)∙P(X|Y)/P(X), where P(Y) is the prior likelihood that outcome Y occurs, and P(Y|X) is the posterior likelihood (when condition X holds in the data). P(Y|X) measures the strength of the inference that X entails Y. We denote this by R. P(X|Y)/P(Y) measures how much more likely it is that condition X is observed when the outcome is Y. In other words, P(X|Y)/P(X) captures the information contained in X about Y. We define P(X|Y)/P(X)=1+I, so that I=0 reflects that X is as likely to occur with Y or without Y, and therefore nothing was learned from studying condition X. If I is different from 0, then X changes our expectation of Y. We can write P(Y)=R/(1+I), and therefore L≡P(Y|X)-P(Y)=R-R/(1+I). (Here we assume that positive relationships between X and Y are being tested, i.e. I≥0. There is no loss of generality, since Y can always be relabeled as the opposite outcome to make a negative relationship positive.)

The quality of a contribution can now be expressed as Q=V∙(R-R/(1+I)), where V is the (projected) value of being able to predict outcome Y, R is the degree to which Y depends on condition X, and I captures how our beliefs about Y changed due to this research. Note that R and I both affect Q positively, and Q≤V. When nothing new was learned (I=0), or when the condition does not predict the outcome (R=0), or when predicting the outcome is irrelevant (V=0), then Q=0. Note that a replication of a prior result can be a quality contribution, since it might significantly increase support for a hypothesis, especially when it is one of the first replications.¹³ ¹⁴ A negative result (where Y does not occur under condition X) can also be a quality contribution if it corrects the current prior.

An interesting, and probably common case, arises if a paper reports surprising results that are potentially paradigm-shifting, but the results turn out to be false. Intuitively, Q might be smaller than zero in this case, because an influential result that is false could do substantial damage both in terms of wasted time and effort by scientists, but also considering the welfare consequences for society. For example, irreproducible pre-clinical trials create indirect costs for patients and society.⁴⁸ Furthermore, future research that builds on the false discovery may not only waste resources, it may also derail scientific progress into further false discoveries.

When an error is made in the Bayesian model, the evidence does not justify the conclusions. Suppose the hypothesis is misspecified, and the relationship between condition and outcome is actually negative (I<0), but mistakenly reported as positive. Then L=R-R/(1+I))<0, which would make the quality of the contribution Q negative.

If we think of scientific progress as a linear process, a positive Q value implies that the new discovery makes some kind of positive contribution to scientific advance. A false discovery may not only not contribute to our knowledge, it may actually add confusion and entropy, resulting in scientific regress. Nevertheless, an editor might publish such a paper, misjudging Q.

The stated purpose of scientific journals is to publish contributions that advance knowledge (Q > 0). It is useful at this point to differentiate between what journals should be evaluating in order to advance knowledge (i.e. the normative case) and what journals actually do in practice (i.e. the descriptive case).

In the normative case (i.e. an ideal world), the predictive algorithm of journals should try to identify papers that have high Q values. This is complicated by the fact that the true value of a contribution is inherently difficult to assess and influenced by subjective insight and preferences. In addition, referees and editors need to exert effort to confirm the objective validity of the analysis, but they are not rewarded for doing so.

We shall denote the predicted quality of the contribution by Q’=f(V’,R’,I’), where primes indicate estimated quantities. Referees and editors will not necessarily evaluate Q according to the Bayesian model, but may assign subjective weights to each. V’ is to a large degree subjective; R’ and I’ can in principle be determined more objectively, but getting them right is effort-intensive, so the task is left mostly to referees. The referees make a report m, the accuracy of which depends on effort e∈0,1. In general, m(e)=t+ρ∙(1-e), where t is the true value and is a random variable that is symmetry (e.g. normally) distributed around zero. Note that the larger the effort, the smaller the potential error ρ∙(1-e).

In a typical process, the Editor, i = 1, performs a first scan of the submission. The submission is sent out for formal review if the editor believes it passes some minimum threshold, which is influenced by the editor’s relative preference for novelty, replicability, etc. If the paper is sent out by the editor for formal review, the referees similarly evaluate it, again giving potentially different weights to different criteria.

The editor then summarizes the evaluations to arrive at a final decision on the paper. If estimated quality is greater than the journal-specific threshold, the paper either gets a revise-and-resubmit and the process is repeated, or the paper is ultimately rejected or published.

Some adverse incentives journal editors face tend to bias these decisions toward novelty and against replications. Controversial, or otherwise attention-grabbing, results will tend to garner citations as researchers try to verify them. If maximizing reputation through citations is a goal, then it is rational for journals not to incentivize and reward replication efforts, although they are a crucial component of the scientific enterprise. Replications also suffer from the dilemma that they are, provocatively put, “not interesting” or “not credible.” If a replication study confirms the original result, or negates a result that was published recently and is not yet widely known, it may not be viewed as noteworthy. If it fails to confirm a well-known result, it will likely face doubt. Moreover, if only negative replications are “novel” enough to be publishable in a well-regarded journal, researchers face substantial risk (as well as bias) in attempting such a study, given that it might yield a positive result.

These aspects suggest that the “estimated quality” of an article will be based on weights that do not correspond to the Bayesian learning framework and may reflect differences in priorities between the editor and the referees, who are less motivated to generate future citations for the journal. Ultimately, referee judgments may be reflected in the final decision to a lesser extent than appears, and this would further reduce the referees’ incentive to commit effort.

To summarize the above points:

Editors and referees will not necessarily evaluate articles according to consistently weighted criteria, and their judgments may well deviate from the best possible prediction of true quality.
In particular, editors have incentives to weight novelty more strongly than replicability, and referees have incentives to limit their efforts to verify scientific accuracy. This can lead to a published literature with many low-quality papers (even if referees exert maximum effort due to intrinsic motivations).

Given the small number of referees and editors that evaluate each paper for each journal and their potential heterogeneity, the distribution of realized quality of publications will have a high variance across journals, and each submission of a paper to a different journal is akin to a lottery draw. Since journals require that the papers they evaluate are not under consideration at a different journal at the same time, this implies a substantial loss of time between the moment of first submission to a journal and the point where an article actually gets published. It also implies substantial costs for the authors of the submission, given that many journals have different formatting requirements etc. Thus, the current practice of curating and evaluating scientific contributions is inefficient and a waste of (public) resources.

If replicability is over-emphasized, the literature would be dominated by true findings, but there would be little or no advances in what we reliably know.

In an ideal world where journals achieve their stated objective of publishing papers of the highest possible quality:

(a) A logically derived rule is employed for predicting quality from the estimated strength of evidence and novelty of research work.

(b) Referees are given extrinsic incentives to put effort into verification and report truthfully.

If (a) and (b) are fulfilled, progress in the scientific literature would be faster if journals were to allow the simultaneous submission of papers to different publication outlets, and if more researchers were involved in the evaluation process.

Authors: Philipp Koellinger, Christian Roessler, Christopher Hill

Philipp Koellinger: DeSci Foundation, Geneva, Switzerland; University of Wisconsin-Madison, La Follette School of Public Affairs, Madison, WI, USA; Vrije Universiteit Amsterdam, School of Business and Economics, Department of Economics, Amsterdam, The Netherlands

Christian Roessler: Cal State East Bay, Hayward, CA, USA

Christopher Hill: DeSci Foundation, Geneva, Switzerland

References

Deutsch, D. The Beginning of Infinity: Explanations That Transform the World. (Penguin Books, 2012).
Goldbeck-Wood, S. Evidence on peer review — scientific quality control or smokescreen? BMJ 318, 44–45 (1999).
Huisman, J. & Smits, J. Duration and quality of the peer review process: the author’s perspective. Scientometrics 113, 633–650 (2017).
MacRoberts, M. H. & MacRoberts, B. R. Problems of citation analysis. Scientometrics 36, 435–444 (1996).
Adam, D. The counting house. Nature 415, 726–729 (2002).
Amin, M. & Mabe, M. A. Impact factors: use and abuse. Medicina 63, 347–354 (2003).
Min, C., Bu, Y., Wu, D., Ding, Y. & Zhang, Y. Identifying citation patterns of scientific breakthroughs: A perspective of dynamic citation process. Inf. Process. Manag. 58, 102428 (2021).
Garfield, E. The history and meaning of the journal impact factor. JAMA vol. 295 90 (2006).
Aistleitner, M., Kapeller, J. & Steinerberger, S. Citation patterns in economics and beyond. Sci. Context 32, 361–380 (2019).
Biagioli, M. & Lippman, A. Gaming the Metrics: Misconduct and Manipulation in Academic Research. (MIT Press, 2020).
Seglen, P. O. Why the impact factor of journals should not be used for evaluating research. BMJ 314, 498–502 (1997).
Moed, H. F. Citation analysis of scientific journals and journal impact measures. Curr. Sci. 89, 1990–1996 (2005).
Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005).
Moonesinghe, R., Khoury, M. J. & A Cecile J. Most published research findings are false — But a little replication goes a long way. PLoS Med. 4, e28 (2007).
Martin, G. N. & Clarke, R. M. Are psychology journals anti-replication? A snapshot of editorial practices. Front. Psychol. 8, 523 (2017).
Smaldino, P. E. & McElreath, R. The natural selection of bad science. R Soc Open Sci 3, 160384 (2016).
Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016).
Open Science Collaboration. PSYCHOLOGY. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).
Camerer, C. F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav 2, 637–644 (2018).
Dreber, A. et al. Using prediction markets to estimate the reproducibility of scientific research. Proc. Natl. Acad. Sci. U. S. A. 112, 15343–15347 (2015).
Verfaellie, M. & McGwin, J. The case of Diederik Stapel. American Psychological Association https://www.apa.org/science/about/psa/2011/12/diederik-stapel (2011).
Grieneisen, M. L. & Zhang, M. A comprehensive survey of retracted articles from the scholarly literature. PLoS One 7, e44118 (2012).
Callaway, E. Report finds massive fraud at Dutch universities. Nature 479, 15 (2011).
Schweinsberg, M. et al. Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis. Organ. Behav. Hum. Decis. Process. 165, 228–249 (2021).
Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).
Serra-Garcia, M. & Gneezy, U. Nonreplicable publications are cited more than replicable ones. Sci Adv 7, (2021).
Hardwicke, T. E. et al. Citation patterns following a strongly contradictory replication result: Four case studies from psychology. Adv. Methods Pract. Psychol. Sci. 4, 251524592110408 (2021).
Hagve, M. The money behind academic publishing. Tidsskr. Nor. Laegeforen. 140, (2020).
Aczel, B., Szaszi, B. & Holcombe, A. O. A billion-dollar donation: estimating the cost of researchers’ time spent on peer review. Res Integr Peer Rev 6, 14 (2021).
Adams, W. J. & Yellen, J. L. Commodity bundling and the burden of monopoly. Q. J. Econ. 90, 475–498 (1976).
Greenlee, P., Reitman, D. & Sibley, D. S. An antitrust analysis of bundled loyalty discounts. Int. J. Ind Organiz 26, 1132–1152 (2008).
Peitz, M. Bundling may blockade entry. Int. J. Ind Organiz 26, 41–58 (2008).
Bergstrom, C. T. & Bergstrom, T. C. The costs and benefits of library site licenses to academic journals. Proc. Natl. Acad. Sci. U. S. A. 101, 897–902 (2004).
Lawson, S., Gray, J. & Mauri, M. Opening the black box of scholarly communication funding: A public data infrastructure for financial flows in academic publishing. Open Library of Humanities 2, (2016).
Else, H. Nature journals reveal terms of landmark open-access option. Nature 588, 19–20 (2020).
Laakso, M. & Björk, B.-C. Anatomy of open-access publishing: a study of longitudinal development and internal structure. BMC Med. 10, 124 (2012).
Solomon, D. J., Laakso, M. & Björk, B.-C. A longitudinal comparison of citation rates and growth among open-access journals. J. Informetr. 7, 642–650 (2013).
Clark, J. & Smith, R. Firm action needed on predatory journals. BMJ 350, h210 (2015).
Grudniewicz, A. et al. Predatory journals: no definition, no defence. Nature 576, 210–212 (2019).
Richtig, G., Berger, M., Lange-Asschenfeldt, B., Aberer, W. & Richtig, E. Problems and challenges of predatory journals. J. Eur. Acad. Dermatol. Venereol. 32, 1441–1449 (2018).
Demir, S. B. Predatory journals: Who publishes in them and why? J. Informetr. 12, 1296–1311 (2018).
Brierley, L. Lessons from the influx of preprints during the early COVID-19 pandemic. Lancet Planet Health 5, e115–e117 (2021).
Singh Chawla, D. Open-access row prompts editorial board of Elsevier journal to resign. Nature (2019) doi:10.1038/d41586–019–00135–8.
Increasing Politicization and Homogeneity in Scientific Funding: An Analysis of NSF Grants, 1990–2020 — CSPI Center. https://cspicenter.org/reports/increasing-politicization-and-homogeneity-in-scientific-funding-an-analysis-of-nsf-grants-1990-2020/ (2021).
Bloom, N., Jones, C. I., Van Reenen, J. & Webb, M. Are Ideas Getting Harder to Find? Am. Econ. Rev. 110, 1104–1144 (2020).
Polikar, R. Ensemble Learning. in Ensemble Machine Learning: Methods and Applications (eds. Zhang, C. & Ma, Y.) 1–34 (Springer US, 2012).
Sagi, O. & Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8, e1249 (2018).
Begley, C. G. & Ellis, L. M. Raise standards for preclinical cancer research. Nature 483, 531–533 (2012).