How Referee is Different

Referee applauds projects like Desci Labs, Research Hub, and other projects trying to improve how research is done. These initiatives are a great way to embed reliability throughout the research process and serve the same role as TLA+ and static/dynamic code analysis tools in application development. If you could get the world to use such tools, then the need for pentests and cybersecurity bug bounty programs would be greatly reduced. But the world still hasn't moved there, and there's a ton of insecure spaghetti code everywhere. It's the same with published research. The world is relying on research that has almost no measure of reliability attached to it, so the h-index needs to be modified or at least paired with another measure. That's the problem that Referee's reliability score is intended to fix. Additional differences include:

  • Referee uses a different reward paradigm based on the market theory of value by using bounties. To be fair, the reward paradigm of Desci Foundation is unclear from their articles but is likely based on the labour theory of value, as is usually the case for academic publications that do reward their referees. Both models can co–exist, however, and in fact do in the cybersecurity domain. The value of the bounty system is that the payers of the rewards always get the value they want because they set the bounties. In the labour theory paradigm, referees can deliver value in excess or in deficiency to their compensation - you never know for sure. Were the critiques the ones people care about or just busy work to justify the reward? In the market theory paradigm, only results are rewarded, not effort.

  • Referee uses a tiered framework called the Common Academic Weakness Enumeration (CAWE), similar to the Common Weakness Enumeration (CWE) used for computer system vulnerabilities. Using such a similar framework provides several important benefits:

  • It ensures bounties can be specifically set on the weaknesses of greatest interest.

  • It helps avoid multiple bounty claims for the same weakness. This can be a known problem in early bug-bounty systems.

  • It improves transparency and clarity on why a paper is considered unreliable.

  • It allows reliable large-scale studies on exactly how research is failing.

  • It enables the creation of a universal reliability score.

  • Referee has a heavy focus on existing research while the Desci Foundation seems more future-oriented. Why focus on the past at all? Because that’s where nearly all the problems are. We hope the Desci Foundation sets a new standard for transparent research but everything starts with cleaning up the past. Who will pay for this clean-up effort? Ideally, those who have funded the research, such as the National Science Foundation in the US. The reality is that the same tools that can verify current research based on bug bounties can also be inexpensively applied to past research that lives in the pre-print repositories and Google Scholar as well.

  • Referee envisions reputation staking. As outlined above, this would encourage researchers to put their reputation (tokens) on the line by staking them on the papers of other researchers. This would inform bounty rewards and help outsiders learn what research insiders consider reliable.

  • Referee democratizes the human knowledge curation project. Academia is very much a status arena and access to the most coveted status markers (institutions, journal reputation, etc.) is heavily guarded. Status markers will always exist but in a decentralised world, access will not be gated. Anyone is capable of claiming a bounty or building an agent that can scan for specific weaknesses. Such democratization is required considering the research that is published. It’s unclear if/how Desci Foundation intends to democratize the process.

  • Referee will pay in digital fiat or decentralized currencies. As noted above, we believe contributors would appreciate being rewarded in a currency that buys goods and services in the real world.

  • Referee targets specific paper weaknesses. ResearchHub bounties are for ‘high-quality peer reviews’ based on five criteria (overall, impact, methods, results, and discussion) but the content within each is flexible. It’s not clear whether more than one reviewer can claim the bounty or if the first reviewer’s judgement becomes the standard for all time. This is a problem for just paying general bounties using the labour theory of value paradigm. With Referee, multiple parties can claim bounties for different paper weaknesses over time.

  • Referee encourages the use of AI agents to tackle the enormous amount of articles that need to be reviewed. The ResearchHub’s approach is more restricted tolerating AI use in conjunction with detailed human feedback but barring blatant AI submissions. We believe AI agents are required on both ends - the submission of rewards and the evaluation of those submissions. In the end, ResearchHub’s approach generates even more content for human review when that resource is already restricted.

  • Referee doesn’t require context for the reviewer’s subject weaknesses. If your submission meets the specific criteria for a bounty reward, then the bounty is yours. The ResearchHub asks reviewers to include a section on their deficiencies to provide context to their reviews. This problem is caused by vague bounty criteria and again causes more content for outsiders to read. The reliability of these deficiency statements is also suspect, as they are self-reported without verification.

Last updated