ArXiv at 20
Paul Ginsparg, founder of the preprint server, reflects on two decades of sharing results rapidly online — and on the future of scholarly communication.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
Access options
Subscribe to this journal
Receive 51 print issues and online access
199,00 € per year
only 3,90 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
39,95 €
Prices may be subject to local taxes which are calculated during checkout
References
Mermin, N. D. Phys. Today 44, 9 (1991).
Haque, A. & Ginsparg, P. J. Am. Soc. Inf. Sci. Technol. 60, 2203–2218 (2009).
Haque, A. & Ginsparg, P. J. Am. Soc. Inf. Sci. Technol. 61, 2381–2388 (2010).
Rights and permissions
About this article
Cite this article
Ginsparg, P. ArXiv at 20. Nature 476, 145–147 (2011). https://doi.org/10.1038/476145a
Published:
Issue date:
DOI: https://doi.org/10.1038/476145a
This article is cited by
Comments
Commenting on this article is now closed.
-
Alex REISNER
In the middle of 2001 Ginsparg left the Los Alamos National Laboratory, under less than amicable circumstances, for Cornell University which allowed him to continue and extend arXiv using it as a model for research into digital libraries. As Ginsparg described it, the last straw was a recent salary review which described him as, "a strictly average performer by overall lab standards; with no particular computer skills contributing to lab programs; easily replaced, and moreover overpaid, according to an external market survey".
The then Chair of the Department of Physics at Cornell, Peter Lepage's sardonic comment was, "Evidently their form didn't have a box for: 'completely transformed the nature and reach of scientific information in physics and other fields'." -
Stevan Harnad
--
ARXIV'S FUNDING PAINS MAY BE A WAKE-UP CALL: CENTRAL VERSUS INSTITUTIONAL ARCHIVINGArxiv (1991) was an invaluable milestone on the road to Open Access. But it was not the first free research-sharing site: That began in the 1970's with the internet itself, with authors making their papers freely accessible to all users net-wide by self-archiving them in their own local institutional anonymous FTP archives .
With the creation of the world wide web in 1990, HTTP began replacing FTP sites for the self-archiving of papers on authors' institutional websites. FTP and HTTP sites were mostly local and distributed, but accessible free for all, webwide. Arxiv was the first important central HTTP site for research self-archiving, with physicists webwide all depositing their papers in one central locus (first hosted at Los Alamos). Arxiv's remarkable growth and success were due to both its timeliness and the fact that it had emerged from a widespread practice among high energy physicists that had already predated the web, namely, to share hard copies of their papers before publication by mailing them to central preprint distribution sites such as SLAC and CERN.
At the same time, while physicists were taking to central self-archiving, in other disciplines (particularly computer science), distributed self-archiving continued to grow. Later web developments, notably google and webwide harvesting and search engines, continued to make distributed self-archiving more and more powerful and attractive. Meanwhile, under the stimulus of Arxiv itself, the Open Archives Initiative (OAI) was created in 1999 — a metadata-harvesting protocol that made all distributed OAI-compliant websites interoperable, as if their local contents were all in one global, searchable archive.
Together, google and OAI probably marked the end of the need for central archives. The cost and effort can instead be distributed across institutions, with all the essential search and retrieval functionality provided by automated central "overlay" services for harvesting, indexing, search and retrieval (e.g., OAIster, Scirus, Base and Google Scholar). Arxiv continues to flourish, because two decades of invaluable service to the physics community has several generations of users deeply committed to it. But no other dedicated central archive has arisen since. Like computer scientists, whose local, distributed self-archiving is harvested centrally by Citeseerx, economists, for example, self-archive institutionally, with central harvesting by RepEc.
In biomedicine, PubMed Central looks to be an exception, with direct central depositing rather than local. But PubMed Central was not a direct author initiative, like anonymous FTP, author websites or Arxiv. It was designed by NLM, deposit was mandated by NIH, and deposit is done not only by authors but by publishers.
Open Access is still growing far more slowly than it might, and one of the factors holding it back might be notional conflicts between institutional and central archiving. It is clear that Open Access self-archiving will have to be universally mandated, if all disciplines are to enjoy its benefits (maximized research access, uptake, usage and impact, minimized costs). The universal providers of all research paper output, funded and unfunded, are the world's universities and research institutions, distributed globally across all scholarly and scientific disciplines, all languages, and all national boundaries.
Hence funder self-archiving mandates like NIH's and institutional self-archiving mandates like Harvard's need to join forces to reinforce one another, and the most natural, efficient and economical way to do this is to mandate that all self-archiving should be done locally, in the author's institutional OAI-compliant repository. The contents of the institutional repositories can then be harvested automatically by central OAI-compliant repositories such as PubMed Central (as well as by google and other central harvesters) for global indexing and search.
In this light, Arxiv's self-funding pains may be a blessing in disguise: Why should Cornell University (or a "wealthy donor") subsidize a cost that institutions can best "sponsor" by each doing (and mandating) their own distributed archiving locally (thereby reduced, to boot)? After all, no one deposits directly in Google?
See: How to Integrate University and Funder Open Access MandatesStevan Harnad, EnablingOpenScholarship
-
Kai von Fintel
Steve Harnad says: "But no other dedicated central archive has arisen since".
That's not true, at least in my corner of the academic world (linguistics & philosophy). In linguistics, there are three dedicated central archives for the main subfields:
In philosophy, there's now PhilPapers (does do a lot of harvesting, but also accepts direct submissions).
I would think that a mixed model of both harvesting and direct submission, like LingBuzz and PhilPapers, is a good way to go.
-
Stevan Harnad
THE MEASURE OF A REPOSITORY (WHETHER CENTRAL OR INSTITUTIONAL)
The spontaneous (unmandated) annual self-archiving rate across all disciplines today is about 15%. That means about 15% of the papers published yearly are made open access by their authors, by self-archiving them free for all, online.
Hence 15% is the figure to beat, if one is arguing for local institution-internal self-archiving versus distal, institution-external (central) self-archiving. (Note that harvested self-archiving does not count in this, because the credit for the items in a harvested site goes to the repository from which they were harvested.)
In high-energy physics, the annual Arxiv self-archiving rate has been close to 100% for over a decade. There are similar success rates in other sectors of Arxiv, for example, astrophysics. In the mathematics sector, Arxiv self-archiving rates are lower but still substantially higher than the 15% global baseline; in Arxiv's computer science sector, they may beat 15% but they are far lower than the perentages for Citeseerx — but Citeseerx is a central harvester of computer science papers, rather like Google, harvesting papers that were self-archived on their authors' institutional websites.
Note that these figures are percentages based on guesstimates of the total annual paper output in each respective field, worldwide.
My question for Kai von Finkel: What percentage of the total annual paper output in — respectively — semantics, phonetic optimality theory and syntactic theory do you think Semantics Archive, the Rutgers Optimality Archive and LingBuzz captures (by direct deposit)? (Since PhilPapers is largely harvested, the question there is moot.)
The benchmark against which to compare the amount by which these annual central deposit percentages exceed the global average of 15% is the annual institutional deposit percentages for institutions with a deposit mandate. (The percentage for unmandated institutions will be the spontaneous 15% baseline rate.) The institutional deposit rate is not based on the annual worldwide output in any given discipline or field, but on that institution's own total annual paper output, across all its disciplines.
On this basis, mandated institutional self-archiving reaches 60% within about two years of mandate adoption, and continues climbing toward 100% thereafter.
My prediction is that — with the exception of Arxiv (which is unmandated), and PubMed Central (which is mandated, but only for NIH-funded research, not all the rest of the world's biomedical research) -- all other unmandated, unharvested central repositories will, exactly like unmandated institutional repositories, be hovering at about 15% of their respective total annual target output.
The only practical measure that has been demonstrated to raise these deposit rates is deposit mandates. Funder mandates are important, but not all research is funded, whereas all research, funded and unfunded, across all disciplines, originates from institutions (universities and research institutions). Hence the slumbering giant of Open Access is institutions: Once they mandate OA, we will have universal OA.
And here's where locus-of-deposit comes in. It's fine to say let 1000 flowers bloom, with "a mixed model of both harvesting and direct [deposit]," but that misses the two fundamental underlying problems: (1) how to get the deposit rate to exceed the 15% spontaneous baseline? and (2) how to fund central repositories?
The solution to both problems is simple: Both institutions and funders must mandate deposit, convergently, in the author's institutional repository and then central services can automatically harvest the content for global search.
This convergent, collaborative "mixed model" for direct deposit and harvesting makes sense and can work. The one that already shows signs of problems is a divergent, competitive "mixed model" in which authors deposit directly in either institutional or institution-external central repositories, depending on where mandates pull. This would either make authors — who are already sluggish about self-archiving spontaneously (85% don't do it unless mandated) -- face the prospect of having to do double-deposit for their funded papers (once institutionally, once institution-externally), or it would force institutions (still sluggish about mandating self-archiving at all) face the prospect of having to harvest back institutional papers that had been deposited institution-externally.
This divergent "mixture" is not only dysfunctional from a practical point of view, but it is a disincentive to the adoption of institutional mandates. In contrast, convergent mandates from funders and institutions, both stipulating direct institutional deposit and automated central harvest would not only generate a great deal more OA, and make institutions the allies of funders in monitoring and ensuring that funder mandates are complied with, but it would fill central repositories while naturally distributing their costs across institutions.
See:
The importance of locus of deposit for OA
How to Integrate University and Funder Open Access Mandates
Stevan Harnad, EnablingOpenScholarship
-
Kai von Fintel
Dear Stevan (I got it right this time),
I'm all in favor of open access mandates and in fact was part of the ad hoc committee that designed MIT's mandate and shepherded it through the MIT Faculty Meeting. I was simply disputing your statement that there have been no dedicated central archives in the last two decades.
-
Brian Josephson
In 2005 I wrote in Nature : ArXiv has become a vital communicative resource for the physics community. The moderators? attitude to any challenge to conventional thinking is likely to result in the loss to science of important innovative ideas. This, regrettably, is still the case today; the archive's administrators still engage in the unimaginative blocking of such ideas. Whether an idea is a deep one or one that is clearly misconceived makes no difference as, assisted by Ginsparg's ingenious mechanised sieving procedures, the moderators perform their rituals .
The way these flagging algorithms work has not been disclosed, but while they may well serve the purpose of stopping crank ideas they are equally effective at impeding significant innovative ones. Once an idea has been flagged, it is up to a moderator to decide whether to allow a paper to appear or not. If the moderator is in doubt, the usual response is to divert the paper to the section 'general physics'. The problem with this is that people tend only to look at papers in their own specialist sections, and do not look at 'general physics'.
That ought not to be a problem, since a cross-listing facility is provided; authors simply have to click on a link, pick a section, and the submission appears in the chosen area also. But the moderators have ways of stopping this in the case of 'problem papers', as this screengrab demonstrates. In this case, the paper was posted by an experienced physicist who had recently been endorsed for the subject area that he had designated for his preprint. Ginsparg says "Incoming abstracts are given a cursory glance by volunteer external moderators for appropriateness to their subject areas", but should a 'cursory glance' override the considered judgement of someone who has spent much time working on a paper?
It is true that in this particular case the majority of people in the subject area concerned, interested only in very technical issues, would have no interest in this particular paper. But they could very quickly ascertain this from the abstract, and would lose only a few moments over it. From the perspective of scientific advance, what is far more important is the minority who would find the ideas in the paper of interest and might help the advance of science by using their own specialist expertise to take the ideas further. As I noted in my letter in Nature referred to above, "Radical changes are required in the way the archive is administered".
-
Brian Josephson
'Science is organised common sense where many a beautiful theory was killed by an ugly fact' (T.H. Huxley). And now, by a supreme irony, it appears that supersymmetry in its basic form has met just this fate (see BBC report LHC results put supersymmetry theory on the spot ). Will arxiv's moderators now be declaring submissions on this subject, of which there are many, 'inappropriate for the archive'? Actually, I think not, because a paper being merely of doubtful validity, rather than being unorthodox and a threat to the ruling paradigm, has hardly ever been ground for refusing an upload.
-
Brian Josephson
Repressive regimes and the arxiv
I recently received a request from an Iranian science magazine for an email interview. A number of insightful questions were asked, on a range of topics.
The arxiv on the other hand has more the air of a repressive regime. In order to try to ensure that they are not seen, ideas that the arxiv admin fear might rock the boat are declared 'inappropriate for cross-posting'. Requests for an explanation for such a declaration in the case of my own preprint in phys-gen, both to the moderation and to Ginsparg, were ignored in both cases.
Even Rupert Murdoch agreed in the end that there were problems in his organisation. Will Prof. Ginsparg now do the same, or will it require the attention of an investigating committee before anything happens?
-
Davy Jones
I don't know about arXiv, I think I can characterize it like:
(+) reliability, convenience of search
(-) strong bias in favor of speculative, unconfirmed research areas (like string theory) that
happened to have reached dominance in academia, lack of transparence, lack of insurance and
remedies against monopoly control,
author registration limitation and endorsement requirement.
HEP is in a long-lasting and deepening crisis and arXiv is contributing to it by placing undue
burden on those who may have to offer new approaches.
Davy@ jdeg