publications - Immature papers on arXiv

Sunday, 8 July 2018

publications - Immature papers on arXiv

I am from computer science and we typically submit to conferences and less often to journals. Publishing pre-prints to arXiv becomes more and more popular in my field. From the discussion here on AS I get the impression that arXiv is more than just putting something on a personal website.

For example from What to do when you spot a paper on arXiv with the same essential material as yours? I get that I should discuss arXiv papers if they are related to your work.

Now, a paper for a CS conference contains often about 75% theory + 25% experimental section. The experimental section takes a lot of time - so I wonder (and fear) if there is a trend to upload just the bare minimum to arXiv in order to get credit for the idea.

Is this a problem? Maybe similar to patent-trolls, just uploading vague ideas in the hope to get citations?

Edit: For clarification: I am working in the field of data-mining, where it is common to have experiments to show that your idea does not only work in theory. This question/concern came up as we discussed the pro and cons of uploading our work to arXiv, since we have never done this before, but it seems to become more common in our field.

Edit 2: It seems that more people are concerned with this problem, especially in machine learning:

Yoav Goldberg:

This post is also an ideological action w.r.t arxiv publishing: while I agree that short publication cycles on arxiv can be better than the lengthy peer-review process we now have, there is also a rising trend of people using arxiv for flag-planting, and to circumvent the peer-review process. This is especially true for work coming from “strong” groups. Currently, there is practically no downside of posting your (often very preliminary, often incomplete) work to arxiv, only potential benefits.

Yoav Goldberg:

I do not mind posting papers quickly on arxiv. I recognize the obvious benefits of arxiv publishing and fast turnarounds. But one should also acknowledge its shortcomings. In particular, I am concerned about the conflation of science and PR that arxiv facilitates; the rich-get-richer effects and abuse of power; and some of the current arxiv publishing dynamics in the DL community. It is OK to post early on arxiv.

It is NOT OK to misrepresent and over-claim what you did. Sloppy papers with broad titles such as “Adversarial Generation of Natural Language” are harmful. It is exactly the difference between the patent system (which is overall a reasonable idea) and patent trolling (which is a harmful abuse).

[...]

Most people don’t read the papers in depth but only the title and sometimes the abstract and sometimes the intro. And when the papers come from established groups, people tend to trust the claims without verification. “Serious researchers” might not fall for this, but the general population sure does get mislead. And by the general population I mean people who are not actively working in this exact sub-field. This includes practitioners in industry, colleagues, prospective students, prospective reviewers of papers and grants. In the short time since this paper came out, I already heard, on several occasions, “oh, you are interested in generation? have you tried using GANs? I saw this recent paper in which they get cool results with adversarial learning for NLG”. This will be extremely harmful and annoying for NLG researchers who apply for grants in the coming year (remember, many grants are reviewed by a panel of capable but non-specialized experts), as they will have to either waste precious space and effort in dealing with this paper and with Hu et al and explaining why they are irrelevant, or be dismissed as working on this “already solved problem”, despite the fact that neither the paper in question nor Hu et al actually did very much, and despite the fact that both papers have terrible evaluations.

And the follow-up discussion on reddit

Answer

It seems to me that this question is less about the arxiv per se and more about how to navigate doing research in a very fast moving academic field.

I get the impression that arXiv is more than just putting something on a personal website.

It's certainly different. The main differences are:

(i) Many more people will see your paper.
(ii) Your paper will indeed be archived, essentially permanently. (Withdrawing a paper from the arxiv has the effect of uploading a new, empty version. Older versions are still there!) On your own website, you can take things down at least as quickly and easily as you can put them up.
(iii) Some (very obnoxious) journals may regard posting on the arxiv as "prior publication". (This is strictly unheard of in my field, mathematics. My guess is that CS is close enough to math so that it is at least very rare in yours.)
(iv) Minimum standards of completeness and professionalism are enforced on the arxiv. These are enumerated on the site itself, but the gist of it is that they are looking for manuscripts at the last step before conference/journal submission or later. They are not looking for early drafts.

Of these points, probably the last is most relevant to you. If it is standard in your subfield to include 25% experimental data [you say that is standard in "CS", but that is certainly not true across the entire field], then a paper uploaded to the arxiv without that would probably look to many in your field to be incomplete, which is against the spirit and perhaps the rules of the arxiv. So I wouldn't recommend it.

But the situation doesn't fundamentally change for papers that you or others post on your own website. The phrasing in your question suggests that you feel that you might not have to "be responsive" in the academic sense to papers that you find on people's webpages (only). That's not true. As an academic you have to be responsive to others' work wherever you find it.

In terms of the prospect of people uploading "the bare minimum to arXiv in order to get credit for the idea": is this an actual problem for you or just something you are wondering might be a problem? I have never encountered this problem in my work. That you are wondering whether it might be a problem makes me think you may be a quite new researcher and haven't fully grasped the way the academic community works. (Which is fine, and you have only to look forward to understanding it better. But you should talk to others, including advisors and mentors, to try to get a better idea.) Academia places a great privilege on completed work for exactly this reason. If you put out a manuscript which, say, modifies an algorithm and hints that it could be faster in some situations, the most likely reaction you'll get is "Go on..."

This question may finally have made me understand what people on this site are on about when they say things like "An idea is worthless". An idea is certainly not worthless, but a vague and unimplemented idea is of highly uncertain value, to the point where rushing to publish "only the vague idea" would be a very poor, um, idea.

By the way, you don't have to immediately drop something because someone else had "the same idea" and put out a paper before you. Much -- perhaps most -- important academic work overlaps with other work and even more of it refines and extends the ideas of others. How to respond to seeing "your idea" in another paper is a topic for a different answer.

Finally, let me say: if what you've done, are doing or want to do has real value, then it is unlikely to be received with thunderous applause this week and totally ignored next week. If you're living in fear that someone else will say what you want to say, maybe slow down and find more to say.

Blog

Sunday, 8 July 2018

publications - Immature papers on arXiv

No comments:

Post a Comment

evolution - Are there any multicellular forms of life which exist without consuming other forms of life in some manner?