How I cope with the flood of arXiv papers
hope, that it could help you as well
I don't
OK, that is a joke -- I believe that I am dealing with the tons of papers rather fine. Here is how.
How and where to look?
-
Coarse-to-fine or Funnel. I check 100 paper titles + abstracts, I skim through maybe 10, I read one or two papers carefully .
-
"Features" I looks for: topic, theory/practice, figures quality, field crowdness, potential impact.
-
arXiv-sanity as a main provider and the other sources.
Coarse-to-fine scheme
I check 100 paper titles + abstracts, I skim through maybe 10, I read carefully one or two papers. Why? The most of papers are not relevant to me as a computer vision researcher. Some of the papers are bad. From those, which are good, the most important thing is their main message, not some details. And only little number of papers are worth reading -- for me. For you that would be different 1 or 2 papers out of 100, but likely not more.
Questions to ask yourself
I will try to show what I am looking for and also a formulate for myself what is going on inside of my "wet neural network" when staring on arXiv-sanity page. I ask myself the following questions, and if the answer is "yes" on some of them, the paper is going to be downloaded. By "read" in this section I mean "skim through abstract, conclusions, tables and figures to decide".
- My areas? Yes -> read.
WxBS (image matching), 3d reconstruction, image retrieval, metric learning, RANSAC, CNN initialization.
- Opens a new (sub-)area of research? Yes -> read.
E.g. first papers on GAN, NERF, Transformers, Lottery ticket hypothesis. It does not matter if the area is not relevant for me, e.g. image-to-image translation. New area papers are always worth to read.
- From the crowded area? Yes -> skip.
How many papers on the topic are published every day? E.g., in 2014-2015 I have followed the research on semantic segmentation, object detection and GANs. Now I mostly skip all the papers related to GAN, segmenation and so on, because the improvements become incremental AND that areas are not mine. Ofc, from time to time I read some paper on GAN, but only if it comes by other channel - e.g. recommended to me by a colleague.
Two previous points can be seen as an idf -- paper score normalization by average number of the papers on the topic.
- Dataset or large-scale benchmark paper? Yes -> read, regargless of the topic.
Why? It is useful to see how people gather data, clean the data, come up with a metrics and so on.
-
Simple baseline? Yes -> read, regadless of the area.
-
About understanding some aspect of machine learning? Yes -> read if have time
E.g. padding, double descent, over-parametrization.
-
Theory paper? Yes -> skip. Unless it touches very important topic for me.
-
Relevant for me as a user? Yes -> read if have time.
E.g. new non-linearity, optimizer, etc.
- Am I a reviewer of that paper? If yes -> bad luck.
I have to really read this paper several times regardless of anything.
In addition, I use some kind of "paper gestalt" -- does it looks as high quality work? Isn't the title over-keyworded and so on -- this kind of things are hard to verbalize.
Use arXiv-sanity, not arXiv feed
First, given the over-whelming popularity of this site, sometimes it is down. That is good - if it is down, then I do not check papers today. Nothing bad would happen is I skip paper reading today.
Second, it shows first 8 pages thumbnails and the abstracts. This helps to make a more informed decision on whether to download the paper or not.
Additional (filtered) sources of papers
-
Twitter feed. My twitter is very curated -- if someone tweets about the paper, it is going to be relevant
-
ResearchGate "You have a new citation of". That is mostly the way how I find about the papers, which use kornia, to be promoted on kornia twitter
-
Google Scholar recommendations. They are slow: appear a week after the ResearchGate shows me the paper. But -- Google Scholar recommendations cover the papers, which are not on arXiv. Thanks to university access, I am able to download the most of them, although not all.
That's all. Hope this helps :)