# Paperscape

August 18, 2013 4:33 AM Subscribe

Paperscape is a searchable 2-dimensional visualization of the 800,000+ scientific papers (mostly in physics and math) on the arXiv preprint server.

Neato. I hope they will analyze common paper tags to automatically generate (sub)subtopic labels, because that level of detail would be cool to explore.

[On 2nd thought… is this semantic information even automatable? etc.]

posted by polymodus at 4:56 AM on August 18, 2013

[On 2nd thought… is this semantic information even automatable? etc.]

posted by polymodus at 4:56 AM on August 18, 2013

This whole explanation bit is quite hard to find, but makes everything make much more sense:

posted by lollusc at 5:07 AM on August 18, 2013 [2 favorites]

Each paper in the map is represented by a circle, with the area of the circle proportional to the number of citations that paper has. In laying out the map, an N-body algorithm is run to determine positions based on references between the papers. There are two “forces” involved in the N-body calculation: each paper is repelled from all other papers using an anti-gravity inverse-distance force, and each paper is attracted to all of its references using a spring modelled by Hooke’s law. We further demand that there is no overlap of the papers.Also, if you search for a specific author or paper, the results are the ones with black borders (which are pretty hard to make out). The silvery circles seem to circle a range of other papers that (maybe) cross-cite the ones you are interested.

The map is rendered simply as a solid circle for each paper. The colour of the circle denotes the arXiv category of the paper, and the brightness indicates age. Brightness is sometime difficult to discern, and we are working on adding a heat-map overlay to indicate clearly the areas of the map which have the most recent activity.

posted by lollusc at 5:07 AM on August 18, 2013 [2 favorites]

(So it seems like the mathematicians cite each other way less than other other areas do. What's up with that? I mean, the maths people I know don't actually TALK, but I always assumed they read each others' papers, at least.)

posted by lollusc at 5:09 AM on August 18, 2013

posted by lollusc at 5:09 AM on August 18, 2013

Oddly, a paper I wrote on the mechanical dispersion of biological cells (and uploaded to Quantitative Biology) is grouped deep in the sodium cobaltates in a minor smear to the right of superconductivity. I don't see anything familiar nearby, and feel very cold and alone.

A possible reason is the none of the cell mechanics papers I cited has been uploaded to arXiv. I wonder what classification method the creators use at that point.

posted by Mapes at 5:14 AM on August 18, 2013

A possible reason is the none of the cell mechanics papers I cited has been uploaded to arXiv. I wonder what classification method the creators use at that point.

posted by Mapes at 5:14 AM on August 18, 2013

If you click on the "two labels" icon in the top right, you're sent to a more traditional search engine called "My Paperscape", where you can create your own graphs of related papers. It seems to be more useful.

posted by surrendering monkey at 5:17 AM on August 18, 2013

posted by surrendering monkey at 5:17 AM on August 18, 2013

*(So it seems like the mathematicians cite each other way less than other other areas do. What's up with that? I mean, the maths people I know don't actually TALK, but I always assumed they read each others' papers, at least.)*

Ever walked in on a number theory lecture midway though the proof of the day? Its better just to walk back out, go back to your room and start on your own with a ream of paper, about 30 sharpend #2 pencils and the gaussian numbers and try to derive the fundamental theory of calculus on your own time and publish whatever mistake you make as a paper since there is a ton to learn from the peripheral tangents you'll be on.

posted by Nanukthedog at 7:43 AM on August 18, 2013

*(So it seems like the mathematicians cite each other way less than other other areas do. What's up with that? I mean, the maths people I know don't actually TALK, but I always assumed they read each others' papers, at least.)*

Often, a proof of a mathematical result will only depend on others' results* in a few key spots; the "scholarship" component of math is perhaps proportionally smaller than in other fields. This is however a matter of style and choice of particular field of study. I know several mathematicians, including some highly successful ones, who don't really read others' work as a routine matter (they look specific things up, of course), and there are others whose way of doing math depends much more heavily on scholarship and deploying pre-existing machinery in a clever way. It depends on what you do and how you do it.

There is a strong cultural tendency to value self-contained proofs, to the extent that a mathematician making an argument that builds on previous work will sometimes be very careful to structure things in such a way that the previous result is not invoked more times than is strictly logically necessary. These values could also contribute.

*Of course, the whole language and choice of problem and biases about what's relevant, etc. come from a culture born of the sedimentation of the work of many, but this is different from explicit instances of "By Proposition 4.1 of Smith-Jones [SJ12]...."

posted by kengraham at 7:50 AM on August 18, 2013 [2 favorites]

(It's interesting to see how these attitudes propagate. My PhD advisor was strictly from "work it out yourself and let the referee tell you that e.g. the third step of your argument can be simplified by citing so-and-so", and while I'm a little bit more prone to reading random stuff, I tend to operate in a similar way. Other people I know are more likely to start with "What's been done on this problem already?"

Of course, I'm painting it as more black-and-white than it is. People of both types spend lots of time doing the thing attributed to the other type, not least because a major part of doing math is taking pre-existing results or techniques and modifying them to suit other, ideally more general, contexts.)

posted by kengraham at 8:01 AM on August 18, 2013

Of course, I'm painting it as more black-and-white than it is. People of both types spend lots of time doing the thing attributed to the other type, not least because a major part of doing math is taking pre-existing results or techniques and modifying them to suit other, ideally more general, contexts.)

posted by kengraham at 8:01 AM on August 18, 2013

paperscape will teach me not to put LaTex in arXiv abstracts; one of my papers is tagged with something in an unnatural language...

posted by kengraham at 8:05 AM on August 18, 2013

posted by kengraham at 8:05 AM on August 18, 2013

*Ever walked in on a number theory lecture midway though the proof of the day?*

Sure, all the time.

Oh, sorry, was that a rhetorical question?

posted by escabeche at 8:26 AM on August 18, 2013

Also, I think the standards for citation in math are different from the standards in the sciences — or at least in the social sciences, which is the branch of science I know best.

1) In math, a single proof of a theorem is all you need in order to make use of that theorem. A second proof might be interesting or illuminating or whatever, but it doesn't make the theorem "more proven." In the sciences, evidence is cumulative — if there are ten studies which all support a single claim, you cite

2) In math, there's a lot of concepts you can use without citing anyone. So like you can refer to the real numbers without namechecking Cantor, Cauchy and Dedekind in a footnote. You can talk about the primes without explicitly giving credit to Euclid. In the social sciences, any time you use a theoretical construct, you cite the people who developed it — even if it's common knowledge within the field that they're the ones responsible. And as I understand it, in hard sciences, any time you use a particular concept or modeling assumption, it's the same way — you give credit even if everyone already knows where it comes from.

3) In math, you don't cite someone unless you're actually using a result of theirs. There's no courtesy citations. You aren't expected to cite other people who worked on your topic just "to give context" or "as historical background" or whatever. If your proof doesn't build on theirs in some crucial way, you can leave them out.

posted by Now there are two. There are two _______. at 8:44 AM on August 18, 2013 [2 favorites]

1) In math, a single proof of a theorem is all you need in order to make use of that theorem. A second proof might be interesting or illuminating or whatever, but it doesn't make the theorem "more proven." In the sciences, evidence is cumulative — if there are ten studies which all support a single claim, you cite

*all ten*, because that really does make the claim "better supported."2) In math, there's a lot of concepts you can use without citing anyone. So like you can refer to the real numbers without namechecking Cantor, Cauchy and Dedekind in a footnote. You can talk about the primes without explicitly giving credit to Euclid. In the social sciences, any time you use a theoretical construct, you cite the people who developed it — even if it's common knowledge within the field that they're the ones responsible. And as I understand it, in hard sciences, any time you use a particular concept or modeling assumption, it's the same way — you give credit even if everyone already knows where it comes from.

3) In math, you don't cite someone unless you're actually using a result of theirs. There's no courtesy citations. You aren't expected to cite other people who worked on your topic just "to give context" or "as historical background" or whatever. If your proof doesn't build on theirs in some crucial way, you can leave them out.

posted by Now there are two. There are two _______. at 8:44 AM on August 18, 2013 [2 favorites]

*Ever walked in on a number theory lecture midway though the proof of the day?*

Only with a duck on my head, a rabbi and a talking dog.

posted by yoink at 8:46 AM on August 18, 2013

This is one of those games that I've never played but read hundreds of pages about...

posted by Foci for Analysis at 8:58 AM on August 18, 2013

posted by Foci for Analysis at 8:58 AM on August 18, 2013

*I was wondering who that was!*

I call him Quacky McQuackerson. It is, I admit, an odd name for a rabbi.

posted by yoink at 9:13 AM on August 18, 2013

*3) In math, you don't cite someone unless you're actually using a result of theirs. There's no courtesy citations. You aren't expected to cite other people who worked on your topic just "to give context" or "as historical background" or whatever. If your proof doesn't build on theirs in some crucial way, you can leave them out.*

I've seen papers following that convention, but my supervisors have instructed me to (and in their own papers they themselves) summarize previous work on the same or closely related questions, with citations, even if the present paper doesn't directly use those results.

posted by stebulus at 9:18 AM on August 18, 2013

*1) In math, a single proof of a theorem is all you need in order to make use of that theorem. A second proof might be interesting or illuminating or whatever, but it doesn't make the theorem "more proven." In the sciences, evidence is cumulative — if there are ten studies which all support a single claim, you cite all ten, because that really does make the claim "better supported."*

I'd mark this "Best Answer" if the background were the right colour.

*3) In math, you don't cite someone unless you're actually using a result of theirs. There's no courtesy citations. You aren't expected to cite other people who worked on your topic just "to give context" or "as historical background" or whatever. If your proof doesn't build on theirs in some crucial way, you can leave them out.*

Seconding the disagreement with this. One cites related work at minimum to give the editor a good idea of whom to choose as a referee, and to make sure that, if they choose somebody else, you didn't piss off that person by not citing them. There are also less mercenary reasons to give context, not least of which is that mathematical results are often interesting by virtue of their role in some larger narrative.

Many math citations are not to results on which the proofs in the paper depend, but rather come in several other flavours, e.g. "Hurf-Durf [HD79] is the paper that launched 1000 ships, of which this is one..." or "We are generalizaing a result of Hurf-Durf [HD79] (in your face, H-D)...", etc.

posted by kengraham at 11:47 AM on August 18, 2013 [1 favorite]

Interesting how the impractical CSci papers are down on the bottom next to the math papers, where they can't contaminate the highly-practical astrophysics masterpieces up on top.

Or maybe the orientation of this universe model is really stochastic and tomorrow CSci will be huddled next to nuclear proliferation studies.

posted by Twang at 5:05 PM on August 18, 2013

Or maybe the orientation of this universe model is really stochastic and tomorrow CSci will be huddled next to nuclear proliferation studies.

posted by Twang at 5:05 PM on August 18, 2013

Poor solar physics is all off on its own barely-visible island...

posted by Zalzidrax at 7:27 PM on August 18, 2013

posted by Zalzidrax at 7:27 PM on August 18, 2013

« Older Truth and/or Bias in South Dakota | A brother in trouble. Newer »

This thread has been archived and is closed to new comments

This post on the Paperscape blog has some explanation: "

When zoomed out, arXiv categories are displayed, and the position of the category label is computed as the average of all papers in that category. As you zoom in, these category labels disappear, and are replaced by individual labels on top of each paper, so long as that paper is “big enough” on screen. [...] We have now added a third layer to this labelling process: we identify by eye regions of the map that have a definite theme, and give these regions a generic, but not too generic, label."Without knowing that, it's kind of a pointillist polychrome blotch.

posted by ardgedee at 4:51 AM on August 18, 2013