Elsevier Says Content-Mining Research Papers 'Could Be Considered' Stealing

linhchi · on Nov 19, 2015

Now they want to define "meta-research" as stealing, too. I can't imagine how would someone buy that.

Knowledge is something you cannot capture, that's why they have to try to "protect" the symbols. And they can go this far with that deed.

Why activists can read and handle degrading words from rich publishers (theft, stealing..), but ignorants cannot read the argument of activists? This is far too confusing to me.

http://www.gnu.org/philosophy/words-to-avoid.en.html

(I think publishers are feeling threatened as the free culture grows)

edit: the speaker explains copyright maths (a special kind of maths used by media lobby experts), very enjoyable :D

http://blog.ted.com/the-numbers-behind-the-copyright-math/

EngineerBetter · on Nov 19, 2015

Is this why they bought Mendeley, a document-management start-up who did a lot of data mining on academic papers, including terabytes of Elsevier's?

Hint: probably. I used to work there.

sklogic · on Nov 19, 2015

Shit. I'm using Mendeley and did not know this. I'll stop using it at once. Any suggestions for alternatives?

wycx · on Nov 19, 2015

I am happy since moving from Mendeley to Zotero. I like that I can store my pdfs on dropbox, and use Zotero on many different PCs. Mendeley's insistence on absolute paths to the database prohibited this.

Zotero's use of Google Scholar to extract metadata from pdfs made it easy when starting with several thousand pdfs.

kriro · on Nov 19, 2015

I use Zotero (+ownCloud for storage) as well, a bit quirky at times but has all the functionality I need. Don't know how good the standalone program is, I only use the browser plugin+libreoffice plugin and occasionally the dreaded ms-office plugin as well (if not working with latex).

_yy · on Nov 19, 2015

Standalone works very well too. Can recommend.

jbaiter · on Nov 19, 2015

Zotero is pretty decent and Open Source.

sklogic · on Nov 19, 2015

A killer feature of Mendeley for me was its ability to extract metadata from PDFs. I remember that I could not find a comparable functionality in Zotero last time I evaluated it. Is there any way to do it now?

EDIT: found it! Great!

spacecowboy_lon · on Nov 19, 2015

Elsiveer has its own in house big data team HPCC - A few years ago when I worked for another part of Elsevier I tried to get New Scientist interested in using ML to montize the 50 years or archives they had.

But the NS editor Was/Is a bit of a Luddite so nothing came of it.

versteegen · on Nov 19, 2015

Are you willing to put in a bit more detail about why you think that? I would be interested for one.

diakritikal · on Nov 19, 2015

I think I probably consider tax payer funded research being locked up behind a corporate paywall "stealing", so yeah they can go jump in a loch...

rvense · on Nov 19, 2015

That's the thing, Elsevier know stealing when they see it.

scotty79 · on Nov 19, 2015

Isn't theft a crime? Isn't publicly accusing someone of a crime he did not commit a slander? Should all content-mining researches sue Elsevier for slander? I don't see why not.

JupiterMoon · on Nov 19, 2015

Not sure why this is being downvoted. In my personal opinion Elsevier are slandering (actually libelling) researchers here. This could have significant financial implications for the researchers when e.g. grant applications are made.

JadeNB · on Nov 19, 2015

Assuming that this is just a cri de cœur, I agree with it; but assuming that you actually mean it as a legal argument, it's hard to imagine (although IANAL). Otherwise, the "abortion is murder" folks would already have been sued just for that claim. (That's assuming you agree that abortion isn't murder; but your post also assumes, maybe more safely, that one agrees that content mining isn't a crime.)

squat · on Nov 19, 2015

The American Physical Society recently blocked my IP and contacted my university after I downloaded about 30 papers within one hour (manually, while researching an unfamiliar topic). My department's IT team called me and asked me to shut down my mass download bot :D. APS staff were very friendly and lifted all restricitons after I contacted them though.

MrPatan · on Nov 19, 2015

They are on the way out. Yes, they'll take a shit on the carpet before leaving, but they are on the way out.

claudius · on Nov 19, 2015

I don’t quite see the issue here. Elsevier’s terms of service for subscriptions very likely explicitly ban bulk downloading. As such, they have every right to expect subscribers not to bulk download and further every right to ban specific subscribers (i.e. university libraries) if they violate those ToS. Nothing new, nothing exciting.

If you want to be able to freely access and bulk download the papers published by your peers, maybe you should make sure that those peers publish in journals that allow such downloads. Maybe you yourself should also only publish in journals that allow these downloads. Possibly, you could even run your own preprint server where everyone can submit their texts and which then can be made available in bulk to interested parties.

Breaking contracts you (or a person acting on your behalf) agreed to doesn’t help at all in the long run – boycotting those who only offer contracts you don’t like until they either change or go away is a much more viable, ethically acceptable and generally more enjoyable alternative.

And hey, if your method to detect fraud is successful, market pressure alone might be enough: after all, who wants to submit to a shady journal which doesn’t have its articles automatically checked for large-scale fraud regularly? Surely, only fraudsters…

jacquesm · on Nov 19, 2015

They say that for bulk downloading one should use the API because they want to be able to serve the 'normal' customers too. I perceive this as Elsevier having a bandwidth problem, I'd be more than happy to host a bunch of torrents with their papers to help them solve this problem.

claudius · on Nov 19, 2015

It doesn’t really matter what excuse they bring forward, contracts are contracts and terms of service are terms of service. Boycott them or stick by their rules, but don’t make your own.

If you release something under the GPL and some shop decides that they’d rather it be under the MIT license, you’d also be annoyed at them for both using your work and not sticking by your rules.

versteegen · on Nov 19, 2015

The recent UK law providing an exception to copyright for text and data mining gives Elsevier the right to require that access is only through the API provided they do make them accessible without "unreasonable" restrictions. I think that's perfectly fair, and they don't need an "excuse". See:

https://www.elsevier.com/connect/how-does-elseviers-text-min...

jacquesm · on Nov 19, 2015

> The UK law adds weight to our position; we are ensuring that those with "lawful access" (in UK legislation speak) have the right to mine our works.

Emphasis mine, I wonder if that was a slip or intentional.

Blahah · on Nov 20, 2015

In the UK we have a copyright exception for text and data mining for non-commercial research, and contracts that try to restrict that activity are unenforceable. Elsevier's terms of service are irrelevant except where they protect the security of their network. Most of the things they require in their terms of service are more restrictive than the law would allow if they tried to enforce the contract.

yourepowerless · on Nov 19, 2015

Why comply with unjust illegitimate laws created to serve a monopoly?

Why comply with a framework designed not for your benefit or societies benefit but for the purpose of maintaining a monopoly on others peoples labor, socializing cost and individualizing profit. Laws are not created for the people and by the people, they are made by corrupted individuals for those with the most wealth and power, complying with them is no better than complying with any other sort of violent oppression.

claudius · on Nov 19, 2015

> unjust illegitimate laws created to serve a monopoly

It’s not unjust illegitimate laws, it’s a largely valid contract you entered into and by which you are very naturally bound. There’s nothing morally wrong about “don’t use this access we give you to download everything”.

If your laws are, for whatever reason, unjust and illegitimate, then fix that instead of fighting against those laws by downloading papers from Elseviere o.O

> Why comply with a framework designed not for your benefit or societies benefit but for the purpose of maintaining a monopoly on others peoples labor, socializing cost and individualizing profit.

I don’t get what you’re trying to say. Are you saying the profit from scientific work lies solely or even largely in bulk access to the papers produced by that work?!

> Laws are not created for the people and by the people, they are made by corrupted individuals for those with the most wealth and power

If that is the case, then this is squarely the fault of “the people” to allow that to happen.

> complying with them is no better than complying with any other sort of violent oppression

This is not about complying with laws but complying with a not morally outrageously wrong contract between two private entities.

You are trolling, right?

jjoonathan · on Nov 19, 2015

Claudius seems to think he can have his cake and eat it too.

> boycotting ... is a much more viable, ethically acceptable and generally more enjoyable

As long as he suffers this delusion, there will be no reasoning with him.

bigiain · on Nov 19, 2015

In related news - "Elsevier considers citing research papers to be stealing".

Clowns...

PaulHoule · on Nov 19, 2015

Charging people big $$$ to read research papers that their tax dollars paid for could also be considered stealing.

dropit_sphere · on Nov 19, 2015

Fortunately, we all trust Elsevier implicitly as moral arbiters.

gamesbrainiac · on Nov 19, 2015

I think Elsevier is resorting to things like this because it is incapable of capitalizing on their current assets and infrastructure in a constructive manner.

Consider the fact that Elsevier locks up millions of dollars in public funded research behind their paywall, and then you'll see the irony of what they claim.

afandian · on Nov 19, 2015

Not pertinent to this exact story but there's a cross-publisher API for Text and Data Mining via Crossref for those of you who are interested. It defines http links to full text and license info. Elsevier participates in this and their API intersects with the Crossref TDM API.

Info here: http://tdmsupport.crossref.org/

versteegen · on Nov 19, 2015

The title is link-bait. That really isn't what Elsevier is saying here. They are complaining that this researcher broke the terms for the university's subscription. On their website they state: [0]

"""Further, we're looking at how we ensure that researchers know what they can and cannot do with content, or where to go for further information, without giving the impression that we are claiming ownership over non-copyrightable facts and data."""

Regarding this case, (IANAL) it seems to be that if their APIs worked properly, then Elsevier is completely in the right (especially in the UK) in demanding that papers only be accessed through the API. If they don't work, as the researcher alleges, then would I agree that's he's in the right.

A couple weeks ago I was reading all about Elsevier's APIs for downloading papers [1]. If you're at an institution that has access to ScienceDirect or Scopus then it seems you can easily get a key that gives you full access to everything, including papers in XML instead of PDF if you want (eg. a mathml-like for equations, paragraphs, figures, tables all marked up). However Elsevier make it very difficult to find the actual terms and conditions for text mining on their website, despite numerous pages which run you in circles. They are here [4].

To summarise them (again, IANAL): non-commercial research use only. You can't share the raw text mining output except with other people belonging to your institution/subscriber, although you could allow indirect access through e.g. a website. You can also distribute short snippets from the text with a copyright notice. You can keep the downloaded data until your API key expires, which happens if you stop using it or your institution stops subscribing.

I also found it strange that Elsevier forbid redistributing the abstracts of papers [3], considering that they are publicly accessible.

It would be fantastic if publishers couldn't restrict the use of the information in the articles they published, even for commercial use. Anything that any human learns from a copyrighted work can normally be used without restriction, why wouldn't the same apply to machine learning? However I assume that when you sign an agreement providing access to data (e.g. subscribe to Scopus) that you can sign away that sort of right (e.g. NDAs). The Hague Declaration [2] is a great initiative in this direction but unfortunately it's only petitioning for the rights of researchers.

[0] https://www.elsevier.com/connect/how-does-elseviers-text-min...

[1] http://dev.elsevier.com/

[2] https://creativecommons.org/weblog/entry/45456

[3] http://dev.elsevier.com/policy.html

[4] https://www.elsevier.com/__data/assets/pdf_file/0012/102234/...

IanCal · on Nov 19, 2015

> I also found it strange that Elsevier forbid redistributing the abstracts of papers [3], considering that they are publicly accessible.

Publicly accessible and the rights to redistribution are different things. Abstracts are protected by copyright, which may be held by the journal or the author, and so having a blanket "you can use the abstracts" policy is very difficult.

davelnewton · on Nov 19, 2015

Elsevier says a lot of things.

DrNuke · on Nov 19, 2015

They're looking for data scientists and engineers to monetise their archives but also losing market to cheaper and more open publishing networks. Good riddance, would say.

eveningcoffee · on Nov 19, 2015

I have simple recommendation. Stop publishing with them. Stop citing articles that have been published with them. Treat them like they do not exist.

bitwize · on Nov 19, 2015

Some of the shit Elsevier's pulled "could be considered" academic fraud.

Immortalin · on Nov 19, 2015

This is rather similar to the web scraping problem i.e. API vs scraping. Nothing new.