Is it legal to scrape data from free resources?
April 3, 2021 12:18 AM   Subscribe

Let's say hypothetically that a textbook publisher has made lots of their textbooks available for free online. Of course, the textbooks are still copyrighted. Still, is it legal for me to scrape these textbooks for questions and create a website based around that? If not, why not?
posted by matkline to Law & Government (18 answers total) 2 users marked this as a favorite
Response by poster: Extended detail: I'd like to create a quiz-style website for my students.
posted by matkline at 12:20 AM on April 3

Are you writing your own questions... Or are you scraping THEIR questions?
posted by kschang at 12:30 AM on April 3 [1 favorite]

If you’re asking whether a copyright is rendered meaningless if a copyright holder allows free access to the content of a copyrighted work, the answer is no. IANAL, but that much seems obvious. “Scraping” isn’t the issue. Re-using someone else’s work without permission is still problematic in the same ways it would be otherwise.
posted by jon1270 at 3:04 AM on April 3 [3 favorites]

Response by poster: As a followup: What if I had my students download the textbooks, then gave them a program that would scrape out the relevant questions to create a quiz?
posted by matkline at 3:18 AM on April 3 [1 favorite]

Then you would be encouraging your students to create derivative works without permission from the copyright holder.

Just because you can, doesn't make it legal. Just because it's useful, doesn't make it legal. Just because it's fair and reasonable, doesn't make it legal. Just because you'd almost certainly get away with it until the usefulness of it made it popular, doesn't make it legal.

If you want to make it legal, the way to do that is to negotiate permission from the copyright owner, and you're less likely to get that permission if you've already breached their copyright before asking.

Perhaps you could pitch it as a money spinner for them, something they could use to generate "bonus" quizzes from their existing textbook IP to make them even more attractive to potential customers.

The ability to create derivative works without needing to seek this kind of explicit permission first is the exact point of Creative Commons licensing.
posted by flabdablet at 4:19 AM on April 3 [8 favorites]

I have something to add. As others have written, no, you may not use copyrighted material that is free. Do note that copyright is for an expression in permanent form. Copyright doesn't cover plots, ideas, inventions.

You can certainly use the plots, ideas, inventions from a copywritten source, but not the words.

Fair use of copyright material is legal. Fair use, however, is a not fully defined concept. It is interpreted by courts. This link is to what the US Copyright Office offers about fair use.
posted by tmdonahue at 5:50 AM on April 3 [1 favorite]

One further point. As others have noted, just because a copyrighted work is available without payment doesn't mean that the copyright is ineffective. Copyright gives the owner the legal right to control the use of the work. But conversely, just because something is copyrighted does not mean that any use of the work is forbidden. It just means that the only legal uses are the uses that are allowed by the owner. (And of course, Fair Use is legal whether or not the owner wishes to allow it.)

In this hypothetical scenario, it would depend on how the copyright owner has made the works available. I'd expect them to distribute them under some sort of license agreement. This might be a CC-0 license, which grants everyone in the world the rights to unrestricted use, but that would be unlikely, as it would allow anyone to print and sell copies of the textbook. So they would probably have some other license agreement associated with the free access that would spell out what uses they wish to allow for free, which are allowed with payment only, and which are not allowed at all. To get a definitive answer, you would need to consult the license under which the works are made available to you.
posted by yuwtze at 6:50 AM on April 3 [3 favorites]

The data has been left unlocked; that does not change how or if you may use it. I would probably not be beyond publicizing the resource, discreetly, to students. You are a teacher, whatever school you are affiliated has someone who understands fair use; this is a useful thing to know about.
posted by theora55 at 7:19 AM on April 3 [1 favorite]

The copyright definitely applies to the questions as much as the text. Educational questions represent a real investment of labor and expertise.
posted by amtho at 7:26 AM on April 3

And if the questions are not part of the book itself, they are still copyrighted.

Copyright exists even if the writer doesn't "register" the work. It happens automatically, at least in the US.
posted by amtho at 7:27 AM on April 3

Assuming you are in the USA, Copyright law has Fair Use provisions with special rules for educational use, with four "fair use" factors to consider:

1. purpose of use, e.g. commercial / for profit vs. educational / non profit
2. nature of work, e.g. fiction, nonfiction, published, unpublished
3. amount of copyrighted work you plan to use compared to the total
4. whether your use alters the market for the copyrighted work.

See Using Copyrighted Material which mentions other factors, such as whether this is for in-person vs. online use.

I think that your use *might* be OK, if you are a nonprofit educational institution.

An analogy could be: Suppose you obtain a set of paper textbooks for free, and that textbook has 10 "test your knowledge" questions within each chapter. Would it be OK for you to copy & paste those 10 questions onto a single page, which you print and hand to your students in each class? I suspect this would be fine.

On the other hand, if you are not a teacher and trying to do this for commercial use for the world at large, I'd bet it's not OK.
posted by soylent00FF00 at 8:01 AM on April 3 [6 favorites]

This could absolutely fall under educational fair use. Particularly since the students are expected to purchase the book. Textbook publishers expect you to use the quest, and hosting then on a website for the exclusive use of your students is not a particularly unusual use case.
posted by mr_roboto at 8:24 AM on April 3 [4 favorites]

If you do this, restrict access to the site to your students.

As a followup: What if I had my students download the textbooks, then gave them a program that would scrape out the relevant questions to create a quiz?

You could definitely tell them "Enter the answer to question 134 in your books", though that might be too clunky for what you're envisioning.
posted by trig at 10:51 AM on April 3 [1 favorite]

yuwtze is right; the actual answer to this question is going to be in whatever license agreement the textbook vendor has put online. Given how carefully textbook publishers treat IP I'm almost certain if you look you will find one.

As folks have said, quiz questions seem very likely to be subject to copyright. Absent a license agreement you would be violating their copyright to scrape and republish all their questions.

One alternate bit of copyright that comes up in the context of textbooks is facts. Facts are not subject to copyright, and that extends to whole pages of tables of data and information. However the specific presentation of facts can be copyrighted and the details get a little woolly. Anyway that doesn't apply to your quiz example.
posted by Nelson at 11:18 AM on April 3 [1 favorite]

What if I had my students download the textbooks, then gave them a program that would scrape out the relevant questions to create a quiz?

Probably not. There was a case where someone had a "Christian" DVD player that had information about major films built into it and would skip segments that had been determined to be too naughty that got sued out of existence.

I think that your use *might* be OK, if you are a nonprofit educational institution.

Being educational is not a silver bullet for fair use. Whether it has an economic impact on the content owner is a big factor. So showing excerpts of a movie to film criticism students is covered, because it's not creating a disincentive to buy the movie. Whereas making a derivative work of an educational product is a lot less defendable, particularly if having the quizzes excerpted online might mean you don't need to buy the book.
posted by Candleman at 2:04 PM on April 3 [3 favorites]

As others have said, this is definitely copyrighted. The only thing that would allow you to use the questions as you have indicated, would be Fair Use--or, of course, asking the publisher and getting permission.

Given that you are using the questions in an educational setting, there is some possibility you can justify fair use.

Looking at the four fair use factors:

#1. Purpose & character of use. If you are an educator at an accredited educational institution, using the materials as part of teaching a class, that puts this factor in your favor. If it is an "in-person" class (ie, 30-odd students you teach each term, in times of COVID this might be all or partly via Zoom, NOT an unlimited number you might reach via a telecourse) that tends in your favor. If you restrict it to current students only (via password etc) that tends in your favor.

#2. Nature of the copyrighted work. This one tends against you--a published textbook definitely falls under copyright protection. However the fact that you are using a portion of the textbook in a class (and in given quiz questions from the text, in a way that is clearly an intended use of these questions--that is, to help students in a class setting learn the material) this might tend a bit in your direction. The questions clearly have protection as a creative work, but definitely not the same level of protection as, say, an artistic work like a poem.

#3. Amount of work used in relation to the whole. If you were just using a few selected questions here and there, this would definitely tend in your favor. If you were even using the questions from just one chapter, it would tend in your favor. However, using all of the questions from all chapter from the entire book tends to run pretty strongly against you. This is a pretty large and substantial portion of the textbook.

#4. Effect of the use on the potential market for the book. If you the textbook is a required textbook for the course and if you restrict usage of the quiz web site to currently enrolled students in your course, I think this one definitely falls in your favor. Having the quiz web site in no way replaces purchase of the book. The fact that the site is restricted to currently enrolled students only, who are required to purchase the book (or use it online at the publisher's web site) for the course anyway, means that there is no impact at all in reducing the potential market for the book.

Altogether, you can see why no one can just give you a cut and dry answer as to whether this would be fair use. If this went to court, the court would weigh all four factors together and somehow add them up to a judgement. I've seen cases where three factors went towards fair use but the remaining factor was so strong in the copyright holder's favor that the judge ruled it copyright infringement nevertheless. (Commonly this happens in a highly creative work like a poem, musical work, play, etc, where the creative element is so strongly that borrowing even a few lines amounts to appropriating the heart of the work.)

But altogether, my own sense is that it is pretty clear fair use, if you are teaching at an educational institution, if students in the class are purchasing the textbook for class use (or otherwise getting access to it in an legal way, such as via the publisher's web site), and you use passwords or other means to restrict usage of the quiz site to currently enrolled students.

The only questionable element for me would be that you are using ALL of the quiz questions in the textbook, which is a substantial portion of the entire work. If you restricted use of the quizzes to those the class is currently studying, or the chapters that are on the current test, or similar, that might help in that regard.
posted by flug at 2:24 PM on April 3

If you're in higher ed (can't speak about K-12, sorry) and your campus library has an Open Educational Resources librarian, copyright unit, or even just a subject librarian for your area, set up a consultation with them. Librarians are used to answering this question from a practical, not-legal-advice perspective and providing alternatives tailored to your course. For example, Open Stax out of Rice has a vetted search for Creative Commons-licensed things like quiz banks and homework.

As a subject librarian, I get this question all the time so I have a conservative answer that I'll put in an email ("No, that doesn't sound legal as described. Let's chat further about options") and a pragmatic answer that I won't. So would an OER or reserves library worker.
posted by librarylis at 8:43 PM on April 3 [1 favorite]

I would just give them the link to the book. Then an email or something with page/question numbers for them to answer.

I'm not sure I see for what reason you would propose having them download the book and run a program to scrape the questions as an alternative to doing it yourself.

Here's a a book (link). Please answer the following questions:
* p30 Q 4
* p80 Q 1-3
* p99 Q 7

Why bother?
posted by zengargoyle at 2:48 AM on April 6

« Older Oldest depictions of rude gestures   |   What's the best program for figuring out how many... Newer »

You are not logged in, either login or create an account to post comments