Join 3,434 readers in helping fund MetaFilter (Hide)


How Far Can Terms of Service Go for Restricting Data Access?
January 11, 2013 6:38 AM   Subscribe

Law portals & content portals use their Terms of Service to restrict access to data beyond what they can restrict by means of copyright. What are the limits to this practice, and the history of enforcement?

So, I'm interested in freeing my city's code, (which is public domain) from the cruddy content portals it's currently stored in - previously westlaw, now lexis. Since westlaw and lexis actually do the editing on the document - they integrate bills and resolutions into the main text - their copy, on cruddy websites and expensive books - is the only copy. Other states (well, real states) usually have the money to do this themselves, but DC doesn't.

I'll spare the explanation of why it's useful to liberate this from westlaw/lexis, I assume people have used those sites and gotten 'the experience'.

And so, this puts me in a sort of bind: it's public domain data, but lexis and westlaw have terms and conditions specifically against the kind of web scraping that's necessary to produce a copy of this data. I'm not clear on what exactly companies can put in their terms of service - for instance, many have stipulations on copying, which you would think is covered by license law rather than terms.

Also worrying is the case against Aaron Schwartz, which frames web scraping in general as fraud under USC 1030 (though aggravated by Aaron's kind of ridiculous dodging-cameras and working-sneakily tactics).

To make matters even more complex, the specific document changes on a two-week basis, so repeated scraping or some kind of direct transfer would be necessary to keep a better copy up-to-date.

What are the options here, legally? I'm interested in making free data 'actually free' without being charged with a felony.
posted by tmcw to Law & Government (10 answers total)
 
It's not public domain data if lexis and west law have edited it, formatted it, etc.

You need to get the raw data published by the government.
posted by dfriedman at 7:03 AM on January 11, 2013


The data is free, but their compilation of the raw data is not.

The state (or city or whatever) passes laws in the form of bills. That is the raw data. They say things like "we hereby amend statute 41 to remove paragraph 18, and change paragraph 21 to read "may" rather than "shall"." That's the law as it exists.

Lexis nexis takes the freely available source and the freely available amendments, and does the work as specified. They simplify the raw data as provided by the legislature into what the effective law ends up being. That end product is no longer public domain because it is a work that they created. They charge for this service.

They are basically a library service. You are free to start your own library, but they want to be paid if you want to use theirs.
posted by gjc at 7:17 AM on January 11, 2013


It seems that, from the way you have written your question, that Lexis actually has a contract with DC to maintain the text of their laws. If that is the case, you should take a look at the terms of that contract- it may have specific language as to what is public domain and what is not. There may be a special exception for access to DC code in Lexis. Lexis may not make this clear on their UI, however, or even in the boilerplate copyright notice they put on every page of their site.
posted by rockindata at 7:24 AM on January 11, 2013


I am an attorney, but I am not your attorney. This is not legal advice. You should consult a competent attorney in your jurisdiction.

You should ignore the answers given so far in this thread. None of them are based on the cases that have dealt with this issue. This is a developing area of the law with different approaches taken by different circuit courts. There may or may not be a clear answer under the law of the DC circuit, but an attorney can provide you with some guidance to weigh against the potentially severe consequences of copyright infringement.
posted by jedicus at 7:27 AM on January 11, 2013 [1 favorite]


The DC Code is available here from Westlaw without cost, however the interface is awful.

In this situation, if you want to get the DC Code direct from the source, I suggest you contact the offices of council members Tommy Wells and Mary Cheh (and perhaps your own council representative?), whom I think would be sympathetic to this effort. Cheh is a law school professor and Tommy is just good.
posted by exogenous at 7:34 AM on January 11, 2013


I'll just point out for the sake of clarity that the link exogenous provided is a link, as far as I can tell, to the official DC Code. That same link is the one at the bottom right of the DC Council website.
posted by OmieWise at 7:48 AM on January 11, 2013


If you're really interested in pursuing this beyond an idle I-want-to-scrape I'd advise you to reach out to the DC Open Government Coalition. It's in their wheelhouse and there's a lot of water under this bridge, as jedicus says. Rather than try to re-research the legal wheel here I'd suggest you see what's been done here already.
posted by phearlez at 8:02 AM on January 11, 2013


It's not public domain data if lexis and west law have edited it, formatted it, etc.
No: the contract with Westlaw, and the new one with Lexis, stipulates that copyright is assigned to the city, not to the portal, and there's case law that supports this - a relatively well-known supreme court decision that ruled that Westlaw could not assert copyright over public domain works by adding or modifying documents in this fashion.
In this situation, if you want to get the DC Code direct from the source.
The issue at hand is that they do not have the raw data; Westlaw, and now Lexis, is the source for the compiled code. It only exists as a portal on the side of Westlaw->Lexis and as printed copies.
If you're really interested in pursuing this beyond an idle I-want-to-scrape I'd advise you to reach out to the DC Open Government Coalition.
Thanks! I'll reach out.
posted by tmcw at 8:10 AM on January 11, 2013


Hey Tom! If you don't know about it already, you might find it worthwhile to look at Carl Malamud's Public.Resource.Org. They've been working for a few years now collecting and republishing data that should be in the public domain but is hard to get at. Including various government codes. Coverage is spotty, I don't know they have anything from DC. But someone there would undoubtedly know the answer to your legal questions. And if you do come up with a good copy of the DC code, I'm sure they'd be glad to host it.
posted by Nelson at 8:55 AM on January 11, 2013


@Nelson - ah, I forgot to email him as well! Just sent off a note, thanks!
posted by tmcw at 9:52 AM on January 11, 2013


« Older We're late to swaddle. After ...   |  I'm trying to move as minimall... Newer »
This thread is closed to new comments.