If everything were Creative Commons...
December 2, 2005 11:11 AM Subscribe

The ethics of screen scraping?

I have an open-source project which queries a large corporate website, parses it and makes the data available for free to other open-source projects in an easily digestable format.

Although it could be viewed as stealing copyrighted information, I feel comfortable doing this because I'm just repackaging freely available data, make no money doing it, people find it useful, and I am clear that my data is not to be used for commercial uses (though I make no effort to enforce this). Basically, the "Robin Hood" effect.

In the couple of years this project has been around, I've recieved several inquiries from people offering to pay me to adapt the project for commercial uses. It's always been easy to say no because there have been technical or time constraints which made it so. The offer I got last week is pretty damn tempting though.

In this case the source of the data would be a much smaller entity than the large corporate source I'm currently using, but they do provide it freely. Would it be wrong to help enable a company take it and use for profit? (I'm not really worried about legal issues, I don't think I'd be liable as a freelance developer, but maybe those issues are in play too.)

posted by If I Had An Anus to Computers & Internet (26 answers total)

Even free data sources typically have a EULA or some sort of usage agreements. As long as you're not violating that, there should be no issue.
posted by GuyZero at 11:22 AM on December 2, 2005

It's that profit part that seems most wrong to me. It certainly isn't fair use, and therefore is probably a violation of copyright, which I raise not just because it's a legal issue but because it should inform your thinking about what you are being asked to do. Without details it's hard to say, but it seems like it's ethically gray at the very least.
posted by OmieWise at 11:24 AM on December 2, 2005

My company was sued, and eventually lost (though I think it was settled), for the same thing. There is legal precendent against it, though I'm not sure how much the developer was involved in the suit.
posted by occhiblu at 11:33 AM on December 2, 2005

(Rereading your question, my company's situation wasn't exactly the same -- there were issues of stealing company secrets, basically -- but in any event, there may be more legal issues here than you think.)
posted by occhiblu at 11:35 AM on December 2, 2005

Thanks, guys. I sort of know this is wrong, but needed help saying no to money.

So what about the open-source project, then? Does being free and open change things (assume it is a violation of the EULA)?
posted by If I Had An Anus at 11:49 AM on December 2, 2005

From a practical perspective, maintaining a screen-scraping-driven app can be a pain in the butt given that everytime the site in question makes a change, it can throw your stuff out of wack.
posted by ph00dz at 11:57 AM on December 2, 2005

Yeah, that's a separate issue, ph00dz. My project (and it's source) have been pretty stable for a couple years now (there are no regexps involved).
posted by If I Had An Anus at 12:04 PM on December 2, 2005

I say go for it. Sure, it may not be "ethical" but you'll probably not get sued.
posted by delmoi at 12:28 PM on December 2, 2005

occhiblu: It would be the company doing the scraping, not IIHAA who would be liable.
posted by delmoi at 12:29 PM on December 2, 2005

Yeah, but if the question is, "Would it be wrong?," then I think knowing that you might be contributing to an illegal action is worth knowing, even if you yourself can't get sued for it.
posted by occhiblu at 12:38 PM on December 2, 2005

(I mean, is the question "Would it be wrong?" or "Would I get in trouble?" They're different, and it's not entirely clear what IIHAA means.)
posted by occhiblu at 12:41 PM on December 2, 2005

"Would it be wrong?"
posted by If I Had An Anus at 12:43 PM on December 2, 2005

If you're into territory about wrong vs right rather than legality I'd have to say it depends entirely on the purpose of the site being scraped and the purpose for scraping it. Clearly it's information that the megacorp offers for free, the issue is how are they trying to gain value from that free information? If your scraping and re-presenting denies them ad views it's a pretty simple "bad." If it denies them development of their brand and goodwill, I'd still call that bad but a little less obviously. If it's something they have to do for legal purposes of some sort (let's say providing reporting on their company for the sake of SEC disclosure, though it's an inaccurate analogy) then you may not at all be hurting them through your scraping.

Does the re-presenting of this information elsewhere deny them of some value in any way?

This sets aside the question of if it's wrong or not based on belief in a creator's complete right to determine how their creative product is presented, of course. By that standard this isn't even something you need to ponder.
posted by phearlez at 1:00 PM on December 2, 2005

(They're different, and it's not entirely clear what IIHAA means.)

IIHAA = If I Had An Anus.
posted by delmoi at 1:04 PM on December 2, 2005

If the company you would do this for is just using it internally, it wouldn't be a problem at all.

If they repackage it or provide copies to 3rd parties, that's when the trouble starts. You will need to talk to a copyright lawyer to make sure of where you stand in that case.
posted by voidcontext at 1:05 PM on December 2, 2005

I've done a bit of similar site scraping contract work of questionable legality. I dodged the issue by making my scraper identify itself as a bot, and following robots.txt rules just like any search engine web crawler does. I also made my scraper's output look as much like a search engine as possible, so if anyone complained, I could just point to Google and Yahoo's prior infringement and explain to them how to exclude my scraper with robots.txt.

None of that really answers the "would it be wrong" question, though. That really depends on your personal view of intellectual property issues.
posted by scottreynen at 1:07 PM on December 2, 2005

Just to be clear, I am not taking the freelance project.

Clearly it's information that the megacorp offers for free, the issue is how are they trying to gain value from that free information? If your scraping and re-presenting denies them ad views it's a pretty simple "bad." If it denies them development of their brand and goodwill, I'd still call that bad but a little less obviously....

I like this distillation of the issue. There is a single large ad on the source page (an example) but I doubt it's an important revenue source. The data is provided for the benefit of the entity's fans primarily, so I'd say the brand and "goodwill umbrella" covers it well. My repackaged data serves the same fanbase and helps to provide interest in the product, so in that sense I don't think I'm hurting anybody.

I dodged the issue by making my scraper identify itself as a bot...

Brilliant, thanks for this idea.
posted by If I Had An Anus at 1:22 PM on December 2, 2005

I think you already understood that it would be wrong when you asked your question. If you're still in any doubt HR3261 is pretty explicit about the legality of scraping and spidering. Really, the question you're asking is, "Should I care, particularly if consequences are unlikely?"

Ultimately it's a question of personal ethics, and the only real guidance anyone can give you is to suggest you spend some time thinking about it from the point of view of the company providing the source data.

Without knowing more about that company, I can only assume that if they provide the information 'for free', they are probably hoping to commercialize it in some other way. IE by developing a reputation, by attracting people who may choose to use another commercial product / service they provide, etc.

If your open source project uses their information without enabling / encouraging people to go directly to the source, then yes, I believe it's ethically wrong, unless it's clearly stated on the site that the information can be used in the way you intend. After all, the target company is at least paying for hosting that your project will be consuming, which means not only have they invested the time and expertise in creating the information, but could be financially disadvantaged, at least to a small extent, by having their information consumed in a way that provides no return for their effort.
posted by planetthoughtful at 1:32 PM on December 2, 2005

If you're worried about the ethics, as opposed to legality, there's a very simple way to resolve this problem: Contact the owners of the site, and ask how *they* feel about you doing it. If they consent, it's hard to see how it'd be wrong.

If they do not consent, or do not respond, then you're in a more murky area. But if they say, "hell yeah, do it!" then you're pretty secure in knowing it's not wrong.

It also depends on how your program works, how often you "scrape" the site, and what kind of load you put on their resources. If you end up adding significantly to their bandwidth bills or server load, I'd say it's wrong unless they do give permission.
posted by jzb at 2:01 PM on December 2, 2005

There was a realty company in (Ontario?) that had its ass sued by MLS for scraping the MLS listings. Direct contravention of the licensing terms.
posted by five fresh fish at 3:00 PM on December 2, 2005

Really, the question you're asking is, "Should I care, particularly if consequences are unlikely?"

Of course, if the project really does make money, it becomes more and more likely that the company you're screenscraping will notice and care, since there's more money in it for them if they go after you...
posted by clarahamster at 3:38 PM on December 2, 2005

IIHAA = If I Had An Anus.

Heh. Yes, I know. I meant, it was unclear to me what, exactly, he meant to ask.
posted by occhiblu at 4:12 PM on December 2, 2005

Simple solution:

Make sure that they run and manage your scraper and not you. As such, they will be responsible for ensuring that their usage does not go against the conditions of use of the site being scraped.

A lot of sites alllow their content to be used for non-commercial personal use - in that case, running someone elses scraper for one persons (non-commercial personal) use would be fine - but anything else would need permission sought by the person/company who wants this information ... and not you.

IANAL though.
posted by mr_silver at 4:29 PM on December 2, 2005

Is this really any different from what Google, Yahoo! and search engines do? As long as you honor "robots.txt", you're in the clear as far as I can see.
posted by SPrintF at 6:13 PM on December 2, 2005

Have you checked the Terms and Conditions page?

All copyright rights in the text, images, photographs, graphics, user interface, and other content provided on the Service, and the selection, coordination, and arrangement of such content, are owned by the NFL PARTNERS, as applicable among us, or their third-party licensors, to the full extent provided under the United States Copyright laws and all international copyright laws. Under applicable copyright laws, you are prohibited from copying, reproducing, modifying, distributing, displaying, performing or transmitting any of the contents of the Service for any purposes. Nothing stated or implied on the Service confers on you any license or right under any copyright of the NFL PARTNERS, or any third party.

The Service and the information contained in reference herein are for informational purposes only. Any reproduction, copying, or redistribution for commercial purposes of any materials or design elements of the Service is strictly prohibited, without the prior written consent of the NFL PARTNERS. Systematic retrieval of data or other content from this Service to create or compile, directly or indirectly, a collection, compilation, database or directory without written permission from the NFL PARTNERS is prohibited.
posted by AmbroseChapel at 11:07 PM on December 2, 2005

If that NFL page is the actual page intending to be scraped, I'd guess that anyone of any size tryied to scrape it, the NFL would sue the sh*t out of them. The NFL makes most of their money from media deals, they aren't going to let someone take it for free and try to use it in any sort of venture that will undermine their media partners. Even that page is presented in partnership with a media company (CBS).
posted by Good Brain at 12:07 AM on December 3, 2005

« Older Tea help! | Howlin Wolf cover of "I'm A Man" by Bo Diddley Newer »

This thread is closed to new comments.

Ask MetaFilter

If everything were Creative Commons...
December 2, 2005 11:11 AM Subscribe

Tags

Share

If everything were Creative Commons... December 2, 2005 11:11 AM Subscribe

Tags

Share

If everything were Creative Commons...
December 2, 2005 11:11 AM Subscribe