Why is Facebook scraping the links in my private messages?
December 20, 2010 1:30 AM   Subscribe

ParanoiaFilter: Should I be worried that Facebook is scraping my links, in my own private messages?

Assuming you are on Facebook, you probably already know that by putting a link in the private messages' textfield, the system will automatically try to grab the contents from the link. And for links that are sent to you in private messages, you can check this by right clicking on a link and it will add their own URL in front of it, seen in the status bar.
Now I understand why Facebook is scraping/tracking this on wall posts assuming that it helps with their system's algorithm bringing relevant posts to your News Feed of what's popular. But why are they doing this in my private messages?

I know it's convenient for Facebook users to use the private message feature like email because they are already logged in, but I get paranoid when the link is personal/family related or when it's business-related. And I have no choice but to message some contacts through Facebook because they have their email addresses hidden on their profiles. Besides using trick.ly to password-protect the link (and SMS texting the password to the recipient), what else can I do to make my messages and links more private and safer on Facebook?
posted by querty to Computers & Internet (16 answers total)
Yes, Facebook records and analyses every single action you take on the site. Yes, you should be worried if you are doing anything that you're not completely comfortable with Facebook knowing about.

But the chances of them doing anything with that info beyond advertising to you or to place you in some demographic is relatively slim.
posted by cogat at 1:45 AM on December 20, 2010 [1 favorite]

nothing is private on facebook. to operate in any other way is to be disappointed. much like advice in relationship posts, listen to what mark zuckerberg is telling you, "People just submitted it. I don't know why. They "trust me" Dumb fucks."
posted by nadawi at 1:57 AM on December 20, 2010 [4 favorites]

The usability reason for doing this for private messages is because it gives a visual verification that your link points to where you intent it to point. For novice web users, sending a wrong link is a scary possibility.

It is technically possible to scrape them if those pages are open for anyone, but scraping interesting data from content outside of Facebook is much more uncertain process than scraping interesting data from inside Facebook, because the pages you link have no known helpful structure for analyzing what data is what (or they have structure, but it can be any of the endless possible structures). If your documents are not blatantly about contender for Facebook, or part of criminal investigation, I don't think they never would bother putting human eyes to the task. -- They are able to scrape it, but there needs to be a plan for what to exactly look from the content and what to do with the findings or it is just noise.

Of course, you can just break the link so that facebook doesn't recognize it as a link anymore. '*ttp://docs.google.com/... , replace * with h.' could be enough.
posted by Free word order! at 2:01 AM on December 20, 2010

Why don't you just ask your friends for their email address?
posted by ActingTheGoat at 2:04 AM on December 20, 2010 [5 favorites]

They're just running your link through a redirector so they know who is clicking what at all times.
posted by rhizome at 2:11 AM on December 20, 2010

Can you explain a bit more what you mean by "scraping" here? If someone sends you a private message with a link in it, Facebook already knows the URL and would be able to see what it leads to whether you click on it or not. I imagine they're adding www.facebook.com to the front of the link just so they can tell that you've clicked on it (and then instantly redirect you), not because your act of clicking on it gives them access to any other kind of special knowledge about the linked site. The data they gather this way is useful to them somehow, or maybe they're just collecting it in case it becomes useful sometime in the future.

The way they grab material from the site you've linked in order to display thumbnails and a description is different, but I'm not sure what the issue is with that. They're just excerpting part of the public web, the same part as they could get by just following the link.
posted by A Thousand Baited Hooks at 2:12 AM on December 20, 2010

This isn't exactly new -- Gmail (which you appear to use) has been scraping the content of private email in order to display contextual advertising for years. It's just a question of whether you trust Facebook to handle your private communications responsibly any less than Google, your ISP, or the feds.
posted by Rhaomi at 2:57 AM on December 20, 2010

They're just running your link through a redirector so they know who is clicking what at all times.
Actually, I think Facebook follows redirectors like bit.ly
If someone sends you a private message with a link in it, Facebook already knows the URL and would be able to see what it leads to whether you click on it or not.
I think the problem may be something posted online, but only meant for a few people to see. The obvious example would be prototypes of new websites. If they URL isn't meant to be long-lived, then it's not really a big deal if the URL itself gets stored, what's annoying is the fact that they actually scrape the website and extract the text.

The other annoying thing is that they scrape the text badly. They don't show an actual image preview like google does now for search results, it's just some scraped text.

I would just keep using that password protecting redirector, if you include the password in the message, that would probably be fine.
posted by delmoi at 4:22 AM on December 20, 2010

Occasionally, if the link leads to a virus or a phishing page, they can make the link lead to an error page instead too, I believe.
posted by ferdinandcc at 5:06 AM on December 20, 2010

you can check this by right clicking on a link and it will add their own URL in front of it

Putting links through a redirector doesn't necessarily mean that facebook scrapes the content. The two are completely separate things. They could theoretically run a click-tracking redirector and never read one byte of content at the destination. All that the l.php script does is record the click and then send a redirect header to tell the browser to go elsewhere.

(Please note that I'm not claiming that Facebook doesn't also scrape content, just that seeing a redirector in front of a link is in no way proof of anything.)
posted by Rhomboid at 5:09 AM on December 20, 2010 [1 favorite]

You can not use facebook. Facebook is probably the least secure medium you could possibly select, other than maybe posting on a myspace wall. If you're not comfortable with walking around Times Square wearing a billboard with the url on it, you should not be transmitting it over facebook.

If you need serious security, look at Hushmail or running your own secure server over an SSH tunnel.

If you're just worring about facebook looking over your shoulder, but don't really have anything sensitive, just switch to regular email.

Facebook is not to be trusted. Ever.
posted by T.D. Strange at 5:59 AM on December 20, 2010

As long as you aren't doing anything overtly illegal where they would have to notify the police, Facebook isn't concerned with you, they are concerned with people like you. The money is in understanding what people like you are interested in, what sites you go to, and how they can effectively provide mined demographic information to their clients. A link isn't important, making a crossection of who else points to the same link and what their common interests are is much more profitable.

There no free social service, if you think there is then chances are you are what is being sold.

As for personal saftey/paranoia, a better concern would be: Have I positioned myself in pictures, videos and wall scribblings, be it on my own site or my friend's site such that potential employers, admissions boards, and or business parters would think twice about my expertise, judgement or what have you?
posted by Nanukthedog at 6:27 AM on December 20, 2010 [1 favorite]

I don't do anything on Facebook that I don't want the world, specifically marketers, to know about. I use FB to see what other people are up to (mostly Farmville), and I occasionally post links that are interesting or funny, but I try to keep it bland.
posted by theora55 at 6:43 AM on December 20, 2010

querty: "ParanoiaFilter: Should I be worried that Facebook is scraping my links, in my own private messages?"

(I don't work at FB, but I develop a lot against their platform and work with their data quite a bit)


From a practical standpoint, Facebook is certainly not the only company to be relentlessly tracking data about you. Google, as Rhaomi noted above, is doing the exact same thing with every bit of data that travels through their servers. As big as FB is, Google is still looking at orders of magnitude more bits about you than FB. Then there's all the URL shorteners. And that's not even touching the ad folks who have been tracking your browsing habits since the web started. Communities like MeFi, which are very protective of user privacy, are far and away the exception rather than the rule.

If your honest answer to this question is "yes", you should pretty much unplug from the internet now. You've already been tracked, profiled, and tagged dozens of times over by now.

On the other hand, Nanukthedog's perspective is spot on. Aside from protecting themselves from illegal activity happening on their servers, there's simply too much data there to care about the micro level. You, as they say, are not a special flower. You're just one among billions of data points used to figure out targeted marketing.

At a technical level, Facebook is diligently trying to catalog the entire web and map relationships sites have not just with each other, but with users. They're pushing people to adopt a standard called Open Graph, which is a series of META tags pages can use to declare themselves more accurately than relying on page scraping. You can read about it here. If a page does not have Open Graph tags, Facebook's bot kind of sucks at figuring out what it's supposed to be, so if you share a link on FB and it can't pull in an image or text that's why.
posted by mkultra at 7:53 AM on December 20, 2010 [1 favorite]

Should I be worried that Facebook is scraping my links, in my own private messages?

If it's on Facebook, it's NOT private. So no, you don't need to be worried, because Facebook is not an email system therefore you have no private messages; and even if it was email, email isn't particularly secure.
posted by blue_beetle at 8:53 AM on December 20, 2010

Your definition of "private," in the sense of "private message," is not the same as Facebook's. Facebook's definition of "private message" is simply that the message is not viewable by everyone else on your friends list.

According to Facebook, everything that you do, say, click, and type on their site is their property. (When you delete your data, it is not actually deleted. It remains in Facebook's database, but their system hides it from your view.)

Your private messages do not belong to you. They are property of Facebook. This is formalized in the Facebook TOS, and results in things like links in "private" messages being cataloged and used to fine-tune Facebook's data mining.

If this bothers you, then you need to leave Facebook. I did.
posted by ErikaB at 11:23 AM on December 20, 2010

« Older How to divide Christmas when families are so far...   |   Which antibiotics will kill bacteria but allow... Newer »
This thread is closed to new comments.