Explain why big data is scary.
June 18, 2017 5:13 PM

In the Amazon eating Whole Foods thread there are a lot of negative responses to the buyout. I agree. But after some discussing it in real life, I'm not sure why, besides being raised to distrust big companies and monopolies. Everyone I talked to was dismissive of the buyout. Who cares if a luxury grocery store is purchased by a retail giant with great customer service but poor labor history?

Amazon absorbing Whole Foods seems like a bad idea to me. I can explain why Walmart or 90s Microsoft or telecom monopolies are bad ideas for everyone but people invested in the company. I feelthat Amazon, Facebook and Google monitoring all of my location data, search data, flight history is bad. But I don't think that I could explain it coherently. I use Adblocker and Ghostery but mostly for usability reasons, as modern ads cripple my work computer.

Anyway I'm looking for a primer on why I should be afraid of this stuff. All I can think about are selfish reason, like past criminal convictions showing up on searches of my name, which has happened and caused some unnecessary conflict in my family. But those are on public record anyway. I guess I'm finding it difficult seeing the big picture. Is the concern mostly for issues like doxxing or surveillance and harassment of targeted people/groups?

Thinking of this comment:

You need to be connected to their wifi, and not using a VPN. Which means it will only work on the foolish.

Well, obviously fools, people who are less technologically literate, and people who have no other options for various reasons deserve to be exploited by Amazon in every possible way.
posted by Copronymus at 9:04 AM on June 16 [14 favorites +] [!]


Seriously how many people use VPNs on their phones, I've used them on my PC while torrenting but you can't honestly say a VPN matters when you're looking up something while you're waiting in line at Target.

As for this comment:
It used to be you'd go into a store and say oh cool, free wifi! I don't have to run up my cellular bill! Then you realized that the free wifi was nothing but a way to gather information about your browsing habits, and that actually surfing the net like you always do is painfully slow and useless. If you go to an airport, it's basically impossible to get to any web site other than "next time you're here, fly with United!"
posted by Melismata at 9:09 AM on June 16 [5 favorites +] [!]


I know airport Wifi is slow but what is the issue? The airport tracks what people access via their Wifi and sells that data to Amazon and Facebook? I know its a shitty thing to do but I don't know why it's scary and will lead to dystopia any more than a million other things. Or is this just one aspect of the decline of our society, on symptom of the increased surveillance apparatus?

TL;DR: Why is big data scary?
posted by kittensofthenight to Computers & Internet (21 answers total) 34 users marked this as a favorite
Well, just as an example, there was a phase about a year ago when five of my coworkers/friends were simultaneously pregnant (we joked there was something in the water). So I was planning a lot of baby showers and looking at onesies and toys on Amazon. Then I started getting targeted ads for prenatal care. Pretty innocuous, but it hit home just how easy it is for an algorithm to make assumptions about a person based on their browsing habits, and how easy it is for those assumptions to be dead wrong. It is a small step to go from "algorithms create a detailed but incorrect picture of a person" to "algorithms are used for surveillance purposes."

I guess it comes down to how much do you trust your government to protect your rights vs trample them in the name of national security.
posted by basalganglia at 5:26 PM on June 18, 2017


It's new, it's hard to understand, it seems to be the precursor of panopticon control of our lives. There's been some spooky effective stories, in a well reported story Target began sending coupons oriented towards new mothers to a teen before she even new she was pregnant, rather upset her parents. There is so much data captured about all of us one saving grace was that it seemed like too much for any one organization to correlate, now fast smart algorithms seem like they can take data and interpret more about us than our closest confidants. How will that information be used? For our benefit or profit?
posted by sammyo at 5:28 PM on June 18, 2017


Why is big data scary? Because it is not neutral or objective but tries to pretend like it is. I enjoyed reading "Weapons of Math Destruction" for reasons why you should be skeptical of big data. Humans make decisions about employment, insurance, and what people should eat or buy but hide behind "algorithms". It has great potential to exacerbate inequality. I have not read this book but I think the argument may be similar, "Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy."
posted by turtlefu at 5:49 PM on June 18, 2017


People can only say that there's nothing to fear if you have nothing to hide as long as they trust that the government and all of these companies are aligned with their own beliefs. The minute you have a government that is prosecuting people for doing things that you believe are unproblematic, you have something to hide and should fear big data uncovering your habits. For example, if your religion becomes illegal, it's a problem for you that Amazon knows you have a habit of purchasing books relating to that religion, and that Facebook knows you followed influential people from that religion, and that's assuming your religion wasn't even listed directly in your profile.

What if the USA decides to crack down on people protesting and organising against Trump? Anyone who has been participating in the metafilter anti-trump threads, anyone who has 'liked' anti-Trump posts on social media, and anyone who has ever been a member of Facebook political protest organising groups is suddenly a target, and that sort of evidence could conceivably be used against them in the event that they were being tried for something they did in the real world.

Or in a more corporate framework, what if your health insurance rates eventually get tied to what companies can infer about your health based on the data they collect about you? For one thing, errors creep in (how about the person who researches heart disease and so their browsing habits look like too similar to someone who is ill or who has a family history of it?) For another, even if it's perfectly accurate, companies could actively punish people who protect their privacy - if they don't have access to data about you (because you aren't on social media, use a VPN, whatever), then you have to pay the highest possible rate. And then anyone who is vulnerable for the previously mentioned reasons ends up being financially pressured to risk the exposure of their information to all actors.

And you might think that you personally are unlikely to be targeted because you are not in any minority and do not participate in any practices that are considered problematic by the wider public, but if the only people who are agitating for privacy measures and less corporate access to our data are persecuted minorities, that's a huge burden to put on their shoulders. Plus that's exactly the segment of society that doesn't tend to have the power to get people to listen to them and to get laws changed.

Finally, a really big problem with big data is that the algorithms are often black boxes even to the people who deploy them. So right now it's hard enough to get health insurance companies, for example, to reconsider decisions they have made about you, but if those decisions are made on the basis of big data algorithms that they don't even understand, they are just going to refuse to reconsider them, even if it's apparent to you that they must have made a mistake.
posted by lollusc at 5:55 PM on June 18, 2017




Hi! Big Brother here!
First off, I care about you as a person - but I don't care about you... I care about everybody like you. What I mean is, sure - on a personal level, if I watched you trip, I'd rush to your aid and help you up. But, I don't care about whether you listen to Post Modern Jukebox while you are surfing in retail space - unless everybody like you (for various levels of 'like you') listens to PMJ while they are in our retail space. Then, in that case, if I know the products that you (er... the products that people just like you) like, then I can get adspace as preroll to PMJ. Case one.

But lets stop there for just a second. Who is like you? ...Well, lots of people are - sure, there are all kinds of special snowflake parts about you, but from a big data perspective, the question is: if I feed a model all kinds of random crap about you and about your peers, what different groupings does a computer model put you in - AND - which of those models can I best activate to extract wealth from you? Did your purchase of Skippy Peanut Butter, a 12 pack of razors, and a blue hand towel just tell me that you are in the top 10% of people likely to buy plungers? Do I generate my best margin by offering you that plunger at a discount, or can I rely on you to make that purchase so I don't need to offer it to you at a discount (and instead I serve up an in-store advert to the next best 10-20% group who are more on the fence about buying plungers today). Now, there's no seemingly logical reason from whatever perspective that those 3 items seem to infer that you'd be most likely to purchase that plunger - but the reality is that those 3 items are solid markers that the machine identifies as a pattern.

Some models are descriptive, and those would be used to describe 'why' you do something. Others, like the example above are predictive, and those show more of a propensity to do something. There's a fine art of relating the two together, and you can build out some weird sub-models to test those relationships...

Now that we've described 'you' in terms of the groups that you shop like, lets stop and throw out that whole notion of the items which you buy belong in these hard fast categories. This is where you start to see affinity modeling once again start to bleed into recreating groups of industries, cars driven, even the products and purpose of trips to a given store. Like those Cheaty McCheatenstein Cheaters who convince you to purchase a $0.99 extra life or whatever crap in Candy Witchville, the goal here is to create what feels like a symbiotic relationship filled with very tightly constructed feedback loops to get you back into a given store more frequently (or online, or wherever). Any way you slice it, the goal is ultimately to get you to the channel that costs the least amount to serve / and or earns the most / whichever one maximizes the margin of your relationship...

Anyway, Post Modern Jukebox... I bought adspace there to attract you so you'd buy plungers... except it wasn't for buying the plungers, because I could rely on you buying the plungers - I did it so you'd buy the granola bars - for that segment you were in the 20-30% decile, and this nudge will get you to buy it at my store...

So the other thing that I might be doing is figuring out which items you've clicked at while browsing online, or which pages you've come to visit me with in the past... I'm just looking for something that marks you as a solid prospect, maybe figure out which items you considered adding to your cart and then just forgot about and walked away from... (Guess who's getting about 10,000 ads served to them about that black and decker toaster oven that will follow you from CNN to Perez Hilton and every single linked site from that weird blue Metafilter that I don't advertise on, but - of all your items that people like you looked at, the one that most people like you bought was that toaster oven - so you'd better believe I'm going to serve you that ad until one day when you are sitting reading your morning news feed in your boxer shorts you say to your self 'you know what, I DO need that toaster oven from black and decker!'

But, lets jump way way back down to Amazon and Whole Foods... what did Amazon just do?
Amazaon just got a grocery store that caters to folks who shop heavily online, folks who know produce and expect quality produce, and are willing to spend money at Whole Paycheck Foods. They also got a cross country distribution network hellbent on maintaining the highest quality produce, and experts at selecting quality produce. They bought themselves regional distribution centers (individual stores) for folks to select the best produce and box it up for a given customer... They bought mother fucking hippies and hipsters who you no longer need to actually see to shame you for your conventional purchase - because those folks can now put your produce in some seemingly green distribution container and Amazon will ship it from their store to your home. That hipster is now just a box filler.

I can go on... but... I think I've put up enough to chew on.
posted by Nanukthedog at 6:05 PM on June 18, 2017


oh, and I know you are in my store, or if not you... I know by checking your extensions and browsers, and apps installed that you are likely 1 of 3 people based on updates and so on...So yeah, I know it is you. Please please please, buy your order with a credit card so I can match your address and first name and have basically everything I need to market to you F-O-R-E-V-E-R. My adds will follow you to competitor stores, to facebook, to linkedin curated sponsorship... everywhere... I know you now... we have a relationship... and now that I know you, its time that you did your part and start buying the things that I am going to hound you to purchase.
posted by Nanukthedog at 6:12 PM on June 18, 2017


So I was planning a lot of baby showers and looking at onesies and toys on Amazon. Then I started getting targeted ads for prenatal care.

Further to this, and just as one of the problems of mass commercial surveillance: Bulk collection of data is much cheaper than ensuring its accuracy. When they get it wrong, you have basically no recourse.

I don't know if this is 'scary' to you, but I recall an article by a woman who had been placed on a bunch of marketing lists (pregnant women are a gold-mine, and new parents tend to be very brand- and store-loyal). Then she miscarried - but continued to receive marketing bumf regarding the pregnancy. And then, in time, regarding the presumed child. For years.
posted by pompomtom at 7:05 PM on June 18, 2017


Big Data is used to persuade you to buy more. If you are ad-resistant, fine. If not, you end up buying more stuff. Big Data might be used to decide how much you're willing to pay for stuff at Amazon, Whole Foods, insurance, clothing, etc., so now you aren't getting the best deals. Big Data might be used by an employer to decide whether to hire you, based on credit score, etc.

Whole Foods bought and closed Wild Oats, which was a nice place to shop. Diversity of vendors means more choices and maybe better prices. Amazon is a hard bargainer, which is ok, but maybe not so great for the book industry or other markets. Amazon is notorious for treating employees poorly.
posted by theora55 at 10:22 PM on June 18, 2017


Why large companies collecting data worries me:

- The larger and more multi-faceted the company, the more different data they collect; the more complete the profile they build for you.
- You don't know in how far their data on you is correct. There are probably errors in it. You can't check that because you don't get to see it.
- Data can get stolen or subpoenaed. You don't know where your data will end up.
- You don't know how that data will be used in the future.
- You don't know which things will be frowned upon or illegal in the future, so you don't know which parts of your profile will be limiting or incriminating you in the future.

A concrete example: I want to be able to look up information about suicide because I'm worried about a friend or family member, or just curious... without running the risk that I'll see my insurance premiums rise in the future, or be turned down for a job, because somewhere in the data is a marker for a heightened risk of suicide.
posted by Too-Ticky at 2:58 AM on June 19, 2017


Adding to my previous comment:
You don't know where your data will end up and how it will be combined with other data, like your physical location at all times (your cell phone provider knows that), your medical data, your tweets or facebook posts, your emotional responses to things you see on the net, your conversations, the media you consume...

To me, freedom includes the freedom to be unobserved every now and then, instead of being watched at every turn. Our privacy is definitely under attack.
posted by Too-Ticky at 3:04 AM on June 19, 2017


[A couple deleted. Sorry folks, but if you want to chat and debate with each other, do that in the blue, please. This is very borderline chatfilter, but concrete examples and links to explanatory articles, books, and resources that explain "why big data is scary" are the best way to go.]
posted by taz at 4:42 AM on June 19, 2017


Even if you don't care who "knows" your information, Big Data is used to make choices about you, to shape the opportunities available for you online and in the world.

How should we screen job applicants? For example, if you have multiple convictions for fraud and assault, we generally think it's okay for managers to say, "I want to call in the other guy for this interview; I think this shows a record of poor decision making." But what if your recruiter knows that people who install non-included browsers on their machines stay at their jobs longer, so passes you over because you submitted your resume using Internet Explorer? Stupid you, your past self should have known that correlation.

Should we send people to prison because their parents also went to prison? Tim Brennan, a statistics professor, created an algorithm in 1989. This data model originally had a laudable goal: to see which parolees would need extra supervision to stop them from reoffending. And that model needed data on things like whether one or both of your parents had gone to prison; whether you had strong or weak family ties; whether you are socially isolated--because all of those things are somewhat correlated with a person's recidivism risk. But these types of algorithm are now used for more than whether or not you need extra supervision. Judges are using it to determine which program criminals are sent to--high security? Work farm? Supervised release? And they are using it to determine the length of sentencing. Or whether someone should be incarcerated at all. Oh yeah, it also gives black folks higher risk scores than white folks who committed the same crime. Nor is your lawyer always allowed to see how the algorithm works or what it's doing. You just get a score.
posted by Hypatia at 5:43 AM on June 19, 2017


If you're not doing anything wrong, is it OK if I come over and watch you and your family and friends poop? And sell the video to anyone willing to pay? They'll probably just use that information to sell them various types of snake oil targeted them, to create and then exploit their individual pooping related insecurities, but you know, it's out there for everyone to use as they see fit.

But say you don't care. Say you like people watching you poop, or just don't care, or you think it's somehow qualitatively different as long as it's machines and not individual humans doing the watching. Say you trust whoever the hell is buying your toilet cam data to only market the best of pooping accessories to you and yours.

Are you still OK with them doing it to people who do care? Do you know anyone who wouldn't want to share that? Anyone who would be horrified that their private activities are being made public without their consent?

This isn't just speculation, except the poop focus. People are already collecting and predicting intimate details about your life and using it to exploit your weaknesses. Go read Pam Dixon's congressional testimony from 2013 (PDF) about the types of information already being sold by data brokers on the open market, and it's a lot worse than just pooping habits. They're already selling lists of people who've been victims of violent crimes, of people with serious medical conditions, cognitive delays, personal problems, whatever else they think they can glean from the data they've collected.

You might be OK for now and not seeing the effects, but due to some shitty data mining, a lot of con artists believe that there's a man in his nineties living in my house, and it's just astonishing the sheer amount of con artistry that comes here targeted to him. Scammy services advertised so they look like bills, particularly to someone who is statistically likely to have low vision and possible mental deterioration, alarmist fearmongering, outright scams. If you get old someday, there will be predators waiting for you.

And if you're familiar with the concept of redlining, you know it's a practice in which insurers, banks, and others would skirt discrimination laws by not directly discriminating against people based on protected categories, but on factors that just strongly correlated to those categories. So you wouldn't refuse loans or charge exorbitant rates for insurance or deliver suboptimal services to certain people just because someone is black, but because they "just happen" to live in an area where most people were black, like it was all some big coincidence. But, because there were humans directly involved in that decision making, people were able to show the intent and use that as leverage to push back.

Computers don't form intent, though. Computers are not racist or sexist. They are objective, and they learn without personal biases, because they're not people. What they do, though, is learn from systems in which those bigotries are all baked in. They don't know what race or gender or anything is, but they can predict that people who listen to this sort of music, who live in these neighborhoods, who buy these foods, use these services, drive these cars are more or less likely to do specific things, like default on loans, get fired, get arrested, declare bankruptcy, develop chronic health conditions, die young, have lots of kids, or join the military.

So by unleashing these learning systems onto mass amounts of surreptitiously collected data, we're not only further entrenching existing biases and stereotypes, but we're creating new ones as well, as these models often discover new correlations. Maybe people whose parents are both immigrants from different countries are more likely to be afraid of spiders, maybe people adopted in infancy are more likely to purchase cooking spray, and people who have conflicted relationships with their siblings buy an unusual number of yellow things. Or more relevantly, maybe certain demographics make more insurance claims, get robbed more often, get sick more, are in more weather related traffic accidents, will accept lower salaries, get arrested more, get convicted more, change jobs more.

And it doesn't matter if you individually don't fit those models. Big data doesn't have the time or interest to clean up its inaccuracies. If you're lucky, your perfect driving record might mitigate the effects of your being in a demographic that's been painted as high risk, but you'll never be on even ground with someone born to a demographic that the machine has decided is lower risk.
posted by ernielundquist at 6:39 AM on June 19, 2017


Quick revisit...
One other thing, depending on what I'm training a machine to do (Machine Learning) there are two things to look out for False Negatives (this is where you are accidentally excluded from a list I want you to be on) and False Positives (this is where you are included in the list I want you to be on. The goal is to minimize the data observations that fit into those pools, but lets think about what those observations generally are: people.

So you get denied a loan because of a data anomaly (we'll call this one a false negative), or you get provided a loan that is irresponsible because you can't pay it (a false positive). With the first half, that's a decrease in my bottom line so I work to decrease this. Quite likely this results in more false positives as well - which I can monetize as well by selling their debt as a credit swap once I figure out in a different model, or they start to default on their loan... Point being... the machine maximized my proffit... and quite likely ruined someone's life in the process.
posted by Nanukthedog at 6:51 AM on June 19, 2017




Last night I searched for "English hats" because I'm headed to England for the aren't-we-just-so-quaint fancypants Henley Regatta. This morning I've received three emails so far, offering me "bespoke" bullshit English clothing. Pain in the ass if nothing else.
posted by JimN2TAW at 9:40 AM on June 19, 2017


Good news JimN2TAW, generally abandoned cart marketing stops after about 2 weeks.. and search marketing - if you don't routinely search it, peters out in a month or so... Until then, asides from unsubscribing and blocking them - you'll need to hold the stirrups tightly since marketers want to ride their expected cash pony.
posted by Nanukthedog at 12:30 PM on June 19, 2017


read any and all of journalist Carole Cadwalladr's long articles on how it influenced the UK's brexit vote decisively: it can undermine the sovereignty of elected democracy, she argues / demonstrates (you decide!)
https://www.theguardian.com/profile/carolecadwalladr
Anything from 'Feb 2017' onwards, eg Robert Mercer, Nigel Farage, Aaron Banks, great british brexit robbery, and who regulates elections are all good articles, with little reduplication
http://journalisted.com/carole-cadwalladr?allarticles=yes
if you prefer journalisted
posted by maiamaia at 2:08 PM on August 28, 2017


as usual, i left out the links in the argument: okay, it makes some people too rich and too powerful in the realm of the internet; there are too few of them so they all know each other and get together; but worst of all, it concentrates the power of who decides what you get to see in half a dozen people's hands. So, look at all the research on how facebook does affect whether you vote and sells advertising that it claims will influence how you vote. Further, the law has not evolved to deal with this so what is absolutely illegal in spirit of the law is not illegal in the letter, and punishment is by puny fines. All it takes is some people with the power to decide they want to use it....

Here is a fine argument by Cloudflare's founder on why them not helping Nazis exist on the internet is a bad thing

finally, you need to understand the ideas of sovereignty in a country, which is free will in a person, that is, not being controlled, mislead etc by another person but making their own mind up; so that when a corporation lobbies a government and is heard against an electorate, that's a death of sovereignty; and of democracy. Most people haven't a clue what that is, because it's a whole collection of greek and modern stuff (in uk), such as the right to assembly and protest, juries of normal citizens, free speech and unions and a whole bunch of stuff that's part of the how the machinery actually works. Most british people don't get what's wrong with Trump and Russia, because they don't see it's about sovereignty; where the law actually draws the line is one thing, but the reason it's there is necessary or it's just a meaningless quibble not a problem. I'd say, democracy is philosophy of law
posted by maiamaia at 2:15 PM on August 28, 2017


Also Carole Cadwalladr found that Google searches for 'was the holocaust a lie?' found the top and nearly all frontpage results were for sites that 'proved' it was... the articles seem to have disappeared, but it dominated the headlines in uk for a week
https://www.theguardian.com/commentisfree/2016/dec/11/google-frames-shapes-and-distorts-how-we-see-world
posted by maiamaia at 2:20 PM on August 28, 2017


« Older Will Verizon Un-Throttle my Data?   |   How do I get sand out of my clothes? Newer »
This thread is closed to new comments.