What does my online user data look like?
June 25, 2024 7:06 PM   Subscribe

I know social media platforms collect a dossier of info about each user, to help them target ads, suggest videos, etc. What does that info actually look like?

Here's how this data is depicted in the movie The Social Dilemma. How realistic is that?
Can employees at YouTube, Facebook, TikTok, etc generate a report about a specific user?
If so, what stats would be there, and how would it be organized / formatted?
Are there any online examples of authentic internal user data from social media sites?

I'm very interested in this whole topic, so I welcome broad answers and tangential information as well!
posted by nouvelle-personne to Computers & Internet (7 answers total) 7 users marked this as a favorite
 
This story from The Markup gives you some flavor.
posted by humbug at 7:37 PM on June 25 [2 favorites]


You can look this up for yourself. Here is your data from:

Facebook
Google
TikTok
posted by saeculorum at 7:37 PM on June 25 [4 favorites]


Can employees at YouTube, Facebook, TikTok, etc generate a report about a specific user?
If so, what stats would be there, and how would it be organized / formatted?


The short answer to the first one is absolutely yes. The longer answer is, it depends on the employee, and what the definition of "a report" is. Which leads to your second question.

"What stats would be there" is a very broad question, considering that these platforms collect an absolutely breathtaking amount of data about you. Seriously. They don't just collect surface level data like time of use/screens or functions you access/platform of use/etc., they collect alllllll kinds of things about what time of day you use the app (and what time(s) of day you use certain parts of the app), how long you use the app, where you go within the app, what elements you linger on the longest within the app, who you interact with within the app, what ads attract your attention and for how long, what ads you actually tap/click on, etc. It's almost impossible to list all the data they're collecting (in no small part because much of it is proprietary).

So I'm not sure it's really realistic to think of a dataset like this in terms of "a report", because that's not really what this data is for; this data is largely designed and collected to optimize the system that it's in, in order to make people spend more time in the system the data is gathered from. The data can be and is packaged and sold to advertisers, sure, but its primary purpose is to figure out better ways to keep you in their ecosystem.

Between that and the segmenting described in humbug's link, "user data" is an amazingly complex, multi-dimensional thing that isn't easily summarized, unfortunately.
posted by pdb at 9:03 PM on June 25 [6 favorites]


Broadly speaking, you can think of user data being broken down into three buckets: things that you created/supplied (eg profile data, but also your Facebook posts, your Facebook friends list, etc), your activity (what posts you viewed, for how long, etc) and then data inferred on top of those things (interests would be a likely candidate for any platform doing consumer advertising).

What "reports" exist is going to vary. Because GDPR Data subject requests are a thing, it's likely that a company like Facebook can automatically assemble everything they need to provide under GDPR (far from my area of expertise, but I think the first and third categories would be covered, not so much the second). Other than that, though, they're going to be narrower in scope, built for a specific task (we can call that respect for user privacy, but the reality is that by the time you're big enough to need this stuff, building one internal tool to rule them all is a coordination nightmare, so your one tool will really be merging separate tools that thus come with separate permissions).

A question like "show me every post nouvelle-personne has viewed" is obviously answerable, but it's likely only available as a "report" intended to be read by a person in something like an abuse investigation, if at all (I'm not sure "everything X viewed" would be useful in that context, but all of a user's posts/comments for sure). Again, I imagine Facebook has automated a bunch of this stuff due to scale -- I had a job at a company you haven't heard of where we'd have to pull things for legal ad hoc once in a blue moon.

Another thing people build UIs for viewing user data for will be debugging ad targeting -- did the user tell us they speak Spanish or did we infer it? (Spotify likes advertising to me in Spanish for some reason. I don't speak Spanish.) That sort of question tends to come up because you deliver some report to the advertiser who then says "What the heck, I only targeted Spanish speakers, why are you telling me X% of my impressions went to English speakers!?" Then someone reverse engineers that report to find a few user IDs that got counted as English speakers and looks up their targeting data and if/when it changed (maybe they switched the language they use the site in after they saw the ad but before the stats were assembled for the advertiser) to make sure there's not a real problem/figure out what to tell the advertiser. (The other use case is a specific user reporting a weird ad. That user will likely either be an employee or the CEO's nextdoor neighbor; I can't imagine standard support channels get someone debugging the targeting.)
posted by hoyland at 10:48 PM on June 25 [2 favorites]


At a streaming media company I have heard of, they didn't have a dossier on a person so
much as everything everyone had ever done,
and how they responded to what they were shown.

The data was used to improve the company's profits mostly by making the site as positively unsurprising as possible.
posted by zippy at 3:11 AM on June 26 [1 favorite]


From talking with previous and current employees at Meta/FB, I can also tell you that access by rank-and-file employees to sensitive user data is closely monitored by automated systems.

This was typically implemented as an access block on entire database tables, or on specific fields of tables, and required a manual authorization step that generated an audit log entry.

From what I hear there was a firing about once a year when the internal security team caught someone abusing the system, usually for a former romantic partner.
posted by graphweaver at 5:56 AM on June 26


I realise now my mental model for these things has always been an ever expanding spreadsheet with a row for me and endless columns with various flags and properties and features. Given this, It’s no surprise that the only data work I do is simple stuff in excel, I suppose.
posted by oxford blue at 7:43 AM on June 26


« Older Looking for bright, high-quality men's tank tops...   |   Need helping inking out the details on a film Newer »

You are not logged in, either login or create an account to post comments