How are cookies used to collect demographic information?
August 16, 2010 4:51 PM   Subscribe

Metafilter (and other sites') traffic demographics: How are cookies or other means used to collection this kind of information?

And if a large percentage of mefites (or some other site's fans) use private browsing, incognito, or inprivate browsing modes, would that significantly skew the results? I'm curious if anyone can provide a fairly detailed (but not overly technical) explanation of how traffic monitoring sites collect such extensive information -- e.g. how do websites know about my college education, income level, and/or ethnicity? While that information might occasionally be provided in secure forms, it doesn't seem to me like cookies should be able to access that. Extra kudos if anyone can answer how accurate traffic sites like Quantcast are.
posted by GnomeChompsky to Technology (7 answers total) 2 users marked this as a favorite
A cookie can connect a particular web browser to a user in a web site database. If you previously told a web site about your education and income level in some other form, and they saved it in their database and connected it to an ID that is stored in a cookie they gave you, when you come back to the site they know who you are and what you previously told them.
posted by willnot at 5:06 PM on August 16, 2010

Response by poster: That basically makes sense. But how does Quantcast gather and automatically aggregate all that information?
posted by GnomeChompsky at 5:07 PM on August 16, 2010

These types of sites usually geusstimate demographic profiles for website audiences from the web activity of sample panels. Panels are comprised of small subsets of web surfers recruited from across the Internet and then asked to answer detailed demographic and behavioral surveys. The responses from these surveys are then combined with traffic metrics to extrapolate trends for entire website audiences. Accuracy varies by the sophistication of the audience metrics company in question and the "nicheness" of the web property in question.

In sum -- the harder it is to create an accurate sample panel for a web property, the less likely it is that the metrics company will extrapolate the property's audience profile accurately.

I wrote an answer to a similar question about Quantcast on Quora.
posted by superfem at 8:38 PM on August 16, 2010

I worked for a dotcom in the 90s which had its web analytics "audited" by a highly-respected third party which charged a fair amount for the service. The auditing process consisted of us uploading our server logs to them and me saying "sure, those files are accurate", even though they couldn't handle the log format we used and I had to run them through a number of scripts before handing them off. Those scripts could have done anything at all to the log files, and they never once asked for a copy, or even an explanation of what the process was. I doubt that the industry has changed much since then. Which is to say, while it's conceivable that they're doing a better job now, chances are that they're making a lot of guesses and assumptions and pulling numbers out of their ass, then putting a polish on them for the investors. "Accuracy" isn't really a word I'd use in that context.
posted by hades at 11:47 PM on August 16, 2010

I doubt that the industry has changed much since then.

It's changed by leaps and bounds since then. Nobody uses web server log files any more -- for one thing you cannot read or set cookies from a log file. If you hadn't noticed, all advertising and traffic monitoring these days is done by calling bits of Javascript from a remote site (usually google or some other ad bureau) and letting it execute on your page. Even though it is hosted elsewhere, it is executing on the domain of the target site which means it has full access to read and write any cookies on that page. From JS it is also possible to read all sorts of information about you, including browser version, operating system type and version, screen size, what browser plugins you have installed, and even for the less scrupulous, what fonts you have installed and what other sites you've visited in the last 7 days. All of this information can then be transmitted back to the ad bureau or to the stats collecting agency, usually by loading a specially crafted 1x1 transparent gif with the information encoded as parts of the URL. This combination of access to persistent storage, availability of lots of software details, and the ability to communicate it all back to the mothership combined with the fact that these ad bureaus/stats measuring agencies get their JS snippets on thousands of different domains all means that they can aggregate this collection and amplify their stats-building abilities. As someone else already mentioned, all they need is demographic information from one source and they can then parlay that into demographic information about all the thousands of sites that they're on, due to the unique identification methods.
posted by Rhomboid at 7:36 AM on August 17, 2010

Sorry, yes, clearly the technology has changed. I'm aware of how tracker bugs work. It's the attitude I have doubts about. If the metrics companies have started providing their clients with explanations of their methods and the ways in which they are not 100% accurate, that's a great improvement, and I retract my rant. But I'd put, say, $20 on the proposition that if you paid two different companies for a site analysis, you'd still get reports which were just different enough that someone in Marketing would want me to explain the discrepancy away by finding a technical reason that the company providing the numbers they didn't like was just wrong. Not that I'm bitter about that experience, or anything. *cough*

Or, to put it more succinctly: I'm not willing to trust that an improvement in technology has also improved their business methods. But I guess I've been out of that field for almost ten years now, so what do I know?
posted by hades at 9:38 AM on August 18, 2010

(This is predicated on Quantcast not being owned by Google or Facebook or something like that. Entities with that amount of data to sift through are another story entirely. Is Quantcast operating at that level? Maybe I'm wrong.)
posted by hades at 9:47 AM on August 18, 2010

« Older I know when my coffee table was born, but I can't...   |   How do I get medical tape off a toddler? Newer »
This thread is closed to new comments.