WTF do these internet download numbers mean?
May 14, 2009 6:56 AM Subscribe
Help me try and figure out how many times my podcast is being downloaded. I have requests, unique hosts, bytes downloaded, etc. but while I understand what each mean individually, I'm not sure which to "trust".
Preface: I'm using Dreamhost and these stats are coming from their panel stat application, which is running Analog 6.0
For a specific example, let me focus on one of my recent podcasts (a single MP3 show, not a series of shows). This show is showing 11613 requests in my stats. This is a LOT higher than most of the shows in that domain (the more "regular" shows are about 200 requests give or take), and I expected this surge as the most recent show is about the new Star Trek movie and given that movie's popularity it makes sense this show would have more traffic.
But I had been assuming a request was able to be equated to a listener...give or take, I knew it wasn't exactly 1:1 given Google bots, etc. but I thought it was a close approximation. For example, when most shows have 200 requests and this one has almost 12,000 I figured I could take that as a 5000% boost in popularity...
But then I looked at distinct hosts served and I'm only seeing a number of 3,216. Given how drastically different 3,216 is from 12,000 I thought something was up and dug deeper.
In the stats it then shows me the total data transfer for that domain for the month (144.27 GB) and it shows that the Star Trek episode has drawn 27.9% of my bandwidth for the month, so about 40GB. Dividing that 40GB by my file size (45mb) I figure I've only gotten about 917 downloads... again a HUGELY different number than 12,000.
However for one of the shows with 200 "requests", the math shows about 100 downloads... So I don't get why most of the shows would be off by 100%, but the show that should be most popular DOES have the most popular requests, but that number is off by 1300%.
Any ideas?
Preface: I'm using Dreamhost and these stats are coming from their panel stat application, which is running Analog 6.0
For a specific example, let me focus on one of my recent podcasts (a single MP3 show, not a series of shows). This show is showing 11613 requests in my stats. This is a LOT higher than most of the shows in that domain (the more "regular" shows are about 200 requests give or take), and I expected this surge as the most recent show is about the new Star Trek movie and given that movie's popularity it makes sense this show would have more traffic.
But I had been assuming a request was able to be equated to a listener...give or take, I knew it wasn't exactly 1:1 given Google bots, etc. but I thought it was a close approximation. For example, when most shows have 200 requests and this one has almost 12,000 I figured I could take that as a 5000% boost in popularity...
But then I looked at distinct hosts served and I'm only seeing a number of 3,216. Given how drastically different 3,216 is from 12,000 I thought something was up and dug deeper.
In the stats it then shows me the total data transfer for that domain for the month (144.27 GB) and it shows that the Star Trek episode has drawn 27.9% of my bandwidth for the month, so about 40GB. Dividing that 40GB by my file size (45mb) I figure I've only gotten about 917 downloads... again a HUGELY different number than 12,000.
However for one of the shows with 200 "requests", the math shows about 100 downloads... So I don't get why most of the shows would be off by 100%, but the show that should be most popular DOES have the most popular requests, but that number is off by 1300%.
Any ideas?
Response by poster: If they click the link directly it doesn't stream per se' but it does start playing before fully downloaded.
I should mention our biggest OS BY FAR is the iPhone OS, and that takes quite some time to download these shows over 3G (shudder...Edge) and will effectively "stream" as it downloads perhaps 2 seconds of show every second.
Also due to our RSS feeds on various podcast aggregator sites some do offer Flash players that link to my MP3 and again start playing before the d/l is complete.
posted by arniec at 7:22 AM on May 14, 2009
I should mention our biggest OS BY FAR is the iPhone OS, and that takes quite some time to download these shows over 3G (shudder...Edge) and will effectively "stream" as it downloads perhaps 2 seconds of show every second.
Also due to our RSS feeds on various podcast aggregator sites some do offer Flash players that link to my MP3 and again start playing before the d/l is complete.
posted by arniec at 7:22 AM on May 14, 2009
Are you taking into account the following factors?
1. the people who may listen more than once (will give you higher requests, and lower uniques)
2. people who start to listen, then stop (counts for request + unique, but lower bandwidth because they don't finish downloading)
3. people who start to listen, stop... come back later, and listen again. (higher requests... same unique, twice the bandwidth)
Also, I'm curious about what your "hosts" metric is referring to. Is it measuring deep down the hosts trail, or just tracking the parent domain like comcast.com? Because if doing the later, you're obviously going to have lots of requests from comcast users, but not very many uniques until it starts drilling down the host trail.
posted by finitejest at 7:36 AM on May 14, 2009
1. the people who may listen more than once (will give you higher requests, and lower uniques)
2. people who start to listen, then stop (counts for request + unique, but lower bandwidth because they don't finish downloading)
3. people who start to listen, stop... come back later, and listen again. (higher requests... same unique, twice the bandwidth)
Also, I'm curious about what your "hosts" metric is referring to. Is it measuring deep down the hosts trail, or just tracking the parent domain like comcast.com? Because if doing the later, you're obviously going to have lots of requests from comcast users, but not very many uniques until it starts drilling down the host trail.
posted by finitejest at 7:36 AM on May 14, 2009
Yeah, I'm casting my vote for #2 being the reason for the download/bandwidth discrepancy. Let's be conservative and assume that half of the requests are double requests (that is, someone trying to download your podcast twice). So that means 1608 (single requests) + 804 (doubly requested) = around 2400. Then lets say one third of them stop playing almost right away, another third listen to just half, and the remaining half listen to the whole podcast. So 800*0 + 800 * 0.5 + 800 = 1200.
Still not exactly right, but closer, and probably a good estimate of how many people actually listened to at least half of your show.
The reason that you're seeing such unusual numbers for this podcast is because it's more popular. For items of focused interest (whoever listened to your podcast before was probably an acquaintance or fan) you're going to get consistent play. For people with a passing interest in your podcast, maybe they'll click on the link then decide against it, maybe they'll listen for 30 seconds and decide it isn't for them.
posted by Deathalicious at 8:09 AM on May 14, 2009
Still not exactly right, but closer, and probably a good estimate of how many people actually listened to at least half of your show.
The reason that you're seeing such unusual numbers for this podcast is because it's more popular. For items of focused interest (whoever listened to your podcast before was probably an acquaintance or fan) you're going to get consistent play. For people with a passing interest in your podcast, maybe they'll click on the link then decide against it, maybe they'll listen for 30 seconds and decide it isn't for them.
posted by Deathalicious at 8:09 AM on May 14, 2009
We've got a podcast that that is tracked both by Feedburner and by LibSyn, our podcast host, and the numbers are never the same. They vary by a percent or less, though.
But the guidelines that apply here are the same that apply to analyzing web site traffic:
1. Look for a range of possible true answers, not a single one.
2. If you can only use a single answer, use the lower one.
3. If your numbers are embarrassingly low, then avoid giving them out.
4. If you must give them out, then talk about growth percentages as well.
posted by Mo Nickels at 8:59 AM on May 14, 2009 [1 favorite]
But the guidelines that apply here are the same that apply to analyzing web site traffic:
1. Look for a range of possible true answers, not a single one.
2. If you can only use a single answer, use the lower one.
3. If your numbers are embarrassingly low, then avoid giving them out.
4. If you must give them out, then talk about growth percentages as well.
posted by Mo Nickels at 8:59 AM on May 14, 2009 [1 favorite]
This thread is closed to new comments.
posted by smackfu at 7:09 AM on May 14, 2009