What's the average length of an email address?
January 16, 2007 3:00 PM Subscribe
How many characters are in the average email address?
I'm guessing it's somewhere around 25-30 characters, but have no data to back that up. Statistical analysis, including mean and standard deviation (assuming there's a normal distribution) would be extremely helpful.
I'm looking for data across a wide variety of domains, since a sample from a single domain would skew the distribution.
If you can, please site your source.
I'm guessing it's somewhere around 25-30 characters, but have no data to back that up. Statistical analysis, including mean and standard deviation (assuming there's a normal distribution) would be extremely helpful.
I'm looking for data across a wide variety of domains, since a sample from a single domain would skew the distribution.
If you can, please site your source.
Best answer: A user database I have handy says mean 19.57, median 19, mode 18, std 4.09. 50% are 16 to 22 characters, 90% are 12 to 26 and 99% are 6 to 32.
posted by cillit bang at 3:21 PM on January 16, 2007
posted by cillit bang at 3:21 PM on January 16, 2007
99% are 6 to 32
I can't imagine how a 6 letter email address would work...
posted by yeoz at 3:37 PM on January 16, 2007
I can't imagine how a 6 letter email address would work...
posted by yeoz at 3:37 PM on January 16, 2007
yeoz: Like this: a@b.ca, for example. If .ca ever let you get single letter TLDs. (I'm nearly sure that some Countries have allowed single letter domains at least some point in their history.)
posted by skynxnex at 3:43 PM on January 16, 2007
posted by skynxnex at 3:43 PM on January 16, 2007
I can't imagine how a 6 letter email address would work...
Six letter e-mail addresses are possible, i.e. [letter][at sign][letter][period][two letter country code]. However, they're quite rare - I think there is something wrong with cillit bang's data. I don't for a minute believe that they're common enough to fall into the 99.9% bracket, much less 99%.
posted by RichardP at 3:52 PM on January 16, 2007
Six letter e-mail addresses are possible, i.e. [letter][at sign][letter][period][two letter country code]. However, they're quite rare - I think there is something wrong with cillit bang's data. I don't for a minute believe that they're common enough to fall into the 99.9% bracket, much less 99%.
posted by RichardP at 3:52 PM on January 16, 2007
Response by poster: Most excellent. You guys rock!
Thanks.
posted by jknecht at 3:54 PM on January 16, 2007
Thanks.
posted by jknecht at 3:54 PM on January 16, 2007
There's the possibility that the 6-letter email addresses in these databases are more common than you'd expect because they're not validated. Someone's email address might be "nospam" or something that's equally invalid.
Also, is there anything preventing someone from throwing an MX record on a CCTLD? There could be a foo@tv, opening it up to four-letter email addresses.
posted by hutta at 4:02 PM on January 16, 2007
Also, is there anything preventing someone from throwing an MX record on a CCTLD? There could be a foo@tv, opening it up to four-letter email addresses.
posted by hutta at 4:02 PM on January 16, 2007
I don't for a minute believe that they're common enough to fall into the 99.9% bracket, much less 99%.
It could merely be an indication of the size of cillit bang's database, since s/he didn't actually say.
posted by snap, crackle and pop at 4:04 PM on January 16, 2007
It could merely be an indication of the size of cillit bang's database, since s/he didn't actually say.
posted by snap, crackle and pop at 4:04 PM on January 16, 2007
Best answer: Hang on, my ranges are wrong. The should read 50% are 17 to 21 characters, 90% are 13 to 25 and 99% are 7 to 31.
I don't for a minute believe that they're common enough to fall into the 99.9% bracket, much less 99%.
I went for symmetrical brackets either side of the median. You are correct in saying the lower end of such brackets is empty.
More usefully, the narrowest bracket that covers 50% is 17 to 21; for 90%, 14 to 26 and for 99%, 11 to 31.
posted by cillit bang at 4:43 PM on January 16, 2007
I don't for a minute believe that they're common enough to fall into the 99.9% bracket, much less 99%.
I went for symmetrical brackets either side of the median. You are correct in saying the lower end of such brackets is empty.
More usefully, the narrowest bracket that covers 50% is 17 to 21; for 90%, 14 to 26 and for 99%, 11 to 31.
posted by cillit bang at 4:43 PM on January 16, 2007
Best answer: Just going by the empirical rule of normal distribution, based on the mean (19.57) and stddev (4.09) of cillit bang's data, a normal distribution of data should reflect this table:
posted by ijoshua at 4:46 PM on January 16, 2007
| length
| min max
----------------------
68.26% | 15.48 | 23.66
95.45% | 11.39 | 27.75
99.73% | 7.48 | 31.84
posted by ijoshua at 4:46 PM on January 16, 2007
averages:
22.01 for 12.5 million unverified addresses
29.49 for 964 thousand verified addresses
And my data doesn't appear to be normal, so i didn't do any other stats. It might be log-normal or something, but I really don't care enough to check.
posted by mosch at 10:58 PM on January 16, 2007
22.01 for 12.5 million unverified addresses
29.49 for 964 thousand verified addresses
And my data doesn't appear to be normal, so i didn't do any other stats. It might be log-normal or something, but I really don't care enough to check.
posted by mosch at 10:58 PM on January 16, 2007
Nope, not log-normal either. Plotting it, it looks like there are really two e-mail address distributions, one for human addresses, and a second for computer-generated addresses.
posted by mosch at 11:02 PM on January 16, 2007
posted by mosch at 11:02 PM on January 16, 2007
My data comes from a database of 323 addresses. The distribution has some upper-end outliers (positively-skewed). It is normally distributed without the outliers (I tested it.)
Min: 12
1st quartile: 19
Mean (w/ outliers): 23.04
Mean w/o outliers): 22.79
3rd quartile: 26
Max (w/ outliers): 47
Max (w/o outliers): 35
Median: 23
Mode: 24
Std. Dev (w/ outliers): 5.20
Std. Dev (w/o outliers): 4.70
Ranges based on data including outliers
68.2% of data 17.8 - 28.2
95.4% of data 12.6 - 33.4
99.7% of data 7.4 - 38.6
Ranges based on data outliers excluded
68.2% of data 18.1 - 27.5
95.4% of data 13.4 - 32.2
99.7% of data 8.7 - 36.9
posted by nekton at 8:57 AM on January 17, 2007
Min: 12
1st quartile: 19
Mean (w/ outliers): 23.04
Mean w/o outliers): 22.79
3rd quartile: 26
Max (w/ outliers): 47
Max (w/o outliers): 35
Median: 23
Mode: 24
Std. Dev (w/ outliers): 5.20
Std. Dev (w/o outliers): 4.70
Ranges based on data including outliers
68.2% of data 17.8 - 28.2
95.4% of data 12.6 - 33.4
99.7% of data 7.4 - 38.6
Ranges based on data outliers excluded
68.2% of data 18.1 - 27.5
95.4% of data 13.4 - 32.2
99.7% of data 8.7 - 36.9
posted by nekton at 8:57 AM on January 17, 2007
This thread is closed to new comments.
Count Avg StdDev
247042 21 5.27275049270157
posted by sanko at 3:14 PM on January 16, 2007