<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel>
	  <title>Ask MetaFilter questions tagged with statistics</title>
      <link>http://ask.metafilter.com/tags/statistics</link>
      <description>Questions tagged with 'statistics' at Ask MetaFilter.</description>
	  <pubDate>Wed, 23 Dec 2009 12:58:21 -0800</pubDate> <lastBuildDate>Wed, 23 Dec 2009 12:58:21 -0800</lastBuildDate>

      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>	  
	<item>
	<title>Math/Stats: help me analyze a data set and determine the values that created it</title>
	<link>http://ask.metafilter.com/141426/MathStats%2Dhelp%2Dme%2Danalyze%2Da%2Ddata%2Dset%2Dand%2Ddetermine%2Dthe%2Dvalues%2Dthat%2Dcreated%2Dit</link>	
	<description>Mathematics / Statistics Filter: I have some pairs of numbers that are the result of a process.  Given just that data set, and a rule that relates them, can you determine the integer values that could have resulted in those sets? Apologies for the phrasing of the FPP -- I know it doesn&apos;t make much sense.  Hopefully some mathematics / statistics types will click through and see this longer version.&lt;br&gt;
&lt;br&gt;
I have some sets of numbers, shown below.  I&apos;m trying to reverse engineer the numbers that could have resulted in these sets, based on some known mathematical relationships between them.&lt;br&gt;
&lt;br&gt;
In general, a given Device(n) consumes a resource in integer Quantities at a floating point Rate(n), resulting in a total Cost for that consumption run.  What I have is pairs of Device/Cost, for several different Device for several runs each, and I&apos;m trying to determine the floating point Rate.  &lt;strong&gt;The Rate is constant for a given Device.&lt;/strong&gt;  The Quantity consumed is different for each run, but the one key here is that I know that the Quantity values are integers.&lt;br&gt;
&lt;br&gt;
So for a given Device run, we have:&lt;br&gt;
&lt;br&gt;
&lt;strong&gt;Quantity(int) x Rate(float) = Cost(float)&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;
All I have is Cost data for each Device, but I have multiple sets of these and am hoping there&apos;s some sort of numeric analysis that can tell me the likely Quantity values that fit.&lt;br&gt;
&lt;br&gt;
Here&apos;s a sample of the data:&lt;br&gt;
&lt;br&gt;
Device / Cost&lt;br&gt;
Device1 / 1235&lt;br&gt;
Device1 /  988&lt;br&gt;
Device1 / 1003&lt;br&gt;
Device1 / 1526&lt;br&gt;
Device2 / 3652&lt;br&gt;
Device2 / 1207&lt;br&gt;
Device2 / 1729&lt;br&gt;
Device2 /  518&lt;br&gt;
Device3 /  745&lt;br&gt;
Device3 / 2115&lt;br&gt;
Device3 / 1415&lt;br&gt;
Device3 /  334&lt;br&gt;
&lt;br&gt;
So, for example, using the Device1 / 988 and Device1 / 1003 set, I could eyeball it and see that the Cost difference of 15 is due to 1 unit of Quantity difference in the runs.  Thus the first run consumed 66 x 14.97 = 988 and the second run consumed 67 x 14.97 = 1003 .  (Alas, the Rate values should be more in the 30-50 range, so 14.97 doesn&apos;t make much sense)  But I&apos;m hoping that with a larger population of data, there&apos;s some analysis I can do that will give a more confident answer.&lt;br&gt;
&lt;br&gt;
Perhaps this can even be solved without ensuring that the Quantity values are integers, but it&apos;s a constraint that the data is supposed to have so I thought I&apos;d mention it.&lt;br&gt;
&lt;br&gt;
I&apos;ll monitor this thread for the next couple hours to answer any questions.  And will add better tags!  I used the science category because this seems like the kind of math that a lab scientist might be familiar with, trying to analyze a data set to work out the conditions that created it.  I&apos;m especially hoping for a statistic analysis that produces some sort of confidence measure, because a couple of these data points might be outliers, screwing up what might otherwise be a closed solution.&lt;br&gt;
&lt;br&gt;
Note: this is not homework filter, or even do-my-job filter.  It&apos;s just something I&apos;m trying to reverse engineer.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.141426</guid>
	<pubDate>Wed, 23 Dec 2009 12:58:21 -0800</pubDate>
	<category>math</category>
	<category>statistics</category>
	<dc:creator>intermod</dc:creator>
	</item>
	<item>
	<title>Conditional Probability</title>
	<link>http://ask.metafilter.com/140868/Conditional%2DProbability</link>	
	<description>Stats-filter: Given a binary matrix, if I know the total number of ones in a given row and a given column, can I calculate the probability that a given position contains a one? I have a binary matrix, like so, where every value is either 1 or 0. So, if the first column contains 2 ones, and the first row contains 1 one, what&apos;s the probability that position A contains a one?&lt;br&gt;
&lt;br&gt;
Example:&lt;br&gt;
&lt;code&gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;2&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;________&lt;br&gt;
&amp;nbsp;1&amp;nbsp;|&amp;nbsp;A|&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;|__|__|__|&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;|__|__|__|&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;|__|__|__|&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&amp;nbsp;&amp;nbsp;|&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;|__|__|__|&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
(not homework-filter)</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.140868</guid>
	<pubDate>Wed, 16 Dec 2009 17:25:54 -0800</pubDate>
	<category>conditional</category>
	<category>math</category>
	<category>probability</category>
	<category>resolved</category>
	<category>statistics</category>
	<dc:creator>chrisamiller</dc:creator>
	</item>
	<item>
	<title>Ask a Stats Nerd</title>
	<link>http://ask.metafilter.com/140720/Ask%2Da%2DStats%2DNerd</link>	
	<description>I need to ruin everyone&apos;s fun by adding a rigorous mathematical scoring system to the company bake-off. Li&apos;l help? We just had an employee bake-off and while the outcome was satisfactory to all, I think the scoring methodology (average of ratings on a 1-10 scale) was too arbitrary to be statistically meaningful.&lt;br&gt;
&lt;br&gt;
How would you build an objective and mathematically sound ranking system based on the following criteria?&lt;br&gt;
&lt;br&gt;
 * Entries will be judged by all participating employees in each of three categories: taste, presentation, and creativity&lt;br&gt;
 * There will be a winner within each category, as well as an overall winner&lt;br&gt;
 * Not everyone has to vote on every entry&lt;br&gt;
&lt;br&gt;
We make software, so yes, the methodology by which we judge pie is &lt;em&gt;quite&lt;/em&gt; critical.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.140720</guid>
	<pubDate>Tue, 15 Dec 2009 08:18:52 -0800</pubDate>
	<category>baking</category>
	<category>math</category>
	<category>statistics</category>
	<dc:creator>sonofslim</dc:creator>
	</item>
	<item>
	<title>How much money does the gossip magazine / gossip blog / tabloid industry make?</title>
	<link>http://ask.metafilter.com/140423/How%2Dmuch%2Dmoney%2Ddoes%2Dthe%2Dgossip%2Dmagazine%2Dgossip%2Dblog%2Dtabloid%2Dindustry%2Dmake</link>	
	<description>So, celebrity gossip is everywhere and lots of $$$ is forked out for images of famous babies, etcetera... BUT how much money does the gossip industry MAKE? I can&apos;t find any online statistics about Gossip Industry Revenues. It&apos;s actually very odd and mysterious.&lt;br&gt;
&lt;br&gt;
I&apos;m looking for numbers on gossip magazines, gossip blogs and/or the tabloid industry, with the goal of comparing the Gossip numbers to the Porn Industry and the Television Industry respectively. I could use global numbers, US or North American numbers. &lt;br&gt;
&lt;br&gt;
It would be useful to find a webpage as comprehensive as this: http://internet-filter-review.toptenreviews.com/internet-pornography-statistics.html&lt;br&gt;
&lt;br&gt;
...but I would take a single, high impact sentence about the tabloids and gossip rags (such as this gem: &quot;US Porn Revenues are greater than the combined revenues of ABC, CBS and NBC&quot;), which was properly cited and verifiable.&lt;br&gt;
&lt;br&gt;
Anyone got anything?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.140423</guid>
	<pubDate>Fri, 11 Dec 2009 12:23:45 -0800</pubDate>
	<category>blogs</category>
	<category>celebrities</category>
	<category>celebrity</category>
	<category>fame</category>
	<category>famewhores</category>
	<category>gossip</category>
	<category>hello</category>
	<category>industry</category>
	<category>magazines</category>
	<category>media</category>
	<category>money</category>
	<category>online</category>
	<category>photographers</category>
	<category>photographs</category>
	<category>porn</category>
	<category>publishers</category>
	<category>revenue</category>
	<category>star</category>
	<category>stars</category>
	<category>statistics</category>
	<category>tabloids</category>
	<category>usweekly</category>
	<category>vultures</category>
	<dc:creator>Elle Vator</dc:creator>
	</item>
	<item>
	<title>What are the odds?</title>
	<link>http://ask.metafilter.com/140235/What%2Dare%2Dthe%2Dodds</link>	
	<description>I&apos;m rolling 5 dice. I win if I roll two 1&apos;s, one 2, or one 3. What are my odds of winning, expressed as a percentage? This question is actually about the board game Battlelore, but I&apos;ve simplified it above for ease of discussion.&lt;br&gt;
&lt;br&gt;
In the game, you roll dice to score hits. Instead of numbers on the dice, there are three banner color icons (red, green, and blue), a &quot;lore&quot; icon, a retreat flag icon, and a bonus strike icon.&lt;br&gt;
&lt;br&gt;
In combat, if you roll the banner color that matches the color of the target, you score a hit. In most circumstances, you also get a hit if you roll the bonus strike icon, and sometimes if you roll the lore icon or the retreat flag icon, just depending on what cards are in play and where the pieces are on the board.&lt;br&gt;
&lt;br&gt;
I&apos;d put together &lt;a href=&quot;http://www.jdharper.com/downloads/battlelore probabilities.xlsx&quot;&gt;a spreadsheet&lt;/a&gt; for determining the odds of almost any given die roll in Battlelore. So, for example, if you want to know the odds of scoring two hits if you&apos;re rolling four dice which hit on bonus strikes and blue banners, you&apos;d look at cell D31 and see that you have 40.74% chance of that happening.&lt;br&gt;
&lt;br&gt;
But there&apos;s this one case from a recent game that&apos;s giving me trouble: I need to score one hit, I&apos;m rolling 5 dice, and I score hits on the blue banner, the retreat flag, and on bonus strikes. &lt;b&gt;BUT&lt;/b&gt;: The target ignores the first bonus strike. &lt;br&gt;
&lt;br&gt;
No problem, I thought: That would be the odds of scoring one hit on on two die faces with five dice (cell D8, 86.83%) plus the odds of scoring two hits on one die face with five dice (cell C32, or 19.62%). &lt;br&gt;
&lt;br&gt;
Unfortunately, those results add up to more than 100%, so I know that&apos;s wrong. The odds are high, probably over 90%, but there&apos;s still the chance that I&apos;ll roll all green and red banners or something.&lt;br&gt;
&lt;br&gt;
How do I handle this oddly specific situation?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.140235</guid>
	<pubDate>Wed, 09 Dec 2009 15:59:54 -0800</pubDate>
	<category>battlelore</category>
	<category>boardgames</category>
	<category>dice</category>
	<category>resolved</category>
	<category>statistics</category>
	<dc:creator>JDHarper</dc:creator>
	</item>
	<item>
	<title>How many grandmothers will die in two months?</title>
	<link>http://ask.metafilter.com/140174/How%2Dmany%2Dgrandmothers%2Dwill%2Ddie%2Din%2Dtwo%2Dmonths</link>	
	<description>Actuarial / statistics geeks: a puzzle for you. You are on a two month training program with a group of Korean teachers of English.&lt;br&gt;
&lt;br&gt;
There are 20 teachers, ranging in age between late 20s to early 50s.&lt;br&gt;
&lt;br&gt;
How many of their grandmothers are likely to die during the course?&lt;br&gt;
&lt;br&gt;
Consider the current months (October-November) as the timeframe.  Assume the grandmothers are all residents of Daegu, Korea.&lt;br&gt;
&lt;br&gt;
I probably cannot provide any more specifics than those, but if you want clarification I will try.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.140174</guid>
	<pubDate>Tue, 08 Dec 2009 22:55:12 -0800</pubDate>
	<category>actuarial</category>
	<category>dead</category>
	<category>grandmother</category>
	<category>liklihood</category>
	<category>probability</category>
	<category>resolved</category>
	<category>statistics</category>
	<dc:creator>Meatbomb</dc:creator>
	</item>
	<item>
	<title>What interesting statistical information can I dig out of this medical billing data?</title>
	<link>http://ask.metafilter.com/138564/What%2Dinteresting%2Dstatistical%2Dinformation%2Dcan%2DI%2Ddig%2Dout%2Dof%2Dthis%2Dmedical%2Dbilling%2Ddata</link>	
	<description>What interesting statistical information can I dig out of this medical billing data? I have 2000 rows of medical billing information (mysql) with the following attributes.  I&apos;ve normalized the data and here is a very simple description:&lt;br&gt;
&lt;br&gt;
patient:&lt;br&gt;
id&lt;br&gt;
dob&lt;br&gt;
city/state/zip&lt;br&gt;
primary insurance provider&lt;br&gt;
secondary insurance provider&lt;br&gt;
&lt;br&gt;
provider:&lt;br&gt;
id&lt;br&gt;
&lt;br&gt;
procedure:&lt;br&gt;
procedure id/code&lt;br&gt;
&lt;br&gt;
patient_procedure:(fk to patient &amp;amp; provider &amp;amp; procedure)&lt;br&gt;
procedure date&lt;br&gt;
billed amount&lt;br&gt;
&lt;br&gt;
Unfortunately, the data only spans one month so I can&apos;t provide anything interesting across a date dimension beyond aggregating certain measures by weekday. &lt;br&gt;
&lt;br&gt;
I&apos;ve already produced a lot of information such as&lt;br&gt;
% of patients with no secondary insurance&lt;br&gt;
avg age for procedure&lt;br&gt;
top-5 most frequently occurring procedures with large avg cost&lt;br&gt;
billing  and frequency by zipcode&lt;br&gt;
so on and so forth-measure X across dimension Y&lt;br&gt;
&lt;br&gt;
I&apos;m looking for suggestions for some deeper analysis than the many SQL aggregates that I&apos;ve performed.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.138564</guid>
	<pubDate>Thu, 19 Nov 2009 11:37:28 -0800</pubDate>
	<category>analysis</category>
	<category>data</category>
	<category>mysql</category>
	<category>sql</category>
	<category>statistics</category>
	<dc:creator>neilkod</dc:creator>
	</item>
	<item>
	<title>Paired T Test sounds like parakeet test to me</title>
	<link>http://ask.metafilter.com/138037/Paired%2DT%2DTest%2Dsounds%2Dlike%2Dparakeet%2Dtest%2Dto%2Dme</link>	
	<description>Hi, I have a specific question about paired t-tests! On the paired t-test, the spreadsheet looks like this&lt;br&gt;
Let&apos;s say I&apos;m doing a a paired t-test for timed trials. &lt;br&gt;
The spreadsheet looks like this:&lt;br&gt;
&lt;br&gt;
Subject A      Subject B            Difference&lt;br&gt;
5                       3                          2&lt;br&gt;
4                       4                          0&lt;br&gt;
5                       6                         -1&lt;br&gt;
&lt;br&gt;
I don&apos;t understand why the &quot;Difference&quot; column allows for negative numbers. That would imply that there was a meaningful difference in the order of the two columns. If the order of the two columns isn&apos;t meaningful (it doesn&apos;t matter what data goes in col.1 and what data goes in col.2, since the trials were not completed in a set order) then shouldn&apos;t the &quot;Difference&quot; column transform the difference into the positive number? (For example, the -1 above would become a 1). Or is the order of the columns meaningful somehow that it would need to be preserved through the negative number?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.138037</guid>
	<pubDate>Fri, 13 Nov 2009 11:37:29 -0800</pubDate>
	<category>math</category>
	<category>pairedttest</category>
	<category>resolved</category>
	<category>statistics</category>
	<dc:creator>amethysts</dc:creator>
	</item>
	<item>
	<title>How does the CDC measure the spread of H1N1?</title>
	<link>http://ask.metafilter.com/138006/How%2Ddoes%2Dthe%2DCDC%2Dmeasure%2Dthe%2Dspread%2Dof%2DH1N1</link>	
	<description>There&apos;s no special place to turn up if you think you&apos;ve got the swine flu to be tested or otherwise counted--hospitals and clinics tell people to just stay home unless they are having actual health complications. How is the CDC able to say that &lt;a href=&quot;http://news.yahoo.com/s/ap/20091113/ap_on_he_me/us_med_swine_flu&quot;&gt;22 million people have been infected with H1N1&lt;/a&gt; when if you don&apos;t have to be hospitalized, nobody will even test you for it?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.138006</guid>
	<pubDate>Fri, 13 Nov 2009 02:01:32 -0800</pubDate>
	<category>cdc</category>
	<category>data</category>
	<category>h1n1</category>
	<category>health</category>
	<category>statistics</category>
	<category>swineflu</category>
	<dc:creator>autoclavicle</dc:creator>
	</item>
	<item>
	<title>Interpretation of cross-correlation and cross-covariance plots</title>
	<link>http://ask.metafilter.com/137402/Interpretation%2Dof%2Dcrosscorrelation%2Dand%2Dcrosscovariance%2Dplots</link>	
	<description>How to interpret certain features? I am looking for some help regarding interpretation of both cross-correlation and cross-covariance plots.&lt;br&gt;
&lt;br&gt;
Namely, I have been struggling with these features:&lt;br&gt;
&lt;br&gt;
* sudden negative spike immediately followed by positive spike&lt;br&gt;
* broad (&apos;smeared&apos;) bulge&lt;br&gt;
* oscillation of the plot (i.e. the trace somewhat resembling a sine)&lt;br&gt;
* ondulation of the trace (think of it as if you were writing u&apos;s continuously)&lt;br&gt;
&lt;br&gt;
I have been going through google over and over again, but to no avail. Are there any good sources/books/etc. that deal with actual interpretation of cross-correlation and cross-covariance plots?&lt;br&gt;
&lt;br&gt;
BONUS question: to what extent can cross-correlation and cross-covariance be treated as a representation of impulse response function?&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Any thoughts will be appreciated.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.137402</guid>
	<pubDate>Fri, 06 Nov 2009 06:56:44 -0800</pubDate>
	<category>correlation</category>
	<category>covariance</category>
	<category>cross-correlation</category>
	<category>cross-covariance</category>
	<category>statistics</category>
	<dc:creator>noztran</dc:creator>
	</item>
	<item>
	<title>How would I pick which equation to use?</title>
	<link>http://ask.metafilter.com/137295/How%2Dwould%2DI%2Dpick%2Dwhich%2Dequation%2Dto%2Duse</link>	
	<description>I want to determine the difference between distributions. When would I use kl-divergence and when would I use rmse? It seems like both equations reduce deviation to a single number, but couldn&apos;t find a comparison between them.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.137295</guid>
	<pubDate>Wed, 04 Nov 2009 22:49:49 -0800</pubDate>
	<category>deviation</category>
	<category>distribution</category>
	<category>statistics</category>
	<dc:creator>lpctstr;</dc:creator>
	</item>
	<item>
	<title>Rates of success?</title>
	<link>http://ask.metafilter.com/137228/Rates%2Dof%2Dsuccess</link>	
	<description>Statistics question: is it possible to test sets of cumulative data for significant differences in rate? I have three cumulative percentage graphs, measuring the germination rates of three different seed types. Is there a way to compare them and see if there are any statistically significant differences?&lt;br&gt;
&lt;br&gt;
The seed types were planted in triplicate, on three dishes each (nine overall). Every day for the past few weeks I&apos;ve observed how many seeds on each dish have begun germinating -- so for an individual dish I would have &quot; Day 1: 0 ... Day 7: 14 ... Day 14: 29&quot; etc, with each day&apos;s score a cumulative total. (There are 100 seeds on each dish, so it works as a percentage rate as well)&lt;br&gt;
&lt;br&gt;
In Excel, I&apos;ve graphed the average germination rates of the replicates, for a graph that &lt;a href=&quot;http://trenchfever.files.wordpress.com/2008/03/cumulative-civilian-and-service.jpg&quot;&gt;resembles this one&lt;/a&gt;. (with three lines plotted, and x-axis = time in days, y-axis = percent germinated).&lt;br&gt;
&lt;br&gt;
So is there a way to compare these different rates statistically? I can use Excel, Minitab, SPSS, and R.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.137228</guid>
	<pubDate>Wed, 04 Nov 2009 09:46:36 -0800</pubDate>
	<category>chart</category>
	<category>cumulative</category>
	<category>data</category>
	<category>excel</category>
	<category>graphs</category>
	<category>mathematics</category>
	<category>minitab</category>
	<category>r</category>
	<category>rates</category>
	<category>resolved</category>
	<category>science</category>
	<category>spss</category>
	<category>statistics</category>
	<dc:creator>rollick</dc:creator>
	</item>
	<item>
	<title>Programs to work with Census data?</title>
	<link>http://ask.metafilter.com/136853/Programs%2Dto%2Dwork%2Dwith%2DCensus%2Ddata</link>	
	<description>How do I create a cross-tab from Census data (SF3)? I would like to create a cross tab that shows disability status by household (or family) income at the census tract level of detail.  I cannot get that table from American Factfinder.  Saw the link that said &quot;download the data and you can generate your own cross tabs&quot;.  Downloaded data - got big old text (ascii) file that does not easily import into excel or access (the only two potential programs I have now).  Now I have big old text file and shattered hopes.&lt;br&gt;
&lt;br&gt;
I think I am looking for a (preferably free or shareware) program that will open and analyze this data with a minimum of fussing on my part.  &lt;br&gt;
&lt;br&gt;
Or, maybe I am looking for something completely different.  You&apos;d propbably know better than me.&lt;br&gt;
&lt;br&gt;
Thanks in advance.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.136853</guid>
	<pubDate>Fri, 30 Oct 2009 08:51:36 -0800</pubDate>
	<category>census</category>
	<category>crosstab</category>
	<category>freeware</category>
	<category>shareware</category>
	<category>software</category>
	<category>statistics</category>
	<dc:creator>qldaddy</dc:creator>
	</item>
	<item>
	<title>Help me out-Billy Beane my fantasy baseball league!</title>
	<link>http://ask.metafilter.com/136643/Help%2Dme%2DoutBilly%2DBeane%2Dmy%2Dfantasy%2Dbaseball%2Dleague</link>	
	<description>How do I appropriately valuate players in my competitive fantasy baseball league? I just took over a horrendous team in a very competitive fantasy baseball league. I&apos;m very familiar with sabermetric stats employed by real-life GMs, but it occurs to me that those same stats are not necessarily going to be helpful to me in fantasy baseball. (Especially because so many of the stats we score on are non-sabermetric in nature.)&lt;br&gt;
&lt;br&gt;
To wit: real-life GMs use sabermetric stats to find players who avoid making outs (hitters) or get outs most efficiently (pitchers). In my league, I&apos;m looking to maximize or minimize the cumulative results in specific statistical categories across my lineup.&lt;br&gt;
&lt;br&gt;
To that end, I&apos;m trying to put together my own valuation system for players, but it&apos;s been awhile since my econometrics and statistics courses in college. &lt;b&gt;Can you help me put together an appropriate formula - or point me in the direction of how I should be analyzing/comparing the statistics - to properly valuate the players in my league?&lt;/b&gt;&lt;br&gt;
&lt;br&gt;
The details: I&apos;m in a weekly head-to-head rotisserie league based upon the following statistical categories:&lt;br&gt;
&lt;br&gt;
HITTERS&lt;br&gt;
Runs&lt;br&gt;
Singles&lt;br&gt;
Doubles&lt;br&gt;
Triples&lt;br&gt;
Home Runs&lt;br&gt;
RBI&lt;br&gt;
Walks&lt;br&gt;
Strikeouts (lowest total wins this category)&lt;br&gt;
Stolen Bases&lt;br&gt;
Batting Average&lt;br&gt;
&lt;br&gt;
PITCHERS&lt;br&gt;
Walks (lowest total wins this category)&lt;br&gt;
Strikeouts&lt;br&gt;
Complete Games&lt;br&gt;
Wins&lt;br&gt;
Losses&lt;br&gt;
Saves&lt;br&gt;
Holds&lt;br&gt;
ERA&lt;br&gt;
WHIP (lowest total wins this category)&lt;br&gt;
K/9&lt;br&gt;
&lt;br&gt;
I&apos;m not sure if any other parameters are needed, but any help you might be able to provide would be most appreciated.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.136643</guid>
	<pubDate>Wed, 28 Oct 2009 07:06:34 -0800</pubDate>
	<category>baseball</category>
	<category>fantasybaseball</category>
	<category>sabermetrics</category>
	<category>statistics</category>
	<dc:creator>po822000</dc:creator>
	</item>
	<item>
	<title>Not all random numbers are created equal</title>
	<link>http://ask.metafilter.com/136426/Not%2Dall%2Drandom%2Dnumbers%2Dare%2Dcreated%2Dequal</link>	
	<description>How do I get a controlled distribution of random numbers to fairly determine a start position. In a sporting event, start position is decided based on the last digit of your registration number. Each week, random numbers are drawn to decide the start order. For example, the random draw order for a single week is 4, 0, 3, 5, 1, 8, 7, 9, 6, 2. So everyone with a number ending in 4 starts first. Everyone with a number ending in 0 starts second. And so on. The next week the draw order is again random.&lt;br&gt;
&lt;br&gt;
While this works, the distribution of of numbers can end up being unfair (one particular number can be &quot;lucky&quot; or &quot;unlucky&quot; for many weeks). Statistically, how would one generate a set of &quot;random&quot; start orders so that the value of each registration number was roughly equal over the course of a season (for ease of calculation, let&apos;s say 10 weeks).&lt;br&gt;
&lt;br&gt;
I don&apos;t know anything about math or statistics, so my description of this situation probably uses lots of words incorrectly. I&apos;d google this, but I don&apos;t even know how to start.&lt;br&gt;
&lt;br&gt;
Basically, is it possible for &lt;strong&gt;value &lt;/strong&gt;of all of the numbers to even out. But in a &lt;em&gt;random &lt;/em&gt;order. So, for example, that 0&apos;s aren&apos;t always going after 4&apos;s. And one group doesn&apos;t always start in the middle.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.136426</guid>
	<pubDate>Sun, 25 Oct 2009 23:30:14 -0800</pubDate>
	<category>numbers</category>
	<category>probability</category>
	<category>statistics</category>
	<dc:creator>monkeystronghold</dc:creator>
	</item>
	<item>
	<title>BookSuggestionFilter: I need to learn R (and about statistics) in a hurry.</title>
	<link>http://ask.metafilter.com/136197/BookSuggestionFilter%2DI%2Dneed%2Dto%2Dlearn%2DR%2Dand%2Dabout%2Dstatistics%2Din%2Da%2Dhurry</link>	
	<description>BookSuggestionFilter: I need to learn about R (and statistical modeling) in a hurry. I need to learn R in a hurry. I&apos;m heading full steam into a new software development project at work where I will be working with some hard-core statistical modelers who are building models in R (as well as using other stuff like SAS).&lt;br&gt;
&lt;br&gt;
I have a degree in engineering so I am ok with the math, but it has been over a decade since I did anything with statistics (and it was pretty rough even back then).&lt;br&gt;
&lt;br&gt;
I&apos;m looking for some book recommendations for the following:&lt;br&gt;
1) a refresher on general statistics and some introductory material with statistical modeling&lt;br&gt;
2) a book about R for someone who is an experienced software developer</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.136197</guid>
	<pubDate>Thu, 22 Oct 2009 19:17:29 -0800</pubDate>
	<category>books</category>
	<category>R</category>
	<category>statistics</category>
	<dc:creator>kenliu</dc:creator>
	</item>
	<item>
	<title>Why are postseason baseball games so long?</title>
	<link>http://ask.metafilter.com/135819/Why%2Dare%2Dpostseason%2Dbaseball%2Dgames%2Dso%2Dlong</link>	
	<description>Why are postseason baseball games so long?  (Data inside.) As I start to write this question, it&apos;s 9:09 pm.  The Phillies and Dodgers started their game tonight at 8:07 pm and &lt;i&gt;just&lt;/i&gt; finished the second inning.  At this rate (31 minutes per innings) the game will take around four and a half hours.&lt;br&gt;
&lt;br&gt;
Now, it&apos;s 6-0 Phillies, and it takes time to get all those men on base, so this isn&apos;t typical.  But in Major League Baseball, playoff games in general seem to take longer than regular-season games.&lt;br&gt;
&lt;br&gt;
Last year&apos;s postseason game times (obtained somewhat laboriously from &lt;a href=&quot;http://www.baseball-reference.com/postseason/&quot;&gt;baseball-reference.com&lt;/a&gt;):&lt;br&gt;
World Series: 3:23, 3:05, 3:41, 3:08, 3:28&lt;br&gt;
NLCS: 2:36, 3:33, 2:57, 3:44, 3:14&lt;br&gt;
ALCS: 3:25, 5:27 (11), 3:23, 3:07, 4:08, 3:48, 3:31&lt;br&gt;
NLDS (Phillies-Brewers): 2:39, 3:00, 3:31, 2:53&lt;br&gt;
NLDS (Dodgers-Cubs): 3:10, 3:10, 3:03&lt;br&gt;
NLDS (Rays-White Sox): 3:10, 3:10, 3:07, 3:13&lt;br&gt;
NLDS (Red Sox-Angels): 3:14, 3:51, 5:19 (12),  2:50&lt;br&gt;
&lt;br&gt;
If you just glance at this, you see that games over three hours predominate.  Median game length is 3:13; mean is 3:24.&lt;br&gt;
&lt;br&gt;
For comparison, the median length of games played by the same eight teams during the regular season was 2:53, the mean 2:55. (In the interest of full disclosure, games between two of those teams are counted twice.)&lt;br&gt;
&lt;br&gt;
Why are postseason games longer?  More pitching changes?  Longer breaks between innings?  (They don&apos;t &lt;i&gt;seem&lt;/i&gt; longer, but an extra thirty seconds at every commercial break is ten minutes or so over the course of a game.)  Something else I haven&apos;t thought of?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.135819</guid>
	<pubDate>Sun, 18 Oct 2009 18:38:10 -0800</pubDate>
	<category>baseball</category>
	<category>statistics</category>
	<dc:creator>madcaptenor</dc:creator>
	</item>
	<item>
	<title>What&apos;s the statistical technique for combining several test results into one?</title>
	<link>http://ask.metafilter.com/135754/Whats%2Dthe%2Dstatistical%2Dtechnique%2Dfor%2Dcombining%2Dseveral%2Dtest%2Dresults%2Dinto%2Done</link>	
	<description>I&apos;m a statistics n00b trying to learn how to combine the results of several tests into one. Basically, I&apos;d like to learn to how to categorize entities in some experimental data by combining the scores from several domain-specific tests into one unified score.  A practical but hypothetical example would be writing a computer program that given the sound of a car engine will try to identify what model of car it came from.  Say that there are 10 possible cars each sound can be matched with, and three independent tests that are applied to each sound, each test producing a number between zero and one for each car model indicating how likely it is that the sound came from that model.  I could combine the test results naively by simply adding them together, but that could produce crappy results if one of the tests is much more accurate than the others, possibly worse results than just using that test by itself. &lt;br&gt;
&lt;br&gt;
There are established ways to do this that I&apos;ve seen used in research before, but don&apos;t know any of the math and haven&apos;t had any luck Googling for info.  I&apos;m not looking for a detailed explanation, just some pointers to what I should research to teach myself.  Thanks!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.135754</guid>
	<pubDate>Sat, 17 Oct 2009 20:39:39 -0800</pubDate>
	<category>datamining</category>
	<category>experiments</category>
	<category>research</category>
	<category>statistics</category>
	<dc:creator>gsteff</dc:creator>
	</item>
	<item>
	<title>Does chi-square work for this?</title>
	<link>http://ask.metafilter.com/135201/Does%2Dchisquare%2Dwork%2Dfor%2Dthis</link>	
	<description>StatisticAnalysisFilter: I took (pretty close to) scientific observations of the general populace in a neighborhood for a few months (personal project, long story). I measured the number of people who had trait X (or did not have trait X) in two locations, A and B. Now, I want to test the statistical significance of these results. Is the chi-square test sufficient for this? Or is there a better option? Effectively, I&apos;ve observed that Trait X is much more common in location A than location B. But I want to let the numbers do the talking, of course, and see if this is a statistically significant difference, or if it is due to chance. Standard science stuff, but I want to make sure I&apos;m doing this correctly, since I haven&apos;t done this in a while!</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.135201</guid>
	<pubDate>Sun, 11 Oct 2009 15:23:30 -0800</pubDate>
	<category>chi</category>
	<category>chisquare</category>
	<category>chi-square</category>
	<category>math</category>
	<category>resolved</category>
	<category>science</category>
	<category>square</category>
	<category>statistics</category>
	<dc:creator>MoreForMad</dc:creator>
	</item>
	<item>
	<title>Where can I find detailed statistics on the garbage sitting in landfills?</title>
	<link>http://ask.metafilter.com/134923/Where%2Dcan%2DI%2Dfind%2Ddetailed%2Dstatistics%2Don%2Dthe%2Dgarbage%2Dsitting%2Din%2Dlandfills</link>	
	<description>I&apos;m looking for statistics on the types of garbage that&apos;s currently sitting in American landfills. Does anybody have any good resources for this? To be clear, I&apos;m trying to find the percentage of paper, plastics, metal, etc as they contribute to the greater whole.
Statistics from other countries are welcomed too.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.134923</guid>
	<pubDate>Thu, 08 Oct 2009 06:20:47 -0800</pubDate>
	<category>garbage</category>
	<category>landfill</category>
	<category>refuse</category>
	<category>rubbish</category>
	<category>statistics</category>
	<dc:creator>wandergeek</dc:creator>
	</item>
	<item>
	<title>significantly negative about the education I got in statistics</title>
	<link>http://ask.metafilter.com/133994/significantly%2Dnegative%2Dabout%2Dthe%2Deducation%2DI%2Dgot%2Din%2Dstatistics</link>	
	<description>Is a contingency table the right thing for this kind of data (detailed within), and either way, how do I analyze it in Prism 5.0? The experiment includes at least two different treatments, sometimes more.  There are always 10 experimental samples.  Each treatment is replicated three times, for a total of 30 experimental samples at each treatment.  Each experimental sample can have the outcomes: positive or negative, or inconclusive (if necessary, we could turn the inconclusive to &quot;negative&quot;).  My stats seek to answer, does one treatment or another lead to significantly more positive or more negative sample?  &lt;br&gt;
&lt;br&gt;
Any recommended texts on the subject would be greatly appreciated -- I&apos;m going to need to have sound documentation for how I&apos;m doing this at some point, although for the time being, at least understanding would be a great help.&lt;br&gt;
&lt;br&gt;
If it helps any, these experiments are sterilization testing (think autoclave or bleach, not tubal ligation...) and involve treatments that can achieve: total inactivation/sterilization (no positives), fractional inactivation (some positives, some negatives), or what i would view as probable lack of inactivation (all positives).  Most of the results I am seeing now involve two treatments with fractional inactivation and I am trying to get a handle on whether there is a way to say that one treatment resulting in fractional inactivation is better at inactivating than another treatment resulting in fractional inactivation.  Coworkers and boss want to think that more negatives = better inactivation,  but when two treatments have 4/30 positives versus 10/30 positives, i am really not sure how to calculate whether that 4/30 is significantly more inactivated than 10/30. &lt;br&gt;
&lt;br&gt;
I&apos;ve search metafilter for info on contingency tables, as well as google for the same as well as trying to figure out if I should be using some other type of analysis....I am more confused than when I began.  It seems like the examples I find include more complex data than that I am working with...</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.133994</guid>
	<pubDate>Mon, 28 Sep 2009 06:03:59 -0800</pubDate>
	<category>contingencytables</category>
	<category>statistics</category>
	<category>sterilization</category>
	<dc:creator>Tandem Affinity</dc:creator>
	</item>
	<item>
	<title>Help me determine rate of imprement for my players!</title>
	<link>http://ask.metafilter.com/133314/Help%2Dme%2Ddetermine%2Drate%2Dof%2Dimprement%2Dfor%2Dmy%2Dplayers</link>	
	<description>I coach a high school soccer team. Each practice we go through a routine which involves repetitions of various skills. The number of repetitions accomplished by a player for each skill is recorded in an excel spreadsheet for ranking purposes. I want to determine the rate of improvement so the players can see their progress and who&apos;s improving most. Is there a simple excel function that would help me do this?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.133314</guid>
	<pubDate>Sat, 19 Sep 2009 13:28:55 -0800</pubDate>
	<category>excel</category>
	<category>sport</category>
	<category>statistics</category>
	<category>teams</category>
	<dc:creator>TheManticore</dc:creator>
	</item>
	<item>
	<title>stay awhile</title>
	<link>http://ask.metafilter.com/132221/stay%2Dawhile</link>	
	<description>My website&apos;s stats app says that visitors stay for an average of five minutes and ten seconds.  I know that this is pretty decent ... but does it put us in the 70% percentile?  The 90% percentile?  Is there anywhere you know of where I can go to find lots of comparative statistics for duration of visits to sites?</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.132221</guid>
	<pubDate>Mon, 07 Sep 2009 19:04:18 -0800</pubDate>
	<category>analytics</category>
	<category>site</category>
	<category>statistics</category>
	<category>web</category>
	<dc:creator>dacoit</dc:creator>
	</item>
	<item>
	<title>Parvovirus: odds of a puppy getting it?</title>
	<link>http://ask.metafilter.com/131750/Parvovirus%2Dodds%2Dof%2Da%2Dpuppy%2Dgetting%2Dit</link>	
	<description>I understand the danger parvovirus poses to puppies, but what are the &lt;i&gt;odds&lt;/i&gt; of a puppy contracting the disease in the US (specifically Alameda County, California)? I have been reading about parvovirus in dogs (including &lt;a href=&quot;http://ask.metafilter.com/91038/Roger-baby-its-a-wild-world&quot;&gt;this discussion&lt;/a&gt;), and understand how serious the illness is.&lt;br&gt;
&lt;br&gt;
What I can&apos;t seem to find is any indication of risk or prevalance. What are the odds a dog will get parvo, and how many cases of it are there a year in my area?&lt;br&gt;
&lt;br&gt;
The more mathematical and bounded the answer, the better. I know I can&apos;t be assured to the fifth decimal place about anything, but I want to know: Parvo, this terrible disease, are the odds 1%, 10%, or 100%?&lt;br&gt;
&lt;br&gt;
More details below, in the hope that they may allow more exact bounding of the answer.&lt;br&gt;
&lt;br&gt;
My dog is five weeks old. He was one of the larger dogs in the litter (with two or three brothers and a sister), which I understand tends to confer longer maternal immunity. I intend to start him on a full vaccine series for parvo.&lt;br&gt;
&lt;br&gt;
He&apos;s 3/4 Australian Cattle Dog, 1/4 Fox Terrier. He was born in a remote rural area of Humboldt County, California, and as of a few days ago now lives in a semi-urban area in Alameda County.&lt;br&gt;
&lt;br&gt;
I keep him mostly indoors, with trips to the back and front yard for exercise. I understand that completely preventing exposure to parvo is impossible (as the virus hardy and survives for long periods in the soil), but also that minimizing exposure to parvo greatly reduces the chances for infection.&lt;br&gt;
&lt;br&gt;
I would like to know: &lt;br&gt;
&lt;br&gt;
How common is parvo in Humbolt County and in Alameda County? Or, if these specific numbers aren&apos;t available, then whatever numbers are available for California or the US. A link to numbers of cases per year would be ideal.&lt;br&gt;
&lt;br&gt;
What are the odds of a puppy getting parvo between the ages of 5 and 16 weeks if he&apos;s allowed to socialize with a: known dogs (with shots), or b: occasionally visit parks and meet other non-wild dogs.&lt;br&gt;
&lt;br&gt;
Links to scholarly papers are fine, and links to the dog equivalent to the CDC would also appreciated.&lt;br&gt;
&lt;br&gt;
If this is too specific, or if there isn&apos;t enough information, please let me know. Also, I do know how bad the illness itself is.</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.131750</guid>
	<pubDate>Tue, 01 Sep 2009 23:34:45 -0800</pubDate>
	<category>alamedacounty</category>
	<category>australiancattledog</category>
	<category>berkeley</category>
	<category>berkeleyca</category>
	<category>blueheeler</category>
	<category>california</category>
	<category>canine</category>
	<category>cattledog</category>
	<category>disease</category>
	<category>dog</category>
	<category>dogs</category>
	<category>humboltcounty</category>
	<category>odds</category>
	<category>parvo</category>
	<category>parvovirus</category>
	<category>puppies</category>
	<category>puppy</category>
	<category>resolved</category>
	<category>rural</category>
	<category>semi-urban</category>
	<category>statistics</category>
	<category>stats</category>
	<category>urban</category>
	<category>usa</category>
	<dc:creator>zippy</dc:creator>
	</item>
	<item>
	<title>Maintaining Server Privacy</title>
	<link>http://ask.metafilter.com/131323/Maintaining%2DServer%2DPrivacy</link>	
	<description>If you do a search for my company&#8217;s website on Google, one of the top hits is a page at Quantcast.com that purports to display &#8220;visitor statistics&#8221; for our domain.  These statistics are wildly inaccurate by orders of magnitude and make it look as though our site gets only a tiny percentage as many visitors as it actually does.  What&#8217;s more, there are all kinds of demographic charts and info, purportedly about our site, that don&#8217;t seem to have any connection to reality whatsoever.  How do I get them to stop misrepresenting us?  In their FAQs, Quantcast answers the question &#8220;how do I remove my site from Quantcast&#8217;s listing?&#8221; with &#8220;We do not remove sites from our listing.&#8221;  They propose that one join their Quantcast Publisher program, but this program forces you to give them your actual server stats, which they will use and publish whether you like it or not.&lt;br&gt;
&lt;br&gt;
I&#8217;m galled that there seem to be only two choices:&lt;br&gt;
&lt;br&gt;
a.) putting up with a page of highly visible misinformation about my company that causes us actual financial harm; and&lt;br&gt;
&lt;br&gt;
b.) signing up for some company&#8217;s service and giving them access to our server stats.&lt;br&gt;
&lt;br&gt;
Surely there&apos;s a third choice?  My question is this:&lt;br&gt;
Do you think there&#8217;s any way to word a letter to a company like this to get them to remove our site from their listing?&lt;br&gt;
&lt;br&gt;
(BTW, I know there are a couple of other companies, like Alexa, that do this kind of thing, but they don&#8217;t bother me because their stats are vague and don&apos;t pretend to be accurate.  It&apos;s the false &quot;accurate&quot; picture that I can&apos;t get over.)</description>
	<guid isPermaLink="false">tag:ask.metafilter.com,2009:site.131323</guid>
	<pubDate>Thu, 27 Aug 2009 16:27:15 -0800</pubDate>
	<category>law</category>
	<category>online</category>
	<category>privacy</category>
	<category>statistics</category>
	<category>www</category>
	<dc:creator>dacoit</dc:creator>
	</item>
	
	</channel>
</rss>

