Comments on: I was told there would be no math.
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math/
Comments on Ask MetaFilter post I was told there would be no math.Sun, 21 Aug 2016 10:22:30 -0800Sun, 21 Aug 2016 10:32:57 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: I was told there would be no math.
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math
I've crossed a bunch of plants with one another, and now have a bunch of berries on those plants. Each berry can contain way more seeds than I have room to pot up and grow out individually, so I'm trying to find the point of diminishing returns, where potting up additional seedlings stops giving me new and interesting results, but my math background is inadequate. <br /><br /> A previous cross has given me thirteen distinguishable colorations. I imagine this isn't typical, but it's all the information I have so let's roll with that. <br>
<br>
In the previous cross, out of 75 seedlings, result A has occurred 22 times, B 15 times, C 7 times, D 5 times, E 5 times, F 4 times, G 3 times, H 3 times, I 3 times, J 3 times, K 2 times, L 2 times, and M 1 time. <br>
<br>
Assuming these results to be typical, how would I:<br>
• calculate the number of unique outcomes likely to result from potting up <i>n</i> seedlings<br>
• calculate the number of seedlings to pot up in order to have an x% chance of result Y?post:ask.metafilter.com,2016:site.299519Sun, 21 Aug 2016 10:22:30 -0800Spathe CadetmathstatisticsgeneticsplantseedlingsbreedingplantbreedingresolvedBy: Tandem Affinity
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339232
I'm not a plant geneticist, but unless the trait for the characteristic you are looking at has simple Mendelian inheritance patterns, this may be impossible to answer...<br>
<br>
I think it's going to depend on knowing the trait and plant you are working with for any to start helping with this...comment:ask.metafilter.com,2016:site.299519-4339232Sun, 21 Aug 2016 10:32:57 -0800Tandem AffinityBy: Spathe Cadet
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339240
To clarify:<br>
<br>
The fact that we're talking about plant breeding specifically isn't relevant to the question. I know that whatever answers are generated aren't going to be "right," from a genetic standpoint. <br>
<br>
If it helps, treat it as a gumball machine with 13 different flavors of gumball in it, and tell me how to calculate the number of unique flavors I'd get from buying <i>n</i> gumballs, and the number of gumballs I'd need to buy in order to have an x% chance of getting a watermelon one, where watermelon is present in the proportion Y/75.comment:ask.metafilter.com,2016:site.299519-4339240Sun, 21 Aug 2016 10:56:36 -0800Spathe CadetBy: Johnny Assay
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339245
The second question is the easier one to answer. If the probability that you get outcome Y is p<sub>Y</sub>, then:<br>
<ul><li> the probability that you do <i>not</i> get outcome Y in one event is 1 - p<sub>Y</sub>; </li><br>
<li>the probability that you never get outcome Y in any of N events is (1 - p<sub>Y</sub>)<sup>N</sup>; and</li><li>the probability that you don't never get outcome Y (i.e., that you get it at least once) in N events is 1 - (1 - p<sub>Y</sub>)<sup>N</sup>.</li></ul><br>
In particular, if you want there to be a probability p<sub>fail</sub> that you <i>don't</i> get outcome Y, then N must satisfy p<sub>fail</sub> = (1 - p<sub>Y</sub>)<sup>N</sup>; which implies that N = log(p<sub>fail</sub>)/log(1 - p<sub>Y</sub>).<br>
<br>
I'll have to think about how to address the first question.comment:ask.metafilter.com,2016:site.299519-4339245Sun, 21 Aug 2016 11:12:36 -0800Johnny AssayBy: dilaudid
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339246
It's easier just to simulate the first question, although I'm sure there's a formula.<br>
<br>
I did 10^5 trials for each sample size from 1-50 and plotted <a href="http://i.imgur.com/vy0SAsn.png">the results</a>.<br>
<br>
<a href="https://gist.github.com/anonymous/6f1f6565482d13df8f86d0a46a6739bc">Raw data</a>.comment:ask.metafilter.com,2016:site.299519-4339246Sun, 21 Aug 2016 11:17:48 -0800dilaudidBy: SaltySalticid
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339262
'Sampling effort' is what you are thinking about. Each seed planted and checked is a sample, and you want to do that enough times to cover most of the possibilities but not so much as to waste time. Entire books have been written on this. The urn model is one way to go- that will help you rule out insane over effort but I think will generally over predict what you need to get the bulk of common outcomes while under predicting what kind of effort you need to get those really rare combos.<br>
<br>
<br>
Also what kind of plants are we talking here? You could also send seeds around to interested parties as a way of increasing sample effort :)comment:ask.metafilter.com,2016:site.299519-4339262Sun, 21 Aug 2016 12:34:48 -0800SaltySalticidBy: Kalmya
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339266
I'm wondering if you're combinint traits in your outcomes. Like yellow flowers and double leaf trait. As it stands, we don't have a big enough sample size to really tell you these answers. But if you can back out to individual trait, maybe you do.comment:ask.metafilter.com,2016:site.299519-4339266Sun, 21 Aug 2016 13:06:28 -0800KalmyaBy: Johnny Assay
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339272
After further thought, I think that the best way to answer the "expected number of outcomes" question is via a simulation like <b>dilaudid</b> did. <br>
<br>
It also occurred to me that you can ask another question: if you kept planting & raising seedlings until you got one of each type, what is the average number of seedlings that you can expect to plant? This basically a version of the <a href="https://en.wikipedia.org/wiki/Coupon_collector%27s_problem">Coupon Collector's Problem</a> with a non-uniform probability distribution of the "coupons". At the bottom of the page, there's a formula for the expected amount of time to get all the options, in terms of an integral; for your numbers, it works out to 105.8 plants.comment:ask.metafilter.com,2016:site.299519-4339272Sun, 21 Aug 2016 13:12:54 -0800Johnny AssayBy: Valancy Rachel
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339278
Look up the geometric distribution, and the cumulative geometric distribution. You should get decent info from an intro stats text, or just the Internet.comment:ask.metafilter.com,2016:site.299519-4339278Sun, 21 Aug 2016 13:32:52 -0800Valancy RachelBy: Thisandthat
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339279
I have been informed, that in the case of the gumball situation, it's important to know how many balls of each flavor are in the machine.<br>
(asked my mathematician friends to look at this one)<br>
<br>
Edit: Or whether there are infinite gumballscomment:ask.metafilter.com,2016:site.299519-4339279Sun, 21 Aug 2016 13:34:28 -0800ThisandthatBy: Spathe Cadet
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339295
The plants in question are holiday cacti (<i>Schlumbergera</i>). <br>
<br>
There are photos of them all <a href="http://plantsarethestrangestpeople.blogspot.com/p/schlumbergera-seedling-gallery.html">here</a>. Seedlings 003A to 114A are the ones I divide into 13 different color categories.[1] The ones after 114 may or may not be from the same parents; not enough of them have bloomed yet to be able to guess.<br>
<br>
The berries I'm trying to plan for are a mix of crosses between store-bought varieties, store-bought varieties with my own seedlings, and my-seedling/my-seedling crosses.<br>
<br>
Not sure how to answer the gumballs in the machine question. Each berry contains about 70-100 seeds, on average, but the number of <i>possible</i> seeds from a particular cross is astronomical. So I guess either it's a gumball machine that holds 70-100 gumballs with the specified distribution, or it's an infinite, frictionless, spherical gumball machine.<br>
<br>
-<br>
<br>
[1] (The casual observer will see a bunch of interchangeable orangeness, but as they are my babies, I'm better than most people at telling them apart, and I say there are 13 categories. We hit the point of diminishing returns with those a while ago, but naming them entertains me, and there hasn't been another batch of plants mature enough to bloom until very recently so there's been no particular reason to stop them from blooming and getting named.)comment:ask.metafilter.com,2016:site.299519-4339295Sun, 21 Aug 2016 14:08:54 -0800Spathe CadetBy: deludingmyself
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339310
I can't work out from your question(s) whether you're asking about calculating the possibility of finding <em>new</em> colorations beyond your existing 13 variations, or just the likelihood of each color you've obtained so far given those frequencies. The latter is doable, the former not really. But even if you're 'ignoring' genetics to get the likelihood of each coloration, I think you should still do your math on how many to pot up <em>per cross</em> and not just in aggregate.comment:ask.metafilter.com,2016:site.299519-4339310Sun, 21 Aug 2016 14:51:02 -0800deludingmyselfBy: Spathe Cadet
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339357
@deludingmyself:<br>
<br>
The <i>ultimate</i> question, which I am <i>not</i> asking here, is: how many seedlings do I need to plan for from each new cross I make, if I'm trying to minimize the amount of space each batch of seedlings takes up while maximizing the number of interesting (visually distinct) results?<br>
<br>
The <i>immediate</i> question, the question I <i>am</i> asking here, is: assuming that the results I've gotten from the one batch are typical, what is the mathematical relationship between the number of seedlings I grow out and the number of visually distinct results they produce?<br>
<br>
(The assumption that current and future batches of seedlings will be similarly variable is, no doubt, a bad one, but the current batch's information is the only information I can extrapolate from.)comment:ask.metafilter.com,2016:site.299519-4339357Sun, 21 Aug 2016 16:24:16 -0800Spathe CadetBy: aws17576
http://ask.metafilter.com/299519/I-was-told-there-would-be-no-math#4339432
I'm going to take for granted that you know the probability of occurrence of each type of seedling, though this is a pretty tenuous assumption.<br>
<br>
As Johnny Assay said, if each seedling is type Y with probability p<sub>Y</sub> (independently of the other seedlings), then the probability of getting at least one type Y seedling in N tries is 1 - (1 - p<sub>Y</sub>)<sup>N</sup>.<br>
<br>
The <em>average</em> number of distinct types you'll get in N seedlings is simply the sum of 1 - (1 - p<sub>Y</sub>)<sup>N</sup> over all types Y. As an example, if you had three types which occur with probability 1/2, 1/3, and 1/6, and you planted 4 seedlings, then the average number of different types among those 4 seedlings would be [1 - (1/2)<sup>4</sup>] + [1 - (2/3)<sup>4</sup>] + [1 - (5/6)<sup>4</sup>], which is 2.26. This summing trick works because of <a href="https://en.wikipedia.org/wiki/Expected_value#Linearity">linearity of expected value</a> (I can expand on this if you want to know more).<br>
<br>
You can also calculate the average number of seedlings you'll need to plant to obtain the full set of types. This gets pretty gnarly, though. If the probabilities of all the types are p<sub>1</sub>, p<sub>2</sub>, ..., p<sub>r</sub>, then the average number of seedlings needed to obtain a full set is S<sub>1</sub> - S<sub>2</sub> + S<sub>3</sub> - ..., where the signs alternate between + and -, and:<br>
<br>
S<sub>1</sub> = 1/p<sub>1</sub> + 1/p<sub>2</sub> + ... + 1/p<sub>r</sub><br>
S<sub>2</sub> = 1/(p<sub>1</sub>+p<sub>2</sub>) + 1/(p<sub>1</sub>+p<sub>3</sub>) + ... + 1/(p<sub>r-1</sub>+p<sub>r</sub>), with a term for every combination of two types<br>
S<sub>3</sub> = 1/(p<sub>1</sub>+p<sub>2</sub>+p<sub>3</sub>) + 1/(p<sub>1</sub>+p<sub>2</sub>+p<sub>4</sub>) + ... + 1/(p<sub>r-2</sub>+p<sub>r-1</sub>+p<sub>r</sub>), with a term for every combination of three types<br>
etc.<br>
<br>
This formula is derived from the geometric distribution (mentioned by Valancy Rachel above) and the <a href="https://en.wikipedia.org/wiki/Maximum-minimums_identity">maximum-minimums identity</a>. Continuing the illustration above with three types of seedlings that occur at rates 1/2, 1/3, and 1/6, the average number of seedlings you'd need to "catch them all" would be<br>
1/(1/2) + 1/(1/3) + 1/(1/6) - 1/(1/2 + 1/3) - 1/(1/2 + 1/6) - 1/(1/3 + 1/6) + 1/(1/2 + 1/3 + 1/6) = 7.3.<br>
<br>
Neat as it is, this formula is insanely unwieldy for 13 types and you will need a computer to evaluate it. At that point, you might just want to run a random simulation instead. (Edited to add: I think this formula is doing the same thing as the integral Johnny Assay also mentioned above. It looks about equally taxing to calculate.)comment:ask.metafilter.com,2016:site.299519-4339432Sun, 21 Aug 2016 18:22:28 -0800aws17576