Lengths of Runs
In Coin Flips
Around the beginning of the year 2000, a question appeared on the AP stats newsgroup that asked, “How do I find the probability that in ten flips of a coin I will get a run of three or more heads or three or more tails.” In February, Joshua Zucker sent a beautiful solution related to the idea of a Fibonacci sequence. Later I had opportunity to respond with a somewhat generalized solution for the problem for runs of other lengths. I have posted Joshua’s post first, with my generalization below.
In May of 2004 a similar problem was submitted by Mike Shelly. He asked the probability that a basketball player who shot 85% would have a string of 4 misses in a row in a string of 100 shots. I have added "notes on that question below the first set...
Here is a link to a Fathom file that I have on this. It dates back to 2001 and seems too well organized to be my work, but I don't know where I got it...(Maybe Jill or Bill or someone at Fathom,, oh well, Thanks to whomever). It runs several thousand simulations of ten flips (or change to another number) and shows the distribution of max run length for each trial..
First Joshua’s epiphany:
I'm going to count the number of ways with at least 3 consecutive
heads or tails by counting the complement: how many have strings of
only 1, or of 1 or 2, consecutive.
Well, only 1 is easy: HTHTHTHTHT or THTHTHTHTH ... two ways.
In fact everything starting with H has its symmetric buddy starting with
T (just turn all the Hs into Ts and vice-versa), so I'll assume we
start with H from here on out, and just double it at the end.
OK, so how many sequences are there with 1 or 2 steps at a time?
Let's start out with length 1 and look for a pattern. Each time
I look at each of the previous things and see if adding H or T is
legal (doesn't create a string of 3):
Length 1: H, 1 way.
Length 2: HH or HT, 2 ways
Length 3: HHT or HTH or HTT, 3 ways
Length 4: HHTH or HHTT or HTHH or HTHT or HTTH, 5 ways.
Length 5: HHTHH or HHTHT or HHTTH or HTHHT or HTHTH or HTHTT or HTTHH or HTTHT,
Well, looks Fibonacci to me! Why?
Let's see if I can explain why there will be 13 such strings of length 6.
After some messing around, I finally see an explanation ... mostly by
remembering some related problems I've seen before.
Here's the trick.
Any string of length 6 either ends with a double letter (HH or TT at the
end) or with alternate letters (HT or TH).
I can make all the length 6 strings that end with alternate letters
by sticking the alternate letter onto the end of one of the length 5
strings. So there are 8 length 6 strings that end with alternate letters.
For example, HHTHH leads to HHTHHT, and HHTHT leads to HHTHTH.
I can make all the length 6 strings that end with double letters by
sticking the (alternate) double letter onto one of the length 4 strings.
That is, for example, HHTH can't have a double H added to it, so it
leads to HHTHTT.
Combining the two methods indeed gives every possible thing that
starts with HHTH, so I'm convinced that this thing works.
Thus the number of length 10 strings is, um,
13, 21, 34, 55, 89 ... 89.
so (2 with strict alternation + 89 with ...
oh wait, the strictly alternating ones are counted here too.
OK, so (89 sequences with no string of 3 in a row) / 1024 ...
oh wait again, I need to multiply by 2 because they could start with
H or with T.
So, fine, it's 89/512.
That's the probability I DON'T want, so 1 - 89/512 ... .826, to
three decimal places. Looks like I caught all my arithmetic and
counting mistakes (or carefully made offsetting ones!)
Thanks for the fun problem,
Gunn HS, Palo Alto, CA
Joshua's wonderful answer appears to be only a teaser to a more grand
generalization. The same arguments could be applied to runs of four
or five or N consecutive H or T. And the solution appears to be a
simple extension of the fibonacci sequence that he has explained. I
have convinced myself (which is far easier than a proof) that the
following generalizations are true, and that probably Joshua already
knew all this and omitted it as an exercise for the reader...
To count the number of ways of strings of length K without N
consecutive H or T, you simply add the N-1 previous values. A table
below shows the sums to 10 for the cases of up to 5 in a row.
Number of coin flips
1 2 3 4 5 6 7 8 9 10 Prb
N=2 2 2 2 2 2 2 2 2 2 2
N=3 (J's case)2 4 6 10 16 26 42 68 110 178
N=4 2 4 8 14 26 48 88 162 298 548
N=5 2 4 8 16 30 58 112 216 416 802
N=6 2 4 8 16 32 62 122 240 472 928
And of course the probability that you do have a string as long as N
is the 1-fN(10)/1024 . It should be observed that the 178 cases for
N=3 includes the 2 cases for N=2, so we can also find the number of
strings which must have at least two but not three if I am thinking
this through correctly....
That would mean that if you flipped a coin ten flips, of the 1024
2 of them have a max run length of 1,
178-2 = 176 of them have a max run length of 2,
548-178= 370 have a max run length of 3
802-548= 254 have a max run length of 4
928-802= 126 have a max run length of 5
and less than 100 have a rund length of 6 or more.... and we see that
runs of length three seem to be the most common....
I simulated 1000 trials with FATHOM, (too stupid to set it for 1024,
sorry) and got 99 outcomes with a run of six or more. There were 171
with runs of max length two... and by coincedence, 370 with max runs
of length three; 231 with length of four, 125 with length five, and a
total of four with no repeats in the string.
I hope I have my mind on straight and didn't mess this up too greatly,
but it seems like a realy nice approach to attack problems of runs, in
fact, it almost takes the mystery out of the whole situation....
...when I posed it to a student (varsity basketball player) after he had finished his daily work: "What is the probability that an 85% free-throw shooter will miss 4 shots in a row during a string of 100 attempts?" We are working on ways to simulate this, but we've not hit on a reasonable approach (yet). We are also playing with computational approaches, but we seem to keep running into "overlaps" as we try to decompose the possible sequences into successes and failures. Has anyone completed a problem such as this? Any insights would be appreciated. Mike Shelly Andover High School
My first response suggested a modification of the original Fathom simulation, and then I realized that the problem could be done with a Markov Matrix approach, and so I submitted a second post.
Mike and all, ON the slow drive home through one of the national forests here in East Anglia I thought about the problem and realized that it should submit to a Markov matrix approach, so I tried it at home and it comes up with essentially the same answer as the simulation... about .041. To do the problem think of there being five states for the shooter to be in... If he ever gets to the state of "four misses in a row" then he stays there forever, what is called an absorbing state, because no matter how many he makes (or misses) after that he will still have had a string of four misses in this trial. For the other states (number of consecutive misses up to last shot) he has a .85 probability of moving back to zero misses, and a .15 probability of moving to the next higher number of misses. The matrix looks like this
If we raise this to the second power we get the probability of moving from any state to another state in 2 shots... and if we raise it to the 100th power, the first row of the result tells us the probability of being in any given state after our 100 shots.....
My Ti-83 (gotta love that litte machine) gives the first row as .815, .122, .018, .0027, .041 as the probabilities of being in state zero, one, etc after 100 shots.....
A good weekend to you all