vpbauer1 wrote:
Don't feel stoopid. I too wish these explanations were in plain
English. Maybe with real life examples.
Of course, part of the problem is that when the topic being discussed
is statistics is that it can require great pains to engage in the
discussion in terms where the jargon of statistics doesn't predominate.
To write in the clearest and most coherent fashion requires
considerable focus and effort. When jotting something off the cuff
for a post, the result will often be lacking in that regard.
And there's the fact that many, such as myself, might enter the
discussion without a consummate grasp of the subject. I have what
likely equates to something a little more comprehensive than a minor
in college math under my belt (my degree being in finance, but first
couple of years study were as a math major). So I'm going to be
stumbling awkwardly a bit in what I contribute ... hoping that someone
more conversant will set me straight.
Underlying the original post in this thread was the general question
of how statistics can be interpreted re/applied to video poker. The
answer is that only very poorly, where it comes to the short-term
(such as session to session results, or even over longer periods of
100,000 hands).
What's commonly offered up is that "video poker results aren't
normally distributed", thus you can't apply the basic statistical
relations that only apply to normally distributed (bell curve shaped)
data (e.g. 67% of data lies +/- 1 sd from the mean).
That's, of course, very true. But the statement, on it's face, should
leave one a little uneasy. What's not "natural" about the deal/draw
of a deck of cards, preventing a normal distribution? The answer is
that every aspect of how the cards fall adheres to a normal
distribution. It's that when you discuss video poker results, you're
talking about an accumulation of several normal distributions -- the
patterns with which pairs, flushes, straights, etc. fall -- the
payouts from which each are normally distributed, but because of
unequal frequencies, don't add to a cumulative normal distribution
(but instead, in the short to medium term, takes on a skewed,
long-tailed shape).
However, over a large number of hands, the skewness of that shape
diminishes and ultimately approaches (though never reaches) something
that approaches the bell shape of a normal distribution.
Now the phrase "long term" is tossed about pretty casually ... with
such a casualness that some will inaccurately say "no player achieves
the long term; only the machines does" ... as if to imply most of the
concepts underlying "optimal play" aren't really applicable to one's play.
But what "long term" is actually referring to is the point when play
results can be looked to adhere to expectation within certain desired
thresholds. That's hardly a precise definition ... those thresholds
are arbitrary. But when set in practical terms such as "how many
hands must be played for there to be a strong confidence of a positive
outcome" (a concept introduced here as "N0", by nightoftheiguanna),
the the "long term" becomes something very tangible and within the
scope of an active players play.
What I sought to address as an additional thought appended to a post
in this thread is the question of why the "long term" varies from one
game to the next. What is it about each game that drives that "long
term"? For that matter, what is it that ultimately converts the
short-term skewed distribution of a vp game into one that approaches a
normal distribution in the longer term.
Well, it's at this point that I begin to do a fair amount of less than
fully satisfactory hand waving. Intuitively I grasp what's going on.
But I haven't voiced that understanding sufficiently to to so
succinctly ... nor am I so conversant in statistics that I'm apt to do
so off the cuff.
But to start with it's helpful to visualize a short term vp
distribution (and at DW2K's prompting I offered up:
http://www.jazbo.com/videopoker/curves.html) and picture how the
individual normally distributed hand payouts (HP, 2P, S, F ...)
aggregate to form such a curve.
Now that, in itself, is a bit of a stretch. You have to picture
what's represented in the overall short term distribution ... every
combination of hits over the n hands charted (at one extremes, n hands
without any hits at all; at the other extreme, n successive RF hits --
both with minuscule probability, and therefore flat-lined as "0" on
the chart).
All of these possible hand combinations is directly related to the
underlying distributions of each potential hand type payout (i.e., for
a high pair, the frequency with which you hit 1 HP, 2 HP, ... over the
course of those n hands). But, the put it simply, the overall
distribution is a rather complex combination of those individual
normally distributed hand frequency charts.
But here's a key point: If all the hand types had the same frequency,
than the resultant overall chart of result frequency would also
reflect a normal distribution. It's the disparate frequencies that
give rise to the skewness (the differing payouts impact the shape of
that skewness, but aren't responsible for the skewness in itself).
But the crux of that skewness lies in a related concept: If you
sample a population for some measure and chart the sample, then it
generally takes about 20 more more measurements for that chart to
approximate a normal distribution. Until that point, you're apt to
sample only a portion of that overall distribution. Pull more than 20
measurements and the more measurements you pull, the more that chart
will approach a normal distribution.
So you can extend this concept to the charting of the distribution of
vp results for a length of n hands. If that sample isn't large enough
that you expect at least 20 occurrences of each hand (most notably,
doesn't extend at least 20 RF cycles), then the contribution of the
respective hands won't reflect a contribution that approaches
something normally distributed.
Having suggested that if your charted play length extends at least 20
cycles of each hand, the contribution of each hand's cumulative payout
will approach a normal distribution, it might seem that for any game
you would expect the "long term" to be reached within about 800,000
hands (20x40,000).
However, the key is that for each hand the payout distribution
"approaches" a normal distribution. When you look at a game such as
DDB, with more than one high paying hand with a frequency of less than
1 in 5000 hands, the cumulative disparity of those hands at 20 RF
cycles is sufficient to still skew the overall distribution
significantly. You need to chart a play length considerably beyond 20
RF cycles before you see a result that's bell shaped.
In the case of a low variance games such as Jacks or Better, even the
cumulative deviation for a truly normal distribution of the quad and
RF hands (the relatively low payout vs infrequency of the SF makes it
a much more negligible factor) means that something greater than 20 RF
cycles is necessary before the overall payout distribution looks
reasonably bell shaped (about 30 cycles, or 1.2 mil hands, does the
trick).
Trust me when I say I've played fast and loose with the facts in
spilling this out. Anyone knowledgable can find much to critique
(something I welcome, if they can shed greater light on the
discussion). But I'm fairly confident that this carries the essence
of what's going on here.
This may well not address what you were really after, or even
interested in ... but it's my best shot "off the cuff" 
- Harry