vpFREE2 Forums

Some VERY Disturbing Statistics

--- In vpFREE@yahoogroups.com, "nightoftheiguana2000"
<nightoftheiguana2000@...> wrote:

In the case of

the American Coin machines, it was impossible to hit a full coin
royal, which was a mistake and probably why they were caught as
eventually people figured that out.

It was an idiotic mistake... just to clarify.

Chandler

Do you track the non-Deuces, non-Royal return percentage?

~31 royal cycles (your ~1420k hands in 2008) isn't that much when it
comes to royals or deuces (~ 290 deuce cycles), but it is for the other
hands (i.e., it's ~2550 WRF cycles and many more cycles of the other
hands).

If you fairly certain that you aren't giving up a lot of EV in mistakes
or distractions, then I'd expect your non-deuce, non-royal return to be
pretty close to the expected 94.922%.

Natural royals and deuces happen infrequently enough that it takes
millions of hands (just run a few billion hand simulations using your
favorite VP software) for the deviations to work themselves out to
reach the expected frequency.

···

--- In vpFREE@yahoogroups.com, "alpine205" <alpine205@...> wrote:

First of all, I have to warn you -- I'm both a math guy and a
compulsive scorekeeper so if you're bored by numbers, you may want to
stop reading right now. In fact, my wife claims that the only reason
I play VP is so that I'll have an excuse to create a spreadsheet.
That, however, is only partially true.

Do you track the non-Deuces, non-Royal return percentage?

Royals - expected = 1.77%, actual(Apr-Dec) = 0.94%
Deuces - expected = 4.07%, actual(Apr-Dec) = 3.71%
Other - expected = 94.92%, actual(Apr-Dec) = 94.83%

These returns are well within the expected variations of FPDW for the
number of hands played (and probably would have been even closer to the
expected returns had you indicated your full play for the year). I'm
not seeing anything here that would make me wonder about the "fairness"
of the game or disturb me in any manner.

Not being a local, my wife and I only get to LV 4 times or so a year.
In 2008 we made it to LV 4 times and played ~102,000 hands and got only
1 Royal and 14 Deuces -- way below the expected 2 Royals and 21
Deuces. At the same time, our other return percentage was 95.176%,
above the expected 94.922%.

Add in our 3 trips in 2007 and our hand count goes up to ~167,000 with
5 royals (1 more than expected), 34 Deuces (right on the expectation)
and an other return percentage of 94.859% (a bit below expectation).

Our overall play (101.319%, due to that 1 extra royal) was well within
the expected variations of the game for the number of hands played
(which was only about 1/10th of what you and your wife played in 2008
alone).

···

--- In vpFREE@yahoogroups.com, "alpine205" <alpine205@...> wrote:

> Do you track the non-Deuces, non-Royal return percentage?

Royals - expected = 1.77%, actual(Apr-Dec) = 0.94%
Deuces - expected = 4.07%, actual(Apr-Dec) = 3.71%
Other - expected = 94.92%, actual(Apr-Dec) = 94.83%

These returns are well within the expected variations of FPDW for the
number of hands played

Could you elaborate on this statement a little? I'm curious as to what
you would consider to be outside the range of expected variations --
especially as it applies to royals? I thought I made a pretty good
case that my royal stats were, at the least, on the outer edge.

alpine205 wrote:

> These returns are well within the expected variations of FPDW for
> the number of hands played

Could you elaborate on this statement a little? I'm curious as to
what you would consider to be outside the range of expected
variations -- especially as it applies to royals? I thought I made
a pretty good case that my royal stats were, at the least, on the
outer edge.

I personally couldn't say that we're talking about "expected
variations", per se. But I come back to the fact that you're
arbitrarily making time divisions.

You've looked back and selected a specific period of time and
observed, "look how good <or bad> this was". I guarantee you that
most people will see swings in their play -- perhaps not quite as
remarkable, but sufficient so as to make your observations "not so
remarkable".

I have no problem buying onto the fact that for a given period you've
seen a best-in-<1% scenario, and for another you've seen worst-in-<1%.
But unless you use these experiences to predefine a time-frame within
which you're going to examine the results of another co-player *with
no pre-knowledge of those results*, they aren't particularly
significant ... just interesting.

- Harry

> These returns are well within the expected variations of FPDW for

the

> number of hands played

Could you elaborate on this statement a little? I'm curious as to

what

you would consider to be outside the range of expected variations --
especially as it applies to royals? I thought I made a pretty good
case that my royal stats were, at the least, on the outer edge.

I would consider 6 sigma to be outside of the expected variations.
Anything less than this will happen more often than anyone would like
(unless it's on the + side of the curve. The 3 sigma that you think is
so disturbing happens about once in a hundred trials. Not that unusual
at all.

Go ahead and run a couple of hundred trials of 934k hands using VPW and
you'll see what I mean.

···

--- In vpFREE@yahoogroups.com, "alpine205" <alpine205@...> wrote:

-3 sigma or more is 0.14%, about once in a thousand trials.

···

--- In vpFREE@yahoogroups.com, "vp_player" <weharter@...> wrote:

The 3 sigma that you think is
so disturbing happens about once in a hundred trials.

--- In vpFREE@yahoogroups.com, "nightoftheiguana2000"
<nightoftheiguana2000@...> wrote:

> The 3 sigma that you think is
> so disturbing happens about once in a hundred trials.

-3 sigma or more is 0.14%, about once in a thousand trials.

The 0.14% number (around 1 in 700) is for data that is normally
distributed. Even though Deuces Wild is far from normally
distributed, 934,000 hands is sufficient to make Deuces Wild data
pretty close to "normal".

To check this, I used DRA-VP to run 5000 trials of a 934,000-
round "year" of Deuces Wild. I got just 4 outcomes that were more
than 3 standard deviations below the mean. That's 0.08%. It's
clearly not in the 1% range that vp_player put forward.

Let me spell out what I did, in case I made an error…

1. The EV of 934,000 hands of Deuces Wild is 934,000*0.76% = +7098
units.

2. The variance of 934,000 hands of Deuces Wild is 934,000*25.42 =
24,097,200 units squared. (25.42 is the variance of a 1-unit hand of
FPDW)

3. The standard deviation of the 934,000 hands = SQRT(variance) = SQRT
(934,000) = 4909 units.

A result of minus 3 standard deviations is EV-3*SD, or 7098 – 3*4909
= -7628 units.

So, any result worse than losing -7628 is a loss of more than 3
standard deviations.

Using Dunbar's Risk Analyzer for Video Poker, I got a result worse
than -7628 in just 4 of 5000 tries .

--Dunbar

···

--- In vpFREE@yahoogroups.com, "vp_player" <weharter@> wrote:

For the record I'll just summarize my conclusion of alpine205's
results. He reported 11 royals in 20.7 cycles, a result that is 0.8%
possible with a fair average royal cycle, but 11.7% possible if the
royal cycle has been doubled. I'll assume an initial chance of this
being done of 1%, I think that's a reasonable assumption given that
there are many floor supervisors who are not happy about having to put
positive machines on their floor (they do it because it is necessary
to bring in business in these tough times). Others might start with
higher or lower assumptions, that's up to you. Given the results (11
royals in 20.7 cycles), and using Bayes' theorem, the new estimate is
that it's 13% possible that the royal cycle has been doubled on these
machines.

nightoftheiguana2000 wrote:

Given the results (11 royals in 20.7 cycles), and using Bayes'
theorem, the new estimate is that it's 13% possible that the royal
cycle has been doubled on these machines.

You'll offer up a statistic such as this on a period of play that's
been selected in hindsight?

I'm very disappointed. Very. :wink:

- H.

--- In vpFREE@yahoogroups.com, "nightoftheiguana2000"
<nightoftheiguana2000@...> wrote:

> The 3 sigma that you think is
> so disturbing happens about once in a hundred trials.

-3 sigma or more is 0.14%, about once in a thousand trials.

Yes, I was rounding perhaps a bit loosely.

For a normally distributed population the standard deviation
percentages are

1ó 68.27%
2ó 95.450%
3ó 99.7300%
4ó 99.993666%
5ó 99.99994267%
6ó 99.9999998027%
7ó 99.9999999997440%

FPDW at nearly 1 million hands is sufficiently "normal" that these
percentages are more than just in the ball park.

1 out of 371 trials will lie outside of the 3ó range (not the 1 out
of a hundred I said before, but it was close enough, definitely not a
OOM out).

To me at least, a 1 in 371 occurrence is within the expected range.

OOTH, only about 1 in 570,000,000 trials lie outside of the 6ó
population.

I would find results outside of the 6ó population VERY disturbing,
but consider a result that lies at about the 3ó range to be well
within expected variance.

···

--- In vpFREE@yahoogroups.com, "vp_player" <weharter@> wrote:

I think this has gone on far enough. NOTI did nothing wrong or even disappointing, and
nor did the other posters who computed things "in hindsight". The only mistakes that
were made were peoples interpretation of results (as in the meaning they took from
things) and the ridiculous assumption that the stat. test needs to be determined before
data is collected. Bayes clearly proved that assumption was wrong-- as does common
sense. This is not just an issue of the difference in "a priori" or "a posteriori" -- and I'm
not a fan of throwing these terms around in statistics-- its far better to just define what
was actually done, as noti and others did, rather then personal philosophy.

FWIW, its common practice to collect the data first, and then decide on what statistical
tests should be done on it. In many cases it is unknown ( a priori, lol) how much data is
needed and the only way to determine if enough data has been collected is to test it,
varying the amount of data used in each test (after collecting all of it). LIkewise, it might
not be known if too much data (too long of a period) is to be used for a test until you
collect too much (this often happens when the underlying process is not strictly stationary;
that is, it is time-varying). The important thing is to understand what the tests that were
done mean. (how to interpret them).

···

--- In vpFREE@yahoogroups.com, "Harry Porter" <harry.porter@...> wrote:

nightoftheiguana2000 wrote:
> Given the results (11 royals in 20.7 cycles), and using Bayes'
> theorem, the new estimate is that it's 13% possible that the royal
> cycle has been doubled on these machines.

You'll offer up a statistic such as this on a period of play that's
been selected in hindsight?

I'm very disappointed. Very. :wink:

- H.

Not so fast. The tail of the distribution (out there in those big sigmas) is NEVER normal... and
nor is it symmetric around the mean or EV (that is, you can always win more then you can
loose) It's not completely trivial to compute really accurate #'s for the tails-- even though we
know the starting distribution (pay table) perfectly.

···

--- In vpFREE@yahoogroups.com, "vp_player" <weharter@...> wrote:

FPDW at nearly 1 million hands is sufficiently "normal" that these
percentages are more than just in the ball park.

> nightoftheiguana2000 wrote:
> > Given the results (11 royals in 20.7 cycles), and using Bayes'
> > theorem, the new estimate is that it's 13% possible that the
> > royal cycle has been doubled on these machines.

Harry Porter wrote:

> You'll offer up a statistic such as this on a period of play
> that's> been selected in hindsight?
>
> I'm very disappointed. Very. :wink:
>
> - H.

0

cdfsrule wrote:

I think this has gone on far enough. NOTI did nothing wrong or
even disappointing, and nor did the other posters who computed
things "in hindsight" ...

You did catch the "emoticon", right?

"Disappointed" is in the eye of the beholder (even when observed with
a *wink*).

Yes, someone can take an observed sequence of events and offer up a
probability that an aberrational nature is due to the introduction of
an outside influence as opposed to just statistical flux.

But to just toss out a statistic without addressing the greater
context of the thread isn't something I'm particularly fond of. Of
course, I'm speaking solely for myself. Anyone is welcome to spin
numbers as it suits them.

Were I in NOTI's shoes (I'm not), I might have closed with:
"Disturbing? Not particularly."

- H.

My choice, based on the math, would be "potentially disturbing". Like
I said, I assumed 1% as an initial guess, some might assume 0% or some
other figure, which would change the results. Given the human
tendency, I would never assume 0%. I also see no problem in selecting
all data from an arbitrary date (presumably the date the fix was in)
to the present. If my results prior to that date were better than
average, that just gives another reason for the fix.

···

--- In vpFREE@yahoogroups.com, "Harry Porter" <harry.porter@...> wrote:

Were I in NOTI's shoes (I'm not), I might have closed with:
"Disturbing? Not particularly."

This is not just an issue of the difference in "a priori" or "a

posteriori" -- and I'm

not a fan of throwing these terms around in statistics-- its far

better to just define what

was actually done, as noti and others did, rather then personal

philosophy.

FWIW, its common practice to collect the data first, and then

decide on what statistical

tests should be done on it.

"common practice" is not a good measure of best practice. I agree
with you that data mining is common; but that doesn't lessen the
fuzziness that results from trying to extract meaning from the mined
results..

In science when a data set produces an outcome that was not part of a
rigorous start/end sort of experiment, it is invariably noted
that "further tests need to be done". And, in fact, it is very often
the case that the further tests do not substantiate the initial
claim.

In many cases it is unknown ( a priori, lol) how much data is
needed and the only way to determine if enough data has been

collected is to test it,

varying the amount of data used in each test (after collecting all

of it).

As I mentioned earlier, if you are willing to collect data for a long
enough period, the chance of eventually finding yourself 3 sd's below
your mean result will approach 100%. So what can you conclude from
an open ended test?

LIkewise, it might
not be known if too much data (too long of a period) is to be used

for a test until you

collect too much (this often happens when the underlying process is

not strictly stationary;

that is, it is time-varying). The important thing is to understand

what the tests that were

done mean. (how to interpret them).

The main objection raised to the initial post was to the conclusion
that there was some special significance to the results obtained
after creating an after-the fact division point. Further, I and
others thought it was incorrect to use that arbitrary subset as
evidence that something isn't on the level at Red Rock.

I can play one hand of FPDW, and if I'm dealt Jc5d8sKc4c say, "wow,
what were the odds of THAT happening?" More than a million to one.
That's an extreme example of not defining a problem and a test ahead
of time.

Btw, I thought NOTI's posts were interesting and thoughtful. As he
made clear, he had to make two assumptions to do the calculation in
his 2nd post. He assumed there was a 1% chance that the game is
rigged, and he assumed that the rigging consists of doubling the RF
cycle. I think Bayes theorem is more useful when you have
something more to base the assumption on than pure conjecture. I
agree with NOTI that the chance the game is rigged is greater than
zero, but I think he would have a tough time showing me evidence that
the chance is closer to 1% than 0.1%, for example.

Bottom line:
I agree with you when you said, "The important thing is to understand
what the tests that were done mean. (how to interpret them)." So,
how would you interpret the Red Rock data? IMO, there is nothing to
interpret, in the same way that there is nothing to interpret if I
just keep playing until I am 3 standard deviations behind.

--Dunbar

>
> nightoftheiguana2000 wrote:
> > Given the results (11 royals in 20.7 cycles), and using Bayes'
> > theorem, the new estimate is that it's 13% possible that the

royal

> > cycle has been doubled on these machines.
>
> You'll offer up a statistic such as this on a period of play

that's

···

--- In vpFREE@yahoogroups.com, "cdfsrule" <vpfree_digests@...> wrote:

--- In vpFREE@yahoogroups.com, "Harry Porter" <harry.porter@> wrote:
> been selected in hindsight?
>
> I'm very disappointed. Very. :wink:
>
> - H.
>

At what point would you begin to suspect an altered machine?

···

--- In vpFREE@yahoogroups.com, "dunbar_dra" <h_dunbar@...> wrote:

Bottom line:
I agree with you when you said, "The important thing is to understand
what the tests that were done mean. (how to interpret them)." So,
how would you interpret the Red Rock data? IMO, there is nothing to
interpret, in the same way that there is nothing to interpret if I
just keep playing until I am 3 standard deviations behind.

dunbar_dra wrote:

Btw, I thought NOTI's posts were interesting and thoughtful. As he
made clear, he had to make two assumptions to do the calculation in
his 2nd post. He assumed there was a 1% chance that the game is
rigged, and he assumed that the rigging consists of doubling the RF
cycle. I think Bayes theorem is more useful when you have
something more to base the assumption on than pure conjecture. I
agree with NOTI that the chance the game is rigged is greater than
zero, but I think he would have a tough time showing me evidence
that the chance is closer to 1% than 0.1%, for example.

Dunbar, I trust I'm not beating a dead horse at this point by using
your on target comments here as impetus for one last observation that
better crystallizes my perspective on this.

···

------

NOTI's calculations are dead on and the methodology impeccable. Since
he didn't extrapolate them into any fixed observation about the
fairness of the machine (just the probability that the results might
be a consequence of rigging under a given hypothesis <doubling the
royal cycle>), I see nothing to draw exception to.

Still, it's my guess that most people would be inclined to extend his
comments into a suggestion that the machine fairness was dubious.
That's compounded a bit by the "potentially disturbing" remark. While
I get that he likely states this in the sense of "since the statistics
don't very strongly rule out the possibility, it's something not to
dismiss entirely, I doubt I'm in a minority in, on initial read,
reading this as "good cause for concern".

NOTI's not responsible for how people read his words, but it's prudent
to be sensitive to it ... particularly in the context in which the
subject's been raised (which I've gathered to be something a little
more significant than, "gee, isn't this unusual?")

Bottom line for me: No problem with NOTI's assertion. It's
significance: No info re the fairness of the machines. Strong
hypothesis from which a prospective test can be performed, if so
motivated.

- Harry

Yes, NOTI has the math correct,i.e., .77% probability of 11 RF in
20.7 RF cycles, and 11.70% chance of 11 RF in 20.7 RF cycles ASSUMING
the RF cycle has been tampered with to be 1 in 90,564 hands instead
of the normal 45,282 hands, but why would you even contemplate this
assumption based on only ONE data point of 20.7 cycles?

The mean number of RF for 20.7 cycles is, of course, 20.7. The
standard deviation is 4.55. So, 11 RF is a little over 2 SD from the
mean on the - side of the distribution. If you graph the binomial
distribution, 11 RF is still visibly up the curve and isn't even in
the "zero" tail yet. An unfortunate result to be sure (particularly
if it's your play that you're talking about), but not that unusual.

What made this supposedly "very disturbing" was that over the 10.8 RF
cycles prior to these 20.7 cycles that 19 RF had been obtained.
Amazingly enough, the probability of 19 RF in 10.8 cycles is .72%,
which is about 2.5 SD on the + side of the 10.8 cycle distribution.

I would agree that it is unusual for the same 2 players combined
results to be on very opposite sides of the distribution in back to
back trials of an unequal number of total hands, but would it make
you think the machines had suddenly been "gaffed"?

I guess to some it would indeed raise that possibility, but for me,
I'd have to see many more 20.7 cycle trials before I'd even start
speculating about "gaffed" machines.

Suppose instead of looking at this play as 10.8 cycles followed by
20.7 cycles, it was looked at as 2, 15.75 cycles? Then there would
have been 22 RF in the first 15.75 cycles and 8 in the second 15.75
cycles. The probability of 22 in 15.75 is 2.81% and 8 in 15.75 is
1.35%. Would these results have been disturbing, again given now
just 2 trials worth of data points?

The point is there just isn't enough data to come to any conclusions
about the fairness of the machines.

This is why the question was asked about given just a single data
point, how many standard deviations away from the mean would it need
to be before one would wonder about a machine being "gaffed". My
orignal response was 6 SD, but now, after having actually done the
math and graphed the curve, I wouldn't argue with a case being made
for 4 SD (only 1 or 2 RFs in 20.7 cycles) to make one wonder about
the fairness.

--- In vpFREE@yahoogroups.com, "Harry Porter" <harry.porter@...>
wrote:

dunbar_dra wrote:
> Btw, I thought NOTI's posts were interesting and thoughtful. As

he

> made clear, he had to make two assumptions to do the calculation

in

> his 2nd post. He assumed there was a 1% chance that the game is
> rigged, and he assumed that the rigging consists of doubling the

RF

> cycle. I think Bayes theorem is more useful when you have
> something more to base the assumption on than pure conjecture.

I

> agree with NOTI that the chance the game is rigged is greater

than

> zero, but I think he would have a tough time showing me evidence
> that the chance is closer to 1% than 0.1%, for example.

Dunbar, I trust I'm not beating a dead horse at this point by using
your on target comments here as impetus for one last observation

that

better crystallizes my perspective on this.

------

NOTI's calculations are dead on and the methodology impeccable.

Since

he didn't extrapolate them into any fixed observation about the
fairness of the machine (just the probability that the results might
be a consequence of rigging under a given hypothesis <doubling the
royal cycle>), I see nothing to draw exception to.

Still, it's my guess that most people would be inclined to extend

his

comments into a suggestion that the machine fairness was dubious.
That's compounded a bit by the "potentially disturbing" remark.

While

I get that he likely states this in the sense of "since the

statistics

don't very strongly rule out the possibility, it's something not to
dismiss entirely, I doubt I'm in a minority in, on initial read,
reading this as "good cause for concern".

NOTI's not responsible for how people read his words, but it's

prudent

···

to be sensitive to it ... particularly in the context in which the
subject's been raised (which I've gathered to be something a little
more significant than, "gee, isn't this unusual?")

Bottom line for me: No problem with NOTI's assertion. It's
significance: No info re the fairness of the machines. Strong
hypothesis from which a prospective test can be performed, if so
motivated.

- Harry