vpFREE2 Forums

how to tell if your machine is fair?

What has it come to, this sensible life?

···

On Apr 16, 2012, at 3:58 PM, Mitchell Tsai <tsai@cs.ucla.edu> wrote:

Frank,

This is where Bayesian theory (and more accurate "a priori" beliefs)
allow more accurate probability calculations.

If you use past data, and assume P(all events) = equal, then you often
run into pre-selection bias; e.g. I picked a weird set of data.
So Bayesian analysis will use P(my data set is unusual) = whatever you
set.

Another example, say I'm considering video poker games in Las Vegas at
1) major casino in Las Vegas - P(prior belief in gaffed machine) < 0.01
2) non-name casino at Indian reservation where other people are
reporting suspicious result - P(prior belief in gaffed machines) = 0.25

Then P(belief machine is gaffed after test | prior belief) = function
of test result and P(prior belief in gaffed machines).

If you use a non-random set of data (e.g. data you have gathered
before), then
P(belief machine is gaffed) = function of test result and P(prior
belief in gaffed machines) and selection-bias-in-original test)

Mitchell

A similar example of selection-bias is one about weather.
My friend tells me that last week it rained 6 out of 7 days, and they
ask how unusual that is...

Most people will just calculate how unlikely it is to have rain 6 of 7
days.
A better calculation will take into account that my friend is only
telling me this because it is "somehow weird" (e.g. no royals in
120,000 hands)
and factor in the "selection-bias".

I have no idea what you just said.

TC___

[Non-text portions of this message have been removed]

I have no idea what you just said.

Sorry. Lots of mathematics.

Say you run a test, it says 25% chance of a gaffed machine.
This is usually not quite accurate, because it assumes "random dataset" and "everything else being equal".

To paraphrase Frank's concern, when you run a test on "old data", often you have selective memory, and are remembering a particularly "bad" or "good result".
...so the "25% of a gaffed machine" may actually be "5%".

If I'm in Las Vegas where I strongly believe the machines are fair, I may chalk up the "25%" result to bad luck, and still believe in a "<1%" chance of a bad machine.

If I'm in an Indian casino whether there been rumors/stories of bad machines, I may believe there is a "90%" chance of a bad machine.

Hope this helps.
Mitchell

P.S. Bayesian inference is one of the math techniques to combine knowledge from multiple sources.
  1) old data, new data, data not randomly created
  2) reliability of Las Vegas machines (which are regulated)
  3) rumors/stories from other people

If John believes 50% in bad machines at Casino A, Mark believes 25%, we have multiple data sets, and run some tests.
Bayesian creates a network of nodes, with arrows connecting the nodes, and propagation rules to send the information/calculations back and forth.

What Bayesian analysis did, is show mathematicians that we usually double-counted, overcompensated, or undercompensated
for multiple information sources, when they interconnect with each other, and we try manually to calculate the overall probability.

The basic P(A or B) = P(A) + P(B) - P (A and B) is unchanged in Bayesian analysis.
It's the messy combining everything together which changed.

I'm aware of this, but I doubt most people are. That's why I'm suggesting using the test on only new data. Yes there are ways to account for selection bias, but they are complicated and require understanding on the part of the user.

This utility is being designed for people that don't understand how it works completely. To target this demographic we have to keep it simple...but thanks for the suggestions.

~FK

···

--- In vpFREE@yahoogroups.com, Mitchell Tsai <tsai@...> wrote:

Frank,

This is where Bayesian theory (and more accurate "a priori" beliefs)
allow more accurate probability calculations.

If you use past data, and assume P(all events) = equal, then you often
run into pre-selection bias; e.g. I picked a weird set of data.
So Bayesian analysis will use P(my data set is unusual) = whatever you
set.

Another example, say I'm considering video poker games in Las Vegas at
  1) major casino in Las Vegas - P(prior belief in gaffed machine) < 0.01
  2) non-name casino at Indian reservation where other people are
reporting suspicious result - P(prior belief in gaffed machines) = 0.25

Then P(belief machine is gaffed after test | prior belief) = function
of test result and P(prior belief in gaffed machines).

If you use a non-random set of data (e.g. data you have gathered
before), then
  P(belief machine is gaffed) = function of test result and P(prior
belief in gaffed machines) and selection-bias-in-original test)

Mitchell

A similar example of selection-bias is one about weather.
My friend tells me that last week it rained 6 out of 7 days, and they
ask how unusual that is...

Most people will just calculate how unlikely it is to have rain 6 of 7
days.
A better calculation will take into account that my friend is only
telling me this because it is "somehow weird" (e.g. no royals in
120,000 hands)
and factor in the "selection-bias".

On Apr 16, 2012, at 3:09 PM, Frank wrote:
> OK. You completely misunderstood what I was saying. It will
> completely invalidate the testing utility I'm making if people use
> their currently existing data. Why? Imagine this.
>
> You post in the newspaper that you'd like to do a study into how
> likely it is to be hit by lighting. Not surprisingly, the people
> that answer your add are those most concerned about this issue (AKA
> people that have been hit). After looking at all your volunteer test
> subjects you conclude that the chances of being hit by lighting are
> 1 in 1.
>

That's likely because you have not studied Bayesian Statistics and Bayesian Inference. What he said made complete sense to me, but one would have to know the mathematical concepts behind it for it to make sense to anyone.

Don't worry I'm keeping such stuff out of the utility and keeping it simple.

~FK

···

--- In vpFREE@yahoogroups.com, Tabbycat <tabbycat@...> wrote:

I have no idea what you just said.

TC___

I like the analogy; This immediately brought another question to mind - can people (not the machine) be gaffed? I don't mean this in a bad way, just that in the big picture almost every study I have seen, people are distributed along a probability curve. I didn't look at the data, but I even remember a study about lightning and that certain people were more or less predisposed to being struck and that those once struck were more predisposed to a second strike.

My point is, what if everyone's baseline varied from the expected for every VP hand when examined on many different machines for a large N value of hands? For example, I thought I was not converting 4 to a flush near the expected frequency regardless of what machine I played. I then kept count over several sessions on 9/6 job and several different machines. I only kept count for 200 consecutive events 23 converted out of this lot. Perhaps I am gaffed in a negative manner for this hand?

···

--- In vpFREE@yahoogroups.com, "Frank" <frank@...> wrote:

OK. You completely misunderstood what I was saying. It will completely invalidate the testing utility I'm making if people use their currently existing data. Why? Imagine this.

You post in the newspaper that you'd like to do a study into how likely it is to be hit by lighting. Not surprisingly, the people that answer your add are those most concerned about this issue (AKA people that have been hit). After looking at all your volunteer test subjects you conclude that the chances of being hit by lighting are 1 in 1.

Problem: All the people that weren't hit by lighting, didn't volunteer.

Solution: Take the volunteers, but toss out all that has happened to them in their lives before they signed up for your study. Dismiss their preexisting data, and collect new data from this point on.

The rule of thumb with statistical tests is never to use the data that made you want to do the test. Test forward from the point in time you decide to do the test and dismiss what's gone before.

All data by definition is past data. The past I'm talking about here, that should be ignored, is what's happened before you decided to do the test.

~FK

--- In vpFREE@yahoogroups.com, "cdfsrule" <cdfsrule@> wrote:
>
> I know I am taking this quote out of context (sorry FK), but your statement:
>
> --- In vpFREE@yahoogroups.com, "Frank" <frank@> wrote:
> >
> >Statistical test cannot be used on anything that's already happened, or else one opens the door for selective recruitment and confirmation bias.
> >
> > ~FK
> >
>
> is absolutely not true. In fact, statistical tests can only be used on "data"-- that is on stuff that already has been observed, computed, recorded, etc. In fact, statistical tests are used in determining (in the sense of ascribing a probability to) if there is or was bias, selective recruitment, etc. of events (and associated data) that has already occured.
>
> Take a look at: http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
>

My point is, what if everyone's baseline varied from the expected for every VP hand when examined on many different machines for a large N value of hands? For example, I thought I was not converting 4 to a flush near the expected frequency regardless of what machine I played. I then kept count over several sessions on 9/6 job and several different machines. I only kept count for 200 consecutive events 23 converted out of this lot. Perhaps I am gaffed in a negative manner for this hand?

When you look at LOTS of items, some are usually far from the norm.
Say you have 1% chance of having very few 4As.

If you look at 4As, 4-2s, ...., 4-Ks, 4 to a spade flush, 4 to a diamond flush, etc...
Overall you might have a 25% or 50% chance of having at least 1 fluke result.

Sometimes researchers who use statistics don't understand this "meta-result".

Mitchell

OK, you converted 23 of 200 flushes. That's 11.5%. Whenever you have 4 cards to a flush, there are 9 cards left in the deck that can make the hand and 48 cards total. So, that is 18.75% you should expect to make or 37.5 hands of 200. Was what you experienced bad luck? Consult the binomial distribution. http://en.wikipedia.org/wiki/Binomial_distribution

Your n=200 and p=.1875, np=37.5 and the variance is np(1-p)=30.5. The square root of that is the standard deviation = 5.5. So, what happened to you was (37.5-23)/5.5 = 2.6 standard deviations below the mean.

In general, you had a bad day. What happened to you (the down swing) happens less than 0.5% of the time (just under 0.4%). 4 out of every 1,000 times. You should be pissed, but not ready to take legal action. Try it again randomly (without the "selection bias" that the other poster refers to, you may have only remembered this bad time) and if the same thing happens, then something might be wrong. Since you had a large expected value of number of hands (18.75) you can approximate all this stuff with the normal distribution too via some online calculator.

···

--- In vpFREE@yahoogroups.com, "armchairpresident" <smellypuppy@...> wrote:

I like the analogy; This immediately brought another question to mind - can people (not the machine) be gaffed? I don't mean this in a bad way, just that in the big picture almost every study I have seen, people are distributed along a probability curve. I didn't look at the data, but I even remember a study about lightning and that certain people were more or less predisposed to being struck and that those once struck were more predisposed to a second strike.

My point is, what if everyone's baseline varied from the expected for every VP hand when examined on many different machines for a large N value of hands? For example, I thought I was not converting 4 to a flush near the expected frequency regardless of what machine I played. I then kept count over several sessions on 9/6 job and several different machines. I only kept count for 200 consecutive events 23 converted out of this lot. Perhaps I am gaffed in a negative manner for this hand?

--- In vpFREE@yahoogroups.com, "Frank" <frank@> wrote:
>
> OK. You completely misunderstood what I was saying. It will completely invalidate the testing utility I'm making if people use their currently existing data. Why? Imagine this.
>
> You post in the newspaper that you'd like to do a study into how likely it is to be hit by lighting. Not surprisingly, the people that answer your add are those most concerned about this issue (AKA people that have been hit). After looking at all your volunteer test subjects you conclude that the chances of being hit by lighting are 1 in 1.
>
> Problem: All the people that weren't hit by lighting, didn't volunteer.
>
> Solution: Take the volunteers, but toss out all that has happened to them in their lives before they signed up for your study. Dismiss their preexisting data, and collect new data from this point on.
>
> The rule of thumb with statistical tests is never to use the data that made you want to do the test. Test forward from the point in time you decide to do the test and dismiss what's gone before.
>
> All data by definition is past data. The past I'm talking about here, that should be ignored, is what's happened before you decided to do the test.
>
> ~FK
>
> --- In vpFREE@yahoogroups.com, "cdfsrule" <cdfsrule@> wrote:
> >
> > I know I am taking this quote out of context (sorry FK), but your statement:
> >
> > --- In vpFREE@yahoogroups.com, "Frank" <frank@> wrote:
> > >
> > >Statistical test cannot be used on anything that's already happened, or else one opens the door for selective recruitment and confirmation bias.
> > >
> > > ~FK
> > >
> >
> > is absolutely not true. In fact, statistical tests can only be used on "data"-- that is on stuff that already has been observed, computed, recorded, etc. In fact, statistical tests are used in determining (in the sense of ascribing a probability to) if there is or was bias, selective recruitment, etc. of events (and associated data) that has already occured.
> >
> > Take a look at: http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
> >
>

Thanks for the refresher on binomial distributions. I haven't seen
the np(1-p) variance thing in so many years.
P.S. 47 cards left for the draw (not 48).

What would be interesting is for x hands, what is the expected best/
worst streak of certain types of hands (and what's the std dev on
those streaks).
My mind is just too lazy to sit down & figure out the equations, but
the result would be something like...

If you have 1,000 flush draws, your best & worst 50-hand flush streaks
will probably be 3/50 (+/- 2) and 20/50 (+/- 5).
Any mathematicians out there handy with this stuff?

Streaks often have some really non-intuitive behaviors.

···

On Apr 18, 2012, at 8:59 AM, matt20482002 wrote:

OK, you converted 23 of 200 flushes. That's 11.5%. Whenever you have
4 cards to a flush, there are 9 cards left in the deck that can make
the hand and 48 cards total. So, that is 18.75% you should expect to
make or 37.5 hands of 200. Was what you experienced bad luck?
Consult the binomial distribution. http://en.wikipedia.org/wiki/Binomial_distribution

Your n=200 and p=.1875, np=37.5 and the variance is np(1-p)=30.5.
The square root of that is the standard deviation = 5.5. So, what
happened to you was (37.5-23)/5.5 = 2.6 standard deviations below
the mean.

-----
Here's another quirky streak thing.

If we play a 50-50 game, in the *long run* we will be highly likely
close to even.

Guess how often we will flip from "losing" to "winning" (or vice-versa)?
I think in 100,000 trials (or 1,000,000 trials), the expected number
of crossings is 5-6. (I haven't seen the result in decades, so I
could be a little off here).

  or have a bad losing streak.
This matches our real-world experience, that we tend to either "get
lucky" and have a long winning streak.

------
To translate to the flush draw issue. Once you are on a bad streak of
flush draws, the streak will probably go for quite a while before
rebalancing.

e.g. If you converted 23 of 200 flushes, what is the expected number
of flush draws you will need before you are statistically even?
I don't know, but my raw guess would be that you might need
1,000-2,000 flushes before you draw even.

Mitchell

P.S. I have seen some analysis along these lines for "Risk of Ruin"
and "How much do you expect to lose before going permanently positive
when playing blackjack?"
e.g. When playing $5 blackjack (with $5-25 spread), you might expect
to lose $600 before going permanently positive ($600 is totally made
up).

[Non-text portions of this message have been removed]

Shorter version of my previous message.

It would be cool to know, if I played 5,000 or 10,000 flush draws, how likely I am to have a bad streak where only 23/200 flushes connect.
Or...What is the number of flush draws I need to play (e.g. 1,000, 10,000), before I have 50% chance of seeing a bad streak of 23/200 or worse.

Mitchell

Here's a link I found:

Recently I’ve come across a task to calculate the probability that a run of at least K successes occurs in a series of N (K≤N) Bernoulli trials (weighted coin flips), i.e. “what’s the probability that in 50 coin tosses one has a streak of 20 heads?”
  http://www.askamathematician.com/2010/07/q-whats-the-chance-of-getting-a-run-of-k-successes-in-n-bernoulli-trials-why-use-approximations-when-the-exact-answer-is-known/

It's messy.

Does any know any links to "bad streaks of 23/200" as opposed to "runs of 23"?

···

On Apr 18, 2012, at 8:59 AM, matt20482002 wrote:

OK, you converted 23 of 200 flushes.

Hi.

Sorry for the triple post.

I found an interesting 2010 paper about "mean time" and "waiting time" and using it to detect changes in underlying behavior (e.g. a gaffed game).
  http://ff.im/USw3M

So what are the "mean time" and "waiting time" to see a bad streak of 23/200 flushes?
How many hands?

If we have seen 23/200 flushes way too soon, maybe the underlying machine is "gaffed".

Mitchell

The authors of the papers say "The statistics of waiting time may not justify the prediction by the gambler’s fallacy, but paying
attention to streaks in the hot hand belief appears to be meaningful in detecting the changes in the underlying process."

The debate over the statistical validity of the hot hand belief has lasted more than twenty years (e.g., Bar-Eli et al., 2006), and it
is not likely to be ended by simply introducing a new set of statistics. However, pattern time statistics do seem to support some
of the existing theories.

In particular, it has been suggested that the hot hand belief arises when people are evaluating human performance, and people pay
particular attention to streak patterns in order to detect a change in the performance, for example, fluctuations in
the shooting accuracy of basketball players (e.g., Ayton & Fischer, 2004; Burns, 2004; Burns & Corpus, 2004; Sun, 2004).

By such account, the prediction to continue a streak is actually valid on the basis of a higher probability
of a single outcome (e.g., a higher shooting accuracy, a higher probability of heads in case of a biased coin).

It can be shown that by the measure of either mean time or waiting time, streak patterns are indeed a good indicator
for detecting the changes in the probability of single outcomes.

Yeah, but to play a 50-50 game to the long run requires an infinite bankroll. For real world bankrolls you have equal chances of busting out or doubling up and zero chance of breaking even. Or, if you play until you bust out, your chances of busting out are 100%.

···

--- In vpFREE@yahoogroups.com, Mitchell Tsai <tsai@...> wrote:

If we play a 50-50 game, in the *long run* we will be highly likely
close to even.

The wizard of odds did one for me to figure out how likely it was that I had had a streak of Royals in a short period during my entire career and solved it using matrix algebra. Apparently, that works quite well. Since I did not do the math I can't tell you the exact equations he used. You could email him and ask.

An additional point for anyone listening in: The chance that something will happen to you in a fixed number of future trials and the chance of something having happened to you at any time in the past are very different and require very different math.

The latter requires looking at your entire life and the inclusion of anything happening unlikely enough to flag your notice...NOT merely the thing that happened.

Example: You hit 4 eights eight times in a row. Wow that unlikely. (Saw it happen BTW)

If one wished to look at how likely this was to have happened in the past you'd have to include the fact that you would have been just as surprised having hit any single 4K eight times in a row. So it isn't the chance to hit 4 eights, it's really the chance to have hit any 4K that many times sequentially.

~FK

···

--- In vpFREE@yahoogroups.com, Mitchell Tsai <tsai@...> wrote:

> On Apr 18, 2012, at 8:59 AM, matt20482002 wrote:
>> OK, you converted 23 of 200 flushes.
>>

Shorter version of my previous message.

It would be cool to know, if I played 5,000 or 10,000 flush draws, how
likely I am to have a bad streak where only 23/200 flushes connect.
Or...What is the number of flush draws I need to play (e.g. 1,000,
10,000), before I have 50% chance of seeing a bad streak of 23/200 or
worse.

Mitchell

Here's a link I found:

Recently I’ve come across a task to calculate the probability that a
run of at least K successes occurs in a series of N (K≤N) Bernoulli
trials (weighted coin flips), i.e. “what’s the probability that in
50 coin tosses one has a streak of 20 heads?”
  http://www.askamathematician.com/2010/07/q-whats-the-chance-of-getting-a-run-of-k-successes-in-n-bernoulli-trials-why-use-approximations-when-the-exact-answer-is-known/

It's messy.

Does any know any links to "bad streaks of 23/200" as opposed to "runs
of 23"?