| kyrie1618 ( @ 2008-08-15 16:52:00 |
Proof by contradiction -or- why small samples demand robust nonparametric statistics.
I roll a die and it comes up 2. Then I do it again and it comes up 1. Correcting for symmetries, we should see this event happening about (exactly, in the long run) 1 time out of every 9. So this is a perfectly reasonable example.
Now suppose I need to know the population mean. The Standard Error Theorem ( http://classweb.gmu.edu/tkeller/HANDOUT S/Handout6.pdf ) says that the sample suggests the population mean has a mean of 1.5 and a standard deviation of 0.71 .
You may have heard of motorola's "six-sigma". This result is not quite there. We're only (3.5-1.5)/.71 = 2.8 sigmas out. Were we to assume guassian/normal error, the actual population mean at least that far off should only happen http://www.math.unb.ca/~knight/util ity/NormTble.htm 0.52% of the time. Be careful! http://meganmcardle.theatlantic.com/arc hives/2008/06/pabpba.php "the probability that you are Jewish, given that you are a rabbi, is 100%. The probability that you are a rabbi, given that you are Jewish, is a small fraction of 1%." But point taken, I hope?
Nobody sane would ever have tried to apply six-sigma here. The sample is way too small, and we know a priori that the population is not guassian. But what should we apply in its place? The Standard Error Theorem is solid. The population mean, given that fairly likely sample, really does have a mean of 1.5 and a standard deviation of 0.71 .
So what went wrong?
The population isn't skewed at all, but the tails are way too light. The normal bell curve has a specific shape where 32% of the time you fall at least 1 sigma away, 4.6% of the time you fall at least 2 sigmas away, 0.27% of the time you fall at least 3 sigmas away, and 0.0064% of the time you fall at least 4 sigmas away.
Kurtosis is the problem. The die has kurtosis (non-normal tails). What's the population here? Mean of 3.5 and stdev of 1.71 so you fall at least 1.5 sigmas away 0% of the time. This is not a bell curve. It's /nothing/ like a bell curve. And the sample we got in the beginning did not warn us about the kurtosis.
In order for a sample to estimate the mean you need at least one data point. To estimate the variance or stdev you need at least two data points. Estimating the skew requires at least three, and the kurtosis needs at least four. Sure, in estimating the population mean, we knew its mean and stdev. But we didn't know anything else, and because of that we were unable to guess how often really unlikely-seeming events would happen.
This is perhaps why a 500-year storm seems to hit us every 15 years. Or perhaps Al Gore is right and anthropogenic global warming is changing the earth's weather. I'm not sure. Anyone want to pay me to go find out?
I roll a die and it comes up 2. Then I do it again and it comes up 1. Correcting for symmetries, we should see this event happening about (exactly, in the long run) 1 time out of every 9. So this is a perfectly reasonable example.
Now suppose I need to know the population mean. The Standard Error Theorem ( http://classweb.gmu.edu/tkeller/HANDOUT
You may have heard of motorola's "six-sigma". This result is not quite there. We're only (3.5-1.5)/.71 = 2.8 sigmas out. Were we to assume guassian/normal error, the actual population mean at least that far off should only happen http://www.math.unb.ca/~knight/util
Nobody sane would ever have tried to apply six-sigma here. The sample is way too small, and we know a priori that the population is not guassian. But what should we apply in its place? The Standard Error Theorem is solid. The population mean, given that fairly likely sample, really does have a mean of 1.5 and a standard deviation of 0.71 .
So what went wrong?
The population isn't skewed at all, but the tails are way too light. The normal bell curve has a specific shape where 32% of the time you fall at least 1 sigma away, 4.6% of the time you fall at least 2 sigmas away, 0.27% of the time you fall at least 3 sigmas away, and 0.0064% of the time you fall at least 4 sigmas away.
Kurtosis is the problem. The die has kurtosis (non-normal tails). What's the population here? Mean of 3.5 and stdev of 1.71 so you fall at least 1.5 sigmas away 0% of the time. This is not a bell curve. It's /nothing/ like a bell curve. And the sample we got in the beginning did not warn us about the kurtosis.
In order for a sample to estimate the mean you need at least one data point. To estimate the variance or stdev you need at least two data points. Estimating the skew requires at least three, and the kurtosis needs at least four. Sure, in estimating the population mean, we knew its mean and stdev. But we didn't know anything else, and because of that we were unable to guess how often really unlikely-seeming events would happen.
This is perhaps why a 500-year storm seems to hit us every 15 years. Or perhaps Al Gore is right and anthropogenic global warming is changing the earth's weather. I'm not sure. Anyone want to pay me to go find out?