Kobi Abayomi
Contents
1. Topics Covered
2. The Big Picture
2.1. The Model, The Parameters
2.2. The Random Variable
2.3. The Probability Distribution
2.4. The functions on Probability Distribution
2.5. The Sample
2.6. The Central Limit Theorem
2.7. The Normal distribution; and is closed to Linear Transforms
2.8. Inference: Hypothesis Testing
2.9. Inference: Confidence Intervals
2.10. Inference: Bayesian
2.11. Bayesian Computation
3. Bayesian Posterior Intervals
1. Topics Covered
This class is different every time I teach it: below is what we we usually cover in the 1st semester of an introductory sequence.
This Semester we Covered:
- Illustrating Data: Histograms, Contingency Tables.
- Statistics: Sample Average, Sample Variance, Order Statistics
- Probability and Experiments: Set Operations,
The Sample Space, Independence. Condi-
tional Distributions, Expectation,Variance. Covariance. Correlation. - Counting Methods: The Counting Principle, Permutations and Combinations.
- Discrete Distributions: Bernoulli, Binomial, Negative Binomial
- Continuous Distributions: Uniform, Normal.
- Statistics and Sampling: The Central Limit
Theorem, The Distribution of the Sample
mean, The Likelihood - Inference: The Test Statistic, Hypothesis Testing, Confidence Intervals, Bayesian Inference.
2. The Big Picture
Statistics is all about saying something about what you believe about how the
world works from what you have observed from it. Statistics is the science of experimentation: what we can
observe, and all that we can say about it, is the result of our assumptions about the Experiment or Model
that generated it.
We always start off with a model: we’ve used the Bernoulli experiment to
illustrate.
2.1. The Model, The Parameters
The simplest possible experiment - only two outcomes, Success or Failure.
We
can enumerate the sample space for this experiment
2.2. The Random Variable
And then assign a Random Variable as a mapping from the Sample Space (Ω) to the real numbers
For example an outcome or event on the sample space is a particular one is
ω = Success. We typically assign
and 0 otherwise.
2.3. The Probability Distribution
Now we can immediately write a probability distribution for X
This is its probability mass function (density function for continuous
random variables).
Then we can write its distribution function immediately
And so on with Expectation, Variance, Covariance (if we have more than one)...all
functions on the probability mass function.
2.4. The functions on Probability Distribution
We know that these functions have nice properties and that Variance, and Covariance are basically just Expectations...discrete or continuous averages of functions on the probability mass function.
We know how to transform the random variable and generate a new distribution for
the new random variable after transformation. We know that expectation is a linear function so that
linearity is respected.
From this we get results on summation and integration ‘passing’ through
expectations under independence
We use these results to investigate the properties of multiple, repeated, often
supposed identical and independent outcomes from the experiment: the Sample.
2.5. The Sample
Suppose we take , observations
from our Bernoulli Random Variable model. We devise an estimator for the probability of success that is just
the average of the observed successes
We can see right away from the properties of the Expectation
and
2.6. The Central Limit Theorem
The CLT tells us, that when we average things, the averages tend to a Normal distribution, no matter the initial distribution we draw from. It is enough to assume that the initial distribution - or model - is stable, and that we take enough samples and then average them.
The parameters - i.e. the particular values which give us a particular distribution -
of this Normal distribution for the average are governed by the initial mean and variance of the underlying
Random Variable (Experiment/Process)
2.7. The Normal distribution; and is closed to Linear Transforms
One of the nice properties of the Normal Distribution is that its quantiles are
easy to remember and it is closed to linear transforms.
Set
and by the properties of the Expectation
and if T ∼ Normal then Z ∼ Normal. Let’s write it again:
2.8. Inference: Hypothesis Testing
So, we have an easy mapping from the sampling distribution (14),(10),(11) of our estimator p̂ (9) to the Z-score which is scaled in
terms of standard deviation units.
But remember: What we really want is to say something about the true state of the Experiment/Model/Nature/The World.
One way is to set up a choice between beliefs about the parameters of our model,
and a way to choose one set of plausible values
given our tolerance for choosing the alternative hypothesis when it
isn’t true
It is reasonable that when our Z statistic is large in magnitude we should take that
as evidence that we are not at our null hypothesis and reject that for our alternative hypothesis: we set
the Z statistic up that way, as just the difference between what we observe and what we expect.
Our ability to choose the alternative when in fact it is true we call the power of the test
The Rejection Region is, of course, set by our choice of α.
2.9. Inference: Confidence Intervals
We can frame the hypothesis testing paradigm in terms of the values that we
think are plausible for the parameter of interest, at a particular tolerance for α. Using algebra that
yields
which means that
is a (1 − α) percent Confidence Interval for our true parameter of interest p.
This is to say that out of M total experiments, where each time we generate an estimator p̂, (1 −
α) · M of them should cover p.
2.10. Inference: Bayesian
Lastly, we can make statements about the estimators using Bayesian inference, if we are willing to explicitly state that the parameter if interest is random, i.e. we are not certain in our belief about it.
This allow us to quantify any prior beliefs about the parameter, look at data,
and construct a set of posterior beliefs about the parameter.
In the Bayesian approach, the posterior for θ, π(θ|x) is a full
PDF, or distribution. This distribution is the tool or method by which we conduct inference.
The Bayesian approach augments ‘frequentist’ procedures by
including ‘prior’ information about the parameter of interest, In the ‘frequentist
approach’ p is a constant, which we estimate, say via the likelihood, Lik(x|p) = ∏ni=1 fp(xi),
for example. We derive estimates using the likelihood of the data, or the sampling distribution. Common
estimates are p̂ = x̄.
In the ‘Bayesian approach’ p is an instance of a random
variable with a PDF, say π(p), and now we derive estimates using the additional randomness of π(p) via
Bayes′Equation:
We get the marginal distribution for the data,
2.11. Bayesian Computation
In the Bayesian approach, the posterior for θ, π(θ|x) is a full PDF,
or distribution. This distribution is the tool or method by which we conduct inference.
Example:
Then
And
So
and here,
Suppose we observe data xobs = 0, then:
and
Now, with a full PDF for p, we can find ‘posterior’ estimates
of parameters. Contrast this with the ‘frequentist’ approach where we found point estimates and
used the sampling distribution; in the Bayesian approach the sampling distribution role in inference on the
parameter is replaced with the Bayesian posterior distribution.
Example
Thus
Thus
...one possible estimate for the parameter p.
Another possible estimate of
the posterior mode:
3. Bayesian Posterior Intervals
In the frequentist approach the Confidence Interval is the interval, say I, such that
In the Bayesian approach the Confidence Interval is the interval, I, such that
Example
Let
and the observed data
The 95% Bayes Interval is (a, b) such that
and
thus