PracWeek4 - Complete
docx
keyboard_arrow_up
School
Macquarie University *
*We aren’t endorsed by this school
Course
1170
Subject
Statistics
Date
May 31, 2024
Type
docx
Pages
9
Uploaded by JudgeKnowledge14243
Introduction to Distributions
Employability Skills As you complete this exercise, think about which of these employability skills you are using: Today’s Practical is in two parts.
What will we cover in this Part?
In this practical exercise we will:
Examine the distribution of means from Normal populations.
Examine the distribution of means from non-Normal populations.
Explore the Central Limit Theorem for means.
Saving your work
Don’t forget that it is useful to save your work. Save your work to your storage device to retain a copy.
IQs are normally distributed with a population mean of 100 and a population standard deviation of 15. The file IQ.xlsx
contains five samples from this population. Download from iLearn and open
IQ.xlsx
Open the IQ.xlsx file and look at the data. The name of the worksheet is IQ
Data
. You will
see five columns, each of length 100, titled Sample 1, Sample 2,… , Sample 5. Each column represents a random sample from a population with a mean of 100 (ie. = 100) and a standard deviation of 15 (ie. = 15). 1 |
Introduction to Distributions Copyright Macquarie University 2020
Open the IQ data
Individuals data – summarising numerically and graphically
Each of the five samples of IQ scores is stored in a separate column (A to E). We begin by obtaining descriptive statistics: Click Data
and Data Analysis
. Select Descriptive Statistics
.
Select all five columns of data. Select that you have Labels in First Row
. Check Summary Statistics
and
New Worksheet
. Then click
OK
. The numerical summaries for all selected columns should appear on a new worksheet. Give your new worksheet a meaningful name.
Write down the mean and standard deviation for each sample (correct to 2dp):
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Mean (
𝑦̅
)
102.09 95.74 101.83 101.48 97.79 Standard deviation (s)
14.51 13.59 15.23 13.37 14.54 Are the sample means equal? No
Are the sample means close to the population mean? Yes Are the sample standard deviations equal?
No Are the sample standard deviations close to the population standard deviation? Yes Why are the samples different and why do they differ from their expected values? Every time we take a random sample from a population, the samples are likely to differ. The difference between samples is known as sampling variability or sampling error. If a sample is representative of the population and is large enough, the sample statistics (such as the mean and standard deviation) should be close to the population values. We will now produce a histogram of the data in Sample 1: Return to the IQ Data worksheet. Click Data
and Data Analysis
. Select Histogram
. Select
Input Range A1:A101
. Use the default bins
. Select Labels
. Check Chart Output
and
New Worksheet
. Then click
OK
. Format the bars of the histogram to remove the gaps
. Give your histogram a meaningful title and
x-axis title
. Give your worksheet a meaningful name. Sketch the shape of the histogram below: 2 |
Introduction to Distributions Copyright Macquarie University 2020
How would you describe the shape of the histogram? The histogram appears unimodal and symmetric – an approximately Normal distribution. Why is this the result that you would expect to see? Samples should resemble the population from which they are taken. Since this sample came from a Normal population, the sample should follow a Normal distribution. Means data – creating and summarising
So far the analysis we did above has created summaries of samples of individual values.
Now we are going to create a column of means
by calculating the mean of each row of
data. We will calculate the mean for each row, using the random values from columns A to E to produce a column of means. These means come from samples of size n=5 because we have 5 values in each row. We will store the means that we calculate in column G. Return to the IQ Data
worksheet. In cell G2, type =AVERAGE(A2:E2)
Now we want to use the same function for cells G3 to G101. The easiest way to do
this is to left click in cell G2. At the bottom right hand corner of the cell you can see a little square. Hover over that square until the Excel curser becomes a thin +. Using the left hand mouse button click and hold on the + and drag the curser down to G101. The formula should copy to cells Add a title in cell G1 of Row Means.
Obtain descriptive statistics for the column of means: Click Data
and Data Analysis
. Select Descriptive Statistics
. Select Column G
.
Select Labels in First Row
. Check Summary Statistics
and
New Worksheet
. Then click
OK
. Give your new worksheet a meaningful name. Produce a histogram for the column of means: Click Data
and Data Analysis
. Select Histogram
. Select
Input Range G1:G101
Use the default bins
. Select Labels
. Check Chart Output
and
New Worksheet
. Then click
OK
. 3 |
Introduction to Distributions Copyright Macquarie University 2020
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Give your histogram a meaningful title and
x-axis title
. Give your new worksheet a meaningful name.
Sketch the shape of the histogram below: How would you describe the shape of the histogram? The histogram appears unimodal and symmetric – an approximately Normal distribution. Now compare the summary statistics for Sample 1 in column A, with the row means you have calculated in column G. Find the sample statistics for column A and column G (correct to 1dp) and fill in the table below:
IQ data
Mean
Media
n
SD
Range
Min
Max
Individuals (Column A:
Sample 1)
102.1 101.7 14.5 72.1 64.6 136.7 Means (Column G:
Row Means n=5)
99.8 99.6 5.5 26.9 86.2 113.1 Comment on the range of the data in Column G compared to Column A. Individuals had a range of 72.1 IQ points; from a minimum of 64.6 to a maximum of 136.7 IQ points. Mean IQ scores for samples of size 5 had a range of 26.9 IQ points; from a minimum of 86.2 to a maximum of 113.1 IQ points. How do the standard deviations of column A and column G compare? The standard deviation of Column A is 2.6 times the standard deviation of Column G. In lectures you learnt that when samples of size n are randomly selected from a population with mean and standard deviation , then the distribution of the Sample Means
has a mean of and a standard error of
. For the IQ data for individuals
: the original population is normally distributed the population mean is = 100 4 |
Introduction to Distributions Copyright Macquarie University 2020
The population standard deviation is = 15 So we would expect that the mean of the individuals (in Column A) should be 100 and the standard deviation should be 15. When dealing with these
sample means from samples of size n = 5
, theory tells us that: the expected mean of these sample means is: 100
the expected
standard error is:
Are the means in the table above close to their expected values? Yes
Why do they differ from their expected value? The difference between the sample statistics and the population parameters is due to sampling variability.
Even though the size of each individual sample that went to make up the mean was only five, the distribution of means was approximately normal. Why is this? When the original population is a Normal distribution, then sample means will follow a Normal distribution, regardless of sample size. The Central Limit Theorem: Non-Normal populations
Now we are going to look at some non-Normal data, calculate means for different sample sizes and see how the distributions change. The chi-squared distribution is a generally right skewed distribution. We will use this distribution in later practical exercises. The file Chisquared.xlsx
contains samples of data from a chi-squared distribution. This specific chi-squared distribution has a population mean = 5 and population standard deviation = 3.16
Download from iLearn and open Chisquared.xlsx
Open the Chisquared.xlsx file and look at the data. You will see twenty-five columns, each
of length 100, titled Sample 1, Sample 2, … Sample 25. Each column represents a random sample from a chi-squared population with a mean of 5 (ie. = 5) and a standard deviation of 3.16 (ie. = 3.16). Produce descriptive statistics
and a histogram for Sample 1 in Column A. Sketch the shape of the histogram below. How would you describe the shape, centre and spread of the histogram? 5 |
Introduction to Distributions Copyright Macquarie University 2020
The histogram appears unimodal and right skewed. The mean of the distribution is 5.60 and the median is 4.75. The minimum value is 0.37 and the maximum is 18.05. Now we will calculate means for rows using different numbers of columns. Begin by calculating the mean for 4 columns. We will calculate the mean for each row, using the random values from columns A to D to produce a column of means. These means come from samples of size n=4 because of the 4 values in each row. We will store the result in column AB
. Scroll right on the Chisquared Data
worksheet until you reach column AB.
In cell AB2, type =AVERAGE(A2:D2)
Use the procedure described on page 3 of this practical exercise to copy the formula to cells AB3 to AB101
. Add a title in cell AB1of Means n=4
Now perform the same process for calculating row means, but include all 25 of the columns of chi-squared data
and
store the result in column AC
. Add a title in cell AC1of Means n=25. Calculate descriptive statistics
and
plot histograms
for Column A, Column AB and Column AC. Summarise the results in the table and sketch the histograms in the space below:
Chi-squared data
Population
mean (
)
Sample Mean (
𝒚̅
)
Population Standard Deviation (
σ
) Sample Standard Deviation (s)
Shape of the distribution
Individuals (Column 1)
= 5
𝒚̅ = 𝟓. 𝟔
= 3.16
𝒔 = 𝟑. 𝟗
Unimodal, right skewed Means
n=4
(Column AB: Means n=4)
= 5
𝒚̅ = 𝟓. 𝟏𝟎
𝒔 = 𝟏. 𝟔𝟒
Unimodal but not symmetric Means
n=25
(Column AC: Means n=25)
= 5
𝒚̅ = 𝟓. 𝟎𝟗
𝒔 = 𝟎. 𝟔𝟒𝟓
Unimodal, fairly symmetric – approximately Normal distribution 6 |
Introduction to Distributions Copyright Macquarie University 2020
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Are the sample means close to the population mean? Yes
Are the sample standard deviations close to the population standard deviation? Yes
What can you say about the shape of the histograms as you go from individuals data (Column A) to means of size 4 (Column AB) to means of size 25 (Column AC)? As the sample size increases, the distribution of means of samples taken from this population becomes approximately Normal. You have just demonstrated the Central Limit Theorem!
That is: even if the original population is not normally distributed, the distribution of means of samples taken from this population is approximately Normal (provided n is “large enough”), and that approximation improves as the sample size increases. Probabilities in Excel
What will we cover in this Part?
In this practical exercise we will:
Find areas (probabilities) under a Normal curve using Excel. Saving your work
Don’t forget that it is useful to save your work. Save your work to your storage device to retain a copy.
7 |
Introduction to Distributions Copyright Macquarie University 2020
Excel function for Normal distribution probabilities
In lectures we learned that we find probabilities
for distributions by calculating the area under the distribution curve. We can use Excel to find probabilities for a Normal distribution by using the
NORM.DIST function. This function will find the area to the left of a given value of y for any Normal distribution:
Function
Description
=NORM.DIST(y,µ,σ,TRUE)
Gives the area to the left
of y in a Normal distribution with a mean of µ, and a population standard deviation of
σ.
In lectures, we learned that the total area under the Normal distribution curve is equal to 1. Using this property we can use the NORM.DIST function in the following way: Function
Description
=1-NORM.DIST(y,µ,σ,TRUE)
Gives the area to the right
of y in a Normal distribution with a mean of µ, and a population standard deviation of σ. We can also use NORM.DIST to find the areas between y
1
and y
2
. Note when we are using the functions as shown below, 𝐲
𝟐
should be the larger of the two values of Y. Function
Description
=NORM.DIST(
y
2
,µ,σ,TRUE)
−
NORM.DIST(
y
1
,µ,σ,TRUE
Gives the
area between
y
1
and y
2
in a Normal distribution with a mean of µ, and a population standard deviation of σ. Using Excel to find probabilities for the Normal distribution
The red-legged pademelon is a species of Australian marsupial kangaroo which live in rainforest habitat. Their name is thought to have come from the word ‘Paddymalla’ which
is an Aboriginal term for ‘small kangaroo from the forest’. The average weight of a mature female red-legged pademelon is 4.1 kg. We will assume the weights of mature female red-legged pademelons have a Normal distribution and a standard deviation of 0.3 kg. Answer the following questions in relation to the weight of mature female red-legged pademelons. For each question, draw a Normal distribution and shade the required area. Probability (area)
Normal distribution with required area shaded
What is the probability that a female weighs less than 4kg?
=NORM.DIST(4,4.1,0.3,T
RUE) = 0.36944 8 |
Introduction to Distributions Copyright Macquarie University 2020
What is the probability that a female weighs more than 5kg?
=1-
NORM.DIST(5,4.1,0.3,TR
UE) = 0.00135 What is the probability that a female weighs between 4kg and 5kg?
=NORM.DIST(5,4.1,0.3,T
RUE)- NORM.DIST(4,4.1,0.3,TR
UE) = 0.62921 The standard Normal distribution
is a particular Normal distribution with a population mean: µ = 0
and a population standard deviation: σ = 1
. In lectures we learned that we can standardise a value of Y from any Normal distribution. When we standardise, we convert the value of Y to the equivalent value of Z on the standard Normal distribution. This z-score is a measure of the number of standard deviations that a value of Y is from its mean. Use the appropriate Excel functions to find the following areas in the standard Normal distribution. For each part draw a Normal distribution and shade the required area. Probability (area)
Normal distribution with required
area shaded
above a z-score of -2.25?
=1- NORM.DIST(-
2.25,0,1, TRUE) = 0.9878 above a z-score of 2.25?
=1- NORM.DIST(2.25,0,1, TRUE) = 0.0122 between a z-score of -1 and a z-score of 1.5?
=NORM.DIST(1.5,0,1,
TRUE) - NORM.DIST(-
1,0,1,TRUE) = 0.93319-0.1587 =0.7745 9 |
Introduction to Distributions Copyright Macquarie University 2020
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
data:image/s3,"s3://crabby-images/d4c8e/d4c8e90518362be1c057ee8c2a2870df44d5153d" alt="Text book image"
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/b0445/b044547db96333d789eefbebceb5f3241eb2c484" alt="Text book image"
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
data:image/s3,"s3://crabby-images/d4c8e/d4c8e90518362be1c057ee8c2a2870df44d5153d" alt="Text book image"
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/b0445/b044547db96333d789eefbebceb5f3241eb2c484" alt="Text book image"
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL