PracWeek4 - Complete

docx

School

Macquarie University *

*We aren’t endorsed by this school

Course

1170

Subject

Statistics

Date

May 31, 2024

Type

docx

Pages

9

Uploaded by JudgeKnowledge14243

Report
Introduction to Distributions Employability Skills As you complete this exercise, think about which of these employability skills you are using: Today’s Practical is in two parts. What will we cover in this Part? In this practical exercise we will: Examine the distribution of means from Normal populations. Examine the distribution of means from non-Normal populations. Explore the Central Limit Theorem for means. Saving your work Don’t forget that it is useful to save your work. Save your work to your storage device to retain a copy. IQs are normally distributed with a population mean of 100 and a population standard deviation of 15. The file IQ.xlsx contains five samples from this population. Download from iLearn and open IQ.xlsx Open the IQ.xlsx file and look at the data. The name of the worksheet is IQ Data . You will see five columns, each of length 100, titled Sample 1, Sample 2,… , Sample 5. Each column represents a random sample from a population with a mean of 100 (ie.  = 100) and a standard deviation of 15 (ie.  = 15). 1 | Introduction to Distributions Copyright Macquarie University 2020 Open the IQ data
Individuals data – summarising numerically and graphically Each of the five samples of IQ scores is stored in a separate column (A to E). We begin by obtaining descriptive statistics: Click Data and Data Analysis . Select Descriptive Statistics . Select all five columns of data. Select that you have Labels in First Row . Check Summary Statistics and New Worksheet . Then click OK . The numerical summaries for all selected columns should appear on a new worksheet. Give your new worksheet a meaningful name. Write down the mean and standard deviation for each sample (correct to 2dp): Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Mean ( 𝑦̅ ) 102.09 95.74 101.83 101.48 97.79 Standard deviation (s) 14.51 13.59 15.23 13.37 14.54 Are the sample means equal? No Are the sample means close to the population mean? Yes Are the sample standard deviations equal? No Are the sample standard deviations close to the population standard deviation? Yes Why are the samples different and why do they differ from their expected values? Every time we take a random sample from a population, the samples are likely to differ. The difference between samples is known as sampling variability or sampling error. If a sample is representative of the population and is large enough, the sample statistics (such as the mean and standard deviation) should be close to the population values. We will now produce a histogram of the data in Sample 1: Return to the IQ Data worksheet. Click Data and Data Analysis . Select Histogram . Select Input Range A1:A101 . Use the default bins . Select Labels . Check Chart Output and New Worksheet . Then click OK . Format the bars of the histogram to remove the gaps . Give your histogram a meaningful title and x-axis title . Give your worksheet a meaningful name. Sketch the shape of the histogram below: 2 | Introduction to Distributions Copyright Macquarie University 2020
How would you describe the shape of the histogram? The histogram appears unimodal and symmetric – an approximately Normal distribution. Why is this the result that you would expect to see? Samples should resemble the population from which they are taken. Since this sample came from a Normal population, the sample should follow a Normal distribution. Means data – creating and summarising So far the analysis we did above has created summaries of samples of individual values. Now we are going to create a column of means by calculating the mean of each row of data. We will calculate the mean for each row, using the random values from columns A to E to produce a column of means. These means come from samples of size n=5 because we have 5 values in each row. We will store the means that we calculate in column G. Return to the IQ Data worksheet. In cell G2, type =AVERAGE(A2:E2) Now we want to use the same function for cells G3 to G101. The easiest way to do this is to left click in cell G2. At the bottom right hand corner of the cell you can see a little square. Hover over that square until the Excel curser becomes a thin +. Using the left hand mouse button click and hold on the + and drag the curser down to G101. The formula should copy to cells Add a title in cell G1 of Row Means. Obtain descriptive statistics for the column of means: Click Data and Data Analysis . Select Descriptive Statistics . Select Column G . Select Labels in First Row . Check Summary Statistics and New Worksheet . Then click OK . Give your new worksheet a meaningful name. Produce a histogram for the column of means: Click Data and Data Analysis . Select Histogram . Select Input Range G1:G101 Use the default bins . Select Labels . Check Chart Output and New Worksheet . Then click OK . 3 | Introduction to Distributions Copyright Macquarie University 2020
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Give your histogram a meaningful title and x-axis title . Give your new worksheet a meaningful name. Sketch the shape of the histogram below: How would you describe the shape of the histogram? The histogram appears unimodal and symmetric – an approximately Normal distribution. Now compare the summary statistics for Sample 1 in column A, with the row means you have calculated in column G. Find the sample statistics for column A and column G (correct to 1dp) and fill in the table below: IQ data Mean Media n SD Range Min Max Individuals (Column A: Sample 1) 102.1 101.7 14.5 72.1 64.6 136.7 Means (Column G: Row Means n=5) 99.8 99.6 5.5 26.9 86.2 113.1 Comment on the range of the data in Column G compared to Column A. Individuals had a range of 72.1 IQ points; from a minimum of 64.6 to a maximum of 136.7 IQ points. Mean IQ scores for samples of size 5 had a range of 26.9 IQ points; from a minimum of 86.2 to a maximum of 113.1 IQ points. How do the standard deviations of column A and column G compare? The standard deviation of Column A is 2.6 times the standard deviation of Column G. In lectures you learnt that when samples of size n are randomly selected from a population with mean  and standard deviation , then the distribution of the Sample Means has a mean of  and a standard error of . For the IQ data for individuals : the original population is normally distributed the population mean is  = 100 4 | Introduction to Distributions Copyright Macquarie University 2020
The population standard deviation is  = 15 So we would expect that the mean of the individuals (in Column A) should be 100 and the standard deviation should be 15. When dealing with these sample means from samples of size n = 5 , theory tells us that: the expected mean of these sample means is: 100 the expected standard error is: Are the means in the table above close to their expected values? Yes Why do they differ from their expected value? The difference between the sample statistics and the population parameters is due to sampling variability. Even though the size of each individual sample that went to make up the mean was only five, the distribution of means was approximately normal. Why is this? When the original population is a Normal distribution, then sample means will follow a Normal distribution, regardless of sample size. The Central Limit Theorem: Non-Normal populations Now we are going to look at some non-Normal data, calculate means for different sample sizes and see how the distributions change. The chi-squared distribution is a generally right skewed distribution. We will use this distribution in later practical exercises. The file Chisquared.xlsx contains samples of data from a chi-squared distribution. This specific chi-squared distribution has a population mean = 5 and population standard deviation = 3.16 Download from iLearn and open Chisquared.xlsx Open the Chisquared.xlsx file and look at the data. You will see twenty-five columns, each of length 100, titled Sample 1, Sample 2, … Sample 25. Each column represents a random sample from a chi-squared population with a mean of 5 (ie.  = 5) and a standard deviation of 3.16 (ie.  = 3.16). Produce descriptive statistics and a histogram for Sample 1 in Column A. Sketch the shape of the histogram below. How would you describe the shape, centre and spread of the histogram? 5 | Introduction to Distributions Copyright Macquarie University 2020
The histogram appears unimodal and right skewed. The mean of the distribution is 5.60 and the median is 4.75. The minimum value is 0.37 and the maximum is 18.05. Now we will calculate means for rows using different numbers of columns. Begin by calculating the mean for 4 columns. We will calculate the mean for each row, using the random values from columns A to D to produce a column of means. These means come from samples of size n=4 because of the 4 values in each row. We will store the result in column AB . Scroll right on the Chisquared Data worksheet until you reach column AB. In cell AB2, type =AVERAGE(A2:D2) Use the procedure described on page 3 of this practical exercise to copy the formula to cells AB3 to AB101 . Add a title in cell AB1of Means n=4 Now perform the same process for calculating row means, but include all 25 of the columns of chi-squared data and store the result in column AC . Add a title in cell AC1of Means n=25. Calculate descriptive statistics and plot histograms for Column A, Column AB and Column AC. Summarise the results in the table and sketch the histograms in the space below: Chi-squared data Population mean ( ) Sample Mean ( 𝒚̅ ) Population Standard Deviation ( σ ) Sample Standard Deviation (s) Shape of the distribution Individuals (Column 1)  = 5 𝒚̅ = 𝟓. 𝟔 = 3.16 𝒔 = 𝟑. 𝟗 Unimodal, right skewed Means n=4 (Column AB: Means n=4)  = 5 𝒚̅ = 𝟓. 𝟏𝟎 𝒔 = 𝟏. 𝟔𝟒 Unimodal but not symmetric Means n=25 (Column AC: Means n=25)  = 5 𝒚̅ = 𝟓. 𝟎𝟗 𝒔 = 𝟎. 𝟔𝟒𝟓 Unimodal, fairly symmetric – approximately Normal distribution 6 | Introduction to Distributions Copyright Macquarie University 2020
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Are the sample means close to the population mean? Yes Are the sample standard deviations close to the population standard deviation? Yes What can you say about the shape of the histograms as you go from individuals data (Column A) to means of size 4 (Column AB) to means of size 25 (Column AC)? As the sample size increases, the distribution of means of samples taken from this population becomes approximately Normal. You have just demonstrated the Central Limit Theorem! That is: even if the original population is not normally distributed, the distribution of means of samples taken from this population is approximately Normal (provided n is “large enough”), and that approximation improves as the sample size increases. Probabilities in Excel What will we cover in this Part? In this practical exercise we will: Find areas (probabilities) under a Normal curve using Excel. Saving your work Don’t forget that it is useful to save your work. Save your work to your storage device to retain a copy. 7 | Introduction to Distributions Copyright Macquarie University 2020
Excel function for Normal distribution probabilities In lectures we learned that we find probabilities for distributions by calculating the area under the distribution curve. We can use Excel to find probabilities for a Normal distribution by using the NORM.DIST function. This function will find the area to the left of a given value of y for any Normal distribution: Function Description =NORM.DIST(y,µ,σ,TRUE) Gives the area to the left of y in a Normal distribution with a mean of µ, and a population standard deviation of σ. In lectures, we learned that the total area under the Normal distribution curve is equal to 1. Using this property we can use the NORM.DIST function in the following way: Function Description =1-NORM.DIST(y,µ,σ,TRUE) Gives the area to the right of y in a Normal distribution with a mean of µ, and a population standard deviation of σ. We can also use NORM.DIST to find the areas between y 1 and y 2 . Note when we are using the functions as shown below, 𝐲 𝟐 should be the larger of the two values of Y. Function Description =NORM.DIST( y 2 ,µ,σ,TRUE) NORM.DIST( y 1 ,µ,σ,TRUE Gives the area between y 1 and y 2 in a Normal distribution with a mean of µ, and a population standard deviation of σ. Using Excel to find probabilities for the Normal distribution The red-legged pademelon is a species of Australian marsupial kangaroo which live in rainforest habitat. Their name is thought to have come from the word ‘Paddymalla’ which is an Aboriginal term for ‘small kangaroo from the forest’. The average weight of a mature female red-legged pademelon is 4.1 kg. We will assume the weights of mature female red-legged pademelons have a Normal distribution and a standard deviation of 0.3 kg. Answer the following questions in relation to the weight of mature female red-legged pademelons. For each question, draw a Normal distribution and shade the required area. Probability (area) Normal distribution with required area shaded What is the probability that a female weighs less than 4kg? =NORM.DIST(4,4.1,0.3,T RUE) = 0.36944 8 | Introduction to Distributions Copyright Macquarie University 2020
What is the probability that a female weighs more than 5kg? =1- NORM.DIST(5,4.1,0.3,TR UE) = 0.00135 What is the probability that a female weighs between 4kg and 5kg? =NORM.DIST(5,4.1,0.3,T RUE)- NORM.DIST(4,4.1,0.3,TR UE) = 0.62921 The standard Normal distribution is a particular Normal distribution with a population mean: µ = 0 and a population standard deviation: σ = 1 . In lectures we learned that we can standardise a value of Y from any Normal distribution. When we standardise, we convert the value of Y to the equivalent value of Z on the standard Normal distribution. This z-score is a measure of the number of standard deviations that a value of Y is from its mean. Use the appropriate Excel functions to find the following areas in the standard Normal distribution. For each part draw a Normal distribution and shade the required area. Probability (area) Normal distribution with required area shaded above a z-score of -2.25? =1- NORM.DIST(- 2.25,0,1, TRUE) = 0.9878 above a z-score of 2.25? =1- NORM.DIST(2.25,0,1, TRUE) = 0.0122 between a z-score of -1 and a z-score of 1.5? =NORM.DIST(1.5,0,1, TRUE) - NORM.DIST(- 1,0,1,TRUE) = 0.93319-0.1587 =0.7745 9 | Introduction to Distributions Copyright Macquarie University 2020
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help