Key_Lab 3
pdf
keyboard_arrow_up
School
Drexel University *
*We aren’t endorsed by this school
Course
410
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
9
Uploaded by ChiefOtter2846
10/17/23, 9:36 PM
Key_Lab 3
file:///Users/zacharykey/Key_Lab-3-.html
1/9
Key_Lab 3
2023-10-11
#Calucating Statistics
##Question 1
cod_data=read.csv("/Users/zacharykey/Downloads/coddata.csv") #Mean
sum(cod_data$X6)/49
## [1] 11.85714
#Median,1st,and 3rd quartiles
sort(cod_data$X6)
## [1] 4 5 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 ## [26] 9 10 10 10 11 11 11 12 12 12 12 12 12 14 15 15 16 18 19 20 21 21 32 72
#Mode
COD=table(cod_data) COD
## X6 ## 4 5 6 7 8 9 10 11 12 14 15 16 18 19 20 21 32 72 ## 1 5 4 4 6 6 3 3 6 1 2 1 1 1 1 2 1 1
#Range
max(cod_data$X6)-min(cod_data$X6)
## [1] 68
#Standard Deviation MeanDiff=cod_data$X6 - 11.85714 SquareDiff=MeanDiff^2 SumSquareDiff=sum(SquareDiff) AgesVar=SumSquareDiff/(48) sqrt(AgesVar)
10/17/23, 9:36 PM
Key_Lab 3
file:///Users/zacharykey/Key_Lab-3-.html
2/9
## [1] 10.27335
I was able to do each by hand and checked my answers using the R functions. For the median I used the sort
function to have the data laid out from least to greatest in order. Then I counted the data till I reached the middle
number 9. This was the same process for the 1st and 3rd quartiles. I split the numbers into an lower and upper
side based on where the median was and did the same process of finding the middle number of the two regions. I
found that the lower was 7 and the upper was 12. For mode I used the method from lab 2 of putting the data into
a frequency table to then see that number 8,9,and 12 were equally in the data the most. ## Question 2
set1=read.csv("/Users/zacharykey/Downloads/dataset_1.csv") hist(set1$Data,breaks=200,main="Histogram of DataSet 1",xlab="Number",xlim=c(0,350))
#Mean
mean(set1$Data)
## [1] 7.2
#Median
median(set1$Data)
10/17/23, 9:36 PM
Key_Lab 3
file:///Users/zacharykey/Key_Lab-3-.html
3/9
## [1] 3
#Interquartile Range
IQR(set1$Data)
## [1] 2
#Range
314-0
## [1] 314
#Standard Dev
sd(set1$Data)
## [1] 35.93651
The mean comes out to 7.2 and the median comes out to 3. The better measure of the center of the data set in
this case would be the median. This is because most of the numbers are on the lower side but there is a single
outlier at 314 that raises the mean significantly.
The reason there is a large difference in the standard deviation and the interquartile range is because of the outlier
in the data. The interquartile range looks at the middle spread of the data so it takes the upper and lower quartiles
and subtracts them to show the difference between them. In the case of data set one most of the number are on
the lower side and the one outlier would not impact the interquartile range as it looks at where the individual
numbers are in a data set not the sum. The standard deviation however looks at how dispersed the data is in
realation to the mean. To find the SD calucation of the sum most be made where the outlier’s higher value can be
more disruptive.
Question 3
set2=read.csv("/Users/zacharykey/Downloads/dataset_2.csv") hist(set2$Data,breaks=10,main="Histogram of DataSet 2",xlab="Numbers")
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/17/23, 9:36 PM
Key_Lab 3
file:///Users/zacharykey/Key_Lab-3-.html
4/9
#Mean
mean(set2$Data)
## [1] 3.067568
#Median median(set2$Data)
## [1] 3
#Range
max(set2$Data)-min(set2$Data)
## [1] 8
#Interquartile Range
IQR(set2$Data)
## [1] 2
10/17/23, 9:36 PM
Key_Lab 3
file:///Users/zacharykey/Key_Lab-3-.html
5/9
#Standard Dev
sd(set2$Data)
## [1] 1.537932
In the case of data set 2 as there are no crazy outliers in the data the mean is better at measuring the center of
the data. This is because the mean summarizes the data overall while the median just finds the central value of
the data.
##Question 4
set3=read.csv("/Users/zacharykey/Downloads/dataset_3.csv") hist(set3$Data,breaks=6,main="Histogram of DataSet 3",xlab="Numbers")
#Mean
mean(set3$Data)
## [1] 5.769231
#Median
median(set3$Data)
10/17/23, 9:36 PM
Key_Lab 3
file:///Users/zacharykey/Key_Lab-3-.html
6/9
## [1] 1
#Range max(set3$Data)-min(set3$Data)
## [1] 17
#Interquartile Range
IQR(set3$Data)
## [1] 9.5
#Standard Dev
sd(set3$Data)
## [1] 5.569945
In the case of data set 3 there is not necessarily a outlier but there is a bunch of a single value. To best represent
the center of this data the mean would be the best as it takes it to account all of the data giving a value of 5. The
median because of the repetitive of the number 1 comes out to number 1 which doesn’t represent the center of
the data as it doesn’t include the other higher numberic values.
##Question 5
fish=read.csv("/Users/zacharykey/Downloads/bluegill.csv") #Population Mean
mean(fish$Length)
## [1] 120.0338
#Standard Dev
sd(fish$Length)
## [1] 41.94847
#Sample Mean and SD of 10 fish
fish_10=sample(fish$Length,size=10,replace=F) mean(fish_10)
## [1] 134.2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/17/23, 9:36 PM
Key_Lab 3
file:///Users/zacharykey/Key_Lab-3-.html
7/9
sd(fish_10)
## [1] 34.49573
#Sample Mean and SD of 100 fish
fish_100=sample(fish$Length,size=100,replace=F) mean(fish_100)
## [1] 125.81
sd(fish_100)
## [1] 37.26609
#Sample Mean and SD of 500 fish
fish_500=sample(fish$Length,size=500,replace=F) mean(fish_500)
## [1] 120.622
sd(fish_500)
## [1] 41.83649
As the sample size gets bigger the mean and standard deviation are increasing.This is because it is getting closer
to the actual population mean and standard dev as more numbers are added. The population mean was 120 so
as the sample mean size increases it gets closer to that number. The same can be applied for the standard
deviation.
#Boxplots
##Question 1
boxplot(cod_data$X6,main="Boxplot of Cod Weights",ylab="Weight (kg)")
10/17/23, 9:36 PM
Key_Lab 3
file:///Users/zacharykey/Key_Lab-3-.html
8/9
The advantages of using a histogram compared to a boxplot is that all of the values of a frequency of a variable
are shown. It is also clear where the distribution of the data is on a histogram. The disadvantage and what the
box plot shows is the summary of the data. The histogram does not show the max or min values nor highlight the
outliers in the data. The boxplot also shows the median and IQR ranges of the data.
##Question 2
Mosq=read.csv("/Users/zacharykey/Downloads/Mosquitofish.csv") split_gender=split(Mosq,Mosq$Gender) Male=split_gender$M Female=split_gender$F Male_data=Male$FishLength Female_data=Female$FishLength boxplot(Male_data,Female_data, col=c('green','blue'), names=c('Male Mosquito Fish','Female Mosquito Fish'), ylab='Length', main="Comparison of Mosquito Fish Length by Gender")
10/17/23, 9:36 PM
Key_Lab 3
file:///Users/zacharykey/Key_Lab-3-.html
9/9
The box plots show that on average a female mosquito fish is longer than a male mosquito fish. There is however
some overlap in the data as the outliers in male mosquito fish are around the same size as the median size of the
female fish data. You can also see that by the size of the female boxplot that the data is more spread compared
to the male fish. There are also more outliers in the female data as seen by the white dots above the top whisker.
Looking more closely at each boxplot the male mosquito has data more frequent on the lower end while the
female has the data more frequent on the higher end. This can be seen by how the boxplot is broken up by the
blackline (median).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
What would an outlier look like on box plot
arrow_forward
Use a chi-square test for independence to compare the proportion of males and females (sex) that indicate that they have trouble falling asleep (trubslep)
²critical=
²calculated =
Decision =
Report results:
arrow_forward
i need help finding the mean and standard deviation
arrow_forward
What is a basic formula to determine relative frequency of provided data to determine a specific value based on that data and its parameters?
arrow_forward
How do Compute and interpret the interquartile range, or IQR?
arrow_forward
A professor was interested in how the amount of sleep may impact a student’s performance in her course. She asked students to record the number of hours they sleep on a typical weeknight. The data can be found in the file hours_sleep.xlsx. You will need to use this information to respond to the questions below.
Based on information provided in the description use jamovi to compute appropriate descriptive statistics for the main variable of interest and provide a copy of the output from jamovi. Use the information you generated in jamovi to complete the following.
Describe the symmetry of this distribution (please be sure to include the appropriate statistics to support your description).
arrow_forward
19 of 17>
Suppose that an international health
organization produces neonatal mortality
estimates (i.e., neonatal death counts) for 12
randomly selected nations.
Identify the median neonatal mortality
estimate from the dot plot. Report your answer
10
is
20
25
30
35
40
45
50
as a whole number representing thousands of
Neonatal deaths (in thousands)
deaths,
Median neonatal mortality =
thousand deaths
arrow_forward
Create an example of 2 data sets with the same range and the same number of values that have different standard deviations
arrow_forward
'prestig10' what does that mean according to GSS
arrow_forward
what are the four imporatant sources of data?
arrow_forward
Suppose we want to test the hypothesis that mothers with low socioeconomic status (SES) deliver babies whose birth weights are different from normal. To test this hypothesis, a random sample of 100 birth weights is selected from a list of full-term babies of SES mothers. The mean birth weight is found to be 115 oz.
Suppose the average birth weight of all babies (based on nationwide surveys of millions of deliveries) is known to be 120 oz with
= 24 oz. Set = .05 Assume all conditions are met, what is the p-value of their test? Give your answer to 4 decimal places.
arrow_forward
If we multiply the data values by a constant (say, c), how this will affect the geometric mean? Use Equation 2 (antilog geometric mean) to prove it
arrow_forward
Calculate the 5 number summary and the interquartile range of the following data:
37, 35, 34, 58, 30, 62, 45, 12, 49, 6, 32, 27, 25, 18, 4237, 35, 34, 58, 30, 62, 45, 12, 49, 6, 32, 27, 25, 18, 42
Q1 = Q2 = Q3 = Min = Max = IQR =
arrow_forward
If a study determines the difference in average salary for sub population of people with blue eyes and people with brown eyes is not significant, then the population of blue eyed people are "blank" different salaries.
arrow_forward
In SPSS, what analysis can be used to quantify the relationship that exists among 5 variables
arrow_forward
There was a compete e^AT part
arrow_forward
Hashim Motors Sdn. Bhd. specializes in selling a secondhand car. Currently, the company
has 12 used cars for sale. The owner of the company wants to investigate the relationship
between the age of the car and the mileage of the car. The data were collected and
analyzed using SPSS. The results as follow.
Car's age (years) | 6 4 2 2.5 3
Mileage ('000 km) | 93 60 33 36 53 59 48 77
4
5 5.5 4.5 4.5 3.5
79 61 63 50
3
Model Summary
Adjusted R
Std. Error of the
Model
R
R Square
Square
Estimate
1
.975
.951
.946
4.036
a. Predictors: (Constant), Age
arrow_forward
I need the right answer to 2 & 3.
arrow_forward
Chapter 5, Section 2, Exercise 056b
Cell Phone AppsIn 2010, some researchers with the Pew Internet & American Life project interviewed a random sample of adults about their cell phone usage.1 One of the questions asked was whether the respondent had ever downloaded an application or ‘‘app” to their cell phone. The sample proportion who had, based on 1917 respondents who had cell phones, was p^=0.29. One such distribution, based on proportions from 5000 bootstrap samples, is shown in the figure below. The standard deviation of these proportions is 0.0102. Use this information to find a 99% confidence interval for the proportion of cell phone users (in 2010) who have downloaded at least one app to their phone.
Bootstrap distribution for the proportion of cell phone users who have downloaded an app
Round your answers to three decimal places.The 99% confidence interval is to .
arrow_forward
Compute 70th percentile and compute the interquartile range
50,47,58,53,66,81,73,65,51,71,58
arrow_forward
The mortality rate from melanoma (skin cancer) during the 1950s was recorded for each of the
48 contiguous United States, plus Washington, D.C. (as reported by Fisher and Van Belle
(1993) and found at http://www.stat.psu.edu/~lsimon/stat501wc/sp05/data/
The following is the scatterplot of the data:
Mortality per 10 million
225
200
175
150
125
100-
30
35
Scatter plot
40
Latitude
r=-.83
45
50
The plot shows a negative linear relationship between the latitude of the state and melanoma mortality rate.
Which of the following is an appropriate interpretation of the negative relationship shown in the scatterplot above?
A. O The more southern the state, the more people that died of melanoma.
B. O The more southern the state, the higher the melanoma death rate.
C. The more northern the state, the higher the melanoma death rate.
arrow_forward
We use the dataset "MASchools" in the package "AER". The dataset contains information on test performance, school characteristics and student demographic backgrounds for school districts in Massachusetts (MA). In this exercise, we shall use R-functions in the packages "estimatr", "car" and "ggplot2". Whenever you need to compute standard errors, the type of standard error should be "HC1"or equivalently "stata".
The R-code below is supposed to generate a graph similar to Figure 9.1 of Stock and Watson (the left graph on page 19 of Lecture Slides 7) with the three estimated regression functions. However, when you implement it, you find that the resulting graph has a problem.
Choose the wrong statement about the code and/or the generated diagram.
a. The resulting graph shows the data points.
b. The error can be fixed by changing the location of one of the '+' symbols in the code.
c. The unknown expression y ~ poly(x,3) causes the error.
d. The x and y in the formula of geom_smooth are…
arrow_forward
PLEASE DO ALL THE PARTS OF THE QUESTION AS ITS ALL FROM ONE!
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- What would an outlier look like on box plotarrow_forwardUse a chi-square test for independence to compare the proportion of males and females (sex) that indicate that they have trouble falling asleep (trubslep) ²critical= ²calculated = Decision = Report results:arrow_forwardi need help finding the mean and standard deviationarrow_forward
- What is a basic formula to determine relative frequency of provided data to determine a specific value based on that data and its parameters?arrow_forwardHow do Compute and interpret the interquartile range, or IQR?arrow_forwardA professor was interested in how the amount of sleep may impact a student’s performance in her course. She asked students to record the number of hours they sleep on a typical weeknight. The data can be found in the file hours_sleep.xlsx. You will need to use this information to respond to the questions below. Based on information provided in the description use jamovi to compute appropriate descriptive statistics for the main variable of interest and provide a copy of the output from jamovi. Use the information you generated in jamovi to complete the following. Describe the symmetry of this distribution (please be sure to include the appropriate statistics to support your description).arrow_forward
- 19 of 17> Suppose that an international health organization produces neonatal mortality estimates (i.e., neonatal death counts) for 12 randomly selected nations. Identify the median neonatal mortality estimate from the dot plot. Report your answer 10 is 20 25 30 35 40 45 50 as a whole number representing thousands of Neonatal deaths (in thousands) deaths, Median neonatal mortality = thousand deathsarrow_forwardCreate an example of 2 data sets with the same range and the same number of values that have different standard deviationsarrow_forward'prestig10' what does that mean according to GSSarrow_forward
- what are the four imporatant sources of data?arrow_forwardSuppose we want to test the hypothesis that mothers with low socioeconomic status (SES) deliver babies whose birth weights are different from normal. To test this hypothesis, a random sample of 100 birth weights is selected from a list of full-term babies of SES mothers. The mean birth weight is found to be 115 oz. Suppose the average birth weight of all babies (based on nationwide surveys of millions of deliveries) is known to be 120 oz with = 24 oz. Set = .05 Assume all conditions are met, what is the p-value of their test? Give your answer to 4 decimal places.arrow_forwardIf we multiply the data values by a constant (say, c), how this will affect the geometric mean? Use Equation 2 (antilog geometric mean) to prove itarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL