STATS 10 Assignment 1
pdf
keyboard_arrow_up
School
University of California, Los Angeles *
*We aren’t endorsed by this school
Course
10
Subject
Mathematics
Date
Apr 3, 2024
Type
Pages
8
Uploaded by CommodoreCrow17890
STATS 10 Assignment 1
Lalonye Calhoun 006059433
Discussion 3A/B
Please submit both parts of the assignment in one single PDF file. You can use any PDF editor
software to merge the two parts into one file. Please make sure that the questions are in the
correct
order and clearly labeled, and that the answers are legible and easy to read.
To submit your assignment, upload the PDF file under the designated assignment page on the
course website before the deadline specified. Email or hard copy submissions are not accepted.
Part I
Include both the R commands and their corresponding outputs, results, or answers for all
exercise questions in Part I.
1. Vectors:
a. Create a vector named heights that contains the heights, in inches, of yourself and two
students near you. Print the contents of this vector.
b. Create a vector named names that contains the names of these people. Print the contents
of this vector.
c. Try typing cbind(heights, names). What did this command do? What class is this new
object?
Hint: Try the class() function.
2. Downloading data
:
a. Download the data set births.csv from the course site and upload it into RStudio. Name
the data frame NCbirths.
b. Demonstrate that you have been successful by typing head(NCbirths) and copying and
pasting the output into your word processing document.
Gender Premie weight Apgar1 Fage Mage
1
Male
No
124
8
31
25
2 Female
No
177
8
36
26
3
Male
No
107
3
30
16
4 Female
No
144
6
33
37
5
Male
No
117
9
36
33
6 Female
No
98
4
31
29
Feduc Meduc TotPreg Visits
Marital
1
13
14
1
13
Married
2
9
12
2
11 Unmarried
3
12
8
2
10 Unmarried
4
12
14
2
12 Unmarried
5
10
16
2
19
Married
6
14
16
3
20
Married
Racemom Racedad Hispmom Hispdad Gained
1
White
White NotHisp NotHisp
40
2
White
White Mexican Mexican
20
3
White Unknown Mexican Unknown
70
4
White
White NotHisp NotHisp
50
5
White
Black NotHisp NotHisp
40
6
White
White NotHisp NotHisp
21
Habit MomPriorCond BirthDef
1 NonSmoker
None
None
2 NonSmoker
None
None
3 NonSmoker At Least One
None
4 NonSmoker
None
None
5 NonSmoker At Least One
None
6 NonSmoker
None
None
DelivComp BirthComp
1 At Least One
None
2 At Least One
None
3 At Least One
None
4 At Least One
None
5
None
None
6
None
None
3. Package loading
a. Install the maps package. Verify its installation by typing find.package("maps") and
include the output in your answer.
-
[1] "C:/Users/lalon/AppData/Local/R/win-library/4.3/maps"
b. Type library(maps) to load up the package. Type map("state") and include the plot output
in your answer.
-
Use the births data set for questions 4-11
4. Perform vector operations
a. Extract the weight variable as a vector from the data frame
-
weights <- NCbirths$weight
b. What units do you think the weights are in?
-
ounces
c. Create a new vector named weights_in_pounds which are the weights of the babies in
pounds. You can look up conversion factors on the internet.
d. Demonstrate your success by typing weights_in_pounds[1:20] and including the output in
your word processing document.
Gender Premie weight Apgar1 Fage Mage Feduc
1 Male
No
124
8
31
25
13
2 Female
No
177
8
36
26
9
3
Male
No
107
3
30
16
12
4 Female
No
144
6
33
37
12
5
Male
No
117
9
36
33
10
6 Female
No
98
4
31
29
14
7
Male
No
147
8
33
30
12
8
Male
No
138
9
22
20
14
9
Female
No
104
9
30
21
12
10 Female
No
123
9
23
18
12
11 Female
No
153
8
32
31
16
12
Male
No
129
8
34
22
9
13
Male
No
119
8
27
28
17
14
Male
No
108
9
30
16
12
15 Female
No
106
8
28
29
13
16 Female
No
125
6
30
37
12
17 Female
No
115
8
30
18
12
18
Male
No
128
9
21
19
12
19
Male
No
132
8
33
21
10
20 Female
No
83
9
30
30
9
5. What is the mean weight of the babies in pounds?
-
116
a. What percentage of the mothers in the sample smoke? Hint: use the tally function with
the format argument. Use the help screen for guidance.
-
90.6% are non-smokers and 9.3% are smokers
b. According to the Centers for Disease Control, approximately 21% of adult Americans are
smokers. How far off is the percentage you found in 2 from the CDC’s report?
-
11.7%
6. Produce three different histograms of the weights in pounds. Use 3 bins, 20 bins, and
100
bins. Which histogram seems to give the best visualization, and why?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
-
Bin 20 gives the best representation because it is readable and has a wider range of
grouped data sets but not too big nor little of bins
7. We can use the syntax boxplot(vector1, vector2) to make a side by side box plot.
Create a side by side boxplot of the mother’s ages and the father’s ages. Which gender
tends to be older?
-
Females tend to be older
-
8. Try typing histogram(~ weight | Habit, data = NCbirths, layout = c(1, 2)). Describe what
this code does. Based on the graph, do you see any major differences between baby
weights from smoking moms vs. non-smoking moms?
-
It compares the weights and habits data
-
I dont see any major difference except for the fact that smoking mothers seem to have
more varying weight sizes and or more likely to have premies or borderline premies (6
pounds)
9. Produce a dot plot of the weights in pounds.
10. Consider the other categorical variables in this data. Of those that record the health
of the baby, which do you think will be associated with the mother’s smoking and why?
Make a two-way Summary Table to check your hypothesis. Do you have evidence that
this variable associated with smoking? Why?
-
Birth complications because I would think smoking could affect the how the baby is born:
I have evidence there is at least one case that can be associated with this hypothesis
-
Birth defects - because I heard drugs can have physical effects on the fetus: I have
evidence at least one case is linked to this hypothesis
-
Hospital visits, smoking could make it harder for pregnant women to breathe so
perhaps more visits would be necessary or there would be a lack of appointments
for smoker mothers since they are already aware of their bad habit.
-
11. Produce a nicely formatted scatter plot of the weight of the baby vs. the mother’s age.
plot(NCbirths$weight, NCbirths$Mage, xlab = "weight in ounces", ylab = "mothers age",
main = "weight x Mothers age")
Part II
You may choose to type or write your answers electronically or scan your handwritten
solutions. Please ensure that you show all steps and explanations to receive full credit,
unless otherwise instructed.
1. A data set on Shark Attacks Worldwide posted on StatCrunch records data on all shark
attacks in recorded history including attacks before 1800. The data set can be viewed
here: https://www.statcrunch.com/app/index.html?dataid=2188687
a. How many variables are contained in the data?
-
15 variables
b. Which of the following questions could not be answered using this data set? Briefly
Explain.
i. In what month do most shark attacks occur?
-
July
ii. Are shark attacks more likely to occur in warm temperature or cooler
Temperatures?
-
I would say warmer water by the looks of the locations and seasons
iii. Attacks by which species of shark are more likely to result in a fatality?
-
Cant be answered because the sharks aren't often listed when there is a fatality and if it
is I can't see the name on the data set
iv. What country has the most shark attacks per year?
-
Australia
c. A researcher wants to understand the age of the people in the data set and proposed some
questions of interest: Are the reported cases are mostly younger people or older people?
How is the age distributed? How would you help the research answer these questions?
What statistical tools (e.g., graphs, measures) will you use? (You only need to describe
your approach)
-
The reported cases are often either people in their early 20s or late 50s
-
I would use a scatterplot or histogram of different ages to gather them to be easily seen
2. The scores of a quiz are displayed in the graph below.
a. Describe the shape of distribution
-
It is left skewed
b. Would the mean score be greater than, less than, or about the same as the median score?
Explain.
-
The mean is us probably less than the median since our shape of dist is left skewed
c. What measures would you use to report the center and spread. Explain
-
Finding the mean, median, and mode for my center
-
And for the spread I'd find the range and standard deviation
3. The distribution of test scores in a class is unimodal and symmetric with a mean of 80
pts and a standard deviation of 7pts. Based on the information, Adam estimated that his
score is higher than approximately 97.5% of the students in class. What score did Adam
receive? Explain.
-
His grade was a 94, i solved 80(the mean) + 7(the sd of pts)x 2 (the emp sd)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4. Assume that both men and women’s heights have symmetric and unimodal
distributions. Women’s distribution has a mean of 64 inches and a standard deviation of
2.5 inches. Men’s distribution has a mean of 69 inches and a standard deviation of 3
inches.
a. What women’s height corresponds with a z-score of -1.50?
-
60.25
b. Professional basketball player Evelyn Akhator is 75 inches tall and plays in the WNBA
(women’s league). Professional basketball player Draymond Green is 79 inches tall and
plays in the NBA (men’s league). Compared to their own peers, who is taller?
Evelyn because she is 6 sds above the mean. Draymond can’t count because he is 6 sds above
the mean for men.
5. The top ten movies based on Marvel comic book characters for the U.S. box office as
of fall 2017 are shown in the following table, with domestic gross rounded to the nearest
hundred million. (Source: ultimatemovieranking.com)
a. Report the five-number summary of the domestic gross income.
-
Min: 363
-
Q1: 408
-
Med: 428.5
-
Q3: 471
-
Max: 677
b. Interpret the five-number summary in context, i.e., what information can you obtain
about the distribution of the domestic gross income?
-
Old movies gross more than the newer ones (2017)
-
The Avengers grossed the most and the least
The data set below show the number of central public libraries in 32 states.
The five number summary is given as:
Minimum Q1 Median Q3 Maximum
1 62 91 218 756
Sketch a boxplot using the five-number summary above and the data below.
Mark the values of the quartiles, the lower whisker, the upper whisker, and any potential
outliers in the boxplot. Explain how you determined the length of the whiskers.
(The scale of the plot does not need to be accurate)