STATS 10 Assignment 1

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

10

Subject

Mathematics

Date

Apr 3, 2024

Type

pdf

Pages

8

Uploaded by CommodoreCrow17890

Report
STATS 10 Assignment 1 Lalonye Calhoun 006059433 Discussion 3A/B Please submit both parts of the assignment in one single PDF file. You can use any PDF editor software to merge the two parts into one file. Please make sure that the questions are in the correct order and clearly labeled, and that the answers are legible and easy to read. To submit your assignment, upload the PDF file under the designated assignment page on the course website before the deadline specified. Email or hard copy submissions are not accepted. Part I Include both the R commands and their corresponding outputs, results, or answers for all exercise questions in Part I. 1. Vectors: a. Create a vector named heights that contains the heights, in inches, of yourself and two students near you. Print the contents of this vector. b. Create a vector named names that contains the names of these people. Print the contents of this vector. c. Try typing cbind(heights, names). What did this command do? What class is this new object? Hint: Try the class() function. 2. Downloading data : a. Download the data set births.csv from the course site and upload it into RStudio. Name the data frame NCbirths. b. Demonstrate that you have been successful by typing head(NCbirths) and copying and pasting the output into your word processing document. Gender Premie weight Apgar1 Fage Mage 1 Male No 124 8 31 25 2 Female No 177 8 36 26 3 Male No 107 3 30 16 4 Female No 144 6 33 37 5 Male No 117 9 36 33 6 Female No 98 4 31 29 Feduc Meduc TotPreg Visits Marital 1 13 14 1 13 Married 2 9 12 2 11 Unmarried 3 12 8 2 10 Unmarried 4 12 14 2 12 Unmarried 5 10 16 2 19 Married 6 14 16 3 20 Married Racemom Racedad Hispmom Hispdad Gained 1 White White NotHisp NotHisp 40 2 White White Mexican Mexican 20
3 White Unknown Mexican Unknown 70 4 White White NotHisp NotHisp 50 5 White Black NotHisp NotHisp 40 6 White White NotHisp NotHisp 21 Habit MomPriorCond BirthDef 1 NonSmoker None None 2 NonSmoker None None 3 NonSmoker At Least One None 4 NonSmoker None None 5 NonSmoker At Least One None 6 NonSmoker None None DelivComp BirthComp 1 At Least One None 2 At Least One None 3 At Least One None 4 At Least One None 5 None None 6 None None 3. Package loading a. Install the maps package. Verify its installation by typing find.package("maps") and include the output in your answer. - [1] "C:/Users/lalon/AppData/Local/R/win-library/4.3/maps" b. Type library(maps) to load up the package. Type map("state") and include the plot output in your answer. - Use the births data set for questions 4-11 4. Perform vector operations a. Extract the weight variable as a vector from the data frame - weights <- NCbirths$weight b. What units do you think the weights are in? - ounces c. Create a new vector named weights_in_pounds which are the weights of the babies in pounds. You can look up conversion factors on the internet. d. Demonstrate your success by typing weights_in_pounds[1:20] and including the output in your word processing document.
Gender Premie weight Apgar1 Fage Mage Feduc 1 Male No 124 8 31 25 13 2 Female No 177 8 36 26 9 3 Male No 107 3 30 16 12 4 Female No 144 6 33 37 12 5 Male No 117 9 36 33 10 6 Female No 98 4 31 29 14 7 Male No 147 8 33 30 12 8 Male No 138 9 22 20 14 9 Female No 104 9 30 21 12 10 Female No 123 9 23 18 12 11 Female No 153 8 32 31 16 12 Male No 129 8 34 22 9 13 Male No 119 8 27 28 17 14 Male No 108 9 30 16 12 15 Female No 106 8 28 29 13 16 Female No 125 6 30 37 12 17 Female No 115 8 30 18 12 18 Male No 128 9 21 19 12 19 Male No 132 8 33 21 10 20 Female No 83 9 30 30 9 5. What is the mean weight of the babies in pounds? - 116 a. What percentage of the mothers in the sample smoke? Hint: use the tally function with the format argument. Use the help screen for guidance. - 90.6% are non-smokers and 9.3% are smokers b. According to the Centers for Disease Control, approximately 21% of adult Americans are smokers. How far off is the percentage you found in 2 from the CDC’s report? - 11.7% 6. Produce three different histograms of the weights in pounds. Use 3 bins, 20 bins, and 100 bins. Which histogram seems to give the best visualization, and why?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
- Bin 20 gives the best representation because it is readable and has a wider range of grouped data sets but not too big nor little of bins 7. We can use the syntax boxplot(vector1, vector2) to make a side by side box plot. Create a side by side boxplot of the mother’s ages and the father’s ages. Which gender tends to be older? - Females tend to be older - 8. Try typing histogram(~ weight | Habit, data = NCbirths, layout = c(1, 2)). Describe what this code does. Based on the graph, do you see any major differences between baby weights from smoking moms vs. non-smoking moms? - It compares the weights and habits data - I dont see any major difference except for the fact that smoking mothers seem to have more varying weight sizes and or more likely to have premies or borderline premies (6 pounds) 9. Produce a dot plot of the weights in pounds.
10. Consider the other categorical variables in this data. Of those that record the health of the baby, which do you think will be associated with the mother’s smoking and why? Make a two-way Summary Table to check your hypothesis. Do you have evidence that this variable associated with smoking? Why? - Birth complications because I would think smoking could affect the how the baby is born: I have evidence there is at least one case that can be associated with this hypothesis - Birth defects - because I heard drugs can have physical effects on the fetus: I have evidence at least one case is linked to this hypothesis - Hospital visits, smoking could make it harder for pregnant women to breathe so perhaps more visits would be necessary or there would be a lack of appointments for smoker mothers since they are already aware of their bad habit. - 11. Produce a nicely formatted scatter plot of the weight of the baby vs. the mother’s age. plot(NCbirths$weight, NCbirths$Mage, xlab = "weight in ounces", ylab = "mothers age", main = "weight x Mothers age") Part II You may choose to type or write your answers electronically or scan your handwritten solutions. Please ensure that you show all steps and explanations to receive full credit,
unless otherwise instructed. 1. A data set on Shark Attacks Worldwide posted on StatCrunch records data on all shark attacks in recorded history including attacks before 1800. The data set can be viewed here: https://www.statcrunch.com/app/index.html?dataid=2188687 a. How many variables are contained in the data? - 15 variables b. Which of the following questions could not be answered using this data set? Briefly Explain. i. In what month do most shark attacks occur? - July ii. Are shark attacks more likely to occur in warm temperature or cooler Temperatures? - I would say warmer water by the looks of the locations and seasons iii. Attacks by which species of shark are more likely to result in a fatality? - Cant be answered because the sharks aren't often listed when there is a fatality and if it is I can't see the name on the data set iv. What country has the most shark attacks per year? - Australia c. A researcher wants to understand the age of the people in the data set and proposed some questions of interest: Are the reported cases are mostly younger people or older people? How is the age distributed? How would you help the research answer these questions? What statistical tools (e.g., graphs, measures) will you use? (You only need to describe your approach) - The reported cases are often either people in their early 20s or late 50s - I would use a scatterplot or histogram of different ages to gather them to be easily seen 2. The scores of a quiz are displayed in the graph below. a. Describe the shape of distribution - It is left skewed b. Would the mean score be greater than, less than, or about the same as the median score? Explain. - The mean is us probably less than the median since our shape of dist is left skewed c. What measures would you use to report the center and spread. Explain - Finding the mean, median, and mode for my center - And for the spread I'd find the range and standard deviation 3. The distribution of test scores in a class is unimodal and symmetric with a mean of 80 pts and a standard deviation of 7pts. Based on the information, Adam estimated that his score is higher than approximately 97.5% of the students in class. What score did Adam receive? Explain. - His grade was a 94, i solved 80(the mean) + 7(the sd of pts)x 2 (the emp sd)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4. Assume that both men and women’s heights have symmetric and unimodal distributions. Women’s distribution has a mean of 64 inches and a standard deviation of 2.5 inches. Men’s distribution has a mean of 69 inches and a standard deviation of 3 inches. a. What women’s height corresponds with a z-score of -1.50? - 60.25 b. Professional basketball player Evelyn Akhator is 75 inches tall and plays in the WNBA (women’s league). Professional basketball player Draymond Green is 79 inches tall and plays in the NBA (men’s league). Compared to their own peers, who is taller? Evelyn because she is 6 sds above the mean. Draymond can’t count because he is 6 sds above the mean for men. 5. The top ten movies based on Marvel comic book characters for the U.S. box office as of fall 2017 are shown in the following table, with domestic gross rounded to the nearest hundred million. (Source: ultimatemovieranking.com) a. Report the five-number summary of the domestic gross income. - Min: 363 - Q1: 408 - Med: 428.5 - Q3: 471 - Max: 677 b. Interpret the five-number summary in context, i.e., what information can you obtain about the distribution of the domestic gross income? - Old movies gross more than the newer ones (2017) - The Avengers grossed the most and the least The data set below show the number of central public libraries in 32 states. The five number summary is given as: Minimum Q1 Median Q3 Maximum 1 62 91 218 756 Sketch a boxplot using the five-number summary above and the data below. Mark the values of the quartiles, the lower whisker, the upper whisker, and any potential outliers in the boxplot. Explain how you determined the length of the whiskers. (The scale of the plot does not need to be accurate)