ANguyen - Drills with R Week 6

docx

School

University of South Florida *

*We aren’t endorsed by this school

Course

6217

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by qinhann

Week 6 – Drills with R 1 Week 6 – Drills with R An Nguyen University of the Cumberlands Statistics for Data Science (MSDS-531-M30) – Full Term Dr. Ora Denton February 14 th , 2024

Week 6 – Drills with R 2 1. A study investigates the distribution of annual income for heads of households living in public housing in Chicago. For a random sample of size 30, the annual incomes (in thousands of dollars) are in the Chicago data file. a) Based on a descriptive graphic, describe the shape of the sample data distribution. Find and interpret point estimates of the population mean and standard deviation. - R code to do so: - Results: - Data distribution shape: #Question 1a #Assign the Chicago data to var chicagoData chicagoData <- read.table("https://stat4ds.rwth-aachen.de/data/Chicago.dat", header = TRUE) #Get numeric summary for chicagoData sumChicagoData <- summary(chicagoData$income) #Print the summary print(sumChicagoData) #Reinstall and activate ggplot2 install.packages("ggplot2") library(ggplot2) #Generate the histogram hist(chicagoData$income, breaks = 10, xlab = “Income (k dollars)”) #Point of estimates of population mean and standard deviation chicagoMean <-mean(chicagoData$income) chicagoSD <- sd(chicagoData$income) print(chicagoMean) print(chicagoSD)

Week 6 – Drills with R 3 The income distribution for the sample of breadwinners in Chicago public housing is quite skewed to the right. This portrays that most of these households have low income and there are very few with high annual income. From the 5 number summary for the data, we can see that the mean annual income is higher than the median income, which also indicate that the shape of the distribution is skewed to the right. - Mean & Standard deviation interpretation: We can see that heads of household in Chicago’s public housing is making an average of $k. This means that there are many of these households who earn around that much per year. Besides that, with a standard deviation of $k. b) Construct a 95% confidence interval for μ, using R software. - R code to do so: - Result: #Question 1b # Perform a t-test on the sample result <- t.test(chicagoData$income) print(result)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Week 6 – Drills with R 4 2. The Anorexia data file contains results for the cognitive behavioral and family therapies and the control group. Using data for the 17 girls who received the family therapy: a) Conduct a descriptive statistical analysis using graphs and numerical summaries. - R code to do so: - Result: - Descriptive statistical analysis: b) Construct a 95% confidence interval for the difference between the population mean weight changes for the family therapy and the control. - R code to do so: #Question 2a # Assign the anorexia data to var anorexiaData anorexiaData <- read.table("https://stat4ds.rwth-aachen.de/data/Anorexia.dat", header = TRUE) # Inspect anorexiaData structure str(anorexiaData) # Convert anorexiaData$therapy to factor type anorexiaData$therapy <- as.factor(anorexiaData$therapy) #Create boxplot to visualize weight changes distribution ggplot(anorexiaData, aes(x=therapy, y=after-before)) + geom_boxplot() + xlab(“Therapy”) + ylab(“Weight Change”) + ggtitle(“Weight Change by Therapy”) #Question 2b #Subset data for the family therapy group familyData <- subset(anorexiaData, therapy ==”f”) #Subset data for control therapy group

Week 6 – Drills with R 5 - Result: