ANguyen - Drills with R Week 1

docx

School

University of South Florida *

*We aren’t endorsed by this school

Course

6217

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by qinhann

Week 1 – Drills with R 1 Week 1 – Drills with R An Nguyen University of the Cumberlands Statistics for Data Science (MSDS-531-M30) – Full Term Dr. Ora Denton January 21 st , 2024

Week 1 – Drills with R 2 1. The student directory for a large university has 400 pages with 130 names per page, a total of 52,000 names. Using software, show how to select a simple random sample of 10 names. - I imagine that every student would have an ID # assigned to them and if we save it as data frame, most likely there would be an index column from 1 to 52000 for each student. The easiest way to pick 10 random name is to get a random sample of 10 index # from 1:52000. - R code to do so: - Result: 2. From the Murder data file, use the variable murder, which is the murder rate (per 100,000 population) for each state in the U.S. in 2017 according to the FBI Uniform Crime Reports. At first, do not use the observation for D.C. (DC). Using software: a) Find the mean and standard deviation and interpret their values. #create variable to store index # from 1 to 52000 studentDirectory <- 1:52000 #take a random sample of 10, assign to variable and print it randomSample <- sample(studentDirectory, 10) print(randomSample)

Week 1 – Drills with R 3 - R code to do so: - Results: - Mean & Standard deviation interpretation: On average, if we do not count DC then the United States have an average murder rate of 4.874. This means that the murder rates across the states are centered around 4.874. On the other hand, the standard deviation of 2.586291 indicates the dispersion of the murder rates regarding the mean. However, without further details about the distribution of the murder rates, it’s hard to say if that standard deviation #Question 2a #Assign the murder data to var murderAll murderAll <- read.table("https://stat4ds.rwth-aachen.de/data/Murder.dat", header = TRUE) #Assign murderAll var except for DC to var murderNoDC murderNoDC <- murderAll[murderAll$state != "DC", ] #Calculate mean of murder rate of all states except for DC, assign mean to var meanNoDC meanNoDC <- mean(murderNoDC$murder) # Calculate standard deviation of murder rate of all states except for DC, assign sd to var sdNoDC sdNoDC <- sd(murderNoDC$murder) # Print meanNoDC and sdNoDC print(meanNoDC) print(sdNoDC)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Week 1 – Drills with R 4 is considered high or low. But what I can is if standard deviation is low, it means that most states have murder rates that are relatively close to the mean of 4.874, while a high standard deviation would indicate otherwise. b) Find the five-number summary and construct the corresponding boxplot. - R code to do so: - Result: #Question 2b # Get the 5 number summary of murder rate by states without DC sumNoDC <- summary(murderNoDC$murder) # Print the summary print(sumNoDC) # Generate a boxplot on murder rate by states without DC boxplot(murderNoDC$murder, ylab="Murder Rate") print(sdNoDC)

Week 1 – Drills with R 5 c) Now include the observation for D.C. What is affected more by this outlier: The mean or the median? - R code to do so: - Result: # Question 2c # Calculate murder rate mean of all states, assign to var meanWDC meanWDC <- mean(murderAll$murder) # Print meanWDC and medianWDC print(meanWDC) print(medianWDC) #To understand the stat better, generate a boxplot on murder rate by states boxplot(murderAll$murder, ylab = "Murder Rate")

Week 1 – Drills with R 6 - Without DC, the mean and median was respectively 4.874 and 4.85. After we took DC into consideration, the mean has increased by 0.378941 to 5.252941 while the median also increase, but only, by 0.15. With that, we can say that the mean was affected by the outlier, DC murder rate, more than the median 3. The Houses data file lists the selling price (thousands of dollars), size (square feet), tax bill (dollars), number of bathrooms, number of bedrooms, and

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Week 1 – Drills with R 7 whether the house is new (1 = yes,0 = no) for 100 home sales in Gainesville, Florida. Let’s analyze the selling prices. a) Construct a frequency distribution and a histogram. - R code to do so: - Result: #Question 3a # Assign the house data to var houses houses <- read.table("https://stat4ds.rwth-aachen.de/data/Houses.dat", header = TRUE) # Constructing a frequency table priceSeq <- seq(min(houses$price), max(houses$price), length.out = 10) housesFreqTab <- table(cut(houses$price, breaks = priceSeq, include.lowest = TRUE)) print(housesFreqTab) # Constructing a histogram hist(houses$price, breaks = priceSeq, xlab = "Selling Prices", ylab = "Frequency")

Week 1 – Drills with R 8 b) Find the percentage of observations that fall within one standard deviation of the mean. - R code to do so: #Question 3b #Calculate mean, sd as well as lower, upper and within range to find %, assigning to var meanPrice <- mean(houses$price) sdPrice <- sd(houses$price) lowerRange <- meanPrice - sdPrice upperRange <- meanPrice + sdPrice withinRange <- sum(houses$price >= lowerRange & houses$price <= upperRange) # Calculate the % of observations that fall within 1 sd of the mean percWithinRange <- (withinRange / nrow(houses)) * 100 print(percWithinRange)

Week 1 – Drills with R 9 - Result: c) Construct a boxplot. - R code to do so: - Result: #Question 3b #boxplot(houses$price, ylab = “Selling Price”)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Week 1 – Drills with R 10

Related Documents

ANguyen - Drills with R Week 6.docx

PSYC 312 STEINBERG QUIZ 1.docx

PSYC 312 HOLCOMB QUIZ 1.docx

Midterm-AppliedStats (1).pdf

stat503_Spring_2024_hw1.docx

mini 4-2.docx

MATH201 - W5 - What are the chances Assignment - Laura Christ.docx

Assignment#5Q.docx

Assignment#2A.docx

Week 9 Homework_ Simulation - ISYE-6644-OAN_O01_Q_ASY.pdf

Week 3 Homework_ Simulation - ISYE-6644-OAN_O01_Q_ASY.pdf

Week 1 Homework_ Simulation - ISYE-6644-OAN_O01_Q_ASY.pdf

Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...

Algebra

ISBN:9780547587776

Author:HOLT MCDOUGAL

Publisher:HOLT MCDOUGAL

College Algebra (MindTap Course List)

Algebra

ISBN:9781305652231

Author:R. David Gustafson, Jeff Hughes

Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...

Algebra

ISBN:9781680331141

Author:HOUGHTON MIFFLIN HARCOURT

Publisher:Houghton Mifflin Harcourt

SEE MORE TEXTBOOKS

Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt