Lab 3

pdf

School

Syracuse University *

*We aren’t endorsed by this school

Course

687

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by DeanTigerMaster997

9/14/23, 11:54 AM Lab3.knit file:///C:/Users/mahaj/Downloads/IDS/Lab3.html 1/8 Intro to Data Science - Lab 3 Copyright 2022, Jeffrey Stanton and Jeffrey Saltz Please do not post online. Week 3 - Using Descriptive Statistics & Writing Functions # Enter your name here: Swapnil Deore Please include nice comments. Instructions: Run the necessary code on your own instance of R-Studio.Save the code: It will be useful on your homework! Attribution statement: (choose only one and delete the rest) # 2. I did this lab assignment with help from the book and the professor and these Internet sour ces: 1. Get an explanation of the contents of the state.x77 data set: help("state.x77") help("state.x77") ## starting httpd help server ... done 2. Create a dataframe from the built-in state.x77 data set, store in a variable named ** dfStates77 ** dfStates77 <- as.data.frame(state.x77) 3. Summarize the variables in your dfStates77 data set - using the summary() function summary(dfStates77)

9/14/23, 11:54 AM Lab3.knit file:///C:/Users/mahaj/Downloads/IDS/Lab3.html 2/8 ## Population Income Illiteracy Life Exp ## Min. : 365 Min. :3098 Min. :0.500 Min. :67.96 ## 1st Qu.: 1080 1st Qu.:3993 1st Qu.:0.625 1st Qu.:70.12 ## Median : 2838 Median :4519 Median :0.950 Median :70.67 ## Mean : 4246 Mean :4436 Mean :1.170 Mean :70.88 ## 3rd Qu.: 4968 3rd Qu.:4814 3rd Qu.:1.575 3rd Qu.:71.89 ## Max. :21198 Max. :6315 Max. :2.800 Max. :73.60 ## Murder HS Grad Frost Area ## Min. : 1.400 Min. :37.80 Min. : 0.00 Min. : 1049 ## 1st Qu.: 4.350 1st Qu.:48.05 1st Qu.: 66.25 1st Qu.: 36985 ## Median : 6.850 Median :53.25 Median :114.50 Median : 54277 ## Mean : 7.378 Mean :53.11 Mean :104.46 Mean : 70736 ## 3rd Qu.:10.675 3rd Qu.:59.15 3rd Qu.:139.75 3rd Qu.: 81163 ## Max. :15.100 Max. :67.30 Max. :188.00 Max. :566432 4. Calculate the total population of the U.S. by adding together the populations of each of the individual states in dfStates77 . Store the result in a new variable called totalPop77 . totalPop77 <- sum(dfStates77$Population) 5. Use R code to read a CSV data file directly from the web. Store the dataset into a new dataframe, called dfStates17. The URL is: “https://intro-datascience.s3.us-east-2.amazonaws.com/statesNew.csv (https://intro-datascience.s3.us-east-2.amazonaws.com/statesNew.csv)” Note: Use the function read_csv( ) to read in the data. You will need to run library(tidyverse) before you can run read_csv( ) . If that generates an error, then you first need to do install.packages("tidyverse") library (tidyverse) ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.3 ✔ readr 2.1.4 ## ✔ forcats 1.0.0 ✔ stringr 1.5.0 ## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1 ## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0 ## ✔ purrr 1.0.2 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to becom e errors url <- "https://intro-datascience.s3.us-east-2.amazonaws.com/statesNew.csv" dfStates17 <- read_csv(url)

9/14/23, 11:54 AM Lab3.knit file:///C:/Users/mahaj/Downloads/IDS/Lab3.html 3/8 ## Rows: 50 Columns: 19 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## chr (15): state, slug, code, nickname, website, capital_city, capital_url, ... ## dbl (3): admission_number, population, population_rank ## date (1): admission_date ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 6. Summarize the variables in your new data set, using the summary() command. summary(dfStates17)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

9/14/23, 11:54 AM Lab3.knit file:///C:/Users/mahaj/Downloads/IDS/Lab3.html 4/8 ## state slug code nickname ## Length:50 Length:50 Length:50 Length:50 ## Class :character Class :character Class :character Class :character ## Mode :character Mode :character Mode :character Mode :character ## ## ## ## website admission_date admission_number capital_city ## Length:50 Min. :1787-12-07 Min. : 1.00 Length:50 ## Class :character 1st Qu.:1790-08-06 1st Qu.:13.25 Class :character ## Mode :character Median :1836-10-05 Median :25.50 Mode :character ## Mean :1840-03-14 Mean :25.50 ## 3rd Qu.:1874-03-24 3rd Qu.:37.75 ## Max. :1959-08-21 Max. :50.00 ## capital_url population population_rank constitution_url ## Length:50 Min. : 582658 Min. : 1.00 Length:50 ## Class :character 1st Qu.: 1857857 1st Qu.:13.25 Class :character ## Mode :character Median : 4510382 Median :25.50 Mode :character ## Mean : 6309648 Mean :25.50 ## 3rd Qu.: 6901760 3rd Qu.:37.75 ## Max. :38332521 Max. :50.00 ## state_flag_url state_seal_url map_image_url ## Length:50 Length:50 Length:50 ## Class :character Class :character Class :character ## Mode :character Mode :character Mode :character ## ## ## ## landscape_background_url skyline_background_url twitter_url ## Length:50 Length:50 Length:50 ## Class :character Class :character Class :character ## Mode :character Mode :character Mode :character ## ## ## ## facebook_url ## Length:50 ## Class :character ## Mode :character ## ## ## 7. The data you now have stored in dfStates17 were collected in 2017. As such, about 40 years passed between the two data collections. Calculate the total 2017 population of the U.S. in dfStates17 by adding together the populations of each of the individual states. Store the result in a new variable called totalPop17 . colnames(dfStates17)

9/14/23, 11:54 AM Lab3.knit file:///C:/Users/mahaj/Downloads/IDS/Lab3.html 5/8 ## [1] "state" "slug" ## [3] "code" "nickname" ## [5] "website" "admission_date" ## [7] "admission_number" "capital_city" ## [9] "capital_url" "population" ## [11] "population_rank" "constitution_url" ## [13] "state_flag_url" "state_seal_url" ## [15] "map_image_url" "landscape_background_url" ## [17] "skyline_background_url" "twitter_url" ## [19] "facebook_url" totalPop17 <- sum(dfStates17$Population) ## Warning: Unknown or uninitialised column: `Population`. 8. Create and interpret a ratio of totalPop77 to totalPop17 . Check to ensure that the result makes sense! Create a function that, given population and area, calculates population density by dividing a population value by an area value. Here is the core of the function: popDensity <- function (pop, area) { # Add your code below here: # Next, divide pop by area and store the result in a # variable called popDens return(popDens) # This provides the function s output } ratio <- totalPop77 / totalPop17 cat("The ratio of total population in 1977 to total population in 2017 is:", ratio, "\n") ## The ratio of total population in 1977 to total population in 2017 is: Inf if (ratio > 1) { cat("The total population in 1977 was larger than in 2017.") } else if (ratio < 1) { cat("The total population in 1977 was smaller than in 2017.") } else { cat("The total population in 1977 was the same as in 2017.") } ## The total population in 1977 was larger than in 2017.

9/14/23, 11:54 AM Lab3.knit file:///C:/Users/mahaj/Downloads/IDS/Lab3.html 6/8 popDensity <- function (pop, area) { # Calculate population density by dividing pop by area popDens <- pop / area return (popDens) } # Calculate population density for a hypothetical population of 1000 and area of 50 square miles density <- popDensity(1000, 50) cat("Population density is", density, "people per square mile.") ## Population density is 20 people per square mile. 9. After you finish your function, make sure to run all of the lines of code in it so that the function becomes known to R. 10. Make a fresh copy of state.x77 into dfStates77 dfStates77 <- as.data.frame(state.x77) 11. Store the population vector in a variable called tempPop . Adjust the tempPop as needed (based on your analysis above) adjustment_factor <- totalPop77 / totalPop17 tempPop <- dfStates77$Population tempPop_adjusted <- tempPop * adjustment_factor 12. Store the area vector in a variable, called tempArea tempArea <- dfStates77$Area 13. Now use tempPop and tempArea to call your function: popDensity(tempPop, tempArea) density <- popDensity(tempPop, tempArea) cat("Population density is:", density, "people per square unit (e.g., square mile or square kilo meter).") ## Population density is: 0.07129053 0.0006443845 0.01950325 0.04061989 0.1355709 0.02448779 0.6 375977 0.2921292 0.1530227 0.08491037 0.1350973 0.009833448 0.2008503 0.1471867 0.05114317 0.027 87729 0.08542245 0.08470955 0.03421734 0.4167425 0.7429083 0.1603569 0.049452 0.04949679 0.06909 196 0.005124084 0.02018749 0.005369054 0.08995237 0.9750033 0.009422462 0.3779139 0.1115005 0.00 9195502 0.261989 0.03947254 0.02374615 0.2637548 0.8875119 0.09316791 0.008965835 0.1009727 0.04 668223 0.01465358 0.05093342 0.1252137 0.05346252 0.07474034 0.08425749 0.003868193 people per s quare unit (e.g., square mile or square kilometer). 14. Store the results from the previous task in a column of the dfStates77 dataframe, called popDensity . dfStates77$popDensity <- popDensity(tempPop, tempArea)

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

9/14/23, 11:54 AM Lab3.knit file:///C:/Users/mahaj/Downloads/IDS/Lab3.html 7/8 15. Use which.max( ) and which.min( ) to reveal which is the most densely populated and which is the least densely populated state. Make sure that you understand the number that is revealed as well as the name of the state. most_densely_populated_index <- which.max(dfStates77$popDensity) least_densely_populated_index <- which.min(dfStates77$popDensity) most_densely_populated_state <- rownames(dfStates77)[most_densely_populated_index] least_densely_populated_state <- rownames(dfStates77)[least_densely_populated_index] most_densely_populated_density <- dfStates77$popDensity[most_densely_populated_index] least_densely_populated_density <- dfStates77$popDensity[least_densely_populated_index] cat("Most densely populated state:", most_densely_populated_state, "with a density of", most_den sely_populated_density, "people per square unit.\n") ## Most densely populated state: New Jersey with a density of 0.9750033 people per square unit. cat("Least densely populated state:", least_densely_populated_state, "with a density of", least_ densely_populated_density, "people per square unit.\n") ## Least densely populated state: Alaska with a density of 0.0006443845 people per square unit. 16. Using tidyverse, sort the dataframe using the popDensity attribute, then using the slice() function, show the first row in the sorted database. library (dplyr) dfStates77_sorted <- dfStates77 %>% arrange(popDensity) first_row <- dfStates77_sorted %>% slice(1) print(first_row) ## Population Income Illiteracy Life Exp Murder HS Grad Frost Area ## Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432 ## popDensity ## Alaska 0.0006443845 17. How was the dataframe sorted (was the minimum first or the maximum)? Explain in a comment.

9/14/23, 11:54 AM Lab3.knit file:///C:/Users/mahaj/Downloads/IDS/Lab3.html 8/8 dfStates77_sorted <- dfStates77 %>% arrange(popDensity) # The arrange() function with popDensity as the sorting variable sorts the dataframe in ascendin g order by default, so the states with the lowest population density appear at the beginning of the sorted dataframe.