Lab 3
pdf
keyboard_arrow_up
School
Syracuse University *
*We aren’t endorsed by this school
Course
687
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
8
Uploaded by DeanTigerMaster997
9/14/23, 11:54 AM
Lab3.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab3.html
1/8
Intro to Data Science - Lab 3
Copyright 2022, Jeffrey Stanton and Jeffrey Saltz Please do not post online.
Week 3 - Using Descriptive Statistics & Writing
Functions
# Enter your name here: Swapnil Deore
Please include nice comments.
Instructions:
Run the necessary code on your own instance of R-Studio.Save the code: It will be useful on your homework!
Attribution statement: (choose only one and delete the rest)
# 2. I did this lab assignment with help from the book and the professor and these Internet sour
ces:
1. Get an explanation of the contents of the state.x77
data set:
help("state.x77")
help("state.x77")
## starting httpd help server ... done
2. Create a dataframe from the built-in state.x77
data set, store in a variable named ** dfStates77 **
dfStates77 <- as.data.frame(state.x77)
3. Summarize the variables in your dfStates77
data set - using the summary()
function
summary(dfStates77)
9/14/23, 11:54 AM
Lab3.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab3.html
2/8
## Population Income Illiteracy Life Exp ## Min. : 365 Min. :3098 Min. :0.500 Min. :67.96 ## 1st Qu.: 1080 1st Qu.:3993 1st Qu.:0.625 1st Qu.:70.12 ## Median : 2838 Median :4519 Median :0.950 Median :70.67 ## Mean : 4246 Mean :4436 Mean :1.170 Mean :70.88 ## 3rd Qu.: 4968 3rd Qu.:4814 3rd Qu.:1.575 3rd Qu.:71.89 ## Max. :21198 Max. :6315 Max. :2.800 Max. :73.60 ## Murder HS Grad Frost Area ## Min. : 1.400 Min. :37.80 Min. : 0.00 Min. : 1049 ## 1st Qu.: 4.350 1st Qu.:48.05 1st Qu.: 66.25 1st Qu.: 36985 ## Median : 6.850 Median :53.25 Median :114.50 Median : 54277 ## Mean : 7.378 Mean :53.11 Mean :104.46 Mean : 70736 ## 3rd Qu.:10.675 3rd Qu.:59.15 3rd Qu.:139.75 3rd Qu.: 81163 ## Max. :15.100 Max. :67.30 Max. :188.00 Max. :566432
4. Calculate the total population of the U.S. by adding together the populations of each of the individual states
in dfStates77
. Store the result in a new variable called totalPop77
.
totalPop77 <- sum(dfStates77$Population)
5. Use R code to read a CSV data file directly from the web. Store the dataset into a new dataframe, called
dfStates17. The URL is: “https://intro-datascience.s3.us-east-2.amazonaws.com/statesNew.csv
(https://intro-datascience.s3.us-east-2.amazonaws.com/statesNew.csv)”
Note: Use the function read_csv( )
to read in the data. You will need to run library(tidyverse)
before you
can run read_csv( )
. If that generates an error, then you first need to do
install.packages("tidyverse")
library
(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔
dplyr 1.1.3 ✔
readr 2.1.4
## ✔
forcats 1.0.0 ✔
stringr 1.5.0
## ✔
ggplot2 3.4.3 ✔
tibble 3.2.1
## ✔
lubridate 1.9.2 ✔
tidyr 1.3.0
## ✔
purrr 1.0.2 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖
dplyr::filter() masks stats::filter()
## ✖
dplyr::lag() masks stats::lag()
## ℹ
Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to becom
e errors
url <- "https://intro-datascience.s3.us-east-2.amazonaws.com/statesNew.csv"
dfStates17 <- read_csv(url)
9/14/23, 11:54 AM
Lab3.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab3.html
3/8
## Rows: 50 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): state, slug, code, nickname, website, capital_city, capital_url, ...
## dbl (3): admission_number, population, population_rank
## date (1): admission_date
## ## ℹ
Use `spec()` to retrieve the full column specification for this data.
## ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
6. Summarize the variables in your new data set, using the summary()
command.
summary(dfStates17)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/14/23, 11:54 AM
Lab3.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab3.html
4/8
## state slug code nickname ## Length:50 Length:50 Length:50 Length:50 ## Class :character Class :character Class :character Class :character ## Mode :character Mode :character Mode :character Mode :character ## ## ## ## website admission_date admission_number capital_city ## Length:50 Min. :1787-12-07 Min. : 1.00 Length:50 ## Class :character 1st Qu.:1790-08-06 1st Qu.:13.25 Class :character ## Mode :character Median :1836-10-05 Median :25.50 Mode :character ## Mean :1840-03-14 Mean :25.50 ## 3rd Qu.:1874-03-24 3rd Qu.:37.75 ## Max. :1959-08-21 Max. :50.00 ## capital_url population population_rank constitution_url ## Length:50 Min. : 582658 Min. : 1.00 Length:50 ## Class :character 1st Qu.: 1857857 1st Qu.:13.25 Class :character ## Mode :character Median : 4510382 Median :25.50 Mode :character ## Mean : 6309648 Mean :25.50 ## 3rd Qu.: 6901760 3rd Qu.:37.75 ## Max. :38332521 Max. :50.00 ## state_flag_url state_seal_url map_image_url ## Length:50 Length:50 Length:50 ## Class :character Class :character Class :character ## Mode :character Mode :character Mode :character ## ## ## ## landscape_background_url skyline_background_url twitter_url ## Length:50 Length:50 Length:50 ## Class :character Class :character Class :character ## Mode :character Mode :character Mode :character ## ## ## ## facebook_url ## Length:50 ## Class :character ## Mode :character ## ## ## 7. The data you now have stored in dfStates17
were collected in 2017. As such, about 40 years passed
between the two data collections. Calculate the total 2017 population of the U.S. in dfStates17
by adding
together the populations of each of the individual states. Store the result in a new variable called
totalPop17
.
colnames(dfStates17)
9/14/23, 11:54 AM
Lab3.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab3.html
5/8
## [1] "state" "slug" ## [3] "code" "nickname" ## [5] "website" "admission_date" ## [7] "admission_number" "capital_city" ## [9] "capital_url" "population" ## [11] "population_rank" "constitution_url" ## [13] "state_flag_url" "state_seal_url" ## [15] "map_image_url" "landscape_background_url"
## [17] "skyline_background_url" "twitter_url" ## [19] "facebook_url"
totalPop17 <- sum(dfStates17$Population)
## Warning: Unknown or uninitialised column: `Population`.
8. Create and interpret a ratio of totalPop77
to totalPop17
. Check to ensure that the result makes sense!
Create a function that, given population and area, calculates population density by
dividing a population value by an area value. Here is the core of the function:
popDensity <- function (pop, area) {
# Add your code below here:
# Next, divide pop by area and store the result in a
# variable called popDens
return(popDens) # This provides the function s output
}
ratio <- totalPop77 / totalPop17
cat("The ratio of total population in 1977 to total population in 2017 is:", ratio, "\n")
## The ratio of total population in 1977 to total population in 2017 is: Inf
if
(ratio > 1) {
cat("The total population in 1977 was larger than in 2017.")
} else
if
(ratio < 1) {
cat("The total population in 1977 was smaller than in 2017.")
} else
{
cat("The total population in 1977 was the same as in 2017.")
}
## The total population in 1977 was larger than in 2017.
9/14/23, 11:54 AM
Lab3.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab3.html
6/8
popDensity <- function
(pop, area) {
# Calculate population density by dividing pop by area
popDens <- pop / area
return
(popDens)
}
# Calculate population density for a hypothetical population of 1000 and area of 50 square miles
density <- popDensity(1000, 50)
cat("Population density is", density, "people per square mile.")
## Population density is 20 people per square mile.
9. After you finish your function, make sure to run all of the lines of code in it so that the function becomes
known to R.
10. Make a fresh copy of state.x77
into dfStates77
dfStates77 <- as.data.frame(state.x77)
11. Store the population vector in a variable called tempPop
. Adjust the tempPop
as needed (based on your
analysis above)
adjustment_factor <- totalPop77 / totalPop17
tempPop <- dfStates77$Population
tempPop_adjusted <- tempPop * adjustment_factor
12. Store the area vector in a variable, called tempArea
tempArea <- dfStates77$Area
13. Now use tempPop
and tempArea to call your function:
popDensity(tempPop, tempArea)
density <- popDensity(tempPop, tempArea)
cat("Population density is:", density, "people per square unit (e.g., square mile or square kilo
meter).")
## Population density is: 0.07129053 0.0006443845 0.01950325 0.04061989 0.1355709 0.02448779 0.6
375977 0.2921292 0.1530227 0.08491037 0.1350973 0.009833448 0.2008503 0.1471867 0.05114317 0.027
87729 0.08542245 0.08470955 0.03421734 0.4167425 0.7429083 0.1603569 0.049452 0.04949679 0.06909
196 0.005124084 0.02018749 0.005369054 0.08995237 0.9750033 0.009422462 0.3779139 0.1115005 0.00
9195502 0.261989 0.03947254 0.02374615 0.2637548 0.8875119 0.09316791 0.008965835 0.1009727 0.04
668223 0.01465358 0.05093342 0.1252137 0.05346252 0.07474034 0.08425749 0.003868193 people per s
quare unit (e.g., square mile or square kilometer).
14. Store the results from the previous task in a column of the dfStates77
dataframe, called popDensity
.
dfStates77$popDensity <- popDensity(tempPop, tempArea)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
9/14/23, 11:54 AM
Lab3.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab3.html
7/8
15. Use which.max( )
and which.min( )
to reveal which is the most densely populated and which is the least
densely populated state. Make sure that you understand the number that is revealed as well as the name of
the state.
most_densely_populated_index <- which.max(dfStates77$popDensity)
least_densely_populated_index <- which.min(dfStates77$popDensity)
most_densely_populated_state <- rownames(dfStates77)[most_densely_populated_index]
least_densely_populated_state <- rownames(dfStates77)[least_densely_populated_index]
most_densely_populated_density <- dfStates77$popDensity[most_densely_populated_index]
least_densely_populated_density <- dfStates77$popDensity[least_densely_populated_index]
cat("Most densely populated state:", most_densely_populated_state, "with a density of", most_den
sely_populated_density, "people per square unit.\n")
## Most densely populated state: New Jersey with a density of 0.9750033 people per square unit.
cat("Least densely populated state:", least_densely_populated_state, "with a density of", least_
densely_populated_density, "people per square unit.\n")
## Least densely populated state: Alaska with a density of 0.0006443845 people per square unit.
16. Using tidyverse, sort the dataframe using the popDensity
attribute, then using the slice()
function, show the
first row in the sorted database.
library
(dplyr)
dfStates77_sorted <- dfStates77 %>% arrange(popDensity)
first_row <- dfStates77_sorted %>% slice(1)
print(first_row)
## Population Income Illiteracy Life Exp Murder HS Grad Frost Area
## Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
## popDensity
## Alaska 0.0006443845
17. How was the dataframe sorted (was the minimum first or the maximum)? Explain in a comment.
9/14/23, 11:54 AM
Lab3.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab3.html
8/8
dfStates77_sorted <- dfStates77 %>% arrange(popDensity)
# The arrange() function with popDensity as the sorting variable sorts the dataframe in ascendin
g order by default, so the states with the lowest population density appear at the beginning of the sorted dataframe.
Related Documents
Related Questions
Please produce descriptive statistics for this data set
arrow_forward
The table below displays the adult literacy rate in Bolivia for several different years. The adult literacy rate is the percentage of people ages 15 and above who can both read and write with understanding a short simple statement about their everyday life.
Data downloaded on 2/19/2020 from https://ourworldindata.org/grapher/literacy-rate-adults?tab=chart&time=1973..2016.
Year 1976 2001 2012
Literacy Rate 63.2% 86.7% 94.5%
When answering the questions below, round to four decimal places in your intermediate computations.
Use interpolation or extrapolation (whichever is appropriate) to predict the literacy rate in Bolivia in 1992. Round your answer to one decimal place. You only get one submission for the unit.
---Select---
Use interpolation or extrapolation (whichever is appropriate) to predict the literacy rate in Bolivia in 2050. Round your answer to one decimal place. You only get one submission for the unit.
---Select---
Is your 2050 prediction realistic? You must…
arrow_forward
Aplicaciones
M Gmail
YouTube
Maps
Noticias G Traducir
T&content_id%3D
* Question Completion Status:
The following set of data represents the number of orders filled by a national-chain restaurant during a two week period. Construct a five number summary
for the the data.
66, 75, 68, 89, 86, 73, 67, 75, 75, 82, 85, 74, 67, 61
(Round to the nearest hundredth, if needed).
Min
Lower Quartile
Median
Upper Quartile
Maximum
What is the range and the interquartile range (IQR)?
Range
Interquartile Range (1QR)
local, family-owned restaurant also gathered data for two weeks of orders. The following set of data represents the number of orders filled by this
Save All Ans
Click Save and Submit to save and submit. Click Save All Answers to save all answers.
Relative
Reading - Mapp.pdf
ANY
Worksheet - Py....docx
W
Worksheet - W....docx
* MLK Letter -2.pdf
ACIC
四国07A|
útv
DIC.
11
arrow_forward
Give an example of an actual or potential application of big data or data mining in a organization. Describe how the application meets the criteria of being big data or data mining.
arrow_forward
A data set contains the observations 8,5,4,6,9. find ( ∑x )^2
arrow_forward
Describe about the three positive relationships of Scatterplots?
arrow_forward
Data is shared with us every day and we encounter it wherever we go. There are different types of data from a variety of data sources.
Identify 4 different types of data.
arrow_forward
Please do not give solution in image format thanku
College students and surfers Rex Robinson and Sandy Hudson collected data on the self-reported number of days surfed in a month for 30 longboard surfers and 30 shortboard surfers. Complete parts a and b below. Longboard: , , , Full data set , Shortboard: , , , ,, , , , ,
a. Compute the means for both longboards and shortboards.
The mean for longboards is nothing days. (Round to one decimal places as needed.)
arrow_forward
KINDLY PLEASE ANSWER THIS IN PRECISE AND ACCURATE MANNER AND PLEASE WRITE OR TYPE LEGIBLY THANK YOU SO MUCH FOR FOLLOWING THE INSTRUCTIONS.
Write a paragraph or two that interprets and analyzes each data set represented in tabular/graphical forms. Aside from data interpretation, explain whether the data presentation effectively communicates the information.
arrow_forward
Just the number 2, please
arrow_forward
Briefly explain the procedure you may adopt to summarize data set obtained from field study.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University
Related Questions
- Please produce descriptive statistics for this data setarrow_forwardThe table below displays the adult literacy rate in Bolivia for several different years. The adult literacy rate is the percentage of people ages 15 and above who can both read and write with understanding a short simple statement about their everyday life. Data downloaded on 2/19/2020 from https://ourworldindata.org/grapher/literacy-rate-adults?tab=chart&time=1973..2016. Year 1976 2001 2012 Literacy Rate 63.2% 86.7% 94.5% When answering the questions below, round to four decimal places in your intermediate computations. Use interpolation or extrapolation (whichever is appropriate) to predict the literacy rate in Bolivia in 1992. Round your answer to one decimal place. You only get one submission for the unit. ---Select--- Use interpolation or extrapolation (whichever is appropriate) to predict the literacy rate in Bolivia in 2050. Round your answer to one decimal place. You only get one submission for the unit. ---Select--- Is your 2050 prediction realistic? You must…arrow_forwardAplicaciones M Gmail YouTube Maps Noticias G Traducir T&content_id%3D * Question Completion Status: The following set of data represents the number of orders filled by a national-chain restaurant during a two week period. Construct a five number summary for the the data. 66, 75, 68, 89, 86, 73, 67, 75, 75, 82, 85, 74, 67, 61 (Round to the nearest hundredth, if needed). Min Lower Quartile Median Upper Quartile Maximum What is the range and the interquartile range (IQR)? Range Interquartile Range (1QR) local, family-owned restaurant also gathered data for two weeks of orders. The following set of data represents the number of orders filled by this Save All Ans Click Save and Submit to save and submit. Click Save All Answers to save all answers. Relative Reading - Mapp.pdf ANY Worksheet - Py....docx W Worksheet - W....docx * MLK Letter -2.pdf ACIC 四国07A| útv DIC. 11arrow_forward
- Give an example of an actual or potential application of big data or data mining in a organization. Describe how the application meets the criteria of being big data or data mining.arrow_forwardA data set contains the observations 8,5,4,6,9. find ( ∑x )^2arrow_forwardDescribe about the three positive relationships of Scatterplots?arrow_forward
- Data is shared with us every day and we encounter it wherever we go. There are different types of data from a variety of data sources. Identify 4 different types of data.arrow_forwardPlease do not give solution in image format thanku College students and surfers Rex Robinson and Sandy Hudson collected data on the self-reported number of days surfed in a month for 30 longboard surfers and 30 shortboard surfers. Complete parts a and b below. Longboard: , , , Full data set , Shortboard: , , , ,, , , , , a. Compute the means for both longboards and shortboards. The mean for longboards is nothing days. (Round to one decimal places as needed.)arrow_forwardKINDLY PLEASE ANSWER THIS IN PRECISE AND ACCURATE MANNER AND PLEASE WRITE OR TYPE LEGIBLY THANK YOU SO MUCH FOR FOLLOWING THE INSTRUCTIONS. Write a paragraph or two that interprets and analyzes each data set represented in tabular/graphical forms. Aside from data interpretation, explain whether the data presentation effectively communicates the information.arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Elementary AlgebraAlgebraISBN:9780998625713Author:Lynn Marecek, MaryAnne Anthony-SmithPublisher:OpenStax - Rice University
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University