Lab 7
pdf
keyboard_arrow_up
School
Syracuse University *
*We aren’t endorsed by this school
Course
687
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
11
Uploaded by DeanTigerMaster997
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
1/11
Intro to Data Science - Lab 7
Copyright 2023, Jeffrey Stanton and Jeffrey Saltz Please do not post online.
Week 7 - Using ggplot to Build Complex Data
Displays
# Enter your name here: Swapnil Deore
Please include nice comments.
Instructions:
Run the necessary code on your own instance of R-Studio.
Attribution statement: (choose only one and delete the rest)
# 1. I did this lab assignment by myself, with help from the book and the professor.
Geology rocks but geography is where it’s at. . . (famous dad joke). In a global economy, geography has an
important influence on everything from manufacturing to marketing to transportation. As a result, most data
scientists will have to work with map data at some point in their careers.
An add-on to the ggplot2
package, called ggmap
, provides powerful tools for plotting and shading maps.
Make sure to install the maps
, mapproj
, and ggmap
packages before running the following:
library(ggplot2); library(maps); library(ggmap); library(mapproj)
us <- map_data("state")
us$state_name <- tolower(us$region)
map <- ggplot(us, aes(map_id= state_name))
map <- map + aes(x=long, y=lat, group=group) +
geom_polygon(fill = "white", color = "black")
map <- map + expand_limits(x=us$long, y=us$lat)
map <- map + coord_map() + ggtitle("USA Map")
map
1. Paste the code below and add a comment for each line, explaining what that line of code does.
#install.packages("maps")
#install.packages("ggmap")
#install.packages("mapproj")
library
(ggplot2); library
(maps); library
(ggmap); library
(mapproj)
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
2/11
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, were retired in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## ℹ
Google's Terms of Service: <https://mapsplatform.google.com>
## ℹ
Please cite ggmap if you use it! Use `citation("ggmap")` for details.
us <- map_data("state") #map_data() will collect the data required to plot map in 'us' us$state_name <- tolower(us$region) #creates a new column state_name which contains region data
(state names) in lower case
map <- ggplot(us, aes(map_id= state_name)) #Plot creation
map <- map + aes(x=long, y=lat, group=group) +
geom_polygon(fill = "white", color = "black") #Ploy polygon with white fill, black border
map <- map + expand_limits(x=us$long, y=us$lat) # Extend limit to make entire map visible
map <- map + coord_map() + ggtitle("USA Map") # Gives map projecions and title is set
map # Display map
2. The map you just created fills in the area of each state in white while outlining it with a thin black line. Use
the fill=
and color=
commands inside the call to geom_polygon( )
to reverse the color scheme.
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
3/11
Now paste and run the following code:
ny_counties <- map_data("county","new york")
ggplot(ny_counties) + aes(long,lat, group=group) + geom_polygon(fill
= "white", color = "black")
map <- map + geom_polygon(fill = "black", color = "white")
map
ny_counties <- map_data("county","new york")
ggplot(ny_counties) + aes(long,lat, group=group) + geom_polygon(fill
= "white", color = "black")
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
4/11
3. Just as in step 2, the map you just created fills in the area of each county in black while outlining it with a
thin white lines. Use the fill=
and color=
commands inside the call to geom_polygon( )
to reverse the color
scheme.
ny_counties <- map_data("county","new york")
ggplot(ny_counties) + aes(long,lat, group=group) + geom_polygon(fill
= "black", color = "white")
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
5/11
4. Run head(ny_counties)
to verify how the county outline data looks
head(ny_counties)
## long lat group order region subregion
## 1 -73.78550 42.46763 1 1 new york albany
## 2 -74.25533 42.41034 1 2 new york albany
## 3 -74.25533 42.41034 1 3 new york albany
## 4 -74.27252 42.41607 1 4 new york albany
## 5 -74.24960 42.46763 1 5 new york albany
## 6 -74.22668 42.50774 1 6 new york albany
5. Make a copy of your code from step 3 and add the following subcommand to your ggplot( ) call (don t forget
to put a plus sign after the geom_polygon( )
statement to tell R that you are continuing to build the
command):
coord_map(projection = "mercator")
In what way is the map different from the previous map. Be prepared to explain what a Mercator projection
is.
ny_counties <- map_data("county","new york")
ggplot(ny_counties) + aes(long,lat, group=group) + geom_polygon(fill
= "black", color = "white") + coord_map(projection = "mercator")
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
6/11
# Mercator is specialy used for world maps as it gives more accurate representation preserving a
ngles, and is cylindrical.
6. Grab a copy of the nyData.csv data set from: https://intro-datascience.s3.us-east-
2.amazonaws.com/nyData.csv (https://intro-datascience.s3.us-east-2.amazonaws.com/nyData.csv)
Read that data set into R with read_csv()
. This will require you have installed and libraried the tidyverse
package. The next step assumes that you have named the resulting data frame ** nyData. **
library
(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔
dplyr 1.1.3 ✔
readr 2.1.4
## ✔
forcats 1.0.0 ✔
stringr 1.5.0
## ✔
lubridate 1.9.2 ✔
tibble 3.2.1
## ✔
purrr 1.0.2 ✔
tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖
dplyr::filter() masks stats::filter()
## ✖
dplyr::lag() masks stats::lag()
## ✖
purrr::map() masks maps::map()
## ℹ
Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to becom
e errors
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
7/11
nyData <- read_csv("https://intro-datascience.s3.us-east-2.amazonaws.com/nyData.csv")
## Rows: 62 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): county
## num (4): pop2010, pop2000, sqMiles, popDen
## ## ℹ
Use `spec()` to retrieve the full column specification for this data.
## ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
#nyData
7. Next, merge your ny_counties
data from the first set of questions with your new nyData
data frame, with
this code:
mergeNY <- merge(ny_counties,nyData,all.x=TRUE,by.x="subregion",by.y="county")
mergeNY <- merge(ny_counties,nyData,all.x=TRUE,by.x="subregion",by.y="county")
#mergeNY 8. Run head(mergeNY)
to verify how the merged data looks.
head(mergeNY)
## subregion long lat group order region pop2010 pop2000 sqMiles
## 1 albany -73.78550 42.46763 1 1 new york 304204 294565 522.8
## 2 albany -74.25533 42.41034 1 2 new york 304204 294565 522.8
## 3 albany -74.25533 42.41034 1 3 new york 304204 294565 522.8
## 4 albany -74.27252 42.41607 1 4 new york 304204 294565 522.8
## 5 albany -74.24960 42.46763 1 5 new york 304204 294565 522.8
## 6 albany -74.22668 42.50774 1 6 new york 304204 294565 522.8
## popDen
## 1 581.87
## 2 581.87
## 3 581.87
## 4 581.87
## 5 581.87
## 6 581.87
9. Now drive the fill color inside each county by adding the fill
aesthetic inside of your geom_polygon( )
subcommand (fill based on pop2000
).
ggplot(mergeNY) + aes(long,lat, group=group) + geom_polygon(aes(fill= pop2000))
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
8/11
10. Create a barchart using ggplot (each county is a bar, the height should be based on pop2000
)
ggplot(mergeNY, aes(x= subregion, y= pop2000)) + geom_bar(stat= "identity", fill= "skyblue") + l
abs(title= "County Population")
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
9/11
11. In a comment, compare the visualization in 9 & 10. Is one easier to understand (you must explain why).
# 9th visualization is way better than 10th as it clearly gives an idea of population whereas 10
th is ambiguous and too hard to understand.
12. Extra (not required):
a. Read in the following JSON datasets: ‘https://gbfs.citibikenyc.com/gbfs/en/station_information.json
(https://gbfs.citibikenyc.com/gbfs/en/station_information.json)’
‘https://gbfs.citibikenyc.com/gbfs/en/station_status.json
(https://gbfs.citibikenyc.com/gbfs/en/station_status.json)’
b. Merge the datasets, based on ** station_id **
c. Clean the merged dataset to only include useful information
For this work, you only need lat, lon and the number of bikes available
d. Create a stamen map using ** get_stamenmap() **
Have the limits of the map be defined by the lat and lot of the stations
e. Show the stations, as points on the map.
f. Show the number of bikes available as a color
library
(jsonlite)
## ## Attaching package: 'jsonlite'
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
10/11
## The following object is masked from 'package:purrr':
## ## flatten
info <- fromJSON('https://gbfs.citibikenyc.com/gbfs/en/station_information.json')
status <- fromJSON('https://gbfs.citibikenyc.com/gbfs/en/station_status.json')
merged_data <- merge(info$data$stations, status$data$stations, by = "station_id")
cleaned_data <- merged_data %>% select(station_id, lat, lon, num_bikes_available)
#library(leaflet)
#install.packages("ggmap")
#library(stamenmap)
map_limits <- c(min(cleaned_data$lat), min(cleaned_data$lon), max(cleaned_data$lat), max(cleaned
_data$lon))
stamen_map <- get_stamenmap(map_limits, zoom = 8)
## ℹ
Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.
ggplot(cleaned_data, aes(x = lon, y = lat)) +
geom_point(aes(color = num_bikes_available)) +
geom_text(aes(label = station_id), nudge_x = 0.001, nudge_y = 0.001, size = 2) + scale_color_g
radient(low = "red", high = "yellow") +
labs(title = "Citibike Stations on Map")
10/12/23, 11:43 PM
Lab7a.knit
file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html
11/11
# Not sure why map is black.
Related Documents
Related Questions
Could you please take a screenshot or list the procedure when you do all the graphical parts by using Minitab. (Please do only the last one(d)
The monthly rainfall (in mm) in a small country for last 41 years is given in the data set Rain_Fall. Copy the given data to a MINITAB worksheet. Answer the following. (Copy and paste the MINITAB output. Resize and wrap to fit into the given area) .
a. Draw a histogram for the variable rainfall. (Copy and Paste the MINITAB graph. Resize and remove excess white space)
b. Draw box plots for the rainfall for each month. (Copy and Paste the MINITAB graph. Resize and remove excess white space. There should be 12 box-plots, 1 for each month.)
c. Find the mean and median, variance, and standard deviation of the rainfall by each month. (Copy and Paste the MINITAB output)
d. Find the total rainfall by each month. (Copy and Paste the MINITAB output)
Rain_fall
Month
6.7
1
8.9
1
6.7
1
7.3
1
4.9
1
3.2
1
4.9
1
9.2
1
7.6
1
2.8…
arrow_forward
Oudo
HP TrueVision HD
Student Portal | Main
Log In LoudCloud Systems
P Take a Test - James Strehl
mathxl.com/Student/PlayerTest.aspx?lis_person_sourcedid%=Dlcs.Ims%3auser&inst_class_id%3D2414079&appproductid%3D3&basiclti_submit=Launch+Endpoint+with+BasicLTI+Data&tool_consumner_in
Apps Student Portal
CLA In-Text Citations: T...
Style Guides - Citin...
P math lab Course H...
MAT-154-0501
James Stre
Quiz: Topic 2 Quiz
Time Rema
This Question: 1 pt
2 of 16 (1 complete)
Fit a regression line to the data shown in the chart, and find the coefficient of correlation for the line. Use the regression line to predict life expectancy in the year 2000, where x is the number of decades after 1900.
0 (1900)
2 (1920)
life expectancy, y 48.1 years 50 4 years | 52.0 years | 53.2 years 54.2 years
year, x
4 (1940)
6 (1960)
8 (1980)
Choose the correct regression line below.
O A. y = 48.58x + 0.750
B. y = 0.750x -
48 58
O C. y = 48.58
O D. y = 0.750x + 48 58
The coefficient of correlation is
(Round to three…
arrow_forward
please help answer these questions I made a pdf because these are practice questions.
arrow_forward
Please share an excel screen on how to input the data for #2 only.
Thank you
arrow_forward
Please show all the steps
arrow_forward
Hi, I am unable to click on link to follow steps since the work is show as an image.
Can someone please write out the steps to show all of the work?
Thank you
arrow_forward
Use a graphing utility to evaluate nPr
50P4
arrow_forward
Can you please help me with the answer for the attached file?
arrow_forward
Please help!
arrow_forward
T-133-X14 X
Connect
zto.mheducation.com/ext/map/index.html?_con3con&external_browser%3D0&launchUrl=https%253A%252F%252Fnewconnect.mheducation.com%252F#/activity/q
mework: Chapter 8 (Sections 8.4 through 8.6) i
Help
Save
Saved
Of 41 bank customers depositing a check, 15 received some cash back.
(a) Construct a 90 percent confidence interval for the proportion of all depositors who ask for cash back. (Round your answers to 4
decimal places.)
The 90% confidence interval is from
to
(b) May normality of p be assumed?
pok
O Yes
O No
sk
rint
rences
arrow_forward
please assist
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- Could you please take a screenshot or list the procedure when you do all the graphical parts by using Minitab. (Please do only the last one(d) The monthly rainfall (in mm) in a small country for last 41 years is given in the data set Rain_Fall. Copy the given data to a MINITAB worksheet. Answer the following. (Copy and paste the MINITAB output. Resize and wrap to fit into the given area) . a. Draw a histogram for the variable rainfall. (Copy and Paste the MINITAB graph. Resize and remove excess white space) b. Draw box plots for the rainfall for each month. (Copy and Paste the MINITAB graph. Resize and remove excess white space. There should be 12 box-plots, 1 for each month.) c. Find the mean and median, variance, and standard deviation of the rainfall by each month. (Copy and Paste the MINITAB output) d. Find the total rainfall by each month. (Copy and Paste the MINITAB output) Rain_fall Month 6.7 1 8.9 1 6.7 1 7.3 1 4.9 1 3.2 1 4.9 1 9.2 1 7.6 1 2.8…arrow_forwardOudo HP TrueVision HD Student Portal | Main Log In LoudCloud Systems P Take a Test - James Strehl mathxl.com/Student/PlayerTest.aspx?lis_person_sourcedid%=Dlcs.Ims%3auser&inst_class_id%3D2414079&appproductid%3D3&basiclti_submit=Launch+Endpoint+with+BasicLTI+Data&tool_consumner_in Apps Student Portal CLA In-Text Citations: T... Style Guides - Citin... P math lab Course H... MAT-154-0501 James Stre Quiz: Topic 2 Quiz Time Rema This Question: 1 pt 2 of 16 (1 complete) Fit a regression line to the data shown in the chart, and find the coefficient of correlation for the line. Use the regression line to predict life expectancy in the year 2000, where x is the number of decades after 1900. 0 (1900) 2 (1920) life expectancy, y 48.1 years 50 4 years | 52.0 years | 53.2 years 54.2 years year, x 4 (1940) 6 (1960) 8 (1980) Choose the correct regression line below. O A. y = 48.58x + 0.750 B. y = 0.750x - 48 58 O C. y = 48.58 O D. y = 0.750x + 48 58 The coefficient of correlation is (Round to three…arrow_forwardplease help answer these questions I made a pdf because these are practice questions.arrow_forward
- T-133-X14 X Connect zto.mheducation.com/ext/map/index.html?_con3con&external_browser%3D0&launchUrl=https%253A%252F%252Fnewconnect.mheducation.com%252F#/activity/q mework: Chapter 8 (Sections 8.4 through 8.6) i Help Save Saved Of 41 bank customers depositing a check, 15 received some cash back. (a) Construct a 90 percent confidence interval for the proportion of all depositors who ask for cash back. (Round your answers to 4 decimal places.) The 90% confidence interval is from to (b) May normality of p be assumed? pok O Yes O No sk rint rencesarrow_forwardplease assistarrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Mathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL