Lab 7

pdf

School

Syracuse University *

*We aren’t endorsed by this school

Course

687

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

11

Uploaded by DeanTigerMaster997

Report
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 1/11 Intro to Data Science - Lab 7 Copyright 2023, Jeffrey Stanton and Jeffrey Saltz Please do not post online. Week 7 - Using ggplot to Build Complex Data Displays # Enter your name here: Swapnil Deore Please include nice comments. Instructions: Run the necessary code on your own instance of R-Studio. Attribution statement: (choose only one and delete the rest) # 1. I did this lab assignment by myself, with help from the book and the professor. Geology rocks but geography is where it’s at. . . (famous dad joke). In a global economy, geography has an important influence on everything from manufacturing to marketing to transportation. As a result, most data scientists will have to work with map data at some point in their careers. An add-on to the ggplot2 package, called ggmap , provides powerful tools for plotting and shading maps. Make sure to install the maps , mapproj , and ggmap packages before running the following: library(ggplot2); library(maps); library(ggmap); library(mapproj) us <- map_data("state") us$state_name <- tolower(us$region) map <- ggplot(us, aes(map_id= state_name)) map <- map + aes(x=long, y=lat, group=group) + geom_polygon(fill = "white", color = "black") map <- map + expand_limits(x=us$long, y=us$lat) map <- map + coord_map() + ggtitle("USA Map") map 1. Paste the code below and add a comment for each line, explaining what that line of code does. #install.packages("maps") #install.packages("ggmap") #install.packages("mapproj") library (ggplot2); library (maps); library (ggmap); library (mapproj)
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 2/11 ## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package, ## which was just loaded, were retired in October 2023. ## Please refer to R-spatial evolution reports for details, especially ## https://r-spatial.org/r/2023/05/15/evolution4.html. ## It may be desirable to make the sf package available; ## package maintainers should consider adding sf to Suggests:. ## Google's Terms of Service: <https://mapsplatform.google.com> ## Please cite ggmap if you use it! Use `citation("ggmap")` for details. us <- map_data("state") #map_data() will collect the data required to plot map in 'us' us$state_name <- tolower(us$region) #creates a new column state_name which contains region data (state names) in lower case map <- ggplot(us, aes(map_id= state_name)) #Plot creation map <- map + aes(x=long, y=lat, group=group) + geom_polygon(fill = "white", color = "black") #Ploy polygon with white fill, black border map <- map + expand_limits(x=us$long, y=us$lat) # Extend limit to make entire map visible map <- map + coord_map() + ggtitle("USA Map") # Gives map projecions and title is set map # Display map 2. The map you just created fills in the area of each state in white while outlining it with a thin black line. Use the fill= and color= commands inside the call to geom_polygon( ) to reverse the color scheme.
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 3/11 Now paste and run the following code: ny_counties <- map_data("county","new york") ggplot(ny_counties) + aes(long,lat, group=group) + geom_polygon(fill = "white", color = "black") map <- map + geom_polygon(fill = "black", color = "white") map ny_counties <- map_data("county","new york") ggplot(ny_counties) + aes(long,lat, group=group) + geom_polygon(fill = "white", color = "black")
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 4/11 3. Just as in step 2, the map you just created fills in the area of each county in black while outlining it with a thin white lines. Use the fill= and color= commands inside the call to geom_polygon( ) to reverse the color scheme. ny_counties <- map_data("county","new york") ggplot(ny_counties) + aes(long,lat, group=group) + geom_polygon(fill = "black", color = "white")
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 5/11 4. Run head(ny_counties) to verify how the county outline data looks head(ny_counties) ## long lat group order region subregion ## 1 -73.78550 42.46763 1 1 new york albany ## 2 -74.25533 42.41034 1 2 new york albany ## 3 -74.25533 42.41034 1 3 new york albany ## 4 -74.27252 42.41607 1 4 new york albany ## 5 -74.24960 42.46763 1 5 new york albany ## 6 -74.22668 42.50774 1 6 new york albany 5. Make a copy of your code from step 3 and add the following subcommand to your ggplot( ) call (don t forget to put a plus sign after the geom_polygon( ) statement to tell R that you are continuing to build the command): coord_map(projection = "mercator") In what way is the map different from the previous map. Be prepared to explain what a Mercator projection is. ny_counties <- map_data("county","new york") ggplot(ny_counties) + aes(long,lat, group=group) + geom_polygon(fill = "black", color = "white") + coord_map(projection = "mercator")
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 6/11 # Mercator is specialy used for world maps as it gives more accurate representation preserving a ngles, and is cylindrical. 6. Grab a copy of the nyData.csv data set from: https://intro-datascience.s3.us-east- 2.amazonaws.com/nyData.csv (https://intro-datascience.s3.us-east-2.amazonaws.com/nyData.csv) Read that data set into R with read_csv() . This will require you have installed and libraried the tidyverse package. The next step assumes that you have named the resulting data frame ** nyData. ** library (tidyverse) ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## dplyr 1.1.3 readr 2.1.4 ## forcats 1.0.0 stringr 1.5.0 ## lubridate 1.9.2 tibble 3.2.1 ## purrr 1.0.2 tidyr 1.3.0 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## dplyr::filter() masks stats::filter() ## dplyr::lag() masks stats::lag() ## purrr::map() masks maps::map() ## Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to becom e errors
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 7/11 nyData <- read_csv("https://intro-datascience.s3.us-east-2.amazonaws.com/nyData.csv") ## Rows: 62 Columns: 5 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## chr (1): county ## num (4): pop2010, pop2000, sqMiles, popDen ## ## Use `spec()` to retrieve the full column specification for this data. ## Specify the column types or set `show_col_types = FALSE` to quiet this message. #nyData 7. Next, merge your ny_counties data from the first set of questions with your new nyData data frame, with this code: mergeNY <- merge(ny_counties,nyData,all.x=TRUE,by.x="subregion",by.y="county") mergeNY <- merge(ny_counties,nyData,all.x=TRUE,by.x="subregion",by.y="county") #mergeNY 8. Run head(mergeNY) to verify how the merged data looks. head(mergeNY) ## subregion long lat group order region pop2010 pop2000 sqMiles ## 1 albany -73.78550 42.46763 1 1 new york 304204 294565 522.8 ## 2 albany -74.25533 42.41034 1 2 new york 304204 294565 522.8 ## 3 albany -74.25533 42.41034 1 3 new york 304204 294565 522.8 ## 4 albany -74.27252 42.41607 1 4 new york 304204 294565 522.8 ## 5 albany -74.24960 42.46763 1 5 new york 304204 294565 522.8 ## 6 albany -74.22668 42.50774 1 6 new york 304204 294565 522.8 ## popDen ## 1 581.87 ## 2 581.87 ## 3 581.87 ## 4 581.87 ## 5 581.87 ## 6 581.87 9. Now drive the fill color inside each county by adding the fill aesthetic inside of your geom_polygon( ) subcommand (fill based on pop2000 ). ggplot(mergeNY) + aes(long,lat, group=group) + geom_polygon(aes(fill= pop2000))
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 8/11 10. Create a barchart using ggplot (each county is a bar, the height should be based on pop2000 ) ggplot(mergeNY, aes(x= subregion, y= pop2000)) + geom_bar(stat= "identity", fill= "skyblue") + l abs(title= "County Population")
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 9/11 11. In a comment, compare the visualization in 9 & 10. Is one easier to understand (you must explain why). # 9th visualization is way better than 10th as it clearly gives an idea of population whereas 10 th is ambiguous and too hard to understand. 12. Extra (not required): a. Read in the following JSON datasets: ‘https://gbfs.citibikenyc.com/gbfs/en/station_information.json (https://gbfs.citibikenyc.com/gbfs/en/station_information.json)’ ‘https://gbfs.citibikenyc.com/gbfs/en/station_status.json (https://gbfs.citibikenyc.com/gbfs/en/station_status.json)’ b. Merge the datasets, based on ** station_id ** c. Clean the merged dataset to only include useful information For this work, you only need lat, lon and the number of bikes available d. Create a stamen map using ** get_stamenmap() ** Have the limits of the map be defined by the lat and lot of the stations e. Show the stations, as points on the map. f. Show the number of bikes available as a color library (jsonlite) ## ## Attaching package: 'jsonlite'
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 10/11 ## The following object is masked from 'package:purrr': ## ## flatten info <- fromJSON('https://gbfs.citibikenyc.com/gbfs/en/station_information.json') status <- fromJSON('https://gbfs.citibikenyc.com/gbfs/en/station_status.json') merged_data <- merge(info$data$stations, status$data$stations, by = "station_id") cleaned_data <- merged_data %>% select(station_id, lat, lon, num_bikes_available) #library(leaflet) #install.packages("ggmap") #library(stamenmap) map_limits <- c(min(cleaned_data$lat), min(cleaned_data$lon), max(cleaned_data$lat), max(cleaned _data$lon)) stamen_map <- get_stamenmap(map_limits, zoom = 8) ## Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL. ggplot(cleaned_data, aes(x = lon, y = lat)) + geom_point(aes(color = num_bikes_available)) + geom_text(aes(label = station_id), nudge_x = 0.001, nudge_y = 0.001, size = 2) + scale_color_g radient(low = "red", high = "yellow") + labs(title = "Citibike Stations on Map")
10/12/23, 11:43 PM Lab7a.knit file:///C:/Users/mahaj/Downloads/IDS/Lab7a.html 11/11 # Not sure why map is black.