Lab5.knit

pdf

School

Syracuse University *

*We aren’t endorsed by this school

Course

687

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

7

Uploaded by DeanTigerMaster997

Report
9/28/23, 10:55 PM Lab5.knit file:///C:/Users/morea/Downloads/Lab5.html 1/7 Intro to Data Science - Lab 5 Copyright 2022, Jeffrey Stanton and Jeffrey Saltz Please do not post online. Week 5 - Obtaining and Using Data from a JSON API # Enter your name here: Adesh More Please include nice comments. Instructions: Run the necessary code on your own instance of R-Studio. Attribution statement: (choose only one and delete the rest) # 1. I did this lab assignment by myself, with help from the book and the professor. JSON (JavaScript Object Notation) provides a transparent, user friendly data exchange format that many organizations use to make their data available over the web. JSON is human readable, but is also highly structured, and its support for nested data structures makes it highly flexible. Today we will use JSON to obtain data from the NYC CitiBike project . The CitiBike project provides an application programming interface (API) that members of the public can access to get up-todate information on the status of more than 800 bike stations. You may need to install the RCurl and jsonlite packages to get the code to work. station_link <- 'https://gbfs.citibikenyc.com/gbfs/en/station_status.json' apiOutput <- getURL(station_link) apiData <- fromJSON(apiOutput) stationStatus <- apiData$data$stations cols <- c('num_bikes_disabled','num_docks_disabled', 'station_id', 'num_ebikes_available', 'num_bikes_available', 'num_docks_available') stationStatus = stationStatus[,cols] 1. Explain what you see if you type in the station_link URL into a browser (in a comment, write what you see) #It is a JSON file. It contains key value pairs. It gives information about stations of citibik e. 2. Paste the code from above here and provide a comment explaining each line of code.
9/28/23, 10:55 PM Lab5.knit file:///C:/Users/morea/Downloads/Lab5.html 2/7 #install.packages("RCurl") library (RCurl) library (jsonlite) station_link <- 'https://gbfs.citibikenyc.com/gbfs/en/station_status.json' #take the link in var iable apiOutput <- getURL(station_link) #grab JSON data apiData <- fromJSON(apiOutput) #parse the data stationStatus <- apiData$data$stations #take data of stations cols <- c('num_bikes_disabled','num_docks_disabled', 'station_id', 'num_ebikes_available', 'num_bikes_available', 'num_docks_available') #create a subset of impor tant columns stationStatus = stationStatus[,cols] #add the column- cols 3. Use str( ) to find out the structure of apiOutput and apiData . Report (via a comment) what you found and explain the difference between these two objects. #str(stationStatus) str(apiData) ## List of 4 ## $ data :List of 1 ## ..$ stations:'data.frame': 2097 obs. of 14 variables: ## .. ..$ is_installed : int [1:2097] 0 0 0 0 0 0 0 0 1 1 ... ## .. ..$ last_reported : int [1:2097] 86400 86400 86400 86400 86400 86400 86400 86 400 1695955827 1695955826 ... ## .. ..$ num_bikes_available : int [1:2097] 0 0 0 0 0 0 0 0 6 8 ... ## .. ..$ num_docks_disabled : int [1:2097] 0 0 0 0 0 0 0 0 0 0 ... ## .. ..$ is_returning : int [1:2097] 0 0 0 0 0 0 0 0 1 1 ... ## .. ..$ is_renting : int [1:2097] 0 0 0 0 0 0 0 0 1 1 ... ## .. ..$ station_id : chr [1:2097] "1854538387432581604" "1851254468343716806" "c638ec67-9ac0-416f-944f-619926144931" "b442a648-e9f4-4893-951a-64d258bc0e55" ... ## .. ..$ legacy_id : chr [1:2097] "1854538387432581604" "1851254468343716806" "4675" "3699" ... ## .. ..$ num_ebikes_available : int [1:2097] 0 0 0 0 0 0 0 0 2 1 ... ## .. ..$ eightd_has_available_keys: logi [1:2097] FALSE FALSE FALSE FALSE FALSE FALSE ... ## .. ..$ num_bikes_disabled : int [1:2097] 0 0 0 0 0 0 0 0 1 0 ... ## .. ..$ num_docks_available : int [1:2097] 0 0 0 0 0 0 0 0 16 12 ... ## .. ..$ num_scooters_available : int [1:2097] NA NA NA NA NA NA NA NA 0 0 ... ## .. ..$ num_scooters_unavailable : int [1:2097] NA NA NA NA NA NA NA NA 0 0 ... ## $ last_updated: int 1695955971 ## $ ttl : int 60 ## $ version : chr "1.1" str(apiOutput) ## chr "{\"data\": {\"stations\": [{\"is_installed\": 0, \"last_reported\": 86400, \"num_bikes_ available\": 0, \"num_do"| __truncated__
9/28/23, 10:55 PM Lab5.knit file:///C:/Users/morea/Downloads/Lab5.html 3/7 #apiData contains the whole JSON data parsed, whereas apiOutput is just JSON data 4. The apiOutput object can also be examined with a custom function from the jsonlite package called prettify( ) . Run this command and explain what you found (in a comment). library (jsonlite) #prettify(apiOutput) #The data is very easy to read, user friendly 5. Explain stationStatus (what type of object, what information is available). str(stationStatus) ## 'data.frame': 2097 obs. of 6 variables: ## $ num_bikes_disabled : int 0 0 0 0 0 0 0 0 1 0 ... ## $ num_docks_disabled : int 0 0 0 0 0 0 0 0 0 0 ... ## $ station_id : chr "1854538387432581604" "1851254468343716806" "c638ec67-9ac0-416f -944f-619926144931" "b442a648-e9f4-4893-951a-64d258bc0e55" ... ## $ num_ebikes_available: int 0 0 0 0 0 0 0 0 2 1 ... ## $ num_bikes_available : int 0 0 0 0 0 0 0 0 6 8 ... ## $ num_docks_available : int 0 0 0 0 0 0 0 0 16 12 ... #It is a data frame which contains 6 variables/columns, 5 integers and a character. 6. Generate a histogram of the number of docks available hist(stationStatus$num_docks_available)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
9/28/23, 10:55 PM Lab5.knit file:///C:/Users/morea/Downloads/Lab5.html 4/7 7. Generate a histogram of the number of bikes available hist(stationStatus$num_bikes_available)
9/28/23, 10:55 PM Lab5.knit file:///C:/Users/morea/Downloads/Lab5.html 5/7 8. How many stations have at least one ebike? OneBike <- sum(stationStatus$num_ebikes_available>0) OneBike ## [1] 1547 9. Explore stations with at least one ebike and create a new dataframe, that only has stations with at least one eBike. #AtleastOne <- c(stationStatus$num_ebikes_available > 0) #NewStation <- stationStatus[AtleastOne,] stations_with_ebikes <- stationStatus[stationStatus$num_ebikes_available > 0, ] 10. Calculate the mean of ** num_docks_available ** for this new dataframe. mean(stations_with_ebikes$num_docks_available) ## [1] 11.76147 11. Calculate the mean of ** num_docks_available ** for for the full ** stationStatus ** dataframe. In a comment, explain how different are the two means?
9/28/23, 10:55 PM Lab5.knit file:///C:/Users/morea/Downloads/Lab5.html 6/7 mean(stationStatus$num_bikes_available) ## [1] 14.89413 # Current mean is of number of available bikes where available bikes can aslo be zero which is n ot in case of previous mean. 12. Create a new attribute, called ** stationSize **, which is the total number of slots available for a bike (that might, or might not, have a bike in it now). Just add together these fields stationStatus num_bikes_available, stationStatus num_bikes_disabled, stationStatus$num_docks_disabled. Run a histogram on this variable and review the distribution. stationStatus$stationSize <- stationStatus$num_ebikes_available + stationStatus$num_bikes_availa ble+ stationStatus$num_docks_available+ stationStatus$num_bikes_disabled+stationStatus$num_docks_disabled hist(stationStatus$stationSize) 13. Use the plot( ) command to produce an X-Y scatter plot with the number of occupied docks on the X-axis and the number of available bikes on the Y-axis. Explain the results plot.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
9/28/23, 10:55 PM Lab5.knit file:///C:/Users/morea/Downloads/Lab5.html 7/7 plot(x = stationStatus$num_docks_available, y = stationStatus$num_bikes_available) #There are many stations which have docks and bikes available between 0 to 40