ps7_tonyzhang

pdf

School

Pennsylvania State University *

*We aren’t endorsed by this school

Course

184

Subject

Computer Science

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by MajorMaskMule13

Problem Set 7 Tony Zhang 3/13/2024 Use Headers Use headers to organize your document. The first level heading is denoted by a single pound sign/hash tag, # . Each new problem/exercise should get a Level 1 Heading. For subparts, increase the heading level by increasing the number of hash tags. For example, if Problem 1 has Parts A (with parts i-ii) and B, your R Markdown file would have the following: # Problem 1 [text] ## Part A [text] ### Part i [text] ### Part ii [text] ## Part B [text] Code There are two ways to include code in your document: inline and chunks. Inline Code To add inline code, you’ll need to type a grave mark ‘ (the key to the left of the numeral 1 key), followed by a lower case r, a space, then the R commands you wish to r and a final grave. For example ‘ r nrow(dataFrame) ‘ would return the number of rows in the data frame named “dataFrame”. Inline code is good for calling values you have stored and doing quick calculations on those values. Inline code will not be added to the Code Appendix. Code Chunks For more complicated code such as data manipulation and cleaning, creating graphs or tables, model building and testing, you’ll want to use code chunks. You can do this in two ways: • You can click the Insert button found just above the RStudio’s editor page (has an icon of a white circle with a green plus sign and a green square with a white C) and selecting R from the drop down list. • You can create your own code chunk by typing three graves in a row, returning twice and typing three more graves. You should see the editor become shaded gray for those three lines. You will want to write your code starting in the middle blank line. In the first line, right after the third grave, you’ll want to set options including coding language and chunk name as well as other options (e.g., figure caption and dimensions). 1

Mathematics To type mathematical formulas, you will need to use LaTeX commands. For inline mathematics you’ll need to enclose your mathematical expression in $ and $. For display math (on it’s own line and centered), enclose the expression in \[ and \]. The following code will automatically create your Code Appendix by grabbing all of your code chunks and writing that code here. Take a moment to look through the appendix and make sure that your code is fully readable. Use comments in your code to help create markers for what code does what. 2

Code Appendix # This template file is based off of a template created by Alex Hayes # https://github.com/alexpghayes/rmarkdown_homework_template # Setting Document Options knitr :: opts_chunk $ set ( echo = TRUE , warning = FALSE , message = FALSE , fig.align = "center" ) install.packages ( "dcData" , repos = "http://cran.us.r-project.org" ) library (dcData) data ( "ZipGeography" ) num_counties <- length ( unique (ZipGeography $ County)) num_counties city_states_count <- table (ZipGeography $ CityName, ZipGeography $ State) city_states_summary <- data.frame ( CityName = rownames (city_states_count), NumStates = rowSums (city_states_count > 0 ) ) city_states_summary <- city_states_summary[ trimws (city_states_summary $ CityName) != "" , ] city_states_summary <- city_states_summary[ order ( - city_states_summary $ NumStates), ] head (city_states_summary) state_populations <- aggregate (Population ~ State, data = ZipGeography, sum) city_state_populations <- aggregate (Population ~ CityName + State, data = ZipGeography, sum) city_state_population_ratio <- merge (city_state_populations, state_populations, by = "State" ) city_state_population_ratio $ Ratio <- city_state_population_ratio $ Population.x / city_state_population_ratio $ Pop cities_over_5_percent <- subset (city_state_population_ratio, Ratio > 0.05 ) result_df <- data.frame ( CityName = character (), NumStates = numeric ()) for (state in unique (cities_over_5_percent $ State)) { state_population <- subset (state_populations, State == state) $ Population cities_in_state <- subset (cities_over_5_percent, State == state & Ratio > 0.05 ) result_df <- rbind (result_df, data.frame ( CityName = cities_in_state $ CityName, NumStates = nrow (cities_in_stat } result_df <- result_df[ order ( - result_df $ NumStates), ] head (result_df) state_timezones <- aggregate (Timezone ~ State, data = ZipGeography, FUN = function (x) length ( unique (x))) total_states_multiple_timezones <- sum (state_timezones $ Timezone > 1 ) total_states_multiple_timezones city_timezones <- aggregate (Timezone ~ CityName, data = ZipGeography, FUN = function (x) length ( unique (x))) total_cities_multiple_timezones <- sum (city_timezones $ Timezone > 1 ) total_cities_multiple_timezones county_timezones <- aggregate (Timezone ~ County, data = ZipGeography, FUN = function (x) length ( unique (x))) total_counties_multiple_timezones <- sum (county_timezones $ Timezone > 1 ) total_counties_multiple_timezones install.packages ( "dcData" , repos = "http://cran.us.r-project.org" ) library (dcData) data ( "ZipGeography" ) Problem 11.1 The join family of data verbs requires a data table as one of the arguments because they merge datasets based on common variables or keys. 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Problem 11.2 The tables cannot be successfully combined because the tables have a different amount of rows and the country order is different. A proper combination would use a join operation based on a common identifier. Problem 11.5 Part 1 num_counties <- length ( unique (ZipGeography $ County)) num_counties ## [1] 1910 Part 2 city_states_count <- table (ZipGeography $ CityName, ZipGeography $ State) city_states_summary <- data.frame ( CityName = rownames (city_states_count), NumStates = rowSums (city_states_count > 0 ) ) city_states_summary <- city_states_summary[ trimws (city_states_summary $ CityName) != "" , ] city_states_summary <- city_states_summary[ order ( - city_states_summary $ NumStates), ] head (city_states_summary) ## CityName NumStates ## Franklin Franklin 27 ## Clinton Clinton 26 ## Madison Madison 26 ## Washington Washington 26 ## Chester Chester 24 ## Greenville Greenville 24 The city names that are used in the most states are Franklin, Clinton, Madison, Washington, Chester, and Greenville. Part 3 state_populations <- aggregate (Population ~ State, data = ZipGeography, sum) city_state_populations <- aggregate (Population ~ CityName + State, data = ZipGeography, sum) city_state_population_ratio <- merge (city_state_populations, state_populations, by = "State" ) city_state_population_ratio $ Ratio <- city_state_population_ratio $ Population.x / city_state_population_ratio $ Pop cities_over_5_percent <- subset (city_state_population_ratio, Ratio > 0.05 ) result_df <- data.frame ( CityName = character (), NumStates = numeric ()) for (state in unique (cities_over_5_percent $ State)) { state_population <- subset (state_populations, State == state) $ Population cities_in_state <- subset (cities_over_5_percent, State == state & Ratio > 0.05 ) result_df <- rbind (result_df, data.frame ( CityName = cities_in_state $ CityName, NumStates = nrow (cities_in_stat } result_df <- result_df[ order ( - result_df $ NumStates), ] head (result_df) 4

## CityName NumStates ## 1 Aguada 9 ## 2 Aguadilla 9 ## 3 Carolina 9 ## 4 Guaynabo 9 ## 5 Hatillo 9 ## 6 San Juan 9 The city names that include more than 5% of their state population that are used in most states are Aguada, Aguadilla, Carolina, Guaynabo, Hatillo, and San Juan. Part 4 state_timezones <- aggregate (Timezone ~ State, data = ZipGeography, FUN = function (x) length ( unique (x))) total_states_multiple_timezones <- sum (state_timezones $ Timezone > 1 ) total_states_multiple_timezones ## [1] 41 There are 41 states with more than one time zone. Part 5 city_timezones <- aggregate (Timezone ~ CityName, data = ZipGeography, FUN = function (x) length ( unique (x))) total_cities_multiple_timezones <- sum (city_timezones $ Timezone > 1 ) total_cities_multiple_timezones ## [1] 3042 There are 3042 cities with more than one time zone. Part 6 county_timezones <- aggregate (Timezone ~ County, data = ZipGeography, FUN = function (x) length ( unique (x))) total_counties_multiple_timezones <- sum (county_timezones $ Timezone > 1 ) total_counties_multiple_timezones ## [1] 386 There are 386 counties with more than one time zone. Problem 11.6 Part 1 To create a data table from MigrationFlows, you can use the rename() function. Part 2 The pairs of variables being matched for the join are countryA and countryB, and countryB and countryA from the first and second tables respectively. 5

Part 3 The variables being matched for infantA are iso_a3 from HealthIndicators with countryA from the glyph ready table Part 4 The variables being matched for infantB are infantB are iso_a3 from HealthIndicators with countryB from the glpyn ready table. 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

A1 Activity sheet – Unplugged client-server network.docx

Assignment 2.docx

Assignment 7(1).docx

Assignment 13(2).docx

project 2 final draft.docx

Module 1 Short Answer.docx

CIS1500_Assignment3_Description_updated.pdf

Recommended textbooks for you

C++ for Engineers and Scientists

Computer Science

ISBN:9781133187844

Author:Bronson, Gary J.

Publisher:Course Technology Ptr

Programming Logic & Design Comprehensive

Computer Science

ISBN:9781337669405

Author:FARRELL

Publisher:Cengage

EBK JAVA PROGRAMMING

Computer Science

ISBN:9781337671385

Author:FARRELL

Publisher:CENGAGE LEARNING - CONSIGNMENT

C++ Programming: From Problem Analysis to Program...

Computer Science

ISBN:9781337102087

Author:D. S. Malik

Publisher:Cengage Learning

Programming with Microsoft Visual Basic 2017

Computer Science

ISBN:9781337102124

Author:Diane Zak

Publisher:Cengage Learning

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

SEE MORE TEXTBOOKS

Recommended textbooks for you

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning
Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage