2023_f_hw13_prompt_v1-1

pdf

School

University of California, Santa Barbara *

*We aren’t endorsed by this school

Course

145

Subject

Economics

Date

Feb 20, 2024

Type

pdf

Pages

7

Uploaded by BailiffKoupreyMaster135

Report
Econ 145 - Fall 2023 Homework 13: Long: A Lot of Merging! Econ 145 Overview The goal of this assignment is to analyze the relationship between the performance and salary of NBA players. There are five data for this assignment: player_salary : contain salary per season for players who played in the NBA between 2015 and 2020 scrapped from the ESPN website CPI_U_minneapolis_fed : contain CPI value between 2000-2023 from the Minneapolis Fed. nba_games_season : contain NBA game dates and seasons. nba_games_perf_points : contain the number of points scored per game for players who played in the NBA between 2015 and 2020. nba_games_perf_minutes : contain the number of minutes played per game for players who played in the NBA between 2015 and 2020. To Receive Credit Save the scripting file (i.e. your R program file) as assignment_13.R . Make sure your capitalization is correct as the autograder is case-sensitive. Save your PERMID at the top of the Rscripts (i.e. PERMID xxxx ) Be sure to include your first name, last name, and perm number on your one page write-up. Make sure all changes to the original dataset are done within the R script. Your one page write-up must be submitted in a .pdf to receive credit. Grading on Coding Questions Grading on the coding portion of the homework will come in two types of questions: Public Questions and Private Questions . Public Questions can be submitted as many times as you like to the autograder, and the autograder will give detailed feedback. Additionally, the TAs will help you on Nectir and in office hours on the Public Questions. On the other hand, Private Questions can be thought of as a mini quiz within the homework. While you still have as many times to upload your answer as you want, the autograder will not provide any feedback, and the TAs and Professor Startz will not provide any guidance or assistance (but getting advice from classmates on Nectir or elsewhere is completely okay). Private Questions will be marked on the homework assignment. Copyright UCSB 2023 1
Econ 145 - Fall 2023 Things that can break the autograder: Please read the autograder_instructions.pdf file on CANVAS. For each homework assignment, words colored in magnenta indicate a variable/vector/tibble that will be graded by the autograder. Pay close attention to these colored texts and be sure not to miss any. You always always always need to include PERMID xxx to receive credit. (Use your real permid, not “xxx”–but you knew that.) rm ( list= ls ()) # clear the environment setwd ( dirname (rstudioapi :: getSourceEditorContext () $ path)) #-------Import necessary packages here-------------------# #This is how you load a package in R #------ Uploading PERMID --------------------------------# PERMID <- "ABC1234" #Type your PERMID with the quotation marks PERMID <- as.numeric ( gsub ( " \\ D" , "" , PERMID)) #Don ' t touch set.seed (PERMID) #Don ' t touch #------- Answers ----------------------------------------# Part 1: Coding assignment Go to the data website for the class and download the four data related to NBA players described in the overview section of this homework. Then go to Canvas and download the CPI_U_minneapolis_fed.csv data under the module for week 6. The goal of this exercise is to first merge the 5 data presented above, then to see a relationship between the number of points scored and future salaries. Remember, before you merge any data, you need to identify the key that uniquely identifies the observations in the data you want to combine. One way to do this is to do group_by() |> count() |> filter(n>1) (Refer to guided exercises for more details). 1. Merging salary data and CPI data: a) ( Private Question ) First, load the salary and the CPI data and name them player_salary and cpi_fed respectively. Make sure to remove the unnecessary first column in the salary data. For the CPI data, only keep the year and CPI columns. b) ( Public Question ) The salary data contains many players who played in each year, but the CPI data is only yearly, thus many observations in salary data matches only one observation in CPI data. This is what we call a many-to-one merge. We have not covered many-to-one merging yet so we will guide you through this. Use the code for the many-to-one merge below to merge the player salary and CPI data. The name of the merged data is player_salary_cpi1 . # Code for the many-to-one merging. player_salary_cpi1 <- player_salary %>% left_join (cpi_fed, by = c ( "season" = "year" ) , relationship = "many-to-one" ) c) ( Public Question ) Once you finish merging the data, use player_salary_cpi1 tibble by computing the real salary of each player in 2015 dollars in a new column called real_2015_salary using the formula provided by the federal reserve of Minneapolis. The year-to-year price conversion formula is provided here . Save this as player_salary_cpi2 . The first few observations of this table should look like Table 1. Copyright UCSB 2023 2
Econ 145 - Fall 2023 d) ( Private Question ) Update the player_salary_cpi2 tibble by creating a new column called real_2015_salary_mill which is real_2015_salary divided by 1000000. Save this as player_salary_cpi . If you did everything right, the first few observations of player_salary_cpi should look like Table 2. e) ( Private Question ) Report the minimum, average, and maximum real_2015_salary_mill salary by year as shown in Table 3. Name the table salary_by_year_stat 2. Merging performance in minutes and points a) ( Private Question ) Load the nba_games_perf_minutes data, remove the first column, convert the player names to lower case, and name it nba_games_perf_minutes1 . The observations in this data are uniquely identified by the player_name and game_id columns (These are what we call: key). However, there are duplicates in these columns, report the duplicates in this data according to the keys in a table similar to Table 4 and name it nba_games_perf_minutes_dups . b) ( Public Question ) Update the nba_games_perf_minutes1 data by dropping those dupli- cates and save it as nba_games_perf_minutes . To test if you did this right, if you compute nba_games_perf_minutes_dups again after dropping the duplicates, nba_games_perf_minutes_dups should be empty. Make sure that the final nba_games_perf_minutes_dups in your code is the one with the duplicates. c) ( Private Question ) Do the same as above for the nba_games_perf_points data. Load the data, remove the first column, convert the player names to lower case, and name it nba_games_perf_points1 , report the duplicates in a table called nba_games_perf_points_dups , and update nba_games_perf_points1 by dropping the duplicates and save it as nba_games_perf_points . d) ( Private Question ) Now nba_games_perf_minutes and nba_games_perf_points are uniquely identified. Merge them by keeping only the observations in nba_games_perf_minutes AND the observations in nba_games_perf_points . Name the merge data nba_games_perf . Then using tidyr::extract() , create a new column called minutes with the numeric values of the first two digits in the min column. The first few observations of nba_games_perf should look like Table 5. e) ( Private Question ) To see if you did everything right, report the average minutes played (from the first two digits in the min column) and the average points scored by the players as in Table 6. Name this nba_games_perf_avg 3. Merging performance and nba game season a) ( Private Question ) Load the nba_games_season data, remove the first column, and name it nba_games_season . Then merge it with nba_games_perf . This is another many-to-one merging. Name the merged data nba_games_perf_season b) ( Private Question ) Compute the average minutes played and the average point scored by season in a table called nba_games_perf_season_avg as in Table 7. Arrange this table by season and remove any NA in the season column. 4. ( Private Question ) Merge salary_by_year_stat and nba_games_perf_season_avg in a tibble called player_salary_perf . This should look like Table 8. Copyright UCSB 2023 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Econ 145 - Fall 2023 Part 2: Write-up Write a newspaper article discussing NBA players’ salary and performance overall and over time between 2015 and 2020. To do this, using the data provided for the assignment, create 3 interesting graphs to support your discussion. You can either choose to graph the tables you constructed in the coding part or write codes for different graphs, whichever you find more interesting. Note that you do not need to report any of the tables in the write-up, Only report the 3 figures that you constructed on your own. Do not submit the codes for the plots to the autograder. Your write-up must be written as a narrative and should be 2 pages long. One page of writing, and another separate page for your tables/graphs, and citations if you are referring to other sources (Any citation format is fine as long as we can verify your sources.) Because you are asked to report multiple tables and graphs, it’s fine if your tables and graphs are more than a page, but the writing part of your write-up cannot be more than a page. The grade on the write-up will depend on how clear the answer is, how good the writing is, and how clearly labeled your tables/graphs are. For instance, no bullet points, no screenshot of R outputs for tables and do not include R variable names in your tables/graphs (Eg: for a variable name grade_cat or gradeCat in your code, you should have a clear label such as grade category in your tables/graphs.) To make grading easier for the TAs, use the format: 12pt font for text body, times new roman, 1 inch margin, 1.5 spacing. Note: The values in the tables below are just examples, the values in your correct tables should be different because each student will receive a slightly different dataset. Copyright UCSB 2023 4
Econ 145 - Fall 2023 Table 1: Example of player_salary_cpi2 player_name salary season CPI real_2015_salary kobe bryant 23500000 2015 237 23500000 joe johnson 23180790 2015 237 23180790 carmelo anthony 22458401 2015 237 22458401 dwight howard 21436271 2015 237 21436271 chris bosh 20644400 2015 237 20644400 lebron james 20644400 2015 237 20644400 chris paul 20068563 2015 237 20068563 deron williams 19754465 2015 237 19754465 rudy gay 19317326 2015 237 19317326 kevin durant 18995624 2015 237 18995624 Table 2: Example of player_salary_cpi player_name salary season CPI real_2015_salary real_2015_salary_mill kobe bryant 23500000 2015 237 23500000 23.50000 joe johnson 23180790 2015 237 23180790 23.18079 carmelo anthony 22458401 2015 237 22458401 22.45840 dwight howard 21436271 2015 237 21436271 21.43627 chris bosh 20644400 2015 237 20644400 20.64440 lebron james 20644400 2015 237 20644400 20.64440 chris paul 20068563 2015 237 20068563 20.06856 deron williams 19754465 2015 237 19754465 19.75446 rudy gay 19317326 2015 237 19317326 19.31733 kevin durant 18995624 2015 237 18995624 18.99562 Table 3: Example of salary_by_year_stat season min_salary avg_salary max_salary 2015 0.0294830 3.976103 23.50000 2016 0.0305019 4.346288 24.68750 2017 0.0055764 4.930743 29.94018 2018 0.0043492 5.393391 35.35383 2019 0.0043906 6.313585 34.71782 2020 0.1425361 6.694146 36.84284 Copyright UCSB 2023 5
Econ 145 - Fall 2023 Table 4: Example of nba_games_perf_minutes_dups player_name game_id n aaron nesmith 22000070 2 adam mokoka 22000015 2 andre drummond 22000023 2 andre iguodala 22000071 2 andrew wiggins 22000001 2 andrew wiggins 22000006 2 andrew wiggins 22000078 2 anthony edwards 22000074 2 aron baynes 22000066 2 avery bradley 22000058 2 ben simmons 22000013 2 boban marjanovic 22000071 2 bobby portis 22000003 2 bobby portis 22000051 2 bogdan bogdanovic 22000072 2 Table 5: Example of nba_games_perf player_name game_id min minutes pts mikal bridges 42000406 39:28:00 39 7 jae crowder 42000406 40:33:00 40 15 deandre ayton 42000406 36:12:00 36 12 devin booker 42000406 46:15:00 46 19 chris paul 42000406 39:13:00 39 26 cameron johnson 42000406 16:04:00 16 3 frank kaminsky 42000406 NA NA 6 cameron payne 42000406 10:27:00 10 10 torrey craig 42000406 00:49:00 0 0 ty-shon alexander 42000406 NA NA NA Table 6: Example of nba_games_perf_avg avg_min avg_pts 21.88141 9.985323 Table 7: Example of nba_games_perf_season_avg season avg_min avg_pts 2015 21.89275 9.468590 2016 21.82957 9.733824 2017 21.90958 9.847761 2018 21.93770 10.294734 2019 21.88641 10.327704 2020 21.82600 10.345884 Copyright UCSB 2023 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Econ 145 - Fall 2023 Table 8: Example of player_salary_perf season min_salary avg_salary max_salary avg_min avg_pts 2015 0.0294830 3.976103 23.50000 21.89275 9.468590 2016 0.0305019 4.346288 24.68750 21.82957 9.733824 2017 0.0055764 4.930743 29.94018 21.90958 9.847761 2018 0.0043492 5.393391 35.35383 21.93770 10.294734 2019 0.0043906 6.313585 34.71782 21.88641 10.327704 2020 0.1425361 6.694146 36.84284 21.82600 10.345884 Copyright UCSB 2023 7