2023_f_final_prompt_v1-3

pdf

School

University of California, Santa Barbara *

*We aren’t endorsed by this school

Course

145

Subject

Economics

Date

Feb 20, 2024

Type

pdf

Pages

11

Uploaded by BailiffKoupreyMaster135

Report
Econ 145 - Fall 2023 ECON 145 : Final Assignment Econ 145 Overview For the final project, every question is private, so there will be no feedback on the coding portion. However, you will be able to submit your assignment as many times as you want. You will see how many points you earn on each part of the assignment. You will not be given feedback. Similar to the format of the long homework, the final project is divided into 2 parts: a coding part that you will submit through gradescope, and a write-up part that you will submit in a pdf format. The coding section consists mainly of cleaning and preparing the data for the analysis you will conduct with the write-up. Finally, this is a 2-week assignment, start early! To Receive Credit Save the scripting file (i.e. your R program file) as assignment_final.R . Make sure your capitalization is correct as the autograder is case-sensitive. Save your PERMID at the top of the Rscripts (i.e. PERMID xxxx ) Be sure to include your first name, last name, and perm number on your final write-up. Make sure all changes to the original dataset are done within the R script. You must submit the write-up in a .pdf format to receive credit. Grading on Coding Questions Grading on the coding portion of the homework will come in two types of questions: Public Questions and Private Questions . Public Questions can be submitted as many times as you like to the autograder, and the autograder will give detailed feedback. Additionally, the TAs will help you on Nectir and in office hours on the Public Questions. On the other hand, Private Questions can be thought of as a mini quiz within the homework. While you still have as many times to upload your answer as you want, the autograder will not provide any feedback, and the TAs and Professor Startz will not provide any guidance or assistance (but getting advice from classmates on Nectir or elsewhere is completely okay). Private Questions will be marked on the homework assignment. Things that can break the autograder: Please read the autograder_instructions.pdf file on CANVAS. Copyright UCSB 2023 1
Econ 145 - Fall 2023 For each homework assignment, words colored in magnenta indicate a variable/vector/tibble that will be graded by the autograder. Pay close attention to these colored texts and be sure not to miss any. You always always always need to include PERMID xxx to receive credit. (Use your real permid, not “xxx”–but you knew that.) rm ( list= ls ()) # clear the environment setwd ( dirname (rstudioapi :: getSourceEditorContext () $ path)) #-------Import necessary packages here-------------------# # This is the only package you need for the coding assignment # Including other packages for the autograder may cause some issues library (tidyverse) #------ Uploading PERMID --------------------------------# PERMID <- "ABC1234" #Type your PERMID with the quotation marks PERMID <- as.numeric ( gsub ( " \\ D" , "" , PERMID)) #Don ' t touch set.seed (PERMID) #Don ' t touch #------- Answers ----------------------------------------# Coding Assignment For the final assignment, you will be given 5 different types of data: education_data.csv : this is a biennial data containing the records of student debt from major US universities between 2010 and 2020. cost_data.csv : this data contains information on the net out-of-pocket costs that families pay for each university. graduates_income.csv : contains information on the income of students graduating in 2018. You will use this data for the write-up. data_description.csv : this data contains the descriptions of the variables in the 3 datasets described above. CPI_U_minneapolis_fed.csv : is the CPI data from the Federal Reserve of Minneapolis between 2000 and 2023. First, for the coding assignment, go to Canvas and download the CPI_U_minneapolis_fed.csv and the data_description.csv data. Then go to the data website for the class and download the other data corresponding to the final project. You should get 2 datasets from the website: education_data.csv and cost_data.csv . The graduates_income.csv data is also on Canvas but you will not need it for the coding part, only the write-up part. As an analyst, main part of your job is to understand the data including its structure and variables. education_data.csv , cost_data.csv , and graduates_income.csv are described in the file called data_description.csv . Before starting, read this file to make sure that you understand the data. Again, every question is private! Part 1: Cleaning Education data 1. Import the CPI data (Make sure to not rename CPI_U_minneapolis_fed.csv when you download it from Canvas), and select only the first and second columns. Name it cpi_data 2. Import the education data. This data should have 11 variables. First, rename the variable Year to the lower case year . Then, rename the other 10 variables according to the rename column in data_description.csv . Then, convert the school names to lower case. Finally, convert the 5 variables Copyright UCSB 2023 2
Econ 145 - Fall 2023 listed below to numeric type ( Do not worry about warning message of NAs introduced by coercion .) Save it as education_data . median_debt_low_income , median_debt_med_income , median_debt_high_income default_rate , avg_family_income 3. Update the column called institution_type to be equal to "public" if the school is public; set institution_type to "private ' otherwise. Save this as education_data_clean . 4. Filter education_data_clean to include only the schools that predominantly offers a bachelor’s degree. Name this data education_data_BA1 5. Merge education_data_BA1 with cpi_data by keeping only the values in education_data_BA1 and save it as education_data_BA . Then convert the debt values and average family income to 2018 dollar values using the formula from the federal reserve of Minneapolis (You have done this in previous homework.) You can find the formula here . Rename the variables such that: the 2018 dollar value of median_debt_low_income is called real_debt_low_income the 2018 dollar value of median_debt_med_income is called real_debt_med_income the 2018 dollar value of median_debt_high_income is called real_debt_high_income the 2018 dollar value of avg_family_income is called real_family_income Finally, drop the 5 variables listed below which include the median variables, average family income, and CPI values. Make sure to update education_data_BA . If you did everything right, the first few columns of education_data_BA should look like Table 1. median_debt_low_income , median_debt_med_income , median_debt_high_income , avg_family_income , CPI Part 2: Cleaning Cost Data 1. Import the cost dataset, name it cost_data1 , and select the following 9 variables: UNITID , INSTNM , YEAR , NPT41_PUB, NPT43_PUB, NPT45_PUB, NPT41_PRIV, NPT43_PRIV, NPT45_PRIV 2. Using cost_data1 , create cost_data2 by doing the following: rename YEAR to year and rename the other 8 columns according to the rename column in data_description.csv . Then convert the school names to lower case and convert the following variable to numeric values: mean_cost_low_income_public , mean_cost_med_income_public, mean_cost_high_income_public mean_cost_low_income_private, mean_cost_med_income_private, mean_cost_high_income_private 3. Create a new column called mean_cost_low_income which is equal to mean_cost_low_income_public . Then replace the NA values in mean_cost_low_income by mean_cost_low_income_private , otherwise keep it to the original value of mean_cost_low_income . Do the same for median and high income cost: create a new column called mean_cost_med_income which is equal to mean_cost_med_income_public . Then replace the NA values in mean_cost_med_income by mean_cost_med_income_private , otherwise keep it to the original value of mean_cost_med_income . create a new column called mean_cost_high_income which is equal to mean_cost_high_income_public . Then replace the NA values in mean_cost_high_income by mean_cost_high_income_private , other- wise keep it to the original value of mean_cost_high_income . Then remove remove the variables below: mean_cost_low_income_public , mean_cost_low_income_private , mean_cost_med_income_public , mean_cost_med_income_private , mean_cost_high_income_public , mean_cost_high_income_private . Then save it as cost_data3 . This should look like Table 2. Copyright UCSB 2023 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Econ 145 - Fall 2023 4. Merge cost_data3 with cpi_data by keeping only the values in cost_data3 and save it as cost_data4 . Update cost_data4 by creating 3 new columns which are the cost values converted to 2018 dollar values using the formula from the federal reserve of Minneapolis. You can find the formula here . The name of the new columns should be: the 2018 dollar value of mean_cost_low_income is called real_cost_low_income the 2018 dollar value of mean_cost_med_income is called real_cost_med_income the 2018 dollar value of mean_cost_high_income is called real_cost_high_income 5. Finally, remove the mean and CPI variables from cost_data4 and save it as cost_data . If you did everything right, the first few observations of cost_data should look like Table 3. The variables to remove are: mean_cost_low_income , mean_cost_med_income , mean_cost_high_income , CPI Part 3: Merging debt and cost data 1. Merge education_data_BA and cost_data by year and school_id and by keeping only the val- ues in education_data_BA and save it as education_data_BA_cost . You should have 2 different school_name columns after the merge, one called school_name.x and the other school_name.y . Use this to verify if you merged correctly. Then drop school_name.y in education_data_BA_cost . The first few observations of education_data_BA_cost should look like Table 4. 2. Using education_data_BA_cost , replicate Table 5 and save is as debt_cost_sumstat_year . 3. Using debt_cost_sumstat_year , replicate Table 6, save it as debt_cost_data_by_year . This is harder than the tables we usually make. ( hint: One way to do it is to first make one table with the debt column using the function pivot_longer() , here is a good tutorial on pivot_longer() , then make another table with the cost column using the function pivot_longer() , then combine the two tables by year , institution_type , and income_category using inner_join() .) 4. Using the education_data_BA_cost , replicate Table 7 to 10 below: 4a. name table 7 as debt_sumstat_school_type 4b. table 8 as debt_sumstat_year 4c. table 9 as cost_sumstat_school_type 4d. table 10 as cost_sumstat_year Note: The values in the tables below are just examples, the values in your correct tables should be different because each student will receive a slightly different dataset. Copyright UCSB 2023 4
Econ 145 - Fall 2023 Write-up format Your write-up must be written as a narrative and should be 4 pages long of writing (please do not write more than 4 pages.) Attach your graphs, tables, and citations at the end of the 4-page writing (Any citation format is fine as long as we can verify your sources.) Because you are asked to report multiple tables and graphs, it’s fine if your tables and graphs are more than a page, but the writing part of your write-up cannot be more than 4 pages. The grade on the write-up will depend on how clear the answer is, how good the writing is, and how clearly labeled your tables/graphs are. For instance, no bullet points, no screenshot of R outputs for tables and do not include R variable names in your tables/graphs (Eg: for a variable name grade_cat or gradeCat in your code, you should have a clear label such as grade category in your tables/graphs.) The legend and axis labels should be readable and your tables/figures should have titles. Overall, you will need to report 8 figures and no tables for this final write-up. To make grading easier for the TAs, use the format: 12pt font for text body, times new roman, 1 inch margin, 1.5 spacing. The format of this final write-up will follow a formal written report and must include the sections below (each sections will be clearly described later in the prompt): Section 1: Introduction (half a page) Section 2: Debt and cost analysis (one page and a half) Section 3: Debt, cost, and earnings after graduation (one page and a half) Section 4: Conclusion (half a page) Write-up - Main prompt Suppose that you work for a state (pick any state of your choice from education_data_BA_cost ) and your job is to analyze the trend in student debt and default rate, education cost, family income, and earning after graduation from your state. The content of each section of the report to your supervisor will be described below: Section 1: Introduction This should be one paragraph (about half a page) describing the overall content of your report. Section 2: Debt and cost analysis In this section, you will be reporting on the analysis we conducted in the coding section. This section is divided into 2 parts: a data description and cleaning process part, and a debt and cost analysis part. This section should be about a page and a half long including the the two parts described below. Section 2 - Part 1: Data description and cleaning process This is where you first describe the original data that was handed to you (education data, cost data, CPI data.) For instance, what information do these data contain and what are the variables reported in them? Then, describe the process you took to get to education_data_BA_cost such as how you cleaned the education and cost data and how you combined them, how you converted the values to real 2018 dollar value and why is this necessary (give a brief description of the conversion formula, your supervisor may not be an economist and do not understand your formula), etc. Then, finally, finish by describing the clean dataset education_data_BA_cost . For the final write-up, we give you the opportunity to work on a real world data, and that includes dealing with missing values. If you look at the tables we constructed from the coding portion, there are many missing Copyright UCSB 2023 5
Econ 145 - Fall 2023 values especially for the cost data. What missing value you get will depend on your individual data. We give you freedom on how to handle these missing values, but you need to clearly explain in this section how you handle the missing values and why your method makes sense. The only method you cannot do is drop them . For instance, if one state is missing the entire income data, one method would be to use the income data of another comparable state. Or if a value of a particular year is missing, then you can replace the missing value of that particular year to the value of the previous year. But what if the previous is also missing? Well now you have to make a decision again, but make sure to clearly explain your decision in this section. Please do not go on Nectir or ask the TAs if your method is correct , there are many ways to do this and there is no right or wrong way to do this, the most important is that you can clearly explain that what you are doing makes sense. It is also possible that you are using different methods based on the different tasks that you want to complete. Section 2 - Part 2: Debt and cost analysis Do not include the codes for the figures to the autograder. First, using education_data_BA_cost , replicate figure 1, 2, and 3 from the prompt and report them for your write-up. Note that figure 1, 2, and 3 are not clearly labeled, it is your job to report clearly labelled figure 1, 2, and 3. Start by studying the overall trend in student debt, and discuss what you can learn from figure 1, and 2. Then, proceed by comparing your state to the average values of the other states as shown in figure 3. Finally, complement the analysis you have done so far by analyzing the family income, out of pocket cost (real cost variables), and default rate variables from education_data_BA_cost for your state . Does the trend in family income over time keep up with the trend in out-of-pocket cost? How is that related to the amount of student debt and the default rate? What policy would you recommend to support the educational cost of the students in your state? Report 2 different clearly labelled figures to support your point. Again, you might be having some missing values in your data, this is where you apply the method you explained in the data section. Section 3: Debt, cost, and earnings after graduation Do not submit the codes from this part to the autograder. In this part, it is your turn to conduct an analysis of your choice that you find interesting. These are the rules that you need to follow: This part should be about a page and a half, discussing about student debt, education cost, and earnings after graduation. This section does not have to be about the state you chose earlier. Provide one relevant policy from your analysis. You have to use the graduates_income_2018.csv data from Canvas and the education and cost data provided in the coding part. However, how you use and combine the different data will depend on what analysis you want to conduct. Similar to section 2 , you need to have at least 2 parts: one describing clearly how you clean and combine the different data (such as in section 2), and another one describing your analysis and results. If you think that you need more than these two sections, then you are welcome to add more sections. Include 3 clearly labeled graphs to support your analysis. Similar to the previous section, you might be dealing with missing values in your data, this where you can apply the missing method you explained in the data section. Your grade will depend on the quality of your analysis. We expect a level of analysis like in section 2, or better. We have conducted many data analysis throughout the quarter, so you can refer to previous assignments as well for guidance. This is designed to be an individual analysis, so the TAs will not be able to provide much guidance on this section. Copyright UCSB 2023 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Econ 145 - Fall 2023 Section 4: Conclusion This is where you summarize the main takeaways from your analysis. This should be one paragraph (about half a page long.) Note: The values in the tables/figures below are just examples, the values in your correct tables/figures can be different because each student will receive a slightly different dataset. Copyright UCSB 2023 7
Econ 145 - Fall 2023 Table 1: Example of education_data_BA school_id school_name year state_id predominant_degree institution_type default_rate real_debt_low_income real_debt_med_income real_debt_high_income real_family_income 110404 california institute of technology 2016 CA 3 private 0.000 NA NA 15441.081 77672.31 193070 mesivtha tifereth jerusalem of america 2012 NY 3 private 0.000 NA NA NA 36414.03 239080 marian university 2020 WI 3 private 0.066 13826.024 17315.03 17808.889 NA 146533 lakeview college of nursing 2016 IL 3 private 0.031 25658.235 20597.52 15693.750 53698.84 148016 principia college 2012 IL 3 private NA NA NA NA NA 225502 university of houston-victoria 2012 TX 3 public 0.074 7185.222 10115.09 8202.308 40096.40 233718 sweet briar college 2018 VA 3 private 0.018 19300.000 12000.00 12000.000 NA 219471 university of south dakota 2012 SD 3 public 0.085 11341.058 13287.74 12030.052 67938.76 181394 university of nebraska at omaha 2020 NE 3 public 0.020 12128.091 11941.80 13098.338 NA 101879 university of north alabama 2018 AL 3 public 0.116 14000.000 14770.00 15000.000 NA Table 2: Example of cost_data3 school_id school_name year mean_cost_low_income mean_cost_med_income mean_cost_high_income 195526 skidmore college 2012 10096 21063 39529 375939 yti career institute-altoona 2014 NA NA NA 155177 hesston college 2010 12066 14698 19176 218353 midlands technical college 2012 6232 10021 11145 366270 delta college of arts & technology 2010 NA NA NA 122755 san jose state university 2020 8107 13431 18029 457660 allure school of cosmetology 2018 NA NA NA 102058 selma university 2020 4147 NA NA 447962 compass career college 2012 NA NA NA 449126 miami-jacobs career college-springboro 2016 15387 NA NA Table 3: Example of cost_data school_id school_name year real_cost_low_income real_cost_med_income real_cost_high_income 195526 skidmore college 2012 11041.401 23035.36 43230.54 375939 yti career institute-altoona 2014 NA NA NA 155177 hesston college 2010 13891.667 16921.91 22077.46 218353 midlands technical college 2012 6815.571 10959.38 12188.63 366270 delta college of arts & technology 2010 NA NA NA 122755 san jose state university 2020 7865.795 13031.39 17492.59 457660 allure school of cosmetology 2018 NA NA NA 102058 selma university 2020 4023.616 NA NA 447962 compass career college 2012 NA NA NA 449126 miami-jacobs career college-springboro 2016 16098.649 NA NA Table 4: Example of education_data_BA_cost school_id school_name.x year state_id predominant_degree institution_type default_rate real_debt_low_income real_debt_med_income real_debt_high_income real_family_income real_cost_low_income real_cost_med_income real_cost_high_income 110404 california institute of technology 2016 CA 3 private 0.000 NA NA 15441.081 77672.31 NA NA NA 193070 mesivtha tifereth jerusalem of america 2012 NY 3 private 0.000 NA NA NA 36414.03 6787.137 6889.939 7764.852 239080 marian university 2020 WI 3 private 0.066 13826.024 17315.03 17808.889 NA NA NA NA 146533 lakeview college of nursing 2016 IL 3 private 0.031 25658.235 20597.52 15693.750 53698.84 NA NA NA 148016 principia college 2012 IL 3 private NA NA NA NA NA NA NA NA 225502 university of houston-victoria 2012 TX 3 public 0.074 7185.222 10115.09 8202.308 40096.40 NA NA NA 233718 sweet briar college 2018 VA 3 private 0.018 19300.000 12000.00 12000.000 NA NA NA NA 219471 university of south dakota 2012 SD 3 public 0.085 11341.058 13287.74 12030.052 67938.76 NA NA NA 181394 university of nebraska at omaha 2020 NE 3 public 0.020 12128.091 11941.80 13098.338 NA NA NA NA 101879 university of north alabama 2018 AL 3 public 0.116 14000.000 14770.00 15000.000 NA NA NA NA Table 5: Example of debt_cost_sumstat_year year institution_type mean_debt_for_low_income mean_debt_for_median_income mean_debt_for_high_income mean_cost_for_low_income mean_cost_for_median_income mean_cost_for_high_income 2010 private 14126.87 15782.08 14864.47 19276.789 23210.106 29780.979 2010 public 11029.85 12594.10 11423.39 7927.721 13316.988 17673.230 2012 private 14378.51 17048.40 16796.53 6787.137 6889.939 7764.852 2012 public 12031.34 12727.37 12063.49 NaN NaN NaN 2014 private 17887.31 19820.16 18439.76 12571.973 13087.540 NaN 2014 public 11670.26 12101.01 11148.14 NaN NaN NaN 2016 private 18084.04 20024.72 18446.16 NaN NaN NaN 2016 public 13689.14 14238.46 14213.70 7994.396 12549.769 15616.327 2018 private 14698.52 16553.38 16084.76 25029.000 25569.000 25501.000 2018 public 15969.58 16387.17 15969.42 NaN NaN NaN 2020 private 15441.87 17272.09 16889.94 14854.001 18570.048 20750.194 2020 public 14410.79 14353.45 14190.35 5190.823 10213.793 10081.840 Copyright UCSB 2023 8
Econ 145 - Fall 2023 Table 6: Example of debt_cost_data_by_year year institution_type income_category debt cost 2010 private low income 14126.87 19276.789 2010 private median income 15782.08 23210.106 2010 private high income 14864.47 29780.979 2010 public low income 11029.85 7927.721 2010 public median income 12594.10 13316.988 2010 public high income 11423.39 17673.230 2012 private low income 14378.51 6787.137 2012 private median income 17048.40 6889.939 2012 private high income 16796.53 7764.852 2012 public low income 12031.34 NaN 2012 public median income 12727.37 NaN 2012 public high income 12063.49 NaN 2014 private low income 17887.31 12571.973 2014 private median income 19820.16 13087.540 2014 private high income 18439.76 NaN 2014 public low income 11670.26 NaN 2014 public median income 12101.01 NaN 2014 public high income 11148.14 NaN 2016 private low income 18084.04 NaN 2016 private median income 20024.72 NaN 2016 private high income 18446.16 NaN 2016 public low income 13689.14 7994.396 2016 public median income 14238.46 12549.769 2016 public high income 14213.70 15616.327 2018 private low income 14698.52 25029.000 2018 private median income 16553.38 25569.000 2018 private high income 16084.76 25501.000 2018 public low income 15969.58 NaN 2018 public median income 16387.17 NaN 2018 public high income 15969.42 NaN 2020 private low income 15441.87 14854.001 2020 private median income 17272.09 18570.048 2020 private high income 16889.94 20750.194 2020 public low income 14410.79 5190.823 2020 public median income 14353.45 10213.793 2020 public high income 14190.35 10081.840 Copyright UCSB 2023 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Econ 145 - Fall 2023 Table 7: Debt by institution institution_type mean_debt_for_low_income mean_debt_for_median_income mean_debt_for_high_income mean_family_income private 15790.18 17779.69 16943.32 59504.99 public 13030.71 13680.19 13089.68 58666.46 Table 8: Debt by year year mean_debt_for_low_income mean_debt_for_median_income mean_debt_for_high_income mean_family_income 2016 16885.43 18409.95 17291.85 56320.48 2012 13664.15 15733.30 15356.04 60316.44 2020 15202.08 16560.23 16231.50 NaN 2018 15110.76 16492.94 16042.82 NaN 2014 16296.90 17749.17 16483.47 58959.43 2010 13223.57 14832.47 13839.47 61180.80 Table 9: Out-of-pocket cost by institution institution_type mean_cost_for_low_income mean_cost_for_median_income mean_cost_for_high_income private 18639.994 22228.81 28350.09 public 7749.706 13058.96 16984.07 Table 10: Out-of-pocket cost by year year mean_cost_for_low_income mean_cost_for_median_income mean_cost_for_high_income 2016 7994.396 12549.769 15616.327 2012 6787.137 6889.939 7764.852 2020 11632.942 15784.630 17194.076 2018 25029.000 25569.000 25501.000 2014 12571.973 13087.540 NaN 2010 15845.676 20147.950 26055.518 13000 14000 15000 16000 17000 18000 2010.0 2012.5 2015.0 2017.5 2020.0 mean_debt_for_low_income colour Debt high income Debt low income Debt median income Figure 1: Example of debt_plot Copyright UCSB 2023 10
Econ 145 - Fall 2023 private public 2010.0 2012.5 2015.0 2017.5 2020.0 2010.0 2012.5 2015.0 2017.5 2020.0 12500 15000 17500 20000 mean_debt_for_low_income colour Debt high income Debt low income Debt median income Figure 2: Example of dept_plot by institution type public Avg_other_states public My_State private Avg_other_states private My_State 2010.0 2012.5 2015.0 2017.5 2020.0 2010.0 2012.5 2015.0 2017.5 2020.0 8000 12000 16000 20000 8000 12000 16000 20000 mean_debt_for_low_income colour Debt high income Debt low income Debt median income Figure 3: Example of debt_vs_cost_plot Copyright UCSB 2023 11