Solutions — Exam III

pdf

School

University of Nebraska, Lincoln *

*We aren’t endorsed by this school

Course

430

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by MateTankKudu37

Exam PA October 11, 2022 Project Statement This model solution is provided so that candidates may better prepare for future sittings of Exam PA. It includes both a sample solution, in plain text, and commentary from those grading the exam, in italics. In many cases there is a range of fully satisfactory approaches. This solution presents one such approach, with commentary on some alternatives, but there are valid alternatives not discussed here. General Information for Candidates This examination has 12 tasks numbered 1 through 12 with a total of 100 points. The points for each task are indicated at the beginning of the task, and the points for subtasks are shown with each subtask. Each task pertains to the business problem (and related data files) and data dictionary described below. Additional information on the business problem may be included in specific tasks—where additional information is provided, including variations in the target variable, it applies only to that task and not to other tasks. An .Rmd file accompanies this exam and provides useful R code for importing the data and, for some tasks, additional analysis and modeling. There are five datasets used in this exam. They are all subsets of a larger dataset that is not given to candidates. The .Rmd file has a chunk for each task. Each chunk starts by reading in one or more data files into one or more dataframes that will be used in the task. This ensures a common starting point for candidates for each task and allows them to be answered in any order. When the datafile is read, the variables it contains are assigned a type (e.g., “numerical,” ”factor”). The code that assigns variable types is easily changed (e.g., if month is read in as “numeric” but you want to treat it as a factor). The responses to each specific subtask should be written after the subtask and the answer label, which is typically ANSWER, in this Word document. Each subtask will be graded individually, so be sure any work that addresses a given subtask is done in the space provided for that subtask. Some subtasks have multiple labels for answers where multiple items are asked for—each answer label should have an answer after it. Where code, tables, or graphs from your own work in R is required, it should be copied and pasted into this Word document. Each task will be graded on the quality of your thought process (as documented in your submission), conclusions, and quality of the presentation. The answer should be confined to the question as set. No response to any task needs to be written as a formal report. Unless a subtask specifies otherwise, the audience for the responses is the examination grading team and technical language can be used. When “for a general audience” is specified, write for an audience not familiar with analytics acronyms (e.g., RMSE, GLM, etc.) or analytics concepts (e.g., log link, binarization). Prior to uploading your Word file, it should be saved and renamed with your five-digit candidate number in the file name. If any part of your exam was answered in French, also include “French” in the file name. Please keep the exam date as part of the file name. It is not required to upload your .Rmd file or other files used in determining your responses, as needed items from work in R will be copied over to the Word file as specified in the subtasks.

The Word file that contains your answers must be uploaded before the five-minute upload period time expires. Business Problem Your boss recently started a consulting firm, PA Consultants, specializing in predictive analytics. You and your assistant are the only other employees. Your boss informs you that a local politician from Baton Rouge, Louisiana, USA has hired your firm. Baton Rouge, a city of about 230,000 residents, is the capital of the state of Louisiana, USA. The client is about to launch a campaign with the mottos, “Clean up Baton Rouge” and “Treat all Neighborhoods Equally – including yours!” The client wants to improve garbage and waste collection. In particular, the client cares about shortening resolution times and ensuring equitable resolution times throughout the city. The client wants your ideas and inputs on the following: • Understanding time trends • Seeing whether different responding departments have different resolution times for similar tasks • Predicting resolution times for any type(s) of complaint Your boss directs you to use a dataset 1 of public data that includes all the service requests from January 2016 – March 2022. There are over 300,000 service requests in this time period. Your assistant has prepared five subsets of the public data and has provided the following data dictionary that contains all the variables appearing in the subsets. Note that all variables do not appear in every subset datafile. 1 Source: City of Baton Rouge Parish of East Baton Rouge .

Data Dictionary Variable Name Variable Values Time.to.resolution Days from service request to resolution quarter “Q1”, “Q2”, “Q3”, “Q4”; quarter of service request month 1 to 12, month of service request year 2016 to 2022, year of service request year.mo 201601 to 202203, 100*year + month DEPARTMENT “GROUNDS”,”BLIGHT”,”SANITATION” LATITUDE Latitude of service location, 30.2 to 30.6 LONGITUDE Longitude of service location, -91.3 to -90.9 area “N”,”W”,”D”,”LSU”; neighborhood of service location Latitude_Binned Latitude range for binned data (geo.grid.csv only) Longitude_Binned Longitude range for binned data (geo.grid.csv only) Ave.time.to.resolution Average Time.to.resolution for binned data (geo.grid.csv only) call.count Number of service requests for binned data (geo.grid.csv only) TYPEid An id representing a specific type of service request Comments Requests for service do not appear in the dataset until they are resolved.

Your preview ends here