Questions Only — Exam I
pdf
keyboard_arrow_up
School
University of Nebraska, Lincoln *
*We aren’t endorsed by this school
Course
430
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
27
Uploaded by MateTankKudu37
Page 1 October 17, 2023 Project Statement © 2023 Society of Actuaries Exam PA October 17 Project Statement IMPORTANT NOTICE – THIS IS THE OCTOBER 17, 2023 PROJECT STATEMENT. IF TODAY IS NOT OCTOBER 17, 2023, SEE YOUR TEST CENTER ADMINISTRATOR IMMEDIATELY. General Information for Candidates This examination has 10 tasks numbered 1 through 10 with a total of 70 points. The points for each task are indicated at the beginning of the task, and the points for subtasks are shown with each subtask. Each task pertains to the business problem described below. Additional information on the business problem may be included in specific tasks—where additional information is provided, including variations in the target variable, it applies only to that task and not to other tasks. For this exam there is no data file or .Rmd file provided. Neither R nor RStudio are available or required. The responses to each specific subtask should be written after the subtask and the answer label, which is typically ANSWER, in this Word document. Each subtask will be graded individually, so be sure any work that addresses a given subtask is done in the space provided for that subtask. Some subtasks have multiple labels for answers where multiple items are asked for—each answer label should have an answer after it. Each task will be graded on the quality of your thought process (as documented in your submission), conclusions, and quality of the presentation. The answer should be confined to the question as set. No response to any task needs to be written as a formal report. Unless a subtask specifies otherwise, the audience for the responses is the examination grading team and technical language can be used. Prior to uploading your Word file, it should be saved and renamed with your five-digit candidate number in the file name. If any part of your exam was answered in French, also include “French” in the file name. Please keep the exam date as part of the file name. The Word file that contains your answers must be uploaded before the five-minute upload period time expires.
Page 2 October 17, 2023 Project Statement © 2023 Society of Actuaries Business Problem You are a consultant and your client has asked you to perform a study related to outcomes in university in the United States. Your client is interested in better understanding the drivers of several key variables and developing models to predict them. These target variables include: •
tuition prices •
students who are defaulting on student loans •
future earnings of students •
student loan repayment rates •
university admission rates To answer these questions, you decide to use a publicly available dataset
1
that includes aggregated data from 2,180 universities in the United States for the 2020-2021 school year.
1
Source: United States Department of Education
Page 3 October 17, 2023 Project Statement © 2023 Society of Actuaries Data Dictionary Variable Data Type: Range/Levels Description UNITID Numeric : 100654 to 495767 ID for the institution INSTNMH String: N/A Institution name REGION Factor: 10 levels Region (IPEDS) CONTROL Factor: 3 levels (“Public”, “Private, non-profit”, ”Private, for-profit”) Control of institution LOCALE Factor: 4 levels (“City”, “Suburb”, ”Town”, ”Rural”) Locale of institution ADMIT_TIER Factor: 5 levels ("MOST SELECTIVE", "EXTREMELY SELECTIVE", "VERY SELECTIVE", "MODERATELY SELECTIVE", "NOT SELECTIVE") How selective the institution is TEST_REQ Factor: 4 levels ("Required", "Recommended", "Neither required nor recommended", "Considered but not required") Does the institution require standardized tests ADM_RATE Numeric : 0.0244 to 1.0 Admission rate SATVRMID Numeric: 395 to 760 Midpoint of SAT critical reading scores SATMTMID Numeric: 350 to 795 Midpoint of SAT math scores SATWRMID Numeric: 280 to 765 Midpoint of SAT writing scores UGDS Numeric: 2 to 109,233 Number of undergraduate certificate/degree-seeking students SCHOOL_SIZE Factor: 3 levels (“Small”, “Medium”, ”Large”) The size of the university based on number of students TUITIONFEE_IN Numeric: 480 to 61,671 In-state tuition and fees TUITIONFEE_OUT Numeric: 480 to 61,671 Out-of-state tuition and fees
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 4 October 17, 2023 Project Statement © 2023 Society of Actuaries AVGFACSAL Numeric: 547 to 21,143 Average faculty salary per month PFTFAC Numeric: 0.0339 to 1.0 Proportion of faculty that is full-time PCTPELL Numeric: 0.0054 to 1.0 Percentage of undergraduates who receive a Pell Grant PCTFLOAN Numeric: 0.0015 to 1.0 Percent of undergraduate students receiving a federal student loan MD_EARN_WNE_P10 Numeric: 13,438 to 132,969 Median earnings of students working and not enrolled 10 years after entry COMPL_RPY_7YR_RT Numeric: 0.2059 to 0.9814 Seven-year repayment rate for completers NONCOM_RPY_7YR_RT Numeric: 0.1130 to 0.9314 Seven-year repayment rate for non-
completers GRAD_DEBT_MDN Numeric: 2,334 to 48,148 The median debt for students who have completed WDRAW_DEBT_MDN Numeric: 2,352 to 24,167 The median debt for students who have not completed COSTT4_A Numeric: 5,663 to 81,531 Average cost of attendance CDR3 Numeric: 0.001 to 0.357 Three-year cohort default rate LOAN_EVER Numeric: 0.0139 to 0.9856 Percent of students who received a federal loan while in school AGE_ENTRY Numeric: 17.43 to 51.60 Average age of entry into the institution FEMALE Numeric: 0.04156 to 0.97957 Share of female students MARRIED Numeric: 0.0027 to 0.8154 Share of married students FIRST_GEN Numeric: 0.08867 to 0.85091 Share of first-generation students MD_FAMINC Numeric: 1,680 to 179,864 Median family income
Page 5 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 1 (5
points
) Your client wants to understand the factors influencing university admission rates. Your client is interested in ensuring that the analysis has proportional representation with respect to different regions of the country (
REGION
) and population densities (
LOCALE
). (a)
(
3 points
) Describe the steps for developing a stratified sample based on your client’s goals. ANSWER:
Your client is also interested in student opinions about the university. You are given a dataset with written responses to a university satisfaction survey. (b)
(
2 points
) Discuss the advantages and disadvantages of using this kind of unstructured data in a predictive model. ANSWER:
Page 6 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 2 (11
points
) Your assistant is interested in understanding the relationship between the features admission rate (
ADM_RATE
) and in-state tuition (
TUITIONFEE_IN
) and is considering whether to perform a K-means analysis or a hierarchical clustering analysis to better understand the relationship. (a)
(
4 points
) Describe two similarities and two differences between K-means clustering and hierarchical clustering.
ANSWER: Your assistant prepared an elbow plot of K-means clustering using the in-state tuition (
TUITIONFEE_IN
) and admission rate (
ADM_RATE
) features, shown below. (b)
(
3 points
) Explain the tradeoff between selecting a value of K=2 and K=4. Recommend a value for K and justify your recommendation.
ANSWER:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 7 October 17, 2023 Project Statement © 2023 Society of Actuaries Your assistant now wants to include more features in the K-means clustering analysis and is suggest adding the following five variables: CONTROL Factor: 3 levels (“Public”, “Private, non-profit”, ”Private, for-profit”) Control of institution ADMIT_TIER Factor: 5 levels ("MOST SELECTIVE", "EXTREMELY SELECTIVE", "VERY SELECTIVE", "MODERATELY SELECTIVE","NOT SELECTIVE") How selective the institution is PFTFAC Numeric: 0.0339 to 1.0 Proportion of faculty that is full-time MARRIED Numeric: 0.0027 to 0.8154 Share of married students FIRST_GEN Numeric: 0.08867 to 0.85091 Share of first-generation students (c)
(
4 points
) Critique your assistant’s suggestion to add these features to the K-means analysis. Include at least three considerations in your critique.
ANSWER:
Page 8 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 3 (7
points
) You are investigating variables that impact the expected future earnings of students. The variable (
MD_EARN_WNE_P10
) represents the median earnings of students working and not enrolled 10 years after entry. (a)
(
3 points
) Explain three differences between fitting a normal linear regression to log(MD_EARN_WNE_P10) compared to fitting a GLM with a log link function to the unaltered MD_EARN_WNE_P10 variable.
ANSWER:
A GLM was fit on the formula MD_EARN_WNE_P10 ~ SATVRMID + SATMTMID + MD_FAMINC + AVGFACSAL. The plot of residuals vs. predicted values is as follows. (b)
(2
points
) Analyze the residual plot.
ANSWER:
Page 9 October 17, 2023 Project Statement © 2023 Society of Actuaries You are asked to construct a model to predict which individual students will default on their student loans. (c)
(
2 points
) Evaluate if this university dataset is appropriate for developing a model for predicting individual student loan defaults. No points will be awarded for referencing ethical issues as part of your evaluation. ANSWER:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 10 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 4 (6
points
) Your boss is interested in ranking the list of universities by their median earnings of students (
MD_EARN_WNE_P10
). Your Boss asks you and your Assistant to build tree-based models to predict this variable. Your assistant creates two decision trees. Each was trained on the data using a bootstrap sample obtained without replacement. You are also provided one row from the testing data set. Decision Tree One:
Page 11 October 17, 2023 Project Statement © 2023 Society of Actuaries Decision Tree Two: Note: When reading the decision trees assume all node values are rounded to the nearest thousand. For example, “63e+3” should be interpreted as 63,000. One row of data from the Testing data set: MD_EARN_WNE_P10 TUITIONFEE_IN SATMTMID SATVRMID RET_FT4 PCTPELL AVGFACSAL 32,084 11,068 462 485 0.6202 0.7368 7,194 (a)
(
4 points
) Calculate the change in the
Absolute Error
, using the testing data row, between the first decision tree model and building a bagged model using both decision trees. State which of these two approaches yields a better result for this observation. Show all work.
ANSWER:
Your assistant is building a decision tree and does not fully understand the Complexity Parameter used in the process. You generate the Complexity Parameter (CP) table below.
Page 12 October 17, 2023 Project Statement © 2023 Society of Actuaries (b)
(
2 points
) Interpret the Complexity Parameter table. Recommend and justify a CP value to use for the model.
ANSWER:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 13 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 5 (9
points
) Your client is interested in obtaining a deeper understanding of how tuition prices and the size of the universities are reflected in the dataset. (a)
(
3 points
) Suggest two numerical variables from the Data Dictionary for this analysis and describe two univariate technique that can be used to explore them.
ANSWER: (b)
(
2 points
) Suggest a categorical variable from the Data Dictionary for this analysis and describe a univariate technique to explore the variable.
ANSWER: Your client is interested in obtaining a deeper understanding of the relationship between tuition prices and the university’s size. (c)
(2 points)
Describe a bivariate visualization that can be applied to understand the relationship between a numeric variable and a categorical variable. ANSWER:
Page 14 October 17, 2023 Project Statement © 2023 Society of Actuaries Your client is interested in the relationship between tuition prices and the percentage of students receiving a federal loan. A plot is provided between TUITIONFEE_IN and PCTFLOAN variables where the data is split between public and private universities. (d)
(2 points)
Interpret the plot above. ANSWER:
Page 15 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 6 (2
points
) Your assistant is building a random forest model. They are asking for help with hyperparameter tuning and selection. Your assistant has provided the following output:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 16 October 17, 2023 Project Statement © 2023 Society of Actuaries (a)
(
2 points
) Recommend a value for each parameter in the random forest. Justify your recommendation. ANSWER: Parameter Value Mtry Ntrees
Page 17 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 7 (8
points
) You ask your assistant to create a generalized linear model using variables in the dataset to predict an institution’s 7-year loan repayment rate for students who completed their degree (
COMPL_RPY_7YR_RT
). Your assistant creates both an ordinary linear model and generalized linear model with a log link function and gamma distribution. Each model has the following output: Linear Model 1: Generalized Linear Model 1:
Page 18 October 17, 2023 Project Statement © 2023 Society of Actuaries The CONTROL
variable has three levels: Public, Private non-profit, and Private for-profit. (a)
(
2 points
) Describe the impact that each of the three levels of the CONTROL
variable has on the 7-year loan repayment rate in Linear Model 1
. ANSWER: (b)
(
3 points
) Calculate the model’s predicted 7-year loan repayment rate for each scenario below and show your work:
Answer
: Model Scenario Predicted 7-year loan repayment rate Linear Model 1 Public institution with 100% of undergrads receiving Pell grants. Generalized Linear Model 1 Private for-profit institution with 50% of undergrads receiving Pell grants. Your assistant wants to add a cost of attendance variable to the model. You hypothesize that because public universities usually have a lower cost of attendance than private universities, the cost of attendance for public and private schools may have a different relationship to loan repayment rates. You create the following graph:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 19 October 17, 2023 Project Statement © 2023 Society of Actuaries Based on the graph, you ask your assistant to create an interaction variable between public schools and cost of attendance. Your assistant adds the cost of attendance variable (
COSTT4_A
) and the interaction variable (
public_costt4a
= COSTT4_A * CONTROL) to the GLM and produces the following output: Your assistant suggests including the new interaction variable in the model. (c)
(3
points
) Critique your assistant’s suggestion using the information above.
ANSWER:
Page 20 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 8 (8
points
) Your assistant performed a principal components analysis (PCA) on the three features with midpoints of SAT scores (
SATVRMID
, SATMTMID
, SATWRMID
). The output of the PCA is shown below. Note that the SAT scores were standardized for the PCA analysis, and that institutions with missing SAT scores in the data were excluded. (a)
(
4 points
) Interpret standard deviation and proportion of variance in the output. Discuss the implications.
ANSWER: (b)
(
4 points
) Interpret the “Loadings of Principal Components” for PC1 and PC2 and describe the relationship among the three SAT features.
ANSWER:
Page 21 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 9 (7
points
) Your client is working with a national bank to provide student loans to universities and wants to have a better understanding of how certain variables impact loan repayment after graduation (
COMPL_RPY_7YR_RT
), particularly the variables TEST_REQ
, SATVRMID
, SATMTMID
and SATWRMID
. Your assistant provides you the following exploratory data analyses.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 22 October 17, 2023 Project Statement © 2023 Society of Actuaries The table below shows each combination of the TEST_REQ with SATVRMID, SATMTMID, SATWRMID, and COMPL_RPY_7YR_RT variables. It shows N: the total number of data points with that value of TEST_REQ, n_miss: the number of times the specified variable is missing, and pct_miss: n_miss / N.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 23 October 17, 2023 Project Statement © 2023 Society of Actuaries Your assistant used these variables to create an OLS model. You are provided with the model summary: (a)
(
3 points
) Describe 3 weaknesses of your assistant’s model with regard to model choice and data issues. ANSWER: (b)
(
2 points
) Explain the reason that variable TEST_REQ
has only one level shown in the model output. ANSWER:
Your assistant replaces the missing values of the SAT scores with the mean of each score and runs two models with different interaction terms: -
Model 1 with the following three interaction terms: TEST_REQ * SATWRMID, TEST_REQ * SATVRMID, and TEST_REQ * SATMTMID -
Model 2 with one interaction term: TEST_REQ * SATWRMID
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 24 October 17, 2023 Project Statement © 2023 Society of Actuaries You are provided with model output: Model 1:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 25 October 17, 2023 Project Statement © 2023 Society of Actuaries Model 2: (c)
(
2 points
) Explain the reason that some interaction coefficients in Model 1 are NAs.
ANSWER:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 26 October 17, 2023 Project Statement © 2023 Society of Actuaries Task 10 (7
points
) Looking at the target variable showing median earnings of students (
MD_EARN_WNE_P10
) your assistant notices outliers in the data and is concerned about poor model fit. (a)
(
2 points
) Explain why tree-based models are resilient to outliers in predictor variables. ANSWER: Your assistant asks which of the two metrics, Root Mean Square Error
(RMSE) or Mean Absolute Error (MAE), to use in evaluating model performance. Your assistant wants to use a metric that is more robust to outliers in the target variable. (b)
(2 points) Recommend which metric to use and justify your recommendation.
ANSWER: Your assistant built the tree below with median earnings (
MD_EARN_WNE_P10
) as the target variable and provided two data points pulled from the test data. Note: When reading the decision tree assume all node values are rounded to the nearest thousand. For example, “63e+3” should be interpreted as 63,000.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Page 27 October 17, 2023 Project Statement © 2023 Society of Actuaries SATMTMID AVGFACSAL TUITIONFEE_IN MD_EARN_WNE_P10 Data Point 1 630 9,000 35,000 120,000 Data Point 2 630 9,000 35,000 200,000 (c)
(3 points) Calculate the RMSE and MAE for the test data above using the tree model. Show your work.
ANSWER:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you
data:image/s3,"s3://crabby-images/43e15/43e15002582914b55ed6b493f6175fa4ceff801d" alt="Text book image"
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
data:image/s3,"s3://crabby-images/8a29e/8a29e7fadd68a8f9280d238fe4d773bcb22d8b15" alt="Text book image"
Elementary Geometry For College Students, 7e
Geometry
ISBN:9781337614085
Author:Alexander, Daniel C.; Koeberlein, Geralyn M.
Publisher:Cengage,
Recommended textbooks for you
- Algebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal LittellElementary Geometry For College Students, 7eGeometryISBN:9781337614085Author:Alexander, Daniel C.; Koeberlein, Geralyn M.Publisher:Cengage,
data:image/s3,"s3://crabby-images/43e15/43e15002582914b55ed6b493f6175fa4ceff801d" alt="Text book image"
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
data:image/s3,"s3://crabby-images/8a29e/8a29e7fadd68a8f9280d238fe4d773bcb22d8b15" alt="Text book image"
Elementary Geometry For College Students, 7e
Geometry
ISBN:9781337614085
Author:Alexander, Daniel C.; Koeberlein, Geralyn M.
Publisher:Cengage,