MA322_Lab_3_-_MLR(1)
pdf
keyboard_arrow_up
School
Fashion Institute Of Technology *
*We aren’t endorsed by this school
Course
322
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
1
Uploaded by unknownuser123.
Lab 3 – Prediction using Multiple Regression
Insurance companies make money by collecting more in yearly premiums than they spend on medical care for their
beneficiaries. Insurers invest a great deal to develop models that attempt to predict medical charges beneficiaries will
incur. Medical expenses are notorious for being difficult to estimate because the most costly conditions are rare and
seemingly random. Nevertheless, there are patterns that emerge for certain segments of the population. For instance,
lung cancer is more likely among smokers than non-smokers, and heart disease may be more likely among the obese. Our
overarching goal will be to use patient data from the past calendar year in order to predict the cost of new patients and
thereby use this information to determine suitable yearly premiums.
Use R and present all your answers/explanations/visualizations in a text document, using a consistent font.
The data file was recorded comma-separated-variable form in file “insurance.csv”.
You will need to watch the tutorials to complete some of these tasks.
1.
Which feature is your target and what type of feature is it? What general family of methods have we learned to
predict this type of feature?
2.
Which are your potential independent features? What types of variables are each of these? What relationship do
you expect each of these features to have with the target?
3.
How many examples do we have in this data set? What does each example represent here?
4.
Construct a scatterplot matrix of the numerical features in this data set. You must subset the data to exclude the
categorical features. Try:
pairs(ins[ , c(“age”, “bmi”, “children”, “charges”)]) Note: ins is what I called my data.frame when importing, this part may be different for you.
5.
Calculate
r
, the Pearson’s correlation coefficient, for every pair of numerical features in this data. A correlation
matrix should do the trick. Describe the direction and strength of a few of the strong relationships in this data.
Does this agree with what you learned from your scatterplot matrix?
Again you will have to subset as in Q4.
Try:
cor(ins[ , c(“age”, “bmi”, “children”, “charges”)]) 6.
Use the
lm
function to learn a MLR model on this data using age, children, bmi, sex, and smoker as independent
features. Report the regression coefficients and write the equation in
Ŷ = b
0
+ b
1
X
1
+ b
1
X
2
+ … + b
k
X
k
form.
Don’t use
y
and
x
, instead use the appropriate feature names with the corresponding coefficients. Describe one
of the partial slopes in the equation to show you understand what this means.
7.
Report
R
2
and
Adjusted R
2
for the model you created in Q6, interpret them.
8.
Create a new MLR model the same as Q6 excluding sex. Compare the new Adjusted R
2
to the Adjusted R
2
from
the larger model in Q6. Did we need the larger model from Q6 or did the smaller model you created here have as
much explaining power? Hint: compare Adjusted R
2
from the two models.
9.
Use the regression model you learned from Q8 to predict the charges for a new patient applying for health
insurance who is 22 years old, a non-smoker, has no children, and has a bmi of 30. Compare that to the
prediction for a smoker with all other information the same.
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Café Michigan's manager, Gary Stark, suspects that demand for mocha latte coffees depends on the price being charged. Based on historical observations, Gary has gathered the following data,
which show the numbers of these coffees sold over six different price values:
Price
$2.70
$3.40
$1.90
$4.10
$3.00
$4.00
Using simple linear regression, and given that the price per cup is $1.80, the forecasted demand for mocha latte coffees will be cups (enter your response rounded to one decimal place).
Number Sold
765
510
980
245
320
480
arrow_forward
In multiple regression analysis, a problem that is not found in simple linear regressions can be caused by the correlation among the independent variables. This problem is called…
…multicollaboration
…multicollinearity
…multicultural
…correlation collateral damage
arrow_forward
GREAT PROJECT
Part 1: Correlation and Linear Regression
It is widely believed that the more education one receives the higher the income earned at the time of first employment and over the course of a career. However, due to varying reasons, many people never complete high school and, thus, never receive their high-school diploma. Although individuals without a high-school diploma are often able to find employment, they experience economic outcomes quite different from those who finish high school before entering the workforce to earn a living. Across the nation, there are millions of individuals with families who are now working but do not possess the credentials of a high-school diploma. Many of these individuals and their families are considered to be a part of the working poor that make up a considerable portion of this nation’s labor force.
1. Use technology to create and provide a scatterplot of the association between the “percent of low-income working families” and the “percent…
arrow_forward
Jensen, Solberg, and Zorn investigated the relationship of insider ownership, debt, and dividend policies in companies. One of their findings was that firms with high insider ownership choose lower levels of both debt and dividends. Shown here is a sample of data of these three variables for 11 different industries. Use the data to develop the equation of the regression model to predict insider ownership by debt ratio and dividend payout. Comment on the regression coefficients.
arrow_forward
Use the Financial database from “Excel Databases.xls” on Blackboard. Use Total Revenues, Total Assets, Return on Equity, Earnings Per Share, Average Yield, and Dividends Per Share to predict the average P/E ratio for a company. Use Excel to develop the multiple linear regression model. Assume a 5% level of significance.
Which independent variable is the strongest predictor of the average P/E ratio of a company?
A. Total Revenues
B. Average Yield
C. Earnings Per Share
D.Return on Equity
E. Total Assets
F.Dividends Per Share
Company
Type
Total Revenues
Total Assets
Return on Equity
Earnings per Share
Average Yield
Dividends per Share
Average P/E Ratio
AFLAC
6
7251
29454
17.1
2.08
0.9
0.22
11.5
Albertson's
4
14690
5219
21.4
2.08
1.6
0.63
19
Allstate
6
20106
80918
20.1
3.56
1
0.36
10.6
Amerada Hess
7
8340
7935
0.2
0.08
1.1
0.6
698.3
American General
6
3362
80620
7.1
2.19
3
1.4
21.2
American Stores
4
19139
8536
12.2
1.01
1.4
0.34
23.5
Amoco
7
36287…
arrow_forward
What do you think would be your reservations in relaying on the linear regression model for budgetary planning purposes?
arrow_forward
Training Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables.
▪ Business unit-profitability (Profit per year in $).
▪ Working experiences in Nimrod Inc (Years).
▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise.
Table Attached
Question:
Use the (full) model to determine the compensation for a manager who has been working for twelve years in a company, no graduate degree, and Nimrod Inc profit of $8.000.000 last year.
arrow_forward
Training Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables.
▪ Business unit-profitability (Profit per year in $).
▪ Working experiences in Nimrod Inc (Years).
▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise.
Table Attached
Question:
At the 5% significance level, is the overall regression model significant
arrow_forward
Training Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables.
▪ Business unit-profitability (Profit per year in $).
▪ Working experiences in Nimrod Inc (Years).
▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise.
Table Attached
Question:
Based on the regression result, write the estimate equation of the regression model for compensation
arrow_forward
Training Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables.
▪ Business unit-profitability (Profit per year in $).
▪ Working experiences in Nimrod Inc (Years).
▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise.
Table Attached
Question:
Which explanatory variables and interaction terms are significant and not significant at alpha = 5%? Explain your answer briefly.
arrow_forward
Training Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables.
▪ Business unit-profitability (Profit per year in $).
▪ Working experiences in Nimrod Inc (Years).
▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise.
Table Attached
Question:
Interpret the coefficient of determination of the regression-based compensation model.
arrow_forward
A regression equation is obtained for a set of data. After examining a scatter diagram, the researcher notices a data point that is potentially an influential point. How could she confirm that this data point is indeed an influential point?
arrow_forward
Explain cautions, mistakes, and pitfalls to watch out for, in the use of regression and forecasting models.
arrow_forward
An
arrow_forward
please answer all thanks !
arrow_forward
1. A change of a dependent variable is all due to a manipulation of intended independent variables . Explain, with an example, why this statement can be inaccurate . Explain why cause of a regression model can also fail to discover a causal relationship
arrow_forward
In general, what are some problems with using regression to measure causal effects?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305071742/9781305071742_smallCoverImage.gif)
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305115545/9781305115545_smallCoverImage.gif)
College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781285463247/9781285463247_smallCoverImage.gif)
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- Café Michigan's manager, Gary Stark, suspects that demand for mocha latte coffees depends on the price being charged. Based on historical observations, Gary has gathered the following data, which show the numbers of these coffees sold over six different price values: Price $2.70 $3.40 $1.90 $4.10 $3.00 $4.00 Using simple linear regression, and given that the price per cup is $1.80, the forecasted demand for mocha latte coffees will be cups (enter your response rounded to one decimal place). Number Sold 765 510 980 245 320 480arrow_forwardIn multiple regression analysis, a problem that is not found in simple linear regressions can be caused by the correlation among the independent variables. This problem is called… …multicollaboration …multicollinearity …multicultural …correlation collateral damagearrow_forwardGREAT PROJECT Part 1: Correlation and Linear Regression It is widely believed that the more education one receives the higher the income earned at the time of first employment and over the course of a career. However, due to varying reasons, many people never complete high school and, thus, never receive their high-school diploma. Although individuals without a high-school diploma are often able to find employment, they experience economic outcomes quite different from those who finish high school before entering the workforce to earn a living. Across the nation, there are millions of individuals with families who are now working but do not possess the credentials of a high-school diploma. Many of these individuals and their families are considered to be a part of the working poor that make up a considerable portion of this nation’s labor force. 1. Use technology to create and provide a scatterplot of the association between the “percent of low-income working families” and the “percent…arrow_forward
- Jensen, Solberg, and Zorn investigated the relationship of insider ownership, debt, and dividend policies in companies. One of their findings was that firms with high insider ownership choose lower levels of both debt and dividends. Shown here is a sample of data of these three variables for 11 different industries. Use the data to develop the equation of the regression model to predict insider ownership by debt ratio and dividend payout. Comment on the regression coefficients.arrow_forwardUse the Financial database from “Excel Databases.xls” on Blackboard. Use Total Revenues, Total Assets, Return on Equity, Earnings Per Share, Average Yield, and Dividends Per Share to predict the average P/E ratio for a company. Use Excel to develop the multiple linear regression model. Assume a 5% level of significance. Which independent variable is the strongest predictor of the average P/E ratio of a company? A. Total Revenues B. Average Yield C. Earnings Per Share D.Return on Equity E. Total Assets F.Dividends Per Share Company Type Total Revenues Total Assets Return on Equity Earnings per Share Average Yield Dividends per Share Average P/E Ratio AFLAC 6 7251 29454 17.1 2.08 0.9 0.22 11.5 Albertson's 4 14690 5219 21.4 2.08 1.6 0.63 19 Allstate 6 20106 80918 20.1 3.56 1 0.36 10.6 Amerada Hess 7 8340 7935 0.2 0.08 1.1 0.6 698.3 American General 6 3362 80620 7.1 2.19 3 1.4 21.2 American Stores 4 19139 8536 12.2 1.01 1.4 0.34 23.5 Amoco 7 36287…arrow_forwardWhat do you think would be your reservations in relaying on the linear regression model for budgetary planning purposes?arrow_forward
- Training Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables. ▪ Business unit-profitability (Profit per year in $). ▪ Working experiences in Nimrod Inc (Years). ▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise. Table Attached Question: Use the (full) model to determine the compensation for a manager who has been working for twelve years in a company, no graduate degree, and Nimrod Inc profit of $8.000.000 last year.arrow_forwardTraining Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables. ▪ Business unit-profitability (Profit per year in $). ▪ Working experiences in Nimrod Inc (Years). ▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise. Table Attached Question: At the 5% significance level, is the overall regression model significantarrow_forwardTraining Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables. ▪ Business unit-profitability (Profit per year in $). ▪ Working experiences in Nimrod Inc (Years). ▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise. Table Attached Question: Based on the regression result, write the estimate equation of the regression model for compensationarrow_forward
- Training Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables. ▪ Business unit-profitability (Profit per year in $). ▪ Working experiences in Nimrod Inc (Years). ▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise. Table Attached Question: Which explanatory variables and interaction terms are significant and not significant at alpha = 5%? Explain your answer briefly.arrow_forwardTraining Dept. of Nimrod Inc wants to develop a regression-based compensation model (compensation in $ per year, Comp) for its mid-level managers to encourage performance, loyalty, and continuing education based on three variables. ▪ Business unit-profitability (Profit per year in $). ▪ Working experiences in Nimrod Inc (Years). ▪ Whether or not a manager has a graduate degree (Grads). If a manager has a graduate degree equals 1, 0 otherwise. Table Attached Question: Interpret the coefficient of determination of the regression-based compensation model.arrow_forwardA regression equation is obtained for a set of data. After examining a scatter diagram, the researcher notices a data point that is potentially an influential point. How could she confirm that this data point is indeed an influential point?arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra and Trigonometry (MindTap Course List)AlgebraISBN:9781305071742Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningCollege AlgebraAlgebraISBN:9781305115545Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningLinear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage Learning
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305071742/9781305071742_smallCoverImage.gif)
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781305115545/9781305115545_smallCoverImage.gif)
College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781285463247/9781285463247_smallCoverImage.gif)
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt