SAS Assignment #1
docx
keyboard_arrow_up
School
Texas A&M University, Corpus Christi *
*We aren’t endorsed by this school
Course
5315
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
14
Uploaded by CaptainElementEel34
Framingham Heart Study: Data Preparation
Industry Aligned Activity
Purpose
This activity focuses on preparing data from the Framingham Heart Study for future statistical analyses as well as exploring data through descriptive statistics.
SAS Software
This activity can be performed using any SAS programming environment, including SAS Studio in
SAS OnDemand for Academics
.
Industry Alignment
This activity aligns with the healthcare industry. It uses data from a clinical study conducted to identify characteristics contributing to cardiovascular disease.
Framingham Heart Study: Data Preparation
Industry Applied Activity
Table of Contents
Framingham Heart Study: Data Preparation
1
Purpose
1
SAS Software
1
Industry Alignment
1
Activity Notes and Requirements
3
Learning Objectives
3
Estimated Completion Time
3
Experience Level
3
Prerequisite Knowledge
3
Software
3
Content Knowledge
3
Additional Notes
3
Data Source
3
Introduction
3
Description of Variables
4
Framingham Heart Study: Data Preparation Activity
5
Part 1: Understanding the Variables
5
Part 2: Creating New Variables and Subsetting the Data
8
Appendix
12
Appendix A: Access Software
12
Appendix B: Helpful Documentation
12
Appendix C: Recommended Learning
12
2
Activity Notes and Requirements
Learning Objectives
This activity provides practice with skills such as:
Implementing data changes and manipulations
Preparing data for future possible statistical analyses
Exploring data through descriptive statistics including: o
Understanding variables and their values within the data
o
Recognizing the need for changes in the data
Estimated Completion Time
This activity will take students approximately 3 hours to complete. Experience Level
To complete this activity students should have the following levels of experience:
Intermediate skill in SAS programming
Beginner skill in statistics
Prerequisite Knowledge
Software
Students should have experience with the following:
Foundations of programming with the SAS Data Step including using functions and if/then/else conditional statements.
SAS descriptive procedures such as PROC PRINT, PROC CONTENTS, PROC FREQ, PROC MEANS, and PROC UNIVARIATE.
Content Knowledge
Students should have experience/knowledge with the following concepts:
Descriptive statistics such as mean, median, counts, and percentages
Conditional if/then/else logic
Additional Notes
This activity pairs well with the following activities that you will complete::
Framingham Heart Study: Descriptive Analysis, Industry Applied Activity
Framingham Heart Study: Statistical Analysis, Industry Applied Activity
Data Source
Introduction
This activity uses the HEART
dataset in the SASHELP library. To access the SASHELP library in SAS, select
View
Explorer
. In the Explorer window, select
Libraries
Sashelp
.
The data came from the landmark Framingham Heart Study (
https://framinghamheartstudy.org/
). The purpose of the Framingham Heart Study was to identify characteristics contributing to
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Framingham Heart Study: Data Preparation
Industry Applied Activity
cardiovascular disease. Important links between cardiovascular disease and high blood pressure, high cholesterol levels, cigarette smoking, and many other health factors were first established using its data. The original cohort of the Framingham Heart study consisted of 5,209 men and women between the ages of 28 and 62 living in Framingham, Massachusetts. The first visit of data collection for participants in this cohort occurred between 1948 and 1953, and participants were assessed every two years thereafter through April 2014—almost 7 decades! The complete Framingham Heart Study data consists of hundreds of datasets taken over time
at 32 biennial exams and has led to over 3000 (wow!) published journal articles. To simplify analyses for illustrative purposes, the SASHELP.HEART
dataset includes a snapshot of selected primary study variables taken at one of the biennial exams.
Description of Variables
The variables used for this exercise are:
Variable
Description
Status
Alive or dead
DeathCause
Cause of death
AgeCHDdiag
Age at which CHD was diagnosed
Sex
Male or female
AgeAtStart
Age at the entry into the Framingham Heart Study
Height
Height in inches
Weight
Weight in pounds
Diastolic
Diastolic blood pressure
Systolic
Systolic blood pressure
MRW
Metropolitan Relative Weight
Smoking
Number of packs of cigarettes smoked per week
AgeatDeath
Age at death
Cholesterol
Total cholesterol
Chol_Status
Total cholesterol categorized into groups
BP_Status
Diastolic and systolic blood pressure categorized into groups
Weight_Status
Height and weight categorized into groups
Smoking_Statu
s
Number of packs of cigarettes smoked per week categorized into groups
4
Framingham Heart Study: Data Preparation
Industry Applied Activity
Framingham Heart Study: Data Preparation Activity
This activity is comprised of two parts. Part one outlines how to explore the data to understand the variables for analysis. Part two outlines how to prepare the data for future analyses by creating new variables and subsetting the data.
Part 1: Understanding the Variables
Deciding an appropriate path for analysis often requires many steps. An important first step is exploring and examining the data. An initial exploratory data analysis provides understanding of
the meaning of study variables and can provide crucial clues into data preparations needed before analyzing the data. 1.
Open and examine the SASHELP.HEART
dataset and its variables. Familiarize yourself with the context and meanings behind the variables and their values. a.
How many observations are in the dataset?
There is a total of 5209 observations in the SASHELP.HEART dataset.
b.
How many variables are in the dataset? How many are numeric? How many are character?
There are 17 total variables in the dataset. 10 of the variables are numeric, the remaining 7 are character. Exploring the assigned values of character variables can demonstrate patterns and inherent orderings. The default ordering of levels in SAS is alphabetical order. The levels of many character variables have an inherent ordering of magnitude. For example, non-smokers smoke less than light smokers who smoke less than moderate smokers.
2.
Tabulate the levels of the character variables in the SASHELP.HEART
dataset. For each of the character variables:
a.
What data values or levels are observed for each?
The Status variable has two values: Alive (3218) and Dead (1991). The DeathCause variable includes five values: cancer (539), cerebral vascular disease (378), coronary heart disease (605), other (357), and unknown (112); with a blank data vluae indicating individuals currently alive. The Sex variable has two values: Female (2873) and Male (2336). Chol_Status has three values: Borderline (1861), Desirable (1405), and High (1791). Similarly, BP_Status has three data values: High (2267), Normal (2143), and Optimal (799). 5
Framingham Heart Study: Data Preparation
Industry Applied Activity
Under the Weight_Status variable, there are three levels: Normal (1472), Overweight (3550), and Underweight (181). Lastly, the Smoking_Status variable has five values: Heavy (16-25) (1046), Light (1-
5) (579), Moderate (6-15) (576), Non-smoker (2501), and Very Heavy (>25) (471). b.
Which variables have an inherent ordering of magnitude? Does alphabetical order of
the levels correspond to ordering levels by magnitude for any of these character variables?
Cholesterol Status, Weight Status, Blood Pressure Status, and Smoking Status display an inherent magnitude ordering. This ordering is determined by the levels of frequency of specific attributes related to each status, such as the number of cigarettes smoked in Smoking Status.
It is important to note that the alphabetical order of levels does not align with the magnitude ordering for any of the variables. For instance, Weight Status is listed as
Normal, Overweight, and Underweight in alphabetical order, which does not accurately reflect their correct magnitude ordering.
Examining the values of numeric variables can provide insights into their magnitude, spread, and symmetry. Variables with a symmetric distribution will have roughly equal mean and median, so can be summarized with either statistic. Variables with substantially different mean and median values indicate a non-symmetric distribution. Such variables may be better summarized with a median. Additionally, some numeric variables may have few unique values, so could be better summarized as categorical variables. 3.
Generate descriptive statistics and histograms for the numeric variables in the SASHELP.HEART
dataset. a.
What is the minimum, maximum, median, and mean of each variable?
Age CHD Diagnosed: Min 33, Max 90, Median 63, Mean 63.30.
Age at Start: Min 28.5, Max 61.5, Median 43, Mean 44.07.
Height: Min 51.5, Max 75.5, Median 64.5, Mean 64.81.
Weight: Min 70, Max 290, Median 150, Mean 153.09.
Diastolic: Min 50, Max 160, Median 84, Mean 85.36.
Systolic: Min 84, Max 292, Median 132, Mean 136.91.
Metropolitan Relative Weight: Min 68, Max 260, Median 118, Mean 119.96.
Smoking: Min 0, Max 60, Median 1, Mean 9.37.
Age at Death: Min 36, Max 93, Median 71, Mean 70.54.
Cholesterol: Min 100, Max 540, Median 223, Mean 227.42
.
b.
Do the mean and median seem substantially different for any of the variables? The mean and median for the Smoking variable is substantially different compared
to the other variables. The median 1 while its mean is 9.37. The other variables’ means and medians are not that different from each other. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Framingham Heart Study: Data Preparation
Industry Applied Activity
c.
Does Smoking
seem to be better suited to be analyzed as a categorical variable or a continuous variable?
The Smoking variable is better suited to be analyzed as a categorical variable.
The SASHELP.HEART
dataset contains several categorical variables whose levels were originally created from values of continuous variables in the dataset. Understanding the relationships between related continuous and categorical predictors in a dataset can inform choices of predictors in later statistical analyses. 4.
Explore the variables Weight_Status
, Smoking_Status
, Chol_Status
, and BP_Status
as follows:
a.
Variables Weight_Status
, MRW
, and Weight
:
i.
What are the ranges (minimum and maximum) of variables MRW
and Weight
for each level of Weight_Status
?
Normal weight status ranges from Weight 92-186 and MRW 91-109. Overweight status spans Weight 104-300, and MRW 110-268. Underweight has a consistent minimum of 67 for both MRW and Weight, with a maximum of 90 for MRW and 150 for Weight. ii.
Are the ranges of MRW
for levels of Weight_Status
overlapping?
The ranges of MRW for levels of Weight_Status are not overlapping.
iii.
Are the ranges of Weight
for levels of Weight_Status
overlapping?
The ranges of Weight for levels of Weight_Status are not overlapping.
iv.
Using your answers to the previous two questions, when this dataset was created which values, MRW
or Weight
, were used to create the levels for Weight_Status
?
Weight values were used to create the levels for Weight_Status. Comparing
MRW values to Weight_Status levels, the results were not consistent. b.
Variables Smoking_Status
and Smoking
:
i.
Which values of Smoking
are categorized as Smoking_Status=Non-smoker
? Light
? Moderate
? Heavy
? Very Heavy
?
Non-smoker is 0. Light smokers are 1-5. Moderate smokers have values of 6-15. Heavy smokers are considered as 16-25. Lastly, Very Heavy smokers have a value of greater than 25. 7
Framingham Heart Study: Data Preparation
Industry Applied Activity
ii.
Are any values of Smoking
categorized into more than one level of Smoking_Status
?
No, all of the values are categorized the same.
c.
Variables Chol_Status
and Cholesterol
:
i.
What are the ranges (minimum and maximum) of Cholesterol
for each level of Chol_Status
?
The minimum for Borderline status is 200, maximum is 239. For the Desirable level, the minimum is 96 while the maximum is 199. For High levels of Cholesterol, the minimum is 240 and maximum is 568.
ii.
Are the ranges of Cholesterol
for levels of Chol_Status
overlapping?
No, the ranges of Cholesterol for levels of Chol_Status is not overlapping.
d.
Variable BP_Status
:
i.
What are the ranges (minimum and maximum) of Diastolic
and Systolic
for each level of BP_Status
?
High Blood Pressure Status: Diastolic minimum is 52, maximum is 160. Systolic minimum is 112, maximum is 300.
Normal Blood Pressure Status: Diastolic minimum is 54, maximum is 88. Systolic minimum is 101, maximum is 140.
Optical Blood Pressure Status: Diastolic minimum is 50, maximum is 78. Systolic minimum is 82, maximum is 118.
ii.
Are the ranges of Diastolic
for levels of BP_Status
overlapping?
No, the ranges of Diastolic for levels of BP_Status is not overlapping.
iii.
Are the ranges of Systolic
for levels of BP_Status
overlapping?
Yes, the ranges of Systolic for levels of BP_Status does overlap.
iv.
Normal levels of blood pressure are usually defined as under 120 for systolic blood pressure and under 80 for diastolic blood pressure. Based on your answers to the previous questions, are one or both of systolic and diastolic blood pressure required to be high for the individual to be categorized as BP_Status=High
? The High BP Status ranges exceed normal levels which means that both diastolic and systolic must be higher for a person to be classified as BP Status=High. Exploring patterns of missingness in a dataset gives insight into data collection procedures for the study generating the dataset and may also indicate data entry or data collection errors.
5.
Examine missing data in the SASHELP.HEART
dataset.
8
Framingham Heart Study: Data Preparation
Industry Applied Activity
a.
Which variables have no missing data?
The Alive/Dead Status, Sex variable, BP_Status, AgeAtStart, Diastolic, and Systolic variables have no missing data.
b.
Which variables have missing data?
Cause of Death, Cholesterol Status, Weight Status, Smoking Status, Age CHD, Height, MRW, Smoking, AgeAtDeath, and Cholesterol variables have missing data.
c.
For each variable with missing data, what percent of the data is missing?
Cause of Death: 3218 missing values (61.78% of 5209 observations).
Cholesterol Status: 152 missing values (2.92%).
Weight, Height, MRW: 6 missing values each (0.12% for each variable).
Smoking Status: 36 missing values (0.69%).
Age CHD: 3760 missing values (72.18%).
Age at Death: 3218 missing values (61.78%).
Cholesterol: 152 missing values (2.92%).
d.
Using what you currently know about the dataset, given the definition of the variable(s) or given values of other variables in the dataset, which variable(s) have patterns of missingness that could be expected?
The Cause of Death and Age at Death variables have patterns of missingness that could be expected due to them not counting people that are still currently alive. 6.
Examine patterns of missingness on certain groups of variables as follows:
a.
If MRW
is non-missing, are both Height
and Weight
always non-missing?
No, if MRW is non-missing, it is possible for Height to be missing 0.08% of the time.
b.
If Weight_Status
is non-missing, are both Height
and Weight
always non-missing?
No, if Weight_Status is non-missing, Height could be missing 0.08% of the time.
c.
If Smoking is non-missing is Smoking_Status
always non-missing, and vice versa?
Yes, if Smoking is non-missing, Smoking_Status will always be non-missing and vice
versa.
d.
If Cholesterol
is non-missing, is Chol_Status
always non-missing, and vice versa?
Yes, if Cholesterol is non-missing, Chol_Status is always non-missing and vice versa.
e.
Analyze DeathCause
and AgeAtDeath
grouped by Status
.
i.
Are DeathCause
and AgeAtDeath ever missing when Status=Dead
?
No, DeathCause and AgeAtDeath is never missing when Status=Dead.
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Framingham Heart Study: Data Preparation
Industry Applied Activity
ii.
Are DeathCause
and AgeAtDeath
ever non-missing when Status=Alive
?
No, DeathCause and AgeAtDeath are never non-missing when Status=Alive.
f.
Analyze AgeCHDdiag
grouped by DeathCause
. Is AgeCHDDiag
ever missing when DeathCause=Coronary Heart Disease
?
No, AgeCHDDiag is never missing when DeathCause=coronary heart disease.
Missing values can also impact later statistical analyses. SAS statistical procedures perform what is called a complete case analysis, which is to say that analyses will exclude any observation with a missing value for any variable involved in the analysis. Such exclusions can substantially decrease the number of observations in a dataset that are used in a later statistical
analysis.
7.
Tabulate the percent of observations in the SASHELP.HEART
dataset that have non-
missing values for all the predictor variables that you will use in later analyses: AgeAtStart
, BP_Status
, Chol_Status
, Cholesterol
, Diastolic
, Height
, MRW
, Sex
, Smoking
, Smoking_Status
, Systolic
, Weight
, and Weight_Status
.
Does the SASHELP.HEART
dataset seem to have a high amount of missing data for any of these predictors? The SASHELP.HEART dataset exhibits significant missing data, particularly for variables
such as Cholesterol, Chol_Status, Smoking, Smoking_Status, Height, MRW, and Weight. Although Cholesterol and Chol_Status has the highest count of missing values at 152, multiple variables show a notable presence of missing data overall.
Part 2: Creating New Variables and Subsetting the Data
An important next step after exploring a dataset is to create any new variables needed for later analyses. The primary outcome of the Framingham Heart study is whether a patient developed coronary heart disease. Interestingly, this variable is not included in the SASHELP.HEART
dataset.
1.
Use information in the variable AgeCHDdiag
to create a variable describing whether a patient developed coronary heart disease. Specifically, if AgeCHDdiag
is non-missing, then the individual had
coronary heart disease, and if AgeCHDdiag
is missing, the individual did not have coronary heart disease.
a.
Create a new numeric variable named CHD
.
b.
Store this new variable in a temporary dataset named WORK.HEART1
. 10
Framingham Heart Study: Data Preparation
Industry Applied Activity
c.
Code this variable so that CHD= 1
if AgeCHDdiag
takes a value from 0 to 999 and CHD= 0
otherwise. After creating any new variable, make sure to check your work.
2.
Generate descriptive statistics for the variable AgeCHDdiag
grouped by CHD
. a.
Is CHD
a numeric variable?
Yes, CHD is a numeric variable.
b.
When CHD=1
, is AgeCHDdiag
always non-missing?
Yes, when CHD=1, AgeCHDdiag is always non-missing.
c.
When CHD=0
, is AgeCHDdiag
always missing?
Yes, when CHD=0, AgeCHDdiag is always missing.
Let’s now turn to creating new predictor variables. Statistical analyses can determine which variables collected in the Framingham Heart Study are predictive of development of coronary heart disease. To facilitate comparison of levels of categorial predictors, levels of categorial predictors must be recoded so that alphabetical order of the levels also corresponds to ordering the levels by magnitude. This is desirable since statistical procedures use the alphabetic last level as a reference level by default. Re-coding is also useful so that levels appear in a logical order in plots.
3.
Re-code categorial variables in the SASHELP.HEART
dataset as follows:
a.
Use WORK.HEART1
as the input dataset.
b.
Create an output dataset named WORK.HEART2
.
c.
Create a new variable Chol_StatusNew
by recoding Chol_Status
as follows:
High = 1 High
Borderline = 2 Borderline
Desirable = 3 Desirable
d.
Create a new variable Sex_New
by recoding Sex as follows:
Male = 1 Male
Female = 2 Female
e.
Create a new variable Weight_StatusNew
by recoding Weight_Status
as follows: Overweight = 1 Overweight
Normal = 2 Normal
11
Framingham Heart Study: Data Preparation
Industry Applied Activity
Underweight = 3 Underweight
f.
Create a new variable Smoking_StatusNew
by recoding Smoking_Status
as follows:
Very Heavy (> 25) = 1 Very Heavy
Heavy (16-25) = 2 Heavy
Moderate (6-15) = 3 Moderate
Light (1-5) = 4 Light
Non-smoker = 5 Non-smoker
g.
Tabulate each of your new variables as follows to check your work:
i.
Tabulate levels of each of the four new variables over all observations.
ii.
Tabulate levels of Chol_StatusNew
grouped by Chol_Status
.
iii.
Tabulate levels of Sex_New
grouped by Sex
.
iv.
Tabulate levels of Weight_StatusNew
grouped by Weight_Status
.
v.
Tabulate levels of Smoking_StatusNew
grouped by Smoking_Status
.
vi.
Do you see the expected ordering of levels within each variable (in part i) as well as the expected combinations of levels of re-coded and original variables
(in parts ii-v)?
No, the only variable that has expected ordering of levels is Smoking Status.
The other variables are ordered only alphabetically.
We have now finished creating new variables. In part 1, question 7, you tabulated the amount of missing data for the set of predictor variables of interest in the SASHELP.HEART
dataset. From this, you noticed that only a small percentage (<5%) of observations in the SASHELP.HEART
dataset have missing data for any of these variables. Ideally, statistical analyses for the SASHELP.HEART
dataset should be performed only on observations with no missing data for all these predictors. This ensures that all analyses, regardless of the predictors included, use the same number of observations. Given that the amount of missing data is small, analyses can simply exclude any observation with missing data on at least one of the predictors of interest. Other strategies such as single or multiple imputation could be employed, but those are beyond the scope of this exercise.
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Framingham Heart Study: Data Preparation
Industry Applied Activity
4.
Create a new permanent dataset that can be used for later statistical analyses.
a.
Use WORK.HEART2
as the input dataset.
b.
Create a library named HEARTLIB
.
c.
Create an output dataset named HEARTLIB.MYHEART
that contains only those observations that have non-missing values for the variables below:
AgeAtStart
Height
Systolic
BP_Status
MRW
Weight
Chol_StatusNew
Sex_New
Weight_Status
Cholesterol
Smoking
Weight_Status2
Diastolic
Smoking_StatusNew
This dataset should have 5039 observations.
d.
Check your work for the dataset HEARTLIB.MYHEART
by tabulating values of character variables and generating descriptive statistics for numeric variables. Do you see any missing values in any of the tabulations or statistics generated?
Yes, there are missing values for AgeCHDDiag, AgeAtDeath, and Cause of Death. Congratulations- you have completed data preparation for the Framingham Heart Study dataset! A next step in exploring relationships between coronary heart disease and predictors of interest is to perform additional descriptive analyses by creating logit plots. The related Framingham Heart Study: Descriptive Analysis, Industry Applied Activity provides practice in generating logit plots. Following this, logistic regression models can be fit to formalize the statistical relationships between coronary heart disease and predictors of interest. The related Framingham Heart Study: Statistical Analysis, Industry Applied Activity
provides practice in fitting these logistic regression models. These activities can be found in the Academic Hub.
13
Framingham Heart Study: Data Preparation
Industry Applied Activity
Appendix
Appendix A: Access Software
SAS OnDemand for Academics
(ODA) is a free, full suite of cloud-based software that supports the analytics life cycle- from data, to discovery, to deployment. Students can use SAS OnDemand for Academics to get access to SAS Studio for free. Click here
to access ODA. Note:
You need to have an established SAS profile linked to an academic affiliation. If you don't have a SAS Profile, click here
to set one up. Check out Frequently Asked Questions
for more support.
Appendix B: Helpful Documentation Below are helpful links to documentation regarding the procedures used in the activity.
The CONTENTS procedure
The PRINT procedure
The MEANS procedure
The FREQ procedure
The UNIVARIATE procedure
Base SAS Procedures Guide
DATA Step Statements: Reference
Appendix C: Recommended Learning
The SAS Global Academic Program
offers free e-learning courses for students to learn SAS through the Student Skill Builder. The following e-learning courses and paths available are recommended to help with this activity:
SAS Programming 1: Essentials
SAS Programming 2: Data Manipulation Techniques
Statistics 1: Introduction to ANOVA, Regression, and Logistic Regression 14
Related Documents
Related Questions
Globally, numerous people died in road traffic crashes every year. For the betterment of the public at
large, ways to alleviate the frequency and severity of traffic crashes have been the primary concerns of
many governments. In fact, many traffic crashes can be avoided with the implementation of effective
policies and regulations.
Assume that a dataset is collected for traffic crash analysis. The dataset contains data of severe traffic
crashes between 2015 and 2019 in Australia. The description of the dataset is listed in Table 1.
Suggest two (2) additional variables that can be included in traffic crash analysis. Explain the rationale of
their inclusion.
Table 1. Description of the dataset
Field
Description
Experience
The driving experience of the driver (Under 7 years/7-
14 years/Over 14 years)
Whether the crash involves fatigue driving (True/False)
Whether the crash involves drunk or drug driving
(True/False)
The type of road where the crash occurs (E.g.,
Highway, Freeway, etc)
The…
arrow_forward
en 000
Pursuing an MBA is a major personal investment. Tuition and expenses associated with business school programs are costly, but the high costs come with hopes of career advancement and high salaries. A prospective MBA student would like to examine the factors that impact starting salary upon graduation and decides to develop a model that uses program per-year tuition as a predictor of
starting salary. Data were collected for 37 full-time MBA programs offered at private universities. The data are stored in the accompanying table. Complete parts (a) through (e) below.
E Click the icon to view the data on program per-year tuition and mean starting salary.
a. Construct a scatter plot. Choose the correct graph below.
OA.
O B.
В.
OC.
OD.
O D.
Q
80,000-
80,000-
TOLL
200,000-
200,000-
0-
0+
04
200,000
Starting Salary (S)
80,000
200,000
Starting Salary (S)
80,000
Tuition ($)
Tuition ($)
b. Assuming a linear relationship, use the least-squares method to determine the regression coefficients…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Related Questions
- Globally, numerous people died in road traffic crashes every year. For the betterment of the public at large, ways to alleviate the frequency and severity of traffic crashes have been the primary concerns of many governments. In fact, many traffic crashes can be avoided with the implementation of effective policies and regulations. Assume that a dataset is collected for traffic crash analysis. The dataset contains data of severe traffic crashes between 2015 and 2019 in Australia. The description of the dataset is listed in Table 1. Suggest two (2) additional variables that can be included in traffic crash analysis. Explain the rationale of their inclusion. Table 1. Description of the dataset Field Description Experience The driving experience of the driver (Under 7 years/7- 14 years/Over 14 years) Whether the crash involves fatigue driving (True/False) Whether the crash involves drunk or drug driving (True/False) The type of road where the crash occurs (E.g., Highway, Freeway, etc) The…arrow_forwarden 000 Pursuing an MBA is a major personal investment. Tuition and expenses associated with business school programs are costly, but the high costs come with hopes of career advancement and high salaries. A prospective MBA student would like to examine the factors that impact starting salary upon graduation and decides to develop a model that uses program per-year tuition as a predictor of starting salary. Data were collected for 37 full-time MBA programs offered at private universities. The data are stored in the accompanying table. Complete parts (a) through (e) below. E Click the icon to view the data on program per-year tuition and mean starting salary. a. Construct a scatter plot. Choose the correct graph below. OA. O B. В. OC. OD. O D. Q 80,000- 80,000- TOLL 200,000- 200,000- 0- 0+ 04 200,000 Starting Salary (S) 80,000 200,000 Starting Salary (S) 80,000 Tuition ($) Tuition ($) b. Assuming a linear relationship, use the least-squares method to determine the regression coefficients…arrow_forward
Recommended textbooks for you
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL