Assignment5
pdf
keyboard_arrow_up
School
DePaul University *
*We aren’t endorsed by this school
Course
323
Subject
Statistics
Date
May 23, 2024
Type
Pages
3
Uploaded by SargentDiscoveryWolf39
D
ATA A
NALYSIS A
ND R
EGRESSION Assignment-5
| Total Points: 36 pts for DSC 323; 41 pts for DSC 423 Note: •
All assignments should be submitted in a single MS WORD format
, no PDFs or any other file types will be accepted. If you submit any other file type, it will not be graded. •
No extensions will be given unless for a documented reason specified in the syllabus, no late assignments past the due date even a couple of minutes late will be accepted as you have an extra day (7-days) to submit your assignments. •
Submitting work that is not yours is grounds for an automatic ‘F’ for the entire course – this includes taking content and ideas from others or consulting others to complete your deliverables other than your instructor. •
SAS software and virtual server stalls, gets slow and crashes; so start early and keep multiple backups in multiple places/mediums. Late submission or inability to do the assignment due to server and/or software issues will not be accepted. Any issues relating with SAS, contact IS using the phone number provided in the syllabus, I won’t be able to help you with DePaul software related issues. •
Make sure to double check your submissions. After you submit the assignment, log out of D2L, log back in, and click on your submission to see if you submitted the right file(s) and it is the correct version. Wrong submissions will not be graded.
Note: For all questions, immaterial if whether the relevant output is asked to be attached or not, make sure to include it. Also, it is important to include the sign (negative/positive or increase/decrease, and units of measurements e.g. $ or $ 99 million,%, etc.) otherwise points will be deducted. PROBLEM 1 [25 pts] – to be answered by everyone This problem asks you to build a model for the college dataset (college.csv) that contains the following variables: School School name Private public/private indicator. YES if university is private, NO if university is public. Accept.pct percentage of applicants accepted Elite10 Elite schools with majority of students from the top 10% of their high school class (0- Not Elite, 1-Elite) F.Undergrad number of full-time undergraduate students P.Undergrad number of part-time undergraduate students Outstate Out-of-state tuition Room.Board room and board costs Books estimated book costs Personal Estimated personal spending PhD Percent of faculty with PhD Terminal Faculty with terminal degrees (
terminal degree is a university degree that is either highest on the academic track or highest on the professional track in a given field of study) S.F.Ratio Student/faculty ratio perc.alumni Percent of alumni who donate Expend Instructional expenditure per student Grad.Rate Graduation rate in 4 years
Apply regression analysis techniques to analyze the relationship among the observed variables and build a model to predict Graduation Rates (Grad.Rate). Note: Depending on how you import you data (INFILE or
IMPORT) the SAS may relabel the column names. Make sure to use the variable names that appear when you use a proc print.
Note: Before you start, open the college.csv file, and examine the data.
Answer the following questions. a)
Analyze the distribution of Grad.Rate and discuss if the distribution is symmetric, or if you need to apply any transformation (This is the data exploration stage, therefore use the appropriate statics to explore your data). b)
Create scatterplots for Grad.Rate vs each of the independent variables. What conclusions can you draw about the relationships between Grad.Rate and the independent variables? (No need to include the scatterplots in your submission). c)
Build boxplots to evaluate if graduation rates vary by university type (private vs public) and by status (elite vs not elite). Include the boxplots and discuss your findings. (See SAS Procedures section on D2L if you need the code to generate a boxplot). d)
Run the full model. For the full model, discuss the parameter estimates, significance, goodness-of-fit and AdjR2 values. Include the relevant output. e)
Does multi-collinearity seem to be a problem here? What is your evidence? Compute and analyze the VIF statistics. Include the relevant output and discuss your answer. f)
Apply TWO selection methods to find an optimal subset of independent variables to predict Grad.Rate
. You can choose any two procedures among the ones we learned in class: backward selection, forward selection, adj-R
2
, Cp, stepwise. Make sure to include the o/p of the 2 selection methods and state which methods was used. No need to discuss the models, include the outputs. g)
Fit a final regression model M1
for Grad.Rate based on the results in f) – i.e. optimal model. Explain your choice. Write down the expression of the estimated model M1
. h)
Draw a plot of the studentized residuals against the predicted values. Does the plot show any striking pattern indicating problems in the regression analysis? Include the outputs and explain. i)
Analyze normal probability plot of residuals. Is there any evidence that the assumption of normality is not satisfied? Include the outputs and explain. j)
Are there any outliers or Influential Points? Compute appropriate statistics. Include the outputs. Take any action you think is necessary and explain why/why not you took these actions? k)
Analyze the AdjR
2
value for the final model and discuss how well the model explains the variation in graduation rates among the universities.
l)
Draw conclusions on graduation rates based on your regression analysis. What are the most important predictors in your model? Does your model show a significant difference in graduation rates between private and public universities? Do “elite” universities have higher graduation rates? Explain. m)
Use the final regression model to predict the graduation rate for the following values. Using SAS, compute the predicted graduation rate, 95% confidence interval and prediction interval for your estimate. Make sure to use SAS coding to determine the values. Include all relevant outputs. Discuss your findings. Private Yes Books 250 Accept.pct 0.87 Personal 1350 Elite10 Not Elite PhD 40 F.Undergrad 3000 Terminal 34 P.Undergrad 524 S.F.Ratio 30.2 Outstate 6500 perc.alumni 13 Room.Board 3300 Expend 5201 n)
Copy and paste your FULL SAS code into the word document along with your answers. PROBLEM 2 [11 pts] – to be answered by everyone Answer the following strictly
based on the lecture discussions
: 1.
[2 pts] What is the main difference between R2 and AdjR2? 2.
[2 pts] What was the main different between
cp and adjR2 selection methods compared to forward, backward or stepwise methods? 3.
[3 pts] One of the selection methods was both a selection method and criteria for determining if a model is good. What was it? Explain why you think so? 4.
Based on the lab activity that was done today a.
[2 pts] What code did we use to display 5 observations?. b.
[2 pts] What was the reason for assigning zero to dj3 and dj4 when we did predictions? PROBLEM 3 [5 pts] – For Graduate Students ONLY 1.
What selection methods did you use in Problem-1 (f)? 2.
Explain why you chose these two selection methods in Problem-1 (f) as opposed to the other methods. The reason(s) should have a methodological approach. If you say I wanted to try it, or it is the easiest to use or something to that effect you will receive zero. Provide reason(s) for both.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
In a certain year, there were 88 female officials in Congress, which is comprised of the House of Representatives and the Senate. If there were 64 more female
members of the House of Representatives than female senators, find the number of females in each house of Congress.
There were female senators and
female members of the House of Representatives.
Submit Assignment
Continue
O 2021 McGraw Hill LLC. All Rights Reserved. Terms of Use Privacy Center Accessibility
&
%23
24
8.
9
R
Y
つ
arrow_forward
An insurance company hires an actuary to determine whether the number of hours of safety drivingclasses can be used to predict the number of driving accidents for each driver. Identify theexplanatory variable, if any.
arrow_forward
A cement company sells his production in different cities through the appointed dealers. The
sales of his production in the last year is given in the following tabular form:
2500 and
0-500 500-1000 1000 15001500-20002000-2500
Sales (in '00' bags)
Number of dealers
40
48
60
52
35
The annual general meeting of the company decided to give award of Rs. 5,000 to each dealer
whose sales are more than the most usual sales. Calculate the total amount of money that would
be given to the dealers.
more
22
arrow_forward
I'm new to line segments. Can someone help me with these problems?
arrow_forward
I'm new to line segments. Can someone help me with these problems?
arrow_forward
I'm new to line segments. Can someone help me with these problems?
arrow_forward
I'm new to line segments. Can someone help me with these problems?
arrow_forward
Your electronic files, including both excel and word files, must be submitted. Please use a separate
worksheet in excel for each question and label them respectively. Please type the section number of
your class, name and ID of the team members on the file.
1. Researchers wanted to understand college students' usage of time. A formal survey of 3,000 students was
taken and the results are summarized as follows:
Activity
Attending class
Sleeping
Socializing & recreation
Studying
Working, volunteering, student clubs
Percentage (%)
a) Construct a bar chart, a pie chart, and a Pareto chart.
b) What conclusions can you reach concerning what college students do with their time?
ii.
iii.
iv.
9
24
51
7
9
2. Examine the data in the file "Class survey data.xls" posted in the week 5 of course portal.
a) Create a descriptive statistics summary table using Data Analysis Toolpak (Add in for Excel) for
survey questions 2 through 10 inclusive.
b)
Identify the data types of each of the 9 variables as…
arrow_forward
A new cold medication claims that, on average, individuals who take the medication recover within 5
days. Dr. Nelson believes the recovery time is longer. She surveys 31 people with colds who take the
medication. She finds that their recovery time, in days, is as follows:
7, 5, 4, 8, 5, 5, 4, 4, 5, 5, 6, 4, 4, 5, 4, 7, 5, 7, 4, 6, 4, 3, 5, 7, 8, 4, 7, 4, 6, 4, 6
Perform a hypothesis test using a 9% level of significance.
Step 1: State the null and alternative hypotheses.
H.: pv >v
(So we will be performing a right-tailed
Vv test.)
Step 2: Assuming the null hypothesis is true, determine the features of the distribution of
point estimates using the Central Limit Theorem.
By the Central Limit Theorem, we know that the point estimates are t-distributed
v with
distribution mean
and distribution standard deviation
Step 3: Find the p-value of the point estimate.
P( 2v
= P(tv 2v
%3D
p-value =
Step 4: Make a Conclusion About the null hypothesis.
Since the p-value =
= a, we do not reject
the…
arrow_forward
You have been asked to complete a short skills assessment exam that will be given to screen applicants to a Jr. Operations Analyst position.
check the attched pic for full question
arrow_forward
The term "snowstorms of note" applies to all snowfalls over
16
6 inches. The snowfall amounts for snowstorms of note in
Utica, New York, over a four-year period are as follows: 7.1,
9.2, 8.0, 6.1, 14.4, 8.5, 6.1, 6.8, 7.7, 21.5, 6.7, 9.0, 8.4, 7.0,
11.5, 14.1, 9.5, 8.6
What are the mean and population standard deviation for
these data, to the nearest hundredth?
A mean =
9.46; standard deviation
3.74
B mean =
9.46; standard deviation =
3.85
C mean =
9.45; standard deviation
=D3.74
D mean =
9.45; standard deviation
3.85
021 EducAide Sofware Inc. All rights reserved.
arrow_forward
Differentiate the following
arrow_forward
A company is evaluating the impact of a wellness program offered on-site as a means of reducing employee sick days. A total of 8 employees agree to participate in the evaluation which lasts 12 weeks. Their sick days in the 12 months prior to the start of the wellness program and again over the 12 months after the completion of the program are recorded and are shown below. Is there a significant reduction in the number of sick days taken after completing the wellness program? Use the Sign Test at a 5% level of significance.
Complete the table below.
Employee
Sick Days Taken in 12 Months Prior to Program
Sick Days Taken in 12 Months Following program
Difference (Reduction = Prior-After )
Sign
1
8
7
( )
( )
2
6
6
( )
( )
3
4
5
( )
( )
4
12
11
( )
( )
5
10
7
( )
( )
6
8
4
( )
( )
7
6
3
( )
( )
8
2
1
( )
( )…
arrow_forward
classify as either observational or experimental designCase 1: Starting Salaries. The National Association of Colleges and Employers (NACE) compiles information on salary offers to new college graduates and publishes the results in Salary Survey.
arrow_forward
Could you please answer this question using excel. For 1a) I got 84.75 and for part 1b) I got 85.33 and was wondering if you could check if my answers were correct. Thanks
arrow_forward
answer part a only in word format exoplaing context and be desprictive please
arrow_forward
Please help to answer the control chart - case 2 as attached.
arrow_forward
A shoe print was found at the scene of a hit and run automobile accident. How
can a suspect's shoe be individualized to match a print?
arrow_forward
tab
(2.1-2.6) Target *
← → C
caps lock
YouTube
→1
esc
Home
Maps Kindle
Winter 2023
canvas.seattlecolleges.edu/courses/10176/assignments/81095
Syllabus
Announcements
Modules
Assignments
People
Office 365
Central Learning
Support
Central eTutoring
Zoom 1.3
!
1
X
Q
A
N
Course Hero
2
= Psychology 2e - O... StatCrunch (1.4-1.7) Writing...
W
S
X
Match each scatterplot shown below with one of the four specified correlations.
#
3
C
E
D
O
0 0
xb Answered: You randomly surve X
CO
4
C
R
8
LL
%
5
Search or type URL
V
T
a. -0.45
b. -0.91
c. 0.86
d. 0.35
G
6
MacBook Pro
OF
Y
H
New Tab
&
7
U
00 *
8
J
1
(
9
x +
K
0
0
L
P
arrow_forward
What does Tukey’s test supplies?
arrow_forward
REVIEW for Unit 3 Quiz
13 of 14
Ms. Whisler entered a 2 hour cookie eating contest.
During the contest, Ms. Whisler ate 3 cookies every
10 minutes. How many did she eat during the entire
contest?
Ms. Whisler ate
cookies.
Next >
II
arrow_forward
please assist with this NON GRADED assignment
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Elementary Geometry For College Students, 7e
Geometry
ISBN:9781337614085
Author:Alexander, Daniel C.; Koeberlein, Geralyn M.
Publisher:Cengage,
Related Questions
- In a certain year, there were 88 female officials in Congress, which is comprised of the House of Representatives and the Senate. If there were 64 more female members of the House of Representatives than female senators, find the number of females in each house of Congress. There were female senators and female members of the House of Representatives. Submit Assignment Continue O 2021 McGraw Hill LLC. All Rights Reserved. Terms of Use Privacy Center Accessibility & %23 24 8. 9 R Y つarrow_forwardAn insurance company hires an actuary to determine whether the number of hours of safety drivingclasses can be used to predict the number of driving accidents for each driver. Identify theexplanatory variable, if any.arrow_forwardA cement company sells his production in different cities through the appointed dealers. The sales of his production in the last year is given in the following tabular form: 2500 and 0-500 500-1000 1000 15001500-20002000-2500 Sales (in '00' bags) Number of dealers 40 48 60 52 35 The annual general meeting of the company decided to give award of Rs. 5,000 to each dealer whose sales are more than the most usual sales. Calculate the total amount of money that would be given to the dealers. more 22arrow_forwardI'm new to line segments. Can someone help me with these problems?arrow_forwardYour electronic files, including both excel and word files, must be submitted. Please use a separate worksheet in excel for each question and label them respectively. Please type the section number of your class, name and ID of the team members on the file. 1. Researchers wanted to understand college students' usage of time. A formal survey of 3,000 students was taken and the results are summarized as follows: Activity Attending class Sleeping Socializing & recreation Studying Working, volunteering, student clubs Percentage (%) a) Construct a bar chart, a pie chart, and a Pareto chart. b) What conclusions can you reach concerning what college students do with their time? ii. iii. iv. 9 24 51 7 9 2. Examine the data in the file "Class survey data.xls" posted in the week 5 of course portal. a) Create a descriptive statistics summary table using Data Analysis Toolpak (Add in for Excel) for survey questions 2 through 10 inclusive. b) Identify the data types of each of the 9 variables as…arrow_forwardA new cold medication claims that, on average, individuals who take the medication recover within 5 days. Dr. Nelson believes the recovery time is longer. She surveys 31 people with colds who take the medication. She finds that their recovery time, in days, is as follows: 7, 5, 4, 8, 5, 5, 4, 4, 5, 5, 6, 4, 4, 5, 4, 7, 5, 7, 4, 6, 4, 3, 5, 7, 8, 4, 7, 4, 6, 4, 6 Perform a hypothesis test using a 9% level of significance. Step 1: State the null and alternative hypotheses. H.: pv >v (So we will be performing a right-tailed Vv test.) Step 2: Assuming the null hypothesis is true, determine the features of the distribution of point estimates using the Central Limit Theorem. By the Central Limit Theorem, we know that the point estimates are t-distributed v with distribution mean and distribution standard deviation Step 3: Find the p-value of the point estimate. P( 2v = P(tv 2v %3D p-value = Step 4: Make a Conclusion About the null hypothesis. Since the p-value = = a, we do not reject the…arrow_forwardYou have been asked to complete a short skills assessment exam that will be given to screen applicants to a Jr. Operations Analyst position. check the attched pic for full questionarrow_forwardThe term "snowstorms of note" applies to all snowfalls over 16 6 inches. The snowfall amounts for snowstorms of note in Utica, New York, over a four-year period are as follows: 7.1, 9.2, 8.0, 6.1, 14.4, 8.5, 6.1, 6.8, 7.7, 21.5, 6.7, 9.0, 8.4, 7.0, 11.5, 14.1, 9.5, 8.6 What are the mean and population standard deviation for these data, to the nearest hundredth? A mean = 9.46; standard deviation 3.74 B mean = 9.46; standard deviation = 3.85 C mean = 9.45; standard deviation =D3.74 D mean = 9.45; standard deviation 3.85 021 EducAide Sofware Inc. All rights reserved.arrow_forwardDifferentiate the followingarrow_forwardarrow_back_iosSEE MORE QUESTIONSarrow_forward_ios
Recommended textbooks for you
- Elementary Geometry For College Students, 7eGeometryISBN:9781337614085Author:Alexander, Daniel C.; Koeberlein, Geralyn M.Publisher:Cengage,

Elementary Geometry For College Students, 7e
Geometry
ISBN:9781337614085
Author:Alexander, Daniel C.; Koeberlein, Geralyn M.
Publisher:Cengage,