Assignment5
pdf
keyboard_arrow_up
School
DePaul University *
*We aren’t endorsed by this school
Course
323
Subject
Statistics
Date
May 23, 2024
Type
Pages
3
Uploaded by SargentDiscoveryWolf39
D
ATA A
NALYSIS A
ND R
EGRESSION Assignment-5
| Total Points: 36 pts for DSC 323; 41 pts for DSC 423 Note: •
All assignments should be submitted in a single MS WORD format
, no PDFs or any other file types will be accepted. If you submit any other file type, it will not be graded. •
No extensions will be given unless for a documented reason specified in the syllabus, no late assignments past the due date even a couple of minutes late will be accepted as you have an extra day (7-days) to submit your assignments. •
Submitting work that is not yours is grounds for an automatic ‘F’ for the entire course – this includes taking content and ideas from others or consulting others to complete your deliverables other than your instructor. •
SAS software and virtual server stalls, gets slow and crashes; so start early and keep multiple backups in multiple places/mediums. Late submission or inability to do the assignment due to server and/or software issues will not be accepted. Any issues relating with SAS, contact IS using the phone number provided in the syllabus, I won’t be able to help you with DePaul software related issues. •
Make sure to double check your submissions. After you submit the assignment, log out of D2L, log back in, and click on your submission to see if you submitted the right file(s) and it is the correct version. Wrong submissions will not be graded.
Note: For all questions, immaterial if whether the relevant output is asked to be attached or not, make sure to include it. Also, it is important to include the sign (negative/positive or increase/decrease, and units of measurements e.g. $ or $ 99 million,%, etc.) otherwise points will be deducted. PROBLEM 1 [25 pts] – to be answered by everyone This problem asks you to build a model for the college dataset (college.csv) that contains the following variables: School School name Private public/private indicator. YES if university is private, NO if university is public. Accept.pct percentage of applicants accepted Elite10 Elite schools with majority of students from the top 10% of their high school class (0- Not Elite, 1-Elite) F.Undergrad number of full-time undergraduate students P.Undergrad number of part-time undergraduate students Outstate Out-of-state tuition Room.Board room and board costs Books estimated book costs Personal Estimated personal spending PhD Percent of faculty with PhD Terminal Faculty with terminal degrees (
terminal degree is a university degree that is either highest on the academic track or highest on the professional track in a given field of study) S.F.Ratio Student/faculty ratio perc.alumni Percent of alumni who donate Expend Instructional expenditure per student Grad.Rate Graduation rate in 4 years
Apply regression analysis techniques to analyze the relationship among the observed variables and build a model to predict Graduation Rates (Grad.Rate). Note: Depending on how you import you data (INFILE or
IMPORT) the SAS may relabel the column names. Make sure to use the variable names that appear when you use a proc print.
Note: Before you start, open the college.csv file, and examine the data.
Answer the following questions. a)
Analyze the distribution of Grad.Rate and discuss if the distribution is symmetric, or if you need to apply any transformation (This is the data exploration stage, therefore use the appropriate statics to explore your data). b)
Create scatterplots for Grad.Rate vs each of the independent variables. What conclusions can you draw about the relationships between Grad.Rate and the independent variables? (No need to include the scatterplots in your submission). c)
Build boxplots to evaluate if graduation rates vary by university type (private vs public) and by status (elite vs not elite). Include the boxplots and discuss your findings. (See SAS Procedures section on D2L if you need the code to generate a boxplot). d)
Run the full model. For the full model, discuss the parameter estimates, significance, goodness-of-fit and AdjR2 values. Include the relevant output. e)
Does multi-collinearity seem to be a problem here? What is your evidence? Compute and analyze the VIF statistics. Include the relevant output and discuss your answer. f)
Apply TWO selection methods to find an optimal subset of independent variables to predict Grad.Rate
. You can choose any two procedures among the ones we learned in class: backward selection, forward selection, adj-R
2
, Cp, stepwise. Make sure to include the o/p of the 2 selection methods and state which methods was used. No need to discuss the models, include the outputs. g)
Fit a final regression model M1
for Grad.Rate based on the results in f) – i.e. optimal model. Explain your choice. Write down the expression of the estimated model M1
. h)
Draw a plot of the studentized residuals against the predicted values. Does the plot show any striking pattern indicating problems in the regression analysis? Include the outputs and explain. i)
Analyze normal probability plot of residuals. Is there any evidence that the assumption of normality is not satisfied? Include the outputs and explain. j)
Are there any outliers or Influential Points? Compute appropriate statistics. Include the outputs. Take any action you think is necessary and explain why/why not you took these actions? k)
Analyze the AdjR
2
value for the final model and discuss how well the model explains the variation in graduation rates among the universities.
l)
Draw conclusions on graduation rates based on your regression analysis. What are the most important predictors in your model? Does your model show a significant difference in graduation rates between private and public universities? Do “elite” universities have higher graduation rates? Explain. m)
Use the final regression model to predict the graduation rate for the following values. Using SAS, compute the predicted graduation rate, 95% confidence interval and prediction interval for your estimate. Make sure to use SAS coding to determine the values. Include all relevant outputs. Discuss your findings. Private Yes Books 250 Accept.pct 0.87 Personal 1350 Elite10 Not Elite PhD 40 F.Undergrad 3000 Terminal 34 P.Undergrad 524 S.F.Ratio 30.2 Outstate 6500 perc.alumni 13 Room.Board 3300 Expend 5201 n)
Copy and paste your FULL SAS code into the word document along with your answers. PROBLEM 2 [11 pts] – to be answered by everyone Answer the following strictly
based on the lecture discussions
: 1.
[2 pts] What is the main difference between R2 and AdjR2? 2.
[2 pts] What was the main different between
cp and adjR2 selection methods compared to forward, backward or stepwise methods? 3.
[3 pts] One of the selection methods was both a selection method and criteria for determining if a model is good. What was it? Explain why you think so? 4.
Based on the lab activity that was done today a.
[2 pts] What code did we use to display 5 observations?. b.
[2 pts] What was the reason for assigning zero to dj3 and dj4 when we did predictions? PROBLEM 3 [5 pts] – For Graduate Students ONLY 1.
What selection methods did you use in Problem-1 (f)? 2.
Explain why you chose these two selection methods in Problem-1 (f) as opposed to the other methods. The reason(s) should have a methodological approach. If you say I wanted to try it, or it is the easiest to use or something to that effect you will receive zero. Provide reason(s) for both.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
In a certain year, there were 88 female officials in Congress, which is comprised of the House of Representatives and the Senate. If there were 64 more female
members of the House of Representatives than female senators, find the number of females in each house of Congress.
There were female senators and
female members of the House of Representatives.
Submit Assignment
Continue
O 2021 McGraw Hill LLC. All Rights Reserved. Terms of Use Privacy Center Accessibility
&
%23
24
8.
9
R
Y
つ
arrow_forward
Your electronic files, including both excel and word files, must be submitted. Please use a separate
worksheet in excel for each question and label them respectively. Please type the section number of
your class, name and ID of the team members on the file.
1. Researchers wanted to understand college students' usage of time. A formal survey of 3,000 students was
taken and the results are summarized as follows:
Activity
Attending class
Sleeping
Socializing & recreation
Studying
Working, volunteering, student clubs
Percentage (%)
a) Construct a bar chart, a pie chart, and a Pareto chart.
b) What conclusions can you reach concerning what college students do with their time?
ii.
iii.
iv.
9
24
51
7
9
2. Examine the data in the file "Class survey data.xls" posted in the week 5 of course portal.
a) Create a descriptive statistics summary table using Data Analysis Toolpak (Add in for Excel) for
survey questions 2 through 10 inclusive.
b)
Identify the data types of each of the 9 variables as…
arrow_forward
classify as either observational or experimental designCase 1: Starting Salaries. The National Association of Colleges and Employers (NACE) compiles information on salary offers to new college graduates and publishes the results in Salary Survey.
arrow_forward
Please help to answer the control chart - case 2 as attached.
arrow_forward
A Digital Notebo x
O Math Joumal
Side Show OS X
O Post Altende
+ → c O
OSA SM1_Mathematics_7_TST (1)
b plus.allinlearning.com/porta/assessments/routeassignment/5eBb38a9-5fde-11e
B. Mitch bought four cookb poks and one novel for a total of $68.75. Eadch cookbook cost the same price The
novel cost $7 dollars less than a cookbook.
How much did each cookbɔok cost?
A $12.35
B. $15.44
C. $15.15
D. $18.94
Us O 9 0 9:29
arrow_forward
The Ministry of Tourism in Trinidad and Tobago is interested in developing a campaign to increase the number of visitors to the island. The Ministry in collaboration with the island’s hotels collected data to be used as a guide to determine what steps should be taken going forward. Using the data in the Microsoft Excel file attached you are required to use the knowledge you have acquired during the semester to answer the following question. Ensure that your responses are detailed and all the necessary steps are clearly outlined.
Please note that JASP should only be used if stated in the question.
The Description of the variables in the data set is given below
Name
Description
ID
Visitor Number (1-150)
Length_stay
Length of stay in the island (days)
Age_years
Age of visitor (years)
Return_pct
Average estimated probability of returning to the island
Attraction
Number of attractions visited in the island
Trip_ratio
Number of trips taken off…
arrow_forward
Bb ECART Home - Blackboard Le
A To-do
O Horizon
A assessment.fcps.edu/horizon/Test Taking/ViewAll
D Mr. T. Mrs Nagyne, and M ex
M Blackboard Collaborate Invitex
HORIZON
Assessment System
其女
NORTH
GRUM
Name: Isaiah Resto
Test: PreAlgebra Composite Figures Test 2021 (ID=827500)
Number of questions: 10
Save
Save & Exit
Show Status
Submit For Grading
Question 8 (ID=3354112)
The picture below shows a rectangular walkway that goes around a rectangular pool. The walkway is 3 meters wide. What is the area of the walkway?
3 m
一
15 m
29 m
O Mark Question For Review
arrow_forward
Define the term linear interpolation?
arrow_forward
Bloom Township
← → C Û ☆
sd206.org bookmarks
Taranter eines and transversals - 11/07/2022 V1.93
QUESTIONS
OItem 1
O Item 2
O Item 3
O Item 4
O
O Item 5
OItem 6
O Item 7
●
O
District 206 X
Item 8
O Item 9
O
O Item 10
Classwork for Pre-AP Geometry XE. Edulastic
X
LockDown Browser for Chromebox +
1/2
3
n
BOOKMARK ✓ CHECK ANSWER
X EQ E
€
19
☐
i
X
:
(
11 LEFT
arrow_forward
place
open
closed
pending
total
bronx
1121
1622
80
2823
brooklyn
1170
2706
48
3924
manhattan
744
3380
25
4149
queens
1353
2043
25
3421
staten island
83
118
0
201
total
4471
9869
178
14518
arrow_forward
An IT company is planning to introduce a new product. In order to predict its
sales, they release the test version of the product to a group of early-access users.
Each user is asked to give a score for the product from 0 to 100. If a user's score
is greater than 50, then the user is classified as a potential buyer.
a) Suppose you are a researcher and can collect any data as you wish. Briefly
discuss what data you would collect, and how you would collect them. What
distribution would be the best choice to model the score? What distribution
would be the best choice to model being a potential buyer? Be sure to define
random variables clearly. Give reasons for your answer.
b) The manager decided that if more than 80% of users are classified as potential
buyers, they will release the product. How would you conduct a hypothesis
test? Make assumptions if you need.
Please give some detailed reasons, i will be very very appreciate!!!!!
arrow_forward
*Please work with your version, don't copy from other sources
use table t in doing it
1. A testing engineer is stress testing an Application Programming Interface (API) made by his company on twenty servers from two different vendors. The maximum number of hits from each server from each vendor is as follows (in thousands):Server Vendor A: 42.1 ; 41.3 ; 42.4 ; 43.2 ; 41.8 ; 41.0 ; 41.8 ; 42.8 ; 42,3 ; 42.7Server Vendor B: 42.7 ; 43.8 ; 42.5 ; 43.1 ; 44.0 ; 43.6 ; 43.3 ; 43.5 ; 41.7 ; 44.1It is known that the two stress test data are normally distributed with the same variance.a. Compute the 99% two-sided confidence interval for the difference in the mean number of hits from the servers of the two vendors.b. Compute the 97.5% upper confidence interval (97.5% upper confidence interval) for the difference in the average number of hits from the servers of the two vendors.c. Compute a lower confidence interval of 90% for the difference in the average number of hits from the servers of the…
arrow_forward
Please give some suitable diagram or figures for this topic -" Specifications must be controlled for national and international accreditation".
arrow_forward
O Connect
Lthe contigency table below shov x
O 8-1 Final Exam MAT-133-J1228 X
A https://ezto.mheducation.com/ext/map/index.html?_con%3Dcon&external_browser3D0&launchUrl-Dhttps%253A%252F%252F
Saved
Exam G
The contingency table below shows the results of a survey of video viewing habits by age.
Video Viewing Platform Preferred
Viewer Age
Mobile/Laptop Device
TV Screen
Row Total
30
38
68
18-34
35-54
10
10
20
12
55+
Column Total
43
57
100
Find the following probabilities or percentages:
(a) Probability that a viewer is aged 18-34. (Round your answer to 2 decimal places.)
Probability
(b) Probability that a viewer prefers watching videos on a TV screen. (Round your answer to 2 decimal places.)
Probability
(c) Percentage of viewers who are 18-34 and prefer videos on a mobile or laptop device.
Percentage of viewers
%
(d) Percentage of viewers given they are 18-34 who prefer videos on a mobile or laptop device. (Round your answer to 2 de
places.)
Percentage of viewers
(e) Percentage of viewers…
arrow_forward
Define the term scatterplot?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Elementary Geometry For College Students, 7e
Geometry
ISBN:9781337614085
Author:Alexander, Daniel C.; Koeberlein, Geralyn M.
Publisher:Cengage,
Related Questions
- In a certain year, there were 88 female officials in Congress, which is comprised of the House of Representatives and the Senate. If there were 64 more female members of the House of Representatives than female senators, find the number of females in each house of Congress. There were female senators and female members of the House of Representatives. Submit Assignment Continue O 2021 McGraw Hill LLC. All Rights Reserved. Terms of Use Privacy Center Accessibility & %23 24 8. 9 R Y つarrow_forwardYour electronic files, including both excel and word files, must be submitted. Please use a separate worksheet in excel for each question and label them respectively. Please type the section number of your class, name and ID of the team members on the file. 1. Researchers wanted to understand college students' usage of time. A formal survey of 3,000 students was taken and the results are summarized as follows: Activity Attending class Sleeping Socializing & recreation Studying Working, volunteering, student clubs Percentage (%) a) Construct a bar chart, a pie chart, and a Pareto chart. b) What conclusions can you reach concerning what college students do with their time? ii. iii. iv. 9 24 51 7 9 2. Examine the data in the file "Class survey data.xls" posted in the week 5 of course portal. a) Create a descriptive statistics summary table using Data Analysis Toolpak (Add in for Excel) for survey questions 2 through 10 inclusive. b) Identify the data types of each of the 9 variables as…arrow_forwardclassify as either observational or experimental designCase 1: Starting Salaries. The National Association of Colleges and Employers (NACE) compiles information on salary offers to new college graduates and publishes the results in Salary Survey.arrow_forwardPlease help to answer the control chart - case 2 as attached.arrow_forwardA Digital Notebo x O Math Joumal Side Show OS X O Post Altende + → c O OSA SM1_Mathematics_7_TST (1) b plus.allinlearning.com/porta/assessments/routeassignment/5eBb38a9-5fde-11e B. Mitch bought four cookb poks and one novel for a total of $68.75. Eadch cookbook cost the same price The novel cost $7 dollars less than a cookbook. How much did each cookbɔok cost? A $12.35 B. $15.44 C. $15.15 D. $18.94 Us O 9 0 9:29arrow_forwardThe Ministry of Tourism in Trinidad and Tobago is interested in developing a campaign to increase the number of visitors to the island. The Ministry in collaboration with the island’s hotels collected data to be used as a guide to determine what steps should be taken going forward. Using the data in the Microsoft Excel file attached you are required to use the knowledge you have acquired during the semester to answer the following question. Ensure that your responses are detailed and all the necessary steps are clearly outlined. Please note that JASP should only be used if stated in the question. The Description of the variables in the data set is given below Name Description ID Visitor Number (1-150) Length_stay Length of stay in the island (days) Age_years Age of visitor (years) Return_pct Average estimated probability of returning to the island Attraction Number of attractions visited in the island Trip_ratio Number of trips taken off…arrow_forwardBb ECART Home - Blackboard Le A To-do O Horizon A assessment.fcps.edu/horizon/Test Taking/ViewAll D Mr. T. Mrs Nagyne, and M ex M Blackboard Collaborate Invitex HORIZON Assessment System 其女 NORTH GRUM Name: Isaiah Resto Test: PreAlgebra Composite Figures Test 2021 (ID=827500) Number of questions: 10 Save Save & Exit Show Status Submit For Grading Question 8 (ID=3354112) The picture below shows a rectangular walkway that goes around a rectangular pool. The walkway is 3 meters wide. What is the area of the walkway? 3 m 一 15 m 29 m O Mark Question For Reviewarrow_forwardDefine the term linear interpolation?arrow_forwardBloom Township ← → C Û ☆ sd206.org bookmarks Taranter eines and transversals - 11/07/2022 V1.93 QUESTIONS OItem 1 O Item 2 O Item 3 O Item 4 O O Item 5 OItem 6 O Item 7 ● O District 206 X Item 8 O Item 9 O O Item 10 Classwork for Pre-AP Geometry XE. Edulastic X LockDown Browser for Chromebox + 1/2 3 n BOOKMARK ✓ CHECK ANSWER X EQ E € 19 ☐ i X : ( 11 LEFTarrow_forwardplace open closed pending total bronx 1121 1622 80 2823 brooklyn 1170 2706 48 3924 manhattan 744 3380 25 4149 queens 1353 2043 25 3421 staten island 83 118 0 201 total 4471 9869 178 14518arrow_forwardAn IT company is planning to introduce a new product. In order to predict its sales, they release the test version of the product to a group of early-access users. Each user is asked to give a score for the product from 0 to 100. If a user's score is greater than 50, then the user is classified as a potential buyer. a) Suppose you are a researcher and can collect any data as you wish. Briefly discuss what data you would collect, and how you would collect them. What distribution would be the best choice to model the score? What distribution would be the best choice to model being a potential buyer? Be sure to define random variables clearly. Give reasons for your answer. b) The manager decided that if more than 80% of users are classified as potential buyers, they will release the product. How would you conduct a hypothesis test? Make assumptions if you need. Please give some detailed reasons, i will be very very appreciate!!!!!arrow_forward*Please work with your version, don't copy from other sources use table t in doing it 1. A testing engineer is stress testing an Application Programming Interface (API) made by his company on twenty servers from two different vendors. The maximum number of hits from each server from each vendor is as follows (in thousands):Server Vendor A: 42.1 ; 41.3 ; 42.4 ; 43.2 ; 41.8 ; 41.0 ; 41.8 ; 42.8 ; 42,3 ; 42.7Server Vendor B: 42.7 ; 43.8 ; 42.5 ; 43.1 ; 44.0 ; 43.6 ; 43.3 ; 43.5 ; 41.7 ; 44.1It is known that the two stress test data are normally distributed with the same variance.a. Compute the 99% two-sided confidence interval for the difference in the mean number of hits from the servers of the two vendors.b. Compute the 97.5% upper confidence interval (97.5% upper confidence interval) for the difference in the average number of hits from the servers of the two vendors.c. Compute a lower confidence interval of 90% for the difference in the average number of hits from the servers of the…arrow_forwardarrow_back_iosSEE MORE QUESTIONSarrow_forward_ios
Recommended textbooks for you
- Elementary Geometry For College Students, 7eGeometryISBN:9781337614085Author:Alexander, Daniel C.; Koeberlein, Geralyn M.Publisher:Cengage,

Elementary Geometry For College Students, 7e
Geometry
ISBN:9781337614085
Author:Alexander, Daniel C.; Koeberlein, Geralyn M.
Publisher:Cengage,