HW1
pdf
keyboard_arrow_up
School
University of Texas, Dallas *
*We aren’t endorsed by this school
Course
6337
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
10
Uploaded by BailiffNeutron16521
Predictive Analytics Using SAS BUAN 6337.0W1 Group No: 08 Vallabhapurapu Naveen Sreeram -VXN21001 Sai Harshith Mothiki –
SXM90166 Meera Katkam –
MXK210014 Tanmayee Kodumagulla- TXK210004 Pranusha Yallala –
PXY210001
2 | P a g e Question 1: a) Examine the raw data file Pizza.csv and read it into SAS using the IMPORT procedure. Print the data set (on the results screen). Print a report that describes the contents of the data set to make sure all the variables are the correct type
.
Ans: Examined the Pizza.csv file and understood it’s a Comma Separated data. The file is imported into SAS using the import procedure. Using PROC print statement the output has been obtained as mentioned below. Pizza.csv data in SAS using import procedure:
3 | P a g e Using PROC contents we obtained the contents of datasets as follows.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4 | P a g e b. Open the raw data file in a simple editor like WordPad and compare the data values to the output from part b) to make sure that they were read correctly into SAS. In a comment in your report, identify any problems with the SAS data set that cannot be resolved using the IMPORT procedure. Explain what is causing the problem. Ans: Upon comparing the datasets in CSV format and in SAS we found that the columns shrimp and eggplant are read as character variables even though they are numeric variables. In the first column survey number, the leading zeros are missing. Since the first two digits are considered as months of survey taken, we changed it to character of length 4. This error occurred because generally SAS reads the first 20 observations and predicts the types of variables. When scanned through the first 20 observations in this dataset, as there are missing values (Shrimp and eggplant) these variables are read as characters. This error can be rectified modifying DATA statement which is generated automatically in the log file when doing the import procedure. C. Read the same raw data file, Pizza.csv, this time using a DATA step (instead of the IMPORT procedure). Be sure to resolve any issues identified above. Ans: After reading the dataset using DATA step by specifying that ‘Shrimp’ and ‘Eggplant’ are numeric variables, and ‘
survey number
’
as character , issues found above are resolved. Now data is read in SAS as in below screenshot, Pizza.csv data in SAS using import procedure:
5 | P a g e Using PROC contents we obtained the contents of datasets as follows.
6 | P a g e d. Create a new dataset with the average ratings for each topping. Ans: Using the PROC MEAN statement we have calculated the average of ratings of each topping.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7 | P a g e Question-2: a) Examine the raw data file Hotel.dat and read it into SAS. Next, create date variables for the check-in and check-out dates, and format them to display as readable dates. The raw dataset ‘Hotel.dat’ is examined on notepad and it’s a column formatted file.
Now, the data is read into SAS using DATA statement and by giving columns with variable names as shown below: Using CATX function, concatenated the check-in variables (month, day, year) check-out variables (month, day, year) as mentioned below:
8 | P a g e As in_date and out_date variable is formed using CATX variable SAS assumes them as character variables. So, these variables are modified (to Checkin_Date, Checkout_Date) with DATA statement with an ‘mmddyy10.’ for SAS to assume them as Date variables and PROC print statement wit
h format ‘mmddyy10.’ for SAS to print as correct format dates as below:
Hotel Data:
9 | P a g e b) Create a variable that calculates the subtotal as the room rate times the number of days in the stay, plus a per person rate ($10 per day for each person beyond one guest, for example for 3 guests, the total per person rate will be (3-1) *10=$20), plus an Internet service fee ($9.95 for a one-time activation and $4.95 per day of use). Created a new variable subtotal to calculate total individually based on Room_Rate, No_of_Guests, Internet and length of stay (Checkout_Date - Checkin_Date) variables. and formatted it to display the result rounded off to 2 decimal places using an informat ‘8.2’
Used an IF/THEN statement to calculate for Four combinations: Case-1: Number of guests = 1 and has Internet: Subtotal = (roomrate*lengthofstay) +(Internet_charges) Case -2: Number of guests =1 and no Internet: Subtotal = (roomrate*lengthofstay) Case-3: Number of guests >1 and has Internet: Subtotal = (roomrate*lengthofstay) +(Internet_charges) +(10*(Guests-1)*(lengthofstay))
Number of guests >1 and no Internet: Subtotal = roomrate*lengthofstay)+(10* (Guests-1)*(lengthofstay)) Hotel data with Subtotal:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10 | P a g e c) Create a variable that calculates the grand total as the subtotal plus sales tax at 7.75%. The result should be rounded to two decimal places. Created a new variable Grandtotal
to calculate total with taxes and formatted it to display the result rounded off to 2 decimal places u
sing an informat ‘8.2’
Hotel data with GrandTotal: d. View the resulting data set. In a comment in your report, state the value for the grand total for room 211. Based the results above, the grand total for room 211 is $1357.65
Related Questions
None
arrow_forward
Can someone please help me with part c on question 1?
arrow_forward
A'6
arrow_forward
Part 4 of 4
Summarize the results.
There (Choose one) enough evidence to conclude that the second-graders in the superintendent's school district have
h the nationwide average.
differe is
is not
arrow_forward
The deadly Ebola virus is a threat to both people and gorillas in Central Africa. An outbreak in 2002 and 2003 killed 91 of the 95 gorillas in seven home ranges in the Congo. To study the spread of the virus, measure "distance" by the number of home ranges separating a group of gorillas from the first group infected. Here are data on distance and time in number of days until deaths began in each later group. (Data set may be found here.) Distance 1 3 4 4 4 5 Time 4 21 33 41 43 46
(a) Make a scatterplot. Which is the explanatory variable? distance time either distance or time Correct: Your answer is correct. What kind of pattern does your plot show? There is a strong positive linear association between distance and time. There is a strong negative linear association between distance and time. There is little to no association between distance and time. Correct: Your answer is correct. (
b) Find the correlation r between distance and time. (Round your answer to four decimal places.)
(c)…
arrow_forward
The deadly Ebola virus is a threat to both people and gorillas in Central Africa. An outbreak in 2002 and 2003 killed 91 of the 95 gorillas in seven home ranges in the Congo. To study the spread of the virus, measure "distance" by the number of home ranges separating a group of gorillas from the first group infected. Here are data on distance and time in number of days until deaths began in each later group. (Data set may be found here.) Distance 1 3 4 4 4 5 Time 4 21 33 41 43 46
(a) Make a scatterplot. Which is the explanatory variable? What kind of pattern does your plot show?(b) Find the correlation r between distance and time.
arrow_forward
Is the scatterplot informative?
arrow_forward
The exercise involving data in this and subsequent sections were designed to be solved using Excel. Johnson Filtration, Inc. provides maintenance service for water-filtration systems. Suppose that in addition to information on the number of months since the machine was serviced and whether a mechanical or an electrical repair was necessary, the managers obtained a list showing which repairperson performed the service. The revised data follow.
Repair Time(hours)
Months SinceLast Service
Type ofRepair
Repairperson
2.9
2
electrical
Dave Newton
3.0
6
mechanical
Dave Newton
4.8
8
electrical
Bob Jones
1.8
3
mechanical
Dave Newton
2.9
2
electrical
Dave Newton
4.9
7
electrical
Bob Jones
4.2
9
mechanical
Bob Jones
4.8
8
mechanical
Bob Jones
4.4
4
electrical
Bob Jones
4.5
6
electrical
Dave Newton
Ignore for now the months since the last maintenance service (x1 ) and the repairperson…
arrow_forward
A random sample of n = 25 students in Gwinnett County schools were chosen to
participate in a study about remote / digital study. Of the 25 students, 20 reported
the Google Classroom provides the best access to digital content.
%3D
Which statement best describes the population and true parameter of this scenario?
O The population is all students in Gwinnett County school; the parameter is the
proportion of students that believe Google Classroom provides the best access
to digital content
The population is all students who participated in the remote / digital learning in
Gwinnett County; the parameter is reported proportion of students that
believe Google Classroom provides the best access to digital content
The population is all students who participated in the remote / digital learning3B
the parameter is the true proportion of students that believe Google Classroom
provides the best access to digital content
O The population is the 25 students who participated in the remote / digital…
arrow_forward
W myWalden Student Portal
h Take Test: Quiz - Unit 5 - ITEC-20 x
O ITEC-2020 Unit 05
Review Submission History: Assic x
i class.waldenu.edu/webapps/assessment/take/launch.jsp?course_assessment_id=_1602013_1&course_id%3 16730315_1&content_id%3 56826874_18step=null
# Apps M Gmail
O New Tab
O Launch Meeting - Z.
a ClassLink
O Reading To Do, i-Re. i General (Spraddling... O Flipgrid |4cc0f5c9
O i 7 Student Success S..
¥ Question Completion Status:
QUESTION 7
The probability of being stopped at one traffic light is 0.40. The probability of being stopped at the next traffic light is 0.30. The
probability of being stopped at both lights is 0.12.
part a. Are the two lights independent?
part b. Given that a car stops at the first traffic light, what is the probability the car will stop at the next light? (Answer as a
decimal to the hundredths)
EST ION O
arrow_forward
You may need to use the appropriate technology to answer this question.
An automobile dealer conducted a test to determine if the time in minutes needed to complete a minor engine tune-up depends on whether a computerized engine analyzer or an electronic analyzer is used. Because tune-up time varies among compact, intermediate, and full-sized cars, the three types of cars were used as blocks in the experiment. The data obtained follow.
Analyzer
computerized
electronic
Car
compact
50
41
Intermediate
56
45
Full Sized
62
46
Use ? = 0.05 to test for any significant differences.
State the null and alternative hypotheses.
H0: ?Computerized = ?ElectronicHa: ?Computerized ≠ ?ElectronicH0: ?Computerized ≠ ?ElectronicHa: ?Computerized = ?Electronic H0: ?Computerized = ?Electronic = ?Compact = ?Intermediate = ?Full-sizedHa: Not all the population means are equal.H0: ?Compact = ?Intermediate = ?Full-sizedHa: ?Compact ≠ ?Intermediate ≠ ?Full-sizedH0:…
arrow_forward
) Which model is the better predictor based on the MSE
arrow_forward
The table gives the first 5 observations of 4242 years of data on boats registered in Florida and manatees killed by boats.
Year
Boats
Manatees
1977
447
13
1978
460
21
1979
481
24
1980
498
16
1981
513
24
1982
512
20
Click to download the data in your preferred format to view the full data.
CSV Excel JMP Mac-Text Minitab14-18 Minitab18+ PC-Text R SPSS TI CrunchIt!
The scatterplot of this data shows a strong positive linear relationship.
The correlation is ?=0.919.
(b) Suppose we expect that the number of boats registered in Florida to be 950,000 in 2019. What would you predict the number of manatees killed by boats to be if there are 950,000 boats registered? Give your answer to a whole number.
?̂ =
manatee deaths
Select the statements that explain why we can trust this prediction.
The prediction is reliable because of the strong linear association visible in the scatterplot.
The…
arrow_forward
We can make a causal claim from the results of the chi-square test ? tru or false
arrow_forward
engage.com/static/nb/ui/evo/index.html?deploymentld:
Q Search this cour
ENGAGE MINDTAP
s HW-ST 260
« Question 7 of 10
Questions
Exercise 05.38 Algorithmic
Check My Work (4 remaining)
5.
6.
eBook
8.
Military radar and missile detection systems are designed to warn a country of an enemy attack. A reliability question is whether a detection system will be able to identify an
9.
attack and issue a warning. Assume that a particular detection system has a 0.95 probability of detecting a missile attack. Use the binomial probability distribution to answer
the following questions.
10.
a. What is the probability that a single detection system will detect an attack?
(to 2 decimals)
b. If two detection systems are installed in the same area and operate independently, what is the probability that at least one of the systems will detect the attack?
(to 4 decimals)
C. If three systems are installed, what is the probability that at least one of the systems will detect the attack?
(to 4 decimals)
d.…
arrow_forward
please answer part b frq style
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- Part 4 of 4 Summarize the results. There (Choose one) enough evidence to conclude that the second-graders in the superintendent's school district have h the nationwide average. differe is is notarrow_forwardThe deadly Ebola virus is a threat to both people and gorillas in Central Africa. An outbreak in 2002 and 2003 killed 91 of the 95 gorillas in seven home ranges in the Congo. To study the spread of the virus, measure "distance" by the number of home ranges separating a group of gorillas from the first group infected. Here are data on distance and time in number of days until deaths began in each later group. (Data set may be found here.) Distance 1 3 4 4 4 5 Time 4 21 33 41 43 46 (a) Make a scatterplot. Which is the explanatory variable? distance time either distance or time Correct: Your answer is correct. What kind of pattern does your plot show? There is a strong positive linear association between distance and time. There is a strong negative linear association between distance and time. There is little to no association between distance and time. Correct: Your answer is correct. ( b) Find the correlation r between distance and time. (Round your answer to four decimal places.) (c)…arrow_forwardThe deadly Ebola virus is a threat to both people and gorillas in Central Africa. An outbreak in 2002 and 2003 killed 91 of the 95 gorillas in seven home ranges in the Congo. To study the spread of the virus, measure "distance" by the number of home ranges separating a group of gorillas from the first group infected. Here are data on distance and time in number of days until deaths began in each later group. (Data set may be found here.) Distance 1 3 4 4 4 5 Time 4 21 33 41 43 46 (a) Make a scatterplot. Which is the explanatory variable? What kind of pattern does your plot show?(b) Find the correlation r between distance and time.arrow_forward
- Is the scatterplot informative?arrow_forwardThe exercise involving data in this and subsequent sections were designed to be solved using Excel. Johnson Filtration, Inc. provides maintenance service for water-filtration systems. Suppose that in addition to information on the number of months since the machine was serviced and whether a mechanical or an electrical repair was necessary, the managers obtained a list showing which repairperson performed the service. The revised data follow. Repair Time(hours) Months SinceLast Service Type ofRepair Repairperson 2.9 2 electrical Dave Newton 3.0 6 mechanical Dave Newton 4.8 8 electrical Bob Jones 1.8 3 mechanical Dave Newton 2.9 2 electrical Dave Newton 4.9 7 electrical Bob Jones 4.2 9 mechanical Bob Jones 4.8 8 mechanical Bob Jones 4.4 4 electrical Bob Jones 4.5 6 electrical Dave Newton Ignore for now the months since the last maintenance service (x1 ) and the repairperson…arrow_forwardA random sample of n = 25 students in Gwinnett County schools were chosen to participate in a study about remote / digital study. Of the 25 students, 20 reported the Google Classroom provides the best access to digital content. %3D Which statement best describes the population and true parameter of this scenario? O The population is all students in Gwinnett County school; the parameter is the proportion of students that believe Google Classroom provides the best access to digital content The population is all students who participated in the remote / digital learning in Gwinnett County; the parameter is reported proportion of students that believe Google Classroom provides the best access to digital content The population is all students who participated in the remote / digital learning3B the parameter is the true proportion of students that believe Google Classroom provides the best access to digital content O The population is the 25 students who participated in the remote / digital…arrow_forward
- W myWalden Student Portal h Take Test: Quiz - Unit 5 - ITEC-20 x O ITEC-2020 Unit 05 Review Submission History: Assic x i class.waldenu.edu/webapps/assessment/take/launch.jsp?course_assessment_id=_1602013_1&course_id%3 16730315_1&content_id%3 56826874_18step=null # Apps M Gmail O New Tab O Launch Meeting - Z. a ClassLink O Reading To Do, i-Re. i General (Spraddling... O Flipgrid |4cc0f5c9 O i 7 Student Success S.. ¥ Question Completion Status: QUESTION 7 The probability of being stopped at one traffic light is 0.40. The probability of being stopped at the next traffic light is 0.30. The probability of being stopped at both lights is 0.12. part a. Are the two lights independent? part b. Given that a car stops at the first traffic light, what is the probability the car will stop at the next light? (Answer as a decimal to the hundredths) EST ION Oarrow_forwardYou may need to use the appropriate technology to answer this question. An automobile dealer conducted a test to determine if the time in minutes needed to complete a minor engine tune-up depends on whether a computerized engine analyzer or an electronic analyzer is used. Because tune-up time varies among compact, intermediate, and full-sized cars, the three types of cars were used as blocks in the experiment. The data obtained follow. Analyzer computerized electronic Car compact 50 41 Intermediate 56 45 Full Sized 62 46 Use ? = 0.05 to test for any significant differences. State the null and alternative hypotheses. H0: ?Computerized = ?ElectronicHa: ?Computerized ≠ ?ElectronicH0: ?Computerized ≠ ?ElectronicHa: ?Computerized = ?Electronic H0: ?Computerized = ?Electronic = ?Compact = ?Intermediate = ?Full-sizedHa: Not all the population means are equal.H0: ?Compact = ?Intermediate = ?Full-sizedHa: ?Compact ≠ ?Intermediate ≠ ?Full-sizedH0:…arrow_forward) Which model is the better predictor based on the MSEarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt