HW1
pdf
keyboard_arrow_up
School
University of Texas, Dallas *
*We aren’t endorsed by this school
Course
6337
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
10
Uploaded by BailiffNeutron16521
Predictive Analytics Using SAS BUAN 6337.0W1 Group No: 08 Vallabhapurapu Naveen Sreeram -VXN21001 Sai Harshith Mothiki –
SXM90166 Meera Katkam –
MXK210014 Tanmayee Kodumagulla- TXK210004 Pranusha Yallala –
PXY210001
2 | P a g e Question 1: a) Examine the raw data file Pizza.csv and read it into SAS using the IMPORT procedure. Print the data set (on the results screen). Print a report that describes the contents of the data set to make sure all the variables are the correct type
.
Ans: Examined the Pizza.csv file and understood it’s a Comma Separated data. The file is imported into SAS using the import procedure. Using PROC print statement the output has been obtained as mentioned below. Pizza.csv data in SAS using import procedure:
3 | P a g e Using PROC contents we obtained the contents of datasets as follows.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4 | P a g e b. Open the raw data file in a simple editor like WordPad and compare the data values to the output from part b) to make sure that they were read correctly into SAS. In a comment in your report, identify any problems with the SAS data set that cannot be resolved using the IMPORT procedure. Explain what is causing the problem. Ans: Upon comparing the datasets in CSV format and in SAS we found that the columns shrimp and eggplant are read as character variables even though they are numeric variables. In the first column survey number, the leading zeros are missing. Since the first two digits are considered as months of survey taken, we changed it to character of length 4. This error occurred because generally SAS reads the first 20 observations and predicts the types of variables. When scanned through the first 20 observations in this dataset, as there are missing values (Shrimp and eggplant) these variables are read as characters. This error can be rectified modifying DATA statement which is generated automatically in the log file when doing the import procedure. C. Read the same raw data file, Pizza.csv, this time using a DATA step (instead of the IMPORT procedure). Be sure to resolve any issues identified above. Ans: After reading the dataset using DATA step by specifying that ‘Shrimp’ and ‘Eggplant’ are numeric variables, and ‘
survey number
’
as character , issues found above are resolved. Now data is read in SAS as in below screenshot, Pizza.csv data in SAS using import procedure:
5 | P a g e Using PROC contents we obtained the contents of datasets as follows.
6 | P a g e d. Create a new dataset with the average ratings for each topping. Ans: Using the PROC MEAN statement we have calculated the average of ratings of each topping.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7 | P a g e Question-2: a) Examine the raw data file Hotel.dat and read it into SAS. Next, create date variables for the check-in and check-out dates, and format them to display as readable dates. The raw dataset ‘Hotel.dat’ is examined on notepad and it’s a column formatted file.
Now, the data is read into SAS using DATA statement and by giving columns with variable names as shown below: Using CATX function, concatenated the check-in variables (month, day, year) check-out variables (month, day, year) as mentioned below:
8 | P a g e As in_date and out_date variable is formed using CATX variable SAS assumes them as character variables. So, these variables are modified (to Checkin_Date, Checkout_Date) with DATA statement with an ‘mmddyy10.’ for SAS to assume them as Date variables and PROC print statement wit
h format ‘mmddyy10.’ for SAS to print as correct format dates as below:
Hotel Data:
9 | P a g e b) Create a variable that calculates the subtotal as the room rate times the number of days in the stay, plus a per person rate ($10 per day for each person beyond one guest, for example for 3 guests, the total per person rate will be (3-1) *10=$20), plus an Internet service fee ($9.95 for a one-time activation and $4.95 per day of use). Created a new variable subtotal to calculate total individually based on Room_Rate, No_of_Guests, Internet and length of stay (Checkout_Date - Checkin_Date) variables. and formatted it to display the result rounded off to 2 decimal places using an informat ‘8.2’
Used an IF/THEN statement to calculate for Four combinations: Case-1: Number of guests = 1 and has Internet: Subtotal = (roomrate*lengthofstay) +(Internet_charges) Case -2: Number of guests =1 and no Internet: Subtotal = (roomrate*lengthofstay) Case-3: Number of guests >1 and has Internet: Subtotal = (roomrate*lengthofstay) +(Internet_charges) +(10*(Guests-1)*(lengthofstay))
Number of guests >1 and no Internet: Subtotal = roomrate*lengthofstay)+(10* (Guests-1)*(lengthofstay)) Hotel data with Subtotal:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10 | P a g e c) Create a variable that calculates the grand total as the subtotal plus sales tax at 7.75%. The result should be rounded to two decimal places. Created a new variable Grandtotal
to calculate total with taxes and formatted it to display the result rounded off to 2 decimal places u
sing an informat ‘8.2’
Hotel data with GrandTotal: d. View the resulting data set. In a comment in your report, state the value for the grand total for room 211. Based the results above, the grand total for room 211 is $1357.65
Related Questions
A data set contains the observations 7, 4, 2, 3, 1. Findx J2.
arrow_forward
//$$/$/$/$::$/$:Helppppppp
arrow_forward
None
arrow_forward
Can someone please help me with part c on question 1?
arrow_forward
A'6
arrow_forward
Question 16
Indicate whether the race of a person is a quantitative
variable or a categorical variable?
Edit View Insert Format Tools Table
12pt v
Paragraph v
BIYA 2 TV
53
arrow_forward
The migration pattern of Monarch butterflies are tracked by a catch-and-release method in which individual
butterflies are tagged with a circular, lightweight sticker placed carefully on the wings so as not to impede
their ability to fly. The sticker contains a unique ID number. Volunteers across the U.S. and South America
capture the butterflies, record the IDs if they are tagged, and release them. This allows us to track the
locations each unique ID is found, allowing us to estimate the migration pattern. On average, 1 out of 100
captured butterflies are already tagged. Suppose you are a volunteer and capture 50 butterflies; let X denote
the number of those that are already tagged. What is the distribution of X? What is the probability that
you catch at least one tagged butterfly?
arrow_forward
A 8.1:Part B
e Edcite 0118.1:Part B
+
A edcite.com/apps/AssignmentViewer?assignid=Dwreeves_1614978548857
E hazelwoodschools.org bookmarks
- connectED
: The Higher Lower G.
Watch FREE Movie.
O Play Curve Fever Pr.
M Watch Latest Movie. . Tyler SIS 360 A VMware Horizon
E Reading list
>
This problem requires work.
Given the side lengths, determine whether the triangle is acute, right, obtuse, or not a triangle.
15, 16, 21
A Not a Triangle
B Acute
(c) Obtuse
D Right
bookmark
note
highlighter
answer eliminator
line-reader
reset answer
zoom
This assignment uses a Viewer designed by Edcite to meet the needs of students to practice for their state assessments. Please note that Edcite is not
the state assessment provider. As such, the Edcite viewer may differ from that of the vendor selected by the state.
© 2013-2021 Edcite, Inc
INTL O O 1:10
arrow_forward
Part 4 of 4
Summarize the results.
There (Choose one) enough evidence to conclude that the second-graders in the superintendent's school district have
h the nationwide average.
differe is
is not
arrow_forward
The deadly Ebola virus is a threat to both people and gorillas in Central Africa. An outbreak in 2002 and 2003 killed 91 of the 95 gorillas in seven home ranges in the Congo. To study the spread of the virus, measure "distance" by the number of home ranges separating a group of gorillas from the first group infected. Here are data on distance and time in number of days until deaths began in each later group. (Data set may be found here.) Distance 1 3 4 4 4 5 Time 4 21 33 41 43 46
(a) Make a scatterplot. Which is the explanatory variable? distance time either distance or time Correct: Your answer is correct. What kind of pattern does your plot show? There is a strong positive linear association between distance and time. There is a strong negative linear association between distance and time. There is little to no association between distance and time. Correct: Your answer is correct. (
b) Find the correlation r between distance and time. (Round your answer to four decimal places.)
(c)…
arrow_forward
Please do 3a with handwritten working out.
Please do 3b, 3c and 3d with R code
arrow_forward
How many rows or observations are there
in the msleep.csv data set?
arrow_forward
Is the scatterplot informative?
arrow_forward
(33) Christian Daniel - Ahora X
Quiz: Chapter 09: Chapter Quiz X
A kccd.instructure.com/courses/43...
图☆
FX
Question 9
1 pts
On a standardized spatial skills task, normative data reveal that
people typically get u = 15 correct solutions. A psychologist test n =
7 individuals who have brain injuries in the right cerebral
hemisphere. For the following data, determine whether or not right-
hemisphere damage results in significantly reduced performance on
the spatial skills task. Test with alpha = .05 and a one-tailed
%3D
distribution.
The data are as follows:
12, 16, 9, 8, 10, 17, 10
What is the t value for this study?
tsample = 4.882
O tsample = 2.87
Isample = - 2.49
O tsample = 0.424
Question 10
1 pts
arrow_forward
The exercise involving data in this and subsequent sections were designed to be solved using Excel. Johnson Filtration, Inc. provides maintenance service for water-filtration systems. Suppose that in addition to information on the number of months since the machine was serviced and whether a mechanical or an electrical repair was necessary, the managers obtained a list showing which repairperson performed the service. The revised data follow.
Repair Time(hours)
Months SinceLast Service
Type ofRepair
Repairperson
2.9
2
electrical
Dave Newton
3.0
6
mechanical
Dave Newton
4.8
8
electrical
Bob Jones
1.8
3
mechanical
Dave Newton
2.9
2
electrical
Dave Newton
4.9
7
electrical
Bob Jones
4.2
9
mechanical
Bob Jones
4.8
8
mechanical
Bob Jones
4.4
4
electrical
Bob Jones
4.5
6
electrical
Dave Newton
Ignore for now the months since the last maintenance service (x1 ) and the repairperson…
arrow_forward
A random sample of n = 25 students in Gwinnett County schools were chosen to
participate in a study about remote / digital study. Of the 25 students, 20 reported
the Google Classroom provides the best access to digital content.
%3D
Which statement best describes the population and true parameter of this scenario?
O The population is all students in Gwinnett County school; the parameter is the
proportion of students that believe Google Classroom provides the best access
to digital content
The population is all students who participated in the remote / digital learning in
Gwinnett County; the parameter is reported proportion of students that
believe Google Classroom provides the best access to digital content
The population is all students who participated in the remote / digital learning3B
the parameter is the true proportion of students that believe Google Classroom
provides the best access to digital content
O The population is the 25 students who participated in the remote / digital…
arrow_forward
Based on the information in the table which of the following statements is
supported by the data
arrow_forward
A car dealer wishes to learn whether they can predict customers preferred car make typebased on suburb so that they could distribute their mail advertising more effectively. Thetable below shows the number of people in a random sample of 245 and their current cars.
arrow_forward
You may need to use the appropriate technology to answer this question.
An automobile dealer conducted a test to determine if the time in minutes needed to complete a minor engine tune-up depends on whether a computerized engine analyzer or an electronic analyzer is used. Because tune-up time varies among compact, intermediate, and full-sized cars, the three types of cars were us
blocks in the experiment. The data obtained follow.
Analyzer
Computerized
Electronic
Compact
50
41
Car
Intermediate
54
44
Full-sized
64
47
Use a = 0.05 to test for any significant differences.
State the null and alternative hypotheses.
O Ho: MCompact = "Intermediate = "Full-sized
H: "Compact *"Intermediate * Full-sized
O Ho: "Compact * "Intermediate
* Full-sized
H: "Compact = "Intermediate "Full-sized
O Ho: "Computerized = HElectronic
H: "Computerized * "Electronic
O Ho: "Computerized = "Electronic = "Compact = 4Intermediate = "Full-sized
H: Not all the population means are equal.
O Ho: "Computerized *…
arrow_forward
what are the four imporatant sources of data?
arrow_forward
W myWalden Student Portal
h Take Test: Quiz - Unit 5 - ITEC-20 x
O ITEC-2020 Unit 05
Review Submission History: Assic x
i class.waldenu.edu/webapps/assessment/take/launch.jsp?course_assessment_id=_1602013_1&course_id%3 16730315_1&content_id%3 56826874_18step=null
# Apps M Gmail
O New Tab
O Launch Meeting - Z.
a ClassLink
O Reading To Do, i-Re. i General (Spraddling... O Flipgrid |4cc0f5c9
O i 7 Student Success S..
¥ Question Completion Status:
QUESTION 7
The probability of being stopped at one traffic light is 0.40. The probability of being stopped at the next traffic light is 0.30. The
probability of being stopped at both lights is 0.12.
part a. Are the two lights independent?
part b. Given that a car stops at the first traffic light, what is the probability the car will stop at the next light? (Answer as a
decimal to the hundredths)
EST ION O
arrow_forward
What is the smallest level of significance for which you could draw such a conclusion?
arrow_forward
You may need to use the appropriate technology to answer this question.
An automobile dealer conducted a test to determine if the time in minutes needed to complete a minor engine tune-up depends on whether a computerized engine analyzer or an electronic analyzer is used. Because tune-up time varies among compact, intermediate, and full-sized cars, the three types of cars were used as blocks in the experiment. The data obtained follow.
Analyzer
computerized
electronic
Car
compact
50
41
Intermediate
56
45
Full Sized
62
46
Use ? = 0.05 to test for any significant differences.
State the null and alternative hypotheses.
H0: ?Computerized = ?ElectronicHa: ?Computerized ≠ ?ElectronicH0: ?Computerized ≠ ?ElectronicHa: ?Computerized = ?Electronic H0: ?Computerized = ?Electronic = ?Compact = ?Intermediate = ?Full-sizedHa: Not all the population means are equal.H0: ?Compact = ?Intermediate = ?Full-sizedHa: ?Compact ≠ ?Intermediate ≠ ?Full-sizedH0:…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- The migration pattern of Monarch butterflies are tracked by a catch-and-release method in which individual butterflies are tagged with a circular, lightweight sticker placed carefully on the wings so as not to impede their ability to fly. The sticker contains a unique ID number. Volunteers across the U.S. and South America capture the butterflies, record the IDs if they are tagged, and release them. This allows us to track the locations each unique ID is found, allowing us to estimate the migration pattern. On average, 1 out of 100 captured butterflies are already tagged. Suppose you are a volunteer and capture 50 butterflies; let X denote the number of those that are already tagged. What is the distribution of X? What is the probability that you catch at least one tagged butterfly?arrow_forwardA 8.1:Part B e Edcite 0118.1:Part B + A edcite.com/apps/AssignmentViewer?assignid=Dwreeves_1614978548857 E hazelwoodschools.org bookmarks - connectED : The Higher Lower G. Watch FREE Movie. O Play Curve Fever Pr. M Watch Latest Movie. . Tyler SIS 360 A VMware Horizon E Reading list > This problem requires work. Given the side lengths, determine whether the triangle is acute, right, obtuse, or not a triangle. 15, 16, 21 A Not a Triangle B Acute (c) Obtuse D Right bookmark note highlighter answer eliminator line-reader reset answer zoom This assignment uses a Viewer designed by Edcite to meet the needs of students to practice for their state assessments. Please note that Edcite is not the state assessment provider. As such, the Edcite viewer may differ from that of the vendor selected by the state. © 2013-2021 Edcite, Inc INTL O O 1:10arrow_forwardPart 4 of 4 Summarize the results. There (Choose one) enough evidence to conclude that the second-graders in the superintendent's school district have h the nationwide average. differe is is notarrow_forward
- The deadly Ebola virus is a threat to both people and gorillas in Central Africa. An outbreak in 2002 and 2003 killed 91 of the 95 gorillas in seven home ranges in the Congo. To study the spread of the virus, measure "distance" by the number of home ranges separating a group of gorillas from the first group infected. Here are data on distance and time in number of days until deaths began in each later group. (Data set may be found here.) Distance 1 3 4 4 4 5 Time 4 21 33 41 43 46 (a) Make a scatterplot. Which is the explanatory variable? distance time either distance or time Correct: Your answer is correct. What kind of pattern does your plot show? There is a strong positive linear association between distance and time. There is a strong negative linear association between distance and time. There is little to no association between distance and time. Correct: Your answer is correct. ( b) Find the correlation r between distance and time. (Round your answer to four decimal places.) (c)…arrow_forwardPlease do 3a with handwritten working out. Please do 3b, 3c and 3d with R codearrow_forwardHow many rows or observations are there in the msleep.csv data set?arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt