HW1

pdf

School

University of Texas, Dallas *

*We aren’t endorsed by this school

Course

6337

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

10

Uploaded by BailiffNeutron16521

Report
Predictive Analytics Using SAS BUAN 6337.0W1 Group No: 08 Vallabhapurapu Naveen Sreeram -VXN21001 Sai Harshith Mothiki SXM90166 Meera Katkam MXK210014 Tanmayee Kodumagulla- TXK210004 Pranusha Yallala PXY210001
2 | P a g e Question 1: a) Examine the raw data file Pizza.csv and read it into SAS using the IMPORT procedure. Print the data set (on the results screen). Print a report that describes the contents of the data set to make sure all the variables are the correct type . Ans: Examined the Pizza.csv file and understood it’s a Comma Separated data. The file is imported into SAS using the import procedure. Using PROC print statement the output has been obtained as mentioned below. Pizza.csv data in SAS using import procedure:
3 | P a g e Using PROC contents we obtained the contents of datasets as follows.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 | P a g e b. Open the raw data file in a simple editor like WordPad and compare the data values to the output from part b) to make sure that they were read correctly into SAS. In a comment in your report, identify any problems with the SAS data set that cannot be resolved using the IMPORT procedure. Explain what is causing the problem. Ans: Upon comparing the datasets in CSV format and in SAS we found that the columns shrimp and eggplant are read as character variables even though they are numeric variables. In the first column survey number, the leading zeros are missing. Since the first two digits are considered as months of survey taken, we changed it to character of length 4. This error occurred because generally SAS reads the first 20 observations and predicts the types of variables. When scanned through the first 20 observations in this dataset, as there are missing values (Shrimp and eggplant) these variables are read as characters. This error can be rectified modifying DATA statement which is generated automatically in the log file when doing the import procedure. C. Read the same raw data file, Pizza.csv, this time using a DATA step (instead of the IMPORT procedure). Be sure to resolve any issues identified above. Ans: After reading the dataset using DATA step by specifying that ‘Shrimp’ and ‘Eggplant’ are numeric variables, and survey number as character , issues found above are resolved. Now data is read in SAS as in below screenshot, Pizza.csv data in SAS using import procedure:
5 | P a g e Using PROC contents we obtained the contents of datasets as follows.
6 | P a g e d. Create a new dataset with the average ratings for each topping. Ans: Using the PROC MEAN statement we have calculated the average of ratings of each topping.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 | P a g e Question-2: a) Examine the raw data file Hotel.dat and read it into SAS. Next, create date variables for the check-in and check-out dates, and format them to display as readable dates. The raw dataset ‘Hotel.dat’ is examined on notepad and it’s a column formatted file. Now, the data is read into SAS using DATA statement and by giving columns with variable names as shown below: Using CATX function, concatenated the check-in variables (month, day, year) check-out variables (month, day, year) as mentioned below:
8 | P a g e As in_date and out_date variable is formed using CATX variable SAS assumes them as character variables. So, these variables are modified (to Checkin_Date, Checkout_Date) with DATA statement with an ‘mmddyy10.’ for SAS to assume them as Date variables and PROC print statement wit h format ‘mmddyy10.’ for SAS to print as correct format dates as below: Hotel Data:
9 | P a g e b) Create a variable that calculates the subtotal as the room rate times the number of days in the stay, plus a per person rate ($10 per day for each person beyond one guest, for example for 3 guests, the total per person rate will be (3-1) *10=$20), plus an Internet service fee ($9.95 for a one-time activation and $4.95 per day of use). Created a new variable subtotal to calculate total individually based on Room_Rate, No_of_Guests, Internet and length of stay (Checkout_Date - Checkin_Date) variables. and formatted it to display the result rounded off to 2 decimal places using an informat ‘8.2’ Used an IF/THEN statement to calculate for Four combinations: Case-1: Number of guests = 1 and has Internet: Subtotal = (roomrate*lengthofstay) +(Internet_charges) Case -2: Number of guests =1 and no Internet: Subtotal = (roomrate*lengthofstay) Case-3: Number of guests >1 and has Internet: Subtotal = (roomrate*lengthofstay) +(Internet_charges) +(10*(Guests-1)*(lengthofstay)) Number of guests >1 and no Internet: Subtotal = roomrate*lengthofstay)+(10* (Guests-1)*(lengthofstay)) Hotel data with Subtotal:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10 | P a g e c) Create a variable that calculates the grand total as the subtotal plus sales tax at 7.75%. The result should be rounded to two decimal places. Created a new variable Grandtotal to calculate total with taxes and formatted it to display the result rounded off to 2 decimal places u sing an informat ‘8.2’ Hotel data with GrandTotal: d. View the resulting data set. In a comment in your report, state the value for the grand total for room 211. Based the results above, the grand total for room 211 is $1357.65