Lab1_F22

pdf

School

York University *

*We aren’t endorsed by this school

Course

3G03

Subject

Mathematics

Date

Feb 20, 2024

Type

pdf

Pages

7

Uploaded by SuperHumanOtterMaster523

Report
McMaster University Department of Mathematics and Statistics STATS 3A03: Applied Regression Analysis with SAS Fall 2022 SAS Lab 1: Week of September 12-16, 2022 Topics Covered in this Lab 1. The SAS University Edition Editor and SAS on Demand 2. Folders and libraries 3. Writing/submitting code (a) Basics (b) Input data via SAS Editor (the DATA step). (c) Importing data from csv files using PROC IMPORT . (d) PROC PRINT. (e) PROC MEANS. 4. Saving your SAS output. Note: In these notes, it is assumed you are using SAS University Edition or SAS on Demand. 1. SAS University Edition Editor/SAS on Demand: First, please open your virtual machine (VirtualBox) and start your SAS University Edition. Then you can connect to your SAS University Edition software by entering this address in your browser: http://localhost:10080. Then you can open a SAS editor by clicking on the tab “Server Files and Folders”. Now you can write your SAS program in the new editor. If you are using SAS on Demand, please login using your username and password. If you have not already done so, you will need to create a SAS on Demand account before being able to use SAS. 2. Folders and Libraries : Under “Server Files and Folders” one can create folders and files (.sas files). Furthermore, SAS uses what are called “libraries”. A SAS library is a folder located on a user’s disk drive or on the internet that is designated for use by SAS, i.e., it is a collection of SAS files. They can be used SAS University Edition and SAS OnDemand for Academics both use the SAS Studio interface. It is possible to use SAS libraries in both of these programs, but since they work on “virtual machines”, the file paths for libraries will be different than if you were using a desktop. These programs will not recognize file paths to locations on your computer. When beginning a SAS session, SAS automatically creates at least two libraries: Work and Sasuser , with the former being temporary and the latter being permanent. You can also create a SAS library yourself. In such a case, you will generally give your libraries names 1
by which you will refer to them. This is done using the libname command. It is generally a good idea to run this command as soon as you open up SAS. When you are using the SAS University Edition, your library file paths will always begin with /folders/myfolders/ .... There are a few guidelines to keep in mind when you create the name of your library. Library names: 1. Are limited to eight characters; 2. Must begin with a letter or underscore; 3. Can contain only letters, numbers, or underscores. Blanks are not allowed. For example: suppose that we wish to use a directory (folder) under the shared folder “myfolders”. Suppose that we have a directory called “Lab1”, then we would set it up as a SAS library by writing: libname lab1 ’/folders/myfolders/Lab1/’; run; In SAS on Demand, your library paths will always begin with /home/your~user~name/ To create a library, you will need to do the following: 1. Select the folder that you want to upload the data to. You can expand My Folders and then select an existing folder or use the New icon to create a new folder. 2. Right click on the folder and select Create Library . 3. Provide a name for the library. The library will be created and will be available from the libraries panel on the left-hand side. 4. Expand the Libraries panel, right click on the library that you just created, and then click Properties . 5. Copy or make note of the library path so that you can define a libname statement. Note that this path will be different for every user. For example, if the library path name is /home/u58890640 , and we wanted to create a library named lab1 , then the we would use: libname lab1 ’/home/u61771837/’; run; 6. We can now use the libname statement within our SAS code. A major advantage of setting up a library in this way is that SAS will save a dataset in the specified library using a special format. In subsequent SAS sessions, it is not necessary to read in the dataset again. All you need to do is to make sure you start each SAS session with the libname command as above. Note that the library name you use for a directory does not need to stay the same the next time you run SAS but it is generally best if you keep the same library name to ensure the same code will still run. Libraries can be temporary or permanent. A SAS dataset in a library should be referred to in the form < libname > . < dataname > . For example lab1.oldfaithful1 used below. Since SAS calls directories that contain datasets libraries, SAS libraries are sometimes referred to as “SAS data libraries” since a common purpose is the storage of data sets. When you end any SAS session, any libraries that you’ve defined will be lost. This just means that when you restart SAS, you will need reload the library again in order to access its contents. 2
3(a) Basics : SAS “code” is made up of statements. The basic rules of SAS statements are: 1. every SAS statement ends with a semicolon 2. statements can be upper or lower case 3. statements can continue on the next line 4. statements can be on the same line as other statements 5. statements can start in any column 6. options in one statement are separated by a space. We can commentate our code using a slash followed by an asterisk or just an asterisk. If we just want to explain code, we start with the asterisk and end with a semicolon. If we want to “comment out” code, as we say, to indicate not to run, we start wish a slash and asterisk and end with asterisk then a slash. To submit the code to SAS click on the Submit icon which looks like a running man! By default this will submit every command in the SAS editor; to submit just part of it use your mouse to highlight the appropriate part and click on the Submit icon. You can save your program using Save icon in the “CODE” tab. Save it as a .sas file. 3(b) The DATA step: There are two basic building blocks in SAS: DATA steps and PROC steps. DATA steps, like PROC steps, are made up of statements. After “DATA” is a name that yo make up for a SAS data set. The purpose of the step can be to read or modify data, or to enter data manually. In the end though you create a SAS data set. As an example, consider the data set“Old Faithful Geyser Data”. The “Old Faithful Geyser Data” describes the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA. There are two variables in this dataset: (1) Eruption time in mins and (2) Waiting time to next eruption in mins. We save the data under “Lab1” and name it “oldfaithful1”. DATA lab1.oldfaithful1; input eruptions waiting ; datalines; 2.0 50 1.8 57 3.7 55 2.2 47 2.1 53 2.4 50 2.6 62 2.8 57 3.3 72 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3.5 62 3.7 63 3.8 70 4.5 85 4.7 75 4.0 77 4.0 70 1.7 43 1.8 48 4.9 70 4.2 79 4.3 72 ; run; 3(c) Importing Data from files: We can also import a dataset from a file. In this course most of the data will be made available as csv files on the course website > Content > Datasets. These can be imported into SAS using PROC IMPORT . For example, suppose we would like to import the complete “Old Faithful Geyser Data” which is named as “faithful.csv”. First, we would save this file in a folder. Let’s suppose that we have done this and saved the file in myfolders/Lab1. We can then load it into a SAS dataset in our library Lab1 using PROC IMPORT datafile=’/folders/myfolders/Labs/Lab1/faithful.csv’ out=lab1.oldfaithful2 dbms=csv replace ; run; In SAS on Demand, first upload the “faithful.csv” using the Upload icon in the Server Files and Folders tab on the left-hand side. You can upload the file it into a folder for “Lab 1” or just into the general file list on the left-hand side. To import the complete “Old Faithful Geyser Data”, we would then use: PROC IMPORT datafile=’/home/u58890640/Labs/Lab1/faithful.csv’ out=lab1.oldfaithful2 dbms=csv replace ; run; Note here that to find the file name to use in the datafile command, right click on the “faithful.csv” file in the Server Files and Folders tab on the left-hand side and select 4
Properties . This will pop up a window and you can copy the “Location” of the file to use in the datafile command. The out=lab1.oldfaithful2 says that the SAS dataset should be saved in the library lab1 and will be called oldfaithful2 in that library. Note: the data set can have any valid name. The ‘/folders/myfolders/Labs/Lab1/faithful.csv’ or ‘/home/u58890640/Labs/Lab1/faithful.csv tells SAS where to find the file with the data and its name. Replace the path that is used above with the correct path to where you saved the downloaded file. The dbms=csv says that this will be a csv file. These are all options in the PROC IMPORT statement and as such are part of a single statement so the semicolon comes after these to end the statement. The replace option tells SAS to overwrite an existing SAS dataset. If you do not specify the replace option, the IMPORT procedure does not overwrite an existing dataset. Finally run; says we are finished with statements modifying PROC IMPORT . When you have completed this you should see in the directory, which you used in the libname statement earlier, that there is a new file called oldfaithful2.sas7bdat . This is the SAS dataset and will now always be available in the library so you will not need to import the data again. 3(d) PROC PRINT: Most things in SAS are done using procedures which start with the word PROC . After writing the code for a procedure together with any statements that it requires, add a line with just run; which tells SAS that you want to run the previous procedure. Remember that every command must end in a semi-colon (;). This procedure allows us to print out the data to the SAS RESULTS tab. This can be very useful to make sure that the data was imported correctly. You need to tell the procedure the name of the dataset. PROC PRINT Data=lab1.oldfaithful2; run; By default, the procedure will print the whole dataset, which is quite long in this case, so we may only want to print the first few rows to ensure that the data were imported correctly. We can print the first six rows using PROC PRINT Data=lab1.oldfaithful2(obs=6); run; 3(e) PROC MEANS: This procedure will calculate and display summary statistics such as the mean, standard deviation, minimum, and maximum. We need to specify which variables in a dataset we want to look at using the Var statement. PROC MEANS Data=lab1.oldfaithful2; Var eruptions; run; We can give other options to the procedure to calculate a number of other quantities such as PROC MEANS Data=lab1.oldfaithful2 N Mean STD Min Q1 Median Q3 Max; 5
Var waiting; run; This will give the sample size, mean, and standard deviation, as well as the 5-number summary for each variable. A useful statement in PROC MEANS is the BY statement, which allows us to get information on one variable for different values of another variable. It is important, however, that the dataset is sorted by values of the variable defining the classes. The original dataset is not sorted by waiting time so we must first sort it and then call PROC MEANS . The PROC SORT will sort the dataset and then we apply PROC MEANS . PROC SORT Data=lab1.oldfaithful2 OUT=lab1.oldfaithful2_sorted; BY waiting; run; PROC MEANS Data=lab1.oldfaithful2_sorted N Mean STD Min Q1 Median Q3 Max; Var eruptions; BY waiting; run; 4. Saving SAS Output: By default, SAS output is sent to the “RESULTS” tab where you can look at it. Often, however, you will want to save it to another file outside of SAS. In SAS University Edition, you can save the output as a HTML file, a PDF file, or an RTF file. To do this, when in the SAS “RESULTS” tab, click on the icons indicating the file type. The output will be saved to to your default download folder. In SAS OnDemand, you can take a screenshot of the appropriate tables in the results tab or save the file as an HTML file. Saving your output in this way will create a new file that can be viewed outside SAS. It will also allow you to use bits you want from the output and re-arrange them. This will be particularly useful for your assignments. Always make sure that you save your output before quitting SAS or it will be lost and you will need to redo your analysis. It’s also a good idea to save the contents of the SAS editor frequently. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Exercise: The Diabetes in Pima India Women (PIMA) data contains women who where at least 21 years old, of Pima Indian heritage and living near Phoenix, Arizona, was tested for diabetes according to World Health Organization criteria. The data were collected by the US National Institute of Diabetes and Digestive and Kidney Diseases. The dataset contains the following columns: 1. Number of pregnancies; 2. Plasma glucose concentration in an oral glucose tolerance test; 3. Diastolic blood pressure (mm Hg); 4. Triceps skin fold thickness (mm); 5. Body mass index; 6. Diabetes pedigree function; 7. Age in years; 8. Yes or No, for diabetic according to WHO criteria. 1. Import the pima.csv from the course website > Contents > SAS Labs + Tutorials > Lab 1. (a) What are the names of the variables in the dataset? (b) Print out the first 10 rows of the dataset. (c) Find the number of observations, sample mean, variance, min, Q1, median, Q3, and max for “bmi” (body mass index) and “glu” (plasma glucose concentration in an oral glucose tolerance test). (d) Sort the data and use the BY statement in PROC MEANS find the mean, variance for “npreg” (number of pregnancies) for diabetic women vs non-diabetic women. Also find the 95% confidence intervals for the mean number of pregnancies ( CLM in the PROC MEAN statement). 7