Lab1_F22
pdf
keyboard_arrow_up
School
York University *
*We aren’t endorsed by this school
Course
3G03
Subject
Mathematics
Date
Feb 20, 2024
Type
Pages
7
Uploaded by SuperHumanOtterMaster523
McMaster University
Department of Mathematics and Statistics
STATS 3A03: Applied Regression Analysis with SAS
Fall 2022
SAS Lab 1: Week of September 12-16, 2022
Topics Covered in this Lab
1. The SAS University Edition Editor and SAS on Demand
2. Folders and libraries
3. Writing/submitting code
(a) Basics
(b) Input data via SAS Editor (the DATA step).
(c) Importing data from csv files using
PROC IMPORT
.
(d)
PROC PRINT.
(e)
PROC MEANS.
4. Saving your SAS output.
Note:
In these notes, it is assumed you are using SAS University Edition or SAS on Demand.
1.
SAS University Edition Editor/SAS on Demand:
First, please open your virtual
machine (VirtualBox) and start your SAS University Edition. Then you can connect to your SAS
University Edition software by entering this address in your browser: http://localhost:10080.
Then you can open a SAS editor by clicking on the tab “Server Files and Folders”. Now you
can write your SAS program in the new editor.
If you are using SAS on Demand, please login using your username and password. If you have
not already done so, you will need to create a SAS on Demand account before being able to use
SAS.
2. Folders and Libraries
: Under “Server Files and Folders” one can create folders and files
(.sas files). Furthermore, SAS uses what are called “libraries”. A SAS library is a folder located
on a user’s disk drive or on the internet that is designated for use by SAS, i.e., it is a collection
of SAS files.
They can be used SAS University Edition and SAS OnDemand for Academics
both use the SAS Studio interface. It is possible to use SAS libraries in both of these programs,
but since they work on “virtual machines”, the file paths for libraries will be different than if
you were using a desktop.
These programs will not recognize file paths to locations on your
computer.
When beginning a SAS session, SAS automatically creates at least two libraries:
Work
and
Sasuser
, with the former being temporary and the latter being permanent. You can
also create a SAS library yourself. In such a case, you will generally give your libraries names
1
by which you will refer to them. This is done using the
libname
command. It is generally a
good idea to run this command as soon as you open up SAS.
When you are using the SAS University Edition, your library file paths will always begin with
/folders/myfolders/
....
There are a few guidelines to keep in mind when you create the
name of your library. Library names:
1. Are limited to eight characters;
2. Must begin with a letter or underscore;
3. Can contain only letters, numbers, or underscores. Blanks are not allowed.
For example: suppose that we wish to use a directory (folder) under the shared folder “myfolders”.
Suppose that we have a directory called “Lab1”, then we would set it up as a SAS library by
writing:
libname lab1 ’/folders/myfolders/Lab1/’;
run;
In SAS on Demand, your library paths will always begin with
/home/your~user~name/
To
create a library, you will need to do the following:
1. Select the folder that you want to upload the data to. You can expand
My Folders
and
then select an existing folder or use the
New
icon to create a new folder.
2. Right click on the folder and select
Create
→
Library
.
3. Provide a name for the library. The library will be created and will be available from the
libraries panel on the left-hand side.
4. Expand the
Libraries
panel, right click on the library that you just created, and then click
Properties
.
5.
Copy or make note of the library path so that you can define a
libname
statement.
Note that this path will be different for every user.
For example, if the library path name
is
/home/u58890640
, and we wanted to create a library named
lab1
, then the we would use:
libname lab1 ’/home/u61771837/’;
run;
6. We can now use the
libname
statement within our SAS code.
A major advantage of setting up a library in this way is that SAS will save a dataset in the
specified library using a special format. In subsequent SAS sessions, it is not necessary to read
in the dataset again. All you need to do is to make sure you start each SAS session with the
libname
command as above. Note that the library name you use for a directory does not need
to stay the same the next time you run SAS but it is generally best if you keep the same library
name to ensure the same code will still run. Libraries can be temporary or permanent. A SAS
dataset in a library should be referred to in the form
<
libname
>
.
<
dataname
>
. For example
lab1.oldfaithful1
used below.
Since SAS calls directories that contain datasets libraries, SAS libraries are sometimes referred
to as “SAS data libraries” since a common purpose is the storage of data sets. When you end
any SAS session, any libraries that you’ve defined will be lost. This just means that when you
restart SAS, you will need reload the library again in order to access its contents.
2
3(a) Basics
: SAS “code” is made up of statements. The basic rules of SAS statements are:
1. every SAS statement ends with a semicolon
2. statements can be upper or lower case
3. statements can continue on the next line
4. statements can be on the same line as other statements
5. statements can start in any column
6. options in one statement are separated by a space.
We can commentate our code using a slash followed by an asterisk or just an asterisk. If we
just want to explain code, we start with the asterisk and end with a semicolon. If we want to
“comment out” code, as we say, to indicate not to run, we start wish a slash and asterisk and
end with asterisk then a slash.
To submit the code to SAS click on the Submit icon which looks like a running man! By default
this will submit every command in the SAS editor; to submit just part of it use your mouse to
highlight the appropriate part and click on the Submit icon. You can save your program using
Save icon in the “CODE” tab. Save it as a .sas file.
3(b) The DATA step:
There are two basic building blocks in SAS: DATA steps and PROC
steps. DATA steps, like PROC steps, are made up of statements. After “DATA” is a name that
yo make up for a SAS data set. The purpose of the step can be to read or modify data, or to
enter data manually. In the end though you create a SAS data set.
As an example, consider the data set“Old Faithful Geyser Data”.
The “Old Faithful Geyser
Data” describes the waiting time between eruptions and the duration of the eruption for the
Old Faithful geyser in Yellowstone National Park, Wyoming, USA. There are two variables in
this dataset: (1) Eruption time in mins and (2) Waiting time to next eruption in mins. We save
the data under “Lab1” and name it “oldfaithful1”.
DATA lab1.oldfaithful1;
input eruptions waiting ;
datalines;
2.0 50
1.8 57
3.7 55
2.2 47
2.1 53
2.4 50
2.6 62
2.8 57
3.3 72
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3.5 62
3.7 63
3.8 70
4.5 85
4.7 75
4.0 77
4.0 70
1.7 43
1.8 48
4.9 70
4.2 79
4.3 72
;
run;
3(c) Importing Data from files:
We can also import a dataset from a file. In this course
most of the data will be made available as csv files on the course website
>
Content
>
Datasets.
These can be imported into SAS using
PROC IMPORT
. For example, suppose we would like to
import the complete “Old Faithful Geyser Data” which is named as “faithful.csv”.
First, we would save this file in a folder. Let’s suppose that we have done this and saved the
file in myfolders/Lab1. We can then load it into a SAS dataset in our library
Lab1
using
PROC IMPORT datafile=’/folders/myfolders/Labs/Lab1/faithful.csv’
out=lab1.oldfaithful2
dbms=csv
replace
;
run;
In SAS on Demand, first upload the “faithful.csv” using the
Upload
icon in the
Server Files
and Folders
tab on the left-hand side. You can upload the file it into a folder for “Lab 1” or
just into the general file list on the left-hand side. To import the complete “Old Faithful Geyser
Data”, we would then use:
PROC IMPORT datafile=’/home/u58890640/Labs/Lab1/faithful.csv’
out=lab1.oldfaithful2
dbms=csv
replace
;
run;
Note here that to find the file name to use in the
datafile
command, right click on the
“faithful.csv” file in the
Server Files and Folders
tab on the left-hand side and select
4
Properties
. This will pop up a window and you can copy the “Location” of the file to use in
the
datafile
command.
The
out=lab1.oldfaithful2
says that the SAS dataset should be saved in the library
lab1
and will be called
oldfaithful2
in that library. Note: the data set can have any valid name.
The
‘/folders/myfolders/Labs/Lab1/faithful.csv’
or
‘/home/u58890640/Labs/Lab1/faithful.csv
tells SAS where to find the file with the data and its name. Replace the path that is used above
with the correct path to where you saved the downloaded file. The
dbms=csv
says that this will
be a csv file.
These are all options in the
PROC IMPORT
statement and as such are part of a single statement so
the semicolon comes after these to end the statement. The
replace
option tells SAS to overwrite
an existing SAS dataset. If you do not specify the
replace
option, the
IMPORT
procedure does
not overwrite an existing dataset.
Finally
run;
says we are finished with statements modifying
PROC IMPORT
. When you have
completed this you should see in the directory, which you used in the
libname
statement earlier,
that there is a new file called
oldfaithful2.sas7bdat
. This is the SAS dataset and will now
always be available in the library so you will not need to import the data again.
3(d) PROC PRINT:
Most things in SAS are done using procedures which start with the word
PROC
. After writing the code for a procedure together with any statements that it requires, add
a line with just
run;
which tells SAS that you want to run the previous procedure. Remember
that every command must end in a semi-colon (;). This procedure allows us to print out the data
to the SAS RESULTS tab. This can be very useful to make sure that the data was imported
correctly. You need to tell the procedure the name of the dataset.
PROC PRINT Data=lab1.oldfaithful2;
run;
By default, the procedure will print the whole dataset, which is quite long in this case, so we
may only want to print the first few rows to ensure that the data were imported correctly. We
can print the first six rows using
PROC PRINT Data=lab1.oldfaithful2(obs=6);
run;
3(e) PROC MEANS:
This procedure will calculate and display summary statistics such as
the mean, standard deviation, minimum, and maximum. We need to specify which variables in
a dataset we want to look at using the
Var
statement.
PROC MEANS Data=lab1.oldfaithful2;
Var eruptions;
run;
We can give other options to the procedure to calculate a number of other quantities such as
PROC MEANS Data=lab1.oldfaithful2 N Mean STD Min Q1 Median Q3 Max;
5
Var waiting;
run;
This will give the sample size, mean, and standard deviation, as well as the 5-number summary
for each variable.
A useful statement in
PROC MEANS
is the
BY
statement, which allows us to get information on
one variable for different values of another variable. It is important, however, that the dataset
is sorted by values of the variable defining the classes.
The original dataset is not sorted by
waiting time so we must first sort it and then call
PROC MEANS
. The
PROC SORT
will sort the
dataset and then we apply
PROC MEANS
.
PROC SORT Data=lab1.oldfaithful2 OUT=lab1.oldfaithful2_sorted;
BY waiting;
run;
PROC MEANS Data=lab1.oldfaithful2_sorted N Mean STD Min Q1 Median Q3 Max;
Var eruptions;
BY waiting;
run;
4. Saving SAS Output:
By default, SAS output is sent to the “RESULTS” tab where you
can look at it. Often, however, you will want to save it to another file outside of SAS.
In SAS University Edition, you can save the output as a HTML file, a PDF file, or an RTF file.
To do this, when in the SAS “RESULTS” tab, click on the icons indicating the file type. The
output will be saved to to your default download folder.
In SAS OnDemand, you can take a screenshot of the appropriate tables in the results tab or
save the file as an HTML file.
Saving your output in this way will create a new file that can be viewed outside SAS. It will also
allow you to use bits you want from the output and re-arrange them. This will be particularly
useful for your assignments. Always make sure that you save your output before quitting SAS or
it will be lost and you will need to redo your analysis. It’s also a good idea to save the contents
of the SAS editor frequently.
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Exercise:
The Diabetes in Pima India Women (PIMA) data contains women who where at least 21 years
old, of Pima Indian heritage and living near Phoenix, Arizona, was tested for diabetes according
to World Health Organization criteria. The data were collected by the US National Institute of
Diabetes and Digestive and Kidney Diseases. The dataset contains the following columns:
1. Number of pregnancies;
2. Plasma glucose concentration in an oral glucose tolerance test;
3. Diastolic blood pressure (mm Hg);
4. Triceps skin fold thickness (mm);
5. Body mass index;
6. Diabetes pedigree function;
7. Age in years;
8. Yes or No, for diabetic according to WHO criteria.
1. Import the pima.csv from the course website
>
Contents
>
SAS Labs + Tutorials
>
Lab
1.
(a) What are the names of the variables in the dataset?
(b) Print out the first 10 rows of the dataset.
(c) Find the number of observations, sample mean, variance, min, Q1, median, Q3, and
max for “bmi” (body mass index) and “glu” (plasma glucose concentration in an oral
glucose tolerance test).
(d) Sort the data and use the
BY
statement in
PROC MEANS
find the mean, variance for
“npreg” (number of pregnancies) for diabetic women vs non-diabetic women. Also
find the 95% confidence intervals for the mean number of pregnancies (
CLM
in the
PROC MEAN
statement).
7