HW1_SPSS
docx
keyboard_arrow_up
School
DePaul University *
*We aren’t endorsed by this school
Course
403
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
20
Uploaded by AmbassadorHorsePerson1079
1
IT 223
HOMEWORK 1
EXPLORATORY DATA ANALYSIS IN SPSS
Total: 30 Points
The goal of this assignment is to guide you through the exploratory analysis of a dataset using SPSS.
You should do the following exercise in SPSS and submit the results of part 5 as
explained at the end of the exercise.
This problem should be completed after doing the reading assignments, the practice exercises and
viewing the SPSS video tutorials on exploratory data analysis.
Data description
The data set gss2004.xls contains 575 observations on 5 variables:
SEX =
Respondent's sex – {1 for Male, 2 for Female}
AGE = Age of
respondent
WWWHR = Hours on the WWW per week for Internet users
NEWS30 = Respondent has used news site in the past 30 days (1= “never”, 2 = “1-2
times”, 3=”3-
5times” 4=”more than 5 times”)
EMAILHR = Hours of e-mail per week for Internet users
The data were collected from the 2004 General Social Survey for adult respondents (18 years of
age or
older), living in the United States. The GSS is one of the largest and longest projects that
have been
conducted to monitor social change and the growing complexity of American society
(see
http://www.norc.org for more information).
The analysis described below will study the number of hours spent by Internet users using email.
The study will also explore whether men and women use email differently.
PART 0: Download the data on your harddrive:
1)
Login to the course website at http:\\d2l.depaul.edu
2)
Go to Segment 1 (select Content on the top navigation bar)
3)
Click on the Datasets link on the left navigation bar of Segment 1 and download the
Excel file gss2004.xls
.on your computer. 4)
Open the SPSS program
(If you are running SPSS on the CDM terminals, apply the steps above from the terminal server
you are logged
on)
PART 1: Import an Excel file in SPSS
1)
Click on File > Open > Data… under the top menu in SPSS. A dialog box to select files will
pop up.
2)
Go to the folder where you saved the data file gss2004.xls
, and select it. You need to search
for “xls” data files in the “files of type” box
3)
Click OK.
4)
The data should now appear in a SPSS data worksheet. Save it in a .sav file using the SAVE
AS… option.
2
5)
If data are successfully imported in SPSS, you should have 5 columns of data, which are the
variables described above.
PART 2: Define variables properties in the Variable View
1)
Click on the Variable View of the SPSS data editor. This view will enable you to specify
properties of the variables in the dataset, such as change type, add label, etc...
2)
Type in meaningful labels for each variable under the Label column (you can use the labels
specified above). This step helps you remember what the variables are about.
3)
Add value labels under the Value column for SEX and NEWS30. This will help you
remember what the codes {1,2,…} denote. The labels will be used in the SPSS output.
4)
Select the correct Measure for each variable. Remember that an ordinal variable has values
that can be ranked (e.g. preferences); a nominal variable has values with no ranking (e.g
gender) and a scale variable is a measurement variable that takes numeric values (e.g. salary).
PART 3: Creating a Histogram
Create a histogram of WWWHR: the hours per week
spent on the WWW for Internet users.
1)
Select Graphs > Legacy Dialogs > Histogram…
under the top bar menu
2)
The Histogram dialog box will appear. Select the
variable to be analyzed (WWWHR) and click on
the “>” arrow button to move the variable into
the “Variable” box.
3)
Click on “Titles…” button to add a title to the
graph. Just use your intuition to navigate the
other screens.
4)
Click OK
5)
The following histogram should appear
6)
Double Click on the histogram chart in the
Output window and the Chart Editor should
appear.
7)
To display only positive values on the Xaxis
since WWHR > 0, click on the X-axis (or go to
“Edit > Select X axis” on the chart editor menu).
Select the “Scale” tab and change minimum
value to 0. Then click OK and close the chart editor.
3
8)
To change the histogram intervals, double click on
the histogram bars and select the Binning tab.
Check “Custom” and change the interval width to
a small number. How does the histogram change?
9)
Try now a large interval width. What happens?
PART 4: Compute descriptive statistics for
number of hours for email (EMAILHR)
METHOD 1: Simple procedure to compute a
few descriptive statistics
1)
Select Analyze > Descriptive Statistics
> Descriptive…
2)
Choose the variables to analyze
3)
Click on the Options button to select the
statistics
4)
Click OK
METHOD 2: BETTER! More statistics –
lots of information!
1)
Select Analyze > Descriptive Statistics
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
> Explore
2)
Choose the variables to analyze and
move them to the “Dependent list” box.
3)
Click on the Statistics button and check Percentiles.
4)
Click on the Plots buttons and check histogram to create a histogram, and uncheck
Stem& Leaf plot.
Use both functions and compare the results.
5
PART 5: TO BE SUBMITTED - Compute the descriptive statistics for email time
(EMAILHR) by sex of respondents
1)
Select Analyze>Descriptive Statistics >
Explore…
2)
Move the variable EMAILHR into the
Dependent List box
3)
Move the SEX variable into the Factor
List box
4)
Click on Statistics to select the statistics
to
compute
5)
Click on Plots… and select “Factor levels
together” under BoxPlots and check
Histogram, and Normality Plots with tests.
6)
Click Continue
7)
Click OK in the “Explore” box
6
ANSWER THE FOLLOWING QUESTION: Do men spend more time writing/reading emails than women?
Compute the following summary statistics (mean, standard deviation, first quartile, median, third quartile,
max and min from SPSS) for the number of weekly hours that men and women spend on emails.
Write the statistics in the table below.
Gender
Variables
Mean
St.Dev.
Max
Min
Male
EMAILHR
6.33
9.506
50
0
Female
EMAILHR
5.93
8.884
50
0
Gender
Variables
Median
First quartile
Third quartile
Male
EMAILHR
2
1
8
Female
EMAILHR
2
1
7
1.
Analyze the descriptive statistics and graphs for men and women. Do you see any difference in the amount of time that men and women spend on email?
Describe the shape, center and spread of the distribution and explain in plain English.
emailhr
50
40
30
20
10
0
100
80
60
40
20
0
Histogram
for sex= Male
Mean = 6.33
Std. Dev. = 9.506
N = 239
emailhr
50
40
30
20
10
0
120
100
80
60
40
20
0
Histogram
for sex= Female
Mean = 5.93
Std. Dev. = 8.884
N = 336
SHAPE:
The amount of emailing time for male and women is right skewed (Positive right skewed). This means that majority of people spend about 10 hrs on reading/writing emails per week, however, we have very few people that spend up to 50 hrs per week.
Center:
The median shows that 50%(Q2) of the people spend 2 hrs on emails.
Distribution:
The Inter-Quartile Range (IQR =Q3-Q1), how spread the data is distributed. The IQR is 7 for men when compared this number to median we use that is very large, meaning that there is huge discrepancy among the email hrs. Comparing the min=0 and max=50 email times it became evident that the emailing time varies widely.
2.
Based on the statistics values computed above and on the shape of the distribution, which statistics would you use to summarize the center of the data and why? Since the distribution is right skewed, I’ll use median to explain or interpret the center.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7
3.
What does the five number tell us about the time spent on email (Hint, interpret the five number summary in plain English) The five number summary shows 25% of the people will spend 1hr or less on emails per week while 50%
will spend 2hrs or less and the, 75% will spend 8hrs or less. The min amount of time spend by both male and female is 0hr, i.e., less than 60mins per week and the max time is spend is 50hrs per week.
4.
What does the Boxplot and the normality test show? Explain.
Observed Value
60
40
20
0
4
2
0
-2
Normal Q-Q Plot of emailhr
for sex= Male
Observed Value
60
40
20
0
6
4
2
0
-2
Normal Q-Q Plot of emailhr
for sex= Female
The graph shows the amount of writing reading emails for both men and women violates the normality test. The points are not aligned on the diagonal line and there are so many outliers.
The box plot supports this finding, it looks like there are outliers after 20hrs of emailing per week for both
male and female.
5.
Use the 1.5xIQR rule to identify possible outliers. List the cutoff points for outliers, Show your workings. Explain what you found out. (Hint: Are there any excessive time spent on email for Male or Female or both).
o- outliers
*- Far outliers
sex
Female
Male
50
40
30
20
10
0
573
575
571
568
334
333
330
332
566
233
560
562
564
103
558
559
104
572
335
336
567
570
331
234
561
565
329
102
232
328 556
322
325 545
546
544
555
557
554
548
549
551
8
Male – Upper outlier =Q3 +(1.5* IQR) = 8+(1.5*7) =18.5( 18.5 hr and above are outliers)
Lower outlier =Q1 -(1.5* IQR) = 1-(1.5*7)=-9.5, because this is negative and time is not negative.
The calculation above shows that there is no lower outlier because it is negative and there is no negative time.
It is however, unusual to spend more than 18hrs and 30 minutes in email reading or writing, meaning that the man uses more than this time on email is considered not typical
Female-
Upper outlier =Q3 +(1.5* IQR) = 7+(1.5*6)=16
Lower outlier =Q1 -(1.5* IQR) = 1-(1.5*6)=-8, because this is negative and time is not negative.
The calculation above shows that there is no lower outlier because its is negative and there is no negative time.
It is however, unusual to spend more than 16 hrs in email reading or writing, meaning that the man uses more than this time on email is considered not typical
In general, the median shows the that both male and female spend the same amount of time on email ,i.e. ,
2hrs per week and there max time use is 50hrs. However, results also shows on the extreme cut-off for male is hour more than female which doesn’t seem much of a difference.
SUBMISSION INSTRUCTIONS: Copy the SPSS relevant tables, graphs and your answer in a
document. Bring a printed copy to class on due date and also submit it at the course webpage at
http://d2l.depaul.edu.
Write your name on the document you submit.
Keep a copy of all your submissions!
If you have questions about the homework, email me BEFORE the deadline. Please pay attention to due
date. No late homework will be accepted.
Graphs Notes
Output Created
14-SEP-2023 12:02:57
Comments
Input
Data
C:\Users\AMOHAPAT\
Downloads\HW_1.sav
Active Dataset
DataSet1
Filter
<none>
9
Weight
<none>
Split File
<none>
N of Rows in Working Data File
575
Syntax
GRAPH
/HISTOGRAM=wwwhr.
Resources
Processor Time
00:00:03.41
Elapsed Time
00:00:01.39
wwwhr
80
60
40
20
0
100
80
60
40
20
0
Mean = 7.16
Std. Dev. = 9.67
N = 575
Explore
Notes
Output Created
14-SEP-2023 12:36:50
Comments
Input
Data
C:\Users\AMOHAPAT\
Downloads\HW_1.sav
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1
0
Active Dataset
DataSet1
Filter
<none>
Weight
<none>
Split File
<none>
N of Rows in Working Data File
575
Missing Value HandlingDefinition of Missing
User-defined missing values
for dependent variables are treated as missing.
Cases Used
Statistics are based on cases with no missing values for any dependent variable or factor used.
Syntax
EXAMINE VARIABLES=emailhr
/PLOT BOXPLOT HISTOGRAM
/COMPARE GROUPS
/PERCENTILES(5,10,25
,50,75,90,95) HAVERAGE
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
Resources
Processor Time
00:00:00.77
Elapsed Time
00:00:00.31
Case Processing Summary
Cases
Valid
Missing
Total
N
Percent
N
Percent
N
Percent
emailhr
575
100.0%
0
0.0%
575
100.0%
Descriptives
Statistic
Std. Error
emailhr
Mean
6.10
.381
1
1
95% Confidence Interval for Mean
Lower Bound
5.35
Upper Bound
6.84
5% Trimmed Mean
4.68
Median
2.00
Variance
83.567
Std. Deviation
9.142
Minimum
0
Maximum
50
Range
50
Interquartile Range
6
Skewness
2.667
.102
Kurtosis
7.696
.203
Percentiles
Percentiles
5
10
25
50
75
Weighted Average(Definition
1)
emailhr
.00
.00
1.00
2.00
7.00
Tukey's Hinges
emailhr
1.00
2.00
7.00
Percentiles
Percentiles
90
95
Weighted Average(Definition
1)
emailhr
18.80
28.00
Tukey's Hinges
emailhr
emailhr
1
2
emailhr
50
40
30
20
10
0
250
200
150
100
50
0
Histogram
Mean = 6.1
Std. Dev. = 9.142
N = 575
emailhr
50
40
30
20
10
0
572
573
574
575
335
336
571
567
568
570
334
333
330
331
332
566
103
329
558
559
328
555
556
557
554
327
552
553
549
550
551
544
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1
3
Explore
Notes
Output Created
14-SEP-2023 18:52:08
Comments
Input
Data
C:\Users\AMOHAPAT\
Downloads\HW_1.sav
Active Dataset
DataSet1
Filter
<none>
Weight
<none>
Split File
<none>
N of Rows in Working Data File
575
Missing Value HandlingDefinition of Missing
User-defined missing values
for dependent variables are treated as missing.
Cases Used
Statistics are based on cases with no missing values for any dependent variable or factor used.
Syntax
EXAMINE VARIABLES=emailhr BY
sex
/PLOT BOXPLOT HISTOGRAM NPPLOT
/COMPARE GROUPS
/PERCENTILES(5,10,25
,50,75,90,95) HAVERAGE
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
Resources
Processor Time
00:00:03.50
Elapsed Time
00:00:01.08
1
4
sex
Case Processing Summary
sex
Cases
Valid
Missing
Total
N
Percent
N
Percent
N
Percent
emailhr
Male
239
100.0%
0
0.0%
239
100.0%
Female
336
100.0%
0
0.0%
336
100.0%
Descriptives
sex
Statistic
Std. Error
emailhr
Male
Mean
6.33
.615
95% Confidence Interval for Mean
Lower Bound
5.12
Upper Bound
7.54
5% Trimmed Mean
4.85
Median
2.00
Variance
90.364
Std. Deviation
9.506
Minimum
0
Maximum
50
Range
50
Interquartile Range
7
Skewness
2.658
.157
Kurtosis
7.585
.314
Female Mean
5.93
.485
95% Confidence Interval for Mean
Lower Bound
4.98
Upper Bound
6.88
5% Trimmed Mean
4.57
Median
2.00
1
5
Variance
78.924
Std. Deviation
8.884
Minimum
0
Maximum
50
Range
50
Interquartile Range
6
Skewness
2.675
.133
Kurtosis
7.834
.265
Percentiles
sex
Percentiles
5
10
25
50
75
Weighted Average(Definition
1)
emailhr
Male
.00
.00
1.00
2.00
8.00
Female
.00
.00
1.00
2.00
7.00
Tukey's Hinges
emailhr
Male
1.00
2.00
7.50
Female
1.00
2.00
7.00
Percentiles
sex
Percentiles
90
95
Weighted Average(Definition
1)
emailhr
Male
20.00
30.00
Female
15.90
28.00
Tukey's Hinges
emailhr
Male
Female
Tests of Normality
sex
Kolmogorov-Smirnov
a
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Sig.
emailhr
Male
.253
239
<.001
.649
239
<.001
Female
.262
336
<.001
.651
336
<.001
a. Lilliefors Significance Correction
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1
6
emailhr
Histograms
emailhr
50
40
30
20
10
0
100
80
60
40
20
0
Histogram
for sex= Male
Mean = 6.33
Std. Dev. = 9.506
N = 239
1
7
emailhr
50
40
30
20
10
0
120
100
80
60
40
20
0
Histogram
for sex= Female
Mean = 5.93
Std. Dev. = 8.884
N = 336
Normal Q-Q Plots
1
8
Observed Value
60
40
20
0
4
2
0
-2
Normal Q-Q Plot of emailhr
for sex= Male
Observed Value
60
40
20
0
6
4
2
0
-2
Normal Q-Q Plot of emailhr
for sex= Female
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1
9
Detrended Normal Q-Q Plots
Observed Value
50
40
30
20
10
0
2
1
0
-1
Detrended Normal Q-Q Plot of emailhr
for sex= Male
2
0
Observed Value
50
40
30
20
10
0
3
2
1
0
-1
Detrended Normal Q-Q Plot of emailhr
for sex= Female
sex
Female
Male
50
40
30
20
10
0
573
575
571
568
334
333
330
332
566
233
560
562
564
103
558
559
104
572
335
336
567
570
331
234
561
565
329
102
232
328 556
322
325 545
546
544
555
557
554
548
549
551
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL