HW1_SPSS

docx

School

DePaul University *

*We aren’t endorsed by this school

Course

403

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

20

Uploaded by AmbassadorHorsePerson1079

Report
1 IT 223 HOMEWORK 1 EXPLORATORY DATA ANALYSIS IN SPSS Total: 30 Points The goal of this assignment is to guide you through the exploratory analysis of a dataset using SPSS. You should do the following exercise in SPSS and submit the results of part 5 as explained at the end of the exercise. This problem should be completed after doing the reading assignments, the practice exercises and viewing the SPSS video tutorials on exploratory data analysis. Data description The data set gss2004.xls contains 575 observations on 5 variables: SEX = Respondent's sex – {1 for Male, 2 for Female} AGE = Age of respondent WWWHR = Hours on the WWW per week for Internet users NEWS30 = Respondent has used news site in the past 30 days (1= “never”, 2 = “1-2 times”, 3=”3- 5times” 4=”more than 5 times”) EMAILHR = Hours of e-mail per week for Internet users The data were collected from the 2004 General Social Survey for adult respondents (18 years of age or older), living in the United States. The GSS is one of the largest and longest projects that have been conducted to monitor social change and the growing complexity of American society (see http://www.norc.org for more information). The analysis described below will study the number of hours spent by Internet users using email. The study will also explore whether men and women use email differently. PART 0: Download the data on your harddrive: 1) Login to the course website at http:\\d2l.depaul.edu 2) Go to Segment 1 (select Content on the top navigation bar) 3) Click on the Datasets link on the left navigation bar of Segment 1 and download the Excel file gss2004.xls .on your computer. 4) Open the SPSS program (If you are running SPSS on the CDM terminals, apply the steps above from the terminal server you are logged on) PART 1: Import an Excel file in SPSS 1) Click on File > Open > Data… under the top menu in SPSS. A dialog box to select files will pop up. 2) Go to the folder where you saved the data file gss2004.xls , and select it. You need to search for “xls” data files in the “files of type” box 3) Click OK. 4) The data should now appear in a SPSS data worksheet. Save it in a .sav file using the SAVE AS… option.
2 5) If data are successfully imported in SPSS, you should have 5 columns of data, which are the variables described above. PART 2: Define variables properties in the Variable View 1) Click on the Variable View of the SPSS data editor. This view will enable you to specify properties of the variables in the dataset, such as change type, add label, etc... 2) Type in meaningful labels for each variable under the Label column (you can use the labels specified above). This step helps you remember what the variables are about. 3) Add value labels under the Value column for SEX and NEWS30. This will help you remember what the codes {1,2,…} denote. The labels will be used in the SPSS output. 4) Select the correct Measure for each variable. Remember that an ordinal variable has values that can be ranked (e.g. preferences); a nominal variable has values with no ranking (e.g gender) and a scale variable is a measurement variable that takes numeric values (e.g. salary). PART 3: Creating a Histogram Create a histogram of WWWHR: the hours per week spent on the WWW for Internet users. 1) Select Graphs > Legacy Dialogs > Histogram… under the top bar menu 2) The Histogram dialog box will appear. Select the variable to be analyzed (WWWHR) and click on the “>” arrow button to move the variable into the “Variable” box. 3) Click on “Titles…” button to add a title to the graph. Just use your intuition to navigate the other screens. 4) Click OK 5) The following histogram should appear 6) Double Click on the histogram chart in the Output window and the Chart Editor should appear. 7) To display only positive values on the Xaxis since WWHR > 0, click on the X-axis (or go to “Edit > Select X axis” on the chart editor menu). Select the “Scale” tab and change minimum value to 0. Then click OK and close the chart editor.
3 8) To change the histogram intervals, double click on the histogram bars and select the Binning tab. Check “Custom” and change the interval width to a small number. How does the histogram change? 9) Try now a large interval width. What happens? PART 4: Compute descriptive statistics for number of hours for email (EMAILHR) METHOD 1: Simple procedure to compute a few descriptive statistics 1) Select Analyze > Descriptive Statistics > Descriptive… 2) Choose the variables to analyze 3) Click on the Options button to select the statistics 4) Click OK METHOD 2: BETTER! More statistics – lots of information! 1) Select Analyze > Descriptive Statistics
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 > Explore 2) Choose the variables to analyze and move them to the “Dependent list” box. 3) Click on the Statistics button and check Percentiles. 4) Click on the Plots buttons and check histogram to create a histogram, and uncheck Stem& Leaf plot. Use both functions and compare the results.
5 PART 5: TO BE SUBMITTED - Compute the descriptive statistics for email time (EMAILHR) by sex of respondents 1) Select Analyze>Descriptive Statistics > Explore… 2) Move the variable EMAILHR into the Dependent List box 3) Move the SEX variable into the Factor List box 4) Click on Statistics to select the statistics to compute 5) Click on Plots… and select “Factor levels together” under BoxPlots and check Histogram, and Normality Plots with tests. 6) Click Continue 7) Click OK in the “Explore” box
6 ANSWER THE FOLLOWING QUESTION: Do men spend more time writing/reading emails than women? Compute the following summary statistics (mean, standard deviation, first quartile, median, third quartile, max and min from SPSS) for the number of weekly hours that men and women spend on emails. Write the statistics in the table below. Gender Variables Mean St.Dev. Max Min Male EMAILHR 6.33 9.506 50 0 Female EMAILHR 5.93 8.884 50 0 Gender Variables Median First quartile Third quartile Male EMAILHR 2 1 8 Female EMAILHR 2 1 7 1. Analyze the descriptive statistics and graphs for men and women. Do you see any difference in the amount of time that men and women spend on email? Describe the shape, center and spread of the distribution and explain in plain English. emailhr 50 40 30 20 10 0 100 80 60 40 20 0 Histogram for sex= Male Mean = 6.33 Std. Dev. = 9.506 N = 239 emailhr 50 40 30 20 10 0 120 100 80 60 40 20 0 Histogram for sex= Female Mean = 5.93 Std. Dev. = 8.884 N = 336 SHAPE: The amount of emailing time for male and women is right skewed (Positive right skewed). This means that majority of people spend about 10 hrs on reading/writing emails per week, however, we have very few people that spend up to 50 hrs per week. Center: The median shows that 50%(Q2) of the people spend 2 hrs on emails. Distribution: The Inter-Quartile Range (IQR =Q3-Q1), how spread the data is distributed. The IQR is 7 for men when compared this number to median we use that is very large, meaning that there is huge discrepancy among the email hrs. Comparing the min=0 and max=50 email times it became evident that the emailing time varies widely. 2. Based on the statistics values computed above and on the shape of the distribution, which statistics would you use to summarize the center of the data and why? Since the distribution is right skewed, I’ll use median to explain or interpret the center.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 3. What does the five number tell us about the time spent on email (Hint, interpret the five number summary in plain English) The five number summary shows 25% of the people will spend 1hr or less on emails per week while 50% will spend 2hrs or less and the, 75% will spend 8hrs or less. The min amount of time spend by both male and female is 0hr, i.e., less than 60mins per week and the max time is spend is 50hrs per week. 4. What does the Boxplot and the normality test show? Explain. Observed Value 60 40 20 0 4 2 0 -2 Normal Q-Q Plot of emailhr for sex= Male Observed Value 60 40 20 0 6 4 2 0 -2 Normal Q-Q Plot of emailhr for sex= Female The graph shows the amount of writing reading emails for both men and women violates the normality test. The points are not aligned on the diagonal line and there are so many outliers. The box plot supports this finding, it looks like there are outliers after 20hrs of emailing per week for both male and female. 5. Use the 1.5xIQR rule to identify possible outliers. List the cutoff points for outliers, Show your workings. Explain what you found out. (Hint: Are there any excessive time spent on email for Male or Female or both). o- outliers *- Far outliers sex Female Male 50 40 30 20 10 0 573 575 571 568 334 333 330 332 566 233 560 562 564 103 558 559 104 572 335 336 567 570 331 234 561 565 329 102 232 328 556 322 325 545 546 544 555 557 554 548 549 551
8 Male – Upper outlier =Q3 +(1.5* IQR) = 8+(1.5*7) =18.5( 18.5 hr and above are outliers) Lower outlier =Q1 -(1.5* IQR) = 1-(1.5*7)=-9.5, because this is negative and time is not negative. The calculation above shows that there is no lower outlier because it is negative and there is no negative time. It is however, unusual to spend more than 18hrs and 30 minutes in email reading or writing, meaning that the man uses more than this time on email is considered not typical Female- Upper outlier =Q3 +(1.5* IQR) = 7+(1.5*6)=16 Lower outlier =Q1 -(1.5* IQR) = 1-(1.5*6)=-8, because this is negative and time is not negative. The calculation above shows that there is no lower outlier because its is negative and there is no negative time. It is however, unusual to spend more than 16 hrs in email reading or writing, meaning that the man uses more than this time on email is considered not typical In general, the median shows the that both male and female spend the same amount of time on email ,i.e. , 2hrs per week and there max time use is 50hrs. However, results also shows on the extreme cut-off for male is hour more than female which doesn’t seem much of a difference. SUBMISSION INSTRUCTIONS: Copy the SPSS relevant tables, graphs and your answer in a document. Bring a printed copy to class on due date and also submit it at the course webpage at http://d2l.depaul.edu. Write your name on the document you submit. Keep a copy of all your submissions! If you have questions about the homework, email me BEFORE the deadline. Please pay attention to due date. No late homework will be accepted. Graphs Notes Output Created 14-SEP-2023 12:02:57 Comments Input Data C:\Users\AMOHAPAT\ Downloads\HW_1.sav Active Dataset DataSet1 Filter <none>
9 Weight <none> Split File <none> N of Rows in Working Data File 575 Syntax GRAPH /HISTOGRAM=wwwhr. Resources Processor Time 00:00:03.41 Elapsed Time 00:00:01.39 wwwhr 80 60 40 20 0 100 80 60 40 20 0 Mean = 7.16 Std. Dev. = 9.67 N = 575 Explore Notes Output Created 14-SEP-2023 12:36:50 Comments Input Data C:\Users\AMOHAPAT\ Downloads\HW_1.sav
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 0 Active Dataset DataSet1 Filter <none> Weight <none> Split File <none> N of Rows in Working Data File 575 Missing Value HandlingDefinition of Missing User-defined missing values for dependent variables are treated as missing. Cases Used Statistics are based on cases with no missing values for any dependent variable or factor used. Syntax EXAMINE VARIABLES=emailhr /PLOT BOXPLOT HISTOGRAM /COMPARE GROUPS /PERCENTILES(5,10,25 ,50,75,90,95) HAVERAGE /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL. Resources Processor Time 00:00:00.77 Elapsed Time 00:00:00.31 Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent emailhr 575 100.0% 0 0.0% 575 100.0% Descriptives Statistic Std. Error emailhr Mean 6.10 .381
1 1 95% Confidence Interval for Mean Lower Bound 5.35 Upper Bound 6.84 5% Trimmed Mean 4.68 Median 2.00 Variance 83.567 Std. Deviation 9.142 Minimum 0 Maximum 50 Range 50 Interquartile Range 6 Skewness 2.667 .102 Kurtosis 7.696 .203 Percentiles Percentiles 5 10 25 50 75 Weighted Average(Definition 1) emailhr .00 .00 1.00 2.00 7.00 Tukey's Hinges emailhr 1.00 2.00 7.00 Percentiles Percentiles 90 95 Weighted Average(Definition 1) emailhr 18.80 28.00 Tukey's Hinges emailhr emailhr
1 2 emailhr 50 40 30 20 10 0 250 200 150 100 50 0 Histogram Mean = 6.1 Std. Dev. = 9.142 N = 575 emailhr 50 40 30 20 10 0 572 573 574 575 335 336 571 567 568 570 334 333 330 331 332 566 103 329 558 559 328 555 556 557 554 327 552 553 549 550 551 544
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 3 Explore Notes Output Created 14-SEP-2023 18:52:08 Comments Input Data C:\Users\AMOHAPAT\ Downloads\HW_1.sav Active Dataset DataSet1 Filter <none> Weight <none> Split File <none> N of Rows in Working Data File 575 Missing Value HandlingDefinition of Missing User-defined missing values for dependent variables are treated as missing. Cases Used Statistics are based on cases with no missing values for any dependent variable or factor used. Syntax EXAMINE VARIABLES=emailhr BY sex /PLOT BOXPLOT HISTOGRAM NPPLOT /COMPARE GROUPS /PERCENTILES(5,10,25 ,50,75,90,95) HAVERAGE /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL. Resources Processor Time 00:00:03.50 Elapsed Time 00:00:01.08
1 4 sex Case Processing Summary sex Cases Valid Missing Total N Percent N Percent N Percent emailhr Male 239 100.0% 0 0.0% 239 100.0% Female 336 100.0% 0 0.0% 336 100.0% Descriptives sex Statistic Std. Error emailhr Male Mean 6.33 .615 95% Confidence Interval for Mean Lower Bound 5.12 Upper Bound 7.54 5% Trimmed Mean 4.85 Median 2.00 Variance 90.364 Std. Deviation 9.506 Minimum 0 Maximum 50 Range 50 Interquartile Range 7 Skewness 2.658 .157 Kurtosis 7.585 .314 Female Mean 5.93 .485 95% Confidence Interval for Mean Lower Bound 4.98 Upper Bound 6.88 5% Trimmed Mean 4.57 Median 2.00
1 5 Variance 78.924 Std. Deviation 8.884 Minimum 0 Maximum 50 Range 50 Interquartile Range 6 Skewness 2.675 .133 Kurtosis 7.834 .265 Percentiles sex Percentiles 5 10 25 50 75 Weighted Average(Definition 1) emailhr Male .00 .00 1.00 2.00 8.00 Female .00 .00 1.00 2.00 7.00 Tukey's Hinges emailhr Male 1.00 2.00 7.50 Female 1.00 2.00 7.00 Percentiles sex Percentiles 90 95 Weighted Average(Definition 1) emailhr Male 20.00 30.00 Female 15.90 28.00 Tukey's Hinges emailhr Male Female Tests of Normality sex Kolmogorov-Smirnov a Shapiro-Wilk Statistic df Sig. Statistic df Sig. emailhr Male .253 239 <.001 .649 239 <.001 Female .262 336 <.001 .651 336 <.001 a. Lilliefors Significance Correction
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 6 emailhr Histograms emailhr 50 40 30 20 10 0 100 80 60 40 20 0 Histogram for sex= Male Mean = 6.33 Std. Dev. = 9.506 N = 239
1 7 emailhr 50 40 30 20 10 0 120 100 80 60 40 20 0 Histogram for sex= Female Mean = 5.93 Std. Dev. = 8.884 N = 336 Normal Q-Q Plots
1 8 Observed Value 60 40 20 0 4 2 0 -2 Normal Q-Q Plot of emailhr for sex= Male Observed Value 60 40 20 0 6 4 2 0 -2 Normal Q-Q Plot of emailhr for sex= Female
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 9 Detrended Normal Q-Q Plots Observed Value 50 40 30 20 10 0 2 1 0 -1 Detrended Normal Q-Q Plot of emailhr for sex= Male
2 0 Observed Value 50 40 30 20 10 0 3 2 1 0 -1 Detrended Normal Q-Q Plot of emailhr for sex= Female sex Female Male 50 40 30 20 10 0 573 575 571 568 334 333 330 332 566 233 560 562 564 103 558 559 104 572 335 336 567 570 331 234 561 565 329 102 232 328 556 322 325 545 546 544 555 557 554 548 549 551