Demo Questions and Solutions V63 (1)

docx

School

Carleton University *

*We aren’t endorsed by this school

Course

102

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by MegaRiver12287

DEMO QUESTIONS and Solutions: STATISTICS 151 LAB Please title, label, and include appropriate scales on graphs, charts, plots and tables. 1. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column ALBBEST, a variable that measures which Alberta political party students think had the best platform (Conservative, Green, Liberal, NDP). a) What type of variable is ALBBEST? Categorical (not ordinal) b) Use SOFTWARE to tally the counts and percents for the various outcomes for this data. Fill in the chart below. SOFTWARE OUTPUT Party Count Percent Conservative 21 35.0000% Green 5 8.3333% Liberal 5 8.3333% NDP 29 (HIGHEST COUNT) (MODE) 48.3333% 60 99.9999% c) Use the information to help you fill in the table below (by hand). Frequency Relative Frequency (f/n)(Proportion) Conservative 21 21/60= 0.3500 etc. Green 5 0.0833 Liberal 5 0.0833 NDP 29 0.4833 TOTAL 60 0.9999 1

d) Use the information above to draw a frequency and percent bar chart for describing this data. Frequency Bar Chart Percent Bar Chart e) The mode is the outcome (choice) of a categorical value that occurs most often (i.e. has the highest count). In this case the mode for our column of ALBBEST is NDP as that bar on the graphs has the highest count of 29 students. 2. Recently, statistics students at MacEwan completed a survey. Raw data can be found in the file STATISTICSSTUDENTSURVEYFORR. It contains the column LIKEENGLISH, a categorical ordinal variable that measures how much students like learning the discipline English at school (1_Dislike_Very_Much, 2_Dislike, 3_Neutral, 4_Like, 5_Like_Very_Much). a) What kind of a variable is LIKEENGLISH? Categorical Ordinal b) Use SOFTWARE to create a bar chart and a pie chart to illustrate the counts of the data for the various choices in the LIKEENGLISH variable. Paste them below. c) The mode is the most common outcome (choice) of a categorical value. This outcome has highest bar in its 2

bar chart. Here, we have a tie , as both 2_Dislike and 3_Neutral have the same count of values (and the same height of bar in their bar chart. d) Do you prefer the bar chart or the pie chart for illustrating this data. Explain your choice. The bar chart works nicely for ordinal data, as most people readily think from least to most (or lowest to highest) (or 1_dislike_very_much to 5_like_very_much). It requires a bit more cognitive work to move your mind among the outcomes (choices) of the categorical variable when we look at the pie chart. 3. Allan recently recorded the raw scores obtained by 20 of his students on a calculus midterm. You are interested in the number of students in the following classes (also called intervals or bins). [40,45), [45, 50), [50, 55), [55,60), [60,65), [65,70). Data can be found in the file ALLANSMIDTERMDATA. a) A [ bracket means that the left end of the interval is closed, while a ) bracket means that the right end of the interval is open. For example, this means that a value of 45 exactly would be assigned to the interval [45, 50). To what interval would a value of 60 be assigned? [60,65) b) Cutpoints are values that occur at the left [ edge and the right ) edge of each interval. For example, the cutpoints for the interval [45,50) are 45 and 50. What are the cutpoints for the interval [60,65)? 60 and 65 c) The midpoints of an interval are the points in the middle of an interval. For example, the midpoint of the interval [40,45) is 42.5. What is the midpoint of the interval [60,65)? 62.5 R, by default, makes intervals of the form (a,b]. But the default used by the textbook (and in lecture) if making a histogram by hand, is intervals of the form [a,b). Histograms will be similar, but not exactly the same, depending on whether they are generated by R or by hand. d) Use SOFTWARE to make a frequency histogram of this data. It will automatically choose the cutpoints on the edges of each interval and midpoints in the middle of each interval, and in this case, it happens to choose the ones that interest Allan. (This is not always the case!) Allan’s midterm data is pasted below. Students can verify that R uses intervals (a,b]. That is, R uses the intervals (40,45], (45, 50], (50, 55], (55,60], (60,65], (65,70] 3

Your preview ends here