Notes2-Descriptive_SS

docx

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

324

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by UltraCrowPerson785

Statistics University of Wisconsin Madison – Chelsey Green chelseygreen@wisc.edu N OTES 2: A F IRST L OOK AT R S TUDIO AND D ESCRIPTIVE S TATISTICS Graphical and Numeric Summaries make overall trends in data more apparent. The most appropriate options for graphical and numeric summaries depend on the type and amount of data you have. We only have time to look at a subset of available summary techniques in this course so we will focus on the most common. Oxide Layer Thicknesses Example: Computer chips contain electronic circuits and are sealed with a thin layer of silicon dioxide. The manufacturer considered using recycled silicon wafers instead of new ones to reduce cost. Oxide thickness measurements (in Angstrom Å ) from 18 test runs using new wafers are given below: 90.0, 92.2, 94.9, 92.7, 91.6, 88.2, 92.0, 98.2, 96.0, 91.1, 89.8, 91.5, 91.5, 90.6, 93.1, 88.9, 92.5, 92.4 Oxide Layer Thicknesses Example: (e): Create or access the Notes2 R markdown file. Save the file into your Stats Folder. Define a vector named Thickness to store the 18 observations. Resave the Thickness vector to be ordered from smallest to largest so it is easier to look at. S ELECTING APPROPRIATE G RAPHICAL SUMMARIES : How many variables, what type of data, and how many observations do you have? Summarizing 1 Variable  Numeric/Quantitative Data: Large Data : Histograms, Box plots Small Data: Stem-and-Leaf, Dot Plot  Categorial/Qualitative Data: Bar Charts, Pareto Charts, Pie Charts*, frequency table * Pie Charts are most useful when there are only a few categories and there is a large distinction between percentages 1

Statistics University of Wisconsin Madison – Chelsey Green chelseygreen@wisc.edu Summarizing more than 1 Variables  2 Numeric/Quantitative values on each subject Scatterplot  Single Numeric Data on 2 or more groups Comparative Histograms or Box Plots  Categorical Data on 2 or more groups Contingency table, mosaic plot G RAPHING Q UANTITATIVE D ATA Dot Plot: chart with a number line and a point for each datum above the line at its value. Repeated values are often stacked. a. Draw a number line b. Draw dot above number line at value of each datum In R: stripchart(x, method = “stack”, …) Histogram: chart used to display the frequency, percentage, or density of measurements falling into a range of values with rectangles with heights equal to the frequency, percentage, or density respectively. a. Divide the range (difference between the maximum and minimum measurement) by the number of class intervals desired (usually 5-20 intervals) and round to get a convenient width for each class interval. (Equal bins are most commonly used) b. Compute the frequency or relative frequency of measurements falling into each class interval (set up convention for values that fall on boundary) c. Density Histogram : Compute the density=(relative freq)/(width of bin) of measurements falling into each class interval d. Divide up an x axis according to the class intervals chosen and construct rectangles with heights according to frequency, relative frequency, or density. *For discrete data with only a few values, rectangles are often centered at the individual values In R: hist(x, breaks = "Sturges", freq = NULL, probability = !freq, include.lowest = TRUE, …) *Notice, by default R puts the values that land on breaks into the lower bin. 2

Statistics University of Wisconsin Madison – Chelsey Green chelseygreen@wisc.edu Boxplot : graphic that displays the 5 number summary and outlying values in a box with extending lines. a. Draw and label a vertical or horizontal axis that spans the range of the data b. Draw longer lines at Q1, Median, Q3 perpendicular to axis c. Connect ends of Q1 and Q3 to create box (and give visual display of IQR) d. Identify any point outside [Q1-1.5*IQR, Q3+1.5*IQR] an outlier and plot each outlier on the axis with a dot. (This is default R behavior, but can be adjusted) e. Draw lines from the box to the largest non-outlier and from box to smallest non-outlier In R: boxplot(x, …) or boxplot(y~grp, …) Oxide Layer Thicknesses Example (f): Construct a dot plot, frequency histogram, relative frequency histogram, and density histogram for the data to summarize the numeric observations. Compare the tools and explain how changing the number of classes/bins affects the histograms’ appearance. S UMMARIZING S HAPE OF Q UANTITATIVE D ATA Graphing numeric data allows us to see the shape of the data Symmetric Data: upper and lower half of the data have approximately the same shape E.g.: repeated measurement of same thing *Mean ≈ median with symmetric data 3

Your preview ends here