Basic Statistics_Parker
docx
keyboard_arrow_up
School
Liberty University *
*We aren’t endorsed by this school
Course
EDCO 735
Subject
Statistics
Date
Jan 9, 2024
Type
docx
Pages
7
Uploaded by azff1989
BASIC STATISTICS
1
Basic Statistics
School of Behavioral Sciences, Liberty University
Author Note
I have no known conflict of interest to disclose.
Correspondence concerning this article should be addressed to.
Email:@liberty.edu
BASIC STATISTICS
2
Basic Statistics
Central Tendency
In variables where data represents a number or amount (called a quantitative variable)
information can be condensed through the three basic descriptive statistics, the mean, median,
and mode. These descriptive statistics make up the measures of central tendencies. The mean,
median, and mode represent the minimum and maximum values, the range, variance, and
standard deviation (Warner, 2020). An outlier is a data point that greatly deviates from most of
the dataset, either significantly larger or smaller, in turn complicating the interpretation of
variable relationships. (Leys et al., 2023).
An example scenario where the measures of central tendency are skewed because of
outliers could be test scores where the average scores were in the 80’s and 90’s, albeit one person
gets a 40 on the test. The score of 40 will significantly skew the mean results, resulting in a mean
that is lower than what any of the students scored. Another example would be examining the
salaries of area fire departments to determine the average salary. Most salaries range from
$65,000-$75,000. Yet in a more prominent city, the same salary is $220,000. When the average is
calculated, the mean will be significantly skewed in the opposite direction as the test scores. In
this scenario, the mean salary will be higher than that of what the actual average is. These
outliers create an inaccurate representation of what the actual average is.
Detecting outliers is crucial, but deciding whether to exclude them from the dataset is
equally essential. An extreme data point does not automatically qualify as an outlier requiring
removal; it might simply be an extreme value. There are a few ways to identify a scenario of a
skewed central tendency. Using data plots (such as scatter plots or box and whisper plots) can
detect outliers. If the Z-score is calculated for each data point, then data points with high absolute
BASIC STATISTICS
3
Z-scores can be ruled as outliers
(Obaid Merza & Jasim Al-Anber, 2021)
. Once an outlier has
been identified, then it can be addressed. A mathematical algorithm can be used for identifying
patterns among detected outliers. Statisticians can remove obviously erroneous numbers if it is a
data entry error. Utilizing robust statistical methods and models, like robust regression, can
reduce the impact of outliers on parameter estimation (Leys et al., 2023; Warner, 2020).
The Sum of Squares
The sum of squares (SS) measures the extent to which the individual data points within
the greater dataset deviate from a dataset's mean (Warner, 2020). Going back to the previous
example of test scores, in this context, the SS helps calculate variance, which tells us how much
the test scores vary or spread out. Looking at the scores (minus the outlier), the mean can be
calculated and then subtracted from each individual data point. Once the mean has been
subtracted from the data point, the resulting number can be squared, determining the squared
differences. Finally, adding the squared differences together will provide the sum of squares.
This number will indicate either a greater or lesser variable, meaning that the data points in the
dataset are spread out over a broader range or the data points are closer to the mean (Finding and
Using Health Statistics, n.d.).
The SS cannot be a negative number. If you square a negative number, it becomes
positive, which means that squaring deviations prevents the issue of positive and negative
deviations, offsetting each other and adding up to zero. After squaring the deviations and adding
them up, the positive number is the sum of squares. When all data points in the dataset are the
same, SS equals zero because there is no variability. In this scenario, every data point matches
the mean, resulting in squared differences of zero, making the SS differences zero as well.
(Statistics Canada, 2021).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BASIC STATISTICS
4
Sampling Distribution
A sampling distribution for a statistic is formed through the collection of a large number
of samples from a specific population
(Eidous & Abu-Shareefa, 2019
). The collection of M
values from numerous samples drawn from the same population is called the sampling
distribution of M (Warner, 2020). This distribution shows the chances of getting a statistic
obtained by taking samples repeatedly from a specific population. It provides a picture of what
outcomes to expect for a population variable's statistic, such as the mean or mode. The
differences between M values and the mean in the sampling distribution offer insights into the
sizes of prediction errors in numerous samples (Warner, 2020).
Standard Deviation
In statistics, σ represents the standard deviation, reflecting the extent of the distribution's
spread (Warner, 2020). Understanding standard deviation contributes to a researcher's
comprehension of the theoretical sampling distribution in terms of its characteristics and shape
by helping them gauge the degree of variability and spread in the population (Lind et al., 2019).
This knowledge aids in predicting and interpreting the shape, central tendency, and variability of
the sampling distribution. This is especially true when applying the Central Limit Theorem,
calculating standard errors, and making inferences about population parameters. When dealing
with a continuous outcome variable in a study, it is essential to estimate the inherent standard
deviation (σ) of the observational errors to make sense of the findings (Walter et al., 2021).
T distribution and Z distribution.
The normal (Z) distribution, with its characteristic bell-shaped curve, is fundamental in
mathematical statistics. The Z-distribution is applicable in regression analysis, experimental
design, and approximating other distributions like binomial and hypergeometric
(Mahbobi &
BASIC STATISTICS
5
Tiemann, 2020)
. It is a crucial aspect of probability theory, often supported by central limit
theorems. Sample sizes influence the sampling distribution, which tends toward normal, even
when the population distribution is not normal. Many phenomena and measurement errors often
adhere to or resemble a normal distribution
(Eidous & Abu-Shareefa, 2019
). The standard normal
distribution does not change; it always has a mean of 0 and a standard deviation of 1.
The t-distribution is a modified z- distribution. It is created by repeatedly sampling from
a normal population with the same sample size, calculating the t-statistic for each sample. To
create a t-distribution, imagine lowering the peak of the normal distribution and letting its tails
become thicker to correct problems that arise when standard distribution is used to estimate a
standard deviation (Warner, 2020). The t-distribution is often used when dealing with small
sample sizes or when the population standard deviation is unknown. It is commonly used in
hypothesis testing and constructing confidence intervals for the population mean. The t-
distribution is more flexible and applicable when dealing with smaller samples or unknown
population standard deviations
(Eidous & Abu-Shareefa, 2019
).
How Sample Size Affects the Confidence Interval
The sample size (N) directly influences the width, precision, and accuracy of a confidence
interval (CI). As the N increases, more information is provided reducing the margin of error and
decreasing the width of the confidence interval. Increasing the N generally leads to a more
accurate confidence interval with a smaller range of possible values, better approximating the
population parameter and reducing sampling errors. Larger sample sizes are less sensitive to
variations in the population. On the other end, A smaller N may result in wider CIs, which means
they can be influenced more by the specific characteristics of the sample (Lakens, 2022).
BASIC STATISTICS
6
References
Eidous, O. M., & Abu-Shareefa, R. (2019). New Approximations for Standard Normal
Distribution Function.
Communications in Statistics - Theory and Methods
,
49
(6), 1357–
1374.
https://doi.org/10.1080/03610926.2018.1563166
Finding and Using Health Statistics
. (n.d.). Www.nlm.nih.gov.
https://www.nlm.nih.gov/oet/ed/stats/02-900.html#:~:text=A%20standard%20deviation
%20close%20to
Lakens, D. (2022). Sample Size Justification.
Collabra: Psychology
,
8
(1), 33267.
https://doi.org/10.1525/collabra.33267
Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to Classify, Detect, and
Manage Univariate and Multivariate Outliers, With Emphasis on Pre-
registration.
International Review of Social Psychology
,
32
(1).
https://psycnet.apa.org/doi/10.5334/irsp.289
Lind, D. A., Marchal, W. G., & Samuel Adam Wathen. (2019).
Basic Statistics for Business &
Economics
. Mcgraw-Hill Education.
Mahbobi, M., & Tiemann, T. K. (2020). Chapter 2. The Normal and T-Distributions.
Pressbooks.nscc.ca
.
https://pressbooks.nscc.ca/introductorybusinessstatistics/chapter/the-
normal-and-t-distributions-2/
Obaid Merza, E., & Jasim Al-Anber, N. (2021). Fast Ways to Detect Outliers.
Journal of
Techniques
,
3
(1), 66–73.
https://www.iasj.net/iasj/download/6e11d19b03519a29
Statistics Canada. (2021, September 2).
Statistics: Power from Data! Variance and Standard
Deviation
. Statcan.gc.ca.
https://www150.statcan.gc.ca/n1/edu/power-
pouvoir/ch12/5214891-eng.htm
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
BASIC STATISTICS
7
Walter, S. D., Rychtář, J., Taylor, D., & Balakrishnan, N. (2021). Estimation of Standard
Deviations and Inverse‐Variance Weights From an Observed Range.
Statistics in
Medicine
,
41
(2), 242–257.
https://doi.org/10.1002/sim.9233
Warner, R. M. (2020).
Applied Statistics I: basic bivariate techniques
(3
rd
ed.)
.
Sage Publications,
Inc. (US).
https://mbsdirect.vitalsource.com/books/9781506352817
Related Documents
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt