Basic Statistics_Parker

docx

School

Liberty University *

*We aren’t endorsed by this school

Course

EDCO 735

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

Uploaded by azff1989

BASIC STATISTICS 1 Basic Statistics School of Behavioral Sciences, Liberty University Author Note I have no known conflict of interest to disclose. Correspondence concerning this article should be addressed to. Email:@liberty.edu

BASIC STATISTICS 2 Basic Statistics Central Tendency In variables where data represents a number or amount (called a quantitative variable) information can be condensed through the three basic descriptive statistics, the mean, median, and mode. These descriptive statistics make up the measures of central tendencies. The mean, median, and mode represent the minimum and maximum values, the range, variance, and standard deviation (Warner, 2020). An outlier is a data point that greatly deviates from most of the dataset, either significantly larger or smaller, in turn complicating the interpretation of variable relationships. (Leys et al., 2023). An example scenario where the measures of central tendency are skewed because of outliers could be test scores where the average scores were in the 80’s and 90’s, albeit one person gets a 40 on the test. The score of 40 will significantly skew the mean results, resulting in a mean that is lower than what any of the students scored. Another example would be examining the salaries of area fire departments to determine the average salary. Most salaries range from $65,000-$75,000. Yet in a more prominent city, the same salary is $220,000. When the average is calculated, the mean will be significantly skewed in the opposite direction as the test scores. In this scenario, the mean salary will be higher than that of what the actual average is. These outliers create an inaccurate representation of what the actual average is. Detecting outliers is crucial, but deciding whether to exclude them from the dataset is equally essential. An extreme data point does not automatically qualify as an outlier requiring removal; it might simply be an extreme value. There are a few ways to identify a scenario of a skewed central tendency. Using data plots (such as scatter plots or box and whisper plots) can detect outliers. If the Z-score is calculated for each data point, then data points with high absolute

BASIC STATISTICS 3 Z-scores can be ruled as outliers (Obaid Merza & Jasim Al-Anber, 2021) . Once an outlier has been identified, then it can be addressed. A mathematical algorithm can be used for identifying patterns among detected outliers. Statisticians can remove obviously erroneous numbers if it is a data entry error. Utilizing robust statistical methods and models, like robust regression, can reduce the impact of outliers on parameter estimation (Leys et al., 2023; Warner, 2020). The Sum of Squares The sum of squares (SS) measures the extent to which the individual data points within the greater dataset deviate from a dataset's mean (Warner, 2020). Going back to the previous example of test scores, in this context, the SS helps calculate variance, which tells us how much the test scores vary or spread out. Looking at the scores (minus the outlier), the mean can be calculated and then subtracted from each individual data point. Once the mean has been subtracted from the data point, the resulting number can be squared, determining the squared differences. Finally, adding the squared differences together will provide the sum of squares. This number will indicate either a greater or lesser variable, meaning that the data points in the dataset are spread out over a broader range or the data points are closer to the mean (Finding and Using Health Statistics, n.d.). The SS cannot be a negative number. If you square a negative number, it becomes positive, which means that squaring deviations prevents the issue of positive and negative deviations, offsetting each other and adding up to zero. After squaring the deviations and adding them up, the positive number is the sum of squares. When all data points in the dataset are the same, SS equals zero because there is no variability. In this scenario, every data point matches the mean, resulting in squared differences of zero, making the SS differences zero as well. (Statistics Canada, 2021).

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

BASIC STATISTICS 4 Sampling Distribution A sampling distribution for a statistic is formed through the collection of a large number of samples from a specific population (Eidous & Abu-Shareefa, 2019 ). The collection of M values from numerous samples drawn from the same population is called the sampling distribution of M (Warner, 2020). This distribution shows the chances of getting a statistic obtained by taking samples repeatedly from a specific population. It provides a picture of what outcomes to expect for a population variable's statistic, such as the mean or mode. The differences between M values and the mean in the sampling distribution offer insights into the sizes of prediction errors in numerous samples (Warner, 2020). Standard Deviation In statistics, σ represents the standard deviation, reflecting the extent of the distribution's spread (Warner, 2020). Understanding standard deviation contributes to a researcher's comprehension of the theoretical sampling distribution in terms of its characteristics and shape by helping them gauge the degree of variability and spread in the population (Lind et al., 2019). This knowledge aids in predicting and interpreting the shape, central tendency, and variability of the sampling distribution. This is especially true when applying the Central Limit Theorem, calculating standard errors, and making inferences about population parameters. When dealing with a continuous outcome variable in a study, it is essential to estimate the inherent standard deviation (σ) of the observational errors to make sense of the findings (Walter et al., 2021). T distribution and Z distribution. The normal (Z) distribution, with its characteristic bell-shaped curve, is fundamental in mathematical statistics. The Z-distribution is applicable in regression analysis, experimental design, and approximating other distributions like binomial and hypergeometric (Mahbobi &

BASIC STATISTICS 5 Tiemann, 2020) . It is a crucial aspect of probability theory, often supported by central limit theorems. Sample sizes influence the sampling distribution, which tends toward normal, even when the population distribution is not normal. Many phenomena and measurement errors often adhere to or resemble a normal distribution (Eidous & Abu-Shareefa, 2019 ). The standard normal distribution does not change; it always has a mean of 0 and a standard deviation of 1. The t-distribution is a modified z- distribution. It is created by repeatedly sampling from a normal population with the same sample size, calculating the t-statistic for each sample. To create a t-distribution, imagine lowering the peak of the normal distribution and letting its tails become thicker to correct problems that arise when standard distribution is used to estimate a standard deviation (Warner, 2020). The t-distribution is often used when dealing with small sample sizes or when the population standard deviation is unknown. It is commonly used in hypothesis testing and constructing confidence intervals for the population mean. The t- distribution is more flexible and applicable when dealing with smaller samples or unknown population standard deviations (Eidous & Abu-Shareefa, 2019 ). How Sample Size Affects the Confidence Interval The sample size (N) directly influences the width, precision, and accuracy of a confidence interval (CI). As the N increases, more information is provided reducing the margin of error and decreasing the width of the confidence interval. Increasing the N generally leads to a more accurate confidence interval with a smaller range of possible values, better approximating the population parameter and reducing sampling errors. Larger sample sizes are less sensitive to variations in the population. On the other end, A smaller N may result in wider CIs, which means they can be influenced more by the specific characteristics of the sample (Lakens, 2022).

BASIC STATISTICS 6 References Eidous, O. M., & Abu-Shareefa, R. (2019). New Approximations for Standard Normal Distribution Function. Communications in Statistics - Theory and Methods , 49 (6), 1357– 1374. https://doi.org/10.1080/03610926.2018.1563166 Finding and Using Health Statistics . (n.d.). Www.nlm.nih.gov. https://www.nlm.nih.gov/oet/ed/stats/02-900.html#:~:text=A%20standard%20deviation %20close%20to Lakens, D. (2022). Sample Size Justification. Collabra: Psychology , 8 (1), 33267. https://doi.org/10.1525/collabra.33267 Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre- registration. International Review of Social Psychology , 32 (1). https://psycnet.apa.org/doi/10.5334/irsp.289 Lind, D. A., Marchal, W. G., & Samuel Adam Wathen. (2019). Basic Statistics for Business & Economics . Mcgraw-Hill Education. Mahbobi, M., & Tiemann, T. K. (2020). Chapter 2. The Normal and T-Distributions. Pressbooks.nscc.ca . https://pressbooks.nscc.ca/introductorybusinessstatistics/chapter/the- normal-and-t-distributions-2/ Obaid Merza, E., & Jasim Al-Anber, N. (2021). Fast Ways to Detect Outliers. Journal of Techniques , 3 (1), 66–73. https://www.iasj.net/iasj/download/6e11d19b03519a29 Statistics Canada. (2021, September 2). Statistics: Power from Data! Variance and Standard Deviation . Statcan.gc.ca. https://www150.statcan.gc.ca/n1/edu/power- pouvoir/ch12/5214891-eng.htm

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

BASIC STATISTICS 7 Walter, S. D., Rychtář, J., Taylor, D., & Balakrishnan, N. (2021). Estimation of Standard Deviations and Inverse‐Variance Weights From an Observed Range. Statistics in Medicine , 41 (2), 242–257. https://doi.org/10.1002/sim.9233 Warner, R. M. (2020). Applied Statistics I: basic bivariate techniques (3 rd ed.) . Sage Publications, Inc. (US). https://mbsdirect.vitalsource.com/books/9781506352817

Basic Statistics_Parker

Related Documents