Data Screening Basics_Parker

docx

School

Liberty University *

*We aren’t endorsed by this school

Course

EDCO 735

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

7

Uploaded by azff1989

Report
DATA SCREENING 1 Data Screening Basics School of Behavioral Sciences, Liberty University Author Note I have no known conflict of interest to disclose. Correspondence concerning this article should be addressed to Email: @liberty.edu
DATA SCREENING 2 Data Basics Percentile Rank It does not make sense for a researcher to find the percentile rank using the z-score and the standard normal distribution table if the empirical frequency distribution is extremely non- normal. Calculating percentiles with z-scores assumes a normal distribution, as z-scores are often linked to normal curves (Warner, 2020). Suppose data is extremely non-normal; using z-scores is inappropriate as the z-score calculation assumptions are violated (Siebert & Siebert, 2018) . In such cases, percentiles should be directly computed from observed data without assuming normality. In extremely non-normal cases, researchers should avoid normality assumptions and opt for nonparametric methods that do not make normality assumptions (Warner, 2020). Bimodal Distributions Bimodal distributions have two distinct peaks or modes, indicating the presence of two different groups or subpopulations within the data. Bimodal distributions can occur when two underlying processes or groups are being measured. For example, if a researcher measures the heights of a population, including adults and children, they might observe a bimodal distribution. Bimodal distributions have a significant number of their observations far from the center. They also have more negative kurtosis values than heavy-tailed distributions like the t distribution. Bimodal distributions occur in research when studying phenomena divided into distinct groups, such as high and low-income ranges, males versus females, young versus old, and other such phenomena. Skewed Distribution Skewed distributions are asymmetrical and have a long tail on one side due to data points that are unevenly distributed around the central point. One side has a longer tail, while the other
DATA SCREENING 3 has a shorter tail (Warner, 2020). Negative (left) skew indicates a longer, thinner tail on the left side of the distribution, whereas positive (right) skew refers to a longer or thinner tail on the right. These skews describe the direction and weight of the distribution. Skewed distributions occur in research when studying events or behaviors that have many responses, such as the number of states people have visited, how many siblings a person has, or how many pairs of shoes a person owns. The variables in these scenarios can range from the lowest value of 0 to large numbers that would result in a skewed distribution shape (Warner, 2020). Absence of Data Screening Recognizing and addressing errors is crucial in the scientific process, leading to the principle that science corrects itself (Brown et al., 2018) . Conversely, if errors are not acknowledged and rectified, science cannot genuinely embody this self-correcting principle, which has been extensively debated. Some argue that errors are necessary for the progress of science, as adhering strictly to established thinking and methods constrains the growth of knowledge. One of the most significant issues that stem from an absence of data screening is missing outliers. An outlier is an erroneous or significantly skewed sample that significantly deviates from most samples in the calibration set (Menezes et al., 2009) . Without proper screening, erroneous or outlying data points may go unnoticed, leading to inaccuracies in the dataset. The data might have missing values or inconsistencies, affecting its overall quality. Unidentified outliers can distort statistical analyses, influencing central tendency and dispersion measures. Skewed or non-normally distributed data may violate assumptions of specific statistical tests, affecting the validity of the results (Knief & Forstmeier, 2021) . Flawed data can lead to incorrect conclusions and interpretations, undermining the reliability of the study's
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
DATA SCREENING 4 findings (Brown et al., 2018) . In extreme cases, it may result in misidentifying trends or relationships within the data. Quantitative Rule for Univariate Outliers The legitimacy of an outlier case can be assessed using the Interquartile Range (IQR), which measures its distance from the central or middle value. The 1.5(IQR) Rule classifies an outlier as a univariate outlier if its value falls at least 1.5 times the length of the IQR box beyond either side of the box's edge (Mowbray et al., 2018) . If a case value falls between 1.5 and 3 times beyond the length of the IQR in either direction, it is considered a mild outlier. In contrast, any value surpassing three times the length of the IQR box is identified as an extreme outlier. Established guidelines for managing errors, genuine missing data, and extreme values are integral to best practices (Van den Broeck et al., 2005). Degrees of Freedom refers to a researcher's ability to determine how they want to analyze their data, manage errors, and address missing data. This analysis can include deleting or omitting a case/participant for the sake of their research (Leys et al., 2019). Degrees of freedom allow researchers to modify their data if they have c ases where a value significantly deviates from the overall pattern (an extreme outlier). Extreme outliers may be candidates for removal, especially if they are suspected to be due to errors or anomalies (Frost, 2019). If outliers violate assumptions and alternative methods are not appropriate, excluding outliers may be considered. Removing cases may be justified in maintaining data integrity if there is evidence of data entry mistakes that cannot be corrected and will influence the analysis. If participants fail to adhere to the study protocol and their data is deemed unreliable or invalid due to non- compliance, excluding their data may be justified (McCoy, 2017). While there are circumstances where deleting a case is justified, it is essential to exercise caution and transparency in such
DATA SCREENING 5 decisions. Researchers must thoroughly document the rationale for exclusions and consider the potential impact on the study's validity and generalizability.
DATA SCREENING 6 References Brown, A. W., Kaiser, K. A., & Allison, D. B. (2018). Issues with data and analyses: Errors, underlying themes, and potential Solutions. Proceedings of the National Academy of Sciences , 115 (11), 2563–2570. https://doi.org/10.1073/pnas.1708279115 Frost, J. (2019, October 23). Guidelines for Removing and Handling Outliers in Data - Statistics By Jim . Statistics by Jim. https://statisticsbyjim.com/basics/remove-outliers/ Knief, U., & Forstmeier, W. (2021). Violating the normality assumption may be the lesser of two evils. Behavior Research Methods , 53 . https://doi.org/10.3758/s13428-021-01587-5 Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to classify, detect, and manage univariate and multivariate outliers, with emphasis on pre- registration. International Review of Social Psychology , 32 (1). https://doi.org/10.5334/irsp.289 McCoy, E. (2017). Understanding the Intention-to-treat principle in randomized controlled trials. Western Journal of Emergency Medicine , 18 (6), 1075–1078. https://doi.org/10.5811/westjem.2017.8.35985 Menezes, J. C., Ferreira, A. P., Rodrigues, L. O., Brás, L. P., & Alves, T. P. (2009). Chemometrics role within the PAT context: Examples from primary pharmaceutical manufacturing. Comprehensive Chemometrics , 313–355. https://doi.org/10.1016/b978- 044452701-1.00012-0 Mowbray, F. I., Fox-Wasylyshyn, S. M., & El-Masri, M. M. (2018). Univariate outliers: A conceptual overview for the nurse researcher. Canadian Journal of Nursing Research , 51 (1), 31–37. https://doi.org/10.1177/0844562118786647
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
DATA SCREENING 7 Siebert, C. F., & Siebert, D. C. (2018). Data analysis with small samples and non-normal data: Nonparametric and other strategies . Oxford University Press. Van den Broeck, J., Argeseanu Cunningham, S., Eeckels, R., & Herbst, K. (2005). Data cleaning: Detecting, diagnosing, and editing data abnormalities. PLoS Medicine , 2 (10), e267. https://doi.org/10.1371/journal.pmed.0020267 Warner, R. M. (2020). Applied Statistics I (3rd ed.). SAGE Publications, Inc. (US). https://mbsdirect.vitalsource.com/books/9781506352817