MIS 661 Topic 2 DQ 1

docx

School

Grand Canyon University *

*We aren’t endorsed by this school

Course

661

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

3

Uploaded by MasterTitanium11775

Report
Often data sets have missing values and are commonly known as null values. The most appropriate technique depends on why the data are missing. Discuss the techniques used to address the missing values and the scenarios in which they are the most appropriate. Incomplete data can bias the results of the machine learning models and/or reduce the accuracy of the model. Missing data is defined as the values or data that is not stored (or not present) for some variable/s in the given dataset. The three types of missing data are Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). There are 2 primary ways of handling missing values: 1. Deleting the Missing values: There are 2 ways one can delete the missing data values: Deleting the entire row (listwise deletion) If a row has many missing values, you can drop the entire row. If every row has some (column) value missing, you might end up deleting the whole data. Deleting the entire column If a certain column has many missing values, then you can choose to drop the entire column. 2. Imputing the Missing Values: Some ways of replacing the missing values: Replacing with an arbitrary value If you can make an educated guess about the missing value, then you can replace it with some arbitrary value. Replacing with the mean
This is the most common method of imputing missing values of numeric columns. If there are outliers, then the mean will not be appropriate. In such cases, outliers need to be treated first. Replacing with the mode Mode is the most frequently occurring value. It is used in the case of categorical features. Replacing with the median The median is the middlemost value. It’s better to use the median value for imputation in the case of outliers. Replacing with the previous value – forward fill In some cases, imputing the values with the previous value instead of the mean, mode, or median is more appropriate. This is called forward fill. It is mostly used in time series data. Replacing with the next value – backward fill In backward fill, the missing value is imputed using the next value. Interpolation Missing values can also be imputed using interpolation. Pandas’ interpolate method can be used to replace the missing values with different interpolation methods like ‘polynomial,’ ‘linear,’ and ‘quadratic.’ The default method is ‘linear.’ Missing data is a problem everyone faces while dealing with real-life data. It can impact the quality and accuracy of our results. Understanding the different types of missing data values and their potential impact on the analysis is crucial for researchers to select an appropriate method for handling the missing data. Each method has its advantages and disadvantages and is appropriate for different types of missing data values.
Ogunbiyi, I. (2022). FreeCodeCamp. How to Handle Missing Data in a Dataset https://www.freecodecamp.org/news/how-to-handle-missing-data-in-a-dataset/ Tamboli, N. (2023). Analytics Vidhya. Effective Strategies for Handling Missing Values in Data Analysis https://www.analyticsvidhya.com/blog/2021/10/handling-missing-value/
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help