MIS 661 Topic 2 DQ 1

docx

School

Grand Canyon University *

*We aren’t endorsed by this school

Course

661

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by MasterTitanium11775

Often data sets have missing values and are commonly known as null values. The most appropriate technique depends on why the data are missing. Discuss the techniques used to address the missing values and the scenarios in which they are the most appropriate. Incomplete data can bias the results of the machine learning models and/or reduce the accuracy of the model. Missing data is defined as the values or data that is not stored (or not present) for some variable/s in the given dataset. The three types of missing data are Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR). There are 2 primary ways of handling missing values: 1. Deleting the Missing values: There are 2 ways one can delete the missing data values:  Deleting the entire row (listwise deletion) If a row has many missing values, you can drop the entire row. If every row has some (column) value missing, you might end up deleting the whole data.  Deleting the entire column If a certain column has many missing values, then you can choose to drop the entire column. 2. Imputing the Missing Values: Some ways of replacing the missing values:  Replacing with an arbitrary value If you can make an educated guess about the missing value, then you can replace it with some arbitrary value.  Replacing with the mean

This is the most common method of imputing missing values of numeric columns. If there are outliers, then the mean will not be appropriate. In such cases, outliers need to be treated first.  Replacing with the mode Mode is the most frequently occurring value. It is used in the case of categorical features.  Replacing with the median The median is the middlemost value. It’s better to use the median value for imputation in the case of outliers.  Replacing with the previous value – forward fill In some cases, imputing the values with the previous value instead of the mean, mode, or median is more appropriate. This is called forward fill. It is mostly used in time series data.  Replacing with the next value – backward fill In backward fill, the missing value is imputed using the next value.  Interpolation Missing values can also be imputed using interpolation. Pandas’ interpolate method can be used to replace the missing values with different interpolation methods like ‘polynomial,’ ‘linear,’ and ‘quadratic.’ The default method is ‘linear.’ Missing data is a problem everyone faces while dealing with real-life data. It can impact the quality and accuracy of our results. Understanding the different types of missing data values and their potential impact on the analysis is crucial for researchers to select an appropriate method for handling the missing data. Each method has its advantages and disadvantages and is appropriate for different types of missing data values.

Ogunbiyi, I. (2022). FreeCodeCamp. How to Handle Missing Data in a Dataset https://www.freecodecamp.org/news/how-to-handle-missing-data-in-a-dataset/ Tamboli, N. (2023). Analytics Vidhya. Effective Strategies for Handling Missing Values in Data Analysis https://www.analyticsvidhya.com/blog/2021/10/handling-missing-value/

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Scratch+final+Project+Rubric.docx

FinerFeaturesofZoom_2024CSD450.docx

Vent_danic_serimpi.docx

MIS 655 Topic 7 DQ 1.docx

TCook_task2d096.docx

MIS 661 Topic 2 DQ 2.docx

Web 2.0 Assignment Template.pdf

Assignment12_CSCE5150_Questions.pdf

Access Control Environment Installation Lab.docx

HW 2 Data Comms.docx

Access Control Via Active Directory Lab Pt. 2 .docx

CSCI330_All_Quizzes.pdf

Recommended textbooks for you

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Information Technology Project Management

Computer Science

ISBN:9781337101356

Author:Kathy Schwalbe

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning