DQ 1

docx

School

Grand Canyon University *

*We aren’t endorsed by this school

Course

620

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

2

Uploaded by ProfField9531

Report
Topic 3 DQ 1 Feb 1-3, 2024 Review the data set below, discuss if it violates the tidy data principles. Provide an explanation of how it does or does not violate these principles. Describe how the data set could be manipulated to be considered tidy. Discuss some advantages and disadvantages of the tidy data approach. Name Age Blue Brown Green Other Height (inches ) Anthon y 34 0 1 0 0 70 Belinda 51 1 0 0 0 67 Casper 19 0 0 1 0 68 Dougla s 70 0 0 1 0 73 According to Hadley Wickham et al. (2016), three interconnected rules determine the tidiness of a dataset: every column is a variable, and every variable is a column. Every row is an observation, and every observation is a row. Every value is a single value, and every cell is a value. From these principles, the tidy data principles are violated because each column is not a variable. Blue, brown, green, and other are not identifiable variables. However, every row is an observation, so this does not violate the tidy data principle. Also, every cell is a value and does not violate the tidy data principle. Ideally, there should be one column representing color and then each color should be a value in the column. To be considered tidy, the dataset should have 4 columns; Name, Age, Color (but more specific as to what color is representing. i.e. hair color, eye color, etc.), and Height. This will make the data in the table more readable. Better decision-making, better customer relations, enhanced data security, enhanced data consistency, enhanced accuracy, increased productivity, enhanced completeness, enhanced data security, cost savings, compliance and regulation, and better decision-making are all benefits of data cleaning. (Tranistics, 2023). A major drawback to data tidying is a time consuming and extremely costly operation. Tidying data includes eliminating outliers and missing observations, this results in incomplete data. Incomplete data may prevent analysts from getting useful insights. When automated, it might be an even worse issue. Certain automated data cleaning technologies lack intelligence and may fail to handle certain dataset observations correctly. (Longe, 2023).
References Longe, B. (2023). Data Cleaning. 7 Techniques + Steps to Cleanse Data. Retrieved on February 2, 2024, from https://www.formpl.us/blog/data-cleaning Tranistics. (2023). Benefits of Cleaning Data. Retrieved on February 2, 2024, from https://www.tranistics.com/benefits-of-data-cleaning/ Wickham, H. et al. (2016). R for Data Science. Retrieved on February 2, 2024, from https://r4ds.hadley.nz/data-tidy
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help