Kaelyn_Murphy_4-2 Project_One

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

300

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

6

Uploaded by BaronKuduPerson693

Report
1 4-2 Project One Kaelyn Murphy Kaelyn.murphy@snhu.edu Southern New Hampshire University DAT-300-T3038 Data Valid: Getting Right Data
2 Organizational Challenges National Motors has acquired a small firm called Kansas City Motor and needs to merge the data from the various warehouses into a single database. The datasets provided by Kansas City Motor reflect the total number of motors sold. National Motors collects motors sold and runs monthly inventory reports to project inventory for future sales. The datasets provided are in two different formats and two different databases. The datasets are not able to be combined in their current states. The Word document is not in a format that can be transferred into Excel without making changes to it. The Excel document is also not formatted in a way that the data can be transferred into Word without making changes to it. The data would not be comprehensible if converted either way. The data is relevant to the organizational problem of merging the data from the two different companies as it contains the inventory of motors for both companies. Each company collects and stores their data in different ways and in different databases. Both data sets contain 28 months of data. The data for Kansas City was stored in a defunct AS400 database in scientific format and extracted into the excel spreadsheet whereas the Warehouse uses a Microsoft SQL server which is not compatible. Data Usability Dirty data is data that is faulty whether it be inaccurate, outdated, incomplete, duplicated, etc. (Couwenbergh, 2023). It is important to clean data so we can ensure accuracy, efficiency, and reliability of the data. When data is clean we are ensuring that we are not wasting time dealing with errors when we are analyzing the data, we are not misleading our analysis, and we are able to trust our results. The data set from the Kansas City’s AS400 database contains missing data and errors. The data is also in scientific notation which does not align with how the rest of
3 the data is formatted. The data from the Monthly Totals word document could be manually input into the excel doc for Kansas City which would correct the formatting issue. This would also be a simple and quick fix that would not take a large amount of time or resources to do. The risk with manually inputting the data is human error. The data could be used to perform a gap analysis once it has been cleaned. Data Completeness and Accuracy The data from on the Extracted Kansas City Store tab contains errors on line 9, 19, 26 and is missing data on line 4. There are only 27 months accounted for in the Extracted Kansas City Store tab whereas the Warehouse Data tab contains a full 28 months. The data on the Extracted Kansas City Store tab is in scientific format whereas the Warehouse Data tab is in numerical format. I would recommend formatting the Extracted Kansas City Store tab data in numerical format as it is a simple fix. The data appears to be unique without multiple instances being seen. Utilizing the Kansas City Monthly Totals word document, I do see month 28 th listed but the amount sold is well under what the rest of the months are so I would recommend verifying it for accuracy. These errors could exist due to information not being updated properly and regularly. For the prevention of errors and inaccurate information in the future, I recommend that the data be stored in the same database, formatted the same, and verified regularly for accuracy. Data Retention I do not see any data provided for National Motors as all the data provided is listed for Kansas City Motors unless I misread who the data in the excel doc is for. The data National Motors collects is for motors sold and represents inventory not sales. The data provided by
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 Kansas City Motors represents motors sold. To merge the data for both companies, the data would have to represent the same information. The data needs to be cleaned in both the excel doc and the word document. Once the data is cleaned, I believe all data should be retained as it is relevant to the problem and representative of the needs for the issue at hand. I do not see any junk data provided. Data Limitations The data does not contain any information that would be considered unethical when it comes to collection and storage. The data provide pertains to sales which is typically readily available to all parties in the company. The data should only be analyzed by those who are performing the research needed to address the organizational data to ensure the data is kept clean. There are not any legal restrictions when it comes to sales data being shared among multiple parties. The only ethical or legal issues I see that could arise are if someone mishandled the data and used it to represent something that was false like stating sales were higher one month than they truly were.
5 Organizational Challenges/Solutions Insert ‘Month’ column for Extracted Kansas City Store Tab to align with how Warehouse Data tab is listed Format data provided on Extracted Kansas City Store tab to Numerical Insert 3140 for Extracted Kansas City Store tab line 4 (3 rd month from Kansas City Monthly Totals) Insert 3030 for Extracted Kansas City Store tab line 9 (8 th month from Kansas City Monthly Totals) Insert 2970 for Extracted Kansas City Store tab line 19 (18 th month from Kansas City Monthly Totals) Insert 2965 for Extracted Kansas City Store tab line 26 (25 th month from Kansas City Monthly Totals) Verify total reported for the 28 th month on the Kansas City Monthly Totals. Insert verified value for Extracted Kansas City Store tab line 29 (28 th month from Kansas City Monthly Totals) Verify all totals listed on Extracted Kansas City Store tab with Kansas City Monthly word doc References
6 Couwenbergh, S. (2023, September 11). The importance of cleaning dirty data for improved operations and customer success . Validity. https://www.validity.com/blog/dirty-data/#:~:text=Dirty%20data%2C%20or%20unclean %20data,incomplete%2C%20inaccurate%2C%20or%20inconsistent.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help