Kaelyn_Murphy_4-2 Project_One
docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
300
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
6
Uploaded by BaronKuduPerson693
1
4-2 Project One
Kaelyn Murphy
Kaelyn.murphy@snhu.edu
Southern New Hampshire University
DAT-300-T3038 Data Valid: Getting Right Data
2
Organizational Challenges
National Motors has acquired a small firm called Kansas City Motor and needs to merge the data from the various warehouses into a single database. The datasets provided by Kansas City Motor reflect the total number of motors sold. National Motors collects motors sold and runs monthly inventory reports to project inventory for future sales. The datasets provided are in two different formats and two different databases. The datasets are not able to be combined in their current states. The Word document is not in a format
that can be transferred into Excel without making changes to it. The Excel document is also not formatted in a way that the data can be transferred into Word without making changes to it. The data would not be comprehensible if converted either way. The data is relevant to the organizational problem of merging the data from the two different companies as it contains the inventory of motors for both companies. Each company collects and stores their data in different ways and in different databases. Both data sets contain 28 months of data. The data for Kansas City was stored in a defunct AS400 database in scientific format and extracted into the excel spreadsheet whereas the Warehouse uses a Microsoft SQL server which is not compatible. Data Usability
Dirty data is data that is faulty whether it be inaccurate, outdated, incomplete, duplicated, etc. (Couwenbergh, 2023). It is important to clean data so we can ensure accuracy, efficiency, and reliability of the data. When data is clean we are ensuring that we are not wasting time dealing with errors when we are analyzing the data, we are not misleading our analysis, and we are able to trust our results. The data set from the Kansas City’s AS400 database contains missing
data and errors. The data is also in scientific notation which does not align with how the rest of
3
the data is formatted. The data from the Monthly Totals word document could be manually input into the excel doc for Kansas City which would correct the formatting issue. This would also be a simple and quick fix that would not take a large amount of time or resources to do. The risk with manually inputting the data is human error. The data could be used to perform a gap analysis once it has been cleaned. Data Completeness and Accuracy
The data from on the Extracted Kansas City Store tab contains errors on line 9, 19, 26 and is missing data on line 4. There are only 27 months accounted for in the Extracted Kansas City Store tab whereas the Warehouse Data tab contains a full 28 months. The data on the Extracted Kansas City Store tab is in scientific format whereas the Warehouse Data tab is in numerical format. I would recommend formatting the Extracted Kansas City Store tab data in numerical format as it is a simple fix. The data appears to be unique without multiple instances being seen. Utilizing the Kansas City Monthly Totals word document, I do see month 28
th
listed but the amount sold is well under what the rest of the months are so I would recommend verifying it for accuracy. These errors could exist due to information not being updated properly and regularly. For the prevention of errors and inaccurate information in the future, I recommend
that the data be stored in the same database, formatted the same, and verified regularly for accuracy. Data Retention
I do not see any data provided for National Motors as all the data provided is listed for Kansas City Motors unless I misread who the data in the excel doc is for. The data National Motors collects is for motors sold and represents inventory not sales. The data provided by
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
Kansas City Motors represents motors sold. To merge the data for both companies, the data would have to represent the same information. The data needs to be cleaned in both the excel doc
and the word document. Once the data is cleaned, I believe all data should be retained as it is relevant to the problem and representative of the needs for the issue at hand. I do not see any junk data provided. Data Limitations
The data does not contain any information that would be considered unethical when it comes to collection and storage. The data provide pertains to sales which is typically readily available to all parties in the company. The data should only be analyzed by those who are performing the research needed to address the organizational data to ensure the data is kept clean. There are not any legal restrictions when it comes to sales data being shared among multiple parties. The only ethical or legal issues I see that could arise are if someone mishandled the data and used it to represent something that was false like stating sales were higher one month than they truly were.
5
Organizational Challenges/Solutions
Insert ‘Month’ column for Extracted Kansas City Store Tab to align with how Warehouse Data tab is listed
Format data provided on Extracted Kansas City Store tab to Numerical
Insert 3140 for Extracted Kansas City Store tab line 4 (3
rd
month from Kansas City Monthly Totals)
Insert 3030 for Extracted Kansas City Store tab line 9 (8
th
month from Kansas City Monthly Totals)
Insert 2970 for Extracted Kansas City Store tab line 19 (18
th
month from Kansas City Monthly Totals)
Insert 2965 for Extracted Kansas City Store tab line 26 (25
th
month from Kansas City Monthly Totals)
Verify total reported for the 28
th
month on the Kansas City Monthly Totals.
Insert verified value for Extracted Kansas City Store tab line 29 (28
th
month from Kansas City Monthly Totals)
Verify all totals listed on Extracted Kansas City Store tab with Kansas City Monthly word
doc
References
6
Couwenbergh, S. (2023, September 11). The importance of cleaning dirty data for improved operations and customer success
. Validity. https://www.validity.com/blog/dirty-data/#:~:text=Dirty%20data%2C%20or%20unclean
%20data,incomplete%2C%20inaccurate%2C%20or%20inconsistent.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
please. help me answer this question. thank you
arrow_forward
tion 2 of 15
Last summer, the Smith family drove through seven different states and visited various popular landmarks. The prices of gasoline
in dollars per gallon varied from state to state and are listed below.
$2.34, $2.75, $2.48, $3.58, $2.87, $2.53, $3.31
Click to download the data in your preferred format.
CrunchIt! CSV Excel JMP Mac Text Minitab PC Text R SPSS TI Calc
Calculate the range of the price of gas. Give your solution to the nearest cent.
range:
dollars per gallon
DELL
&
4.
7
8.
arrow_forward
Vacation destinations often run on a seasonal basis, depending on the primary activities in that location. Amanda Wang is the owner of a travel agency in Cincinnati, Ohio. She has built a database of the number of vacation packages (Vacation) that she has sold over the last 12 years. The accompanying file includes quarterly data on the number of vacation packages sold.
Year
Quarter
Vacation
2008
1
500
2008
2
147
2008
3
1314
2008
4
673
2009
1
984
2009
2
584
2009
3
1449
2009
4
1125
2010
1
209
2010
2
458
2010
3
1357
2010
4
544
2011
1
483
2011
2
280
2011
3
1392
2011
4
780
2012
1
340
2012
2
1003
2012
3
1692
2012
4
783
2013
1
781
2013
2
409
2013
3
1657
2013
4
792
2014
1
671
2014
2
769
2014
3
1068
2014
4
520
2015
1
616
2015
2
775
2015
3
1215
2015
4
514
2016
1
232
2016
2
492
2016
3
1581
2016
4
1028
2017
1
551
2017
2
869
2017
3
1114
2017
4
476
2018
1
405
2018
2
412
2018
3
1121
2018
4
554
2019
1
666…
arrow_forward
. Primary Data Source and Secondary Data Source ?
arrow_forward
A company produces handmade shawls and afghans. They spin the yarn, dye it, and then weave it. A shawl requires
1 hour of spinning, 1 hour of dyeing, and 1 hour of weaving. An afghan requires 1 hour of spinning, 2 hours of dyeing,
and 4 hours of weaving. There are at most 8 hours available each day for spinning, 10 hours for dyeing, and 16 hours
for weaving. Summarize the data in a table. Use the table to write a system of inequalities. Then graph the feasible
region.
Complete the table.
Hours Hours
Number Spinning Dyeing
Shawls x
Afghans y
Maximum Number
of Hours Available
8
10
Hours
Weaving
16
arrow_forward
Please answer C. and D.
Palisades Eco-Park is a small ecological reserve that admits a relatively small number of visitors on any day, but provides both educational and entertaining lectures, exhibitions, and opportunities to observe nature. The company has collected the following data on labor costs and number of visitors to the park over the last 30 months.
Month
Labor Cost
Visitors
1
$
25,820
1,700
2
$
36,290
2,592
3
$
39,968
2,811
4
$
31,648
2,222
5
$
34,809
2,373
6
$
32,672
2,181
7
$
32,023
2,205
8
$
38,334
2,676
9
$
38,568
2,760
10
$
46,933
3,572
11
$
42,157
2,976
12
$
39,030
2,804
13
$
34,444
2,415
14
$
33,173
2,336
15
$
36,971
2,508
16
$
26,500
2,099
17
$
30,663
3,397
18
$
29,961
2,786
19
$
27,590
2,412
20
$
30,003
2,972
21
$
25,692
2,000
22
$
24,655
1,794
23
$
28,892
3,267
24
$
30,153
3,817
25
$
28,119
2,592
26
$
31,690
3,274
27
$
28,651
2,989
28
$
29,927
3,050
29
$
28,438
2,511
30
$
27,874
2,075…
arrow_forward
A company produces handmade shawls and afghans. They spin the yarn, dye it, and then weave it. A shawl requires 1 hour of spinning, 1 hour of dyeing, and 1 hour of
weaving. An afghan requires 1 hour of spinning, 4 hours of dyeing, and 2 hours of weaving. There are at most 6 hours available each day for spinning, 14 hours for
dyeing, and 8 hours for weaving. Summarize the data in a table. Use the table to write a system of inequalities. Then graph the feasible region.
Complete the table.
Hours
Hours
Hours
Number Spinning Dyeing Weaving
Shawls
Afghans
Maximum Number
y
14
8
of Hours Available
arrow_forward
United Dairies, Inc., supplies milk to several independent grocers throughout Dade County, Florida. Managers at United Dairies want to develop a forecast of the number of half-gallons of milk sold per week. Sales data for the past 12 weeks follow and have been transferred in the data file UDFMilk.xlsx download(also posted under week 11 module).
Week
Sales
Week
Sales
1
2750
7
3300
2
3100
8
3100
3
3250
9
2950
4
2800
10
3000
5
2900
11
3200
6
3050
12
3150
By Thursday, April 15 (11:55pm PST), answer the following 2 questions and attach a word doc or excel file to show your work.
Use the data in the file UDFMilk.xlsx downloadto construct a time series plot. What type of pattern exists in the data?
Use exponential smoothing with to develop a forecast of demand for week 13.
arrow_forward
Johnson Filtration, Inc. provides maintenance service for water-filtration systems. Suppose that in addition to information on the
number of months since the machine was serviced and whether a mechanical or an electrical repair was necessary, the managers
obtained a list showing which repairperson performed the service. The revised data follow.
Click on the datafile logo to reference the data.
DATA file
Repair Time
Months Since
in Hours
Last Service
Type of Repair
Repairperson
2.9
Electrical
Dave Newton
3.0
Mechanical
Dave Newton
4.8
8.
Electrical
Bob Jones
1.8
Mechanical
Dave Newton
2.9
Electrical
Dave Newton
4.9
Electrical
Bob Jones
4.2
6.
Mechanical
Bob Jones
4.8
8.
Mechanical
Bob Jones
4.4
4.
Electrical
Bob Jones
4.5
Electrical
Dave Newton
a. Ignore for now the months since the last maintenance service (1 ) and the repairperson who performed the service. Develop the
estimated simpe linear regression equation to predict the repair time (y) given the type of repair (2 ). Recall that…
arrow_forward
In IBM SPSS, what does clicking on this icon do?
arrow_forward
A1
1
2
3
7
5
5
7
B
9
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
1
2
A
Day
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
X
B
Discharged_Patients
A
54
63
110
105
131
137
80
63
75
92
105
112
120
95
72
128
126
106
129
136
94
74
107
135
124
113
140
83
62
106
fx
Day
с
Records_not_Processed
18
18
52
29
70
57
26
29
14
27
49
38
47
26
28
49
45
39
27
38
44
25
29
62
44
60
46
38
25
39
D
E
F
G
H
I
J
arrow_forward
Just need help with part 3, thank you!
arrow_forward
What is meant by strategic mapping, and why is this technique especially useful in healthcare strategic planning?
arrow_forward
I need help with this problem please.
arrow_forward
Which type of data source provides real-time data?
A. Historical data
B. Survey data
C. Transactional data
D. Social media data
arrow_forward
Create scatterplot using Excel
arrow_forward
Suppose you have access to a database with the variables listed below. Choose two variables you are interested in exploring that may be related (a predictor and outcome).
Which variable is your y (outcome) and which is your x (predictor)? *Hint: which one may predict the other? Example: hot weather may predict sunburns.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Related Questions
- please. help me answer this question. thank youarrow_forwardtion 2 of 15 Last summer, the Smith family drove through seven different states and visited various popular landmarks. The prices of gasoline in dollars per gallon varied from state to state and are listed below. $2.34, $2.75, $2.48, $3.58, $2.87, $2.53, $3.31 Click to download the data in your preferred format. CrunchIt! CSV Excel JMP Mac Text Minitab PC Text R SPSS TI Calc Calculate the range of the price of gas. Give your solution to the nearest cent. range: dollars per gallon DELL & 4. 7 8.arrow_forwardVacation destinations often run on a seasonal basis, depending on the primary activities in that location. Amanda Wang is the owner of a travel agency in Cincinnati, Ohio. She has built a database of the number of vacation packages (Vacation) that she has sold over the last 12 years. The accompanying file includes quarterly data on the number of vacation packages sold. Year Quarter Vacation 2008 1 500 2008 2 147 2008 3 1314 2008 4 673 2009 1 984 2009 2 584 2009 3 1449 2009 4 1125 2010 1 209 2010 2 458 2010 3 1357 2010 4 544 2011 1 483 2011 2 280 2011 3 1392 2011 4 780 2012 1 340 2012 2 1003 2012 3 1692 2012 4 783 2013 1 781 2013 2 409 2013 3 1657 2013 4 792 2014 1 671 2014 2 769 2014 3 1068 2014 4 520 2015 1 616 2015 2 775 2015 3 1215 2015 4 514 2016 1 232 2016 2 492 2016 3 1581 2016 4 1028 2017 1 551 2017 2 869 2017 3 1114 2017 4 476 2018 1 405 2018 2 412 2018 3 1121 2018 4 554 2019 1 666…arrow_forward
- . Primary Data Source and Secondary Data Source ?arrow_forwardA company produces handmade shawls and afghans. They spin the yarn, dye it, and then weave it. A shawl requires 1 hour of spinning, 1 hour of dyeing, and 1 hour of weaving. An afghan requires 1 hour of spinning, 2 hours of dyeing, and 4 hours of weaving. There are at most 8 hours available each day for spinning, 10 hours for dyeing, and 16 hours for weaving. Summarize the data in a table. Use the table to write a system of inequalities. Then graph the feasible region. Complete the table. Hours Hours Number Spinning Dyeing Shawls x Afghans y Maximum Number of Hours Available 8 10 Hours Weaving 16arrow_forwardPlease answer C. and D. Palisades Eco-Park is a small ecological reserve that admits a relatively small number of visitors on any day, but provides both educational and entertaining lectures, exhibitions, and opportunities to observe nature. The company has collected the following data on labor costs and number of visitors to the park over the last 30 months. Month Labor Cost Visitors 1 $ 25,820 1,700 2 $ 36,290 2,592 3 $ 39,968 2,811 4 $ 31,648 2,222 5 $ 34,809 2,373 6 $ 32,672 2,181 7 $ 32,023 2,205 8 $ 38,334 2,676 9 $ 38,568 2,760 10 $ 46,933 3,572 11 $ 42,157 2,976 12 $ 39,030 2,804 13 $ 34,444 2,415 14 $ 33,173 2,336 15 $ 36,971 2,508 16 $ 26,500 2,099 17 $ 30,663 3,397 18 $ 29,961 2,786 19 $ 27,590 2,412 20 $ 30,003 2,972 21 $ 25,692 2,000 22 $ 24,655 1,794 23 $ 28,892 3,267 24 $ 30,153 3,817 25 $ 28,119 2,592 26 $ 31,690 3,274 27 $ 28,651 2,989 28 $ 29,927 3,050 29 $ 28,438 2,511 30 $ 27,874 2,075…arrow_forward
- A company produces handmade shawls and afghans. They spin the yarn, dye it, and then weave it. A shawl requires 1 hour of spinning, 1 hour of dyeing, and 1 hour of weaving. An afghan requires 1 hour of spinning, 4 hours of dyeing, and 2 hours of weaving. There are at most 6 hours available each day for spinning, 14 hours for dyeing, and 8 hours for weaving. Summarize the data in a table. Use the table to write a system of inequalities. Then graph the feasible region. Complete the table. Hours Hours Hours Number Spinning Dyeing Weaving Shawls Afghans Maximum Number y 14 8 of Hours Availablearrow_forwardUnited Dairies, Inc., supplies milk to several independent grocers throughout Dade County, Florida. Managers at United Dairies want to develop a forecast of the number of half-gallons of milk sold per week. Sales data for the past 12 weeks follow and have been transferred in the data file UDFMilk.xlsx download(also posted under week 11 module). Week Sales Week Sales 1 2750 7 3300 2 3100 8 3100 3 3250 9 2950 4 2800 10 3000 5 2900 11 3200 6 3050 12 3150 By Thursday, April 15 (11:55pm PST), answer the following 2 questions and attach a word doc or excel file to show your work. Use the data in the file UDFMilk.xlsx downloadto construct a time series plot. What type of pattern exists in the data? Use exponential smoothing with to develop a forecast of demand for week 13.arrow_forwardJohnson Filtration, Inc. provides maintenance service for water-filtration systems. Suppose that in addition to information on the number of months since the machine was serviced and whether a mechanical or an electrical repair was necessary, the managers obtained a list showing which repairperson performed the service. The revised data follow. Click on the datafile logo to reference the data. DATA file Repair Time Months Since in Hours Last Service Type of Repair Repairperson 2.9 Electrical Dave Newton 3.0 Mechanical Dave Newton 4.8 8. Electrical Bob Jones 1.8 Mechanical Dave Newton 2.9 Electrical Dave Newton 4.9 Electrical Bob Jones 4.2 6. Mechanical Bob Jones 4.8 8. Mechanical Bob Jones 4.4 4. Electrical Bob Jones 4.5 Electrical Dave Newton a. Ignore for now the months since the last maintenance service (1 ) and the repairperson who performed the service. Develop the estimated simpe linear regression equation to predict the repair time (y) given the type of repair (2 ). Recall that…arrow_forward
- In IBM SPSS, what does clicking on this icon do?arrow_forwardA1 1 2 3 7 5 5 7 B 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 A Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 X B Discharged_Patients A 54 63 110 105 131 137 80 63 75 92 105 112 120 95 72 128 126 106 129 136 94 74 107 135 124 113 140 83 62 106 fx Day с Records_not_Processed 18 18 52 29 70 57 26 29 14 27 49 38 47 26 28 49 45 39 27 38 44 25 29 62 44 60 46 38 25 39 D E F G H I Jarrow_forwardJust need help with part 3, thank you!arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Elementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage Learning

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning