Case Study 5- Data Curation Project
docx
keyboard_arrow_up
School
Mississippi College *
*We aren’t endorsed by this school
Course
640
Subject
Information Systems
Date
Apr 3, 2024
Type
docx
Pages
8
Uploaded by mmakol
Running Head: Case Study 5- Data Curation Project
UNIVERSITY OF THE POTOMAC
DACS640:7: Online Data Integration, Warehousing, Provenance, and Analysis
Dr. Daryl R. Brydie
Megha Makol
Case Study 5- Data Curation Project
Abstract
As data environments with increasing data variety and volume emerge, organizations must be
supported by processes and technologies that enable them to produce and maintain high-quality
data while facilitating data reuse, accessibility, and analysis. Data curation infrastructures play an
important role in addressing common challenges found in many different data production and
consumption environments in modern data management environments. Recent changes in the
data landscape have resulted in significant changes and new demands for data curation processes
and technologies. This document examines how the emerging big data landscape is defining new
requirements for data curation infrastructures, as well as how curation infrastructures are
evolving to meet these challenges.
Case Study 5- Data Curation Project
Introduction
One of the fundamental principles of data analytics is that the quality of the analysis is
determined by the quality of the data being analyzed. According to Gartner, more than 25% of
critical data in the world's top companies is flawed (Gartner 2007). Data quality issues can have
a significant impact on business operations, particularly when it comes to organizational
decision-making processes. Data curation offers methodological and technological data
management support to address data quality issues while maximizing data usability. In the words
of Cragin et al. (2007), "Data curation is the active and on-going management of data through its
lifecycle of interest and usefulness; … curation activities enable data discovery and retrieval,
maintain quality, add value, and provide for re-use over time”. As the number of data sources
and platforms for data generation grows, data curation emerges as a critical data management
process.
Figure below depicts the position of big data curation within the overall big data value chain.
Content creation, selection, classification, transformation, validation, and preservation are all
examples of data curation processes. The selection and implementation of a data curation process
is a multidimensional problem that is influenced by the interaction of incentives, economics,
standards, and technological dimensions. The document examines the data dynamics into which
data curation is inserted, investigates future data curation requirements and emerging trends, and
briefly describes exemplar case studies. (
Data Value Chain – Data Economy
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Case Study 5- Data Curation Project
I intend to use data normalization to clean and integrate multiple datasets for this data curation
project. I will be able to create a single dataset with consistent and uniform data points by
normalizing the data. This will allow me to analyze the data more precisely and produce more
meaningful results. I'll use a dataset of public-school enrollment statistics from across the United States to
demonstrate this technique. This dataset includes information on the number of students enrolled
in various grade levels, the number of educators, the number of schools, and other demographic
data. (White)
Case Study 5- Data Curation Project
First, I'll normalize the data by converting all values to a standardized unit of measurement. This
can be accomplished by converting all numerical values to percentages or a standard unit of
measurement, such as students per school. Following that, I will create a single dataset by
combining the data from the various sources into a single table. This single dataset will contain
the same values as the previous one, but in a more consistent and uniform format. TABLE 1.1: NUMBER OF CHARTER STUDENTS AND SCHOOLS FROM 1992-93 THROUGH 2020-21
Case Study 5- Data Curation Project
Year
Charter Students
Percent
Change
Charter Schools
Percent
Change
Closed
Open
Open/Closed Same Year
Share of Public Students
1992-93
0
1
0
1
0
0.0%
1993-94
6193
23
2200.0%
0
22
0
0.0%
1994-95
21100
240.7%
68
195.7%
0
45
0
0.0%
1995-96
34939
65.6%
135
98.5%
0
67
0
0.1%
1996-97
55200
58.0%
217
60.7%
0
82
0
0.1%
1997-98
83908
52.0%
354
63.1%
1
137
0
0.2%
1998-99
168864
101.2%
660
86.4%
0
278
29
0.4%
1999-00
342840
103.0%
1526
131.2%
23
846
49
0.7%
2000-01
448362
30.8%
1989
30.3%
72
493
42
0.9%
2001-02
571197
27.4%
2347
18.0%
118
448
24
1.2%
2002-03
667002
16.8%
2579
9.9%
114
357
17
1.4%
2003-04
790496
18.5%
2966
15.0%
140
483
35
1.6%
2004-05
888048
12.3%
3381
14.0%
150
561
29
1.8%
2005-06
1020533
14.9%
3770
11.5%
189
504
64
2.1%
2006-07
1160003
13.7%
4079
8.2%
147
536
26
2.3%
2007-08
1278106
10.2%
4393
7.7%
159
458
29
2.6%
2008-09
1438509
12.6%
4727
7.6%
168
497
25
2.9%
2009-10
1611568
12.0%
5033
6.5%
201
470
29
3.2%
2010-11
1792997
11.3%
5342
6.1%
205
513
26
3.6%
2011-12
2060138
14.9%
5798
8.5%
192
624
63
4.1%
2012-13
2271860
10.3%
6140
5.9%
285
568
29
4.5%
2013-14
2527799
11.3%
6566
6.9%
260
712
27
5.0%
2014-15
2723622
7.7%
6882
4.8%
360
560
43
5.4%
2015-16
2866814
5.3%
7057
2.5%
226
560
18
5.7%
2016-17
3033344
5.8%
7227
2.4%
290
393
21
6.0%
2017-18
3170471
4.5%
7349
1.7%
307
406
27
6.2%
2018-19
3323014
4.8%
7581
3.2%
320
545
21
6.5%
2019-20
3456978
4.0%
7697
1.5%
180
436
21
6.8%
2020-21
3695769
6.9%
7821
1.6%
0
325
0
7.5%
Finally, I will draw insights and conclusions from the data using a variety of data analysis
techniques. This could include examining the relationship between student demographics and
academic performance, comparing enrollment trends across states, analyzing the relationship
between student achievement and teacher experience, or investigating the relationship between
student demographics and academic performance.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Case Study 5- Data Curation Project
I will be able to gain a better understanding of public-school enrollment trends and draw
meaningful conclusions from the data by using data normalization to clean and integrate multiple
datasets. (Kidd)
Conclusion
Data normalization is a technique for cleaning and integrating multiple datasets by ensuring that
all data points are consistent and uniform. This is accomplished by converting all the data values
to a standard unit of measurement, such as percentages. Once the data has been formatted
consistently, it can be combined into a single dataset. All the same data points will be included in
this single dataset, but in a more organized and consistent format.
After the data has been normalized and combined into a single dataset, a variety of data analysis
techniques can be used to extract insights from it. All the same data points will be included in
this single dataset, but in a more organized and consistent format.
After the data has been normalized and combined into a single dataset, a variety of data analysis
techniques can be used to extract insights from it. This could include examining the relationship
between student demographics and academic performance, comparing enrollment trends across
states, analyzing the relationship between student achievement and teacher experience, or
investigating the relationship between student demographics and academic performance. I will
be able to gain a better understanding of public-school enrollment trends and draw meaningful
conclusions from the data by using data normalization to clean and integrate multiple datasets.
Case Study 5- Data Curation Project
Reference:
1.
Data Value Chain – Data Economy
. dataeconomy.eu/data-value-chain/#page-content.
2.
Kidd, Chrissy. “Data Normalization Explained: How to Normalize Data.” Splunk-Blogs
,
28 Oct. 2022, www.splunk.com/en_us/blog/learn/data-normalization.html.
3.
White, Jamison. “1. How Many Charter Schools and Students Are There?” National
Alliance for Public Charter Schools
, data.publiccharters.org/digest/charter-school-data-
digest/how-many-charter-schools-and-students-are-there.