Case Study 5- Data Curation Project

docx

School

Mississippi College *

*We aren’t endorsed by this school

Course

640

Subject

Information Systems

Date

Apr 3, 2024

Type

docx

Pages

8

Uploaded by mmakol

Report
Running Head: Case Study 5- Data Curation Project UNIVERSITY OF THE POTOMAC DACS640:7: Online Data Integration, Warehousing, Provenance, and Analysis Dr. Daryl R. Brydie Megha Makol
Case Study 5- Data Curation Project Abstract As data environments with increasing data variety and volume emerge, organizations must be supported by processes and technologies that enable them to produce and maintain high-quality data while facilitating data reuse, accessibility, and analysis. Data curation infrastructures play an important role in addressing common challenges found in many different data production and consumption environments in modern data management environments. Recent changes in the data landscape have resulted in significant changes and new demands for data curation processes and technologies. This document examines how the emerging big data landscape is defining new requirements for data curation infrastructures, as well as how curation infrastructures are evolving to meet these challenges.
Case Study 5- Data Curation Project Introduction One of the fundamental principles of data analytics is that the quality of the analysis is determined by the quality of the data being analyzed. According to Gartner, more than 25% of critical data in the world's top companies is flawed (Gartner 2007). Data quality issues can have a significant impact on business operations, particularly when it comes to organizational decision-making processes. Data curation offers methodological and technological data management support to address data quality issues while maximizing data usability. In the words of Cragin et al. (2007), "Data curation is the active and on-going management of data through its lifecycle of interest and usefulness; … curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time”. As the number of data sources and platforms for data generation grows, data curation emerges as a critical data management process. Figure below depicts the position of big data curation within the overall big data value chain. Content creation, selection, classification, transformation, validation, and preservation are all examples of data curation processes. The selection and implementation of a data curation process is a multidimensional problem that is influenced by the interaction of incentives, economics, standards, and technological dimensions. The document examines the data dynamics into which data curation is inserted, investigates future data curation requirements and emerging trends, and briefly describes exemplar case studies. ( Data Value Chain – Data Economy )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Case Study 5- Data Curation Project I intend to use data normalization to clean and integrate multiple datasets for this data curation project. I will be able to create a single dataset with consistent and uniform data points by normalizing the data. This will allow me to analyze the data more precisely and produce more meaningful results. I'll use a dataset of public-school enrollment statistics from across the United States to demonstrate this technique. This dataset includes information on the number of students enrolled in various grade levels, the number of educators, the number of schools, and other demographic data. (White)
Case Study 5- Data Curation Project First, I'll normalize the data by converting all values to a standardized unit of measurement. This can be accomplished by converting all numerical values to percentages or a standard unit of measurement, such as students per school. Following that, I will create a single dataset by combining the data from the various sources into a single table. This single dataset will contain the same values as the previous one, but in a more consistent and uniform format. TABLE 1.1: NUMBER OF CHARTER STUDENTS AND SCHOOLS FROM 1992-93 THROUGH 2020-21
Case Study 5- Data Curation Project Year Charter Students Percent Change Charter Schools Percent Change Closed Open Open/Closed Same Year Share of Public Students 1992-93 0 1 0 1 0 0.0% 1993-94 6193 23 2200.0% 0 22 0 0.0% 1994-95 21100 240.7% 68 195.7% 0 45 0 0.0% 1995-96 34939 65.6% 135 98.5% 0 67 0 0.1% 1996-97 55200 58.0% 217 60.7% 0 82 0 0.1% 1997-98 83908 52.0% 354 63.1% 1 137 0 0.2% 1998-99 168864 101.2% 660 86.4% 0 278 29 0.4% 1999-00 342840 103.0% 1526 131.2% 23 846 49 0.7% 2000-01 448362 30.8% 1989 30.3% 72 493 42 0.9% 2001-02 571197 27.4% 2347 18.0% 118 448 24 1.2% 2002-03 667002 16.8% 2579 9.9% 114 357 17 1.4% 2003-04 790496 18.5% 2966 15.0% 140 483 35 1.6% 2004-05 888048 12.3% 3381 14.0% 150 561 29 1.8% 2005-06 1020533 14.9% 3770 11.5% 189 504 64 2.1% 2006-07 1160003 13.7% 4079 8.2% 147 536 26 2.3% 2007-08 1278106 10.2% 4393 7.7% 159 458 29 2.6% 2008-09 1438509 12.6% 4727 7.6% 168 497 25 2.9% 2009-10 1611568 12.0% 5033 6.5% 201 470 29 3.2% 2010-11 1792997 11.3% 5342 6.1% 205 513 26 3.6% 2011-12 2060138 14.9% 5798 8.5% 192 624 63 4.1% 2012-13 2271860 10.3% 6140 5.9% 285 568 29 4.5% 2013-14 2527799 11.3% 6566 6.9% 260 712 27 5.0% 2014-15 2723622 7.7% 6882 4.8% 360 560 43 5.4% 2015-16 2866814 5.3% 7057 2.5% 226 560 18 5.7% 2016-17 3033344 5.8% 7227 2.4% 290 393 21 6.0% 2017-18 3170471 4.5% 7349 1.7% 307 406 27 6.2% 2018-19 3323014 4.8% 7581 3.2% 320 545 21 6.5% 2019-20 3456978 4.0% 7697 1.5% 180 436 21 6.8% 2020-21 3695769 6.9% 7821 1.6% 0 325 0 7.5% Finally, I will draw insights and conclusions from the data using a variety of data analysis techniques. This could include examining the relationship between student demographics and academic performance, comparing enrollment trends across states, analyzing the relationship between student achievement and teacher experience, or investigating the relationship between student demographics and academic performance.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Case Study 5- Data Curation Project I will be able to gain a better understanding of public-school enrollment trends and draw meaningful conclusions from the data by using data normalization to clean and integrate multiple datasets. (Kidd) Conclusion Data normalization is a technique for cleaning and integrating multiple datasets by ensuring that all data points are consistent and uniform. This is accomplished by converting all the data values to a standard unit of measurement, such as percentages. Once the data has been formatted consistently, it can be combined into a single dataset. All the same data points will be included in this single dataset, but in a more organized and consistent format. After the data has been normalized and combined into a single dataset, a variety of data analysis techniques can be used to extract insights from it. All the same data points will be included in this single dataset, but in a more organized and consistent format. After the data has been normalized and combined into a single dataset, a variety of data analysis techniques can be used to extract insights from it. This could include examining the relationship between student demographics and academic performance, comparing enrollment trends across states, analyzing the relationship between student achievement and teacher experience, or investigating the relationship between student demographics and academic performance. I will be able to gain a better understanding of public-school enrollment trends and draw meaningful conclusions from the data by using data normalization to clean and integrate multiple datasets.
Case Study 5- Data Curation Project Reference: 1. Data Value Chain – Data Economy . dataeconomy.eu/data-value-chain/#page-content. 2. Kidd, Chrissy. “Data Normalization Explained: How to Normalize Data.” Splunk-Blogs , 28 Oct. 2022, www.splunk.com/en_us/blog/learn/data-normalization.html. 3. White, Jamison. “1. How Many Charter Schools and Students Are There?” National Alliance for Public Charter Schools , data.publiccharters.org/digest/charter-school-data- digest/how-many-charter-schools-and-students-are-there.