DAT 325 Project One
docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
325
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
docx
Pages
2
Uploaded by JusticeMetalRabbit26
DAT 325 Project One Template
Data Quality Plan
Purpose Statement:
Obtaining and maintaining high quality data is vital to making data driven decisions. Without high quality
data, the decisions we make and the processes we adopt based on data analysis have the potential to be
flawed. Acting on incomplete or flawed data can lead to inefficiencies, loss of revenue, and time which
could have been spent working on other projects and initiatives which may have yielded a better result.
This is a risk point that should be taken seriously and is within our control.
Organizational Goals:
Prior to joining the data received from Wayne Enterprises, we must ensure that the data adheres to our
data quality requirements. If we do not take the time to ensure high quality data is entering our system
from the start, we may be faced with incomplete or inaccurate data down the line which will likely have
already affected business decisions made along the way. We must also establish a standard method in
which we extract, transform, and load (ETL) the data. This method must be reproducible to ensure we
receive the same high-quality data into our system each time. Lastly, we must ensure that those handling
the ETL process are aligned on what high data quality is. To lay these goals out more simply we must:
1.
Complete an initial data assessment – this will provide insight into the data's state before joining
our system. This will allow us as an organization to locate obstacles and address areas of
opportunity in source data quality.
2.
Create a process for ETL of the source data – having a standardized ETL procedure will ensure
that issues with data quality and integrity are addressed each time we load new data. A
repeatable process will help us maintain high quality data standards throughout the ETL process
and minimize data quality loss.
3.
Align the organization with Data Quality Expectations – having each stakeholder involved in the
ETL process aligned with data quality expectations will ensure that we continue to follow
industry and organizational standards. The job of ensuring data quality does not begin and stop
at the analyst level, therefore all parties involved in the data should also be involved in the data
quality process. This alignment will be achieved through initial and follow-up training and regular
audits on the data at each step in the ETL process.
Data Quality Characteristics and Procedures:
There are many characteristics by which data quality can be measured, the typical measures to gauge
data quality are completeness, timeliness, validity, consistency, and integrity.
Completeness
refers to the amount of data populated measured against the total possible data
entries for a specific category
(Gawande, 2022)
, essentially checking for missing records.
Timeliness
measures the time between an actual event occurring versus the time it took to
capture that data in the system and make it available for use
(Gawande, 2022)
. A sufficient
lag in capturing data timely can cause downstream processes to suffer due to missing data.
Validity
measures the closeness of the data value to the predetermined values or calculations
(Gawande, 2022)
. Having invalid data can cause issues with downstream calculations if the
data type is not valid.
Consistency
measures how closely your data aligns with another dataset or a reference dataset
(Gawande, 2022)
. If adding data to an existing data set, the data being loaded should be
consistent with previous entries for number of values and data types.
Integrity
measures the degree to which a defined relational constraint is implemented between
two data sets
(Gawande, 2022)
. Cardinality and Referential integrity should be considered
when adding new data to existing data sets.
Security and Personnel Responsibility Plan:
Although there are many stakeholders involved in data quality, and everyone involved in the data
analysis has an expectation to be involved in data quality – there are limitations to the involvement in
this process simply based on security standards and requirements. Limitations must be placed on who
can access the data based on a specific business need, in order to protect personal or sensitive data and
comply with industry regulations. This does pose some significant challenges, for example with respect
to data consumers understanding what high-quality data means with respect to the business functions
utilizing this data, access to the source data may be limited based on the sensitivity level of the data
being analyzed. With the shift in industry to a cloud computing model, the security of our data is more
important than ever.
References:
Gawande, S. (2022, February 22).
A Guide for Data Quality (DQ) and 6 Data Quality
Dimensions
. ICEDQ. https://icedq.com/6-data-quality-
dimensions#what_is_integrity_data_quality_dimension
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help