LearningJournal1
docx
keyboard_arrow_up
School
Universidade Federal do Estado do Rio de Janeiro - UNIRIO *
*We aren’t endorsed by this school
Course
3440
Subject
Information Systems
Date
Nov 24, 2024
Type
docx
Pages
3
Uploaded by mthlucena
Reflect on the learning from this week around the fundamentals of big data
and respond to the following:
●
Compare and contrast the three base elements of big data (volume,
velocity, and variety).
●
What role do you feel data quality plays in the overall importance of big
data collection and analysis? How does it impact these three base elements?
The Learning Journal entry should be a minimum of 500 words and not more
than 750 words. Use APA citations and references if you use ideas from the
readings or other sources.
This assignment will be assessed by your instructor using the rubric below.
Hello, Instructor Jeff Wolgast.
I will use this space below to submit my work for the Learning Journal.
The principles of big data revolve around the three key elements that start with
V's: Volume, Velocity, and Variety. These components can collectively define the
unique characteristics and challenges associated with managing and analyzing
large-scale datasets. Additionally, data quality plays a crucial role in the overall
importance of big data collection and analysis, as it directly impacts the
accuracy, reliability, efficiency and usefulness of the insights derived from the
data.
Comparing and contrasting these three main elements of Big Data:
Volume
refers to the vast amount of data generated and collected, which
usually is considered too challenging for traditional data management systems
alternatives. Therefore, the scale and magnitude of data compose an important
part for Big Data, aiming to handle petabytes, exabytes, or even zettabytes of
information. Examples could include social media posts, sensor data, customer
transactions, and log files (Taylor, 2022).
Velocity
is related to the speed at which data is generated, processed, and
analyzed. It underscores the real-time or near real-time nature of data
streams. Velocity distinguishes itself from volume by emphasizing the analysis
over the speed that the flow of data is processed rather than the grandiosity of
the data size (Oracle - n.d., n.d.). Examples include data from IoT devices, social
media feeds, financial market transactions, or website clickstreams. The
velocity of data necessitates efficient data ingestion, processing, and analysis
techniques to derive timely insights.
Variety
highlights the diversity of types and sources of data that exists and
may be used in big data. This property considers the following sets to classify
data: structured, semi-structured, and unstructured. Structured data refers to
data with a predefined format and fixed schema, such as relational databases.
Semi-structured data exhibits some organizational structure but does not
adhere to a rigid schema, commonly found in XML or JSON files. Unstructured
data lacks a predefined format and includes text, some multimedia related
data (images, videos), and social media posts. Big data's variety is driven by the
need to analyze data from various sources and formats, enabling a
comprehensive view of information (Taylor, 2022). Variety is more concerned
about being able to handle different formats and sources of data, contrasting
with volume and velocity, which focus on the scale and speed of data.
Below I will provide a more detailed description on how data quality is a
relevant concern for each of the key elements involving Big Data:
Poor data quality, such as duplicate, inconsistent, or erroneous data, can
significantly impact storage requirements, processing times, and overall
efficiency. High data quality ensures that only relevant and accurate data is
stored and processed, minimizing storage costs and enhancing the
effectiveness of data analysis. For example, in a customer database, data
quality measures can help eliminate duplicate records, ensuring an accurate
representation of the customer base.
Additionally, real-time or near real-time data analysis is also pretty dependable
on a certain quality for the data being handled. Timeliness and accuracy are
crucial in rapidly processing and analyzing data streams (Oracle - n.d., n.d.).
With that considered, inaccurate or incomplete data can lead to erroneous
insights or delayed decision-making. Ensuring real-time data streams' quality is
crucial for maintaining the integrity and reliability of analysis results. For
example, in financial trading systems, high data quality is crucial to ensure
accurate and timely analysis of market data, enabling traders to make
informed decisions.
Data quality is instrumental in handling diverse data types effectively. Each
data type requires specific processing techniques, algorithms, and
technologies. Inaccurate or inconsistent data across different data types can
introduce biases or inaccuracies in analysis. Data quality measures, such as
data cleansing and validation, are necessary to ensure the reliability and
accuracy of insights derived from structured, semi-structured, and
unstructured data. For example, in sentiment analysis of social media data,
data quality techniques can help remove noise, ensuring accurate sentiment
classification.
Data quality impacts the overall success and reliability of big data analysis.
Poor data quality can result in misleading insights, erroneous predictions, and
flawed decision-making. Conversely, high data quality enhances the
trustworthiness, credibility, and usefulness of analysis results, enabling
organizations to make more precise decisions based on reliable information.
Data quality practices, like data profiling, cleansing, and validation, are essential
steps in ensuring the accuracy and reliability of big data analysis.
Reference
What is Big Data?
. Oracle. (n.d.).
https://www.oracle.com/big-data/what-is-
big-data/
Taylor, D. (2022, March 26).
What is big data? introduction, types,
characteristics, examples
. Guru99.
https://www.guru99.com/what-is-big-
data.html
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help