LearningJournal1

docx

School

Universidade Federal do Estado do Rio de Janeiro - UNIRIO *

*We aren’t endorsed by this school

Course

3440

Subject

Information Systems

Date

Nov 24, 2024

Type

docx

Pages

3

Uploaded by mthlucena

Report
Reflect on the learning from this week around the fundamentals of big data and respond to the following: Compare and contrast the three base elements of big data (volume, velocity, and variety). What role do you feel data quality plays in the overall importance of big data collection and analysis? How does it impact these three base elements? The Learning Journal entry should be a minimum of 500 words and not more than 750 words. Use APA citations and references if you use ideas from the readings or other sources. This assignment will be assessed by your instructor using the rubric below. Hello, Instructor Jeff Wolgast. I will use this space below to submit my work for the Learning Journal. The principles of big data revolve around the three key elements that start with V's: Volume, Velocity, and Variety. These components can collectively define the unique characteristics and challenges associated with managing and analyzing large-scale datasets. Additionally, data quality plays a crucial role in the overall importance of big data collection and analysis, as it directly impacts the accuracy, reliability, efficiency and usefulness of the insights derived from the data. Comparing and contrasting these three main elements of Big Data: Volume refers to the vast amount of data generated and collected, which usually is considered too challenging for traditional data management systems alternatives. Therefore, the scale and magnitude of data compose an important part for Big Data, aiming to handle petabytes, exabytes, or even zettabytes of information. Examples could include social media posts, sensor data, customer transactions, and log files (Taylor, 2022). Velocity is related to the speed at which data is generated, processed, and analyzed. It underscores the real-time or near real-time nature of data streams. Velocity distinguishes itself from volume by emphasizing the analysis over the speed that the flow of data is processed rather than the grandiosity of the data size (Oracle - n.d., n.d.). Examples include data from IoT devices, social media feeds, financial market transactions, or website clickstreams. The velocity of data necessitates efficient data ingestion, processing, and analysis techniques to derive timely insights. Variety highlights the diversity of types and sources of data that exists and may be used in big data. This property considers the following sets to classify
data: structured, semi-structured, and unstructured. Structured data refers to data with a predefined format and fixed schema, such as relational databases. Semi-structured data exhibits some organizational structure but does not adhere to a rigid schema, commonly found in XML or JSON files. Unstructured data lacks a predefined format and includes text, some multimedia related data (images, videos), and social media posts. Big data's variety is driven by the need to analyze data from various sources and formats, enabling a comprehensive view of information (Taylor, 2022). Variety is more concerned about being able to handle different formats and sources of data, contrasting with volume and velocity, which focus on the scale and speed of data. Below I will provide a more detailed description on how data quality is a relevant concern for each of the key elements involving Big Data: Poor data quality, such as duplicate, inconsistent, or erroneous data, can significantly impact storage requirements, processing times, and overall efficiency. High data quality ensures that only relevant and accurate data is stored and processed, minimizing storage costs and enhancing the effectiveness of data analysis. For example, in a customer database, data quality measures can help eliminate duplicate records, ensuring an accurate representation of the customer base. Additionally, real-time or near real-time data analysis is also pretty dependable on a certain quality for the data being handled. Timeliness and accuracy are crucial in rapidly processing and analyzing data streams (Oracle - n.d., n.d.). With that considered, inaccurate or incomplete data can lead to erroneous insights or delayed decision-making. Ensuring real-time data streams' quality is crucial for maintaining the integrity and reliability of analysis results. For example, in financial trading systems, high data quality is crucial to ensure accurate and timely analysis of market data, enabling traders to make informed decisions. Data quality is instrumental in handling diverse data types effectively. Each data type requires specific processing techniques, algorithms, and technologies. Inaccurate or inconsistent data across different data types can introduce biases or inaccuracies in analysis. Data quality measures, such as data cleansing and validation, are necessary to ensure the reliability and accuracy of insights derived from structured, semi-structured, and unstructured data. For example, in sentiment analysis of social media data, data quality techniques can help remove noise, ensuring accurate sentiment classification. Data quality impacts the overall success and reliability of big data analysis. Poor data quality can result in misleading insights, erroneous predictions, and flawed decision-making. Conversely, high data quality enhances the
trustworthiness, credibility, and usefulness of analysis results, enabling organizations to make more precise decisions based on reliable information. Data quality practices, like data profiling, cleansing, and validation, are essential steps in ensuring the accuracy and reliability of big data analysis. Reference What is Big Data? . Oracle. (n.d.). https://www.oracle.com/big-data/what-is- big-data/ Taylor, D. (2022, March 26). What is big data? introduction, types, characteristics, examples . Guru99. https://www.guru99.com/what-is-big- data.html
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help