Discussion Forum Unit 2

docx

School

University of Notre Dame *

*We aren’t endorsed by this school

Course

CS4407

Subject

Information Systems

Date

Apr 3, 2024

Type

docx

Pages

2

Uploaded by Beff_Jezos

Report
Discussion Forum Unit 2 Hadoop has become a popular technology for Big Data and Analytics applications. As part of your response to this unit’s discussion question, describe what Hadoop is and how it functions. Further, discuss why Hadoop is such an important analytics technology. What is Hadoop? Hadoop is simply put, a process by which large amounts of data is distributed among clusters of computers to help with processing it. We could illustrate Hadoop as a factory where there are many workers working on a different item, but then at the end of the day all the different items are combined to make one product. The "workers" represent the computers or nodes in a Hadoop cluster, each working on different parts of the data (like different items in the factory) (Apache, 2006). How Hadoop functions As in our analogy above, the function of Hadoop is to make what would otherwise be a large and unmanageable task become more manageable by breaking it down into smaller more manageable tasks. We can think of it as building a jumbo jet. Imagine asking one person to single-handedly build a jumbo jet, this would no doubt be a monumental task for just one individual. The Hadoop framework breaks the job down into smaller segments that can be handled by multiple individuals, each focusing on a specific aspect of the construction process. Hadoop breaks down large data processing tasks into smaller chunks that can be processed in parallel by multiple computers, resulting in faster and more efficient data analysis (Amazon Web Services, 2024). Why is Hadoop important? The most important function of Hadoop is that it “Stores and processes humongous data at a faster rate. The data may be structured, semi- structured, or unstructured”. It is also important because it helps to protect application and data processing against hardware failures. So, if a node in the cluster goes down Hadoop makes sure that the work it was doing gets done by another
computer instead, so that the programs and data keep running smoothly and don't get interrupted (Thamas, 2019). References Amazon Web Services. (2024). What is Hadoop? - apache hadoop explained - AWS. https://aws.amazon.com/what-is/hadoop/ Apache. (2006). Apache Hadoop. https://hadoop.apache.org/ Thamas, Y. (2019, November 19). Importance of Big Data Analytics Tool: Hadoop. Data Science Central. https://www.datasciencecentral.com/importance-of-big-data-analytics-tool- hadoop/
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help