WrittenAssignment7

pdf

School

Universidade Federal do Estado do Rio de Janeiro - UNIRIO *

*We aren’t endorsed by this school

Course

3440

Subject

Computer Science

Date

Nov 24, 2024

Type

pdf

Pages

5

Uploaded by mthlucena

Report
CS 3440 Big Data Written Assignment Unit 7 University of the People
Instructions For this week’s written assignment, answer the following questions : In this week's reading by Lev-Libfled and Margolin (2019) , the authors discuss the fact that the MapReduce paradigm is based on several assumptions, which include: the completeness of data, independence of data set calculations, and relevancy distinguishability. Describe what each of these assumptions means. How will this impact the MapReduce paradigm, if you fail to evaluate the above-said assumptions? What will the effect of the impact on big data security be? You will be assessed based on: Description of the completeness of data, independence of data set calculations, and relevancy distinguishability Explanation of the impact, on failing to evaluate assumptions noted in the MapReduce paradigm and the effect on big data security. Organization and style (including APA formatting) Submit a paper that is at least 2 pages in length exclusive of the reference page, double- spaced using 12-point Times New Roman font. The paper must cite a minimum of two sources in APA format and be well-written. Check all content for grammar and spelling and be sure you have correctly cited all resources (in APA format) used. Refer to the UoPeople APA Tutorials in the LRC for help with APA citations.
Assignment The MapReduce paradigm, widely used for processing Big Data, is built on certain assumptions, that while serving as a foundation for data processing, can lead to challenges and inaccuracies when not thoroughly evaluated (Lev-Libfeld and Margolin, 2019). I will delve deeper into each assumption and its implications in the space below: 1. Completeness of Data: Assumption : The MapReduce paradigm assumes that the data required to solve a problem is available in its entirety. Implications : Failing to validate data completeness can result in incomplete or inaccurate results. In practical scenarios, data may be continuously generated or arrive in a dynamic fashion, leading to situations where the entirety of data needed for a precise solution may never be available. This can undermine the reliability of outcomes and subsequent decisions based on them. 2. Independence of Data Set Calculations: Assumption : The paradigm assumes that individual data set calculations are independent and can be performed concurrently without influencing each other. Implications : In real-world scenarios, data sets can have interdependencies that violate this assumption. When such dependencies exist, concurrent processing can yield inconsistent or incorrect results. Disregarding these dependencies may lead to incorrect conclusions and flawed insights. 3. Relevancy Distinguishability: Assumption : The paradigm assumes that relevant data can be distinguished from irrelevant data for a particular computation. Implications : Identifying relevant data can be complex, especially in cases where the relevancy of data evolves over time. If not properly addressed, irrelevant data might be included in computations, leading to distorted insights. Moreover, accurately defining relevancy criteria can be challenging, and incorrect assumptions can impact the quality of processed data. 4. Contextual Completeness: Assumption : Data is contextually complete only within its arrival context, which is time-bound. Implications : The contextual completeness of data introduces temporal limitations. Data can't be expected to possess universal completeness across all contexts or moments. Ignoring this reality may lead to processing outdated or contextually irrelevant data, skewing analytical outcomes.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
5. Partially Dependent Calculation: Assumption : We assume that calculations can be interdependent, with dependencies reducing down the processing chain. Implications : As data undergoes a series of calculations, its interdependencies shift. Calculations become progressively less reliant on prior data states. The MapReduce model's assumption of complete independence might not hold. Overlooking this partial dependence can lead to inaccuracies in outcomes. 6. Emergent Relevancy: Assumption : Determining data relevance can be ambiguous, with optimal answers emerging through testing. Implications : The actual relevance of data might only emerge through iterative testing. Selecting the best answer may involve trial and error. Overlooking this emergent aspect might hinder the discovery of optimal solutions, impacting the accuracy of results. Impact of Failing to Evaluate Assumptions: Failing to thoroughly evaluate these assumptions can undermine the effectiveness of the MapReduce paradigm and significantly impact data processing and analysis. Inaccurate or incomplete results can lead to misguided decision-making and lost business opportunities. Moreover, computational resources may be wasted on processing irrelevant or unnecessary data, diminishing efficiency. Effect on Big Data Security: The impact of neglecting these assumptions extends to big data security. For instance, incomplete data may lead to erroneous security evaluations, potentially leaving vulnerabilities undetected. Inaccurate data processing could also result in misidentifying security threats or including sensitive data unintentionally. Additionally, dependencies between data sets, when ignored, might compromise data integrity and confidentiality, posing security threats. To mitigate these challenges, organizations should adopt a nuanced approach to data processing, acknowledging the limitations of the MapReduce paradigm's assumptions. Implementing adaptive data processing models, leveraging real-time analytics, and incorporating machine learning techniques can help address these limitations. By carefully considering the context and dynamics of their data, organizations can ensure more accurate, reliable, and secure big data processing, leading to better decision- making and reduced risk. Word Count: 620 words (excluding assignment instructions and question commands)
Reference Lev-Libfeld, A., & Margolin, A. (2019). Fast Data: Moving beyond from Big Data’s map-reduce. Retrieved from https://arxiv.org/ftp/arxiv/papers/1906/1906.10468.pdf