Concept explainers
a. Definition of the term Hadoop.
Explanation of Solution
Hadoop is a complete package of framework that makes it possible to deal with data using cheap commodity hardware machines. We know that it is costly to store and process the data in a single machine because machine with that kind of computation power and memory is very expensive. What Hadoop does is, it combines the power of many cheap commodity machines as one by storing and processing the data in a distributed fashion over cluster of commodity machines.
Hadoop uses popular MapReduce technique (explained in next section) to achieve this.
(b)
Definition of the term MapReduce.
Explanation of Solution
Map Reduce is a processing technique used in Hadoop based on Java. It is a combination of two individual processing techniques.
- Map: Map technique takes the input data and transform it into another set of data that is tuple(key/value) pair.
- Reduce as name suggests reduces or combines the output from map into a smaller set of data(tuples).
(c)
Definition of the term HDFS.
Explanation of Solution
HDFS (Hadoop File System) is a distributed file
HDFS follows the master-slave architecture. Where Namenode acts as a master and Datanode acts as a slave.
Namenda: - It manages the namespace of file system, client’s access to file and controls the operations like renaming, opening and closing a file.
Datanode: - It acts as the instruction received from Namenode which includes file I/O(read/write), block creation, deletion and replication.
Pig as name suggests who eats anything, it is an abstraction layer on the top of MapReduce technique to analyze Big data using the representation of data flow.
(d)
Definition of the term NoSQL.
Explanation of Solution
As name suggest NoSQL means non-relational. In a nutshell NoSQL is a
(e)
Define the term Pig.
Explanation of Solution
Pig as name suggests who eats anything, it is an abstraction layer on the top of MapReduce technique to analyze data using the representation of data flow.
Want to see more full solutions like this?
Chapter 10 Solutions
Modern Database Management
- Describe the concept of sharding in NoSQL distributed databases and its advantages.arrow_forwardIn regards to XML language. Please Suggest scenarios where it would be useful to determine whether two variables refer to the same node set. Would IDs generated using a DTD or the generate-id() function be more useful in each case?arrow_forwardCompare the major key-based 2NF and 3NF definitions to the generic ones. Provide an example.arrow_forward
- In your own word explain a little about PostgreSQL and MySQL How do MySQL and PostgreSQL compare? What "geospatial" really means for PG (PostGIS) and MySQL Spatial?arrow_forwardIn what ways can data binding enhance the development of single-page applications (SPAs)?arrow_forwardWe have a number of products for which we do not have any information about their characteristics/features. The only information we have is from a domain expert, that reviewed them provide us with a similarity value between any two products. How would you convert that information into a multidimensional data set for clustering the products?arrow_forward
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education