- Big Data is defined as data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time.
- Examples of Big Data generation includes stock exchanges, social media sites, jet engines, etc.
- Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
- Volume, Variety, Velocity, and Variability are few Characteristics of Bigdata
- Improved customer service, better operational efficiency, Better Decision Making are few advantages of Bigdata
HDFS cluster primarily consists of a NameNode that manages the file system Metadata and a DataNodes that stores the actual data.
Read Operation In HDFS
Write Operation In HDFS
The whole process goes through four phases of execution namely, splitting, mapping, shuffling, and reducing.
- Jobtracker: Acts like a master (responsible for complete execution of submitted job)
- Multiple Task Trackers: Acts like slaves, each of them performing the job