Apache Hadoop is an open source application framework for storingand handling enormousscale of datasets on commodity hardwareclusters. Hadoop is an Apache topmost-level project being designed and utilized by a group of usersand contributors. It is authorized under the Apache License 2.0.
The Apache Hadoop framework contains following four modules:
1. Hadoop Common:comprises libraries and utilities required by various Hadoop modules
2. HDFS- Hadoop Distributed File System : a distributed filesystem that stores data on the commodity machines, offeringexcessivecombined bandwidth across the cluster
3. Hadoop YARN: a resource management frameworkaccountable for handling compute resources in clusters and utilizing them for scheduling of user’s applications
4. Hadoop MapReduce: a large scale data processingmodel.
Each module in Hadoop are outlined with a basicsuppositionthatfailures ofhardware (racks of machines or of individual machines) are normal and consequently ought to be naturally taken care of in software by the system. Apache Hadoop's HDFS and MapReduce modules initially builtseparately Google File System (GFS) papers and Google's MapReduce.
Beyond HDFS, MapReduceand YARN, the whole Apache Hadoop "framework" is currently normally considered to comprise of various related projects also: Apache Hive, Apache Pig, Apache HBase, and others.
For the end-clients, however MapReduce Java code is basic, any programming dialect can be utilized with "Hadoop Streaming” to execute the “map” and “reduce“parts of the client's project. Apache Hive and Apache Pig and among other related projects, uncover more elevated amount client interfaces like Pig latin and a SQL variation individually. The Hadoop Platform itself is generally composed in the Java programming language, with some code in C and order line utilities composed as shell-scripts.