Hadoop:
Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.Hadoop makes it possible to run applications on systems with thousands of commodity hardware nodes , and to handle thousands of terabytes of data. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating in case of a node failure. This approach lowers the risk of catastrophic system failure and unexpected data loss, even if a significant number of nodes become inoperative. The second iteration of Hadoop (Hadoop 2) improved resource management and scheduling.
Hadoop is a :
- Programming model for expressing distributed computations at a massive scale
- Execution framework for organizing and performing such computations
Hadoop uses Hadoop Common as a kernel to provide the framework's essential libraries. Other components include Hadoop Distributed File System (HDFS), which is capable of storing data across thousands of commodity servers to achieve high bandwidth between nodes; Hadoop Yet Another Resource Negotiator (YARN), which provides resource management and scheduling for user applications; and Hadoop MapReduce, which provides the programming model used to tackle large distributed data processing -- mapping data and reducing it to a result.
Hadoop Streaming refers to the ability to use an arbitrary language to define a job’s map and reduce processes
Hadoop related projects: Ambari, Avro, Cassandra, HBase, Hive, Mahout, Pig, Spark, Tez, Zookeeper, Hue