10 hours to get started with big data (two) ------ first met Hadoop

10 hours to get started big data (two) - first met Hadoop

1, Hadoop introduced | | | open source, distributed storage + Distributed computing platform 开源、分布式存储+分布式计算平台

2, what Hadoop can do Build large data warehouse, PB level data storage, processing, analysis, statistics, etc. Search engine, log analysis, business intelligence, data mining

3, distributed component file system HDFSFeatures: scalability, fault tolerance, massive data storage Divide files into blocks of specified size and store them on multiple machines in multiple copies Data segmentation, multiple copies, fault tolerance, etc. are transparent to the user

**4, resource scheduling system YARN of Hadoop core components YARN: Yet Another Resource Negotiator is responsible for the management and scheduling of the entire cluster resource. Features: scalability, fault tolerance, unified scheduling of multi-frame resources 这里写图片描述

5, distributed computing framework of Hadoop core components MapReduce 这里写图片描述

6, Hadoop advantage1. High reliability * Data storage: multiple copies of data blocks * Data calculation: rescheduling job calculation  2. Extensibility: When the storage and computing resources are insufficient, the machine can be extended linearly in a horizontal direction. * A cluster can contain a number of nodes 3. Other * Store on inexpensive machines to reduce costs * Mature ecosystem