HDFS
( Notes from Definitive Guide 4th edition ) Hadoop is written in Java. Good for Very large files, streaming data access, commodity hardware. Not suitable for low latency applications, lots of small files, multiple writes and arbitrary file modifications Disk block is different from file system block. File system ( of a single disk) blocks consists of multiple disk blocks. Disk block is typically 512 bytes. HDFS also has concept of block, but it is much larger than disk or regular single disk file system block. HDFS block size is large to so that it can take advantage of higher disk transfer rate as compared to lower seek rate. Map tasks in MapReduce normally operate on one block at time. Why HDFS blocks? file can be larger than a disk , simplifies storage management (fixed width) , blocks fit well for replication Block Cache : for frequently used file blocks , can be administered via cache directives to cache pools. Namode/datanode => Master/slave Namenode : ( m