Hadoop can work directly with any mountable distributed file system such as restricted FS, HFTP FS, S3 FS, and others, but the most common file system used by Hadoop is the Hadoop Distributed File System .The Hadoop Distributed File System is based on the Google File System
and provides a distributed file system that is designed to run on large clusters (thousands of computers) of small computer machines in a dependable, fault-tolerant method.
Perhaps my perspective is affected by the fact that I worked closely on the underlying Google File System, but I still believe Google's sharp contrast with Yahoo on infrastructure offers powerful lessons about building a sustainable business, especially in the rapidly transforming technology landscape.
But in nearby Mountain View, Google began work on engineering its own software-defined infrastructure, ultimately known as the Google File System, which would function as a platform that could serve a diverse range of use cases for all the services Google would offer as part of its future ecosystem.
It was inspired by Google's MapReduce and Google File System
and cultivated at Yahoo.
The big data architecture (Figure 2) contains a filesystem at the lowest level, which allows creation of files and directories (Hadoop Distributed File System--HDFS --or Google File System
The Google File System
(GFS) allowed clusters of commodity servers to present their internal disk storage as a unified file system and inspired the Hadoop Distributed File System (HDFS).
MapReduce and its pal, Chubby and GFS (the Google File System) are among Google's core innovations.
Between 2004 and 2006, Google disclosed details of its Google File System and its MapReduce method.
Perhaps with this in mind, Google recently launched a preview release of Google App Engine, a way for developers to run their web applications on Google's infrastructure.With Google App Engine, developers can write web applications based on the same building blocks that Google uses, such as the Google File System
(GFS) and BigTable (its distributed storage system for structured data).
"The Google File System
(GFS) and the Parascale Virtual Storage Network (VSN) employ similar architectures," explained Robin Harris of StorageMojo.
The AltaVista.com engineers contributed to such innovations as the Google File System
(smart enough to avoid the message passing that can choke a parallelized system), MapReduce (a framework supporting parallel computations to be shared among Google's server clusters for processing large data sets), Chubby (a record locking and unlocking service that jumped over traditional relational database methods of managing cell and row locks), and other innovations that continue to give Google a performance and cost advantage 10 years after Google opened for business.