Hadoop is a tool kit for splitting work into pieces for computation on separate servers, then joined together into a final product. Google pioneered the idea when it needed to choreograph a vast army of servers to crawl the Web, and now Hadoop offers a general framework that's being used in similar situtations.
There's a great deal of spinoffs that bundle Hadoop with code for tackling specific problems. Mahout is a scalable machine-learning framework that analyzes large data sets for patterns that might emerge. Hive offers a data warehouse that can be queried with parallel search using HiveQL, a popular approach for dealing with massive quantities of Web logs.