The Google File System (labs.google.com/papers/gfs-sosp2003.pdf) is, like the name implies, a file system. And what makes it different from the file systems that are common on our desktops and laptops is that it is optimized, well, for Google. As users, we do a lot of different things with our computers. Sometimes we surf the Internet, sometimes we edit pictures, sometimes we edit music, sometimes we watch movies, etc. And all of us do these and other things at different times and in different combination of ways. Basically, we need computers, operating systems and supporting subsystems (e.g. file systems) that are general purpose. This is not so for Google. Google uses their computers and their data in very specific ways and for those reasons they created a file system that suits them better than any general purpose file system likely would, or could.
Why Create a New File System?
It is not uncommon for large organizations to rely on specialized hardware to help ensure uptime, faster processing, etc. Google takes a different approach by using predominantly commodity hardware and lots of it. The problem with commodity hardware (not unlike a typical desktop) is that it is not reliable. Let me clarify that statement. An individual desktop computer may almost never fail, but if you had thousands of desktops then the likelihood of any one of them failing at any point in time might be pretty good.
This idea is something referred Mean Time to Failure (MTTF). MTTF is the average amount of time between failures. So, say that you have a desktop computer that has been running for 5 years before ever having an interruption in service (e.g. blue screen, automatic reboot, power surge, etc.) then we might be inclined to say that it has an MTTF of 5 years. Now, let's say that we have 1,000 of those desktops that makeup some computer system. And in order for that system to deliver the required services then 100% of those computers must be functioning. Our MTTF for that system is 5years/1000, or a little under two days. That means on average for such a system composed of 1,000 5year computers, each with a MTTF of 5years, we should expect a failure more than once every 2 days.
In order to deliver services like search, Google has a system of computers that is actually composed of well over 1000 computers, mostly composed of commodity hardware. Needless to say, one or more hardware failures are the norm not the exception at Google. Google also relies heavily on data, so an outage of any single computer in say, their computer system that supports search, could not mean a data outage or a service outage. GFS was created to guard against this kind of problem.
GFS was also created to work with really, really big files. And these aren't just any kind of files, they are the kind of files that Google needs to work with. Allow me to explain. The files that I use on my computer aren't typically larger than a gigabyte. Sure, I have movies and some virtual drives, etc. that can get pretty large. But most of my files are in the megabyte range. I also do a lot of things with my files. I append to them, I update them and I delete them. None of these actions are uncommon for me. But for Google they would be. Google adds a lot more data than they overwrite or delete. In fact, overwriting almost never happens for them. So, the size of the files they use and the way they use those files are very different than most general purpose computing. Therefore, it stands to reason that Google might find some benefit from having a file system that is optimized to work the way they work.
Continue to the next page to read how it works...