by Asif Khan
About 10 years ago, I had decided that I needed a major career change. I felt that I had sucked the marrow out of my marketing job and wanted to try something new. I wanted to get more technical and focus my career on data center infrastructure technologies. I thought this track would train me better to take advantage of a trend I was starting to see forming. At the time, they called it Web 2.0 or utility computing. It eventually came to be called Cloud Computing.
I posed a question to a few RSGs (ridiculously smart guys) I knew at the time: “what will be the most interesting infrastructure technology in the coming decade?” Storage was the most popular answer. Then I asked which storage technology was the most interesting. The answer surprised me.
“Most storage technologies are pretty much the same but NetApp is doing some cool things…and besides, it’s a great place to work.” At the time, I had never heard of WAFL or RAID-DP nor did I know what storage virtualization meant, but I trusted their collective opinions. Were they right? Regardless, several months later, I sweet-talked my way into a Systems Engineer job at one of the best places to work in America, according to Fortune Magazine. Gooooal!
I quickly learned how NetApp’s file structure, WAFL (Write Anywhere File Layout) allowed for a much simpler way to manage data access compared to what most of its competitors were doing at the time.
The traditional way to manage data access was to create “high performance” RAID groups (RAID10, FibreChannel disks) and “high capacity” RAID groups (RAID5, SATA disks). An experienced (and highly paid) storage architect would have to figure out how to place data appropriately to get the right mix of performance and capacity. Let’s call this a vertical architecture. NetApp, on the other hand, developed a horizontal architecture which placed all data on a single tier that offered a perfect balance of performance and capacity.
Horizontal vs Vertical Storage Architecture
A single hard disk can only handle a limited number of I/Os per second (IOPS). At the low end, a SATA disk performs at less than 100 IOPS while a solid state disk (SSD) can exceed a couple thousand IOPS. A decade ago, with a vertical architecture, a storage architect had to determine how many disks of which type to place where and how to organize them in order to develop the most cost-effective solution. See Part 1 for more detail. Then, s/he had to determine what type of data to place on each type of disk (and when to move it) in order to meet application SLOs (Service Level Objectives).
NetApp’s WAFL simplified this process by creating a large horizontal stripe (or aggregate, as NetApp calls it) of the same type of disk. If you string together a row of 30 SATA disks for example, each providing 100 IOPS, you would have roughly 3,000 IOPS with a lot more capacity than if you tried to achieve the same IOPS with a couple of SSDs (duly noting that SSDs didn’t exist a decade ago).
Enter Data Tiering
In the last few years, most leading vendors have adopted storage virtualization which enabled automated data tiering…which then helped to neutralize some of NetApp’s inherent architectural advantages. Further, SSDs offered an extremely high performance tier which tilted the advantage to the vendors who implemented vertical tiers with SSDs in their storage architectures.
NOTE: NetApp’s FAS and IBM’s XIV are the two leading “horizontal” architectures on the market today. Most other established vendors employ a “vertical” architecture. There are also a number of storage startups that are rewriting the rules to create entirely new architectures.
So whether you call it aggregates (as NetApp does) or storage pools (as EMC, HP and other vendors do), you can achieve the same end result of balancing performance and capacity that is easy to manage, albeit using technically different methods.
In 2008, NetApp introduced Performance Acceleration Modules (PAM–later renamed FlashCache)as a high speed read-only caching technology to improve read performance and compete against vertical data tiering architectures that employ SSDs. Then in 2010, new NetApp CEO, Tom Georgens boldly proclaimed “Frankly I think the entire concept of tiering is dying” just as tiering was starting to take off for every other vendor except NetApp.
So it should come as no surprise that two years later, NetApp introduced the Virtual Storage Tier Suite. VST is a set of tools to provide SSD-based read+write caching (originally called hybrid aggregates until the marketing team wisely renamed it FlashPools) and server-side PCIe caching (FlashAccel software manages FusionIO on the server) to provide real time data tiering.
Playing Field. Leveled.
In the last decade, Dell severed its reseller relationship with EMC and acquired EqualLogic and Compellent to contend for mid-market storage leadership. HP pretty much abandoned its EVA product line and handed the keys to its 2010 acquisition, 3PAR. IBM has assembled a line of storage systems for every need (most incompatible with each other). NetApp acquired LSI’s Enginio (Georgens’ former employer) and then introduced tiering and clustering to expand it’s storage portfolio–and, in the process, foregoing its once-famous “one architecture” strategy. EMC acquired VMware, DataDomain and Isilon to expand its market lead.
So, as those ridiculously smart guys correctly predicted a decade ago, storage has indeed been really interesting. And most storage technologies, once again, are pretty much the same (including NetApp!).
All This Can Only Mean One Thing…
The next decade is going to bring some really interesting storage innovations that will once again shake up the status quo and keep us storage geeks entertained. I love this industry!
In Part 3, we’ll discuss some of my predictions on what storage might look like in the coming decade.