by Asif Khan
To recap what we covered in the Data Tiering series so far, in Part 1, we discussed the history and evolution of data tiering for moving data BETWEEN storage arrays. In Part 2, we discussed how data tiering is being applied today to move data WITHIN storage arrays. In this, the third and final installment of the Data Tiering series, we will discuss what the future holds for Data Tiering (HINT: we will transition to moving data ACROSS the data center). As always, these are just my semi-informed opinions and if I’m proven wrong in the future, I will deny everything :-).
For proper context, let’s start with a discussion of the role of the data center. At its most fundamental level, a data center performs four key functions:
- Process data
- Transfer data
- Store data
- Protect data
If you apply those functions to a storage array, data tiering simply takes data from its original STORED location and TRANSFERS it so that it can be more easily and quickly accessed and PROCESSED all while ensuring it is PROTECTED by governance, risk mitigation, regulatory compliance and security policies.
When you get right down to it, Data Tiering is really just about moving data to where it is needed at the moment (and then stepping aside for newly requested data when the old data is no longer needed). So I’m going to take the liberty and expand the scope of our discussion beyond the storage array to the entire data center. The following is my prediction on where the fundamental design of the storage array in general, and data tiering in particular, is headed.
The Future of Data Tiering
I believe the storage array, as we know it, is about to go through a major transformation. Just as the mainframe morphed into a distributed client server architecture and the network has dispersed itself into a core/distribution/access tiered architecture, a monolithic storage array is becoming an anachronism. A central repository for archived or inactive data makes sense but active data needs to be closer to where it is being processed or accessed…think of Google’s Map-Reduce algorithm.
A very expensive array that stores all the data for an enterprise just won’t work going forward. Today, 70% of a typical enterprise’s data is stored as blocks (LUNs). That number is expected to shrink to 30% within five to ten years, according to some industry experts. The majority of future data generated will be unstructured and most likely stored on scale-out NAS platforms.
Block-based arrays will have to adapt to a similar scale-out strategy at a much lower cost than what is available today. EMC describes its flagship, enterprise-class Symmetrix VMAX platform as “Scale-Out SAN.” HDS’ VSP architecture is marketed in a similar fashion. HP 3Par and maybe even NetApp Cluster-Mode can also fit into the category of “Scale-Out SAN.” And these platforms will continue to be the kings of storage for a large enterprise’s most critical data: financial, HR and other “Tier1” applications.
But how most enterprises store the vast majority of their non-critical (or aged) data will have to become nimbler and cheaper…and most likely transmit over ethernet. The following are two examples where I’ve run into this issue with clients.
1) Mid Sized IT Organization
A retailer with 1000 servers hired Accenture to see if it was possible to move 100% of their data to NAS platforms within five years. They were tired of paying for top-tier block-based storage arrays and dedicated SAN infrastructure and the high costs associated with supporting these technologies. They wanted to virtualize and centralize all their data for easier management and lower operating costs.
We concluded that it was not possible to quit block-based storage altogether due to application support issues. The best we could do was 80-90% NAS. The client was disappointed because their fixed costs for supporting block-based storage would actually go UP in this scenario.
2) Large IT Organization
A healthcare provider with over 15,000 servers, including 3,000 UNIX servers and mainframes, had less than 5,500 of their servers SAN-attached. The majority of their compute nodes used direct-attached islands of storage, primarily due to FibreChannel port costs and lack of a NAS strategy.
This client hired Accenture to come up with a strategy to store over 70% of their total data on NAS platforms within five years. They were not interested in migrating existing data off their current platforms due to support issues and complexity of migration. They were interested in storing as much newly created data on NAS going forward as possible.
Both organizations are typical for IT organizations of their respective size. They both expect their data to grow at unsustainable CAGR levels for the foreseeable future with no clear strategy on how to significantly reduce their overall per-gigabyte cost of storage (especially for block-based data).
I believe there are three emerging technologies that could change all that. The terms are mostly made up by me. Let’s discuss.
[CONTINUED ON PAGE 2]