Are Virtual Desktops (Finally) About to go Mainstream? Part 2

Part 2 – The VDI Building Blocks

by Asif Khan

Part 1 discussed the “what” of making VDI more affordable to the masses. In this segment, let’s discuss the “how.”

With the release of the Cisco UCS C260 M2, all the technology components are now in place to build the ultimate VDI appliance. With a little software tweaking, we can dramatically cut the cost of implementing VDI.

Some smart readers commented on part 1 that the C260 is a perfect catalyst for HyperV or XenDesktop. And it may very well be. For the scope of this post, let’s keep the configuration confined to Vmware, EMC and Cisco (only because I’m familiar with this combo). However, other vendors’ technologies may also work if some of the advanced features are available. A clever system integrator can definitely build something interesting here.

So let’s dive right in. Here are the components we need in order to build this solution:

1) Compute
The goal is to configure an appliance that can support up to 250 concurrent sessions. Let’s further define this as 250 light users=200 task workers=150 knowledge workers=100 advanced users. Your mileage may vary. The key to any VDI configuration is defining the right mix of CPU, RAM and I/O to avoid choke points that limit scalability. To keep it simple, let’s assume that this appliance will only support Windows 7. The average user will use up to 2GB RAM and 1 vCPU. Each user will need an average of 10 steady state IOPS (I/O’s per second) and a 95th percentile burst of 40 IOPS. Let’s further assume 75% average concurrency rate and a 50/50 read/write ratio to start. Finally, let’s assume 12.5 concurrent desktops per core (VMware typically estimates 10-15 per core; I took the average and applied it to the lightest user; heavier users can be calculated using a linear regression).

The C260 M2 supports 20 cores which gets us to 250 concurrent users (20*12.5). If we use low cost 4GB DIMMs to populate all 64 sockets, that gives us 256GB RAM, or 2GB vRAM per user with 50% overcommit. Or we may want to consider using more expensive 8GB DIMMs to reach 512GB RAM which gives us 2GB per user with zero overcommit. Since we are assuming 75% concurrency, somewhere between these two numbers is optimal…skew higher to handle bursts which we’ll discuss next.

2) Storage
The goal is to make the appliance self-sufficient for everything but the user data and roaming profile data. The UCS C260 has 16 * 2.5″ disk slots. It supports SSD, SAS and NL-SAS and comes with a choice of RAID controllers. Assuming we use VMware View Composer, we will need to put the OS image (replica image) on SSD and put the linked clones (user delta disks) on SAS. We could use SAS or NL-SAS for swap files and other metadata.

VDI tends to exhibit frequent bursty behavior (boot storms, anti-virus scan, batch refresh/recompose, etc) so we need a good auto tiering solution to deal with the negative impact on IOPS. EMC’s FAST Suite (Fully Automated Storage Tiering and FastCache) does an awesome job of tiering up “hot” data to SSD and tiering down “cold” data to mitigate the effects of bursts and to conform to the Pareto Principle (20% of the input will generate 80% of the output). If only an EMC array with FAST Suite were available as a VSA (Virtual Storage Array) which can run on the C260 to manage the internal disks…

Check out my vSpecialist brother’s (Nick Weaver, @lynxbat) UberVSA on his excellent blog, Nickapedia (if you read the UberVSA user guide, you might recognize one of the authors, yours truly…or read my other vSpecialist brother’s (Simon Seagrave, @Kiwi_Si) excellent user guide on his TechHead blog). The UberVSA is a virtual Celerra designed for testing only and is not supported for production environments (there is also a new VNX UberVSA but it is also NFS only (DART-based) for now). This is where the closed door negotiations could come in handy. Lefthand Networks made a name for itself doing this sort of thing. We would need the Flare-based VSA to support caching, auto-tiering and virtual pooling. And it needs to be productized and supported for production environments. What d’ya say, EMC?

I have configured several VDI solutions for up to 500 users with 24 disks or less with caching, tiering and pooling enabled. So we can easily create a 250-user RAID6 configuration with hot spares using the 16 disk slots on the C260 if we have the proper tiering, caching and virtual pooling capabilites that EMC’s Flare30 code provides. We should probably set up the SSDs as RAID10 for faster replica reads.

We only need disk space on the appliance for the replicas, linked clones and transient user metadata (swap, temp, etc). User documents, roaming profiles and streamed apps can reside on a separate NAS (CIFS share) using folder redirection. We need to keep the user’s stuff on a fully redundant backend array with disk-based backup and (self-serve) restore capabilities. Talk to your EMC rep for a fabulous solution that can provide you with all that!

3) Network
A typical VMware View session using PCoIP requires somewhere between 128kbps to 256kbps with less than 150ms latency to avoid major user complaints. Let’s take the high end of the range and double the bandwidth to ensure that everyone can browse facebook, youtube and flickr all at once. So 512kbps * 250 users * 75% concurrency = 96Mbps sustained. A 1GB ethernet connection wouldn’t even break a sweat. But UCS supports 10GB fabrics. Boom. Done. Further, remember that the storage is direct-attached on the UCS so no need to carve out bandwidth for a storage network (except for user data on the NAS which is not high priority traffic).

4) Hypervisor
This is the fun (and controversial) part. The latest VMware best practices dictate using non-persistent linked clones with roaming profiles for most use cases. So if a desktop is compromised (blue screen, virus, registry conflict, etc), launch the roaming profile on another non-persistent disk on the same server. If the entire server crashes, the users can restart on another (hot spare) compute node. If you have classes of users that have different SLAs, View Manager can direct them to specific nodes optimized for speed or availability or both.

HA? FT? DRS? vMotion? Who cares? VMware offers a vSphere Essentials Bundle for ROBO users for around $1000 for a 3 server cluster. This is essentially a feature-limited version of ESX and vCenter at a dramatically reduced cost. Couldn’t they do the same for a View package? Let’s call it View Essentials. Start with embedded ESXi (did I mention that the UCS C260 has redundant SD Card slots?! How perfect if you are not booting from SAN…since there is no SAN!). Disable the enterprise features. Add View Manager, View Composer, Security Server and vCenter. License it for $25,000 per node, which breaks down to $100 per user assuming 250 users. Offer a “View Essentials Plus” package which includes the ThinApp bundle for $37.5-$50K ($150-200 per user, assuming 250 users).

It costs about $500 to buy a new desktop PC these days. Since most VDI projects start with virtualizing task workers’ desktops (rather than mobile workers’ laptops), we need to get the cost per user down to $500 (or less) for this to work. It also makes budgeting easy. If the customer can budget, say $450 per user (in packages of 100-250), then it becomes trivial to compare the cost to buying a physical desktop. This enables an IT manager to use their annual refresh cycles to introduce VDI to their users gradually with no high upfront costs.

Customers frequently ask me to do exactly this for them. Last month, I configured a “per user” price for a customer wanting to implement VDI for 4,000 users at 500 users per refresh cycle. We came up with a very reasonable cost per user (blade+disk packs)…assuming they spent $500K upfront to buy the VDI infrastructure. They haven’t called back yet…I’ll let you know if they do.

So can we get the “all in” cost below $500 per user? By the way, by “all in”, I’m referring to server side infrastructure only. This does not include the following:

1) End-user devices – Thin clients have a much longer depreciation schedule…or customers can reuse existing desktops
2) Client Software licenses – there are too many variables here (per user, per concurrent user, per device, VDA licensing, etc)
3) Network costs, NAS for user data, backup/restore tools, power, space and other data center costs

To get to that magic number of $500 per user, we need to price out the following:

1) Cisco UCS infrastructure (fully loaded server and fabric interconnects for single point of management)
2) *Totally made up* EMC VSA license
3) *Totally made up* VMware “View  Essentials” license

I’m not a sales guy so I have no idea how much any of this costs. So let’s reverse engineer this. Let’s say we need to price it at $500 per user list price (customers can negotiate to get below that magic number). Max Users (250) * Per User ($500) = $125,000. We’ve determined that the *totally made up* VMware View Essentials should be $100 per user. Let’s say the *totally made up* EMC VSA is $50 per user. For a 250 user max config, that is $12,500 for the storage software (disks sold separately). I think that’s a very fair price for just the software.

Which leaves $350 per user for the UCS infrastructure (a fully configured C260 with lots of RAM, 4 SSD and 12 SAS disks and the UCS fabric interconnect). $350 * 250 users = $87,500. Can you configure a fully loaded C260 for $87,500 list? If so, this just might work.

OK, I admit that there are some huge risks to address and major obstacles to overcome in order to make all this work. If the three partners (VMware, Cisco and EMC) agree to do this, will EMC be happy to get just 10% of the total revenue…and let Cisco sell the 16 disks while EMC only sells the VSA software? Will VMware be happy to get just 20% of the total revenue while disabling the very features that set it apart from just about every other hypervisor out there (vMotion, DRS, FT, HA, etc)? Will Cisco co-develop a similar appliance with NetApp and Citrix and piss off the other partners?

What do you think? Will this work? Is my UCS pricing estimate in the ballpark? If not, is there a cheaper way to build this with other vendors’ hardware and, say, open source software? Can the partnership agree on the terms and the inequitable revenue split? Does this need to be sold at a flat price regardless of user count or should it be priced on a per user basis with, say, a 100 user minimum?

So the question is: should we charge a higher “per user” fee for heavier users…a progressive tax structure? In other words, we could charge a flat fee ($125K = $500 per user fully loaded) for this appliance whether you have 100 (presumably advanced) users or 250 (presumably light) users. Does it make sense for the customer to pay a higher “per user” fee for fewer users if those users require more system resources? I think that sounds fair. But I’d love to hear from you in the comments.

Next segment, we’ll talk about why most VDI projects fail before they ever get started…and why something like this proposal is so desperately needed. I’ll be at EMC World all this week doing booth and/or lab duty. Look me up. See you there!

UPDATE:  Today at EMC World, Pat Gelsinger announced “Project Lightning” designed to bring SSDs mainstream (cost-wise) and also introduced a new server side PCIe-based cache technology. Read Chad Sakac’s VirtualGeek post on this topic to understand how this works. I think the server side caching trends favor this VDI appliance model. Note especially where Chad discusses ESX running in an array. What if we were to reverse that and have the array co-reside on the ESX server (in this case, the C260)? That is essentially the model I propose here. Stay tuned to see how this plays out.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s