This is a compare and contrast of ESX and Hyper-V from a practitioners and hopefully impartial viewpoint.
I will discuss following areas
Resilience / High Availability
Networking
Storage
Deployment
Manageability
Conclusions
I will outline the technologies available in each solution and throw in my own thoughts for good measure.
ESX = 3x (3.5 and 3i ) unless stated
HV = Hyper-V
HV2 = Hyper-V R2
Resilience
The key concern for a virtualised environment is the high availability of the physical platform on which the VMs reside. It quite clear that a physical production server tends to be fairly important in the modern IT environment and its loss is usually negated through a variety of strategies. Consider the transition from the physical to the virtual. You make a single point of failure represent a potential catastrophic failure with one physical server running 20 or more virtual machines. How do we address this?
For all solutions I assumes that VMs are located on shared storage. I am not interested in stand alone deployments for the reason mentioned above.
ESX
The ESX suite has a number ways of achieving resilience.
HA – High Availability
ESX server is deployed in a “cluster” configuration with each ESX server monitoring its fellow cluster members and its own console port gateway address. If a member of a cluster is isolated (network issue) or physically disappears (hardware issue) then the other member/s take ownership of the VMs and subject to resource criteria (see DRS below) start spinning up the VMs to restore service.
Pros:
Technology is tried and tested, works and is very simple to configure.
Simple solution based on ICMP hosts ping each other and gateway. If host is isolated from cluster and gateway it shuts down VMs on the assumption that another ESX server will take on the workload. If the ESX server is isolated from its peers but can still ping gateway it continues as normal assuming that peer/s are at fault.
Cons:
Downtime can be significant, up to 15 minutes.
Although not strictly a HA issue it is possible to configure ESX (via Virtual Centre) in a way that prevents the successful migration of VMs due to resource allocation conflicts.
VMotion
Targeted at graceful migration of VM from one ESX server to another. The technology in its simplest terms starts up a copy of the VM to be migrated on the new destination ESX server. The two VMs are then brought in CPU step with each other and when they are fully synchronised a reverse Arp is issued that updates the switch port MAC address tables resulting in all network traffic been directed to the new VM, this final step takes a few milliseconds during which there is no network connectivity to either VMs. Once complete the original VM is powered down.
Requires a separate 1GB VMotion network to facilitate the synchronisation in a timely manor.
Storage VMotion
Much like VMotion only the actual VM files are moved from one storage system to another. This enables VMs that have become disk I/O bound to be migrated without downtime.
ESX v4 Introduces the concept of HA VMs
A VM is run on two separate physical ESX servers utilising specific Hardware capability of the new Intel chipset the VMs are CPU synced all CPU cycles are executed one primary host and secondary host so that in the event of a failure a reverse ARP is issued and the secondary VM is instantly available.
Cons:
This requires specific hardware to implement
Pros:
If this works it will be the single most effective HA solution available to the Windows Systems Administrator. At present MSCS Clusters are the best option for true HA applications.
HV
Utilises Microsoft Cluster services for HA. Currently there is no zero downtime migration capability.
MSCS Microsoft Cluster Services
This is a area were the future is bright for HV but right now is a poor relation to ESX. In order to create HA solutions for HV a MSCS cluster is created and VMs are stored on the shared volume.
This results in its own problems
1) In order to facilitate a failure between hosts MSCS need to gain LUN level ownership for a host, this means that in order to provided a failover solution that only effects the specific VM that needs migration (thing network card failure) then a VM must have its own LUN
2) As each VM needs its own LUN SAN configuration becomes complex very quickly. Invariably this will add extra load to the SAN storage processors and a much higher administration overhead on fabric management. Furthermore a host can rapidly run out of drive letters to assign to LUNs
HV R2 – Clustered Shared Volumes
The reason that ESX has VMotion and Storage VMotion is because of the VMFS File system. Put simply VMFS allows multiple hosts to see the same LUN at the same time. Access is at the file level and therefore multiple hosts can write to a LUN simultaneously without effecting the integrity of the LUN. This is not the case with NTFS. NTFS has its hands tied here however there is light at the end of the tunnel…..
CSV allows multiple hosts to connect to a LUN at the same time. All writes are effectively proxied through a master host (one per CSV) and any host (in the cluster) can read at anytime from the LUN.
This combined with geographically diverse clusters is a very useful technology and has uses beyond virtualisation. Think geographically diverse SQL databases… and while I think about it possibly load balanced SQL DBs for web applications e.g. predominantly read orientated DB access. (I am looking at a mySQL load balanced configuration with ZXTMs at the moment V. Clever. )
Comments:
At Tech.Ed this technology was demonstrated and looks the business. Is it as easy to setup and manage as ESX alternatives? I would have to say no.
HA / Resilience – ESX 1 HV 0
Useful Links
Hyper-V
Hyper-V Step-by-Step Guide: Hyper-V and Failover Clustering
VMware Infrastructure 3 Online Library (HTML)
In the next part I will look at Networking….