Microsoft S2D Fights Back Nutanix & VMware VSAN

s2d1

Intro:

Gartner predicts Hyper-Converged Infrastructure to be a multi billion dollar industry by 2020. Having been working with VMware VSAN on VxRail and Nutanix with customers and vendors for some time now, GA release of Microsoft Server 2016 with Storage Spaces Direct seems like a great turning point for vendors and customers alike.

HCI Vendors:

s2d

Dell EMC VxRail only supports VMware vSphere hypervisor as it utilizes VSAN which runs on vSphere Kernel, Nutanix supports vSphere, Hyper-V, Acropolis, and Lately XenServer because it utilizes a VSA architecture where a CVM would run on each node, and Microsoft Storage Spaces Direct supports only Hyper-V 2016 in a hyper-converged deployment which runs in kernel mode and a Scale Out File Server deployment which requires independent nodes. Microsoft S2D is a Datacenter feature so make sure licensing is taken into consideration, now for you who want to argue that this would incur licensing cost so not why invest in a more established technology aka “Nutanix” , think again ..

Whatever hypervisor you plan to deploy that being vSphere, Acropolis, XenServer, or KVM eventually 99% of the time it will run more than 2 Windows server VMs thus you have to pay for the Microsoft Datacenter license even though you are not running Hyper-V in order for your guest Windows server VMs to be licensed correctly unless you want to buy an OLP license for each server VM running which would be plain stupid. Just for the sake of it, Simplivity utilize dedicated hardware for data efficiency ( Compression, Deduplication, and optimization ) using an OmniStack Accelerator Card so yes it performs better then all vendors in that area but it requires special hardware which is against software defined in a sense but I believe after being acquired by HP, they will have a very good product overall not to forget Cisco HyperFlex as well which runs recently acquired SpringPath for SDS and as of now only supports vSphere ( Hyper-V support soon ).

vTax & Vendor Lockdown:

image

Honestly speaking I love Nutanix and have tried to work for them before ( what a story behind that Winking smile ) and signup for their Technology champion program but some sales pitches really tick me off, Vendor lockdown or vTaxation or call it what ever that may be is the most senseless unrealistic marketing statement Nutanix brought to this market. The reason I say that is very simple, choosing a hypervisor is not buying potato, if and when you choose to change your hypervisor is not to be taken as slightly as Nutanix sales pitch it to be let alone the technical part of things. From a vendor lock perspective, if you chose vSphere, Nutanix would tell you go with us since VxRail or VSAN only supports vSphere while with Nutanix you can change at anytime to Acropolis or Hyper-V or Xen which is completely true but for heavens sake when did a customer change hypervisors after just purchasing and setting up a new environment and secondly if that is the case would they do it on the same hardware and within its lifecycle timeframe !?

How many customers changed hypervisors after it was deployed especially knowing that most probably they chose vSphere because they have been using it and are sure it is stable. If you choose Hyper-V or Acropolis from the start then simply enough VxRail and VSAN are NOT an option , it is how the product was architecture and built, this is not a point for selling debates … If you choose to change that is also a no brainer since VxRail and VSAN are out of the equation so you plan your new hardware and environment based on your new hyper-visor requirements. Make your point valid and tell us how many Nutanix customers changed from vSphere to another hypervisor on the same hardware within the hardware lifecycle for that company !? I doubt it to be more than 1% …

From a vTaxation perspective which is a phrase invented by Nutanix to screw VMware, all I have to say is that it is a very cheap shot . When you tell customers you are paying a tax for software that has been in the market for more than 20 years with constant developments, innovation, and stability owning around 70% of the hypervisor market, that is just plain low even from a sales/marketing perspective. VMware has the right to charge as much as they want and customers have the right to evaluate and choose what is right for them .. Anyhow Microsoft will keep making money out of all of them Smile .

Endless Debate:

https://gilmourlance.org/wp-content/uploads/2015/12/34262261_25942085bfadf.png

A long endless debate in the HCI world exists between Distributed RAID and Data Locality architectures mainly between VMware VSAN and Nutanix. In essence each vendor twists the technology advantages/disadvantages to there own benefit (which makes sense from vendor perspective), Nutanix Data locality does not heavily depend on the network since a local copy of the data is always served from the same node a VM is hosted on until a failure occurs in which the VM started on other node reading from the network until a new local data copy is built on the new node on top of read I/O should have low latency since its not leaving the node itself while VMware VSAN distributed RAID relies heavily on the network since no locality exists except in a stretched cluster scenario and a VM is spread on different nodes reading from different disks.

Nutanix Data locality issue is mainly with DRS and to some extent maintenance mode, imagine ( only imagine Smile I don`t want to argue on this with Nutanix especially with the very public war going on between Nutanix and VMware professionals in general with the VMUG leaders, vExperts, and NTC issues … although Nutanix strategy is to hire VMware experts mainly ) DRS is invoked every 10 minutes and you have a cluster with 1000 VMs and 20 servers , the impact of moving say 10% of the VMs constantly using DRS fully automated on performance is huge as the data has to always follow the VM and until it has created a local copy its reading from the network.

VMware VSAN distributed RAID issue is mainly with network, VMs read from different hosts/disks at all times ( except if the data happens to be on the same node or in cache ) and all traffic is traveling through the network so that is major utilization of network resources and reliance, another issue that is being propagated in the market is that SSD speed is getting faster than 10GB networks thus very soon with NVMe SSDs the disk is going to be faster than the network so that would incur latency and reduced performance/speed. Both have good answers to these so called limitations and I wont go over that because its a really long debate and I haven’t touched base on Kernel versus VSA architecture, auto-tiering, RAID 5/6/EC support All-Flash only or hybrid as well, Cache limitation and others …

Microsoft Storage Spaces Direct:

image

Back to our subject Microsoft S2D Mirroring provides the fastest possible reads and writes and it is the best option for acutely performance-sensitive workloads or when vast amounts of data are being actively, randomly written, so-called “hot” data but the downside is its lesser storage efficiency so for a RAID1 (2-way mirror) configuration a minimum of two servers are required which provides 50% of usable disk from total storage, for RAID1 (3-way mirror) a minimum of three servers are required which provides 33% of usable disk from total storage. Parity is best for infrequently written, so-called “cold” data, and data which is written in bulk, such as archival or backup workloads so for a RAID5 (Single Parity) you need a minimum of 3 servers but this configuration is not recommended by Microsoft, for a RAID6(Dual Parity) configuration which uses LRC  a minimum of 4 servers are required tolerating a 2 server failure ( same as 3-way mirror ) with 50% usable disk from total storage but the added benefit is that the more servers you add the more storage ratio % you will get (The storage efficiency of dual parity increases the more hardware fault domains you have, from 50% up to 80%. For example, at seven (with Storage Spaces Direct, that means seven servers) the efficiency jumps to 66.7% – to store 4 TB of data, you need just 6 TB of physical storage capacity).

Beginning in Windows Server 2016, one Storage Spaces Direct volume can be part mirror and part parity. Based on read/write activity, the new Resilient File System (ReFS) intelligently moves data between the two resiliency types in real-time to keep the most active data in the mirror part. Effectively, this is using mirroring to accelerate erasure coding, giving the best of both: fast, cheap writes of hot data, and better storage efficiency for cooler data.To mix three-way mirror and dual parity, you need at least four fault domains, meaning four servers. On a side note, ReFS does not support deduplication as of yet but an NTFS volume can be created on S2D with deduplication supported never the less with NTFS auto-tiering (multi-resiliency) is not supported. Take note that “ReFS is equipped with a new “copy on write” feature to battle torn writes, write operations which have not been completed for whatever reason, for example due to a power failure. When ReFS driver needs to update file system metadata, it does not just overwrite the previous data, like NTFS does, but creates a new copy of the metadata in another location”.

Microsoft Server 2016 Storage Spaces Direct (S2D) utilizes distributed RAID with a bit of Erasure coding magic using Local Reconstruction Codes developed by Microsoft especially for S2D. Storage Spaces supports RAID1 ( 2 –way & 3-way mirror ), RAID5 ( Single Parity ), RAID6 ( Dual-Parity ), Erasure Coding, and Auto-Tiering of different resiliency settings utilizing ReFS (Resilient File System) which accelerates .vhdx operations and supports auto-tiering. S2D supports locally attached SATA, SAS, and NVMe drives and RDMA capable 10GB/40GB NICs are recommended. S2D automatically configures itself based on type of disks and number of disks where fastest disks are always used for caching. If three types of disks are available say 2 NVMe, 2 SSD, and 2 HDD , S2D would configure the VNMe disks as Cache, the SSD disks as 1-way mirror performance tier, and the HDD disks as 1-way mirror capacity tier.

If three types of disks are available say 2 NVMe, 4 SSD, and 4 HDD the configuration would be the same as before but with a 2-way mirror and/or Parity. When all drives are of the same type, no cache is configured automatically. You have the option to manually configure higher-endurance drives to cache for lower-endurance drives of the same type so If Only SSD exists ( without VNMe ), no caching is utilized or else if SSDs with NVMe then NVMe are automatically used for caching ( this can be changed through PowerShell ). Any type of configuration  desired can be established through PowerShell and to be honest its the only way to get a decent outcome as the GUI is a bit limiting especially in choosing different Resiliency settings.

Storage Spaces Direct can be deployed in Hyper-converged Infrastructure mode where VMs are situated on the servers that have the storage pooled or in a Scale Out File Server where VMs are on different nodes than the nodes hosting the storage spaces direct storage. For once Microsoft has done a good job documenting Storage Spaces Direct S2D https://technet.microsoft.com/en-us/windows-server-docs/storage/storage-spaces/storage-spaces-direct-overview never the less a very bad job has been done on management of S2D on the GUI part as it requires accessing different tools and at least one command is required through PowerShell to enable S2D except if you are using Microsoft SCVMM then creating an S2D cluster is pretty simple and 100% GUI. Check IOPS performance testing at https://blogs.technet.microsoft.com/filecab/2016/07/26/storage-iops-update-with-storage-spaces-direct/ .

In terms of networking an optimal solution would be to use 25GB or 40GB networks but that is not the case for most customers thus with Server 2016 use Switch Embedded Teaming on 2 x 10GB Mellanox ConnectX NICs RDMA enabled aka “RoCe” on top of enabling Data Center Bridging, QoS, and PFC ( Priority Flow Control ). This might sound complex but it is much easier done than said. From Hyper-V all that needs to be done is enabling RoCe on Mellanox cards and enable Data Center Bridging feature. From the switch side, DCB and PFC have to be enabled and mostly all switch vendors have guides on how to configure this for RDMA over converged Ethernet with couple of commands.

Microsoft Server 2016 build 1709 Semi-Annual Channel release announced support for deduplication for ReFS but the build itself is not supported on Storage Spaces Direct as of now and can only be installed on Core Servers ( No GUI ), I believe official support would come in the next LTSB release which would make S2D even that much stronger.

Microsoft Azure Stack offering which Microsoft is heavily investing in, is completely based on Storage Spaces Direct for providing storage services and hosting the Azure Stack components. After the 1709 came out, many thought the plug was pulled on S2D but that is far from true, it is here to stay. More so Microsoft recommends S2D clusters on Azure when looking for highly available solutions for profile/date management in VDI/RDS solutions.

Last but not least, Microsoft S2D can be built on any Microsoft supported hardware or through partner vendors like Dell, HP, Lenovo, and others … The most important parts from my experience are the storage controller which should support pass through and network cards either Mellanox or Chelsio (iWrap on Chelsio does not require RoCE support or any switch/server related configuration so its my favorite).

Conclusion:

HCI will only get better and so will Microsoft Storage Spaces Direct. As Azure Stack gains more ground so will S2D and Microsoft commitment to continuously developing and improving it. This will help prices drop from other HCI vendors and will increase market awareness on the importance of Hyper-Converged solutions in a soon to be hybrid cloud IT world.

Salam  Smile  .

5621 Total Views 30 Views Today

One thought

  1. Deduplication is a pretty much standard requirement/necessity these days with virtualization. I encouraged many customers to adopt ReFS especially for Veeam Backup Scenarios, But it needs to evolve more, look at Filesystems from Open Source community, Microsoft can do better.

Leave a Reply

Your email address will not be published. Required fields are marked *