New VSAN Features

VMware VSAN : What’s new.

 

I’ll start with the little things.  In addition to pushing out new features with each major release, VMware is great at delivering usability and management tools that make adoption and deployment easy.  While many tend to focus on flashy numbers in new releases (“Support for a bazillion cores!”; “245% more cloud!”) my focus is generally focused on management and back end features that enable my team and customers accomplish more with less time.

 

On the front end, management of VSAN has been made “even more radically simple” in this release. With a click of a button, you will now be able to:

  • trigger the notification lights for drives,
  • evacuate and remove individual drives,
  • flag drives as SSD (or not),
  • and remove existing partitions to prepare a disk for VSAN

 

While hunting down device numbers through the CLI was fun, it’s good to replace “./storcli /c0 /e252 /s2 start locate” with clicking on the green light icon, making drive replacement service a less technical and less complex experience.

 

A lot of proof of concepts and brownfield deployments count on RAID-0-only controllers, and this should simplify and remove the need to manually flag devices for flash.  Granular evacuations will allow for simple, safe and quick drive-by-drive upgrades, and simplifies proactive removal of drives nearing the end of their life. .  These enhancements  effectively eliminate my need to keep  a StorCLI reference guide on my desk.

vsanblog61

On the back end, some major low-level house cleaning has been done.  The old VMFS-L system used by VSAN 5.5 is replaced with the new Log Structured File System.  This was made possibly by VMware’s acquisition of  Virstro in 2013.  Virstro brought an  innovative file system to the table, allowing not only significant improvement to random write workloads (VDI and OLTP users rejoice!), but radically different snapshots that carried lower overhead.  It appears some of these concepts have finally made it to VSAN with the release of the new file system.  While VSAN 5.5 buffers bursts of writes, this new system will radically improve non-burst steady state write IOPS.

 

In addition, a new log structure checkpoint system for snapshots allows for rapid snapshot creation and removal without the significant performance penalty that traditional redo log snapshots entail.  With today’s vmfsSparse Snapshots, it is not recommended to leave snapshots open for any length of time as snapshot consolidation can “stun” a VM with a burst of write IO. Further, running a VM with multiple open snapshots can lead to a 50% or more decline in performance.  The new vsanSparse snapshots should deliver comparable performance to traditional hardware array snapshots.  This should drastically reduce issues with snapshot merges stunning VM’s under high load, snapshot consolidation errors, and allow for using regularly scheduled virtual machine snapshots to augment backups as part of your data protection strategy. QA and test/dev environments that use deep snapshot chains will experience orders of magnitude better disk performance.

 

Migrations to the new format will be non-disruptive (and will be managed through the  RVC console) and though the legacy file system will still be supported, we expect quick adoption for the new format.

vsanblog62

37 replies
  1. Zsolt Pamuki
    Zsolt Pamuki says:

    Thanks for this article. Multiple open snapshots are a pain the back indeed. This new filesystem looks really good.

    Reply
    • John Nicholson
      John Nicholson says:

      Hoping to do some benchmarks before and after once the GA code is released. Stuff I saw back in the day (around Virsto acquisition) was 3-6% impact vs. the 80+% impact on nested snapshots on normal redo log snapshots.
      Try to do a similar mini-white paper like your VSAN Switch HCL one.

      Reply
      • Zsolt Pamuki
        Zsolt Pamuki says:

        Some benchmarks would be really good. Bookmarked your blog and I’m going to check it time to time for your results :).

        Reply
    • John Nicholson
      John Nicholson says:

      We actually use VSAN (and help customers deploy and manage it) so the “little things” of management are critical to us. VMware has been really responsive, and this releases fixes all of my complaints and management “quirks”.

      Reply
    • John Nicholson
      John Nicholson says:

      If you need any help on the design or sizing give me a buzz. We’ve found with the right hardware vendors we can do a “no compromise VSAN” that scales down very low for the SMB market.

      Reply
  2. Andres Rojas
    Andres Rojas says:

    Thx for this great post, I can see how your approach is diffent from hte other blogs that I have seen
    You go to the meat, and I like it!

    Reply
    • John Nicholson
      John Nicholson says:

      There’s a fling floating around that will allow for scheduling these snaps and use them for rapid checkpoints (Not backup replacements!) but that combined with better Veeam/VDP performance should make the new VSAN a strong contender for high write/transaction databases that normally the snap remove can get painful on. I’m looking to Veeam/Commvault to release snapshot management functionality in their products to handle advanced scheduling and recovery using these new snaps.

      Reply
    • John Nicholson
      John Nicholson says:

      Yup. It does involve putting a host in maintenance mode then a RVC (Ruby console command) but it doesn’t look too bad. I’m migrating our production cluster as soon as I have the final code. Look for a blog post on the process.

      Reply
  3. Peter Mensah
    Peter Mensah says:

    Looking forward to the non-disruptive capabilities for migration and still supporting the legacy file system. If a host will be put in maintenance mode, then there is some level of down time. kindly clarify.

    Reply
    • John Nicholson
      John Nicholson says:

      In short, disruption to a VSAN host doesn’t mean disruption to my running virtual machines.

      Its simple. ESXi’s access to the VM’s are through a VMFS shim layer, so its a rolling upgrade (one node at a time) across your cluster.
      In 99% of environments we deploy, we have VMware vMotion that allows for the moving of a running virtual machine from one host to another. Before maintenance mode is entered all VM’s are moved off (either by DRS, or by hand). Once maintenance mode is entered, the VM’s will access their data from copies on other hosts over the VSAN network and vKernel (VSAN/vSphere doesn’t care if the VM is running on a host where the data is local).

      Setting up DRS and Admission Control when deploying the cluster make these operations a lot less work :)

      When putting the host in maintenance mode the VSAN option for full data migration, or ensure accessibility would need to be chosen. (I would prefer full data migration in this case).
      Cormac has a good blog going over how VSAN and maintenance mode work.
      http://blogs.vmware.com/vsphere/2013/11/virtual-san-maintenance-mode-monitoring.html

      Reply
  4. Kirk Tran
    Kirk Tran says:

    Thanks for the great post. Liked how you showed how the new features helped make administrative task easier.

    Reply
    • John Nicholson
      John Nicholson says:

      I’m curious what you define as a “Established” all-flash-array vendor (and why anyone would care).

      Would we want an “all flash, born for flash vendor” because they are faster?
      Out of the top 10 SPC-1 benchmarks, only Kaminario falls in this category (meanwhile HDS’s VSP which has a relatively old code base that was originally never designed for flash “winning”). (I’m being told by 3rd parties testing NVMe and DIMM based flash they are seeing VSAN clusters with millions of IOPS certainly an option, at reasonable prices).

      Do we pick them because they are from a big company? EMC’s ExtremeIO that’s promise of non-disruptive upgrades proved to be more than they could deliver on, and shipped without replication and other functionality. VSAN shipped with at least replication support (in the form of vSphere Replication).

      Do we pick them because they’ve been around a while? Whiptail (now Cisco Invicita) has been doing all flash for a long time, and they are still not selling an HA solution and are working to ship a fix for the data corruption issue that caused them to quit selling in Q3 to current. VSAN’s teething issues have largely come from LSI/VMware drivers, or people going off HCL.

      Honestly I’ll defer to Chris Evan’s on why defining AFA as a category is kind of silly.

      Reply
  5. Shawn
    Shawn says:

    Thanks for the article, every bit of information is helpfull, I have been looking at the possibility of introducing a VDI in our environment on VSAn for the cost savings. It’s starting to look like the performance may be where it should be.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply to John Nicholson Cancel reply

Your email address will not be published. Required fields are marked *