Vmware

As part of a project, old server hardware was replaced with shiny new hardware. Beside the server hardware, storage hardware and infrastructure was also replaced. The new hardware was installed beside the old hardware and because the customer has a high virtualization ratio, nearly all servers were VMs and the migration of the VMs was was done without downtime. The customer uses a Windows 2008 R2 failover cluster for file services and MS SQL Server. The MS SQL Server is the database for the ERP software. This cluster used in-guest iSCSI and because of this, we were able to move it online to the new server hardware and migrate the cluster disks later. At a certain point we had the cluster nodes on the new hardware and were able to do a direct comparison in terms of the performance of the new hardware. The runtime of batch jobs and the experience of user showed us, that the hardware was slower. We were puzzled…

While I was onsite at a customer to decommission an old storage system, one of my very first tasks was to unmount and detach some old datastores. No big deal, until I saw that one after one ESXi hosts went to “not responding”. Time for a heart attack but hey: Why should a host ran into a PDL/ APD, while I was dismounting datastores on the vSphere layer? The LUNs were still there and accessible. The hosts came back quickly and from that point, I watched the hosts flapping between “connected” and “not responding”. Time for an investigation. My first thought was that it must have something to do with the network. But the network was okay, no problems with interfaces, (M/R)STP or similar. Then I checked the logs and found this

The first part of this (short) blog series covered the basics of VMware vSphere Metro Storage Cluster (vMSC) with HP 3PAR Peer Persistence. This, the second, part will cover the basic tasks to configure Peer Persistence. Please note that this blog post relies on the features and supported configurations of 3PAR OS 3.1.3! This is essential to know, because 3.1.3 got some important enhancements in respect of 3PAR Remote Copy.

Fibre-Channel zoning

On of the very first tasks is to create zones with between the Remote Copy Fibre Channel (RCFC) ports. I used two ports from a quad-port FC Adapter for Remote Copy. This matrix shows the zone members in each Fibre Channel fabric. 3PAR OS 3.1.3 supports up to four RCFC ports per node. Earlier versions of 3PAR OS only support one RCFC port per node.

A customer contacted me, because he had trouble to move a VM between two clusters. The hosts in the source cluster used vNetwork Standard Switches (vSS), the hosts in the destination cluster vNetwork Distributed Switch (dVS). Because of this, a host in the destionation cluster had an additional vSS with the same port groups, that were used in the source cluster. This configuration allowed the customer to do vMotion without shared storage between the two clusters. The setup worked fine, until the customer moved a specific VM to the new cluster and switched the port group of the VM from the vSS to the vDS: The VM lost the connect to the network. A switch back to the vSS restored network connectivity for the VM. While troubleshooting this issue I noticed that the port was blocked due to a L2 security violation.

The title of this blog post mentions two terms that have to be explained. First, a VMware vSphere Metro Storage Cluster (or VMware vMSC) is a configuration of a VMware vSphere cluster, that is based on a a stretched storage cluster. Secondly, HP 3PAR Peer Persistence adds functionalities to HP 3PAR Remote Copy software and HP 3PAR OS, that two 3PAR storage systems form a nearly continuous storage system. HP 3PAR Peer Persistence allows you, to create a VMware vMSC configuration and to achieve a new quality of availability and reliability.

The whole story began with a tweet and a picture:

Spotted Marvin on VMware campus during a break this morning "first hyperconverged infrastructure appliance " pic.twitter.com/1iIPocjREX
— Fletcher Cocquyt (@Cocquyt) June 7, 2014

This picture in combination with rumors about Project Mystic have motivated Christian Mohn to publish an interesting blog post. Today, two and a half months later, “Marvin” or project Mystic got its final name: EVO:RAIL.

What is EVO:RAIL?

Firstly, we have to learn a new acronym: Hyper-Converged Infrastructure Appliance (HCIA). EVO:RAIL will be exactly this: A HCIA. IMHO EVO:RAIL is VMwares try to jump on the fast moving hyper-converged train. EVO:RAIL combines different VMware products (vSphere Enterprise Plus, vCenter Server, Virtual SAN and vCenter Log Insight) along with EVO:RAIL deployment, configuration and management to a hyper-converged infrastructure appliance. Appliance? Yes, an appliance. A single stock keeping unit (SKU) including hardware, software and support. To be honest: VMware will no try to sell hardware. The hardware will be provided by partners (currently Dell, EMC, Fujitsu, Inspur, NetOne and SuperMicro).

On the HP Discover in June 2013 (I wrote 2014, sorry for that typo). HP has announced the HP 3PAR StoreServ 7450 All-Flash Array. To optimize the StoreServ platform for all-flash workloads, HP made some changes to the hardware of the nodes. The 7450 uses 8-core Intel Xeon CPUs instead 6-core 1.8 Ghz CPUs, the cache was doubled from 64GB to 128GB and they added some changes to the 3PAR OS: HP added additional cache flush queues to separate the flushing of cache for rotating rust and SSD devices. They also made some write I/O optimizations and added the ability to perform fragmented writes. Instead of writing 16 KB blocks, 3PAR OS is now able to write only 4 KB of a 16 KB block. This software-based changes may be used also on the 7200 and 7400. This leads to the new…

Virtualization is an awesome technology. Last weeks I visited a customer and we took a walk through their data centers. While standing in one of their data centers I thought: Imagine that all server, that they are currently run as VMs, would be physical?. I’m still impressed about the influence of virtualization. The idea is so simple You share the resources of the physical hardware between multiple virtual instances. I/O, network bandwidth, CPU cycles and memory. After nearly 10 years of experience with server virtualization I can tell that especially the memory resources is one of the weak points. When a customer experiences performance problems, they were mostly caused by a lack of storage I/O or memory.

Today I stumbled over a nice workaround. While installing a CentOS 6 VM, I needed to install the VMware Tools. I don’t know why, but I got an error message, regarding a non accessible VMware Tools ISO.

Patrick Terlisten/ vcloudnine.de/ Creative Commons CC0

I remembered a blog post I read a few months ago, about a VMware online repository, from which VMware tools can be installed. You can download the repository information here. The RPM for RHEL can also be used for CentOS. Simply download and install the RPM:

In April 2014 was a bug in vSphere 5.5 U1 discovered, which can lead to APD events with NFS datastores.iSCSI, FC or FCoE aren’t affected by this bug, but potentially every NFS installation running vSphere 5.5 U1 was at risk. This bug is described in KB2076392. Luckily none of my customers ran into this bug, but this is more due to the fact, that most of my customers use FC/ FCoE or iSCSI. Until today the only solution was to avoid the upgrade to U1 and to use vSphere 5.5 GA (with some patches to fix the Heartbleed bug).

Performance issues on new HW

VMware vCenter: Host state 'not responding' flapping

VMware vSphere Metro Storage Cluster with HP 3PAR Peer Persistence – Part II

Fibre-Channel zoning

Trouble due to changed vDS default security policy

VMware vSphere Metro Storage Cluster with HP 3PAR Peer Persistence - Part I

VMware jumps on the fast moving hyper-converged train

What is EVO:RAIL?

New HP 3PAR StoreServ AFA, VMware VVols and some thoughts

Memory management: VMware ESXi vs. Microsoft Hyper-V

Install VMware Tools from VMware repository

Patch available: VMware vSphere 5.5 U1 NFS APD bug