Shared Storage for Failover Clusters

Shared disk storage is a requirement for Windows Server 2008 R2 failover clusters using the Node and Disk Majority Quorum and the Disk Only Quorum models. Shared storage devices can be a part of any cluster configuration and when they are used, the disks, disk volumes, or LUNs presented to the Windows systems must be presented as basic Windows disks.

All storage drivers must be digitally signed and certified for use with Windows Server 2008 R2. Many storage devices certified for Windows Server 2003 or even Windows Server 2008 might not work with Windows Server 2008 R2 and either simply cannot be used for failover cluster shared storage, or might require a firmware and driver upgrade to be supported. One main reason for this is that all failover shared storage must comply with SCSI-3 Architecture Model SAM-2. This includes any and all legacy and serial attached SCSI controllers, Fibre Channel host bus adapters, and iSCSI hardware- and software-based initiators and targets. If the cluster attempts to perform an action on a LUN or shared disk and the attempt causes an interruption in communication to the other nodes in the cluster or any other system connected to the shared storage device, data corruption can occur and the entire cluster and each storage area network (SAN) connected system might lose connectivity to the storage.

When LUNS are presented to failover cluster nodes, each LUN must be presented to each node in the cluster. Also, when the shared storage is accessed by the cluster and other systems, the LUNs must be masked or presented only to the cluster nodes and the shared storage device controllers to ensure that no other systems can access or disrupt the cluster communication. There are strict requirements for shared storage support, especially with failover clusters. Using SANs or other types of shared storage must meet the following list of requirements and recommendations:

» All Fibre, SAS, and iSCSI host bus adapters (HBAs) and Ethernet cards used with iSCSI software initiators must obtain the “Designed for Microsoft Windows” logo for Windows Server 2008 R2 and have suitable signed device drivers.

» SAS, Fibre, and iSCSI HBAs must use StorPort device drivers to provide targeted LUN resets and other functions inherent to the StorPort driver specification. SCSIport was at one point supported for two-node clusters, but if a StorPort driver is available, it should be used to ensure support from the hardware vendors and Microsoft.

» All shared storage HBAs and back-end storage devices, including iSCSI targets, Fibre, and SAS storage arrays, must support SCSI-3 standards and must also support persistent bindings or reservations of LUNs.

» All shared storage HBAs must be deployed with matching firmware and driver versions. Failover clusters using shared storage require a very stable infrastructure and applying the latest storage controller driver to an outdated HBA firmware can cause very undesirable situations and might disrupt access to data.

» All nodes in the cluster should contain the same HBAs and use the same version of drivers and firmware. Each cluster node should be an exact duplicate of each other node when it comes to hardware selection, configuration, and driver and firmware revisions. This allows for a more reliable configuration and simplifies management and standardization.

» When iSCSI software initiators are used to connect to iSCSI software- or hardwarebased targets, the network adapter used for iSCSI communication should be connected to a dedicated switch, should not be used for any cluster communication, and cannot be a teamed network adapter as teamed adapters are not supported with iSCSI.

For Microsoft to officially support failover clusters and shared storage, in addition to the hardware meeting the requirements listed previously, the entire configuration of the server brand and model, local disk configuration, HBA or network card controller firmware and driver version, iSCSI software initiator software, storage array, and storage array controller firmware or SAN operating system version must be tested as a whole system before it will be considered a “Windows Server 2008 R2 Failover Cluster Supported Configuration.” The point to keep in mind is that if a company really wants to consider using failover clusters, they should research and find a suitable solution that will meet their budget. If a tested and supported solution cannot be found within their price range, the company should consider alternative solutions that can restore systems in about an hour or a few hours if not within a few minutes. The truth is that failover clusters are not for everyone, they are not for the faint of heart, and they are not within every organization’s information technology budget from an implementation, training, and support standpoint. Administrators who want to test failover cluster configurations to gain knowledge and experience can leverage several low-cost shared storage alternatives, including using the Windows iSCSI initiator and a software-based iSCSI target, but they must remember that the configuration may not be supported by Microsoft in case a problem is encountered or data loss results.


Serial Attached SCSI (SAS) Storage Arrays
Serial Attached SCSI or SAS storage arrays provide organizations with affordable, entrylevel, hardware-based direct attached storage arrays suitable for Windows Server 2008 R2 clusters. SAS storage arrays commonly are limited to four hosts, but some models support extenders to add additional hosts as required. One of the major issues with direct attached storage is that replication of the data within the storage is usually not achievable without involving one of the host systems or software provided by the hardware vendor.


Fibre Channel Storage Arrays
Using Fibre Channel (FC) HBAs, Windows Server 2008 R2 can access both shared and nonshared disks residing on a SAN connected to a common FC switch. This allows both the shared storage and operating system volumes to be located on the SAN, if desired, to provide diskless servers. In many cases, however, diskless servers might not be desired if the operating system performs many paging actions because the cache on the storage controllers can be used up very fast and can cause delays in disk read and write operations for dedicated cluster storage. If this is desired, however, the SAN must support this option and be configured to present the operating system dedicated LUNs to only a single host exclusively. The LUNs defined for shared cluster storage must be zoned and presented to every node in the cluster, and no other systems. The LUN zoning or masking in many cases is configured on the Fibre Channel switch that connects the cluster nodes and the shared storage device. This is a distinct difference between direct attached storage and FC or iSCSI shared storage. Both FC and iSCSI require a common fiber or Ethernet switch and network to establish and maintain connections between the hosts and the storage.

A properly configured FC zone for a cluster will include the World Wide Port Number
(WWPN) of each cluster host’s FC HBAs and the WWPN of the HBA controller(s) from the shared storage device. If either the server or the storage device utilizes multiple HBAs to connect to a single or multiple FC switches to provide failover or load-balancing functionality, this is known as Multipath I/O (MPIO) and a qualified driver for MPIO management and communication must be used. Also, the function of either MPIO failover and/or MPIO load balancing must be verified as approved for Windows Server 2008 R2. Consult the shared storage vendor, including the Fibre Channel switch vendor, for documentation and supported configurations, and check the cluster Hardware Compatibility List (HCL) on the Microsoft website to find approved configurations.


iSCSI Storage
When organizations want to utilize iSCSI storage for Windows Server 2008 R2 failover clusters, security and network isolation is highly recommended. iSCSI utilizes an initiator on the host that requires access to the LUNs or iSCSI targets. Targets are located or hosted on iSCSI target portals. Using the target portal interface, the target must be configured to be accessed by multiple initiators in a cluster configuration. Both the iSCSI initiators and target portals come in software- and hardware-based models, but both models utilize IP networks for communication between the initiators and the targets. The targets need to be presented to Windows as a basic disk. When standard network cards will be used for iSCSI communication on Windows Server 2008 R2 systems, the built-in Windows Server 2008 R2 iSCSI initiator can be used, provided that the iSCSI target can support the authentication and security options provided, if used.

Regardless of the choice of the Microsoft iSCSI initiator, software- or hardware-based initiators, or targets, iSCSI communication should be deployed on isolated network segments and preferably dedicated network switches and network interface cards. Furthermore, the LUNs presented to the failover cluster should be masked and secured from any systems that are not nodes participating in the cluster, by using authentication and IPSec communication, when possible. Within the Windows Server 2008 R2 operating system, the iSCSI HBA or designated network card should not be used for any failover cluster configuration and cannot be deployed using network teaming software or it will not be supported by Microsoft.

Hopefully by now, it is very clear that Microsoft only wants to support organizations that deploy failover clusters on tested and approved entire systems, but in many cases, failover clusters can still be deployed and can function, as the Create a Cluster Wizard will allow a cluster to be deployed that is not in a supported configuration.


Multipath I/O
Windows Server 2008 R2 supports Multipath I/O to external storage devices such as SANs and iSCSI targets when multiple HBAs are used in the local system or by the shared storage. Multipath I/O can be used to provide failover access to disk storage in case of a controller or HBA failure, but some drivers also support load balancing across HBAs in both standalone and failover cluster deployments. Windows Server 2008 R2 provides a built-in Multipath I/O driver for iSCSI that can be leveraged when the manufacturer conforms to the necessary specifications to allow for the use of this built-in driver. The iSCSI initiator built in to Windows Server 2008 R2 is very user friendly and makes adding iSCSI targets simple and easy by making new targets reconnect by default. Multipath I/O (MPIO) support is also installed by default, and this is different from previous releases of the iSCSI initiator software.


Volume Shadow Copy for Shared Storage Volume
The Volume Shadow Copy Service (VSS) is supported on shared storage volumes. Volume Shadow Copy can take a point-in-time snapshot of an entire volume, enabling administrators and users to recover data from a previous version. Furthermore, failover clusters and the entire Windows Server Backup architecture utilize VSS to store backup data. Many of today’s services and applications that are certified to work on Windows Server 2008 R2 failover clusters are VSS compliant; careful choice and consideration should be made when choosing an alternative backup system, unless the system is provided by the shared storage manufacturer and certified to work in conjunction with VSS, Windows Server 2008 R2, and the service or application running on the failover cluster.

Source of Information : Sams - Windows Server 2008 R2 Unleashed (2010)

Choosing Applications for Failover Clusters

Many applications can run on failover clusters, but it is important to choose and test those applications wisely. Although many can run on failover clusters, the application might not be optimized for clustering or supported by the software vendor or Microsoft when deployed on Windows Server 2008 R2 failover clusters. Work with the vendor to determine requirements, functionality, and limitations (if any). Other major criteria that should be met to ensure that an application can benefit and adapt to running on a cluster are the following:

» Because clustering is IP-based, the cluster application or applications must use an IPbased protocol.

» Applications that require access to local databases must have the option of configuring where the data can be stored so a drive other than the system drive can be specified for data storage that is separate from the storage of the application core files.

» Some applications need to have access to data regardless of which cluster node they are running on. With these types of applications, it is recommended that the data is stored on a shared disk resource that will failover with the Services and Applications group. If an application will run and store data only on the local system or boot drive, the Node Majority Quorum or the Node and File Share Majority Quorum model should be used, along with a separate file replication mechanism for the application data.

» Client sessions must be able to reestablish connectivity if the application encounters a network disruption or fails over to an alternate cluster node. During the failover process, there is no client connectivity until an application is brought back online. If the client software does not try to reconnect and simply times out when a network connection is broken, this application might not be well suited for failover or NLB clusters.

Cluster-aware applications that meet all of the preceding criteria are usually the best applications to deploy in a Windows Server 2008 R2 failover cluster. Many services built in to Windows Server 2008 R2 can be clustered and will failover efficiently and properly. If a particular application is not cluster-aware, be sure to investigate all the implications of the application deployment on Windows Server 2008 R2 failover clusters before deploying or spending any time prototyping the solution.

If you’re purchasing a third-party software package to use for Windows Server 2008 R2 failover clustering, be sure that both Microsoft and the software manufacturer certify that it will work on Windows Server 2008 R2 failover clusters; otherwise, support will be limited or nonexistent when troubleshooting is necessary.

Source of Information : Sams - Windows Server 2008 R2 Unleashed (2010)

Failover Cluster Quorum Models

Windows Server 2008 R2 failover clusters support four different cluster quorum models. Each of these four models is best suited for specific configurations but if all the nodes and shared storage are configured, specified, and available during the installation of the failover cluster, the best-suited quorum model is automatically selected. Node Majority Quorum

The Node Majority Quorum model has been designed for failover cluster deployments that contain an odd number of cluster nodes. When determining the quorum state of the cluster, only the number of available nodes is counted. A cluster using the Node Majority Quorum is called a Node Majority cluster. A Node Majority cluster remains up and running if the number of available nodes exceeds the number of failed nodes. As an example, in a five-node cluster, three nodes must be available for the cluster to remain online. If three nodes fail in a five-node Node Majority cluster, the entire cluster is shut down. Node Majority clusters have been designed and are well suited for geographically or network dispersed cluster nodes, but for this configuration to be supported by Microsoft, it takes serious effort, quality hardware, a third-party mechanism to replicate any back-end data, and a very reliable network. Once again, this model works well for clusters with an odd number of nodes.


Node and Disk Majority Quorum
The Node and Disk Majority Quorum model determines whether a cluster can continue to function by counting the number of available nodes and the availability of the cluster witness disk. Using this model, the cluster quorum is stored on a cluster disk that is accessible and made available to all nodes in the cluster through a shared storage device using Serial Attached SCSI (SAS), Fibre Channel, or iSCSI connections. This model is the closest to the traditional single-quorum device cluster configuration model and is composed of two or more server nodes that are all connected to a shared storage device. In this model, only one copy of the quorum data is maintained on the witness disk. This model is well suited for failover clusters using shared storage, all connected on the same network with an even number of nodes. For example, on a 2-, 4-, 6-, 8-, or 16-node cluster using this model, the cluster continues to function as long as half of the total nodes are available and can contact the witness disk. In the case of a witness disk failure, a majority of the nodes need to remain up and running for the cluster to continue to function. To calculate this, take half of the total nodes and add one and this gives you the lowest number of available nodes that are required to keep a cluster running when the witness disk fails or goes offline. For example, on a 6-node cluster using this model, if the witness disk fails, the cluster will remain up and running as long as 4 nodes are available, but on a 2-node cluster, if the witness disk fails, both nodes will need to remain up and running for the cluster to function.


Node and File Share Majority Quorum
The Node and File Share Majority Quorum model is very similar to the Node and Disk Majority Quorum model but instead of a witness disk, the quorum is stored on file share. The advantage of this model is that it can be deployed similarly to the Node Majority Quorum model but as long as the witness file share is available, this model can tolerate the failure of half of the total nodes. This model is well suited for clusters with an even number of nodes that do not utilize shared storage or clusters that span sites. This is the preferred and recommended quorum configuration for geographically dispersed failover clusters.


No Majority: Disk Only Quorum
The No Majority: Disk Only Quorum model is best suited for testing the process and behavior of deploying built-in or custom services and/or applications on a Windows Server 2008 R2 failover cluster. In this model, the cluster can sustain the failover of all nodes except one, as long as the disk containing the quorum remains available. The limitation of this model is that the disk containing the quorum becomes a single point of failure and that is why this model is not well suited for production deployments of failover clusters.

As a best practice, before deploying a failover cluster, determine if shared storage will be used, verify that each node can communicate with each LUN presented by the shared storage device, and when the cluster is created, add all nodes to the list. This ensures that the correct recommended cluster quorum model is selected for the new failover cluster. When the recommended model utilizes shared storage and a witness disk, the smallest available LUN will be selected. This can be changed, if necessary, after the cluster is created.

Source of Information : Sams - Windows Server 2008 R2 Unleashed (2010)

Determining the Correct Clustering Technology

For either of the Windows Server 2008 R2 fault-tolerant clustering technologies to be most effective, administrators must carefully choose which technology and configuration best fits their application or service requirements. NLB is best suited to provide connectivity to TCP/IP-based services such as Remote Desktop Services, web-based services and applications, VPN services, streaming media, and proxy services. NLB is easily scalable and the number of clients that can be supported is based on the number of clients a single NLB cluster node can support multiplied by the number of nodes in the cluster. Windows Server 2008 R2 failover clusters provide system failover functionality for mission-critical applications, such as enterprise messaging, databases, file servers, print services, DHCP services, Hyper-V virtualization services, and many other built-in Windows Server 2008 R2 roles, role services, and features.

Although Microsoft does not support using both NLB and failover clusters on the same server, multitiered applications can take advantage of both technologies by using NLB to load-balance front-end application servers and using failover clusters to provide failover capabilities to back-end databases that contain data too large to replicate during the day or if the back end cannot withstand more than a few minutes of downtime if a node or service encounters a failure.


Failover Clusters
Windows Server 2008 R2 failover clusters are a clustering technology that provides systemlevel fault tolerance by using a process called failover. Failover clusters are best used to provide access to resources such as file shares, print queues, email or database services, and back-end applications. Applications and network services defined and managed by the failover cluster, along with cluster hardware including shared disk storage and network cards, are called cluster resources. When services and applications are cluster-aware or certified to work with Windows Server 2008 R2 failover clusters, they are monitored and managed by the cluster service to ensure proper operation.

When a problem is encountered with a cluster resource, the failover cluster service attempts to fix the problem by restarting the resource and any dependent resources. If that doesn’t work, the Services and Applications group the resource is a member of is failed over to another available node in the cluster, where it can then be restarted. Several conditions can cause a Services and Applications group to failover to a different cluster node. Failover can occur when an active node in the cluster loses power or network connectivity or suffers a hardware or software failure. In most cases, the failover process is either noticed by the clients as a short disruption of service or is not noticed at all. Of course, if failback is configured on a particular Services and Applications group and the group is simply not stable but all possible nodes are available, the group will be continually moved back and forth between the nodes until the failover threshold is reached. When this happens, the group will be shut down and remain offline by the cluster service.

To avoid unwanted failover, power management should be disabled on each of the cluster nodes in the motherboard BIOS, on the network interface cards (NICs), and in the Power applet in the operating system’s Control Panel. Power settings that allow a display to shut off are okay, but the administrator must make sure that the disks, as well as each of the network cards, are configured to never go into Standby mode.

Cluster nodes can monitor the status of resources running on their local system, and they can also keep track of other nodes in the cluster through private network communication messages called heartbeats. Heartbeat communication is used to determine the status of a node and send updates of cluster configuration changes and the state of each node to the cluster quorum.

The cluster quorum contains the cluster configuration data necessary to restore a cluster to a working state. Each node in the cluster needs to have access to the quorum resource, regardless of which quorum model is chosen or the node will not be able to participate in the cluster. This prevents something called “split-brain” syndrome, where two nodes in the same cluster both believe they are the active node and try to control the shared resource at the same time or worse, each node can present its own set of data, when separate data sets are available, which causes changes in both data sets and a whirlwind of proceeding issues.


Network Load Balancing
The second clustering technology provided with Windows Server 2008 R2 is Network Load Balancing (NLB). NLB clusters provide high network performance, availability, and redundancy by balancing client requests across several servers with replicated configurations. When client load increases, NLB clusters can easily be scaled out by adding more nodes to the cluster to maintain or provide better response time to client requests. One important point to note now is that NLB does not itself replicate server configuration or application data sets.

Two great features of NLB are that no proprietary hardware is needed and an NLB cluster can be configured and up and running literally in minutes. One important point to remember is that within NLB clusters, each server’s configuration must be updated independently. The NLB administrator is responsible for making sure that application or service configuration, version and operating system security, and updates and data are kept consistent across each NLB cluster node.

Source of Information : Sams - Windows Server 2008 R2 Unleashed (2010)

Windows Server 2008 R2 Cluster Terminology

Before failover or NLB clusters can be designed and implemented, the administrator deploying the solution should be familiar with the general terms used to define the clustering technologies. The following list contains the many terms associated with Windows Server 2008 R2 clustering technologies:

» Cluster—A cluster is a group of independent servers (nodes) that are accessed and presented to the network as a single system.

» Node—A node is an individual server that is a member of a cluster.

» Cluster resource—A cluster resource is a service, application, IP address, disk, or network name defined and managed by the cluster. Within a cluster, cluster resources are grouped and managed together using cluster resource groups, now known as Services and Applications groups.

» Services and Applications group—Cluster resources are contained within a cluster in a logical set called a Services and Applications group or historically referred to as a cluster group. Services and Applications groups are the units of failover within the cluster. When a cluster resource fails and cannot be restarted automatically, the Services and Applications group this resource is a part of will be taken offline, moved to another node in the cluster, and the group will be brought back online.

» Client Access Point—A Client Access Point is a term used in Windows Server 2008 R2 failover clusters that represents the combination of a network name and associated IP address resource. By default, when a new Services and Applications group is defined, a Client Access Point is created with a name and an IPv4 address. IPv6 is supported in failover clusters but an IPv6 resource either needs to be added to an existing group or a generic Services and Applications group needs to be created with the necessary resources and resource dependencies.

» Virtual cluster server—A virtual cluster server is a Services or Applications group that contains a Client Access Point, a disk resource, and at least one additional service or application-specific resource. Virtual cluster server resources are accessed either by the domain name system (DNS) name or a NetBIOS name that references an IPv4 or IPv6 address. A virtual cluster server can in some cases also be directly accessed using the IPv4 or IPv6 address. The name and IP address remain the same regardless of which cluster node the virtual server is running on.

» Active node—An active node is a node in the cluster that is currently running at least one Services and Applications group. A Services and Applications group can only be active on one node at a time and all other nodes that can host the group are considered passive for that particular group.

» Passive node—A passive node is a node in the cluster that is currently not running any Services and Applications groups.

» Active/passive cluster—An active/passive cluster is a cluster that has at least one node running a Services and Applications group and additional nodes the group can be hosted on, but are currently in a waiting state. This is a typical configuration when only a single Services and Applications group is deployed on a failover cluster.

» Active/active cluster—An active/active cluster is a cluster in which each node is actively hosting or running at least one Services and Applications group. This is a typical configuration when multiple groups are deployed on a single failover cluster to maximize server or system usage. The downside is that when an active system fails, the remaining system or systems need to host all of the groups and provide the services and/or applications on the cluster to all necessary clients.

» Cluster heartbeat—The cluster heartbeat is a term used to represent the communication that is kept between individual cluster nodes that is used to determine node status. Heartbeat communication can occur on a designated network but is also performed on the same network as client communication. Due to this internode communication, network monitoring software and network administrators should be forewarned of the amount of network chatter between the cluster nodes. The amount of traffic that is generated by heartbeat communication is not large based on the size of the data but the frequency of the communication might ring some network alarm bells.

» Cluster quorum—The cluster quorum maintains the definitive cluster configuration data and the current state of each node, each Services and Applications group, and each resource and network in the cluster. Furthermore, when each node reads the quorum data, depending on the information retrieved, the node determines if it should remain available, shut down the cluster, or activate any particular Services and Applications groups on the local node. To extend this even further, failover clusters can be configured to use one of four different cluster quorum models and essentially the quorum type chosen for a cluster defines the cluster. For example, a cluster that utilizes the Node and Disk Majority Quorum can be called a Node and Disk Majority cluster.

» Cluster witness disk or file share—The cluster witness or the witness file share are used to store the cluster configuration information and to help determine the state of the cluster when some, if not all, of the cluster nodes cannot be contacted.

» Generic cluster resources—Generic cluster resources were created to define and add new or undefined services, applications, or scripts that are not already included as available cluster resources. Adding a custom resource provides the ability for that resource to be failed over between cluster nodes when another resource in the same Services and Applications group fails. Also, when the group the custom resource is a member of moves to a different node, the custom resource will follow. One disadvantage or lack of functionality with custom resources is that the Failover Clustering feature cannot actively monitor the resource and, therefore, cannot provide the same level of resilience and recoverability as with predefined cluster resources. Generic cluster resources include the generic application, generic script, and generic service resource.

» Shared storage—Shared storage is a term used to represent the disks and volumes presented to the Windows Server 2008 R2 cluster nodes as LUNs. In particular, shared storage can be accessed by each node on the cluster, but not simultaneously.

» Cluster Shared Volumes—A Cluster Shared Volume is a disk or LUN defined within the cluster that can be accessed by multiple nodes in the cluster simultaneously. This is unlike any other cluster volume, which normally can only be accessed by one node at a time, and currently the Cluster Shared Volume feature is only used on Hyper-V clusters but its usage will be extended in the near future to any failover cluster that will support live migration.

» LUN—LUN stands for Logical Unit Number. A LUN is used to identify a disk or a disk volume that is presented to a host server or multiple hosts by a shared storage array or a SAN. LUNs provided by shared storage arrays and SANs must meet many requirements before they can be used with failover clusters but when they do, all active nodes in the cluster must have exclusive access to these LUNs.

» Failover—Failover is the process of a Services and Applications group moving from the current active node to another available node in the cluster when a cluster resource fails. Failover occurs when a server becomes unavailable or when a resource in the cluster group fails and cannot recover within the failure threshold.

» Failback—Failback is the process of a cluster group automatically moving back to a preferred node after the preferred node resumes operation. Failback is a nondefault configuration that can be enabled within the properties of a Services and Applications group. The cluster group must have a preferred node defined and a failback threshold defined as well, for failback to function. A preferred node is the node you would like your cluster group to be running or hosted on during regular cluster operation when all cluster nodes are available. When a group is failing back, the cluster is performing the same failover operation but is triggered by the preferred node rejoining or resuming cluster operation instead of by a resource failure on the currently active node.

» Live Migration—Live Migration is a new feature of Hyper-V that is enabled when Virtual Machines are deployed on a Windows Server 2008 R2 failover cluster. Live Migration enables Hyper-V virtual machines on the failover cluster to be moved between cluster nodes without disrupting communication or access to the virtual machine. Live Migration utilizes a Cluster Shared Volume that is accessed by all nodes in the group simultaneously and it transfers the memory between the nodes during active client communication to maintain availability. Live Migration is currently only used with Hyper-V failover clusters but will most likely extend to many other Microsoft services and applications in the near future.

» Quick Migration—With Hyper-V virtual machines on failover clusters, Quick Migration provides the option for failover cluster administrators to move the virtual machine to another node without shutting the virtual machine off. This utilizes the virtual machine’s shutdown settings options and if set to Save, the default setting, performing a Quick Migration will save the current memory state, move the virtual machine to the desired node, and resume operation shortly. End users should only encounter a short disruption in service and should reconnect without issue depending on the service or application hosted within that virtual machine. Quick Migration does not require Cluster Shared Volumes to function.

» Geographically dispersed clusters—These are clusters that span physical locations and sometimes networks to provide failover functionality in remote buildings and data centers, usually across a WAN link. These clusters can now span different networks and can provide failover functionality, but network response and throughput must be good and data replication is not handled by the cluster.

» Multisite cluster—Geographically dispersed clusters are commonly referred to as multisite clusters as cluster nodes are deployed in different Active Directory sites. Multisite clusters can provide access to resources across a WAN and can support automatic failover of Services and Applications groups defined within the cluster.

» Stretch clusters—A stretch cluster is a common term that, in some cases, refers to geographically dispersed clusters in which different subnets are used but each of the subnets is part of the same Active Directory site—hence, the term stretch, as in stretching the AD site across the WAN. In other cases, this term is used to describe a geographically dispersed cluster, as in the cluster stretches between geographic locations.

Source of Information : Sams - Windows Server 2008 R2 Unleashed (2010)

Windows Server 2008 R2 Clustering Technologies

Windows Server 2008 R2 provides two clustering technologies, which are both included on the Enterprise and Datacenter Editions. Clustering is the grouping of independent server nodes that are accessed and viewed on the network as a single system. When a service and/or application is run from a cluster, the end user can connect to a single cluster node to perform his work, or each request can be handled by multiple nodes in the cluster. In cases where data is read-only, the client might request data from one server in the cluster and the next request might be made to a different server and the client would never know the difference. Also, if a single node on a multiple node cluster fails, the remaining nodes will continue to service client requests and only the clients that were originally connected to the failed node may notice either a slight interruption in service, or their entire session might need to be restarted depending on the service or application in use and the particular clustering technology that is in use for that cluster.

The first clustering technology provided with Windows Server 2008 R2, Enterprise and Datacenter Editions is failover clustering. Failover clusters provide system fault tolerance through a process called failover. When a system or node in the cluster fails or is unable to respond to client requests, the clustered services or applications that were running on that particular node are taken offline and moved to another available node where functionality and access are restored. Failover clusters, in most deployments, require access to shared data storage and are best suited, but not necessarily limited to, the deployment of the following services and applications:

» File services—File services deployed on failover clusters provide much of the same functionality a standalone Windows Server 2008 R2 system can provide, but when deployed as clustered file services, a single data storage repository can be presented and accessed by clients through the currently assigned and available cluster node without replicating the file data.

» Print services—Print services deployed on failover clusters have one main advantage over a standalone print server: If the active print server fails, each of the shared printers is made available to clients using another designated print server in the cluster. Although deploying and replacing printers to computers and users is easily managed using Group Policy deployed printers, when standalone print servers fail, the impact can be huge, especially when servers, devices, services, and applications that cannot be managed with group policies access these printers.

» Database services—When large organizations deploy line-of-business applications, e-commerce, or any other critical services or applications that require a back-end database system that must be highly available, deploying database services on failover clusters is the preferred method. Also, in many cases configuring enterprise database services can take hours and the size of the databases, indexes, and logs can be huge, so deploying database services on a standalone system encountering a system failure may results in several hours of undesired downtime during repair or restore, instead of quick recovery as with a failover cluster.

» Back-end enterprise messaging systems—For many of the same reasons as cited previously for deploying database services, enterprise messaging services have become critical to many organizations and are best deployed in failover clusters.

» Hyper-V virtual machines—As many organizations move toward server consolidation and conversion of physical servers to virtual servers, providing a means to also maintain high availability and reliability has become even more essential when a single physical Hyper-V host has several critical virtual machines running on it.

The second Windows Server 2008 R2 clustering technology is Network Load Balancing (NLB), which is best suited to provide fault tolerance for front-end web applications and websites, Remote Desktop Services Session Host server systems, VPN servers, streaming media servers, and proxy servers. NLB provides fault tolerance by having each server in the cluster individually run the network services or applications, removing any single points of failure. Depending on the particular needs of the service or application deployed on an NLB cluster, there are different configuration or affinity options to determine how clients will be connected to the back-end NLB cluster nodes. For example, on a read-only website, client requests can be directed to any of the NLB cluster nodes; during a single visit to a website, a client might be connected to different NLB cluster nodes. As another example, when a client attempts to utilize an e-commerce application to purchase goods or services provided through a web-based application on an NLB cluster, the client session should be initiated and serviced by a single node in the cluster, as this session will most likely be using Secure Sockets Layer (SSL) encryption and will also contain specific session data, including the contents of the shopping cart and the end-user specific information.

Source of Information : Sams - Windows Server 2008 R2 Unleashed (2010)

HYPER-V Migrations

Migrating servers is not new; for years, IS professionals have been migrating servers from one piece of hardware to another. Let us face it; we are not quite to the point of hardware life cycles matching up with software life cycles, so there is a really good chance that the brand-new enterprise application you are implementing today will outlive the server platform it is being deployed on; anybody supporting legacy applications knows this well. The introduction of virtualization to the mainstream began to change the rules. Of course, nobody wants to jump right in and be the first to move revenue-generating powerhouse applications to a virtual environment, no matter how hard our inner geek is screaming for us to do so. So starting in the late 1990s, IS professionals began to use virtual environments for testing—not just for applications, but the virtualization technology itself. In its infancy, virtualization technology simply did not allow for migrating existing platforms into the virtual world; as a result, over the years, we all became very fluent at building virtual servers and workstations from scratch. Of course, this process evolved into creating templates or prestaging copies of virtual operating systems that could be implemented within minutes. It was only a matter of time before the technology would catch up to the desire to import an existing physical server platform into the virtual world. We are now on the cusp of flawless migrations of physical server platforms into the virtual world, and doing so seamlessly while the server is in use by end users, without impacting performance.

In the brave new world of using virtualization to support frontline production operations, the upper echelon of management is looking to their own engineers and trusted vendors to provide the confidence and expertise needed to begin moving more toward not only data center virtualization but also a truly dynamic data center environment. You should expect hesitation in the discussions to implement production supporting virtualization; there is a mind-set obstacle that many managers need to overcome when it comes to the use of virtualization. Managers comment on preferring a “real” server over a virtual platform. For some it is as simple as preferring to have a solid object to visualize in their mind, whereas others simply have skepticism about the technology itself. Regardless of the reasoning, your design should be written in a way to accommodate and address all concerns. Your approach needs to be focused on accomplishing your goals in a feasible, appropriate way. Do not virtualize for the sake of virtualizing.

Microsoft calls out some specific best practices when migrating specialized server platforms. Familiarize yourself with these best practices thoroughly before attempting to migrate any of the specialty servers.

Source of Information : Elsevier-Microsoft Virtualization Master Microsoft Server Desktop Application and Presentation

Microsoft Clustering Concepts

In a Hyper-V environment, it may be useful to go over some of the concepts around clustering, and Microsoft’s solutions for providing clustering technology.

Clustering is done by using technology to group two or more servers together into a single functional unit. Traditional clustering techniques require the individual servers, or “nodes,” to be physically identical—that is, they need to have the same hardware configuration, memory size, CPU speed, etc. They also need to be identical in software configuration, with the same OS and (ideally) patch levels, and the same applications installed. In addition, all nodes of a cluster must have access to the same data storage, at the same speed; this is typically done with a high performance SAN.

There are two common types of clusters: fail-over and load-balanced. Failover clusters consist of a single node that typically handles all of the client requests, called the Primary node, and one or more nodes that are largely inactive unless the Primary node goes offline; these are called Secondary nodes. In a load-balanced cluster, all of the nodes participate actively in serving client requests. In most cases, a load-balanced cluster can also serve as a fail-over cluster, since one or more nodes of the load-balanced cluster can typically fail without the other nodes being impacted.

In any cluster, two major challenges present themselves: determining the status of a node member (particularly in fail-over clusters), and determining which node of a cluster currently controls a clustered application and its data. The first challenge is met with a heartbeat network, which is typically a physically separate set of network cards that communicate a signal, or heartbeat, to determine the status of each node. Data ownership is tracked by a data partition called the Quorum. The Quorum is a separate partition from the shared data, that also needs to be equally accessible to all nodes in a cluster. The quorum tracks which node is the owner of a given set of applications or data.

Source of Information : Elsevier-Microsoft Virtualization Master Microsoft Server Desktop Application and Presentation

Virtual Machine Manager

Virtual Machine Manager (VMM) 2008 is part of Microsoft’s comprehensive System Center system management suite.VMMenables you to manage your virtual environment from one central console with the tools needed to get the most out of your data center. LeveragingVMMfacilitates maximization of physical server resources, improves agility and virtual machine deployment, and allows your company to continue to leverage the skill sets of your existing IT staff.

If you are responsible for a medium-sized to a large virtualized enterprise, then at some point you will likely realize that the management of multiple virtual host machines can be very cumbersome and time-consuming when using the standard console for each machine. VMM simplifies their management by consolidating everything you need into one console. In fact, you can manage your Hyper-V, Virtual Server, and even VMware ESX hosts all via VMM.

If you are already using System Center Operations Manager (SCOM), you can use the data it collects via its monitoring capabilities to further advantage. Performance and Resource Optimization (PRO) leverages it and recommends actions to be taken to improve the performance of your virtual machines. You can even configure it to automatically make certain adjustments on your behalf in order to maintain the level of performance your customers require.

VMM not only consolidates the functionality built into Hyper-V but also adds to it. With VMM performing, physical-to-virtual (P2V) migrations are greatly simplified and can be done without service interruption. VMM will also convert your VMware machines to VHDs using a similar technique called virtual-to-virtual transfers.

For development and testing environments, VMM provides a self-service Web portal you can configure to delegate virtual machine provisioning while maintaining management control of the VMs.

By implementing a centralized library for the storage virtual machine components, you can leverage these building blocks to quickly stand up new virtual machines as demand dictates.

You can also create scripts to increase your level of automation, because VMM is built on Windows PowerShell. The various wizards included in VMM are typically just a pretty interface to generate a PowerShell script. VMMprovides the functionality to view the code behind these scripts to help expand your knowledge of the scripting language.


System Requirements
Before designing your VMM architecture, you should decide whether you want to implement VMMon a single server or share the load ofVMM’s multiple components across separate servers. As a rule of thumb, if you need to support 20 or fewer virtual machine hosts, you can use a smaller single processor server. If you suspect you will eventually have a greater number, up to 150 or more servers, then consider a multiple processor server. If you will be supporting groups of virtual machine hosts in diverse locations, you might want to use a multiple server approach. For a complete list of hardware requirements and software prerequisites for either deployment option, visit the TechNet Web site at http://www.microsoft.com/systemcenter/virtualmachinemanager/en/us/system-requirements.aspx#Server.

Source of Information : Elsevier-Microsoft Virtualization Master Microsoft Server Desktop Application and Presentation

Hardware requirements for Hyper-V

The hardware requirements for Hyper-V are not much different from the base requirements for Windows Server 2008. Windows Server 2008 is available in both 32-bit and 64-bit configurations, while Windows Server 2008 R2 is 64-bit only. Hyper-V, however, is only available on the 64-bit editions of Windows Server. The CPU must have the necessary virtualization extensions available and turned on in the BIOS. Both the major processor manufacturers (Intel and AMD) have CPUs with these extensions available.

In addition, the CPU must support hardware-enforced Data Execution Prevention, or DEP. For Intel processors, this requires enabling the XD bit; for AMD processors it is the NX bit. These functions are found in your computer’s BIOS settings.

Memory is another consideration. In addition to the underlying (or “host”) Operating System, the virtualized (or “guest”) operating system also requires its share of RAM; the more guests you plan to run simultaneously, the more memory you need. A good minimum to aim for is 4 GB of RAM, as this leaves sufficient memory for the guest and one or two hosts. HP has a sizing tool available at http://g3w1656g-vip.houston. hp.com/SB/Installs/HyperV_SizingTool.zip. Again, this is a minimum number. We seem Microsoft recommendations as high as 8 GB.

For hard drive space, you similarly want to consider the needs of the host and each guest operating system. In the case of the guests, you need enough disk space to accommodate all of the installed guests, applications, and data simultaneously, regardless of how many you intend to run at the same time.

For best results, Hyper-V requires a minimum of two physical network adapters: one for hypervisor management and one for VM to network connectivity. If you plan to cluster your devices, install a third adapter.

Source of Information : Elsevier-Microsoft Virtualization Master Microsoft Server Desktop Application and Presentation

Cloud storage is for blocks too, not just files

One of the misconceptions about cloud storage is that it is only useful for storing files. This assumption comes from the popularity of file...