Wiki source code of High Availability and Disaster Recovery Technologies

Last modified by Billie D on 2021/02/15 17:38
{{box cssClass="floatinginfobox" title="**Contents**"}}
{{toc /}}
{{/box}}

[[image:About Us.WebHome@myitguide_small_01.jpg]]

SQL Server (2019) provides a full suite of technologies for implementing high availability and disaster recovery. The following sections provide an overview of these technologies.

= AlwaysOn Failover Clustering =

An **AlwaysOn Failover Clustered Instance** (**FCI**) is an instance of SQL Server that spans the servers within this group. If one of the servers within this group fails, another server takes ownership of the instance.  Its most appropriate usage is for high availability scenarios where the **databases are large or have high write** profiles. This is because **clustering relies on shared storage**, meaning the data is only written to disk once.  With SQL Server–level HA technologies, write operations occur on the primary database and then again on all secondary databases, before the commit on the primary completes.

This can cause performance issues. Even though it is possible to stretch a cluster across multiple sites, this involves **SAN replication**, which means that a cluster is normally configured within a single site.  Each server within a cluster is called a **node**.  __Each node__ within a cluster has the __SQL Server binaries installed__, but the SQL __Server service is only started on **one** of the nodes__, which is known as the **active node**.  Each node within the cluster also **shares the same storage** for the **SQL Server data and log files**. The storage, however, is **only attached to the active node**.

If the __active node fails__, then the __SQL Server service is stopped__ and the __storage is detached__. The storage is then __reattached to one of the other nodes in the cluster,__ and the __SQL Server service is started__ on this node, which is now the active node.  The instance is also assigned its **own network name and IP address**, which are also bound to the **active node**.  This means that __applications__ can __connect seamlessly__ to the instance, __regardless of which node has ownership__.  Each node still has a **dedicated system volume**. This volume contains the **SQL Server binaries**.

[[image:SQLAG001.JPG||height="582" width="792"]]

In **geographically dispersed clusters** (known as **geoclusters** or **stretch clusters**), __each server is attached to different storage__. The volumes are updated by **SAN replication** or **Windows Storage Replica** (a Windows Server technology, introduced in __Windows Server 2016__, which performs storage replication). The cluster regards the __two volumes as a single__, **shared volume**, which can __only be attached to one node at a time__.


== Active/Active Configuration ==

It is also possible to have an **active/active configuration**.  Although it is __not possible for more than one node at a time to own a single instance__, and therefore it is __not possible to implement load balancing__, it is, however, **possible to install multiple instances on a cluster**, and a
**different node may own each instance**.  __**Multiple instance**__.  In this scenario, each node has its __own unique network name__ and __IP address__. **Each** instance’s shared storage also **consists of a unique set of volumes**.

Therefore, in an active/active configuration, during normal operations, **Node1** may **host Instance1** and **Node2** may **host Instance2**. If **Node1 fails**, **both instances** are then **hosted by Node2**, __and vice versa__.

[[image:SQLAG002.JPG||height="593" width="867"]]

== Three-Plus Node Configurations ==

It is possible to have up to 64 nodes in a cluster. When you have __three or more nodes__, it is __unlikely that you will want to have a single active node and two redundant nodes__, due to the associated **costs**. Instead, you can choose to implement an **N+1** or **N+M** configuration.  In an N+1 configuration, you have **multiple active nodes** and a **single passive node**. If a **failure** occurs on **any of the active nodes**, they **fail over to the passive node**.

In an active/active cluster, it is **important to consider resources in the  event of failover**. For example, if each node has 128GB of RAM and the instance hosted on each node is using 96GB of RAM and locking pages in memory, then when one node fails over to the other node, this node fails as well, because it does not have enough memory to allocate to both instances. Make sure you** plan both memory and processor requirements as if the two nodes are a single server**. For this reason, __active/active clusters are **not generally recommended** for SQL Server__. 

[[image:SQLAG003.JPG||height="686" width="1117"]]

**Multiple nodes may fail **over to the passive node. For this reason, you must be **very careful when you plan resources** to __ensure that the passive node is able to support multiple instances__. However, you can __mitigate this issue by using __an **N+M configuration**.


== Five-Node N+M Configuration ==

An N+M cluster has **multiple active** nodes and **multiple passive nodes**, although there are usually fewer passive nodes than there are active nodes.

Instance3 is configured to always fail over to one of the passive nodes, whereas Instance1 and Instance2 are configured to always fail over to the other passive node. This gives you the **flexibility** to **control resources on the passive nodes**, but you can also configure the cluster to
**allow any of the active nodes** to fail over **to either of the passive nodes**

[[image:SQLAG004.JPG||height="766" width="1338"]]

== Quorum ==

So that automatic failover can occur, the cluster service needs to know if a node goes down. In order to achieve this, you must form a quorum. The definition of a **quorum** is “__//The minimum number of members required in order for business to be carried out//__.” In
terms of high availability, this means that each node within a cluster, and optionally a **witness device** (which may be a **cluster disk**, a **file share** that is external to the cluster or **Azure BLOB storage**), **receives a vote,** is to avoid a situation known as a **//__split brain__//**

[[image:SQLAG005.JPG||height="411" width="1297"]]

To explain a **split-brain** scenario, imagine that you have three nodes in Data Center 1 and three nodes in Data Center 2. Now imagine that you lose network connectivity between the two data centers, yet all six nodes remain online. The three nodes in Data Center 1 believe that all of the nodes in Data Center 2 are unavailable. Conversely, the nodes in Data Center 2 believe that the nodes in Data Center 1 are unavailable. This leaves both sides (known as partitions) of the cluster thinking that  they should take control. This can have **unpredictable and undesirable consequences for any application** that successfully connects to one or the other partition. The **Quorum** = __(Voting Members / 2) **+ 1**__ formula protects against this scenario.

__If your cluster loses quorum__, then you **can force one partition online**, by **starting the cluster service** using the **/fq switch**. If you are using Windows Server 2012 R2 or higher, then the **partition that you force online** is considered the **authoritative partition**. This means that other partitions can automatically rejoin the cluster when connectivity is reestablished.

Although the __default option__ is __one node, one vote__, it is __possible to manually remove a node’s vote __by changing the **NodeWeight** property to **zero**. This is useful if you have a multi-subnet cluster (a cluster in which the nodes are split across multiple sites). In this scenario, it is recommended that you use a file share witness in a third site. This helps you avoid a cluster outage as a result of network failure between data centers.

Modern versions of Windows Server also support the concepts of **Dynamic Quorum** and **Tie Breaker for 50% Node Split**. When Dynamic Quorum is enabled, the __cluster service automatically decides__ whether or __not to give the quorum witness a vote__, depending on the number of nodes in the cluster. If you have an __even number of nodes, then it is assigned a vote__. If you have an __odd number of nodes, it is not assigned a vote__.

= AlwaysOn Availability Groups =

**AlwaysOn Availability Groups** (**AOAG**) replaces database mirroring and is essentially a merger of database mirroring and clustering technologies.

SQL Server is installed as a **stand-alone instance** (as opposed to an AlwaysOn Failover Clustered Instance) on each node of a cluster. A cluster-aware application, called an **Availability Group Listener,** is then installed on the cluster; it is used __to direct traffic to the correct node__. Instead of
relying on shared disks, however, **AOAG compresses the log stream** and **sends it to the other nodes**, in a __similar fashion to database mirroring__.   When you are using AOAG, **failover **__does not occur at the database level__, __nor at the instance level.__ Instead, failover occurs at the **level of the availability group,  **__availability group__ is a concept that allows you to __group related databases together__ so that they can __fail over as an atomic unit, it allows you to group together the databases that map to a single application.__

__No hard limits are imposed for the number of availability groups you can configure on an instance, nor are there any hard limits for the number of databases on an instance that can take part in AOAG.__  **Microsoft**, however, has tested up to, and officially recommends, __a maximum of 100__ databases and 10 availability groups per instance.  When you are using AOAG, failover does not occur at the database level, nor at the instance level. Instead, failover occurs at the level of the availability group.  The availability group is a concept that allows you __to group related databases__ together so that they can fail over as an __atomic unit, to group together the databases that map to a single application.__  The main limiting factor in scaling the number of databases is that AOAG uses a database mirroring endpoint and there can only be one per instance. This means that the __log stream for all data modifications__ is sent over the **same endpoint.**

 AOAG is the most appropriate technology for high availability in scenarios where you have small databases with **low write profiles**.  This is because, when used **synchronously**, it requires that the __data is committed on all synchronous replicas__ before it is committed on the primary database.

Availability Groups, on the other hand, allow you to configure one or more **replicas as readable**. The **only limitation** is that **readable replicas** and **automatic failover** **__//cannot be configured//__** on the **same secondaries**.   The norm, however, would be to __configure readable secondary__
replicas in **asynchronous commit mode** so that they do not impair performance.   To further simplify this, the __Availability Group Replica __checks for the **read-only** or **read-intent** properties in an //__applications connection string__// and points the application  to the appropriate node. This means that you can easily **scale reporting and database maintenance routines horizontally** with __very little development effort__ and with the __applications being able __to use a __single connection string__.

[[image:SQLAG006.JPG||height="845" width="656"]]

You can have up to **eight replicas**, including **three synchronous replicas**. AOAG may also be the most appropriate technology for implementing high availability in a **virtualized environment**.  This is because the **shared disk required by clustering** may __**not be compatible**__ with some features of the __virtual estate__.

**AOAG **is the most appropriate technology for **DR** when you have a **proactive failover requirement** but when you **do not need to implement a load delay**. AOAG may also be suitable for **disaster recovery in scenarios** where you wish to utilize your **DR server for offloading reporting**. This allows the **redundant servers to be utilized**. When used for __disaster recovery, AOAG works in an **asynchronous** mode__. This means that it is possible to lose data in the event of a failover. The RPO is nondeterministic and is based on the time of the last uncommitted transaction.

If you’re sole requirement is **read-scaling**, as opposed to HA or DR, then it is actually possible to configure **Availability Groups with no cluster,** from __SQL Server 2017 onward__. In this case, there is **no cluster service** and hence **no automatic redirection**. Replicas within the Availability Groups **__use certificate__** when communicating with each other. This is also true if you configure Availability Groups **without AD, in a workgroup**, __or cross-domain__.

Because AOAG allows you to combine synchronous replicas (with or without automatic failover), asynchronous replicas, and replicas for read-only access, it allows you to satisfy high availability, __disaster recovery,__ and __reporting scale-out requirements__ using a single technology.


The diagram in //Figure 2-6// depicts an AlwaysOn Availability Group topology. In this example, there are four nodes in the cluster and a disk witness. Node1 is hosting the primary replicas of the databases, Node2 is being used for automatic failover, Node3 is being used to offload reporting, and Node4 is being used for DR. Because the cluster is stretched across two data centers, multi-subnet clustering has been implemented. Because there is no shared storage, however, there is no need for SAN replication between the sites.

[[image:SQLAG007.JPG||height="860" width="818"]]


== Automatic Page Repair ==

If a page becomes corrupt in a database configured as a replica in an AlwaysOn Availability Group topology, then SQL Server attempts to fix the corruption by obtaining a copy of the pages from one of the secondary replicas. This means that a logical corruption can be resolved without you needing to perform a restore or for you to run DBCC CHECKDB with a repair option.

Automatic repair

* If the primary replica fails to read a page because it is corrupt, it first logs the page in the MSDB.dbo.suspect_pages table. It then checks that at least one replica is in the SYNCHRONIZED state and that transactions are still being sent to the replica. If these conditions are met, then the __primary sends a broadcast to all replicas__, specifying the PageID and LSN (log sequence number) at the end of the flushed log. The __page is then marked as restore pending__, meaning that any attempts to access it will fail, with error code 829.
* After receiving the broadcast, __the secondary__ replicas __wait, until they have redone transactions up to the LSN __specified in the broadcast message. At this point, they try to access the page. If they cannot access it, they return an error. If they can access the page, __they send the page back to the primary replica__. The primary replica accepts the page from the first secondary to respond.
* The__ primary replica will then replace the corrupt copy of the page__ with the version that it received from the secondary replica. When this process completes, it updates the page in the MSDB.dbo.suspect_pages table to reflect that it has been repaired by setting the event_type column to a value of 5 (**Repaired**).
* __If the secondary replica fails__ to read a page while redoing the log because it is corrupt,__ it places the secondary into the SUSPENDED state__. It then logs the page in the MSDB.dbo.suspect_pages table and requests a copy of the page from the primary replica. The primary replica attempts to access the page. If it is inaccessible, then it returns an error and the secondary replica remains in the SUSPENDED state.
* If it can access the page, then it sends it to the secondary replica that requested it. The __secondary replica replaces the corrupt page__ with the version that it obtained from the primary replica. It then updates the MSDB.dbo.suspect_pages table with an event_id of 5. Finally, it attempts to resume the AOAG session.

It is possible to manually resume the session, but if you do, the corrupt page is hit again during the synchronization. Make sure you repair or restore the page on the primary replica first.

= Log Shipping =

Log shipping is a technology that you can use to implement **disaster recovery**.

* It works by backing up the transaction log on the principle server,
* copying it to the secondary server,
* and then restoring it.

It is most appropriate to use log shipping in **DR scenarios** in which you **require a load delay**, because this is __not possible with AOAG__. As an example of where a load delay may be useful, consider a scenario in which a user accidently deletes all of the data from a table. If there is a delay before the database on the DR server is updated, then it is possible to recover the data for this table, from the DR server, and then repopulate the production server. This means that you do not need to restore a backup to recover the data.

Log shipping is **not appropriate for high availability**, since there is **no automatic failover** functionality. The diagram in Figure 2-7 illustrates a log shipping topology.

[[image:SQLAG008.JPG||height="661" width="739"]]


== Recovery Modes ==

In a log shipping topology, there is always exactly **one principle server**, which is the production server. It is possible to have **multiple secondary servers**, however, and these servers can be a mix of **DR servers** and **servers used to offload reporting**.

When you restore a transaction log, you can specify three recovery modes:

1. Recovery,
1. NoRecovery,
1. and Standby.

The __**Recovery** mode__ brings the database online, which is __not supported with log shipping__. The **NoRecovery** mode keeps the __database offline__ so that more backups can be restored. This is the normal configuration for log shipping and is the __appropriate choice for **DR** scenarios__.

The **Standby** option brings the __database online__, but in a** read-only state** so that you can restore further backups. This functionality works by maintaining a **TUF** (Transaction Undo File). The TUF file records __any uncommitted transactions__ in the transaction log.  This means that you can roll back these uncommitted transactions in the transaction log,  which allows the database to be more accessible (although it is read-only). The next time a restore needs to be applied, you can reapply the uncommitted transaction in the TUF file to the log before the redo phase of the next log restore begins.

[[image:SQLAG009.JPG||height="626" width="830"]]

== Remote Monitor Server ==

__Optionally__, you can configure a **monitor server** in your log shipping topology. This helps you __centralize monitoring and alerting__. When you implement a monitor server, the __history and status of all backup__, __copy, and restore operations are stored on the monitor server__. A monitor server also allows you to have a __single alert job__, which is configured to monitor the backup, copy, and restore operations on all servers, as opposed to it needing separate alerts on each server in the topology.

If you wish to use a **monitor server**, it is __important to configure it when you set up log shipping__. After log shipping has been configured, the only way to add a monitor server is to tear down and reconfigure log shipping.

== Failover ==

Unlike other high availability and disaster recovery technologies, an amount of administrative effort is associated with failing over log shipping. __To fail over log shipping__, you must **back up the tail end of the transaction log** and __copy it, along with any other uncopied backup files__, to the secondary server.
\\You now need __to apply the remaining transaction log backups__ to the secondary server __in sequence__, finishing with the tail-log backup. All of the restores are applied with the __WITH NORECOVERY,__ expect for __the final restore__, which you apply using the __WITH RECOVERY__ option to bring the database back online in a consistent state. If you are not planning to fail back, you can reconfigure log shipping with the secondary server as the new primary server.

= Combining Technologies =

To meet your business objectives and nonfunctional requirements (**NFRs**), you need __to combine multiple high availability__ and __disaster recovery technologies togethe__r to **create a reliable, scalable platform**. A classic example of this is the requirement to combine an //__AlwaysOn Failover Cluster with AlwaysOn Availability Groups__//.
\\The reason you may need to combine these technologies is that when you use __AlwaysOn Availability Groups in synchronous mode__, which you must do __for automatic failover,__ it can **cause a performance impediment**.  As discussed earlier in this chapter, the performance issue is caused by **the transaction being committed on the secondary server before being committed on the primary server**.  //__Clustering does not suffer from this issue__//, however, because **it relies on a shared disk resource**, and therefore the **transaction is only committed once**.
\\Therefore, it is **common practice to first use a cluster** to achieve **high availability** and then use **AlwaysOn Availability Groups to perform DR and/or offload reporting**. The diagram in Figure 2-9 illustrates a HA/DR topology that combines clustering and AOAG to achieve high availability and disaster recovery, respectively.

== Clustering and AlwaysOn Availability Groups Combined ==

The diagram in Figure 2-9 shows that the **primary replica** of the database is hosted on a **two-node active/passive cluster**. If the active node fails, the rules of clustering apply, and the **shared storage**, network name, and IP address are reattached to the passive node, which then becomes the active node. If both nodes are inaccessible, however, the availability group listener points the traffic to the **third node of the cluster**, which is situated in the** DR site** and is synchronized using **log stream replication**. Of course, when **asynchronous mode** is used, the database must be **failed over manually by a DBA**.

[[image:SQLAG0010.JPG||height="695" width="992"]]


== Clustering Combined with Log Shipping ==

Another common scenario is the combination of a cluster and log shipping **to achieve high availability and disaster recovery,** respectively. This combination works in much the same way as clustering combined with AlwaysOn Availability Groups and is illustrated in Figure 2-10.

[[image:SQLAG0011.JPG||height="750" width="1026"]]


The diagram shows that a __two-node active/passive cluster__ has been configured in the primary data center. The __transaction log(s)__ of the database(s) hosted on this instance are then __shipped to a stand-alone server in the DR data center__. Because the cluster uses shared storage, you should also use shared storage for the backup volume and add the backup volume as a resource in the role. This means that when the instance fails over to the other node, the backup share also fails over, and log shipping continues to synchronize, uninterrupted.

If failover occurs while the log shipping backup or copy jobs are in progress, then log shipping may become unsynchronized and require manual intervention. This means that after a failover, you should check the health of your log shipping jobs.