Monitoring AlwaysOn Availability Groups
- Contents
Once you have implemented Availability Groups, you need to monitor them and respond to any errors or warnings that could affect the availability of your data.If you have many availability groups implemented throughout the enterprise, use an enterprise monitoring tool, such as
- SOC (Systems Operations Center).
If you only have a small number of availability groups, however,
- then SQL Server provides the AlwaysOn Dashboard
- and the AlwaysOn Health Trace.
- You can also create your own Extended Events sessions to monitor Availability Groups.
AlwaysOn Dashboard
The AlwaysOn Dashboard is an interactive report that allows you to view the health of your AlwaysOn environment and drill through, or roll up elements within the topology.
- see that currently synchronization is in a healthy state
- add the Estimated Recovery Time and Estimated Data Loss measures.
- Estimated Data Loss - gauge impact in the event of an incident
- three possible synchronization states
- SYNCHRONIZED,
- SYNCHRONIZING,
- and NOT SYNCHRONIZING
- A synchronous replica should be in the SYNCHRONIZED state, and any other state is unhealthy.
- An asynchronous replica, however, will never be in the SYNCHRONIZED state, state of SYNCHRONIZING is considered healthy.
- Regardless of the mode, NOT SYNCHRONIZING indicates that the replica is not connected.
- PENDING_FAILOVER,
- PENDING,
- ONLINE,
- OFFLINE,
- FAILED,
- FAILED_NO_QUORUM, and
- NULL (when the replica is disconnected).
- The operational state of a replica can be viewed using the sys.dm_hadr_availability_replica_states DMV
- At the top right of the report, there are links to the failover wizard
- Cluster Quorum Information screen
AlwaysOn Health Trace
The AlwaysOn Health Trace is an Extended Events session, which is created when you create you first availability group. Drilling through the session exposes the session’s package, and from the context menu of the package, you can view previously captured events
Right-clicking a column header in the top pane of the window will expose a context menu, which allows you to search for a text or a value in a specific column, group by the values within a column or sort the results sets by a specific column. You can also use the context menu to add or remove columns from the result set.
Monitoring AlwaysOn with Extended Events
Extended Events are a lightweight monitoring system in SQL Server, which capture events using WMI. Because the architecture uses so few system resources, they scale very well and allow you to monitor instances, with minimal impact on user activity. They are also highly configurable, giving you a wide range of options for capturing details from a very fine grain, such as page splits, to courser-grain information, such as CPU utilization. You can also correlate Extended Events with operating system data to provide a holistic picture when troubleshooting issues. The predecessor to Extended Events was SQL Trace and its GUI, called Profiler, which is deprecated.
Extended Events Concepts
Extended Events have a rich architecture, which consists of the following. These artifacts are stored within a Package, which is, in turn, stored within a module, which can be either a .dll or an executable.
- Events,
- Targets,
- Actions,
- Predicates,
- Types,
- Maps,
- and Sessions.
Packages
A package is a container for the objects used within Extended Events. Here are the four types of SQL Server package:
- Package0 – The default package, used for Extended Events system objects.
- Sqlserver – Used for SQL Server–related objects.
- Sqlos – Used for SQLOS-related objects.
- SecAudit – Used by SQL Audit; however, its objects are not exposed.
Events
An event is an occurrence of interest that you can trace. It may be a SQL batch completing, a cache miss, or a page split, or virtually anything else that can happen within the Database Engine, depending on the nature of the trace that you are configuring. Each event is categorized by channel and keyword (also known as category). A channel is a high-level categorization, and all events in SQL Server 2016 fall into one of the channels
- Admin - Well-known events with well-known resolutions. For example, deadlocks, server starts, CPU thresholds being exceeded, and the use of deprecated features.
- Operational - Used for troubleshooting issues. For example, bad memory being detected an AlwaysOn Availability Group replica changing its state, and a long IO being detected are all events that fall within the Operational channel.
- Analytic - High-volume events that you can use for troubleshooting issues such as performance. For example, a transaction beginning, a lock being acquired, and a file read completing are all events that fall within the Analytic channel.
- Debug - Used by developers to diagnose issues by returning internal data. The events in the Debug channel are subject to change in future versions of SQL Server, so you should avoid them when possible.
SQL Server exposes 122 events relating to AlwaysOn.
Targets
A target is the consumer of the events; essentially, it is the device to which the trace data will be written.
Actions
Also known as Global Fields, Actions are commands that allow additional information to be captured when an event fires. An action is fired synchronously when an event occurs and the event is unaware of the action. There are 50 actions available that allow you to capture a rich array of information, including the statement that caused the event to fire, the security context under which the statement ran, the transaction ID, the CPU ID, and the call stack.
Predicates
Predicates are filter conditions that you can apply before the system sends events to the target. It is possible to create simple predicates, such as filtering statements completing based on a database ID, but you can also create more complex predicates, such as only capturing the role change of an AlwaysOn Availability Group replica if it happens more than twice.
Predicates also fully support short-circuiting. This means that if you use multiple conditions within a predicate, then the order of predicates is important, because if the evaluation of the first predicate fails, the second predicate will not be evaluated. Because predicates are evaluated synchronously, this can have an impact on performance. Therefore, it is sensible to design your predicates, so that predicates which are least likely to evaluate to true are placed before predicates that are very likely to evaluate to true. For example, imagine that you are planning to filter on a specific database (with a database ID of 6), but this database accounts for a high percentage of the activity on the instance. You also plan to filter on a specific user ID (UserA), which is responsible for a low percentage of the activity. In this scenario, you would use the WHERE (([sqlserver].[username]='UserA') AND ([sqlserver].[database_id]=(6))) predicate to first filter out activity that does not relate to UserA, before then filtering out activity that does not relate to database ID 6. Types All objects within a package are assigned a type. This type is used to interpret the data stored within the byte collection of an object. Objects are assigned one of the following types:
- Action
- Event
- Pred_compare (retrieve data from events)
- Pred_source (compare data types)
- Target
- Type
Maps
A map is a dictionary that maps internal ID vales to strings that DBAs can understand. Map keys are only unique within their context and are repeated between contexts. For example, within the statement_recompile_cause context, a map_key of 1 relates to a map_value of Schema Changed. Within the context of a database_sql_statement type, however, a map_key of 1 relates to a map_value of CREATE DATABASE. You can find a
complete list of mappings by using the sys.dm_xe_map_values DMV.Sessions
A session is essentially a trace. It can contain events from multiple packages, actions, targets, and predicates. When you start or stop a session, you are turning the trace on or off. When a session starts, events are written to memory buffers and have predicates applied before they are sent to the target. Therefore, when creating a session, you need to configure properties, such as how much memory the session can use for buffering, what
events can be dropped if the session experiences memory pressure, and the maximum latency before the events are sent to the target.Creating an Event Session to Monitor Availability Group
You can create an event session using either the New Session Wizard, the New Session Dialog Box, or via T-SQL. To create a session using the New Session Wizard, drill through Management ➤ Extended Events in Object Explorer, and select New Session Wizard from the context menu of Sessions. This will cause the Introduction page of the New Session Wizard to be displayed. After passing through the Introduction page, you will find the Set Session Properties page. Here, you can configure a name for the Session and also specify if the Session should automatically be started on creation.