SQL Server 2019 Intelligent Performance

Last modified by Billie D on 2021/02/12 07:30

Introduction ...myitguide_small_01.jpg

The idea was that SQL Server is building capabilities that include intelligence into the engine to detect, adapt, and provide insight like never before.

Each of these areas contains built-in intelligence in the SQL Server engine to help you get better performance out of your systems, in many cases without any changes at all.  In other situations, SQL Server has built-in capabilities to automatically take advantage of new innovations in hardware.

This includes the following new enhancements in SQL Server 2019:
• Intelligent Query Processing
• Lightweight Query Profiling
• In-Memory Database
• Last-Page Insert Contention

Intelligent Query Processing

In SQL Server 2014, our engineering team made a bold decision to introduce a new set of code for the query processor within the engine that makes decisions for cardinality estimation (CE).  The new “CE model” would take effect if a database used a compatibility level of 120 or later (120 is the default for SQL Server 2014). link.   Thus was born a new feature family of enhancements in SQL Server 2017 called Adaptive Query Processing (AQP). The concept was to build into the query processor the ability to adapt as a query executed (or before it executed again) to provide faster execution, without any user intervention or application changes.  Examples.

AQP_001.JPG

Let’s take a look at each new capability you see in gray in Figure 2-1, with examples of how each works.  

It is so important to keep in mind as you read through this section that we built these capabilities so you don’t have to know about them. In time, if we have done our job well, Intelligent Query Processing is “just” the query processor, and you as an application developer, DBA, or data professional are just used to an engine that is flexible, intelligent, and adaptive to your workload. You can see all the capabilities of Adaptive Query Processing as part of the new documentation on IQP at here.  (In all scenarios except for Approximate Count Distinct, you can enable the capabilities of Intelligent Query Processing by changing the database compatibility level of the database to 150. Approximate Count Distinct is a T-SQL function that is new to SQL Server 2019 and does not require a database compatibility level of 150.) 

Example here uses large dataset.  WideWorldImportersDW

  • run the script extendwwidw.sql to create two large tables
    • Extending this database will increase its size, including the transaction log, to about 8Gb overall. One of these tables is called Fact.OrderHistory.  Based on the Orders table, we will make this table much larger and not use a columnstore index.  We will create another table called Fact.OrderHistoryExtended. This will be based on Fact. OrderHistory but will have even more rows
    • Almost all the examples come with two methods
      • A set of T-SQL scripts you can use with any tool like SQL Server Management Studio, Azure Data Studio, or sqlcmd
      • A T-SQL notebook that requires Azure Data Studio. Take a close look on how to run notebooks with Azure Data Studio 
      • One example requires a Windows client, as it uses the famous ostress.exe tool

Intelligent Query Processing is all about a smarter query processor meeting the needs of your query workloads without making major application changes. Most of the functionality is available by simply changing your database compatibility level to 150. I look forward to more enhancements in the future as the query  processor takes on new scenarios powered by your feedback.

Memory Grant Feedback Row Mode

One of the toughest problems I’ve seen customers face when it comes to performance is problems with memory grants. What is a memory grant?

SQL Server allocates memory for all kinds of reasons. When SQL Server executes a query, memory may be used to cache buffers associated with pages belonging to indexes or tables in the query. In most SQL Server instances that have been up and running, the buffer pool may be already in allocated memory so bringing in pages doesn’t require additional memory.

Some query operations are intensive and require some type of temporary area to store data. Two such operations are hash joins (or even just hash operators) and sorts. To perform a hash join, SQL Server effectively has to build a mini-table in memory in
order to perform the operation
. Any type of data sort can require some type of array or structure to sort data. SQL Server has to have some place to perform these operations so it allocates memory outside the buffer pool. The process for allocating this memory by the query execution engine is called a memory grant.

Sounds simple enough. Here is the problem: Memory grants are based on what the optimizer knows about the query plan as it is first being executed. And the “what” for these decisions usually comes down to cardinality estimation or unique number of rows for an operation. If SQL Server thinks a sort operation as part of a query plan will be on data columns that are 100 bytes in total but with an estimated 1 billion rows, it must acquire a memory grant enough to allocate memory to sort that many rows of data of that size.

However, what if the memory grants are based on cardinality estimates that are not accurate? Two types of problems can occur:

  • The memory grant can be too small for what is really needed, resulting in the infamous and painful “tempdb spill.” SQL Server will not allow a hash join operator or sort to get all the memory it wants. If the memory request is too large (we don’t document what is too large, because we might change it, and wouldn’t want you to rely on it), the current allocated memory must be saved.  Saved where? You guessed it… tempdb. Think of this like a paging system much like the how operating system pages memory when physical RAM is exhausted.
     
  • The memory grant is too large for what is really needed. This could  squeeze memory pressure for other parts of the SQL Server engine, but what is more likely is multiple users run queries that have excessive memory grants, and SQL Server will throttle queries. The result is some users experience bottlenecks on a wait_type called RESOURCE_SEMAPHORE. Both of these problems can lead to performance problems.

In SQL Server 2017, we introduced a concept called memory grant feedback for batch mode. This feature is a perfect example of adapting. When a query has completed execution, SQL Server knows how much memory was used for a grant vs. what was originally requested.

If the memory used was far less than what was granted, why keep asking for too much memory the next time the same cached plan is executed? Same goes for if the memory used was far greater than the requested original grant. Why keep spilling to tempdb for a cached query plan over and over? Memory grant feedback solves this problem by storing information in the cached query plan for what the correct memory grant should be for future executions. To the user, it feels like SQL Server healed itself. This feature was great for SQL Server 2017, but only for batch mode operations, which meant it only worked for columnstore index operations. As you will learn in a later section of this chapter titled “Batch Mode on Rowstore,” SQL Server supports batch mode operations on more than just columnstore. However, why not support memory grant feedback even when batch mode is not used? The result is an adaptive SQL Server engine for memory grant scenarios no matter what type of table or index is being used. Enabling memory grant feedback row mode is as simple as changing the database compatibility level (dbcompat) to 150.

You can disable or enable memory grant feedback row mode even with dbcompat at 150 using the ROW_MODE_MEMORY_GRANT_FEEDBACK option for ALTER DATABASE SCOPED CONFIGURATION.  You can also disable this feature at the query
level using the DISABLE_ROW_MODE_MEMORY_GRANT_FEEDBACK query option.  Link.

USE WideWorldImportersDW
GO
ALTER DATABASE SCOPED CONFIGURATION SET ROW_MODE_MEMORY_GRANT_FEEDBACK = OFF;
GO

/* To re-enable row mode memory grant feedback for all query executions originating from the database, execute the following within the context of the applicable database: */

USE WideWorldImportersDW
GO
ALTER DATABASE SCOPED CONFIGURATION SET ROW_MODE_MEMORY_GRANT_FEEDBACK = ON;
GO

-- Use Query Hint
USE AdventureWorks2019
GO
SELECT * FROM Person.Address   
WHERE City = 'SEATTLE' AND PostalCode = 98104
OPTION (USE HINT ('DISABLE_ROW_MODE_MEMORY_GRANT_FEEDBACK'));
GO

Spills are really ugly because this is not data placed into a buffer pool page associated to tempdb. Tempdb data files become the paging file for memory grants for hash joins (these are not tempdb pages for temporary tables. This is yet another reason why I often call tempdb the garbage dump of SQL Server).

Even with the feedback system, in some cases, the actual needed memory grant can be very large. Large enough that concurrent users will encounter RESOURCE_SEMAPHORE waits to cause memory pressure within SQL Server.  In these cases, you can use resource governor to limit the amount of memory for grants. See the documentation here on how to change this. In SQL Server 2019, this value can now be a float value so values < 1% are valid. This could be important with systems with a large
amount of memory. In addition, you can set these values at the query level. See the documentation here.

This system is well designed and could really help save you time on expensive tuning for workloads requiring memory grants. There are a few scenarios where memory grant feedback will not be enabled or will not take effect:

  • There is no spill detected, or 50% of the granted memory is used.
  • There is a fluctuation where the memory grant is being reduced and increased constantly.

Table Variable Deferred Compilation

As long as table variables have been around, they have the inherent problem that the cardinality estimation by the SQL Server optimizer is always one row, no matter how many rows are populated into the table variable, the optimizer doesn’t know how many rows are actually in a table variable, since they are defined and typically populated as part of a batch or stored procedure.  See also trace flag 2453.

The leadership of the Query Processor team had an idea they wanted to implement in SQL Server 2019 as part of Intelligent Query Processing called table variable deferred compilation.

Table variable deferred  compilation defers compilation of a statement that references a table variable until the first actual run of the statement. This deferred compilation behavior is the same as that of temporary tables. This change results in the use of actual cardinality instead of the original one-row guess.    

-- Change WideWorldImportersDW to dbcompat 130 to see legacy behavior
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE
GO
ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 130
GO



-- Change WideWorldImportersDW to dbcompat 150 to see legacy behavior
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE
GO
ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 150
GO

Batch Mode on Rowstore

SQL Server 2012 added a nifty (what an understatement!), now famous capability called Columnstore Indexes, through Project Apollo

As a part of delivering this feature, the query processor was enhanced to use batch mode processing of rows with columnstore indexes.  Up to this time, plan operators, like scans, execute and process data based on a single row (and the entire row). Batch mode provides a new paradigm to allow operators to process data based on batches of rows that are organized by column and include vectors to identify qualify rows. This concept aligns very well with columnstore indexes, which are organized by columns, not rows.  Any table or index that is not organized with a columnstore index is aptly named a rowstore.

In SQL Server 2019, the query processor can automatically detect whether your query qualifies for batch mode processing on a rowstore.

Batch mode, again, may not make sense for all queries, so a few basics must apply. 

  • Your query needs to process a large number of rows and involve operations that require aggregates (think count(∗) or sum(), joins, or sorts).  Batches make sense when there is a flow of data between several operators of a large number of rows to execute a query, the threshold is generally 128K rows
  • Blog post by SQL Community Expert, Dima Pilugin, who debugged the magical 128K number.

You can read all the details of batch mode on rowstore, including enabling and disabling this capability here.

-- Disabling batch mode on rowstore
ALTER DATABASE SCOPED CONFIGURATION SET BATCH_MODE_ON_ROWSTORE = OFF;

-- Enabling batch mode on rowstore
ALTER DATABASE SCOPED CONFIGURATION SET BATCH_MODE_ON_ROWSTORE = ON;

------------------------------------------------------------------
-- Clear the procedure cache and change dbcompat to 130 to ensure batch mode for rowstore is not enabled
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE
GO
ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 130
GO


-- Clear the procedure cache and change dbcompat to 150 to enable batch mode
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE
GO
ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 150
GO

In other words, batches make sense when there is a flow of data between several operators of a large number of rows to execute a query.
 

You will see that Batch Mode on Rowstore can have a significant performance difference, especially when dealing with tables of large datasets. Additionally, Batch Mode now works for both columnstore (implemented in SQL Server 2017) and rowstore, so you shouldn’t need to worry about it. The query processor knows when to use it and how it can help boost performance of your query.
 

Scalar UDF Inlining

SQL Server has long had a concept called a user-defined function (UDF). The concept is that you build some T-SQL code inside a FUNCTION that takes one or more parameters, and this function returns a value. You could then use the function in any T-SQL SELECT statement. It is a popular way for code reuse like a stored procedure, but a function has the nice property of being part of the SELECT statement.

There are two types of user-defined functions:
Scalar, which returns a single value
Table-valued, which returns a result set in the form of a TABLE type

Despite the popularity and programming advantages of UDFs, their use can lead to performance problems because of the limitations in how they are compiled and integrated into the overall query plan. 

Any time a scalar UDF is used to return a value as part of a list of columns, each row that is part of the table being accessed is applied to the code in the UDF one row at a time.
 

Now comes along scalar UDF inlining. The query processor can take the code (a UDF could have multiple T-SQL statements) and is able to integrate those statements with the overall query, hence the term inlining.  

  • How to enable scalar UDF inlining by using dbcompat in the documentation.  
  • How to disable and enable scalar UDF inlining without changing dbcompat here.
/* You can make workloads automatically eligible for Scalar UDF Inlining by enabling compatibility level 150 for the database. You can set this using Transact-SQL */
ALTER DATABASE [WideWorldImportersDW] SET COMPATIBILITY_LEVEL = 150;

--Disabling Scalar UDF Inlining without changing the compatibility level
ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = OFF;

-- To re-enable Scalar UDF Inlining for the database, execute the following statement within the context of the applicable database:
ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON;

Example is loosely based off of examples in the following blog post by Microsoft, which also has some really nice details on the previous limits of scalar UDF functions and how Intelligent Query Processing has enabled significant
performance improvements.

You can see the power of scalar UDF inlining; now you should feel more empowered to use scalar UDFs in your applications. You can read more about scalar UDF inlining, including all the requirements and restrictions here.

Approximate Count Distinct

There are scenarios where you need to count the number of rows in any table. That’s  easy. But there are also situations where you need to know the number of distinct values of a column across all rows of a table.

The only problem is how the query processor must do work to figure out what are all the distinct values.  For SQL Server, this often requires the use of a Hash Match operator. This operator is similar to a Hash Join, in that a “hash table” is used to build a list of all the distinct values
to count.  A Hash Join requires a memory grant, so all the issues with memory grants can come into play. Furthermore, it can take a great deal of compute resources to use a hash table to count all the distinct values.
 

Is there a better way? Well, there is a different way that could be faster, at the cost of a slightly less precise answer. The solution is a new T-SQL function called APPROX_COUNT_DISTINCT(). This is a built-in function that will count distinct values of a column based on a sample approximation. This is not an enhancement for the COUNT() function. This is an entirely new function, which is why it does not require dbcompat = 150. This function uses a concept called HyperLogLog.

Using an approximation of the count of distinct values comes with a 2% error rate on a 97% probability. This means if you can get back an answer that you are pretty confident will be close to the real truth, you can use this function.

You may wonder why I forced the dbcompat to 130 – to prove to you that you don’t have to use the latest dbcompat of 150 to take advantage of this capability. This is because the new T-SQL function APPROX_COUNT_DISTINCT() just comes with the SQL Server 2019 engine but doesn’t require a new dbcompat like other Intelligent Query Processing features.  

  • If you look at the execution plan, notice the Hash Match operator. If you move your mouse over that operator, you will see it uses Row Mode and has to process the entire 29M rows in the hash operator.
  • The second query should only take a second or two – about 50% faster than before. Again, this could be significant on very large datasets. If you look at the execution plan, it looks similar but with less operators, notice the Hash Match operator doesn’t have a “thick line” as
    output because the approximation operation is applied with this operator yielding only one row to the rest of the plan.  As you can see, the use of approximation for counting distinct values can provide better performance provided you need only a “close enough” value.

Lightweight Query Profiling

SQL Server has amazing diagnostics for performance problems, including Dynamic Management Views (DMVs) and Extended Events. We had built DMVs to be a great mechanism to see what is running at any point in time. This is a great way to find out about active sessions and what queries they are running. But often, to solve a complex performance problem, you need details of the query plan.

This team had built the concept already of live query stats, but lightweight profiling allows to do much more .  In fact, what the team had built was a query execution statistics profile infrastructure, or standard profiling. This capability gives you actual execution plan statistics at the
operator level for rows, CPU, and I/O. This is key information to profile a query, but there is a catch. You must enable this before running the query or enable Extended Events  for all queries which can be impactful to production workloads. You can read more
about standard profiling.

“Can we make this better?”    They created the lightweight query execution statistics profiling infrastructure, or lightweight profiling. The concept is to get profiling for queries without the overhead required from standard profiling.  However, to make it “light,” we had to take out collecting CPU statistics so you still get “per operator” rowcount and I/O statistics.  This is great, but… you still need to turn this on to make it work.   The true answer is to just have lightweight profiling running by default.  You only get rowcount information from actively running queries, but
often that is enough to help look at performance problems. But there is a bonus. We’ve added the ability to get the last actual execution plan for most cached queries.

When lightweight query profiling is on by default in SQL Server 2019, row_count is the only statistics captured. It can be expensive to capture statistics like CPU and I/O by default. You can capture these with standard profiling.

  • How do you find out whether this query will finish anytime soon or should be killed and corrected?  How built-in, on by default, Lightweight Query Profiling can help give you the answer.
    • Example:  What does “No Join Predicate” mean? It means there is a major problem with the JOIN operator in the query. It means there really is no “equi” join.   You will see that this query simply joins a table to itself. A simple
      typo of si vs. sil is the problem. This query will almost never finish.

Lightweight query profiling also includes extended events and query hint support to enable it. You can read more about how to enable these, plus how to disable the feature per database.

Lightweight profiling is enabled by default on SQL Server 2019 (15.x) and Azure SQL Database. Starting with SQL Server 2019 (15.x), trace flag 7412 has no effect. Lightweight profiling can be disabled at the database level using the LIGHTWEIGHT_QUERY_PROFILING database scoped configuration: ALTER DATABASE SCOPED CONFIGURATION SET LIGHTWEIGHT_QUERY_PROFILING = OFF;.

A new DMF sys.dm_exec_query_plan_stats is introduced to return the equivalent of the last known actual execution plan for most queries, and is called last query plan statistics. The last query plan statistics can be enabled at the database level using the LAST_QUERY_PLAN_STATS database scoped configuration: ALTER DATABASE SCOPED CONFIGURATION SET LAST_QUERY_PLAN_STATS = ON;.

Starting with lightweight profiling v2 and its low overhead, any server that is not already CPU bound can run lightweight profiling continuously, and allow database professionals to tap into any running execution at any time, for example using Activity Monitor or directly querying sys.dm_exec_query_profiles, and get the query plan with runtime statistics.

Now with the ability to see the actual execution plan at any time vs. having to turn on special flags that may be expensive or run the query offline, this becomes a powerful part of your toolkit for query performance tuning and troubleshooting. Lightweight query profiling is just plain cool! If you have spent anytime supporting production SQL Servers for performance issues, having built-in diagnostics available anytime, anywhere is a breath of fresh air.

-- What does the estimated plan say? Looks like the right plan based on estimates
SELECT st.text, cp.plan_handle, qp.query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
WHERE qp.dbid = db_id('WideWorldImporters')
GO

-- What does the last actual plan say? Ooops. Actual vs Estimates way off
SELECT st.text, cp.plan_handle, qps.query_plan, qps.*
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st
CROSS APPLY sys.dm_exec_query_plan_stats(cp.plan_handle) AS qps
WHERE qps.dbid = db_id('WideWorldImporters')
GO

In-Memory Database

In SQL Server 2014, we introduced a feature called In-Memory OLTP, which centered on the concept of memory-optimized tables. For this feature, the entire table is stored in memory, but it is the optimized (read: latch-free) access that makes it special. We made significant enhancements to In-Memory OLTP in SQL Server 2016. You can read more details about that feature.
 

For SQL Server 2019

The engineering team met together and collectively decided to call a new suite of features In-Memory Database to add to the capabilities of In-Memory OLTP.  Together, the following features have become the In-Memory Database feature suite:  You can see the full collection of this feature suite at here.
• In-Memory OLTP
• Memory-Optimized TempDB Metadata
• Hybrid Buffer Pool
• Persistent Memory Support

Memory-Optimized TempDB Metadata

Even since I’ve been involved with SQL Server, the concurrency performance of workloads using temporary tables has been an issue. This has led for almost every SQL Server administrator to configure tempdb to use multiple files. You can read more about the history of this adventure with these resources:

  • Inside TempDB talk by Bob Ward from the 2017 PASS Summit (link)
  • Paul Randal’s blog on adding tempdb files (link)

One aspect to using tempdb files that most SQL professionals don’t realize (because it is just common culture now) is that you are creating a partitioning scheme for the SQL Server engine to access allocation pages such as PFS, GAM, and SGAM pages. This type of scheme is useful because a workload using temporary tables results in a heavy create table, allocate pages, drop table cycle. This creates contention on these system allocation pages in the form of latches. By creating multiple files, you are spreading out the contention for latches on these pages, which increases performance for concurrent tempdb workloads.

After creating multiple files (starting with SQL Server 2016, the setup program can do this for you automatically, or you can configure it manually), you may see with a heavier concurrent tempdb-based workload more page latch waits, but these waits are on pages belonging to objects you may not recognize – objects like sysschobjs. These page latch waits are for system table pages in tempdb. When you create and drop tables at a rapid pace, SQL Server must perform internal read/write operations on pages for system tables to keep the metadata of tables consistent. These operations result in page latch pressure across users. In the past when customers would run into high page latch contention on system tables and contact me in support, I would answer, “Unfortunately you must reduce the load of tempdb usage to avoid this problem.”  Pam Lahoud describes this problem very well in her blog.

Along comes a solution for SQL Server 2019, memory-optimized tempdb metadata.  Memory-optimized tables (remember the famous project Hekaton) are latch-free by their nature, and the data for these tables all exist in memory. If the memory-optimized tables are “schema only,” they don’t even have durability constraints. This is a perfect platform for tempdb system tables. Since tempdb is recreated with each server restart, the system tables don’t need to be durable. And since the only data being stored in memory-optimized tables is metadata, the
memory consumption for these should be small. 

Tempdb metadata does not use memory-optimized tables by default when you install SQL Server. You must run the following T-SQL command to enable this capability.   You can see me demo this feature at the 2019 SQLBits keynote.

ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED TEMPDB_METADATA = ON

You can see the huge benefits of this feature built into the engine. You only turn on a server configuration option, restart SQL Server, and you are ready to go. If you access to catalog views in tempdb, you will see there are a few restrictions when using memory-optimized tempdb metadata. You can read about these restrictions and all the details of this capability.  Link.

Hybrid Buffer Pool

Persistent memory devices have been around for a few years but are now starting to become more popular. The concept is memory-based hardware that have persistence through a power source. Think the speed of RAM but any data stored is guaranteed to survive a power restart. One of the more popular persistent memory offerings is by Intel, called Optane.

Our SQL Server engineering team is always looking to find ways to optimize access to data, and, with persistent memory, there are several opportunities. In fact, SQL Server 2016 included a feature called “tail of the log caching” based on persistent memory.

Since persistent memory is in fact memory, SQL Server can access any data stored on a persistent memory device like it is really memory. This means SQL Server can find creative ways to bypass kernel code for I/O processing when accessing data on persistent memory devices.

One such new capability is Hybrid Buffer Pool. The concept is that if you place your database data files on a persistent memory device, SQL Server can simply access pages on the data file from this device without having to copy data from the data file into a buffer pool page. Hybrid buffer pool uses memory-mapped kernel calls to make this a reality. If a database page is modified, it must then be copied into the buffer pool and then eventually written back to the persistent memory device. Performance results vary on the benefits of using hybrid buffer pool, but you can
typically expect some boost from this technology, especially on read-heavy workloads. For SQL Server, provided you have placed one or more database files on a persistent memory device, you can enable Hybrid Buffer Pool for all databases for SQL Server with the T-SQL statement:

-- entire server
ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED HYBRID_BUFFER_POOL = ON;

-- specific database
ALTER DATABASE <databaseName> SET MEMORY_OPTIMIZED = ON

When you enable hybrid buffer pool for all databases, you must restart SQL Server.

To read more about how to enable your devices for persistent memory for databases, how to disable hybrid buffer pool, and best practices for using hybrid buffer pool, consult the documentation.

Persistent Memory Support

If you don’t want to enable hybrid buffer pool but would like SQL Server to take advantage of reading and writing both data and transaction log data to persistent memory devices, you can configure your device as a persistent memory device on Linux.  SQL Server will automatically detect it and use memory-based operations to move data into SQL Server cache and the device, bypassing the Linux kernel I/O stack. This process is called enlightenment.  Dell EMC was able to see significant performance improvements with enlightenment as documented here.  According
to Dell, “With new Intel® Optane™ DC persistent memory, customers can accelerate in-memory databases, virtualization and data analytics workloads with up to 2.5 times more memory capacity for select PowerEdge Servers.

You can read all the details about how to enable your persistent memory device on Linux for SQL Server here.  

Last-Page Insert Contention

Here is a common problem for SQL Server users for a very long time. You want to build a table with a primary key that will be used in a clustered index. And this primary key is a sequential value. In other words, each insert of a row leads to a new value in an incrementing order. The most common form of this type of key is a column using a SEQUENCE object or an IDENTITY property.

While the design works fine in most cases, it presents a challenging problem for application performance. Each time a query needs to modify a page, SQL Server must physically protect other queries from changing or reading the page structure at the same time (even with row-level locking) using a page latch.

If many users were all trying to modify the same page, your application might suffer in performance due to page latch contention. If you build a clustered index on a sequential key, the data is sorted on that key. Each insert will be trying to insert a new row in the last page of the clustered index leaf level. And if many users are concurrently executing inserts, they could all end up trying to modify the last page of the index, hence the term last-page insert contention.

While this contention is not ideal, it normally is not too much of a problem, until a phenomenon called a latch convoy occurs.   SQL Server and the last-page insert contention problem, a page split is an example of a scenario where a convoy can build up. A page split can easily occur when there are  not enough rows on the page for a new INSERT, and a new page must be created in the clustered index.

Many users in the SQL community, technical support, and engineering have tackled this problem in many different ways over the years. This support article mentions many of them, link.

The solution now exists in the form of a new option for indexes called OPTIMIZE_FOR_SEQUENTIAL_KEY. By adding this option to your index or primary key constraint, you are telling SQL Server to enable new code to try and avoid the convoy problem. This option doesn’t eliminate latches or prevent a latch contention problem. What it does is try to avoid the dreaded convoy problem so that your workload throughput is consistent. You can read more about this option and how to use it in our documentation.  If you use this option, you may notice a new wait_type called BTREE_INSERT _FLOW_CONTROL. This is normal and part of the mechanism to avoid or reduce the convoy problem.

This option is not for everyone. If you don’t use a sequential key for a clustered index or don’t see heavy contention, then I wouldn’t recommend using this option. In fact, you might experience worse performance by blindly applying this to any clustered index.

It is possible the techniques as described in article may provide you better performance, but this new method for an index may give you the consistent performance you need and is far less intrusive to your application.

 

Created by Billie D on 2021/02/07 08:29