Toggle navigation
Log-in
myITGuide
Page Index
User Index
Application Index
Global
Wiki Index
Home
Blog
Links
Database
SQL Server
Cassandra
BigData
PostgreSQL
Redis
MariaDB
MongoDB
Tools
T-SQL
Powershell
Python
Scala
R
Jupyter
Java
Solution
Streaming
Spark
Storm
Search
Solr
ElasticSearch
Kafka
Puppet
Kibana
Docker
Kubernetes
OS
Windows
Linux
About
About Us
Email Us
SQL Server
SQL Server 2019 Intelligent Performance
Wiki source code of
SQL Server 2019 Intelligent Performance
Last modified by
Billie D
on 2021/02/12 07:30
Show line numbers
{{box cssClass="floatinginfobox" title="**Contents**"}} {{toc /}} {{/box}} = Introduction ...[[image:About Us.WebHome@myitguide_small_01.jpg]] = The idea was that SQL Server is building capabilities that include intelligence into the engine to detect, adapt, and provide insight like never before. Each of these areas contains built-in intelligence in the SQL Server engine to help you get better performance out of your systems, in many cases without any changes at all. In other situations, SQL Server has built-in capabilities to automatically take advantage of new innovations in hardware. This includes the following new enhancements in SQL Server 2019: • Intelligent Query Processing • Lightweight Query Profiling • In-Memory Database • Last-Page Insert Contention = Intelligent Query Processing = In SQL Server 2014, our engineering team made a bold decision to introduce a new set of code for the query processor within the engine that makes decisions for cardinality estimation (CE). The new “CE model” would take effect if a database used a compatibility level of 120 or later (120 is the default for SQL Server 2014). [[link>>https://docs.microsoft.com/en-us/sql/relational-databases/performance/cardinality-estimation-sql-server?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]]. Thus was born a new feature family of enhancements in SQL Server 2017 called Adaptive Query Processing (AQP). The concept was to build into the query processor the ability to adapt as a query executed (or before it executed again) to provide faster execution, without any user intervention or application changes. [[Examples>>https://github.com/Microsoft/bobsql/tree/master/demos/sqlserver/aqp||target="_blank" rel="noopener noreferrer"]]. [[image:AQP_001.JPG||height="435" width="1072"]] Let’s take a look at each new capability you see in gray in Figure 2-1, with examples of how each works. It is so important to keep in mind as you read through this section that we built these capabilities so you don’t have to know about them. In time, if we have done our job well, Intelligent Query Processing is “just” the query processor, and you as an application developer, DBA, or data professional are just used to an engine that is flexible, intelligent, and adaptive to your workload. You can see all the capabilities of Adaptive Query Processing as part of the new documentation on IQP at [[here>>https://docs.microsoft.com/en-us/sql/relational-databases/performance/intelligent-query-processing?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]]. (In all scenarios except for Approximate Count Distinct, you can enable the capabilities of Intelligent Query Processing by __//changing the database compatibility level of the database to 150//__. Approximate Count Distinct is a T-SQL function that is new to SQL Server 2019 and does not require a database compatibility level of 150.) Example here uses large dataset. [[WideWorldImportersDW>>https://docs.microsoft.com/en-us/sql/samples/wide-world-importers-dw-database-catalog?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]] * run the script //__extendwwidw.sql__// to create two large tables ** Extending this database will increase its size, including the transaction log, to about 8Gb overall. One of these tables is called Fact.OrderHistory. Based on the Orders table, we will make this table much larger and not use a columnstore index. We will create another table called Fact.OrderHistoryExtended. This will be based on Fact. OrderHistory but will have even more rows ** Almost all the examples come with two methods *** A set of T-SQL scripts you can use with any tool like SQL Server Management Studio, Azure Data Studio, or sqlcmd *** A T-SQL notebook that requires Azure Data Studio. Take a close look on how to [[run notebooks>>https://docs.microsoft.com/en-us/sql/azure-data-studio/notebooks/notebooks-guidance?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]] with Azure Data Studio *** One example requires a Windows client, as it uses the famous [[**ostress.exe**>>https://www.sqlservercentral.com/blogs/sql-server-stress-testing-using-ostress-%E2%80%93-rml-utilities||target="_blank" rel="noopener noreferrer"]] tool Intelligent Query Processing is all about a smarter query processor meeting the needs of your query workloads without making major application changes. Most of the functionality is available by simply changing your __database compatibility level to **150**__. I look forward to more enhancements in the future as the query processor takes on new scenarios powered by your feedback. == Memory Grant Feedback Row Mode == One of the toughest problems I’ve seen customers face when it comes to performance is problems with memory grants. What is a memory grant? SQL Server allocates memory for all kinds of reasons. When __SQL Server executes a query__, **memory may be used to cache buffers associated with pages belonging to indexes or tables in the query**. In most SQL Server instances that have been up and running, __the buffer pool may be already in allocated memory so bringing in pages doesn’t require additional memory__. Some query operations are intensive and require some type of temporary area to store data. Two such operations are **hash joins** (or even just hash operators) and **sorts**. To perform a //__hash join__//, SQL Server effectively has to //build a mini-table in memory in order to perform the operation//. Any type of data sort can require some type of array or structure to sort data. SQL Server has to have some place to perform these operations so **it allocates memory outside the buffer pool**. The process for allocating this memory by the query execution engine is called a **memory grant**. Sounds simple enough. Here is the problem: **Memory grants** are **based on what the optimizer** knows about the query plan as it is first being executed. And the “what” for these decisions usually comes down to //**cardinality estimation** or unique number of rows for an operation//. If SQL Server thinks a sort operation as part of a query plan will be on data columns that are 100 bytes in total but with an estimated 1 billion rows, it must acquire a memory grant enough to allocate memory to sort that many rows of data of that size. However, what if the memory grants are based on cardinality estimates that are not accurate? Two types of problems can occur: * The memory grant can be **too small** for what is really needed, resulting in the infamous and painful “**tempdb spill**.” SQL Server will not allow a hash join operator or sort to get all the memory it wants. If the memory request is too large (we don’t document what is too large, because we might change it, and wouldn’t want you to rely on it), the current allocated memory must be saved. Saved where? You guessed it… tempdb. __//Think of this like a paging system//__ much like the how operating system pages memory when physical RAM is exhausted. * The memory grant is **too large** for what is really needed. This **could squeeze memory pressure for other parts of the SQL Server engine**, but what is more likely is multiple users run queries that have excessive memory grants, and SQL Server will throttle queries. The result is some users experience bottlenecks on a wait_type called **RESOURCE_SEMAPHORE**. Both of these problems can lead to performance problems. In __SQL Server 2017__, we introduced a concept called __memory grant feedback for **batch mode**__. This feature is a perfect example of adapting. When a query has completed execution, SQL Server knows how much memory was used for a grant vs. what was originally requested. If the memory used was far less than what was granted, why keep asking for too much memory the next time the same cached plan is executed? Same goes for if the memory used was far greater than the requested original grant. Why keep spilling to tempdb for a cached query plan over and over? __Memory grant feedback solves this problem__ by __storing information in the cached query plan__ for what the __correct memory grant should be for future executions__. To the user, it feels like SQL Server healed itself. This f__eature was great for SQL Server 2017__, **but only for batch mode operations**, which meant it **only worked for columnstore index operations**. As you will learn in a later section of this chapter titled “**Batch Mode on Rowstore**,” **SQL Server supports batch mode operations on __more than just columnstore__**. However, why not support memory grant feedback even when batch mode is not used? The result is an adaptive SQL Server engine for memory grant scenarios no matter what type of table or index is being used. Enabling memory grant feedback row mode is as **simple as changing the database compatibility level (dbcompat) to 150**. You can **disable or enable memory grant feedback row mode** even with dbcompat at 150 using the ROW_MODE_MEMORY_GRANT_FEEDBACK option for ALTER DATABASE SCOPED CONFIGURATION. You can also disable this feature at the query level using the DISABLE_ROW_MODE_MEMORY_GRANT_FEEDBACK query option. [[Link>>https://docs.microsoft.com/en-us/sql/relational-databases/performance/intelligent-query-processing?view=sql-server-ver15#row-mode-memory-grant-feedback||target="_blank" rel="noopener noreferrer"]]. {{code language="SQL"}} USE WideWorldImportersDW GO ALTER DATABASE SCOPED CONFIGURATION SET ROW_MODE_MEMORY_GRANT_FEEDBACK = OFF; GO /* To re-enable row mode memory grant feedback for all query executions originating from the database, execute the following within the context of the applicable database: */ USE WideWorldImportersDW GO ALTER DATABASE SCOPED CONFIGURATION SET ROW_MODE_MEMORY_GRANT_FEEDBACK = ON; GO -- Use Query Hint USE AdventureWorks2019 GO SELECT * FROM Person.Address WHERE City = 'SEATTLE' AND PostalCode = 98104 OPTION (USE HINT ('DISABLE_ROW_MODE_MEMORY_GRANT_FEEDBACK')); GO {{/code}} **Spills** are really ugly because this is not data placed into a buffer pool page associated **to tempdb**. Tempdb data files become the paging file for memory grants for hash joins (these are not tempdb pages for temporary tables. This is yet another reason why I often call tempdb the garbage dump of SQL Server). Even with the feedback system, in some cases, the actual needed memory grant can be very large. Large enough that concurrent users will encounter RESOURCE_SEMAPHORE waits to cause memory pressure within SQL Server. In these cases, you can use resource governor to limit the amount of memory for grants. See the documentation [[here>>https://docs.microsoft.com/en-us/sql/t-sql/statements/create-workload-group-transact-sql||target="_blank" rel="noopener noreferrer"]] on how to change this. In SQL Server 2019, this value can now be a float value so values < 1% are valid. This could be important with systems with a large amount of memory. In addition, you can set these values at the query level. See the documentation [[here>>https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15#arguments||target="_blank" rel="noopener noreferrer"]]. This system is well designed and could really help save you time on expensive tuning for workloads requiring memory grants. There are a few scenarios where memory grant feedback will not be enabled or will not take effect: * There is no spill detected, or 50% of the granted memory is used. * There is a fluctuation where the memory grant is being reduced and increased constantly. == Table Variable Deferred Compilation == As long as table variables have been around, they have the inherent problem that the cardinality estimation by the SQL Server optimizer is always one row, no matter how many rows are populated into the table variable, the optimizer doesn’t know how many rows are actually in a table variable, since they are defined and typically populated as part of a batch or stored procedure. See also [[trace flag 2453>>https://docs.microsoft.com/en-ca/archive/blogs/psssql/having-performance-issues-with-table-variables-sql-server-2012-sp2-can-help||target="_blank" rel="noopener noreferrer"]]. The leadership of the Query Processor team had an idea they wanted to implement in SQL Server 2019 as part of Intelligent Query Processing called table variable deferred compilation. **[[Table variable>>https://docs.microsoft.com/en-us/sql/relational-databases/performance/intelligent-query-processing?view=sql-server-ver15#table-variable-deferred-compilation||target="_blank" rel="noopener noreferrer"]]** deferred compilation defers compilation of a statement that references a table variable until the first actual run of the statement. This deferred compilation behavior is the same as that of temporary tables. This change results in the use of actual cardinality instead of the original one-row guess. {{code language="SQL"}} -- Change WideWorldImportersDW to dbcompat 130 to see legacy behavior ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE GO ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 130 GO -- Change WideWorldImportersDW to dbcompat 150 to see legacy behavior ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE GO ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 150 GO {{/code}} == Batch Mode on Rowstore == SQL Server 2012 added a nifty (what an understatement!), now famous capability called Columnstore Indexes, through [[Project Apollo>>https://cloudblogs.microsoft.com/sqlserver/2011/08/04/columnstore-indexes-a-new-feature-in-sql-server-known-as-project-apollo/||target="_blank" rel="noopener noreferrer"]]. As a part of delivering this feature, the query processor was enhanced to use batch mode processing of rows with columnstore indexes. Up to this time, plan operators, like scans, execute and process data based on a single row (and the entire row). Batch mode provides a new paradigm to allow operators to process data based on batches of rows that are organized by column and include vectors to identify qualify rows. This concept aligns very well with columnstore indexes, which are organized by columns, not rows. Any table or index that is not organized with a columnstore index is aptly named a rowstore. In SQL Server 2019, the query processor can automatically detect whether your query qualifies for batch mode processing on a rowstore. Batch mode, again, may not make sense for all queries, so a few basics must apply. * Your query needs to process a large number of rows and involve operations that require aggregates (think **count(∗)** or **sum()**, **joins**, or **sorts**). Batches make sense when there is a flow of data between several operators of a large number of rows to execute a query, the threshold is generally 128K rows * [[Blog>>http://www.queryprocessor.com/batch-mode-on-row-store/||target="_blank" rel="noopener noreferrer"]] post by SQL Community Expert, Dima Pilugin, who debugged the magical 128K number. You can read all the details of batch mode on rowstore, including enabling and disabling this capability [[here>>https://docs.microsoft.com/en-us/sql/relational-databases/performance/intelligent-query-processing?view=sql-server-ver15#batch-mode-on-rowstore||target="_blank" rel="noopener noreferrer"]]. {{code language="SQL"}} -- Disabling batch mode on rowstore ALTER DATABASE SCOPED CONFIGURATION SET BATCH_MODE_ON_ROWSTORE = OFF; -- Enabling batch mode on rowstore ALTER DATABASE SCOPED CONFIGURATION SET BATCH_MODE_ON_ROWSTORE = ON; ------------------------------------------------------------------ -- Clear the procedure cache and change dbcompat to 130 to ensure batch mode for rowstore is not enabled ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE GO ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 130 GO -- Clear the procedure cache and change dbcompat to 150 to enable batch mode ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE GO ALTER DATABASE WideWorldImportersDW SET COMPATIBILITY_LEVEL = 150 GO {{/code}} In other words, batches make sense when there is a flow of data between several operators of a **large number of rows** to execute a query. You will see that **Batch Mode** on __Rowstore__ can have a significant performance difference, especially when dealing with tables of large datasets. Additionally, Batch Mode now works for **both** __columnstore__ (implemented in SQL Server 2017) a__nd rowstore__, __so you shouldn’t need to worry about it__. The query processor knows when to use it and how it can help boost performance of your query. == Scalar UDF Inlining == SQL Server has long had a concept called a __user-defined function__ (**UDF**). The concept is that you build some __T-SQL code inside a FUNCTION__ that takes one or more parameters, and this function returns a value. You could then use the function in any T-SQL SELECT statement. It is a popular way for code reuse like a stored procedure, but a function has the nice property of being part of the SELECT statement. There are two types of user-defined functions: • **Scalar**, which returns a single value • **Table-valued**, which returns a result set in the form of a TABLE type Despite the popularity and programming advantages of UDFs, their use **can lead to performance problems** because of the __limitations in how they are compiled__ and integrated into the overall query plan. Any time a scalar UDF is used to return a value as part of a list of columns, each row that is part of **the table being accessed** is **applied to the code in the UDF** __**one row at a time**__. Now comes along **scalar UDF inlining**. The __query processor__ can __take the code__ (a UDF could have multiple T-SQL statements) and is able **to integrate those statements with the overall query**, hence the term **//inlining//**. * How to enable scalar UDF inlining by using dbcompat in the [[documentation>>https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/scalar-udf-inlining?view=sql-server-ver15#enabling-scalar-udf-inlining||target="_blank" rel="noopener noreferrer"]]. * How to disable and enable scalar UDF inlining without changing dbcompat [[here>>https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/scalar-udf-inlining?view=sql-server-ver15#disabling-scalar-udf-inlining-without-changing-the-compatibility-level||target="_blank" rel="noopener noreferrer"]]. {{code language="SQL"}} /* You can make workloads automatically eligible for Scalar UDF Inlining by enabling compatibility level 150 for the database. You can set this using Transact-SQL */ ALTER DATABASE [WideWorldImportersDW] SET COMPATIBILITY_LEVEL = 150; --Disabling Scalar UDF Inlining without changing the compatibility level ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = OFF; -- To re-enable Scalar UDF Inlining for the database, execute the following statement within the context of the applicable database: ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON; {{/code}} Example is loosely based off of examples in the following [[blog post>>https://docs.microsoft.com/en-ca/archive/blogs/sqlserverstorageengine/introducing-scalar-udf-inlining||target="_blank" rel="noopener noreferrer"]] by Microsoft, which also has some really nice details on the previous limits of scalar UDF functions and how Intelligent Query Processing has enabled significant performance improvements. You can see the power of scalar UDF inlining; now you should feel more empowered to use scalar UDFs in your applications. You can read more about scalar UDF inlining, including all the requirements and restrictions [[here>>https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/scalar-udf-inlining?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]]. == Approximate Count Distinct == There are scenarios where you need to count the number of rows in any table. That’s easy. But there are also situations where __you need to know the number of distinct values of a column__ across all rows of a table. The only problem is how the query processor must do work to figure out what are all the distinct values. For SQL Server, this often requires the use of a Hash Match operator. This operator is similar to a Hash Join, in that a “hash table” is used to build a list of all the distinct values to count. A **Hash Join** requires a **memory grant**, so __all the issues with memory grants can come into play__. Furthermore, it can take a//__ great deal of compute resources to use a hash table to count all the distinct values__//. Is there a better way? Well, there is a different way that could be faster, at the cost of a slightly less precise answer. The solution is a new T-SQL function called **APPROX_COUNT_DISTINCT()**. This is a built-in function that will count __distinct values of a column based on a sample approximation__. This is not an enhancement for the COUNT() function. This is an entirely new function, which is why it does not require dbcompat = 150. This function uses a concept called [[HyperLogLog>>https://en.wikipedia.org/wiki/HyperLogLog||target="_blank" rel="noopener noreferrer"]]. Using an approximation of the count of distinct values comes with a **2% error rate** on a **97% probability**. This means if you can get back an __answer__ that you are __pretty confident will be close to the real truth__, you can use this function. You may wonder why I forced the dbcompat to 130 – to prove to you that __you don’t have to use the latest dbcompat of 150__ to take advantage of this capability. This is because the new T-SQL function [[**APPROX_COUNT_DISTINCT()**>>https://docs.microsoft.com/en-us/sql/t-sql/functions/approx-count-distinct-transact-sql?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]] just comes with the SQL Server 2019 engine but doesn’t require a new dbcompat like other Intelligent Query Processing features. * If you look at the execution plan, notice the **Hash Match operator**. If you move your mouse over that operator, you will see it uses **Row Mode** and __has to process the entire 29M rows__ in the hash operator. * The second query should only take a second or two – about** 50% faster** than before. Again, this could be significant on very large datasets. If you look at the execution plan, it looks similar but with less operators, notice the Hash Match operator doesn’t have a “thick line” as output because the approximation operation is applied with this operator yielding only one row to the rest of the plan. As you can see, the use of approximation for counting distinct values can provide better performance provided you need only a “close enough” value. = Lightweight Query Profiling = SQL Server has amazing diagnostics for performance problems, including //Dynamic Management Views// (**DMVs**) and **Extended Events**. We had built DMVs to be a great mechanism to see what is running at any point in time. This is a great way to find out about active sessions and what queries they are running. But often, to solve a complex performance problem, you need details of the query plan. This team had built the concept already of [[live query stats>>https://docs.microsoft.com/en-us/sql/relational-databases/performance/live-query-statistics?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]], but lightweight profiling allows to do much more . In fact, what the team had built was a __query execution statistics profile infrastructure__, or **standard profiling**. This capability__ gives you actual execution plan statistics__ at the __operator level__ for rows, CPU, and I/O. This is key information to profile a query, but there is a catch. You **must enable this before running the query** or **enable Extended Events **for all queries which can be impactful to production workloads. You can read more about [[**standard profiling**>>https://docs.microsoft.com/en-us/sql/relational-databases/performance/query-profiling-infrastructure?view=sql-server-ver15#the-standard-query-execution-statistics-profiling-infrastructure||target="_blank" rel="noopener noreferrer"]]. “Can we make this better?” They created the __lightweight query execution statistics profiling infrastructure__, or [[**lightweight profiling**>>https://docs.microsoft.com/en-us/sql/relational-databases/performance/query-profiling-infrastructure?view=sql-server-ver15#lwp||target="_blank" rel="noopener noreferrer"]]. The concept is to get profiling for queries **without the overhead** required from standard profiling. However, to make it “light,” we had __to take out collecting__ __CPU statistics__ so you still get __“per operator” rowcount__ and__ I/O statistics__. This is great, but… you still need to turn this on to make it work. The true answer is to **just have** __lightweight profiling running **by default**.__ You __only__ get __rowcount information__ from actively running queries, but often that is enough to help look at performance problems. But there is a bonus. We’ve added the ability to get the __last actual execution plan__ for most cached queries. When **lightweight query profiling** is on by default in SQL Server 2019, row_count is the only statistics captured. It can be expensive to capture statistics like CPU and I/O by default. You can capture these with **standard profiling**. * How do you find out whether this query will finish anytime soon or should be killed and corrected? How built-in, on by default, Lightweight Query Profiling can help give you the answer. ** Example: What does “__No Join Predicate__” mean? It means there is a major problem with the JOIN operator in the query. It means there really is no “equi” join. You will see that this query simply joins a table to itself. A simple typo of si vs. sil is the problem. This query will almost never finish. **Lightweight query profiling** also includes extended events and query hint support to enable it. You can read more about [[how to enable>>https://docs.microsoft.com/en-us/sql/relational-databases/performance/query-profiling-infrastructure?view=sql-server-ver15#lwp||target="_blank" rel="noopener noreferrer"]] these, plus how to disable the feature per database. Lightweight profiling is enabled by default on SQL Server 2019 (15.x) and Azure SQL Database. Starting with SQL Server 2019 (15.x), trace flag 7412 has no effect. Lightweight profiling can be disabled at the database level using the LIGHTWEIGHT_QUERY_PROFILING [[database scoped configuration>>url:https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-database-scoped-configuration-transact-sql?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]]: ALTER DATABASE SCOPED CONFIGURATION SET LIGHTWEIGHT_QUERY_PROFILING = OFF;. A new DMF [[sys.dm_exec_query_plan_stats>>url:https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-query-plan-stats-transact-sql?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]] is introduced to return the equivalent of the last known actual execution plan for most queries, and is called //last query plan statistics//. The last query plan statistics can be enabled at the database level using the LAST_QUERY_PLAN_STATS [[database scoped configuration>>url:https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-database-scoped-configuration-transact-sql?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]]: ALTER DATABASE SCOPED CONFIGURATION SET LAST_QUERY_PLAN_STATS = ON;. Starting with lightweight profiling v2 and its low overhead, any server that is not already CPU bound can run lightweight profiling **continuously**, and allow database professionals to tap into any running execution at any time, for example using Activity Monitor or directly querying sys.dm_exec_query_profiles, and get the query plan with runtime statistics. Now with the ability to see the actual execution plan at any time vs. having to turn on special flags that may be expensive or run the query offline, this becomes a powerful part of your toolkit for query performance tuning and troubleshooting. Lightweight query profiling is just plain cool! If you have spent anytime supporting production SQL Servers for performance issues, having built-in diagnostics available anytime, anywhere is a breath of fresh air. {{code language="SQL"}} -- What does the estimated plan say? Looks like the right plan based on estimates SELECT st.text, cp.plan_handle, qp.query_plan FROM sys.dm_exec_cached_plans AS cp CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp WHERE qp.dbid = db_id('WideWorldImporters') GO -- What does the last actual plan say? Ooops. Actual vs Estimates way off SELECT st.text, cp.plan_handle, qps.query_plan, qps.* FROM sys.dm_exec_cached_plans AS cp CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st CROSS APPLY sys.dm_exec_query_plan_stats(cp.plan_handle) AS qps WHERE qps.dbid = db_id('WideWorldImporters') GO {{/code}} = In-Memory Database = In SQL Server 2014, we introduced a feature called In-Memory OLTP, which centered on the concept of memory-optimized tables. For this feature, the entire table is stored in memory, but it is the optimized (read: latch-free) access that makes it special. We made significant enhancements to In-Memory OLTP in SQL Server 2016. You can read more details about that [[feature>>https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/in-memory-oltp-in-memory-optimization?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]]. For SQL Server 2019 The engineering team met together and collectively decided to call a new suite of features In-Memory Database to add to the capabilities of In-Memory OLTP. Together, the following features have become the In-Memory Database feature suite: You can see the full collection of this feature suite at [[here>>https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-database?view=sql-server-ver15&viewFallbackFrom=sqlallproducts-allversions||target="_blank" rel="noopener noreferrer"]]. • In-Memory OLTP • Memory-Optimized TempDB Metadata • Hybrid Buffer Pool • Persistent Memory Support == Memory-Optimized TempDB Metadata == Even since I’ve been involved with SQL Server, the concurrency performance of workloads using temporary tables has been an issue. This has led for almost every SQL Server administrator to configure tempdb to use multiple files. You can read more about the history of this adventure with these resources: * Inside TempDB talk by Bob Ward from the 2017 PASS Summit ([[link>>https://www.youtube.com/watch?v=SvseGMobe2w||target="_blank" rel="noopener noreferrer"]][[)>>https://www.youtube.com/watch?v=SvseGMobe2w)||target="_blank" rel="noopener noreferrer"]] * Paul Randal’s blog on adding tempdb files ([[link>>https://www.sqlskills.com/blogs/paul/correctly-adding-data-files-tempdb/||target="_blank" rel="noopener noreferrer"]]) One aspect to using tempdb files that most SQL professionals don’t realize (because it is just common culture now) is that you are __creating a partitioning scheme__ for the SQL Server engine **to access allocation pages** such as **PFS**, **GAM**, and **SGAM **pages. This type of scheme is useful because a w__orkload using temporary tables __results in a __heavy create table, allocate pages, drop table cycle__. This creates contention on these system allocation pages in the form of **latches**. **By creating multiple files**, you are __spreading out the contention for latches on these pages__, which increases performance for concurrent tempdb workloads. After creating multiple files (starting with SQL Server 2016, the setup program can do this for you automatically, or you can configure it manually), you may see with a heavier concurrent tempdb-based workload more page latch waits, but these waits are on pages belonging to objects you may not recognize – objects like sysschobjs. These page latch waits are for system table pages in tempdb. When you create and drop tables at a rapid pace, SQL Server must perform internal read/write operations on pages for system tables to keep the metadata of tables consistent. These operations result in page latch pressure across users. In the past when customers would run into high page latch contention on system tables and contact me in support, I would answer, “//Unfortunately you must reduce the load of tempdb usage to avoid this problem.//” Pam Lahoud describes this problem very well in her [[blog>>https://docs.microsoft.com/en-ca/archive/blogs/sql_server_team/tempdb-files-and-trace-flags-and-updates-oh-my||target="_blank" rel="noopener noreferrer"]]. Along comes a solution for **SQL Server 2019**, **__memory-optimized tempdb metadata__**. Memory-optimized tables (remember the famous project Hekaton) are latch-free by their nature, and the data for these tables all exist in memory. If the memory-optimized tables are “schema only,” they don’t even have durability constraints. This is a perfect platform for tempdb system tables. Since tempdb is recreated with each server restart, the system tables don’t need to be durable. And since the only data being stored in memory-optimized tables is metadata, the memory consumption for these should be small. Tempdb metadata does not use memory-optimized tables by default when you install SQL Server. You must run the following T-SQL command to enable this capability. You can see me [[demo>>https://sqlbits.com/Sessions/Event18/Keynote||target="_blank" rel="noopener noreferrer"]] this feature at the 2019 SQLBits keynote. {{code language="SQL"}} ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED TEMPDB_METADATA = ON {{/code}} You can see the huge benefits of this feature built into the engine. You only turn on a server configuration option, restart SQL Server, and you are ready to go. If you access to catalog views in tempdb, you will see there are a few restrictions when using memory-optimized tempdb metadata. You can read about these restrictions and all the details of this capability. [[Link>>https://docs.microsoft.com/en-us/sql/relational-databases/databases/tempdb-database?view=sql-server-ver15&viewFallbackFrom=sqlallproducts-allversions#memory-optimized-tempdb-metadata||target="_blank" rel="noopener noreferrer"]]. == Hybrid Buffer Pool == Persistent memory devices have been around for a few years but are now starting to become more popular. The concept is memory-based hardware that have persistence through a power source. Think the speed of RAM but any data stored is guaranteed to survive a power restart. One of the more popular persistent memory offerings is by Intel, called [[Optane>>https://www.intel.com/content/www/us/en/architecture-and-technology/optane-technology/optane-for-data-centers.html||target="_blank" rel="noopener noreferrer"]]. Our SQL Server engineering team is always looking to find ways to optimize access to data, and, with persistent memory, there are several opportunities. In fact, SQL Server 2016 included a feature called “[[tail of the log caching>>https://docs.microsoft.com/en-ca/archive/blogs/sqlserverstorageengine/transaction-commit-latency-acceleration-using-storage-class-memory-in-windows-server-2016sql-server-2016-sp1||target="_blank" rel="noopener noreferrer"]]” based on persistent memory. Since persistent memory is in fact memory, SQL Server can access any data stored on a persistent memory device like it is really memory. This means SQL Server can find creative ways to bypass kernel code for I/O processing when accessing data on persistent memory devices. One such new capability is **Hybrid Buffer Pool**. The concept is that if you **place your database data files on a persistent memory device**, SQL Server can simply access pages on the data file from this device without having to copy data from the data file into a buffer pool page. Hybrid buffer pool uses memory-mapped kernel calls to make this a reality. If a database page is modified, it must then be copied into the buffer pool and then eventually written back to the persistent memory device. Performance results vary on the benefits of using hybrid buffer pool, but you can typically expect some boost from this technology, especially on read-heavy workloads. For SQL Server, provided you have placed one or more database files on a persistent memory device, you can enable Hybrid Buffer Pool for all databases for SQL Server with the T-SQL statement: {{code language="SQL"}} -- entire server ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED HYBRID_BUFFER_POOL = ON; -- specific database ALTER DATABASE <databaseName> SET MEMORY_OPTIMIZED = ON {{/code}} When you enable hybrid buffer pool for all databases, you must restart SQL Server. To read more about how to enable your devices for persistent memory for databases, how to disable hybrid buffer pool, and best practices for using hybrid buffer pool, consult the [[documentation>>https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/hybrid-buffer-pool?view=sql-server-ver15||target="_blank" rel="noopener noreferrer"]]. == Persistent Memory Support == If you don’t want to enable hybrid buffer pool but would like SQL Server to take advantage of reading and writing both data and transaction log data to persistent memory devices, you can configure your device as a persistent memory device **on Linux**. SQL Server will automatically detect it and use memory-based operations to move data into SQL Server cache and the device, bypassing the Linux kernel I/O stack. This process is called **enlightenment**. Dell EMC was able to see significant performance improvements with enlightenment as documented [[here>>https://corporate.delltechnologies.com/en-us/newsroom/announcements/2019/04/20190402-01.htm||target="_blank" rel="noopener noreferrer"]]. According to Dell, “With new Intel® Optane™ DC persistent memory, customers can accelerate in-memory databases, virtualization and data analytics workloads with up to 2.5 times more memory capacity for select PowerEdge Servers. You can read all the details about how to enable your persistent memory device on Linux for SQL Server [[here>>https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-configure-pmem?view=sql-server-ver15&viewFallbackFrom=sqlallproducts-allversions||target="_blank" rel="noopener noreferrer"]]. = Last-Page Insert Contention = Here is a common problem for SQL Server users for a very long time. You want to build a table with a __primary key__ that will be used in a __clustered index__. And this primary key is a sequential value. In other words, each insert of a row leads to a new value in an incrementing order. The most common form of this type of key is a column using a SEQUENCE object or an __IDENTITY__ property. While the design works fine in most cases, it presents a challenging problem for application performance. __Each time a query needs to modify a page__, SQL Server must physically __protect other queries from changing or reading the page__ structure at the same time (even with row-level locking) using a **page latch**. If many users were all trying to modify the same page, your __application might suffer in performance__ due to **page latch contention**. If you build a __clustered index on a sequential key__, the data is sorted on that key. Each insert will be trying to insert a new row in the last page of the clustered index leaf level. And if many users are concurrently executing inserts, they could all end up trying to modify the last page of the index, hence the term last-page insert contention. While this contention is not ideal, it normally is not too much of a problem, until a phenomenon called a latch [[**__//convoy//__**>>https://blog.acolyer.org/2019/07/01/the-convoy-phenomenon/||target="_blank" rel="noopener noreferrer"]] occurs. SQL Server and the **last-page insert contention problem**, a **page split **is an example of a scenario where a **convoy** can build up. A page split can easily occur when there are not enough rows on the page for a new INSERT, and a new page must be created in the clustered index. Many users in the SQL community, technical support, and engineering have tackled this problem in many different ways over the years. This support article mentions many of them, [[link>>https://docs.microsoft.com/en-US/troubleshoot/sql/performance/resolve-pagelatch-ex-contention||target="_blank" rel="noopener noreferrer"]]. The solution now exists in the form of a **new option** for **indexes** called **OPTIMIZE_FOR_SEQUENTIAL_KEY**. By adding this option to your index or primary key constraint, you are telling SQL Server to enable new code to try and avoid the convoy problem. This option doesn’t eliminate latches or prevent a latch contention problem. What it does is try to avoid the dreaded convoy problem so that your workload throughput is consistent. You can read more about this option and how to use it in our [[documentation>>https://docs.microsoft.com/en-us/sql/t-sql/statements/create-index-transact-sql?view=sql-server-ver15&viewFallbackFrom=sqlallproducts-allversions#sequential-keys||target="_blank" rel="noopener noreferrer"]]. If you use this option, you may notice a new wait_type called BTREE_INSERT _FLOW_CONTROL. This is normal and part of the mechanism to avoid or reduce the convoy problem. This option is not for everyone. If you don’t use a sequential key for a clustered index or don’t see heavy contention, then I wouldn’t recommend using this option. In fact, you might experience worse performance by blindly applying this to any clustered index. It is possible the techniques as described in [[article>>https://docs.microsoft.com/en-US/troubleshoot/sql/performance/resolve-pagelatch-ex-contention||target="_blank" rel="noopener noreferrer"]] may provide you better performance, but this new method for an index may give you the consistent performance you need and is far less intrusive to your application.