SQL Server Architecture
- Contents
DATABASE TRANSACTIONS
A transaction is a unit of work in a database, that it must complete all the commands in their entirety or none of them (atomocity)
ACID Properties
- Atomicity
- Atomicity means that all the effects of the transaction must complete successfully or the changes are rolled back.
- Consistency
- The consistency requirement ensures that the transaction cannot break the integrity rules of the database; it must leave the database in a consistent state.
- Isolation
- Each transaction must be entirely self-contained, and changes it makes must not be readable by any other transaction, control the degree of isolation in order to find a balance between business and performance requirements.
- Durability
- Once a transaction is committed, it must persist even if there is a system failure — that is, it must be durable.
SQL Server Transactions
- Implicit transactions are used automatically by SQL Server to guarantee the ACID properties of single commands.
- Explicit transactions are started by using the BEGIN TRANSACTION T-SQL command and are stopped by using the COMMIT TRANSACTION or ROLLBACK TRANSACTION commands.
- Explicit transactions are used to group together changes to which you want to apply the ACID properties as a whole, which also enables you to roll back the changes at any point if your business logic determines that you should cancel the change.
The Life Cycle of a Query
SQL Server is divided into two main engines: the Relational Engine and the Storage Engine.
- The Relational Engine is also sometimes called the query processor because its primary function is query optimization and execution. It contains a Command Parser to check query syntax and prepare query trees; a Query Optimizer that is arguably the crown jewel of any database system; and a Query Executor responsible for execution.
- The Storage Engine is responsible for managing all I/O to the data, and it contains the Access Methods code, which handles I/O requests for rows, indexes, pages, allocations and row versions; and a Buffer Manager, which deals with SQL Server’s main memory consumer, the buffer pool. It also contains a Transaction Manager, which handles the locking of data to maintain isolation (ACID properties) and manages the transaction log.
The Buffer Pool
Buffer pool, which is the largest consumer of memory in SQL Server. The buffer pool contains all the different caches in SQL Server, including the plan cache and the data cache.
SQL Server Network Interface (SNI)
- protocol layer that establishes the network connection between the client and the server.
- It consists of a set of APIs that are used by both the database engine and the SQL Server Native Client (SNAC).
- once the connection is established, SNI creates a secure connection to a TDS endpoint (described next) on the server, which is then used to send requests and receive data.
- SNI replaces the net-libraries found in SQL Server 2000 and the Microsoft Data Access Components (MDAC), which are included with Windows.
- SNI isn’t configurable directly; you just need to configure a network protocol on the client and the server.
- Protocol SQL Server supports
- Shared Memory
- Simple and fast, shared memory is the default protocol used to connect from a client running on the same computer as SQL Server. It can only be used locally, has no configurable properties, and is always tried first when connecting from the local machine.
- TCP/IP
- This is the most commonly used access protocol for SQL Server. It enables you to connect to SQL Server by specifying an IP address and a port number. (1433 tcp, 1434 udp).
- Your internal name resolution system resolves the hostname part of the instance name to an IP address
- you connect to the default TCP port number 1433 for default instances or the SQL Browser service will find the right port for a named instance using UDP port 1434.
- Name Pipes
- Named Pipes was developed for local area networks (LANs) but it can be inefficient across slower networks such as wide area networks (WANs). (TCP 445)
- You create SQL Server alias, which connects to the server using Named Pipes.
- VIA
- Virtual Interface Adapter is a protocol that enables high-performance communications between two systems. It requires specialized hardware at both ends and a dedicated connection.
- You create SQL Server alias, which connects to the server using VIA.
- While SQL Server 2012 still supports the VIA protocol, it will be removed from a future version so new installations using this protocol should be avoided.
- Shared Memory
Tabular Data Stream (TDS) Endpoints
- TDS is a Microsoft-proprietary protocol originally designed by Sybase that is used to interact with a database server.
- Once a connection has been made using a network protocol such as TCP/IP, a link is established to the relevant TDS endpoint that then acts as the communication point between the client and the server.
- There is one TDS endpoint for each network protocol and an additional one reserved for use by the dedicated administrator connection (DAC).
- Once connectivity is established, TDS messages are used to communicate between the client and the server
- The SELECT statement is sent to the SQL Server as a TDS message across a TCP/IP connection TCP/IP is the default protocol).
Protocol Layer
- When the protocol layer in SQL Server receives your TDS packet, it has to reverse the work of the SNI at the client and unwrap the packet to find out what request it contains.
- The protocol layer is also responsible for packaging results and status messages to send back to the client as TDS
messages.
Command Parser
- The Command Parser’s role is to handle T-SQL language events.
- It first checks the syntax and returns any errors back to the protocol layer to send to the client.
- If the syntax is valid, then the next step is to generate a query plan or find an existing plan.
- A query plan contains the details about how SQL Server is going to execute a piece of code. It is commonly referred to as an execution plan.
- To check for a query plan, the Command Parser generates a hash of the T-SQL and checks it against the plan cache to determine whether a suitable plan already exists.
- The plan cache is an area in the buffer pool used to cache query plans.
Plan Cache
- The plan cache, part of SQL Server’s buffer pool, is used to store execution plans in case they are needed later.
- If no cached plan is found, then the Command Parser generates a query tree based on the T-SQL. A query tree is an internal structure whereby each node in the tree represents an operation in the query that needs to be performed.
Query Optimizer
- The Query Optimizer is the most prized possession of the SQL Server team and one of the most complex and secretive parts of the product.
- “cost-based” optimizer
- find a good plan in a reasonable amount of time, rather than the best plan, finding the most efficient plan
- performs multi-stage optimization, increasing the number of decisions available to find a good plan at each stage
- pre-optimization, simple query or trivial plans, removing the need for additional costs
- phase 0 - looks at nested loop joins and won’t consider parallel operators, transaction processing, or TP, plan. < 0.2
- phase 1 - looks for common patterns. quick plans, < 1.0
- phase 2 - final phase is where the optimizer pulls out all the stops and is able to use all of its optimization rules. looks at parallelism and indexed views, is a balance between the cost of the plan found versus the time spent optimizing. full plan
Query Executor
- it executes the query plan by working through each step it contains and interacting with the Storage Engine to retrieve or modify data
- The SELECT query needs to retrieve data, so the request is passed to the Storage Engine through an OLE DB interface to the Access Methods.
Access Methods
- provides the storage structures for your data and indexes, as well as the interface through which data is retrieved and modified.
- but it doesn’t actually perform the operation itself; it passes the request to the Buffer Manager.
- at are all on a single page. The Access Methods code will ask the Buffer Manager to retrieve the page so that it can prepare an OLE DB rowset to pass back to the Relational Engine.
Buffer Manager
- manages the buffer pool, which represents the majority of SQL Server’s memory usage.
- If the page isn’t already in cache, then the Buffer Manager gets the page from the database on disk, puts it in the data cache, and passes the results to the Access Methods.
- The key point to take away from this is that you only ever work with data in memory.
- Every new data read that you request is first read from disk and then written to memory (the data cache) before being returned as a result set.
- This is why SQL Server needs to maintain a minimum level of free pages in memory; you wouldn’t be able to read any new data if there were no space in cache to put it first.
- The Buffer Manager checked whether it already had it in the data cache, and then loaded it from disk into the cache when it couldn’t find it.
Data Cache
- The data cache is usually the largest part of the buffer pool; therefore, it’s the largest memory consumer within SQL Server.
- It is here that every data page that is read from disk is written to before being used.
- The sys.dm_os_buffer_descriptors DMV contains one row for every data page currently held in cache. You can use this script to see how much space each database is using in the data cache:
SELECT count (*) * 8/1024 AS 'Cached Size (MB)' ,
CASE database_id WHEN 32767 THEN 'ResourceDb' ELSE db_name(database_id) END AS 'Database'
FROM //sys.dm_os_buffer_descriptors //
GROUP BY db_name(database_id),database_id
ORDER BY 'Cached Size (MB)' DESC- The amount of time that pages stay in cache is determined by a least recently used (LRU) policy.
- The header of each page in cache stores details about the last two times it was accessed, and a periodic scan through the cache examines these values. A counter is maintained that is decremented if the page hasn’t been accessed for a while; and when SQL Server needs to free up some cache, the pages with the lowest counter are flushed first. (lazy writer process)
- You can view how long SQL Server expects to be able to keep a page in cache by looking at the MSSQL$<instance>:Buffer Manager\Page Life Expectancy counter in Performance Monitor. Page life expectancy (PLE) is the amount of time, in seconds, that SQL Server expects to be able to keep a page in cache.
- Under memory pressure, data pages are flushed from cache far more frequently. Microsoft has a long standing recommendation for a minimum of 300 seconds for PLE but a good value is generally considered to be 1000s of seconds these days.
- query is now in the data cache in the buffer pool and will have an entry in the sys.dm_os_buffer_descriptors DMV. Now that the Buffer Manager has the result set, it’s passed back to the Access Methods to make its way to the client.
Basic SELECT query
1. The SQL Server Network Interface (SNI) on the client established a connection to the SNI on the SQL Server using a network protocol such as TCP/IP. It then created a connection to a TDS endpoint over the TCP/IP connection and sent the SELECT statement to SQL Server as a TDS message.
2. The SNI on the SQL Server unpacked the TDS message, read the SELECT statement, and passed a “SQL Command” to the Command Parser.
3. The Command Parser checked the plan cache in the buffer pool for an existing, usable query plan that matched the statement received. When it didn’t find one, it created a query tree based on the SELECT statement and passed it to the Optimizer to generate a query plan.
4. The Optimizer generated a “zero cost” or “trivial” plan in the pre-optimization phase because the statement was so simple. The query plan created was then passed to the Query Executor for execution.
5. At execution time, the Query Executor determined that data needed to be read to complete the query plan so it passed the request to the Access Methods in the Storage Engine via an OLE DB interface.
6. The Access Methods needed to read a page from the database to complete the request from the Query Executor and asked the Buffer Manager to provision the data page.
7. The Buffer Manager checked the data cache to see if it already had the page in cache. It wasn’t in cache so it pulled the page from disk, put it in cache, and passed it back to the Access Methods.
8. Finally, the Access Methods passed the result set back to the Relational Engine to send to the client.
A Simple Update Query
The good news is that the process is exactly the same as the process for the SELECT statement you just looked at until you get to the Access Methods.
The Access Methods need to make a data modification this time, so before the I/O request is passed on, the details of the change need to be persisted to disk. That is the job of the Transaction Manager.
The actual data modification can only be performed when confirmation is received that the operation has been physically written to the transaction log. This is why transaction log performance is so crucial.
Transaction Manager
- Lock Manager
- is responsible for providing concurrency to the data, and it delivers the configured level of isolation (as defi ned in the ACID properties)
- Log Manager
- The Access Methods code requests that the changes it wants to make are logged, and the Log Manager writes the changes to the transaction log. This is called write-ahead logging (WAL).
- Writing to the transaction log is the only part of a data modification transaction that always needs a physical write to disk because SQL Server depends on being able to reread that change in the event of system failure.
- What’s actually stored in the transaction log isn’t a list of modification statements but only details of the page changes that occurred as the result of a modification statement.
- The actual data modification can only be performed when confirmation is received that the operation has been physically written to the transaction log. This is why transaction log performance is so crucial.
- Once confirmation is received by the Access Methods, it passes the modification request on to the Buffer Manager to complete.
Buffer Manager
- The page that needs to be modified is already in cache, so all the Buffer Manager needs to do is modify the page required by the update as requested by the Access Methods.
- The page is modified in the cache, and confirmation is sent back to Access Methods and ultimately to the client.
- the UPDATE statement has changed the data in the data cache, not in the actual database file on disk.
- This is done for performance reasons, and the page is now what’s called a dirty page because it’s different in memory from what’s on disk.
- the durability of the modification as defined in the ACID properties you can re-create the change using the transaction log
Recovery
- Modifications are written to the transaction log first and are then action in memory only.
- This is done for performance reasons and enables you to recover the changes from the transaction log if
necessary.
Dirty Pages
- When a page is read from disk into memory it is regarded as a clean page because it’s exactly the same as its counterpart on the disk.
- However, once the page has been modified in memory it is marked as a dirty page.
- Clean pages can be flushed from cache using dbcc dropcleanbuffers, which can be handy when you’re troubleshooting development and test environments because it forces subsequent reads to be fulfilled from disk, rather than cache, but doesn’t touch any dirty pages.
- A dirty page is simply a page that has changed in memory since it was loaded from disk and is now different from the on-disk page.
SELECT db_name(database_id) AS 'Database',count(page_id) AS 'Dirty Pages'
FROM sys.dm_os_buffer_descriptors
WHERE is_modified =1
GROUP BY db_name(database_id)
ORDER BY count(page_id) DESC- These dirty pages will be written back to the database file periodically whenever the free buffer list is low or a checkpoint occurs.
- SQL Server always tries to maintain a number of free pages in cache in order to allocate pages quickly, and these free pages are tracked in the free buffer list.
- Whenever a worker thread issues a read request, it gets a list of 64 pages in cache and checks whether the free buffer list is below a certain threshold. If it is, it will try to age-out some pages in its list, which causes any dirty pages to be written to disk. Another thread called the lazy writer also works based on a low free buffer list.
Lazy Writer
- The lazy writer is a thread that periodically checks the size of the free buffer list.
- When it’s low, it scans the whole data cache to age-out any pages that haven’t been used for a while.
- If it finds any dirty pages that haven’t been used for a while, they are flushed to disk before being marked as free
in memory. - The lazy writer also monitors the free physical memory on the server and will release memory from the free buffer list back to Windows in very low memory conditions.
- When SQL Server is busy, it will also grow the size of the free buffer list to meet demand (and therefore the buffer pool) when there is free physical memory and the configured Max Server Memory threshold hasn’t been reached
Checkpoint Process
- A checkpoint is a point in time created by the checkpoint process at which SQL Server can be sure that any committed transactions have had all their changes written to disk.
- This checkpoint then becomes the marker from which database recovery can start.
- The checkpoint process ensures that any dirty pages associated with a committed transaction will be flushed to disk.
- It can also flush uncommitted dirty pages to disk to make efficient use of writes
- but unlike the lazy writer, a checkpoint does not remove the page from cache; it ensures the dirty page is written to disk and then marks the cached paged as clean in the page header.
- By default, on a busy server, SQL Server will issue a checkpoint roughly every minute, which is marked in the transaction log.
- The time between checkpoints, therefore, represents the amount of work that needs to be done to roll forward any committed transactions that occurred after the last checkpoint, and to roll back any transactions that were not committed.
- By checkpointing every minute, SQL Server is trying to keep the recovery time when starting a database to less than one minute, but it won’t automatically checkpoint unless at least 10MB has been written to the log within the period.
- Checkpoints can also be manually called by using the CHECKPOINT T-SQL command, and can occur because of other events happening in SQL Server.
- Trace flag 3502 records in the error log when a checkpoint starts and stops.
Recovery Interval
- Recovery Interval is a server configuration option that can be used to influence the time between checkpoints
- By default, the recovery interval is set to 0;
- this enables SQL Server to choose an appropriate interval
- which usually equates to roughly one minute between automatic checkpoints
- Changing this value to greater than 0 represents the number of minutes you want to allow between checkpoints
- Under most circumstances you won’t need to change this value
- SQL Server evens throttles checkpoint I/O to stop it from affecting the disk subsystem too much, so it’s quite good at self-governing.
- SLEEP_BPOOL_FLUSH wait type on your server, that means checkpoint I/O was throttled to maintain overall system performance
Recovery Models
- SQL Server has three database recovery models: full, bulk-logged, and simple
- affects the way the transaction log is used and how big it grows, your backup strategy, and your restore options.
Full Recovery Model
- all their operations fully logged in the transaction log and must have a backup strategy that includes full backups and transaction log backups
- Starting with SQL Server 2005, full backups don’t truncate the transaction log
- This is done so that the sequence of transaction log backups isn’t broken and it gives you an extra recovery option if your full backup is damaged.
- highest level of recoverability should use the full recovery
- point-in-time-recovery is possible, based on the interval of the transaction log backup
Bulk-logged
- special recovery model because it is intended to be used only temporarily to improve the performance of certain bulk operations by minimally logging them
- all other operations are fully logged just like the full recovery model.
- only the information required to roll back the transaction is logged. Redo information is not logged, which means you also lose point-in-time-recovery.
- These bulk operations include the following:
➤ BULK INSERT
➤ Using the bcp executable
➤ SELECT INTO
➤ CREATE INDEX
➤ ALTER INDEX REBUILD
➤ DROP INDEX
BULK-LOGGED AND TRANSACTION LOG BACKUPS
Using bulk-logged mode is intended to make your bulk-logged operation complete faster. It does not reduce the disk space requirement for your transaction log backups.Simple
- all committed transactions are truncated from the transaction log every time a checkpoint occurs.
- ensures that the size of the log is kept to a minimum and that transaction log backups are not necessary (or even possible)
- If the potential to lose all changes since the last full or differential backup still meets your business requirements, then simple recovery might be the way to go.
EXECUTION MODEL AND THE SQLOS
- SQLOS is a thin user-mode layer that sits between SQL Server and Windows
- for low-level operations such as
- scheduling,
- I/O completion,
- memory management,
- and resource management.
Execution Model
- When an application authenticates to SQL Server it establishes a connection in the context of a session, which is identified by a session_id, (in older versions of SQL Server this was called a SPID)
- You can view a list of all authenticated sessions by querying the sys.dm_exec_sessions DMV
- When an execution request is made within a session, SQL Server divides the work into one or more tasks and then associates a worker thread to each task for its duration. Each thread can be in one of three states (that you need to care about)
Running
- A processor can only execute one thing at a time and the thread currently executing on a processor will have a state of running.
Suspended
- SQL Server has a co-operative scheduler so running threads will yield the processor and become suspended while they wait for a resource. This is what we call a wait in SQL Server.
Runnable
- When a thread has finished waiting, it becomes runnable which means that it’s ready to execute again. This is known as a signal wait.
- If no worker threads are available and max worker threads has not been reached, then SQL Server will allocate a new worker thread. If the max worker threads count has been reached, then the task will wait with a wait type of THREADPOOL until a thread becomes available.
- The default max workers count is based on the CPU architecture (64 or 32 bit) and the number of logical processors.
INCREASING THE MAX WORKER THREADS SETTING
Running out of worker threads (THREADPOOL wait type) is often a symptom of large numbers of concurrent parallel execution plans (since one thread is used per processor), or it can even indicate that you’ve reached the performance capacity of the server and need to buy one with more processors. Either way, you’re usually better off trying to solve the underlying problem rather than overriding the default Max Worker Threads setting.- Each worker thread requires 2MB of RAM on a 64-bit server and 0.5MB on a 32-bit server, so SQL Server creates threads only as it needs them, rather than all at once.
- The sys.dm_os_workers DMV contains one row for every worker thread, so you can see how many threads SQL Server currently has
Schedulers
- Each thread has an associated scheduler, which has the function of scheduling time for each of its threads on a processor.
- The number of schedulers available to SQL Server equals the number of logical processors that SQL Server can use plus an extra one for the dedicated administrator connection (DAC). sys.dm_os_schedulers DMV.
- Windows is a general-purpose OS and is not optimized for server-based applications, SQL Server in particular
- the goal of the Windows development team is to ensure that all applications, written by a wide variety of developers inside and outside Microsoft, will work correctly and have good performance
- the scheduling in Windows is very basic to ensure that it’s suitable for a common cause
- but if an application does its own scheduling then there is more intelligence about what to choose next, such as assigning some threads a higher priority or deciding that choosing one thread for execution will avoid other threads being blocked later
- The basic scheduler in Windows is known as a pre-emptive scheduler and it assigns slices of time known as quantums to each task to be executed
- the downside is that execution can be interrupted at any point as Windows balances execution requests from multiple processes
- All versions of SQL Server up to and including version 6.5 used the Windows scheduler to take advantage of the work that the Windows team had done through a long history of optimizing processor usage.
- For SQL Server 7.0, Microsoft decided that SQL Server should handle its own scheduling, and created the User Mode Scheduler (UMS) to do just that.
- The UMS was designed as a co-operative scheduling model whereby threads aren’t forcibly interrupted during execution but instead voluntarily yield the processor when they need to wait for another resource.
- When a thread yields the processor, a wait type is assigned to the task to help describe the wait and aid you in troubleshooting performance issues.
SQLOS
Prior to SQLOS (which was first implemented in SQL Server 2005), low-level operations such as scheduling, I/O completion, memory management, and resource management were all handled by different teams, which resulted in a lot of duplication of effort as the product evolved.
The idea for SQLOS was to consolidate all these efforts of the different internal SQL Server development teams to provide performance improvements on Windows, putting them in a single place with a single team that can continue to optimize these low-level functions. This enables the other teams to concentrate on challenges more specific to their own domain within SQL Server.
Another benefit to having everything in one place is that you now get better visibility of what’s happening at that level than was possible prior to SQLOS. You can access all this information through dynamic management views (DMVs). Any DMV that starts with sys.dm_os_ provides an insight into the workings of SQLOS
Many of the components make calls to the SQLOS in order to fulfill low-level functions required to support their roles. The SQLOS doesn’t replace Windows. Ultimately, everything ends up using the documented Windows system services; SQL Server just uses them in a way optimized for its own specific scenarios.
➤ sys.dm_os_schedulers — Returns one row per scheduler (remember, there is one user scheduler per logical processor) and displays information about scheduler load and health.
➤ sys.dm_os_waiting_tasks — Returns one row for every executing task that is currently waiting for a resource, as well as the wait type
➤ sys.dm_os_memory_clerks — Memory clerks are used by SQL Server to allocate memory. Significant components within SQL Server have their own memory clerk. This DMV shows all the memory clerks and how much memory each one is using. See - Atomicity