Feb 9, 2011

The Effect of NOLOCK on Performance

When this article was first published, it produced some lively debate. It was believed by some that the article misled readers into thinking that NOLOCK should be a tool for performance tuning, which it is not. There is a divided opinion on the appropriate use of NOLOCK, and therefore I am adding this foreword to my article in an effort to clarify matters.
NOLOCK is a query optimizer hint. It has advantages, and disadvantages, and it potentially breaks one of the fundamental rules of databases – data integrity through the use of a locking mechanism. In a nutshell, NOLOCK does not take locks on the data it reads. This provides benefits for concurrency and performance, but data integrity suffers.
I can't speak for Microsoft or the MVP's, but if Microsoft offers an ISOLATION LEVEL which does the same thing as NOLOCK, then there must be an appropriate place for the use of NOLOCK. We've discussed the advantages of NOLOCK, so let's take a look at some of the disadvantages.
Firstly, when reading with NOLOCK you are reading uncommitted data. This means the data has not been committed to the database, and could be rolled back (undone) after you have read it. So, you may find your application is processing invalid data. This is not so much of a problem with Name and Address data, but is particularity problematic with Accounts, Finance and Sales data. This is where your data integrity would suffer.
Secondly, as noted by one of the SQLServerCentral.com forum posters, Itzik Ben-Gan demonstrated to the UK SQL Server User Group that NOLOCK can return duplicate rows when reading data. To quote the forum posting: "Tony Rogerson's blog has code which demonstrates this. http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/16/1345.aspx ".
This article was intended as a clinical approach to the effect of NOLOCK on performance, without advocating you use NOLOCK in an attempt to speed up your queries. If you do decide to use it that way, I would like you to know the risks.

Introduction - How locking works

Using the NOLOCK query optimiser hint is generally considered good practice in order to improve concurrency on a busy system. When the NOLOCK hint is included in a SELECT statement, no locks are taken when data is read. The result is a Dirty Read, which means that another process could be updating the data at the exact time you are reading it. There are no guarantees that your query will retrieve the most recent data.
The advantage to performance is that your reading of data will not block updates from taking place, and updates will not block your reading of data. This means that you will have improved concurrency on your system, and more tasks can be performed at the same time. Without NOLOCK each access to the database results in a lock being made on the row being read, or the page on which the data is located. SELECT statements take Shared (Read) locks. This means that multiple SELECT statements are allowed simultaneous access, but other processes are blocked from modifying the data. The updates will queue until all the reads have completed, and reads requested after the update will wait for the updates to complete. The result to your system is delay, otherwise known as Blocking.
Blocks often take milliseconds to resolve themselves, and this is not always noticeable to the end-user. At other times, users are forced to wait for the system to respond. Performance problems can be difficult to diagnose and resolve, but blocking is often easily resolved by the use of NOLOCK.
One solution to blocking is to place the NOLOCK query optimiser hint on all select statements where a dirty read is not going to cause problems for your data integrity. You would generally avoid this practice in your Finance stored procedures, for example.

Faster reading of data with NOLOCK

Logically, a select using NOLOCK should return data faster than without, as the taking of locks invokes an overhead. Ever curious, I decided to investigate the impact of NOLOCK on the performance of a SELECT query. In order to prove this I needed data - lots of it. The following example shows code I ran to create two tables: Products and Orders, and populate them with 5 million rows each.
You will need to create a database called NOLOCKPerformanceTest, and I recommend you allocate 400MB of initial space for the data file. This code could take over an hour to complete, depending on your system. I am running this on an Intel Dual Core 2Ghz laptop, with a 7200rpm SATA drive and 2GB RAM. The database edition is SQL Server 2005 Developer, with SP1 and hotfix 2153 installed.
I decided to use very large tables, as the longer it takes my test to run, the more noticeable any differences.
USE NOLOCKPerformanceTest
GO
CREATE TABLE Orders (
               OrderID        int IDENTITY(1,1) PRIMARY KEY,
               ProductID      int NOT NULL,
               OrderDate      varchar(50) NOT NULL
               )
        
DECLARE @i                     int,
        @OrderDate             datetime,
        @DateDifferenceDays    int,
        @ProductID             int
SET @i = 1
SET @DateDifferenceDays = 0
SET @ProductID = 0
WHILE @i < 5000000
BEGIN
  -- The order date will decrease by one day every 1000 orders
  -- Drop the time using CONVERT
  SET @OrderDate = CONVERT(datetime,CONVERT(varchar(10),getdate()-@DateDifferenceDays,112))
  IF @i % 1000 = 0
        SET @DateDifferenceDays = @DateDifferenceDays + 1     
  -- Ensure the Referential Integrity with the Product tables
  SET @ProductID = @ProductID + @i
  IF @ProductID > 500000
        SET @ProductID = 1
  INSERT       Orders (ProductID, OrderDate)
  VALUES       (@ProductID , @OrderDate)
  SET @i = @i + 1
END
GO
Example 1 Load table Orders
USE NOLOCKPerformanceTest
GO
CREATE TABLE Products (
               ProductID      int IDENTITY(1,1) PRIMARY KEY,
               ProductDescription varchar(100) NOT NULL,
               ProductPrice   money
               )
DECLARE @i                     int,
        @ProductDescription    varchar(100)
SET @i = 1
WHILE @i < 5000000
BEGIN
        SET @ProductDescription = 'Product ' + CONVERT(varchar, @i)
        INSERT  Products (ProductDescription, ProductPrice)
        VALUES  (@ProductDescription , @i)
        SET @i = @i + 1
END
GO
Example 2 Load table Products
I started the exercise by running a simple join query between the two tables, while checking the runtime with NOLOCK, and without NOLOCK. I flush the data buffer before each execution, and I drew an average over 5 executions.
USE NOLOCKPerformanceTest
GO
DBCC DROPCLEANBUFFERS
DECLARE @StartTime datetime
SET @StartTime = getdate()
SELECT  Orders.OrderID,
        Orders.OrderDate,
        Products.ProductID,
        Products.ProductDescription
FROM    Orders
        INNER JOIN Products
        ON orders.ProductID = Products.ProductID
WHERE   Orders.OrderDate BETWEEN CONVERT(datetime, '01 Jan 2006') and getdate()
SELECT getdate() - @StartTime
GO
USE NOLOCKPerformanceTest
GO
DBCC DROPCLEANBUFFERS
DECLARE @StartTime datetime
SET @StartTime = getdate()
SELECT  Orders.OrderID,
        Orders.OrderDate,
        Products.ProductID,
        Products.ProductDescription
FROM    Orders WITH (NOLOCK)
        INNER JOIN Products WITH (NOLOCK)
        ON orders.ProductID = Products.ProductID
WHERE   Orders.OrderDate BETWEEN CONVERT(datetime, '01 Jan 2006') and getdate()
SELECT getdate() - @StartTime
GO


The Results


The results for execution times are listed below. It takes a while to run, as 310,001 rows are returned. If you are running these queries, execution times will vary depending on the specification of your server. The format for these results is hh:mm:ss.nnn, where nnn is milliseconds.
Without NOLOCK:
Run1: 00:00:29.470
Run2: 00:00:30.467
Run3: 00:00:28.877
Run4: 00:00:29.123
Run5: 00:00:29.407
Average: 00:00:29:469
With NOLOCK:
Run1: 00:00:25.060
Run2: 00:00:25.157
Run3: 00:00:25.107
Run4: 00:00:25.140
Run5: 00:00:25.893
Average: 00:00:25:271
This test shows the average execution time is less when using NOLOCK. The average time saved is 00:00:04.197, which equates to a saving in elapsed time of approximately 14%.
My next step was to check the amount of CPU and IO used in both queries. I added the following code immediately before the first SELECT in examples 3 and 4, and ran both queries again:
SET STATISTICS IO ON
SET STATISTICS TIME ON
The execution times were the same as before, and the results for the SET STATISTICS options were as follows:
Without NOLOCK:
Table 'Orders'. Scan count 3, logical reads 27266, physical reads 10, read-ahead reads 24786, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Products'. Scan count 3, logical reads 27119, physical reads 78, read-ahead reads 24145, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 13844 ms, elapsed time = 27480 ms.
With NOLOCK:
Table 'Orders'. Scan count 3, logical reads 24845, physical reads 0, read-ahead reads 24826, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Products'. Scan count 3, logical reads 24711, physical reads 0, read-ahead reads 24689, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 13813 ms, elapsed time = 24999 ms.
While not being an expert at IO, I nevertheless noticed lower physical and logical reads for the NOLOCK version. I ran both queries several times, and noticed the same pattern. Using NOLOCK definitely results in lower IO.
The CPU time is similar between the versions, though you can see that once again the elapsed time is lower when using NOLOCK.
Changing the date in the WHERE clause (see examples 3 and 4) to "01 Dec 2006" results in no rows being returned. However, the execution time for the without NOLOCK example shows an average of about 24 seconds. With NOLOCK it runs for approximately 22 seconds each time the query is run.

Conclusion

This article has hopefully proven that using NOLOCK when querying data results in lower IO and faster response times. I have executed these queries at least a dozen times each over several weeks, and the expected result is always the same.

No comments:

Post a Comment

Hi,

Thanks for your visit to this blog.
We would be happy with your Queries/Suggestions.