Transmokopter database solutions

Category: Performance tuning

OPTION(USE HINT) – New SQL Server 2016 SP1 feature

Many SQL Server developers and admins found, after upgrading to SQL Server 2014, that some queries started taking much longer time than before. The reason is the new cardinality estimation formula which was introduced in SQL Server 2014. Cardinality Estimation is done all the time by the SQL Server optimizer. To produce a Query plan, the optimizer makes some assumptions about how many rows exist for each condition in the table. In most cases, the new cardinality estimation formula in SQL Server 2014 and onwards gives slightly better estimates and the optimizer therefore produces slightly better plans. In some cases however, mostly when there are predicates on more than one column in a WHERE clause or JOIN clause, the 2014 cardinality estimation is a lot worse than in previous versions of SQL Server.

I’m doing an upgrade of SQL Server, from 2012 to 2016. We had a stored procedure which went from three minutes to 15 minutes some days ago. Analyzing Query plans for the procedure showed an operator estimating 10.000 rows but with an actual rowcount of nearly three million rows. In SQL Server 2012, the estimation for the same combination of predicates was two million rows. The difference between two and three million rows is neglectable. The difference between 10.000 and three million rows is far from neglectable and in our case it caused the optimizer to choose to do key lookups instead of scanning small and medium sized tables.

There was a new trace flag introduced in SQL Server 2014, 9481, which can be used to instruct SQL Server to use the old cardinality estimation formula. But a trace flag is global and we probably wouldn’t want to change the behaviour for a whole instance when we have a problem with one specific Query.
In SQL Server 2016, database scoped configuration was introduced. Database scoped configuration moves som of the configuration options previously only available on instance level down to the database level. One database scoped configuration option is LEGACY_CARDINALITY_ESTIMATION, which will give the same behaviour as trace flag 9481, but only for a single database. I’m pleased with database scoped configuration, but I still don’t want to force a certain behaviour upon ALL queries in a database when I only have a problem with ONE single Query.
Both in SQL Server 2014 and SQL Server 2016, one can use the query hint QUERYTRACEON, to use the behaviour of certain trace flags for a single query. That’s almost good enough. Almost.

use AdventureWorks2014;
--Create a user with low privileges
--Impersonate user u
EXECUTE AS user='u';
--Query using QUERYTRACEON to enforce legacy cardinality estimation
SELECT * FROM Sales.SalesOrderHeader 
--Stop impersonating user u

And the result is this

Msg 2571, Level 14, State 3, Line 11
User 'u' does not have permission to run DBCC TRACEON.

QUERYTRACEON requires sysadmin privileges, and I certainly don’t want all my users or application services to run with sysadmin privileges.

Which brings me to the real topic of this post. There has been a LOT of fuzz about SP1 to Sql Server 2016 and most of it has been about all the cool Enterprise features that makes it into Standard edition. Less talked about is the new query hint USE HINT. It provides a method to add behaviour to a single query, and behaviour which was previously only available as trace flags or database scoped configuration options. One option available through USE HINT is FORCE_LEGACY_CARDINALITY_ESTIMATION.

It does the same as querytraceon 9481 and the syntax is quite simple:

SELECT col1, col2, col3...
FROM SomeTable
WHERE col1='somevalue' AND col2='someothervalue' AND col3='somethirdvalue'

And if we run our test using the low privelege user “u” again, but with USE HINT instead of QUERYTRACEON, it just works.

use AdventureWorks2014;
--Create a user with low privileges
--Impersonate user u
EXECUTE AS user='u';
--Query using USE HINT to enforce legacy cardinality estimation
SELECT * FROM Sales.SalesOrderHeader 
--Stop impersonating user u

To me, this is fantastic, because it lets us fix the very, very few queries where the cardinality estimator goes bananas after an upgrade, without having to change the behaviour of the other several thousand queries.

And of course, there’s a sibling hint to FORCE_LEGACY_CARDINALITY_ESTIMATION, and that is FORCE_DEFAULT_CARDINALITY_ESTIMATION, which overrides both trace flag 9481 and the database scoped setting LEGACY_CARDINALITY_ESTIMATION=ON.

Another really useful hint available through USE HINT is ENABLE_QUERY_OPTIMIER_HOTFIXES. When a hotfix is released to fix some odd behaviour in the optimizer, Microsoft has decided that is’s not going to be enabled by default. That is because a hotfix may fix an odd behaviour for certain queries, but it might introduce other problems with other queries. Instead of globally enabling the trace flag 4199 or use database scoped configuration option QUERY_OPTIMIZER_HOTFIXES=ON, we can now enable Query optimizer on a per query base. Really cool if you ask me!

All the rest of the hints available using USE HINT (there are nine of them) are available in Books OnLine, here: https://msdn.microsoft.com/en-us/library/ms181714.aspx

Statistics on ascending columns

I have had an annoying problem for a while. In a database used for a statistical survey system reporting is painfully slow in the beginning of each reporting period.

The system
The tables contain a few million rows. Ola Hallengren’s index maintenance (which includes UPDATE STATISTICS) is running weekly. Each month is a new reporting period. When a new reporting period opens, there are no rows for the current period. From the first day of the month, we receive input, each input being less than 2000 new rows in the table.

The problem
Reporting of any previous period is always consistent in execution time – around 3 seconds to produce a full report. That’s an OK performance. But when reporting is done for current period early in a reporting period, execution takes up to 10 minutes.

The reason
Looking at the execution plans for the problem queries, I see that there are some lookups for current period, where the same query for a previous reporting period uses a scan operator. Looking at the estimated number of rows, compared to actual number of rows shows that the issue is bad statistics. The estimated plan shows 1 row from each of the lookups while actual shows some ten thousand rows. Since the queries always have Reporting Period as part of the WHERE-clause, it’s logical that this happens. We run UPDATE STATISTICS each week. When the statistics is collected, there are zero rows for current period. So when we add some thousand rows for current period, the statistics still shows zero rows for current period. This has been verified using DBCC SHOW_STATISTICS.

Possible solutions
I have been looking at different solutions to this problem. One would be to add a query hint, but the system uses an OR-mapper (Entity Framework) and I don’t even think it’s possible to add query hints to Linq2Entities. A stored procedure mapped as a function in Entity Framework would perhaps have worked, but I would rather solve the problem without having to deploy new .NET-code.
Next possible solution would be to enable trace flags 2389 and 2390, to tell SQL Server to try and identify ASCENDING columns and treat them differently in the optimizer (use average estimated number of rows for previous values). This might or might not solve our problem with this specific query (we run four consecutive UPDATE STATISTICS Before any increments to the period column so it’s still not sure that the column would be marked ASCENDING). But 2389 and 2390 have other problems, mainly that the trace flags are enabled on the whole instance, not just one database or one table.
What we did until recently was to manually run UPDATE STATISTICS when the users reported bad performance. Sometimes that wasn’t enough – we also had to rebuild an index containing the reporting period column. But that’s not really a long term solution either, since it requires a DBA/DBDev to be around to quickly respond to the situation.

The “final” solution
What we have done to answer to this problem as a long-term solution is to implement a SQL Agent job which compares distribution of values in the histogram from DBCC SHOW_STATISTICS with the actual values in the table. If the difference between histogram rowcount differs much (500% difference is the threshold we have used) the job updates statistics on the table and rebuilds a specific index. This job runs once each 10 minutes. When there’s no big difference, the job runs in a second. When statistics are updated and the index is being rebuilt, it takes a little longer, but using an online rebuild, it doesn’t affect system performance too much.
The job is very specific to the table(s) in our system. I have been thinking about making it more general, but haven’t had the time to really do it. It requires some logic to find out which column(s) to group on, which datatype the columns have etc. So for now, a solution specific to our table(s) will do.
Here’s the code for the job (obfuscated in terms of table/column names).

DECLARE @UpperThreshold numeric(10,0)=5;
DECLARE @LowerThreshold numeric(10,0)=0.2;
DECLARE @indexname sysname=N'ix_Period'

CREATE TABLE #histogram 
(   RANGE_HI_KEY char(6), 
    RANGE_ROWS bigint, 
    EQ_ROWS bigint, 
    AVG_RANGE_ROWS bigint); 

CREATE INDEX ix_range_hi_key ON #histogram(range_hi_key) INCLUDE (eq_rows);

INSERT INTO #histogram 
    EXEC ('DBCC SHOW_STATISTICS([myTable],' + @indexname + ') WITH HISTOGRAM');
--Now we have the statistics estimation of number of rows for each period in #histogram

IF (SELECT count(*) FROM(
	SELECT h.range_hi_key,COALESCE(h.eq_rows,1) AS eq_rows,d.period,COUNT(*) AS cnt
	  #histogram h
	--RIGHT OUTER JOIN to the table in order to catch those periods not in the histogram at all
	  RIGHT OUTER join myTable d 
	  ON h.RANGE_HI_KEY = d.period
		cast(count(*) AS numeric(10,0)) / cast(COALESCE(h.eq_rows,1) AS numeric(10,0)) > @UpperThreshold 
		cast(count(*) AS numeric(10,0)) / cast(COALESCE(h.eq_rows,1)  AS numeric(10,0))<@LowerThreshold)
	as t)>0
	PRINT 'Index ix_Period ON myTable is rebuilt and index for myTable is updated';
	EXEC('ALTER INDEX ' + @indexname + ' ON myTable REBUILD');
DROP TABLE #histogram

DELETE and Non Clustered Indexes

I’ll start off with a disclaimer: I’m going to tell you about something that happened in a specific system Environment. There’s no such thing as  a general advice you can build on this specific scenario. I’m just posting it because I was myself surprised by what order of magnitude I was able to speed up a specific query by slightly removing some of the work in the execution plan.

The other day I helped troubleshooting a database system. In a table with some 400 million records, a subset (50-60 million records) were to be deleted. The application kept timing out on this delete operation so I adviced the developer to split the delete operation into smaller chunks. I even helped writing a T-SQL script to perform the delete in one million row chunks. The script was pretty basic – a WHILE-loop which checked if any rows fulfilling the WHERE-condition of the delete was left in the table, and inside the loop a DELETE TOP(one million) followed by an explicit checkpoint.

I was sure that this script would do the trick, but I found that just deleting one million rows was a 20 minute job. 20 minutes for one million rows. And the table has 50-60 million rows. That’s something like a 15 hour job. No wonder the application kept timing out… Looking at the query plan for a delete of one million rows, I saw that a SORT-operator and a Non Clustered delete took 70% of the execution and I had a warning about spill to tempdb in the SORT operator. Knowing something about the disk latency for tempdb, I could also tell that this part of the execution plan was in fact taking more than 70% of the time.

So I started advicing on partitioning the data, in order to switch out data in specific partition to a staging area and truncate. Though this would be the “ultimate” solution in terms of performance, it should also be said that this is a rather advanced topic and it would require more future maintenance than not partitioning the data.

I had the SORT- and Non Clustered Delete operators in the back of my head and decided to try disabling the Non Clustered Index before performing the the delete. I tested how long a Non Clustered Index rebuild would take and to my surprise I saw that that was only an eight minute job. Eight minute to rebuild a NCI on 400 million rows, compared to 15 hours for deleting 50-60 million rows. So I built another T-SQL script, which disabled NCIs on the table, performed the delete operation and then rebuilt the NCIs. I tested only with deleting one million rows. The result: zero seconds for disabling NCIs. Nine seconds for the delete (!!!). Eight minutes for rebuilding the index. If I could delete one million rows in nine seconds, I was looking at something like 7 minutes for deleting all 50-60 million rows. That’s 0,8% of the time it would have taken to just perform the delete. If I add on the time it takes to rebuild the index, I’m up to a 15 minute job, compared to a 15 hour job before disabling the indexes.

BUT, someone would argue – if I disable and then rebuild the indexes, the table would be slow on Query performance the first seven minutes and in worst case it would be offline the following eight, during the index rebuild (if it’s not done online). But imaging the Query performance on a table in which a delete operation goes on for 15 hours.

The disk latency on tempdb and actually also on the transaction log is pretty high in this machine and that’s something which slows down performance. But even with an ok or even good disk configuration, I would say that disabling NCIs before the DELETE would save us a lot of time.

I’m not going to argue that you should ALWAYS do this or that, but if you find yourself struggling with updates, inserts or deletes which goes on forever, check your execution plan to see how much of the work really has to do with the DML operation and how much has to do with updating/inserting/deleting in non clustered indexes. Before going to your manager and ask for better hardware, look what you can do to optimize large workloads with a mix of DDL and DML operations. In my case, we had spill to tempdb already with one million rows. If I would have tried fighting that with more memory on the server, I would have had to ask for 150GB more memory in a box with 3GB available memory. Hardware is sometimes a real bottleneck, but I would argue that you should fix your code before adjusting the hardware.

Partitioning the table and/or implementing compression on the table would probably help even more. But these are both Enterprise features, which comes with a rather high cost. Not every organisation can afford that. If you don’t have the knowledge to troubleshoot really problematic queries yourself, it’s probably worth investing in a few days of expert help, before wasting money on new hardware or more expensive licenses.

© 2019 tsql.nu

Theme by Anders NorenUp ↑