Transmokopter database solutions

I spoke on GroupBy October 9th

October 9 and October 10 2019 was GroupBy days. New this time was the two-day schedule, with five European sessions, running in European office hours, and six North American sessions, running in Pacific office hours.

I did my presentation on partitioning “Eight hours of work in 20 minutes – partitioning rocks“. I’ve done that a few times before, but I slightly adjusted it to fit the online format a little better, and I did more prepping than usual, in terms of pre-executing setup scripts. When doing a in-your-face classroom presentation, it’s easier to “wing it” when a script takes twice as long as usual (it always does on demo day). In an online presentation, it’s harder to go around the room and ask questions or take questions while a script is finishing.

I went to Daniel Hutmacher’s (http://www.sqlsunday.com) office to do the presentation, mainly because Daniel was moderating the session but also because I needed a space where I knew I wouldn’t be disturbed. So thank you Daniel for lending some office space to me. Most things worked as planned. The final version of the ETL I was demoing took twice the usual time to run (again, it’s demo day) but it was ok – that was just another 30 seconds of anxiously waiting for it to finish. But again, overall it felt OK. Not too many were online for it, but those that were posted relevant and interesting questions.

Overall, it was a great experience and I’d be happy to do more presentations this way. Physically travelling to a conference adds value with more human interaction and most of all the Speakers’ dinner. But travelling costs time and money, plus travelling usually means getting into a fossile fule-driven airplane which doesn’t feel great every time. I’m thinking about getting more involved in PASS Virtual chapters, if family- and work schedule allows for it.

After GroupBy, I created a “Presentations” page on this blog, which I will use to keep track of my public speaking and to upload presentation material. I know all the cool kids use GitHub for such, but I’m not one of the cool kids. Heck, I don’t even own a skateboard anymore.

Finally: Thank you Brent Ozar for all the work (and money I’d assume) you put into GroupBy. And thank you everyone that volunteered to take over when Brent stopped doing GroupBy. I love the format. My five cents to next time: Please don’t do back-to-back timings. Leave half an hour between sessions to allow for more questions and general silly talk between the sessions. That is not time wasted between sessions, it’s value added.

Data Quality Client – .NET Framework Error Occured

I sometimes train the course “20767 – Implementing a SQL Data Warehouse”. One module in the course is about Data Quality Services. Every time I prep for the course, I run into an error message when trying to connect the Data Quality Client to (LOCAL): “A .NET Framework error occured”, followed by a stack trace.

I write this blog post, so I can remember when this happens for the millionth time what I should do to fix the problem, so that I can spend my time actually running through the course demos instead of googling for a fix to a DLL-hell-problem. And that’s just what it is – DLL files, or assemblies if you like, having the wrong version.

Fixing the problem is as easy as it is irritating to forget how it was fixed last time…

Step-by-step instruction:

– In a command prompt, run “c:\program files\microsoft sql server\mssql.\MSSQL\Binn\DQSInstaller.exe -upgradedlls”
– Wait until the command has completed.
– Done.

Duplicate key in sysclsobjs using DBCC CLONEDATABASE

This blog post is about an error message I got the other day when using DBCC CLONEDATABASE in a T-sql-script. But first some background to DBCC CLONEDATABASE.

I was pretty excited about the DBCC CLONEDATABASE command, which was introduced in SQL Server 2014 SP2 and SQL Server 2016 SP1. It creates a schema-only (that means all the database objects, but no data) copy of a database, keeping all statistics data, so that you can troubleshoot Query plans for certain queries without having to copy all the data. Before DBCC CLONEDATABASE (and to be honest probably also afterwords, DBCC CLONEDATABASE doesn’t replace all the needs) one had to make a full copy of a database to get the statistics data along. That’s usually copied to a test box. If the test box is identical to your production box, you’re almost fine. But on your test box, you don’t have the cached execution plans from the production box. Therefore, you might end up with very different Query plans in your test box. With DBCC CLONEDATABASE, you get a readonly copy of a database, on your production box and you can use that to tweak your queries and see what new estimated execution plans they get.

But there are limitations to what you can do in terms of troubleshooting. You can’t really use it to test real queries, since the clone has no data. Not only will you miss the actual data when running queries, you won’t know anything about CPU or IO times for a Query execution. And you can’t add or change indexes to see how a Query will behave – you can of course make the clone a readwrite database and alter/add indexes. But any new or altered index will screw up the statistics objects – since the database contains no data, the statistics for the objects will be empty.

To me, DBCC CLONEDATABASE is not that useful for troubleshooting. My use for DBCC CLONEDATABASE is a replacement for scripting a database. I’m creating a database which should represent a subset of another database. When I first developed the scripts for it, I used Adventureworks2014, and sampled 250 rows from Sales.Customer, and wanted to get a complete Adventureworks2014 database for these 250 customers, including their orders, the orderlines for these orders, the salespersons for these customers, the rows from Person.Person, Sales.Store, Production.Products etc. I got a script to work OK on my own laptop, but when trying to run it against a server and Another database, I got an error message:

Msg 2601, Level 14, State 1, Line 4
Cannot insert duplicate key row in object ‘sys.sysclsobjs’ with unique index ‘clst’. The duplicate key value is (50, 5).

I created a Connect item for this error, but couldn’t really let it go, so I did some researching. sys.sysclsobjs is not available unless you open an Admin connection to your database. So I did that, and this is what I found:

I have a row in sys.sysclsobjs with class=50 and id=5. I found out that this row has information about a schema which exists in my source database. I also found that all rows with class=50 has information about my schemas (id=1 is dbo etc). The schema_id for each schema is represented by the id-column in sys.sysclsobjs. But this didn’t help me much – there was nothing strange with the schema. So I let it go for a while, had some coffee, helped some developers troubleshoot queries and went home from work.
But I still couldn’t let it go, so I fired up my home computer and tried to figure out what is different on my own laptop’s SQL Server instance compared to the server instance. Since the source database was the same on both my laptop and the server instance, the SQL Server build number is the same, I started looking at what could possibly be different between the instances.
Without going through all the attempts leading to a dead end (they were many…) I finally looked at the model database on my two instances. In the instance where I got the error, I had a schema created. A schema created by monitoring software. This schema had schema_id=5.
I removed all the user created schemas from model. And voila, DBCC CLONEDATABASE works.
What does this mean? To me, it looks like (this has not been verified, I draw my conclusions based on symtoms, not on any knowledge about the bits and pieces inside DBCC) DBCC CLONEDATABASE creates a new database, using a normal CREATE DATABASE command and then creates database objects and copies statistics. All schema_id’s and object_id’s are copied to the clone and this is what causes the command to throw the duplicate key error. When schema_id=5 is copied from Adventureworks2014 to Adventureworks2014_clone, there’s already a schema with schema_id=5 – the one copied from the Model system database.
I was pleased to find the cause of the error and on our test boxes, I’m ok with dropping schema_id=5 in Model and go on with life. But I won’t be equally happy when we start using my script for something more useful than subsetting Adventureworks2014. Subsetting a production database for troubleshooting purposes won’t be possible. Unless I let my script drop any non-system-schemas in Model in our production instances.
Here’s a script to reproduce the error:

--First create a user database
USE CloneTest;
CREATE SCHEMA CloneTestSchema;
SELECT SCHEMA_ID('CloneTestSchema');
--Now create a schema in Model
USE Model;
SELECT * FROM sys.schemas ORDER BY schema_id;
--Gives these first rows
--	name				schema_id	principal_id
--	dbo				1			1
--	guest				2			2
--	sys				4			4
--	db_owner			16384			16384
CREATE SCHEMA model_schema;
SELECT SCHEMA_ID('model_schema');
USE master;
--Now try to clone the CloneTest database
DBCC CLONEDATABASE('CloneTest','CloneTest_Clone');
--Database cloning for 'CloneTest' has started with target as 'CloneTest_Clone'.
--Msg 2601, Level 14, State 1, Line 32
--Cannot insert duplicate key row in object 'sys.sysclsobjs' with unique index 'clst'. The duplicate key value is (50, 5).
--Now remove the schema with schema_id=5 from Model
USE model;
DROP SCHEMA model_schema;
USE master;
--Clone again
DBCC CLONEDATABASE('CloneTest','CloneTest_Clone');
--Database cloning for 'CloneTest' has started with target as 'CloneTest_Clone'.
--Database cloning for 'CloneTest' has finished. Cloned database is 'CloneTest_Clone'.
--Database 'CloneTest_Clone' is a cloned database. A cloned database should be used for diagnostic purposes only and is not supported for use in a production environment.
--DBCC execution completed. If DBCC printed error messages, contact your system administrator.

I thought for a while that it would be OK if the schema in model and the source database had the same schema_id/Schema_name-combination. But that also fails. So the only way to use DBCC CLONEDATABASE is if there are no user created schemas in model.

OPTION(USE HINT) – New SQL Server 2016 SP1 feature

Many SQL Server developers and admins found, after upgrading to SQL Server 2014, that some queries started taking much longer time than before. The reason is the new cardinality estimation formula which was introduced in SQL Server 2014. Cardinality Estimation is done all the time by the SQL Server optimizer. To produce a Query plan, the optimizer makes some assumptions about how many rows exist for each condition in the table. In most cases, the new cardinality estimation formula in SQL Server 2014 and onwards gives slightly better estimates and the optimizer therefore produces slightly better plans. In some cases however, mostly when there are predicates on more than one column in a WHERE clause or JOIN clause, the 2014 cardinality estimation is a lot worse than in previous versions of SQL Server.

I’m doing an upgrade of SQL Server, from 2012 to 2016. We had a stored procedure which went from three minutes to 15 minutes some days ago. Analyzing Query plans for the procedure showed an operator estimating 10.000 rows but with an actual rowcount of nearly three million rows. In SQL Server 2012, the estimation for the same combination of predicates was two million rows. The difference between two and three million rows is neglectable. The difference between 10.000 and three million rows is far from neglectable and in our case it caused the optimizer to choose to do key lookups instead of scanning small and medium sized tables.

There was a new trace flag introduced in SQL Server 2014, 9481, which can be used to instruct SQL Server to use the old cardinality estimation formula. But a trace flag is global and we probably wouldn’t want to change the behaviour for a whole instance when we have a problem with one specific Query.
In SQL Server 2016, database scoped configuration was introduced. Database scoped configuration moves som of the configuration options previously only available on instance level down to the database level. One database scoped configuration option is LEGACY_CARDINALITY_ESTIMATION, which will give the same behaviour as trace flag 9481, but only for a single database. I’m pleased with database scoped configuration, but I still don’t want to force a certain behaviour upon ALL queries in a database when I only have a problem with ONE single Query.
Both in SQL Server 2014 and SQL Server 2016, one can use the query hint QUERYTRACEON, to use the behaviour of certain trace flags for a single query. That’s almost good enough. Almost.

use AdventureWorks2014;
--Create a user with low privileges
--Impersonate user u
EXECUTE AS user='u';
--Query using QUERYTRACEON to enforce legacy cardinality estimation
SELECT * FROM Sales.SalesOrderHeader 
--Stop impersonating user u

And the result is this

Msg 2571, Level 14, State 3, Line 11
User 'u' does not have permission to run DBCC TRACEON.

QUERYTRACEON requires sysadmin privileges, and I certainly don’t want all my users or application services to run with sysadmin privileges.

Which brings me to the real topic of this post. There has been a LOT of fuzz about SP1 to Sql Server 2016 and most of it has been about all the cool Enterprise features that makes it into Standard edition. Less talked about is the new query hint USE HINT. It provides a method to add behaviour to a single query, and behaviour which was previously only available as trace flags or database scoped configuration options. One option available through USE HINT is FORCE_LEGACY_CARDINALITY_ESTIMATION.

It does the same as querytraceon 9481 and the syntax is quite simple:

SELECT col1, col2, col3...
FROM SomeTable
WHERE col1='somevalue' AND col2='someothervalue' AND col3='somethirdvalue'

And if we run our test using the low privelege user “u” again, but with USE HINT instead of QUERYTRACEON, it just works.

use AdventureWorks2014;
--Create a user with low privileges
--Impersonate user u
EXECUTE AS user='u';
--Query using USE HINT to enforce legacy cardinality estimation
SELECT * FROM Sales.SalesOrderHeader 
--Stop impersonating user u

To me, this is fantastic, because it lets us fix the very, very few queries where the cardinality estimator goes bananas after an upgrade, without having to change the behaviour of the other several thousand queries.

And of course, there’s a sibling hint to FORCE_LEGACY_CARDINALITY_ESTIMATION, and that is FORCE_DEFAULT_CARDINALITY_ESTIMATION, which overrides both trace flag 9481 and the database scoped setting LEGACY_CARDINALITY_ESTIMATION=ON.

Another really useful hint available through USE HINT is ENABLE_QUERY_OPTIMIER_HOTFIXES. When a hotfix is released to fix some odd behaviour in the optimizer, Microsoft has decided that is’s not going to be enabled by default. That is because a hotfix may fix an odd behaviour for certain queries, but it might introduce other problems with other queries. Instead of globally enabling the trace flag 4199 or use database scoped configuration option QUERY_OPTIMIZER_HOTFIXES=ON, we can now enable Query optimizer on a per query base. Really cool if you ask me!

All the rest of the hints available using USE HINT (there are nine of them) are available in Books OnLine, here: https://msdn.microsoft.com/en-us/library/ms181714.aspx

What is NULL?

If you ever studied normalisation of databases, you have probably come to the same conclusion as I have regarding NULL: It is best if NULL values in the database can be avoided but it is not always easy to achieve a NULL-free database. Let’s look at an example:

An object has certain properties. A vehicle has a number of gears, a certain length and height, a Wheel size etc. Typically there’s only one of these properties – a vehicle has a fixed number of gears, its length is fixed etc. So if you want to create a data model for a vehicle, you probably would create a Vehicles table, with Length, Height, Gears ans so forth as columns. So far, so good. But what if some properties are not applicable for certain Vehicles? Like a Tesla car, which doesn’t use the type of transmission we are used to. What if number of gears is a configurable option, not set until a buyer wants to become the owner of a specific vehicle? If you try to model your database to avoid NULL values in the database, you would have to rethink your data model. The Vehicles table would maybe have to become less wide and certain properties be taken out of the table and put into a VehiclePropertyValues table instead. If a certain property value is missing, there will simply be no row for that property value in the VehiclePropertyValues table. This is all fine, but it makes reporting slightly less intuitive – the SQL Queries against the database would have to contain a JOIN to the VehiclePropertyValues table. Values in the VehiclePropertyValues table would have to be pivoted to fit a report with fixed columns etc.

This is a rather simple example, but already with a simple data model, you are likely to end up with NULL-able columns.

To me, as a database designer and developer, NULL values are not forbidden. They are sometimes a problem, but my reality is that some tables will contain NULL values. In this blog post, I will discuss which possible implications NULL values can have and clear out some misunderstandings about NULL.

First of all – What is NULL? What does it mean?
To me as a user, NULL can mean one of two things. It can be used to indicate that a value is MISSING or NOT APPLICABLE. It can also be used to indicate that a value is UNKNOWN. I can usually understand which type of NULL is where. Sometimes I have to make an assumption, but usually I know if NULL means “MISSING” or “UNKNOWN”. That’s because I know the business requirements. I know that a NULL value in the Gears column means that no Gear has been chosen yet – the value is MISSING (so far). But me knowing if NULL means a value is MISSING or if it means UNKNOWN doesn’t help much when it comes to SQL Server interpreting a Query against a database. SQL Server (or SQL in general, this is not an implementation issue, it’s a SQL language issue) will always treat NULL as UNKNOWN. This is how the SQL language is designed and it’s nothing we can do about it. Whenever possible, I will try to design the database in such a way that I can treat MISSING different from UNKNOWN. In the Gears column, I might put a -1 to indicate that the value is MISSING or Not Applicable. But sometimes (or very often even), I’m not in charge of designing the database and/or the system surrounding it. I’m supposed to work with whatever data is in the database.
Many Words. “Too long, didn’t read” might be your reaction, but just remember this one thing: To a person, NULL would mean either MISSING or UNKNOWN. To SQL, it always means UNKNOWN.

Why does it matter what NULL means?
To answer that question, I would like to keep using the Gears column in the Vehicles table and make an example out of that. I’m faced with answering two questions from the business: How many bikes in our registry has five gears? How many bikes does NOT have five gears? To answer that, most developers would write two queries:

--How many vehicles has five gears?
  COUNT(*) AS WithFiveGears
  Gears = 5;

--How many vehicles does NOT have five gears?
  COUNT(*) AS NotHavingFiveGears_WRONG
  Gears <> 5;

You see that I labelled the count from the second query NotHavingFiveGears_WRONG. Why WRONG? Because it potentially returns the wrong results. And I say potentially. Like so often – “it depends”. This time, it depends on how we want to treat NULLs. Does NULL mean that the value is in fact UNKNOWN (there will be gears in this vehicle but the number of gears is set at a later stage) or does it mean that the value is MISSING or NOT APPLICABLE (A Tesla won’t have gears at all)?
If the NULL value is to be treated as UNKNOWN, we might have to rewrite the first Query as well. It should perhaps take into consideration that a number of the NULL values WILL get the value “5” in the future, and that should be a part of our report. But if the business says that UNKNOWN should be treated as “not 5”, our first Query will return the correct results, and the second won’t. And that’s because NULL is UNKNOWN to SQL. The WHERE clause in the first query gives us the rows where Gear = 5. That’s filtering out anything which is not “5” and therefore NULL values are not returned. Fine. And the second query has a WHERE clause which gives us the rows where Gears is NOT “5”. And that should give us all rows which are not explicitly “5”, right? Well, no. It doesn’t. It gives us the rows where the comparison Gears<>5 equeates to TRUE. All WHERE clauses equate to TRUE or FALSE, right? Well, no. Not always. Anything compared to NULL equeates to UNKNOWN. Because SQL doesn’t know if the column contains a certain value or not. The value is UNKNOWN and therefore it could be TRUE or FALSE. The WHERE clause will only return rows where the comparison is TRUE. And UNKNOWN is not TRUE. Therefore, the rows with NULL values will not be returned for the inequality predicate either.
Do you feel, when reading this, that you have made this error? I know I have. I’m not even sure I get it right every time nowadays either. But I know that I get it right more often then some years ago. I know that because my errors have had implications which were not desirable.

So what do we do about these d**n NULLs?
Well. To begin with, we try as hard as we can to only have NULL values which mean UNKNOWN in the database. That’s not Always easy, and it could mean we have to remodel the database. If we can’t remodel the database, we could use a constant. Like -1 for Gears. But how would that affect an AVG aggregate query? Or a median calculation? Or even “how many transmission boxes do we have to produce based on the number of Vehicles sold last year”? You might end up with one problem solved and a number of new, interesting problems.
In the end, you would have to learn to write SQL code which takes NULL into consideration. And each time you consider NULL, you would have to understand if NULL means UNKNOWN och MISSING/NOT APPLICABLE.
I will not try to answer when NULL means a certain thing. Instead, I will just end with a thing which you probably already know and have seen, but which you might not always remember to use. The IS keyword.
The IS keyword is used to compare something with NULL. It works both for equality (IS NULL) and inequality (IS NOT NULL). And it will return TRUE or FALSE, not UNKNOWN. Because SQL will now if a value is a NULL value or not.
So to finalise, I will rewrite the second reporting query to take the NULL value into consideration and in this case treat NULL as “not 5”.

  COUNT(*) as NotFiveGears_CORRECT
  Gears <> 5 OR Gears IS NULL

An alternative solution would be to use either ISNULL or COALESCE functions, to replace NULL values with something else, and then compare that with “not 5”. This will however be problematic in terms of performance. ISNULL have to be applied to each row, returning either the value for the Gears column of that row OR whatever replacement value we have. The result of that is then compared with “<> 5″. If we have an index on Gears column, we will end up with scanning the whole index, applying the function to each row in the index, thus making the index more or less useless. But semantically, using ISNULL or COALESCE would solve our problem, and therefore I show how it can be done using, in this case, COALESCE. COALESCE and ISNULL are very similar, if you want to compare them, look them up in Books Online. One thing not mentioned there (I Think at least) is how the two behave differently when used together with SELECT INTO.

  COUNT(*) as NotFiveGears_CORRECT
  COALESCE(Gears,-1) <> 5 

Archiving with the OUTPUT clause

Most database developers have been faced with the task to archive old data. It could look something like this:

  CarID int identity(1,1) PRIMARY KEY, 
  BrandName varchar(100),
  ModelName varchar(100),
  releaseYear smallint

CREATE TABLE dbo.Cars_Archive(
  CarID int,
  BrandName varchar(100),
  ModelName varchar(100),
  releaseYear smallint,
  ArchivedDateTime datetime DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT PK_Cars_Archive PRIMARY KEY(CarID, ArchivedDateTime)

And updating a row would often require a stored procedure and some explicit transactions

--Note that this is very simplified. 
CREATE PROC dbo.UpdateCars(@CarID int, @Brandname varchar(100), @ModelName varchar(100), @releaseYear smallint)
  INSERT dbo.Cars_Archive(CarID, BrandName, ModelName, releaseYear)
    SELECT CarID, BrandName, ModelName, releaseYear
    FROM dbo.Cars
    WHERE CarID = @CarID;

  UPDATE dbo.Cars 
    SET Brandname = @brandName,
        ModelName = @ModelName,
        releaseYear = @releaseYear
        WHERE CarID = @CarID;
  --You would of course do more than just Rolling back the transaction here, but this works as an example

This works. But complexity always grows quite a bit when you have to use explicit transactions. And if you (like me) are lazy, you want to write as little code as possible.
And this is where the OUTPUT-clause comes into play.
The OUTPUT-clause can be added to UPDATE, INSERT, DELETE and MERGE and will provide you with a way to view how rows looks before and after a change.
I will provide an example of how the above UPDATE-procedure could be rewritten to a single SQL-statement, thus getting rid of the explicit transaction handling (a single statement is atomic and will either commit or fail, all by itself).

INSERT dbo.Cars_Archive(CarID, BrandName, ModelName, releaseYear)
SELECT CarID, BrandName, ModelName, releaseYear FROM
UPDATE dbo.ActiveCars SET BrandName='SAAB' 
OUTPUT deleted.CarID, deleted.BrandName, deleted.ModelName, deleted.releaseYear 
) as t;

The above code will both update the Cars table and archive the old row from the Cars table into the Cars_Archive table.

The syntax of the OUTPUT-clause isn’t really that complicated – it reminds a lot about what you can do inside a trigger. You have an inserted- and a deleted-table, just like in a trigger.

Read more about the OUTPUT-clause here: https://msdn.microsoft.com/en-us/library/ms177564(v=sql.105).aspx

Statistics on ascending columns

I have had an annoying problem for a while. In a database used for a statistical survey system reporting is painfully slow in the beginning of each reporting period.

The system
The tables contain a few million rows. Ola Hallengren’s index maintenance (which includes UPDATE STATISTICS) is running weekly. Each month is a new reporting period. When a new reporting period opens, there are no rows for the current period. From the first day of the month, we receive input, each input being less than 2000 new rows in the table.

The problem
Reporting of any previous period is always consistent in execution time – around 3 seconds to produce a full report. That’s an OK performance. But when reporting is done for current period early in a reporting period, execution takes up to 10 minutes.

The reason
Looking at the execution plans for the problem queries, I see that there are some lookups for current period, where the same query for a previous reporting period uses a scan operator. Looking at the estimated number of rows, compared to actual number of rows shows that the issue is bad statistics. The estimated plan shows 1 row from each of the lookups while actual shows some ten thousand rows. Since the queries always have Reporting Period as part of the WHERE-clause, it’s logical that this happens. We run UPDATE STATISTICS each week. When the statistics is collected, there are zero rows for current period. So when we add some thousand rows for current period, the statistics still shows zero rows for current period. This has been verified using DBCC SHOW_STATISTICS.

Possible solutions
I have been looking at different solutions to this problem. One would be to add a query hint, but the system uses an OR-mapper (Entity Framework) and I don’t even think it’s possible to add query hints to Linq2Entities. A stored procedure mapped as a function in Entity Framework would perhaps have worked, but I would rather solve the problem without having to deploy new .NET-code.
Next possible solution would be to enable trace flags 2389 and 2390, to tell SQL Server to try and identify ASCENDING columns and treat them differently in the optimizer (use average estimated number of rows for previous values). This might or might not solve our problem with this specific query (we run four consecutive UPDATE STATISTICS Before any increments to the period column so it’s still not sure that the column would be marked ASCENDING). But 2389 and 2390 have other problems, mainly that the trace flags are enabled on the whole instance, not just one database or one table.
What we did until recently was to manually run UPDATE STATISTICS when the users reported bad performance. Sometimes that wasn’t enough – we also had to rebuild an index containing the reporting period column. But that’s not really a long term solution either, since it requires a DBA/DBDev to be around to quickly respond to the situation.

The “final” solution
What we have done to answer to this problem as a long-term solution is to implement a SQL Agent job which compares distribution of values in the histogram from DBCC SHOW_STATISTICS with the actual values in the table. If the difference between histogram rowcount differs much (500% difference is the threshold we have used) the job updates statistics on the table and rebuilds a specific index. This job runs once each 10 minutes. When there’s no big difference, the job runs in a second. When statistics are updated and the index is being rebuilt, it takes a little longer, but using an online rebuild, it doesn’t affect system performance too much.
The job is very specific to the table(s) in our system. I have been thinking about making it more general, but haven’t had the time to really do it. It requires some logic to find out which column(s) to group on, which datatype the columns have etc. So for now, a solution specific to our table(s) will do.
Here’s the code for the job (obfuscated in terms of table/column names).

DECLARE @UpperThreshold numeric(10,0)=5;
DECLARE @LowerThreshold numeric(10,0)=0.2;
DECLARE @indexname sysname=N'ix_Period'

CREATE TABLE #histogram 
(   RANGE_HI_KEY char(6), 
    RANGE_ROWS bigint, 
    EQ_ROWS bigint, 
    AVG_RANGE_ROWS bigint); 

CREATE INDEX ix_range_hi_key ON #histogram(range_hi_key) INCLUDE (eq_rows);

INSERT INTO #histogram 
    EXEC ('DBCC SHOW_STATISTICS([myTable],' + @indexname + ') WITH HISTOGRAM');
--Now we have the statistics estimation of number of rows for each period in #histogram

IF (SELECT count(*) FROM(
	SELECT h.range_hi_key,COALESCE(h.eq_rows,1) AS eq_rows,d.period,COUNT(*) AS cnt
	  #histogram h
	--RIGHT OUTER JOIN to the table in order to catch those periods not in the histogram at all
	  RIGHT OUTER join myTable d 
	  ON h.RANGE_HI_KEY = d.period
		cast(count(*) AS numeric(10,0)) / cast(COALESCE(h.eq_rows,1) AS numeric(10,0)) > @UpperThreshold 
		cast(count(*) AS numeric(10,0)) / cast(COALESCE(h.eq_rows,1)  AS numeric(10,0))<@LowerThreshold)
	as t)>0
	PRINT 'Index ix_Period ON myTable is rebuilt and index for myTable is updated';
	EXEC('ALTER INDEX ' + @indexname + ' ON myTable REBUILD');
DROP TABLE #histogram

Generate time slots

Here’s an Inline Table Valued Function (TVF) for generating time-slots from a start-date to an end-date, given a certain time for each slot, given in minutes.

This would be useful for many applications, like scheduling systems, sales statistics broken down into certain slices of time etc. The function does have some limitations, eg there can’t be more than 100.000 minutes between start and endtime. This is easily fixed by just adding Another CROSS JOIN to CTE2, or by changing the DATEADD-functions to use hour instead of minute if that fits your purpose.

The function uses a numbers table together with DATEADD to genereate time slots. I’ve seen tons of Numbers table solutions, using CROSS JOIN and the table sys.columns. This solution is faster, since it only holds the numbers table in memory and won’t have to do any logical reads or Scans against sys.columns. My solution uses zero logical reads, compared to 12 logical reads using 3600 time-slots using sys.columns CROSS JOIN sys.columns. Not a dramatic difference, but if you CROSS APPLY this function with some large dataset you might end up with a more dramatic performance difference.


Anyway, here’s the code for the function.


CREATE FUNCTION GetTimeSlots(@starttime datetime, @endtime datetime, @interval int)
--Generate numbers 0 to N-1 for number of slots N
WITH cte AS(
), cte2 AS (
	SELECT TOP(DATEDIFF(minute,@starttime, @endtime) / @interval) row_number() OVER(ORDER BY c1.n)-1 AS rownum
	FROM cte c1
	CROSS JOIN cte c2
	CROSS JOIN cte c3
	CROSS JOIN cte c4
	CROSS JOIN cte c5
	DATEADD(minute,rownum*@interval,@starttime) AS slotfrom,
	DATEADD(minute,(rownum+1)*@interval,@starttime) AS slotto

DELETE and Non Clustered Indexes

I’ll start off with a disclaimer: I’m going to tell you about something that happened in a specific system Environment. There’s no such thing as  a general advice you can build on this specific scenario. I’m just posting it because I was myself surprised by what order of magnitude I was able to speed up a specific query by slightly removing some of the work in the execution plan.

The other day I helped troubleshooting a database system. In a table with some 400 million records, a subset (50-60 million records) were to be deleted. The application kept timing out on this delete operation so I adviced the developer to split the delete operation into smaller chunks. I even helped writing a T-SQL script to perform the delete in one million row chunks. The script was pretty basic – a WHILE-loop which checked if any rows fulfilling the WHERE-condition of the delete was left in the table, and inside the loop a DELETE TOP(one million) followed by an explicit checkpoint.

I was sure that this script would do the trick, but I found that just deleting one million rows was a 20 minute job. 20 minutes for one million rows. And the table has 50-60 million rows. That’s something like a 15 hour job. No wonder the application kept timing out… Looking at the query plan for a delete of one million rows, I saw that a SORT-operator and a Non Clustered delete took 70% of the execution and I had a warning about spill to tempdb in the SORT operator. Knowing something about the disk latency for tempdb, I could also tell that this part of the execution plan was in fact taking more than 70% of the time.

So I started advicing on partitioning the data, in order to switch out data in specific partition to a staging area and truncate. Though this would be the “ultimate” solution in terms of performance, it should also be said that this is a rather advanced topic and it would require more future maintenance than not partitioning the data.

I had the SORT- and Non Clustered Delete operators in the back of my head and decided to try disabling the Non Clustered Index before performing the the delete. I tested how long a Non Clustered Index rebuild would take and to my surprise I saw that that was only an eight minute job. Eight minute to rebuild a NCI on 400 million rows, compared to 15 hours for deleting 50-60 million rows. So I built another T-SQL script, which disabled NCIs on the table, performed the delete operation and then rebuilt the NCIs. I tested only with deleting one million rows. The result: zero seconds for disabling NCIs. Nine seconds for the delete (!!!). Eight minutes for rebuilding the index. If I could delete one million rows in nine seconds, I was looking at something like 7 minutes for deleting all 50-60 million rows. That’s 0,8% of the time it would have taken to just perform the delete. If I add on the time it takes to rebuild the index, I’m up to a 15 minute job, compared to a 15 hour job before disabling the indexes.

BUT, someone would argue – if I disable and then rebuild the indexes, the table would be slow on Query performance the first seven minutes and in worst case it would be offline the following eight, during the index rebuild (if it’s not done online). But imaging the Query performance on a table in which a delete operation goes on for 15 hours.

The disk latency on tempdb and actually also on the transaction log is pretty high in this machine and that’s something which slows down performance. But even with an ok or even good disk configuration, I would say that disabling NCIs before the DELETE would save us a lot of time.

I’m not going to argue that you should ALWAYS do this or that, but if you find yourself struggling with updates, inserts or deletes which goes on forever, check your execution plan to see how much of the work really has to do with the DML operation and how much has to do with updating/inserting/deleting in non clustered indexes. Before going to your manager and ask for better hardware, look what you can do to optimize large workloads with a mix of DDL and DML operations. In my case, we had spill to tempdb already with one million rows. If I would have tried fighting that with more memory on the server, I would have had to ask for 150GB more memory in a box with 3GB available memory. Hardware is sometimes a real bottleneck, but I would argue that you should fix your code before adjusting the hardware.

Partitioning the table and/or implementing compression on the table would probably help even more. But these are both Enterprise features, which comes with a rather high cost. Not every organisation can afford that. If you don’t have the knowledge to troubleshoot really problematic queries yourself, it’s probably worth investing in a few days of expert help, before wasting money on new hardware or more expensive licenses.

Impressions from Sql Saturday 433

I’m currently on a train from Gothenburg back home to Enköping. I have attended my first Sql Saturday (thank’s Mikael Wedham and crew for a great event!). I also did my first ever public SQL presentation at the event – a session about SQL Server partitioning.

The presentation and demo scripts can be downloaded from http://www.sqlsaturday.com/433/Sessions/Details.aspx?sid=38722

Initially, I was scheduled to have my session at 10:15 – the second session-slot of the day. The night before the conference – while being on the way to a fantastic Speakers’ Dinner out on the Island Vinga – someone told me to check the schedule again, there had been some last minute changes to it. I did, and found myself being scheduled for the last session-slot, at 4PM.

When I started prepping for the session in the morning, my PC was slow. Not a little bit slower than usual, but painfully slow. A query which normally takes about 15 seconds (which by the way is a pretty long time to fill out with chatting when you’re a speaker) now took a full minute. Not only would my presentation be filled with “Now we’ll just have to wait a minute, does someone have a nice joke to tell while we wait”-moments. The time for the presentation just wouldn’t be enough. I was already somewhat nervous about presenting and this sure didn’t make it better. After some looking at performance counters for a while, I unplugged my power supply and Bam! My PC was back to full speed again and I could see the CPU up to full speed. Perfect! Or at least that’s what I thought for a short while. Until I realised that some of my demos are rather CPU- and IO-intensive and I wasn’t sure my battery would last for the full session. There wasn’t much to do about that – I would just have to hope for it to do, and bring a power supply as a last resort when the battery gave up.

With the preparations more or less done the first session kicked off. I attended Cathrine Wilhelmsen’s session “Biml for Beginners: Generating SSIS packages with BimlScript“. A conference couldn’t have had a better start. If you ever get a chance to listen to Cathrine, just do it!

Next session, instead of presenting myself, I listened to Bob Duffy’s “When good SQL design goes bad” – a presentation about when Best Practice simply isn’t “Best”, not even “Good” or “Decent”.

Next two sessions were given by Håkan Winther and Steinar Andersen from SQL Service. I have listened to both before and knew they have both technical- and presentation skills. Håkan presented “How to kill SQL Server Performance“, showing how to make your hardware work really slow in order to convince your manager to get new hardware 🙂 Steinar presented “SQL Server Magic buttons: What are Trace Flags, and why should I care“. From both these sessions, I take a few very specific Points with me to some very specific problems at work.

I skipped the 3PM-session and made some final adjustments to my own presentation instead, had some coffee, fixed some resolution problems with the beamer in the session room and started waiting nervously. Would anyone show up? Did I want many people to show up? What if everyone in the room knows much more about partitioning than I do? As we got closer to 4PM, the room filled up. I had promised the organisers to give some practical information. The same moment I said “sure, I can do that”, I totally forgot.

My presentation started and everything went really smooth. Every demo worked, I got some good questions from the audience. But I realised that when I practiced my presentation, I didn’t note times for any presentation steps, so I didn’t really know if I was ahead of or behind schedule. I started thinking my presentation would finish too soon, and it probably would have if I kept up the speed the way I had in the first few slides. So I stopped rushing over the slides, took some time to elaborate on some of the points. In the end, I wasn’t ahead of Schedule at all – I had to skip most of the last demo and rush over a few things. But I did save some time to tell a really nerdy SQL DBA joke, got myself a round of applause and some good feedback on my way to the raffle.

First ever public SQL presentation done! There are a few things I would change if I give this presentation again – I would spend some more time showing the internals of partitions and more time elaborating on partition elimination. But the main points I’m taking with me for future presentations are:

  • Plan for things to go wrong. Had I not found out that it was the power supply which made my PC work slow, I would have been in trouble. In this particular case, I could (thanks for the advice Steinar) have created one ore more databases and script them to certain states, in order to have a backout plan if eg an index rebuild takes too long to wait for.
  • Practice the presentation back to back and note some checkpoints, to know if Topic A finishing in 20 minutes is too soon or too late.
  • Decide which parts of the presentation could be skipped – or save all the optional stuff to the last slides. Not going through the whole material is no disaster – noone else than you knows what you did NOT talk about.
  • Don’t be intimidated by the audience being really smart. They came there by choice and they all read the session abstract before, so they know what you are going to talk about.
  • Interact. There will be people in the room with specialist knowledge about some details. Use their knowledge to improve the session. In the end, you might end up learning new things yourself during the presentation, which you can put into your next presentation.
  • If you can, find someone to listen to your presentation before the conference.

Finally, if you ever get a chance to speak at a SQL Saturday, do it. If you can’t (or do not want to) speak, attend the next SQL Saturday near you. Free training and a chance to network with other nerds is an opportunity you should take.

« Older posts

© 2019 tsql.nu

Theme by Anders NorenUp ↑