Performance tuning

14 September, 202114 September, 2021

Video: T-SQL to Generate Time Slots. FAST, with a Tally Table

In my previous post, I demonstrated how we can use a Tally Table, or a numbers table, to generate all possible combinations of four letter codes from the letters A-Z.

Today, I have another example of the usage of Tally Tables. I have used this approach many times, to generate a calendar dimension for a Data Warehouse. Or more often, when I need to do reporting with fixed periods (hourly, daily, weekly etc) but there’s no underlying data for some of the periods.

In the video, I show how the technique with generating variable length time slots work.

As in my previous post, I have finalised the code and made a function that you can use. It’s an inline table valued function, so you can join or cross apply to it without having to worry too much about performance issues with it. Having said that, cardinality estimation may or may not be a performance issue for you if you use this function to drive eg a report. Because SQL Server might make some funky assumptions about the number of rows returned from the query. But let’s first look at the code for the actual function.

CREATE FUNCTION dbo.GenerateTimeSlots(@Start DATETIME, @End DATETIME, @IntervalLengthSeconds INT)
RETURNS TABLE AS
RETURN(
WITH ten AS(
	SELECT n FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(n)
), millions AS (
	SELECT 
	TOP(DATEDIFF(SECOND,@start,@end)/@IntervalLengthSeconds)
	ROW_NUMBER() OVER(ORDER BY (SELECT 'no specific order')) AS n 
	FROM ten t1
	CROSS JOIN ten t2
	CROSS JOIN ten t3 
	CROSS JOIN ten t4 
	CROSS JOIN ten t5 
	CROSS JOIN ten t6
	ORDER BY n
)SELECT 
	DATEADD(SECOND,@IntervalLengthSeconds * (n-1),@start) AS TimeSlotStart, n AS TimeSlotNumber
FROM millions);

Depending on how you call this function, you may or may not suffer from bad cardinality estimation. Let’s start with an example where SQL Server estimates cardinality properly

SELECT * FROM dbo.GenerateTimeSlots('2020-03-11',CURRENT_TIMESTAMP, 3600);

In the above call, SQL Server estimates 13255 rows to be returned from the query. Which turns out to be exactly right (depending on when you run it of course, since I’m using CURRENT_TIMESTAMP for the end-date).

Now, try calling the function with these lines of code.

DECLARE @IntervalLengthSeconds INT=3600;
SELECT * FROM dbo.GenerateTimeSlots('2020-03-11',CURRENT_TIMESTAMP,@IntervalLengthSeconds)

This time, SQL Server estimates 100 rows to be returned from the query. Which of course isn’t correct. It will still return 13255 rows. This is because SQL Server sees a variable for the @IntervalLengthSeconds parameter and doesn’t know the value of the parameter.

If you WILL use a variable for any of the parameters, and you know how many time slots you _most often_ will return from the query, you can make an adjustment to add your own estimation using the query hint OPTIMIZE FOR. Like this:

DECLARE @IntervalLengthSeconds INT=3600;
SELECT * FROM dbo.GenerateTimeSlots('2020-03-11',CURRENT_TIMESTAMP,@IntervalLengthSeconds)
OPTION(OPTIMIZE FOR(@IntervalLengthSeconds=3600));

In the above example, this of course looks a bit stupid, because I set the variable value to a constant and then optimize for the variable to have that value. It would be much easier to just call the function using that constant. But you might pull up that value from a table in a previous statement, and in that case, you can use this technique to force a certain value, for the cardinality estimation.

Happy playing with Tally Tables!

13 November, 201913 November, 2019

I spoke on SQL Server User Group Sweden

SQL Server User Group (SQLUG) Sweden had a meeting on October 24th. This one felt a bit special to me, as it was kind of a replay of my first time public speaking, in two ways. My first time speaking in public about SQL Server was on SQLUG Sweden, a few years ago. SQLUG Sweden arranged a Local Community Edition, with local speakers. That first time, it was me and Daniel Hutmacher (Twitter , Blog) speaking. And on October 24th, it was, again, me and Daniel Hutmacher speaking.

10 February, 201731 October, 2019

OPTION(USE HINT) – New SQL Server 2016 SP1 feature

Many SQL Server developers and admins found, after upgrading to SQL Server 2014, that some queries started taking much longer time than before. The reason is the new cardinality estimation formula which was introduced in SQL Server 2014. Cardinality Estimation is done all the time by the SQL Server optimizer. To produce a Query plan, the optimizer makes some assumptions about how many rows exist for each condition in the table. In most cases, the new cardinality estimation formula in SQL Server 2014 and onwards gives slightly better estimates and the optimizer therefore produces slightly better plans. In some cases however, mostly when there are predicates on more than one column in a WHERE clause or JOIN clause, the 2014 cardinality estimation is a lot worse than in previous versions of SQL Server.

5 November, 201531 October, 2019

Statistics on ascending columns

I have had an annoying problem for a while. In a database used for a statistical survey system reporting is painfully slow in the beginning of each reporting period.

The system
The tables contain a few million rows. Ola Hallengren’s index maintenance (which includes UPDATE STATISTICS) is running weekly. Each month is a new reporting period. When a new reporting period opens, there are no rows for the current period. From the first day of the month, we receive input, each input being less than 2000 new rows in the table.

The problem
Reporting of any previous period is always consistent in execution time – around 3 seconds to produce a full report. That’s an OK performance. But when reporting is done for current period early in a reporting period, execution takes up to 10 minutes.

12 September, 201531 October, 2019

DELETE and Non Clustered Indexes

I’ll start off with a disclaimer: I’m going to tell you about something that happened in a specific system Environment. There’s no such thing as a general advice you can build on this specific scenario. I’m just posting it because I was myself surprised by what order of magnitude I was able to speed up a specific query by slightly removing some of the work in the execution plan.

The other day I helped troubleshooting a database system. In a table with some 400 million records, a subset (50-60 million records) were to be deleted. The application kept timing out on this delete operation so I adviced the developer to split the delete operation into smaller chunks. I even helped writing a T-SQL script to perform the delete in one million row chunks. The script was pretty basic – a WHILE-loop which checked if any rows fulfilling the WHERE-condition of the delete was left in the table, and inside the loop a DELETE TOP(one million) followed by an explicit checkpoint.