I have SQL Server running in a container with /var/opt/mssql saved on a volume. Now I want to clone said volume to a new volume and spin up another container using the cloned volume, but doing so gives me an Access Denied error when starting the second container.
To reproduce the problem
# Create container sql1
docker run --name sql1 -v sql1volume:/var/opt/mssql -p1501:1433 -d -e "ACCEPT_EULA=YES" -e "SA_PASSWORD=Pa55w.rd" mcr.microsoft.com/mssql/server:latest
# Stop sql1 and clone sql1volume
docker stop sql1
docker volume create sql2volume
docker run --rm -i -t -v sql1volume:/original -v sql2volume:/clone alpine sh -c "cp -avr /original/* /clone"
# Start new container using the cloned volume
docker run --name sql2 -v sql2volume:/var/opt/mssql -p1502:1433 -d -e "ACCEPT_EULA=YES" -e "SA_PASSWORD=Pa55w.rd" mcr.microsoft.com/mssql/server:latest
# Check out the logs from sql2
docker logs sql2
Logs will look like this
SQL Server 2022 will run as non-root by default.
This container is running as user mssql.
Your master database file is owned by mssql.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
/opt/mssql/bin/sqlservr: Error: The system directory [/.system] could not be created. File: LinuxDirectory.cpp:420 [Status: 0xC0000022 Access Denied errno = 0xD(13) Permission denied]
Ok, error reproduced. Why does this happen?
It happens because Microsoft did good. Microsoft’s SQL Server-container images doesn’t run as root. They run as the user mssql, with user-id 10001. This is awesome, just as awesome as _not_ running SQL Server for Windows with a service account that is local admin on the server. However, the alpine container we use to clone the volume doesn’t run as user mssql. In fact, it doesn’t know who mssql is.
Thanks for the lesson. Now, how do we solve this mess?
It isn’t that hard to fix really. We need to create a user with user-id 10001 in our alpine container and we need to change ownership of the files in the container to this user. So let’s clean up and start cloning the volume again, this time creating the user too.
2023-08-31 23:23:10.38 spid65s [4]. Feature Status: PVS: 0. CTR: 0. ConcurrentPFSUpdate: 1. ConcurrentGAMUpdate: 1. ConcurrentSGAMUpdate: 1, CleanupUnderUserTransaction: 0. TranLevelPVS: 0
2023-08-31 23:23:10.38 spid65s Starting up database 'msdb'.
2023-08-31 23:23:10.49 spid47s [2]. Feature Status: PVS: 0. CTR: 0. ConcurrentPFSUpdate: 1. ConcurrentGAMUpdate: 1. ConcurrentSGAMUpdate: 1, CleanupUnderUserTransaction: 0. TranLevelPVS: 0
2023-08-31 23:23:10.49 spid47s Starting up database 'tempdb'.
2023-08-31 23:23:10.81 spid47s The tempdb database has 8 data file(s).
2023-08-31 23:23:10.82 spid62s The Service Broker endpoint is in disabled or stopped state.
2023-08-31 23:23:10.82 spid62s The Database Mirroring endpoint is in disabled or stopped state.
2023-08-31 23:23:10.83 spid62s Service Broker manager has started.
2023-08-31 23:23:10.83 spid45s Recovery is complete. This is an informational message only. No user action is required.
Final comments
This was actually pretty easy, but it wasn’t so easy to find it on the interwebs, thus this blog post. When running the updated cloning process, the alpine container (or rather the adduser command) will ask for a password. Just type anything there – it that password will be in the passwd file of the alpine container, which is thrown away as soon as it finished copying the files. But it’s a bit annoying – getting prompted for passwords makes it impossible to automate the process. There’s probably a flag for adduser to not change passwords but I din’t find it at first glance and it’s getting late here. Tomorrow perhaps…
If you already run containers and even SQL Server in containers, this post is not for you. This is something to get started.
And I will start very basic. You heard the word “Container” and you heard it’s good. But you wonder, what the heck IS a container. Great. This is for you.
I created a video about creating your very first container. This post contains some more, and it’s a lot easier to copy/paste from a blog than from a YouTube video.
What the heck is a Container?
A container is a way to run a process, or a program. In my examples, that program is going to be the SQL Server executable. The surrounding operating system lends some resources. Like a namespace, so that whatever is running inside the container can’t see everything in the hosting operating system. Once inside the container, the processes can’t see what’s running outside of the container. The operating system controls how much resources the processes inside the container can use – how much CPU, how much RAM etc. Since the operating system gives the container some resources on its own, it can look to the processes running inside the container as if they are even running in a host of their own – with its own hostname.
I heard this is all Linux stuff, I don’t know how that works
It is true Containers is mostly a Linux thing. You can run Windows containers as well, but the Windows operating system isn’t as good as Linux at creating this isolated environment for processes, which means Windows containers are substantially larger. That means more time to create them and more resources to host them. Therefore, containers is mostly a Linux thing. But you don’t need to know all that about Linux. If you run SQL Server in a container, you need to know something about SQL Server on Linux. But not much really. If you run a .NET Core application in a container, you just need to know about .NET Core. The Linux operating system, and how to configure it to be able to run the containers, that’s what you have container software for.
How do I run a container?
To be able to run containers in Windows 10 or Windows 11, you need software to maintain and run the containers. I prefer Docker Desktop for Windows, which you can download from docker.com. You also need a Linux environment running inside your Windows operating system. Either in a Hyper-V VM or in Windows Subsystem For Linux ver 2 (WSL2). These are Windows features which you will simply enable. Once you have enabled them, you will configure Docker Desktop to use either Hyper-V or WSL2. If you run in Hyper-V, you don’t need to create, configure or even login to the Linux VM. Docker Desktop takes care of that for you.
Ok, I have Docker Desktop, now what?
Once you have installed and configured Docker Desktop, you’re ready to start playing with containers. I will use SQL Server containers. But there are container images for basically everything. In some of my demos, I use a .NET Core SDK container to build a .NET Core application, which I then run in an ASP.NET Runtime container. Imaging something that can run in Linux. That something can run in a Container, and there’s 99,9% chance someone else already wanted to do it and created a container image for it. And despite what some people may thing of people, people are mostly friendly and like to share things, so look and you’ll find.
Some simple docker commands to get you started
Fire up your favourite terminal and start playing. Here are some commands to get you started.
# Pull, or download if you will, the layers that make up the
# container image for SQL Server 2022, latest build.
# Think if "pull" as "download installation media". Or even
# "download a snapshot of a preinstalled VM with SQL Server"
# Only there's no VM, because a container is not i VM, it's just
# isolated processes running in Linux.
docker pull mcr.microsoft.com/mssql/server:2022-latest
# Use docker create to create a container from an image.
# If that image isn't already downloaded,
# docker create will take care of the pull for you.
# parameters to docker create
# -e Environment variables to create inside the container
# -p Port-mapping. Map local port X to container internal port Y
# -v Volumes. Mount a persistant docker volume to a path
# inside the container file system.
# Put the name of the image you want to create the container
# from last in the docker create command
docker create -e"SA_PASSWORD=Pa55w.rd" -e"ACCEPT_EULA" -p1450:1433 -v datavolume:/var/opt/mssql --name myveryfirstcontainer mcr.microsoft.com/mssql/server:2022-latest
docker start myveryfirstcontainer
# docker start will start a container created with docker create
# docker stop will stop a container that is running
# docker rm will remove a container. Stop it before removing
# docker ps will show you running containers
# docker ps -a will show you all containers
# docker run is a combination of docker create and docker start.
# You probably want to use the -d switch to start the container detached
# Otherwise your terminal will run the container process interactively and when
# you close the terminal, your container dies
docker run -e"SA_PASSWORD=Pa55w.rd" -e"ACCEPT_EULA" -p1450:1433 -v datavolume:/var/opt/mssql --name myveryfirstcontainer -d mcr.microsoft.com/mssql/server:2022-latest
# finally, docker exec. It will run a command inside the container.
# if you want to run it as "fire and forget", or rather if you want an unattended
# execution, you want to run it with -d, detached that is.
docker exec -d myveryfirstcontainer touch /var/opt/mssql/emptyfile
# Sometimes you want to run commands interactively inside the container
# Then you want a shell. Bash or sh for example.
# Then you want to run them with the -i and -t switches.
docker exec -it myveryfirstcontainer /bin/sh
Happy playing with containers. Once you have played some, you will want to learn some more. When that happens, there’s a chance I have written some more on the topic and you will return here. I of course didn’t learn in a vacuum. Andrew Pruski knows a lot more than I do about containers and what he writes and presents will take you to the next level once you start playing. Start on Andrew Pruskis website: DBA From The Cold | Ramblings on working as a SQL Server DBA
The title of this post: A while ago, I did a Teams presentation for my son’s class, about my occupation. And to make it a bit more fun, I told them I’m a Database Specialist and a YouTuber. The later didn’t amuse my son. He was upset when he came home from school. “You don’t even have 1000 subscribers on your channel, you’re not a ‘tuber’, he said.” I still don’t have 1000 subscribers, but it’s getting closer, so maybe I’ll have to start printing swag and call myself a SQL Influencer instead of a consultant soon 🙂
TL/DR; Keep videos short if you want viewers to watch the whole thing. That’s really hard if the video is a recording of a 60 minute long presentation 😂
Since I started organising and hosting SQL Friday, I have recorded every presentation. Most of them are published on my YouTube-channel (the rest are editing-work-in-progress and will show up shortly).
“Rank” 3 and 4 are my own videos, so not SQL Friday Recordings. They are about Temporal tables and and about Tally tables.
YouTube studio offers some more analytics than just the views, and here it becomes interesting and actionable for content publishers. As a publisher, you probably want viewers to watch the whole video, right? Sorry for you, but that’s not going to happen. At least not all viewers. People are going to watch the beginning and then drop off. Or watch some sections but fast forward through other sections. Or watch 20 minutes, and then the phone rings and they shut down their browser etc.
What I found out is, publishing full episodes from SQL Friday is not ideal. SQL Friday presentations are on average just over one hour, including the after-presentation chats which I usually leave in the video. That’s almost the length of your typical Hollywood movie. And that’s honestly too long for a YouTube video. But what if my goal isn’t to get as many views as possible? I mean, of course I want everyone to watch and learn from the fantastic presentations that speakers have given to the SQL Friday members. But I realise that one-hour long videos about SQL Server has a somewhat narrow possible audience. So I’m happy if just a handful of people, who missed the live presentation, watches the videos end-to-end.
For Deepthis video about SQL Server itnernals, the average view time is 8 minutes 45 seconds. That’s only 13,4% of the total video, so it must be really bad, right? Not really. Or not necessarily. Because only 1/3 of the viewers watched past the first 40 seconds. That’s very, very typical for a YouTube viewer. You watch something, and when the video ends, you get a new suggestion. You start watching, and maybe, just maybe that wasn’t an awesome suggestion, so you skip to the next suggestion. So if viewers watch 13,4% of Deepthi’s video, and 2/3 drops off after less than 40 seconds, that means the 1/3 that stays past 40 seconds watch roughly 40% of the video. That’s a whole lot better than 13,4%, right?
Using YouTube Studio’s analytics section, you can go into depth with every video, and try to figure out what makes people loose attention, or keep watching. For real “tubers”, this will have a direct impact on income. For me, I don’t have the channel to make money on ads. But I do like it when the people who find the videos keep watching them, because that means they probably like the content and think they can learn something from it.
What’s the ideal video then? I have no idea to be honest 🙂 But it looks like at least “my” viewers stay on the video for a larger percentage of time if the video is around 10 minutes compared to 60 minute videos. So perhaps “Keep it short” is the advice? On the other hand, if the video is a recording of a 60 minute presentation, it’s pretty hard to make it 10 minutes long.
I recently started seeing lots of implicit conversions in execution plans in a system for which I oversee databases. Implicit conversions happen when two values with different datatypes are to be compared. Here’s first an example of implicit conversion that does NOT affect performance all that much. I’m running the query in the AdventureWorks2014 database.
What is Implicit Conversion? Is it good or bad?
SELECT
BusinessEntityID,
FirstName,
MiddleName,
LastName
FROM
Person.Person
WHERE
LastName='Sánchez';
What’s gonna happen when this query executes? The SQL Server Optimizer is going to find an index, ix_Person_LastName_FirstName_MiddleName with the names as key columns and BusinessEntityId on the leaf level (since it’s the clustered index, the clustering key is always included on the leaf level of non clustered indexes). So the optimizer is going to choose to do an index seek on that covering index. But before seeking into the index, SQL Server have to CONVERT values, because ‘Sánchez’ is a varchar value and the LastName column is an nvarchar column. So implicit conversion is going to happen. The optimizer is going to choose to convert the varchar value to nvarchar and then perform the seek operation. The reason for that is that nvarchar has higher precedence than varchar. This makes sense. Nvarchar can store text that varchar can’t. Doing it the other way around – convert nvarchar to varchar – would result in loosing some data in the conversion and therefore give us the wrong results.
Here’s where you can see implicit conversion in the execution plan.
Hover on the Index Seek operator and you’ll see that the varchar value is converted with CONVERT_IMPLICIT to nvarchar(4000). You could also right-click on the operator and look at properties to find out more. Properties for the Seek Predicates of the Index Seek operator.
Will this implicit conversion affect the performance of our query? No, it won’t. At least not noticeable. Our constant is converted to nvarchar(4000) and then it can be used to seek into the index.
Let’s look at a more problematic example. Let’s run two queries against the Sales.Customer table.
SELECT
CustomerId,
AccountNumber
FROM
Sales.Customer
WHERE
AccountNumber = 'AW00000013';
SELECT
CustomerId,
AccountNumber
FROM
Sales.Customer
WHERE
AccountNumber = N'AW00000013';
What do you think will happen with these two very similar queries? The first one will perform an index seek into the index AK_Customer_AccountNumber. The second one will perform an index scan of the index AK_Customer_AccountNumber.
And it’s not only seek vs scan. We get an extra compute scalar and a filter in the lower plan. But the important operator is the Index Scan and the Filter operators.This is in the tooltip of the Filter operator and the cause of bad performance.
What happens here? The upper query performs just the way we expect. Use the index, seek into it. But in the lower query, we send in an nvarchar value. Since the table column is varchar, we get implicit conversion on the COLUMN, not the parameter/constant expression. This is really bad. It means we are scanning every single row of the table and pass it on to the FILTER-operator, where the predicate is evaluated.
Dapper
Now we have looked at what implicit conversion is. We also now know a little more about when it can be bad for us.
I said in the beginning that I saw implicit conversion in Query Store. And it wasn’t the good kind of implicit conversion, where parameters or constant values are converted. I saw implicit conversion on the table side of things, resulting in index scans.
Investigating things got me to queries sent from a service, in a code path using Dapper. Dapper is a library that can be used to map database results to application code objects. It’s relatively easy to use and lots of developers favour it vs just using SqlClient.SqlCommand or other frameworks like Entity Framework.
Dapper. And parameters.
Dapper provide a really simple way to pass parameters to queries. It looks a little something like this:
var template = new SalesCustomer { AccountNumber = "AW00000013" };
var parameters = new DynamicParameters(template);
var sql = "select CustomerId, AccountNumber from Sales.Customer where AccountNumber = @AccountNumber";
using (var connection = new SqlConnection(connString))
{
var customers = connection.QuerySingle<SalesCustomer>(sql, parameters);
}
See how simple the parameter passing is? Just create a template from a SalesCustomer class with a specific AccountNumber and create a DynamicParameters result set from this template. But the parameters aren’t strongly typed. This means Dapper/.NET will choose a datatype for us. This will translate to an sp_executesql call with defined parameters. AccountNumber will be passed in as an nvarchar(4000) parameter to the query.
How is this bad?
As we saw in the execution plans above, when we send in an nvarchar value to a query that uses it to compare it to a varchar column, we will get implicit conversion on the table side of things. SQL Server will scan every row of the index and then apply the filter predicate. That’s not an effective use of a covering index. Effective use would have been an index seek.
What should we do then?
We saved a couple of characters of code by not specifying the datatype for the parameters when we created the Dapper code. But we make the database suffer every single time we run the query. What’s the correct way of doing this?
In Dapper, we could instead create an empty parameters collection and use the Add-method to add parameters, with datatype, direction and more. In Dapper, it would look a little something like this.
var parameters = new DynamicParameters();
var AccountNumber = "AW00000013";
parameters.Add("@AccountNumber", accountNumber, DbType.AnsiString, ParameterDirection.Input, AccountNumber.Length);
var sql = "select CustomerId, AccountNumber from Sales.Customer where AccountNumber = @AccountNumber";
using (var connection = new SqlConnection(connString))
{
var customers = connection.QuerySingle<SalesCustomer>(sql, parameters);
}
In the above example, we explicitly defines the AccountNumber parameter as DbType.AnsiString, which will translate to varchar in SQL. Using DbType.String would give us an nvarchar parameter. But in our case, we want to pass a varchar parameter. And while we’re at it, we’re setting the LENGTH of that varchar parameter. Because otherwise it’s going to be sent as varchar(8000). For the comparison itself this doesn’t matter too much. But the larger datatypes sent in as parameters, the larger the memory grant you’ll need to execute the query. And if you’re going to explicitly set the datatype, you might as well explicitly set the length too.
Conclusion
Be explicit. A few extra lines of code could potentially save you from that 2AM call when you’re on call, because that one important batch brought the database on it’s knees. Use AnsiString for varchar and String for nvarchar.
In part 1 I showed in a video how to create a new, temporal table for row versioning in Sql Server. In this post, I will show how to convert an existing table to a system versioned (temporal) table.
CREATE TABLE dbo.PriceList(
ProductID INT NOT NULL CONSTRAINT PK_PriceList PRIMARY KEY CLUSTERED,
ListPrice MONEY NOT NULL
);
INSERT INTO dbo.PriceList (ProductID, ListPrice)
VALUES
(1,10),
(2,11),
(3,12);
We start with this small table, same as we used in part 1. Only this table already contains data. Same principles apply to existing tables, we need to add the two columns for start- and end-dates.
So we could try this.
ALTER TABLE dbo.PriceList ADD
RowStart DATETIME2(7) GENERATED ALWAYS AS ROW START,
RowEnd DATETIME2(7) GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME(RowStart, RowEnd);
But that gives us this result
Msg 4901, Level 16, State 1, Line 12
ALTER TABLE only allows columns to be added that can contain nulls, or have a DEFAULT definition specified, or the column being added is an identity or timestamp column, or alternatively if none of the previous conditions are satisfied the table must be empty to allow addition of this column. Column 'RowStart' cannot be added to non-empty table 'PriceList' because it does not satisfy these conditions.
So we can do this different ways. We could add the columns as normal columns, set values to them and then decorate them with “Generated always”. Instead, I’m going to add two default-constraints to the columns. RowStart can be set to whatever you find suitable. Current date and time perhaps? I’m going with 1900-01-01, to indicate the actual start-date is unknown.
ALTER TABLE dbo.PriceList ADD
RowStart DATETIME2(7) GENERATED ALWAYS AS ROW START CONSTRAINT DF_PriceList_RowStart DEFAULT '1900-01-01',
RowEnd DATETIME2(7) GENERATED ALWAYS AS ROW END CONSTRAINT DF_PriceList_RowEnd DEFAULT '9999-12-31 23:59:59.9999999',
PERIOD FOR SYSTEM_TIME(RowStart, RowEnd);
More important than the value for RowStart is the value for RowEnd. It has to be the max available value for the data-type. Since I’m using datetime2 with a high resolution, I need to set the date with seven decimals (three for milliseconds, three for microseconds and 1 for 100 nanoseconds).
Now I have prepared the table for system versioning. All that’s left to do is to alter the table to make it system versioned.
ALTER TABLE dbo.PriceList SET (SYSTEM_VERSIONING=ON(HISTORY_TABLE=dbo.PriceList_History));
And now, if we want to, we can drop the default constraints. They are kind of redundant, since the columns are decorated with Generated Always.
ALTER TABLE dbo.PriceList DROP CONSTRAINT DF_PriceList_RowStart;
ALTER TABLE dbo.PriceList DROP CONSTRAINT DF_PriceList_RowEnd;
And finally, let’s change values in the main table.
UPDATE dbo.PriceList SET ListPrice = 100 WHERE ProductID = 1;
If we now check the main table and the history table, we can see that the automatic row versioning is indeed in effect.
SELECT * FROM dbo.PriceList WHERE ProductID = 1;
SELECT * FROM dbo.PriceList_History WHERE ProductID = 1;
Done!
I hope you found this post useful. In part 3, we will look at how we can convert existing temporal pattern tables and make them automagically system versioned, temporal tables. That is: Tables where we have data in both a main table and a history table.
I’m working on a video series about Temporal Tables. In this first video I’m showing how to create a table which has System Versioning switched on. You might have heard the word Temporal Tables. What it means is that System Versioning is switched on for the table, and that all changes to the table are recorded in its history table.
Part 1 of my Temporal Tables video series.
In upcoming videos and blog posts, we’re going to look at some more complex scenarios, such as
Add system versioning to an existing table
Switch from home-cooked versioning to automatic system versioning
Use the FOR SYSTEM_TIME features in Transact-SQL
Performance considerations for temporal tables
Do you have any specific use cases for temporal tables that you want me to bring up in this series? Drop a comment here, or on the YouTube video, and we’ll see what we can do.
Today, I have another example of the usage of Tally Tables. I have used this approach many times, to generate a calendar dimension for a Data Warehouse. Or more often, when I need to do reporting with fixed periods (hourly, daily, weekly etc) but there’s no underlying data for some of the periods.
In the video, I show how the technique with generating variable length time slots work.
As in my previous post, I have finalised the code and made a function that you can use. It’s an inline table valued function, so you can join or cross apply to it without having to worry too much about performance issues with it. Having said that, cardinality estimation may or may not be a performance issue for you if you use this function to drive eg a report. Because SQL Server might make some funky assumptions about the number of rows returned from the query. But let’s first look at the code for the actual function.
CREATE FUNCTION dbo.GenerateTimeSlots(@Start DATETIME, @End DATETIME, @IntervalLengthSeconds INT)
RETURNS TABLE AS
RETURN(
WITH ten AS(
SELECT n FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(n)
), millions AS (
SELECT
TOP(DATEDIFF(SECOND,@start,@end)/@IntervalLengthSeconds)
ROW_NUMBER() OVER(ORDER BY (SELECT 'no specific order')) AS n
FROM ten t1
CROSS JOIN ten t2
CROSS JOIN ten t3
CROSS JOIN ten t4
CROSS JOIN ten t5
CROSS JOIN ten t6
ORDER BY n
)SELECT
DATEADD(SECOND,@IntervalLengthSeconds * (n-1),@start) AS TimeSlotStart, n AS TimeSlotNumber
FROM millions);
Depending on how you call this function, you may or may not suffer from bad cardinality estimation. Let’s start with an example where SQL Server estimates cardinality properly
SELECT * FROM dbo.GenerateTimeSlots('2020-03-11',CURRENT_TIMESTAMP, 3600);
In the above call, SQL Server estimates 13255 rows to be returned from the query. Which turns out to be exactly right (depending on when you run it of course, since I’m using CURRENT_TIMESTAMP for the end-date).
Now, try calling the function with these lines of code.
DECLARE @IntervalLengthSeconds INT=3600;
SELECT * FROM dbo.GenerateTimeSlots('2020-03-11',CURRENT_TIMESTAMP,@IntervalLengthSeconds)
This time, SQL Server estimates 100 rows to be returned from the query. Which of course isn’t correct. It will still return 13255 rows. This is because SQL Server sees a variable for the @IntervalLengthSeconds parameter and doesn’t know the value of the parameter.
If you WILL use a variable for any of the parameters, and you know how many time slots you _most often_ will return from the query, you can make an adjustment to add your own estimation using the query hint OPTIMIZE FOR. Like this:
DECLARE @IntervalLengthSeconds INT=3600;
SELECT * FROM dbo.GenerateTimeSlots('2020-03-11',CURRENT_TIMESTAMP,@IntervalLengthSeconds)
OPTION(OPTIMIZE FOR(@IntervalLengthSeconds=3600));
In the above example, this of course looks a bit stupid, because I set the variable value to a constant and then optimize for the variable to have that value. It would be much easier to just call the function using that constant. But you might pull up that value from a table in a previous statement, and in that case, you can use this technique to force a certain value, for the cardinality estimation.
This is my first attempt at short Transact-SQL videos. I was answering a question on a SQL Server related Facebook group. The question was about generating four-letter codes containing all possible combinations of letters A-Z. There are 26 letters in the english alphabet. Meaning the T-SQL code should generate 26^4 rows, or 456,976 rows.
But I worked a bit further on the solution, and created a stored procedure which with parameters can create any length strings for the codes and also persist the data in a table instead of returning the rows to the caller.
What I’ve done is generate a string with SQL-code, based on the parameters, and then execute the SQL-code with the system stored procedure sys.sp_executesql.
CREATE OR ALTER PROC dbo.GenerateLetterStrings
(
@LetterCount TINYINT,
@ResultTable NVARCHAR(128) = NULL
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @sql NVARCHAR(MAX)
= N'
WITH alphabet AS (
SELECT CHAR(ASCII(''A'')+ ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) -1) AS n
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(n)
)SELECT
<<selectlist>> AS word
<<intoclause>>
FROM alphabet AS t0
<<crossjoin>>
' ;
DECLARE @SelectList NVARCHAR(MAX) = N't0.n',
@CrossJoin NVARCHAR(MAX) = N'';
WITH eight
AS (SELECT n
FROM
(
VALUES
(1),
(1),
(1),
(1),
(1),
(1),
(1),
(1)
) t (n) ),
twofiftysix
AS (SELECT TOP (@LetterCount - 1)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n
FROM eight
CROSS JOIN eight e2
ORDER BY 1)
SELECT @CrossJoin = @CrossJoin + CONCAT('CROSS JOIN alphabet as t', n, '
'),
@SelectList = @SelectList + CONCAT('+t', n, '.n')
FROM twofiftysix;
SET @sql = REPLACE(@sql, N'<<selectlist>>', @SelectList);
SET @sql = REPLACE(@sql, N'<<crossjoin>>', @CrossJoin);
SET @sql = REPLACE(@sql, N'<<intoclause>>', COALESCE(N'INTO ' + @ResultTable, N''));
EXEC sys.sp_executesql @sql;
END;
I’m trying to summarise 2020. Don’t know how to describe the year. It’s been…. rich in content.
This post will be slightly unstructured as I’ve tried to write down just the things
Picture: mr_write at morguefile.com
New Year, New Opportunities
2020 started pretty normal. After some time off with the family, I started a new, one-year consultant assignment with the company I was employed at before I started my company. Though I had been away for some years, it was nice to be back. A lot had changed, but even more was the same. Most of all, it was nice to catch up with some very awesome people that I hadn’t seen for a couple of years.
I had set out to visit more SQL Saturdays and other conferences than previous years, and I started with a trip to SQL Saturday Vienna on January 24th (yes, that’s a Friday, the Austrians don’t know their weekdays like the rest of us😊).
Covid-19
I’m not sure when we first heard about Covid-19, but it can’t have been long after SQL Saturday Vienna. It started as something in China. Then there were some travel restrictions and before we knew it, several european countries were reporting the virus spreading out of control.
In my team at work, key people were sent home to work from home, to make sure we could maintain operations if the virus took out the whole office. Within two weeks, the rest of us were sent home. This must have been by the end of March. Initially, we were meant to work from home for a few months, perhaps until summer vacations. And when people came back from summer vacations, the word was that we would be slowly moving back to working in the office instead of working from home. And then came the second wave.
Since end of March, I haven’t set foot in my client’s office. Instead, I rented a small office room in my home town, early November so that I wouldn’t have to spend afternoons with “daddy is in a meeting, please keep your voice down” to my kids. After all, the house is there home more than it was my office. It feels nice to leave the house and go to the office. There are a few more persons on the office floor where I rent my room, on a busy day there are as many as four or five. We keep the distance, and I work with the door to my room closed, so it feels safe.
This pandemic will be over at one point. We don’t yet know when. Vaccine is being distributed as I write, but it will take months before enough people get the vaccine before we can relax.
I think this pandemic and the timing of it will change the way many of us work. Obviously, some jobs can’t be done remotely. Teaching and learning will be mainly an in-person activity. Construction workers, nurses and doctors can’t work from home. But for all of us that CAN work from a remote location, I think the “new normal” will be remote work, with fixed days when we meet in a common location. At least I’m not ready to go back to spending 12,5 hours per week commuting anytime soon.
One year working with this client has been really amazing and I’m happy to have signed a contract for 2021 as well. It’s been a tremendous team work, when we have migrated lots of systems to new infrastructure, with the goal of improving documentation and become more cloud-ready with the applications. I have become more fluent with Powershell and I have learned a lot about networking, DNS and load balancers than I anticipated. And of course, I have tuned a SQL statement or two. After all, DBA is my main reponsibility within the team.
Data Weekender
With SQL Saturdays cancelled or postponed, there was a vacuum in the Data Community. Speakers didn’t have conferences to attend and data professionals had far less learning and networking opportunities when countries went into lockdown mode. I responded to a tweet from Kevin Chant, where he asked if anyone was interested in trying to organise a virtual conference. Damir Matesic, Gethyn Ellis, Asgeir Gunnarsson and Mark Hayes also replied to Kevin’s tweet.
We had a first zoom meeting to discuss the format of a virtual conference. Mark did some magic with a cool picture from Kevin’s honey moon, where he was sitting in a camper van, and we decided the name of the conference would be Data Community Weekender Europe (which was later changed to just Data Weekender). https://www.dataweekender.com
On this first zoom-call, we also decided on the date for the first edition of Data Weekender. And looking back, I can totally see how unrealistic it was. We were to organise a conference within 30 days, and all we had was a name and a cool picture.
But we did it. We opened up the Call for Speakers and left it open for only ten days, April 8 – April 17. We organised Data Weekender on a zero budget, which meant we didn’t have a marketing budget. Instead, we relied on Twitter and LinkedIn to spread the word. And it worked! We anticipated a conference with two or three tracks but we got 196 submissions from 76 individual speakers and ended up running the conference on six tracks with a total of 42 sessions.
The first edition of Data Weekender was May 2nd and we had roughly 600 individuals participating in the conference.
On October 17th, we did #DataWeekender #TheSQL, with a few more sessions and with some lightning talks. More or less the same number of participants, which we think was a pretty good result, as by then, there were many, many more virtual events being organised.
Thank you Kevin, Damir, Gethyn, Asgeir and Mark! I look forward to The Van running again in 2021!
SQL Friday
Just after the first edition of Data Weekender, I went public with my idea to run a weekly online lunch event, SQL Friday (https://sqlfriday.net) . I didn’t publish a Call for Speakers for the whole season, but instead made sure I had speakers for the first few weeks, started tweeting about it and hoped for the best.
And it worked out pretty well. Sql Friday episode #1 had Damir Matesic as the guest star. The topic was T-SQL and JSON and it had 120 registered attendees.
The format of SQL Friday is casual. There’s no powerpoint template. There are no sponsors to thank (my company is organising the event and people participating is enough thanks) and speakers can choose to do 60 minutes of demos or 60 minutes of just powerpoint slides. It’s all up to the speaker.
There have been a couple of “bloopers” this season, and I’m sure we will have more of them in 2021. When Mark Hayes had some problems with his audio, I told him for a minute or two that we could now hear him, while he was changing settings. But I was muted, so he didn’t hear me. But the worst one was probably when Gianluca Sartori got a blue screen in the middle of a demo. “Gianluca, hello, are you there?”. But he was back on the call within five minutes and continued like nothing happened. Impressive!
I want to thank all the speakers and attendees for joining me for 29 fridays in 2020. The schedule for January-June 2021 is published and we have 25 really good sessions to look forward to, the first one being on January 8. If you haven’t already, join the Meetup group at https://meetup.com/sql-friday.
SQL User Group activity
With in-person events out of the question, SQL Server Usergroup Sweden went virtual, as did many other groups. And we’re still virtual, at least for the coming few months. Lately, we have started having bi-weekly user group meetings, even when there are no speakers. We just meet and have a chat about work and personal life. It’s nice to see some familiar faces and get that important networking going.
Public speaking
With in-person events cancelled, I have had more opportunities to speak at events and user groups I wouldn’t have been able to visit otherwise. For example, I did a talk about dbatools for the Quad City User Group in April, and another one about partitioning for the PASS380-group, the day before. It was middle of the night for me, but I’m a night owl anyway so it was all right. I “went to” Singapore in the autumn and I spoke at Data Platform Summit in the winter. And I’ve done a few Virtual SQL Saturday talks as well.
In september, I did an in-person talk at SQL Saturday Gothenburg. There were not many attendees, I think Mikael who organised the event had set a limit of 50 attendees. But regardless the size of the event, it felt so, so good to be in a classroom and actually meet people, to be able to stay for an informal Q&A-session in the hallway after the talk, and to be able to hang out in the hotel bar with Asgeir Gunnarsson and Erland Sommarskog in the evening.
MVP
I can’t summarize 2020 without mentioning I was awarded Microsoft Most Valuable Professional. I know a lot of MVPs say the same, but I was honestly not expecting this award. Thank you so much Damir Matesic for nominating me, and thank you Microsoft for the honor. I’m gonna do my best to continue contributing to the Data Platform community. And perhaps a few contributions to the Powershell community in the year to come.
PASS
The sad news about PASS taking its last breath on January 15th 2021 leave more questions than answers. Will anyone take over SQL Saturday? Will anyone take over PASS Summit, the largest Data Platform conference in the world? And do we need a global organisation for the Data Platform community? We will see. But I do know that PASS driven events took me to where I am today. My first Data conference was SQL Rally Nordic, in Denmark. That’s when I started thinking about public speaking. A PASS Usergroup was my first SQL related presentation. And PASS SQL Saturday was the first time I presented on a conference.
Thank you PASS for all these years. And thank you all amazing people within the Data Community. #SqlFamily will remain, we’re yet to see in which shapes.
Health
Despite all the professional development and all the community work I have done in 2020, the most important has been my personal health. After some stressful years of starting my company, commuting for hours every day and working a lot more than what’s healthy, I see the period of working from home as a real blessing. I mean, I don’t see covid-19 as a blessing, but the consequences has been that I’m home a lot more than before, I spend a lot more time with my family and I picked up running again.
Running. That used to be an important part of my life. But I haven’t prioritised it for years. I have eaten unhealthy food and I have gained weight. In just a few years, I put on 25kg. When I started working from home, I weighed more than 100kg, with a Body Mass Index of 32,5 (obesity). After working from home (or from my nearby office room), I have lost 14kg and my Body Mass Index is now at 28. It’s still considered Overweight, but it’s a lot better than 32,5. And I love running again. The last seven days of 2020, I have run every day. And I’m not finished. My goals for 2021, which I’m sure I will meet, is to loose another 10kg and to run a marathon before 2021 is over.
2021
I don’t know what 2021 will bring, other than what I said in the previous paragraph. I do know that there will be abother Data Weekender event. And that there will be SQL Friday events. I’m sure before 2021 is over, there will be in-person Data events to visit as well. I have already made a promise to be on the front row when Nikola Ilic makes his first in-person conference presentation, so I guess that’s one conference (wherever it may be) that I will visit.
But the biggest take-away from 2021 is to not plan too far ahead. Plans change. Some changes are minor, some or huge. And we will see change in 2021 as well.
I have worked quite a bit (too much?) with SQL Server replication. It has mostly been Transactional Replication and less Merge Replication.
A very common scenario I come across in both real life and in forum discussions is SQL Server Replication and Availability Groups.
This is the first in a short series of posts about replication and Availability Groups. As time permits, I will also finish posts on the Distributor and Subscriber roles in Availability Groups and finalize with a post on war stories from trying to move a distributor and subscribers from standalone instances to Availability Groups.
I’m not going to give very detailed instructions. That’s already well described in official documentation. Use this page as a starting point: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-replication-for-always-on-availability-groups-sql-server?view=sql-server-ver15
Instead I’m gonna try to summarize which scenarios are supported and which are not supported and a brief overview of the steps included. I will also share some of my experience too, gotchas and success factors.
Setup Publisher in Availability Group
A publisher can be part of an availability group. Both transactional replication and merge replication are supported scenarios.
Assuming you have already setup the distributor, these are the steps. Order does matter. I’m also assuming you have already setup the Availability Group before configuring replication publisher. If that’s not the case, aka you have already setup a publisher and then add them to availability group, this is also supported and you will just have to skip the steps that have to do with setting up the primary replica as publisher in this list.
Install replication features on all replicas.
Configure the primary replica and all secondary replicas as publishers in the distributor (with sp_adddistpublisher).
Configure the distributor at all replicas (with sp_adddistributor).
Enable the database(s) for replication (with sp_replicationdboption) in the primary replica.
(Documentation say linkedserver-connections are to be setup from publisher to push subscribers. I don’t do this, and more specifically, I don’t always know which subscribers I will have when I setup publications).
On distributor, redirect publisher to Availability Group Listener name (using sp_redirect_publisher)
Validate redirection works (with sp_validate_replica_hosts_as_publishers). This step will fail for secondary replicas not setup for read access.
It might come out as strange that the validation step fails. Usually you don’t want red errors in your result. But to get rid of red errors in the validation, you will need to pay additional enterprise licenses for the secondary replicas. If you won’t use the secondary replicas for read-only workloads, it’s not gonna be worth spending the extra money on additional enterprise licenses.
Test, test, test
I strongly encourage you to test this scenario and also test failing over to all secondary replica hosts.
On a busy system, you will have very, very, very nasty side effects when your publisher isn’t able to send stuff to the distributor. You will see transaction logs fill up and you will see replication as log_reuse_wait_desc in sys.databases. What that means is your transaction logs will grow, grow and then grow some more, until there’s no more disk to grow on and your databases will stop. If you miss this, and get a call in the middle of the night (these things always happen around 2AM) you’re gonna find yourself in a fun situation. A fast way to get rid of traces of replication is to set DB to single recovery mode and then run sp_removedbreplication. But it’s an Availability Group your dealing with here, and simple recovery mode doesn’t play well with Availability Groups. So your other option will be to reconfigure replication on the now primary replica, publish something and then remove replication. Either way, you’re into a mess you don’t want to deal with in production.
Script, script, script
As part of your setting up replication publishers on Availability Groups in your sandbox environment (that is where you should start, if you don’t have such environment, create one, it’s worth the effort and money), create scripts and make sure those scripts work.
Script to setup replication.
Script to add secondary replicas including setting up replication on them.
Script to fail back to your original primary replica (where you have a greater chance of logreader agents actually working and not filling up transaction logs).
Script to remove all traces of replication.
The script to remove all traces of replication is an important one. It might have to start with enabling replication, with the distribution database on the publisher and then removing it, as some commands to remove replication won’t work if you can’t reach the distributor.
It doesn’t matter if you create SQL-scripts, Powershell scripts or anything else. But please do what you can to automate creating and destruction of replication. Test these scripts. Not just the normal flow, but the odd quirky ones too. Test from the DR-site, to make sure you have firewalls open to everything you need. Test on all replicas. Test what happens when distributor is offline.
Monitor, monitor, monitor
There are a million ways to monitor your SQL Server environment. Choose your favourite flavour. But make sure to monitor your replication setup. Monitor transaction log usage on the publisher side. If your published databases’ transaction log usage remains high after transaction log backups, you want to set off alarms. You want to monitor messages from the Log Reader Agents. Your Log Reader Agents might seem to work just fine, in the way that they are in a running state. That doesn’t mean they are sending anything at all to the distributor. Try to monitor end-to-end. Does an inserted or changed row from a published table in the publisher end up in the distributor? If it doesn’t, you have a problem. And you want to solve that problem before your index maintenance jobs start filling up the transaction log the next night.
Do I hate replication?
Reading my warnings above, it might look like I hate SQL Server replication. That’s not the case. Transactional replication is a much cheaper solution than readonly AG secondaries for reporting scenarios. And you can choose to replicate only the stuff you need for reporting instead of the whole database. And you can add your reporting-specialised indexes on your reporting database. And, and, and…
But be careful. You don’t want to called in at 2AM because your production database died. Especially if the reason it failed is because you weren’t careful.