Backup to Azure Storage Container – overcome the 200GB limit with striped backups

I was recently tasked with backing up some databases to an Azure Storage Account, using BACKUP TO URL in SQL Server. It is quite simple. Create a Shared Access Token for the Storage Container and then create a CREDENTIAL in SQL Server using “shared access token” as identity and the access token as password.

In an Azure Storage Container, one writes either page blobs or block blobs. Think of a blob as a file in a file system. When SQL Server backups to an URL, it will create a block blob. The size limitation for a block blob in an Azure Storage Container is 200GB. For a large database, the backup file will be larger than 200GB and you will get some funny I/O device errors. The error message will not tell you that you hit the 200GB threshold, it will just tell you there was an I/O failure on the backup device. And that error won’t come at the start of the backup. It will come when SQL Server has written 200GB.

Why 200GB? The limitation is actually 50.000 blocks. A block can be up to 4MB. 50.000x4MB ~ 200GB. Use MAXTRANSFERSIZE for the backup command to set this size. Also use the BLOCKSIZE parameter for the backup command and set it to 64kB (65536 bytes) to maximize the maximum size of a single blob created by the 50 000 blocks in a blob.

BACKUP DATABASE [Underlandet] TO URL='https://saunderlandetdatabasebackups/productionbackups/underlandet.bak WITH BLOCKSIZE=65536, MAXTRANSFERSIZE 4194304, FORMAT, INIT, CHECKSUM, COMPRESSION

Ok, now that’s quite a limitation you might think. Large databases will get large backup files. Even with compression, large enough databases will have backup files larger than 200GB. You could of course backup different file groups to different backup files to overcome this limitation. But sooner or later, as your database and your file groups grow, you will hit the 200GB limit.

The way to overcome this limit is to stripe the backups to multiple files. You stripe a backup to multiple files by simply adding multiple URL=<storage container path> to your backup command. Something like this

BACKUP DATABASE [Underlandet] TO 
  URL='https://saunderlandetdatabasebackups/productionbackups/underlandet-1-of-2.bak,
  URL='https://saunderlandetdatabasebackups/productionbackups/underlandet-2-of-2.bak
  WITH BLOCKSIZE=65536, MAXTRANSFERSIZE 4194304, FORMAT, INIT, CHECKSUM, COMPRESSION

You can stripe a backup to up to 64 files. This gives you 200GB x 64 ~ 12,8TB, which at least for all the databases I normally work with is enough. If you still hit the size limit, you will have to split your database into multiple file groups and backup individual file groups instead of the whole database at once. If your database is THAT large, you probably have it split into multiple file groups already anyway (some of the file groups being read-only and not backuped with the same cadence as the read-write file groups).

Clone a docker volume used by SQL container

Erland asked me to have a look at an MS Q&A question today. The question in short was this:

I have SQL Server running in a container with /var/opt/mssql saved on a volume. Now I want to clone said volume to a new volume and spin up another container using the cloned volume, but doing so gives me an Access Denied error when starting the second container.

To reproduce the problem

# Create container sql1
docker run --name sql1 -v sql1volume:/var/opt/mssql -p1501:1433 -d -e "ACCEPT_EULA=YES" -e "SA_PASSWORD=Pa55w.rd" mcr.microsoft.com/mssql/server:latest

# Stop sql1 and clone sql1volume
docker stop sql1
docker volume create sql2volume
docker run --rm -i -t -v sql1volume:/original -v sql2volume:/clone alpine sh -c "cp -avr /original/* /clone"

# Start new container using the cloned volume
docker run --name sql2 -v sql2volume:/var/opt/mssql -p1502:1433 -d -e "ACCEPT_EULA=YES" -e "SA_PASSWORD=Pa55w.rd" mcr.microsoft.com/mssql/server:latest

# Check out the logs from sql2
docker logs sql2

Logs will look like this

SQL Server 2022 will run as non-root by default.
This container is running as user mssql.
Your master database file is owned by mssql.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
/opt/mssql/bin/sqlservr: Error: The system directory [/.system] could not be created. File: LinuxDirectory.cpp:420 [Status: 0xC0000022 Access Denied errno = 0xD(13) Permission denied]

Ok, error reproduced. Why does this happen?

It happens because Microsoft did good. Microsoft’s SQL Server-container images doesn’t run as root. They run as the user mssql, with user-id 10001. This is awesome, just as awesome as _not_ running SQL Server for Windows with a service account that is local admin on the server. However, the alpine container we use to clone the volume doesn’t run as user mssql. In fact, it doesn’t know who mssql is.

Thanks for the lesson. Now, how do we solve this mess?

It isn’t that hard to fix really. We need to create a user with user-id 10001 in our alpine container and we need to change ownership of the files in the container to this user. So let’s clean up and start cloning the volume again, this time creating the user too.

# cleanup
docker rm sql2
docker volume rm sql2volume

# Clone again
docker run --rm -i -t -v sql1volume:/original -v sql2volume:/clone alpine sh -c "adduser -u 10001 mssql;cp -avr /original/* /clone;chown mssql -R /clone"
docker run --name sql2 -v sql2volume:/var/opt/mssql -p1502:1433 -d -e "ACCEPT_EULA=YES" -e "SA_PASSWORD=Pa55w.rd" mcr.microsoft.com/mssql/server:latest

# check logs
docker logs sql2

Voila, logs look good!

2023-08-31 23:23:10.38 spid65s     [4]. Feature Status: PVS: 0. CTR: 0. ConcurrentPFSUpdate: 1. ConcurrentGAMUpdate: 1. ConcurrentSGAMUpdate: 1, CleanupUnderUserTransaction: 0. TranLevelPVS: 0
2023-08-31 23:23:10.38 spid65s     Starting up database 'msdb'.
2023-08-31 23:23:10.49 spid47s     [2]. Feature Status: PVS: 0. CTR: 0. ConcurrentPFSUpdate: 1. ConcurrentGAMUpdate: 1. ConcurrentSGAMUpdate: 1, CleanupUnderUserTransaction: 0. TranLevelPVS: 0
2023-08-31 23:23:10.49 spid47s     Starting up database 'tempdb'.
2023-08-31 23:23:10.81 spid47s     The tempdb database has 8 data file(s).
2023-08-31 23:23:10.82 spid62s     The Service Broker endpoint is in disabled or stopped state.
2023-08-31 23:23:10.82 spid62s     The Database Mirroring endpoint is in disabled or stopped state.
2023-08-31 23:23:10.83 spid62s     Service Broker manager has started.
2023-08-31 23:23:10.83 spid45s     Recovery is complete. This is an informational message only. No user action is required.

Final comments

This was actually pretty easy, but it wasn’t so easy to find it on the interwebs, thus this blog post. When running the updated cloning process, the alpine container (or rather the adduser command) will ask for a password. Just type anything there – it that password will be in the passwd file of the alpine container, which is thrown away as soon as it finished copying the files. But it’s a bit annoying – getting prompted for passwords makes it impossible to automate the process. There’s probably a flag for adduser to not change passwords but I din’t find it at first glance and it’s getting late here. Tomorrow perhaps…