Partitioning Tables to Speed up SQL Server Backup and Restores

We’re continuing to cover methods for speeding up your SQL Server database backup and restore processes. There’s one rule that must apply for all these techniques – code changes must not be necessary!

If you frequently need to restore production databases into QA/Dev environments, these techniques may well prove very useful in speeding up the process.If you need some help speeding up your SQL Server backup and restore operations, please contact us or take a look at our services.

Previous posts in this series:

  1. How To Speed Up Backups and Restores in SQL Server
  2. Backups and Restores in SQL Server – Filegroups

What is SQL Server Partitioning?

Partitioning has been a feature of SQL Server Enterprise Edition since 2005. However, since SQL Server 2016, partitioning has been part of Standard Edition. Briefly, horizontal partitioning involves splitting the data in a database table from one, usually very large table, into multiple, smaller parts. In this blog, we’re going to demonstrate how, when implemented, SQL Server Partitioning can allow you to target your database maintenance operations at sections of your table data, rather than the whole thing. When backing up and restoring data into QA/Dev environments, this can be hugely beneficial as you handle only the portions of the data that you really need to restore. You don’t need to change your applications and scripts to take advantage of Partitioning. We’re going to through the following steps:

  1. Create new filegroups and files
  2. Create a Partition Function and Scheme
  3. Move the data to the new partitions
      • This step is likely to be the most time-consuming and require the most planning.  If you’re considering Partitioning, then chances are, you have large tables. In a perfect world, you would have implemented Filegroups and Partitioning when designing your database, and not once you’ve filled it with terabytes of data.

We’re using the 10GB version of the Stack Overflow database, which has been modified to include an extra (fictional) table name ‘Audit’ (you can find the script to create that table here but you’ll need to insert the data yourself). This table keeps track of activity in the system, so has grown very large over time.

In order to speed up our backup and restore processes, we’re going to partition the Audit table based on its ‘date’ column so that the data created in historic years is stored in different Filegroups. This will ultimately allow us to configure our backups to only include the Filegroups containing the data that we want to restore. First of all, we need to create some new Filegroups:
    
      ALTER DATABASE [StackOverflow2010] ADD FILEGROUP [Year2018AndPrior];
GO
ALTER DATABASE [StackOverflow2010] ADD FILEGROUP [Year2019];
GO
ALTER DATABASE [StackOverflow2010] ADD FILEGROUP [Year2020];
GO 

    
   

If you’re not familiar with Filegroups, read our previous blog in this series.

The code above is simply adding three new Filegroups to our database. We’ve named the Filegroups based on the partitioned data that we intend to store in them i.e. separated by year, with everything prior 2019 belonging to a single Filegroup (more on that below). Of course, you may decide to use a none-date value, such as an integer.

This leaves us with four Filegroups – the Primary Filegroup, and the three new Filegroups that we’ve just created.

Next, we need to add a new physical data file to each of the Filegroups:

    
     ALTER DATABASE [StackOverflow2010]
ADD FILE
 (
 NAME = [Part2018AndPrior],
 FILENAME = 'C:\Databases\StackOverflow2010\StackOverflow_2018AndPrior.ndf',
 SIZE = 3072KB,
 MAXSIZE = UNLIMITED,
 FILEGROWTH = 1024KB
 )
TO FILEGROUP [Year2018AndPrior];
GO
 
ALTER DATABASE [StackOverflow2010]
ADD FILE
 (
 NAME = [Part2019],
 FILENAME = 'C:\Databases\StackOverflow2010\StackOverflow_2019.ndf',
 SIZE = 3072KB,
 MAXSIZE = UNLIMITED,
 FILEGROWTH = 1024KB
 )
TO FILEGROUP [Year2019];
GO
 
ALTER DATABASE [StackOverflow2010]
ADD FILE
 (
 NAME = [Part2020],
 FILENAME = 'C:\Databases\StackOverflow2010\StackOverflow_2020.ndf',
 SIZE = 3072KB,
 MAXSIZE = UNLIMITED,
 FILEGROWTH = 1024KB
 )
TO FILEGROUP [Year2020];
GO

    
   

We’ve now created the necessary file structure that will support our partitioned data. The next step is to create a Partition Function. The Partition Function tells SQL Server how we want to map the data rows in our table to our partitions:

    
     CREATE PARTITION FUNCTION [PartitioningByYear] (DATETIME)
AS RANGE LEFT FOR VALUES
( 
 '20190101',
 '20200101',
 '20210101'
);
GO

    
   

The RANGE LEFT syntax above is essentially saying “put values BEFORE 1st Jan 2019 in the first partition, put values BEFORE 1st Jan 2020 in the second partition, and put values before 1st Jan 2021 in the third partition.

Although we’ve mapped our data to our partitions using the Partition Function, we haven’t actually mapped the partitions themselves to the respective Filegroups.

To do that, we need to create a Partition Scheme:

    
     CREATE PARTITION SCHEME [PartitionByYear]
AS PARTITION [PartitioningByYear]
TO
(
 [Year2018AndPrior],
 [Year2019],
 [Year2020],
 [DEFAULT]
 );
GO

    
   
By examining the code above, we’re saying: Create a Partition Scheme, that uses the Partition Function ‘PartitioningByYear’, and maps the ranges of that function to our four Filegroups. Anything not caught by our three ranges will go into the DEFAULT filegroup. The final step is to move our data. There are several ways of moving data into a new Filegroup. Our previous post in this series demonstrated one way, which is to rebuild the clustered index. In this post, we’re going to take a different, arguably simpler approach, which is to add the data to a new table that is mapped to the new partition scheme:
    
     CREATE TABLE [dbo].[AuditPartitioned](
    RecordId[int] NOT NULL,
    UserId INT NULL,
    [Date] [datetime] NULL,
    [Notes] NVARCHAR(1000)
) ON [PartitionByYear] ([Date]);
GO
 
INSERT INTO [dbo].[AuditPartitioned]
(
    [RecordId],
    [UserId],
    [Date],
    [Notes]
)
SELECT [A].[RecordId],
       [A].[UserId],
       [A].[Date],
       [A].[Notes]
FROM [dbo].[Audit] AS [A]

    
   

Before we go any further, let’s confirm that our data is now partitioned as we expect:

    
     SELECT [p].[partition_number] AS [PartitionNumber],
       [f].[name] AS [PartitionFilegroup],
       [p].[rows] AS [NumberOfRows]
FROM [sys].[partitions] AS [p]
    JOIN [sys].[destination_data_spaces] AS [dds]
        ON [p].[partition_number] = [dds].[destination_id]
    JOIN [sys].[filegroups] AS [f]
        ON [dds].[data_space_id] = [f].[data_space_id]
WHERE OBJECT_NAME([p].[object_id]) = 'AuditPartitioned';

    
   

Let’s rename the tables so our applications and scripts are interacting with our newly partitioned table rather than the old version:

    
     EXEC sp_rename 'dbo.Audit', '_Audit';
GO
EXEC sp_rename 'dbo.AuditPartitioned', 'Audit';

    
   

…and finally, drop the ‘old’ Audit table:

    
     DROP TABLE [dbo].[_Audit];
    
   

Our data is now partitioned so that we can target our backups and restores to the only the data we care about. Any new rows or modified rows will be allocated to the correct partition.

In our example above, if we back up and restore only the Primary Filegroup, we’re handling only 856 rows from the Audit table, rather than the entire set:

    
     BACKUP DATABASE [StackOverflow2010]
FILEGROUP = 'PRIMARY'
TO  DISK = 'C:\Databases\Backups\StackOverflowPrimaryFB.bak'
WITH COPY_ONLY; /*optional*/
 
RESTORE DATABASE [StackOverflow2010] FILEGROUP = 'PRIMARY'
FROM DISK = 'C:\Databases\Backups\StackOverflowPrimaryFG.Bak';

    
   

We hope you find this technique useful when backing and restoring your production databases into test and development environments!

If you’re struggling to speed up your backup and restore processes and need some assistance, please contact us or take a look at our services.

Thanks for reading!

Subscribe to our newsletter to be the first to hear about new features, vacancies, and industry news.

Shaun Austin

Shaun Austin

Shaun has been working with SQL Server for over 20 years, as both a developer and DBA. He is passionate about optimising and troubleshooting SQL Server environments, ensuring that customers get the maximum value from their cloud or on-premise infrastructure. Shaun regularly blogs about SQL Server administration and performance tuning, frequently contributing to user group channels on platforms such as LinkedIn.