3 Ways to Synchronise Data Between SQL Server and Azure SQL Database

An increasing number of businesses are moving towards utilising Microsoft cloud technologies to make their data highly available to other applications, processes and people. If you are one such business, then you may well find yourself wondering how you can synchronise your critical on-premises SQL Server data with Azure SQL Database.

Synchronise Data Between SQL Server and Azure SQL Database Icon

In this blog, we’re going to introduce three very different ways to synchronise your data, so that you can make an informed decision based on your use-case.

Transactional Replication

SQL Server transactional replication is a well-established technology for copying data and objects from one database (publisher) to other databases (subscribers). Replication has been around for a long time, and although it’s not perfect, it’s reliable and well understood.

It’s an excellent choice for this scenario or almost any scenario where you want to synchronise databases because:

  • Synchronise only the data you need to – Replication allows fine control over what data is synchronised. Let’s say you have a 2TB on-premise database. Replication allows you to select the specific tables you’re interested in and even filter that data (e.g. based on a date column), so there’s no need to sync that 500GB log table if you don’t need it!
  • Low cost – Replication is available in all supported versions and editions of SQL Server. The more basic editions support subscriber-only (and there are some limitations when it comes to backwards compatibility). Furthermore, there is no specific additional cost for replicating to Azure (beyond the cost of the Azure SQL database itself). This makes it a fairly low-cost option compared to alternatives.
  • Tune the database for the Azure workload – In a replication configuration, subscriber databases are not read-only, which means that you can add additional tables, views etc, and you can also create a completely different set of indexes than those in the publisher database. It’s very common for the workload of a synchronised database to be different from the workload of an on-premises OLTP-style database (reporting being a common scenario here), and those queries will often benefit from a targeted set of indexes.
  • Guaranteed transactional consistency – Replication works by reading transactions from the transaction log and executing them against the subscriber database. A set of statements within a transaction is guaranteed to either succeed or fail, so only part of a transaction will never be applied to a subscriber, even for just a brief period of time. This isn’t the case with some of the alternatives discussed below.
  • Low performance impact – The day-to-day performance impact of implementing transactional replication on the publisher database/server is pretty low, although the initial snapshot generation is often best completed out-of-hours if you’re dealing with a large database.

However, before choosing this way of synchronising your data, you’ll need to consider the following:

  • Initial setup can hurt – If the database is very large, then the initial snapshot creation and synchronisation can take a long time, and adversely affect the performance of the source database or server. It can also cause a significant amount of network traffic. For this reason, it may be best to add tables to your publication incrementally.
  • One-way synchronisation only – As explained in the Microsoft documentation, Azure SQL Database can be a push subscriber in a one-way transactional or snapshot replication topology. Bi-directional replication, such as merge replication is not supported.

Azure Data Factory Integration

Azure Data Factory (ADF) is a service for creating data integration pipelines. The pipelines can be configured to pull data from numerous types of data source, including an on-premise SQL Server database(s). This data can then be stored in a target Azure SQL database(s). ADF is ideally suited to synchronising a subset of the data, rather than the entire dataset, due to the complexities in managing related data.

When synchronising data with Azure SQL Database, ADF can be a good choice when you require the following:

  • Transform and enrich data – If you’re looking to do more than simply synchronise your data, then ADF provides the opportunity to transform, enrich and apply custom business logic to your data, prior to storing it.
  • Synchronise only the data you need to – When synchronising with an on-premises database, then the data you synchronise is typically based on the queries you write, so you have complete control over what data and objects are synchronised.
  • Error handling and alerting – ADF allows you to build custom pipelines with robust error handling and email workflows.
  • You can’t modify the source database in any way – Replication requires both modifications to the source database and the creation of a distributor In circumstances where this is not possible or desirable, ADF offers a viable alternative.

However, before choosing this way of synchronising your data, you’ll need to consider the following:

  • Complex to implement – When compared to transactional replication, setting up synchronisation using ADF is a more manual process. You’ll need to identify which rows have changed in your source database (since your pipeline last ran). If transactional consistency with the source database is important, then this could be very complex to implement in a large database that has many related tables.
  • Heavier performance impact – You’ll need to write your own queries in order to pull out the data you want to synchronise. These queries will inevitably impact other transactions that are running on the source database (and let’s face it, we don’t always write good queries!). Transactional replication on the other hand, reads transactions from the transaction log and therefore the impact on other transactions is usually minimal.
  • Higher cost – ADF is a paid service and you should familiarise yourself with the pricing model.

SQL Data Sync for Azure

SQL Data Sync is an Azure service that allows for the synchronisation of a group of databases, for example, between those in Azure and on-premises.

Given the name, you may well assume this to be the best option. Without a doubt, there are scenarios where it’s a great choice because:

  • Bi-directional synchronisation – SQL Data Sync allows you to synchronise both ways between SQL Server and Azure SQL Database at the database level (or group of databases). Transactional replication does not support this.
  • Eventual consistency is guaranteed – Microsoft guarantees that all changes will be synchronised and no data loss will occur.

However, before choosing this way of synchronising your data, you’ll need to consider the following:

  • Table limitations – There are numerous limitations that affect which tables can be synchronised and some of them are extremely common to come across, such as no support for columns with user-defined types.
  • The Hub database must be an Azure SQL Database – SQL Data Sync uses a ‘hub and spoke’ model, whereby the Hub database is responsible for synchronising the changes between each member database (Spoke). Each member database can be either a SQL Server or Azure SQL database, but the Hub database must be an Azure SQL database. The resource requirements for the Hub database can be significant, especially if you add a lot of databases to the sync group.
  • Heavier performance impact SQL Data Sync uses triggers to track the changes to the source tables. This typically will have a higher performance impact on the source database than transactional replication.
  • Lack of transactional consistency – Although eventual consistency is guaranteed, transactional consistency is not. If your downstream processes depend on the consistency of your data, this may be problematic.
  • Sync the entire table – Unlike with transactional replication, SQL Data Sync for Azure doesn’t natively support filtering the data horizontally within a table, although you can choose which tables and columns you want to include.
    •  

For a comparison with transactional replication, see this Microsoft article.

Can Log Shipping be used to synchronise data between SQL Server and Azure SQL Database?

This question has been raised in these conversations in the past, so I wanted to address it. Log Shipping isn’t supported by Azure SQL database. That aside, it often isn’t the best choice for a readable secondary environment due to its requirement to put the database in single-user mode whilst restoring the T-Logs.

Conclusion

So there you go – three different options, each with different pros and cons. The most common requirement we see at Koderly is a one-way synchronisation between SQL Server and Azure SQL Database. In that scenario, it’s hard to look past transactional replication as the tool of choice.

If you’re interested in talking to us about our database services, then please contact us.

Thanks for reading!

Subscribe to our newsletter to be the first to hear about new features, vacancies, and industry news.

Picture of Shaun Austin

Shaun Austin

Shaun has been working with SQL Server for over 20 years, as both a developer and DBA. He is passionate about optimising and troubleshooting SQL Server environments, ensuring that customers get the maximum value from their cloud or on-premise infrastructure. Shaun regularly blogs about SQL Server administration and performance tuning, frequently contributing to user group channels on platforms such as LinkedIn.