SQL Server … GDPR … one of these is my favourite thing to blog about … the other is GDPR.
Fortunately, this blog isn’t about GDPR! It’s about SQL Server, or more specifically, how to secure your data in SQL Server. However, if you care about GDPR (and let’s face it, we’re constantly told we should!) and you happen to have a few SQL Server databases kicking around, laying bare to the world, then it’s probably worth your while reading on.
Over recent months, most of our customers have enquired about what they should be doing database-wise to make sure they’re protecting their data. SQL Server gives you a few different options on this front, depending on what version you have. Unfortunately, there’s no one-size-fits-all answer. However, I’ll attempt to summarise the key characteristics of each of these options.
TRANSPARENT DATA ENCRYPTION (TDE)
What is it?
TDE is nothing new. In fact, it’s been a DBA staple since 2008. TDE encrypts the physical files – including backups – by using a certificate, much like most other forms of encryption. However, it does NOT encrypt the data itself.
When is this a good option?
Well…if someone walks off with your backups, or even the raw data files, they’re going to have a pretty tough time restoring your databases. So if this is a concern, then TDE is a great option. It’s also super easy to implement – your applications won’t need to change one bit!
However…
- It’s an Enterprise-only feature (boo-hoo!), and your disaster recovery (DR) server will need Enterprise too if you ever need to restore the backup!
- It doesn’t encrypt the data across the wire, so if someone intercepts your data in transit, TDE won’t protect you. If you have a rogue DBA in your data centre, TDE isn’t going to stop them simply dumping all your data into another database and walking off with it.
- If you’re prone to forgetting passwords, or losing encryption keys, then you’re not going to have a fun time with TDE.
Further reading
ALWAYS ENCRYPTED (AE)
Always Encrypted was new in SQL Server 2016. AE means that the data is always encrypted within the database, and in transit. The data can only be decrypted by a client application driver, such as OLE DB. This requires a certificate to be installed on your app server. The database server on the other hand hasn’t got a clue about how to decrypt the data. So anyone stealing your database, or even just querying AE columns through SSMS, are going to see the data in it’s encrypted state (i.e. a load of junk).
AE encrypts the data at column level, so you target specific columns that you want to encrypt, such as passwords, credit card numbers, and other bits of Personally Identifiable Information (PII). Basically, you get to pick and choose (to some extent) what data you want to encrypt.
When is this a good option?
- It’s available in all editions (yay!)
- The data is protected all the way down to the application server, so that rogue DBA in the remote data centre isn’t going to be able to see your data. Equally, anyone intercepting the data in transit will only see the encrypted version. It’s a great solution for providing separation between those who own the data and those who manage it.
- It involves very little in the way of application changes – you just need to make sure you’re using an Always Encrypted enabled driver in your application.
However…
- Remember, SQL Server doesn’t hold the encryption keys, which makes things tricky when you want to compare data. Range scan queries e.g. LIKE, <. >, BETWEEN have issues. Some of this logic will need to be shifted to your application layer. Not fun! However ask yourself, do I actually need to perform these kind of searches on my encrypted columns?
- Full-text indexes on an encrypted column aren’t supported.
- Every client application that needs access to the data (e.g. a data warehouse) needs a copy of the certificate, so there’s a maintenance overhead.
Further reading
https//www.brentozar.com/archive/2017/08/use-always-encrypted/
DYNAMIC DATA MASKING (DDM)
Yes, that’s MASK, not encrypt, hash, or any other fancy stuff. Quite simply, DDM allows you to put a configurable mask in front of a column’s data, so undeserving prying eyes can see only some, or none of it. The amount they can see depends on the masking function you apply. Phone numbers and email addresses are obvious uses here.
When is this a good option?
- You want column-level control of what data is protected.
- You want to use SQL Server permissions to control who can see behind the mask.
- You want to search the masked data using range-scans (LIKE, ‘<>’ etc).
- You want something that’s fast to apply to existing columns.
- You don’t want to modify your existing queries.
However…
- It’s not completely secure – just because you can’t see the data, doesn’t mean you can’t figure out what it is. For example, you may ‘mask’ the SALARY column in your HR database, but someone running ….WHERE Name = ‘Joe Bloggs’ AND SALARY BETWEEN @VAL1 and @VAL2…. isn’t going to have to work too hard to figure out Joe’s salary.
- If your application uses a single login throughout, DDM is going to be harder to implement. DDM relies on the login context, so you’re going to need to use EXECUTE AS in your queries, or multiple connection strings in your application. The latter has the downside of preventing connection pooling and spinning up multiple database connections.
Further reading
ROW-LEVEL SECURITY (RLS)
Like AE, RLS was new in 2016 (all editions). RLS is concerned with removing rows of data from the results of a query (as opposed to masking or encrypting), depending on permissions. If a user isn’t allowed to read/modify a particular subset of rows from a table, then they’re filtered out. The filtering is performed by a user-defined function (UDF), so you can apply your own business logic as required.
When is this a good option?
- You want to restrict access to specific rows in a table (both READS and WRITES), based on the user context.
- When you want to centralise the access-restriction logic for your application and ensure consistency.
- When you have complex business rules that govern who can modify what data.
However…
- Think about performance! You are applying a predicate function to the query and it’s going to slow it down.
- Like DDM, RLS requires apps to execute in the context of the end user, which could lead to many open connections. Using a dynamic ‘EXECUTE AS [user]’ could be one way around this.
Further reading
Hopefully this has given you a basic overview of what your options are. Which is right for you will depend on your business requirements. If you have enterprise edition, then turning on TDE from the outset seems like a no-brainer, right? The other options are going to require a little more thought. Just spare one of those thoughts for the App development team – you’re likely going to need work with them on this!
Thanks for reading!