Aug 18, 2011

Database Snapshots To Test Loads

When testing data warehouse ETL routines it’s often necessary to be able to reload the same data several times before you get the code dialed in and ready for production work.  One way to be able to do this is to simply TRUNCATE the target table, assuming there are no foreign keys on it.
If you want to save data from previous loads and only remove the records from the most recent load you can DELETE the data.  Of course this is no big deal because you have the load date in the new records, right?
Yet another method to identify the newly added records would be to use an OUTPUT clause on the INSERT statement.  Then it would be easy to identify the newly inserted records from the OUTPUT table.  Likewise, any records that are UPDATED could also be restored to their original state from the OUTPUT table.  The downside is the INSERT and UPDATE statements would need to be modified to use the OUTPUT clause for testing purposes.  Then you would have to DELETE newly inserted records and UPDATE the updated records to restore the database to it’s original state.  If several tables are involved in the ETL process you have your hands full to capture the new and changed records and then reset them back.
A much easier way to reset the data warehouse to its original state after an ETL test load is to use a database SNAPSHOT.  To me this whole concept sounded kind of intimidating but after I got past my reluctance and gave it a try I realized the power of this tool in my SQL toolbox.
This post will provide a general introduction to what a database snapshot is…  To demonstrate how snapshots can be used to keep data fully recoverable, even in an environment where database backups are not an option… And ultimately, to increase our understanding and awareness of database snapshots.

Advantages of Snapshots
Easy method to roll back changes to the database to a previous point in time.
It is a suitable alternative to saving a table to a tmp_ table; saving data to an OUTPUT table; rolling back a transaction.
Snapshots are not limited to data.  Changes to stored procedures, tables, triggers and other DDL structures can also be reverted back to their original state.
Reverting the database to its original state from a SNAPSHOT is just one T-SQL command.
SNAPSHOT databases take up a minimal amount of space based on the amount of data that is changed or added.
Disadvantages of Snapshots
They do take up some space on the server.  The more the data changes, the more the snapshot will grow.
To revert the database back using the snapshot you’ll need to have exclusive access to the database.  Kick everyone out.  Still, this was also be the case if you needed to restore data from a backup.
Snapshots can be abused.  It could be very easy to have a bunch of ever growing snapshots that no body knows who owns them.
There is a minimal performance hit when writing the existing records out to the snapshot.
If a db is reverted back it could undo the DDL work of someone else.  This is all the more reason to save stored procedures and T-SQL to a source control system.

So let’s set how snapshots really work by doing a test run.
The first step is to take create the snapshot database.  Although you can see snapshots in SQL Server Management Studio (SSMS) there is no way to create a snapshot through the interface.   It can only be created by executing a simple T-SQL command.
Before we can create the snapshot we’ll need to know what data files make up the database we’re shooting.
USE master
GO

EXEC sp_helpdb AdventureWorks;


From this command we just learned that the data file for this database is on:
C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\SnapShots\AdventureWorks_Data.mdf
In a production database there will probably be several data files involved.  No worry.  For the purposes of snapshots they are all the same.
Now that we know the data files we can create the snapshot:
CREATE DATABASE AdventureWorks_SS ON
(  NAME = AdventureWorks_Data,
   FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\SnapShots\AdventureWorks_Data.mdf' )
AS SNAPSHOT OF AdventureWorks;

GO
In this case, all we did was create a database with the “_SS” extension and we stored the datafile in a SnapShots directory.  You can use whatever convention you want to name your database and storage location.  I like to keep my snapshot storage separate from my other storage locations.
Also, if the SnapShots directory does not exist the CREATE DATABASE command will fail.  You must have the directory created BEFORE executing the command.  If you don’t, the error message will be clear.
You have noticed that we are not executing this command from the AdventureWorks database itself.  You’ll need to run the command from any other database on the server.  Master is always a good choice for these type of activities.
At initial creation time the SNAPSHOT database does not have any records.  Any  INSERT, UPDATE and DELETE operations against the AdventureWorks database will result in the “original” records being written to the snapshot.
USE AdventureWorks;
GO

UPDATE    Person.Contact
SET        Suffix = 'XVIII'
WHERE    LastName = 'Powell';
--(116 row(s) affected)

SELECT    TOP (5) FirstName,
        LastName,
        Suffix
FROM    Person.Contact
WHERE    LastName = 'Powell';
Isabella    Powell    XVIII
Natalie    Powell    XVIII
Alexandra    Powell    XVIII
Sydney    Powell    XVIII
Katherine    Powell    XVIII
So we just updated all 116 of the Powells to have a suffix if XVIII.  I don’t know about you but I don’t know any XVIIIs… Except Louis.  This update was probably a mistake.
If we didn’t have a snapshot of this data we would have been in trouble.  We just don’t know how many of these folks had suffixes such as Sr., Jr., III, IV, MD, JD, etc.
Let’s use our snapshot to restore the database to it’s original state:
USE master;
GO

RESTORE DATABASE AdventureWorks FROM DATABASE_SNAPSHOT = 'AdventureWorks_SS';


This “restore” took 5 seconds on my slow, 4-year old laptop.  Certainly no worse than the amount of time it would have taken me to restore from a backup, if I had one, or from a temp table.
But did it work?

SELECT    COUNT(*)
 FROM    Person.Contact
 WHERE    LastName = 'Powell'
 AND    Suffix = 'XVIII';

This query returned ZERO, nary, nilch, 0.
Now for the last step, clean up your work and delete the snapshot database.
DROP DATABASE AdventureWorks_SS;


And you are finished.
Conclusion:
Although there are some limitations, snapshots offer a low cost option, in time and system resources, to rollback the state of an entire database to a previous point in time without the need to restore the database from backups.  The amount of disk space needed for snap shots is directly proportional to the amount of data that actually changed, not the size of the original database.  The amount of time to create the snapshot is minimal, only requiring the execution of a single T-SQL command.  The time and effort to restore the database to the point in time when the snapshot was taken is minimal, assuming millions of records have not changed.

No comments:

Post a Comment

Hi,

Thanks for your visit to this blog.
We would be happy with your Queries/Suggestions.