Aug 18, 2011

DELETE using SYNONYMS

It’s all about how we used a synonym for a table in a DELETE statement.
Since SYNONYMS are not commonly used, I’ll take a moment to describe them. Synonyms, besides being tough to say, are used in SQL Server to create an alternate name for a database object.  In my case, I like to use synonyms to replace long table names.  I don’t use synonyms in production code, but if I’m testing and doing corrective updates they are fair game.
I’m not going to use the actual tables to demonstrate this anomaly, lest the legal team pay me an unwanted visit.
In this block of code I’ll create a fictitious table, create a synonym for it (“lt” = LONG TABLE), load data into the table using the synonym, and select from the synonym to show the data actually loaded:
USE tempdb;
GO
CREATE TABLE dbo.this_is_one_long_table_name (    id    int,
whocaresaboutdata varchar(50));
CREATE SYNONYM lt FOR dbo.this_is_one_long_table_name;

INSERT INTO lt VALUES    ( 1, 'I do'),
                        ( 2, 'Do You?');

SELECT * FROM lt;

The result of that query is:
image

So far nothing too exciting.
Anyone who has ever had to do anything other than the most simple DELETE knows that DELETE can have two FROM clauses.  It’s awkward, but that’s how it is.
First let’s finish loading our little table…
INSERT INTO dbo.this_is_one_long_table_name (id)
    VALUES    ( 3 ), (4), (5), (6), (7), (8), (9), (0);

SELECT * FROM lt ORDER BY id;

OK, so now we have 0 – 9 loaded into our test table.  We don’t need much data to run this test.
Now let’s delete some data out of our table.  To do that we’ll create another table, delete_these_records, that will have the IDs that we want to delete:

CREATE TABLE dbo.delete_these_records ( id    int );
INSERT INTO dbo.delete_these_records VALUES (2), (4), (8);

-- DELETE the data using a JOIN and SYNONYM
BEGIN TRAN
DELETE FROM dbo.this_is_one_long_table_name
FROM    lt lt
        INNER JOIN dbo.delete_these_records dr
        ON dr.id = lt.id;
(10 row(s) affected)    

ROLLBACK;

If you’re following this, you’ll expect the above query to delete the rows where the ID IN (2, 4, 8).  Instead, it deleted all of the records in this_is_one_long_table_name.  Another happy time for ROLLBACK.
It turns out, after some experimentation and review of the execution plans, the issue is in the differences in the first and second FROM clauses.  If both clauses consistently use either the full table name in both places OR uses the synonym in both places, the query will work as expected.
For example, both of these queries will work as expected, ie., deletes IDs 2, 4 and 8.
BEGIN TRAN
DELETE FROM dbo.this_is_one_long_table_name
FROM    dbo.this_is_one_long_table_name lt
        INNER JOIN dbo.delete_these_records dr
        ON dr.id = lt.id;
ROLLBACK

BEGIN TRAN
DELETE FROM lt
FROM    lt lt
        INNER JOIN dbo.delete_these_records dr
        ON dr.id = lt.id;
ROLLBACK

At this point in time I surmised that if a table has a synonym associated with it, and a DELETE statement uses BOTH the actual table name and the synonym in the same query, the second FROM clause will basically get ignored.
It turns out, that if a careless SQL developer created the second FROM clause that did not link back to the table being deleted, ALL OF THE ROWS WILL BE DELETED.  The second FROM clause and any JOINs in it will be logically ignored.  No warnings.  No errors.  All data is gone!
The following example will delete ALL 10 records.  The second FROM clause does not include a reference to the table in the first FROM clause.  The query passes the parser, algebrizer and optimizer…  and proceeds to delete the entire table.

BEGIN TRAN
DELETE FROM dbo.this_is_one_long_table_name
FROM    dbo.delete_these_records d1
        INNER JOIN dbo.delete_these_records d2
        ON d1.id = d2.id;
ROLLBACK

I first observed this query behavior in SQL 2005 Enterprise Edition.  The query for this post is in SQL Server 2008 Developer Edition.  The version of SQL Server does not appear to be a factor.
CONCLUSION:  The second FROM clause in a DELETE statement must exactly reference the table name in the first FROM clause.  If it doesn’t, the DELETE will act as if the second FROM clause doesn’t even exist.  (Actually, that’s not quite true either, because a review of the query plan will show the filter taking place in the second FROM clause and then JOINs the entire dataset… resulting in all records being deleted.  So the query is actually doing more work than a straight forward DELETE.)
The use of SYNONYMS MUST BE CONSISTENTLY APPLIED to both FROM clauses.  If one clause uses a SYNONYM and the other the actual table name, the second FROM clause will be logically disconnected from the query and ALL RECORDS IN THE TABLE will be deleted.
Going one step further, we also recognized that if the second FROM clause does not include a table reference to the first FROM clause, that table that is actually being deleted, the DELETE will ignore the second FROM clause and proceed to delete all of the records in the target table.

Uncommitted Transactions

One way to find out if a transaction is causing a block is with the sp_who2 system stored procedure.  The scope of this procedure is server-wide.  It identifies all connections to the database, the login name, the application that is running, and in the context of locks and blocks, any SPIDs that are causing a block.  The offending SPID will be in the BlkBy column.
I often find myself using transactions when doing DELETE or UPDATE operations to allow myself the option of ROLLING back the transaction in case something goes awry.  As equally often, I forget to COMMIT the transaction and move on the the next order of business, resulting in a lock on the table and essentially blocking everyone out of it’s use (unless they use the with (nolock) hint in their query).
If I ran the following query and forgot to commit the transaction what would happen?
USE AdventureWorks;
GO

BEGIN TRAN
INSERT INTO person.contact (FirstName, LastName, EmailAddress, PasswordHash, PasswordSalt)
VALUES ('Dave', 'Turpin', 'dave.turpin@acm.org', 'goblygook', 'Salt');

The table would lock up until either a commit or rollback happens.  But how would I know if there are any open transactions, either mine or somebody else?  sp_who2 would not indicate any blocks because there aren’t any.  There’s only an open transaction.  But as long as no other processes are tying to use that table there will not be any blocking/locking issues.
DBCC OPENTRAN will list the earliest open transaction.  In our simple example, the DBCC will return:
image
In this case we can see that SPID 54 had a transaction open since 5:08AM.  Now we can take a look at SPID 54, see who it is, and initiate either a rollback or commit.  Chances are if an uncommitted transaction is hanging out there there’s either a major bug in someone’s code, we have a legitimate locking/blocking problem, or its an ad hoc batch.
Since DBCC OPENTRAN only identifies the earliest open transaction, you may have to resolve someone else’s transaction before actually getting to yours.  Hopefully there will not be any open transactions for very long and the point is moot. 

Escaping a Quote

I needed to query a column in a table that had a single quote embedded in the string.
This is one of those little SQL tricks that can really slow you down.  I don’t know when or where I learned it but knew right away to escape the single quote with another single quote.
USE master;
GO

SELECT 'D'Agostino';

This doesn’t work because the parser sees the quoted ‘D’ but doesn’t know what to do with the trailing Agostino’ string.

SELECT 'D''Agostino'

This query works!  The first quote is the start of the string.  The second quote is the ESCAPE for the third quote.  Basically the ESCAPE makes the third quote a literal string.  Finally the fourth quote ends the string.  The parser knows what this string is and works just fine.
SELECT 'D''Agostino'''

In the above query the result set looks like: D’Agostino’.  At first glance it looks a little awkward but the 3rd from last quote is the ESCAPE for the second from last quote; the last quote simply terminates the string.
Are there other ways to do the same thing?  In SQL Server there are ALWAYS other ways to get things done and this one is no exception.
We could use the CHAR function to generate the ASCII character for the single quote.  In this case it will look like this:
SELECT 'D' + CHAR(39) + 'Agostino'
 
This is simple stuff, but only when you know it.  If you don’t know it you can spend way too much time figuring it out on your own.

Indexed Views - Benefits/Side Effects

Indexed views, also known as materialized views, can offer significant performance benefits when properly implemented.
A co-worker asked me to take a look at a query he was running that was behaving in a bizarre way. The query seemed simple enough…  A couple joins, and 2 columns in the result set.  The problem was that the WHERE clause had a WHERE column_value = 1 in it, but it wasn’t working.  To make it even more perplexing, the identical query was working on a copy of the database, on a different server.  The copy had an additional days worth of data in it; otherwise the two databases were identical.
We went through the typical checks… data types of the columns, indexes, consistency of the tables between the two databases.  We looked at the compatibility level of the databases.  We looked at the service pack levels of the two engines.  Everything matched up perfectly.
Finally, we looked at the actual query plan that was used.  Voila!  The database that was not returning the expected result set was using a view, an indexed view, to get the data; the other database that was working correctly was using the base table.
So here’s the lesson…  Indexed views may be substituted by the query optimizer if it is determined that the cost of the query will be less than using the base table.  (Note:  This only applies to Enterprise, Developer and Data Center editions of SQL Server).
In our particular case, the indexed view that was chosen by the optimizer filtered out the desired rows the query was intended to select.  The optimizer chose this view, probably because there were way fewer rows in the view than the base table.
Unfortunately for my associate, the database he was working on uses indexed views heavily.  I’ll even go as far as to say indiscriminately.  An additional challenge is that the data is accessed from the database with third party tools that may not have the ability to force the query plan to use the base table.  The good news is that the applications seldom access the data from the base tables directly, so the unintentional substitution of the incorrect index view for the base table may be the exception.
We identified 3 possible workarounds that will guarantee that the optimizer would not substitute the indexed view (would use the base table) even if the query plan cost is less:
1.  Use the table hint WITH (EXPAND).  Not a great option considering the third party reporting tools.  It would also need to be applied to every SELECT statement (and possible UPDATE and DELETE too!)
2.  Downgrade to SQL Server Standard edition.  This was not very popular…
3.  Set one of the SET OPTIONs that are necessary for the index view to be used by the optimizer, such as SET NUMERIC_ROUNDABOUT to ON, instead of OFF.  The other potential candidates are ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL and QUOTED_IDENTIFIER
We tested #1 and #3 and they both forced the query optimizer to utilize the base table instead of the indexed view.  Option #2 we’ll leave as an exercise for the reader.
I looked to see if there is an option to over-ride the auto substitution of the view for the base table when the view is created, ie., to emulate the view behavior as if the edition was STANDARD, but did not see anything.
SQL Server BoL describes the many benefits of using indexed views, including the benefit that the query optimizer will use an indexed view instead of the base table if the cost of the plan is the least.  Unfortunately, in our case, the view didn’t contain the data we needed.  I recall reading about horizontally partitioned indexed views, which at the time I interpreted as partitioned tables / views (partition function, partition scheme, etc), so I may need to go back and dig deeper in this area.  By creating a view that is filtering on a column’s value, we have essentially partitioned the base table.
Another cost of indexed views is that the data is actually stored.  By applying the clustered index we are creating a copy of the data.  So if you have several indexed views on a single table, you essentially have several copies of the data, along with the statistics of the data and the indexes and all of the overhead that goes with it.
It only seems reasonable that new records inserted into the base table will also result in inserted rows in the indexed view(s) that are SCHEMA bound to the table.  Likewise with DELETE and UPDATE operations… the cost of these operators will be multiplied by the number of indexed views.
And anyone who indiscriminately implements indexed views that call indexed views… layers of indexed views… should be publicly flogged.  That’s just my opinion and is unlikely to be deployed.
CONCLUSION:  The overuse of indexed views can be like winding a watch with a wrench.  It seems like it’ll work great but something will eventually break.  In a way it’s like the over use of triggers… the invisible stored procedures…  They have their place in production systems, but they should be implemented in moderation and with a full understanding of their positive and negative impacts.  Unintended side effects can have significant costs that totally negate the upside of their use in the first place.
Obviously, this is a big topic and this posting just raises some of the issues.  I’m certain others have hit this issue and have additional methods to avoid some of the ill-intended side effects of improper/over use of indexed views.

Jobs Running Time

Knowing the amount of time a job takes is relevant.  But knowing how much deviation from the “norm” is also useful.  Even more useful, would be to plot the job run times over time.  In this context it might be interesting to spot jobs that are taking longer to run as the data sets grow.
Although I did not attempt to handle the time based plot, getting the Agent runtimes has value.  You could take it one step further and place the call to the stored procedure in an excel file and send it to your manager.  All she would have to do is hit Data==>Refresh.
Just a little background.  For starters, the SQLAgent information is stored in the msdb database.  We’ll capture the job name out of the sysjobs table and the job run times out of the sysjobshistory table.
Regardless, here’s what the code looks like:
USE [msdb]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[usp_jobs_summary_report]
AS

SELECT    j.[name],
        COUNT(jh.run_duration) AS 'Sample Size',
        CONVERT(DECIMAL(4, 2),
            MIN((    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 1, 2)) * 3600) +
                    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 3, 2)) * 60) +
                     CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 5, 2)))/3660.0))
                        AS 'MIN Runtime (hours)',
        CONVERT(DECIMAL(4, 2),
            MAX((    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 1, 2)) * 3600) +
                    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 3, 2)) * 60) +
                     CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 5, 2)))/3660.0))
                        AS 'MAX Runtime (hours)',
        CONVERT(DECIMAL(4, 2),
            AVG((    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 1, 2)) * 3600) +
                    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 3, 2)) * 60) +
                     CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 5, 2)))/3660.0))
                        AS 'AVG Runtime (hours)',
        CONVERT(DECIMAL(4, 2),
            STDEV((    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 1, 2)) * 3600) +
                    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 3, 2)) * 60) +
                     CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 5, 2)))/3660.0))
                        AS 'Std Dev.',
        CONVERT(DECIMAL(4, 2),
            VAR((    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 1, 2)) * 3600) +
                    (CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 3, 2)) * 60) +
                     CONVERT(int, SUBSTRING(STR(jh.run_duration, 6, 0), 5, 2)))/3660.0))
                        AS 'Variance'
FROM        sysjobs j
        INNER JOIN
            sysjobhistory jh
                ON    jh.job_id = j.job_id
WHERE    jh.run_status = 1    -- Succeeded
AND        jh.step_id = 0        -- Outcome
GROUP BY j.name
ORDER BY 5 DESC

No rocket science here.  Just plain old SQL.
And here’s the useless output from my notebook instance of SQL Server.  I’m happy to be getting the one row I got.  Try running this on a production server… then it’ll have real value.
image

Control the Size of UPDATE Transactions

In several of my production databases I have to deal with tables that have hundreds of millions of records.  Querying these tables are challenging; doing updates to them can be nightmarish.
I recently had to add a column to a table that had 65 million records.  The table is an aggregate table so the new column needed to be loaded from a table that had 180 million records.
Doing an unmanaged UPDATE would only serve to fill the transaction log and waste a bunch of time and system resources.
After doing a little tinkering I was able to break the transactions into pieces by using a view.  First let’s layout the setup for this.
CREATE TABLE tmp_top_view ( id           int   identity (1, 1),
                            ContactID    int,
                            test_bit     bit,
                            loopcount    int );

INSERT INTO tmp_top_view ( ContactID )
    SELECT    ContactID
    FROM    Person.Contact;

CREATE VIEW v_tmp_top_view AS
    SELECT    TOP 500 id
    FROM    tmp_top_view  WITH (NOLOCK)
    WHERE    test_bit is null;

The tmp_top_view table would be the production table we want to update.  In a production situation it would have tens of millions of records.  Certainly more than you would want to have in a singe UPDATE transaction.
The test_bit  column is the target column we want to update from some other table, or just set to a value.
The loop_count column is just something we’ll use to show how many records get updated per transaction.
The INSERT statement is just populating the table with some dummy data from the AdventureWorks Person.Contact table.  In this case we are inserting 19,972 records into our table.
So far nothing interesting…
The CREATE VIEW is a key part of the solution.  You’ll notice the TOP 500 in the SELECT statement of the view.  This is literally setting the max size of the UPDATE transactions.
The next key piece of the puzzle is the WHERE clause that filters on NULL test_bit.  This column will be updated in our query, below.  When the column becomes non-NULL it will drop out of the view.
Also, very important, notice the use of the NOLOCK hint.  Without this hint the batch will run much slower.
OK, we’re all setup.  Now let’s get some work done:
DECLARE @rowcount    int,
        @loopcount    int;

SELECT  @rowcount = 1,
        @loopcount = 1;

WHILE @rowcount > 0
BEGIN
    BEGIN TRAN

    UPDATE  t
    SET     test_bit = 1,
            loopcount = @loopcount
    FROM        tmp_top_view t
            INNER JOIN v_tmp_top_view v
                ON v.id = t.id;

    SELECT  @rowcount = @@rowcount,
            @loopcount += 1;

    COMMIT;
END

Observer the test_bit column getting set to one.  The value that we are setting it to does not matter.  What’s important is that it’s no longer NULL.  Once it stops being NULL, it stops being a member of the view.
Fortunately, in practice, the newly updated record doesn’t drop out of the view immediately.  If it did the view would keep adding records to itself and the entire 19,972 record dataset would be updated in one transaction, which is exactly what we are trying to avoid.
In this sample set the data gets updated with 40 transactions; the first 39 are 500 record transactions; the last transaction updates 472 records.
To see how the records were updated use the following query:
SELECT  loopcount,
        COUNT(*) AS [Count of loopcount]
FROM        tmp_top_view
GROUP BY loopcount
ORDER BY loopcount;


Finding ways to neatly break up large transactions is an art as well as a science.  What I didn’t discuss here is the 2 hours I spent trying to figure out the optimal transaction size to get all 64M records to update in the fastest time.
A key piece of the performance ingredient is to use the NOLOCK hint in the view definition.
I attempted to populate an indexed temp table to capture a subset of the main table (instead of the TOP 500 view) but found the time to write to the temp table and manage the index was too high.
I took a swag at using an indexed view but that doesn’t fly because of the use of the TOP in the view definition.
What solutions have you devised to break up large UPDATE transactions?  Have you found anything unique about DELETE vs UPDATE vs INSERT transactions?  Send me a pointer to your solutions!

Quick Startup of SSMS

Have you ever noticed how long it takes to start up SQL Server Management Studio (SSMS)?  I always assumed it was because it is the biggest and most powerful tool in my toy box and it deserves the load time.
You can SIGNIFICANTLY reduce the load time by avoiding Microsoft’s marketing message at startup, the splash screen.
image
To automatically skip over the splash screen follow these simple steps:
1. START ==> RIGHT click the SSMS icon ==> Properties
2. Click the Shortcut tab.
3. Add the option -nosplash at the end of the target string, outside of the quotes.  The final string should look something like:
“C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\SqlWb.exe” -nosplash
4. Hit OK.
5. Start up SSMS and watch your wait time drop!
This also works with MS Visual Studio.

Database Snapshots To Test Loads

When testing data warehouse ETL routines it’s often necessary to be able to reload the same data several times before you get the code dialed in and ready for production work.  One way to be able to do this is to simply TRUNCATE the target table, assuming there are no foreign keys on it.
If you want to save data from previous loads and only remove the records from the most recent load you can DELETE the data.  Of course this is no big deal because you have the load date in the new records, right?
Yet another method to identify the newly added records would be to use an OUTPUT clause on the INSERT statement.  Then it would be easy to identify the newly inserted records from the OUTPUT table.  Likewise, any records that are UPDATED could also be restored to their original state from the OUTPUT table.  The downside is the INSERT and UPDATE statements would need to be modified to use the OUTPUT clause for testing purposes.  Then you would have to DELETE newly inserted records and UPDATE the updated records to restore the database to it’s original state.  If several tables are involved in the ETL process you have your hands full to capture the new and changed records and then reset them back.
A much easier way to reset the data warehouse to its original state after an ETL test load is to use a database SNAPSHOT.  To me this whole concept sounded kind of intimidating but after I got past my reluctance and gave it a try I realized the power of this tool in my SQL toolbox.
This post will provide a general introduction to what a database snapshot is…  To demonstrate how snapshots can be used to keep data fully recoverable, even in an environment where database backups are not an option… And ultimately, to increase our understanding and awareness of database snapshots.

Advantages of Snapshots
Easy method to roll back changes to the database to a previous point in time.
It is a suitable alternative to saving a table to a tmp_ table; saving data to an OUTPUT table; rolling back a transaction.
Snapshots are not limited to data.  Changes to stored procedures, tables, triggers and other DDL structures can also be reverted back to their original state.
Reverting the database to its original state from a SNAPSHOT is just one T-SQL command.
SNAPSHOT databases take up a minimal amount of space based on the amount of data that is changed or added.
Disadvantages of Snapshots
They do take up some space on the server.  The more the data changes, the more the snapshot will grow.
To revert the database back using the snapshot you’ll need to have exclusive access to the database.  Kick everyone out.  Still, this was also be the case if you needed to restore data from a backup.
Snapshots can be abused.  It could be very easy to have a bunch of ever growing snapshots that no body knows who owns them.
There is a minimal performance hit when writing the existing records out to the snapshot.
If a db is reverted back it could undo the DDL work of someone else.  This is all the more reason to save stored procedures and T-SQL to a source control system.

So let’s set how snapshots really work by doing a test run.
The first step is to take create the snapshot database.  Although you can see snapshots in SQL Server Management Studio (SSMS) there is no way to create a snapshot through the interface.   It can only be created by executing a simple T-SQL command.
Before we can create the snapshot we’ll need to know what data files make up the database we’re shooting.
USE master
GO

EXEC sp_helpdb AdventureWorks;


From this command we just learned that the data file for this database is on:
C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\SnapShots\AdventureWorks_Data.mdf
In a production database there will probably be several data files involved.  No worry.  For the purposes of snapshots they are all the same.
Now that we know the data files we can create the snapshot:
CREATE DATABASE AdventureWorks_SS ON
(  NAME = AdventureWorks_Data,
   FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\SnapShots\AdventureWorks_Data.mdf' )
AS SNAPSHOT OF AdventureWorks;

GO
In this case, all we did was create a database with the “_SS” extension and we stored the datafile in a SnapShots directory.  You can use whatever convention you want to name your database and storage location.  I like to keep my snapshot storage separate from my other storage locations.
Also, if the SnapShots directory does not exist the CREATE DATABASE command will fail.  You must have the directory created BEFORE executing the command.  If you don’t, the error message will be clear.
You have noticed that we are not executing this command from the AdventureWorks database itself.  You’ll need to run the command from any other database on the server.  Master is always a good choice for these type of activities.
At initial creation time the SNAPSHOT database does not have any records.  Any  INSERT, UPDATE and DELETE operations against the AdventureWorks database will result in the “original” records being written to the snapshot.
USE AdventureWorks;
GO

UPDATE    Person.Contact
SET        Suffix = 'XVIII'
WHERE    LastName = 'Powell';
--(116 row(s) affected)

SELECT    TOP (5) FirstName,
        LastName,
        Suffix
FROM    Person.Contact
WHERE    LastName = 'Powell';
Isabella    Powell    XVIII
Natalie    Powell    XVIII
Alexandra    Powell    XVIII
Sydney    Powell    XVIII
Katherine    Powell    XVIII
So we just updated all 116 of the Powells to have a suffix if XVIII.  I don’t know about you but I don’t know any XVIIIs… Except Louis.  This update was probably a mistake.
If we didn’t have a snapshot of this data we would have been in trouble.  We just don’t know how many of these folks had suffixes such as Sr., Jr., III, IV, MD, JD, etc.
Let’s use our snapshot to restore the database to it’s original state:
USE master;
GO

RESTORE DATABASE AdventureWorks FROM DATABASE_SNAPSHOT = 'AdventureWorks_SS';


This “restore” took 5 seconds on my slow, 4-year old laptop.  Certainly no worse than the amount of time it would have taken me to restore from a backup, if I had one, or from a temp table.
But did it work?

SELECT    COUNT(*)
 FROM    Person.Contact
 WHERE    LastName = 'Powell'
 AND    Suffix = 'XVIII';

This query returned ZERO, nary, nilch, 0.
Now for the last step, clean up your work and delete the snapshot database.
DROP DATABASE AdventureWorks_SS;


And you are finished.
Conclusion:
Although there are some limitations, snapshots offer a low cost option, in time and system resources, to rollback the state of an entire database to a previous point in time without the need to restore the database from backups.  The amount of disk space needed for snap shots is directly proportional to the amount of data that actually changed, not the size of the original database.  The amount of time to create the snapshot is minimal, only requiring the execution of a single T-SQL command.  The time and effort to restore the database to the point in time when the snapshot was taken is minimal, assuming millions of records have not changed.

Search Column Name

The following query will search the entire AdventureWorks database for any column that looks like ‘address’.
USE AdventureWorks;
GO

SELECT    OBJECT_NAME(id),
        name
FROM    syscolumns
WHERE    name like '%address%'
ORDER BY 1 ASC;

Here’s what we get for the output:
image
You can also see it returns 32 rows that includes tables and views, greatly narrowing your search.  With a tiny bit of insight you can probably narrow your search to 2 or 3 tables.
It’s a simple little query but one that always saves me a bunch of time. 

List of Databases

If you plug this query into a SQL Server Central Management server you’ll have a snapshot of all your instances that are part of the CMS.
This query will return the database name, the overall size of the database including all data and log files, the create date, owner, compatibility level, online vs offline, and update stats info.
It’s interesting to find databases owned by staff that are no longer around, compatibility levels to old versions for no obvious reason, and create and update stats that are turned off.
So here’s the query:

Select    Convert(int, Convert(char, current_timestamp, 112)) as CaptureDate
        , d.database_id
        , d.[name]
        , CAST(SUM(mf.size * 8096.0 / 1000000000) AS NUMERIC(18, 2)) AS [db_size (G)]
        , d.recovery_model_desc AS [Recovery Model]
        , d.create_date
        , suser_sname(d.owner_sid) AS [Owner]
        , d.[compatibility_level]
        , d.state_desc AS [State]
        , d.is_auto_create_stats_on
        , d.is_auto_update_stats_on
FROM        sys.databases d
        Inner Join sys.master_files mf
            ON    mf.database_id = d.database_id
WHERE    d.database_id > 4    — Exclude the system databases.
GROUP BY d.database_id
        , d.[name]
        , d.recovery_model_desc
        , d.create_date
        , d.owner_sid
        , d.[compatibility_level]
        , d.state_desc
        , d.is_auto_create_stats_on
        , d.is_auto_update_stats_on

RowCount

I had a simple enough task at hand.  An existing EXTRACT that loads a result set to a DATA file needed to have a corresponding COUNT flat file that contained the number of rows in the extract’s data file.

# Step 1
How Not To Do It #1: Use the SSIS ROW COUNT Transformation
SSIS has a built-in, ready to go, out of the box transformation called ROW COUNT.  Basically what it does is set the value of a variable to the number of records passing through it.  This was my first approach and it seemed like such a simple task to write the value out to a flat file, until I got to the point of writing the file.
So here’s how I did it:
1.  Create an OLE DB query to pull some records out of a database.  In this case I’ll use AdventureWorks.  I’ll grab about 2 percent of the 19,000+ records in the Sales.Customer table using TABLESAMPLE.
image
2.  Create an INTEGER variable called RowCount.
3.  Add a ROWCOUNT transform after the OLE DB. Configure the ROWCOUNT transform to store the results to a variable.
image
3.  Add a DERIVED TABLE transformation and attach it after the ROWCOUNT transformation.
4.  Configure the DERIVED TABLE transform to use the RowCount variable by dragging the USER:RowCount variable down to the “Derived Column Name”.
image
5.  Finally, add the flat file destination and configure the flat file destination to wherever you want the file to be created.  Configure the mappings to include only the RowCount; exclude the other columns:
image
6.  Run the package.  The final product will look like this:
image
7.  Check the output file.  Oh know!!!!  There are 188 zeros in it!
So what went wrong?
1.  We still have 188 rows.  To get this to display just one row with the value of the RowCount variable we could add the Row Sampling transform, which is equivalent to the T-SQL TOP command.  Set the row sample size to 1.
2.  But we still have a problem.  The row count is still zero!  The problem is that the variable that gets set by the ROWCOUNT transformation does not get set until the last row passes through the data flow.  In other words, the user variable is not usable until the data flow exits.  Please reference the MSDN page for the ROW COUNT transform for details.
One of those 2 issues alone might seem like a show stopper.  But not quite and the learning process is working in all its glory.
In step 2, we will modify this package to give us the row count to the flat file that we are looking for.

Download a file from the web using SSIS

There are some tasks, no matter how seemingly simple, I just would not want to have to do again.  Today’s post is about one of them…
At work we use a data cleansing tool that purchased from the USPS (Post Office) called NCOA (National Change of Address).  All of the big list brokers use the NCOA to update old addresses and keep their lists as current as possible.
One of the requirements for the NCOA is a daily delete file, that is used to make minor changes to the local NCOA database.  As a NCOA service bureau we are licensed and even required to capture the DAILYDEL.DAT file by downloading it from the USPS’s web site.
Since no one in their right mind is going to take on a task of downloading any file on a daily basis, its curious why the USPS doesn’t just issue a script to go get the file.  So I’ll share my code with anyone that needs it.
To be fair I figured this one out with much support of numerous posts from other people.  Since it’s been a while since I got this running I can not be specific as to who you are.
So here’s the objective of the task:   Go to the USPS web site and download a file every day.   The website URL and the file name do not change.  Pretty simple…
Unfortunately SSIS doesn’t have a native transform for HTTP.  As unbelievable as it is, the only way I could figure how to do it was with a EXECUTE SCRIPT task.
Here’s the code:
' Microsoft SQL Server Integration Services Script Task
' Write scripts using Microsoft Visual Basic
' The ScriptMain class is the entry point of the Script Task.
Imports System
Imports System.IO
Imports System.Text
'Imports System.Windows.Forms
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain
Public Sub Main()

        ' Get the unmanaged connection object.
        Dim nativeObject As Object = Dts.Connections("HTTP Connection to USPS").AcquireConnection(Nothing)

        ' Create a new HTTP client connection
        Dim connection As New HttpClientConnection(nativeObject)

        ' Save the file from the connection manager to SBDQS01
        Dim filename As String = "\\dqs01\NCOALink\DAILYDEL.DAT"
        connection.DownloadFile(filename, True)

        Dts.TaskResult = Dts.Results.Success
    End Sub
End Class
If you need to capture the NCOA DAILYDEL.DAT file just grab this code, drop it into an EXECUTE SCRIPT, change the path where you want to store it, and voila, you’re done.
I just scheduled a SQL Server Agent job to go get the file every night just before midnight (everything starts at midnight, right?).  The download finishes in about 5 seconds.