Delete Unused Rows in SQL Server

Recently, I’ve been developing a SQL Server Data Warehouse solution where I needed to remove all the “unused” rows from particular dimension tables, that is, rows that weren’t referenced by a foreign key relationship.

In itself, this isn’t particularly tricky. I had code that did something like this (using the AdventureWorks2008 sample database):

DELETE p
FROM [Person].[Address] p
WHERE NOT EXISTS (
SELECT 1 FROM [Person].[BusinessEntityAddress] f
WHERE f.[AddressID] = p.[AddressID])

I added a NOT EXISTS clause for each foreign key relationship and table.

Now, as you would expect, as the number of tables and relationships grew, this became cumbersome and prone to error. I reasoned that SQL Server “knows” what can and can’t be deleted and if you try to delete a row that is referenced from another table will report an error such as:

The DELETE statement conflicted with the REFERENCE constraint “FK_BusinessEntityAddress_Address_AddressID”

As a result, I developed a script that will check the foreign key relationships on a given table and only attempt to delete the rows that are not referenced. If tables and relationships are added at a later date, I no longer needed to worry about updating my code to take this into account.

The script (for SQL Server 2008) is shown here:

-- Set the name of the schema and table here
DECLARE @SCHEMA sysname
SET @SCHEMA = 'Person'
DECLARE @TABLE sysname
SET @TABLE = 'Address';
 
-- Determine unused rows for the above specified table
DECLARE @SQL nvarchar(max);
WITH ForeignKeys AS
(
SELECT
	ROW_NUMBER() OVER(ORDER BY fk.object_id) as RowNumber,
	SCHEMA_NAME(fk.schema_id) as ReferencingSchemaName,
	OBJECT_NAME(fk.parent_object_id) AS ReferencingTableName,
	COL_NAME(col.parent_object_id, col.parent_column_id) AS ReferencingColumnName,
	COL_NAME(col.referenced_object_id, col.referenced_column_id) AS ReferencedColumnName
FROM sys.foreign_keys AS fk
INNER JOIN sys.foreign_key_columns AS col
	ON fk.object_id = col.constraint_object_id
WHERE fk.referenced_object_id = OBJECT_ID(@SCHEMA + '.' + @TABLE)
	AND is_disabled = 0
)
SELECT @SQL = ISNULL(@SQL, 'DELETE p FROM [' + @SCHEMA + '].[' + @TABLE + '] p WHERE 1=1')
	+ ' AND NOT EXISTS (SELECT 1 FROM [' + ReferencingSchemaName + '].['+ ReferencingTableName + '] f' + cast(RowNumber as varchar)
	+ ' WHERE f' + cast(RowNumber as varchar) + '.[' + ReferencingColumnName + '] = p.[' + ReferencedColumnName + '])'
FROM ForeignKeys
EXEC sp_ExecuteSQL @SQL

Currently, it is only suitable for tables that are referenced using a single column key. I’ll expand it to handle multiple columns in another post.

Leave a Reply

Your email address will not be published. Required fields are marked *