Skip to content

Remove TDE completely. Really?

I must be getting old, two blog posts in the same year!🙂

Anyway, I made some additional testing as part of a TDE (Transparent Data Encryption, that is) implementation and I noticed couple of interesting things when you want to remove the TDE completely from the server:

  • When checking sys.databases, tempdb is shown as not encrypted (is_encrypted = 0, not 1, as one would expect).
    You will get the correct status, though, under sys.dm_database_encryption_keys.
  • A simple right-click and restart service would not work in this case. One must stop and then start the SQL Server service in order for the TDE to be completely removed

As a side note, I’m on SQL Server 2012 SP2, did not find the time to check other versions.

How to remove TDE, you might wonder? Well, the lovely guys at simple-talk have the answer for you, I’m not going to repeat it here.

Best regards,
io

To OR or …?

Yeap, a year later, there we go, posting a quick blog again.

A colleague of mine skyped me about a query that was running slow. Well, slow means that it ran for 90 seconds, and then it timed out, that is.

Here is the little repro of the challenge:


USE [AdventureWorks2012]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[test](
[ProductID] [INT] NOT NULL,
[FirstDescription] [INT] NULL,
[SecondDescription] [INT] NULL,
CONSTRAINT [PK_ProductID_test] PRIMARY KEY CLUSTERED
(
[ProductID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

I know, I’m too lazy NOT to generate the table from SSMS🙂.

Moving further, generate some data for the table – I’m regularly using SQL Data Generator. Again, lazy-lazy, I know…

Now, moving on to the actual code:


DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SET STATISTICS IO, TIME ON

SELECT *
FROM [dbo].[test]
WHERE [FirstDescription] = 299002250 -- your own value here
OR FirstDescription IS NULL
AND SecondDescription = 84830679 -- your own value here;

I made sure I had a cold cache, then I simply selected all columns where the first description column has a particular value, or the first description column is NULL and the second column has a particular value.

Nothing magic here.

And the results from statistics and time being set to ON:


SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 2 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(2 row(s) affected)
Table 'test'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 35 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

And now for the fun part, the execution plan:

execution plan 1

So we’re seeking the same index twice AND also “helping” it with a very expensive sort? Well, a picture worth a thousand words, right?

Any other way to re-write the query? How about:


SELECT *
FROM [dbo].[test]
WHERE [FirstDescription] = 299002250
UNION ALL
SELECT *
FROM [dbo].[test]
WHERE FirstDescription IS NULL
AND SecondDescription = 84830679;

So we replaced the OR with UNION ALL. I will spare you all the details, but I will show you the new execution plan:

execution plan 2

So we got rid of the expensive sort! How cool is that!

As a closing remark, I urge you to understand that this worked in this particular scenario. To clear the fog: if you are trying the same trick with multiple ORs and UNIONs on the same column, the result will favor the OR:


SELECT *
FROM [dbo].[test]
WHERE [FirstDescription] = 299002250
OR FirstDescription = 1684182851
OR FirstDescription = 364563112;

SELECT *
FROM [dbo].[test]
WHERE [FirstDescription] = 299002250
UNION ALL
SELECT *
FROM [dbo].[test]
WHERE FirstDescription = 1684182851
UNION ALL
SELECT *
FROM [dbo].[test]
WHERE FirstDescription = 364563112;

The execution plans, side by side:

execution plan 3

Of course, I barely scratched the surface here. Please feel free to do your own testing and let the rest of us know the results.

Best regards,
io

 

Why is important to separate data from indexes

Hello again.

You all probably heard here and there that separating data from the indexes is good. Some of you even went down that road and actually did it. Which is good.

There are also a ton of articles on this topic and the ones that come quickly to my mind are the ones authored by Paul Randal (here, here and here). These are not the only ones, most likely.

I will try not to repeat what others already said as to why. I will, however, try to show you why.

Here’s the setup. There’ll be 2 tests. One for a one-filegroup database (PRIMARY) and one with a 3-filegroup database (PRIMARY, DATA and INDEX). To enhance a bit the results I’ve placed the databases on 2 USB sticks. The TLOG was placed on the local drive, to avoid skewing the results with the I/O related to that.

The test itself will have the following steps:

  • create the db and the tables
  • populate the tables
  • start PerfMon
  • run random scripts (well, pseudo-random🙂, you’ll see )
  • cleanup

While doing the above, I’ll use a simple PerfMon session to capture the disks response times.

So let’s get started. We’ll create 3 tables, same schema for all 3. We’ll populate each table and then run some scripts against them. You can download the scripts here. You might want to adjust the paths, the sizes of the files. In the scripts that populates the tables, you can change the number of the records inserted by changing the number of the batches (GO 5000).

To run the scripts, I’ve used the SQLQueryStress (oldie, but goldie – available here). The stored procedure I’ve used, generates a list of scripts from a given folder (see the code) and based on a randomly generated number, executes one of the scripts. Depending on the number of scripts, you might want to adjust the module operand (in my case, 12 as I have 12 scripts). Please also note that the stored procedure called by the SQLQueryStress is using xp_cmdshell. Make sure this is enabled before running it. Or, if this is an issue, feel free to change the code to use Powershell, for example. Alternatively, you can adjust the whole process to be a dynamic code stored in temp table of some sort (copy and paste the workload scripts in an VARCHAR(MAX) column).

And now let’s see the results. Before anything else, briefly, the setup:

  • One FG: TLOG located on local drive and PRIMARY FG located on F: (USB stick)
  • MultiFG: TLOG and PRIMARY FG located on local drive, DATA located on F (USB stick) and INDEX on H (also an USB stick)

Here’s a summary of the PerfMon data (file can be downloaded here).

PerfMon_Summary

Looking at the figures above, it is clear that having multiple FGs definitely helps. The average response times (Avg Disk Sec / Read and Avg Disk Sec / Write) improves considerably. The read response time improved with more than 100%, the write response time improved with more than 200% on the F drive. As there’s no data in the one filegroup case for H drive, there’s no data to compare against. The max response times also improves, as it can be seen in the highlighted cells.

All the PerfMon graphs can be seen here, as a gallery:

Considering that my setup was a rather simple one, in a real life scenario where things can get complicated really ugly, this approach can help improving the performance big time. Of course, as always, it is a matter of “it depends”, but as a rule of thumb, if the table is going to be hammered pretty heavy, maybe it makes sense to split the data from the indexes. Maybe having its own filegroup will help even more (here some other factors will need to be taken into account like table size, usage pattern, expected fragmentation, etc). Placing these on a LUN or on a storage that’s not so hammered may further help balancing out the I/O. I will not go any further into this, as there are a lot of other articles on the matter (see the 3 links from the beginning of this post, for example).

So next time you have to design the physical design of a database, think twice before just running the CREATE DATABASE statement or just blindly using the default options. This approach can also be used when troubleshooting performance issues. One thing that should always be checked is the underlying physical design of the affected tables.

What could be done differently to test this more thoroughly ? Well, one could use a server-side trace (SST) as well and correlate the 2 traces (PerfMon and SST) in SQL Profiler. This way you can also see the impact of each test script in terms of response times. Extended events can also be used to track the activity, but cannot be correlated in Profiler that nicely as an SST (or at least I am not aware of such a feature).

In addition to all of the above, sys.dm_io_virtual_stats can also be queried and see the impact from inside SQL Server on the affected files. Just make sure to query it in a loop, as the result set is cumulative.

Another thing that can be done differently (and this is something that you can try out), is to try and simulate workloads separately. First only reads, then only writes and then combined. This will allow you to see the improvement for each of the I/O pattern.

Another interesting thing to watch is the change in the I/O graphs when the tables are populated. But I’ll let you discover that for yourselves.

Enjoy.

Reporting on AutoGrowth Events

We all know that autogrowth events are bad, for a lot of reasons. For starters, it increases the number of VLOGs for the transaction log. Second, depending on your settings, it may also slow down your server if these are often enough. Third, it means that you, as a DBA, missed something in the configuration of the database.

So, how can we see these events ?

Well, there’s Brent Ozar’s sp_Blitz which you can use to check for this and a whole lot of other issues. Kendra Little has a nice article on this, here.

In addition to that, if you fancy creating your own, one can build relatively quickly a mechanism to notify when such an event is detected. Well, not real-time…

Here’s how I did it. You can change the object names, the database names, schema name any way you want. Just make the required changes to the stored procedure as well.

The script I used (the one reading the entries from the default trace) is available for quite a while on various blogs, two of them being Aaron Bertrand’s and Tibor Karaszi. Just keep in mind that the default trace is marked as a deprecated feature in SQL 2014.

I started by creating a table to log these events. Having these events logged may also help in troubleshooting some of the processes that run on your servers. For example, it may show you frequent tempdb autogrowth events. You can check into the active processes and try to figure out what process is causing this and try to address the issue there (is there a tempdb spill, is tempdb used when other methods could be used, etc). Another example would be autogrowth events affecting transaction logs. One, you will have to address the increased number of vlogs and second, you’ll have to identify what is causing that and see if it’s a tlog misconfiguration or a process running a really long transaction which can be changed.

USE [DBA]
GO

IF OBJECT_ID('[monitor].[AutoGrowthEvents]') IS NULL
BEGIN
	CREATE TABLE [monitor].[AutoGrowthEvents]
	(	[server_name] [varchar](128) NULL,
		[event_datetime] [datetime] NULL,
		[event_id] INT,
		[event_name] [varchar](50) NULL,
		[database_id] INT,
		[database_name] [varchar](128) NULL,
		[logical_filename] [varchar](256) NULL,
		[physical_name] [varchar](256) NULL,
		[current_size_mb] [float] NULL,
		[is_percent_growth] [varchar](3) NULL,
		[growth_setting] [float] NULL,
		[growth_size_mb] [float] NULL,
		[event_duration_ms] [float] NULL,
		[reported] BIT DEFAULT(0)
	) ON [DATA]

	CREATE CLUSTERED INDEX [CIX_servername_eventid_databaseid_logicalname_eventdatetime] ON [monitor].[AutoGrowthEvents]
	(	[server_name],
		[event_id],
		[database_id],
		[logical_filename],
		[event_datetime],
		[growth_size_mb]
	)
	WITH (	PAD_INDEX = OFF,
			STATISTICS_NORECOMPUTE = OFF,
			ALLOW_PAGE_LOCKS = ON,
			ALLOW_ROW_LOCKS = ON,
			ONLINE = ON,
			SORT_IN_TEMPDB = ON
	) ON DATA
END

Then, I created a stored procedure that would read the default trace files and look for EventClass’es 92 (Data File Autogrowth) and 93 (Log File Autogrowth). Once the stored procedure was created, I created a job that runs every hour and runs that stored procedure.

USE [DBA]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[dba_CheckAutoGrowEvents]
AS
BEGIN
	DECLARE @filename NVARCHAR(1000);
	DECLARE @bc INT;
	DECLARE @ec INT;
	DECLARE @bfn VARCHAR(1000);
	DECLARE @efn VARCHAR(10);
	DECLARE @operator VARCHAR(100); -- email distribution list
	DECLARE @profile	VARCHAR(100)
	DECLARE @ReportHTML  NVARCHAR(MAX);
	DECLARE @Subject NVARCHAR (250);

	DECLARE @FQN	VARCHAR(150)
	DECLARE @Domain VARCHAR(100)

	EXEC master.dbo.xp_regread 'HKEY_LOCAL_MACHINE', 'SYSTEM\CurrentControlSet\services\Tcpip\Parameters', N'Domain',@Domain OUTPUT   
	SELECT @FQN = CAST(SERVERPROPERTY('MachineName') AS VARCHAR) + '.' + @Domain 

	-- Set email distrubution list value
	SET @operator = 'dba@corporatemail.local'
	SET @profile  = 'Email Profile'

	-- Get the name of the current default trace
	SELECT @filename = CAST(value AS NVARCHAR(1000))
	FROM ::fn_trace_getinfo(DEFAULT)
	WHERE traceid = 1 AND property = 2;

	-- rip apart file name into pieces
	SET @filename = REVERSE(@filename);
	SET @bc = CHARINDEX('.',@filename);
	SET @ec = CHARINDEX('_',@filename)+1;
	SET @efn = REVERSE(SUBSTRING(@filename,1,@bc));
	SET @bfn = REVERSE(SUBSTRING(@filename,@ec,LEN(@filename)));

	-- set filename without rollover number
	SET @filename = @bfn + @efn

	INSERT INTO [DBA].[monitor].[AutoGrowthEvents]
	SELECT 
		CAST(@FQN AS VARCHAR(128)) AS [server_name],
		CAST(ftg.[StartTime] AS DATETIME) AS [event_datetime], 
		CAST(te.[trace_event_id] AS INT) AS [event_id],
		CAST(te.[name] AS VARCHAR(50)) AS [event_name], 
		CAST(ftg.[DatabaseID] AS INT) AS [database_id],
		CAST(DB_NAME(ftg.[DatabaseID]) AS VARCHAR(128)) AS [database_name], 
		CAST(ftg.[FileName] AS VARCHAR(256)) AS [logical_filename], 
		CAST((SELECT [physical_name] FROM sys.master_files WHERE [name] = ftg.[FileName] AND [database_id] = ftg.[DatabaseID]) AS VARCHAR(256)) AS [physical_name],
		CAST((SELECT ([size] * 8) / 1024. FROM sys.master_files WHERE [name] = ftg.[FileName] AND [database_id] = ftg.[DatabaseID]) AS FLOAT) AS [current_size_mb],
		CAST((SELECT SUBSTRING('No Yes',3 * [is_percent_growth] + 1, 3) FROM sys.master_files WHERE [name] = ftg.[FileName] AND [database_id] = ftg.[DatabaseID]) AS VARCHAR(3)) AS [is_percent_growth],
		CAST((SELECT CASE WHEN [is_percent_growth] = 1 THEN [growth] ELSE ([growth] * 8) / 1024. END FROM sys.master_files WHERE [name] = ftg.[FileName] AND [database_id] = ftg.[DatabaseID]) AS FLOAT) AS [growth_setting],
		CAST((ftg.IntegerData*8)/1024.0 AS FLOAT) AS [growth_size_mb], 
		CAST((ftg.duration/1000) AS FLOAT) AS [event_duration_ms],
		0 AS [reported]
	FROM ::fn_trace_gettable(@filename, DEFAULT) AS ftg 
	INNER JOIN sys.trace_events AS te ON ftg.EventClass = te.trace_event_id  
	LEFT JOIN [DBA].[monitor].[AutoGrowthEvents] age ON age.[server_name] = @FQN
													AND age.[event_id] = ftg.[eventclass] 
													AND age.[database_id] = ftg.[DatabaseID]
													AND age.[logical_filename] = ftg.[FileName]
													AND age.[event_datetime] = ftg.[StartTime]
													AND age.[growth_size_mb] = (ftg.[IntegerData] * 8) / 1024.

	WHERE (		EventClass = 92  -- Date File Auto-grow
			OR	EventClass = 93) -- Log File Auto-grow 
		--AND StartTime > DATEADD(hh,-1,GETDATE()) -- Less than 1 hour ago
		AND (	age.[server_name] IS NULL 
			AND age.[event_id] IS NULL
			AND age.[database_id] IS NULL
			AND age.[logical_filename] IS NULL
			AND age.[event_datetime] IS NULL
			AND age.[growth_size_mb] IS NULL
		)

	IF @@ROWCOUNT > 0
	BEGIN
		/* Any Events Occur were added to the local collection */
		SET @ReportHTML = '<html><body style="font-family:Calibri;font-size:10pt;">' +
		N'<H2>' + N'Auto-grow Events for ' + @FQN + 
		CASE WHEN SERVERPROPERTY('InstanceName') IS NULL 
			THEN ''  
			ELSE N'\' +  CAST(SERVERPROPERTY('InstanceName') AS NVARCHAR(128)) 
		END +
		N'</H2>' +
		N'These are the auto-growth events that occured in the last hour for databases mentioned in the table below. If log shipping is enabled for any of the mentioned databases, please check log shipping.' + 
		N'<table border="1" style="font-family:Calibri;font-size:10pt;">' +
		N'<tr><th>Event DateTime</th>' +
		N'<th>Event Name</th>' +
		N'<th>Database Name</th>' +
		N'<th>File Name (path)</th>' +
		N'<th>Growth in MB</th>' +
		N'<th>Duration in MS</th></tr>' +
		CAST((	SELECT 
					td = [event_datetime], '',
					td = [event_name], '',
					td = [database_name], '',
					td = [logical_filename] + '(path=' + [physical_name] + ')', '',
					td = [growth_size_mb], '', 
					td = [event_duration_ms]
				FROM [DBA].[monitor].[AutoGrowthEvents]
				WHERE [reported] = 0
				ORDER BY [event_datetime] 
				FOR XML PATH('tr'), TYPE 
			) AS NVARCHAR(MAX) ) +
		N'</table></body></html>' ;
    
		UPDATE [DBA].[monitor].[AutoGrowthEvents]
		SET [reported] = 1
		WHERE [reported] = 0

		-- Build the subject line with server and instance name
		SET @Subject = 'ALERT: (WARNING) Auto-grow Events in Last hour on ' + CAST(SERVERPROPERTY('ServerName') AS NVARCHAR(128))

		IF @ReportHTML <> ''
		BEGIN
			-- Send email to distribution list.     
			EXEC msdb.dbo.sp_send_dbmail 
				@profile_name = @profile, 
				@recipients=@operator,
				@subject = @Subject,  
				@body = @ReportHTML,
				@body_format = 'HTML'
		END; -- If there are autogrows in the last day
	END
END

The stored procedure, though, does a trick. Let’s say your workload is not always very heavy, you may have some quite(er) times. In those quieter times, it is probably unlikely to have auto-growth events. Let’s say now that your workload increases suddenly over-night and if you run the job too often you may end up sending false alerts on a growth event that was already reported. And here’s where the log table comes handy.

The trick the stored procedure is doing, it compares the data read from the trace file to the data from previous runs. If it finds an auto-growth event that was already captured, it discards it. Any new event, however, is recorded and then an email is sent. The email is formatted as HTML so it looks nice🙂

Creating a job should not be such big of a deal, right ?

Fine-tuning the process
Once you have the process up and running, you can correlate / check your process against the SSMS built-in report called “Disk Usage” (see the post from Kendra Little I mentioned at the beginning of this article). This report shows the auto-growth events for the selected database. If you see some events there that are not reported by your process, you should run the job more often, it’s very likely that the default trace files have been rolled-over between 2 occurrences of your job.

Enjoy.

PS: Don’t forget to add a process that prunes this table every now and then.🙂

The joy of FILEGROUP_NAME (and probably other as well)

Part of my scripts collection, is a script that returns the details of a table: storage, metadata (sys.objects), partitioning, etc. I was using this script for quite a while and ran ok. Few days ago, however, after I disabled an index on a partitioned table it failed and the error was a bit puzzling:

Msg 220, Level 16, State 1, Line 8
Arithmetic overflow error for data type smallint, value = 65601.

An excerpt of the whole script that failed is here (adapted to use a test database I built with this script – remove the .DOC extension):

USE [TestFG]

DECLARE @tablename VARCHAR(256), @schemaname VARCHAR(64)

SET @tablename = 'test_fg'
SET @schemaname = 'dbo'

/* physical storage */
SELECT 'Physical storage info' AS Category

;WITH spaceused AS
(
SELECT DISTINCT
	COALESCE(f.[name], FILEGROUP_NAME(ds.[data_space_id])) AS fg_name,
	OBJECT_SCHEMA_NAME(p.[object_id], DB_ID()) + '.' + OBJECT_NAME(p.[object_id]) AS table_name,
	OBJECTPROPERTY(p.[object_id],'IsMSShipped') AS system_object,
	i.[name] AS index_name,
	i.[type_desc] AS index_type,
	COALESCE(ps.[name],'unpartitioned') AS ps_name,
	COALESCE(pf.[name],'unpartitioned') AS pf_name,
	/* number of indexes, excluding the clustered index */
	COALESCE((SELECT COUNT(1) FROM sys.indexes WHERE [index_id] > 1 AND [object_id] = p.[object_id]),0) AS num_of_indexes,
	(SELECT SUM([ROWS]) FROM sys.partitions WHERE [index_id] < 2 AND [OBJECT_ID] = p.[object_id]) AS num_rows,
	(SELECT SUM(reserved_page_count) * 8 FROM sys.dm_db_partition_stats WHERE [OBJECT_ID] = p.[object_id]) AS reserved_space_KB,
	(SELECT SUM(reserved_page_count) * 8 FROM sys.dm_db_partition_stats p1, sys.internal_tables it WHERE 
				it.parent_id = p1.[object_id] AND it.internal_type IN (202,204,211,212,213,214,215,216) AND p1.[object_id] = it.[object_id]
			AND p1.[object_id] = p.[object_id]) AS xml_ft_reserved_space_KB,
	(SELECT SUM(used_page_count) * 8 FROM sys.dm_db_partition_stats WHERE [OBJECT_ID] = p.[object_id]) AS used_space_KB,
	(SELECT SUM(used_page_count) * 8 FROM sys.dm_db_partition_stats p1, sys.internal_tables it WHERE 
				it.parent_id = p1.[object_id] AND it.internal_type IN (202,204,211,212,213,214,215,216) AND p1.[object_id] = it.[object_id]
			AND p1.[object_id] = p.[object_id]) AS xml_ft_used_space_KB,
	(SELECT SUM(
			CASE
				WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
				ELSE lob_used_page_count + row_overflow_used_page_count
			END
			) * 8 FROM sys.dm_db_partition_stats WHERE [OBJECT_ID] = p.[object_id]) AS data_space_KB
FROM    sys.partitions p WITH(NOLOCK) 
LEFT JOIN sys.indexes i WITH(NOLOCK)  ON p.object_id = i.object_id AND p.index_id = i.index_id
LEFT JOIN sys.data_spaces ds WITH(NOLOCK)  ON i.data_space_id = ds.data_space_id
LEFT JOIN sys.partition_schemes ps WITH(NOLOCK)  ON ds.data_space_id = ps.data_space_id
LEFT JOIN sys.partition_functions pf WITH(NOLOCK)  ON ps.function_id = pf.function_id
LEFT JOIN sys.destination_data_spaces dds with(nolock)  ON dds.partition_scheme_id = ds.data_space_id AND p.partition_number = dds.destination_id
LEFT JOIN sys.filegroups AS f WITH(nolock) ON dds.data_space_id = f.data_space_id
WHERE 
	/* Only get clustered tables / heaps */
	p.[index_id] <= 1

)
SELECT 
	fg_name,
	CAST((	SELECT	SUM(([size] * 8) / 1024.) AS [fg_size_MB],
					SUM((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) AS [fg_used_MB],
					SUM(([size] * 8) / 1024.) - SUM((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) AS [fg_free_space_MB],
					100 * (SUM((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) / (SUM(([size] * 8) / 1024.) )) AS [fg_percent_used],
					COUNT([file_id]) AS [fg_number_of_files]
			FROM sys.database_files df
			WHERE FILEGROUP_NAME([data_space_id]) = fg_name
			FOR XML PATH(''), ROOT('filegroup')) AS XML) AS [fg_space_details],
	CAST((	SELECT	[name] AS [logical_name], 
					[physical_name],
					([size] * 8) / 1024. AS [size_MB],
					(FILEPROPERTY([name],'SpaceUsed') * 8) / 1024. AS [used_MB],
					100 * ((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) / (([size] * 8) / 1024. ) AS [percent_used],
					(([size] * 8) / 1024. ) - ((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) AS [free_space_MB]
			FROM sys.database_files df
			WHERE FILEGROUP_NAME([data_space_id]) = fg_name
			ORDER BY [name]
			FOR XML PATH('file'), ROOT('datafiles')) AS XML) AS data_files_status,
	table_name,
	num_of_indexes,
	system_object,
	ps_name,
	pf_name,
	num_rows,
	reserved_space_KB + COALESCE(xml_ft_reserved_space_KB,0) AS reserved_space_KB,
	used_space_KB + COALESCE(xml_ft_used_space_KB,0) AS used_space_KB,
	data_space_KB,
	/* index_space_KB = used_space_KB - data_space_KB */
	CASE
		WHEN (used_space_KB + COALESCE(xml_ft_used_space_KB,0)) > data_space_KB THEN (used_space_KB + COALESCE(xml_ft_used_space_KB,0)) - data_space_KB
		ELSE 0
	END AS index_space_KB
FROM spaceused
WHERE 
	1=1
	/* exclude system objects */
	AND system_object = 0
	AND table_name = @schemaname + '.' + @tablename
ORDER BY fg_name, table_name

When I saw the error message, I immediately thought of something related to partitioning, but when I checked the joining column’s datatypes, all were INT, as I knew they should be. So what was wrong?

I commented out all columns and added back one by one and I reached the one for the filegroup name, it failed again with the same error. I checked the sys.data_spaces, the data_space_id was INT. Again, what was wrong ? And then I checked the FILEGROUP_NAME function itself. Books Online says:

Syntax:
FILEGROUP_NAME ( filegroup_id )

filegroup_id
Is the filegroup ID number for which to return the filegroup name. filegroup_id is smallint.

Remarks:
filegroup_id corresponds to the data_space_id column in the sys.filegroups catalog view.

That’s weird… data_space_id in sys.filegroups is INTEGER. If filegroup_id corresponds to data_space_id, why would FILEGROUP_NAME function expect it as SMALLINT ?

So let’s test it.
Scenario 1 – partitioned table, all indexes enabled.
Script runs, no errors.

Scenario 2 – partitioned table, 1 index disabled, script looks for all indexes.
Script fails.

The workaround is simple. Exclude all disabled indexes. But the root cause is still there: inconsistency in the data types between the parameter expected by this function and the source data for this parameter. And if someone would like to just list all indexes and just flag the disabled ones, will have no other choice but to script them separately. The first time the script worked because a filegroup name has been found in the sys.filegroups and the script didn’t used the FILEGROUP_NAME function. But for the disabled index, there was no filegroup retrieved and to get the data for that part of the script, it reached the

The same error is raised when you try a simpler version, but still against a partitioned table:

USE [TestFG]

SELECT 
	FILEGROUP_NAME(ix.[data_space_id]) AS [fg_name] 
FROM sys.indexes ix
WHERE [object_id] = OBJECT_ID('dbo.Test_FG')

BOL states the same for all SQL versions, up to 2014.

So, is this a bug by mistake or by design ?

Aggregation and partitioned tables

One of my main tasks is to keep a close eye on performance. Every time we have a major release, I have to start looking at performance, find main culprits and fix them. Well, not me, but work with developers and find a solution to the issues at hand. And our databases are rather large, historical databases are in the terabytes, current databases are in the hundreds of gigs, if not terabytes as well, few of them.

Statistically (in my work), partition elimination in all its flavors (aggregating partitioned tables, etc) is one of the recurring patterns in the offending queries I found over the past few months. Actually, it’s the lack of it🙂. Now, what’s so special about it? Well, it helps keep I/O down to an acceptable level by not reading more data pages than we actually need. In fact, there are few other complementary methods, like proper WHERE clauses (SARGable WHERE clauses, etc) which can be pushed down by the query optimizer and so on and so forth.

Working on a query few weeks ago, I came across a scenario where partition elimination could not be easily used. The task at hand was to get the MAX value of a column without being able to filter it in ant way. Similar to this one:

SET @variable = CONVERT(NVARCHAR(100), SELECT MAX([column1]) FROM dbo.Table1 WITH (NOLOCK)))

Apart from the obvious issues with the above statement, the MAX value was retrieved alright, but the problem was that it had to scan the whole table (couldn’t add any WHERE clauses to it). The fact that the table was partitioned on a datetime column, made me think of ways of getting somehow the partition boundaries and use partition elimination. [Column1] was an integer value which only increases, so my initial thought was, of course, that the MAX value would be in the last ones inserted. Table was also one of the biggest we have in that database, was heavily read from and written into and due to some constraints, I wasn’t able to add any index that would help.

My first attempt was to limit the search to only few days worth of data. There was an improvement, it read far less records (39K opposed to 43 millions), it scanned only the last few partitions, but business rules made this approach not feasible.

SELECT @variable = MAX([Column1]) FROM dbo.Table1 WHERE [DateTimeColumn] >= @CutoffDate

So how would I go through all the table without being so expensive ? One way would be the direct way, using the WHERE clause which would contain the required criteria. But that was already deemed unfeasible. And then there’s an indirect way, which has a limited applicability, but it’s a beautiful trick especially when the direct way is a bit difficult to achieve (like in my example). I learned this from a colleague of mine, Octavian Vaida (Thank you, Tavi), and because I find it really interesting, I decided to share it.

So, let’s set up the test environment. For the sake of experimentation, I’ve swapped the columns from my work query and I decided to see if the same applies when I’ll try to get the MAX of a DATETIME column on a table partitioned on an BIGINT column.

First we create a database and then the required objects: one partition function and 2 partition schemes (one for data and one for indexes).

IF EXISTS (SELECT 1
           FROM sys.databases
           WHERE [name] = 'Test'
)
BEGIN
    DROP DATABASE [Test]
END
GO

CREATE DATABASE [Test]

The script that creates the PF and PSs was generated by this (Generate_PFs_and_PSs.sql) script and certain sections of the script have been removed, as are not relevant for our case. Feel free to download it (you’ll have to rename it from .DOC to .SQL or .TXT), use it, change it, improve it.

USE [Test]
GO

/*
Creating partitioning objects for [test_part_elim] table:
PF          : pf_bigint_left_test_part_elim
PF datatype : bigint
PF type     : left
Data PS     : ps_primary_test_part_elim_data
Index PS    : ps_primary_test_part_elim_index
Data FG     : primary
Index FG    : primary
*/

/* Cleanup phase */
PRINT 'Cleaning phase - dropping existing objects'
IF EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_data')
BEGIN
    DROP PARTITION SCHEME [ps_primary_test_part_elim_data]
    PRINT '  ** [ps_primary_test_part_elim_data] partition scheme dropped'
END

IF EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_index')
BEGIN
    DROP PARTITION SCHEME [ps_primary_test_part_elim_index]
    PRINT '  ** [ps_primary_test_part_elim_index] partition scheme dropped'
END

IF EXISTS(SELECT 1 FROM [sys].[partition_functions] WHERE [name] = 'pf_bigint_left_test_part_elim')
BEGIN
    DROP PARTITION FUNCTION [pf_bigint_left_test_part_elim]
    PRINT '  ** [pf_bigint_left_test_part_elim] partition function dropped'
END

/* Create PF and PSs */
PRINT '(Re-)Creating objects...'
PRINT '  * Adding partition function'
IF NOT EXISTS(SELECT 1 FROM [sys].[partition_functions] WHERE [name] = 'pf_bigint_left_test_part_elim')
BEGIN
    CREATE PARTITION FUNCTION [pf_bigint_left_test_part_elim](bigint) AS RANGE left FOR VALUES
    (    0,  10000,  20000,  30000,  40000,  50000,
     60000,  70000,  80000,  90000, 100000, 110000,
    120000, 130000, 140000, 150000, 160000, 170000,
    180000, 190000, 200000, 210000, 220000, 230000,
    240000, 250000, 260000, 270000, 280000, 290000,
    300000, 310000, 320000, 330000, 340000, 350000,
    360000, 370000, 380000, 390000, 400000, 410000,
    420000, 430000, 440000, 450000, 460000, 470000,
    480000, 490000, 500000, 510000, 520000, 530000,
    540000, 550000, 560000, 570000, 580000, 590000,
    600000, 610000, 620000, 630000, 640000, 650000,
    660000, 670000, 680000, 690000, 700000, 710000,
    720000, 730000, 740000, 750000, 760000, 770000,
    780000, 790000, 800000, 810000, 820000, 830000,
    840000, 850000, 860000, 870000, 880000, 890000,
    900000, 910000, 920000, 930000, 940000, 950000,
    960000, 970000, 980000, 990000,1000000,1010000,
   1020000,1030000,1040000,1050000
    )
    PRINT '  ** [pf_bigint_left_test_part_elim] partition function created'
END

PRINT '  * Adding data partition scheme'
IF NOT EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_data')
BEGIN
    CREATE PARTITION SCHEME [ps_primary_test_part_elim_data] AS PARTITION [pf_bigint_left_test_part_elim]
    ALL TO (    [PRIMARY]    )
    PRINT '  ** [ps_primary_test_part_elim_data] partition scheme created'
END

PRINT '  * Adding index partition scheme'
IF NOT EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_index')
BEGIN
    CREATE PARTITION SCHEME [ps_primary_test_part_elim_index] AS PARTITION [pf_bigint_left_test_part_elim]
    ALL TO  (    [PRIMARY]    )
    PRINT '  ** [ps_primary_test_part_elim_index] partition scheme created'
END

And now, let’s create the table:

USE [Test]
GO

CREATE TABLE [test_part_elim]
(    c1    BIGINT IDENTITY(1,1),
     c2    INT    NOT NULL DEFAULT( CEILING(RAND()*1000)),
     c3    VARCHAR(100) NOT NULL DEFAULT('ABC'),
     c4    DATETIME NOT NULL DEFAULT(GETDATE())
)
ON [ps_primary_test_part_elim_data](c1)

CREATE CLUSTERED INDEX [ix_c1] ON [test_part_elim]([c1])
WITH
(   SORT_IN_TEMPDB = ON,
    ONLINE = ON,
    ALLOW_ROW_LOCKS = ON,
    ALLOW_PAGE_LOCKS = ON
)
ON [ps_primary_test_part_elim_data](c1)

CREATE NONCLUSTERED INDEX [ix_c2] ON [test_part_elim]([c2],[c1])
WITH
(   SORT_IN_TEMPDB = ON,
    ONLINE = ON,
    ALLOW_ROW_LOCKS = ON,
    ALLOW_PAGE_LOCKS = ON
)
ON [ps_primary_test_part_elim_index]([c1])

Second, we populated the tables.

INSERT INTO [test_part_elim]([c2],[c3],[c4])
VALUES (DEFAULT, DEFAULT, DEFAULT)
GO 1050000

Third, we start testing. Let’s assume we want to get the MAX of [c4] (which is datetime) and it’s not the partitioning column. One way of doing it is:

SELECT
    MAX([c4]) AS [c]
FROM [test_part_elim]

But the execution plan doesn’t look so good. When dealing with large tables, clustered index scans are never good. The index scan operator cost is close to the subtree cost. The I/O statistics shows these numbers:

(1 row(s) affected)
Table 'test_part_elim'. Scan count 107, logical reads 5145, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Let’s test the same but with MAXDOP 1.

SELECT
    MAX([c4]) AS [c]
FROM [test_part_elim]
OPTION (MAXDOP 1)

The execution plan doesn’t look any better. Same clustered index scan, same range of costs. If we look at the subtree cost for the whole statement, it shows a cost of 5.71247. The operator cost is, again, close to this value. The I/O for the serial plan is the same:

(1 row(s) affected)
Table 'test_part_elim'. Scan count 107, logical reads 5145, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)

But what if we consider the partitions as smaller tables, that can be scanned individually ? Let’s try this approach to get the MAX value we need:

SELECT 
    MAX(y.[c])
FROM sys.partitions p
CROSS APPLY ( SELECT TOP 1
                  MAX([c4]) AS [c]
              FROM [test_part_elim]
              WHERE p.[partition_number] = $PARTITION.pf_bigint_left_test_part_elim([c1])
            ) y
WHERE p.[object_id] = OBJECT_ID('[test_part_elim]')

For starters, the execution plan looks totally different. If we’re examining the execution plan, we’ll see that the subtree cost is now 0.109098. That’s a dramatic change in cost, even though the script scans all 106 partitions in both cases. The operator cost also dropped considerably.

Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)
Table 'Worktable'. Scan count 428, logical reads 1281, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'test_part_elim'. Scan count 107, logical reads 5145, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 2, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Some may argument that the costs we’re seeing would drop if we would have a covering index in place. So, what if we create a covering index for column [c4], will the query have a lower cost with the second approach ? One way to find out… Let’s create an index first.

CREATE NONCLUSTERED INDEX [ix_c4] ON [test_part_elim]([c4],[c1])
WITH
	(	SORT_IN_TEMPDB = ON,
		ONLINE = ON,
		ALLOW_ROW_LOCKS = ON,
		ALLOW_PAGE_LOCKS = ON
	)
ON [ps_primary_test_part_elim_index]([c1])

And let’s see what’s happening with the queries. First thing I noticed, is that the new index is being used.

With script 1, we have these I/O figures.

(1 row(s) affected)
Table 'test_part_elim'. Scan count 107, logical reads 3255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

The logical reads have dropped by approximately 30%.. Execution plan looks a bit different (we’re not having a parallel plan because of the new index), subtree cost have dropped a bit from 5.71247 down to 4.31247 and operator cost dropped as well, from 5.08247 down to 3.68247. It’s good, but let’s see the second approach.

With script 2, the I/O looks way better, with only close to 12% from the whole reads previously.

Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)
Table 'test_part_elim'. Scan count 321, logical reads 630, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 2, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

The execution plan looks similar but check that subtree cost and index scan operator cost!

We can go even further with our tests, by adding a WHERE clause, on one of the non-partitioned columns, maybe even on the same C4 we’re reading. To not make this post too long, let me just say that during my tests, the same improvement pattern has been seen when using WHERE clauses.

Here are 2 scripts I’ve used to test. In the execution plans, check the subtree costs and index scan operator costs and do the math yourself.

SELECT 
	MAX([c4]) AS [c]
FROM [test_part_elim]
WHERE 
	[c1] > 0 AND [c1] < 10000000


SELECT MAX(y.[c])
FROM sys.partitions p
CROSS APPLY (
	SELECT TOP 1
		MAX([c4]) AS [c]
	FROM [test_part_elim]
	WHERE 1=1
		AND p.[partition_number] = $PARTITION.pf_bigint_left_test_part_elim([c1])
		AND [c1] > 0 AND [c1] < 10000000
) y
WHERE p.[object_id] = OBJECT_ID('[test_part_elim]')



SELECT 
	MAX([c4]) AS [c]
FROM [test_part_elim]
WHERE 
	[c4] > DATEADD(DAY, -5, GETDATE())

SELECT MAX(y.[c])
FROM sys.partitions p
CROSS APPLY (
	SELECT TOP 1
		MAX([c4]) AS [c]
	FROM [test_part_elim]
	WHERE 1=1
		AND p.[partition_number] = $PARTITION.pf_bigint_left_test_part_elim([c1])
		AND [c4] > DATEADD(DAY, -5, GETDATE())
) y
WHERE p.[object_id] = OBJECT_ID('[test_part_elim]')

To summarize, we have the following subtree and operator costs:
ComparisonTable

And how do we explain this ?

With method one, we’re scanning all records in all partitions in one go (with or without parallelism) and this is how we’re getting the max value. With method 2, we’re getting one MAX value for each partition and then a MAX across these MAX values is calculated. This explains why the cost dropped so considerably.

Enjoy.

Corruption she (the DBCC) wrote….

I received recently an email from a friend asking for my help as he faced a problem. He runs a small business in the healthcare industry and his needs in terms of IT should and are minimalist, if you want. And that’s expected.

Of course, I asked what was the problem about and he said: “We had our systems down for quite a while and starting 2 days ago, we cannot access our database anymore.”. Long story short, it looked like a database corruption of some sort. I paid him a visit and had a look thru the logs. Wow… Loads of mini-dumps all over the SQL error log, similar to this one:

The affected database seemed ok, was online and from I was able to tell, readable. I was able to read metadata, I was able to read few user tables (spot checks). But looking in the Windows logs, I found an entry that usually sends cold chills up to any DBA’s spine and make them sweat instantly: 824.

I started asking for a bit of a history, I found out that lately, the computer started running slower and slower and 2 days before they had an incident with the computer when it froze and decided to give a go with a CHKDSK. Now, to have the full picture: the computer was not a server hardware, rather an all-in-one desktop computer playing in the big league…

Let me take a break here and explain why I wouldn’t recommend having a desktop computer act as both desktop and server. Main reasons, there’s a human using it day in day out, for all kind of stuff: Excel spreadsheet, navigating the Internet, Word documents, Solitaire, you name it, it’s there. Plus the antivirus. Plus the occasional freeze when humans do what they overhear in buses, on the street: power it down and then power it up. Or reset it, if you’re lucky enough to own a computer which still has the reset button.

For really small offices, such in this case, a reasonably spec’d desktop, could do the job, provided the human factor is out of the equation and the desktop is fairly reliable.

Back to the case, I started wondering how could a CHKDSK cause such havoc, because it shouldn’t. And here’s the gotcha. The drive (yes, the only drive in the system) has been removed, set up as a slave in another computer and CHKDSK was able to finish its job. Hurray, job done. Well, sort of kind of. Because after the drive was installed back in, the second SQL started up, bang! Problems started.

When I learned all these last details, everything started to make sense. It didn’t really matter anymore why the computer was running slower and slower (eventually, I figured that out, I guess – read on), the only thing that matter now was that the database *was* corrupted. And as it turned out later, it was corrupted big time.

Ran a DBCC CHECKDB and got these beauties back, after only 8 minutes:

DBCC results for ‘DBNAME’.
Warning: NO_INDEX option of checkdb being used. Checks on non-system indexes will be skipped.
Msg 8921, Level 16, State 1, Line 10
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.
Msg 8998, Level 16, State 2, Line 10
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 5 pages from (1:695568) to (1:703655). See other errors for cause.
Msg 8998, Level 16, State 2, Line 10
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 5 pages from (1:703656) to (1:711743). See other errors for cause.
Msg 8939, Level 16, State 5, Line 10
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_headerVersion == HEADER_7_0) failed. Values are 104 and 1.
Msg 8939, Level 16, State 7, Line 10
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_freeData >= PageHeaderOverhead () && m_freeData <= (UINT)PAGESIZE – m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8192.
Msg 8939, Level 16, State 11, Line 10
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_ghostRecCnt <= m_slotCnt) failed. Values are 29811 and 0. Msg 8939, Level 16, State 5, Line 10 Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_headerVersion == HEADER_7_0) failed. Values are 104 and 1. Msg 8939, Level 16, State 7, Line 10 Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_freeData >= PageHeaderOverhead () && m_freeData <= (UINT)PAGESIZE – m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8192.
Msg 8939, Level 16, State 11, Line 10
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_ghostRecCnt <= m_slotCnt) failed. Values are 29811 and 0. Msg 8939, Level 16, State 5, Line 10 Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 362258295026614272 (type Unknown), page (1:703656). Test (m_headerVersion == HEADER_7_0) failed. Values are 104 and 1. Msg 8939, Level 16, State 7, Line 10 Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 362258295026614272 (type Unknown), page (1:703656). Test (m_freeData >= PageHeaderOverhead () && m_freeData <= (UINT)PAGESIZE – m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8192.
CHECKDB found 7 allocation errors and 3 consistency errors not associated with any single object.
CHECKDB found 7 allocation errors and 3 consistency errors in database ‘DBNAME’.

I came across corruption, I think, twice in my career and I was lucky enough to have it fixed in almost no time. Some indexes got corrupted and in other case it was some 3rd level table that we were able to rebuild it completely from scratch. But this one, for me, was a first. “Master, I have a bad feeling about this”, I said to myself. Did a quick search on the errors and almost if not all results pointed to a non-repairable case, where the only salvation would be a reliable backup.

Yes, you guessed it, there was none. Like in never, since the database was created.

So, I created a backup of the database and the tail-log (well, in the reverse order) and was time to get my hands dirty.

I restored the backup on a virtual machine and started working. Two things that I noticed were that VM was running slower and slower and that the drive where I had the SQL binaries was filling up quite quickly. Then I remembered about those mini-dumps. Checking the default location for those logs, big surprise. That’s what was causing the drive to fill up. Thousands and thousands of files. Because there’s no way of stopping those, I added a new drive to the VM, changed the settings for the mini-dumps to point to that drive and let that fill up (SQL will not stop anyway when that’s filled up, will just roll-over the files). That would explain the symptom that my friend noticed in the first place. I guess the corruption lurked around from some time. To stop these from being generated, I switched the corrupted DB to read-only. But there’s a catch switching it to read-only, at least that’s what I noticed. Switching it to read-only, seems to have limited my ability to read the records that were still salvageable from the corrupted tables. As soon as I switched it back to R/W, I was able to read whatever records were still accessible. I played a bit around with this and seems to have a direct correlation. Once that was cleared out of the way, I moved on to starting recovering the data, or what was still recoverable.

I started by scripting out the database. Once that was done, I moved on and put together a script to export everything I can still read. I managed, this way, to isolate a number of 9 tables that were seriously impacted. The big problem was that 2 of those were at the core of the whole application and 2 table hold a lot of scanned documents, which are really hard to rescan them. So yeah, no pressure.

The impacted table were still readable up to a certain point, so I took advantage of that and exported whatever records were still readable and that was pretty much everything I could salvage. Now the reconstruction work can begin.

I used the script I created when I scripted out the database and created an empty on. Once all the salvageable data was exported, I switched the damaged database to R/O. Exporting the unaffected tables, was a breeze. I created a dedicated schema (call it [export]) and the I simply used SELECT * … INTO [export].[table] FROM [table] and that was it. For the affected ones, I created another schema (let’s call it [staging]) and then I created the required empty tables. Couple of notes here:

  • I created the tables with no constraints whatsoever
  • I created the tables with generic column datatypes (VARCHAR(MAX)) to avoid any incompatibilities

Once the staging tables have been populated, I can start reconstructing whatever lost data was possible. And here, all depends on what the data model is, what the relations / foreign keys are in place, a lot of factors are at play here. In some cases, that’s not even possible.

The last step of this will be to import the salvaged data into the final tables. Here the data model comes into play, to load the tables in the right order, to have all the FKs check out, so on and so forth. It’s going to be a tedious process, but it has to be done. At least, tried🙂.

The final chapter of this is yet to be written: I’m at the stage where the staging tables are all populated and I have to try and reconstruct whatever data is possible. And that is a lot of manual work: studying the database diagrams, all the relationships between the tables, etc.

I also issued a list of recommendations to my friend and he was more than happy to follow and we’re now in the process of fixing things: buying a new hardware for the server role, changing how things are done. It’s good to see that people understands the risks, but it’s sad to see this just after a disaster struck. The bad part in this story, is that only 4%-5% of the impacted data was salvageable and there’s no money in the world that can make up for that. Green area shows how many records I was able to recover from the affected tables out of how many. The biggest issues are with table in rows #1, #3, #4 and #8. All of them were important, but these ones were really all that mattered.

RecoverPercentages

Added bonus to the whole incident ? There’s another instance on that computer which was not affected by this incident, fortunately. YET. It runs on a SQLExpress edition, has some backup in place, but… it’s on the same hard drive. In case something goes wrong with the whole computer, both databases are doomed, no matter that the second one has some sort of backup.

Lessons learned ?

No matter how small your database is, disaster *will* strike. It’s not IF, it’s WHEN. Just make sure you’re prepared:

  • Have proper backup in place
  • Have proper backup testing process in place (because having backup is only half of the protection)
  • Have the backup files stored off of that box on another computer, at least, if not in some cloud or some NAS
  • Check your logs, SQL and Windows
  • Run periodic DBCC CHECKDBs (how frequent, another topic)
  • Make sure you’re looking for early signs of hardware failure (have hard drives with SMART capabilities or similar technologies)
  • Make sure the computer you’re using as server is as isolated in terms of access and workload as possible
  • no other activities should take place on this server other than SQL (or whatever other processes you may have, but definitely no user activities)
  • Invest in some sort of data redundancy technology (software RAID1 at least)
  • Monitor your hardware (HDD space, resource usage, etc)

All the best.