Skip to content

Number of CPUs per NUMA node

A word of advice: the script assumes that you have the exact number of CPUs per NUMA node, WHICH MIGHT NOT BE THE CASE IN YOUR SCENARIO:

SELECT COUNT(DISTINCT memory_node_id) AS numnodes,
(
SELECT TOP (1)
(SUM(online_scheduler_count) OVER (PARTITION BY memory_node_id))
FROM sys.dm_os_nodes
ORDER BY (SUM(online_scheduler_count) OVER (PARTITION BY memory_node_id)) DESC
) AS numcores
FROM sys.dm_os_nodes
WHERE node_id <span id="mce_SELREST_start" style="overflow:hidden;line-height:0;">&#65279;</span>&lt; 64;

Remove TDE completely. Really?

I must be getting old, two blog posts in the same year! 🙂

Anyway, I made some additional testing as part of a TDE (Transparent Data Encryption, that is) implementation and I noticed couple of interesting things when you want to remove the TDE completely from the server:

  • When checking sys.databases, tempdb is shown as not encrypted (is_encrypted = 0, not 1, as one would expect).
    You will get the correct status, though, under sys.dm_database_encryption_keys.
  • A simple right-click and restart service would not work in this case. One must stop and then start the SQL Server service in order for the TDE to be completely removed

As a side note, I’m on SQL Server 2012 SP2, did not find the time to check other versions.

How to remove TDE, you might wonder? Well, the lovely guys at simple-talk have the answer for you, I’m not going to repeat it here.

Best regards,
io

To OR or …?

Yeap, a year later, there we go, posting a quick blog again.

A colleague of mine skyped me about a query that was running slow. Well, slow means that it ran for 90 seconds, and then it timed out, that is.

Here is the little repro of the challenge:


USE [AdventureWorks2012]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[test](
[ProductID] [INT] NOT NULL,
[FirstDescription] [INT] NULL,
[SecondDescription] [INT] NULL,
CONSTRAINT [PK_ProductID_test] PRIMARY KEY CLUSTERED
(
[ProductID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

I know, I’m too lazy NOT to generate the table from SSMS :-).

Moving further, generate some data for the table – I’m regularly using SQL Data Generator. Again, lazy-lazy, I know…

Now, moving on to the actual code:


DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SET STATISTICS IO, TIME ON

SELECT *
FROM [dbo].[test]
WHERE [FirstDescription] = 299002250 -- your own value here
OR FirstDescription IS NULL
AND SecondDescription = 84830679 -- your own value here;

I made sure I had a cold cache, then I simply selected all columns where the first description column has a particular value, or the first description column is NULL and the second column has a particular value.

Nothing magic here.

And the results from statistics and time being set to ON:


SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 2 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(2 row(s) affected)
Table 'test'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 35 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

And now for the fun part, the execution plan:

execution plan 1

So we’re seeking the same index twice AND also “helping” it with a very expensive sort? Well, a picture worth a thousand words, right?

Any other way to re-write the query? How about:


SELECT *
FROM [dbo].[test]
WHERE [FirstDescription] = 299002250
UNION ALL
SELECT *
FROM [dbo].[test]
WHERE FirstDescription IS NULL
AND SecondDescription = 84830679;

So we replaced the OR with UNION ALL. I will spare you all the details, but I will show you the new execution plan:

execution plan 2

So we got rid of the expensive sort! How cool is that!

As a closing remark, I urge you to understand that this worked in this particular scenario. To clear the fog: if you are trying the same trick with multiple ORs and UNIONs on the same column, the result will favor the OR:


SELECT *
FROM [dbo].[test]
WHERE [FirstDescription] = 299002250
OR FirstDescription = 1684182851
OR FirstDescription = 364563112;

SELECT *
FROM [dbo].[test]
WHERE [FirstDescription] = 299002250
UNION ALL
SELECT *
FROM [dbo].[test]
WHERE FirstDescription = 1684182851
UNION ALL
SELECT *
FROM [dbo].[test]
WHERE FirstDescription = 364563112;

The execution plans, side by side:

execution plan 3

Of course, I barely scratched the surface here. Please feel free to do your own testing and let the rest of us know the results.

Best regards,
io

 

Why is important to separate data from indexes

Hello again.

You all probably heard here and there that separating data from the indexes is good. Some of you even went down that road and actually did it. Which is good.

There are also a ton of articles on this topic and the ones that come quickly to my mind are the ones authored by Paul Randal (here, here and here). These are not the only ones, most likely.

I will try not to repeat what others already said as to why. I will, however, try to show you why.

Here’s the setup. There’ll be 2 tests. One for a one-filegroup database (PRIMARY) and one with a 3-filegroup database (PRIMARY, DATA and INDEX). To enhance a bit the results I’ve placed the databases on 2 USB sticks. The TLOG was placed on the local drive, to avoid skewing the results with the I/O related to that.

The test itself will have the following steps:

  • create the db and the tables
  • populate the tables
  • start PerfMon
  • run random scripts (well, pseudo-random :), you’ll see )
  • cleanup

While doing the above, I’ll use a simple PerfMon session to capture the disks response times.

So let’s get started. We’ll create 3 tables, same schema for all 3. We’ll populate each table and then run some scripts against them. You can download the scripts here. You might want to adjust the paths, the sizes of the files. In the scripts that populates the tables, you can change the number of the records inserted by changing the number of the batches (GO 5000).

To run the scripts, I’ve used the SQLQueryStress (oldie, but goldie – available here). The stored procedure I’ve used, generates a list of scripts from a given folder (see the code) and based on a randomly generated number, executes one of the scripts. Depending on the number of scripts, you might want to adjust the module operand (in my case, 12 as I have 12 scripts). Please also note that the stored procedure called by the SQLQueryStress is using xp_cmdshell. Make sure this is enabled before running it. Or, if this is an issue, feel free to change the code to use Powershell, for example. Alternatively, you can adjust the whole process to be a dynamic code stored in temp table of some sort (copy and paste the workload scripts in an VARCHAR(MAX) column).

And now let’s see the results. Before anything else, briefly, the setup:

  • One FG: TLOG located on local drive and PRIMARY FG located on F: (USB stick)
  • MultiFG: TLOG and PRIMARY FG located on local drive, DATA located on F (USB stick) and INDEX on H (also an USB stick)

Here’s a summary of the PerfMon data (file can be downloaded here).

PerfMon_Summary

Looking at the figures above, it is clear that having multiple FGs definitely helps. The average response times (Avg Disk Sec / Read and Avg Disk Sec / Write) improves considerably. The read response time improved with more than 100%, the write response time improved with more than 200% on the F drive. As there’s no data in the one filegroup case for H drive, there’s no data to compare against. The max response times also improves, as it can be seen in the highlighted cells.

All the PerfMon graphs can be seen here, as a gallery:

Considering that my setup was a rather simple one, in a real life scenario where things can get complicated really ugly, this approach can help improving the performance big time. Of course, as always, it is a matter of “it depends”, but as a rule of thumb, if the table is going to be hammered pretty heavy, maybe it makes sense to split the data from the indexes. Maybe having its own filegroup will help even more (here some other factors will need to be taken into account like table size, usage pattern, expected fragmentation, etc). Placing these on a LUN or on a storage that’s not so hammered may further help balancing out the I/O. I will not go any further into this, as there are a lot of other articles on the matter (see the 3 links from the beginning of this post, for example).

So next time you have to design the physical design of a database, think twice before just running the CREATE DATABASE statement or just blindly using the default options. This approach can also be used when troubleshooting performance issues. One thing that should always be checked is the underlying physical design of the affected tables.

What could be done differently to test this more thoroughly ? Well, one could use a server-side trace (SST) as well and correlate the 2 traces (PerfMon and SST) in SQL Profiler. This way you can also see the impact of each test script in terms of response times. Extended events can also be used to track the activity, but cannot be correlated in Profiler that nicely as an SST (or at least I am not aware of such a feature).

In addition to all of the above, sys.dm_io_virtual_stats can also be queried and see the impact from inside SQL Server on the affected files. Just make sure to query it in a loop, as the result set is cumulative.

Another thing that can be done differently (and this is something that you can try out), is to try and simulate workloads separately. First only reads, then only writes and then combined. This will allow you to see the improvement for each of the I/O pattern.

Another interesting thing to watch is the change in the I/O graphs when the tables are populated. But I’ll let you discover that for yourselves.

Enjoy.

Reporting on AutoGrowth Events

We all know that autogrowth events are bad, for a lot of reasons. For starters, it increases the number of VLOGs for the transaction log. Second, depending on your settings, it may also slow down your server if these are often enough. Third, it means that you, as a DBA, missed something in the configuration of the database.

So, how can we see these events ?

Well, there’s Brent Ozar’s sp_Blitz which you can use to check for this and a whole lot of other issues. Kendra Little has a nice article on this, here.

In addition to that, if you fancy creating your own, one can build relatively quickly a mechanism to notify when such an event is detected. Well, not real-time…

Here’s how I did it. You can change the object names, the database names, schema name any way you want. Just make the required changes to the stored procedure as well.

The script I used (the one reading the entries from the default trace) is available for quite a while on various blogs, two of them being Aaron Bertrand’s and Tibor Karaszi. Just keep in mind that the default trace is marked as a deprecated feature in SQL 2014.

I started by creating a table to log these events. Having these events logged may also help in troubleshooting some of the processes that run on your servers. For example, it may show you frequent tempdb autogrowth events. You can check into the active processes and try to figure out what process is causing this and try to address the issue there (is there a tempdb spill, is tempdb used when other methods could be used, etc). Another example would be autogrowth events affecting transaction logs. One, you will have to address the increased number of vlogs and second, you’ll have to identify what is causing that and see if it’s a tlog misconfiguration or a process running a really long transaction which can be changed.

USE [DBA]
GO

IF OBJECT_ID('[monitor].[AutoGrowthEvents]') IS NULL
BEGIN
	CREATE TABLE [monitor].[AutoGrowthEvents]
	(	[server_name] [varchar](128) NULL,
		[event_datetime] [datetime] NULL,
		[event_id] INT,
		[event_name] [varchar](50) NULL,
		[database_id] INT,
		[database_name] [varchar](128) NULL,
		[logical_filename] [varchar](256) NULL,
		[physical_name] [varchar](256) NULL,
		[current_size_mb] [float] NULL,
		[is_percent_growth] [varchar](3) NULL,
		[growth_setting] [float] NULL,
		[growth_size_mb] [float] NULL,
		[event_duration_ms] [float] NULL,
		[reported] BIT DEFAULT(0)
	) ON [DATA]

	CREATE CLUSTERED INDEX [CIX_servername_eventid_databaseid_logicalname_eventdatetime] ON [monitor].[AutoGrowthEvents]
	(	[server_name],
		[event_id],
		[database_id],
		[logical_filename],
		[event_datetime],
		[growth_size_mb]
	)
	WITH (	PAD_INDEX = OFF,
			STATISTICS_NORECOMPUTE = OFF,
			ALLOW_PAGE_LOCKS = ON,
			ALLOW_ROW_LOCKS = ON,
			ONLINE = ON,
			SORT_IN_TEMPDB = ON
	) ON DATA
END

Then, I created a stored procedure that would read the default trace files and look for EventClass’es 92 (Data File Autogrowth) and 93 (Log File Autogrowth). Once the stored procedure was created, I created a job that runs every hour and runs that stored procedure.

USE [DBA]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[dba_CheckAutoGrowEvents]
AS
BEGIN
	DECLARE @filename NVARCHAR(1000);
	DECLARE @bc INT;
	DECLARE @ec INT;
	DECLARE @bfn VARCHAR(1000);
	DECLARE @efn VARCHAR(10);
	DECLARE @operator VARCHAR(100); -- email distribution list
	DECLARE @profile	VARCHAR(100)
	DECLARE @ReportHTML  NVARCHAR(MAX);
	DECLARE @Subject NVARCHAR (250);

	DECLARE @FQN	VARCHAR(150)
	DECLARE @Domain VARCHAR(100)

	EXEC master.dbo.xp_regread 'HKEY_LOCAL_MACHINE', 'SYSTEM\CurrentControlSet\services\Tcpip\Parameters', N'Domain',@Domain OUTPUT   
	SELECT @FQN = CAST(SERVERPROPERTY('MachineName') AS VARCHAR) + '.' + @Domain 

	-- Set email distrubution list value
	SET @operator = 'dba@corporatemail.local'
	SET @profile  = 'Email Profile'

	-- Get the name of the current default trace
	SELECT @filename = CAST(value AS NVARCHAR(1000))
	FROM ::fn_trace_getinfo(DEFAULT)
	WHERE traceid = 1 AND property = 2;

	-- rip apart file name into pieces
	SET @filename = REVERSE(@filename);
	SET @bc = CHARINDEX('.',@filename);
	SET @ec = CHARINDEX('_',@filename)+1;
	SET @efn = REVERSE(SUBSTRING(@filename,1,@bc));
	SET @bfn = REVERSE(SUBSTRING(@filename,@ec,LEN(@filename)));

	-- set filename without rollover number
	SET @filename = @bfn + @efn

	INSERT INTO [DBA].[monitor].[AutoGrowthEvents]
	SELECT 
		CAST(@FQN AS VARCHAR(128)) AS [server_name],
		CAST(ftg.[StartTime] AS DATETIME) AS [event_datetime], 
		CAST(te.[trace_event_id] AS INT) AS [event_id],
		CAST(te.[name] AS VARCHAR(50)) AS [event_name], 
		CAST(ftg.[DatabaseID] AS INT) AS [database_id],
		CAST(DB_NAME(ftg.[DatabaseID]) AS VARCHAR(128)) AS [database_name], 
		CAST(ftg.[FileName] AS VARCHAR(256)) AS [logical_filename], 
		CAST((SELECT [physical_name] FROM sys.master_files WHERE [name] = ftg.[FileName] AND [database_id] = ftg.[DatabaseID]) AS VARCHAR(256)) AS [physical_name],
		CAST((SELECT ([size] * 8) / 1024. FROM sys.master_files WHERE [name] = ftg.[FileName] AND [database_id] = ftg.[DatabaseID]) AS FLOAT) AS [current_size_mb],
		CAST((SELECT SUBSTRING('No Yes',3 * [is_percent_growth] + 1, 3) FROM sys.master_files WHERE [name] = ftg.[FileName] AND [database_id] = ftg.[DatabaseID]) AS VARCHAR(3)) AS [is_percent_growth],
		CAST((SELECT CASE WHEN [is_percent_growth] = 1 THEN [growth] ELSE ([growth] * 8) / 1024. END FROM sys.master_files WHERE [name] = ftg.[FileName] AND [database_id] = ftg.[DatabaseID]) AS FLOAT) AS [growth_setting],
		CAST((ftg.IntegerData*8)/1024.0 AS FLOAT) AS [growth_size_mb], 
		CAST((ftg.duration/1000) AS FLOAT) AS [event_duration_ms],
		0 AS [reported]
	FROM ::fn_trace_gettable(@filename, DEFAULT) AS ftg 
	INNER JOIN sys.trace_events AS te ON ftg.EventClass = te.trace_event_id  
	LEFT JOIN [DBA].[monitor].[AutoGrowthEvents] age ON age.[server_name] = @FQN
													AND age.[event_id] = ftg.[eventclass] 
													AND age.[database_id] = ftg.[DatabaseID]
													AND age.[logical_filename] = ftg.[FileName]
													AND age.[event_datetime] = ftg.[StartTime]
													AND age.[growth_size_mb] = (ftg.[IntegerData] * 8) / 1024.

	WHERE (		EventClass = 92  -- Date File Auto-grow
			OR	EventClass = 93) -- Log File Auto-grow 
		--AND StartTime > DATEADD(hh,-1,GETDATE()) -- Less than 1 hour ago
		AND (	age.[server_name] IS NULL 
			AND age.[event_id] IS NULL
			AND age.[database_id] IS NULL
			AND age.[logical_filename] IS NULL
			AND age.[event_datetime] IS NULL
			AND age.[growth_size_mb] IS NULL
		)

	IF @@ROWCOUNT > 0
	BEGIN
		/* Any Events Occur were added to the local collection */
		SET @ReportHTML = '<html><body style="font-family:Calibri;font-size:10pt;">' +
		N'<H2>' + N'Auto-grow Events for ' + @FQN + 
		CASE WHEN SERVERPROPERTY('InstanceName') IS NULL 
			THEN ''  
			ELSE N'\' +  CAST(SERVERPROPERTY('InstanceName') AS NVARCHAR(128)) 
		END +
		N'</H2>' +
		N'These are the auto-growth events that occured in the last hour for databases mentioned in the table below. If log shipping is enabled for any of the mentioned databases, please check log shipping.' + 
		N'<table border="1" style="font-family:Calibri;font-size:10pt;">' +
		N'<tr><th>Event DateTime</th>' +
		N'<th>Event Name</th>' +
		N'<th>Database Name</th>' +
		N'<th>File Name (path)</th>' +
		N'<th>Growth in MB</th>' +
		N'<th>Duration in MS</th></tr>' +
		CAST((	SELECT 
					td = [event_datetime], '',
					td = [event_name], '',
					td = [database_name], '',
					td = [logical_filename] + '(path=' + [physical_name] + ')', '',
					td = [growth_size_mb], '', 
					td = [event_duration_ms]
				FROM [DBA].[monitor].[AutoGrowthEvents]
				WHERE [reported] = 0
				ORDER BY [event_datetime] 
				FOR XML PATH('tr'), TYPE 
			) AS NVARCHAR(MAX) ) +
		N'</table></body></html>' ;
    
		UPDATE [DBA].[monitor].[AutoGrowthEvents]
		SET [reported] = 1
		WHERE [reported] = 0

		-- Build the subject line with server and instance name
		SET @Subject = 'ALERT: (WARNING) Auto-grow Events in Last hour on ' + CAST(SERVERPROPERTY('ServerName') AS NVARCHAR(128))

		IF @ReportHTML <> ''
		BEGIN
			-- Send email to distribution list.     
			EXEC msdb.dbo.sp_send_dbmail 
				@profile_name = @profile, 
				@recipients=@operator,
				@subject = @Subject,  
				@body = @ReportHTML,
				@body_format = 'HTML'
		END; -- If there are autogrows in the last day
	END
END

The stored procedure, though, does a trick. Let’s say your workload is not always very heavy, you may have some quite(er) times. In those quieter times, it is probably unlikely to have auto-growth events. Let’s say now that your workload increases suddenly over-night and if you run the job too often you may end up sending false alerts on a growth event that was already reported. And here’s where the log table comes handy.

The trick the stored procedure is doing, it compares the data read from the trace file to the data from previous runs. If it finds an auto-growth event that was already captured, it discards it. Any new event, however, is recorded and then an email is sent. The email is formatted as HTML so it looks nice 🙂

Creating a job should not be such big of a deal, right ?

Fine-tuning the process
Once you have the process up and running, you can correlate / check your process against the SSMS built-in report called “Disk Usage” (see the post from Kendra Little I mentioned at the beginning of this article). This report shows the auto-growth events for the selected database. If you see some events there that are not reported by your process, you should run the job more often, it’s very likely that the default trace files have been rolled-over between 2 occurrences of your job.

Enjoy.

PS: Don’t forget to add a process that prunes this table every now and then. 🙂

The joy of FILEGROUP_NAME (and probably other as well)

Part of my scripts collection, is a script that returns the details of a table: storage, metadata (sys.objects), partitioning, etc. I was using this script for quite a while and ran ok. Few days ago, however, after I disabled an index on a partitioned table it failed and the error was a bit puzzling:

Msg 220, Level 16, State 1, Line 8
Arithmetic overflow error for data type smallint, value = 65601.

An excerpt of the whole script that failed is here (adapted to use a test database I built with this script – remove the .DOC extension):

USE [TestFG]

DECLARE @tablename VARCHAR(256), @schemaname VARCHAR(64)

SET @tablename = 'test_fg'
SET @schemaname = 'dbo'

/* physical storage */
SELECT 'Physical storage info' AS Category

;WITH spaceused AS
(
SELECT DISTINCT
	COALESCE(f.[name], FILEGROUP_NAME(ds.[data_space_id])) AS fg_name,
	OBJECT_SCHEMA_NAME(p.[object_id], DB_ID()) + '.' + OBJECT_NAME(p.[object_id]) AS table_name,
	OBJECTPROPERTY(p.[object_id],'IsMSShipped') AS system_object,
	i.[name] AS index_name,
	i.[type_desc] AS index_type,
	COALESCE(ps.[name],'unpartitioned') AS ps_name,
	COALESCE(pf.[name],'unpartitioned') AS pf_name,
	/* number of indexes, excluding the clustered index */
	COALESCE((SELECT COUNT(1) FROM sys.indexes WHERE [index_id] > 1 AND [object_id] = p.[object_id]),0) AS num_of_indexes,
	(SELECT SUM([ROWS]) FROM sys.partitions WHERE [index_id] < 2 AND [OBJECT_ID] = p.[object_id]) AS num_rows,
	(SELECT SUM(reserved_page_count) * 8 FROM sys.dm_db_partition_stats WHERE [OBJECT_ID] = p.[object_id]) AS reserved_space_KB,
	(SELECT SUM(reserved_page_count) * 8 FROM sys.dm_db_partition_stats p1, sys.internal_tables it WHERE 
				it.parent_id = p1.[object_id] AND it.internal_type IN (202,204,211,212,213,214,215,216) AND p1.[object_id] = it.[object_id]
			AND p1.[object_id] = p.[object_id]) AS xml_ft_reserved_space_KB,
	(SELECT SUM(used_page_count) * 8 FROM sys.dm_db_partition_stats WHERE [OBJECT_ID] = p.[object_id]) AS used_space_KB,
	(SELECT SUM(used_page_count) * 8 FROM sys.dm_db_partition_stats p1, sys.internal_tables it WHERE 
				it.parent_id = p1.[object_id] AND it.internal_type IN (202,204,211,212,213,214,215,216) AND p1.[object_id] = it.[object_id]
			AND p1.[object_id] = p.[object_id]) AS xml_ft_used_space_KB,
	(SELECT SUM(
			CASE
				WHEN (index_id < 2) THEN (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
				ELSE lob_used_page_count + row_overflow_used_page_count
			END
			) * 8 FROM sys.dm_db_partition_stats WHERE [OBJECT_ID] = p.[object_id]) AS data_space_KB
FROM    sys.partitions p WITH(NOLOCK) 
LEFT JOIN sys.indexes i WITH(NOLOCK)  ON p.object_id = i.object_id AND p.index_id = i.index_id
LEFT JOIN sys.data_spaces ds WITH(NOLOCK)  ON i.data_space_id = ds.data_space_id
LEFT JOIN sys.partition_schemes ps WITH(NOLOCK)  ON ds.data_space_id = ps.data_space_id
LEFT JOIN sys.partition_functions pf WITH(NOLOCK)  ON ps.function_id = pf.function_id
LEFT JOIN sys.destination_data_spaces dds with(nolock)  ON dds.partition_scheme_id = ds.data_space_id AND p.partition_number = dds.destination_id
LEFT JOIN sys.filegroups AS f WITH(nolock) ON dds.data_space_id = f.data_space_id
WHERE 
	/* Only get clustered tables / heaps */
	p.[index_id] <= 1

)
SELECT 
	fg_name,
	CAST((	SELECT	SUM(([size] * 8) / 1024.) AS [fg_size_MB],
					SUM((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) AS [fg_used_MB],
					SUM(([size] * 8) / 1024.) - SUM((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) AS [fg_free_space_MB],
					100 * (SUM((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) / (SUM(([size] * 8) / 1024.) )) AS [fg_percent_used],
					COUNT([file_id]) AS [fg_number_of_files]
			FROM sys.database_files df
			WHERE FILEGROUP_NAME([data_space_id]) = fg_name
			FOR XML PATH(''), ROOT('filegroup')) AS XML) AS [fg_space_details],
	CAST((	SELECT	[name] AS [logical_name], 
					[physical_name],
					([size] * 8) / 1024. AS [size_MB],
					(FILEPROPERTY([name],'SpaceUsed') * 8) / 1024. AS [used_MB],
					100 * ((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) / (([size] * 8) / 1024. ) AS [percent_used],
					(([size] * 8) / 1024. ) - ((FILEPROPERTY([name],'SpaceUsed') * 8) / 1024.) AS [free_space_MB]
			FROM sys.database_files df
			WHERE FILEGROUP_NAME([data_space_id]) = fg_name
			ORDER BY [name]
			FOR XML PATH('file'), ROOT('datafiles')) AS XML) AS data_files_status,
	table_name,
	num_of_indexes,
	system_object,
	ps_name,
	pf_name,
	num_rows,
	reserved_space_KB + COALESCE(xml_ft_reserved_space_KB,0) AS reserved_space_KB,
	used_space_KB + COALESCE(xml_ft_used_space_KB,0) AS used_space_KB,
	data_space_KB,
	/* index_space_KB = used_space_KB - data_space_KB */
	CASE
		WHEN (used_space_KB + COALESCE(xml_ft_used_space_KB,0)) > data_space_KB THEN (used_space_KB + COALESCE(xml_ft_used_space_KB,0)) - data_space_KB
		ELSE 0
	END AS index_space_KB
FROM spaceused
WHERE 
	1=1
	/* exclude system objects */
	AND system_object = 0
	AND table_name = @schemaname + '.' + @tablename
ORDER BY fg_name, table_name

When I saw the error message, I immediately thought of something related to partitioning, but when I checked the joining column’s datatypes, all were INT, as I knew they should be. So what was wrong?

I commented out all columns and added back one by one and I reached the one for the filegroup name, it failed again with the same error. I checked the sys.data_spaces, the data_space_id was INT. Again, what was wrong ? And then I checked the FILEGROUP_NAME function itself. Books Online says:

Syntax:
FILEGROUP_NAME ( filegroup_id )

filegroup_id
Is the filegroup ID number for which to return the filegroup name. filegroup_id is smallint.

Remarks:
filegroup_id corresponds to the data_space_id column in the sys.filegroups catalog view.

That’s weird… data_space_id in sys.filegroups is INTEGER. If filegroup_id corresponds to data_space_id, why would FILEGROUP_NAME function expect it as SMALLINT ?

So let’s test it.
Scenario 1 – partitioned table, all indexes enabled.
Script runs, no errors.

Scenario 2 – partitioned table, 1 index disabled, script looks for all indexes.
Script fails.

The workaround is simple. Exclude all disabled indexes. But the root cause is still there: inconsistency in the data types between the parameter expected by this function and the source data for this parameter. And if someone would like to just list all indexes and just flag the disabled ones, will have no other choice but to script them separately. The first time the script worked because a filegroup name has been found in the sys.filegroups and the script didn’t used the FILEGROUP_NAME function. But for the disabled index, there was no filegroup retrieved and to get the data for that part of the script, it reached the

The same error is raised when you try a simpler version, but still against a partitioned table:

USE [TestFG]

SELECT 
	FILEGROUP_NAME(ix.[data_space_id]) AS [fg_name] 
FROM sys.indexes ix
WHERE [object_id] = OBJECT_ID('dbo.Test_FG')

BOL states the same for all SQL versions, up to 2014.

So, is this a bug by mistake or by design ?

Aggregation and partitioned tables

One of my main tasks is to keep a close eye on performance. Every time we have a major release, I have to start looking at performance, find main culprits and fix them. Well, not me, but work with developers and find a solution to the issues at hand. And our databases are rather large, historical databases are in the terabytes, current databases are in the hundreds of gigs, if not terabytes as well, few of them.

Statistically (in my work), partition elimination in all its flavors (aggregating partitioned tables, etc) is one of the recurring patterns in the offending queries I found over the past few months. Actually, it’s the lack of it :). Now, what’s so special about it? Well, it helps keep I/O down to an acceptable level by not reading more data pages than we actually need. In fact, there are few other complementary methods, like proper WHERE clauses (SARGable WHERE clauses, etc) which can be pushed down by the query optimizer and so on and so forth.

Working on a query few weeks ago, I came across a scenario where partition elimination could not be easily used. The task at hand was to get the MAX value of a column without being able to filter it in ant way. Similar to this one:

SET @variable = CONVERT(NVARCHAR(100), SELECT MAX([column1]) FROM dbo.Table1 WITH (NOLOCK)))

Apart from the obvious issues with the above statement, the MAX value was retrieved alright, but the problem was that it had to scan the whole table (couldn’t add any WHERE clauses to it). The fact that the table was partitioned on a datetime column, made me think of ways of getting somehow the partition boundaries and use partition elimination. [Column1] was an integer value which only increases, so my initial thought was, of course, that the MAX value would be in the last ones inserted. Table was also one of the biggest we have in that database, was heavily read from and written into and due to some constraints, I wasn’t able to add any index that would help.

My first attempt was to limit the search to only few days worth of data. There was an improvement, it read far less records (39K opposed to 43 millions), it scanned only the last few partitions, but business rules made this approach not feasible.

SELECT @variable = MAX([Column1]) FROM dbo.Table1 WHERE [DateTimeColumn] >= @CutoffDate

So how would I go through all the table without being so expensive ? One way would be the direct way, using the WHERE clause which would contain the required criteria. But that was already deemed unfeasible. And then there’s an indirect way, which has a limited applicability, but it’s a beautiful trick especially when the direct way is a bit difficult to achieve (like in my example). I learned this from a colleague of mine, Octavian Vaida (Thank you, Tavi), and because I find it really interesting, I decided to share it.

So, let’s set up the test environment. For the sake of experimentation, I’ve swapped the columns from my work query and I decided to see if the same applies when I’ll try to get the MAX of a DATETIME column on a table partitioned on an BIGINT column.

First we create a database and then the required objects: one partition function and 2 partition schemes (one for data and one for indexes).

IF EXISTS (SELECT 1
           FROM sys.databases
           WHERE [name] = 'Test'
)
BEGIN
    DROP DATABASE [Test]
END
GO

CREATE DATABASE [Test]

The script that creates the PF and PSs was generated by this (Generate_PFs_and_PSs.sql) script and certain sections of the script have been removed, as are not relevant for our case. Feel free to download it (you’ll have to rename it from .DOC to .SQL or .TXT), use it, change it, improve it.

USE [Test]
GO

/*
Creating partitioning objects for [test_part_elim] table:
PF          : pf_bigint_left_test_part_elim
PF datatype : bigint
PF type     : left
Data PS     : ps_primary_test_part_elim_data
Index PS    : ps_primary_test_part_elim_index
Data FG     : primary
Index FG    : primary
*/

/* Cleanup phase */
PRINT 'Cleaning phase - dropping existing objects'
IF EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_data')
BEGIN
    DROP PARTITION SCHEME [ps_primary_test_part_elim_data]
    PRINT '  ** [ps_primary_test_part_elim_data] partition scheme dropped'
END

IF EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_index')
BEGIN
    DROP PARTITION SCHEME [ps_primary_test_part_elim_index]
    PRINT '  ** [ps_primary_test_part_elim_index] partition scheme dropped'
END

IF EXISTS(SELECT 1 FROM [sys].[partition_functions] WHERE [name] = 'pf_bigint_left_test_part_elim')
BEGIN
    DROP PARTITION FUNCTION [pf_bigint_left_test_part_elim]
    PRINT '  ** [pf_bigint_left_test_part_elim] partition function dropped'
END

/* Create PF and PSs */
PRINT '(Re-)Creating objects...'
PRINT '  * Adding partition function'
IF NOT EXISTS(SELECT 1 FROM [sys].[partition_functions] WHERE [name] = 'pf_bigint_left_test_part_elim')
BEGIN
    CREATE PARTITION FUNCTION [pf_bigint_left_test_part_elim](bigint) AS RANGE left FOR VALUES
    (    0,  10000,  20000,  30000,  40000,  50000,
     60000,  70000,  80000,  90000, 100000, 110000,
    120000, 130000, 140000, 150000, 160000, 170000,
    180000, 190000, 200000, 210000, 220000, 230000,
    240000, 250000, 260000, 270000, 280000, 290000,
    300000, 310000, 320000, 330000, 340000, 350000,
    360000, 370000, 380000, 390000, 400000, 410000,
    420000, 430000, 440000, 450000, 460000, 470000,
    480000, 490000, 500000, 510000, 520000, 530000,
    540000, 550000, 560000, 570000, 580000, 590000,
    600000, 610000, 620000, 630000, 640000, 650000,
    660000, 670000, 680000, 690000, 700000, 710000,
    720000, 730000, 740000, 750000, 760000, 770000,
    780000, 790000, 800000, 810000, 820000, 830000,
    840000, 850000, 860000, 870000, 880000, 890000,
    900000, 910000, 920000, 930000, 940000, 950000,
    960000, 970000, 980000, 990000,1000000,1010000,
   1020000,1030000,1040000,1050000
    )
    PRINT '  ** [pf_bigint_left_test_part_elim] partition function created'
END

PRINT '  * Adding data partition scheme'
IF NOT EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_data')
BEGIN
    CREATE PARTITION SCHEME [ps_primary_test_part_elim_data] AS PARTITION [pf_bigint_left_test_part_elim]
    ALL TO (    [PRIMARY]    )
    PRINT '  ** [ps_primary_test_part_elim_data] partition scheme created'
END

PRINT '  * Adding index partition scheme'
IF NOT EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_index')
BEGIN
    CREATE PARTITION SCHEME [ps_primary_test_part_elim_index] AS PARTITION [pf_bigint_left_test_part_elim]
    ALL TO  (    [PRIMARY]    )
    PRINT '  ** [ps_primary_test_part_elim_index] partition scheme created'
END

And now, let’s create the table:

USE [Test]
GO

CREATE TABLE [test_part_elim]
(    c1    BIGINT IDENTITY(1,1),
     c2    INT    NOT NULL DEFAULT( CEILING(RAND()*1000)),
     c3    VARCHAR(100) NOT NULL DEFAULT('ABC'),
     c4    DATETIME NOT NULL DEFAULT(GETDATE())
)
ON [ps_primary_test_part_elim_data](c1)

CREATE CLUSTERED INDEX [ix_c1] ON [test_part_elim]([c1])
WITH
(   SORT_IN_TEMPDB = ON,
    ONLINE = ON,
    ALLOW_ROW_LOCKS = ON,
    ALLOW_PAGE_LOCKS = ON
)
ON [ps_primary_test_part_elim_data](c1)

CREATE NONCLUSTERED INDEX [ix_c2] ON [test_part_elim]([c2],[c1])
WITH
(   SORT_IN_TEMPDB = ON,
    ONLINE = ON,
    ALLOW_ROW_LOCKS = ON,
    ALLOW_PAGE_LOCKS = ON
)
ON [ps_primary_test_part_elim_index]([c1])

Second, we populated the tables.

INSERT INTO [test_part_elim]([c2],[c3],[c4])
VALUES (DEFAULT, DEFAULT, DEFAULT)
GO 1050000

Third, we start testing. Let’s assume we want to get the MAX of [c4] (which is datetime) and it’s not the partitioning column. One way of doing it is:

SELECT
    MAX([c4]) AS [c]
FROM [test_part_elim]

But the execution plan doesn’t look so good. When dealing with large tables, clustered index scans are never good. The index scan operator cost is close to the subtree cost. The I/O statistics shows these numbers:

(1 row(s) affected)
Table 'test_part_elim'. Scan count 107, logical reads 5145, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Let’s test the same but with MAXDOP 1.

SELECT
    MAX([c4]) AS [c]
FROM [test_part_elim]
OPTION (MAXDOP 1)

The execution plan doesn’t look any better. Same clustered index scan, same range of costs. If we look at the subtree cost for the whole statement, it shows a cost of 5.71247. The operator cost is, again, close to this value. The I/O for the serial plan is the same:

(1 row(s) affected)
Table 'test_part_elim'. Scan count 107, logical reads 5145, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)

But what if we consider the partitions as smaller tables, that can be scanned individually ? Let’s try this approach to get the MAX value we need:

SELECT 
    MAX(y.[c])
FROM sys.partitions p
CROSS APPLY ( SELECT TOP 1
                  MAX([c4]) AS [c]
              FROM [test_part_elim]
              WHERE p.[partition_number] = $PARTITION.pf_bigint_left_test_part_elim([c1])
            ) y
WHERE p.[object_id] = OBJECT_ID('[test_part_elim]')

For starters, the execution plan looks totally different. If we’re examining the execution plan, we’ll see that the subtree cost is now 0.109098. That’s a dramatic change in cost, even though the script scans all 106 partitions in both cases. The operator cost also dropped considerably.

Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)
Table 'Worktable'. Scan count 428, logical reads 1281, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'test_part_elim'. Scan count 107, logical reads 5145, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 2, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Some may argument that the costs we’re seeing would drop if we would have a covering index in place. So, what if we create a covering index for column [c4], will the query have a lower cost with the second approach ? One way to find out… Let’s create an index first.

CREATE NONCLUSTERED INDEX [ix_c4] ON [test_part_elim]([c4],[c1])
WITH
	(	SORT_IN_TEMPDB = ON,
		ONLINE = ON,
		ALLOW_ROW_LOCKS = ON,
		ALLOW_PAGE_LOCKS = ON
	)
ON [ps_primary_test_part_elim_index]([c1])

And let’s see what’s happening with the queries. First thing I noticed, is that the new index is being used.

With script 1, we have these I/O figures.

(1 row(s) affected)
Table 'test_part_elim'. Scan count 107, logical reads 3255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

The logical reads have dropped by approximately 30%.. Execution plan looks a bit different (we’re not having a parallel plan because of the new index), subtree cost have dropped a bit from 5.71247 down to 4.31247 and operator cost dropped as well, from 5.08247 down to 3.68247. It’s good, but let’s see the second approach.

With script 2, the I/O looks way better, with only close to 12% from the whole reads previously.

Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)
Table 'test_part_elim'. Scan count 321, logical reads 630, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 2, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

The execution plan looks similar but check that subtree cost and index scan operator cost!

We can go even further with our tests, by adding a WHERE clause, on one of the non-partitioned columns, maybe even on the same C4 we’re reading. To not make this post too long, let me just say that during my tests, the same improvement pattern has been seen when using WHERE clauses.

Here are 2 scripts I’ve used to test. In the execution plans, check the subtree costs and index scan operator costs and do the math yourself.

SELECT 
	MAX([c4]) AS [c]
FROM [test_part_elim]
WHERE 
	[c1] > 0 AND [c1] < 10000000


SELECT MAX(y.[c])
FROM sys.partitions p
CROSS APPLY (
	SELECT TOP 1
		MAX([c4]) AS [c]
	FROM [test_part_elim]
	WHERE 1=1
		AND p.[partition_number] = $PARTITION.pf_bigint_left_test_part_elim([c1])
		AND [c1] > 0 AND [c1] < 10000000
) y
WHERE p.[object_id] = OBJECT_ID('[test_part_elim]')



SELECT 
	MAX([c4]) AS [c]
FROM [test_part_elim]
WHERE 
	[c4] > DATEADD(DAY, -5, GETDATE())

SELECT MAX(y.[c])
FROM sys.partitions p
CROSS APPLY (
	SELECT TOP 1
		MAX([c4]) AS [c]
	FROM [test_part_elim]
	WHERE 1=1
		AND p.[partition_number] = $PARTITION.pf_bigint_left_test_part_elim([c1])
		AND [c4] > DATEADD(DAY, -5, GETDATE())
) y
WHERE p.[object_id] = OBJECT_ID('[test_part_elim]')

To summarize, we have the following subtree and operator costs:
ComparisonTable

And how do we explain this ?

With method one, we’re scanning all records in all partitions in one go (with or without parallelism) and this is how we’re getting the max value. With method 2, we’re getting one MAX value for each partition and then a MAX across these MAX values is calculated. This explains why the cost dropped so considerably.

Enjoy.