Skip to content

Aggregation and partitioned tables

One of my main tasks is to keep a close eye on performance. Every time we have a major release, I have to start looking at performance, find main culprits and fix them. Well, not me, but work with developers and find a solution to the issues at hand. And our databases are rather large, historical databases are in the terabytes, current databases are in the hundreds of gigs, if not terabytes as well, few of them.

Statistically (in my work), partition elimination in all its flavors (aggregating partitioned tables, etc) is one of the recurring patterns in the offending queries I found over the past few months. Actually, it’s the lack of it :). Now, what’s so special about it? Well, it helps keep I/O down to an acceptable level by not reading more data pages than we actually need. In fact, there are few other complementary methods, like proper WHERE clauses (SARGable WHERE clauses, etc) which can be pushed down by the query optimizer and so on and so forth.

Working on a query few weeks ago, I came across a scenario where partition elimination could not be easily used. The task at hand was to get the MAX value of a column without being able to filter it in ant way. Similar to this one:

SET @variable = CONVERT(NVARCHAR(100), SELECT MAX([column1]) FROM dbo.Table1 WITH (NOLOCK)))

Apart from the obvious issues with the above statement, the MAX value was retrieved alright, but the problem was that it had to scan the whole table (couldn’t add any WHERE clauses to it). The fact that the table was partitioned on a datetime column, made me think of ways of getting somehow the partition boundaries and use partition elimination. [Column1] was an integer value which only increases, so my initial thought was, of course, that the MAX value would be in the last ones inserted. Table was also one of the biggest we have in that database, was heavily read from and written into and due to some constraints, I wasn’t able to add any index that would help.

My first attempt was to limit the search to only few days worth of data. There was an improvement, it read far less records (39K opposed to 43 millions), it scanned only the last few partitions, but business rules made this approach not feasible.

SELECT @variable = MAX([Column1]) FROM dbo.Table1 WHERE [DateTimeColumn] >= @CutoffDate

So how would I go through all the table without being so expensive ? One way would be the direct way, using the WHERE clause which would contain the required criteria. But that was already deemed unfeasible. And then there’s an indirect way, which has a limited applicability, but it’s a beautiful trick especially when the direct way is a bit difficult to achieve (like in my example). I learned this from a colleague of mine, Octavian Vaida (Thank you, Tavi), and because I find it really interesting, I decided to share it.

So, let’s set up the test environment. For the sake of experimentation, I’ve swapped the columns from my work query and I decided to see if the same applies when I’ll try to get the MAX of a DATETIME column on a table partitioned on an BIGINT column.

First we create a database and then the required objects: one partition function and 2 partition schemes (one for data and one for indexes).

IF EXISTS (SELECT 1
           FROM sys.databases
           WHERE [name] = 'Test'
)
BEGIN
    DROP DATABASE [Test]
END
GO

CREATE DATABASE [Test]

The script that creates the PF and PSs was generated by this (Generate_PFs_and_PSs.sql) script and certain sections of the script have been removed, as are not relevant for our case. Feel free to download it (you’ll have to rename it from .DOC to .SQL or .TXT), use it, change it, improve it.

USE [Test]
GO

/*
Creating partitioning objects for [test_part_elim] table:
PF          : pf_bigint_left_test_part_elim
PF datatype : bigint
PF type     : left
Data PS     : ps_primary_test_part_elim_data
Index PS    : ps_primary_test_part_elim_index
Data FG     : primary
Index FG    : primary
*/

/* Cleanup phase */
PRINT 'Cleaning phase - dropping existing objects'
IF EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_data')
BEGIN
    DROP PARTITION SCHEME [ps_primary_test_part_elim_data]
    PRINT '  ** [ps_primary_test_part_elim_data] partition scheme dropped'
END

IF EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_index')
BEGIN
    DROP PARTITION SCHEME [ps_primary_test_part_elim_index]
    PRINT '  ** [ps_primary_test_part_elim_index] partition scheme dropped'
END

IF EXISTS(SELECT 1 FROM [sys].[partition_functions] WHERE [name] = 'pf_bigint_left_test_part_elim')
BEGIN
    DROP PARTITION FUNCTION [pf_bigint_left_test_part_elim]
    PRINT '  ** [pf_bigint_left_test_part_elim] partition function dropped'
END

/* Create PF and PSs */
PRINT '(Re-)Creating objects...'
PRINT '  * Adding partition function'
IF NOT EXISTS(SELECT 1 FROM [sys].[partition_functions] WHERE [name] = 'pf_bigint_left_test_part_elim')
BEGIN
    CREATE PARTITION FUNCTION [pf_bigint_left_test_part_elim](bigint) AS RANGE left FOR VALUES
    (    0,  10000,  20000,  30000,  40000,  50000,
     60000,  70000,  80000,  90000, 100000, 110000,
    120000, 130000, 140000, 150000, 160000, 170000,
    180000, 190000, 200000, 210000, 220000, 230000,
    240000, 250000, 260000, 270000, 280000, 290000,
    300000, 310000, 320000, 330000, 340000, 350000,
    360000, 370000, 380000, 390000, 400000, 410000,
    420000, 430000, 440000, 450000, 460000, 470000,
    480000, 490000, 500000, 510000, 520000, 530000,
    540000, 550000, 560000, 570000, 580000, 590000,
    600000, 610000, 620000, 630000, 640000, 650000,
    660000, 670000, 680000, 690000, 700000, 710000,
    720000, 730000, 740000, 750000, 760000, 770000,
    780000, 790000, 800000, 810000, 820000, 830000,
    840000, 850000, 860000, 870000, 880000, 890000,
    900000, 910000, 920000, 930000, 940000, 950000,
    960000, 970000, 980000, 990000,1000000,1010000,
   1020000,1030000,1040000,1050000
    )
    PRINT '  ** [pf_bigint_left_test_part_elim] partition function created'
END

PRINT '  * Adding data partition scheme'
IF NOT EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_data')
BEGIN
    CREATE PARTITION SCHEME [ps_primary_test_part_elim_data] AS PARTITION [pf_bigint_left_test_part_elim]
    ALL TO (    [PRIMARY]    )
    PRINT '  ** [ps_primary_test_part_elim_data] partition scheme created'
END

PRINT '  * Adding index partition scheme'
IF NOT EXISTS(SELECT 1 FROM [sys].[partition_schemes] WHERE [name] = 'ps_primary_test_part_elim_index')
BEGIN
    CREATE PARTITION SCHEME [ps_primary_test_part_elim_index] AS PARTITION [pf_bigint_left_test_part_elim]
    ALL TO  (    [PRIMARY]    )
    PRINT '  ** [ps_primary_test_part_elim_index] partition scheme created'
END

And now, let’s create the table:

USE [Test]
GO

CREATE TABLE [test_part_elim]
(    c1    BIGINT IDENTITY(1,1),
     c2    INT    NOT NULL DEFAULT( CEILING(RAND()*1000)),
     c3    VARCHAR(100) NOT NULL DEFAULT('ABC'),
     c4    DATETIME NOT NULL DEFAULT(GETDATE())
)
ON [ps_primary_test_part_elim_data](c1)

CREATE CLUSTERED INDEX [ix_c1] ON [test_part_elim]([c1])
WITH
(   SORT_IN_TEMPDB = ON,
    ONLINE = ON,
    ALLOW_ROW_LOCKS = ON,
    ALLOW_PAGE_LOCKS = ON
)
ON [ps_primary_test_part_elim_data](c1)

CREATE NONCLUSTERED INDEX [ix_c2] ON [test_part_elim]([c2],[c1])
WITH
(   SORT_IN_TEMPDB = ON,
    ONLINE = ON,
    ALLOW_ROW_LOCKS = ON,
    ALLOW_PAGE_LOCKS = ON
)
ON [ps_primary_test_part_elim_index]([c1])

Second, we populated the tables.

INSERT INTO [test_part_elim]([c2],[c3],[c4])
VALUES (DEFAULT, DEFAULT, DEFAULT)
GO 1050000

Third, we start testing. Let’s assume we want to get the MAX of [c4] (which is datetime) and it’s not the partitioning column. One way of doing it is:

SELECT
    MAX([c4]) AS [c]
FROM [test_part_elim]

But the execution plan doesn’t look so good. When dealing with large tables, clustered index scans are never good. The index scan operator cost is close to the subtree cost. The I/O statistics shows these numbers:

(1 row(s) affected)
Table 'test_part_elim'. Scan count 107, logical reads 5145, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Let’s test the same but with MAXDOP 1.

SELECT
    MAX([c4]) AS [c]
FROM [test_part_elim]
OPTION (MAXDOP 1)

The execution plan doesn’t look any better. Same clustered index scan, same range of costs. If we look at the subtree cost for the whole statement, it shows a cost of 5.71247. The operator cost is, again, close to this value. The I/O for the serial plan is the same:

(1 row(s) affected)
Table 'test_part_elim'. Scan count 107, logical reads 5145, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)

But what if we consider the partitions as smaller tables, that can be scanned individually ? Let’s try this approach to get the MAX value we need:

SELECT 
    MAX(y.[c])
FROM sys.partitions p
CROSS APPLY ( SELECT TOP 1
                  MAX([c4]) AS [c]
              FROM [test_part_elim]
              WHERE p.[partition_number] = $PARTITION.pf_bigint_left_test_part_elim([c1])
            ) y
WHERE p.[object_id] = OBJECT_ID('[test_part_elim]')

For starters, the execution plan looks totally different. If we’re examining the execution plan, we’ll see that the subtree cost is now 0.109098. That’s a dramatic change in cost, even though the script scans all 106 partitions in both cases. The operator cost also dropped considerably.

Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)
Table 'Worktable'. Scan count 428, logical reads 1281, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'test_part_elim'. Scan count 107, logical reads 5145, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 2, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Some may argument that the costs we’re seeing would drop if we would have a covering index in place. So, what if we create a covering index for column [c4], will the query have a lower cost with the second approach ? One way to find out… Let’s create an index first.

CREATE NONCLUSTERED INDEX [ix_c4] ON [test_part_elim]([c4],[c1])
WITH
	(	SORT_IN_TEMPDB = ON,
		ONLINE = ON,
		ALLOW_ROW_LOCKS = ON,
		ALLOW_PAGE_LOCKS = ON
	)
ON [ps_primary_test_part_elim_index]([c1])

And let’s see what’s happening with the queries. First thing I noticed, is that the new index is being used.

With script 1, we have these I/O figures.

(1 row(s) affected)
Table 'test_part_elim'. Scan count 107, logical reads 3255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

The logical reads have dropped by approximately 30%.. Execution plan looks a bit different (we’re not having a parallel plan because of the new index), subtree cost have dropped a bit from 5.71247 down to 4.31247 and operator cost dropped as well, from 5.08247 down to 3.68247. It’s good, but let’s see the second approach.

With script 2, the I/O looks way better, with only close to 12% from the whole reads previously.

Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)
Table 'test_part_elim'. Scan count 321, logical reads 630, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 2, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

The execution plan looks similar but check that subtree cost and index scan operator cost!

We can go even further with our tests, by adding a WHERE clause, on one of the non-partitioned columns, maybe even on the same C4 we’re reading. To not make this post too long, let me just say that during my tests, the same improvement pattern has been seen when using WHERE clauses.

Here are 2 scripts I’ve used to test. In the execution plans, check the subtree costs and index scan operator costs and do the math yourself.

SELECT 
	MAX([c4]) AS [c]
FROM [test_part_elim]
WHERE 
	[c1] > 0 AND [c1] < 10000000


SELECT MAX(y.[c])
FROM sys.partitions p
CROSS APPLY (
	SELECT TOP 1
		MAX([c4]) AS [c]
	FROM [test_part_elim]
	WHERE 1=1
		AND p.[partition_number] = $PARTITION.pf_bigint_left_test_part_elim([c1])
		AND [c1] > 0 AND [c1] < 10000000
) y
WHERE p.[object_id] = OBJECT_ID('[test_part_elim]')



SELECT 
	MAX([c4]) AS [c]
FROM [test_part_elim]
WHERE 
	[c4] > DATEADD(DAY, -5, GETDATE())

SELECT MAX(y.[c])
FROM sys.partitions p
CROSS APPLY (
	SELECT TOP 1
		MAX([c4]) AS [c]
	FROM [test_part_elim]
	WHERE 1=1
		AND p.[partition_number] = $PARTITION.pf_bigint_left_test_part_elim([c1])
		AND [c4] > DATEADD(DAY, -5, GETDATE())
) y
WHERE p.[object_id] = OBJECT_ID('[test_part_elim]')

To summarize, we have the following subtree and operator costs:
ComparisonTable

And how do we explain this ?

With method one, we’re scanning all records in all partitions in one go (with or without parallelism) and this is how we’re getting the max value. With method 2, we’re getting one MAX value for each partition and then a MAX across these MAX values is calculated. This explains why the cost dropped so considerably.

Enjoy.

Corruption she (the DBCC) wrote….

I received recently an email from a friend asking for my help as he faced a problem. He runs a small business in the healthcare industry and his needs in terms of IT should and are minimalist, if you want. And that’s expected.

Of course, I asked what was the problem about and he said: “We had our systems down for quite a while and starting 2 days ago, we cannot access our database anymore.”. Long story short, it looked like a database corruption of some sort. I paid him a visit and had a look thru the logs. Wow… Loads of mini-dumps all over the SQL error log, similar to this one:

The affected database seemed ok, was online and from I was able to tell, readable. I was able to read metadata, I was able to read few user tables (spot checks). But looking in the Windows logs, I found an entry that usually sends cold chills up to any DBA’s spine and make them sweat instantly: 824.

I started asking for a bit of a history, I found out that lately, the computer started running slower and slower and 2 days before they had an incident with the computer when it froze and decided to give a go with a CHKDSK. Now, to have the full picture: the computer was not a server hardware, rather an all-in-one desktop computer playing in the big league…

Let me take a break here and explain why I wouldn’t recommend having a desktop computer act as both desktop and server. Main reasons, there’s a human using it day in day out, for all kind of stuff: Excel spreadsheet, navigating the Internet, Word documents, Solitaire, you name it, it’s there. Plus the antivirus. Plus the occasional freeze when humans do what they overhear in buses, on the street: power it down and then power it up. Or reset it, if you’re lucky enough to own a computer which still has the reset button.

For really small offices, such in this case, a reasonably spec’d desktop, could do the job, provided the human factor is out of the equation and the desktop is fairly reliable.

Back to the case, I started wondering how could a CHKDSK cause such havoc, because it shouldn’t. And here’s the gotcha. The drive (yes, the only drive in the system) has been removed, set up as a slave in another computer and CHKDSK was able to finish its job. Hurray, job done. Well, sort of kind of. Because after the drive was installed back in, the second SQL started up, bang! Problems started.

When I learned all these last details, everything started to make sense. It didn’t really matter anymore why the computer was running slower and slower (eventually, I figured that out, I guess – read on), the only thing that matter now was that the database *was* corrupted. And as it turned out later, it was corrupted big time.

Ran a DBCC CHECKDB and got these beauties back, after only 8 minutes:

DBCC results for ‘DBNAME’.
Warning: NO_INDEX option of checkdb being used. Checks on non-system indexes will be skipped.
Msg 8921, Level 16, State 1, Line 10
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.
Msg 8998, Level 16, State 2, Line 10
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 5 pages from (1:695568) to (1:703655). See other errors for cause.
Msg 8998, Level 16, State 2, Line 10
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 5 pages from (1:703656) to (1:711743). See other errors for cause.
Msg 8939, Level 16, State 5, Line 10
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_headerVersion == HEADER_7_0) failed. Values are 104 and 1.
Msg 8939, Level 16, State 7, Line 10
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_freeData >= PageHeaderOverhead () && m_freeData <= (UINT)PAGESIZE – m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8192.
Msg 8939, Level 16, State 11, Line 10
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_ghostRecCnt <= m_slotCnt) failed. Values are 29811 and 0. Msg 8939, Level 16, State 5, Line 10 Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_headerVersion == HEADER_7_0) failed. Values are 104 and 1. Msg 8939, Level 16, State 7, Line 10 Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_freeData >= PageHeaderOverhead () && m_freeData <= (UINT)PAGESIZE – m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8192.
Msg 8939, Level 16, State 11, Line 10
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 77687093572141056 (type Unknown), page (1:695568). Test (m_ghostRecCnt <= m_slotCnt) failed. Values are 29811 and 0. Msg 8939, Level 16, State 5, Line 10 Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 362258295026614272 (type Unknown), page (1:703656). Test (m_headerVersion == HEADER_7_0) failed. Values are 104 and 1. Msg 8939, Level 16, State 7, Line 10 Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 362258295026614272 (type Unknown), page (1:703656). Test (m_freeData >= PageHeaderOverhead () && m_freeData <= (UINT)PAGESIZE – m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8192.
CHECKDB found 7 allocation errors and 3 consistency errors not associated with any single object.
CHECKDB found 7 allocation errors and 3 consistency errors in database ‘DBNAME’.

I came across corruption, I think, twice in my career and I was lucky enough to have it fixed in almost no time. Some indexes got corrupted and in other case it was some 3rd level table that we were able to rebuild it completely from scratch. But this one, for me, was a first. “Master, I have a bad feeling about this”, I said to myself. Did a quick search on the errors and almost if not all results pointed to a non-repairable case, where the only salvation would be a reliable backup.

Yes, you guessed it, there was none. Like in never, since the database was created.

So, I created a backup of the database and the tail-log (well, in the reverse order) and was time to get my hands dirty.

I restored the backup on a virtual machine and started working. Two things that I noticed were that VM was running slower and slower and that the drive where I had the SQL binaries was filling up quite quickly. Then I remembered about those mini-dumps. Checking the default location for those logs, big surprise. That’s what was causing the drive to fill up. Thousands and thousands of files. Because there’s no way of stopping those, I added a new drive to the VM, changed the settings for the mini-dumps to point to that drive and let that fill up (SQL will not stop anyway when that’s filled up, will just roll-over the files). That would explain the symptom that my friend noticed in the first place. I guess the corruption lurked around from some time. To stop these from being generated, I switched the corrupted DB to read-only. But there’s a catch switching it to read-only, at least that’s what I noticed. Switching it to read-only, seems to have limited my ability to read the records that were still salvageable from the corrupted tables. As soon as I switched it back to R/W, I was able to read whatever records were still accessible. I played a bit around with this and seems to have a direct correlation. Once that was cleared out of the way, I moved on to starting recovering the data, or what was still recoverable.

I started by scripting out the database. Once that was done, I moved on and put together a script to export everything I can still read. I managed, this way, to isolate a number of 9 tables that were seriously impacted. The big problem was that 2 of those were at the core of the whole application and 2 table hold a lot of scanned documents, which are really hard to rescan them. So yeah, no pressure.

The impacted table were still readable up to a certain point, so I took advantage of that and exported whatever records were still readable and that was pretty much everything I could salvage. Now the reconstruction work can begin.

I used the script I created when I scripted out the database and created an empty on. Once all the salvageable data was exported, I switched the damaged database to R/O. Exporting the unaffected tables, was a breeze. I created a dedicated schema (call it [export]) and the I simply used SELECT * … INTO [export].[table] FROM [table] and that was it. For the affected ones, I created another schema (let’s call it [staging]) and then I created the required empty tables. Couple of notes here:

  • I created the tables with no constraints whatsoever
  • I created the tables with generic column datatypes (VARCHAR(MAX)) to avoid any incompatibilities

Once the staging tables have been populated, I can start reconstructing whatever lost data was possible. And here, all depends on what the data model is, what the relations / foreign keys are in place, a lot of factors are at play here. In some cases, that’s not even possible.

The last step of this will be to import the salvaged data into the final tables. Here the data model comes into play, to load the tables in the right order, to have all the FKs check out, so on and so forth. It’s going to be a tedious process, but it has to be done. At least, tried :).

The final chapter of this is yet to be written: I’m at the stage where the staging tables are all populated and I have to try and reconstruct whatever data is possible. And that is a lot of manual work: studying the database diagrams, all the relationships between the tables, etc.

I also issued a list of recommendations to my friend and he was more than happy to follow and we’re now in the process of fixing things: buying a new hardware for the server role, changing how things are done. It’s good to see that people understands the risks, but it’s sad to see this just after a disaster struck. The bad part in this story, is that only 4%-5% of the impacted data was salvageable and there’s no money in the world that can make up for that. Green area shows how many records I was able to recover from the affected tables out of how many. The biggest issues are with table in rows #1, #3, #4 and #8. All of them were important, but these ones were really all that mattered.

RecoverPercentages

Added bonus to the whole incident ? There’s another instance on that computer which was not affected by this incident, fortunately. YET. It runs on a SQLExpress edition, has some backup in place, but… it’s on the same hard drive. In case something goes wrong with the whole computer, both databases are doomed, no matter that the second one has some sort of backup.

Lessons learned ?

No matter how small your database is, disaster *will* strike. It’s not IF, it’s WHEN. Just make sure you’re prepared:

  • Have proper backup in place
  • Have proper backup testing process in place (because having backup is only half of the protection)
  • Have the backup files stored off of that box on another computer, at least, if not in some cloud or some NAS
  • Check your logs, SQL and Windows
  • Run periodic DBCC CHECKDBs (how frequent, another topic)
  • Make sure you’re looking for early signs of hardware failure (have hard drives with SMART capabilities or similar technologies)
  • Make sure the computer you’re using as server is as isolated in terms of access and workload as possible
  • no other activities should take place on this server other than SQL (or whatever other processes you may have, but definitely no user activities)
  • Invest in some sort of data redundancy technology (software RAID1 at least)
  • Monitor your hardware (HDD space, resource usage, etc)

All the best.

DATETIME and its surprises

Well, it’s about time I write another blog post (with a promise that I’ll do it more than once a year).

I’m going to start a series of “bad habits to kick” as for me it is important that we all have a common ground on SQL Programming.

1.            Consider the following scenario:

SET

LANGUAGE British;

SELECT
CAST(’02/12/2007′
AS
DATETIME);

SET
LANGUAGE us_english;

SELECT
CAST(’02/12/2007′
AS
DATETIME);

SET
LANGUAGE British;

SELECT
CAST(‘2007-02-12’
AS
DATETIME);

SET
LANGUAGE us_english;

SELECT
CAST(‘2007-02-12’
AS
DATETIME);

SET
LANGUAGE British;

SELECT
CAST(‘20070212’
AS
DATETIME);

SET
LANGUAGE us_english;

SELECT
CAST(‘20070212’
AS
DATETIME);

And let’s see the results:

Changed language setting to British.

———————–

2007-12-02 00:00:00.000

(1 row(s) affected)

Changed language setting to us_english.

———————–

2007-02-12 00:00:00.000

(1 row(s) affected)

Changed language setting to British.

———————–

2007-12-02 00:00:00.000

(1 row(s) affected)

Changed language setting to us_english.

———————–

2007-02-12 00:00:00.000

(1 row(s) affected)

Changed language setting to British.

———————–

2007-02-12 00:00:00.000

(1 row(s) affected)

Changed language setting to us_english.

———————–

2007-02-12 00:00:00.000

(1 row(s) affected)

So the first two results are different, while the last query returns the same result! What’s going on?

It has to do with the way Microsoft implemented the date and time data types, starting with SQL Server 2008.

We have two similar data types, DATETIME and DATETIME2. What people fail to realize is that the two data types behave completely differently (Use the timedatedatetime2 and datetimeoffset data types for new work. These types align with the SQL Standard. They are more portable. timedatetime2 and datetimeoffset provide more seconds precision. datetimeoffset provides time zone support for globally deployed applications – see here.)

DATETIME preserves the old, non-standard ANSI  settings, in order to be backwards compatible, while DATETIME2 is pretty much standard compatible.

2.           How can we address this issue?

The answer lies in the code itself (repeated here for clarity):

SET
LANGUAGE British;

SELECT
CAST(‘20070212’
AS
DATETIME);

SET
LANGUAGE us_english;

SELECT
CAST(‘20070212’
AS
DATETIME);

This will return the same date and time; and the reason lies (again) in Books Online, for CAST and CONVERT:

For date and time styles, we have:

12

112

ISO

12 = yymmdd

112 = yyyymmdd

In other words, even if DATETIME is not ISO standard, the way we input data for CAST can be aligned with the ISO standard, thus having the correct results each time.

I hope this helps.

Create certificate from backup fails on SQL Server 2008

This is so going to bite yer ass if you don’t pay attention…

As usual, a very quick post: I tried to create (ok, restore) a certificate from the production server to our DR server, so that the automated restores would work. When I ran the script (yes, CREATE CERTIFICATE … FROM FILE […]), I got this:

Msg 15208, Level 16, State 1, Line 1
The certificate, asymmetric key, or private key file does not exist or has invalid format.

Hm… apart from the usual stuff you would find on the web, I’ll add another one: just make sure that the machine name is the same as the SQL Server (default instance) name. In my case, the computer name displayed: DR20 server, while the SQL Server default instance name was: DR20DB server. Once the SQL Server default instance name was DR20, everything went fine.

I hope this helps some poor soul out there.

Best regards,
io

Say What? – “Management Studio: must be set in single thread apartment (STA) mode before OLE calls can be made”

Yes, I have another couple of minutes to document this weird message.

I was trying to access the database properties for one of our databases. Guess what: I couldn’t as there was no owner! Well, the easiest thing to do is to alter the authorization on the database; as you may have guessed it already, I didn’t pay any kind of attention to the error message, and quickly jumped to the magic world of… creating a diagram (actually the mouse pointer was in the wrong location, I just clicked, so it was not my fault, right?). Well, this time I got to do a print screen:

OK, now let’s copy that stupid message without reading it, select some random text from it, and do a blind search in hope for sheer luck. Click the Copy sign… and there you go, another weird stuff:

Now this one finally got my attention.

It turns out that it used to be a bug (check here), but it was supposed to be fixed. Well, apparently not all the cases were covered.

My wild guess is that:

If you have a database that does not have an owner (how about someone who left the company and we removed the user account – I need to check this) and you want to create a diagram, you will see the first pop-up error message. Trying to copy the message itself (via the Copy feature – yeap, those little sheets of paper in the lower-left corner) will yield the second error.

Of course, the solution is the most difficult one:

ALTER AUTHORIZATION ON DATABASE::MyDB TO sa;

I hope it helps.

Best regards,
io

Login failed. The login is from an untrusted domain and cannot be used with Windows authentication. Error source: Microsoft OLE DB Provider for SQL Server – revisited

This blog post is waiting for me to have some time for the past couple of months by now. It did bite me back then, and so it did yesterday.

“Login failed. The login is from an untrusted domain and cannot be used with Windows authentication. Error source: Microsoft OLE DB Provider for SQL Server” – now, I must emphasize that I’m not going to reproduce the plethora of info that is present out there already, but instead I might be among the few selected ones that indeed had issues with AD domains. If you’re curious about that, read on; if you’re looking for how to enter some lines of code in some connection string, you better go have a coffee or something.

Background: the company that I work for, migrated all their SQL Server 2005 servers (with DAS – Direct Attached Storage) to SQL Server 2008 clustering (on SAN). This obviously means a lot (I mean A LOT) of fun, such as deprecated code, and so on. Besides this, one of the required steps is to swap the IP addresses between the SQL Server 2005 machine and the SQL Server 2008 resource on the cluster (of course, anybody knows that SQL Server 2008 is just a resource on a 2008 cluster, right? – I learned this the hard way).

And now for the fun:

“ONLY jobs that need to run to C:\Program Files (x86)\Microsoft SQL Server\80\Tools\Binn\DTSRun are failing.
The ones that are running C:\Program Files (x86)\Microsoft SQL Server\90\DTS\Binn\dtexec are running fine.
Secondly, the ones running under 80\Tools are the ones using the server IP (10.0.0.31). when you ping the IP address, it still returns SQL03DB as the server.”

Rough translation of the above: all the jobs that running DTSX packages are running fine, while all the jobs that are running DTS (SQL Server 2000 – yes, I know, that old) are failing in SQL Server 2008. Besides that, when I run ping –a 10.0.0.31, I still get the name of the SQL Server 2005 machine, instead of the SQL Server 2008 cluster resource.

This is something that happened early May, and I completely forgot about it. And it happened again yesterday.

Sorry for all that blur stuff, but this is a prod machine. Anyway, as you can see, the IP address of my SQL Server cluster resource is 10.0.0.10.

Now, if I ping –a 10.0.0.10, I get the following:

So instead of getting something in the line of “28DBCLU”, I still get the old SQL Server machine, which used to be “01db”.

Again, oddly enough, this manifested itself only for the DTS packages, which are running DTSRun, found in the \80 folder.

Solution? A very simple one, just ask the sys admins to make the necessary changes in the DNS (yes, that DNS, related to AD, DHCP, all that jazz). Also, it would be very helpful not to have hard-coded values in your hosts file.

And for the grand finale: I posted a question on Twitter, under #sqlhelp, asking whether I should have my system databases on SAN (which mean automatic failover in case something goes wrong with the active node), or if they should be on the default installation path (yes, that Program Files thing). While there was a little variation of the answers, my personal choice is to have them automatically fail over. Just imagine what would happen if you have a failover and the msdb
and master databases are not the same.

Best regards,
io

My mail to SQL Sentry that will win me (yeah, right) a free seat to SQL Skills training!

The amazing guys @SQL Sentry (www.sqlsentry.net)  are giving away a chance to win a free seat to SQL Skills Internals and Performance training (check out the details: http://www.sqlsentry.net/sqlskillsimmersion.asp).

Of course, I decided to give it a try, so I wrote to the guys; you can check it below:

“Hello there.

Now, last night at around 2:30 AM I stepped down of bed, as the little ones (we have two kids) kept kicking me in the head for far too long. Since I’m a hopeless DBA, the very first thing to do is to open the laptop and check on the emails. I almost fell down from chair when I saw Paul’s email regarding your offer. And I said to myself ok, I’m still asleep and dreaming, let’s read that again in the morning. Which I did. And there it was, the offer.

I so MUST HAVE to be there! Let’s see why:

I live in Romania, which, by all standards, had a strict comunist regime until 1989. While it was strict in regards with computers (we didn’t even have a word for computer), there was a plethora of underground illegal stuff, such as cable TV. This way we got introduced to good movies with hot chicks. Movies that immediately after the Romanian revolution (December 1989 the revolution, early ’90ies for the rest of the story) turned into even more good looking chicks. Of course, the entire gang fell in love with such angels. Even back then we knew that we had no chance with them, but at least we could try to get something similar in Romania.

By all standards, none of us were quite hot-shots, so we decided that the best way to pick up gorgeous girls is to have money – a truth that happens to be the same worldwide. It also happened that unfortunately all of us were raised to be honest, not to steal, all that jazz, so we realized that we would have to make a career that would pay enough money to stand out from the crowd. There you go, we got into IT.

This was a big mistake for several reasons: as everywhere else in the world, the true hot shots never go to work, ever; as they say, if you work, you don’t have time to make money. But the worse was to happen: we all fell in love with this new… thing, which was computers. Not to mention that back then the Internet changed it all (yeap, I’m that old, I keep telling that Internet and internet are not the same). So before any of us could realize what’s happening, we pretty much forgot about the movies and the hot chicks and got our hands dirty into all sorts of things, such as networking (myself included) and programming (myself excluded).

Fast forward a few years later, around the beginning of the millennia, most of us were working in the IT field (by the way, it is still growing exponentially in Romania). Now, there is an old saying: he who can do, will do; he who cannot, will teach. Of course, most of the gang members got to work, while I got to teach (as a technical trainer in various capacities).

Quick jump into 2008: I got a job offer as a database administrator in SQL Server. Does it have icons? Can I work with a mouse? Yes. OK, I’m in. It was back then when I fell in love with the damn thing. Then I got to read great blogs from great people, such as Brent Ozar and Paul Randal, just to name a few. Back then, I would have said that thanks to them, I kept my love for SQL Server growing.

Another quick jump into 2011: I got another job as a SQL Server DBA, but this time the mouse was the least of my worries. Just to give you an example: guys, do any of you know anything about clustering? None of us did. OK, you have three days to pick it up, we’re migrating all of our databases to SQL Server 2008 clusters. We did pick it up.

This year (2012) I got to TechEd Orlando, only to make a fool out of myself in front of Brent Ozar (if you check his presentation, he makes fun of me for asking for a card to show his card to my wife).

To conclude: while in 2008 I would have said “thanks to these guys…”, this year I’m saying “because of these guys…”. Because of these guys I still love the damn thing (SQL Server, that is). Because of these guys I’m still wasting nights, trying to pick up things on the fly. You cannot imagine the sheer joy and happiness of my wife when I have to work to replace and consolidate the database servers almost every week-end.

So, if it’s all because of “these guys“, why not let them handle me? At least in part? They made me forget my initial goal, of being a hot shot, and turned me into some sort of a weird kookoo, trying to do his best. Worse part yet; because of them I got again into training, blogging, and trying to help other people.

I absolutely need this course (and not only): it will massively shorten the path to great knowledge and less work. I’m planning to that ASAP; right now, the kids are still calling me daddy, but with the crazy hours that I have to put, I’m not sure it will be the case in the near future.”