improve.dk
Just another mindless drone looking for the perfect stack
posts - 227, comments - 489

Avoiding page splits by… Splitting pages

Written on May 24, 2011 by Mark S. Rasmussen in SQL Server: Optimization

Continuing my review of my old database designs, I stumbled upon yet another mind numbing design decision. Back then, I’d just recently learned about the whole page split problem and how you should always use sequentially valued clustered keys.

This specific table needed to track a number of views for a given entity, storing a reference to the entity and the time of the view. As I knew page splits where bad, I added a clustered index key like so:

CREATE TABLE EntityViews
(
	ViewID int identity PRIMARY KEY CLUSTERED,
	EntityID int NOT NULL,
	Created datetime NOT NULL,
	OtherData char(20)
)

With a schema like this, insertions won’t cause fragmentation as they’ll follow the nice & sequential ViewID identity value. However, I did realize that all of my queries would be using EntityID and Created as a predicate, reading most, if not all, of the columns. By clustering on ViewID, I’d have to scan the entire table for all queries. As that obviously wouldn’t be efficient, I added a nonclustered index:

CREATE NONCLUSTERED INDEX IDX_EntityID_Created ON EntityViews (EntityID, Created) INCLUDE (OtherData)

If you’re shaking your head by now, good. This index solved my querying issue as I could now properly seek my data using (WHERE EntityID = x AND Created BETWEEN y AND z) predicates. However, the nonclustered index contains all of my columns, including ViewID as that’s the referenced clustered key. And thus I’m storing all my data twice! My clustered index is neatly avoiding fragmentation, but my nonclustered index (that contains all the same data!) is experiencing the exact fragmentation issues that I originally wanted to avoid!

Realizing this fact, the correct schema would’ve been:

CREATE TABLE EntityViews
(
	EntityID int NOT NULL,
	Created datetime NOT NULL,
	OtherData char(20)
)
CREATE CLUSTERED INDEX CX_EntityID_Created ON EntityViews (EntityID, Created)

We save the surrogate key value bytes for each row and all the data is stored only once. There’s no need for secondary indexes as all the data is stored in the natural querying order. However, page splitting will occur as EntityID won’t be sequential. This is easily avoided by scheduling REINDEX & REUBILD as appropriate.

Furthermore, as the clustered key is sorted on Created secondarily, older non-fragmented data won’t be affected – it’ll only affect the most recently added pages, which are probably in memory anyways and thus won’t cause problems for querying.

Feedback

Gravatar

Sjang wrote on 5/24/2011 5:46 PM

I know you know, but to clarify, then it's perhaps worth mentioning, that to avoid those page splits by doing rebuilds, you would need to rebuild the indexes with a fill factor of less than 100, to leave room for the new rows. How much lower than 100 depends on the volume of inserts happening between the rebuild schedules.
If the fill factor is left to the default of 100, then you would still experience page splits :)
Gravatar

Mark S. Rasmussen wrote on 5/24/2011 6:04 PM

Actually I would still use a fill factor of 0/100. Given that my clustered key is (EntityID, Created), I'm not touching any of my archival data, I'm only adding to the end of the page chain.

If I were to rebuild with a fill factor of less than 100, I'd waste a lot of space on my archival data pages, while only the last page (partitioned by EntityID) would actually gain something from the extra space.

Given that, it's much cheaper just to throw a REORGANIZE at the index regularly as only a small portion of the table will be fragmented.

Had we not had a sequential column as the second part of the CX, then yes, I'd definitely go for a lower fill factor :)

Post Comment

Name  
Email
Url
Comment
Please add 2 and 4 and type the answer here: