improve.dk
Just another mindless drone looking for the perfect stack
posts - 227, comments - 489

Saving space by storing decimal values in integer data types

Written on May 31, 2011 by Mark S. Rasmussen in SQL Server: Internals, SQL Server: Tricks, SQL Server: Data Types

I recently stumbled upon a question on Stack Overflow on how best to reduce their data size as it’s growing out of hand. As the original author hasn’t replied back yet (as of writing this post, I’m making some assumptions on the scenario – so take it as an abstract scenario). The basic scenario is that they have a number of measuring stations, each one of those containing a lot of equipment reporting back to a SQL Server in a schema like the following abstract:

CREATE TABLE Measurements
(
	DataID bigint IDENTITY,
	StationID int,
	MeasurementA real,
	MeasurementB real,
	MeasurementC real
	... 100 more columns
)

They’re willing to loose some precision of the data, for the purpose of saving space. As some of the data is measuring wind speed in meters/sec and air pressure, I’m making the assumptions that most of the data will be in the 0-200 and 500-2000 ranges, depending on the scale used.

If the wind speed does not need accuracy further than two decimals, storing it in a 4 byte real column is a lot of waste. Instead we might store it in a smallint column, saving 2 bytes per column. The data would be converted like so:

35.7   => 35.7   * 100 = 3,570
1.38   => 1.38   * 100 = 138
155.29 => 155.29 * 100 = 15,529
84.439 => 84.439 * 100 = 8,443 (with the .9 being rounded off due to integer math)

So by multiplying all the values by 100, we achieve a precision of two decimal points, with all further decimal points being cropped. As the smallint max value is 32,767, the maximum value we could store in this format would be:

327.67 => 327.67 * 100 = 32,767

Which is probably enough for most wind measurements. Hopefully.

For the larger values in the 500-2000 ranges, we can employ the same technique by multiplying by 10. This only gives us a single digit of precision, but allows for values in the –3,276.8 to 3,276.7 range, stored using just 2 bytes per column. Employing the same technique we could also store values between 0 and 2.55 in a single byte tinyint column, with a precision of two digits.

Unless you really need to save those bytes, I wouldn’t recommend you do this as it’s usually better to store the full precision. However, this does show that we can store decimals in integer data types with a bit of math involved.

Feedback

Gravatar

Christian Horsdal wrote on 5/31/2011 1:08 PM

Seems fixed point values would also do the trick in some cases.
Gravatar

Mark S. Rasmussen wrote on 5/31/2011 1:15 PM

Indeed they can. However, the samllest fixed point value in SQL Server takes up 4 bytes on disk so it'd only make sense, in regards of saving space, if you're currently underutilizing a 5+ byte data type for the same data.
Gravatar

Henrik Staun Poulsen wrote on 8/26/2011 12:45 PM

I've also been involved in an experiment with user defined types.

This does indeed save space, as a Decimal(5,2) can be stored in 2 bytes, if your range is between -327 and +327

But at a higher CPU cost, so we ended up buying more disks instead.

Best regards,
Henrik
Gravatar

Mark S. Rasmussen wrote on 8/26/2011 6:39 PM

To my knowledge, the smallest decimal/numeric value you can store is 5 bytes. One byte to store the sign bit/byte, and 4 bytes to store the value from precision 1-9 (decimal(x,y) with x < 5 will internally be stored just as a decimal (9, y)).

I'm not sure whether the actual data part of a vardecimal might be smaller, but either way, vardecimal itself will take up two extra bytes for the variable length offset array, so it'd end up more or less the same.

Post Comment

Name  
Email
Url
Comment
Please add 5 and 5 and type the answer here: