Weighted random selections in SQL Server
UPDATE
After testing my code through based on JP’s comments, I’ve realized my implementation was way too naïve and cannot be used for most datasets. For a correct weighted random implementation, see Dems’ answer on StackOverflow.
Original (flawed) implementation
There are no built-in functions for selecting weighted averages in SQL Server. Fortunately it's a simple task to do so oneself.
We'll use this table as an example:
CREATE TABLE #tmp
(
Name varchar(64),
Points int
)
INSERT INTO #tmp VALUES ('Mark', 25);
INSERT INTO #tmp VALUES ('Jakob', 12);
INSERT INTO #tmp VALUES ('Peter', 17);
INSERT INTO #tmp VALUES ('Anders', 0);
INSERT INTO #tmp VALUES ('Kirsten', 33);
INSERT INTO #tmp VALUES ('Mads', 4);
This table represents a list of players in an arbitrary game. The more points you have, the bigger the chance of winning. It has to be weighted, meaning that the person with just 4 points may win, but is unlikely to do so.
The RAND() function in SQL Server returns a floating point number between 0 and 1. Multiplying that with our points gives a random weight based on the amount of points. Unfortunately the RAND() function is seeded once for each query, not for each row - meaning that for each row RAND() will yield the same result, effectively multiplying the points with a constant all the way through. We need to provide a new seed for the RAND() function for each row. NEWID() returns a new unique identifier that may be used as a seed if cast to VARBINARY:
SELECT Name, Points, RAND(CAST(NEWID() AS VARBINARY)) * Points AS Weight FROM #tmp ORDER BY Weight DESC
Name Points Weight
Peter 17 15,9795741766356
Mark 25 14,9122204505153
Kirsten 33 9,67888480542761
Jakob 12 9,38697608441358
Mads 4 0,833340539027792
Anders 0 0
And here we have the result ordered by weight. As you can see, although Kirsten has the most points, Peter ended up winning the competition.
FIG1: Showing the statistical distribution of using RAND(NEWID())
DECLARE @SampleSize int = 100000;
WITH RND AS
(
select top (@SampleSize) ROUND(RAND(CAST(NEWID() AS VARBINARY)), 1, 1) AS RandValue from sys.objects cross join sys.columns
)
SELECT
COUNT(*),
RandValue,
COUNT(*) / CAST(@SampleSize AS float) * 100 AS [%]
FROM
RND
GROUP BY
RandValue
ORDER BY
COUNT(*) DESC
Results:
COUNT VAL %
10117 0,8 10,117
10091 0,4 10,091
10073 0 10,073
10034 0,9 10,034
9996 0,5 9,996
9993 0,2 9,993
9956 0,7 9,956
9927 0,6 9,927
9923 0,3 9,923
9890 0,1 9,89