UPDATE
After testing my code through based on JP’s comments, I’ve realized my implementation was way too naïve and cannot be used for most datasets. For a correct weighted random implementation, see Dems’ answer on StackOverflow.
Original (flawed) implementation
There are no built-in functions for selecting weighted averages in SQL Server. Fortunately it’s a simple task to do so oneself.
We’ll use this table as an example:
CREATE TABLE #tmp
(
Name varchar(64),
Points int
)
INSERT INTO #tmp VALUES ('Mark', 25);
INSERT INTO #tmp VALUES ('Jakob', 12);
INSERT INTO #tmp VALUES ('Peter', 17);
INSERT INTO #tmp VALUES ('Anders', 0);
INSERT INTO #tmp VALUES ('Kirsten', 33);
INSERT INTO #tmp VALUES ('Mads', 4);
This table represents a list of players in an arbitrary game. The more points you have, the bigger the chance of winning. It has to be weighted, meaning that the person with just 4 points may win, but is unlikely to do so.
The RAND() function in SQL Server returns a floating point number between 0 and 1. Multiplying that with our points gives a random weight based on the amount of points. Unfortunately the RAND() function is seeded once for each query, not for each row - meaning that for each row RAND() will yield the same result, effectively multiplying the points with a constant all the way through. We need to provide a new seed for the RAND() function for each row. NEWID() returns a new unique identifier that may be used as a seed if cast to VARBINARY:
SELECT Name, Points, RAND(CAST(NEWID() AS VARBINARY)) * Points AS Weight FROM #tmp ORDER BY Weight DESC
Name Points Weight
Peter 17 15,9795741766356
Mark 25 14,9122204505153
Kirsten 33 9,67888480542761
Jakob 12 9,38697608441358
Mads 4 0,833340539027792
Anders 0 0
And here we have the result ordered by weight. As you can see, although Kirsten has the most points, Peter ended up winning the competition.
FIG1: Showing the statistical distribution of using RAND(NEWID())
DECLARE @SampleSize int = 100000;
WITH RND AS
(
select top (@SampleSize) ROUND(RAND(CAST(NEWID() AS VARBINARY)), 1, 1) AS RandValue from sys.objects cross join sys.columns
)
SELECT
COUNT(*),
RandValue,
COUNT(*) / CAST(@SampleSize AS float) * 100 AS [%]
FROM
RND
GROUP BY
RandValue
ORDER BY
COUNT(*) DESC
Results:
COUNT VAL %
10117 0,8 10,117
10091 0,4 10,091
10073 0 10,073
10034 0,9 10,034
9996 0,5 9,996
9993 0,2 9,993
9956 0,7 9,956
9927 0,6 9,927
9923 0,3 9,923
9890 0,1 9,89