- The Most Accurate Way to Calculate the Median in MySQL
- Use UDFs (User-Defined Functions) to Add Functionality to MySQL
In this tutorial, we will show a couple of ways to efficiently calculate the median of a series of numbers in MySQL.
MySQL does not have a built-in function to calculate the statistical median of a series of numbers. That is why, if you need to calculate a median, you must do it using the custom code.
The median is the middle element of the series of numbers you are analyzing. Detecting this element can be a slow process if you work with a large sample and do not do it efficiently. The process can also be tricky in some situations. For example, some algorithms may work on odd sets of numbers but not on even sets, or vice versa.
The Most Accurate Way to Calculate the Median in MySQL
Suppose you have a table
dataset with a column
MyNumber of type float, and you need to get the median of the data series of
MyNumber. The most accurate way to calculate the median with an algorithm that works in all cases would be to use the following script:
SELECT AVG(mid_vals) AS 'median' FROM ( SELECT tab1.MyNumber AS 'mid_vals' FROM ( SELECT @row:=@row+1 AS 'row', a.MyNumber FROM dataset AS a, (SELECT @row:=0) AS r ORDER BY a.MyNumber ) AS tab1, ( SELECT COUNT(*) as 'count' FROM dataset x ) AS tab2 WHERE tab1.row >= tab2.count/2 and tab1.row <= ((tab2.count/2) +1)) AS tab3;
If the table has a large row count, you will need an index on column
MyNumber and make sure that index is used for filtering and sorting for the statement to perform well.
Use UDFs (User-Defined Functions) to Add Functionality to MySQL
Math and statistical functions are very commonplace, and you should not reinvent the wheel if you need to use a function such as the median. Therefore, you can grab a well-tested user-defined function that does the job.
On this site, you will find a bunch of UDFs you could use free of charge to add statistical functionality to MySQL. One of those UDFs is the
median function. It has one input parameter and one output parameter, both of type
REAL. The input parameter is the column that contains the set of numbers to calculate the median, and the output parameter is the calculated median.
After you download the source code, you add the UDF by using this command:
CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';
Then, you can use the function within a SELECT statement, like this:
SELECT MEDIAN(MyNumber) FROM dataset