MySQL ANY_VALUE() Function Explained

In MySQL, ANY_VALUE() is an aggregate function that allows us to include nonaggregated columns in the SELECT list when using the GROUP BY clause.

The ANY_VALUE() function is commonly used to resolve issues that arise when a query includes columns that are not part of the GROUP BY clause or aggregate functions. It can be useful in scenarios where we want to include descriptive columns in a grouped result set without affecting the grouping behaviour.

Syntax

The syntax goes like this:

ANY_VALUE(arg)

Where arg represents the column or value we want to include in the result set. It can be a column name, a literal value, or an expression involving columns and operators.

Functionality

The primary purpose of the ANY_VALUE() function is to eliminate the ambiguity that arises when nonaggregated columns are included in a GROUP BY query.

By default, when the ONLY_FULL_GROUP_BY SQL mode is enabled, MySQL requires that all nonaggregated columns in the SELECT list be included in the GROUP BY clause. However, in some cases, we may want to include additional columns without affecting the grouping behaviour.

When we apply the ANY_VALUE() function to a nonaggregated column within a GROUP BY query, MySQL selects an arbitrary value from the group to display in the result set. It effectively informs MySQL that any value from the group is acceptable, resolving the ambiguity and allowing the query to execute successfully.

Example

Suppose we run the following query:

SELECT 
    District, 
    Name, 
    Population 
FROM City 
WHERE CountryCode = 'AUS';

Result:

+-----------------+---------------+------------+
| District        | Name          | Population |
+-----------------+---------------+------------+
| New South Wales | Sydney        |    3276207 |
| Victoria        | Melbourne     |    2865329 |
| Queensland      | Brisbane      |    1291117 |
| West Australia  | Perth         |    1096829 |
| South Australia | Adelaide      |     978100 |
| Capital Region  | Canberra      |     322723 |
| Queensland      | Gold Coast    |     311932 |
| New South Wales | Newcastle     |     270324 |
| New South Wales | Central Coast |     227657 |
| New South Wales | Wollongong    |     219761 |
| Tasmania        | Hobart        |     126118 |
| Victoria        | Geelong       |     125382 |
| Queensland      | Townsville    |     109914 |
| Queensland      | Cairns        |      92273 |
+-----------------+---------------+------------+

This query returns a list of cities and their populations, along with the district that each city belongs to.

Now suppose we want to group the results by district, and include the population for each district. And we don’t want to include the city names in the grouping, but we would like to have a sample city returned for each district.

We can do the following to achieve that outcome:

SELECT 
    District, 
    ANY_VALUE(Name), 
    SUM(Population) FROM City
WHERE CountryCode = 'AUS'
GROUP BY District;

Result:

+-----------------+-----------------+-----------------+
| District        | ANY_VALUE(Name) | SUM(Population) |
+-----------------+-----------------+-----------------+
| New South Wales | Sydney          |         3993949 |
| Victoria        | Melbourne       |         2990711 |
| Queensland      | Brisbane        |         1805236 |
| West Australia  | Perth           |         1096829 |
| South Australia | Adelaide        |          978100 |
| Capital Region  | Canberra        |          322723 |
| Tasmania        | Hobart          |          126118 |
+-----------------+-----------------+-----------------+

This query does exactly what we want. It returns the total population of each district, and displays a sample city from within each district. The sample cities were generated by the ANY_VALUE() function.

In the above query, the ANY_VALUE() function allows us to include the Name column in the result set without including it in the GROUP BY clause. We are still able to group the results by the District column and obtain the total population of each district. The ANY_VALUE() function ensures that any city name within the given district is displayed in the result set, even though it is not part of the grouping operation.

Here’s what happens when I remove the ANY_VALUE() function:

SELECT 
    District, 
    Name, 
    SUM(Population) FROM City
WHERE CountryCode = 'AUS'
GROUP BY District;

Result:

ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'world.City.Name' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I get an error, because my ONLY_FULL_GROUP_BY SQL mode is enabled. Let’s verify that:

SELECT @@sql_mode;

Result:

+-----------------------------------------------------------------------------------------------------------------------+
| @@sql_mode                                                                                                            |
+-----------------------------------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
+-----------------------------------------------------------------------------------------------------------------------+

Yes, we can see the first item is ONLY_FULL_GROUP_BY.

When ONLY_FULL_GROUP_BY is enabled, MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.

We can either disable ONLY_FULL_GROUP_BY or use ANY_VALUE() to overcome this restriction and display results like in the above example. The ANY_VALUE() function allows us to generate a result without changing our sql_mode.

Just to be sure, here’s what happens if we add the Name column to the GROUP BY clause:

SELECT 
    District, 
    Name, 
    SUM(Population) FROM City
WHERE CountryCode = 'AUS'
GROUP BY District, Name;

Result:

+-----------------+---------------+-----------------+
| District        | Name          | SUM(Population) |
+-----------------+---------------+-----------------+
| New South Wales | Sydney        |         3276207 |
| Victoria        | Melbourne     |         2865329 |
| Queensland      | Brisbane      |         1291117 |
| West Australia  | Perth         |         1096829 |
| South Australia | Adelaide      |          978100 |
| Capital Region  | Canberra      |          322723 |
| Queensland      | Gold Coast    |          311932 |
| New South Wales | Newcastle     |          270324 |
| New South Wales | Central Coast |          227657 |
| New South Wales | Wollongong    |          219761 |
| Tasmania        | Hobart        |          126118 |
| Victoria        | Geelong       |          125382 |
| Queensland      | Townsville    |          109914 |
| Queensland      | Cairns        |           92273 |
+-----------------+---------------+-----------------+

We no longer get the error, but it has completely screwed up our results in the process. We no longer get the total population for each district, which was our main goal. It’s basically outputting the results of our original query without the GROUP BY clause. This demonstrates the benefit of the ANY_VALUE() function.

That said, sometimes we might want to include all the cities in our query results (while also obtaining the total population of each district). In such cases, we could use the WITH ROLLUP modifier to calculate the totals for each district.

Conclusion

The ANY_VALUE() function in MySQL enables the inclusion of nonaggregated columns in the SELECT clause without affecting the grouping behaviour. It serves as a handy option for enhancing query flexibility and addressing the challenges associated with nonaggregated columns in GROUP BY queries.