How NULL Values Can Affect your Results When Using the COUNT() Function in SQL

The SQL COUNT() function is a handy tool for telling us how many rows would be returned in a query. We can pass a column name to the function or we can pass the asterisk (*) wildcard to indicate all columns.

If a column contains NULL values, we could get different results, depending on whether we use the column name or the asterisk (*).

Basically, it works like this:

  • Using COUNT(column_name) eliminates NULL values from evaluation.
  • Using COUNT(*) includes NULL values in the total.

Example

To demonstrate this, suppose we create a table with a single column and populate that column with NULL in all rows:

CREATE TABLE t1 (
  c1 VARCHAR(50)
);

INSERT INTO t1 VALUES (NULL);
INSERT INTO t1 VALUES (NULL);
INSERT INTO t1 VALUES (NULL);

SELECT c1 FROM t1;

Result:

+------+
| c1   |
+------+
| NULL |
| NULL |
| NULL |
+------+
3 rows in set (0.00 sec)

We inserted three rows with a single column, all of which contain NULL. The SELECT statement returned all three rows, even when we explicitly used the column name our SELECT list.

Now let’s run a query with COUNT(column_name) against that table:

SELECT COUNT(c1) FROM t1;

Result:

+-----------+
| COUNT(c1) |
+-----------+
|         0 |
+-----------+

The COUNT() function tells us that zero rows match our criteria.

And now let’s use COUNT(*) instead:

SELECT COUNT(*) FROM t1;

Result:

+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+

This time all NULLs were included in the total. This is true even though there’s only one column in the table (and therefore no other columns can interfere with our results).

So in a nutshell, we got different results depending on whether we passed the column name or the asterisk (*) wildcard.

Factoring in Multiple Columns

The above example is overly simplistic in that the table only contains a single column. In reality, we’ll more than likely use tables with multiple columns.

Here’s an example with two columns:

CREATE TABLE t2 (
  c1 INT,
  c2 INT
);

INSERT INTO t2 VALUES (1, NULL);
INSERT INTO t2 VALUES (2, NULL);
INSERT INTO t2 VALUES (NULL, NULL);
INSERT INTO t2 VALUES (NULL, NULL);

SELECT COUNT(c1) AS "c1" FROM t2;
SELECT COUNT(c2) AS "c2" FROM t2;
SELECT COUNT(*) AS "*" FROM t2;

Result:

+----+
| c1 |
+----+
|  2 |
+----+

+----+
| c2 |
+----+
|  0 |
+----+

+---+
| * |
+---+
| 4 |
+---+

As expected, whenever we passed the column name to COUNT(), it excluded NULL values from evaluation. Regarding the * symbol, its result reflected the total number of rows, including the rows where both columns were NULL.

So in summary, we inserted four rows into the table and COUNT(*) returned 4, even though some of those rows contained nothing but NULL. The COUNT(column_name) syntax on the other hand, returned just the non-NULL rows. Its result depended on the column passed to the COUNT() function.

The DISTINCT Operator

The COUNT() function accepts a DISTINCT operator that removes duplicate rows from its result. Therefore, this can cause COUNT() to return a lower number than we’d get if we didn’t use DISTINCT.

We can also use DISTINCT outside of the COUNT() function (in order to actually return the distinct rows – not just count them).

It’s important to know that the DISTINCT operator could yield different results, depending on whether it’s part of the COUNT() function or not. This is because COUNT(column_name) removes NULL values from evaluation, as we’ve already seen, but NULL values are treated differently outside of aggregate functions like this.

If we simply select all rows from our t2 table:

SELECT c1 FROM t2;

Result:

+------+
| c1   |
+------+
|    1 |
|    2 |
| NULL |
| NULL |
+------+
4 rows in set (0.00 sec)

We get four rows, two of which are NULL.

But if we apply the DISTINCT operator:

SELECT DISTINCT c1 FROM t2;

Result:

+------+
| c1   |
+------+
|    1 |
|    2 |
| NULL |
+------+
3 rows in set (0.01 sec)

Only three rows are returned. This is because multiple NULLs are treated as duplicates. Therefore, the query returned just one NULL in its result (i.e. it removed redundant duplicate NULL values).

But COUNT(DISTINCT column_name) shows a different story:

SELECT COUNT(DISTINCT c1) FROM t2;

Result:

+--------------------+
| COUNT(DISTINCT c1) |
+--------------------+
|                  2 |
+--------------------+

Here, we get just two rows. All NULL values have been removed. The COUNT() function removed all NULL values from its evaluation.