How APPROX_COUNT_DISTINCT() Works in SQL Server

APPROX_COUNT_DISTINCT() is one of the new functions introduced in SQL Server 2019. This function returns the approximate number of unique non-null values in a group.

Basically, you can use it to get an approximate idea of the number of non-duplicate rows in a large table or result set. It works similar to the COUNT_BIG() and COUNT() functions (when using the DISTINCT clause), but it returns an approximate number rather than a precise number.

APPROX_COUNT_DISTINCT() is aimed mainly at big data scenarios. It’s designed for accessing large data sets with more than a million rows, and aggregation of a column or columns that have many distinct values. It is intended for scenarios where responsiveness is more critical than absolute precision.

Microsoft states that the function implementation guarantees up to a 2% error rate within a 97% probability.

At the time of writing, APPROX_COUNT_DISTINCT() is a public preview feature. It was introduced in SQL Server 2019, which is also currently in preview status.

Note that Microsoft states that preview features are not intended for production use.

Read more

COUNT() vs COUNT_BIG() in SQL Server: What’s the Difference?

In SQL Server, the COUNT_BIG() function and the COUNT() do essentially the same thing: return the number of items found in a group. Basically, you can use these functions to find out how many rows are in a table or result set.

In many cases, you’ll be able to choose whichever one you prefer. However, there’s a difference between these two functions that might dictate that you to use one over the other.

The difference is that COUNT() returns its result as an int, whereas COUNT_BIG() returns its result as a bigint.

In other words, you’ll need to use COUNT_BIG() if you expect its results to be larger than 2,147,483,647 (i.e. if the query returns more than 2,147,483,647 rows).

Read more

How COUNT_BIG() Works in SQL Server

In SQL Server, the COUNT_BIG() function returns the number of items found in a group. You can use it to find out how many rows are in a table or result set.

This function works similar to the COUNT() function. The difference is that COUNT() returns its result as an int, whereas COUNT_BIG() returns its result as a bigint.

Therefore COUNT_BIG() could come in handy if you expect your result set to have a very large number of rows (i.e. larger than 2,147,483,647).

Read more

How NTILE() Works in SQL Server

In SQL Server, the NTILE() function allows you to distribute the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at 1. For each row, NTILE() returns the number of the group to which the row belongs.

You simply provide the number of groups you desire when calling the function.

Read more

How ROW_NUMBER() Works in SQL Server

In SQL Server, the ROW_NUMBER() function allows you to number the output of a result set. It returns the sequential number of each row, starting at 1.

If you specify partitions for the result set, each partition causes the numbering to start over again (i.e. the numbering will start at 1 for the first row in each partition).

Read more

How DENSE_RANK() Works in SQL Server

In SQL Server, the DENSE_RANK() function returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of distinct ranks that come before the row.

This function is similar to RANK(), but without the gaps in the ranking values that can occur with RANK() when ties exist in the result set.

Read more

How to Include Results that Tie for Last Place when Using the TOP Clause in SQL Server

When using the TOP clause in a query in SQL Server, you may encounter occasions where two or more results tie for last place. You probably wouldn’t even know when this happens, because the default behavior of TOP is to return no more than the number of rows you specify.

The TOP clause accepts a WITH TIES argument that allows you to specify whether or not to include all results that tie for last place. Rows can tie for last place due to their ORDER BY column containing the same value. Using this argument may therefore result in more rows being returned than you actually specified.

Read more