How APPROX_COUNT_DISTINCT() Works in SQL Server

APPROX_COUNT_DISTINCT() is one of the new functions introduced in SQL Server 2019. This function returns the approximate number of unique non-null values in a group.

Basically, you can use it to get an approximate idea of the number of non-duplicate rows in a large table or result set. It works similar to the COUNT_BIG() and COUNT() functions (when using the DISTINCT clause), but it returns an approximate number rather than a precise number.

APPROX_COUNT_DISTINCT() is aimed mainly at big data scenarios. It’s designed for accessing large data sets with more than a million rows, and aggregation of a column or columns that have many distinct values. It is intended for scenarios where responsiveness is more critical than absolute precision.

Microsoft states that the function implementation guarantees up to a 2% error rate within a 97% probability.

At the time of writing, APPROX_COUNT_DISTINCT() is a public preview feature. It was introduced in SQL Server 2019, which is also currently in preview status.

Note that Microsoft states that preview features are not intended for production use.

Continue reading

Install SQL Server 2019 on a Mac

I previously explained how to install SQL Server on a Mac via a Docker container. When I wrote that, SQL Server 2017 was the latest version of SQL Server, and it had just been made available for Linux and Docker (which means that you can also install it on MacOS systems).

In late 2018, Microsoft announced SQL Server 2019 Preview, and subsequently announced general release in late 2019. The installation process for SQL Server 2019 is exactly the same as for SQL Server 2017. The only difference is that you need to use the container image for SQL Server 2019 instead of the 2017 image. Here I show you how to do that.

Also, if you already have SQL Server 2017 installed, and you want to install SQL Server 2019 without removing the 2017 version, you’ll need to allocate a different port number on your host. I show you how to do that too.

But first… if you’re using a Mac with the M1 chip, see How to Install SQL Server on an M1 Mac (ARM64).

Continue reading

COUNT() vs COUNT_BIG() in SQL Server: What’s the Difference?

In SQL Server, the COUNT_BIG() function and the COUNT() do essentially the same thing: return the number of items found in a group. Basically, you can use these functions to find out how many rows are in a table or result set.

In many cases, you’ll be able to choose whichever one you prefer. However, there’s a difference between these two functions that might dictate that you to use one over the other.

The difference is that COUNT() returns its result as an int, whereas COUNT_BIG() returns its result as a bigint.

In other words, you’ll need to use COUNT_BIG() if you expect its results to be larger than 2,147,483,647 (i.e. if the query returns more than 2,147,483,647 rows).

Continue reading

How COUNT_BIG() Works in SQL Server

In SQL Server, the COUNT_BIG() function returns the number of items found in a group. You can use it to find out how many rows are in a table or result set.

This function works similar to the COUNT() function. The difference is that COUNT() returns its result as an int, whereas COUNT_BIG() returns its result as a bigint.

Therefore COUNT_BIG() could come in handy if you expect your result set to have a very large number of rows (i.e. larger than 2,147,483,647).

Continue reading

How to Include Results that Tie for Last Place when Using the TOP Clause in SQL Server

When using the TOP clause in a query in SQL Server, you may encounter occasions where two or more results tie for last place. You probably wouldn’t even know when this happens, because the default behavior of TOP is to return no more than the number of rows you specify.

The TOP clause accepts a WITH TIES argument that allows you to specify whether or not to include all results that tie for last place. Rows can tie for last place due to their ORDER BY column containing the same value. Using this argument may therefore result in more rows being returned than you actually specified.

Continue reading