In SQL databases, COUNT()
is a commonly used aggregation function that returns the number of rows in a group. In this article, I run some examples of the COUNT()
function in DuckDB. DuckDB is a high-performance analytical database system that’s designed to be fast, reliable, portable, and easy to use.
Syntax
We can use the following variations when using the COUNT()
function:
count()
count(*)
count(arg)
The first two return the number of rows in a group. They’re basically synonyms – they do the same thing.
The third variation returns the number of non-null values in arg
.
Basic COUNT()
Usage
Let’s start by creating some sample data to work with:
CREATE TABLE customers (
customer_id INTEGER,
name VARCHAR,
country VARCHAR,
age INTEGER
);
INSERT INTO customers VALUES
(1, 'Jacy Brittle', 'USA', 35),
(2, 'Sofia Garcia', 'Spain', 28),
(3, 'Wei Li', 'China', 42),
(4, 'Sal Chen', 'USA', 31),
(5, 'Ahmed Hassan', 'Egypt', 39);
Let’s now use the COUNT()
function to get the number of customers:
SELECT COUNT() AS total_customers
FROM customers;
Result:
+-----------------+
| total_customers |
+-----------------+
| 5 |
+-----------------+
As mentioned, we have the option of using the asterisk wildcard (*
). The function produces the same results whether we use it or not:
SELECT
COUNT(),
COUNT(*)
FROM customers;
Result:
+--------------+--------------+
| count_star() | count_star() |
+--------------+--------------+
| 5 | 5 |
+--------------+--------------+
We can see that both variations returned the same result. Interestingly, the result uses count_star()
as the column name for both variations.
The COUNT(arg)
Variation
We can also specify column names when using the COUNT()
function. When we do this, the function returns the number of non-null values in the column.
Let’s create another table:
-- Create a table with some NULL values
CREATE TABLE orders (
order_id INTEGER,
customer_id INTEGER,
amount DECIMAL,
discount DECIMAL
);
INSERT INTO orders VALUES
(1, 1, 100.00, NULL),
(2, 1, 75.50, 10.00),
(3, 2, 200.00, 25.00),
(4, 3, 150.00, NULL),
(5, 2, 300.00, 50.00);
Now let’s find out how many orders have a discount:
SELECT
COUNT(discount) AS orders_with_discount
FROM orders;
Result:
+----------------------+
| orders_with_discount |
+----------------------+
| 3 |
+----------------------+
This query simply counted up the non-null values in the discount
column, which tells us that three orders received a discount.
The DISTINCT
Clause
The DISTINCT
clause is a popular clause in SQL that allows us to narrow the results down to just the unique values. DuckDB supports this clause, and so we can go ahead and use it with the COUNT()
function to get the number of unique customers in the table.
Example:
SELECT
COUNT(customer_id) AS all_customer_rows,
COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;
Result:
+-------------------+------------------+
| all_customer_rows | unique_customers |
+-------------------+------------------+
| 5 | 3 |
+-------------------+------------------+
We can see that there are five rows with (non-null) customer_id
values, but only three of those are unique (due to some customer_id
s being listed more than once). The reason these customer_id
s are listed multiple times is because some customers have more than one order. The DISTINCT
clause provides a quick and easy way to “de-dupe” these customer_id
rows.
COUNT()
with Filtering
We can use COUNT()
to count the number of rows that would be returned from a filtered query.
Suppose we have a transactions
table:
CREATE TABLE transactions (
transaction_id INTEGER,
customer_id INTEGER,
amount DECIMAL,
transaction_date DATE
);
INSERT INTO transactions VALUES
(1, 1, 100.00, '2024-01-01'),
(2, 1, 75.50, '2024-01-02'),
(3, 2, 200.00, '2024-01-02'),
(4, 3, 150.00, '2024-01-03'),
(5, 2, 300.00, '2024-01-03'),
(6, 1, 50.00, '2024-01-03'),
(7, 4, 125.00, '2024-01-04');
The next two examples run filtered queries against this table.
COUNT()
with a WHERE
Clause
This query uses a WHERE
clause to filter it to just those rows where the amount of the transaction is over 100:
SELECT
COUNT(*) AS high_value_transactions
FROM transactions
WHERE amount > 100;
Result:
+-------------------------+
| high_value_transactions |
+-------------------------+
| 4 |
+-------------------------+
COUNT()
with a HAVING
Clause
And here’s an example with a HAVING
clause:
SELECT
customer_id,
COUNT(*) AS transaction_count
FROM transactions
GROUP BY customer_id
HAVING COUNT(*) > 1;
Result:
+-------------+-------------------+
| customer_id | transaction_count |
+-------------+-------------------+
| 1 | 3 |
| 2 | 2 |
+-------------+-------------------+
This query returns a list of customers who have more than one transaction, along with their respective transaction count.
COUNT()
with GROUP BY
Let’s create and populate a products
table:
-- Create a products table
CREATE TABLE products (
product_id INTEGER,
category VARCHAR,
supplier_id INTEGER,
price DECIMAL
);
INSERT INTO products VALUES
(1, 'Electronics', 1, 499.99),
(2, 'Electronics', 2, 299.99),
(3, 'Clothing', 1, 49.99),
(4, 'Clothing', 3, 79.99),
(5, 'Electronics', 2, 799.99),
(6, 'Furniture', 3, 599.99);
Here’s an example of using COUNT()
in a query with a GROUP BY
clause:
SELECT
category,
COUNT(*) AS product_count,
COUNT(*) FILTER (WHERE price > 400) AS expensive_products
FROM products
GROUP BY category
ORDER BY category;
Result:
+-------------+---------------+--------------------+
| category | product_count | expensive_products |
+-------------+---------------+--------------------+
| Clothing | 2 | 0 |
| Electronics | 3 | 2 |
| Furniture | 1 | 1 |
+-------------+---------------+--------------------+
And here’s one that includes the supplier ID:
SELECT
category,
supplier_id,
COUNT(*) AS product_count,
COUNT(*) FILTER (WHERE price > 500) AS expensive_products
FROM products
GROUP BY category, supplier_id
ORDER BY category, supplier_id;
Result:
+-------------+-------------+---------------+--------------------+
| category | supplier_id | product_count | expensive_products |
+-------------+-------------+---------------+--------------------+
| Clothing | 1 | 1 | 0 |
| Clothing | 3 | 1 | 0 |
| Electronics | 1 | 1 | 0 |
| Electronics | 2 | 2 | 1 |
| Furniture | 3 | 1 | 1 |
+-------------+-------------+---------------+--------------------+
A More Comprehensive Analysis
Suppose we have a sales
table:
-- Create a sales table
CREATE TABLE sales (
sale_id INTEGER,
product_id INTEGER,
customer_id INTEGER,
sale_date DATE,
quantity INTEGER,
status VARCHAR
);
INSERT INTO sales VALUES
(1, 1, 1, '2024-01-01', 2, 'Completed'),
(2, 2, 2, '2024-01-01', 1, 'Pending'),
(3, 1, 3, '2024-01-02', 3, 'Completed'),
(4, 3, 1, '2024-01-02', 1, 'Cancelled'),
(5, 2, 4, '2024-01-03', 2, 'Completed');
Here’s a query that combines multiple COUNT()
operations to provide a comprehensive analysis:
SELECT
sale_date,
COUNT(*) AS total_sales,
COUNT(DISTINCT customer_id) AS unique_customers,
COUNT(*) FILTER (WHERE status = 'Completed') AS completed_sales,
COUNT(*) FILTER (WHERE status = 'Cancelled') AS cancelled_sales,
COUNT(*) FILTER (WHERE quantity > 1) AS multi_item_sales
FROM sales
GROUP BY sale_date
ORDER BY sale_date;
Result:
+------------+-------------+------------------+-----------------+-----------------+------------------+
| sale_date | total_sales | unique_customers | completed_sales | cancelled_sales | multi_item_sales |
+------------+-------------+------------------+-----------------+-----------------+------------------+
| 2024-01-01 | 2 | 2 | 1 | 0 | 1 |
| 2024-01-02 | 2 | 2 | 1 | 1 | 1 |
| 2024-01-03 | 1 | 1 | 1 | 0 | 1 |
+------------+-------------+------------------+-----------------+-----------------+------------------+