How UNCOMPRESSED_LENGTH() Works in MariaDB

In MariaDB, UNCOMPRESSED_LENGTH() is a built-in function that returns the length of a compressed string before it was compressed with the COMPRESS() function.

Syntax

The syntax goes like this:

UNCOMPRESSED_LENGTH(compressed_string)

Where compressed_string is the compressed string.

Example

Here’s a basic example:

SELECT UNCOMPRESSED_LENGTH(COMPRESS(REPEAT('z', 50)));

Result:

+------------------------------------------------+
| UNCOMPRESSED_LENGTH(COMPRESS(REPEAT('z', 50))) |
+------------------------------------------------+
|                                             50 |
+------------------------------------------------+

Here, I used the REPEAT() function to repeat the same letter 50 times. Repeating the character makes it a good candidate for compression.

Compared to the Compressed String

Here it is again, compared to the compressed string:

SELECT
    LENGTH(COMPRESS(REPEAT('z', 50))) AS "Compressed",
    UNCOMPRESSED_LENGTH(COMPRESS(REPEAT('z', 50))) AS "Uncompressed";

Result:

+------------+--------------+
| Compressed | Uncompressed |
+------------+--------------+
|         16 |           50 |
+------------+--------------+

Byte Length vs Character Length

The UNCOMPRESSED_LENGTH() reports on the length in bytes (as opposed to just character length).

We can test this by using a multi-byte character, such as the copyright symbol (©), then passing it to functions such as LENGTH() to get the length in bytes, and CHAR_LENGTH() to get the character length:

SELECT 
    UNCOMPRESSED_LENGTH(COMPRESS(REPEAT('©', 50))) AS "UNCOMPRESSED_LENGTH()",
    LENGTH(REPEAT('©', 50)) AS "Uncompressed (Bytes)",
    LENGTH(COMPRESS(REPEAT('©', 50))) AS "Compressed (Bytes)",
    CHAR_LENGTH(REPEAT('©', 50)) AS "Uncompressed (Char)",
    CHAR_LENGTH(COMPRESS(REPEAT('©', 50))) AS "Compressed (Char)";

Result (using vertical output):

UNCOMPRESSED_LENGTH(): 100
 Uncompressed (Bytes): 100
   Compressed (Bytes): 17
  Uncompressed (Char): 50
    Compressed (Char): 17

Here’s another example that uses a Thai character:

SELECT 
    UNCOMPRESSED_LENGTH(COMPRESS(REPEAT('อ', 50))) AS "UNCOMPRESSED_LENGTH()",
    LENGTH(REPEAT('อ', 50)) AS "Uncompressed (Bytes)",
    LENGTH(COMPRESS(REPEAT('อ', 50))) AS "Compressed (Bytes)",
    CHAR_LENGTH(REPEAT('อ', 50)) AS "Uncompressed (Char)",
    CHAR_LENGTH(COMPRESS(REPEAT('อ', 50))) AS "Compressed (Char)";

Result (using vertical output):

UNCOMPRESSED_LENGTH(): 150
 Uncompressed (Bytes): 150
   Compressed (Bytes): 18
  Uncompressed (Char): 50
    Compressed (Char): 18

This Thai character uses 3 bytes, and therefore we end up with 150 bytes for the uncompressed string, even though the string is only 50 characters long.

Null Arguments

Passing null returns null:

SELECT UNCOMPRESSED_LENGTH(null);

Result:

+---------------------------+
| UNCOMPRESSED_LENGTH(null) |
+---------------------------+
|                      NULL |
+---------------------------+

Missing Argument

Calling UNCOMPRESSED_LENGTH() with the wrong number of arguments, or without passing any arguments results in an error:

SELECT UNCOMPRESSED_LENGTH();

Result:

ERROR 1582 (42000): Incorrect parameter count in the call to native function 'UNCOMPRESSED_LENGTH'