In Redis, the LCS
command implements the longest common subsequence algorithm.
The longest common subsequence algorithm finds the longest subsequence common to all sequences in a set of sequences. Note that this is different to the longest common string algorithm (also known as the longest common substring algorithm), which requires that matching characters in the string are contiguous. The longest common subsequence algorithm, on the other hand, doesn’t require matching characters to be contiguous.
The LCS command was introduced in Redis 7.0.0.
Syntax
LCS key1 key2 [LEN] [IDX] [MINMATCHLEN len] [WITHMATCHLEN]
When no arguments are specified, the string representing the longest common subsequence is returned.
The arguments work as follows:
LEN | Returns the length of the longest common subsequence. |
IDX | Returns an array with the LCS length and all the ranges in both the strings, start and end offset for each string, where there are matches. |
MINMATCHLEN | When used with IDX , limits the results to only those matches of a given minimal length. |
WITHMATCHLEN | When used with IDX , returns the length of each match. |
These arguments are demonstrated in the following examples.
Example
Let’s set two keys:
MSET firstname "Rick" lastname "Richardson"
Result:
OK
Now let’s use LCS to find the longest common subsequence between the two keys:
LCS firstname lastname
Result:
"Ric"
Only the first three characters matched.
Also, in this case, all three characters were contiguous (i.e. consecutive). However, LCS
doesn’t require them to be contiguous.
Let’s use values that contain some non-contiguous common values:
MSET firstname "Randi" lastname "Rancid"
Result:
OK
Now let’s run LCS
against those two keys again:
LCS firstname lastname
Result:
"Rani"
We can see that the Rani
sequence is common to both values, even though not all characters are contiguous.
We can also see that, although both keys contain the letter d
, that character is out of sequence, and so it’s not included in the result.
The LEN
Argument
We can use the LEN
argument to get the length of the resulting string:
LCS firstname lastname LEN
Result:
(integer) 4
Here we get an integer reply of 4
because that’s the length of the resulting LCS operation.
The IDX
Argument
We can use the IDX
argument to get the position of each match:
LCS firstname lastname IDX
Result:
1) "matches" 2) 1) 1) 1) (integer) 4 2) (integer) 4 2) 1) (integer) 4 2) (integer) 4 2) 1) 1) (integer) 0 2) (integer) 2 2) 1) (integer) 0 2) (integer) 2 3) "len" 4) (integer) 4
Matches are returned from the last one to the first one.
The above array means that the first match (second element of the array) is between positions 0 and 2 of the first string and 0 and 2 of the second. This is reporting on the Ran
match.
Then there is another match at position 4 in the first string and 4 in the second. This is reporting on the i
match.
The MINMATCHLEN
Argument
We can use the MINMATCHLEN
argument to specify a minimum length:
LCS firstname lastname IDX MINMATCHLEN 3
Result:
1) "matches" 2) 1) 1) 1) (integer) 0 2) (integer) 2 2) 1) (integer) 0 2) (integer) 2 3) "len" 4) (integer) 4
In this case I specified a minimum length of 3, and so only those matches with a length of three or more are returned.
The WITHMATCHLEN
Argument
We can also use the WITHMATCHLEN
argument to return the actual length of each match:
LCS firstname lastname IDX MINMATCHLEN 1 WITHMATCHLEN
Result:
1) "matches" 2) 1) 1) 1) (integer) 4 2) (integer) 4 2) 1) (integer) 4 2) (integer) 4 3) (integer) 1 2) 1) 1) (integer) 0 2) (integer) 2 2) 1) (integer) 0 2) (integer) 2 3) (integer) 3 3) "len" 4) (integer) 4
Here, I lowered the MINMATCHLEN
to 1
in order to better demonstrate WITHMATCHLEN
.
Case Sensitivity
Be mindful of case sensitivity when using LCS
:
MSET firstname "Rick" lastname "ricK"
Result:
OK
Now let’s use LCS to find the longest common subsequence between the two keys:
LCS firstname lastname
Result:
"ic"
Only sequences with matching case were returned.