Programming Question: Knuth-Morris-Pratt Algorithm for String Search

Mipsmonsta
6 min readMar 24, 2020

Part 1: How to construct a Longest Prefix Suffix Table

Photo by Omar Flores on Unsplash

Usual Question Set-Up

Given a pattern and a string, find the first index of the string at which the pattern can be found fully in the string. This is the usual set-up. Immediately, any seasoned programmer would be able to think of the naive solution which is as follows:

  1. Compare first character in pattern with first character in string.
  2. If matched, checked remaining characters in pattern as we traverse along string.
  3. If not matched, compare first character in pattern with second character in string…

Notice that we are abandoning all the progress in matching the pattern with string whenever we hit a mis-match. This is wasteful. And the big O notation for time complexity is O(n*m) where n is the length of the string and m is the length of the pattern.

Stop Waste — Progress with Abortive Work

Along came three very smart gentlemen (Knuth, Morris and Pratt) in the 1970s that decided to cut waste and devise a way to do the pattern matching in linear time of big O notation i.e. O(n+m).

So what’s the savings? Notice in point 3 above that if there are mismatch encountered between the pattern and string, the next comparison will be to jump back to the first character of the pattern. What if the “jump back” could be smarter and only need to go back to a point? So what is this point?

The Point is…

Suppose the pattern is ‘dwzacdwgx’ and the string is very long but has the sequence ‘….dwzacdwza….’. Assuming the pattern and string matched till before the highlighted ‘z’, we obviously don’t have a full match. But we don’t have to restart from first character of the pattern but just go back to index 2 of the pattern i.e. “dwz…” to continue matching with the string with the encountered mismatched character.

To have the guiding index, we need to build such a “guide” table. The jargon is to build a Longest Prefix Suffix Table or “LPS” for short.

LPS Building Receipe


Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : - - - - - - - - -
Start. Let j be at position of index 0 and i at index 1, set LPS[0] to 0.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 - - - - - - - -
Compare characters at position of j and i to see if matched. They don't match. Since j is at position index 0, set LPS[i] = 0.
Then Increment only i to right by one position.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 0 - - - - - - -
Compare characters at position of j and i to see if matched. They don't match. Since j is at position index 0, set LPS[i] = 0.
Then Increment only i to right by one position.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 0 0 - - - - - -
Compare characters at position of j and i to see if matched. They don't match. Since j is at position index 0, set LPS[i] = 0.
Then Increment only i to right by one position.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 0 0 0 - - - - -
Compare characters at position of j and i to see if matched. They don't match. Since j is at position index 0, set LPS[i] = 0.
Then Increment only i to right by one position.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 0 0 0 0 - - - -
Compare characters at position of j and i to see if matched. They matched. Set LPS[i] = j's position at index + 1 = 0 + 1 = 1
Then increment i and j to right by one position.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 0 0 0 0 1 - - -
Compare characters at position of j and i to see if matched. They matched. Set LPS[i] = j's position at index + 1 = 1+ 1 = 2
Then increment i and j to right by one position.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 0 0 0 0 1 2 - -
Compare characters at position of j and i to see if matched. They matched. Set LPS[i] = j's position at index + 1 = 2 + 1 = 3
Then increment i and j to right by one position.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 0 0 0 0 1 2 3 -
Compare characters at position of j and i to see if matched. They don't.
Special Handling: Token j is not in position index 0. Look at LPS Table for the left adjacent neighbour of j i.e. LPS[j-1] = LPS[2] = 0. Move j to position index of LPS[2] = 0.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 0 0 0 0 1 2 3 -
Compare characters at position of j and i to see if matched. They don't. Since j is at position index 0, set LPS[i] = 0.
Then Increment only i to right by one position.
j i
Pattern : d w z a c d w z a
index : 0 1 2 3 4 5 6 7 8
LPS Table : 0 0 0 0 0 1 2 3 0
Since i is outside pattern index range, end.

If we analyze the variations of instructions, you will only see four types:

One:

Start. Let j be at position of index 0 and i at index 1, set LPS[0] to 0.

Two:

Compare characters at position of j and i to see if matched. They don't match. Since j is at position index 0, set LPS[i] = 0. 
Then Increment only i to right by one position.

Three:

Compare characters at position of j and i to see if matched. They matched. Set LPS[i] = j's position at index + 1 = 2 + 1 = 3
Then increment i and j to right by one position.

Four:

Compare characters at position of j and i to see if matched. They don't. 
Special Handling: Token j is not in position index 0. Look at LPS Table for the left adjacent neighbour of j i.e. LPS[j-1] = LPS[2] = 0. Move j to position index of LPS[2] = 0.

Armed with the above. Can you write code to churn out the LPS table based on the alignment of the four variations of instructions. Have a go. Scroll down for the Python code once you are ready for spoilers.

Code — See Function prefixSuffixTable

class Solution: #O(n+m) is a solution by KMP
"""
@param: source: A source string
@param: target: A target string
@return: An integer as index
"""
def strStr2(self, source, target):
pass #not implemented yet.


def prefixSuffixTable(self, target):
length = len(target)
lpsTable = [None]*length
i = 1
j = 0
lpsTable[0] = 0
while i < length:
if target[i] == target[j]: #match, so we log in lps table the index of j + 1, then increment positions of i and j
#by 1
lpsTable[i] = j + 1
i += 1
j += 1
else: #don't match i.e. target[i] == target[j]
if j == 0: #j cannot back to the left anymore
lpsTable[i] = 0
i += 1
else:
j = lpsTable[j - 1]
return lpsTable

That’s it for Part 1. In part 2, we will see how to use the LPS table for actual matching of the pattern with the string.

--

--

Mipsmonsta

Writing to soothe the soul, programming to achieve flow