word break: now where have I seen this before?

I’m now doing some paired leetcode study sessions with a former ZipRecruiter co-worker, Marcellus Pelcher (LinkedIn). As he says, “I’ve been doing these [leetcode exercises] for a long time!” and it’s been great both to get to spend some time with him and to work together on some problems.

I’m generally doing OK on easy problems so after getting our video call working, we looked at the word break problem. This essentially boils down to a pattern match: given a list of words, and a string of characters, figure out if the string can be broken into pieces that all match a word in the list of words.

Examples?

  • leetcode and ["leet", "code"] succeeds (by inspection!)
  • catsandog and ["cat", "cats", "sand", "dog"] fails (no combo lets us match the final og).

We agreed that this was probably meant to be a dynamic programming problem (way to saddle myself with a hard one right out of the gate!). I put together a recursive greedy algorithm (similar to the good old SNOBOL4 pattern matcher) in ten minutes or so, with a certain amount of flailing at accessing the strings, but it took too long to get to a solution, and it was a huge memory hog. Marcellus did a dynamic programming solution in Java that used a bit array to track the matches, and had it done and passing in about 15-20 minutes. So one for him and zero for me! 🙂

I let the solution rattle around in my head and after taking a shower, it suddenly dawned on me that this was very much like the coin change problem, but instead of reducing the count, we’re trimming off substrings. I went back to work from there and finally realized that the bit array was tracking the ends of successful matches, which made the dynamic programming recurrence go like this:

  • Start at the beginning of the string (position 0) and find all words that match at that position. Record where they end. That’s our starting configuration.
  • Continue moving forward through the string, comparing all the words at the current position.
  • If the word matches at the current position, it successfully extends a match if and only if a previous match ended right before where this one starts. If this is true, then we record the end of the newly-matched word. Otherwise we just continue to the next word.
  • If a match ends at the end of the string, we’ve successfully found some match or another; for this problem, that’s all that matters.

Let’s do an example. Let’s say our string is “catsupondog” and our words are “cat”, “cats”, “catsup”, “upon”, “on”, and “dog”. At position zero, “cat”, “cats”, and “catsup” all matched, so we’ll record the ending indexes in a bit array. After our startup matches, things look like this:

catsupondog
0123456789A
00110100000
 AA A
 || |
 || +-- catsup matched up to here
 |+------ cats matched up to here
 +-------- cat matched up to here

As the index moves up through the string, we keep checking all the words against it. If we match another word, we then look to see if the position just before the start of this new match marks the end of a successful match.

When we get to position 4, “upon” matches, and position 3 is true, so this extends the match, and we mark position 7 as the end of a successful match.

catsupondog
0123456789A
00110101000
   A   A
   |   |
   |   +--- "upon" extends the "cats" match
   +------- "cats" match ended here

When we get to position 6, “on” matches, and we mark position 7 as end of a match again. (Note that marking it twice is fine; we just have two different successful matches that end there, and for this problem, which one got us to 7 doesn’t matter.)

catsupondog
0123456789A
00110101000
     A A
     | |
     | +--- "on" extends the "catsup" match
     +----- "catsup" match ended here

When we finally get to position 8, “dog” matches, and position 7 is true, so this extends the match again. We’re at the end, so we’re done, and successful. We don’t have to check any more past position 8.

catsupondog
0123456789A
00110101001
       A  A
       |  |
       |  +--- "dog" matches, reaches the end, and succeeds
       +------ end of "on" and "upon" matches; which one doesn't matter
        

Enough chatter, here’s some code.

func wordBreak(s string, wordDict []string) bool {
    
    // Records where matches ended successfully.
    matched := make([]bool, len(s))

    for i := 0; i < len(s); i++ {
        // Check every word at the current position.
        for _,word := range(wordDict) {
            if i + len(word)-1 < len(s) {
                // Word can fit in remaining space
                if s[i:i+len(word)] == word {
                    // matches at the current index
                    if i == 0 || matched[i-1] {
                        // If we're at zero, we always succeed.
                        // Otherwise, check if a successful match ended right
                        // before this one. If it did, mark the end of this
                        // match as successful too.
                        matched[i+len(word)-1] = true
                        if i+len(word)-1 == len(s)-1 {
                            // Short-circuit if we just successfully
                            // reached the end with this match
                            break
                        }
                    }
                }
            }
        }
    }
    return matched[len(s)-1]
}

Bottom 7% in performance, but I’m giving myself a win because I actually figured out a dynamic programming problem by getting what the recurrence was. Marcellus told me, I just didn’t follow his explanation! [EDIT: I removed a fmt.Println of the bit array inside the loop and now it’s top 80%! Calling that a pass for an interview too.]

Reply