leetcode day 29 – longest substring

First solution worked, but was slow and used a lot of memory (5th percentile for both):

func lengthOfLongestSubstring(s string) int {
    if s == "" {
        return 0
    }
    
    // This looks like a sliding window.
    longest := []byte{}
    current := []byte{}
    m := make(map[byte]int)
    bs := []byte(s)
    
    i := 0
    scan := 0

    for {
        // scan from current point.
        m[bs[scan]]++
        if m[bs[scan]] < 2 {
            // still no repeat, now add it
            current = append(current, bs[scan])
            scan++
            if scan == len(bs) {
                // no more chars to add, current finished
                // and string finished
                break
            }
        } else {
            // repeat; restarting scan at next point
            // for a new possible substring, see if 
            // the new substring is the new longest.
            if len(current) > len(longest) {
                longest = current
            }
            // in either case, we have to try for a new
            // non-repeater in the next window
            i++
            if i == len(bs) {
                // no new window left, finish up
                break
            }
            scan = i
            current = []byte{}
            m = make(map[byte]int)
        }
    }
    if len(current) > len(longest) {
        longest = current
    }
    return len(longest)
}

So this is the right track — the map to manage the characters we’ve seen is definitely right, but there has to be a trick here — and that is to use the index of the character as the hash value, and use that to help us manage the sliding window.

func lengthOfLongestSubstring(s string) int {
    if s == "" {
        return 0
    }

    m := make(map[byte]int)
    bs := []byte(s)
    
    start := 0
    maxLen := 0
    currentLen := 0

    // Iterate through the string and slide the window each
    // time you find a duplicate.
	// end is the end of current substring
	for end := 0; end < len(bs); end++ {
		// Check if the current character is a duplicate
		if dupIndex, ok := m[bs[end]]; ok {
            // New max?
            if maxLen < end - start {
                maxLen = end - start
            }
			// forget index for all characters before the dup
			m = make(map[byte]int)

			// Slide the window to just after the dup.
			start = dupIndex + 1
            currentLen = 0
            end = start
		}
		// Add the current character to hashmap
		m[bs[end]] = end
        currentLen++
	}
    if currentLen > maxLen {
        maxLen = currentLen
    }
    return maxLen
}

That’s a little better, but it’s still pretty slow. 13th percentile runtime, 20th percentile CPU. (If you use strings, the percentiles drop to 5 and 5. No better than the dumb solution!) Looking at the solutions, it is possible to do the whole thing with indexes…but I definitely do not grok it.

func lengthOfLongestSubstring(s string) int {
    if s == "" {
        return 0
    }
    if len(s) == 1 {
        return 1
    }

    // brackets current longest found
    startMax := 0
    endMax := 0

    // start of window
    start := 0
    dupIndex := 0

     // Scan end pointer of window forward
    for end := 1; end < len(s); end++ {
        // The "I didn't find it" value
        dupIndex = -1
        // looking for duplicate inside window
        for j := start; j < end; j++ {
            if s[j] == s[end] {
                // Found it
                dupIndex = j
                break
            }
        }
        // Dup found inside window
        if dupIndex >= 0 {
            if (end - start - dupIndex) > (endMax - startMax) {
              startMax = dupIndex
              endMax = end
            }
            start = dupIndex + 1
        } else {
            // no dup in window
            if (end - start + 1) > (endMax - startMax) {
              startMax = start
              endMax = end + 1
            }
        }
    }
    return endMax - startMax
}

71st percentile CPU, 96th percentile memory (memory makes sense, we barely use anything other than indexes into the string). However, the big winner is dynamic programming:

func lengthOfLongestSubstring(s string) int {
    rv := 0
    // array as hash for extra speed and no dynamic memory.
    // create it, mark it all invalid.
    last := [256]int{}
    for i := range last {
        last[i] = -1
    }

    // define min inline.
    min := func(a, b int) int {
        if a < b {
            return a
        }
        return b
    }

    size := 0
    for i, b := range []byte(s) {
        size = min(size + 1, i - last[b])
        last[b] = i
        rv = rv + size - min(rv, size) // max(rv, size)
    }

    return rv
    
}

This is black magic and I do not understand it yet!

Comments

Leave a Reply