Author: Joe McMahon

  • Making change via dynamic programming: easy when you see it

    I failed to see this is an induction problem, and did not think of an elegant way to represent “we have no solution” for a given amount.

    The problem description is “given a set of coins of n denominations, figure out the minimum number of coins needed to make change for a given amount. If you can’t make change, return -1.”

    So here’s the induction:

    • Assume it’s impossible to make change for any amount up to the target amount by setting the number of coins to > the max amount. (For this problem it was 2**32, which I’ll refer to as “impossible”).
    • For an amount of 0, it takes 0 coins to make change.
    • For each amount after 0, the number of coins needed to make change for this amount is 1 of the current coin, plus however many coins it took to make change for amount - current coin value.
    • If we couldn’t make change for amount - current coin value (value for that is impossible), then try the next coin.
    • If we run out of coins, we can’t make change for that amount, leave it set to impossible, and we move on to the next amount up.
    • When we reach the final amount, we’ve either found a chain of sub-solutions that reach 0 (change is made with n coins, the sum of all the sub-solutions) or it’s impossible.

    Because this builds a table of solutions from the bottom up, we’re always guaranteed that the solution for any amount < the current amount has already been solved, and we always try all the coins for every amount, so we’re guaranteed to find a solution if there is one, even if the coins are not in sorted order.

    I chose to use int64‘s and set the FAILURE amount to an amount specified in the problem description as definitely larger than any amount that is possible. You could do it with a map[int]int, checking for “no entry”, but using FAILURE allows the lookback to always work with just a less-than test, so it’s probably faster. I’ve updated the variable names to make it clearer how this works.

    
    func coinChange(coins []int, amount int) int {
        const FAILURE int64 = 5000000000
    
        // Assume everything is impossible...
        changeFor := make([]int64, amount+1)
        for i := 0; i < len(changeFor); i++ {
            // effectively Inf
            changeFor[i] = FAILURE
        }
    
        // ...except 0. Change for 0 is 0 (it takes no coins to make change
        // for 0.
        changeFor[0] = 0
    
        // For every amount up to the target amount (i.e., solve every
        // subproblem):
        for currentAmount := 1; currentAmount <= amount; currentAmount++ {
    
            // Try every coin we have to solve the problem.
            for _, coin := range(coins) {
    
                // If this coin might work...
                if coin <= currentAmount {
    
                    // Get the answer for the subproblem.
                    lookBack := changeFor[currentAmount - coin]
                    
                     // This if statement is doing a lot of elegant
                     // heavy lifting.
                     //
                     // If the subproblem had no solution, the lookback is
                     // FAILURE, and FAILURE + 1 is greater than the current
                     // value (FAILURE or less). This guarantees that we don't
                     // overwrite any solution that was already found, and that
                     // we leave it as FAILURE for this coin!
                     //
                     // If the subproblem DID have a solution, and it's better
                     // (less) than what we have (some number of coins or
                     // FAILURE), then we change the current solution for this
                     // amount to the new count.
                     //
                     // This is why the order of the coins in the coin array
                     // doesn't matter: we will always try ALL the coins for
                     // a given amount, and if a different coin has a better
                     // (smaller) solution, we'll change the solution for the
                     // current amount for the better one!
                     //
                     // This means we're guaranteed to have the best solution
                     // for every amount < the current amount (including 0),
                     // so composing a solution for the current amount from
                     // previous solutions always gets the best one for this
                     // amount.
                     //
                     // The fact that it takes a comment THIS LONG to accurately
                     // explain TWO LINES OF CODE...!
                    if lookBack + 1 < changeFor[currentAmount] {
                        // Previous number of coins solving this, plus one more
                        // for the current coin solving this amount better than
                        // we previously did. We never get here if we don't find
                        // a solution for any of the coins, leaving this set to
                        // FAILURE.
                        changeFor[currentAmount] = lookBack + 1
                    }
                }
            }
        }
        // If we never found a solution, the caller wants -1.
        if changeFor[amount] == FAILURE {
            return -1
        }
        // Retrieve the solution from the solution cache. If we were
        // going to use this a lot, we'd save the cache.
        return int(changeFor[amount])
    }
    

    I hope this commented code shows how disgustingly elegant this is. I didn’t find this solution, but I feel like working through why it works has helped me get a better grasp on dynamic programming.

    Lessons learned:

    • What can you solve by inspection?
    • How do you mark things as “not solved” or “insoluble”?
    • Don’t be afraid to solve a lot of seemingly unneeded subproblems if you can use them to break down the problem you have.

    I initially tried a greedy solution for this problem, and that is actually optimal if the set of coins you’re given can always make up any number. (E.g., US coins are 1 cent, 5 cents, 10 cents, 25 cents, and 50 cents; you can use those to make any number of cents from 1 to 100.) If the set of coins doesn’t guarantee this, then you have to use the dynamic programming approach to solve it.

    The second, recursive try might have worked if I’d cached solutions found along the way, effectively turning the iterative lop here into a recursive one, but I didn’t and it DNF’ed.

    Right, stats. 79th percentile on runtime, 80th on memory. Pretty good.

  • leetcode month review

    Summary: I am going to have to do a lot more practice, and brush up on some things, especially dynamic programming, if I’m going to do well on medium to hard questions.

    The good:

    • The practice in the past few weeks has got the idiosyncrasies of Go slices and arrays mostly under my belt. Stuff I was struggling with in the first few exercises is now pretty reflexive. I’m reasonably certain that I can manipulate slices as I need to.
    • I really do remember pointer algorithms from my systems programming days. I was, in some cases, finding it easier to implement algorithms using linked lists for variable linear storage than slices. (Slices have improved a lot though, so I’m not reaching for linked lists first.)
    • I’m doing okay with recursion now.
    • Tree algorithms are not as bad. The medium height-order problem was easy for me to conceptualize; I just had trouble getting the slice of slices to work.
    • If the code can use a queue-based algorithm, I have no problem writing it. 01 matrix was a breeze once I committed to trying the queue, and the same kind of solution worked well for the “clone a graph” problem. I can also see how I could have used it for flood fill.
    • I knocked out the k points closest to the origin problem well under the time limit with a very fast solution; it was pure “how do I fill and then sort a custom data structure?”.
    • sort.Ints, sort.Slice, and sort.Search are now my very good friends and help immensely.

    The bad:

    • I am pretty bad at numerical algorithms, like 2sum/3sum, or the inserted interval. I am not intuiting the algorithms at all well.
    • I am still not properly catching myself falling down rabbit holes or not being on the right track. I need some heuristics. Maybe “if the if statement are more than 2 deep, you may be off course”.
    • If I get a dynamic programming problem, I’m sunk. I might be able to code something if it ends up having a Fibonacci relation, but anything more complex is currently beyond me. The DP solution to the length of the longest substring without repeats is just completely black magic. It works, but I cannot see how it possibly could.

    leetcode feels like programming’s highs and lows all condensed into a single highly compressed experience: sometimes it’s absolutely crystal clear, and you just need to get it all typed in (all of the copying of the graph into an adjacency array and recloning the nodes worked the first time; the only thing that didn’t was the link builder, which I had left one line out of).

    So sometimes I’m enjoying it, and sometimes I am very much not. I sorely miss a real debugger, but I’m getting along with fmt.Println. I would give a lot to be able to build things the way I prefer, bottom up with tests, but leetcode not gonna work that way.

  • leetcode day 30: RPN

    Take an array of strings in RPN and evaluate it.

    This is all about managing the stack. I started trying to code a Stack class, but dumped that for simply using append and slicing to manage the stack. Performance was okay: 73% runtime, 41% memory; probably good enough for a pass.

    func evalRPN(tokens []string) int {
        stack := []int{}
        for _,token := range(tokens) {
            switch token {
                case "+":
                op2 := stack[len(stack)-1]
                op1 := stack[len(stack)-2]
                stack = stack[:len(stack)-1]
                stack[len(stack)-1] = op1 + op2
    
                case "-":
                op2 := stack[len(stack)-1]
                op1 := stack[len(stack)-2]
                stack = stack[:len(stack)-1]
                stack[len(stack)-1] = op1 - op2
    
                case "/":
                op2 := stack[len(stack)-1]
                op1 := stack[len(stack)-2]
                stack = stack[:len(stack)-1]
                stack[len(stack)-1] = op1 / op2
        
                case "*":
                op2 := stack[len(stack)-1]
                op1 := stack[len(stack)-2]
                stack = stack[:len(stack)-1]
                stack[len(stack)-1] = op1 * op2
    
                default:
                  i, err := strconv.Atoi(token)
                  if err != nil {
                      panic(fmt.Sprintf("unparseable token %s", token))
                  }
                  stack = append(stack, i)
            }
        }
        return stack[len(stack)-1]
    }

    My guess is that if I used a real stack pointer to manage the stack, it’d be faster.

    func evalRPN(tokens []string) int {
        stack := make([]int, 10000)
        sp := -1
        for _,token := range(tokens) {
            switch token {
                case "+":
                op2 := stack[sp]
                op1 := stack[sp-1]
                sp--
                stack[sp] = op1 + op2
    
                case "-":
                op2 := stack[sp]
                op1 := stack[sp-1]
                sp--
                stack[sp] = op1 - op2
    
                case "/":
                op2 := stack[sp]
                op1 := stack[sp-1]
                sp--
                stack[sp] = op1 / op2
    
                case "*":
                op2 := stack[sp]
                op1 := stack[sp-1]
                sp--
                stack[sp] = op1 * op2
    
                default:
                  i, err := strconv.Atoi(token)
                  if err != nil {
                      panic(fmt.Sprintf("unparseable token %s", token))
                  }
                  sp++
                  stack[sp] = i
            }
        }
        return stack[sp]
    }

    And no, it is not! Exact same runtime percentile, and used a little less memory (35th percentile).

    Couple of quite elegant idea in the solutions; I really liked this one:

    func evalRPN(tokens []string) int {
        stack := make([]int, 10000)
        sp := -1
        ops := map[string]func(int,int) int{
            "+": func(i, j int) int { return i + j},
            "-": func(i, j int) int { return i - j},
            "/": func(i, j int) int { return i / j},
            "*": func(i, j int) int { return i * j},
        }
        for _,token := range(tokens) {
            i, err := strconv.Atoi(token)
            if err != nil {
                // operator
                stack[sp-1] = ops[token](stack[sp-1], stack[sp])
                sp--
            } else {
                sp++
                stack[sp] = i
            }
        }
        return stack[sp]
    }

    We get rid of the switch altogether using the fact that the tokens are either valid numbers or operators, and we make the operators into anonymous functions in a map. If the token fails to convert to a number, then we look it up and call the appropriate function. This is far more fragile though, since any bad token will result in a “no such key” error on the ops map…and we only gain 3% in the runtime. I’ll settle for the first one as good enough for a pass.

  • leetcode day 30: clone a graph

    Given a graph, clone it and return it with all links intact.

    I chose to use a caching crawl of the graph, and then reconstruct the new graph from the cache.

    /**
     * Definition for a Node.
     * type Node struct {
     *     Val int
     *     Neighbors []*Node
     * }
     */
    
    
    func cloneGraph(node *Node) *Node {
        // need to crawl the graph and make sure we've visited all
        // the nodes, saving the connections at each node. Once we
        // have all the nodes and connections, we can rebuild the
        // graph.
    
        // start at the first node. add it to the queue.
        // while the queue is not empty:
        //  - pull the next item
        //  - if this node is not in the map, add it, and add all the
        //    values from the nodes it links to as the adjacency list.
        //  - for each node in the adjacency list:
        //      if the node is not in the map, add it to the queue
        //         to be explored
        if node == nil {
            return nil
        }
    
        rootNodeVal := node.Val  // will be used to find and return new root
    
        queue := []*Node{node}
        // adjacency list for scanned nodes:
        //   key is node pointer
        //   value is list of adjacent nodes
        adj := map[*Node][]*Node{}
    
        for len(queue) != 0 {
            item := queue[0]
            if len(queue) == 1 {
                queue = []*Node{} // empty it
            } else {
                queue = queue[1:]
            }
            for _,neighbor := range(item.Neighbors) {
                if _, ok := adj[neighbor]; !ok {
                    // not visited, add to queue
                    queue = append(queue, neighbor)
                }
            }
            adj[item] = item.Neighbors
        }
    
        // all nodes crawled. each entry in the map is a node address ->
        // list of neighbors.
        
        type task struct {
            from int
            to int
        }
    
        valToNode := map[int]*Node{}
        linkTasks := []task{}
        for nodeptr, neighbors := range(adj) {
            // we have a pointer to the old node, and all the things it points to.
            // - allocate a new node and record it.
            n := &Node{
                Val: nodeptr.Val,
                Neighbors: []*Node{},
            }
            valToNode[nodeptr.Val] = n
            for _,linkedNode := range(neighbors) {
                if target, ok := valToNode[linkedNode.Val]; ok {
                    // Node has been reallocated, link it
                    n.Neighbors = append(n.Neighbors, target)
                } else {
                    // not allocated yet, add link task for it
                    linkTasks = append(linkTasks, task{from:n.Val, to:linkedNode.Val})
                }
            }
        }
        fmt.Println(valToNode)
        // At this point we have allocated all the nodes, but have links
        // left to handle in the queue. for each item in the queue:
        //    - look up the from and to nodes in the map
        //    - append a link to the 'to' node to the 'from' node's link list.
        // when the queue is empty, the links have been built, and we're done.
        for _,task := range(linkTasks) {
            fmt.Println(task.from, task.to)
            fmt.Println(valToNode[task.from])
            fmt.Println(valToNode[task.to])
            fromNode := valToNode[task.from]
            toNode := valToNode[task.to]
            fromNode.Neighbors = append(fromNode.Neighbors, toNode)
        }
        // return the root node.
        return valToNode[rootNodeVal]
    }

    Works, but slow: 32nd percentile runtime, 8th percentile memory. So there must be some tricks here. (Betting on arrays instead of maps.) Yes, I’ve overengineered it: there are 100 nodes at max, and the node.Val willl be 1-100. SO I can allocate an array instead of a map for my valToNode map. Let’s try that first and see how much speed we get…absolutely none! Exactly the same performance!

    Well. What else can I do to speed this up? All right, let’s make the initial adjacency computation just store the integer node values. Since these are unique per node, and we have 100 or less, we can make that [][]int and see what happens.

    Had a little trouble that I tracked down to the queue drop getting lost in the adjacency computation. Once that was back in, it worked fine, and now is perfectly acceptable: 78th percentile CPU. Memory still sucky, but I’ll take it.

  • leetcode day 30 – I am not friends with slices: level-order tree

    Visit the nodes of a binary tree in level order: all nodes at the root level, left-to-right, all nodes at level 1, left-to-right, etc.

    I did try to solve this with slices and boy did I have trouble. I eventually stopped trying to use them and went to a list-oriented solution with a double closure.

    /**
     * Definition for a binary tree node.
     * type TreeNode struct {
     *     Val int
     *     Left *TreeNode
     *     Right *TreeNode
     * }
     */
    
    type MyList struct {
        head *MyNode
        tail *MyNode
    }
    
    type MyNode struct {
        Val int
        Next *MyNode
    }
    
    func levelOrder(root *TreeNode) [][]int {
        // Ideally, I want to end up with an array
        // of arrays of nodes, added in left-to-right order.
        getStash, helper := createClosures()
        helper(root, 0)
        stash := getStash()
    
        out := [][]int{}
        for i := 0; i < len(stash); i++ {
            out = append(out, []int{})
            thisList := stash[i]
            for scan := thisList.head; scan != nil; scan = scan.Next {
                out[i] = append(out[i], scan.Val)
            }
        }
        return out
    }
    
    func createClosures() (func() []MyList, func(*TreeNode, int)) {
        stash := []MyList{}
    
        var helper func(*TreeNode, int)
        helper = func(root *TreeNode, level int) {
                if root == nil {
                    // Nothing to do at this level
                    return
                }
                // Current node gets stashed at the end of the list for this level.
                // (*output)[level] is the slice to append to.
                // Add new node to list at this level
                if len(stash) <= level {
                    stash = append(stash, MyList{})
                }
                
                n := &MyNode{Val: root.Val}
                if stash[level].head == nil {
                    stash[level].head = n
                    stash[level].tail = n
                } else {
                    stash[level].tail.Next = n
                    stash[level].tail = n
                }
     
                // add the left and right subtrees at the next level down
                helper(root.Left, level + 1)
                helper(root.Right, level + 1)
            }
        return func() []MyList { return stash }, helper
    }

    This solves the problem, but it’s poor on memory (8th percentile) and runtime (26%). On the other hand, the list manipulation worked correctly the first time. Let’s try to speed it up. First let’s try to remove the lists.

    /**
     * Definition for a binary tree node.
     * type TreeNode struct {
     *     Val int
     *     Left *TreeNode
     *     Right *TreeNode
     * }
     */
    
    func levelOrder(root *TreeNode) [][]int {
        // Ideally, I want to end up with an array
        // of arrays of nodes, added in left-to-right order.
        getStash, helper := createClosures()
        helper(root, 0)
        return getStash()
    }
    
    func createClosures() (func() [][]int, func(*TreeNode, int)) {
        // This will be our output structure. As we go down the
        // levels, we add more []int slices to hold the next level
        // of nodes. Because we work left to right, we always add
        // new nodes at a level to the correct slice, and they always
        // are added to the right end of the slice, giving us the
        // output data structure we want.
        stash := [][]int{}
    
        // We have to declare a named variable to be closed over for
        // a Go anonymous function to be able to call itself.
        var helper func(*TreeNode, int)
    
        // The real meat of the process.
        helper = func(root *TreeNode, level int) {
                if root == nil {
                    // Nothing to do at this level
                    return
                }
                // Current node gets stashed at the end of the slice
                //  for this level.
                // stash[level] is the slice to append to.
                if len(stash) <= level {
                    // We have never accessed this level. Add a slice
                    // so appends will work. You CANNOT just assign to
                    // stash[level], because it doesn't exist yet and
                    // you'll get an out of bounds error.
                    stash = append(stash, []int{})
                }
                // Add this node's value to the end of the list at this
                // level.
                stash[level] = append(stash[level], root.Val)
    
                // add the left and right subtrees at the next level down.
                // Because we're tracking the level, we always append to the
                // right slice in the stash, which lives outside the recursion.
                helper(root.Left, level + 1)
                helper(root.Right, level + 1)
            }
        // The two functions the main function needs to call. Stash will be
        // exactly the needed data structure when helper finishes.
        return func() [][]int { return stash }, helper
    }

    That’s a significant improvement; 71st percentile runtime, 24th percentile memory. The problem I had the first time around was not properly managing the slices. To add an element to a slice after the current end, you must use append(), and I needed to do this for each level as I got there. The code to build the output structure from the lists was the learning experience I needed to get that part right in the rewrite.

    Still definitely more comfortable manipulating data structures. Let’s try removing the closure and see how that goes; we’ll just let scoping make the stash visible to the helper.

    /**
     * Definition for a binary tree node.
     * type TreeNode struct {
     *     Val int
     *     Left *TreeNode
     *     Right *TreeNode
     * }
     */
    
    func levelOrder(root *TreeNode) [][]int {
        // Ideally, I want to end up with an array
        // of arrays of nodes, added in left-to-right order.
        stash := [][]int{}
    
    var helper func(*TreeNode, int)
        helper = func(root *TreeNode, level int) {
                if root == nil {
                    // Nothing to do at this level
                    return
                }
                // Current node gets stashed at the end of the list for this level.
                // stash[level] is the slice to append to.
                // Add new node to list at this level
                if len(stash) <= level {
                    stash = append(stash, []int{})
                }
                stash[level] = append(stash[level], root.Val)
    
                // add the left and right subtrees at the next level down
                helper(root.Left, level + 1)
                helper(root.Right, level + 1)
            }
    
        helper(root, 0)
        return stash
    }

    Interestingly, this is actually worse according to leetcode; 68th percentile now (3ms instead of 1ms). Honestly I think we’re in the noise at this point. No change in the memory usage.

    I think if I’d gotten this in an interview I’d call the second or third solution good, and the first marginal. The rewrite time from the list version to the working slice version was only a couple minutes, and I never felt like I didn’t grasp the necessities of the problem.

  • leetcode day 30 – definitely struggling

    3 sum: given a list of numbers, find all distinct triples that sum to 0. The hint was to transform this to a 2 sum problem, and look for the 2 sum that solves the problem for the current index.

    My initial try spent a lot of effort trying to track the used values in maps, and I did not learn my lesson: when the solution feels too hard, you’re probably doing it wrong.

    After I ran out of time, I started looking at solutions; I had started trying to sort the answers and see if there were duplicates, and it sort of worked, but failed some test cases in a non-obvious way.

    The real attack for this kind of problem starts out by sorting the numbers first. You can then start depending on math identities to help winnow the solutions, eliminate duplicates, and shrink the search space all at once.

    Here’s the solution that was clearest, with my comments and variable name adjustments to make the code clearer yet.

    func threeSum(nums []int) [][]int {
        // Output slice.
        out := [][]int{}
        // Sort the numbers ascending first.
        sort.Ints(nums)
    
        var left, right, sum int
    
        // We're going to search for three terms. We're going to just move
        // left to right for the first one.
        for first, target := range nums {
            // Skip duplicate terms at the start. We always look at nums[0]
            // and start deduping after that.
            if i > 0 && a == nums[i-1] {
                // Same as previous so any solutions using this would be dups.
                // Skip to the next item.
                continue
            }
            // Start with second item just after the first and third at the end.
            second, third = first + 1, len(nums) - 1
            // As long as the second pointer hasn't hit the third, keep looking.
            for second < third {
                // calculate the sum
                sum = target + nums[second] + nums[third]
                if sum == 0 {
                    // This is a solution, save it.
                    out = append(out, []int{target, nums[second], nums[third]})
                    // Skip duplicates of the second term. We increment once to
                    // move to the item after the one we're dedeuping.
                    second++
                    for second < third && nums[second] == nums[second-1] {
                        second++
                    }
                } else if sum < 0 {
                    // second number is too small to work; third is the largest
                    // as-yet-unused value, so the second has to be larger to
                    // move the sum closer to zero.
                    second++
                } else { // sum is > 0
                    // third number is too large to work. The second number is
                    // already the smallest possible for the next solution, so
                    // we have to reduce the third to move closer to zero.
                    third--
                }
            }
        }
        return out
    }
    
    

    This is good clever as opposed to bad clever, as it’s clear how this actually works. Notice how we keep shrinking the search space as we move the first pointer upward, and how the duplicate elimination at the first and second pointers speeds this up even further. We’re still O(N^2), but we’re discarding as much possible duplicate work as we can.

    This is the kind of problem that I find quite difficult; I am not very friendly with numbers, despite a lot of programming over the years. I have spent more time chasing control block chains and securing inputs than I have shuffling numbers, and it shows: I have a lot less trouble with linked lists than these numerical problems.

    Not feeling great about my chances to pass one of these live at the moment. Maybe it’s just “more practice needed”. I hope it is, anyway.

  • leetcode day 29 – longest substring

    First solution worked, but was slow and used a lot of memory (5th percentile for both):

    func lengthOfLongestSubstring(s string) int {
        if s == "" {
            return 0
        }
        
        // This looks like a sliding window.
        longest := []byte{}
        current := []byte{}
        m := make(map[byte]int)
        bs := []byte(s)
        
        i := 0
        scan := 0
    
        for {
            // scan from current point.
            m[bs[scan]]++
            if m[bs[scan]] < 2 {
                // still no repeat, now add it
                current = append(current, bs[scan])
                scan++
                if scan == len(bs) {
                    // no more chars to add, current finished
                    // and string finished
                    break
                }
            } else {
                // repeat; restarting scan at next point
                // for a new possible substring, see if 
                // the new substring is the new longest.
                if len(current) > len(longest) {
                    longest = current
                }
                // in either case, we have to try for a new
                // non-repeater in the next window
                i++
                if i == len(bs) {
                    // no new window left, finish up
                    break
                }
                scan = i
                current = []byte{}
                m = make(map[byte]int)
            }
        }
        if len(current) > len(longest) {
            longest = current
        }
        return len(longest)
    }

    So this is the right track — the map to manage the characters we’ve seen is definitely right, but there has to be a trick here — and that is to use the index of the character as the hash value, and use that to help us manage the sliding window.

    func lengthOfLongestSubstring(s string) int {
        if s == "" {
            return 0
        }
    
        m := make(map[byte]int)
        bs := []byte(s)
        
        start := 0
        maxLen := 0
        currentLen := 0
    
        // Iterate through the string and slide the window each
        // time you find a duplicate.
    	// end is the end of current substring
    	for end := 0; end < len(bs); end++ {
    		// Check if the current character is a duplicate
    		if dupIndex, ok := m[bs[end]]; ok {
                // New max?
                if maxLen < end - start {
                    maxLen = end - start
                }
    			// forget index for all characters before the dup
    			m = make(map[byte]int)
    
    			// Slide the window to just after the dup.
    			start = dupIndex + 1
                currentLen = 0
                end = start
    		}
    		// Add the current character to hashmap
    		m[bs[end]] = end
            currentLen++
    	}
        if currentLen > maxLen {
            maxLen = currentLen
        }
        return maxLen
    }

    That’s a little better, but it’s still pretty slow. 13th percentile runtime, 20th percentile CPU. (If you use strings, the percentiles drop to 5 and 5. No better than the dumb solution!) Looking at the solutions, it is possible to do the whole thing with indexes…but I definitely do not grok it.

    func lengthOfLongestSubstring(s string) int {
        if s == "" {
            return 0
        }
        if len(s) == 1 {
            return 1
        }
    
        // brackets current longest found
        startMax := 0
        endMax := 0
    
        // start of window
        start := 0
        dupIndex := 0
    
         // Scan end pointer of window forward
        for end := 1; end < len(s); end++ {
            // The "I didn't find it" value
            dupIndex = -1
            // looking for duplicate inside window
            for j := start; j < end; j++ {
                if s[j] == s[end] {
                    // Found it
                    dupIndex = j
                    break
                }
            }
            // Dup found inside window
            if dupIndex >= 0 {
                if (end - start - dupIndex) > (endMax - startMax) {
                  startMax = dupIndex
                  endMax = end
                }
                start = dupIndex + 1
            } else {
                // no dup in window
                if (end - start + 1) > (endMax - startMax) {
                  startMax = start
                  endMax = end + 1
                }
            }
        }
        return endMax - startMax
    }

    71st percentile CPU, 96th percentile memory (memory makes sense, we barely use anything other than indexes into the string). However, the big winner is dynamic programming:

    func lengthOfLongestSubstring(s string) int {
        rv := 0
        // array as hash for extra speed and no dynamic memory.
        // create it, mark it all invalid.
        last := [256]int{}
        for i := range last {
            last[i] = -1
        }
    
        // define min inline.
        min := func(a, b int) int {
            if a < b {
                return a
            }
            return b
        }
    
        size := 0
        for i, b := range []byte(s) {
            size = min(size + 1, i - last[b])
            last[b] = i
            rv = rv + size - min(rv, size) // max(rv, size)
        }
    
        return rv
        
    }

    This is black magic and I do not understand it yet!

  • leetcode day 29 – seriously? k closest points

    I was going to go to bed after the 01 matrix, but I couldn’t resist peeking at the next one.

    Given a list of [x,y] coordinates, return the k points closest to the origin, rated “medium”.

    I read it over and thought, you just need to compute the distances for each point, sort by distance, and then return the first k items. Am I reading this wrong? Is it harder than this? Am I missing something?

    So I read it over again. No, no weird input or output format. No tricky limits or ranges.

    Shrug, okay.

    // Convenience structure to make it stupid easy.
    type point struct {
        x int
        y int
        dist float64
    }
    
    func kClosest(points [][]int, k int) [][]int {
        // this seems like I'd sort the points by distance and return the
        // slice containing the right number of items...and yeah, that was it.
    
        // Create a slice to put the points and distances in.
        distancedPoints := []point{}
    
        for i := 0; i < len(points); i++ {
            // stash coordinates
            p := point{
                x: points[i][0],
                y: points[i][1],
            }
            // Go's sqrt wants float64, so convert the coords to float64
            // and do the calculation that's even spelled out in the description.
            X := float64(p.x)
            Y := float64(p.y)
            p.dist = math.Sqrt(X * X + Y * Y)
            // All saved together.
            distancedPoints = append(distancedPoints, p)
        }
    
        // sorted in increasing distance...
        sort.Slice(distancedPoints, 
            func(i, j int) bool {
                return distancedPoints[i].dist < distancedPoints[j].dist
            })
    
        // Take the first k points. We create and return the new slice
        // because we have to get rid of the distance.
        out := [][]int{}
        for i := 0; i <= k-1; i++ {
            out = append(out, []int{distancedPoints[i].x, distancedPoints[i].y})
        }
    
        // That was it. Really.
        return out
    }

    I really don’t get why this is a medium. I did have to check the type on Math.sqrt, and double-check the sort.Slice syntax, but in terms of what to do, this is not a medium.

    97th percentile on speed, 28th on memory. I suppose you could sort parallel arrays for memory savings, but why bother? It just makes the code more complex without adding a lot of value.

  • leetcode day 29: 01 matrix

    Transform an arbitrarily-shaped rectangular matrix of zeroes and ones into a map of distance from the nearest zero, using a NSEW neighborhood.

    My first impulse was, “oh, find all the zeroes, and then work out from each one to fix the ones”, followed by “that sounds hard to do, I’ll try the recurrence relation and caching”. The recurrence relation and caching simply does not work because the problem space is too big. Even with caching the intermediate results I still ran out of memory because the neighborhood processing recursed 4x for every cell.

    So I went back to the original thought. I didn’t get much of anywhere until I got the hint that what I needed to do was mark all the 1’s as 999999 (an impossible value, marking them as “not done”). After that, I pulled up my memories of linked list processing and it was actually kind of easy! We start up by creating a QueueItem struct:

    type QueueItem struct {
        i int,
        j int,
        next *QueueItem
    }
    

    Now we need to find all the zeros and add them to the queue, and mark all the ones as “not done yet”. In hindsight, I should have made the queue operations at least a separate sub — mostly because there was a test case with a single zero, which messed up my enqueue/dequeue logic! — but they’re inline:

        m := len(mat)
        n := len(mat[0])
    
        for i := 0; i < m; i++ {
            for j := 0; j < n; j++ {
                if mat[i][j] == 0 {
                    item := &queueItem{
                        i: i,
                        j: j,
                        next: queue,
                    }
                    if queue == nil {
                        // tail stays here while we add to the head
                        tail = item
                    }
                    queue = item
                } else {
                    mat[i][j] = 9999999
                }
            }
        }

    Pretty straightforward — now we have all the zeroes on the queue, and can work our way out from each one. The 999999 is a value guaranteed greater than the largest possible real value in the map. Now all we have to do is process the neighborhoods around each zero; if we find a cell we need to update, we know from the definition of the problem that these are one away from the cell we took off the queue, so for the zero cells, these will all be ones. We set the value in the matrix, and then we add the cell we just processed to the tail of the queue. This way we process all the zeroes, then all the ones, and so on until we’ve processed every cell that needs it. When that finishes, we’re done and can return the updated matrix.

        neighborhood := [][]int{ {1,0}, {-1,0}, {0, -1}, {0, 1}}
    
        for queue != nil {
            // next cell in priority popped off the quque
            cell := queue
            queue = queue.next
            for _, dir := range neighborhood {
                // Figure out the next cell in the neighborhood of the one we
                // pulled off the queue.
                offsetI, offsetJ := cell.i + dir[0], cell.j + dir[1]
    
                // If we're in bounds...
                if offsetI >= 0 && offsetI < m && offsetJ >= 0 && offsetJ < n {
                    // ...and the cell we're looking at has a value > the current
                    // cell + 1, update it to the value of the current cell + 1
                    // and queue it. Note that neighbors closer to a zero or 
                    // zeroes themselves will be fail the test and will be
                    // ignored; they're already done.
                    if mat[offsetI][offsetJ] > mat[cell.i][cell.j] + 1 {
                        newCell := &queueItem{
                            i: offsetI,
                            j: offsetJ,
                        }
                        // This is the special case that bit me: it's possible
                        // that the queue emptied. Reset the head and tail 
                        // pointers to the cell we're adding.
                        if queue == nil {
                            queue = newCell
                            tail = newCell
                        } else {
                            // Didn't empty, so add to the tail. This guarantees
                            // that we process all items closer to a zero first.
                            tail.next = newCell 
                            tail = tail.next
                        }
                        // This updates the cell to the right distance and
                        // removes the "unprocessed" marker.                    
                        mat[offsetI][offsetJ] = mat[cell.i][cell.j] + 1
                    }
                }
            }
        }
    
        // when we arrive here there are no more queue items to process,
        // we've marked all the cells and are done.
        return mat

    So that’s it! The priority queue is stupid easy to do after the other list exercises, and I should have trusted my initial idea and tried it first. The problem I got at Bloomberg was actually a variation on this, and I could solve it fairly fast now.

    I will have to figure out dynamic programming at some point, but this problem didn’t need it.

  • leetcode day 29: insert interval big fail

    I chose completely the wrong way to solve this problem.

    Instead of eliminating all the elements that could not contain the inserted interval, I attempted to simultaneously inspect each interval while retaining the only interval and composing a new one. This led to deeply nested conditionals, and numerous flags attempting to track if I’d finished composing a new interval, had I inserted it, and so on.

    This problem is really about figuring out the elements that cannot contribute to the merged interval, and skipping over them as fast as possible. So we have two assertions:

    • The element strictly occurs before the new interval (end of the interval is < start of new interval)
    • The element strictly occurs after the new interval (beginning of the interval is > end of the new interval)

    With those in mind, the only thing we really have to do is collapse any intervals that start on or after the start of the new interval and end before or at the end of it. To collapse those intervals with the new interval, we simply have to keep taking the minimum of the start of the old or new interval as the start of the inserted interval, and the max of the end of the old or new interval as the end of the new one to stretch the new interval lower and higher as needed to fit.

    Once the new stretched interval is computed (remember, because we continue collapsing until the next interval (if there is one) is strictly after the new interval, which is will always be if the end of the stretched interval is < the start of the next one), we can simply insert the stretched interval!

    The finish appends the rest of the remaining items to the output, and then returns it.

    func insert(intervals [][]int, newInterval []int) [][]int {
        // Interval belongs somewhere in the list.
        // Look for insert point for new interval
        output := [][]int{}
        
        // Assertion: if interval[i][1] < newInterval[0], this interval 
        //            occurs before the new interval.
        // I is NOT in the for so its value hangs around after the loop ends.
        i := 0
        for ; i < len(intervals) && intervals[i][1] < newInterval[0]; i++ {
            output = append(output, intervals[i])
        }
    
        // At this point, the last interval in output is before or at the 
        // start of newinterval. collapse intervals until we find one that 
        // starts before thend of newinterval. i still points to the last
        // interval shifted.
        for ; i < len(intervals) && intervals[i][0] <= newInterval[1]; i++ {
            // adjust the start of the new interval to the earliest point
            if intervals[i][0] < newInterval[0] {
                newInterval[0] = intervals[i][0]
            }
            if newInterval[1] < intervals[i][1] {
                newInterval[1] = intervals[i][1]
            }
        }
    
        // add the stretched interval
        output = append(output, newInterval)
    
        // Copy anything left.
        for ; i  < len(intervals); i++ {
            output = append(output, intervals[i])
        }
        return output
    }

    I checked the solutions and found that there’s a function in the sort package named sort.Search that lets you construct an anonymous boolean function that is run over a sorted array to find the first item that meets the criterion specified by the function. So we can eliminate all of the explicit loops and write it like this:

    func insert(intervals [][]int, newInterval []int) [][]int {
        n := len(intervals)
    
        // Find first interval starting strictly before the new one
        // so we can extract it with a slice. i-1 will be the first one
        // that starts at a value <= the start of the new interval,
        i := sort.Search(n, func(i int) bool {
           return intervals[i][0] > newInterval[0] 
        })
    
        // Find all intervals ending strictly after this one, same thing.
        // j will be the first one definitely ending after the end of the
        // new interval.
        j := sort.Search(n, func(j int) bool {
           return intervals[j][1] > newInterval[1]
        })
    
        // Collapse intervals. If we're inside the array, and
        // the LAST interval with a start <= the start of the
        // new interval has a start that is <= the end of that
        // interval, they overlap. Stretch the bottom of the new
        // interval down and delete the overlapped interval.
        if i-1 >= 0 && newInterval[0] <= intervals[i-1][1] {
            newInterval[0] = intervals[i-1][0]
            i--
        }
    
        // If we're inside the array, and the FIRST interval with
        // an end AFTER the end of the new interval has a start <=
        // the end of the new interval, then they overlap. Stretch the
        // top of the new interval up and discard the overlapped interval.
        if j < n && newInterval[1] >= intervals[j][0] {
            newInterval[1] = intervals[j][1]
            j++
        }
    
        return append(
            // the slice up to the last interval strictly ending before
            // the start of the new interval
            intervals[:i],            
            append(
                // the new, possibly stretched interval
                [][]int{newInterval}, 
                // the slice from the first interval strictly starting
                // after the end of the new interval
                intervals[j:]...)...)
    }

    Internally, sort.Search does a binary search for the criterion function, so it’s very efficient. (And I should go back to the binary search questions and see if I can solve them more easily with sort.Search!) However, thinking through the conditions is a bit of a beast. Unless you have very large arrays to work on, the triple for loop may be easier to code on the spot and get right. Given the major improvement in speed, it’s worth practicing with!

    The sort.Search solution is very good: 94th percentile on speed, 92nd percentile on memory. Not surprising, as we’ve swapped the linear iteration over the intervals to a binary search, and we only compose the output array at the end with two appends.

    Interestingly, the linear recomposition three-loop version is quite fast too! 95th percentile, though the memory usage is only about the 50th percentile. I’d say stick with that solution unless there’s a definite case for the harder-to-code one.

    Lessons for me on this first “medium” problem, that I didn’t finish at all quickly? Spend more time thinking about how to eliminate data from the process. What is the fastest way to do that?

    My “sort it into place and track everything” solution was too complicated by far. If the logic to iterate over an array starts getting deeply nested, you are doing it wrong.