Articles by Joe McMahon

You are currently browsing Joe McMahon’s articles.

Didn’t do so well on this one on the first pass.

You’re given an ordered array of prices, and are asked to find the maximum inter-day profit. Constraints: the buy must be before the sell (first price array index < second price array index), and profit is 0 if there is no positive delta.

I originally did a brute force solution that essentially came down to a double loop. Worked fine for small examples, but failed on time for long ones.

So I went and took a shower, and got half of it: if I find the minimum value, then no price after that can possibly have a bigger delta. However, it’s possible that there was a bigger delta preceding that, and I got stuck there for a while, wondering about iterating backward, but couldn’t see a definite siimple algorithm.

I got a hint that I could use a sliding window, and that gave me the right solution.

Start at index 0 as the tentative minimum.
Walk a second index forward.
If the walking index finds a new minimum, move the minimum pointer.
Otherwise, calculate the delta between the current minimum and the walking index price.
If the delta is > the old max delta, save it.

My original insight would eliminate whatever part of the array that followed the global minimum, but would have missed possible local maximum deltas (e.g., [10, 100, 1, 5] would have a true max delta of 90; scanning only after the global minimum would only have found a delta of 4). The sliding window finds that first delta of 90, and then keeps it, even though we find a new minimum at 1.

func maxProfit(prices []int) int {
    // iterate over the array from the start.
    // take the current item as the start price.
    // find the max price after this one that leads to a profit, or 0.
    // repeat for each day up to the last-1.
    // if the profit for a given iteration is > the current, save it instead.
    maxProfit := 0
    tentativeMinIndex := 0

    for i := tentativeMinIndex; i < len(prices); i++ {
        if prices[i] < prices[tentativeMinIndex] {
            tentativeMinIndex = i
            continue
        }
        // haven't moved minimum, calc profit
        thisProfit := prices[i] - prices[tentativeMinIndex]
        if thisProfit > maxProfit {
            maxProfit = thisProfit
        }
    }
    return maxProfit
}

How’d I do? Bad on the initial O(N^2) iteration, but fab on the sliding window version: top 99% in speed, top 93% in memory.

Taking a shower helped, but I won’t be able to do that in an interview, alas. (Another good argument for working at home.)

Next is palindromes, and we’ll be back to fun with Go strings again.

Tags:

Today’s challenge is Valid parens. And it reminded me of several of the things that I don’t particularly like in Go.

Problem statement boils down to “here’s a string of brackets: parens, square brackets, braces. Tell us if they match properly (true) or don’t (false)”. Algorthmically this is dead simple:

Start with an empty stack.
for each character in the string:
  if it's an open bracket, push it.
  if its a close bracket:
    if the stack is empty, return false.
    if the top of the stack isn't the matching open, return false.
    pop the stack.

When we're done, return true if the stack is empty, false otherwise.

In Perl or Python, I’d have a stack pointer, and move it up and down an array as I pushed and popped characters by altering the stack pointer and assigning to the appropriate point in the array. If the stack pointer went below 0, I’ve gotten too many close parens, and I return false.

This approach doesn’t work well at all with Go, because Go arrays and slices Do Not Work Like That.

To push onto an existing array, we need to use a := append(a, new). Just assigning to an index outside of the slice’s current boundaries will panic.

We could still use the stack pointer to move backward, but then we’d have to have more complex code to decide if we need to append or just move the pointer. (Side note: doing it that way would probably use less memory — the working version of my solution used more memory than 90% of the other solutions). Instead, we just use the slice notation to pop the stack instead, with a := a[:len(a)-1].

My original iterations with a real stack pointer failed miserably because I wasn’t properly tracking the length of the array, and was perpetually getting the stack pointer position wrong, causing the code to panic. It was only after completely discarding the stack pointer that I got it to finally work.

Items of note:

  • I remembered I’d need to use range() to iterate over the string, but forgot that I would have to string() the characters I was getting so they were strings instead of runes. I wasted a good chunk of time trying to mess with runes before I realized that I needed string() instead. Runes are considerably more second-class citizens.
  • Still running afoul of using make() properly. Finally figured out the syntax to create an empty string array, but it took some Googling to get it.
  • I decided to make the code less complex by creating a pairs map that mapped left brackets to right ones. This meant I could check pairs[left] == right for whether I’d matched the left and right brackets. It also meant that if we ever added more bracket pairs in the future, it’d be be easier to implement them.
  • I got fenceposted by a[len(a)-1] accessing the last element, and a[:len([a)-1] dropping the last element. I naively expected that since a[len(a)-1] accessed the last element, I’d want a[:len(a)-2] to drop that last element, but that takes one too many, running me into another fencepost. Yes, it’s good that it’s symmetric, and now I’ll remember, but I definitely had to look it up to figure out both of them.
  • Forgot the check for “did we close all the opens”. I probably would not have missed it if I was making my own test cases, but it wasn’t in the tests. Note to self: watch out for poor initial test cases. There is an opportunity to add more, I think?

So how’d I do overall? We saw I was in the bottom 10% in memory usage, boo, but runtime? “Beats 100.00% of users with Go”.

I’ll definitely take that as a win, but it’s definitely obvious I need to keep practicing and getting these idioms back under my belt.

func isValid(s string) bool {
    // We'll run through the string, doing the following:
    // - if we see an open bracket, we stack it.
    // - if we see a close bracket, we pop the stack. If the
    //   brackets match, we continue, otherwise we fail.
    // - when we run out of brackets, the stack must be empty or we fail.

    stack := make([]string,0)

    pairs := map[string]string{
        "(": ")", 
        "[": "]", 
        "{": "}",
    }

    for i := range s {
        char := string(s[i])
       _, ok := pairs[char]
       if ok {
           // left, push it
           stack = append(stack, char)
       } else {
           // right, stack empty?
           if (len(stack) == 0) {
               fmt.Println("pop of empty stack")
               return false
           }
           // Not empty. match?
           if (pairs[stack[len(stack)-1]] != char) {
               fmt.Println ("mismatch")
               // mismatched bracket, fail
               return false
           }
           // Match. Pop stack.
           stack = stack[:len(stack)-1]
       }
    }
    // Check for "did we close all the open brackets".
    return len(stack) == 0
}

That’s one for today, and I need to clean the house, so I’ll call it a day.

Tags:

Well, these were fun, and definitely showed me I’m not ready to walk into an interview at the moment!

Twosum

Twosum is an easy challenge: you are given an array of numbers and a target value; your job is to find the only two numbers at different indexes in the array that sum to the target value and return those two indexes.

The brute-force strategy just repeatedly walks through the array, trying to find the right pair of numbers. This ends up being O(n^2) because we essentially run over the upper triangle of a square of the elements of the array> If in the example below were looking for numbers that sum to 9, we’d run over the numbers four times finally finding it on the fourth pass, and doing 4+3+2+1 = 10 checks. (We can save operations by precomputing the second term in the sum: subtract the first search value from the target sum, and just compare that against the value in the array at that index instead of summing the two.)

>1  2  3  4  5
 1 >2  3  4  5
 1  2 >3  4  5
 1  2  3 >4  5

But the old Perl adage “when someone says search, consider a hash” comes in handy here. We can precompute exactly the number we want, so if we can make the values the keys of a hash (and we can, as we know that there is only one pair that sums to the target value), then we can make the indexes the values.

We walk through the array twice:

  • The first iteration builds the hash by taking each value and inserting it as the key with the index as its value. And since we’re doing one iteration over the whole array anyway at this point, we can check each item as we insert it to see if it hits the target with the first item!
  • If we don’t luck out during the insert, then we iterate over items 2 to N, calculating the value we’d need to hit the target, and then doing a hash lookup to see if it’s there The hash lookup is O(1), and the pass over the array (including the load) is also O(1), so we’ve made a pretty good reduction in overall runtime.

Things I’d forgotten in the last year of pretty much nothing but Scala:

  • The only loop structure in Go is the for loop.
  • var, not val. val is Scala, and as one tries for as much immutable data as possible in Scala, my fingers are trained to type val at every turn.
  • Go array initializer syntax. Square brackets, then the type, then the values in curly braces.
  • I remembered I needed to make() a map, but used Scala syntax instead of Go syntax trying to specify the types.
  • Go does not consider [2]int to be the same as []int. Fair.
  • Gotta have the parens for every function. Perl and Scala made me sloppy about that.

Things discovered:

  • Adding the “try to catch it in the load pass” made a significant speed difference when the first term was at index 0.
  • Putting the first index in the array initializer for the result array instead of assigning it vastly increased the speed of the search loop — I went from the top 75% to the top 95% with that one change.
  • The hash solution uses more memory; I was at the 15th percentile on memory, but I’ll definitely take being faster than 95% of the other solutions as better.
  • I missed the “must be different indexes” restriction on my first pass; fortunately the second test case was for 6 with [3, 4, 2], and I got a test fail for [0,0].

Here’s the final version. I do comment even when writing these because I can line out the code with comments and put the actual code itself in later. Writing down the assumptions helps clarify them as well (thanks loads, ADHD).

func twoSum(nums []int, target int) []int {
    // Given that
    // - there is only one solution, then all entries must be unique
    // - therefore they can be used as keys to a hash
    // - this means we can iterate O(1) over the array for term 1
    //   and lookup term 2 in the hash, getting its index. No need
    //   for a linear search for it.

    // Fill the hash...and see if we can catch the solution
    // during this pass, since we have to do it anyway.
    locations := make(map[int]int)
    prematch := target - nums[0]
    for i := 0; i < len(nums); i++ {
        if i > 0 && nums[i] == prematch {
            return []int{0, i}
        }
        locations[nums[i]] = i
    }

    // scan the nums, looking for a match.
    // first match wins.
    for i := 0; i < len(nums); i++ {
        result := []int{i, -1}
        term2 := target - nums[i]
        j, ok := locations[term2]
        if ok {
            // Disqualify same-entry match. Missed this first time,
            // thank you test cases.
            if i == j {
                continue
            }
            // Found
            result[1] = j
            return result
        }
    }
    // This is a "should not get here"; meant to show me if I've totally
    // blown the loop.
    panic("did not find a solution")
}

Add two numbers

This one was recommended by leetcode as a “hot” problem, and it looked like fun, so I did it. It’s not in the Grind 75 list, but it’s a nice pointer manipulation problem, and I was a bit stale on pointers.

The function is passed two linked lists; each list has an integer from 0-9 and a pointer to the next digit. The digits are presented in right-to-left order. The function is to take these two lists, and return a new list that represents the sum of the two numbers. Of course, the lists can be different lengths; if they are, then the list that runs out first is treated as if it were returning zeroes for the purposes of the sum.

It was instantly clear that the challenge here would be maintaining the scan pointers for the two numbers and the result, and that trying to do bookkeeping for all of it in the loop would suck. So my approach was to construct a function to return a closure that would follow the rules for reading the numbers off the list:

  • If the scan pointer is null, return 0 and false
  • If the scan pointer is not null, return the node’s value and true, advancing the pointer to the next node.

This meant that the list management was dead simple:

first  := iterator(L1)
second := iterator(L2)

fVal, fState := first()
sVal, sState := second()

Each call to the first and second closures returns the next value to use in the sum; when both are false, the sum is complete.

So all I had to manage in the loop was the scan pointer for the sum, and the carry from the addition operation: adding it in, clearing it, and recalculating it after the current operation.

Things learned/relearned:

  • Had to remember when I needed * and when I didn’t for variable types.
  • The first run segfaulted. I realized that I’d mismanaged the scan pointer; I needed to save the root node, and then scan forward while doing the operations.
  • Once I fixed that, I found out I wasn’t clearing the carry. Whoops. Easy to fix.
  • The closures worked perfectly.
  • The “end the loop” and “restart the loop” keywords are break and continue. Trivial, but I had to go look it up.

I did make one major mistake: I missed that the output was supposed to be another list, and started calculating the sum as an integer. This wasn’t too terribly off from the desired solution; I had figured out that I’d need the carry tracking, and I had a power variable that I was multiplying by 10 to switch columns in the output variable, but that probably wasted 5 or 10 minutes and might have made the difference between a pass and a fail in an interview.

It was pretty easy to switch to the list building, but lesson hammered home again: be sure you read the problem statement correctly and know what the output is. Confirming with the interviewer that I got the details right is probably a good idea in a timed problem.

**
 * Definition for singly-linked list.
 * type ListNode struct {
 *     Val int
 *     Next *ListNode
 * }
 */
func addTwoNumbers(l1 *ListNode, l2 *ListNode) *ListNode {
    // The tricky bit with this one is going to be watching out
    // for the end of the list.

    // the dumbest and easiest way is to have a bool for each
    // list; when one list has no more entries, we set its bool
    // to false and return a zero value for it.

    // when both lists return a false value, we're done.
    first := iterator(l1)
    second := iterator(l2)
    var currentDigit *ListNode
		var root *ListNode
		carry := 0

    for {
        fVal, fState := first()
        sVal, sState := second()

        if (!fState && !sState) {
            // run off the end of both. Stop loop, finalize sum.
						fmt.Println("Done")
            break
        }
        // At least one still returning a value. (The other returns 0
        // if there's no value.)
        // Sum the digits and the curent carry; if > 9,
        // take mod 10 and set the carry to 1. (no two
        // digits can sum to > 18).
        digitSum := fVal + sVal + carry
        carry = 0
        if digitSum > 9 {
            carry = 1
            digitSum = digitSum - 10
        }
        // Add space for a new digit, append it, continue.
        if currentDigit != nil {
            currentDigit.Next = &ListNode{digitSum, nil}
		    currentDigit = currentDigit.Next
        } else {
		    // Create and save root digit
            currentDigit = &ListNode{digitSum, nil}
			root = currentDigit
        }
    }
    if (carry != 0) {
        // last addition had a carry we need to keep
        currentDigit.Next = &ListNode{carry, nil}
    }
    return root
}

func iterator(l *ListNode) func()(int, bool) {
    // Close over the pointer.
    p := l
    return func()(int, bool) {
        if p == nil {
            // Reached end. Keep returning 0 and false.
            return 0, false
        } else {
            // Capture next digit, advance pointer,
            // return next digit and true. If new pointer
            // is nil, we'll just return 0 and end-of-list signal
            // forever.
            v := p.Val
            p = p.Next 
            return v, true
        }
    }
} 

So how’d I do overall in comparison? Pretty fuckin’ good. I was in the top 95% on speed, and the top 91% in memory use. I suspect that managing all the bookkeeping in the loop might make it a tad faster (no call overhead for the closures), but at the cost of significantly more complexity. This solution is fast and small, and I’ll take it.

Conclusions

  • My Go is pretty good. I didn’t spend a huge amount of time chasing bugs, and had it in a couple tries. I have a good grasp of the concepts and know mostly how to do things.
  • My Go is fairly stale. I don’t remember how to say things!
  • I had to look up a lot of things that I should have just remembered, like len() for array length (not .len, that’s Scala!). I need this practice!

Tomorrow I’ll start off with Valid Parentheses.

Tags:

Most people do a novel for NaNoWriMo, and that’s a great thing.

I am not a great fiction writer; maybe I’d be better if I practiced, but right now, writing code makes more money, so I’m going to spend the month of November working through the Grind 75 leetcode practice set.

I would very much prefer a Perl job, but there just aren’t that many places now that really want someone whose primary is Perl. Python’s fine, but it’s mostly tied up in LLMs and AI at the moment, neither of which I actually find very interesting. (They seem to be more “build a model and hope it does the work you wanted” as opposed to “write the code to do the job you want done”, which I find much more satisfying.)

I haven’t done much with Rust yet, and I think it’d be a fun language to work in, but I don’t see a lot of demand for it yet. Scala is fun to write but I’d rather jump out a window than use the JVM toolchain again. (This also crosses off Java, Dart, and Flutter for me as well.) I have not in general found C++ to be that much fun, and Javascript is what it is. I can write it, I just don’t enjoy it.

So it comes down to two languages, really: Go and Swift. I enjoy coding in both (for different reasons), but at the moment I’m seeing more Go jobs than Swift ones, though that may just be my settings. Certainly when you need a Mac/iOS programmer, you need someone who at least understands Swift.

So I’ll probably end up trying it with both, and seeing how it goes. Onward!

Tags:

A lot of progress on multiple fronts to get me to the point where the the models are tested and working, and the registration controller works when manually tested. Its support functions have been unit tested as well.

Settings model

Working on through the code, I found and fixed a couple typos (SearchPage, not SeatchPage!), and then put all the model functions under test. The settings are properly initialized with sensible values for the fields that can be pre-initialized, and the load and save functions work. Overall I found the Mojo ORM to sometimes work very intuitively, and other times to be opaque enough that I just fell back to creating and running placeholder queries.

To prevent multiple sets of settings from being created, the model always sets the ID of the settings record to 1, guaranteeing that we have only one set of settings.

User model

We have the usual suite of methods for a user: do they exist, are they verified, can they log in. I decided to use Crypt::Passphrase as the hashing function and manage passwords myself instead of using the Mojo plugin. Since this is all inside the model, it’s not terrible if I decide to change it later.

Originally I thought that I should probably block multiple IDs from the same email, but I decided that I would allow it so that people could have IDs with multiple privilege sets. This becomes more necessary if the folks using the wiki start using access lists, especially if there are disjoint groups with people in more than one of them. Again, another decision that’s easy to change if I do change my mind.

The principle problem I had here was that I had set up the user table with an id primary key, but a lot of operations depend on using the username as a key. SQLite can do multiple primary keys, but the Mojo SQLite module doesn’t. It was easier to write a method that does a SELECT * FROM users where username = ? and returns the user than try to work around it.

The initial version didn’t have any password constraints; I added a function that does some very basic ones (> 10 chars, at least one upper, one lower, one digit, one special character. I used the POSIX character classes to try to start toward a more UTF-8-ish future.

The tests were getting crowded, and a bit too long, so I reorganized the tests by creating
a subdirectory for each model and controller, and copying the appropriate tests into it. I then went through the tests, making multiple copies, stripping each one down to testing just one thing. I last added a new test to verify that password validation worked, including whether the user was verified or not (not verified == failed login).

Controllers

Next was refining the controllers.

I reworked the login controller to use the User model instead of doing everything itself. and set it aside for the moment.

The rest of this sprint went into the registration controller; I wanted to be able to add users natively before testing the login controller. The only tweak it needed to be able to just run was to add in an import of the Users model so it could indeed create users.

A quick run showed me that I’d need to shuffle around the template and work on the validation code; the fields were fine for the original demo, but there were ones I didn’t need (middle name), ones that were missing (username), and the “flash” error message from validation was at the bottom of the page. Swapped things around and everything looked good.

I then went in and disassembled the logic of the controller into input validation, username creation, and the actual messing about with the User model. I left the model manipulation inline, but thinking again about it, I think I want to isolate that in a separate method and unit test that as well.

Wrote the tests for both of those, and did some cleanup on the error messaging in the validation. It now gathers all the validation errors and constructs a properly-punctuated list:

  • The item alone, capitalized, if there’s one field wrong.
  • "Item_one and item_two" (no serial comma) if there are two wrong.
  • "Item_one, Item_two,...Item_n_minus_one, and item_n" if there are more than two.

I decided to just grab the fields and drop them into a hash, then pass the hash to the utility functions; this actually led to a lot of impedance matching problems, and it might have been better to build a tiny class to carry the data instead. (If I were doing this in Go, I’d use a struct and be sure that I couldn’t assign to or use the wrong field names because the compiler would catch me.)

The username construction adds a very nice module, Text::Unidecode, which does a pretty decent job of translating Unicode characters into an ASCII-coded equivalent. I decided to do this to preserve the simplicity of the link checking code; later on, when we get to the page decoding, that code will look for words that match the wiki link syntax and automatically transform them into links to a page of the same name. Making the link syntax more complex would mean that it would be easier to accidentally create links; it’s possible to use `` before a linkname to prevent this, but having to do that a lot makes using the wiki less pleasurable.

The decoded strings sometimes contain spaces, so a s/\s(.)/uc($1)/eg was needed to collapse the space and capitalize the letter after it.

I tested this a little bit manually as well, and the controller seems to work fine. The page is a little stark, but can probably be made nicer with some CSS later.

Registration controller’s going to need integration tests to cover the rest, and the login controller has exactly one line that isn’t directly concerned with logging the user in, so it’ll need integration tests as well. Next up is those integration tests and then I’ll start on the actual page display code. Most of the rest of the functionality is inside special processing attached to pages.

A pretty productive sprint!

Tags:

Last month, while working in the Perl debugger, I came across this strange behavior:

  DB<4> l 1.2
1.2:	print "this is a placeholder";

Well, that doesn’t look right.

  DB<5> l
2.2 	
3.2 	sub lorem {
4.2:	  for(my $i = 1; $i < 100; $i++) {
5.2:	    print <<EOF;
6.2 	    Lorem
7.2 	    ipsum
8.2 	    dolor
9.2 	    sit
10.2 	    amet,
11.2 	EOF

Okay, that really doesn’t look right.

  DB<5> l 10.0.0.5
10.0.0.5 	    amet,

…I think there’s definitely something wrong here.

 l $17$
Scalar found where operator expected (Missing operator before "$;"?) at (eval 153)[/Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0/lib/5.38.0/perl5db.pl:742] line 1, near "$17$;"

Oh, no. No no no.

So I dug in to the debugger code, and found this line:

2805 elsif (my ($s, $e) = $spec =~ /^(?:(-?[\d\$\.]+)(?:[-,]([\d\$\.]+))?)?/ ) {

This is the code that matches the arguments for the debugger’s l command for either l start,stop or l start-stop. The problem is that there are a lot of things that this matches that absolutely do not make any sense, like

l ......
l -.
l $
l -.$.
l -12
1 3.14
l 127.0.0.1
l 17$

Worse, as we saw above, some of them kind of work. Others…just don’t do anything, but are accepted.

What causes this?

First question: why does this work at all? The basic reason is that Perl is all about implicit conversions. Several of those expressions evaluate to 0 in a numeric context, so the debugger effectively executes a l 0. This doesn’t list anything, because the program source in the debugger’s magic source code array starts at line 1. But Perl arrays start at 0, and therefore index 0 in the magic source array exists; its just empty. So the debugger happily lists nothing.

For the floating point numbers and IP addresses, Perl captures the list command argument as a string, and then uses it in a loop that uses the “line number” as an index into that same magic source code array, incrementing it by 1 each time. The floating-point-looking strings convert to an actual float, which can be incremented by 1, and the index into the array is automatically truncated to an int when we try to use it. So the debugger finds the lines in the array, and prints the current value of the index it’s using as the line number, resulting in floating-point line numbers.

So why is this complicated regex here at all? Why aren’t we being a lot more strict about this?

Digging into the commit history

I sat down with the perl5 repo from Github and went spelunking. Going back through the commits shows that exact pattern’s been there for over 30 years – since Perl 3. It’s just been cargo-culted forward since then.

Let’s establish some history before we do anything else.

(Aside: this was an interesting exercise, and I learned a bit more about navigating a very long and complex commit history with git. The big thing to keep in mind is that you can navigate much more easily through time with git reset --hard than with git checkout. I still ended up hitting a breaking commit of the OS/2 Makefile in there that prevented me from jumping all the way back to HEAD, and…actually kind of breaks the repo. There’s probably an easy way to repair it, but I ended up deleting the repo altogether and recloning to fix it.)

Here’s a summary of the evolution of the debugger and its l command over time, all the way back to the original creation of the debugger. I got this by repeatedly using git blame to find the commit that added a change to the line with the regex, and then git reset -hard commit^ to jump back in time, repeating until we get so far back that there’s no debugger anymore.

CommitVersion and notesRange-match regex
a559c25918bperl 1.0 patch 8

This is the original perldb commit. Parse loop is a for(;;) starting at line 99.

l min-max List lines, lines 131-135
l line List line, lines 137-140
l List the whole program, lines 141-146

Pattern is very simple, but doesn’t outlaw non-digits in the range.
/^l (.*)[-,](.*)/
13281fa4f8perl 2.0 patch 1, still perldb at the top level of the source directory

parse loop is still for(;;) starting at line 100

No changes in the l command support; lines have moved down 1. Pattern for range is the same.
Same
a687059cbafperl 3.0, moved to lib/perldb.pl

Command parse while() loop at line 64.

Command parsing depends heavily on fallthrough to work.

Lines 145-157 handle l subroutine_name with @sub, falling through to range code to do the command.
Lines 166-168 handle bare l: set up l start-end and fall through
Lines 169-173 handle l start+number: set up corresponding l start-end and fall through
Lines 174-168 actually do the l start-end

First appearance of the complex match including $ and .. Only . gets special handling (current line). No leading - accepted yet, and no handling of $, though the match accepts it.
/^l\s*(([\d\$.]+)(-,)?)?/
fe14fcc35f7perl 4.0.00

Command parse now a while loop with CMD label at line 90; command use next to restart the REPL.

No changes to the supported commands from the 3.0 debugger. Patterns now use \b to delimit the command from its arguments, but the actual commands remain the same; processing still uses fallthrough as before.

Lines 201-219 handle l subroutine: find in @sub, stack l start-end, fall through.
Lines 231-235 handle l nnnn+mmm: compute end for l start-end, fall through.
Lines 236-248 handle l range

Complex match is extended to match all of the arguments. $ and . accepted.
/^l\b\s*(([\d\$.]+)([-,]([\d\$\.]+))?)?/
d338d6fe1dfperl 5.0, “Ilya’s new debugger”, lib/perl5db.pl

l min+incr List incr+1 lines starting at min.
l min-max List lines min through max.
l line List single line.
l subname List first window of lines from subroutine.
l List next window of lines.

Lines 377-398 handle l subname by looking up the subroutine in the debugger’s magical structures, stacking an l start-end and falling through.
The w and - commands are implemented as special cases of l, stacking an l range command and falling through.
The l start+n command is also implemented with a stacked l range fallthrough.
Lines 415-432 actually implement the l range command. . is still specially processed as current line number, but $ is still ignored.

Complex match is unchanged.
Same
54d04a52ebeperl 5.0 beta 3

Same commands as last change.

l command code does not change in any significant way. The handling of the end of the range changes slightly, but there’s no new special case support.

First appearance of the leading - in the complex pattern. Nothing added to specifically support it.
/^l\b\s*((-?[\d\$\.]+)([-,]([\d\$\.]+)?)?/
492652be590perl 5.7.2

Major reorg of commands. Fallthrough code in mainline CMD loop moves to cmd_l.

LIne 1687 l – lists whole file (first occurrence)
Line 1688 l $name for subref list command.
Line 1696 implements l function_name
Line 1724 implements bare l command
Line 1728 implements l line+nnnn
Line 1734 retains the complex match for l start-end and l start,end. . is still special-cased. No change to the regex.
/^((-?[\d\$\.]+)([-,]([\d\$\.]+))?)?/
eda6e075b0cperl 5.9.X

Big documentation patch, no functional changes. Code moves a bit, but doesn’t change at all.
Same
d12a485175bperl 5.9.X

Retracts erroneous doc patch. No functional changes.
Same
69893cffeb6perl 5.9.x

Rearranges things a bit to minimize delta between 5.8 and 5.9 debuggers, but no functional changes.
Same
e22ea7cc8dbperl 5.9.x

Revert previous patch.
Same
613bf352ae2perl 5.10.0

“Extract _cmd_l_range“.

Beginning of the modern debugger architecture. Regex is unchanged, but the code is reorganized like this:
1) The regexes have been updated to remove the leading command character.
2) cmd_l parses the different commands, and each is handed off to a specific implementation routine. The complex regex routes to _cmd_l_range.
3) . is processed as current line in _cmd_l_range. $ is still unprocessed.
/^(?:(-?[\d\$\]+)(?:[-,]([\d\$\.]+))?)?/
be43a6d374eMinor changes to remove extra comments and streamline names. No other changes.Same
401da5225b7More mostly-cosmetic changes. No significant logic changes.Same
b7a96fc9f53More code rearrangement and streamlining of cmd_l. No logic changes.Same
Change history for the debugger l command.

So there are essentially four eras in the evolution of the debugger:

  • Era 1, the initial version in Perl 1 and Perl 2, which used an explicit simple range match.
  • Era 2, the appearance of the complex match in Perl 3.0.
  • Era 2.5, the transition to Perl 5.
  • Era 3, the extraction of commands in Perl 5.7.2 to separate subs.
  • Era 4, the modern (and testable) command architecture in Perl 5.10.0.

From looking over all these changes, it seems to me that what happened was that the pattern was intended to be a catchall for all the range-ish commands, but it wasn’t explicitly broken up so that it was clear exactly what those commands were supposed to be. My best guess is that Larry intended to implement l $var somehow in Perl 3; I believe assigning globs sort of worked like references at that point, and it would probably have been possible to expose the sub slot of the glob to the debugger…but this never happened.

Since the handling was all in one block with fallthroughs, creating the “match all the possible things” regex makes more sense in that context; I’m again guessing that the code was just going to ignore nonsense the way it does now, and that no one was expected to be silly enough to type floating point numbers or IP addresses at it.

The implicit conversion to integer when these were used as array indexes into the magic line array meant that it worked “well enough”. The increment of the variable containing the current line number cleaned it up enough that really dumb stuff like IP addresses and strings of periods were handled by implicit conversion to integers when looking for lines and didn’t cause enough of a problem to require that it be fixed.

I suppose you could be lazy and call it an “Easter egg”, but having all the weird things accepted feels sloppy.

What’s the “right thing”?

My best guess at what’s actually supposed to have been implemented sometime between Perl 3 and now is

  • l – : list the whole file. This does work, but probably should be called out as a specific case.
  • It’s likely that l -nnnn (list from here to line nnnn) was intended to work, since the match is there for it, but doesn’t.
  • It would seem logical that l nnnn- (list from line nnnn to EOF) should also be wanted and work, but also doesn’t.
  • Variations on using . as one or the other of the arguments: l nnnn,., (list from some previous line to the current one) and l .,nnnn (from here to some later line) probably ought to work. l .+5 should also work. Either these don’t work at all, or don’t work as one would expect.
  • Technically, l .,. should be legal and work, and do the same thing as l ., but admittedly it’s silly and I think we’d be forgiven if it didn’t. (It currently doesn’t.)
  • Bare $ is a mystery. You can enter it, and it matches, but it does nothing. There’s nothing in the code or comments that points to an intended use. Perhaps it was meant to be a magic “last line” substitute, or maybe it’s an artifact of “we should be able to match stuff in a scalat at some point” which ended up implemented as a completely separate case. Since bare $ has never worked, I think we’re OK dropping it.
  • Several other things match and the debugger tries to process them, but they’re nonsensical. l $$$$$$ is effectively l. l $fred$ matches but causes the debugger to throw an error; it should probably be rejected altogether.
  • There is no “I don’t like what you entered” clause. Either something weird happens, nothing happens, or there’s an error.

The problem is that complex regex right now matches far more than it ought to. We really need to establish what the acceptable forms are, and break that regex up so that it doesn’t match nonsense and cmd_l explicitly rejects nonsense. We’re only getting along because of Perl’s forgiving nature — but that means that peculiar things are accepted and (sort of) work.

Next steps

I’d suggest that we decide what the valid inputs are, and what they should do, and that the giant catchall regex should go away in favor of something much more like Larry’s original Perl 1 version.

I’m going to put a patch together in the short term that will simply not match most of the weird options, and add a “list range specification invalid” catchall else in _cmd_l_range or _cmd_l. We can decide if we want to add support for some of the more esoteric options (like l .,nnnn and so on) later; just making sure that we don’t assign nonsense to the start of the range that keeps propagating through further list commands seems like enough of a fix for now.

I flailed around for a while trying to get a good handle on how to properly start moving the old wiki functions from the script in hold/ to someplace proper in the Mojolicious tree. After a good bit of reading I finally stumbled across Ashutosh’s great intro article on assembling a model and controller. I used his registration and login controller articles to model the following:

  • A model for users, WebWebXNG::Model::Users. I used his basic model and changed it up a bit, adding a validated field and a field for the wiki username.
  • A model for settings, WebWebXNG::Model::Settings; this was a little tricky because I wanted only one possible settings record, and coding an SQLite UPSERT operation led to complicated code tracking which fields were updated and which weren’t. I got around this by using a fixed record ID and always either inserting or updating that record by simply checking if it’s there first.
  • A registration controller, WebWebXNG::Controller::RegistrationController, that displays the registration form on a GET and tries to use the form contents to register the user on POST. The registration code builds a wiki username in four steps, checking each time whether the name conforms to wiki syntax and trying the next step if not:
    • If the user supplied their own username, use it.
    • If they did not supply a username, try first_name . last_name without changing capitalization.
    • If that doesn’t work, try ucfirst(lc(first_name)) . ucfirst(lc(last_name)).
    • If that doesn’t work, it’s likely that the name contains non-ASCII characters, and we use Text::Unidecode to try to decode it into close-as-we-can ASCII.
  • A login controller, WebWebXNG::Controller::LoginController, that checks if the user’s been validated, and if so, then checks if the password is good. This most closely resembles the old code (though notably, that code did not do any user validation. Any bozo could sign up, but fortunately the NASA firewall was tight enough that the wiki wasn’t visible!)
  • A display controller is sketched in but not ready yet.
  • Templates for the login and register pages, and a start on the display page.

I added helpers (created with $self->helper) for sqlite (the database), users (a model instance that we pass the sqlite helper to), and settings (same as for users). I moved the routes back into the mainline code of WebWebXNG.pm because factoring them out seemed to just create more complexity for no real benefit.

WebWebXNG now also has some 12-factorish support in that we can specify a list of environment variables that are to be reflected into the config; the only one added so far is SQLITE_FILE, which sets the filename of the SQLite database. There’s also code in place to ensure that the config meets the minimum standards (again, SQLITE_FILE is the only hard requirement so far, but PAGEARCHIVE_DIR and LOCKDIR are probably next).

All this is now in place, but sadly under-tested. The “did we set up the database” test is in and passes, but little else. I did run the service and verify that I can display the login and register forms, but the new code added means that I can’t just throw in some data and test everything.

It was necessary to add a good bit of POD as well to get dzil test to pass, but it does, and all the code is committed and pushed up to GitHub.

Next sprint will be devoted only to getting all this under test.

And now the real work starts.

I’ve gotten the support code in pretty good shape, so now what’s needed is to tackle upgrading the actual processing to Mojolicious. 

I started out by moving the App::WebWebXNG module that contained the core code into a new hold/ directory; now I needed to create the Mojolicious app. I originally though I’d be able to just create this piecemeal, so created a new dummy lib/App/WebWebXNG.pm modulino that just loaded Mojolicious and did main unless caller; with a dummy main. I then needed to add enough POD to get dzil test to pass, so added some skeleton docs in the new lib/App/WebWebXNG.pm.

I changed main to startup, since Mojo wanted that, and added stub versions of the routes for all of the functions in WebWebX. This was easy because I could simply take the dispatch hash table and change all the keys to calls to $self->routes->get, like this:

  $self->routes->get('/SearchRefs')->to("dev#hello");
  $self->routes->get('/ViewPage')->to("dev#hello");
  $self->routes->get('/ShowDiffs')->to("dev#hello");
  ...

Getting a little further into it, I realized that I really wanted a properly-structured Mojolicious application instead of a Lite one, which would have more closely resembled the original, so I used mojo generate app to generate a full-up app skeleton. I originally tried this inside the existing directory, but this created a nested app instead, so I moved up to the containing directory, ran mojo generate app there, and copied the resulting files and directories into the existing repo. I then moved the route creation into the new lib/WebWebXNG.pm and deleted lib/App/WebWebXNG.pm, and of course needed POD again. Also removed the 01load.t test for the no-longer-extant module and some use statements for it that were lying around.

Around this time I started getting irritating messages about prototypes from Perl::Critic; apparently the current version of one of the Perl::Critic core tests doesn’t properly detect signatures, and the fix is not yet forthcoming. I opened .perlcriticrc and realized that I’d been using the template version I’d been copying into code since I started working at Zip, and it was still referencing Zip Perl modules. Cleaned those up, moved the .perlcriticrc to perlcritic.rc, which is where Dist::Zilla expects it, deleted the lib/App/WebWebXNG/Controller/Dev.pm controller, since I didn’t want or need it, added POD to the generated lib/WebWebXNG/Controller/Example.pm so it would pass the POD tests, added [-Subroutines::ProhibitSubroutinePrototypes] to turn off the failing “prototype” checks, and stopped work for the day.

Tests are passing again; the code doesn’t do anything yet, but we’re on the way to start reimplementing the controllers and creating the HTML templates.

On the Disquiet Junto Slack, one of our members posted that they’d had a dream:

I had a dream about a piece of gear last night. I wouldn’t say that it was “dream gear,” though it was still cool. It was a small black metal box, about the size of three DVD cases stacked on top of each other. There were a few knobs and sliders, a small 3-inch speaker, headphone out, and a telescoping antenna, so it kinda looked like a little radio at first. The antenna was there for radio reception but there was other stuff going on. It was intended to be used as a meditation/sleep aid/ASMR machine. There were sliders for a four-band EQ and a tuning knob for the radio. The tuning knob had a secondary function that tuned a drone sound (kinda sounded like a triangle wave fed through a wavefolder/resonance thinger). The other feature of this box was something like a numbers stations generator. Another slider was for the mix between the drone and a woman’s voice speaking random numbers and letters from the NATO alphabet in a Google Assistant-/Alexa-/Siri-type voice but with far less inflection. The four-band EQ was to be used like a mixer as well in that it was how a person could adjust how much of the radio signal was audible over the drone/numbers by using the output gain of the EQ. There was also a switch that fed the drone/numbers signal into the EQ as well. The EQ was intentionally low-quality so that when you took it above 0dB, it would distort.

The Disquiet Junto Slack, #gear channel

Now what was weird was that I’m been doing something like this in AUM; I had a quiet ambient Dorian sequence driven by ZOA on several instances of KQ Dixie (a DX7 emulator), and was using Radio Unit (a radio streaming AU) to layer in some birdsong. I realized I could mostly emulate the dream box if I added another Radio Unit to pull in some random stations, but generating the “numbers station” audio was more of a challenge – until I remembered that OS X has the say command, that will let you use the built-in speech synthesizers to pronounce text from the command line.

I sat down, and after some fiddling (and looking up “how to add arbitrary pauses” so the rhythm was right), I created NATO::Synth to create the strings I wanted and pass them to say. It has a few nice little tweaks, like caching the strings created so it can decide to repeat itself, and properly inflecting the start and end of each “sentence”.

I saved the generated audio (recorded with Audio Hijack) to iCloud, loaded it into AUM, and then recorded the results. Very pleased with it!

I spent some time in this last set of work cleaning up some more stuff in PageArchive and File::LockDir.

First, I renamed the PageArchive::RCS class to PageArchive::Sequential, because it really doesn’t share much with RCS other than a file naming convention (name,version  to denote file version). It actually just stores each version of the file; no deltas, no branches, just a linear set of files for each version as it’s saved. 

This has the advantage of being so stupid that it’s hard to get wrong, but the disadvantage that there’s a lot of duplicated data. Casting my mind back to 2000, I think that the decision was “we want a wiki running as soon as possible, and we don’t think it’ll be that big. Let’s just get this going.” I think that we did eventually run into space constraints, given that the delete() and purge() functions are in there for cleaning up old versions. For now, we’ll leave this as is, but it definitely needs addressing down the line.

I started off with a tidy of the class, and committing that by itself to make it easier to track actual changes vs. whitespace fiddling.

Second, I came back to fatal() and note(); these were way more complicated than they needed to be. I was originally wrapping up the callbacks so that they’d look like native methods, and cute as that was, it was actually less straightforward than simply returning a sub ref, and then just using that directly:

    $archive->fatal->(“this is our message”)

I went though the code I had previously and cleaned it all up to match this spec instead. In addition, the default fatal() callback now uses carp(), so we see the error in the proper context instead of down inside the class where the error was found, but not where it happened.

The setError and getError method names weren’t consistent, so I renamed them to set_error() and get_error().

The dirhandle that the PageArchive class needs was being referred to via a direct lookup in the object hash, which is a code smell. Wrapped this up in a method too. Some other code was also factored out into methods, notably _dirname (setter/getter for the directory we’re using for the archive) and _page_in_archive (co-locating all of the “synthesize a name for a file in the archive from the object, a name, and a version). dh_reset was renamed rewind(), because it’s essentially a rewinddir() call. We retained the method to allow Win32 to close and reopen the dirhandle, as that’s the only way Win32 could do that operation in 2000. (It may have gotten better, but I don’t run Win32.)

The big change at this point was committing to Test2 for all my testing (thanks, Chad!). It in particular made testing multi-process locking far easier to do. I still have to fork and wait (a little), but it’s much less work than before, and I can do tests in both the parent and child processes.

PageArchive::Sequential was further reorganized to make the methods peculiar to it private; this gets me further toward creating a PageArchive interface — or perhaps an abstract base class — to make swapping PageArchive engines easier.

I’d been away for a bit, so I added the Dist::Zilla Cover class to allow me to easily run Devel::Cover and see how good my test coverage was. Not great; File::LockDir was only 50% covered, so I decided to focus on getting that closer to covered before doing anything else. nlock_state wasn’t tested at all yet, and I was definitely going to need that for PageArchive::Sequential, so I wrote basic tests to verify that it worked. Discovered a few bugs, most notably that I wasn’t using the carefully-maintained locking cache to prevent having to read the directory each time.

Now I was ready to start in on PageArchive::Sequential. The basic new() test was already done, but it was time to start testing the actual page manipulation functions. I started with put() and page_exists(), as those two are kind of the most basic. In the process, I realized that I hadn’t properly tested the “there’s no such directory”, “this is a file, not a directory”, and “I can’t write to this” cases. Added tests for those.

The put() function was very straightforward, since Storable was pretty much handling all of it. I just needed to make sure that the files I expected did end up in the archive directory, and that was easy with -e and PageArchive’s page name generator method.

I fixed my first TODO: remove the path crawl down the supplied path and change that over to make_path(). That simplified new() significantly.

I now started to get into testing File::LockDir in context, and there were some rough edges here. I think most of them were introduced by swapping File::LockDir’s implementation to be an object, but there were a few issues that were there before that I either never saw, or lucked out along the happy path and didn’t hit. 

One interesting aside: I had added Try::Tiny to PageArchive::Sequential…but I added it right after the use strict and use warnings, and not inside the package! Fortunately Perl::Critic pointed this out for me; the code otherwise would silently have executed both blocks…

While running the tests, I saw a lot more tracing output in the log (printed to STDERR), and wrapped up more of it in if $self->debug. This entailed adding debug() to PageArchive::Sequential, as it didn’t have it. 

I had used “lockee” to refer to the locking user, and besides being an esoteric word, it was also wrong. Changed this to locking_user to be accurate. In addition, while doing this cleanup, the tests showed me that I wasn’t being consistent in what was returned for locked vs. not locked; I had to reorder the code slightly and return different values to ensure that was working right. I also noticed that the owner file inside the locking dir was generated via concatenation and a magic string, so I moved that to a method and switched it to File::Spec->catdir.

I added extra checks to the locking tests in PageArchive::Sequential to verify that locking one version marked all versions locked; this was because I wanted to prevent someone fiddling with the “commit” history while someone else has the file locked. I verified it was still possible to read the old versions via get() while the page was locked — this is desirable as someone may want to look at the page or older versions while someone else is editing.

In the process, I found that max_version was throwing some weird errors, though it seemed to be working. I tracked this down to the readdir of the archive directory quite correctly returning the locking directory names as well as the page file names. Adding a grep to eliminate them cleaned that up. In addition, I rewrote the code that took the filenames and found the max version to be a full-up Schwartzian transform, so the code now looks like this:

    my @pages = grep { /,\d+$/ } $self->page_exists($name);
    return unless @pages;

     my @indexes = map  { $_->[1] }
                  sort { $b->[1] <=> $a->[1] }
                  map  { /^.*,(.*)$/; [$_, $1] }
                  @pages;

     return shift @indexes;

We take the filtered page list, extract the version with a pattern match, create an array of [$name, $version], sort that descending numerically, and then shift it to return the highest number. Note also that we catch the “there aren’t any versions of this” case as well.

Last, I tested page gets: can I get an arbitrary version? If I specify no version, do I get the most recent one? Do gets still work when the page is locked? If I specify a bad version, do I get the correct error? 

Then it was tests for delete() and purge:

  • delete of a nonexistent version fails
  • delete with no version deletes newest
  • delete with explicit version deletes only that version
  • purge of a nonexistent page works (silently)
  • purge of an existing page deletes all versions

There’s an inconsistency between delete and purge; I think I’ll probably need to go more in-depth into App::WebWebXNG to figure out if this was purposeful, or just a brain fart.

Last test was for iterator(), which returns the highest revision of every file in the archive. (This would be trivial with a more modern SCM!) I did find another $^W = 0, which is a code smell…and no longer blocks the undef error I get in this code anyway. I removed it, changed the code that builds the hash recording the max version of each file so it does an exists check first, and everything now works.

That’s the breakpoint for today. File::LockDir is 96% covered, and PageArchive::Sequential is 97% covered; taking a quick look at the code, everything left mostly seems to be difficult-to-test error cases, so I’ll move on to App::WebWebXNG in the next “sprint” and see if anything bites me.

« Older entries § Newer entries »