Author: Joe McMahon

  • This bug is…32 years old?

    Last month, while working in the Perl debugger, I came across this strange behavior:

      DB<4> l 1.2
    1.2:	print "this is a placeholder";

    Well, that doesn’t look right.

      DB<5> l
    2.2 	
    3.2 	sub lorem {
    4.2:	  for(my $i = 1; $i < 100; $i++) {
    5.2:	    print <<EOF;
    6.2 	    Lorem
    7.2 	    ipsum
    8.2 	    dolor
    9.2 	    sit
    10.2 	    amet,
    11.2 	EOF

    Okay, that really doesn’t look right.

      DB<5> l 10.0.0.5
    10.0.0.5 	    amet,

    …I think there’s definitely something wrong here.

     l $17$
    Scalar found where operator expected (Missing operator before "$;"?) at (eval 153)[/Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0/lib/5.38.0/perl5db.pl:742] line 1, near "$17$;"

    Oh, no. No no no.

    So I dug in to the debugger code, and found this line:

    2805 elsif (my ($s, $e) = $spec =~ /^(?:(-?[\d\$\.]+)(?:[-,]([\d\$\.]+))?)?/ ) {

    This is the code that matches the arguments for the debugger’s l command for either l start,stop or l start-stop. The problem is that there are a lot of things that this matches that absolutely do not make any sense, like

    l ......
    l -.
    l $
    l -.$.
    l -12
    1 3.14
    l 127.0.0.1
    l 17$

    Worse, as we saw above, some of them kind of work. Others…just don’t do anything, but are accepted.

    What causes this?

    First question: why does this work at all? The basic reason is that Perl is all about implicit conversions. Several of those expressions evaluate to 0 in a numeric context, so the debugger effectively executes a l 0. This doesn’t list anything, because the program source in the debugger’s magic source code array starts at line 1. But Perl arrays start at 0, and therefore index 0 in the magic source array exists; its just empty. So the debugger happily lists nothing.

    For the floating point numbers and IP addresses, Perl captures the list command argument as a string, and then uses it in a loop that uses the “line number” as an index into that same magic source code array, incrementing it by 1 each time. The floating-point-looking strings convert to an actual float, which can be incremented by 1, and the index into the array is automatically truncated to an int when we try to use it. So the debugger finds the lines in the array, and prints the current value of the index it’s using as the line number, resulting in floating-point line numbers.

    So why is this complicated regex here at all? Why aren’t we being a lot more strict about this?

    Digging into the commit history

    I sat down with the perl5 repo from Github and went spelunking. Going back through the commits shows that exact pattern’s been there for over 30 years – since Perl 3. It’s just been cargo-culted forward since then.

    Let’s establish some history before we do anything else.

    (Aside: this was an interesting exercise, and I learned a bit more about navigating a very long and complex commit history with git. The big thing to keep in mind is that you can navigate much more easily through time with git reset --hard than with git checkout. I still ended up hitting a breaking commit of the OS/2 Makefile in there that prevented me from jumping all the way back to HEAD, and…actually kind of breaks the repo. There’s probably an easy way to repair it, but I ended up deleting the repo altogether and recloning to fix it.)

    Here’s a summary of the evolution of the debugger and its l command over time, all the way back to the original creation of the debugger. I got this by repeatedly using git blame to find the commit that added a change to the line with the regex, and then git reset -hard commit^ to jump back in time, repeating until we get so far back that there’s no debugger anymore.

    CommitVersion and notesRange-match regex
    a559c25918bperl 1.0 patch 8

    This is the original perldb commit. Parse loop is a for(;;) starting at line 99.

    l min-max List lines, lines 131-135
    l line List line, lines 137-140
    l List the whole program, lines 141-146

    Pattern is very simple, but doesn’t outlaw non-digits in the range.
    /^l (.*)[-,](.*)/
    13281fa4f8perl 2.0 patch 1, still perldb at the top level of the source directory

    parse loop is still for(;;) starting at line 100

    No changes in the l command support; lines have moved down 1. Pattern for range is the same.
    Same
    a687059cbafperl 3.0, moved to lib/perldb.pl

    Command parse while() loop at line 64.

    Command parsing depends heavily on fallthrough to work.

    Lines 145-157 handle l subroutine_name with @sub, falling through to range code to do the command.
    Lines 166-168 handle bare l: set up l start-end and fall through
    Lines 169-173 handle l start+number: set up corresponding l start-end and fall through
    Lines 174-168 actually do the l start-end

    First appearance of the complex match including $ and .. Only . gets special handling (current line). No leading - accepted yet, and no handling of $, though the match accepts it.
    /^l\s*(([\d\$.]+)(-,)?)?/
    fe14fcc35f7perl 4.0.00

    Command parse now a while loop with CMD label at line 90; command use next to restart the REPL.

    No changes to the supported commands from the 3.0 debugger. Patterns now use \b to delimit the command from its arguments, but the actual commands remain the same; processing still uses fallthrough as before.

    Lines 201-219 handle l subroutine: find in @sub, stack l start-end, fall through.
    Lines 231-235 handle l nnnn+mmm: compute end for l start-end, fall through.
    Lines 236-248 handle l range

    Complex match is extended to match all of the arguments. $ and . accepted.
    /^l\b\s*(([\d\$.]+)([-,]([\d\$\.]+))?)?/
    d338d6fe1dfperl 5.0, “Ilya’s new debugger”, lib/perl5db.pl

    l min+incr List incr+1 lines starting at min.
    l min-max List lines min through max.
    l line List single line.
    l subname List first window of lines from subroutine.
    l List next window of lines.

    Lines 377-398 handle l subname by looking up the subroutine in the debugger’s magical structures, stacking an l start-end and falling through.
    The w and - commands are implemented as special cases of l, stacking an l range command and falling through.
    The l start+n command is also implemented with a stacked l range fallthrough.
    Lines 415-432 actually implement the l range command. . is still specially processed as current line number, but $ is still ignored.

    Complex match is unchanged.
    Same
    54d04a52ebeperl 5.0 beta 3

    Same commands as last change.

    l command code does not change in any significant way. The handling of the end of the range changes slightly, but there’s no new special case support.

    First appearance of the leading - in the complex pattern. Nothing added to specifically support it.
    /^l\b\s*((-?[\d\$\.]+)([-,]([\d\$\.]+)?)?/
    492652be590perl 5.7.2

    Major reorg of commands. Fallthrough code in mainline CMD loop moves to cmd_l.

    LIne 1687 l – lists whole file (first occurrence)
    Line 1688 l $name for subref list command.
    Line 1696 implements l function_name
    Line 1724 implements bare l command
    Line 1728 implements l line+nnnn
    Line 1734 retains the complex match for l start-end and l start,end. . is still special-cased. No change to the regex.
    /^((-?[\d\$\.]+)([-,]([\d\$\.]+))?)?/
    eda6e075b0cperl 5.9.X

    Big documentation patch, no functional changes. Code moves a bit, but doesn’t change at all.
    Same
    d12a485175bperl 5.9.X

    Retracts erroneous doc patch. No functional changes.
    Same
    69893cffeb6perl 5.9.x

    Rearranges things a bit to minimize delta between 5.8 and 5.9 debuggers, but no functional changes.
    Same
    e22ea7cc8dbperl 5.9.x

    Revert previous patch.
    Same
    613bf352ae2perl 5.10.0

    “Extract _cmd_l_range“.

    Beginning of the modern debugger architecture. Regex is unchanged, but the code is reorganized like this:
    1) The regexes have been updated to remove the leading command character.
    2) cmd_l parses the different commands, and each is handed off to a specific implementation routine. The complex regex routes to _cmd_l_range.
    3) . is processed as current line in _cmd_l_range. $ is still unprocessed.
    /^(?:(-?[\d\$\]+)(?:[-,]([\d\$\.]+))?)?/
    be43a6d374eMinor changes to remove extra comments and streamline names. No other changes.Same
    401da5225b7More mostly-cosmetic changes. No significant logic changes.Same
    b7a96fc9f53More code rearrangement and streamlining of cmd_l. No logic changes.Same
    Change history for the debugger l command.

    So there are essentially four eras in the evolution of the debugger:

    • Era 1, the initial version in Perl 1 and Perl 2, which used an explicit simple range match.
    • Era 2, the appearance of the complex match in Perl 3.0.
    • Era 2.5, the transition to Perl 5.
    • Era 3, the extraction of commands in Perl 5.7.2 to separate subs.
    • Era 4, the modern (and testable) command architecture in Perl 5.10.0.

    From looking over all these changes, it seems to me that what happened was that the pattern was intended to be a catchall for all the range-ish commands, but it wasn’t explicitly broken up so that it was clear exactly what those commands were supposed to be. My best guess is that Larry intended to implement l $var somehow in Perl 3; I believe assigning globs sort of worked like references at that point, and it would probably have been possible to expose the sub slot of the glob to the debugger…but this never happened.

    Since the handling was all in one block with fallthroughs, creating the “match all the possible things” regex makes more sense in that context; I’m again guessing that the code was just going to ignore nonsense the way it does now, and that no one was expected to be silly enough to type floating point numbers or IP addresses at it.

    The implicit conversion to integer when these were used as array indexes into the magic line array meant that it worked “well enough”. The increment of the variable containing the current line number cleaned it up enough that really dumb stuff like IP addresses and strings of periods were handled by implicit conversion to integers when looking for lines and didn’t cause enough of a problem to require that it be fixed.

    I suppose you could be lazy and call it an “Easter egg”, but having all the weird things accepted feels sloppy.

    What’s the “right thing”?

    My best guess at what’s actually supposed to have been implemented sometime between Perl 3 and now is

    • l – : list the whole file. This does work, but probably should be called out as a specific case.
    • It’s likely that l -nnnn (list from here to line nnnn) was intended to work, since the match is there for it, but doesn’t.
    • It would seem logical that l nnnn- (list from line nnnn to EOF) should also be wanted and work, but also doesn’t.
    • Variations on using . as one or the other of the arguments: l nnnn,., (list from some previous line to the current one) and l .,nnnn (from here to some later line) probably ought to work. l .+5 should also work. Either these don’t work at all, or don’t work as one would expect.
    • Technically, l .,. should be legal and work, and do the same thing as l ., but admittedly it’s silly and I think we’d be forgiven if it didn’t. (It currently doesn’t.)
    • Bare $ is a mystery. You can enter it, and it matches, but it does nothing. There’s nothing in the code or comments that points to an intended use. Perhaps it was meant to be a magic “last line” substitute, or maybe it’s an artifact of “we should be able to match stuff in a scalat at some point” which ended up implemented as a completely separate case. Since bare $ has never worked, I think we’re OK dropping it.
    • Several other things match and the debugger tries to process them, but they’re nonsensical. l $$$$$$ is effectively l. l $fred$ matches but causes the debugger to throw an error; it should probably be rejected altogether.
    • There is no “I don’t like what you entered” clause. Either something weird happens, nothing happens, or there’s an error.

    The problem is that complex regex right now matches far more than it ought to. We really need to establish what the acceptable forms are, and break that regex up so that it doesn’t match nonsense and cmd_l explicitly rejects nonsense. We’re only getting along because of Perl’s forgiving nature — but that means that peculiar things are accepted and (sort of) work.

    Next steps

    I’d suggest that we decide what the valid inputs are, and what they should do, and that the giant catchall regex should go away in favor of something much more like Larry’s original Perl 1 version.

    I’m going to put a patch together in the short term that will simply not match most of the weird options, and add a “list range specification invalid” catchall else in _cmd_l_range or _cmd_l. We can decide if we want to add support for some of the more esoteric options (like l .,nnnn and so on) later; just making sure that we don’t assign nonsense to the start of the range that keeps propagating through further list commands seems like enough of a fix for now.

  • Mojo Models and Controllers

    I flailed around for a while trying to get a good handle on how to properly start moving the old wiki functions from the script in hold/ to someplace proper in the Mojolicious tree. After a good bit of reading I finally stumbled across Ashutosh’s great intro article on assembling a model and controller. I used his registration and login controller articles to model the following:

    • A model for users, WebWebXNG::Model::Users. I used his basic model and changed it up a bit, adding a validated field and a field for the wiki username.
    • A model for settings, WebWebXNG::Model::Settings; this was a little tricky because I wanted only one possible settings record, and coding an SQLite UPSERT operation led to complicated code tracking which fields were updated and which weren’t. I got around this by using a fixed record ID and always either inserting or updating that record by simply checking if it’s there first.
    • A registration controller, WebWebXNG::Controller::RegistrationController, that displays the registration form on a GET and tries to use the form contents to register the user on POST. The registration code builds a wiki username in four steps, checking each time whether the name conforms to wiki syntax and trying the next step if not:
      • If the user supplied their own username, use it.
      • If they did not supply a username, try first_name . last_name without changing capitalization.
      • If that doesn’t work, try ucfirst(lc(first_name)) . ucfirst(lc(last_name)).
      • If that doesn’t work, it’s likely that the name contains non-ASCII characters, and we use Text::Unidecode to try to decode it into close-as-we-can ASCII.
    • A login controller, WebWebXNG::Controller::LoginController, that checks if the user’s been validated, and if so, then checks if the password is good. This most closely resembles the old code (though notably, that code did not do any user validation. Any bozo could sign up, but fortunately the NASA firewall was tight enough that the wiki wasn’t visible!)
    • A display controller is sketched in but not ready yet.
    • Templates for the login and register pages, and a start on the display page.

    I added helpers (created with $self->helper) for sqlite (the database), users (a model instance that we pass the sqlite helper to), and settings (same as for users). I moved the routes back into the mainline code of WebWebXNG.pm because factoring them out seemed to just create more complexity for no real benefit.

    WebWebXNG now also has some 12-factorish support in that we can specify a list of environment variables that are to be reflected into the config; the only one added so far is SQLITE_FILE, which sets the filename of the SQLite database. There’s also code in place to ensure that the config meets the minimum standards (again, SQLITE_FILE is the only hard requirement so far, but PAGEARCHIVE_DIR and LOCKDIR are probably next).

    All this is now in place, but sadly under-tested. The “did we set up the database” test is in and passes, but little else. I did run the service and verify that I can display the login and register forms, but the new code added means that I can’t just throw in some data and test everything.

    It was necessary to add a good bit of POD as well to get dzil test to pass, but it does, and all the code is committed and pushed up to GitHub.

    Next sprint will be devoted only to getting all this under test.

  • Getting my Mojo workin’

    And now the real work starts.

    I’ve gotten the support code in pretty good shape, so now what’s needed is to tackle upgrading the actual processing to Mojolicious. 

    I started out by moving the App::WebWebXNG module that contained the core code into a new hold/ directory; now I needed to create the Mojolicious app. I originally though I’d be able to just create this piecemeal, so created a new dummy lib/App/WebWebXNG.pm modulino that just loaded Mojolicious and did main unless caller; with a dummy main. I then needed to add enough POD to get dzil test to pass, so added some skeleton docs in the new lib/App/WebWebXNG.pm.

    I changed main to startup, since Mojo wanted that, and added stub versions of the routes for all of the functions in WebWebX. This was easy because I could simply take the dispatch hash table and change all the keys to calls to $self->routes->get, like this:

      $self->routes->get('/SearchRefs')->to("dev#hello");
      $self->routes->get('/ViewPage')->to("dev#hello");
      $self->routes->get('/ShowDiffs')->to("dev#hello");
      ...

    Getting a little further into it, I realized that I really wanted a properly-structured Mojolicious application instead of a Lite one, which would have more closely resembled the original, so I used mojo generate app to generate a full-up app skeleton. I originally tried this inside the existing directory, but this created a nested app instead, so I moved up to the containing directory, ran mojo generate app there, and copied the resulting files and directories into the existing repo. I then moved the route creation into the new lib/WebWebXNG.pm and deleted lib/App/WebWebXNG.pm, and of course needed POD again. Also removed the 01load.t test for the no-longer-extant module and some use statements for it that were lying around.

    Around this time I started getting irritating messages about prototypes from Perl::Critic; apparently the current version of one of the Perl::Critic core tests doesn’t properly detect signatures, and the fix is not yet forthcoming. I opened .perlcriticrc and realized that I’d been using the template version I’d been copying into code since I started working at Zip, and it was still referencing Zip Perl modules. Cleaned those up, moved the .perlcriticrc to perlcritic.rc, which is where Dist::Zilla expects it, deleted the lib/App/WebWebXNG/Controller/Dev.pm controller, since I didn’t want or need it, added POD to the generated lib/WebWebXNG/Controller/Example.pm so it would pass the POD tests, added [-Subroutines::ProhibitSubroutinePrototypes] to turn off the failing “prototype” checks, and stopped work for the day.

    Tests are passing again; the code doesn’t do anything yet, but we’re on the way to start reimplementing the controllers and creating the HTML templates.

  • Using Perl to simulate a numbers station

    On the Disquiet Junto Slack, one of our members posted that they’d had a dream:

    I had a dream about a piece of gear last night. I wouldn’t say that it was “dream gear,” though it was still cool. It was a small black metal box, about the size of three DVD cases stacked on top of each other. There were a few knobs and sliders, a small 3-inch speaker, headphone out, and a telescoping antenna, so it kinda looked like a little radio at first. The antenna was there for radio reception but there was other stuff going on. It was intended to be used as a meditation/sleep aid/ASMR machine. There were sliders for a four-band EQ and a tuning knob for the radio. The tuning knob had a secondary function that tuned a drone sound (kinda sounded like a triangle wave fed through a wavefolder/resonance thinger). The other feature of this box was something like a numbers stations generator. Another slider was for the mix between the drone and a woman’s voice speaking random numbers and letters from the NATO alphabet in a Google Assistant-/Alexa-/Siri-type voice but with far less inflection. The four-band EQ was to be used like a mixer as well in that it was how a person could adjust how much of the radio signal was audible over the drone/numbers by using the output gain of the EQ. There was also a switch that fed the drone/numbers signal into the EQ as well. The EQ was intentionally low-quality so that when you took it above 0dB, it would distort.

    The Disquiet Junto Slack, #gear channel

    Now what was weird was that I’m been doing something like this in AUM; I had a quiet ambient Dorian sequence driven by ZOA on several instances of KQ Dixie (a DX7 emulator), and was using Radio Unit (a radio streaming AU) to layer in some birdsong. I realized I could mostly emulate the dream box if I added another Radio Unit to pull in some random stations, but generating the “numbers station” audio was more of a challenge – until I remembered that OS X has the say command, that will let you use the built-in speech synthesizers to pronounce text from the command line.

    I sat down, and after some fiddling (and looking up “how to add arbitrary pauses” so the rhythm was right), I created NATO::Synth to create the strings I wanted and pass them to say. It has a few nice little tweaks, like caching the strings created so it can decide to repeat itself, and properly inflecting the start and end of each “sentence”.

    I saved the generated audio (recorded with Audio Hijack) to iCloud, loaded it into AUM, and then recorded the results. Very pleased with it!

  • Further testing of File::LockDir and starting on PageArchive

    I spent some time in this last set of work cleaning up some more stuff in PageArchive and File::LockDir.

    First, I renamed the PageArchive::RCS class to PageArchive::Sequential, because it really doesn’t share much with RCS other than a file naming convention (name,version  to denote file version). It actually just stores each version of the file; no deltas, no branches, just a linear set of files for each version as it’s saved. 

    This has the advantage of being so stupid that it’s hard to get wrong, but the disadvantage that there’s a lot of duplicated data. Casting my mind back to 2000, I think that the decision was “we want a wiki running as soon as possible, and we don’t think it’ll be that big. Let’s just get this going.” I think that we did eventually run into space constraints, given that the delete() and purge() functions are in there for cleaning up old versions. For now, we’ll leave this as is, but it definitely needs addressing down the line.

    I started off with a tidy of the class, and committing that by itself to make it easier to track actual changes vs. whitespace fiddling.

    Second, I came back to fatal() and note(); these were way more complicated than they needed to be. I was originally wrapping up the callbacks so that they’d look like native methods, and cute as that was, it was actually less straightforward than simply returning a sub ref, and then just using that directly:

        $archive->fatal->(“this is our message”)

    I went though the code I had previously and cleaned it all up to match this spec instead. In addition, the default fatal() callback now uses carp(), so we see the error in the proper context instead of down inside the class where the error was found, but not where it happened.

    The setError and getError method names weren’t consistent, so I renamed them to set_error() and get_error().

    The dirhandle that the PageArchive class needs was being referred to via a direct lookup in the object hash, which is a code smell. Wrapped this up in a method too. Some other code was also factored out into methods, notably _dirname (setter/getter for the directory we’re using for the archive) and _page_in_archive (co-locating all of the “synthesize a name for a file in the archive from the object, a name, and a version). dh_reset was renamed rewind(), because it’s essentially a rewinddir() call. We retained the method to allow Win32 to close and reopen the dirhandle, as that’s the only way Win32 could do that operation in 2000. (It may have gotten better, but I don’t run Win32.)

    The big change at this point was committing to Test2 for all my testing (thanks, Chad!). It in particular made testing multi-process locking far easier to do. I still have to fork and wait (a little), but it’s much less work than before, and I can do tests in both the parent and child processes.

    PageArchive::Sequential was further reorganized to make the methods peculiar to it private; this gets me further toward creating a PageArchive interface — or perhaps an abstract base class — to make swapping PageArchive engines easier.

    I’d been away for a bit, so I added the Dist::Zilla Cover class to allow me to easily run Devel::Cover and see how good my test coverage was. Not great; File::LockDir was only 50% covered, so I decided to focus on getting that closer to covered before doing anything else. nlock_state wasn’t tested at all yet, and I was definitely going to need that for PageArchive::Sequential, so I wrote basic tests to verify that it worked. Discovered a few bugs, most notably that I wasn’t using the carefully-maintained locking cache to prevent having to read the directory each time.

    Now I was ready to start in on PageArchive::Sequential. The basic new() test was already done, but it was time to start testing the actual page manipulation functions. I started with put() and page_exists(), as those two are kind of the most basic. In the process, I realized that I hadn’t properly tested the “there’s no such directory”, “this is a file, not a directory”, and “I can’t write to this” cases. Added tests for those.

    The put() function was very straightforward, since Storable was pretty much handling all of it. I just needed to make sure that the files I expected did end up in the archive directory, and that was easy with -e and PageArchive’s page name generator method.

    I fixed my first TODO: remove the path crawl down the supplied path and change that over to make_path(). That simplified new() significantly.

    I now started to get into testing File::LockDir in context, and there were some rough edges here. I think most of them were introduced by swapping File::LockDir’s implementation to be an object, but there were a few issues that were there before that I either never saw, or lucked out along the happy path and didn’t hit. 

    One interesting aside: I had added Try::Tiny to PageArchive::Sequential…but I added it right after the use strict and use warnings, and not inside the package! Fortunately Perl::Critic pointed this out for me; the code otherwise would silently have executed both blocks…

    While running the tests, I saw a lot more tracing output in the log (printed to STDERR), and wrapped up more of it in if $self->debug. This entailed adding debug() to PageArchive::Sequential, as it didn’t have it. 

    I had used “lockee” to refer to the locking user, and besides being an esoteric word, it was also wrong. Changed this to locking_user to be accurate. In addition, while doing this cleanup, the tests showed me that I wasn’t being consistent in what was returned for locked vs. not locked; I had to reorder the code slightly and return different values to ensure that was working right. I also noticed that the owner file inside the locking dir was generated via concatenation and a magic string, so I moved that to a method and switched it to File::Spec->catdir.

    I added extra checks to the locking tests in PageArchive::Sequential to verify that locking one version marked all versions locked; this was because I wanted to prevent someone fiddling with the “commit” history while someone else has the file locked. I verified it was still possible to read the old versions via get() while the page was locked — this is desirable as someone may want to look at the page or older versions while someone else is editing.

    In the process, I found that max_version was throwing some weird errors, though it seemed to be working. I tracked this down to the readdir of the archive directory quite correctly returning the locking directory names as well as the page file names. Adding a grep to eliminate them cleaned that up. In addition, I rewrote the code that took the filenames and found the max version to be a full-up Schwartzian transform, so the code now looks like this:

        my @pages = grep { /,\d+$/ } $self->page_exists($name);
        return unless @pages;
    
         my @indexes = map  { $_->[1] }
                      sort { $b->[1] <=> $a->[1] }
                      map  { /^.*,(.*)$/; [$_, $1] }
                      @pages;
    
         return shift @indexes;

    We take the filtered page list, extract the version with a pattern match, create an array of [$name, $version], sort that descending numerically, and then shift it to return the highest number. Note also that we catch the “there aren’t any versions of this” case as well.

    Last, I tested page gets: can I get an arbitrary version? If I specify no version, do I get the most recent one? Do gets still work when the page is locked? If I specify a bad version, do I get the correct error? 

    Then it was tests for delete() and purge:

    • delete of a nonexistent version fails
    • delete with no version deletes newest
    • delete with explicit version deletes only that version
    • purge of a nonexistent page works (silently)
    • purge of an existing page deletes all versions

    There’s an inconsistency between delete and purge; I think I’ll probably need to go more in-depth into App::WebWebXNG to figure out if this was purposeful, or just a brain fart.

    Last test was for iterator(), which returns the highest revision of every file in the archive. (This would be trivial with a more modern SCM!) I did find another $^W = 0, which is a code smell…and no longer blocks the undef error I get in this code anyway. I removed it, changed the code that builds the hash recording the max version of each file so it does an exists check first, and everything now works.

    That’s the breakpoint for today. File::LockDir is 96% covered, and PageArchive::Sequential is 97% covered; taking a quick look at the code, everything left mostly seems to be difficult-to-test error cases, so I’ll move on to App::WebWebXNG in the next “sprint” and see if anything bites me.

  • iTunes Swedish Death Cleaning

    If you haven’t heard of “Swedish Death Cleaning”, the idea is that when you finally do drop dead, it’d be polite to not saddle whoever is taking care of your stuff with a big job of “is this important? should I keep it? should I just give all this away, or throw it away, because it’s just too much?”. Also, living with just the stuff that actually means something to you on a daily basis, as opposed to “I may want this someday, so I’ll keep it in my live gathering dust and generating clutter.”

    I definitely need to do more of that in my physical life, but this weekend I embarked on it in my digital one. Like most people, when I finally had iTunes and no longer had an actually “how full are my shelves?” physical limit, I started hoarding music. I had a lot of stuff from my old CD collection, music I’d bought from iTunes, the StillStream library from when I was maintaining the music library for that station’s ambient robot, music from friends who’d lent me CDs, stuff I’d borrowed from the library and timeshifted into iTunes to listen to “later”, free releases from Amazon…basically a huge pile of stuff. Worse, I’d put all this in iTunes Match, so even if I cleaned out my library, turning iTunes Match on again would just put all the crud back.

    In addition, my partner didn’t have a music library at all because her internal disk on her laptop was too small to keep all of her work and optional stuff as well. There was an offline copy of her old music library, and it too had also grown over the years from music lent to her, music I thought she might like, etc. She wanted to be able to pack up her CD collection and put it into storage, and maybe get rid of some of it as well. So we needed to take our old libraries and clean out anything that we didn’t want, and then see what each other might have that the other person might want afterward.

    I spent a couple evenings last week ripping the CDs she didn’t have online yet into a separate library, so they wouldn’t be part of the existing mess, and then went through and did the following in a brand new library:

    • Anything she actually owned got copied in. iPhoto’s ability to let me photograph the discs on the shelf and copy the text off of them came in very handy to make sure i got them all.
    • Anything I didn’t find in the library on that pass got ripped into this new library.
    • The not-previously ripped CDs in the secondary library were copied in.

    At this point, she had a clean “definitely mine” library. Now it was time to clean mine up. I had done one pass already to strip it down, but I wanted to make sure that I both cleaned out my iTunes Match library and made a conscious decision, “keep or not” for anything in there that I didn’t already have in the stripped-down library.

    The easiest way to do this was to to create a brand new, empty library, and connect that to iTunes Match, after turning on the “I want lossless copies” option — this is apparently new in Ventura, and is very welcome. Once this synced up, I could download and copy in only things I knew I wanted to keep. This meant I would actually have to look at the music and say, “do I really want to listen to this again?”, but not having to pull it out of an existing library would help.

    In addition, my partner had asked me to give her a copy of music of mine that I know she likes; we share a liking for world music, and several different other artists. After a little thought, I came up with the following:

    • There’s probably music in iTunes Match that we both want, and there’s definitely music I want. So let’s do this:
      • Create a new folder on a scratch disk that will contain music to add to her library.
      • Do the same for music I want to add to mine.
      • Drag those into the favorites in the finder.
      • Drag the Media folder from my target library to the sidebar as well. This will let me quickly check to see if a given release is already in my library , and if it is I can skip downloading it altogether, unless I want to give my partner a copy.
      • As I process each release in the Match library, I do the following:
        • If my partner would like it, download it.
        • If I want to keep it myself, open a Finder window using the Media folder shortcut and check if I have it.
          • If I do, simply delete it from the iTunes Match library (which also takes it out of iTunes Match).
          • If I don’t, download it.
        • If I downloaded it, right-click on one track in the iTunes list, and “Show in Finder”. This pops up a new Finder window with all the tracks for the release in it.
        • Command-Click on the folder name in the top bar of the window and go up one level to see the release in its enclosing folder.
        • Drag the release folder to the sidebar aliases for the “music to add” folders as appropriate.
        • Delete the tracks in iTunes. This removes them from the iTunes Match library, and iTunes Match as well.

    This took the better part of two days to finish, but I now have two cleaned-up music libraries, and an empty iTunes Match. I am considering whether to retain iTunes Match, mostly because it’s not a “backup” — it’s just a convenient way to share music across my devices, and doesn’t guarantee I’ll get the original file back.

    I’ve probably lost fidelity on some of the tracks I added to Match, and it’s possible some of them now have DRM. I will do another pass at some point and see; I’m not sure if it really makes a lot of difference to me right now, but I can always play them through Audio Hijack and re-record them to remove the DRM if I decide I want to.

    We also wanted a list of “what you have that I don’t” for both the final libraries; I was able to do that with Google Sheets, but I’ll post that as a separate article.

  • A full day of file locking

    Yesterday’s job was tackling File::LockDir, finishing up the conversion to a class, and actually putting it under test.

    Locking files on NFS

    The original version of this library was designed to do locking of a shared directory on an NFS fileserver. This meant that we had to figure out some workarounds for the inherently asynchronous filesystem.

    Our big breakthrough was figuring out that mkdir() was atomic on NFS. This meant that we could use the creation and removal of a directory as a semaphore; creating it was atomic, and if it already existed, trying to create it would return an error. Code that needed sequential access to a resource on NFS could make a directory using a standard naming convention. Once it was able to do so, it knew it had locked the corresponding resource. Releasing the lock was a simple matter of removing anything in the directory (we write a file there to record the owner and how long they’ve had it) and then doing rmdir() to release the resource. Anyone trying to get the resource when it was locked could spin on a mkdir and sleep loop until either they timed out or got the lock, which would then be cached in the locked files cache.

    Diving in: the very basics

    The original version of this code had a package global for the locked file cache, and twiddled the symbol table to add its logger and fatal error callbacks as well, effectively making them package global as well. I wanted to clean all this up, and get everything tested.

    First things first: the old library exports the locking functions, and we don’t want to do that anymore. They should only be called as instance methods. Took out the use of Exporter, and got rid of the @ISA and @EXPORT arrays.

    Adding tests, the first set of tests is for new().

    • We need to test if the logger and fatal parameter validation and installation works. The new test adds throws_ok tests that add an invalid argument for each of those, and a lives_ok one for valid arguments.
    • We then also want to test that the valid arguments work, so I passed in closures that pushed data onto arrays in the test’s namespace. Calling these via the File::LockDir methods allowed me to verify that the data was captured for both note() and fatal(), and that fatal() properly did a croak so that we could see where the error had actually occurred.
    • Some minor tweaking of the code in File::LockDir was needed to set the values in the object, to add the default note() and fatal() actions, and then the actual code that gets the methods properly in place.

    Items of note:

    • It was easiest to call the callbacks by assigning them to a scalar, and then calling $callback->(@args). There’s probably a cool way to dereference and call all in one shot, but this works and made it clear what was going on — a better investment of time.
    • I forgot several of the getters until I was actually testing the code because the original code had just stored everything in package globals. This was simple and I probably could have left it, but package globals are a code smell, and I wanted to be able to have each File::LockDir object be completely independent. Simpler for testing if nothing else.
    • I upgraded testing libraries to Test2::V0 partway through, and it vastly simplified getting the tests written.

    The locked file cache

    Originally, this was yet another package global hash, which has the advantage of being dead simple to access, but means that all instances of File::LockDir were potentially fighting over the hash. Rather than setting myself up for data races, I decided to move this to the object as well.

    The second set of tests actually instantiated the File::LockDir object, and then verified it had properly validated its arguments and saved the data into the object. I chose to implement the method as if it were polymorphic:

    • If called with one argument, locked_files() assumed this was a initializer hash and saved the supplied hash reference, after validating that it was as hash reference.
    • If it’s not a hashref, then it assumes this must be a key to lookup a locked file and return the lock holder info. It does a hash lookup with the key and returns the value. (Unlocked files will populate the hash with undefs for each lookup; this probably should be insulated a bit with exists() calls to keep from vivifying hash entries that don’t contain anything.
    • Last, if there are two arguments, they are assumed to be a key and a value, and this key-value pair is stored in the hash.

    How the locking data works, and testing the locking

    We start off with a base directory; it doesn’t have to be the same path as the page repository, but nothing stops it from being the same. (In our original deployment, it was easier for us to have the page repository and the locking directory share the same storage; this wasn’t a problem because the filenames of the pages couldn’t contain non-alphabetics that are used to name the lock directory.)

    In that directory, we do a mkdir() for filename.LOCK. This is atomic in all the filesystems we care about, and fails if the directory already exists. This gives us an atomic test-and-set operation for the file locking semaphore. Once we’ve obtained the locked directory, no one else is (by convention) going to access it, so it’s now safe to open, write, and close a file inside that semaphore directory. We open filename.LOCK/owner and write the username of the owner plus the date and time.

    This requires the following tests:

    1. Validate that we got a directory name, that the directory exists, and that we can write to it. (If the lock directory can’t be written, we can’t maintain locks.)
    2. Verify we return success and the lock info if the file is in locked_files. (The cache works.)
    3. With an empty locked_files, try to lock a file in the lock directory. Verify it succeeds, sets the locked_files cache and that the requestor’s username is in the locking data. (We can lock a file and cache it.)
    4. With an empty locked_files and a .LOCK directory in the locks directory for the file, verify that we fail when we try to lock the file again, and that we get the stored locking data. (Trying to relock a file that has a .LOCK directory fails, loads the locked_files cache, and returns the current owner.)
    5. Create an empty directory and cache. Lock a file. Verify it got locked. Fork another process that unlocks the file and exits, while trying to lock the same file with a different user. The unlocking process should succeed, the locking process should succeed, and the file should now be locked by the second user. (The retry process works, and properly relocks the file if another lock owner unlocks it.)

    The only really tricky test is the two-process one; that gets handled with Test2::AsyncSubtest, and looks like this:

    # 5. Happy-ish path: lock exists, but goes away before tries elapse.
    # Locking succeeds.
    my $dir = tempdir(CLEANUP => 1);
    $path = File::Spec->catfile($dir, 'foo');
    # callback capture arrays. Not used this test, but clear them anyway
    @l = ();
    @f = ();
    # Create a new File::LockDir object for this run.
    $locker = new_locker(tries => 10, sleep => 1);
    # Fork: the child immediately unlocks the file and exits;
    #       the parent spins waiting for the lock, gets it,
    #       and waits for the child to exit.
    my $ast = Test2::AsyncSubtest->new(name => ‘unlocker');
    $ast->run_fork(
      sub {
        my $locker2 = new_locker();
        $locker2->nfunlock($path);
        # This is a subtest, so at least one test has to happen.
        pass "Async unlock succeeded";
    });
    # Parent: try to lock.
    ($status, $owner) = $locker->nflock($path, 0, "RealUser");
    is($status, 1, "lock successfully switched");
    like($owner, qr/RealUser/, "now locked by RealUser”);
    # close out the forked subtest.
    $ast->finish;

    Other notes and changes

    The original spin loop was much more complex than it needed to be. It was trying to calculate if we’d overstayed our welcome on retries with time calculations when a simple countdown was far more straightforward. I added a new method, tries(), and added a tries parameter in new(). The logic now uses the value of the count as the while loop test, causing it to drop out if the count gets to zero. The “should we retry” checks are all “are we out of retries” instead of complicated offsets from the start time of the loop.

    Things to do

    Things that I might want to do, based on today’s work

    Waiting fractional seconds with more iterations might be better. Probably needs some benchmarking to figure out.

    The locking data definitely should be changed to something structured to make it easier to generate and consume.

  • WebWebXNG: history and goals

    [It seems like a good idea to lay out some history of this project and its goals, so I’m posting this before today’s progress update.]

    WebWebXNG Overview

    We’ve been concentrating on the nuts and bolts of the conversion, but maybe we should step back and look at the project as a whole, so we can get some perspective on where it started, where we are, and where we might go from here.

    “Ed’s Whiteboard” and Wikis

    Around 1997, we had just recently converted to Unix machines from IBM mainframes at the Goddard Space Flight Center, where I was then working as a contractor on a relatively large Perl project. Collaboration tooling was severely lacking. Bug trackers were very difficult to install or very expensive.

    Our process was mostly email, and one actual physical whiteboard in our project lead’s office had become the definitive source of truth where everything was recorded. There were numerous DO NOT ERASE signs on the door, on the whiteboard, next to it…it was definitely a single point of failure for the project, and it literally required you to go walk over to Ed’s office to get the project status, or to take notes on paper and transfer them to one’s own whiteboard/files/etc. If you needed to know something about the project, “Ed’s whiteboard” was where you found it.

    Our process was weekly status meeting, we agree on who’s doing what, Ed – and only Ed! – writes it on his whiteboard, and as the week goes on we mail him with our updates, which he’d then transfer to the board. It did give us a running status, a list of bugs, and assignments, but it was clear that we needed something better, more flexible, and less fragile than something that could be destroyed by an accidental brush up against it or a well-meaning maintenance person.

    Bettering the whiteboard

    It was about this time that I stumbled on the original c2.com WikiWiki. The idea that a website could be implemented in such a way that it could extend itself was a revelation. (Most websites were pretty simple-minded in 1997; you’d code up a bunch of HTML by hand and deploy it to your web server, and that was it.) It took a few days for the penny to drop, but I realized that, hey, we could move Ed’s whiteboard to a website! Instead of writing things on a physical whiteboard and worrying that it might get erased, we could record everything on a virtual one, share it among all the team members, and have historical backups of the project state.

    We could discuss things in sort-of real time and have a record of the conversation to refer to later, and link the discussion to a bug, or a feature request, or…

    We could track bugs on one page, assignments on another, have room to discuss implementation, record the minutes of our status meetings, and just generally document things we needed to share amongst ourselves. It was Ed’s whiteboard plus, and best of all, we could do it for free!

    We did have a few extra requirements. The biggest one was that we needed to be able to provide different levels of access to accessing and editing the site, depending on who you were. After some searching around, I found David McNicol’s WebWeb.

    WebWeb and its evolution to WebWebX

    WebWebX started off as an extended version of WebWeb, a derivative of Ward Cunningham’s original WikWIki, written by David McNicol at the University of Strathcyled . WebWeb was a monolithic CGI script that added a few extra features to the original WikiWiki codebase, most notably access levels, which we needed more than anything else. We wanted to have some pages publicly readable, but only writable by project members, and some project tracking available for read by a subset of users and editable only by a yet smaller subset — the dev team.

    WebWeb did have some limitations that made it not quite good enough out of the box. The biggest issue was the original data storage, which used Storable and Perl DBM files; pages were “frozen” with Storable, making them into strings. They were then stored in the DBM file under a key composed of the page name and a version number; this made operations like history, removing old versions, searching, etc. all relatively easy, since a DBM file looked like a hash to the Perl code.

    The biggest problem with this was that the size of a page was limited by the per-item storage capacity of the underlying DBM implementation, meaning that a page that got “too big” suddenly couldn’t be saved any more. This was a real usability issue, as it was very difficult to predict when you’d exceed the allowable page size — and worse, different Perl implementations on different machines might have radically different limitations on the allowable page size.

    I undertook a rewrite of WebWeb to make it more modular, easier to maintain, and more performant, most specifically focusing on the page size issue. It was clear that we’d need to fix that problem, but most of the rest of WebWeb was fine as it was.

    RCS and PageArchive

    I started out by “factoring out” (not really, because there was no test suite!) the DBM code into a separate class which I dubbed PageArchive, creating an interface to the page management code as a separate class. A reasonable choice to allow me to change the underlying implementation; I’d learned enough OO programming to have the idea of a Liskov substitution, but none of us really had internalized the idea that writing a test suite for a module was a good idea yet.

    This complexified the mainline code a bit, as accessing the pages needed to use function calls instead of hash accesses, but it wasn’t too bad — the overall size of the project was fairly small, and the vast majority of the lines of code was inlined HTML heredocs.

    With the page storage code isolated in PageArchive, I could start replacing the mechanism it used for storage. One of the other tools we’d recently started using was RCS to do source code management, mostly because it was easy to start using and it was free with the operating system. We might have been able to use CVS, but it would have required a lot more coordination with our system administrator; with RCS, we just had to decide to start using it.

    From 25 years later, RCS looks hideously primitive. Each file is individually versioned, with a hidden file next to it containing the deltas for each version. For our code, even though it helped us a lot on a day-to-day basis, it made release management very difficult— a code freeze had to be imposed, and a tar file built up out of the frozen codebase. This then had to be deployed to a separate machine and manually tested against a list of features and bugs to ensure that we had a clean release. This all fell on the shoulders of our release manager, who was, fortunately for us, very meticulous! Not something that would work nowadays, but for 1997, it was a huge improvement.

    For storing pages in a wiki, on the other hand, RCS was great. We really were more interested in maintaining the history of each individual page rather than of the wiki as a whole anyway, so RCS perfectly reasonable for this. I updated PageArchive to use RCS versioning to store stringified versions of the pages instead of storing them in DBM. Because I’d already abstracted the operations on the pages, it was easy to swap out one implementation for another. Pages could now be whatever size we wanted!

    Edit races and locking

    The wiki was a huge success. We were able to to move everything from the physical whiteboard to “Ed’s Whiteboard” the website. Unfortunately success came with a problem: we were updating the pages a lot, and very often we’d have an edit race:

    1. Alice starts editing a page.
    2. Bob starts editing the same page.
    3. Bob saves their changes.
    4. Alice saves their changes.
    5. Bob is confused because their edits aren’t there, but something else is. Did Alice remove their edits?

    This was recoverable, as the “previous” version of the page had Bob’s changes, but there had to be a phone call or an email: “Alice, did you change my edits to X? You didn’t? OK, thanks!”, and then Bob had to look back through the page archive for his edits, copy them out, and then re-edit the page. In the meantime, Carol has started yet another set of edits, and after Bob makes his changes, Carol saves…lather, rinse, repeat.

    On busy days, our “productivity tool” could end up being pretty unproductive for slow typists. We needed a way to serialize edits, and I came up with the idea of the edit lock. When a page was actively being edited, it was marked as locked (and by who) when someone else accessed it, even for read. This made it clear that editing was in progress and prevented edit races by simply not allowing simultaneous edits at all. Because the person editing the page was flagged, it was possible for a second person to call or email them to ask them to save and release the lock. This did have the problem that if someone started an edit and went to lunch or went home for the day, the page would be locked until they came back. This was fixed by adding a “break edit lock” feature that turned off Alice’s lock and allowed Bob to edit the page. The wiki emailed Alice to let her know that the edit lock had been broken.

    This only worked because we had a limited number of users editing the wiki; it wouldn’t have worked for something the size of Wikipedia, for instance, but our site only had about a half-dozen active users who all knew each other’s phone numbers and emails. If someone had a page busy for an extended time, we generally called to ask them to save so someone else could edit — lock breaking was infrequent, and mostly only used when someone had had a machine crash while they were editing.

    This was our primary project tracking tool up until around 2005, and it served us pretty well.

    Fast-forward 25 years…

    Tooling has improved mightily since 1997, or even 2005. GitHub, GitLab, JIRA…lots of options that integrate source control, bug tracking and even wikis for documentation. Every once in a while, though, a standalone wiki is handy to have. There are services that provide just wikis, such as Notion, but a wiki that provides both public and private access, for free, is hard to find.

    I’m one of the DJs and maintainers at RadioSpiral (radiospiral.net), and we have a lot of station management stuff to track: artists who have signed our agreement (we are completely non-profit, so artists who have released their music under copyright have to waive their ASCAP/BMI/etc. rates so we don’t personally go broke having to pay licensing fees); URLs and ports for the listening streams; configurations to allow us to broadcast on the site; and lots more.

    Some of this info is public, some very private — for instance, we don’t want to publish the credentials needed to stream audio to the station for just anybody, but having them at hand for our DJs is very useful. Putting all this in a wiki to make it easy to update and have it centrally located is a big plus, but that wiki needs what the old one at Goddard had: delineated access levels.

    High-level project goals

    • Get WebWebX working as a standalone application that doesn’t require extensive CGI configuration to work. The original WebWebX was tightly integrated with Apache, and required Apache configuration, adding of ScriptAlias, and a lot of other tedious work. Ideally, the new version should be as close to “install and run” as possible.
    • Modernize the codebase; most importantly, add tests. WebWebX worked incredibly well for code without tests, but I no longer am so sure of myself! In addition, the HTML “templating” is all inline print() statements, and I’d really prefer to do this better.
    • Convert the code to a contemporary stack with a minimum of requirements to install it. I’ve chosen Mojolicious because it’s quite self-contained. I did not choose Catalyst or Dancer; both of those are great, but they definitely require a lot more prerequisites to install.
    • Make this project something that’s generally useful for folks who just want a controlled-access wiki that’s easy to install, easy to deploy, and easy to manage.

    Ideally, I want something that can be deployed to Heroku or Digital Ocean by checking out the code, setting some environment variables, and running it. We’ll see how close I can come to this ideal with a Perl stack.

  • More codebase cleanup, perlcritic, and POD coverage

    TIme to take a look at all the other stuff lying around in the codebase.

    • INSTALL is pretty tied up with the details of hooking a non mod_perl CGI script into Apache. Some of it will be salvageable, but mostly not. The section on first-run may be useful, but definitely there’s going to need to be a way to set up a privileged account before the first run.
    • LICENSE will need to be updated with the original release date. I’ll go look at the ibiblio site for that.
    • Making a note to move THANKS into README.
    • USING is mostly okay. It’s a very quick intro to using a wiki. The section on logging in will need some editing, as it assumes the WikiWiki model of “anyone can add an account”.
    • The bin/insert-mail script was a hack specifically for our use as a bug tracker. We probably don’t need it, and there are significant security issues to address if we decide we do. Deleting this; we can always get it out of Git if we change our minds.
    • The cgi-bin directory can go away; the script there really just calls the code we moved to App::WebWebXNG.pm.
    • The docs directory contains a set of fixed HTML documents. They probably want a reformatting and possibly a rewriting, but we can leave them as they are right now.
    • Everything in old-lib has been moved elsewhere; that can go away.

    Back to the code!

    I revamped the dist.ini file to use [@Basic] and removed the duplicate actions. It now looks like this:

    [AutoPrereqs]
    [@Basic]
    [PruneCruft]
    [ExtraTests]
    [Test::Perl::Critic]
    [PodCoverageTests]
    [PodSyntaxTests]
    [@Git]

    The next step is to get everything tidied up and passing the perlcritic tests. To that end, I moved the start of main() in App::WebWebXNG to only encompass the actual old main program and added a leading underscore to _setup_kludge. That keeps us from having to document something we’ll be removing anyway, and un-nests the rest of the methods in that module to get rid of a huge number of perlcritic errors.

    I’ve also moved the old PasswordManager code to App::WebWebX::AuthManager; the old code manages an Apache htpasswd basic auth file, but the structure will do as an interface to some kind of more modern authentication management. (Notable in there: no password requirements! It was a simpler time.)

    Next is to remove the code we definitely don’t need or want: the insert-mail script, the CGI wrapper, everything in old-lib, and the license file from GitHub (Dist::Zilla will generate one).

    >File::LockDir fiddles with the symbol table, and I don’t want to do that any more. I’ll restructure it as a class. It’ll also need some tests, and I’ll have to start writing those and fixing up the code to pass.

    The perlcritic tests and POD coverage tests are running, and failing, so I’ll need to start fixing those. I started on this and actually realized that I hadn’t committed the tidied code before starting to work on it, so I created a branch, wound the history back with git reset, committed the tidied code, and then cherry-picked back to the current state. This let me keep the critic and POD changes separate.

    For the modules failing the POD tests, I’d actually added block comments that would work perfectly fine as POD when I originally wrote the code, so I just needed to do the mechanical task of converting them. There were a lot of them, but it was very easy editing, so I just went ahead and cleaned that up by hand.

    Critic fixes were primarily making all of the loop variables lexical and removing bare word file handles. There are two methods that store and reload the global config that use a string eval(); they’re ## no critic marked for now, but I want to think about a better setup for that. My current reflexes say “database”, but I’m trying to minimize dependencies. Let’s see how that goes and defer the decision.

    I started adding some tests: if PageArchive gets no directory, it should die.

    At this point, we pass all of the tests that we have; the code is barely tested, but the POD and critic tests, which had a lot of errors, are all fixed, and the couple of validation tests I added are passing.

    That will do for this pass.

  • Clearing the decks: removing ancient Perlisms and stripping down

    The next task is getting App::WebWebXNG to build, let alone pass any tests.

    First up: I’ve changed the name of the page archive library, so I need to change the use statement, and fix up the new() call (making it direct invocation syntax while I’m at it).

    The defined %hash syntax is no longer valid, so we need to fix that. The usages we have in this script are really “is there anything in this hash” checks, so keys will work to fix these.

    it uses a lot of globals. This results from repackaging a Perl 4 script and making as few changes as possible to get it running. The vast majority are defined in webwebx.pl, but there are a couple – no, sorry, a bunch – that come from the CGI script. We need to add a use vars for these. Found two on the first run, then after the defined %hash issues were fixed, there were a bunch more. Adding them as we go.

    “Replacement list is longer than search list”. There’s an interesting one! This is a tr that should be an s//g.

    Okay,. load test passes! It doesn’t actually do anything, but that much is working. Good.

    Let’s go look at the CGI script and see what it’s doing to initialize the globals we had to add; those are going to have to be set up somehow (for now, I think I’ll just add a setup_kludge function to do it). The variables we’re setting up here are mostly related to knowing where the script is hosted so that the internal link URLs are right, the location of the static files, and the location that stores all the data. Mojolicious should allow us to dispense with a lot of this and build the URLs as relative rather than absolute.

    Now for some serious cleaning up. Let’s set up Perl::Tidy and Perl::Critic. Perl::Tidy is pretty critical, because the indentation is all over the place, and it’s hard to read the code. And Perl::Critic is just good insurance. I’m using policies similar to those we used at Zip.

    Running those found a lot of things that needed neatening up…and several outright bugs!

    1. App::WebWebXNG had one perlcritic issue, a my with a trailing conditional. Not too bad for 25-year-old code.
    2. However, PageArchive::RCS had a lot of things to fix up.
      1. No use warnings. Okay, that one’s pretty easy.
      2. Tried to set the Rewound attribute for a directory; the code was after a return so it couldn’t be reached. When it was moved to be reachable, it was using a variable that didn’t exist! Needed to be using the instance variable for the object.
      3.  All of the open() calls used the old two-argument syntax. It’s still supported but it’s lousy practice, so I edited all of the open() calls in App::WebWebXNG and in PageArchive::RCS.
      4. There were several places where an if(my $foo... referenced $foo outside of the block. This changed sometime between Perl 5.6 and 5.38 (which I’m testing this with), so all of those had to be moved outside of the block.
      5. Finally, one method in PageArchive::RCS tried to use $self without creating it in scope. This would result in never getting error messages back, and may have hidden other bugs. We’ll see.

    We’re back to all tests passing, perlcritic happy, and perltidy happy.  Created the repo on GitHub, pushed the work to date. Hang on, need to add a WIP marker…okay, got it.

    A good morning’s work!