Category: Perl

  • Just barely not Perl 4: diving into the old WebWebX codebase

    Hoo boy.

    I’ve put the basics in place now: there’s an App::WebWebXNG.pm nodule, and I’ve moved the page management and file locking modules into /lib. The load tests for the existing library modules pass, but there aren’t any functional tests yet.

    Now, on to the old core script, webwebx.pl.

    I’ve imported it as close to as-is as possible into App::WebWebX.pm, and added a main if not caller() to run the old script as the main program.

    This script was just barely converted from Perl 4. There’s a giant pile of globals, and the majority of the “database” stuff it does is in DBM (if anyone still remembers that). I don’t even know if DBM still exists in more modern Perls!

    All of the HTML generation is from interpolated print statements. There’s no CSS (browsers didn’t even support such a thing at the time; it was Mosaic or nothing. Okay, maybe IE, but the number of Windows machines on base at GSFC that were being used by our user community was probably countable on one hand.).

    This should be convertible to Mojo::Template relatively easily, which is good. And the command dispatch is driven off a hash of code references, so that should work fairly well too.

    It’s not terrible, it’s just old. Well, off to see how much will work!

  • WebWebXNG: revisiting a 25-year-old project

    The past

    Back in 1998 or so,  or long after I’d switched for system administrator to web developer, I stumbled across Ward Cunningham’s original WikiWiki. It was, at the time, a mind-blowing idea: a website that people could edit and extend themselves, without any programming at all. Simply sign in to the wiki, and start editing. Adding a specially-formatted word automatically generated a link to another page, either an existing one…or a brand new one, that you could start expanding on yourself.

    I can’t say that I conceived of Wikipedia when I saw this, but I absolutely zeroed in on how we could use it for several problems we had:

    • We didn’t have a bug tracker/project tracker for our project. With a wiki, we could just have a page that linked to all of the features we were working on and the bugs we were fixing.
    • We didn’t have a formal release process at all, or much in the way of source control. We started using RCS and noting the version number(s) of files that fixed bugs. We still had to build up a canonical checkout of everything, but we at least had some tracking that way.
    • We really wanted (and needed) an easy way to build a reference manual for our users that was easy or them to browse and search, and easy for us to keep up to date.

    We (okay, I) decided to try a wiki. The original WikiWiki didn’t have a number of features we really felt like we needed for this to work: no authorized users and no access control being the big issues. I found WebWeb, original written by (I will have to look at the WebWebX source!), which had part of, but not all of what I needed, and with their permission, I created an extended version, rather unimaginatively called WebWebX.

     

    The present

    RadioSpiral has a lot of stuff that we need to have documented: how to connect to the streams, configs, where Spud lives and how to reboot him, policies, etc., and it’d be nice to have all that in a wiki instead of in documents (our last update of our docs was 5 years ago!). I remembered that we’d had a private Notion instance at ZipRecruiter — it wasn’t great, but it was usable, and private. So I signed up for Notion…and discovered for a mere $720 a year, I could have the level of support that included a private wiki.

    Given that RadioSpiral’s income is in the red at all times — it’s 100% a labor of love, and a place for us to have fun while playing good music — that was just not a tenable solution. I didn’t want to run the old Zip wiki either — it was written in Haskell, and I didn’t feel like learning a whole new programming paradigm just to get a private wiki.

    The I remembered, well, I have the old WebWebX source out there, and it did have access control. Maybe I could get it running again, and modernize it in the process. I’ve pulled the source from ibiblio and started working on the conversion. First things first, I’ve installed Dist::Zilla so I can build it out in some kind of reasonable fashion, and I’ve decided to base the whole thing on Mojolicious to try to make it as self-contained as possible.

    My goal is a private wiki that can be deployed with a dead minimum of effort. Which will probably entail a lot of effort to write and fix up, but that’s time better spent than trying to find a free alternative somewhere that I’ll have to accept compromises in, or yet another paid service that I’ll have to pay for myself.

    So far, I’ve created the initial README.md, initialized Dist::Zilla in the new App::WebWebXNG repo, and imported the old code into the repo to start work. I’m choosing to implement the main program as a modulino, to make it easy to test (did I mention that the old code has exactly zero tests?).

    Updates to follow!

  • Broken iframes and HTML::TreeBuilder

    We had a situation last week where someone had entered a broken <iframe> tag in a job description and our cleanup code didn’t properly remove it. This caused the text after the <iframe> to render as escaped HTML.

    We needed to prefilter the HTML and just remove the <iframe>s. The most difficult part of this was figuring out what HTML::TreeBuilder was emitting and what I needed to do with it to do the cleanup. It was obvious that this would have to be recursive, since HTML is recursive (there could be nested, or multiple uncosed iframes!) and several tries at it failed until I finally dumped out the data structure in the debugger and spotted that HTML::TreeBuilder was adding “implicit” nodes. These essentially help it do bookkeeping, but don’t contain anything that has to be re-examined to properly do the cleanup. Worse, the first node contains all th text for the current level, so recursing on them was leading me off into infinite depths, as I kept looking for iframes in the content of the leftmost node, finding them, and uselessly recursing again on the same HTML.

    The other interesting twist is that once I dropped the implicit nodes with a grep, I still needed to handle the HTML in the non-implicit nodes two different ways: if it had one or more iframe tags, then I needed to use the content method to take the node apart and process the pieces. There might be one or more non-iframes there, which end up getting returned untouched via as_HTML. If there are iframes, the recursion un-nests them and lets us clean up individual subtrees.

    Lastly, any text returned from content comes back as an array of strings, so I needed to check for that case and recurse on all the items in the array to be sure I’ve filtered everything properly. My initial case checks for the trivial “no input so no output”, and “not a reference” to handle the starting string.

    We do end up doing multiple invocations of HTML::TreeBuilder on the text as we recurse, but we don’t recurse at all unless there’s an iframe, and it’s unusual to have more than one.

    Here’s the code:

    +sub _filter_iframe_content {
      my($input) = @_;
      return '' unless $input;
    
      my $root;
      # We've received a string. Build the tree.
      if (!ref $input) {
        # Build a tree to process recursively.
        $root = HTML::TreeBuilder->new_from_content($input);
        # There are no iframe tags, so we're done with this segment of the HTML.
        return $input unless $root->look_down(_tag=>'iframe');
      } elsif (ref $input eq 'ARRAY') {
        # We got multiple strings from a content call; handle each one in order, and
        # return them, concatenated, to finish them up.
        return join '', map { _filter_iframe_content($_) } @$input;
      } else {
        # The input was a node, so make that the root of the (sub)tree we're processing.
        $root = $input;
      }
    
      # The 'implicit' nodes contain the wrapping HTML created by
      # TreeBuilder. Discard that.
      my @descendants = grep { ! $_->implicit } $root->descendants;
    
      # If there is not an iframe below the content of the node, return
      # it as HTML. Else recurse on the content to filter it.
      my @results;
      for my $node (@descendants) {
        # Is there an iframe in here?
        my $tree = HTML::TreeBuilder->new_from_content($node->as_HTML);
        if ($tree->look_down(_tag=>'iframe')) {
          # Yes. Recurse on the node, taking it apart.
          push @results, _filter_iframe_content($node->content);
        } else {
          # No, just return the whole thing as HTML, and we're done with this subtree.
          push @results, $node->as_HTML;
        }
      }
      return join '', @results;
    }
    
  • Scraping Patchstorage

    I lost an important VCVRack patch a couple days before Mountain Skies 2019. It was based on a patch I’d gotten from patchstorage.com, but I couldn’t remember which patch it was. I tried paging through the patches on the infinite scroll, but it wasn’t helping me much. I knew the patch had Clocked and the Impromptu 16-step sequencer, but I couldn’t remember anything else about it after seriously altering it for my needs.

    I decided the only option was going to have to be automated if I was going to find the base patch again in time to recreate my performance patch. I hammered out the following short Perl script to download the patches:

    use strict;
    use warnings;
    use WWW::Mechanize;
    use WWW::Mechanize::TreeBuilder;
    
    $|++;
    
    my $base_url = "https://patchstorage.com/platform/vcv-rack/page/";
    my $mech = WWW::Mechanize->new(autocheck=>0);
    WWW::Mechanize::TreeBuilder->meta->apply($mech);
    use constant SLEEP_TIME => 2;
    
    my $seq = 1;
    my $working = 1;
    while ($working) {
      print "page $seq\n";
      $mech->get($base_url.$seq);
      sleep(SLEEP_TIME);
      my @patch_pages = $mech->look_down('_tag', 'a');
      my @patch_links = grep {
        defined $_ and
        !m[/upload\-a\-patch\/] and
        !m[/login/] and
        !m[/new\-tutorial/] and
        !m[/explore/] and
        !m[/registration/] and
        !m[/new\-question/] and
        !m[/explore/] and
        !m[/platform/] and
        !m[/tag/] and
        !m[/author/] and
        !m[/wp\-content/] and
        !m[/category/] and
        !/\#$/ and
        !/\#respond/ and
        !/\#comments/ and
        !/mailto:/ and
        !/\/privacy\-policy/ and
        !/discord/ and
        !/https:\/\/vcvrack/ and
        !/javascript:/ and
        !/action=lostpassword/ and
        !/patchstorage.com\/$/ and
        ! $_ eq ''} map {$_->attr('href')} @patch_pages;
        my %links;
        @links{@patch_links} = ();
        @patch_links = keys %links;
        print scalar @patch_links, " links found\n";
        for my $link (@patch_links) {
          next unless $link;
          print $link;
          my @parts = split /\//, $link;
          my $patch_name = $parts[-1];
          if (-f "/Users/jmcmahon/Downloads/$patch_name") {
            print "...skipped\n";
            next;
          }
          print "\n";
          $mech->get($link);
          sleep(SLEEP_TIME);
          my @patches = $mech->look_down('id', "DownloadPatch");
          for my $patch (@patches) {
            my $p_link = $patch->attr('href');
            next unless $p_link;
            print "$patch_name...";
            $mech->get($patch->attr('href'));
            sleep(SLEEP_TIME);
            open my $fh, ">", "/Users/jmcmahon/Downloads/$patch_name" or die "Can't open $patch_name: $!";
            print $fh $mech->content;
            close $fh;
            print "saved\n";
          }
        }
        $seq++;
     }
    

    Notable items here:

    • The infinite scroll is actually a chunk of Javascript wrapped around a standard WordPress page setup, so I can “page” back through the patches for Rack by incrementing the page number and pulling off the links to the actual posts with the patches in them.
    • That giant grep and map cleans up the links I get off the individual pages to just the ones that are actually links to patches.
    • I have a couple checks in there for “have I already downloaded this?” to allow me to restart the script if it dies partway through the process.
    • The script kills itself off once it gets a page with no links on it. I haven’t actually gotten that far yet, but I think it should work.

    Patchstorage folks: I apologize for scraping the site, but this is for my own use only; I”m not republishing. If I weren’t desperate to retrieve the patch for Friday I would have just left it alone.

  • A HOWTO for Test::Mock::LWP

    I was clearing out my CPAN RT queue today, and found a question in the tickets for Test::Mock::LWP from dcantrell:

    It’s not at all clear how to use this module. I have a module which (partly) wraps around LWP::UserAgent which I use to fetch data which my module then manipulates. Obviously I need to test that my module handles webby errors correctly, for instance that it correctly detects when the remote sites don’t respond; and I need to be able to feed known data to my module so I can test that it does those manipulations correctly.

    Test::Mock::LWP is the obvious candidate for faking up LWP::UserAgent, but I just can’t figure out how to use it. Please can you write a HOWTO and add it to the docs.

    I’m adding the HOWTO tonight, even though the question was asked 12 years ago (I really need to get to my RT queue more often). The module’s description as it stands is pretty opaque; this explanation should, I hope, make it much more clear.

    HOWTO use Test::Mock::LWP

    Test::Mock::LWP is designed to provide you a quick way to mock out LWP calls.

    Exported variables

    Test::Mock::LWP‘s interface is exposed via the variables it exports:

    • $Mock_ua – mocks LWP::USerAgent
    • $Mock_req / $Mock_request – mocks HTTP::Request
    • $Mock_resp / $Mock_response – mocks HTTP::Response
    • All of these are actually Test::MockObject objects, so you call mock() on them to change how they operate dynamically. Here’s an example.

      Let’s say you wanted the next response to an LWP call to return the content foo and an HTTP status code of 201. You’d do this:

       
      BEGIN {
        # Load the mock modules *first*.
        use Test::Mock::LWP::UserAgent;
        use Test::Mock::HTTP::Response;
        use Test::Mock::HTTP::Request;
      }
      
      # Load the modules you'll use to actually do LWP operations.
      # These will automatically be mocked for you.
      use LWP::UserAgent;
      use HTTP::Response;
      use HTTP::Request;
      
      # Now set up the response you want to get back.
      $Mock_resp->mock( content => sub { 'foo' });
      $Mock_resp->mock( code    => sub { 201 });
      
      # Pretend we're making a request to a site.
      for (1..2) {
        my $req   = HTTP::Request->new(GET => 'http://nevergoeshere.com');
        my $agent = LWP::UserAgent->new;
        my $res   = $agent->simple_request($req);
      }
      # The values you added to the mock are now there.
      printf("The site returned %d %s\n", $res->code, $res->content);
      

      This will print

      201 foo
      201 foo
      

      Getting more than one value out of the mocks: repeated re-mocks

      Note that the values are constrained to what you’ve sent to the mocks. The mock here will simply keep returning 201 and foo for as many times as you call it. You’ll need to re-mock the content and code methods
      each time you want to change them.

      my $req   = HTTP::Request->new(GET => 'http://nevergoeshere.com');
      my $agent = LWP::UserAgent->new;
      
      $Mock_resp->mock( content => sub { 'foo' });
      $Mock_resp->mock( code    => sub { 201 });
      my $res   = $agent->simple_request($req);
      
      printf("The site returned %d %s\n", $res->code, $res->content);
      # 201 foo
      		
      $Mock_resp->mock( content => sub { 'bar' });
      $Mock_resp->mock( code    => sub { 400 });
      my $res   = $agent->simple_request($req);
      
      printf("The site returned %d %s\n", $res->code, $res->content);
      # 400 bar	
      

      Moving the logic into the mocks

      If you have a fixed sequence of items to return, just add them all to the mocks and have the mocks step through them. Here’s an example where we hand off two lists of values to the mocks:

      use strict;
      BEGIN {
        # Load the mock modules *first*.
        use Test::Mock::LWP::UserAgent;
        use Test::Mock::HTTP::Response;
        use Test::Mock::HTTP::Request;
      }
      
      # Load the modules you'll use to actually do LWP operations.
      # These will automatically be mocked for you.
      use LWP::UserAgent;
      use HTTP::Response;
      use HTTP::Request;
      
      my @contents = qw(foo bar baz);
      my @codes    = qw(404 400 200);
      
      # initialize counter.
      my $code_counter = 2;
      my $content_counter = 2;
      
      my $content_sub = sub {
        $content_counter += 1;
        $content_counter %= 3;
        $contents[$content_counter];
      };
      
      my $code_sub = sub {
        $code_counter += 1;
        $code_counter %= 3;
        return $codes[$code_counter];
      };
          
      $Mock_resp->mock(content => $content_sub);
      $Mock_resp->mock(code    => $code_sub);
          
      my $req   = HTTP::Request->new(GET => 'http://nevergoeshere.com');
      my $agent = LWP::UserAgent->new;
          
      for (0..5) {
        my $res   = $agent->simple_request($req);
        printf("The site returned %d %s\n", $res->code, $res->content);
      }
      

      This will print

          The site returned 404 foo
          The site returned 400 bar
          The site returned 200 baz
          The site returned 404 foo
          The site returned 400 bar
          The site returned 200 baz
      

      Remember: the key is make sure that the mock is ready to return the next item when you make the next request to the user agent.

  • Recovering my old Scape files

    My original iPad finally bit the dust in August, just before I could get a final good backup of it. Most of the stuff on it was already backed up elsewhere (GMail, Dropbox, iCloud), but Scape was the exception.

    Scape is (at least not yet) able to back up its files to the cloud, so there wasn’t anyplace else to restore from — except I had take advantage of the fact that under iOS5, the files in the app were still directly readable using Macroplant’s iExplorer, so I had actually grabbed all the raw Scape files and even the Scape internal resources. Sometime I’ll write up what I’ve figured out about Scape from those files…

    The Scape files themselves are just text files that tell Scape what to put on the screen and play, so the files themselves were no problem; they don’t include checksums or anything that would make them hard to work with.


    Version:0.20
    Mood:7
    Date:20121113025954
    Separation:0.50
    HarmonicComplexity:0.50
    Mystery:0.50
    Title:Scape 117
    Steam Factory,0.50,0.50,1.0000
    Spirit Sine Dry,0.23,0.31,3.1529
    Spirit Sine Dry,0.40,0.36,3.4062
    Spirit Sine Dry,0.64,0.19,3.9375
    Spirit Sine Dry,0.55,0.49,1.0065
    Spirit Sine Dry,0.26,0.67,3.5039
    Spirit Sine Dry,0.76,0.54,3.1211
    Spirit Sine Dry,0.49,0.79,3.8789
    Spirit Sine Dry,0.46,0.17,3.9766
    Spirit Sine Dry,0.85,0.27,2.0732
    Spirit Sine Dry,0.90,0.53,1.5154
    Spirit Sine Dry,0.66,0.72,3.6680
    Spirit Sine Dry,0.15,0.55,2.2527
    Spirit Sine Dry,0.11,0.80,1.9320
    Spirit Sine Dry,0.32,0.88,4.1289
    Spirit Sine Dry,0.18,0.14,3.2779
    Spirit Sine Dry,0.81,0.11,3.0752
    Spirit Sine Dry,0.49,0.56,1.7528
    Spirit Sine Dry,0.82,0.80,3.3783
    Bass Pum,0.53,0.46,1.8761
    Thirds Organ Pulsar Rhythm,0.50,0.50,1.0000
    End

    I wrote to Peter Chilvers, who is a mensch, and asked if there was any way to just import these text files. He replied that there unfortunately wasn’t, but suggested that if I still had access to a device that had the scapes on it, I could use the share feature and mail them one by one to my new iPad, where I could tap them in Mail to open them in Scape and then save them.

    At first I thought I was seriously out of luck, but then I figured, why not share one from the new iPad and see what was in the mail? I did, and found it was just an attachment of the text file, with a few hints to iOS as to what app wanted to consume them:


    Content-Type: application/scape; name="Scape 10";x-apple-part-url=Scape 10ar; name="Scape 10ar.scape"
    Content-Disposition: inline; filename="Scape 10ar.scape"
    Content-Transfer-Encoding: base64

    Fab, so all I have to do is look through five or six folder containing bunches of scape files that may or may not be duplicates, build emails, and…this sounds like work. Time to write some scripts. First, I used this script to ferret through the directories, find the scapes, and bring them together.


    use strict;
    use warnings;
    use File::Find::Rule;

    my $finder = File::Find::Rule->new;
    my $scapes = $finder->or(
    $finder->new
    ->directory
    ->name(‘Scape.app’)
    ->prune
    ->discard,
    $finder->new
    ->name(‘*_scape.txt’)
    );
    my $seq=”a”;
    for my $scape ($scapes->in(‘.’)) {
    (my $base = $scape) =~ s/_scape.txt//;

    my $title;
    open my $fh, “<“, $scape or die “can’t open $scape: $!”;
    while(<$fh>){
    chomp;
    next unless /Title:(.*)$/;
    $title = $1;
    last;
    }
    $title =~ s[/][\\/]g;
    if (-e “$title.scape”) {
    $title = “$title$seq”;
    $seq++;
    die if $seq gt “z”;
    }
    system qq(mv “$scape” “$title.scape”);
    system qq(mv “$base.jpg” “$title.jpg”)
    }

    I decided it was easier to do a visual sort using the .jpg thumbnails to spot the duplicates and filter them out; I probably could have more easily done it by checksumming the files and eliminating all the duplicates, but I wanted to cull a bit as well.

    So now I’ve got these, and I need to get them to my iPad. Time for another script to build me the mail I need:

    #!/usr/bin/env perl

    =head1 NAME

    bulk_scapes.pl – recover scape files in bulk

    =head1 SYNOPSIS

    MAIL_USER=gmail.sendername@gmail.com \
    MAIL_PASSWORD=’seekrit’ \
    RECIPENT=’icloud_user@me.com’ \
    bulk_scapes

    =head1 DESCRIPTION

    C will collect up all the C<.scape> files in a directory
    and mail them to an iCloud user. That user can then open the mail on their
    iPad and tap the attachments to restore them to Scape.

    This script assumes you’ll be using GMail to send the files; create an app
    password in your Google account to use this script to send the mail.

    =cut

    use strict;
    use warnings;
    use Email::Sender::Simple qw(sendmail);
    use Email::Sender::Transport::SMTP;
    use MIME::Entity;

    my $top = MIME::Entity->build(Type => “multipart/mixed”,
    From => $ENV{MAIL_USER},
    To => $ENV{RECIPIENT},
    Subject => “recovered scapes”);

    # Loop over files and attach. MIME type is ‘application/scape’.
    my $n = 1;
    for my $file (`ls -1 *.{scape,playlist}`) {
    chomp $file;
    my($part, undef) = split /\./, $file;
    open my $fh, “<“, $file or die “Can’t open $file: $!\n”;
    my $name;
    while(<$fh>){
    next unless /Title/;
    (undef, $name) = split /:/;
    last;
    }
    unless ($name) {
    $name = “Untitled $n”;
    $n++;
    }
    close $fh;
    $top->attach(Path => $file,
    Type => “application/scape; name=\”$name\”;x-apple-part-url=$part”,
    );
    }

    my $transport = Email::Sender::Transport::SMTP->new(
    host => ‘smtp.gmail.com’,
    port => 587,
    ssl => ‘starttls’,
    sasl_username => $ENV{MAIL_USER},
    sasl_password => $ENV{MAIL_PASSWORD},
    );

    sendmail($top, { transport => $transport });

    I was able to receive this on my iPad, tap on the attachments, and have them open in Scape. Since there were a lot of these, it took several sessions over a week to get them all loaded, listened to, saved, and renamed using Scape’s edit function (the titles did not transfer, unfortunately).

    So now I have all my Scapes back, and I’m working through the program, trying to get to the point where I have all the objects enabled again. I haven’t played with it in a while, and I’m glad to be rediscovering what a gem this app is.