Tag: perl

  • Controller unit testing

    A lot of progress on multiple fronts to get me to the point where the the models are tested and working, and the registration controller works when manually tested. Its support functions have been unit tested as well.

    Settings model

    Working on through the code, I found and fixed a couple typos (SearchPage, not SeatchPage!), and then put all the model functions under test. The settings are properly initialized with sensible values for the fields that can be pre-initialized, and the load and save functions work. Overall I found the Mojo ORM to sometimes work very intuitively, and other times to be opaque enough that I just fell back to creating and running placeholder queries.

    To prevent multiple sets of settings from being created, the model always sets the ID of the settings record to 1, guaranteeing that we have only one set of settings.

    User model

    We have the usual suite of methods for a user: do they exist, are they verified, can they log in. I decided to use Crypt::Passphrase as the hashing function and manage passwords myself instead of using the Mojo plugin. Since this is all inside the model, it’s not terrible if I decide to change it later.

    Originally I thought that I should probably block multiple IDs from the same email, but I decided that I would allow it so that people could have IDs with multiple privilege sets. This becomes more necessary if the folks using the wiki start using access lists, especially if there are disjoint groups with people in more than one of them. Again, another decision that’s easy to change if I do change my mind.

    The principle problem I had here was that I had set up the user table with an id primary key, but a lot of operations depend on using the username as a key. SQLite can do multiple primary keys, but the Mojo SQLite module doesn’t. It was easier to write a method that does a SELECT * FROM users where username = ? and returns the user than try to work around it.

    The initial version didn’t have any password constraints; I added a function that does some very basic ones (> 10 chars, at least one upper, one lower, one digit, one special character. I used the POSIX character classes to try to start toward a more UTF-8-ish future.

    The tests were getting crowded, and a bit too long, so I reorganized the tests by creating
    a subdirectory for each model and controller, and copying the appropriate tests into it. I then went through the tests, making multiple copies, stripping each one down to testing just one thing. I last added a new test to verify that password validation worked, including whether the user was verified or not (not verified == failed login).

    Controllers

    Next was refining the controllers.

    I reworked the login controller to use the User model instead of doing everything itself. and set it aside for the moment.

    The rest of this sprint went into the registration controller; I wanted to be able to add users natively before testing the login controller. The only tweak it needed to be able to just run was to add in an import of the Users model so it could indeed create users.

    A quick run showed me that I’d need to shuffle around the template and work on the validation code; the fields were fine for the original demo, but there were ones I didn’t need (middle name), ones that were missing (username), and the “flash” error message from validation was at the bottom of the page. Swapped things around and everything looked good.

    I then went in and disassembled the logic of the controller into input validation, username creation, and the actual messing about with the User model. I left the model manipulation inline, but thinking again about it, I think I want to isolate that in a separate method and unit test that as well.

    Wrote the tests for both of those, and did some cleanup on the error messaging in the validation. It now gathers all the validation errors and constructs a properly-punctuated list:

    • The item alone, capitalized, if there’s one field wrong.
    • "Item_one and item_two" (no serial comma) if there are two wrong.
    • "Item_one, Item_two,...Item_n_minus_one, and item_n" if there are more than two.

    I decided to just grab the fields and drop them into a hash, then pass the hash to the utility functions; this actually led to a lot of impedance matching problems, and it might have been better to build a tiny class to carry the data instead. (If I were doing this in Go, I’d use a struct and be sure that I couldn’t assign to or use the wrong field names because the compiler would catch me.)

    The username construction adds a very nice module, Text::Unidecode, which does a pretty decent job of translating Unicode characters into an ASCII-coded equivalent. I decided to do this to preserve the simplicity of the link checking code; later on, when we get to the page decoding, that code will look for words that match the wiki link syntax and automatically transform them into links to a page of the same name. Making the link syntax more complex would mean that it would be easier to accidentally create links; it’s possible to use `` before a linkname to prevent this, but having to do that a lot makes using the wiki less pleasurable.

    The decoded strings sometimes contain spaces, so a s/\s(.)/uc($1)/eg was needed to collapse the space and capitalize the letter after it.

    I tested this a little bit manually as well, and the controller seems to work fine. The page is a little stark, but can probably be made nicer with some CSS later.

    Registration controller’s going to need integration tests to cover the rest, and the login controller has exactly one line that isn’t directly concerned with logging the user in, so it’ll need integration tests as well. Next up is those integration tests and then I’ll start on the actual page display code. Most of the rest of the functionality is inside special processing attached to pages.

    A pretty productive sprint!

  • Intro to Testing at SVPerl

    As promised, here are the slides from my talk: Intro to Testing.

    This version of the talk covers a little more ground than the previous version, and gives you the preferred ways of dealing with leaking output and warnings, and includes the new section on Cucumber.

    You can also play with the demo programs which live on GitHub.

  • Shellshock scanner

    So I had a bunch of machines with a standard naming convention that I needed to scan for the Shellshock bug. Since I just needed to run a command on each one and check the output, and I had SSH access, it seemed easy enough to put together a quick script to manage the process.

    Here’s a skeleton of that script, with the details on what machines I was logging into elided. This does a pretty reasonable job, checking 300 machines in about a minute. You need to have a more recent copy of Parallel::ForkManager, as versions prior to 1.0 don’t have the  ability to return a data structure from the child.

    $|++;
    use strict;
    use warnings;
    use Parallel::ForkManager 1.07;
    
    my $MAX_PROCESSES = 25;
    my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
    my @servers = @SERVER_NAMES;
    my %statuses;
    my @diagnostics;
    $pm-> run_on_finish (
        sub {
            my($pid, $exit_code, $ident, $exit_signal, $core_dump,
               $data_structure_reference) = @_;
            if (defined($data_structure_reference)) { 
                my ($host_id, $status, $results) = @{$data_structure_reference};
                if ($status eq 'Unknown') {
                    push @diagnostics, $host_id, $results;
                } else {
                    push @{ $statuses{$status} }, $host_id;
                }
            } else { 
                warn qq|No message received from child process $pid!\n|;
            }
        }
    );
    
    print "Testing servers: ";
    for my $host_id (@servers) {
        my $pid = $pm->start and next;
        my $result = << `EOF`;
    ssh -o StrictHostKeyChecking=no $host_id <<'ENDSSH' 2>&1
    env x='() { :;}; echo vulnerable' bash -c "echo this is a test"
    ENDSSH
    EOF
        my $status;
        if ($result =~ /Permission denied/is) {
           $status = q{Inacessible};
        } elsif ($result =~ /key verification failed/s) {
           $status = q{Key changed};
        } elsif ($result =~ /timed out/is) {
           $status = q{Timed out};
        } elsif ($result =~ /vulnerable/s) {
               $status = q{Vulnerable};
        } elsif ($result =~ /ignoring function definition attempt/s) {
           $status = q{Patched};
        } elsif ($result =~ /Name or service not known/s) {
           $status = q{Nonexistent};
        } else {
           $status = q{Unknown}
        }
        print "$host_id, ";
        $pm->finish(0, [$host_id, $status, $result]);
    }
    $pm->wait_all_children;
    print "done!\n";
    for my $status (keys %statuses) {
        print "$status: ",join(',', @{$statuses{$status}}), "\n";
    }
    print "The following hosts returned an undiagnosed status:",
          join("\n", @diagnostics), "\n";

    Note that this doesn’t test the most recent version (#3) of the bug; I have modified it slightly to test for that, but that’s a reasonable exercise for the reader.

  • ETL into WordPress: lessons learned

    I had a chance this weekend to do a little work on importing a large (4000 or so articles and pages) site into WordPress. It was an interesting bit of work, with a certain amount learning required on my part – which translated into some flailing around on to establish the toolset.

    Lesson 1: ALWAYS use a database in preference to anything else when you can. 
    I wasted a couple hours trying to clean up the data for CSV import using any of a number of WordPress plugins. Unfortunately, CSV import is half-assed at best – more like about quarter-assed, and any cleanup in Excel is excruciatingly slow.
    Some of the data came out with mismatched quotes, leaving me with aberrant entries in the spreadsheet that caused Excel to throw an out-of-memory error and refuse to process them when I tried to delete the bad rows or even cells from those bad rows.
    Even attempting to work with the CSV data using Text::CSV in Perl was problematic because the site export data (from phpMyAdmin) was fundamentally broken. I chalk that partially up to the charset problems we’ll talk about later.
    I loaded up the database using MAMP, which worked perfectly well, and was able to use Perl DBI to pull the pages and posts out without a hitch, even the ones with weirdo character set problems.
    Lesson 2: address character set problems first
    I had a number of problems with the XMLRPC interface to WordPress (which otherwise is great, see below) when the data contained improperly encoded non-ASCII characters. I was eventually forced to write code to swap the strings into hex, find the bad 3 and 4 character runs, and replace them with the appropriate Latin-1 substitutes (note that these don’t quite match that table – I had to look for the ”e2ac’ or ‘c3’ delimiter characters in the input to figure out where the bad characters were. Once I hit on this idea, it worked very well.
    Lesson 3: build in checkpointing from the start for large import jobs
    The various problems ended up causing me to repeatedly wipe the WordPress posts database and restart the import, which wasted a lot of time. I did not count that toward the overall time needed to complete when I charged my client. If I had, it would have been more like 20-24 hours instead of 6. Fortunately the imports were, until a failure occurred, a start-it-and-forget-it process. It was necessary to wipe the database between tried because WordPress otherwise very carefully preserves all the previous versions, and cleaning them out is even slower.
    I hit on the expedient of recording the row ID of an item each time one successfully imported and dumping that list out in a Perl END block. If the program fell over and exited due to a charset problem, I got a list of the rows that had processed OK which I could then add to an ignore list. Subsequent runs could simply exclude those records to get me straight to the stuff I hadn’t done yet and and to avoid duplicate entries.
    I had previously tried just logging the bad ones and going back to redo those, but it turned out to be easier to exclude than include.
    Lesson 4: WordPress::API and WordPress XMLRPC are *great*.
    I was able to find the WordPress::API module on CPAN, which provides a nice object-oriented wrapper around WordPress XMLRPC. With that, I was able to programmatically add posts and pages about as fast as I could pull them out of the local database.
    Lesson 5: XMLRPC just doesn’t support some stuff
    You can’t add users or authors via XMLRPC, sadly. In the future, the better thing to do would probably be to log directly in to the server you’re configuring, load the old data into the database, and use the PHP API calls  directly to create users and authors as well as directly load the data into WordPress. I decided not to embark on this, this time, because I’m faster and more able in Perl than I am in PHP, and I decided it would be faster to go that way than try to teach myself a new programming language and solve the problem simultaneously.
    Overall
    I’d call this mostly successful. The data made it in to the WordPress installation, and I have an XML dump from WordPress that will let me restore it at will. All of the data ended up where it was supposed to go, and it all looks complete. I have a stash of techniques and sample code to work with if I need to do it again.
  • Mojolicious Revolutions

    3rd in my series of talks at SVPerl about Mojolicious; this one reviews using the server-side features for building bespoke mock servers, and adds a quick overview of the Mojo client features, which I had missed until last week. Color me corrected.

    Mojolicious Revolutions

     

  • Pure majority rule considered harmful

    I’ve been discussing an issue on Perlmonks over the past couple days; specifically the potential for abuse of the anonymous posting feature. I’ve seen numerous threads go by discussing this, most of which have focused on restricting the anonymous user. Since the anonymous user’s current feature set seems to be a noli me tangere, I proposed an alternative solution similar to Twitter’s blocking feature. One of the site maintainers very cordially explained why my proposal was not going to be adopted, and in general I’d just let this drop – but I received another comment that I can’t just let pass without comment. To quote:

    I’m saying “This isn’t a problem for the overwhelming majority, therefore it is not a problem.”

    I’d like to take a second and talk about this particular argument against change, and why it is problematic. This is not about Perlmonks. This is not about any particular user. This is about a habit of thought that can be costly both on a job-related and personal level.

    Software engineering is of necessity conservative. It’s impossible to do everything that everyone wants, therefore we have to find reasons to choose some things and not others. And as long as the reasons are honest and based on fact and good reasoning, then they are good reasons. They may not make everyone happy (impossible to do everything), but they do not make anyone feel as if their needs are not being carefully considered. But, because we’re all human, sometimes we take our emotional reactions to a proposal and try to justify those with a “reason” that “proves” our emotional reaction is right.

    In this case, what is said here is something I’ve seen in many places, not just at Perlmonks: the assumption that unless the majority of the people concerned have a problem, there’s no good reason to change; the minority must put up with things as they are or leave. Secondarily, if there is no “perfect” solution (read: a solution that I like), then doing nothing is better than changing.

    There is a difference between respectfully acknowledging that a problem exists, and taking the time to lay out why there are no good solutions within the existing framework, including the current proposal, as the maintainer did – and with which I’m satisfied – and saying “everyone else is happy with things as they are”, end of conversation.

    The argument that the majority is perfectly happy with the status quo says several things by implication: the complainer should shut up and go along; the complainer is strange and different and there’s something wrong with them; they do not matter enough for us to address this.

    Again, what I’m talking about is not about Perlmonks.

    As software engineers, we tend to lean on our problem-solving skills, inventiveness, and intelligence. We use them every day, and they fix our problems and are valuable (they are why we get paid). This means we tend to take them not only to other projects, but into our personal lives. What I would want you to think about is whether you have accepted that stating “everyone else is happy with things as they are” is a part of your problem-solving toolkit. The idea that “the majority doesn’t have a problem with this” can morph into “I see myself as a member of the majority, so my opinions must be the majority’s opinions; since the majority being happy is sufficient to declare a problem solved, asserting my opinion is sufficient – the majority rule applies because I represent the majority”.

    This shift can be poisonous to personal relationships, and embodies a potential for the destruction of other projects – it becomes all too easy to say the stakeholders are being “too picky” or “unrealistic”, or to assume that a romantic partner or friend should always think the same way you do because “most people like this” or “everybody wants this” or “nobody needs this” – when in actuality you like it or want it or don’t need it. The other person may like, need, or want it very much – and you’ve just said by implication that to you they’re “nobody” – that they don’t count. No matter how close a working or personal relationship is, this will sooner or later break it.

    Making sure you’re acknowledging that what others feel, want, and need is as valid as what you feel, want, and need will go a long way toward dismantling these implicit assumptions that you are justified in telling them how they feel and what should matter to them.

  • Test::Routine slides

    This is my Test::Routine slide deck for the presentation I ended up doing from memory at the last SVPerl.org meeting. I remembered almost all of it except for the Moose trigger and modifier demos – but since I didn’t have any written yet, we didn’t miss those either!

    Update: My WordPress installation seems to have misplaced this file. I’ll look around for it and try to put it back soon.

  • Intro to Perl Testing at SVPerl

    A nice evening at SVPerl – we talked about the basic concepts of testing, and walked through some examples of using Test::Simple, Test::More, and Test::Exception to write tests. We did a fair amount of demo that’s not included in the slides – we’ll have to start recording these sometime – but you should be able to get the gist of the talk from the slides.
    [wpdm_file id=2]

  • CrashPlan folder date recovery

    The situation: a friend had a MacBook Air whose motherboard went poof. Fortunately she had backups (almost up-to-date) in CrashPlan, so she did a restore of her home directory, which worked fine in that she had her files, but not so fine in that all the folder last-changed dates now ran from the current date to a couple days previous (it takes a couple days to recover ~60GB of data).

    This was a problem for her, because she partly uses the last-changed date on her folders to help her keep organized. “When was the last time I did anything on project X?” (I should note: she uses Microsoft Word and a couple different photo library managers, so git or the equivalent doesn’t work well for her workflow. She is considering git or the like now for her future text-based work…)

    A check with CrashPlan let us know that they did not track folder update dates and couldn’t restore them. We therefore needed to come up with a way to re-establish as best we could what the dates were before the crash.

    Our original thought was simply to start at the bottom and recursively restore the folder last-used dates using touch -t, taking the most-recently-updated file in the folder as the folder’s last-updated date. Some research and thought turned up the following:

    • Updating a file updated the folder’s last-updated date.
    • Updating a folder did not update the containing folder’s last-updated date.

    This meant that we couldn’t precisely guarantee that the folder’s last-updated date would accurately reflect the last update of its contents. We decided in the end that the best strategy for her was to “bubble up” the last-updated dates by checking both files and folders contained in a subject folder. This way, if a file deep in the hierarchy is updated, but the files and folders above it have not been, the file’s last-updated date is applied to its containing folder, and subsequently is applied also to each containing folder (since we’re checking both files and folders, and there’s always a folder that has the last-updated date that corresponds to the one on the deeply-nested file). This seemed like the better choice for her as she had no other records of what had been worked on when, and runs a very nested set of folders.

    If you were running a flatter hierarchy, only updating the folders to the last-updated date of the files might be a better choice.  Since I was writing a script to do this anyway, it seemed reasonable to go ahead and implement it so that you could choose to bubble up or not as you liked, and to also allow you to selectively bubble-up or not in a single directory.

    This was the genesis of date-fixer.pl. Here’s the script. A more detailed example of why neither approach to restoring the folder dates is perfect is contained in the POD.
    [wpdm_file id=1]

    use strict;
    use warnings;
    use 5.010;
    
    =head1 NAME
    
    date-fixer.pl - update folder dates to match newest contained file
    
    =head1 SYNOPSIS
    
    date-fixer.pl --directory top_dir_to_fix
                 [--commit]
                 [--verbose]
                 [--includefolders]
                 [--single]
    
    =head1 DESCRIPTION
    
    date-fixer.pl is meant to be used after you've used something like CrashPlan
    to restore your files. The restore process will put the files back with their
    proper dates, but the folders containing those files will be updated to the
    current date (the last time any operation was done in this folder -
    specifically, putting the files back).
    
    date-fixer.pl's default operation is to tell you what it would do; if you want
    it to actually do anything, you need to add the --commit argument to force it
    to actually execute the commands that change the folder dates.
    
    If you supply the --verbose argument, date-fixer.pl will print all the commands
    it is about to execute (and if you didn't specify --includefolders, warn you
    about younger contained folders - see below). You can capture these from STDOUT
    and further process them if you like.
    
    =head2 Younger contained folders and --includefolders
    
    Consider the following:
    
        folder1           (created January 2010 - date is April 2011)
            veryoldfile 1 (updated March 2011)
            oldfile2      (updated April 2011)
            folder2       (created June 2012 - date is July 2012)
                newfile   (updated July 2012)
    
    If we update folder1 to only match the files within it, we won't catch that
    folder2's date could actually be much more recent that that of either of the
    files directly contained by folder1. However, if we use contained folder dates
    as well as contained file dates to calculate the "last updated" date of the
    current folder, we may make the date of the current folder considerably more
    recent than it may actually have been.
    
    Example: veryoldfile1 and oldfile2 were updated in March and April 2011.
    Folder2 was updated in June 2012, and newfile was added to in in July 2012.
    The creation of folder2 updates the last-updated date of folder1 to June 2012;
    the addition of newfile updates folder2's last-updated date to that date --
    but the last-updated date of folder1 does not change - it remains June 2012.
    
    If we restore all the files and try to determine the "right" dates to set the
    folder update dates to, we discover that there is no unambiguous way to decide
    what the "right" dates are. If we use the file dates, alone, we'll miss that
    folder2 was created in June (causing folder1 to update to June); if we use
    both file and folder dates, we update folder1 to July 2012, which is not
    accurate either.
    
    date-fixer.pl takes a cautious middle road, defaulting to only using the files
    within a folder to update that folder's last-modified date. If you prefer to
    ensure that the newest date buried in a folder hierarchy always "bubbles up"
    to the top, add the --includefolders option to the command.
    
    date-fixer will, in verbose mode, print a warning for every folder that
    contains a folder younger than itself; you may choose to go back and adjust
    the dates on those folders with
    
    date-fixer.pl --directory fixthisone --includefolders --single
    
    This will, for this one folder, adjust the folder's last-updated date to the
    most recent date of any of the items contained in it.
    
    =head1 USAGE
    
    To fix all the dates in a directory and all directories below it, "bubbling
    up" dates from later files:
    
        date-fixer.pl --directory dir --commit --includefolders
    
    To fix the dates in just one directory based on only the files in it and
    ignoring the dates on any directories it contains:
    
        date-fixer.pl --directory dir --commit --single
    
    To see in detail what date-fixer is doing while recursively fixing dates,
    "bubbling up" folder dates:
    
        date-fixer.pl --directory dir --commit --verbose --includefolders
    
    =head1 NOTES
    
    "Why didn't you use File::Find?"
    
    I conceived the code as a simple recursion; it seemed much easier to go ahead and read the directories
    myself than to go through the mental exercise of transforming the treewalk into an iteration such as I
    would need to use File::Find instead.
    
    =head1 AUTHOR
    
    Joe McMahon, mcmahon@cpan.org
    
    =head1 LICENSE
    
    This code is licensed under the same terms as Perl itself.
    
    =cut
    
    use Getopt::Long;
    use Date::Format;
    
    my($commit, $start_dir, $verbose, $includefolders, $single);
    GetOptions(
        'commit' => \$commit,
        'directory=s' => \$start_dir,
        'verbose|v' => \$verbose,
        'includefolders' => \$includefolders,
        'single' => \$single,
    );
    
    $start_dir or die "Must specify --directory\n";
    
    set_date_from_contained_files($start_dir);
    
    sub set_date_from_contained_files {
        my($directory) = @_;
        return unless defined $directory;
    
        opendir my $dirhandle, $directory
            or die "Can't read $directory: $!\n";
        my @contents;
        push @contents, $_ while readdir($dirhandle);
        closedir $dirhandle;
    
        @contents = grep { !/\.$|\.\.$/ } @contents;
        my @dirs = grep { -d "$directory/$_" } @contents;
    
        my %dirmap;
        @dirmap{@{[@dirs]}} = ();
    
        my @files = grep { !exists $dirmap{$_}} @contents;
    
        # Recursively apply the same update criteria unless --single is on.
        unless ($single) {
            foreach my $dir (@dirs) {
                set_date_from_contained_files("$directory/$dir");
            }
        }
    
        my $most_recent_date;
        if (! $includefolders) {
             $most_recent_date = most_recent_date($directory, @files);
             my $most_recent_folder = most_recent_date($directory, @dirs);
             warn "Folders in $directory are more recent ($most_recent_folder) than the most-recent file ($most_recent_date)\n";
        }
        else {
             $most_recent_date = most_recent_date($directory, @files, @dirs);
        }
    
        if (defined $most_recent_date) {
            (my $requoted = $directory) =~ s/'/\\'/g;
            my @command = (qw(touch -t), $most_recent_date, $directory);
            print "@command\n" if $verbose;
            system @command if $commit;
        }
        else {
            warn "$directory unchanged because it is empty\n" if $verbose;
        }
    }
    
    sub most_recent_date {
        my ($directory, @items) = @_;
        my @dates =     map  { (stat "$directory/$_")[9] } @items;
        my @formatted = map  { time2str("%Y%m%d%H%M.%S", $_) } @dates;
        my @ordered =   sort { $a lt $b } @formatted;
        return $ordered[0];
    }