Tag: tools

Postgres array_to_string() and array_agg() to decouple your interface

Let’s say you’ve got a collection of job data for a set of users that looks like this, and you want to create a nice summary of it to be displayed, with a count of how many jobs are in each category for each user, all on one line.

 │id │ user_id │ job │ status    │
 ├───┼─────────┼─────┼───────────┤
 │ 1 │ 12      │ 1   │ Completed │
 │ 2 │ 12      │ 2   │ Cancelled │
 │ 3 │ 14      │ 3   │ Ready     │
 │ 5 │ 14      │ 4   │ Completed │
 │ 6 │ 14      │ 4   │ Completed │
 │ 7 │ 14      │ 4   │ Cancelled │
 ...

Here’s the report of summarized statuses of jobs for each user that you want.

│ user_id │ summary                           │
├─────────┼───────────────────────────────────┤
│ 12      | 1 Cancelled, 1 Completed          │
│ 14      | 1 Cancelled, 2 Completed, 1 Ready │

I’ll show you how it’s possible to provide this solely with a Postgres SELECT.

(more…)

February 15, 2018

Git: undoer of bad decisions
I’ve been working on a major Perl code refactor since March; this is a fairly critical subsystem that unifies two slightly different ways of doing the same thing under the One True Way. I’m finally starting to come out the far end of this process, having learned several things very much the hard way.

The biggest mistake was not working out the most stepwise possible attack on the problem. I tackled a monolith of code and created a new monolith. The changeset was over a thousand lines of moved, tweaked, hoisted, rewritten, and fixed code – a recipe for failed code review. No matter how simple it seems to you because you’ve been living with the code for months on end, the reviewer will come up against a massive wall of code and bounce off it.

Second, I didn’t clearly establish a baseline set of tests for this code. It was, essentially, not tested. A few features were cursorily tested, but the majority of the code was uncovered. In addition, some code needed to live on the Apache web servers, and some on dumb database servers without Apache, so the structure of the code ended up being two communicating monoliths hooked up to mod_perl.

Third, I squashed too soon. Fifty-some commits were turned into a single commit that, to be fair to me, contained only 90 lines of new code – but in fairness to everyone else, shifted a thousand lines of code around, hoisting a lot to new common libraries, and changing one set to mach another.

The code worked, and was provably correct by my tests — but it was an utter failure as far as software engineering was concerned.

After a very educational conversation with my tech lead, Rachel, I headed back to revisit this change and make it into something my co-workers and I could live with.

First: build the infrastructure. I learned from the first try at the code that unit-testing it would not work well. Some of it could be unit-tested, but others simply died because they weren’t running under mod_perl, and couldn’t be mocked up to work without it. The best approach seemed to be to use a behavior-driven development approach: write the necessary interactions as interactions with an Apache instance running enough of the stack for me to test it. I decided that since, luckily, this particular part of the code had very little Javascript, and none along the critical path, I’d be able to write interaction tests using WWW::Mechanize, and verify that the right things had happened by checking over the headers and cooke jar and database.

I started off by creating tiny commits to add a couple of support functions for the Web testing — a WWW::Mechanize subclass optimized for our site, and a couple of support methods to make constructing URLs easier.

I then wrote a set of tests, each exercising a specific part of the code in question, and overall verifying that we had a set of tests that described the system behavior as it should be, both for good and bad inputs.

Once this was done, I turned back to the giant monolithic commit. I knew I wanted to unsquash the commits, but I wasn’t sure how, or what was safest. After some reading, I found a good description of using git reflog and git cherry-pick to restore a branch to its unsquashed shape, and a Stack Overflow post with more hints. With a little extra consideration and checking of git cherry-pick’s options, I was able to recover the original set of commits on a new branch. Here’s how:
1. Start with the output from git reflog. This tracks all your commits and branch switches. As long as your squashed commits point to something (in this case it’s the reflog), git won’t discard them.
2. Scan back for the first reference to the branch that you need to unsquash. Note its SHA1, open another window, and git checkout this SHA1. You’ll now be in “detached head” state.
3. git checkout -b some-name to create a new branch at the same point your desired branch was in when it was created.
4. Now scroll back through the reflog, and git cherry-pick all the commits on this branch. You can save time by cherry-picking ranges (sha1..sha1), which will apply them in reflog order to the branch
5. Continue until you’ve reached the last commit you want on this branch. You may end up skipping sets of commits if you’ve been working on other branches too; watch for branch switches away from the desired branch and then back to it.
You may hit minor problems re-applying the commits to the branch; resolve these as you normally would, and then use git cherry-pick –continue to complete the commit or continue the set of commits.

Once I had my original commits, I was able to cherry-pick these off the newly-recovered (and working) branch, a few at a time, and create smaller pull requests (add a test, add another test; shift code, verify the test still works; and so on).

The final result was a series of pull requests: tests to validate the current behavior, and then a series of hoists and refactors to get half of the code to the desired point, and then another series to bring the second half in line with the first.

Overall, this was more than a dozen pull requests, and I’m sure that my co-workers got sick of seeing more PRs from me every day, but the result was a properly-tested and properly-refactored set of code, and no one had any complaints about that.
October 2, 2015
xmonad on OS X Mavericks
I installed XQuartz today, and while looking around for a low-distraction window manager, I came across xmonad. It looked interesting, and I started following the installation instructions and found they were out of date. Here’s an updated set of instructions for installing xmonad.
1. Install XQuartz.
2. Install homebrew if you don’t already have it.
3. brew update
4. brew install ghc cabal-install wget
5. cabal update
6. export LIBRARY_PATH=/usr/local/lib:/usr/X11/lib
7. cabal install xmonad
8. Launch XQuartz and go to Preferences (command-,). Set the following:
  - Output
    
    Enable “Full-screen mode”
  - Input
    
    Enable “Emulate three button mouse”
    
    Disable “Follow system keyboard layout”
    
    Disable “Enable key equivalents under X11”
    
    Enable “Option keys sent Alt_L and Alt_R”
  - Pasteboard
    
    Enable all of the options
monad has been installed in $HOME/.cabal/bin/xmonad. You now need to create an .xinitrc that will make XQuartz run monad. Edit ~/.xinitrc and add these lines:
```
[[ -f ~/.Xresources ]] && xrdb -load ~/.Xresources
xterm &
$HOME/.cabal/bin/xmonad
```
You can now launch XQuartz; nothing seems to happen, but press command-option-A and the xmonad “desktop” (one huge xterm) will appear, covering the whole screen. Great! It’s using the default teeny and nasty xterm font, though. Let’s pretty it up a bit by making it use Monaco instead. Edit ~/.xresources and add these lines:
```
xterm*background: Black
xterm*foreground: White
xterm*termName: xterm-color
xterm*faceName: Monaco
```
Quit XQuartz with command-Q, and then relaunch, then hit command-option-A again to see the XQuartz desktop. The terminal should now be displaying in Monaco.

At this point, you should take a look at the guided tour and get familiar with xmonad. If you’re looking for a distraction-free working environment, this might be good for you. I’m going to give it a try and see how it works out.
October 23, 2014

youtube-dl: it just works

I was having trouble watching the Théâtre du Châtelet performance of Einstein on the beach at home; my connection was stuttering and buffering, which makes listening to highly-pulsed minimalist music extremely unrewarding. Nothing like a hitch in the middle of the stream to throw you out of the zone that Glass is trying to establish. (This is a brilliant staging of this opera and you should go watch it Right Now.)

So I started casting around for a way to download the video and watch it at my convenience. (Public note: I would never redistribute the recording; this is solely to allow me to timeshift the recording such that I can watch it continuously.) I looked at the page and thought, “yeah, I could work this out, but isn’t there a better way?” I searched for a downloader for the site in question, and found it mentioned in a comment in the GitHub pages for youtube-dl.

I wasn’t 100% certain that this would work, but a quick perusal seemed to indicate that it was a nicely sophisticated Python script that ought to be able to do the job. I checked it out and tried a run; it needed a few things installed, most importantly ffmpeg. At this point I started getting a little excited, as I knew ffmpeg should technically be quite nicely able to do any re-enoding etc. that the stream might need.

A quick brew install later, I had ffmpeg, and I asked for the download (this is where we’d gotten to while I’ve been writing this post):

$ youtube_dl/__main__.py http://culturebox.francetvinfo.fr/einstein-on-the-beach-au-theatre-du-chatelet-146813
 [culturebox.francetvinfo.fr] einstein-on-the-beach-au-theatre-du-chatelet-146813: Downloading webpage
 [culturebox.francetvinfo.fr] EV_6785: Downloading XML config
 [download] Destination: Einstein on the beach au Théâtre du Châtelet-EV_6785.mp4
 ffmpeg version 1.2.1 Copyright (c) 2000-2013 the FFmpeg developers
 built on Jan 12 2014 20:50:55 with Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
 configuration: --prefix=/usr/local/Cellar/ffmpeg/1.2.1 --enable-shared --enable-pthreads --enable-gpl --enable-version3 --enable-nonfree --enable-hardcoded-tables --enable-avresample --enable-vda --cc=cc --host-cflags= --host-ldflags= --enable-libx264 --enable-libfaac --enable-libmp3lame --enable-libxvid
 libavutil 52. 18.100 / 52. 18.100
 libavcodec 54. 92.100 / 54. 92.100
 libavformat 54. 63.104 / 54. 63.104
 libavdevice 54. 3.103 / 54. 3.103
 libavfilter 3. 42.103 / 3. 42.103
 libswscale 2. 2.100 / 2. 2.100
 libswresample 0. 17.102 / 0. 17.102
 libpostproc 52. 2.100 / 52. 2.100
 [h264 @ 0x7ffb5181ac00] non-existing SPS 0 referenced in buffering period
 [h264 @ 0x7ffb5181ac00] non-existing SPS 15 referenced in buffering period
 [h264 @ 0x7ffb5181ac00] non-existing SPS 0 referenced in buffering period
 [h264 @ 0x7ffb5181ac00] non-existing SPS 15 referenced in buffering period
 [mpegts @ 0x7ffb52deb000] max_analyze_duration 5000000 reached at 5013333 microseconds
 [mpegts @ 0x7ffb52deb000] Could not find codec parameters for stream 2 (Unknown: none ([21][0][0][0] / 0x0015)): unknown codec
 Consider increasing the value for the 'analyzeduration' and 'probesize' options
 [mpegts @ 0x7ffb52deb000] Estimating duration from bitrate, this may be inaccurate
 [h264 @ 0x7ffb51f9aa00] non-existing SPS 0 referenced in buffering period
 [h264 @ 0x7ffb51f9aa00] non-existing SPS 15 referenced in buffering period
 [hls,applehttp @ 0x7ffb51815c00] max_analyze_duration 5000000 reached at 5013333 microseconds
 [hls,applehttp @ 0x7ffb51815c00] Could not find codec parameters for stream 2 (Unknown: none ([21][0][0][0] / 0x0015)): unknown codec
 Consider increasing the value for the 'analyzeduration' and 'probesize' options
 Input #0, hls,applehttp, from 'http://ftvodhdsecz-f.akamaihd.net/i/streaming-adaptatif/evt/pf-culture/2014/01/6785-1389114600-1-,320x176-304,512x288-576,704x400-832,1280x720-2176,k.mp4.csmil/index_2_av.m3u8':
 Duration: 04:36:34.00, start: 0.100667, bitrate: 0 kb/s
 Program 0
 Metadata:
 variant_bitrate : 0
 Stream #0:0: Video: h264 (Main) ([27][0][0][0] / 0x001B), yuv420p, 704x396, 12.50 fps, 25 tbr, 90k tbn, 50 tbc
 Stream #0:1: Audio: aac ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 102 kb/s
 Stream #0:2: Unknown: none ([21][0][0][0] / 0x0015)
 Output #0, mp4, to 'Einstein on the beach au Théâtre du Châtelet-EV_6785.mp4.part':
 Metadata:
 encoder : Lavf54.63.104
 Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 704x396, q=2-31, 12.50 fps, 90k tbn, 90k tbc
 Stream #0:1: Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, 102 kb/s
 Stream mapping:
 Stream #0:0 -> #0:0 (copy)
 Stream #0:1 -> #0:1 (copy)
 Press [q] to stop, [?] for help
 frame=254997 fps=352 q=-1.0 size= 1072839kB time=02:49:59.87 bitrate= 861.6kbits/s

Son of a gun. It works.

I’m waiting for the download to complete to be sure I got the whole video, but I am pretty certain this is going to work. Way better than playing screen-capture games. We’ll see how it looks when we’re all done, but I’m quite pleased to have it at all. The download appears to be happening at about 10x realtime, so I should have it all in about 24 minutes, give or take (it’s a four-hour, or 240 minute, presentation).

Update: Sadly, does not work for PBS videos, but you can actually buy those; I can live with that.

January 12, 2014

CrashPlan folder date recovery

The situation: a friend had a MacBook Air whose motherboard went poof. Fortunately she had backups (almost up-to-date) in CrashPlan, so she did a restore of her home directory, which worked fine in that she had her files, but not so fine in that all the folder last-changed dates now ran from the current date to a couple days previous (it takes a couple days to recover ~60GB of data).

This was a problem for her, because she partly uses the last-changed date on her folders to help her keep organized. “When was the last time I did anything on project X?” (I should note: she uses Microsoft Word and a couple different photo library managers, so git or the equivalent doesn’t work well for her workflow. She is considering git or the like now for her future text-based work…)

A check with CrashPlan let us know that they did not track folder update dates and couldn’t restore them. We therefore needed to come up with a way to re-establish as best we could what the dates were before the crash.

My original thought was simply to start at the bottom and recursively restore the folder last-used dates using touch -t, taking the most-recently-updated file in the folder as the folder’s last-updated date. Some research (read: trying it) and thought turned up the following:

Updating a file updated the folder’s last-updated date.
Updating a folder did not update the containing folder’s last-updated date.

This meant that we couldn’t precisely guarantee that the folder’s last-updated date would accurately reflect the last update of its contents. We decided in the end that the best strategy for her was to “bubble up” the last-updated dates by checking both files and folders contained in a subject folder. This way, if a file deep in the hierarchy is updated, but the files and folders above it have not been, the file’s last-updated date is applied to its containing folder, and subsequently is applied also to each containing folder (since we’re checking both files and folders, and there’s always a folder that has the last-updated date that corresponds to the one on the deeply-nested file). This seemed like the better choice for her as she had no other records of what had been worked on when, and runs a very nested set of folders.

If you were running a flatter hierarchy, only updating the folders to the last-updated date of the files might be a better choice. Since I was writing a script to do this anyway, it seemed reasonable to go ahead and implement it so that you could choose to bubble up or not as you liked, and to also allow you to selectively bubble-up or not in a single directory.

This was the genesis of date-fixer.pl. Here’s the script. A more detailed example of why neither approach to restoring the folder dates is perfect is contained in the POD.

use strict;
use warnings;
use 5.010;

=head1 NAME

date-fixer.pl - update folder dates to match newest contained file

=head1 SYNOPSIS

date-fixer.pl --directory top_dir_to_fix
             [--commit]
             [--verbose]
             [--includefolders]
             [--single]

=head1 DESCRIPTION

date-fixer.pl is meant to be used after you've used something like CrashPlan to restore your files. The restore process will put the files back with their proper dates, but the folders containing those files will be updated to the current date (the last time any operation was done in this folder - specifically, putting the files back).

date-fixer.pl's default operation is to tell you what it would do; if you want it to actually do anything, you need to add the --commit argument to force it to actually execute the commands that change the folder dates.

If you supply the --verbose argument, date-fixer.pl will print all the commands it is about to execute (and if you didn't specify --includefolders, warn you about younger contained folders - see below). You can capture these from STDOUT and further process them if you like.

=head2 Younger contained folders and --includefolders

Consider the following:

    folder1           (created January 2010 - date is April 2011)
        veryoldfile 1 (updated March 2011)
        oldfile2      (updated April 2011)
        folder2       (created June 2012 - date is July 2012)
            newfile   (updated July 2012)

If we update folder1 to only match the files within it, we won't catch that folder2's date could actually be much more recent that that of either of the files directly contained by folder1. However, if we use contained folder dates as well as contained file dates to calculate the "last updated" date of the current folder, we may make the date of the current folder considerably more recent than it may actually have been.

Example: veryoldfile1 and oldfile2 were updated in March and April 2011.
Folder2 was updated in June 2012, and newfile was added to in in July 2012. The creation of folder2 updates the last-updated date of folder1 to June 2012; the addition of newfile updates folder2's last-updated date to that date -- but the last-updated date of folder1 does not change - it remains June 2012.

If we restore all the files and try to determine the "right" dates to set the folder update dates to, we discover that there is no unambiguous way to decide what the "right" dates are. If we use the file dates, alone, we'll miss that folder2 was created in June (causing folder1 to update to June); if we use both file and folder dates, we update folder1 to July 2012, which is not accurate either.

date-fixer.pl takes a cautious middle road, defaulting to only using the files within a folder to update that folder's last-modified date. If you prefer to ensure that the newest date buried in a folder hierarchy always "bubbles up" to the top, add the --includefolders option to the command.

date-fixer will, in verbose mode, print a warning for every folder that
contains a folder younger than itself; you may choose to go back and adjust the dates on those folders with

date-fixer.pl --directory fixthisone --includefolders --single

This will, for this one folder, adjust the folder's last-updated date to the most recent date of any of the items contained in it.

=head1 USAGE

To fix all the dates in a directory and all directories below it, "bubbling up" dates from later files:

    date-fixer.pl --directory dir --commit --includefolders

To fix the dates in just one directory based on only the files in it and
ignoring the dates on any directories it contains:

    date-fixer.pl --directory dir --commit --single

To see in detail what date-fixer is doing while recursively fixing dates, "bubbling up" folder dates:

    date-fixer.pl --directory dir --commit --verbose --includefolders

=head1 NOTES

"Why didn't you use File::Find?"

I conceived the code as a simple recursion; it seemed much easier to go ahead and read the directories myself than to go through the mental exercise of transforming the treewalk into an iteration such as I
would need to use File::Find instead.

=head1 AUTHOR

Joe McMahon, mcmahon@cpan.org

=head1 LICENSE

This code is licensed under the same terms as Perl itself.

=cut

use Getopt::Long;
use Date::Format;

my($commit, $start_dir, $verbose, $includefolders, $single);
GetOptions(
    'commit' => \$commit,
    'directory=s' => \$start_dir,
    'verbose|v' => \$verbose,
    'includefolders' => \$includefolders,
    'single' => \$single,
);

$start_dir or die "Must specify --directory\n";

set_date_from_contained_files($start_dir);

sub set_date_from_contained_files {
    my($directory) = @_;
    return unless defined $directory;

    opendir my $dirhandle, $directory
        or die "Can't read $directory: $!\n";
    my @contents;
    push @contents, $_ while readdir($dirhandle);
    closedir $dirhandle;

    @contents = grep { !/\.$|\.\.$/ } @contents;
    my @dirs = grep { -d "$directory/$_" } @contents;

    my %dirmap;
    @dirmap{@{[@dirs]}} = ();

    my @files = grep { !exists $dirmap{$_}} @contents;

    # Recursively apply the same update criteria unless --single is on.
    unless ($single) {
        foreach my $dir (@dirs) {
            set_date_from_contained_files("$directory/$dir");
        }
    }

    my $most_recent_date;
    if (! $includefolders) {
         $most_recent_date = most_recent_date($directory, @files);
         my $most_recent_folder = most_recent_date($directory, @dirs);
         warn "Folders in $directory are more recent ($most_recent_folder) than the most-recent file ($most_recent_date)\n";
    }
    else {
         $most_recent_date = most_recent_date($directory, @files, @dirs);
    }

    if (defined $most_recent_date) {
        (my $requoted = $directory) =~ s/'/\\'/g;
        my @command = (qw(touch -t), $most_recent_date, $directory);
        print "@command\n" if $verbose;
        system @command if $commit;
    }
    else {
        warn "$directory unchanged because it is empty\n" if $verbose;
    }
}

sub most_recent_date {
    my ($directory, @items) = @_;
    my @dates =     map  { (stat "$directory/$_")[9] } @items;
    my @formatted = map  { time2str("%Y%m%d%H%M.%S", $_) } @dates;
    my @ordered =   sort { $a lt $b } @formatted;
    return $ordered[0];
}

October 8, 2013

pemungkah.com migrated to WordPress

I’ve decided to consolidate both my website and blog into a single WordPress site. This greatly decreases the friction in adding new stuff to the site, and makes it much easier to update and maintain. I’m currently working on a series of posts about Scape and its internals – first one will be a tour of the basics and how to back up your scapes and name them.

I’ve grabbed the few posts I had on Blogger under the blog of the same name and have republished them here (with edits).

November 12, 2012
Reassembling the Studio

My studio’s been half-assembled for the last six months (other things have been taking up my free time). I’ve decided that today I’ll try to get it reassembled and back to the setup I had when I was playing at Different Skies. That lets me use everything together again, gets the Vortex back in business, and lets me get some music recorded again. I have gotten back to the point where MIDI into the laptop works, but a hard disk crash on one of my server disks has caused me to temporarily disconnect the studio machine to get that sorted out again. I do have the Firebox working for MIDI and audio input again, but don’t yet have the Vortex reintegrated.

Surprisingly, I’m finding that more and more I prefer the virtual instruments I have on the laptop to the internal sounds in the SD-1. Those are still good, mind you – I’ve still not got as playable a sax, flute, violin, and cello anywhere else – but the GarageBand pianos feel better than the internal ones, and they sound right. I’ve just started to scratch the surface on Live at this point, and an considering whether I should spring for the Suite upgrade now so I get the free Live 9 upgrade when it comes out.

I’m still working on the portable setup, and actually have played out live with the iPad and my computer speakers (it all fit in the backpack except for the subwoofer, which is pretty good). Interesting new iPad instruments keep coming out and are quite impressive; I am wondering whether a Mini would do as a performance device if I stuck to MIDI; the compacted display is already a little tight for the apps that have on-screen keyboards, and I can’t see that being better. I am starting to see things evolving away from my iPad 1 at this point – Scape won’t run in the background there, and Borderlands flat out says it’s not good on the iPad 1.

November 12, 2012