Author: Joe McMahon

“PG::UnableToSend: SSL error: sslv3 alert bad record mac”: not Postgres, not SSL
If you see this error in the build log for your Rails app (we were getting it during our Cucumber tests):

PG::UnableToSend: SSL error: sslv3 alert bad record mac : ...SQL query...

then what you have is probably not an SSL problem, and probably not a Postgres problem. You should not turn off the SSL connection to Postgres, because more than likely not SSL causing the problem, and it’s not (directly) Postgres either, so don’t disable SSL in Postgres.

What it most likely is, especially if it’s intermittent, is Postgres dropping a connection and then Rails trying to reuse it. This is still not Postgres’s fault.

In my particular case, it turned out my app was using Puma to serve the app, and when you’ve got Puma in there, you need to properly configure it to work with Postgres by setting up a connection pool, like this:
```
workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['MAX_THREADS'] || 1)
threads threads_count, threads_count

preload_app!

rackup DefaultRackup
port ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || 'development'

on_worker_boot do
  # Worker specific setup for Rails 4.1+
  ActiveRecord::Base.establish_connection
end

on_worker_shutdown do
  ActiveRecord::Base.connection.disconnect!
end
```
This is a minimal configuration; we’re probably going to want to add more workers later. The important part is the on_worker_boot block. This tells ActiveRecord that it’s time for a new connection pool for this worker. I’ve cut the number of threads to one, as connections cannot be shared across threads.

Once I had this in place, our build passed and all seems good. We may have to add some more workers once we go to production, but that’s a simple matter of updating the Ansible playbook to bump WEB_CONCURRENCY to an acceptable level.

Why didn’t this show up until today? Probably because we hadn’t gotten our application’s test suite up to the point that we used up all our connections and had some drop on us. We’re still early on in our development and are just starting to hit the database a fair amount.

I’m posting this because I searched for hours today for a solution that wasn’t just “turn off your security and it’ll be fine” – which wouldn’t have actually fixed the issue anyway.

[Edited 11/24/15 to add on_worker_shutdown block.]
November 24, 2015
Note on Vagrant UID conflicts
If you’ve restored from backup on OS X and are seeing this message when you issue vagrant up:
```
The VirtualBox VM was created with a user that doesn't match the
current user running Vagrant. VirtualBox requires that the same
user be used to manage the VM that was created. Please re-run 
Vagrant with that user. This is not a Vagrant issue.

The UID used to create the VM was: 503

Your UID is: 2612
```
Vagrant is lying. There’s a file in the directory where you’re trying to start the Vagrant box named .vagrant/machines/default/virtualbox/creator_uid; the UID of the creator is in that file. Change it to your current UID and everything will be fine again.
October 9, 2015
Git: undoer of bad decisions
I’ve been working on a major Perl code refactor since March; this is a fairly critical subsystem that unifies two slightly different ways of doing the same thing under the One True Way. I’m finally starting to come out the far end of this process, having learned several things very much the hard way.

The biggest mistake was not working out the most stepwise possible attack on the problem. I tackled a monolith of code and created a new monolith. The changeset was over a thousand lines of moved, tweaked, hoisted, rewritten, and fixed code – a recipe for failed code review. No matter how simple it seems to you because you’ve been living with the code for months on end, the reviewer will come up against a massive wall of code and bounce off it.

Second, I didn’t clearly establish a baseline set of tests for this code. It was, essentially, not tested. A few features were cursorily tested, but the majority of the code was uncovered. In addition, some code needed to live on the Apache web servers, and some on dumb database servers without Apache, so the structure of the code ended up being two communicating monoliths hooked up to mod_perl.

Third, I squashed too soon. Fifty-some commits were turned into a single commit that, to be fair to me, contained only 90 lines of new code – but in fairness to everyone else, shifted a thousand lines of code around, hoisting a lot to new common libraries, and changing one set to mach another.

The code worked, and was provably correct by my tests — but it was an utter failure as far as software engineering was concerned.

After a very educational conversation with my tech lead, Rachel, I headed back to revisit this change and make it into something my co-workers and I could live with.

First: build the infrastructure. I learned from the first try at the code that unit-testing it would not work well. Some of it could be unit-tested, but others simply died because they weren’t running under mod_perl, and couldn’t be mocked up to work without it. The best approach seemed to be to use a behavior-driven development approach: write the necessary interactions as interactions with an Apache instance running enough of the stack for me to test it. I decided that since, luckily, this particular part of the code had very little Javascript, and none along the critical path, I’d be able to write interaction tests using WWW::Mechanize, and verify that the right things had happened by checking over the headers and cooke jar and database.

I started off by creating tiny commits to add a couple of support functions for the Web testing — a WWW::Mechanize subclass optimized for our site, and a couple of support methods to make constructing URLs easier.

I then wrote a set of tests, each exercising a specific part of the code in question, and overall verifying that we had a set of tests that described the system behavior as it should be, both for good and bad inputs.

Once this was done, I turned back to the giant monolithic commit. I knew I wanted to unsquash the commits, but I wasn’t sure how, or what was safest. After some reading, I found a good description of using git reflog and git cherry-pick to restore a branch to its unsquashed shape, and a Stack Overflow post with more hints. With a little extra consideration and checking of git cherry-pick’s options, I was able to recover the original set of commits on a new branch. Here’s how:
1. Start with the output from git reflog. This tracks all your commits and branch switches. As long as your squashed commits point to something (in this case it’s the reflog), git won’t discard them.
2. Scan back for the first reference to the branch that you need to unsquash. Note its SHA1, open another window, and git checkout this SHA1. You’ll now be in “detached head” state.
3. git checkout -b some-name to create a new branch at the same point your desired branch was in when it was created.
4. Now scroll back through the reflog, and git cherry-pick all the commits on this branch. You can save time by cherry-picking ranges (sha1..sha1), which will apply them in reflog order to the branch
5. Continue until you’ve reached the last commit you want on this branch. You may end up skipping sets of commits if you’ve been working on other branches too; watch for branch switches away from the desired branch and then back to it.
You may hit minor problems re-applying the commits to the branch; resolve these as you normally would, and then use git cherry-pick –continue to complete the commit or continue the set of commits.

Once I had my original commits, I was able to cherry-pick these off the newly-recovered (and working) branch, a few at a time, and create smaller pull requests (add a test, add another test; shift code, verify the test still works; and so on).

The final result was a series of pull requests: tests to validate the current behavior, and then a series of hoists and refactors to get half of the code to the desired point, and then another series to bring the second half in line with the first.

Overall, this was more than a dozen pull requests, and I’m sure that my co-workers got sick of seeing more PRs from me every day, but the result was a properly-tested and properly-refactored set of code, and no one had any complaints about that.
October 2, 2015
Intro to Testing at SVPerl

As promised, here are the slides from my talk: Intro to Testing.

This version of the talk covers a little more ground than the previous version, and gives you the preferred ways of dealing with leaking output and warnings, and includes the new section on Cucumber.

You can also play with the demo programs which live on GitHub.

June 6, 2015
Sleep Number beds, QA techniques, and (not) saving $300
We’ve owned a Sleep Number bed since the mid-1990’s – an original two-cell queen, with the original old model wired-remote, loud-as-hell pump. It’s served us well and is still functioning fine after many years of service. When Shymala moved to Connecticut in 2010, she bought a new single-cell full-sized mattress, and took it with her when she moved to Brooklyn in 2012.

This last winter, we moved her Sleep Number bed from Brooklyn to San Jose. In Brooklyn, it had worked fine: it stayed inflated well and was solid and comfortable. We carefully did our research about moving inflatable beds and noted that the consensus was that the bed should be deflated to about 30 (out of 100) while going over the Rockies; the change in altitude would cause the air inside at sea-level pressures to expand significantly, and since the bed was quite new, we wanted to make sure it was going to survive the trip.

We followed the suggested procedure: deflated the bed, check; removed the hose, check; capped it off with the supplied plastic cap, check. Very much like the previous move of this same bed from Connecticut to Brooklyn, which worked just fine. We bagged it up and the movers put it in a box, and off it went to California.

We set it up on the bed frame in the same way it had been used in Brooklyn, reconnected the pump, and inflated it. Halfway through the first night, Shymala found herself all the way down on the slats of the bed frame. The cell had almost completely deflated. We tried it again, and it didn’t deflate as badly, but still too much to be comfortable. We experimented with how much it was inflated, whether rolling around on the bed made it deflate, but couldn’t get any kind of consistent result.

We finally hit on a temporary solution of inflating it to somewhat above the desired firmness, then popping the hose off and installing the cap. This kept the cell reasonably well inflated, but it still deflated slowly over the course of a couple days.

At this point we suspected that there had to be a slow leak in the cell but we couldn’t figure out why. We called Select Comfort, and they were as mystified as we were, which wasn’t encouraging. Their suggestion was that we needed a new cell, so we ordered one. After getting off the phone, we started thinking that since the cell was holding pressure better with the cap on than with the pump, maybe it was the pump instead. So we called back and ordered a pump.

The pump came, and we tested it with the pump. Same result: inflating it to full then waiting resulted in the cell slowly deflating over an hour or so to a setting of 5 on the remote. This meant the pump was not the problem.

The pump’s been used for about an hour, so we thought, “great, we can return the pump since it’s not the problem.” We called Select Comfort to arrange a return. Unfortunately, the customer service reps did not mention when we purchased the pump and cell that any parts you buy are non-returnable.

Let me emphasize that, for anyone else trying to debug a Select Comfort problem: items are 100% non-returnable. Do not expect to be able to swap parts to diagnose a problem.

So there’s $118 gone west, and the bed is still screwed up. They did offer to let us return the cell since the reps hadn’t told us that about the “you bought it, you’re stuck with it” policy. We hung up, and then started trying to figure out how to debug the problem without blowing any more money.

At this point, we could not be sure where the problem lay for certain. Was it that the pumps both had a similar issue that our configuration was exposing, or was the cell really messed up somehow in a non-obvious way? This resembles the case of a client-server problem in software: the two aren’t communicating, and you don’t have a way to unit test the client and server.

We had unit-tested the cell as best we could: we inflated it, and used the cap to simulate the pump sitting there not running. The slow deflation didn’t tell us anything because we weren’t sure that the two situations were exactly equivalent. We didn’t have a way to unit-test the pump (such as a manometer to verify that the check valve in the pump wasn’t leaking). This meant we had to come up with a way to do a better integration test.

After some thinking, we came up with the idea that there was a second Sleep Number bed available with a different model of pump altogether. If we swapped the pumps, we’d know whether the pump had an issue or if the cell did. (If the new pump held pressure on the old bed, the pump was OK; if the old pump failed to hold pressure on the new cell, the cell was bad.)

The test showed that the cell was bad: the new pump held pressure on the old bed just fine, and the old pump slowly lost pressure on the new bed.

Okay, so that meant that if we had opened the cell first, we would have not needed to open the pump at all. Bad guess on our part! We opened the new cell and installed it on the bed. The hose was considerably more difficult to get on the new cell, and observation showed that there were two O-rings that…aw, crap. I finished installing the hose, pumped up the bed, and as expected, it held pressure just fine with the original pump that we’d brought from Brooklyn.

I then went out and looked at the old cell. The attachment had a small groove in it which contained…no O-ring. Apparently when we disassembled the bed in Brooklyn, the O-ring failed or was pulled off. The connection in California had a good-enough seal to inflate the bed and keep it inflated for a while, but not a good enough one to sleep on. We essentially ended up spending $200 to replace a 25-cent O-ring.

Conclusions:
- Make sure you know baseline conditions before you start testing – the classic “known-good state”. If we’d had a baseline state that included a check for the O-ring, we would have solved this by a short trip to the hardware store.
- Make sure that your oracle, if you have one, is sufficient. Select Comfort’s customer service does not have a good diagnostic tree available to help them spot this problem, and therefore couldn’t tell us to check for this particular issue.
- Make sure you know the costs for testing. In this case, swapping parts to fix a Select Comfort bed is a problem, as everything is non-returnable.
- Know your problem space. When looking at a pneumatic connection, check to see if there are supposed to be O-rings there and verify that they’re still there.
- Search for the problem; if you have to solve it yourself, document it. No search turned up “look for the O-ring”, so here’s a tip: if you have an intermittent deflation problem with a Select Comfort bed after a move, look for the missing O-ring first.
April 4, 2015
Samsung SMH7178SME charcoal filter replacement tip

I’ve been in my new condo for 4 months now, and as always when you move into a place, you keep finding out things. I now know what the mystery switch in the hall is supposed to do: it’s for a wall sconce that the previous owner removed from the dining room wall. This results in their being no lighting at all in the dining room other than light from the kitchen. The wall sconce at the bottom of the stairs to the upper bedroom was very pretty, but it was open at the top, so if you turned on the light to go downstairs at night, you got bare bulb right in the eye. I’ve since replaced it with a nice cylindrical fixture from Lowe’s that is closed at the top. The light switches feel a bit worn, and I’ve been replacing them one by one with the flat rocker-style switches, which fit the modern style of the place better anyway.

I also have an over-the-stove microwave that vents inside rather than out. Not optimal, but I’m used to that from my old place in Redwood Shores, so this isn’t a big deal. It’s a Samsung model SMH7178SME, very pretty – all stainless steel, with a cute little flap that tilts out when the vent fan’s running, and closes when it’s not so you have a smooth continuous surface when it’s off. Feature-wise, I prefer the GE models, but this one is perfectly fine, if a bit on the high-powered side.

Being at the personal altitude I am, I couldn’t help noticing that when the fan was running, I could see that the grille inside the flap was fairly gunked up with dust trapped in greasy residue. That didn’t seem right, so I (finally) this week downloaded the manual to check out the right way to clean it.

After reading through, I discovered another little gotcha from he previous owner: both the grease filters and the charcoal filter were missing. (I’m guessing they were horrible and she just removed then instead of replacing them. Very much in line with just removing the wall sconce and conveniently forgetting there was a $2000 bill for deck repairs she hadn’t paid.)

I was able to pick up replacements from PartSelect.com (half the price of Sears Parts Direct – sorry, Sears!). They sent the filters in a bubble envelope, which wasn’t really quite enough protection. The filters got bent up a bit in transit. I was able to straighten the grease filters out sufficiently to get them to fit properly, and the charcoal filter was hefty enough that it was okay.

Installing it, however, was a different issue altogether. The manual says you need to remove two screws at the top of the microwave and then “pull off the grille” to access the place where the filter goes. It leaves out that you need to push the grille to the left first to get the tabs at the bottom to unseat!

Once this is done, you can simply pull the whole grille assembly off toward you to pop it off the front of the microwave, and follow the rest of the instructions from the manual – there’s a little place to plop the filter into, where it sits at an angle, tilted toward you.

Putting the grille back was challenging until I hit on opening the flap so I could see inside and line up the bottom tabs; after that it was less than a minute to all back together again.

I’ll try it out later today when I make a batch of pasta sauce and see how well it works to disperse odors. I don’t mind my house smelling of good food; I just like the choice of whether it does or not.

November 9, 2014
Automating ingest of StillStream signups

I help manage the StillStream internet radio site. We’re in the midst of a conversion of the site to WordPress to make it a lot more usable for both our listeners and our DJs, but at the moment we’re still running off our old custom PHP site coded by Palancar back when he first created the station.

One of the things this means is that we get all new signups by artists as emails delivered to me, and I have to manually enter the signup details into the site via PHPmyadmin. Our new system will automatically store the data in the database and let us approve the artists to be shown publicly, but for right now it’s a lot of cutting and pasting. Tedious and potentially easy to make a mistake, so I decided to see if I could automate it at least a little.

First, I’ve already got Gmail tagging the signups with their own label. gmvault, a nice Python program that can connect to Gmail and back up one’s emails, has recently been enhanced to allow you to pull down emails with a specific label. I log in to Gmail, check what the proper “in:” string is to select those emails, and then run gmvault to download just those:

gmvault sync –type custom –gmail-req “in:stillstream-inbox-artists” my.email@gmail.com

Joes-MacBook-Pro:~ joemcmahon$ cd gmvault-db/db/

Joes-MacBook-Pro:db joemcmahon$ ls

2014-05 2014-06 2014-07 2014-08 2014-09 2014-10 chats

(The ‘chats’ directory only contains Google Chat messages, so I’ll just ignore it.) Now I’ve got the emails, and I need to process them. Each email is essentially a dump of data from the signup forms. Here’s a sample email body – I’ve dropped the headers because they’re all the same and don’t contain any data I use for the process, and the field contents are redacted so my example artist doesn’t get spammed.

An artist has digitally signed the StillStream agreement.

Full Name: [John Doe]

Artist Name: [Does in the Woods]

URL: [http:/doesinthewoods.com]

Addr1: [123 Someplace Street]

Addr2: []

City: [Bay City]

State: [AK]

Zip: [99999]

Country: [US]

Email: [john.doe@doesinthewoods.com]

Phone: [715-555-1212]

Agreement Version: [3]

REMEMBER – artist is not in the database automatically any more – you must add them BY HAND.

What I need to do now is transform this into a nice MySQL UPDATE or INSERT statement I can execute on the StillStream server to add the artist to our database. A quick little Perl program seems the right idea.

October 25, 2014
xmonad on OS X Mavericks
I installed XQuartz today, and while looking around for a low-distraction window manager, I came across xmonad. It looked interesting, and I started following the installation instructions and found they were out of date. Here’s an updated set of instructions for installing xmonad.
1. Install XQuartz.
2. Install homebrew if you don’t already have it.
3. brew update
4. brew install ghc cabal-install wget
5. cabal update
6. export LIBRARY_PATH=/usr/local/lib:/usr/X11/lib
7. cabal install xmonad
8. Launch XQuartz and go to Preferences (command-,). Set the following:
  - Output
    
    Enable “Full-screen mode”
  - Input
    
    Enable “Emulate three button mouse”
    
    Disable “Follow system keyboard layout”
    
    Disable “Enable key equivalents under X11”
    
    Enable “Option keys sent Alt_L and Alt_R”
  - Pasteboard
    
    Enable all of the options
monad has been installed in $HOME/.cabal/bin/xmonad. You now need to create an .xinitrc that will make XQuartz run monad. Edit ~/.xinitrc and add these lines:
```
[[ -f ~/.Xresources ]] && xrdb -load ~/.Xresources
xterm &
$HOME/.cabal/bin/xmonad
```
You can now launch XQuartz; nothing seems to happen, but press command-option-A and the xmonad “desktop” (one huge xterm) will appear, covering the whole screen. Great! It’s using the default teeny and nasty xterm font, though. Let’s pretty it up a bit by making it use Monaco instead. Edit ~/.xresources and add these lines:
```
xterm*background: Black
xterm*foreground: White
xterm*termName: xterm-color
xterm*faceName: Monaco
```
Quit XQuartz with command-Q, and then relaunch, then hit command-option-A again to see the XQuartz desktop. The terminal should now be displaying in Monaco.

At this point, you should take a look at the guided tour and get familiar with xmonad. If you’re looking for a distraction-free working environment, this might be good for you. I’m going to give it a try and see how it works out.
October 23, 2014

Shellshock scanner

So I had a bunch of machines with a standard naming convention that I needed to scan for the Shellshock bug. Since I just needed to run a command on each one and check the output, and I had SSH access, it seemed easy enough to put together a quick script to manage the process.

Here’s a skeleton of that script, with the details on what machines I was logging into elided. This does a pretty reasonable job, checking 300 machines in about a minute. You need to have a more recent copy of Parallel::ForkManager, as versions prior to 1.0 don’t have the ability to return a data structure from the child.

$|++;
use strict;
use warnings;
use Parallel::ForkManager 1.07;

my $MAX_PROCESSES = 25;
my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
my @servers = @SERVER_NAMES;
my %statuses;
my @diagnostics;
$pm-> run_on_finish (
    sub {
        my($pid, $exit_code, $ident, $exit_signal, $core_dump,
           $data_structure_reference) = @_;
        if (defined($data_structure_reference)) { 
            my ($host_id, $status, $results) = @{$data_structure_reference};
            if ($status eq 'Unknown') {
                push @diagnostics, $host_id, $results;
            } else {
                push @{ $statuses{$status} }, $host_id;
            }
        } else { 
            warn qq|No message received from child process $pid!\n|;
        }
    }
);

print "Testing servers: ";
for my $host_id (@servers) {
    my $pid = $pm->start and next;
    my $result = << `EOF`;
ssh -o StrictHostKeyChecking=no $host_id <<'ENDSSH' 2>&1
env x='() { :;}; echo vulnerable' bash -c "echo this is a test"
ENDSSH
EOF
    my $status;
    if ($result =~ /Permission denied/is) {
       $status = q{Inacessible};
    } elsif ($result =~ /key verification failed/s) {
       $status = q{Key changed};
    } elsif ($result =~ /timed out/is) {
       $status = q{Timed out};
    } elsif ($result =~ /vulnerable/s) {
           $status = q{Vulnerable};
    } elsif ($result =~ /ignoring function definition attempt/s) {
       $status = q{Patched};
    } elsif ($result =~ /Name or service not known/s) {
       $status = q{Nonexistent};
    } else {
       $status = q{Unknown}
    }
    print "$host_id, ";
    $pm->finish(0, [$host_id, $status, $result]);
}
$pm->wait_all_children;
print "done!\n";
for my $status (keys %statuses) {
    print "$status: ",join(',', @{$statuses{$status}}), "\n";
}
print "The following hosts returned an undiagnosed status:",
      join("\n", @diagnostics), "\n";

Note that this doesn’t test the most recent version (#3) of the bug; I have modified it slightly to test for that, but that’s a reasonable exercise for the reader.

September 28, 2014

Laundry part 3: solved

University Electric came through like champs. I found a 24″ GE unitized washer-dryer that would fit and that had generally positive reviews, checked that they were a GE dealer, called them up on last Tuesday, and asked if they could get it for me. “Yes. Saturday.” Well then. That’s faster than I expected. They called me Friday to let me know that yes, I was on the schedule 8-11 AM tomorrow. They arrived at 8:30, and they were done and I was taking care of the queued laundry by 9:30.

All in all a very satisfactory experience; I do recommend that you figure out what you want yourself, though – the last-year’s Bosch that they had would have been fine, I’m sure, but the reviews were too up-and-down for me to feel comfortable spending almost $700 more than I would have for the original full-size pair I tried to get in here. I was also a bit doubtful about getting service.

The new machine is a 2.0 cu. ft. washer/4.0 cu ft. dryer, so it’s not large, but neither is it hideously small. Seems to do a fine job both washing and drying. It has a 240V vented dryer, so it can actually manage to dry the clothes, getting around the problem that people were complaining about the non-vented and 120V dryers. Doing a good job so far; I’ll wait for a few months’ experience before I try to rate it.

September 28, 2014