Git: undoer of bad decisions

I’ve been working on a major Perl code refactor since March; this is a fairly critical subsystem that unifies two slightly different ways of doing the same thing under the One True Way. I’m finally starting to come out the far end of this process, having learned several things very much the hard way.

The biggest mistake was not working out the most stepwise possible attack on the problem. I tackled a monolith of code and created a new monolith. The changeset was over a thousand lines of moved, tweaked, hoisted, rewritten, and fixed code – a recipe for failed code review. No matter how simple it seems to you because you’ve been living with the code for months on end, the reviewer will come up against a massive wall of code and bounce off it.

Second, I didn’t clearly establish a baseline set of tests for this code. It was, essentially, not tested. A few features were cursorily tested, but the majority of the code was uncovered. In addition, some code needed to live on the Apache web servers, and some on dumb database servers without Apache, so the structure of the code ended up being two communicating monoliths hooked up to mod_perl.

Third, I squashed too soon. Fifty-some commits were turned into a single commit that, to be fair to me, contained only 90 lines of new code – but in fairness to everyone else, shifted a thousand lines of code around, hoisting a lot to new common libraries, and changing one set to mach another.

The code worked, and was provably correct by my tests — but it was an utter failure as far as software engineering was concerned.

After a very educational conversation with my tech lead, Rachel, I headed back to revisit this change and make it into something my co-workers and I could live with.

First: build the infrastructure. I learned from the first try at the code that unit-testing it would not work well. Some of it could be unit-tested, but others simply died because they weren’t running under mod_perl, and couldn’t be mocked up to work without it. The best approach seemed to be to use a behavior-driven development approach: write the necessary interactions as interactions with an Apache instance running enough of the stack for me to test it. I decided that since, luckily, this particular part of the code had very little Javascript, and none along the critical path, I’d be able to write interaction tests using WWW::Mechanize, and verify that the right things had happened by checking over the headers and cooke jar and database.

I started off by creating tiny commits to add a couple of support functions for the Web testing — a WWW::Mechanize subclass optimized for our site, and a couple of support methods to make constructing URLs easier.

I then wrote a set of tests, each exercising a specific part of the code in question, and overall verifying that we had a set of tests that described the system behavior as it should be, both for good and bad inputs.

Once this was done, I turned back to the giant monolithic commit. I knew I wanted to unsquash the commits, but I wasn’t sure how, or what was safest. After some reading, I found a good description of using git reflog and git cherry-pick to restore a branch to its unsquashed shape, and a Stack Overflow post with more hints. With a little extra consideration and checking of git cherry-pick’s options, I was able to recover the original set of commits on a new branch. Here’s how:

  1. Start with the output from git reflog. This tracks all your commits and branch switches. As long as your squashed commits point to something (in this case it’s the reflog), git won’t discard them.
  2. Scan back for the first reference to the branch that you need to unsquash. Note its SHA1, open another window, and git checkout this SHA1. You’ll now be in “detached head” state.
  3. git checkout -b some-name to create a new branch at the same point your desired branch was in when it was created.
  4. Now scroll back through the reflog, and git cherry-pick all the commits on this branch. You can save time by cherry-picking ranges (sha1..sha1), which will apply them in reflog order to the branch
  5. Continue until you’ve reached the last commit you want on this branch. You may end up skipping sets of commits if you’ve been working on other branches too; watch for branch switches away from the desired branch and then back to it.

You may hit minor problems re-applying the commits to the branch; resolve these as you normally would, and then use git cherry-pick –continue to complete the commit or continue the set of commits.

Once I had my original commits, I was able to cherry-pick these off the newly-recovered (and working) branch, a few at a time, and create smaller pull requests (add a test, add another test; shift code, verify the test still works; and so on).

The final result was a series of pull requests: tests to validate the current behavior, and then a series of hoists and refactors to get half of the code to the desired point, and then another series to bring the second half in line with the first.

Overall, this was more than a dozen pull requests, and I’m sure that my co-workers got sick of seeing more PRs from me every day, but the result was a properly-tested and properly-refactored set of code, and no one had any complaints about that.

Posted in Uncategorized | Leave a comment

Intro to Testing at SVPerl

As promised, here are the slides from my talk: Intro to Testing.

This version of the talk covers a little more ground than the previous version, and gives you the preferred ways of dealing with leaking output and warnings, and includes the new section on Cucumber.

You can also play with the demo programs which live on GitHub.

Posted in Uncategorized | Leave a comment

Sleep Number beds, QA techniques, and (not) saving $300

We’ve owned a Sleep Number bed since the mid-1990’s – an original two-cell queen, with the original old model wired-remote, loud-as-hell pump. It’s served us well and is still functioning fine after many years of service. When Shymala moved to Connecticut in 2010, she bought a new single-cell full-sized mattress, and took it with her when she moved to Brooklyn in 2012.

This last winter, we moved her Sleep Number bed from Brooklyn to San Jose. In Brooklyn, it had worked fine: it stayed inflated well and was solid and comfortable. We carefully did our research about moving inflatable beds and noted that the consensus was that the bed should be deflated to about 30 (out of 100) while going over the Rockies; the change in altitude would cause the air inside at sea-level pressures to expand significantly, and since the bed was quite new, we wanted to make sure it was going to survive the trip.

We followed the suggested procedure: deflated the bed, check; removed the hose, check; capped it off with the supplied plastic cap, check. Very much like the previous move of this same bed from Connecticut to Brooklyn, which worked just fine. We bagged it up and the movers put it in a box, and off it went to California.

We set it up on the bed frame in the same way it had been used in Brooklyn, reconnected the pump, and inflated it. Halfway through the first night, Shymala found herself all the way down on the slats of the bed frame. The cell had almost completely deflated. We tried it again, and it didn’t deflate as badly, but still too much to be comfortable. We experimented with how much it was inflated, whether rolling around on the bed made it deflate, but couldn’t get any kind of consistent result.

We finally hit on a temporary solution of inflating it to somewhat above the desired firmness, then popping the hose off and installing the cap. This kept the cell reasonably well inflated, but it still deflated slowly over the course of a couple days.

At this point we suspected that there had to be a slow leak in the cell but we couldn’t figure out why. We called Select Comfort, and they were as mystified as we were, which wasn’t encouraging. Their suggestion was that we needed a new cell, so we ordered one. After getting off the phone, we started thinking that since the cell was holding pressure better with the cap on than with the pump, maybe it was the pump instead. So we called back and ordered a pump.

The pump came, and we tested it with the pump. Same result: inflating it to full then waiting resulted in the cell slowly deflating over an hour or so to a setting of 5 on the remote. This meant the pump was not the problem.

The pump’s been used for about an hour, so we thought, “great, we can return the pump since it’s not the problem.” We called Select Comfort to arrange a return. Unfortunately, the customer service reps did not mention when we purchased the pump and cell that any parts you buy are non-returnable.

Let me emphasize that, for anyone else trying to debug a Select Comfort problem: items are 100% non-returnable. Do not expect to be able to swap parts to diagnose a problem.

So there’s $118 gone west, and the bed is still screwed up. They did offer to let us return the cell since the reps hadn’t told us that about the “you bought it, you’re stuck with it” policy. We hung up, and then started trying to figure out how to debug the problem without blowing any more money.

At this point, we could not be sure where the problem lay for certain. Was it that the pumps both had a similar issue that our configuration was exposing, or was the cell really messed up somehow in a non-obvious way? This resembles the case of a client-server problem in software: the two aren’t communicating, and you don’t have a way to unit test the client and server.

We had unit-tested the cell as best we could: we inflated it, and used the cap to simulate the pump sitting there not running. The slow deflation didn’t tell us anything because we weren’t sure that the two situations were exactly equivalent. We didn’t have a way to unit-test the pump (such as a manometer to verify that the check valve in the pump wasn’t leaking). This meant we had to come up with a way to do a better integration test.

After some thinking, we came up with the idea that there was a second Sleep Number bed available with a different model of pump altogether. If we swapped the pumps, we’d know whether the pump had an issue or if the cell did. (If the new pump held pressure on the old bed, the pump was OK; if the old pump failed to hold pressure on the new cell, the cell was bad.)

The test showed that the cell was bad: the new pump held pressure on the old bed just fine, and the old pump slowly lost pressure on the new bed.

Okay, so that meant that if we had opened the cell first, we would have not needed to open the pump at all. Bad guess on our part! We opened the new cell and installed it on the bed. The hose was considerably more difficult to get on the new cell, and observation showed that there were two O-rings that…aw, crap. I finished installing the hose, pumped up the bed, and as expected, it held pressure just fine with the original pump that we’d brought from Brooklyn.

I then went out and looked at the old cell. The attachment had a small groove in it which contained…no O-ring. Apparently when we disassembled the bed in Brooklyn, the O-ring failed or was pulled off. The connection in California had a good-enough seal to inflate the bed and keep it inflated for a while, but not a good enough one to sleep on. We essentially ended up spending $200 to replace a 25-cent O-ring.


  • Make sure you know baseline conditions before you start testing – the classic “known-good state”. If we’d had a baseline state that included a check for the O-ring, we would have solved this by a short trip to the hardware store.
  • Make sure that your oracle, if you have one, is sufficient. Select Comfort’s customer service does not have a good diagnostic tree available to help them spot this problem, and therefore couldn’t tell us to check for this particular issue.
  • Make sure you know the costs for testing. In this case,  swapping parts to fix a Select Comfort bed is a problem, as everything is non-returnable.
  • Know your problem space. When looking at a pneumatic connection, check to see if there are supposed to be O-rings there and verify that they’re still there.
  • Search for the problem; if you have to solve it yourself, document it. No search turned up “look for the O-ring”, so here’s a tip: if you have an intermittent deflation problem with a Select Comfort bed after a move, look for the missing O-ring first.
Posted in Uncategorized | Leave a comment

Samsung SMH7178SME charcoal filter replacement tip

I’ve been in my new condo for 4 months now, and as always when you move into a place, you keep finding out things. I now know what the mystery switch in the hall is supposed to do: it’s for a wall sconce that the previous owner removed from the dining room wall. This results in their being no lighting at all in the dining room other than light from the kitchen. The wall sconce at the bottom of the stairs to the upper bedroom was very pretty, but it was open at the top, so if you turned on the light to go downstairs at night, you got bare bulb right in the eye. I’ve since replaced it with a nice cylindrical fixture from Lowe’s that is closed at the top. The light switches feel a bit worn, and I’ve been replacing them one by one with the flat rocker-style switches, which fit the modern style of the place better anyway.

I also have an over-the-stove microwave that vents inside rather than out. Not optimal, but I’m used to that from my old place in Redwood Shores, so this isn’t a big deal. It’s a Samsung model SMH7178SME, very pretty – all stainless steel, with a cute little flap that tilts out when the vent fan’s running, and closes when it’s not so you have a smooth continuous surface when it’s off. Feature-wise, I prefer the GE models, but this one is perfectly fine, if a bit on the high-powered side.

Being at the personal altitude I am, I couldn’t help noticing that when the fan was running, I could see that the grille inside the flap was fairly gunked up with dust trapped in greasy residue. That didn’t seem right, so I (finally) this week downloaded the manual to check out the right way to clean it.

After reading through, I discovered another little gotcha from he previous owner: both the grease filters and the charcoal filter were missing. (I’m guessing they were horrible and she just removed then instead of replacing them. Very much in line with just removing the wall sconce and conveniently forgetting there was a $2000 bill for deck repairs she hadn’t paid.)

I was able to pick up replacements from (half the price of Sears Parts Direct – sorry, Sears!). They sent the filters in a bubble envelope, which wasn’t really quite enough protection. The filters got bent up a bit in transit. I was able to straighten the grease filters out sufficiently to get them to fit properly, and the charcoal filter was hefty enough that it was okay.

Installing it, however, was a different issue altogether. The manual says you need to remove two screws at the top of the microwave and then “pull off the grille” to access the place where the filter goes. It leaves out that you need to push the grille to the left first to get the tabs at the bottom to unseat!

Once this is done, you can simply pull the whole grille assembly off toward you to pop it off the front of the microwave, and follow the rest of the instructions from the manual – there’s a little place to plop the filter into, where it sits at an angle, tilted toward you.

Putting the grille back was challenging until I hit on opening the flap so I could see inside and line up the bottom tabs; after that it was less than a minute to all back together again.

I’ll try it out later today when I make a batch of pasta sauce and see how well it works to disperse odors. I don’t mind my house smelling of good food; I just like the choice of whether it does or not.

Posted in Uncategorized | Leave a comment

xmonad on OS X Mavericks

I installed XQuartz today, and while looking around for a low-distraction window manager, I came across xmonad. It looked interesting, and I started following the installation instructions and found they were out of date. Here’s an updated set of instructions for installing xmonad.

  1. Install XQuartz.
  2. Install homebrew if you don’t already have it.
  3. brew update
  4. brew install ghc cabal-install wget
  5. cabal update
  6. export LIBRARY_PATH=/usr/local/lib:/usr/X11/lib
  7. cabal install xmonad
  8. Launch XQuartz and go to Preferences (command-,). Set the following:
    •  Output
      • Enable “Full-screen mode”
    •  Input
      • Enable “Emulate three button mouse”
      • Disable “Follow system keyboard layout”
      • Disable “Enable key equivalents under X11”
      • Enable “Option keys sent Alt_L and Alt_R”
    •  Pasteboard
      • Enable all of the options

monad has been installed in $HOME/.cabal/bin/xmonad. You now need to create an .xinitrc that will make XQuartz run monad. Edit ~/.xinitrc and add these lines:

[[ -f ~/.Xresources ]] && xrdb -load ~/.Xresources
xterm &

You can now launch XQuartz; nothing seems to happen, but press command-option-A and the xmonad  “desktop” (one huge xterm) will appear, covering the whole screen. Great! It’s using the default teeny and nasty xterm font, though. Let’s pretty it up a bit by making it use Monaco instead. Edit ~/.xresources and add these lines:

xterm*background: Black
xterm*foreground: White
xterm*termName: xterm-color
xterm*faceName: Monaco

Quit XQuartz with command-Q, and then relaunch, then hit command-option-A again to see the XQuartz desktop. The terminal should now be displaying in Monaco.

At this point, you should take a look at the guided tour and get familiar with xmonad. If you’re looking for a distraction-free working environment, this might be good for you. I’m going to give it a try and see how it works out.

Posted in Uncategorized | Leave a comment

Shellshock scanner

So I had a bunch of machines with a standard naming convention that I needed to scan for the Shellshock bug. Since I just needed to run a command on each one and check the output, and I had SSH access, it seemed easy enough to put together a quick script to manage the process.

Here’s a skeleton of that script, with the details on what machines I was logging into elided. This does a pretty reasonable job, checking 300 machines in about a minute. You need to have a more recent copy of Parallel::ForkManager, as versions prior to 1.0 don’t have the  ability to return a data structure from the child.

use strict;
use warnings;
use Parallel::ForkManager 1.07;

my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
my @servers = @SERVER_NAMES;
my %statuses;
my @diagnostics;
$pm-> run_on_finish (
    sub {
        my($pid, $exit_code, $ident, $exit_signal, $core_dump,
           $data_structure_reference) = @_;
        if (defined($data_structure_reference)) { 
            my ($host_id, $status, $results) = @{$data_structure_reference};
            if ($status eq 'Unknown') {
                push @diagnostics, $host_id, $results;
            } else {
                push @{ $statuses{$status} }, $host_id;
        } else { 
            warn qq|No message received from child process $pid!\n|;

print "Testing servers: ";
for my $host_id (@servers) {
    my $pid = $pm->start and next;
    my $result = << `EOF`;
ssh -o StrictHostKeyChecking=no $host_id <<'ENDSSH' 2>&1
env x='() { :;}; echo vulnerable' bash -c "echo this is a test"
    my $status;
    if ($result =~ /Permission denied/is) {
       $status = q{Inacessible};
    } elsif ($result =~ /key verification failed/s) {
       $status = q{Key changed};
    } elsif ($result =~ /timed out/is) {
       $status = q{Timed out};
    } elsif ($result =~ /vulnerable/s) {
           $status = q{Vulnerable};
    } elsif ($result =~ /ignoring function definition attempt/s) {
       $status = q{Patched};
    } elsif ($result =~ /Name or service not known/s) {
       $status = q{Nonexistent};
    } else {
       $status = q{Unknown}
    print "$host_id, ";
    $pm->finish(0, [$host_id, $status, $result]);
print "done!\n";
for my $status (keys %statuses) {
    print "$status: ",join(',', @{$statuses{$status}}), "\n";
print "The following hosts returned an undiagnosed status:",
      join("\n", @diagnostics), "\n";

Note that this doesn’t test the most recent version (#3) of the bug; I have modified it slightly to test for that, but that’s a reasonable exercise for the reader.

Posted in Uncategorized | Leave a comment

Laundry part 3: solved

University Electric came through like champs. I found a 24″ GE unitized washer-dryer that would fit and that had generally positive reviews, checked that they were a GE dealer, called them up on last Tuesday, and asked if they could get it for me. “Yes. Saturday.” Well then. That’s faster than I expected. They called me Friday to let me know that yes, I was on the schedule 8-11 AM tomorrow. They arrived at 8:30, and they were done and I was taking care of the queued laundry by 9:30.

All in all a very satisfactory experience; I do recommend that you figure out what you want yourself, though –  the last-year’s Bosch that they had would have been fine, I’m sure, but the reviews were too up-and-down for me to feel comfortable spending almost $700 more than I would have for the original full-size pair I tried to get in here. I was also a bit doubtful about getting service.

The new machine is a 2.0 cu. ft. washer/4.0 cu ft. dryer, so it’s not large, but neither is it hideously small. Seems to do a fine job both washing and drying. It has a 240V vented dryer, so it can actually manage to dry the clothes, getting around the problem that people were complaining about the non-vented and 120V dryers. Doing a good job so far; I’ll wait for a few months’ experience before I try to rate it.


Posted in Uncategorized | Leave a comment

Laundry, part 2, bicycles, and too much sun

First, if you read my blog, and you send music to, please note that my address has changed; check the site for the new address. The new tenant in my old apartment is quite confused by the CDs he’s getting even though I’m set up forwarding for my mail. Now on to the trivia of everyday life.

So I still haven’t actually gotten anything into the new place to do laundry with. Obviously I’m going to need to do this sometime soon as I cannot wait until I run out of underwear to make the decision on this. Well I can, but I won’t be very popular.

So today I am headed over to University Electric in Santa Clara to see what they can do for me in terms of a washer-dryer that will actually fit into the space that I have. It looks like I’m either going to have to go with a stacked unit similar to the one I had in here before (I wouldn’t wash anybody’s clothes in that, and I suppose it’s just as well the Best Buy guys took it away), or I’m going to have to go with a European washer and dryer. Those are still not very popular here in the US, so I don’t have a very good basis on which to judge them. The ratings tend to be all over the place, from “oh my God best washer ever” to “this is a terrible piece of junk and I wasted my money and I hate life”, so it’s difficult to get a bead on exactly how good or bad they are.

Sorry – just got distracted by a hummingbird in the tree outside the window. Where was I?

I also made a slight misjudgment as far as the crime rate in the local neighborhood. Understand, the place is safe to walk around in, even at night,but there is apparently a potential for petty theft. (Apparently there’s a problem with some of the local high school age kids.) When I arrived, I put my bicycle in the bicycle rack inside the parking garage, and the rest of the bicycles didn’t seem to be locked up. So I figured, “Oh, this must be plenty secure then.” and left it unlocked and didn’t think anything further about it. About a week later I came home, thinking, ” hey, I should probably take my bike out for a ride today,” and…no bike. Apparently during the time when the outside of the place was being painted someone came into the garage and lifted my bicycle.

Not really happy about this because I really did like that bicycle quite a lot – it wasn’t the world’s most wonderful or expensive bicycle but it was my bicycle. (It may have been one of the local homeless folks, and in that case I don’t feel quite so bad, but I really wasn’t planning to give my bicycle away – I was planning to ride it.)

A neighbor happened to have a what looks like 1990s-vintage Specialized Ground Control bicycle sitting in his garage which he gave me; according to the folks at REI when I took it in to see what repairs it needed, it’s not worth repairing. I’m going to check in with a local Specialized bicycle shop and see if they have a different take on this; it looks like a really nice mountain bike.  If it’s not too terribly expensive to fix up I actually kind of like it. Looks like it’ll need new front forks and probably a new rear shock; the tires are probably also going to need replacing and the brake pads are shot… Okay, so the frame is in good shape…

The REI guy said that I probably ought to consider saving up for a new bike instead because he could probably get me into something for around $200-300,  which I’m guessing means that he thought it would be at least that much to fix it. I’ll get a second opinion today at Mike’s Bikes, which is a Specialized shop, and if they say the same, I’ll consider the bike a lost cause, and take it over to Goodwill to drop off.

The other thing today is that I realized that the clerestory windows I have in my main room, beautiful as they are, really let a lot of sun in. I really haven’t spent enough time here to this point to notice this. The AC cools the place off again okay, but they’re going to have to be blocked off at least part of the time; I got toasted enough by the hot sunlight that I needed to put on some anti-sunburn lotion and drink a lot of water. I need to talk to my real estate agent and see if putting in remote-controlled blinds for those windows is a good idea, or if I’ll have to take them down again when I want to sell the place, In which case it’s not worth doing, And I suppose I have to check with the HOA as well and make sure that’s this is not breaking one of the covenants.

Anyway overall the new house is really quite nice and livable, or it least it will be as soon as I get all of these bloody boxes out of here. Still in the process of unpacking, and there’s always more stuff you find out you have to have, bring in, assemble, and then get rid of the boxes from that too. My weekends will not be idle for a while yet.

Off to the appliance store; back later.

Posted in Uncategorized | Leave a comment

Best Buy and appliances: avoid

If this had not actually happened to me I would say it had to be made up, but this is a precise report of exactly how badly Best Buy managed to handle a recent attempt to purchase a new washer and dryer.

I spent quite a lot of time researching and finally picked out a washer and dryer for my new place. I did make a mistake as far as size; the units I picked would have fit, but they didn’t leave enough clearance on either side. So I’ll own up to that.

Delivery was on July 19th. Install crew number 1 removed the old washer/dryer combo (side note: if they do not install the new unit do not let them take the old one). They looked at the taps and said, “oh, hey, those look like they might be leaking, you need to get that checked. We can’t install this.” So I am left with no working washer and dryer, and the new ones in the middle of my floor. I get the plumber in, he looks and says, “yep, those need tightening up”. He fixed them, did a pressure test, all good.

I call Best Buy, they can’t get anyone out for a week. Washer and dryer in the middle of the living room.

Second install crew comes, says, “oh, you didn’t buy the installation stuff from Best Buy, we can’t install this”. Despite the fact that the stuff in question was identical to the Best Buy materials. They refused to install it even with my parts if I said fine, I don’t care, I just want working appliances. Nope. They measured and said “It’ll stick out about 3 inches, is that OK?” Fine by me if I can wash my clothes. Washer and dryer still in the living room. I’m starting to think of them as an art piece by this point.

I go to Best Buy and buy the parts they say I need. Three more days before install crew 3 comes out.

Install crew three arrives. “Oh, we can’t install this, it won’t fit.”

I am at this point rendered speechless. I call dispatch and tell them, “you have two guys and a truck here who refuse to install the appliances. Fuck this. I want them gone, now.”

“We can’t do that. We’ll have to send another crew.”

At this point it was lucky I was unarmed. Another two days, crew #4 shows up and takes them away. I call dispatch, who assures me that they’ll do a refund as soon as they appliances get back to dispatch.

It’s August 1st now, in case you’re counting.

I get the return letter on Saturday, and figure it’ll take till Monday. So I wait. Monday, no refund. Tuesday, no refund.

I call.

“You’ll have to go to the store where you purchased the item to finish the return.”

I bought it on the Internet, so I didn’t buy it in a store.

Still have to go to the Santana Row Best Buy to get my refund.

Amazon, those guys are not.

Posted in Uncategorized | Leave a comment

ETL into WordPress: lessons learned

I had a chance this weekend to do a little work on importing a large (4000 or so articles and pages) site into WordPress. It was an interesting bit of work, with a certain amount learning required on my part – which translated into some flailing around on to establish the toolset.

Lesson 1: ALWAYS use a database in preference to anything else when you can. 
I wasted a couple hours trying to clean up the data for CSV import using any of a number of WordPress plugins. Unfortunately, CSV import is half-assed at best – more like about quarter-assed, and any cleanup in Excel is excruciatingly slow.
Some of the data came out with mismatched quotes, leaving me with aberrant entries in the spreadsheet that caused Excel to throw an out-of-memory error and refuse to process them when I tried to delete the bad rows or even cells from those bad rows.
Even attempting to work with the CSV data using Text::CSV in Perl was problematic because the site export data (from phpMyAdmin) was fundamentally broken. I chalk that partially up to the charset problems we’ll talk about later.
I loaded up the database using MAMP, which worked perfectly well, and was able to use Perl DBI to pull the pages and posts out without a hitch, even the ones with weirdo character set problems.
Lesson 2: address character set problems first
I had a number of problems with the XMLRPC interface to WordPress (which otherwise is great, see below) when the data contained improperly encoded non-ASCII characters. I was eventually forced to write code to swap the strings into hex, find the bad 3 and 4 character runs, and replace them with the appropriate Latin-1 substitutes (note that these don’t quite match that table – I had to look for the ”e2ac’ or ‘c3’ delimiter characters in the input to figure out where the bad characters were. Once I hit on this idea, it worked very well.
Lesson 3: build in checkpointing from the start for large import jobs
The various problems ended up causing me to repeatedly wipe the WordPress posts database and restart the import, which wasted a lot of time. I did not count that toward the overall time needed to complete when I charged my client. If I had, it would have been more like 20-24 hours instead of 6. Fortunately the imports were, until a failure occurred, a start-it-and-forget-it process. It was necessary to wipe the database between tried because WordPress otherwise very carefully preserves all the previous versions, and cleaning them out is even slower.
I hit on the expedient of recording the row ID of an item each time one successfully imported and dumping that list out in a Perl END block. If the program fell over and exited due to a charset problem, I got a list of the rows that had processed OK which I could then add to an ignore list. Subsequent runs could simply exclude those records to get me straight to the stuff I hadn’t done yet and and to avoid duplicate entries.
I had previously tried just logging the bad ones and going back to redo those, but it turned out to be easier to exclude than include.
Lesson 4: WordPress::API and WordPress XMLRPC are *great*.
I was able to find the WordPress::API module on CPAN, which provides a nice object-oriented wrapper around WordPress XMLRPC. With that, I was able to programmatically add posts and pages about as fast as I could pull them out of the local database.
Lesson 5: XMLRPC just doesn’t support some stuff
You can’t add users or authors via XMLRPC, sadly. In the future, the better thing to do would probably be to log directly in to the server you’re configuring, load the old data into the database, and use the PHP API calls  directly to create users and authors as well as directly load the data into WordPress. I decided not to embark on this, this time, because I’m faster and more able in Perl than I am in PHP, and I decided it would be faster to go that way than try to teach myself a new programming language and solve the problem simultaneously.
I’d call this mostly successful. The data made it in to the WordPress installation, and I have an XML dump from WordPress that will let me restore it at will. All of the data ended up where it was supposed to go, and it all looks complete. I have a stash of techniques and sample code to work with if I need to do it again.
Posted in Uncategorized | Leave a comment