Category: Programming

  • A quick note on Mac malware

    The most recent bit of OS X malware making the rounds is a fake file converter program that installs a PHP backdoor accessible via Tor, allowing some rando to rummage around on your machine. As is usual, you, the victim, have to download the software, and run it, and allow the program to install its backdoor code.

    If you want to protect yourself from malware on your Mac, there are three principal things you can do.

    1. Don’t download software if you’re not 100% sure of its provenance, function, and dependability. Just because it’s on MacUpdate doesn’t mean it’s okay.
    2. If you do accidentally download something – say you’re using one of those dodgy file-sharing sites with fifteen ads masquerading as the real download button – don’t run it. Just delete it.
    3. If, despite all this, you did download something and run it, under no circumstances enter your administrator password when it asks to install “support software” or the like unless you know exactly what’s going to happen if you do.

    You still have the backstop that nothing is going to get installed as a persistent backdoor if you don’t enter your administrator password when prompted, but it’s trivially easy to build a backdoor that runs only when the software is running. Don’t run random programs you aren’t sure you can trust. Find a trusted third party that knows for certain that the program is safe.

    If you insist on living dangerously, there are a couple utilities I’ll mention below that will allow you to try to prevent the damage, and I emphasize try. They are no guarantee of safety; if you download pirated software or random things to “try them out” or “see if they work”, you’re sooner or later going to mess yourself up. These monitors are counting on your personal software sophistication to protect yourself from harm. They are only useful if you understand what they are telling you.

    I run the programs I mention below not because they magically keep me safe from bad programs, but because I like to know what’s going on behind the scenes on my Mac. If you aren’t certain you know what ~/Library/LaunchAgents does, or what a connection over port 443 means, you may not want to try using these programs, because they will confuse you; if you try to use them by simply blocking everything, you’ll find that things that are actually supposed to make outgoing connections (like Mail) will stop working, and that software that really needs to install agents, like Dropbox, will break. Conversely, if you just say “yes” to everything, things like the fake file converter mentioned above will get to install their hooks and they will allow who knows who to read your mail and download your naked selfies.

    If I haven’t lost you at this point – you understand OS x/Unix well enough to understand what connections are good and what ones are bad, and you know what a file in ~/Library/LaunchAgents is for:

    • Little Snitch is a program that sits in the background and alerts you whenever your machine tries to make a network connection, whether incoming our outgoing. If you don’t respond, the connection is automatically blocked. You can add rules to automatically allow or automatically block connections. This utility will let you know if someone is actively trying to connect to your machine, or if your machine is trying to make an unexpected outgoing connection.
    • BlockBlock is a utility that monitors attempts to install long-running processes like the one that constitutes the Tor/PHP backdoor and reports them to you with the option to block them. In the case of EasyDoc Converter, it’d be pretty easy to spot that the software was up to no good, as it’s attempting to install stuff named “dropbox” in an attempt to hide their nasty software as part of good software.

    As helpful and useful as these monitors are – I run them, and I like them – they’re still not going to 100% protect you from what happens if you run random things you download from the Internet, especially if you say “sure, why not?” when they ask for your administrator password.

    Just avoid the off-the-wall random links and wait until someone reputable has said, “yeah, that’s good” before trying it.

  • Squashing commits in the same branch

    Okay, I’m sure this is documented lots of places, but I just now figured out that git rebase -i is perfectly happy to rebase from a commit on the same branch, so if you’ve got a branch that you’d like to smoosh some commits on, do a git log to get the SHA1 of the initial commit, then

    git rebase -i <SHA1>

    It will happily show you all the commits and you can then twiddle to your heart’s content to clean up.

  • Creating a submissions log with Airtable

    Okay, so Twitter ads were useful for once. They pointed me to Airtable, which is a very slick site which is halfway between a spreadsheet and a database. You create tables, like you would in a database, but adding, updating, and deleting columns is as easy as doing the same for a column in a spreadsheet – plus you can link between tables easily.

    I was asked to put together a very simple submissions tracking application. “Show me a list of markets with deadlines, and a list of what I’ve sent where, the date I sent it, and the date I heard back”. I was able, with the help of Airtable, to put this together very quickly. (I would note that the sample sales tracking database was instrumental in understanding how to set things up; I encourage anyone who wants to try their hand at a similar application take a look at the samples, which are really well done.)

    One design note: you’ll make it easier on yourself if you go bottom-up in your design. It’s possible to go top-down, but it means creating columns that you have to backfill later with links; in my case, I created the submissions table first, which is dependent on the pieces and markets. Because I did it that way, I ran into the restriction that the first column can’t be a link, and I had to change it to a formula text column instead. I’ll show it last, because it’s easier to understand that way.

    I created a table of “pieces” – things to be submitted. This was pretty simple: a title, the kind of piece, date written, and a notes field – “everything needs a notes field.”

    Pieces table

    The kind is a selection column:

    Piece type field

    Then a table of markets – places to submit things. That had a name, a deadline date (most markets have submission deadlines; even though those change, it’s easier to put in the current one and update it later than to try to have a table of them and fill in the right one – simple is the watchword), and of course notes.

    Markets table

    Now we can properly fill out the Submissions table. The first column, because it can’t be a table link, was set to

    Piece & “sent to ” & Market & “on ” & {Date submitted}

    (This means empty records have a note that reads “sent to on”. Oh well.) Market is set to a link to the Markets table, and Piece to the Pieces table. Date submitted and response date are of course dates, and the Response is another selection column:

    Submission status field

    Plus as always, notes. Here’s the final table:

    Screenshot 2016-01-20 23.52.24

    To use this, one adds pieces to the Pieces table and markets and deadlines to the Markets table. The Markets table can be sorted by date to see where pieces can usefully be submitted, and the submissions logged in the Submissions table by clicking on the Piece and Market cells to fill them in from the available options, followed by the submission date once they’re sent out. The table has to be updated manually once a response comes back.

    The filters and sorting can be used to figure out what’s out to a given market, what’s been rejected or accepted  by a given market, what’s been sent to a market, and so on – and it provides a handy log, when sorted into submission order, of the work done; filtering even lets one see how much has been done over a period.

    This was demoed and found to be very usable. The flexibility to add more data to better slice and dice how submissions are going should make this easy for to customize as needed.

    There are some shortcomings:

    • Sharing is only by email, so I can’t put up a public copy of this Airtable to show off.
    • Lock-in: I can’t run this anywhere else; when it passes 1200 records, I’ll have to upgrade to a pay tier – and if Airtable goes poof, so does my database.
    • I can’t do full-up relational queries, like “show me all the short stories rejected by market X or Y” easily. I can use filters to select only the short stories and items with status rejected and in market X, but “or” operations are harder.
    • Automatically creating many-to-one, one-to-many, and many-to-many mappings may be possible but they’re not straightforward.
    • It’s possible to do backups, but not easy. Snapshots are OK unless there’s a data loss at Airtable, in which case the data might be permanently gone. My best option is probably to write a program using the Airtable access APIs to pull the data and back it up locally, and write another to reload it in such a case. (The tables would have to be reconstructed; there’s no way to download a schema or upload one, either.)

    Overall, because I needed to put together something simple, usable, flexible, and done  in a very short time, Airtable was the right call for this application.

     

  • “PG::UnableToSend: SSL error: sslv3 alert bad record mac”: not Postgres, not SSL

    If you see this error in the build log for your Rails app (we were getting it during our Cucumber tests):

    PG::UnableToSend: SSL error: sslv3 alert bad record mac
    : ...SQL query...

    then what you have is probably not an SSL problem, and probably not a Postgres problem. You should not turn off the SSL connection to Postgres, because more than likely not SSL causing the problem, and it’s not (directly) Postgres either, so don’t disable SSL in Postgres.

    What it most likely is, especially if it’s intermittent, is Postgres dropping a connection and then Rails trying to reuse it. This is still not Postgres’s fault.

    In my particular case, it turned out my app was using Puma to serve the app, and when you’ve got Puma in there, you need to properly configure it to work with Postgres by setting up a connection pool, like this:

    workers Integer(ENV['WEB_CONCURRENCY'] || 2)
    threads_count = Integer(ENV['MAX_THREADS'] || 1)
    threads threads_count, threads_count
    
    preload_app!
    
    rackup DefaultRackup
    port ENV['PORT'] || 3000
    environment ENV['RACK_ENV'] || 'development'
    
    on_worker_boot do
      # Worker specific setup for Rails 4.1+
      ActiveRecord::Base.establish_connection
    end
    
    on_worker_shutdown do
      ActiveRecord::Base.connection.disconnect!
    end

    This is a minimal configuration; we’re probably going to want to add more workers later. The important part is the on_worker_boot block. This tells ActiveRecord that it’s time for a new connection pool for this worker. I’ve cut the number of threads to one, as connections cannot be shared across threads.

    Once I had this in place, our build passed and all seems good. We may have to add some more workers once we go to production, but that’s a simple matter of updating the Ansible playbook to bump WEB_CONCURRENCY to an acceptable level.

    Why didn’t this show up until today? Probably because we hadn’t gotten our application’s test suite up to the point that we used up all our connections and had some drop on us. We’re still early on in our development and are just starting to hit the database a fair amount.

    I’m posting this because I searched for hours today for a solution that wasn’t just “turn off your security and it’ll be fine” – which wouldn’t have actually fixed the issue anyway.

    [Edited 11/24/15 to add on_worker_shutdown block.]

  • Note on Vagrant UID conflicts

    If you’ve restored from backup on OS X and are seeing this message when you issue vagrant up:

    The VirtualBox VM was created with a user that doesn't match the
    current user running Vagrant. VirtualBox requires that the same
    user be used to manage the VM that was created. Please re-run 
    Vagrant with that user. This is not a Vagrant issue.
    
    The UID used to create the VM was: 503
    
    Your UID is: 2612
    

    Vagrant is lying. There’s a file in the directory where you’re trying to start the Vagrant box named .vagrant/machines/default/virtualbox/creator_uid; the UID of the creator is in that file. Change it to your current UID and everything will be fine again.

  • Git: undoer of bad decisions

    I’ve been working on a major Perl code refactor since March; this is a fairly critical subsystem that unifies two slightly different ways of doing the same thing under the One True Way. I’m finally starting to come out the far end of this process, having learned several things very much the hard way.

    The biggest mistake was not working out the most stepwise possible attack on the problem. I tackled a monolith of code and created a new monolith. The changeset was over a thousand lines of moved, tweaked, hoisted, rewritten, and fixed code – a recipe for failed code review. No matter how simple it seems to you because you’ve been living with the code for months on end, the reviewer will come up against a massive wall of code and bounce off it.

    Second, I didn’t clearly establish a baseline set of tests for this code. It was, essentially, not tested. A few features were cursorily tested, but the majority of the code was uncovered. In addition, some code needed to live on the Apache web servers, and some on dumb database servers without Apache, so the structure of the code ended up being two communicating monoliths hooked up to mod_perl.

    Third, I squashed too soon. Fifty-some commits were turned into a single commit that, to be fair to me, contained only 90 lines of new code – but in fairness to everyone else, shifted a thousand lines of code around, hoisting a lot to new common libraries, and changing one set to mach another.

    The code worked, and was provably correct by my tests — but it was an utter failure as far as software engineering was concerned.

    After a very educational conversation with my tech lead, Rachel, I headed back to revisit this change and make it into something my co-workers and I could live with.

    First: build the infrastructure. I learned from the first try at the code that unit-testing it would not work well. Some of it could be unit-tested, but others simply died because they weren’t running under mod_perl, and couldn’t be mocked up to work without it. The best approach seemed to be to use a behavior-driven development approach: write the necessary interactions as interactions with an Apache instance running enough of the stack for me to test it. I decided that since, luckily, this particular part of the code had very little Javascript, and none along the critical path, I’d be able to write interaction tests using WWW::Mechanize, and verify that the right things had happened by checking over the headers and cooke jar and database.

    I started off by creating tiny commits to add a couple of support functions for the Web testing — a WWW::Mechanize subclass optimized for our site, and a couple of support methods to make constructing URLs easier.

    I then wrote a set of tests, each exercising a specific part of the code in question, and overall verifying that we had a set of tests that described the system behavior as it should be, both for good and bad inputs.

    Once this was done, I turned back to the giant monolithic commit. I knew I wanted to unsquash the commits, but I wasn’t sure how, or what was safest. After some reading, I found a good description of using git reflog and git cherry-pick to restore a branch to its unsquashed shape, and a Stack Overflow post with more hints. With a little extra consideration and checking of git cherry-pick’s options, I was able to recover the original set of commits on a new branch. Here’s how:

    1. Start with the output from git reflog. This tracks all your commits and branch switches. As long as your squashed commits point to something (in this case it’s the reflog), git won’t discard them.
    2. Scan back for the first reference to the branch that you need to unsquash. Note its SHA1, open another window, and git checkout this SHA1. You’ll now be in “detached head” state.
    3. git checkout -b some-name to create a new branch at the same point your desired branch was in when it was created.
    4. Now scroll back through the reflog, and git cherry-pick all the commits on this branch. You can save time by cherry-picking ranges (sha1..sha1), which will apply them in reflog order to the branch
    5. Continue until you’ve reached the last commit you want on this branch. You may end up skipping sets of commits if you’ve been working on other branches too; watch for branch switches away from the desired branch and then back to it.

    You may hit minor problems re-applying the commits to the branch; resolve these as you normally would, and then use git cherry-pick –continue to complete the commit or continue the set of commits.

    Once I had my original commits, I was able to cherry-pick these off the newly-recovered (and working) branch, a few at a time, and create smaller pull requests (add a test, add another test; shift code, verify the test still works; and so on).

    The final result was a series of pull requests: tests to validate the current behavior, and then a series of hoists and refactors to get half of the code to the desired point, and then another series to bring the second half in line with the first.

    Overall, this was more than a dozen pull requests, and I’m sure that my co-workers got sick of seeing more PRs from me every day, but the result was a properly-tested and properly-refactored set of code, and no one had any complaints about that.

  • Intro to Testing at SVPerl

    As promised, here are the slides from my talk: Intro to Testing.

    This version of the talk covers a little more ground than the previous version, and gives you the preferred ways of dealing with leaking output and warnings, and includes the new section on Cucumber.

    You can also play with the demo programs which live on GitHub.

  • xmonad on OS X Mavericks

    I installed XQuartz today, and while looking around for a low-distraction window manager, I came across xmonad. It looked interesting, and I started following the installation instructions and found they were out of date. Here’s an updated set of instructions for installing xmonad.

    1. Install XQuartz.
    2. Install homebrew if you don’t already have it.
    3. brew update
    4. brew install ghc cabal-install wget
    5. cabal update
    6. export LIBRARY_PATH=/usr/local/lib:/usr/X11/lib
    7. cabal install xmonad
    8. Launch XQuartz and go to Preferences (command-,). Set the following:
      •  Output
        • Enable “Full-screen mode”
      •  Input
        • Enable “Emulate three button mouse”
        • Disable “Follow system keyboard layout”
        • Disable “Enable key equivalents under X11”
        • Enable “Option keys sent Alt_L and Alt_R”
      •  Pasteboard
        • Enable all of the options

    monad has been installed in $HOME/.cabal/bin/xmonad. You now need to create an .xinitrc that will make XQuartz run monad. Edit ~/.xinitrc and add these lines:

    [[ -f ~/.Xresources ]] && xrdb -load ~/.Xresources
    xterm &
    $HOME/.cabal/bin/xmonad

    You can now launch XQuartz; nothing seems to happen, but press command-option-A and the xmonad  “desktop” (one huge xterm) will appear, covering the whole screen. Great! It’s using the default teeny and nasty xterm font, though. Let’s pretty it up a bit by making it use Monaco instead. Edit ~/.xresources and add these lines:

    xterm*background: Black
    xterm*foreground: White
    xterm*termName: xterm-color
    xterm*faceName: Monaco
    

    Quit XQuartz with command-Q, and then relaunch, then hit command-option-A again to see the XQuartz desktop. The terminal should now be displaying in Monaco.

    At this point, you should take a look at the guided tour and get familiar with xmonad. If you’re looking for a distraction-free working environment, this might be good for you. I’m going to give it a try and see how it works out.

  • Shellshock scanner

    So I had a bunch of machines with a standard naming convention that I needed to scan for the Shellshock bug. Since I just needed to run a command on each one and check the output, and I had SSH access, it seemed easy enough to put together a quick script to manage the process.

    Here’s a skeleton of that script, with the details on what machines I was logging into elided. This does a pretty reasonable job, checking 300 machines in about a minute. You need to have a more recent copy of Parallel::ForkManager, as versions prior to 1.0 don’t have the  ability to return a data structure from the child.

    $|++;
    use strict;
    use warnings;
    use Parallel::ForkManager 1.07;
    
    my $MAX_PROCESSES = 25;
    my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
    my @servers = @SERVER_NAMES;
    my %statuses;
    my @diagnostics;
    $pm-> run_on_finish (
        sub {
            my($pid, $exit_code, $ident, $exit_signal, $core_dump,
               $data_structure_reference) = @_;
            if (defined($data_structure_reference)) { 
                my ($host_id, $status, $results) = @{$data_structure_reference};
                if ($status eq 'Unknown') {
                    push @diagnostics, $host_id, $results;
                } else {
                    push @{ $statuses{$status} }, $host_id;
                }
            } else { 
                warn qq|No message received from child process $pid!\n|;
            }
        }
    );
    
    print "Testing servers: ";
    for my $host_id (@servers) {
        my $pid = $pm->start and next;
        my $result = << `EOF`;
    ssh -o StrictHostKeyChecking=no $host_id <<'ENDSSH' 2>&1
    env x='() { :;}; echo vulnerable' bash -c "echo this is a test"
    ENDSSH
    EOF
        my $status;
        if ($result =~ /Permission denied/is) {
           $status = q{Inacessible};
        } elsif ($result =~ /key verification failed/s) {
           $status = q{Key changed};
        } elsif ($result =~ /timed out/is) {
           $status = q{Timed out};
        } elsif ($result =~ /vulnerable/s) {
               $status = q{Vulnerable};
        } elsif ($result =~ /ignoring function definition attempt/s) {
           $status = q{Patched};
        } elsif ($result =~ /Name or service not known/s) {
           $status = q{Nonexistent};
        } else {
           $status = q{Unknown}
        }
        print "$host_id, ";
        $pm->finish(0, [$host_id, $status, $result]);
    }
    $pm->wait_all_children;
    print "done!\n";
    for my $status (keys %statuses) {
        print "$status: ",join(',', @{$statuses{$status}}), "\n";
    }
    print "The following hosts returned an undiagnosed status:",
          join("\n", @diagnostics), "\n";

    Note that this doesn’t test the most recent version (#3) of the bug; I have modified it slightly to test for that, but that’s a reasonable exercise for the reader.

  • ETL into WordPress: lessons learned

    I had a chance this weekend to do a little work on importing a large (4000 or so articles and pages) site into WordPress. It was an interesting bit of work, with a certain amount learning required on my part – which translated into some flailing around on to establish the toolset.

    Lesson 1: ALWAYS use a database in preference to anything else when you can. 
    I wasted a couple hours trying to clean up the data for CSV import using any of a number of WordPress plugins. Unfortunately, CSV import is half-assed at best – more like about quarter-assed, and any cleanup in Excel is excruciatingly slow.
    Some of the data came out with mismatched quotes, leaving me with aberrant entries in the spreadsheet that caused Excel to throw an out-of-memory error and refuse to process them when I tried to delete the bad rows or even cells from those bad rows.
    Even attempting to work with the CSV data using Text::CSV in Perl was problematic because the site export data (from phpMyAdmin) was fundamentally broken. I chalk that partially up to the charset problems we’ll talk about later.
    I loaded up the database using MAMP, which worked perfectly well, and was able to use Perl DBI to pull the pages and posts out without a hitch, even the ones with weirdo character set problems.
    Lesson 2: address character set problems first
    I had a number of problems with the XMLRPC interface to WordPress (which otherwise is great, see below) when the data contained improperly encoded non-ASCII characters. I was eventually forced to write code to swap the strings into hex, find the bad 3 and 4 character runs, and replace them with the appropriate Latin-1 substitutes (note that these don’t quite match that table – I had to look for the ”e2ac’ or ‘c3’ delimiter characters in the input to figure out where the bad characters were. Once I hit on this idea, it worked very well.
    Lesson 3: build in checkpointing from the start for large import jobs
    The various problems ended up causing me to repeatedly wipe the WordPress posts database and restart the import, which wasted a lot of time. I did not count that toward the overall time needed to complete when I charged my client. If I had, it would have been more like 20-24 hours instead of 6. Fortunately the imports were, until a failure occurred, a start-it-and-forget-it process. It was necessary to wipe the database between tried because WordPress otherwise very carefully preserves all the previous versions, and cleaning them out is even slower.
    I hit on the expedient of recording the row ID of an item each time one successfully imported and dumping that list out in a Perl END block. If the program fell over and exited due to a charset problem, I got a list of the rows that had processed OK which I could then add to an ignore list. Subsequent runs could simply exclude those records to get me straight to the stuff I hadn’t done yet and and to avoid duplicate entries.
    I had previously tried just logging the bad ones and going back to redo those, but it turned out to be easier to exclude than include.
    Lesson 4: WordPress::API and WordPress XMLRPC are *great*.
    I was able to find the WordPress::API module on CPAN, which provides a nice object-oriented wrapper around WordPress XMLRPC. With that, I was able to programmatically add posts and pages about as fast as I could pull them out of the local database.
    Lesson 5: XMLRPC just doesn’t support some stuff
    You can’t add users or authors via XMLRPC, sadly. In the future, the better thing to do would probably be to log directly in to the server you’re configuring, load the old data into the database, and use the PHP API calls  directly to create users and authors as well as directly load the data into WordPress. I decided not to embark on this, this time, because I’m faster and more able in Perl than I am in PHP, and I decided it would be faster to go that way than try to teach myself a new programming language and solve the problem simultaneously.
    Overall
    I’d call this mostly successful. The data made it in to the WordPress installation, and I have an XML dump from WordPress that will let me restore it at will. All of the data ended up where it was supposed to go, and it all looks complete. I have a stash of techniques and sample code to work with if I need to do it again.