Category: Git

  • Fixing a commit in the middle of a set

    Another tip for those who’ve needed to do this: let’s say you’ve created a feature branch and are adding tests to the code, and you realize that one of your tests is incorrect several commits further on. What do you do?

    If this is a plain old feature branch, you can just make the fixup commit and have two commits for that test. This is perfectly fine.

    If however, you’re constructing a series of commits to be cherry-picked later, it’s better to have all the related changes together.

    You can do this by doing a git log and capturing the output in order back to the incomplete commit. Save that output, then git reset --hard oldcommit.

    The incomplete commit is now the current one, so you can make any fixes you need, git add them, and then git commit --amend to include them in the current (formerly incomplete) commit.

    Now go back to the log output, and find the commit just after the current one; record that, and then record the old HEAD commit. git cherry-pick nextcommit..oldhead will then reapply all of the commits following the one you just repaired, and your branch will be back where it was, with the one incorrect commit repaired.

  • Avoiding gigantic changesets when refactoring

    This may seem obvious to many folks but for those who haven’t had to do something like this before, but I think it’ll be very illuminating to those who haven’t.

    The starting point and a naive solution

    I’m currently working on a cleanup in our codebase at ZipRecruiter, where we want to remove dependencies shared between two parts of the codebase into a common area. The part I’m working on now is well-defined, but touches a very large number of modules throughout the codebase.

    If I chose a naive way of doing the port, I’d do the the following:

    1. Port the functions to be moved to a new module in the common area and ensure they’re all under test.
    2. Go through the rest of the codebase, find all references to the ported functions, and update the modules to use the new module instead of the old.
    3. Remove the functions from the old module now that all the other code has been ported.

    If the functions in the old module aren’t used much, then step 2 is fine as is, but this is a utility module that’s used in a lot of modules. This makes step 2 a gigantic risk point because the changeset for a large number of modules essentially has to be done all at once, whether we do it as one big commit or a lot of small ones that still have to be applied at the same time.

    Doing these all at once is going to be a code review and QA nightmare, so this isn’t going to fly.  Unfortunately, I started on the port and didn’t come to this understanding until I’d gotten pretty far into the changes. I needed to save the work I’d done, but reuse it in a way that was safe and not an unbearable load for my reviewers and QA.

    (more…)

  • Squashing commits in the same branch

    Okay, I’m sure this is documented lots of places, but I just now figured out that git rebase -i is perfectly happy to rebase from a commit on the same branch, so if you’ve got a branch that you’d like to smoosh some commits on, do a git log to get the SHA1 of the initial commit, then

    git rebase -i <SHA1>

    It will happily show you all the commits and you can then twiddle to your heart’s content to clean up.

  • Git: undoer of bad decisions

    I’ve been working on a major Perl code refactor since March; this is a fairly critical subsystem that unifies two slightly different ways of doing the same thing under the One True Way. I’m finally starting to come out the far end of this process, having learned several things very much the hard way.

    The biggest mistake was not working out the most stepwise possible attack on the problem. I tackled a monolith of code and created a new monolith. The changeset was over a thousand lines of moved, tweaked, hoisted, rewritten, and fixed code – a recipe for failed code review. No matter how simple it seems to you because you’ve been living with the code for months on end, the reviewer will come up against a massive wall of code and bounce off it.

    Second, I didn’t clearly establish a baseline set of tests for this code. It was, essentially, not tested. A few features were cursorily tested, but the majority of the code was uncovered. In addition, some code needed to live on the Apache web servers, and some on dumb database servers without Apache, so the structure of the code ended up being two communicating monoliths hooked up to mod_perl.

    Third, I squashed too soon. Fifty-some commits were turned into a single commit that, to be fair to me, contained only 90 lines of new code – but in fairness to everyone else, shifted a thousand lines of code around, hoisting a lot to new common libraries, and changing one set to mach another.

    The code worked, and was provably correct by my tests — but it was an utter failure as far as software engineering was concerned.

    After a very educational conversation with my tech lead, Rachel, I headed back to revisit this change and make it into something my co-workers and I could live with.

    First: build the infrastructure. I learned from the first try at the code that unit-testing it would not work well. Some of it could be unit-tested, but others simply died because they weren’t running under mod_perl, and couldn’t be mocked up to work without it. The best approach seemed to be to use a behavior-driven development approach: write the necessary interactions as interactions with an Apache instance running enough of the stack for me to test it. I decided that since, luckily, this particular part of the code had very little Javascript, and none along the critical path, I’d be able to write interaction tests using WWW::Mechanize, and verify that the right things had happened by checking over the headers and cooke jar and database.

    I started off by creating tiny commits to add a couple of support functions for the Web testing — a WWW::Mechanize subclass optimized for our site, and a couple of support methods to make constructing URLs easier.

    I then wrote a set of tests, each exercising a specific part of the code in question, and overall verifying that we had a set of tests that described the system behavior as it should be, both for good and bad inputs.

    Once this was done, I turned back to the giant monolithic commit. I knew I wanted to unsquash the commits, but I wasn’t sure how, or what was safest. After some reading, I found a good description of using git reflog and git cherry-pick to restore a branch to its unsquashed shape, and a Stack Overflow post with more hints. With a little extra consideration and checking of git cherry-pick’s options, I was able to recover the original set of commits on a new branch. Here’s how:

    1. Start with the output from git reflog. This tracks all your commits and branch switches. As long as your squashed commits point to something (in this case it’s the reflog), git won’t discard them.
    2. Scan back for the first reference to the branch that you need to unsquash. Note its SHA1, open another window, and git checkout this SHA1. You’ll now be in “detached head” state.
    3. git checkout -b some-name to create a new branch at the same point your desired branch was in when it was created.
    4. Now scroll back through the reflog, and git cherry-pick all the commits on this branch. You can save time by cherry-picking ranges (sha1..sha1), which will apply them in reflog order to the branch
    5. Continue until you’ve reached the last commit you want on this branch. You may end up skipping sets of commits if you’ve been working on other branches too; watch for branch switches away from the desired branch and then back to it.

    You may hit minor problems re-applying the commits to the branch; resolve these as you normally would, and then use git cherry-pick –continue to complete the commit or continue the set of commits.

    Once I had my original commits, I was able to cherry-pick these off the newly-recovered (and working) branch, a few at a time, and create smaller pull requests (add a test, add another test; shift code, verify the test still works; and so on).

    The final result was a series of pull requests: tests to validate the current behavior, and then a series of hoists and refactors to get half of the code to the desired point, and then another series to bring the second half in line with the first.

    Overall, this was more than a dozen pull requests, and I’m sure that my co-workers got sick of seeing more PRs from me every day, but the result was a properly-tested and properly-refactored set of code, and no one had any complaints about that.