CrashPlan folder date recovery

The situation: a friend had a MacBook Air whose motherboard went poof. Fortunately she had backups (almost up-to-date) in CrashPlan, so she did a restore of her home directory, which worked fine in that she had her files, but not so fine in that all the folder last-changed dates now ran from the current date to a couple days previous (it takes a couple days to recover ~60GB of data).

This was a problem for her, because she partly uses the last-changed date on her folders to help her keep organized. “When was the last time I did anything on project X?” (I should note: she uses Microsoft Word and a couple different photo library managers, so git or the equivalent doesn’t work well for her workflow. She is considering git or the like now for her future text-based work…)

A check with CrashPlan let us know that they did not track folder update dates and couldn’t restore them. We therefore needed to come up with a way to re-establish as best we could what the dates were before the crash.

Our original thought was simply to start at the bottom and recursively restore the folder last-used dates using touch -t, taking the most-recently-updated file in the folder as the folder’s last-updated date. Some research and thought turned up the following:

  • Updating a file updated the folder’s last-updated date.
  • Updating a folder did not update the containing folder’s last-updated date.

This meant that we couldn’t precisely guarantee that the folder’s last-updated date would accurately reflect the last update of its contents. We decided in the end that the best strategy for her was to “bubble up” the last-updated dates by checking both files and folders contained in a subject folder. This way, if a file deep in the hierarchy is updated, but the files and folders above it have not been, the file’s last-updated date is applied to its containing folder, and subsequently is applied also to each containing folder (since we’re checking both files and folders, and there’s always a folder that has the last-updated date that corresponds to the one on the deeply-nested file). This seemed like the better choice for her as she had no other records of what had been worked on when, and runs a very nested set of folders.

If you were running a flatter hierarchy, only updating the folders to the last-updated date of the files might be a better choice.  Since I was writing a script to do this anyway, it seemed reasonable to go ahead and implement it so that you could choose to bubble up or not as you liked, and to also allow you to selectively bubble-up or not in a single directory.

This was the genesis of date-fixer.pl. Here’s the script. A more detailed example of why neither approach to restoring the folder dates is perfect is contained in the POD.
[wpdm_file id=1]

use strict;
use warnings;
use 5.010;

=head1 NAME

date-fixer.pl - update folder dates to match newest contained file

=head1 SYNOPSIS

date-fixer.pl --directory top_dir_to_fix
             [--commit]
             [--verbose]
             [--includefolders]
             [--single]

=head1 DESCRIPTION

date-fixer.pl is meant to be used after you've used something like CrashPlan
to restore your files. The restore process will put the files back with their
proper dates, but the folders containing those files will be updated to the
current date (the last time any operation was done in this folder -
specifically, putting the files back).

date-fixer.pl's default operation is to tell you what it would do; if you want
it to actually do anything, you need to add the --commit argument to force it
to actually execute the commands that change the folder dates.

If you supply the --verbose argument, date-fixer.pl will print all the commands
it is about to execute (and if you didn't specify --includefolders, warn you
about younger contained folders - see below). You can capture these from STDOUT
and further process them if you like.

=head2 Younger contained folders and --includefolders

Consider the following:

    folder1           (created January 2010 - date is April 2011)
        veryoldfile 1 (updated March 2011)
        oldfile2      (updated April 2011)
        folder2       (created June 2012 - date is July 2012)
            newfile   (updated July 2012)

If we update folder1 to only match the files within it, we won't catch that
folder2's date could actually be much more recent that that of either of the
files directly contained by folder1. However, if we use contained folder dates
as well as contained file dates to calculate the "last updated" date of the
current folder, we may make the date of the current folder considerably more
recent than it may actually have been.

Example: veryoldfile1 and oldfile2 were updated in March and April 2011.
Folder2 was updated in June 2012, and newfile was added to in in July 2012.
The creation of folder2 updates the last-updated date of folder1 to June 2012;
the addition of newfile updates folder2's last-updated date to that date --
but the last-updated date of folder1 does not change - it remains June 2012.

If we restore all the files and try to determine the "right" dates to set the
folder update dates to, we discover that there is no unambiguous way to decide
what the "right" dates are. If we use the file dates, alone, we'll miss that
folder2 was created in June (causing folder1 to update to June); if we use
both file and folder dates, we update folder1 to July 2012, which is not
accurate either.

date-fixer.pl takes a cautious middle road, defaulting to only using the files
within a folder to update that folder's last-modified date. If you prefer to
ensure that the newest date buried in a folder hierarchy always "bubbles up"
to the top, add the --includefolders option to the command.

date-fixer will, in verbose mode, print a warning for every folder that
contains a folder younger than itself; you may choose to go back and adjust
the dates on those folders with

date-fixer.pl --directory fixthisone --includefolders --single

This will, for this one folder, adjust the folder's last-updated date to the
most recent date of any of the items contained in it.

=head1 USAGE

To fix all the dates in a directory and all directories below it, "bubbling
up" dates from later files:

    date-fixer.pl --directory dir --commit --includefolders

To fix the dates in just one directory based on only the files in it and
ignoring the dates on any directories it contains:

    date-fixer.pl --directory dir --commit --single

To see in detail what date-fixer is doing while recursively fixing dates,
"bubbling up" folder dates:

    date-fixer.pl --directory dir --commit --verbose --includefolders

=head1 NOTES

"Why didn't you use File::Find?"

I conceived the code as a simple recursion; it seemed much easier to go ahead and read the directories
myself than to go through the mental exercise of transforming the treewalk into an iteration such as I
would need to use File::Find instead.

=head1 AUTHOR

Joe McMahon, mcmahon@cpan.org

=head1 LICENSE

This code is licensed under the same terms as Perl itself.

=cut

use Getopt::Long;
use Date::Format;

my($commit, $start_dir, $verbose, $includefolders, $single);
GetOptions(
    'commit' => \$commit,
    'directory=s' => \$start_dir,
    'verbose|v' => \$verbose,
    'includefolders' => \$includefolders,
    'single' => \$single,
);

$start_dir or die "Must specify --directory\n";

set_date_from_contained_files($start_dir);

sub set_date_from_contained_files {
    my($directory) = @_;
    return unless defined $directory;

    opendir my $dirhandle, $directory
        or die "Can't read $directory: $!\n";
    my @contents;
    push @contents, $_ while readdir($dirhandle);
    closedir $dirhandle;

    @contents = grep { !/\.$|\.\.$/ } @contents;
    my @dirs = grep { -d "$directory/$_" } @contents;

    my %dirmap;
    @dirmap{@{[@dirs]}} = ();

    my @files = grep { !exists $dirmap{$_}} @contents;

    # Recursively apply the same update criteria unless --single is on.
    unless ($single) {
        foreach my $dir (@dirs) {
            set_date_from_contained_files("$directory/$dir");
        }
    }

    my $most_recent_date;
    if (! $includefolders) {
         $most_recent_date = most_recent_date($directory, @files);
         my $most_recent_folder = most_recent_date($directory, @dirs);
         warn "Folders in $directory are more recent ($most_recent_folder) than the most-recent file ($most_recent_date)\n";
    }
    else {
         $most_recent_date = most_recent_date($directory, @files, @dirs);
    }

    if (defined $most_recent_date) {
        (my $requoted = $directory) =~ s/'/\\'/g;
        my @command = (qw(touch -t), $most_recent_date, $directory);
        print "@command\n" if $verbose;
        system @command if $commit;
    }
    else {
        warn "$directory unchanged because it is empty\n" if $verbose;
    }
}

sub most_recent_date {
    my ($directory, @items) = @_;
    my @dates =     map  { (stat "$directory/$_")[9] } @items;
    my @formatted = map  { time2str("%Y%m%d%H%M.%S", $_) } @dates;
    my @ordered =   sort { $a lt $b } @formatted;
    return $ordered[0];
}

Tags: , ,

Reply