I lost an important VCVRack patch a couple days before Mountain Skies 2019. It was based on a patch I’d gotten from patchstorage.com, but I couldn’t remember which patch it was. I tried paging through the patches on the infinite scroll, but it wasn’t helping me much. I knew the patch had Clocked and the Impromptu 16-step sequencer, but I couldn’t remember anything else about it after seriously altering it for my needs.
I decided the only option was going to have to be automated if I was going to find the base patch again in time to recreate my performance patch. I hammered out the following short Perl script to download the patches:
use strict; use warnings; use WWW::Mechanize; use WWW::Mechanize::TreeBuilder; $|++; my $base_url = "https://patchstorage.com/platform/vcv-rack/page/"; my $mech = WWW::Mechanize->new(autocheck=>0); WWW::Mechanize::TreeBuilder->meta->apply($mech); use constant SLEEP_TIME => 2; my $seq = 1; my $working = 1; while ($working) { print "page $seq\n"; $mech->get($base_url.$seq); sleep(SLEEP_TIME); my @patch_pages = $mech->look_down('_tag', 'a'); my @patch_links = grep { defined $_ and !m[/upload\-a\-patch\/] and !m[/login/] and !m[/new\-tutorial/] and !m[/explore/] and !m[/registration/] and !m[/new\-question/] and !m[/explore/] and !m[/platform/] and !m[/tag/] and !m[/author/] and !m[/wp\-content/] and !m[/category/] and !/\#$/ and !/\#respond/ and !/\#comments/ and !/mailto:/ and !/\/privacy\-policy/ and !/discord/ and !/https:\/\/vcvrack/ and !/javascript:/ and !/action=lostpassword/ and !/patchstorage.com\/$/ and ! $_ eq ''} map {$_->attr('href')} @patch_pages; my %links; @links{@patch_links} = (); @patch_links = keys %links; print scalar @patch_links, " links found\n"; for my $link (@patch_links) { next unless $link; print $link; my @parts = split /\//, $link; my $patch_name = $parts[-1]; if (-f "/Users/jmcmahon/Downloads/$patch_name") { print "...skipped\n"; next; } print "\n"; $mech->get($link); sleep(SLEEP_TIME); my @patches = $mech->look_down('id', "DownloadPatch"); for my $patch (@patches) { my $p_link = $patch->attr('href'); next unless $p_link; print "$patch_name..."; $mech->get($patch->attr('href')); sleep(SLEEP_TIME); open my $fh, ">", "/Users/jmcmahon/Downloads/$patch_name" or die "Can't open $patch_name: $!"; print $fh $mech->content; close $fh; print "saved\n"; } } $seq++; }
Notable items here:
- The infinite scroll is actually a chunk of Javascript wrapped around a standard WordPress page setup, so I can “page” back through the patches for Rack by incrementing the page number and pulling off the links to the actual posts with the patches in them.
- That giant grep and map cleans up the links I get off the individual pages to just the ones that are actually links to patches.
- I have a couple checks in there for “have I already downloaded this?” to allow me to restart the script if it dies partway through the process.
- The script kills itself off once it gets a page with no links on it. I haven’t actually gotten that far yet, but I think it should work.
Patchstorage folks: I apologize for scraping the site, but this is for my own use only; I”m not republishing. If I weren’t desperate to retrieve the patch for Friday I would have just left it alone.
Leave a Reply
You must be logged in to post a comment.