• Experiments in Automation

    With the knock out success of my ttrss service rollout, I thought it might be fun to look into other self-hosted services that I might find useful. Now, let’s be very clear, this was, on its face, entirely a make-work project to give me something fun to do with my spare time. But the outcome has proven surprisingly useful!

    It all began when I came across Huginn. Huginn is an open source implementation of the kind of service offered by IFTTT, Zapier, and I’m sure others (Microsoft Flow popped up while I was finding the links to those services). The general idea is that these services allow you to plumb or connect various other services together to effect an automated workflow. For example, you might receive tweets on one end and shoot them off to, say, a Slack channel on the other.

    Okay, so what would I do with this?

    Well, as a bit of background, I’m an avid reader of Matt Levine. Mr. Levine offers a newsletter that one can subscribe to that is delivered daily to ones email inbox. Notably, if you want to read this content on the Bloomberg website, it’s hidden behind a decided effective paywall that happens to defeat web scrapers. That means getting this content into my RSS feed isn’t directly possible.

    But wouldn’t it be nice if I could take those emails, scrape out the content, and republish them to a private RSS feed that I could incorporate into ttrss?

    Well, with Huginn I can do just that!

    First, I set up a rule in gmail to apply a label to, and then archive, the newsletter emails.

    Then, I set up a Huginn pipeline that does the following:

    1. Use the Imap Folder Agent to connect to my gmail account and retrieve any new emails with the label applied (making sure to use the text/html MIME enclosure so the full message body is available).
    2. Use the Website Agent to parse the email body and pull out the link to the article on Bloomberg.
    3. Use the Data Output agent to republish the content as an RSS feed.

    Finally, in ttrss I subscribe to the feed and… voila!

    Matt Levine RSS Feed

    Now that is pretty darn useful!

    Since then I’ve also set up Gotify and integrated it with Huginn and other services in my home to notify me when, for example, my offsite backup process is completed (and yeah, I could do that with just Gotify, but piping the events through Huginn gives me more flexibility later to do other things with them… like… publish them to an RSS feed? I dunno…).

    This is some very nice infrastructure! I’m now very curious how else I might leverage this stuff, or what other services I could deploy (some of which are listed here)…

  • Homegrown Backups Redux

    Over the last couple of years I’ve written extensively about backup solutions. The whole thing started as I tried to find a use for my NUC, which I initially turned into a Hackintosh, a solution that was, frankly, in search of a problem.

    macOS ran fairly nicely on the thing, but eventually I ran into issues which ultimately lead me to just converting the thing over to an Ubuntu 18.04 installation. In the end, Linux is just, at least in my experience, a much better home server OS for mixed-OS environments (taking the SMB issues on the Mac as a perfect example).

    Anyway, I still needed a backup solution, and I originally settled on a combination of a few things:

    • For Windows machines ** A Samba file share on the server ** Windows 10 built in file copy backup capabilities
    • For Linux machines ** Syncthing for real-time storage redundacy ** rclone for transferring backups to Google Drive for off-site replication.

    The whole thing stalled out when I screwed up the rclone mechanism and inadvertently deleted a bunch of items in my broader Google Drive account.

    Oops.

    And so I became gun shy and paused the whole thing.

    The other big change is I switched over to Ubuntu on my X1 Carbon, which meant that I now needed to sort out the backup solution for a Linux client as well. Syncthing is great for redundancy, but it’s not itself a backup solution.

    So a couple of things changed, recently, that allowed me to close those gaps and resolve those issues.

    First off, when it comes to rclone and Google Drive, I enabled two features:

    • Set the authentication scope to “drive.file”
    • Set the root_folder_id to the location on Drive where I want the backups stored

    The first setting authenticates rclone to only be able to manipulate files it creates. So Google Drive should prevent rclone from accidentally touching anything else but the backups it’s transferring.

    The second setting is belt-and-suspenders. By setting the root_folder_id, even if Google Drive somehow screwed up, rclone would never look outside of the target folder I selected.

    So, the accidental deletion problem should be well behind me.

    The issue of backups with Linux was to expand my use of Syncthing to include additional folders on my laptop I want stored on my backup server. This ensures that my laptop is always maintaining a real-time replica of critical data in another location.

    Finally, I adopted Restic for producing snapshot backups of content that I replicate to my backup server.

    Basically, I create a local replica of data on the server (either with Syncthing, rclone, lftp, or other mechanisms) and then use Restic to produce a backup repository from those local copies. Restic then takes care of de-duplication, snapshotting, restoration, and other mechanisms. The Restic repositories then get pushed out to Google Drive via rclone.

    I’ve also extended this backup strategy to the contents of my linode instance (where this blog is hosted), and to Lenore’s blog. Specifically, I use rclone (or lftp) to create/update a local copy of the data on those respective servers, and then use Restic to produce a backup repository from those copies. And, again, those repositories are then pushed out to Drive.

    Overall, I think this stack should work nicely! And I like that it neatly separates the various stages of the process (data transfer, backup, off-siting) into a set of discrete stages that I can independently monitor and control.

    Update:

    Just a quick handy tidbit: When using rclone for backup purposes like this, it’s a good idea to create a custom OAuth API key for use with Google Drive. By default rclone uses a default API key shared by all other rclone users, which means you’re sharing the API quota as well. As a result, you get much better performance with your own key (though, unless you’re willing to jump through a lot of hoops, you’re stuck with “drive.file” scope… which, again, for this purpose isn’t just fine, it’s desirable).