With the knock out success of my ttrss service rollout, I thought it might be fun to look into other self-hosted services that I might find useful. Now, let’s be very clear, this was, on its face, entirely a make-work project to give me something fun to do with my spare time. But the outcome has proven surprisingly useful!
It all began when I came across Huginn. Huginn is an open source implementation of the kind of service offered by IFTTT, Zapier, and I’m sure others (Microsoft Flow popped up while I was finding the links to those services). The general idea is that these services allow you to plumb or connect various other services together to effect an automated workflow. For example, you might receive tweets on one end and shoot them off to, say, a Slack channel on the other.
Okay, so what would I do with this?
Well, as a bit of background, I’m an avid reader of Matt Levine. Mr. Levine offers a newsletter that one can subscribe to that is delivered daily to ones email inbox. Notably, if you want to read this content on the Bloomberg website, it’s hidden behind a decided effective paywall that happens to defeat web scrapers. That means getting this content into my RSS feed isn’t directly possible.
But wouldn’t it be nice if I could take those emails, scrape out the content, and republish them to a private RSS feed that I could incorporate into ttrss?
Well, with Huginn I can do just that!
First, I set up a rule in gmail to apply a label to, and then archive, the newsletter emails.
Then, I set up a Huginn pipeline that does the following:
- Use the Imap Folder Agent to connect to my gmail account and retrieve any new emails with the label applied (making sure to use the text/html MIME enclosure so the full message body is available).
- Use the Website Agent to parse the email body and pull out the link to the article on Bloomberg.
- Use the Data Output agent to republish the content as an RSS feed.
Finally, in ttrss I subscribe to the feed and… voila!
Now that is pretty darn useful!
Since then I’ve also set up Gotify and integrated it with Huginn and other services in my home to notify me when, for example, my offsite backup process is completed (and yeah, I could do that with just Gotify, but piping the events through Huginn gives me more flexibility later to do other things with them… like… publish them to an RSS feed? I dunno…).
This is some very nice infrastructure! I’m now very curious how else I might leverage this stuff, or what other services I could deploy (some of which are listed here)…
I’ve been a huge fan of RSS for a very long time now. For those not aware, RSS is a protocol that allows websites (news organizations, blogs, aggregators, etc) to push out a feed of content as they publish it. As an example, the CBC publishes a list of RSS feeds that any reader can subscribe to.
The reader then uses an RSS feed reader to subscribe to the feed and consume it.
Now, that by itself sounds just okay, but the real magic happens when you subscribe to a large number of feeds. What most folks don’t realize–even those familiar with RSS–is that RSS feeds are extremely common and widely available across many web properties. In my case, I subscribe to a number of news sources (CBC, BBC, NYT, etc), some technology aggregators (Hacker News, Reddit Programming), plus a number of random blogs and other outlets.
The RSS feed reader can then combine these streams of content in various ways. Personally, my preference is to just see a single list of all the most recently published articles that I can then scroll through. The best services allow me to consume that stream of content on multiple devices–in particular, on a desktop or on a phone–so that no matter where I am, my RSS feeds are at my fingertips, showing me a stream of all the content I’ve chosen to subscribe to.
Ultimately, what this amounts to is something like the Facebook news feed, except I’m personally selecting my sources rather than having content selected for me by some proprietary algorithm on a social network.
Now up until 2013 folks widely agreed that Google Reader was one of the best feed readers out there.
Unfortunately, Google, in their infinite wisdom, decided to shut Google Reader down.
Fortunately, there are plenty of fine alternatives out there, and for a very long time Feedly was my tool of choice. The web interface is clean and functional, the Android app is excellent, and it has a lot of interesting features if you’re willing to pay for their subscription. If you’re interested in dipping a toe into the RSS waters, I highly recommend it!
However, there are a couple of things about RSS that can be a bit of a nuisance.
First, news sources frequently only publish their article titles, perhaps a brief excerpt, and a link, so that you have to leave the feed reader and visit their website to consume the content. I can understand why that is (i.e. ad revenue), but it’s a real pain. First, the context switch to the website is always a bit jarring (and on a phone, a bit slow); each site has a different layout which means the reading experience isn’t consistent; and if I want to read the content offline, I’m out of luck.
Second, some types of feeds, notably Reddit and Hacker News, publish links to their aggregation service rather than to the article content itself, often without any excerpt at all. The result is a rather bland, difficult-to-use feed.
Third, call me paranoid, but I’m not thrilled about having a third party tracking what I’m reading.
And then I discovered tt-rss.
tt-rss is a self-hosted RSS feed service. This means you stand up a server and run the application yourself (which, unfortunately, means the barrier to entry is pretty high, even for technical folks). It ships with a decent out-of-the-box web UI (that can be spruced up with themes), and it can be paired with an Android app for the phone. Once set up, the experience is vaguely analogous to Feedly: You subscribe to feeds in tt-rss, then view them on the web browser or mobile device.
But the real magic with tt-rss is the fact that it’s open source and extensible with plugins. And that means you can customize.
And customize I did!
The first thing I did was set up the mercury_fulltext plugin for tt-rss (and its associated service). The mercury_fulltext plugin processes each entry in the RSS feed and replaces the included excerpt with a stripped down version of the content directly from the source.
This means that the full article content is inlined right into the feed, and so there’s need to leave the RSS client to read the content. And since the presentation is now controlled by the reader, you get a nice, consistent reading experience right in the app.
But it gets better!
The second thing I did was hack the mercury_fulltext plugin so it understands Reddit and Hacker News feeds. So, for those sources, the original source article’s content is inlined right into the RSS feed. To make it clear where the content came from, the URL for the source is displayed at the top of the article. And finally, the Reddit or Hacker News comment link is appended to the end of the article so that I can get to their site to see the user commentary.
This wouldn’t be possible with a closed source application or web service. The fact that the whole thing runs on a server I control means I can make the experience my own.
And I also retain control over my data.
Here’s what it looks like:
As an aside, while working on this project, I realized that the RSS feed for this blog was horribly broken, so that’s fixed now. For folks who were previously subscribed (LOL, as if you exist…), apologies for the sudden flood of entries. For those not subscribed, give it a shot!
Many years ago I experimented with running IPv6 in my home network (dual-stacked, not IPv6-only… I’m not that crazy!). At the time this was mainly an intellectual exercise. While a lot of major services already offered IPv6 (including Google, Facebook, and Netflix), the big draw of v6 is the ability to completely do away with NAT and simplify access to services and P2P applications running out of my home. But without broad v6 support, even if my home network was available via v6, the rest of the world wouldn’t be able to access it, which pretty severely curtailed the utility of the whole thing.
But, it was still an interesting exercise!
Until, that is, Netflix started cracking down on VPNs.
The way v6 was deployed in my network was via a tunnel supplied by Hurricane Electric. That tunnel terminated in California, and, while not intentional, it allowed me to watch US Netflix in Canada.
That is until Netflix realized people were abusing those tunnels and started blocking inbound traffic via HE.
I considered potential workarounds, but I could never figure out a satisfying solution (in large part thanks to closed devices like Chromecasts).
And so I shut down v6 in my network. While, previously, v6 didn’t provide a lot of value, it also didn’t cause me any problems. Once this issue surfaced, it was no longer worth the effort.
Recently I decided to take another look at the situation to see if anything had changed.
Well, unfortunately Netflix still blocks traffic coming from Hurricane Electric traffic originating in the US.
However, it turns out, back in 2013, HE added new Points of Presence (POPs) in both Calgary and Manitoba. That meant I could set up a tunnel with an exit point inside the country.
Would Netflix block that?
It turns out, the answer is: No!
So I now have IPv6 back up in my home network.
But has the connectivity story changed? Yes!
Much to my astonishment, I discovered that in the last couple of years, AT&T, Rogers, and Telus have all deployed native IPv6 inside their networks. That means that, when I’m out and about in both Canada and the US, I have direct v6 connectivity back to my home network! Even my mother-in-law’s house has access thanks to her Telus internet package.
That’s a huge expansion in coverage!
In fact, ironically enough, of the places I frequent, the only location that lacks v6 connectivity is my workplace. Go figure. But, in that case, I can always just tunnel through my linode VPS, which has had v6 connectivity for many many years.
IPv6 adoption may be taking a while, but it is happening!
Over the last couple of years I’ve written extensively about backup solutions. The whole thing started as I tried to find a use for my NUC, which I initially turned into a Hackintosh, a solution that was, frankly, in search of a problem.
macOS ran fairly nicely on the thing, but eventually I ran into issues which ultimately lead me to just converting the thing over to an Ubuntu 18.04 installation. In the end, Linux is just, at least in my experience, a much better home server OS for mixed-OS environments (taking the SMB issues on the Mac as a perfect example).
Anyway, I still needed a backup solution, and I originally settled on a combination of a few things:
- For Windows machines ** A Samba file share on the server ** Windows 10 built in file copy backup capabilities
- For Linux machines ** Syncthing for real-time storage redundacy ** rclone for transferring backups to Google Drive for off-site replication.
The whole thing stalled out when I screwed up the rclone mechanism and inadvertently deleted a bunch of items in my broader Google Drive account.
And so I became gun shy and paused the whole thing.
The other big change is I switched over to Ubuntu on my X1 Carbon, which meant that I now needed to sort out the backup solution for a Linux client as well. Syncthing is great for redundancy, but it’s not itself a backup solution.
So a couple of things changed, recently, that allowed me to close those gaps and resolve those issues.
First off, when it comes to rclone and Google Drive, I enabled two features:
- Set the authentication scope to “drive.file”
- Set the root_folder_id to the location on Drive where I want the backups stored
The first setting authenticates rclone to only be able to manipulate files it creates. So Google Drive should prevent rclone from accidentally touching anything else but the backups it’s transferring.
The second setting is belt-and-suspenders. By setting the root_folder_id, even if Google Drive somehow screwed up, rclone would never look outside of the target folder I selected.
So, the accidental deletion problem should be well behind me.
The issue of backups with Linux was to expand my use of Syncthing to include additional folders on my laptop I want stored on my backup server. This ensures that my laptop is always maintaining a real-time replica of critical data in another location.
Finally, I adopted Restic for producing snapshot backups of content that I replicate to my backup server.
Basically, I create a local replica of data on the server (either with Syncthing, rclone, lftp, or other mechanisms) and then use Restic to produce a backup repository from those local copies. Restic then takes care of de-duplication, snapshotting, restoration, and other mechanisms. The Restic repositories then get pushed out to Google Drive via rclone.
I’ve also extended this backup strategy to the contents of my linode instance (where this blog is hosted), and to Lenore’s blog. Specifically, I use rclone (or lftp) to create/update a local copy of the data on those respective servers, and then use Restic to produce a backup repository from those copies. And, again, those repositories are then pushed out to Drive.
Overall, I think this stack should work nicely! And I like that it neatly separates the various stages of the process (data transfer, backup, off-siting) into a set of discrete stages that I can independently monitor and control.
Just a quick handy tidbit: When using rclone for backup purposes like this, it’s a good idea to create a custom OAuth API key for use with Google Drive. By default rclone uses a default API key shared by all other rclone users, which means you’re sharing the API quota as well. As a result, you get much better performance with your own key (though, unless you’re willing to jump through a lot of hoops, you’re stuck with “drive.file” scope… which, again, for this purpose isn’t just fine, it’s desirable).