Dripfeed (a niche Python utility)

The last couple of weekends I’ve been working on a little Python side project: dripfeed. This is partly a tool to help me keep my webcomics addiction under control, and partly an excuse to try a bunch of Python tools that I haven’t played with yet.

The official excuse (“it scratches an itch”)

Whenever I discover a new webcomic, I’m faced with the problem of getting myself up to speed: reading the archives in order, so that I’m ready to start reading along at the rate the author is writing. Typically I solve this problem with a single binge session, reading as far into the night (and morning) as it takes. This tactic is only marginally compatible with the realities of parenthood, however.

When I discovered The Abominable Charles Christopher I tried something different: I rationed my catchup sessions, and bookmarked my progress. Of course this led to the predictable frustrations: some nights I forgot to bookmark, some days I wanted to have a quick catchup session somewhere my bookmarks aren’t synched, and I ended up with whole lot of “single-shot” bookmarks that needed tidying up.

What I really wanted was to do my catchup in the same place I read my regular webcomics: my RSS feed reader. And thus was dripfeed born.

The idea is terribly simple: it constructs an RSS feed from a comic archive,¹ and you control how often the feed updates. If this sounds useful to you, grab it and give it a spin.

The real reason (“it scratches a different itch”)

I don’t actually have a new webcomic to catch up with at the moment, but I do have a bunch of Python tools and techniques that I’ve wanted to play with but that we don’t need at work. Putting this package together let me try out:

cookiecutter (a templating system for initialising new projects)
docopt (commandline arg parsing, configured by writing the --help text)
Travis CI testing (on a mercurial project hosted on bitbucket!)
supporting both Python 2 and Python 3 from the same codebase

Here are some lessons I learned on the way:

Cookiecutter is cool

I used someone else’s template, and it wasn’t set up quite how I wanted; I probably could have set up my repository myself in about the same time I spent rearranging theirs, but this way I got reminded about the bits I might have forgotten, and I also got introduced to some new utilities. Next time, though, I’m writing a template myself.

Docopt is less cool in practise than it sounds in theory

In principle the idea behind docopt is great: just write your --help text output (including usage examples) and it will produce a commandline parser to match.

In practise, though, I found that it wasn’t quite flexible enough in the places I needed it to be. The dripfeed interface has a couple of commands (init, update, list, info) which take arguments and options, and there are a few global options for controlling logging and verbosity. The suggested way to handle this structure with docopt is to make separate sub-parsers for each command but this seemed overkill for the very small command set dripfeed has. I hacked together an alternative, but it felt like I was working against the grain of the library in doing so, and the result isn’t as friendly as I would like.

There’s a second way in which I started falling out of love with docopt in this project: it’s not composable. I added a set of options for controlling logging: --quiet, --verbose, --log-level, and --log for specifying an output file.² Ideally I would like to extract the handling for these into a package which I can drop into any of my commandline scripts, so that I don’t have to repeat myself whenever I want these (very standard-looking) options. But docopt makes that difficult: the spec for the options has to be intermingled with the spec for the rest of the commandline interface. Next side project I play with I’ll be looking at click instead, for this reason.

I need to either learn Git, or raise my Hg game

The commit history for this project is an embarassment. I’ve always been a fan of the mercurial philosophy that “history is sacred”;³ looking at some of these commits, though, I start to appreciate git’s philosophy that “history is sacred”.⁴ Git makes it easy to commit often for trivial changes (which is good for short-term development) and still to squash those commits into reasonable-sized coherent updates before pushing them out into the world (which is good for long-term not looking like a doofus). I understand that modern mercurial can do this too, but it’s certainly not the way I use it at the moment.

Contributing to this feeling is the fact that Travis CI is so self-evidently the obvious choice for CI testing… but it doesn’t support bitbucket, which is equally self-evidently the obvious place to host mercurial projects. I’ve “solved” this problem this time around by making a github clone of the repository, which I update with the hg-git mercurial extension, but the setup is decidedly rickety.

Supporting both Python 2 and Python 3 is … interesting

Name changes are pretty cleanly dealt with using the six package, but more finegrained changes still need some hackery; generally, though, this part of the process is pretty simple.

What’s much more interesting is dealing with the changes in string types and semantics. Being forced to be explicit about the distinction between bytes and unicode strings for Python 3 actually helped me catch what I think would have been a very subtle bug in the original code — one that it’s very unlikely would ever have been discovered⁵ but still I think this counts as a point for the New Deal.⁶

Was it worth it?

Well, I’m currently dripfeeding Gunnerkrigg Court (as a running test and for nostalgia value). So far it’s working beautifully.⁷ And I tuned my toolset, which was a whole lot of fun.

Notes:

In fact, from any sequence of html pages all of which have a “next” link that can be reliably extracted with an XPath expression. [↪]
Yes, this is total overkill for this tiny little script. I said I was playing with tools, right? [↪]
It should record everything that happened in the project. [↪]
It should be a readable summary of significant changes. [↪]
I don’t expect this tool to get heavy use. [↪]
The bug was: I was loading RSS files with open('r+') instead of with open('r+b'). I think this could have caused problems with a non-unicode-encoded RSS feed, but I’m not 100% sure. [↪]
I am feeling the pain of not scraping the images into the feed, but that’s a deliberate decision that I don’t see changing. [↪]