Dripfeed (a niche Python utility)
The last couple of weekends I’ve been working on a little Python side project: dripfeed
. This is partly a tool to help me keep my webcomics addiction under control, and partly an excuse to try a bunch of Python tools that I haven’t played with yet.
The official excuse (“it scratches an itch”)
Whenever I discover a new webcomic, I’m faced with the problem of getting myself up to speed: reading the archives in order, so that I’m ready to start reading along at the rate the author is writing. Typically I solve this problem with a single binge session, reading as far into the night (and morning) as it takes. This tactic is only marginally compatible with the realities of parenthood, however.
When I discovered The Abominable Charles Christopher I tried something different: I rationed my catchup sessions, and bookmarked my progress. Of course this led to the predictable frustrations: some nights I forgot to bookmark, some days I wanted to have a quick catchup session somewhere my bookmarks aren’t synched, and I ended up with whole lot of “single-shot” bookmarks that needed tidying up.
What I really wanted was to do my catchup in the same place I read my regular webcomics: my RSS feed reader. And thus was dripfeed
born.
The idea is terribly simple: it constructs an RSS feed from a comic archive,1 and you control how often the feed updates. If this sounds useful to you, grab it and give it a spin.
The real reason (“it scratches a different itch”)
I don’t actually have a new webcomic to catch up with at the moment, but I do have a bunch of Python tools and techniques that I’ve wanted to play with but that we don’t need at work. Putting this package together let me try out:
cookiecutter
(a templating system for initialising new projects)docopt
(commandline arg parsing, configured by writing the--help
text)- Travis CI testing (on a mercurial project hosted on bitbucket!)
- supporting both Python 2 and Python 3 from the same codebase
Here are some lessons I learned on the way:
Cookiecutter is cool
I used someone else’s template, and it wasn’t set up quite how I wanted; I probably could have set up my repository myself in about the same time I spent rearranging theirs, but this way I got reminded about the bits I might have forgotten, and I also got introduced to some new utilities. Next time, though, I’m writing a template myself.
Docopt is less cool in practise than it sounds in theory
In principle the idea behind docopt
is great: just write your --help
text output (including usage examples) and it will produce a commandline parser to match.
In practise, though, I found that it wasn’t quite flexible enough in the places I needed it to be. The dripfeed
interface has a couple of commands (init
, update
, list
, info
) which take arguments and options, and there are a few global options for controlling logging and verbosity. The suggested way to handle this structure with docopt
is to make separate sub-parsers for each command but this seemed overkill for the very small command set dripfeed
has. I hacked together an alternative, but it felt like I was working against the grain of the library in doing so, and the result isn’t as friendly as I would like.
There’s a second way in which I started falling out of love with docopt
in this project: it’s not composable. I added a set of options for controlling logging: --quiet
, --verbose
, --log-level
, and --log
for specifying an output file.2 Ideally I would like to extract the handling for these into a package which I can drop into any of my commandline scripts, so that I don’t have to repeat myself whenever I want these (very standard-looking) options. But docopt
makes that difficult: the spec for the options has to be intermingled with the spec for the rest of the commandline interface. Next side project I play with I’ll be looking at click
instead, for this reason.
I need to either learn Git, or raise my Hg game
The commit history for this project is an embarassment. I’ve always been a fan of the mercurial philosophy that “history is sacred”;3 looking at some of these commits, though, I start to appreciate git’s philosophy that “history is sacred”.4 Git makes it easy to commit often for trivial changes (which is good for short-term development) and still to squash those commits into reasonable-sized coherent updates before pushing them out into the world (which is good for long-term not looking like a doofus). I understand that modern mercurial can do this too, but it’s certainly not the way I use it at the moment.
Contributing to this feeling is the fact that Travis CI is so self-evidently the obvious choice for CI testing… but it doesn’t support bitbucket, which is equally self-evidently the obvious place to host mercurial projects. I’ve “solved” this problem this time around by making a github clone of the repository, which I update with the hg-git
mercurial extension, but the setup is decidedly rickety.
Supporting both Python 2 and Python 3 is … interesting
Name changes are pretty cleanly dealt with using the six
package, but more finegrained changes still need some hackery; generally, though, this part of the process is pretty simple.
What’s much more interesting is dealing with the changes in string types and semantics. Being forced to be explicit about the distinction between bytes and unicode strings for Python 3 actually helped me catch what I think would have been a very subtle bug in the original code — one that it’s very unlikely would ever have been discovered5 but still I think this counts as a point for the New Deal.6
Was it worth it?
Well, I’m currently dripfeeding Gunnerkrigg Court (as a running test and for nostalgia value). So far it’s working beautifully.7 And I tuned my toolset, which was a whole lot of fun.
Notes:
- In fact, from any sequence of html pages all of which have a “next” link that can be reliably extracted with an XPath expression. [↪]
- Yes, this is total overkill for this tiny little script. I said I was playing with tools, right? [↪]
- It should record everything that happened in the project. [↪]
- It should be a readable summary of significant changes. [↪]
- I don’t expect this tool to get heavy use. [↪]
-
The bug was: I was loading RSS files with
open('r+')
instead of withopen('r+b')
. I think this could have caused problems with a non-unicode-encoded RSS feed, but I’m not 100% sure. [↪] - I am feeling the pain of not scraping the images into the feed, but that’s a deliberate decision that I don’t see changing. [↪]
Comments
So the footnotes were worth it after all!
Shorter parental-advisory version: I wrestled with a python, and mostly I won.
I'm glad you've started on Charles Christopher! This is a good time to start reading: things are getting very interesting just now. If you want another guinea-pig for dripfeed, Derelict recently wrapped up book one, and definitely benefits from non-binge reading.
I feel your pain regarding embarrassing commit histories. My workflow these days seems to be:
hg st hg commit [thirty seconds pass] "OH BUGGER" hg rollback [tappity tappity tap] hg commit hg push
Usually I catch it before the push. I didn't realize that batched changes were possible. Histedit's docs make it look a little dangerous... if Git can do this nicely, maybe it will finally convince me to try it out seriously. So far I have been unable to suppress the suspicion that using it looks something like this.
Oh I caught up with CC and Derelict ages ago (when you first recommended them, I think -- thanks!). This is the sort of itch-scratching I can only do when I don't currently have an itch.
Git has --amend
and modern mercurial does too: another of those cases where my usage lags behind the state of the tool. There is a significant difference though: because git has built-in the notion of "staging" your commits, it encourages you to make lots of quick commits as you work, then to clean them up into a bundle before pushing them out to the outside world. You can do the same with mercurial, it just takes a bit more discipline and you'll have to switch on some extensions.
Somewhere I remember reading a mailing list thread where someone with serious hg chops gave a list of advanced features matching the stuff in git that involves "rewriting history"; my google-fu is too weak, sadly.
Oh, and the git man-page generator is fantastic. That is definitely one barrier to switching to git: its commandline interface is freakin' nuts.
I seem to recall you ostensibly refusing CC and Derelict on my first attempt; should've known it wouldn't last. And --amend
looks like exactly what I'm after -- thanks. (The changeset evolution in 3.0 also looks fantastic; I think I'll come back to it circa 2016 when other people have found all the bugs and written gentle tutorials for its use.)
I bet I claimed I had no time to start on new comics -- that's what produces all-night catchup binges, exactly and completely.
I understood most of that except between the words "The last couple of weekends.......and "was a whole lot of fun."