Python's trailing-comma syntax (a complaint)
I’ve been using Python for something over a year now, and I love it. I learned it on-the-fly, and it says a lot for the design of the language that I could pick it up so easily but that I still find it powerful and flexible a year later. But there is one feature of the language design that I consider a terrible mistake; it is unintuitive to learn, but much more importantly, getting it wrong (which I continue to do regularly, after a year’s experience) produces bugs that often surface far from the original mistake. I’m talking about the trailing-comma one-element tuple syntax.
Python’s tuples: a quick reminder
A Python “tuple” is a fixed-length immutable list. This is a handy thing to have for efficiency reasons (they’re used internally to pass function arguments, for instance): allowing list modification, especially the insertion or removal of internal elements, means mutable lists have to carry a whole lot of pointer overhead that tuples can do without. A python tuple is usually written with commas and parentheses: (1,2,3)
is a triple, for instance.1
For reasons I don’t know (although I’m sure they were carefully thought through at the time)2 the syntax actually only depends on the commas, not the surrounding parentheses. This allows some nice idioms: x,y = y,x
exchanges the values in two variables, for instance. But it gives rise to two rather odd features.
Firstly, the zero-element tuple does rely on parentheses: it is written ()
. An odd little inconsistancy, but not a very important one. Given this fact, though, one might wonder how to write the one-element tuple containing some value x
. The obvious, but wrong, answer would be (x)
. In fact, this is simply the value of x
, with the parentheses giving only (trivial) grouping. The correct answer is x,
(note the comma!). One can for clarity write (x,)
but the parentheses are strictly optional.
All right. This is odd, and certainly gives rise to some beginner’s errors before you fully internalise it, but surely it’s not so terrible? You learn the rule, you deal with it. Well yes, but there is at least one code cleanup task that I do fairly frequently, that makes it very easy to trip over this bug.
A refactoring that frequently goes wrong
Suppose you have a function or method call with a large number of arguments, vertically aligned for clarity:
my_object.wordy_method_call(arg1, arg2, another_arg,
kw_arg_1=complicated_calculation(x,y,z),
kw_arg_2=another_calculation(a,b,c), # (*)
kw_arg_3='a string for a change')
Some time later you realise that you are going to reuse that kw_arg_2
value at (*)
further down, so you pull it out of the method call by copy/paste:
kw_arg_2=another_calculation(a,b,c), # (*)
my_object.wordy_method_call(arg1, arg2, another_arg,
kw_arg_1=complicated_calculation(x,y,z),
kw_arg_2=kw_arg_2,
kw_arg_3='a string for a change')
Spot the error?
The value being passed to wordy_method_call
as kw_arg_2
isn’t the result of another_calculation(a,b,c)
any more, it’s a tuple containing that result. The trailing comma is innocent within the argument list, but definitely dangerous outside it. (A trailing comment carried along with the copy/paste makes this error even harder to spot.)
If you’re lucky, wordy_method_call
does some sanity checking of its input values. (Python is weakly typed, so you won’t get warned about this unless you’re explicitly looking for it.) If you’re unlucky, it either passes its arguments immediately off to other functions3 or was expecting a tuple there anyway (just not one containing a tuple…).
What can we do about it?
Not much. My best suggestion is a style convention, that tuples on the right side of an assignment should always be parenthesised.4 Then a good IDE should be able to raise a warning flag whenever an assignment ends with a comma.5 That’s about the best we can do, because while it’s often not what you mean, a trailing comma is valid Python and can be perfectly legitimate.
Notes:
- “Usually written” is necessarily vague, but this is how the python interpreter prints them. [↪]
- I don’t mean this facetiously. Programming language design is a hugely complicated balancing act. Just changing this piece of syntax would certainly introduce problems elsewhere; if it makes any existing expression ambiguous then that ambiguity has to be resolved somehow, and the effects continue to flow outwards. [↪]
-
A recent unpleasant example: writing a tuple-containing-a-unicode-string into an http query parameter, instead of the unicode string itself:
&q=(u'value',)
wasn’t quite the desired effect. [↪] - See the exchange-values idiom above for an example where this doesn’t read very nicely though. [↪]
- PyCharm, which I use, doesn’t include this check. It does look for trailing semicolons though, in I think roughly the same syntactic position, so adding it shouldn’t be hard. [↪]