professionalism in writing LaTeX documents

This essay comes from my experiences proofreading LaTeX documents, both semi-professionally and for friends. It is intended mainly for technical writing, where LaTeX really comes into its own and where attention to detail can both greatly assist the proofreader or typesetter, and greatly improve the readability of the manuscript. This is neither a description of how to use LaTeX nor a replacement for a decent style guide for technical writing. I will discuss concepts that should be kept in mind while writing, and give some suggestions on how to use LaTeX effectively to manage them.

(Note: I refer throughout to _La_TeX. These ideas apply just as much to TeX, and in fact to any —markup-based— document preparation system. The examples are in LaTeX, because that’s what I use.)

Writing LaTeX for the reader and for the typesetter

Two readers of your manuscript must be kept in mind: the eventual reader of the published product, and the typesetter at the publishing office who must correct your typos and apply the house style to what you have written. For the first, the main concerns are consistency and clarity. For the second, I advocate writing LaTeX as a programming language as well as a typesetting system. These concerns overlap to some extent; good programming practises help maintain consistency, while simplicity and clarity are of paramount importance for writing code (as anyone who has debugged code written by someone else can confirm). Even if you are your own typesetter (for student papers, for example, or a thesis) the techniques I describe will be useful. They’ll make your manuscript easier to write, and much easier to edit.

First I’ll discuss writing for the reader, then afterwards how considering LaTeX as a programming language can make things easier. For those who don’t consider themselves programmers, this second section is non-technical and even more important for you than for the code junkies, since these simple techniques are likely to be both unfamiliar and very helpful.

Consistency and clarity in technical writing

There are two major areas where consistency becomes an issue in technical writing: in technical notation, and in the basic business of document management. The first category includes mathematical expressions, and whatever specialised notation you might happen to need for your subject. By “document management” I mean such details as crossreferences, citations, and fonts and styling. I assume the reader is familiar with \ref{}, \cite{}, \section{} and so on. I’m going to discuss some more general methods for using these consistently.

Technical notation

The biggest thing you should be conscious of here is that choice of font styling matters. When used technically, “x”, “x” and “x” could refer to three different concepts or objects. As an author, you should choose your styling deliberately and with care. There are a number of generally established principles that can help you. (In this section I refer to “font” and “styling” more or less interchangeably.)

Variables are italic

The most general rule I know of is that variables are given italics. I mean “variable” in the mathematical sense: ‘placeholder’ words whose referent is unspecified, or changes over time. (The canonical example is the use of x to represent an unspecified number in a mathematical expression.) Note that variables do not have to be mathematical! The sentences “I say something, and you say something back” and “I say something and you say something back” have very different meanings if “something” is interpreted as a variable.

In LaTeX you usually put variables in math mode. A common mistake to avoid is mentioning a variable in text mode when it also appears in a mathematical expression (“…the formula is 2x(x+y) where x and y are positive…”). The correct code is where $x$ and $y$ are, or where $x$ and $y$ are, producing “where x and y are”. (The pitfalls of math mode are many! Some more common mistakes are discussed later. The most important is that math mode should not be used to produce italics over whole-word variables, since the spacing is incorrect — try it with “effect” and see for yourself.)

Another consequence of the variables-are-italic rule is that most non-variable entities should not be italicised. Another common misuse is to spell out functions such as sine and cosine in math mode. Since the referents of the words “sin” and “cos” are fixed, they should be in roman. LaTeX provides builtins for many common mathematical functions, but you need to take care of your own notation yourself. (For example, if discussing semantics you might refer to the denotation of the noun “man” as “man(x)”, but “man(x)” would be incorrect.) Some techniques to make this less of a chore are discussed later.

“But I’m using someone else’s notation!”

This is sometimes a tricky one. Some people (particularly non-mathematicians) are simply font-blind, and don’t see the sorts of distinctions I’m discussing here. When using notation invented by such people, I usually feel free to standardise it (a common telltale is the use of the same word in different fonts for the same object or concept).

On the other hand, if the notational system is font-conscious, but doesn’t follow these guidelines, then you should be very cautious about making changes.

Choosing fonts carefully

At the opposite extreme from the completely font-blind, you can use a different font or style for every conceptually different type of object you discuss. This has the same advantage that type-checking does in a programming language: it helps you make sure you’re writing what you think you’re writing. However it can produce such a profusion of information that the result is largely unreadable. (For an example, see the notation in this cognitive linguistics paper.) Semantic markup is a technique giving you the type-checking advantage, without the loss in clarity.

More important, though, is careful choice of notation to emphasise the important distinctions and gloss over the unimportant ones. You should choose your examples to make as many distinctions as possible intuitively obvious, so that they need not be indicated with a profusion of font shifts. However if the distinctions are important, try to avoid “overloading” fonts (using italics for emphasis, technical terms and quotes, for instance). Minimising markup means that one ‘marker’ is enough, there’s no need to (for example) italicise a quote and surround it with quotation marks.

If the same word is used for many related concepts, you might consider using subscripts instead of font shifts to distinguish between them. (This is common for instance in cognitive linguistics, where you might want to distinguish between the word “man”, an abstract meaning “man(x)”, a particular conceptual implementation of the meaning in someone’s head, and so on. Another example is distinguishing types from tokens.) Note that the subscripts should be in roman! (They refer to fixed entities —in this case types— not variables.)

In most cases where the typography carries meaning, it is wise to explicitly state the conventions you employ. Having to describe these conventions for every font shift may also help you keep such complications to the minimum needed. (Bear in mind that many people will not pay close attention to these points. Using slanted and italic styles in LaTeX for different purposes is probably unwise, despite their real differences, since they’re so easy to confuse.)

Common math mode mixups

The confusing regarding italics is the commonest problem I see with LaTeX math mode. Close behind are the people who let math mode put their variables in italics, but use long variable names (such as “effort”). The reason this is a mistake is that spacing in math mode is based on the assumption that each letter is a separate variable. You’ll see this very clearly if your word contains the letter “f”, since this is unusually wide. The solution is to use the math italic font \mathit{}, since this is spaced as ordinary text.

Spacing in math mode is much more complex than it appears. This only becomes apparent when it goes wrong, of course. In fact there are six or seven different classes of math entities, all of which are given different treatment for spacing. So for instance you should use “\colon” instead “:” for function notation f\colon D \rightarrow R, because of the altered spacing. (You can also specify directly the spacing class you want an expression to have. More details in the Short Math Guide.)

Another, more subtle, mistake I see frequently is the failure to distinguish between the mathematical part of a formula and the textual part. For instance, I often see expressions such as let $v \in V, e \in V \times V,$ and. It is mere pedantry to insist that the commas are textual elements and do not belong in math mode, since the visual appearance is exactly the same. What does matter, though, is the lack of a space between “\in V” and “e \in” — which belongs there because the comma is not a mathematical operator (as in e_1,\dotsc,e_n) but a textual one. The reverse confusion becomes highly visible when a mathematical variable is included in a text paragraph without putting it into math mode. (Depending on the font, the same may hold for numerals as well, although thankfully this is not the case in Computer Modern, the LaTeX default font.)

Managing your document

LaTeX already provides good features for the low-level details of keeping your document consistent. If you use \label{}, \ref{}, \cite{} and friends, you no longer need to worry about the numerical details of your cross-references. But there are subtler matters of consistency that LaTeX will not handle for you, or at least not without a bit of prompting.

I have in mind things like whether figures are referred to as “Fig. 13” or “figure 13”, whether to use double or single quotes for quotations, and so on. I don’t advocate any particular style here, for that you need to read a good style guide (and in fact your decisions may be overridden by the typesetter of your publishing house). The most important thing is that you apply one set of decisions consistently throughout your document.

For some of these issues, you can make LaTeX do some of the work for you. For instance, you might define a macro \figref{} that works like \ref{} but also adds the word “figure”. This has two advantages: you’re sure to always get the same consistent style, and if you want to change that style you only have to change your macro definition.

A word of warning: this sort of thing is worse than useless if you don’t, or can’t, use it! The definition above is bad, because you can’t use it if the reference is the first word of a sentence. You might be tempted to write the word by hand, but then you lose the consistency-enforcing benefits of the macro. Either choose a style that you can use sentence-initially, or define another macro for that case. I wrote a small package (allrefs) that assists with this process for reference-like constructs, but in general you’ll have to think carefully through how you expect to use the macros you define, and make sure they cover all the situations you’ll run into. A partial solution is worse than none at all, for enforcing consistency. Some rules have so many exceptions it’s better to implement them by hand than using macros.

Also be aware that someone else will have to read your manuscript. If you’re going to use non-standard macros like this, you should apply the ideas outlined below to make your decisions as clear and easy to edit as possible for your typesetter. (If you are your own typesetter, this is still good practise and will certainly pay off in a large project such as a thesis.)

There exist a few LaTeX packages designed to solve this particular problem (the words attached to a \ref{}). Most of them try to analyse the label they’re given, to decide if it is a figure or a table or whatever (so you’re committed to a naming scheme such as \label{fig:myfig}, which is in itself useful discipline). I don’t use them because they’re complicated to learn and use correctly. In general I advocate simple solutions to consistency problems, for two reasons. Firstly, these are minor matters of professional presentation. While they’re important, they’re not worth devoting too large a portion of your writing time to, for instance in mastering a complex reference-synthesis package. Secondly, as discussed below, the simpler and clearer your markup is the easier it is to read and alter. Complex solutions have a tendency to hide the actual text that gets produced underneath multiple layers of macros, which is generally unhealthy for the maintenance of your document. (See the section on LaTeX as a programming language for why I think your document needs “maintaining”.)

Semantic markup for consistency

A very useful technique for maintaining consistency is semantic markup. This idea is behind the LaTeX macro \emph{}, which seems superfluous when we already have \textit{} and \itshape. At least, it seems superfluous until you look at the result of \emph{within this text \emph{some words} are more important than others}. The \emph{} macro is not a simple font switch, but really emphasises its argument differently depending on the environment.

Not all semantic markup need be so cleverly constructed. The important point is that the name and usage of the macro are defined by the intended meaning for the reader, not by the intended typographical effect.

This is good for several reasons. As with the reference example above, you can change your style decisions quickly and easily. But you can also embed extra information in your document that the reader never receives, but which is available for you or for your typesetter!

Consider for example the common convention of introducing new technical terms in italics. You can do this using \textit{}. You can be clever and use \emph{}, so that the words stand out in italicised environments (such as theorems under some styles, for instance). Even better, though, is to define a macro \term{} that applies \emph{} to its argument.

Then if you change your mind about how you wish to mark terms, just change the macro. If you want to construct an index of terms showing where they are defined, put an indexing command in \term{}! Even if you in the end only apply \emph{}, the distinction between italics for real emphasis and for technical terms is preserved in the LaTeX source. (That this is a Good Thing comes down again to document maintenance.)

Semantic markup is especially good for technical notation. You can include all the type-related information you need (to keep your notions straight) in your document, while only showing the parts that are important for the example at hand. You can even recycle the same example with different styling definitions for different purposes.

Writing for maintenance: LaTeX as a programming language

Software engineering emphasises the important of maintaining a program: far more time is usually spent debugging and altering a piece of software than writing it. In contrast, a document is apparently written once then thereafter is little if at all altered.

Apparently. In fact, you should be spending quality time editing your writing if you want to end up with a polished result. Most people during the writing process make rearrangements to their material, whether large or small, and the result is often the better for it. It’s easy to be put off making major changes by the thought of making sure everything fits together again afterward. Before electronic document production, this problem was much worse: rearranging material meant renumbering sections, figures, footnotes, and even pages. Many people still use word processers this way, for instance writing a table of contents by hand (although any serious modern word processor can automate all of this). In this sense, document maintenance is something to plan for early (if you’re considering making major changes, chances are the original is really not what you wanted, so making changes possible should be a priority!).

However document production using LaTeX is in another way more similar to writing a program than typing an essay on a typewriter. TeX, the typesetting engine on which LaTeX is built, is a programming language. Some LaTeX constructions are much more like miniature programs than simple typesetting instructions (anyone who has used the XY-pic system for diagrams will agree). And LaTeX constructions have the capacity to be just as baffling as the worst obfuscated C code you could imagine. This is particularly the case if you’re writing complex mathematical expressions, which TeX is very good at but can be almost impossible to decipher or make alterations to.

The reason this can become important is simple: publication. If you’re getting published, at least one other person probably has to read some of your LaTeX code. You want to please this person, you want to make them as happy as you can, because they have the power to alter anything you have written and leave your name attached to it, for all the world to see. Less cynically, they will likely have to make some alterations to what you have written, for layout reasons and to apply the conventions of their publishing house. The clearer your intentions are, the more likely the end result is more or less what you wanted. (Remember semantic markup? This is where it really comes into its own.)

Write it how it looks

A general principle that helps the reader enormously is that the code should look as much as possible like the result on the page. For example, take the display-mode math delimiters \[ and \]. You can write the main theorem \[E=m\mathrm{c}^3\] states that, or you can write

  the main theorem
  \[
    E=m\mathrm{c}^3
  \]
  states that

Because LaTeX treats spaces intelligently, you can lay the code out pretty much as you wish. When you later see that you’ve used the wrong exponent, you’ll find it more easily if it’s laid out in the second manner (especially given that searching is sometimes tricky, with markup and variable spacing to contend with).

For the same sorts of reasons, when defining macros you should choose names that are related to the text produced, as far as this is possible (these are generally good semantically-oriented names anyway). As much as possible you want a typesetter to be able to read your document source and make correct guesses about how the output will look, without having to keep in mind the details of how your macros are defined.

There is a certain conflict between this aim and the ideal of semantic markup (the “emph” of \emph{} doesn’t appear in the text, for instance). This is a delicate matter calling for taste and discretion ;-) Your markup shouldn’t overwhelm the actual text, but low-level font switches should occur very infrequently, if at all. The worst cases are things like \mathcal{}, which have long names and will probably be applied very frequently to single letters. Any semantic markup will likely be just as bad (\eventvariable{}) and you’re probably better off moving more towards the as-it-looks end of the spectrum:

   \newcommand{\event}[1]{\mathcal{#1}} % use for event variables
   % commonly used event vars
   \newcommand{\e}{\event{e}}
   \newcommand{\a}{\event{a}}

Be especially careful with these ultra-short forms, since they’re error-prone and difficult to interpret when reading. If you’re using them as convenience shortcuts for sensible longer forms, use a smarter editor instead.

Indent for clarity

Like most programming languages, LaTeX has a block structure (indicated by {, } and \begin{} and \end{} environment delimiters). Keeping track of this intelligently aids the readability of your code, as well as helping avoid errors. A good text editor should do this automatically.

Commenting

Whenever you define new macros, you should comment extensively. Not what the macros do (unless it’s such hairy TeX code that it’s not obvious) but why. Mnemonic names are good, but realistically macro names should be kept short as well. The comment is for everything that got thrown out from the name. For macros taking arguments, or that should only be used in certain ways, an example usage might help.

  \newcommand{\myFigref}[1]{Figure~\ref{#1}} % eg. \myFigref{fig:3}

  \newcommand{\axiom}[1]{\mathsf{#1}} % named axioms or axiom systems, eg. \axiom{ZFC}
  % some common axioms and systems
  \newcommand{\ZFC}{\axiom{ZFC}}
  \newcommand{\DC}{\axiom{DC}}
  \newcommand{\FOD}{\axiom{FOD}} % first-order-definable, eg. $\FOD(\theta)$

Use tools intelligently

Yes, you can write LaTeX in Microsoft Word, or even Notepad. But you don’t want to. You want a tool that makes it easy, so you can spend more time deciding what to write. My tool of choice is Emacs, you mileage may vary. But there are a few things that your tool should provide.

Syntax highlighting and brace matching. This is really really important. These things have to do with LaTeX as a programming language, so they’re important to get right. You should know where each brace pair starts and finished without having to count, because that’s a single argument to some LaTeX macro, so if you get it wrong it’s going to show. It might generate an error, which is irritating but actually a good thing, because then you know you have something to fix. Or it might just silently make a three-line section heading for you to catch on proofreading. Likewise you should know where math mode turns on and off just by glancing at your source code. Indenting, as mentioned above, comes into this category as well. Properly indenting code makes certain common errors (forgetting to \end{} an evironment, for instance) much less likely.
Completion for macro names. With a good completion mechanism, you don’t have to define short but unreadable versions only for convenience sake.
Label, reference and citation checking. The builtin LaTeX mechanisms of course can’t cope if you misspell your label names. A good system warns you when you \ref{} something that doesn’t exist in your document.
Multi-file handling. For large projects, being able to split your file into several pieces has big advantages. Your tool should be aware that these are all parts of the same document (for instance enabling document-wide find-and-replace, reference checking, and so on).

And now a shameless plug: you mileage may vary, but you should really take Emacs for a test-drive. At least, if you’re unfamiliar with Emacs you should try XEmacs, in either case with the AUC-TeX package and RefTeX, which together provide everything mentioned above and more. Where this package really shines is in the treatment of math mode, allowing you to enter expressions like $f \colon \alpha \rightarrow \beta$ as quickly as the (less clear) shortcut forms $f \: \a \ra \b$ which you might otherwise be tempted to define.

Summary

After outright errors, nothing shows up an unprofessional manuscript like a lack of consistent styling. LaTeX does a lot of work to ensure that this is not a problem, but you can still make a hash of it all. (It goes without saying that you should use the mechanisms LaTeX provides, such as \label{} and \ref{}!) Being aware of the need for consistency is the first step towards a professional-looking manuscript.

Style guides can help make many decisions about presentation, but often you’ll be left with a large number of possible approaches. Emphasising clarity and simplicity in presentation will make your document more readable to its audience, while well-chosen semantic markup and layout designed for reading will help you or your typesetter find and fix errors and give you the freedom to make stylistic changes freely.

The facilities LaTeX provides can be used to make this much less of a chore than it appears at first. To reap the full benefits you’ll have to learn some TeX programming, but for basic semantic markup all you need to know is \newcommand{}{}. Combine this with attention to matters of consistency, and a basic knowledge of a few general conventions, and your writings will be the better for it.