A better LaTeX environment

Notes published the
4 - 5 minutes to read, 1065 words

While I did not need to write in LaTeX for many years, it happened to me that I wanted to resurrect some old documents I wrote years ago.

I could find most of the .tex files I wrote, and most of the final .pdf file, but some were missing.

I also noticed, with horror (as I spent months writing those documents), that I am not able to reproduce the final pdf output.

So I’ve been trying to recover the missing .tex files and recreate the .pdf files I did not save separately.

Revision control systems

The first step was adding everything I had to a revision control system.

While it is one of the first tools one needs to learn as a professional programmer, the only system I knew when I wrote those notes was making copies of the files I was handling.

A revision control system is much better because compared to plain copies

  • it gives the possibility to attach a comment to the saved changes (the commit message)

  • it makes it possible to test different variations (branches)

  • it integrates tools for seeing how and when a file has been changed (history and diffs)

And there are a lot of other advantages, but those three alone would have made it so much easier to track my work and avoid losing important files.

Reproducible builds

After I’ve secured all the documents I could find, I started trying to rebuild them.

While at the time I wrote those documents "reproducible builds" were not mainstream as today (they were a thing already in 1990!), being able to reproduce the exact same document is a very useful feature.

pdflatex does not create reproducible documents. The document might look similar, even identical, but some PDF metadata changes every time.

While it is possible to make a diff between the visible part of the document (more on that later), it is much easier to see if two documents are semantically the same if they are binary identical.

Since 2017, it is possible to create reproducible files.

The easiest way is setting the SOURCE_DATE_EPOCH environment variable.

If you are stuck with an older tex live distribution, then you might need to look at what metadata changes and if it is possible to control it directly.

Setting /CreationDate and /ModDate might help, as also using the external program faketime.

Automate the build job

I remember that at the beginning I typed pdflatex filename.tex. Then I called the command from history, and at the end, I settled writing a simple makefile (which was much more advanced than most people I knew did at the time), so that I simply needed to type make.

The makefile looked like this:

FILE=filename

.PHONY: all clean

all: $(FILE).pdf

$(FILE).pdf: $(FILE).tex
    pdflatex $(FILE).tex
    pdflatex $(FILE).tex

clean:
    rm -f $(FILE).pdf $(FILE).aux $(FILE).log $(FILE).out $(FILE).pdfsync $(FILE).toc

You might have noticed, that my target calls pdflatex twice. Why is it so? Because on the first run, pdflatex generates metadata for creating the table of contents. This is generated in the pdf file only on the second run.

As pdflatex, for my needs, was fast enough, I simply executed it twice every time.

Only later I discovered that in some scenarios you need to execute pdflatex three or even four times before the final result did contain the all data.

This is why I decided to search online for a better solution and found latexmk.

Long story short, instead of writing pdflatex input.tex, use latexmk -pdf input.tex. It will take care of running pdflatex multiple times if necessary.

I have no idea why I have never heard of latexmk, but every single source I found used pdflatex for creating PDF documents.

Better interaction with errors

The output of pdflatex is long and annoying, the same holds for latexmk. It is difficult to spot warnings and errors.

I’ve been using -haltonerror -interaction=nonstopmode as additional parameters, as it does the expected thing in other processes when an error occurs: stop the build job.

I have never needed to interactively fix an error.

Automatically build while writing

My workflow was writing some text, and compiling it after a while. Thus check if there were any errors and from time to time (more frequently in case there were some non-trivial formulas) look at the generated output.

It worked well enough that I did not bother finding a better way to write my documents.

While trying to recover the source code and recompile the missing documents, I appreciated this automatism a lot.

Thanks to entr, it is easy to build the document on file save. As entr is not TeX specific, it can be used as easily when editing other types of documents, like AsciiDoc or Markdown.

cd <dir with tex source code>
ls | entr latexmk -pdf -haltonerror -interaction=nonstopmode "file.tex"

diffpdf

This program has been extremely useful for recreating my documents.

As some documents were many pages long, comparing them at every iteration would have been an error-prone and time-consuming job. Thanks to diffpdf I was able to execute latexmk and glance at what was different or missing from my previous version.

It might be less useful while writing a new document, but it can come in handy when comparing two revisions. Especially when making some graphical adaptions, for example for comparing the graphical difference between formulas.

How to structure documents

To avoid working on big files, the content is split over multiple files, each one containing a chapter.

This makes it easier to structure the document, and not get lost in a gigantic file.

Also, macros, definitions, and other reusable components should go in a separate file. Ideally, those can shared as-is between different documents.

One sentence per line. In most programming languages it is common practice to put one statement per line; similar considerations hold when writing text. It makes it easier to read the content, and handle diffs and merges, especially when working from multiple devices or with someone else.

As the generated document is not affected by how many files and how many lines the content has been written, there is no reason to limit the readability and practicability of the final structure of the document.

Other than that, the recommended practices for asciidoctor have some useful guidelines.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.