The PHP logo, by Colin Viebrock, licensed under CC BY-SA 4.0

PHP as static site generator


9 - 11 minutes read, 2193 words
Categories: web
Keywords: asciidoc asciidoctor css hugo javascript php pygments web

I’ve used Hugo for organizing my notes online, and I have been pretty happy with it.

Before Hugo, I’ve used some CMS like wordpress.com, but those platforms had some major disadvantages that always bothered me

  • I needed to write the content online, with little to no possibility to test offline (unless maintaining a second website)

  • I did have little choice on what format to use. Back in the day, I would have preferred, for example, to write in LaTeX or HTML directly instead of using the provided visual editor.

  • My content is saved inside a database online, instead of simple text files that I could simply copy on my systems

Hugo, like most other static site generators, overcomes those disadvantages:

  • the website is created offline and can be tested with a simple local server (installing and maintaining WordPress is not as easy)

  • content is written mainly in markdown or asciidoc. Multiple formats are supported and those are mainly textual, they do not necessarily need a dedicated WYSIWYG editor.

Before choosing Hugo, I tried to make a comparison between static site generators. I do not remember which I did compare, but I do remember why I preferred Hugo other similar programs:

  • little to no dependencies

  • fat binary

For programs that are not packaged officially, I’m very cautious before deciding to use them. Yes, it is a high bar for many programs, but as an end-user, I’m more confident (of course there is no guarantee), that the program will work with the rest of the system, that it is somewhat trustworthy, and that I can manage it like all other programs.

As Hugo did not reach version 1.0, having a simple executable was actually a big plus. I do want to store my content online easily, I do not want to learn a new framework for creating websites. So if Hugo decides to do some major changes, I could simply use the old binary. At least until I have time and nothing better to do than to check how the newer versions work and use it.

And this is what I did until now.

Hugo has a couple of (for me) big downsides, the first one is that it needs a theme to work. This means that there is a non-trivial dependency.

The other things I did not like are that Hugo requires to use a piece of YAML inside asciidoc files for adding metadata. This makes the whole file non-asciidoc compatible, thus it is not recognized anymore by some editors (for example for syntax highlight or previews while writing), and by asciidoctor. It is an unfortunate choice, as asciidoc already has a way for adding those pieces of information.

And another thing I did not like is that it encourages the use of javascript for colorizing code snippets.

For the theme, I could not find any decent documentation for creating one.

I’ve used (and adapted) the hugo-bootstrap theme, which works well and looks simple. But under the hood, it is pretty bloated. The assets folder of the theme is more than 180MB in size! Considering that most of my content is textual, and the total size of what I wrote (comments excluded) and uploaded is below 10MB, it feels like there are possible improvements.

Is this actually a problem? As long as no-one downloads the content, it’s just some data on the drive.

It turns out that most of the space is occupied for a js subfolder, and from this subfolder, 130MB are for fonts.

I have no idea (I could have checked, but apparently I was not that that interested…​) how much of those files were needed and downloaded every time I’ve visited my website. I am pretty sure that I do not need some special fonts, and most of the time I surf on the web with javascript disabled, and the content of my website is accessible.

The second thing I did not like, is that with javascript disabled you felt like a second-class citizen. Because even if all the content is still available, some icons (because of Font awesome) only works with javascript enabled. Again, I did not like it, but I decided to stick with it.

Then, a new version of Hugo appeared where the theme I’m using no longer worked. I’ve checked if there was a new version of the theme, but no, there was none. I checked if there is finally a default theme, but no, the issue is still unresolved.

So I could have tried updating the theme, but it meant learning better how Hugo works internally. If only I had enough time I would gladly do it, as learning new things or how the tools I’m using is most of the time a good exercise, but I’m already struggling to write my thoughts down and in a presentable way.

I could have chosen another theme, test it and remove the old one, and update Hugo. Unfortunately, most themes are not tested and/or updated regularly (at least those I’ve checked), and many made layout decisions I did not like. Also even if I did find a new theme that suited me, a future update of Hugo could break it again, so it would be only a short-term solution.

Of course one could blame the theme authors, as of course the theme should get updated, or Hugo, as it should be more stable/provide a test default theme or an example theme. Surely it is also my fault, as it is stated clearly that Hugo has not reached version 1.0 yet.

So I decided to back up my fat binary and continue using it.

And so did I, even if, from time to time, I thought how could I stop using the old version of Hugo.

I even thought about writing a static site generator from scratch, but for a very short amount of time: I do not want to spend too much time on it.

And then it struck me: what if I used PHP for statically generating the website?

Hugo uses Go template engine, which seems like a great idea. But in PHP it is possible to do much more: HTML and PHP can be written interleaved on the same document, giving me the same and more features (like syntax highlighting, IDE support, …​) for free.

The "classic" usage of PHP is having one instance running on the server, generate the resulting page, and sending it to the end-user. Thus pages were still created dynamically on the server, and, in my case, would be served as static pages to the user.

Generating webpages dynamically has some disadvantages, for example, it makes testing more difficult. I cannot, for example, list all HTML pages on my drive and validate them. I need a running PHP instance that serves them and then feeds the validator with the result. Granted, creating such a server is not difficult (and in the end, I still needed it for other reasons), and there are HTML validators that accept URLs and not only stdin or files.

Also, generating pages dynamically makes, by definition, the server susceptible to certain types of errors. Suppose that internal links to other articles are generated, for example, by concatenating some string together. Testing that all links that are going to be generated are valid is, normally, not possible. One should navigate and execute all code paths that can generate a link and then validate it. If the pages are generated statically, then it’s trivial to see if there was any error: either all (internal) links on all HTML files are correct, or some are not.

This does not mean, of course, that there cannot be an implementation error that can create an invalid link. I do just know that with the given content, the generated links are not broken. If there is an error that can cause a broken link, it is not relevant (yet).

Of course, there are other types of errors, like a link to a wrong page, that are also hard to detect with static HTML pages.

Still, deciding to leave my current setup for something else could easily turn into a time-sink, as I have to

  • write and test HTML and CSS. Also do not forget to test on desktop and mobile.

  • rewrite/update some parts of the articles. For example, because even if I used asciidoc, Hugo uses YAML for storing metadata, and this makes all those files non-conforming

  • check how to transform an asciidoc to HTML in a way it can be embedded in another page

  • define a "build system". It should not only handle PHP files, but also asciidoc, images, CSS, and probably something else.

  • parsing and transforming metadata, for example, title, date, description, keywords, etc.

  • ensure that most links are not invalidated. Either reuse the same structure or add a redirection.

  • ensure pages can be tested locally

  • generate sitemap, RSS feed, …​

As I do not have particular needs, as long as I find the content readable and it is fast to load, most tasks were easy to implement. Unfortunately, others were more complex than expected. I also noticed how many things I took for granted, and asked myself if I wanted to go this route down. The to-do list before considering dropping Hugo for my local solution is at least four times longer, and it kept growing once I noticed that this or that feature was still missing (for example word count, creating a page with categories, …​). Also, many things I needed to test were open questions I had about the theme I was using.

So all-in-all, writing my static site generator has been an interesting and revealing exercise. After realizing how much Hugo did for me, I’m happy that it did not take that long to write something that I thought I could use and maintain. It has also been interesting because normally I do not care that much how much time I’m going to spend to fix an issue, it does not matter how small. Most of the time I do not consider a task done until all known issues are fixed, and once reached that point, I try to make sure there are no unknown issues.

This time, I also wore the hat of a "product manager", and needed to tell myself: "ok, yes, it’s a bug, but it can wait, I need an MVP to replace Hugo and then add the missing, but less important, features".

It has not been easy, but I micromanaged it :-) and made a second (and then a third and fourth) roadmap.

My static site generator has a lot of open/known (by me) issues. Nevertheless, I did not fix them all, the most important thing was to create an MVP.

Also because I just needed it to play well with the content I’ve written, I was able to take a lot of shortcuts (gasp).

I did, for example, not always encode URLs correctly. Most of the time it’s not an issue, as there are now whitespace or strange characters until I added c# and #pragma as keywords. At that point, I had to reorganize some code. There might still be lurking similar issues around, but the important thing is that all generated links are not broken, and this means that the underlying implementation is good enough (for now). If some future link is broken, I will search for all places where those are added, or find a better abstraction for dealing with this problem once and for all.

Currently, my strict dependencies are PHP (at least version 7, but code can be adapted to older versions), pygmentize, asciidoctor, sed, and other common CLI-utilities.

So I have more dependencies, but compared with what comes, for example, from npm, I consider them stable, as packaged for most distribution. And if for some reason those are not available, it is easy (except for PHP and asciidoctor because of their central role) to replace them.

The overhead got reduced from approximately 180MB to less than one. There is no javascript (as of now), and the CSS file is less than 5KB. For every article, the generated HTML is generally double as big as the written raw data. Thus if an article is 30KB big, the resulting HTML page is approximately 60KB big (thus images, CSS, and other references excluded) Also adding and changing things is easier because I have no issues with retro compatibility, but also because the code base is very small.

Making a real framework from what I’ve written would not be a good idea. I already needed to implement more features of Hugo I did not think of at the beginning than I would like to admit.

A better alternative would be to provide a library, or some tooling, to automate common tasks, and leave the user the decision of how and when to use it. This would make it possible to leave the code as small and specialized as possible, while a framework, generally speaking, will inevitably need to add more and more features.