GCC: A Templating Engine, if you ask it nicely
Epsilon-1 Gamma on 24/11/17
PREFACE
This is a rewrite of an older article of mine, with improved wording and formatting for my
Eta-based templating system that replaced these gcc shenanigans, as I needed more advanced features. This is still valid information however, and interesting enough to warrant keeping up.
BACKGROUND
In my various website writing endeavours I inevitably reached a point where manual replication of HTML on every single file got unwieldy and annoying. Really it was a small miracle it took as long as it did, but regardless, I needed a templating system now. Of course, a well-adjusted web developer would choose a proper templating engine such as
Jinja. However, I am not a well-adjusted web developer, and decided that such a solution is overkill for my rather limited templating needs.
So, rather than learning a new, overkill solution, I set out to solve the problem on my own. Looking closer at the problem, a templating engine on the scale I needed one is little more than an automatic copy-paste engine. Which of course led me to think of the C preprocessor, which essentially is a templating engine for C, at least when it comes to the
#include
directive.
Conveniently enough, PSR003, the server this site is running on, already has GCC installed like most Linux machines, and GCC happens to come with a C preprocessor. At that point the choice was rather obvious.
IMPLEMENTATION
So, off to do templating in C preprocessor. To start off, we need some test files. Assume that we are working with the following HTML files for this experiment:
html
000
001
002
003
004
005
<body>
<p>Lorem Ipsum</p>
</body>
html
000
001
002
<h1>hi i'm a title</h1>
The simplest thing to try was of course simply
#include
without anything extra. It is also worth noting to those unaware that C preprocessor directives should be at the beginning of lines. This attempt yielded the following file:
html
000
001
002
003
004
005
<body>
#include "B.html"
<p>Lorem Ipsum</p>
</body>
Upon attempting to compile
A.html
as if it were a C file, GCC of course spits out an error, as really I should have expected. According to GCC's output it failed to recognise the file format, and also the linker returned code 1. Makes sense, I did just try to cram HTML of all things into it after all.
The next thing to try after taking a look at GCC's documentation appeared to be the
-E
flag. This is described as "Preprocess only; do not compile, assemble, or link." This seems like exactly what was needed; however, trying it out, GCC spat out.. a linker error? But the linker wasn't supposed to run in the first place? Weird.
Turns out, the solution is
also weird: if the input file is specifically piped into GCC from
stdin
, GCC doesn't bother invoking a linker. Absolutely baffling behaviour, but oh well, I could work with this.
Now that GCC was finally vaguely cooperating, I ran
gcc -E -o result.html - < A.html
, which did indeed produce a result:
html
000
001
002
003
004
005
006
007
008
009
010
011
# 0 "<stdin>"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "<stdin>"
<body>
<p>Lorem Ipsum</p>
</body>
This was clearly progress in the right direction, the files were glued together as they should. That left cleaning up junk the preprocessor leaves behind, namely line markers. This would be trivial to handle with sed or a similar tool, however by this point I was committed to doing as much as possible with GCC alone. Luckily enough, GCC has the
-P
flag for suppressing them, making this an easy fix.
Another problem that is not visible with these minimal test files but appears when for example using
<pre>
tags for ASCII art: GCC likes to strip whitespace, breaking formatting. Luckily this is another single-flag fix, as GCC provides
--traditional-cpp
, which attempts to emulate legacy pre-standardisation preprocessors. It also turns out that working with non-C languages is actually the flag's intended purpose
according to GNU. I do suspect that HTML wasn't among the expected languages when designing it though.
Anyhow, this left me with
gcc -P -E --traditional-cpp -o result.html - < A.html
as the final command. Running this produces exactly what I wanted:
html
000
001
002
003
004
005
<body>
<p>Lorem Ipsum</p>
</body>
I will note however that at least on some of my systems, GCC likes to add a pile of additional blank lines at the top of the file. These are a non-issue as far as HTML goes, and it is left as an exercise to the reader to remove them, as this is a rather trivial task with for example sed. Or you can just copy what I did from
this example script.