./blog/article/241117/gcc-templating.rtx – rdf.app

GCC: A Templating Engine, if you ask it nicely

Epsilon-1 Gamma on 24/11/17

PREFACE

This is a rewrite of an older article of mine, with improved wording and formatting for my Eta-based templating system that replaced these gcc shenanigans, as I needed more advanced features. This is still valid information however, and interesting enough to warrant keeping up.

BACKGROUND

In my various website writing endeavours I inevitably reached a point where manual replication of HTML on every single file got unwieldy and annoying. Really it was a small miracle it took as long as it did, but regardless, I needed a templating system now. Of course, a well-adjusted web developer would choose a proper templating engine such as Jinja. However, I am not a well-adjusted web developer, and decided that such a solution is overkill for my rather limited templating needs.
So, rather than learning a new, overkill solution, I set out to solve the problem on my own. Looking closer at the problem, a templating engine on the scale I needed one is little more than an automatic copy-paste engine. Which of course led me to think of the C preprocessor, which essentially is a templating engine for C, at least when it comes to the #include directive.
Conveniently enough, PSR003, the server this site is running on, already has GCC installed like most Linux machines, and GCC happens to come with a C preprocessor. At that point the choice was rather obvious.

IMPLEMENTATION

So, off to do templating in C preprocessor. To start off, we need some test files. Assume that we are working with the following HTML files for this experiment:
html

000
001
002
003
004
005
    

<!-- A.html -->
<body>
    <!-- !!insert B.html here!! -->
    <p>Lorem Ipsum</p>
</body>

    
html

000
001
002
    

<!-- B.html -->
<h1>hi i'm a title</h1>

    

The simplest thing to try was of course simply #include without anything extra. It is also worth noting to those unaware that C preprocessor directives should be at the beginning of lines. This attempt yielded the following file:
html

000
001
002
003
004
005
    

<!-- A.html -->
<body>
#include "B.html"
    <p>Lorem Ipsum</p>
</body>

    

Upon attempting to compile A.html as if it were a C file, GCC of course spits out an error, as really I should have expected. According to GCC's output it failed to recognise the file format, and also the linker returned code 1. Makes sense, I did just try to cram HTML of all things into it after all.
The next thing to try after taking a look at GCC's documentation appeared to be the -E flag. This is described as "Preprocess only; do not compile, assemble, or link." This seems like exactly what was needed; however, trying it out, GCC spat out.. a linker error? But the linker wasn't supposed to run in the first place? Weird.
Turns out, the solution is also weird: if the input file is specifically piped into GCC from stdin, GCC doesn't bother invoking a linker. Absolutely baffling behaviour, but oh well, I could work with this.
Now that GCC was finally vaguely cooperating, I ran gcc -E -o result.html - < A.html, which did indeed produce a result:
html

000
001
002
003
004
005
006
007
008
009
010
011
    

# 0 "<stdin>"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "<stdin>"
<!-- A.html -->
<body>
    <!-- !!insert B.html here!! -->
    <p>Lorem Ipsum</p>
</body>

    

This was clearly progress in the right direction, the files were glued together as they should. That left cleaning up junk the preprocessor leaves behind, namely line markers. This would be trivial to handle with sed or a similar tool, however by this point I was committed to doing as much as possible with GCC alone. Luckily enough, GCC has the -P flag for suppressing them, making this an easy fix.
Another problem that is not visible with these minimal test files but appears when for example using <pre> tags for ASCII art: GCC likes to strip whitespace, breaking formatting. Luckily this is another single-flag fix, as GCC provides --traditional-cpp, which attempts to emulate legacy pre-standardisation preprocessors. It also turns out that working with non-C languages is actually the flag's intended purpose according to GNU. I do suspect that HTML wasn't among the expected languages when designing it though.
Anyhow, this left me with gcc -P -E --traditional-cpp -o result.html - < A.html as the final command. Running this produces exactly what I wanted:
html

000
001
002
003
004
005
    

<!-- A.html -->
<body>
    <!-- !!insert B.html here!! -->
    <p>Lorem Ipsum</p>
</body>

    

I will note however that at least on some of my systems, GCC likes to add a pile of additional blank lines at the top of the file. These are a non-issue as far as HTML goes, and it is left as an exercise to the reader to remove them, as this is a rather trivial task with for example sed. Or you can just copy what I did from this example script.