Peter Krautzberger · on the web

(Print) "Algorithm Layout" in HTML and CSS

July 2022. I started this piece well over three years ago (some time before June 2019 in any case). As it came along grew out of proportion, I had tried turning it into three pieces but never got far enough to start publishing it. Then I tried to recombine them (some time in 2020?) in the hopes of finally pushing them out but didn't finish the last few pieces. Then 2021 was worse than 2020 and now it's 2022. So I'm thinking I will make this a new kind of piece around here: a partially complete piece of writing that may (or more likely: may not) get edits and updates. Let's jump in then.

A while ago I started to think about "algorithm layout". Not syntax highlighting of source code (which prism.js handles nicely) but the kind of print traditions captured in the following image.

Euclid's algorithm typeset using LaTeX's algorithmicx package

Part 1. Prior art and the problem of content conversion

If you're familiar with LaTeX packages such as algorithmicx or algorithm2e you'll recognize this particular style of layout. As the names of those packages imply, this type of content is more about the abstract algorithms and less about implementation details in a specific language.

Realizing this specific style on the web is an interesting challenge. On the one hand, it's a problem of realizing print layout traditions on the web, an area I deal with professionally on a daily basis. On the other hand, there's an authoring problem behind this: realizing print layout is one thing, realizing it in a way that allows for a good (web!) reading and authoring experience is another.

When I started this side project, I had only had a quick look at some tools that solve this problem and before I wanted to write about what I did, I thought I should take another, closer look at them to provide context (and check my own work). I'm glad I didn't look too closely earlier since that may have sent me off in a completely different direction. It also reminded me of the pitfalls of authoring, in particular print content and conversion.

Prior art 1: algorithmicx

To pick a starting point, let's use a classic in every sense of the word: Euclid's algorithm.

Here's what you'd write using the algorithmicx LaTeX package.

\begin{algorithm}
\caption{Euclid's algorithm}\label{euclid}
\begin{algorithmic}[1]
\Procedure{Euclid}{$a,b$}\Comment{The g.c.d. of a and b}
\State $r\gets a\bmod b$
\While{$r\not=0$}\Comment{We have the answer if r is 0}
\State $a\gets b$
\State $b\gets r$
\State $r\gets a\bmod b$
\EndWhile\label{euclidendwhile}
\State \textbf{return} $b$\Comment{The gcd is b}
\EndProcedure
\end{algorithmic}
\end{algorithm}

and you'd get the image from earlier. It's fun to note that this source is straight from the package documentation, including inconsistencies (such as a mismatch of math mode in states and comments).

Let's naively HTML-ify this:

<algorithm>
<caption id="euclid">Euclid's algorithm</caption>
<algorithmic numberLineSkip="1">
<procedure>
<label>Euclid</label>
<start>a,b</start><comment>The g.c.d. of a and b</comment>
<state>r ← a mod b</state>
<while>
<start>r ≠ 0</start> <comment>We have the answer if r is 0</comment>
<state>a ← b</state>
<state>b ← r</state>
<state>r ← a mod b</state>
</while>
<state> <strong>return</strong> b</state> <comment>The gcd is b</comment>
</procedure>
</algorithmic>
</algorithm>

That seems fairly straight forward even if the label "euclidendwhile" has been lost.

I'd like to focus on two problems

Or put differently: how to move from content conversion to content authoring.

Pior art 2: latexml

While I personally wouldn't recommend it, latexml is an interesting option to consider because it is so heavily focuses on extracting semantic structure from print documents.

Here's a codepen embedding of what you'd get from it:

See the Pen Algorithms via LaTeXML by Peter Krautzberger (@pkra) on CodePen.

So to me this is a terrible mess of markup and layout but it's not surprising given what latexml focuses on. When you get into the business of generating HTML and CSS from things that are fairly far removed from HTML and CSS (here: long form print documents) you are bound to create such a mess of a markup. That's not a criticism; I think it's simply inevitable.

After all, an author who uses latexml was not interested in authoring web content. Instead, they authored print content using LaTeX and then used a second piece of software (filled to the brim with heuristics) to generate a close approximation of the print product in a custom XML. In a third step, they created HTML and CSS from that.

Nevertheless, there is a lot of noise in this which may make it hard to explore the example. Here's another codepen embedding with a more minimal version of the latexml output that has (hopefully) less noise and still exhibits the inherent issues I see with this approach towards authoring of web content.

See the Pen Algorithms via LaTeXML, reduced by Peter Krautzberger (@pkra) on CodePen.

A quick list of things that jump out when going through this.

I'm not trying to complain too much here. Again, this is mostly what you'd expect when you build a tool like latexml. Print is just a crappy format to generate web content from and even good LaTeX can't save you from that fact.

Prior art 2: pseudocode

The next example is a probably less known JS library called pseudocode.js.

Here's again a codepen embedding of what we might get.

See the Pen pseudocode.js by Peter Krautzberger (@pkra) on CodePen.

Cleaning it up a bit seems harder given. Maybe that's because it is closer to thinking in terms of web design (as opposed to print-reproduction). But we can delete the equation layout mess (which is weirdly inconsistent in the algorithmicx source anyway).

See the Pen pseudocode.js, reduced by Peter Krautzberger (@pkra) on CodePen.

Again some random observations.

Coda Part 1.

In both cases, the main reason why I find these approaches lacking lies, I think, in the fact that they are the result of (print) content conversion. The disconnect between the authoring and the layout is too large to get one in touch with the other. This is a loss for both sides: the layout gets messy and the author comes no closer to understanding the medium they author for.

A typical side effect of this shows when you want to modify the output: neither output can be re-arranged efficiently. Usually, all you can do is go to the source, change it, and convert it again. We'llcome back to that later.

To be clear, I'm not saying that some of the "bad" design decisions could be avoided; I definitely haven't found a way around every pain point. But my experience has been that the law of the instrument is at the heart of the problem. Content conversion inevitably leads to certain types of hacks (such as meaningless DOM structures, class names and inline styles) to allow authors customizations without them having to think about the actual medium. If it's content conversion from a format meant for print, this is problem exacerbated because the media and their traditions often match poorly. But once those hacky instruments are in the tool belt, it's hard not to use them and you end up with hacks all the way down.

In the next part, I'll explore a different approach, also thinking about it from the perspective of the source material but hopefully with some level of detachment. After that, perhaps I'll find time to jot down some ideas on how to move towards authoring. But this piece has been more fun than I thought it would be; it's been interesting to dive a little deeper into these two after trying things on my own; good to see similarities and differences.

Part 2. A first draft

Let's go back to the beginning.

Euclid's algorithm typeset using LaTeX's algorithmicx package

In Part 1 I started at the wrong end, so to speak, and I looked at solutions that try to be keep the spirit of the print source, both in authoring "convenience" and layout design. I also outlined what I think of as issues and limitations of such (print) content conversion.

Still, it is painfully fascinating. I wanted to find another way of dealing with this problem and, ideally, learning (with detachement) from the print authoring experience to enable web authoring.

Back to the source: algorithmicx (again)

Again, we keep to the classics and Euclid's algorithm.

Here's what you'd write in algorithmicx.

\begin{algorithm}
\caption{Euclid's algorithm}\label{euclid}
\begin{algorithmic}[1]
\Procedure{Euclid}{$a,b$}\Comment{The g.c.d. of a and b}
\State $r\gets a\bmod b$
\While{$r\not=0$}\Comment{We have the answer if r is 0}
\State $a\gets b$
\State $b\gets r$
\State $r\gets a\bmod b$
\EndWhile\label{euclidendwhile}
\State \textbf{return} $b$\Comment{The gcd is b}
\EndProcedure
\end{algorithmic}
\end{algorithm}

and you'd get the image from earlier. And here's our naive HTML-ification again:

<algorithm>
<caption id="euclid">Euclid's algorithm</caption>
<algorithmic numberLineSkip="1">
<procedure>
<label>Euclid</label>
<start>a,b</start><comment>The g.c.d. of a and b</comment>
<state>r ← a mod b</state>
<while>
<start>r ≠ 0</start> <comment>We have the answer if r is 0</comment>
<state>a ← b</state>
<state>b ← r</state>
<state>r ← a mod b</state>
</while>
<state> <strong>return</strong> b</state> <comment>The gcd is b</comment>
</procedure>
</algorithmic>
</algorithm>

Some random thoughts when looking at this:

head on

So let's try not to overthink this, let's try not to take more than inspiration from the source, let's try to think about the layout, its meaning, and how it could work with the web's grain.

What do we need at first sight?

Again, the obvious modern choice for two dimensional layouts is CSS grid. And CSS counters should help as well:

.algorithm {
counter-reset: algolinecounter;
display: grid;
grid-template-columns: repeat(3, max-content);
grid-column-gap: 1em;
}

To try it out, here's a dummy set of two "lines", first as part of the page then as code.

Counter
Content
Comment
Counter
Content
Comment
<section class="algorithm">
<div>Counter</div>
<div>Content</div>
<div>Comment</div>
<div>Counter</div>
<div>Content</div>
<div>Comment</div>
</section>

Nothing really to see yet, so let's move on.

To handle our line counters, we use the counters (obviously) but we also get some critical help from grid: since we might not have comments, we don't know whether the line is full. But that's not a problem with grid - we can just force our counter to always start in the first column

.counter {
grid-column-start: 1;
counter-increment: algolinecounter;
}

.counter::before {
content: counter(algolinecounter) ': ';
}

To try it out, here's two lines with only the second on having a comment. Again, first as part of the page, then as source.

Content
Content
Comment
<section class="algorithm">
<div class="counter"></div>
<div>Content</div>
<div class="counter"></div>
<div>Content</div>
<div>Comment</div>
</section>

This is a good point to think about accessiblity. CSS generated content is of course ok but we might want to add some ARIA attributes to indicate the visual effect of starting a line; then again, maybe that tends to be clear from context and we shouldn't add noise. It's tempting to aria-hide the counters and use table roles which would gives AT users the advantage of table walkers - a grid role would be more appropriate but then we'd have to write our own walker so definitely out of scope for a first draft. Anyway, we don't have to do this right now, we just have to keep it in mind.

Next up, let's deal with the other fixed property: comments. This is simple anyway:

.comment {
margin-left: 3em;
}

.comment::before {
content: '▷ ';
}

This looks nice enough but we should probably hide that from non-visual users. We could indicate that it's a comment but again it might be enough to grok this from context. After all, context is all we have visually. Let's take a look, in the page and source:

Content
Comment, longer
Content
Comment
<section class="algorithm">
<div class="counter"></div>
<div>Content</div>
<div class="comment">Comment, longer</div>
<div class="counter"></div>
<div>Content</div>
<div class="comment">Comment</div>
</section>

Oooh, look at that alignment! Isn't it pretty?

So, 9 lines of CSS and our basic layout is done. Nice.

Since this is a first draft, let's continue not to over think this. The begin/end routines seem to always have, well, a beginning and an end. So let's just make that happen.

.while::before,
.while::after
{
font-weight: bolder;
}

.while__start::before {
content: 'while ';
}

.while__start::after {
content: ' do';
}

.while__end::before {
content: 'end while';
}

In action and as source:

Some condition
Otherwise, we're done.
We do some work
<section class="algorithm">
<div class="counter"></div>
<div class="while while__start">
Some condition
</div>
<div class="comment">
Otherwise, we're done.
</div>
<div class="counter"></div>
<div>We do some work</div>
<div class="counter"></div>
<div class="while while__end"></div>
</section>

The final piece doesn't look very nice but neither does the original source.

There's a worse glitch though: if you look at the original, the part "inside" the loop is indented. This is slightly tricky since we don't have a good way of writing a selector for those kinds of elements. Luckily, this is a first draft and we're lazy, so let's just add a helper class.

.ml1 {
margin-left: 1em;
}

Problem solved - let's have a look:

Some condition
Otherwise, we're done.
We do some work
<section class="algorithm">
<div class="counter"></div>
<div class="while while__start">
Some condition
</div>
<div class="comment">
Otherwise, we're done.
</div>
<div class="counter"></div>
<div class="ml1">We do some work</div>
<div class="counter"></div>
<div class="while while__end"></div>
</section>

Moving on to the last issue, nested subroutines, so let's copy and paste our loop.

Some condition
Otherwise, we're done.
Some condition
Otherwise, we're done.
We do some work
<section class="algorithm">
<div class="counter"></div>
<div class="while while__start">
Some condition
</div>
<div class="comment">
Otherwise, we're done.
</div>
<div class="counter"></div>
<div class="while while__start">
Some condition
</div>
<div class="comment">
Otherwise, we're done.
</div>
<div class="counter"></div>
<div class="ml1">We do some work</div>
<div class="counter"></div>
<div class="while while__end"></div>
<div class="counter"></div>
<div class="while while__end"></div>
</section>

But of course now we have the same margin problem all over again, so we don't think too hard and add another helper.

.ml2 {
margin-left: 2em;
}

Then we can make it:

Some condition
Otherwise, we're done.
Some condition
Otherwise, we're done.
We do some work
<section class="algorithm">
<div class="counter"></div>
<div class="while while__start">
Some condition
</div>
<div class="comment">
Otherwise, we're done.
</div>
<div class="counter"></div>
<div class="ml1 while while__start">
Some condition
</div>
<div class="comment">
Otherwise, we're done.
</div>
<div class="counter"></div>
<div class="ml2">We do some work</div>
<div class="counter"></div>
<div class="ml1 while while__end"></div>
<div class="counter"></div>
<div class="while while__end"></div>
</section>

At this point, it's clear that we have a lurking problem but luckily this was just a first draft and we can finish our example. Let's just add a procedure class as follows.

.procedure::before {
font-weight: bolder;
}

.procedure__start::before {
content: 'procedure ';
}

.procedure__end::before {
content: 'end procedure';
}

And it looks like we're there.

Euclid(a, b)
The g.c.d. of a and b
r ← a mod b
r ≠ 0
We have the answer if r is 0
a ← b
b ← r
r ← a mod b
return b
The gcd is b
<section class="algorithm">
<div class="counter"></div>
<div class="procedure procedure__start">
<span class="sc">Euclid</span>(<i>a, b</i>)
</div>
<div class="comment">
The g.c.d. of <i>a</i> and <i>b</i>
</div>
<div class="counter"></div>
<div class="state ml1"><i>r ← a </i>mod<i> b</i></div>
<div class="counter"></div>
<div class="while while__start ml1">
<i>r ≠ </i>0
</div>
<div class="comment">
We have the answer if <i>r</i> is 0
</div>
<div class="counter"></div>
<div class="ml2"><i>a ← b</i></div>
<div class="counter"></div>
<div class="state ml2"><i>b ← r</i></div>
<div class="counter"></div>
<div class="ml2"><i>r ← a </i>mod<i> b</i></div>
<div class="counter"></div>
<div class="while while__end ml1"></div>
<div class="counter"></div>
<div class="ml1"><strong>return</strong> <i>b</i></div>
<div role="region" aria-label="comment" class="comment">
The gcd is <i>b</i>
</div>
<div role="region" aria-label="line" class="counter"></div>
<div class="procedure procedure__end"></div>
</section>

Coda Part 2.

Not bad for a first draft. Grid makes so many things just easy, sprinkle some counters.

There are a couple of glaring issues at this point

Inevitably, these will lead us into the weeds. But at least the web's weeds.

Part 3. But what is it

Let's go back to our starting point.

Euclid's algorithm typeset using LaTeX's algorithmicx package

In Part 1, I started at the wrong end, if you will: I looked at a TeX source and then at existing solutions that aim to stay close to print, both in authoring "convenience" and layout design. In Part 2, I went through a rought draft of realizing a simlar layout in CSS "properly".

Now, let's try to let go.

Forget the source, Luke.

What can we observe for this layout?

The first draft was trying to be so minimalistic that it skipped on the two primary semantic pieces: lines and subgroups of lines. Isn't that odd?

So thinking about those two, where would we start?

Authoring with batteries included

There are a couple of conveniences when it comes to authoring that I'd like to see preserved. The core problem is change: removing and injecting content into other content. You want it to magically work and of course it often doesn't. In particular copy&paste is a key usability challenge that good authoring tools try to solve. Obviously, this is is a hard problem in general.

It also leads to other factors such as good (automated) default behavior alongside customizability aka "macros".

The first challenge from the first draft is: how do we solve the indentation problem, i.e., how can we avoid the .ml1, .ml2 etc classes. Because if we cannot solve that, authors will have to a) add those classes themselves and b) update them whenever they move or add something to an existing structure.

Coda

This coda marks where the "original" draft ended, i.e, the draft from which I started in June 2022 (though many edits were done in June / July 2022). Handling the indentation is really the only "hard" problem for this layout so I'm not sure why I stopped here.

I have a couple of codepens from that time that might be worth writing about. And then there's the solution I adopted at work to talk about (which, inevitably, is different). Let's stop here and see if I get around to those updates.