HTML Self-closing Tags and Mental Models

July 08, 2023

My commentary on "The Case Against Self-closing Tags in HTML." Did you know the HTML parser ignores them?

An image for the link preview so it's not my face. 😬

An image for the link preview so it's not my face. 😬

Table of Contents

Context

I have a confession: I’m a longtime internet lurker, and this is one of the few times out of the woodwork to actually say anything on social media. Embarrassingly, I still don’t fully understand how to use Twitter (Am I supposed to reply to myself? Am I supposed to end with ā€œ/1ā€ or something if it doesn’t fit in one tweet? How does anyone use this and why does it exist?).

Anyways, I stumbled upon this post from Jake Archibald, @jaffathecake:

He followed up with a blog post of the main counterpoints he was getting in the Twitter thread.

This was such an interesting read for me. My foray into web development started with XHTML when I was probably 13 years old. I remember fondly running my HTML through the W3C validator and learning a bit by trial and error, and the pride and joy of putting the validated sticker on every website I made.

Seeing the old <!DOCTYPE ...> and <?xml version="1.0" ...> stuff again in Jake’s blog was a blast of nostalgia. It was a good refresher and put into context a lot of things I had learned and remembered but didn’t fully understand or remember the timeline.

I ended up having a bit of a back and forth on the Twitter thread, but I feel like I used Twitter wrong and it was hard to express my thoughts that concisely. I thought to myself, ā€œWell, I’ve got this blog over here, and this smells like content!ā€ so here I am writing a blog post.

I want to preface this article with saying I thoroughly enjoyed reading Jake’s blog post and he makes a lot of great points. I respect his opinions, but I still have my own. I also think there’s an interesting difference in our mental models that I’m curious to explore.

Worth noting here is I come from an XHTML/HTML background and not-so-much an XML one, and also everything here seems prone to stylistic choices and opinions, so I’m not going to ā€œdisproveā€ or ā€œdiscreditā€ nor do I think I’m ā€œrightā€ or that Jake’s ā€œwrongā€ with the choices and opinions.

In fact, I tend to agree with Jake’s take on what Prettier should do:

I think Prettier should […] fix cases where /> is actively misleading.

My version of this would be stripping /> from all non-void elements. A void element (as an aside, in that same link MDN addresses how there’s no self-closing tags in HTML) is an element that cannot have children, like <img> or <br>, and doesn’t have a closing tag. So for example, my proposal would have Prettier take something like:

<div />
Hello
<img src="xy">

…and reformat it to:

<div>
Hello
<img src="xyz" />

Finally, admittedly, you might be asking if it’s really worth fighting over such trivial things. This is definitely very close to the ā€œtabs vs. spacesā€ religious debate in my mind.

I can relate to Kent C. Dodds @kentcdodds:

I’ve spent way too much time dissecting something that I ironically really don’t think I care that much about. It was a good exercise in writing, though, and it’s revealing and of some other concepts and modes of thinking and (hopefully) insightful as an exploration to you, dear reader!

My Commentary

My general takeaway is I’m not convinced that /> is actively harmful when used only on void elements. In fact, I still think there’s some marginal utility for keeping it around, even if it has no functional use in HTML parsers and is ignored (which seems to be the crux of Jake’s argument). Where Jake sees the use of /> as more confusing and troublesome for newcomers, and a potential source for bugs and weird behavior, I see it as a helpful visual indicator that can help newcomers more easily identify these void elements at a glance.

On the Space

In XML, it would generally be formatted like <this/>, without the space before the /, but Netscape Navigator 4 couldn’t cope with <input type=ā€œtextā€/>, where the / immediately followed an attribute, so the spec recommended a space before the /.

I agree with Jake in that it would be silly to keep this around just for Netscape. However, I’m not fully convinced that removing the space would be better. I don’t really see a strong benefit. Do we need the space? No. But it does add a bit of pleasant separation. Yes, in other tags there’s no space between the last attribute or the tag name and the closing >, so I could be swayed, but after years of seeing that space it provokes a weird unsettling feeling to not see it.

I think this ā€œawkwardnessā€ feeling would likely go away over time, but I’m still resistant. I feel like I’m the old timer still using two spaces after a period in my emails and trying to convince younger folks that I’m right for doing so because that’s how I was taught. Despite knowing it’s not necessary anymore, I just don’t see much reason to remove the space in self-closing tags. I’m not convinced there’s enough cause for me to change my habits.

I acknowledge that Jake isn’t advocating me to change my habits as much as he’s advocating for the autoformatter to remove unnecessary self-closing tags rather than inserting them. In my double space analogy, he’s proposing that autocorrect removes the double spaces rather than inserting them.

On Error Tolerance

During the history lesson Jake shares at the start of his blog post, he briefly mentions the following as an explanation for why we moved away from strict XHTML:

Ask yourself: If you visit the website of your local doctor’s surgery to find out the opening hours, which browser is best: The one that displays the opening hours of the surgery, or the one that displays an XML parsing error message?

This is so interesting. I’ve adopted the mindset of the ā€œfail fastā€ over ā€œerror toleranceā€ mindset pretty hard, so from the point of view of the programmer I think I’d rather know that the markup was invalid. But, Jake is absolutely right here, the consumer wants to see the website!

Last week, we had a bug at work where something was broken for weeks, and went by unnoticed because we had clever fallback logic. While that fallback logic helped fix what we showed the users, it didn’t fix our backend and wasn’t meant to be the main path. Although better telemetry and alerting would be the best solve for this, it’s hard to ignore that the error would have been caught sooner without the masking fallback - we would have immediately seen it in testing.

In the case of rendering HTML, however, I understand why browsers wanted error tolerance. I imagine it’s the result of incentives - if one browser were to tolerate errors, even if the agreed spec was to ā€œthrow upā€ and refuse to render anything, users would gravitate to the browser that ā€œjust workedā€ - even if it went against spec. It’s probably the right choice, too - the web is a messy place and things should still function as best we can make them. I’m sure many, many websites are chock full of little errors.

This didn’t have much impact on the overall conclusion, but it was a fun to explore. It challenged my ā€œfail fastā€ mindset and will probably make me less dogmatic in my thinking in the future and think more critically.

On JSX

Some Twitter users commented about how Prettier’s behavior was to keep compatibility with JSX.

Jake responds to that directly:

JSX and HTML are different formats. They aren’t consistent with each other. Pretending they’re consistent is misleading.

While this is definitely true, I find it’s unfair to pretend one of the most popular usecases of JSX right now isn’t writing components for React that look and feel very, very similar to HTML. The points about className vs class and other differences of behavior are valid - there’s little nuances to understand when developing one or the other. But when handwriting JSX and HTML, it makes sense to try and make the experiences as similar as possible to have the easiest time context switching between the two (and for a tool like Prettier, writing the formatter in a way that allows the most reuse between the languages). I think the consistency argument is dismissed simply due to the computer (parsers) treating them differently, but the human also matters here and not having to think hard about the nuances between the two systems makes our lives easier. I’d argue that writing eg. <br /> in both JSX and HTML helps facilitate that. Put another way, just because some things are different about writing JSX vs HTML doesn’t mean more things should be different.

On XML

Similarly, Jake fielded complaints about XML compatibility and responded on his blog:

Call me some sort of purist, but if I want to parse an HTML document, I’ll use an HTML parser. I wouldn’t try and write JSON so it can be parsed by a YAML parser, so I don’t see why I’d do the same with HTML and XML.

And that’s fair. But it’s hard to deny that HTML looks and feels very, very similar to XML, and XML’s influence is especially clear after the history lesson about XHTML earlier in his blog post. Divergence from writing XML-like syntax doesn’t seem like it’s entirely in the programmer’s best interest. If I’m coming from knowing XML and I’m reading <br> and see no closing tag, that’s another addition to the learning curve for me to learn HTML. A small, super minor one, but one that doesn’t need to exist.

I have to acknowledge that if I’m coming from XML and discover I can’t do <div /> and get what I expect, that is a bigger learning curve bump, and I’m sure Jake would argue that using <br /> and the others makes this transition less clear - he’d probably prefer XML writers more quickly unlearn /> entirely and instead learn to write more closely to what the parser actually uses (ie. unlearn XML and relearn HTML as a separate concept). I think there’s a balance to strike here between throwing out everything you know and keeping things familiar and I’m probably more on the latter side of the spectrum than Jake.

What’s more, Prettier’s existing implementation works with XHTML and HTML out of the box, which is pretty cool (even if nobody writes XHTML anymore).

This is a somewhat weak argument. But I think it’s another example I think of a difference in the way Jake and I approach the topic: I’m coming from the mindset of someone sitting and reading/writing HTML as a new experience/black box, whereas I think he’s coming at it with more knowledge of the parser and spec. To him, it probably feels extraneous or superfluous, given he knows the parser ignores it. But to me, it feels more correct even if it’s useless - sure, it doesn’t do anything, but it makes it seem more correct to me.

My internal mental model is ā€œeverything should close and thus have a visible closing tag or be a self-closing tagā€ despite the parser not actually matching that model. If anything, to me, the parser allowing the lack of self-closing tags feels like the parser being extra robust and tolerant of errors. The truth (according to Jake) is that /> is less ā€œcorrectā€ per the HTML spec (HTML doesn’t have self-closing tags!), and the parser is being error tolerant by allowing you to include it and ignoring it. This feels as jarring as when I learned how to spell ā€œBerenstain Bearsā€ correctly.

On Newcomers

This is probably my biggest disagreement with Jake, and the one I spent time going back and forth with him on Twitter.

I think that’s particularly bad for newcomers. Imagine you’d never seen <img src="…"> before. You’d see, unlike other elements, it doesn’t have a closing tag. Debuggers and validators don’t complain about it, suggesting there’s something particular about this element you need to learn – it doesn’t need to close, it self-closes, and it’s particular in this behaviour.

Jake thinks it’s good for newcomers to learn that <img> is unique tag that doesn’t require a closing tag, and on that much we fully agree! This is the right takeaway for newcomers coming across an <img> in the wild. But how long does it take to recognize that a tag doesn’t have a closing tag? It could be a lot of lines and scrolling before they realize that all that text isn’t inside an <img> tag. And how likely are they to assume it’s an not error? Everything else closes, there’s nothing on the surface that makes these elements look special or suspicious.

He goes on to say:

Now imagine you’d never seen <img src="…" /> before. You look up this new syntax you’ve discovered, and learn that it means a tag is ā€œself-closingā€. At this point, why wouldn’t you assume <iframe /> is self closing too? Or that <img src="…"></img> is valid?

I have a really interesting difference in mental model for how I learned about self-closing tags. It’s likely due to not coming from an XML background, but when I saw <img /> my takeaway wasn’t ā€œthe /> makes any tag self-closingā€ it was ā€œthe <img> is a special element that can’t have children, and thus uses /> instead of a closing tag.ā€ I never saw self-closing tags for anything besides the special void elements. There weren’t ever examples with <iframe /> or <div /> or <script />. From seeing that example, I didn’t learn the XML feature of /> as a separate functional thing changes behavior, but rather that /> was something uniquely tied to these special elements.

ā€œBut this is wrong!ā€ I can hear Jake argue, ā€œThe self-closing tag isn’t linked to those elements at all. The parser doesn’t recognize it ever - it ignores it! You’re learning the wrong thing!ā€ And there’s truth in that - I learned to always self-close my <img> tags, and hung onto that even after moving from XHTML to HTML5, none-the-wiser that my self-closing tags were being ignored. But my point is to highlight that I never tried self-closing other tags. I actually thought about it for <script> tags, but reasoned that they can have a child (inline scripts!) so that’s why they couldn’t use the self-closing tag. The hypothetical of a newcomer trying <iframe /> after learning about /> never happened for me. That’s mostly due, in part, to how I think newcomers learn - they pattern match. I never saw <iframe /> so I never made a <iframe />. I never saw <img></img> so I never wrote <img></img>. While the larger point of ā€œlearning the meanings and whyā€ I may have lacked, using self-closing tags didn’t result in any confusion or weird behavior. The fear of /> being bad for newcomers to me seems overblown.

On the flip side, I would argue self-closing tags are actually helpful, primarily when reading HTML as a human and when used only on void elements. Jake acknowledges this point briefly, saying:

But, does /> have to work to be useful? Code comments don’t ā€˜work’. Just like />, they’re an indication, they might be misleading, but that isn’t a good argument for removing code comments.

However, he disagrees with me on the usefulness of /> as it ā€œdoesn’t look like a commentā€ and ā€œdoesn’t always behave like a comment,ā€ referencing the foreign content case.

I’m not convinced that ā€œdoesn’t look like a commentā€ is a particularly strong argument. The ā€œcommentā€ analogy is just that, an analogy - it doesn’t need to behave or look similar to existing comments to be useful. It’s either a helpful visual indicator that a tag is a void element, or it’s unhelpful clutter.

On the foreign content case, I think as long as usage lines up with expectation, it’s not an issue. Since I only advocate for self-closing tags on void elements, the parser ignoring the self-closing tag but closing the tag anyway because it’s special is effectively the same to the writer as though the parser closed the tag due to the self-closing tag. I don’t see that being a problem nor misleading unless you try self-closing non-void elements in HTML.

An Analogy for Self-closing Tags

Despite being told I’m terrible at analogies and metaphors, I continue to attempt them. Hopefully this is a good one:

Consider the use of a capital letter at the start of a sentence.

  • It’s not semantically important: it’s not necessary for a ā€œparserā€ to determine the start of a sentence, as it would be able to infer it from grammatical rules and the use of punctuation. In other words, it doesn’t do anything. A capital letter doesn’t indicate the start of a sentence, the punctuation does. Much like the /> doesn’t close the tag, the tag is self-closed based on the type and the /> is ignored.
  • One could argue it’s confusing: proper nouns are also capitalized, for example, and now it’s ambiguous if the start of a sentence is using a proper noun or not since it’s capitalized. Readers might think that first word ā€œReadersā€ is proper and write Readers capitalized elsewhere, when it’s not. Much like Jake’s argument how if /> is used in void elements, readers might try to use /> on non-void elements too.
  • But it’s useful to human readers, as we don’t easily see the period-space as the periods are small (and look very similar to commas!) and spaces are everywhere in sentences.
  • And there’s nothing really to be gained by switching. It’s just extra pain and relearning for nothing.

Here’s an example paragraph with no capital letters to start sentences:

try reading this paragraph. or rather, try skimming it carefully, and notice how it feels like a run-on sentence. it never seems to terminate. try skimming it quickly, and count how many sentences it has. you might argue that's not something anyone really does. maybe instead try to find the fifth sentence, like you're trying to find a quote. but also, note how you feel reading this paragraph. you're not used to it. something feels off or "incorrect" about it. maybe it gets worse if I use a proper noun like Seattle, so there are capital letters in the middle of the sentence. that might cause more confusion because your human eye is probably trained to scan for those big capital letters, even if it's not useful here. maybe I should use something proper after a comma. the writer of the original article, Jake, made several arguments against the use of self-closing tags.

Compare that last paragraph to its equivalent with capital letters at the start of sentences:

Try reading this paragraph. Or rather, try skimming it carefully, and notice how it feels like a run-on sentence. It never seems to terminate. Try skimming it quickly, and count how many sentences it has. You might argue that's not something anyone really does. Maybe instead try to find the fifth sentence, like you're trying to find a quote. But also, note how you feel reading this paragraph. You're not used to it. Something feels off or "incorrect" about it. Maybe it gets worse if I use a proper noun like Seattle, so there are capital letters in the middle of the sentence. That might cause more confusion because your human eye is probably trained to scan for those big capital letters, even if it's not useful here. Maybe I should use something proper after a comma. The writer of the original article, Jake, made several arguments against the use of self-closing tags.

It’s a subtle stylistic difference. I suspect many people will read this and conclude that actually, we should get rid of self-closing tags and capital letters at the start of sentences, and that instead of defending self-closing tags I argued against capital letters. ā€œWe can retrain our brains to look for periods!ā€ you argue. If so, then I think we just have a difference in stylistic choice and preference. We also probably have different thresholds for change - think about all the new textbooks that would have to be written to that new styleguide, the teachers that would need to relearn and teach it, and all the cascading impacts if the English world were to adopt the ā€œno capital start of sentencesā€ rule. It’s a lot of retraining, and I just don’t really see a true benefit to it.

Conclusion

It’s interesting to me that Jake and I approach this topic in such different ways. His approach is close to the core and the technology: It’s about the parser and semantic utility and correctness. I approach it from the natural evolution of easing humans into new paradigms: It’s about prioritizing what makes sense to the human developers and what fits their existing mental models.

I made a similar statement in my final Tweet:

That said, I agree somewhat with Jake’s conclusion on what Prettier should do. Given the similar conclusions, this article might seem weird. The interest I had in writing this article was less about the direct question of ā€œshould we continue using self-closing tagsā€ and more about analyzing the arguments and exploring the mindsets and mental models. It’s also been especially useful as an exercise for myself to explore and understand why I’m so impulsively hesitant to let /> go.

In some ways, I’d liken Jake’s approach to a ā€œprescriptiveā€ one, where the spec defines how we should read and write HTML. If you’ll let me make another attempt at an analogy from the English language, it feels to me like a teacher informing you to say ā€œYes that’s she over thereā€ instead of ā€œYes that’s her over thereā€ in your papers. Meanwhile I’d liken my approach to ā€œdescriptiveā€ one, where I feel like if the number of people that read/write HTML a certain way is at a critical mass, then it’s canon. To continue the analogy, I’d be the annoying student that would argue with the wiser, more experienced, and technically correct teacher about how saying ā€œYes that’s she over thereā€ sounds terribly wrong and awkward, and we should all stick to saying ā€œYes that’s her over thereā€ instead.

Overall, this was a fun exercise and deep dive. I don’t think it warrants being categorized as a technical ā€œdeep diveā€ though (especially since I don’t think I’m particularly ā€œcorrectā€), so I’m making a new category called ā€œsoapboxā€ and sticking it in there. I am also considering a category called ā€œhot-takesā€ or ā€œthings I’m actively wrong about but I don’t care,ā€ but we’ll see. Anyways, congratulations on making it to the end! I’ll get off my soapbox now.


Get new posts in your inbox

Written by Marcus Pasell, a programmer who doesn't know anything. Don't listen to him.