HTML Self-closing Tags and Mental Models

July 08, 2023

My commentary on "The Case Against Self-closing Tags in HTML." Did you know the HTML parser ignores them?

An image for the link preview so it's not my face. šŸ˜¬

An image for the link preview so it's not my face. šŸ˜¬

Table of Contents

Context

I have a confession: Iā€™m a longtime internet lurker, and this is one of the few times out of the woodwork to actually say anything on social media. Embarrassingly, I still donā€™t fully understand how to use Twitter (Am I supposed to reply to myself? Am I supposed to end with ā€œ/1ā€ or something if it doesnā€™t fit in one tweet? How does anyone use this and why does it exist?).

Anyways, I stumbled upon this post from Jake Archibald, @jaffathecake:

He followed up with a blog post of the main counterpoints he was getting in the Twitter thread.

This was such an interesting read for me. My foray into web development started with XHTML when I was probably 13 years old. I remember fondly running my HTML through the W3C validator and learning a bit by trial and error, and the pride and joy of putting the validated sticker on every website I made.

Seeing the old <!DOCTYPE ...> and <?xml version="1.0" ...> stuff again in Jakeā€™s blog was a blast of nostalgia. It was a good refresher and put into context a lot of things I had learned and remembered but didnā€™t fully understand or remember the timeline.

I ended up having a bit of a back and forth on the Twitter thread, but I feel like I used Twitter wrong and it was hard to express my thoughts that concisely. I thought to myself, ā€œWell, Iā€™ve got this blog over here, and this smells like content!ā€ so here I am writing a blog post.

I want to preface this article with saying I thoroughly enjoyed reading Jakeā€™s blog post and he makes a lot of great points. I respect his opinions, but I still have my own. I also think thereā€™s an interesting difference in our mental models that Iā€™m curious to explore.

Worth noting here is I come from an XHTML/HTML background and not-so-much an XML one, and also everything here seems prone to stylistic choices and opinions, so Iā€™m not going to ā€œdisproveā€ or ā€œdiscreditā€ nor do I think Iā€™m ā€œrightā€ or that Jakeā€™s ā€œwrongā€ with the choices and opinions.

In fact, I tend to agree with Jakeā€™s take on what Prettier should do:

I think Prettier should [ā€¦] fix cases where /> is actively misleading.

My version of this would be stripping /> from all non-void elements. A void element (as an aside, in that same link MDN addresses how thereā€™s no self-closing tags in HTML) is an element that cannot have children, like <img> or <br>, and doesnā€™t have a closing tag. So for example, my proposal would have Prettier take something like:

<div />
Hello
<img src="xy">

ā€¦and reformat it to:

<div>
Hello
<img src="xyz" />

Finally, admittedly, you might be asking if itā€™s really worth fighting over such trivial things. This is definitely very close to the ā€œtabs vs. spacesā€ religious debate in my mind.

I can relate to Kent C. Dodds @kentcdodds:

Iā€™ve spent way too much time dissecting something that I ironically really donā€™t think I care that much about. It was a good exercise in writing, though, and itā€™s revealing and of some other concepts and modes of thinking and (hopefully) insightful as an exploration to you, dear reader!

My Commentary

My general takeaway is Iā€™m not convinced that /> is actively harmful when used only on void elements. In fact, I still think thereā€™s some marginal utility for keeping it around, even if it has no functional use in HTML parsers and is ignored (which seems to be the crux of Jakeā€™s argument). Where Jake sees the use of /> as more confusing and troublesome for newcomers, and a potential source for bugs and weird behavior, I see it as a helpful visual indicator that can help newcomers more easily identify these void elements at a glance.

On the Space

In XML, it would generally be formatted like <this/>, without the space before the /, but Netscape Navigator 4 couldnā€™t cope with <input type=ā€œtextā€/>, where the / immediately followed an attribute, so the spec recommended a space before the /.

I agree with Jake in that it would be silly to keep this around just for Netscape. However, Iā€™m not fully convinced that removing the space would be better. I donā€™t really see a strong benefit. Do we need the space? No. But it does add a bit of pleasant separation. Yes, in other tags thereā€™s no space between the last attribute or the tag name and the closing >, so I could be swayed, but after years of seeing that space it provokes a weird unsettling feeling to not see it.

I think this ā€œawkwardnessā€ feeling would likely go away over time, but Iā€™m still resistant. I feel like Iā€™m the old timer still using two spaces after a period in my emails and trying to convince younger folks that Iā€™m right for doing so because thatā€™s how I was taught. Despite knowing itā€™s not necessary anymore, I just donā€™t see much reason to remove the space in self-closing tags. Iā€™m not convinced thereā€™s enough cause for me to change my habits.

I acknowledge that Jake isnā€™t advocating me to change my habits as much as heā€™s advocating for the autoformatter to remove unnecessary self-closing tags rather than inserting them. In my double space analogy, heā€™s proposing that autocorrect removes the double spaces rather than inserting them.

On Error Tolerance

During the history lesson Jake shares at the start of his blog post, he briefly mentions the following as an explanation for why we moved away from strict XHTML:

Ask yourself: If you visit the website of your local doctorā€™s surgery to find out the opening hours, which browser is best: The one that displays the opening hours of the surgery, or the one that displays an XML parsing error message?

This is so interesting. Iā€™ve adopted the mindset of the ā€œfail fastā€ over ā€œerror toleranceā€ mindset pretty hard, so from the point of view of the programmer I think Iā€™d rather know that the markup was invalid. But, Jake is absolutely right here, the consumer wants to see the website!

Last week, we had a bug at work where something was broken for weeks, and went by unnoticed because we had clever fallback logic. While that fallback logic helped fix what we showed the users, it didnā€™t fix our backend and wasnā€™t meant to be the main path. Although better telemetry and alerting would be the best solve for this, itā€™s hard to ignore that the error would have been caught sooner without the masking fallback - we would have immediately seen it in testing.

In the case of rendering HTML, however, I understand why browsers wanted error tolerance. I imagine itā€™s the result of incentives - if one browser were to tolerate errors, even if the agreed spec was to ā€œthrow upā€ and refuse to render anything, users would gravitate to the browser that ā€œjust workedā€ - even if it went against spec. Itā€™s probably the right choice, too - the web is a messy place and things should still function as best we can make them. Iā€™m sure many, many websites are chock full of little errors.

This didnā€™t have much impact on the overall conclusion, but it was a fun to explore. It challenged my ā€œfail fastā€ mindset and will probably make me less dogmatic in my thinking in the future and think more critically.

On JSX

Some Twitter users commented about how Prettierā€™s behavior was to keep compatibility with JSX.

Jake responds to that directly:

JSX and HTML are different formats. They arenā€™t consistent with each other. Pretending theyā€™re consistent is misleading.

While this is definitely true, I find itā€™s unfair to pretend one of the most popular usecases of JSX right now isnā€™t writing components for React that look and feel very, very similar to HTML. The points about className vs class and other differences of behavior are valid - thereā€™s little nuances to understand when developing one or the other. But when handwriting JSX and HTML, it makes sense to try and make the experiences as similar as possible to have the easiest time context switching between the two (and for a tool like Prettier, writing the formatter in a way that allows the most reuse between the languages). I think the consistency argument is dismissed simply due to the computer (parsers) treating them differently, but the human also matters here and not having to think hard about the nuances between the two systems makes our lives easier. Iā€™d argue that writing eg. <br /> in both JSX and HTML helps facilitate that. Put another way, just because some things are different about writing JSX vs HTML doesnā€™t mean more things should be different.

On XML

Similarly, Jake fielded complaints about XML compatibility and responded on his blog:

Call me some sort of purist, but if I want to parse an HTML document, Iā€™ll use an HTML parser. I wouldnā€™t try and write JSON so it can be parsed by a YAML parser, so I donā€™t see why Iā€™d do the same with HTML and XML.

And thatā€™s fair. But itā€™s hard to deny that HTML looks and feels very, very similar to XML, and XMLā€™s influence is especially clear after the history lesson about XHTML earlier in his blog post. Divergence from writing XML-like syntax doesnā€™t seem like itā€™s entirely in the programmerā€™s best interest. If Iā€™m coming from knowing XML and Iā€™m reading <br> and see no closing tag, thatā€™s another addition to the learning curve for me to learn HTML. A small, super minor one, but one that doesnā€™t need to exist.

I have to acknowledge that if Iā€™m coming from XML and discover I canā€™t do <div /> and get what I expect, that is a bigger learning curve bump, and Iā€™m sure Jake would argue that using <br /> and the others makes this transition less clear - heā€™d probably prefer XML writers more quickly unlearn /> entirely and instead learn to write more closely to what the parser actually uses (ie. unlearn XML and relearn HTML as a separate concept). I think thereā€™s a balance to strike here between throwing out everything you know and keeping things familiar and Iā€™m probably more on the latter side of the spectrum than Jake.

Whatā€™s more, Prettierā€™s existing implementation works with XHTML and HTML out of the box, which is pretty cool (even if nobody writes XHTML anymore).

This is a somewhat weak argument. But I think itā€™s another example I think of a difference in the way Jake and I approach the topic: Iā€™m coming from the mindset of someone sitting and reading/writing HTML as a new experience/black box, whereas I think heā€™s coming at it with more knowledge of the parser and spec. To him, it probably feels extraneous or superfluous, given he knows the parser ignores it. But to me, it feels more correct even if itā€™s useless - sure, it doesnā€™t do anything, but it makes it seem more correct to me.

My internal mental model is ā€œeverything should close and thus have a visible closing tag or be a self-closing tagā€ despite the parser not actually matching that model. If anything, to me, the parser allowing the lack of self-closing tags feels like the parser being extra robust and tolerant of errors. The truth (according to Jake) is that /> is less ā€œcorrectā€ per the HTML spec (HTML doesnā€™t have self-closing tags!), and the parser is being error tolerant by allowing you to include it and ignoring it. This feels as jarring as when I learned how to spell ā€œBerenstain Bearsā€ correctly.

On Newcomers

This is probably my biggest disagreement with Jake, and the one I spent time going back and forth with him on Twitter.

I think thatā€™s particularly bad for newcomers. Imagine youā€™d never seen <img src="ā€¦"> before. Youā€™d see, unlike other elements, it doesnā€™t have a closing tag. Debuggers and validators donā€™t complain about it, suggesting thereā€™s something particular about this element you need to learn ā€“ it doesnā€™t need to close, it self-closes, and itā€™s particular in this behaviour.

Jake thinks itā€™s good for newcomers to learn that <img> is unique tag that doesnā€™t require a closing tag, and on that much we fully agree! This is the right takeaway for newcomers coming across an <img> in the wild. But how long does it take to recognize that a tag doesnā€™t have a closing tag? It could be a lot of lines and scrolling before they realize that all that text isnā€™t inside an <img> tag. And how likely are they to assume itā€™s an not error? Everything else closes, thereā€™s nothing on the surface that makes these elements look special or suspicious.

He goes on to say:

Now imagine youā€™d never seen <img src="ā€¦" /> before. You look up this new syntax youā€™ve discovered, and learn that it means a tag is ā€œself-closingā€. At this point, why wouldnā€™t you assume <iframe /> is self closing too? Or that <img src="ā€¦"></img> is valid?

I have a really interesting difference in mental model for how I learned about self-closing tags. Itā€™s likely due to not coming from an XML background, but when I saw <img /> my takeaway wasnā€™t ā€œthe /> makes any tag self-closingā€ it was ā€œthe <img> is a special element that canā€™t have children, and thus uses /> instead of a closing tag.ā€ I never saw self-closing tags for anything besides the special void elements. There werenā€™t ever examples with <iframe /> or <div /> or <script />. From seeing that example, I didnā€™t learn the XML feature of /> as a separate functional thing changes behavior, but rather that /> was something uniquely tied to these special elements.

ā€œBut this is wrong!ā€ I can hear Jake argue, ā€œThe self-closing tag isnā€™t linked to those elements at all. The parser doesnā€™t recognize it ever - it ignores it! Youā€™re learning the wrong thing!ā€ And thereā€™s truth in that - I learned to always self-close my <img> tags, and hung onto that even after moving from XHTML to HTML5, none-the-wiser that my self-closing tags were being ignored. But my point is to highlight that I never tried self-closing other tags. I actually thought about it for <script> tags, but reasoned that they can have a child (inline scripts!) so thatā€™s why they couldnā€™t use the self-closing tag. The hypothetical of a newcomer trying <iframe /> after learning about /> never happened for me. Thatā€™s mostly due, in part, to how I think newcomers learn - they pattern match. I never saw <iframe /> so I never made a <iframe />. I never saw <img></img> so I never wrote <img></img>. While the larger point of ā€œlearning the meanings and whyā€ I may have lacked, using self-closing tags didnā€™t result in any confusion or weird behavior. The fear of /> being bad for newcomers to me seems overblown.

On the flip side, I would argue self-closing tags are actually helpful, primarily when reading HTML as a human and when used only on void elements. Jake acknowledges this point briefly, saying:

But, does /> have to work to be useful? Code comments donā€™t ā€˜workā€™. Just like />, theyā€™re an indication, they might be misleading, but that isnā€™t a good argument for removing code comments.

However, he disagrees with me on the usefulness of /> as it ā€œdoesnā€™t look like a commentā€ and ā€œdoesnā€™t always behave like a comment,ā€ referencing the foreign content case.

Iā€™m not convinced that ā€œdoesnā€™t look like a commentā€ is a particularly strong argument. The ā€œcommentā€ analogy is just that, an analogy - it doesnā€™t need to behave or look similar to existing comments to be useful. Itā€™s either a helpful visual indicator that a tag is a void element, or itā€™s unhelpful clutter.

On the foreign content case, I think as long as usage lines up with expectation, itā€™s not an issue. Since I only advocate for self-closing tags on void elements, the parser ignoring the self-closing tag but closing the tag anyway because itā€™s special is effectively the same to the writer as though the parser closed the tag due to the self-closing tag. I donā€™t see that being a problem nor misleading unless you try self-closing non-void elements in HTML.

An Analogy for Self-closing Tags

Despite being told Iā€™m terrible at analogies and metaphors, I continue to attempt them. Hopefully this is a good one:

Consider the use of a capital letter at the start of a sentence.

  • Itā€™s not semantically important: itā€™s not necessary for a ā€œparserā€ to determine the start of a sentence, as it would be able to infer it from grammatical rules and the use of punctuation. In other words, it doesnā€™t do anything. A capital letter doesnā€™t indicate the start of a sentence, the punctuation does. Much like the /> doesnā€™t close the tag, the tag is self-closed based on the type and the /> is ignored.
  • One could argue itā€™s confusing: proper nouns are also capitalized, for example, and now itā€™s ambiguous if the start of a sentence is using a proper noun or not since itā€™s capitalized. Readers might think that first word ā€œReadersā€ is proper and write Readers capitalized elsewhere, when itā€™s not. Much like Jakeā€™s argument how if /> is used in void elements, readers might try to use /> on non-void elements too.
  • But itā€™s useful to human readers, as we donā€™t easily see the period-space as the periods are small (and look very similar to commas!) and spaces are everywhere in sentences.
  • And thereā€™s nothing really to be gained by switching. Itā€™s just extra pain and relearning for nothing.

Hereā€™s an example paragraph with no capital letters to start sentences:

try reading this paragraph. or rather, try skimming it carefully, and notice how it feels like a run-on sentence. it never seems to terminate. try skimming it quickly, and count how many sentences it has. you might argue that's not something anyone really does. maybe instead try to find the fifth sentence, like you're trying to find a quote. but also, note how you feel reading this paragraph. you're not used to it. something feels off or "incorrect" about it. maybe it gets worse if I use a proper noun like Seattle, so there are capital letters in the middle of the sentence. that might cause more confusion because your human eye is probably trained to scan for those big capital letters, even if it's not useful here. maybe I should use something proper after a comma. the writer of the original article, Jake, made several arguments against the use of self-closing tags.

Compare that last paragraph to its equivalent with capital letters at the start of sentences:

Try reading this paragraph. Or rather, try skimming it carefully, and notice how it feels like a run-on sentence. It never seems to terminate. Try skimming it quickly, and count how many sentences it has. You might argue that's not something anyone really does. Maybe instead try to find the fifth sentence, like you're trying to find a quote. But also, note how you feel reading this paragraph. You're not used to it. Something feels off or "incorrect" about it. Maybe it gets worse if I use a proper noun like Seattle, so there are capital letters in the middle of the sentence. That might cause more confusion because your human eye is probably trained to scan for those big capital letters, even if it's not useful here. Maybe I should use something proper after a comma. The writer of the original article, Jake, made several arguments against the use of self-closing tags.

Itā€™s a subtle stylistic difference. I suspect many people will read this and conclude that actually, we should get rid of self-closing tags and capital letters at the start of sentences, and that instead of defending self-closing tags I argued against capital letters. ā€œWe can retrain our brains to look for periods!ā€ you argue. If so, then I think we just have a difference in stylistic choice and preference. We also probably have different thresholds for change - think about all the new textbooks that would have to be written to that new styleguide, the teachers that would need to relearn and teach it, and all the cascading impacts if the English world were to adopt the ā€œno capital start of sentencesā€ rule. Itā€™s a lot of retraining, and I just donā€™t really see a true benefit to it.

Conclusion

Itā€™s interesting to me that Jake and I approach this topic in such different ways. His approach is close to the core and the technology: Itā€™s about the parser and semantic utility and correctness. I approach it from the natural evolution of easing humans into new paradigms: Itā€™s about prioritizing what makes sense to the human developers and what fits their existing mental models.

I made a similar statement in my final Tweet:

That said, I agree somewhat with Jakeā€™s conclusion on what Prettier should do. Given the similar conclusions, this article might seem weird. The interest I had in writing this article was less about the direct question of ā€œshould we continue using self-closing tagsā€ and more about analyzing the arguments and exploring the mindsets and mental models. Itā€™s also been especially useful as an exercise for myself to explore and understand why Iā€™m so impulsively hesitant to let /> go.

In some ways, Iā€™d liken Jakeā€™s approach to a ā€œprescriptiveā€ one, where the spec defines how we should read and write HTML. If youā€™ll let me make another attempt at an analogy from the English language, it feels to me like a teacher informing you to say ā€œYes thatā€™s she over thereā€ instead of ā€œYes thatā€™s her over thereā€ in your papers. Meanwhile Iā€™d liken my approach to ā€œdescriptiveā€ one, where I feel like if the number of people that read/write HTML a certain way is at a critical mass, then itā€™s canon. To continue the analogy, Iā€™d be the annoying student that would argue with the wiser, more experienced, and technically correct teacher about how saying ā€œYes thatā€™s she over thereā€ sounds terribly wrong and awkward, and we should all stick to saying ā€œYes thatā€™s her over thereā€ instead.

Overall, this was a fun exercise and deep dive. I donā€™t think it warrants being categorized as a technical ā€œdeep diveā€ though (especially since I donā€™t think Iā€™m particularly ā€œcorrectā€), so Iā€™m making a new category called ā€œsoapboxā€ and sticking it in there. I am also considering a category called ā€œhot-takesā€ or ā€œthings Iā€™m actively wrong about but I donā€™t care,ā€ but weā€™ll see. Anyways, congratulations on making it to the end! Iā€™ll get off my soapbox now.


Get new posts in your inbox

Profile picture

Written by Marcus Pasell, a programmer who doesn't know anything. Don't listen to him.