HTML Self-closing Tags and Mental Models
July 08, 2023
Table of Contents
Context
I have a confession: Iām a longtime internet lurker, and this is one of the few times out of the woodwork to actually say anything on social media. Embarrassingly, I still donāt fully understand how to use Twitter (Am I supposed to reply to myself? Am I supposed to end with ā/1ā or something if it doesnāt fit in one tweet? How does anyone use this and why does it exist?).
Anyways, I stumbled upon this post from Jake Archibald, @jaffathecake:
It still seems weird to me that Prettier will turn <br> into <br />.
ā Jake Archibald (@jaffathecake) July 6, 2023
The / is there from XHTML - a standard that became redundant in the late 2000s.
The space before the / is there for compatibility with Netscape Navigator 4 - an engine that was dropped in the late 90s.
He followed up with a blog post of the main counterpoints he was getting in the Twitter thread.
I got tired of responding to the same points, so here's a blog post š https://t.co/U9Nf8Q9Hh5
ā Jake Archibald (@jaffathecake) July 6, 2023
This was such an interesting read for me. My foray into web development started with XHTML when I was probably 13 years old. I remember fondly running my HTML through the W3C validator and learning a bit by trial and error, and the pride and joy of putting the validated sticker on every website I made.
Seeing the old <!DOCTYPE ...>
and <?xml version="1.0" ...>
stuff again in Jakeās blog was a blast of nostalgia. It was a good refresher and put into context a lot of things I had learned and remembered but didnāt fully understand or remember the timeline.
I ended up having a bit of a back and forth on the Twitter thread, but I feel like I used Twitter wrong and it was hard to express my thoughts that concisely. I thought to myself, āWell, Iāve got this blog over here, and this smells like content!ā so here I am writing a blog post.
I want to preface this article with saying I thoroughly enjoyed reading Jakeās blog post and he makes a lot of great points. I respect his opinions, but I still have my own. I also think thereās an interesting difference in our mental models that Iām curious to explore.
Worth noting here is I come from an XHTML/HTML background and not-so-much an XML one, and also everything here seems prone to stylistic choices and opinions, so Iām not going to ādisproveā or ādiscreditā nor do I think Iām ārightā or that Jakeās āwrongā with the choices and opinions.
In fact, I tend to agree with Jakeās take on what Prettier should do:
I think Prettier should [ā¦] fix cases where /> is actively misleading.
My version of this would be stripping />
from all non-void elements. A void element (as an aside, in that same link MDN addresses how thereās no self-closing tags in HTML) is an element that cannot have children, like <img>
or <br>
, and doesnāt have a closing tag. So for example, my proposal would have Prettier take something like:
<div />
Hello
<img src="xy">
ā¦and reformat it to:
<div>
Hello
<img src="xyz" />
Finally, admittedly, you might be asking if itās really worth fighting over such trivial things. This is definitely very close to the ātabs vs. spacesā religious debate in my mind.
I can relate to Kent C. Dodds @kentcdodds:
I don't know why I'm arguing about this. All of this doesn't matter for me because I very rarely write HTML that's not processed before it hits the browser š¤”
ā Kent C. Dodds š (@kentcdodds) July 6, 2023
Iāve spent way too much time dissecting something that I ironically really donāt think I care that much about. It was a good exercise in writing, though, and itās revealing and of some other concepts and modes of thinking and (hopefully) insightful as an exploration to you, dear reader!
My Commentary
My general takeaway is Iām not convinced that />
is actively harmful when used only on void elements. In fact, I still think thereās some marginal utility for keeping it around, even if it has no functional use in HTML parsers and is ignored (which seems to be the crux of Jakeās argument). Where Jake sees the use of />
as more confusing and troublesome for newcomers, and a potential source for bugs and weird behavior, I see it as a helpful visual indicator that can help newcomers more easily identify these void elements at a glance.
On the Space
In XML, it would generally be formatted like <this/>, without the space before the /, but Netscape Navigator 4 couldnāt cope with <input type=ātextā/>, where the / immediately followed an attribute, so the spec recommended a space before the /.
I agree with Jake in that it would be silly to keep this around just for Netscape. However, Iām not fully convinced that removing the space would be better. I donāt really see a strong benefit. Do we need the space? No. But it does add a bit of pleasant separation. Yes, in other tags thereās no space between the last attribute or the tag name and the closing >
, so I could be swayed, but after years of seeing that space it provokes a weird unsettling feeling to not see it.
I think this āawkwardnessā feeling would likely go away over time, but Iām still resistant. I feel like Iām the old timer still using two spaces after a period in my emails and trying to convince younger folks that Iām right for doing so because thatās how I was taught. Despite knowing itās not necessary anymore, I just donāt see much reason to remove the space in self-closing tags. Iām not convinced thereās enough cause for me to change my habits.
I acknowledge that Jake isnāt advocating me to change my habits as much as heās advocating for the autoformatter to remove unnecessary self-closing tags rather than inserting them. In my double space analogy, heās proposing that autocorrect removes the double spaces rather than inserting them.
On Error Tolerance
During the history lesson Jake shares at the start of his blog post, he briefly mentions the following as an explanation for why we moved away from strict XHTML:
Ask yourself: If you visit the website of your local doctorās surgery to find out the opening hours, which browser is best: The one that displays the opening hours of the surgery, or the one that displays an XML parsing error message?
This is so interesting. Iāve adopted the mindset of the āfail fastā over āerror toleranceā mindset pretty hard, so from the point of view of the programmer I think Iād rather know that the markup was invalid. But, Jake is absolutely right here, the consumer wants to see the website!
Last week, we had a bug at work where something was broken for weeks, and went by unnoticed because we had clever fallback logic. While that fallback logic helped fix what we showed the users, it didnāt fix our backend and wasnāt meant to be the main path. Although better telemetry and alerting would be the best solve for this, itās hard to ignore that the error would have been caught sooner without the masking fallback - we would have immediately seen it in testing.
In the case of rendering HTML, however, I understand why browsers wanted error tolerance. I imagine itās the result of incentives - if one browser were to tolerate errors, even if the agreed spec was to āthrow upā and refuse to render anything, users would gravitate to the browser that ājust workedā - even if it went against spec. Itās probably the right choice, too - the web is a messy place and things should still function as best we can make them. Iām sure many, many websites are chock full of little errors.
This didnāt have much impact on the overall conclusion, but it was a fun to explore. It challenged my āfail fastā mindset and will probably make me less dogmatic in my thinking in the future and think more critically.
On JSX
Some Twitter users commented about how Prettierās behavior was to keep compatibility with JSX.
Jake responds to that directly:
JSX and HTML are different formats. They arenāt consistent with each other. Pretending theyāre consistent is misleading.
While this is definitely true, I find itās unfair to pretend one of the most popular usecases of JSX right now isnāt writing components for React that look and feel very, very similar to HTML. The points about className
vs class
and other differences of behavior are valid - thereās little nuances to understand when developing one or the other. But when handwriting JSX and HTML, it makes sense to try and make the experiences as similar as possible to have the easiest time context switching between the two (and for a tool like Prettier, writing the formatter in a way that allows the most reuse between the languages). I think the consistency argument is dismissed simply due to the computer (parsers) treating them differently, but the human also matters here and not having to think hard about the nuances between the two systems makes our lives easier. Iād argue that writing eg. <br />
in both JSX and HTML helps facilitate that. Put another way, just because some things are different about writing JSX vs HTML doesnāt mean more things should be different.
On XML
Similarly, Jake fielded complaints about XML compatibility and responded on his blog:
Call me some sort of purist, but if I want to parse an HTML document, Iāll use an HTML parser. I wouldnāt try and write JSON so it can be parsed by a YAML parser, so I donāt see why Iād do the same with HTML and XML.
And thatās fair. But itās hard to deny that HTML looks and feels very, very similar to XML, and XMLās influence is especially clear after the history lesson about XHTML earlier in his blog post. Divergence from writing XML-like syntax doesnāt seem like itās entirely in the programmerās best interest. If Iām coming from knowing XML and Iām reading <br>
and see no closing tag, thatās another addition to the learning curve for me to learn HTML. A small, super minor one, but one that doesnāt need to exist.
I have to acknowledge that if Iām coming from XML and discover I canāt do <div />
and get what I expect, that is a bigger learning curve bump, and Iām sure Jake would argue that using <br />
and the others makes this transition less clear - heād probably prefer XML writers more quickly unlearn />
entirely and instead learn to write more closely to what the parser actually uses (ie. unlearn XML and relearn HTML as a separate concept). I think thereās a balance to strike here between throwing out everything you know and keeping things familiar and Iām probably more on the latter side of the spectrum than Jake.
Whatās more, Prettierās existing implementation works with XHTML and HTML out of the box, which is pretty cool (even if nobody writes XHTML anymore).
This is a somewhat weak argument. But I think itās another example I think of a difference in the way Jake and I approach the topic: Iām coming from the mindset of someone sitting and reading/writing HTML as a new experience/black box, whereas I think heās coming at it with more knowledge of the parser and spec. To him, it probably feels extraneous or superfluous, given he knows the parser ignores it. But to me, it feels more correct even if itās useless - sure, it doesnāt do anything, but it makes it seem more correct to me.
My internal mental model is āeverything should close and thus have a visible closing tag or be a self-closing tagā despite the parser not actually matching that model. If anything, to me, the parser allowing the lack of self-closing tags feels like the parser being extra robust and tolerant of errors. The truth (according to Jake) is that />
is less ācorrectā per the HTML spec (HTML doesnāt have self-closing tags!), and the parser is being error tolerant by allowing you to include it and ignoring it. This feels as jarring as when I learned how to spell āBerenstain Bearsā correctly.
On Newcomers
This is probably my biggest disagreement with Jake, and the one I spent time going back and forth with him on Twitter.
I think thatās particularly bad for newcomers. Imagine youād never seen
<img src="ā¦">
before. Youād see, unlike other elements, it doesnāt have a closing tag. Debuggers and validators donāt complain about it, suggesting thereās something particular about this element you need to learn ā it doesnāt need to close, it self-closes, and itās particular in this behaviour.
Jake thinks itās good for newcomers to learn that <img>
is unique tag that doesnāt require a closing tag, and on that much we fully agree! This is the right takeaway for newcomers coming across an <img>
in the wild. But how long does it take to recognize that a tag doesnāt have a closing tag? It could be a lot of lines and scrolling before they realize that all that text isnāt inside an <img>
tag. And how likely are they to assume itās an not error? Everything else closes, thereās nothing on the surface that makes these elements look special or suspicious.
He goes on to say:
Now imagine youād never seen
<img src="ā¦" />
before. You look up this new syntax youāve discovered, and learn that it means a tag is āself-closingā. At this point, why wouldnāt you assume<iframe />
is self closing too? Or that<img src="ā¦"></img>
is valid?
I have a really interesting difference in mental model for how I learned about self-closing tags. Itās likely due to not coming from an XML background, but when I saw <img />
my takeaway wasnāt āthe />
makes any tag self-closingā it was āthe <img>
is a special element that canāt have children, and thus uses />
instead of a closing tag.ā I never saw self-closing tags for anything besides the special void elements. There werenāt ever examples with <iframe />
or <div />
or <script />
. From seeing that example, I didnāt learn the XML feature of />
as a separate functional thing changes behavior, but rather that />
was something uniquely tied to these special elements.
āBut this is wrong!ā I can hear Jake argue, āThe self-closing tag isnāt linked to those elements at all. The parser doesnāt recognize it ever - it ignores it! Youāre learning the wrong thing!ā And thereās truth in that - I learned to always self-close my <img>
tags, and hung onto that even after moving from XHTML to HTML5, none-the-wiser that my self-closing tags were being ignored. But my point is to highlight that I never tried self-closing other tags. I actually thought about it for <script>
tags, but reasoned that they can have a child (inline scripts!) so thatās why they couldnāt use the self-closing tag. The hypothetical of a newcomer trying <iframe />
after learning about />
never happened for me. Thatās mostly due, in part, to how I think newcomers learn - they pattern match. I never saw <iframe />
so I never made a <iframe />
. I never saw <img></img>
so I never wrote <img></img>
. While the larger point of ālearning the meanings and whyā I may have lacked, using self-closing tags didnāt result in any confusion or weird behavior. The fear of />
being bad for newcomers to me seems overblown.
On the flip side, I would argue self-closing tags are actually helpful, primarily when reading HTML as a human and when used only on void elements. Jake acknowledges this point briefly, saying:
But, does /> have to work to be useful? Code comments donāt āworkā. Just like />, theyāre an indication, they might be misleading, but that isnāt a good argument for removing code comments.
However, he disagrees with me on the usefulness of />
as it ādoesnāt look like a commentā and ādoesnāt always behave like a comment,ā referencing the foreign content case.
Iām not convinced that ādoesnāt look like a commentā is a particularly strong argument. The ācommentā analogy is just that, an analogy - it doesnāt need to behave or look similar to existing comments to be useful. Itās either a helpful visual indicator that a tag is a void element, or itās unhelpful clutter.
On the foreign content case, I think as long as usage lines up with expectation, itās not an issue. Since I only advocate for self-closing tags on void elements, the parser ignoring the self-closing tag but closing the tag anyway because itās special is effectively the same to the writer as though the parser closed the tag due to the self-closing tag. I donāt see that being a problem nor misleading unless you try self-closing non-void elements in HTML.
An Analogy for Self-closing Tags
Despite being told Iām terrible at analogies and metaphors, I continue to attempt them. Hopefully this is a good one:
Consider the use of a capital letter at the start of a sentence.
- Itās not semantically important: itās not necessary for a āparserā to determine the start of a sentence, as it would be able to infer it from grammatical rules and the use of punctuation. In other words, it doesnāt do anything. A capital letter doesnāt indicate the start of a sentence, the punctuation does. Much like the
/>
doesnāt close the tag, the tag is self-closed based on the type and the/>
is ignored. - One could argue itās confusing: proper nouns are also capitalized, for example, and now itās ambiguous if the start of a sentence is using a proper noun or not since itās capitalized. Readers might think that first word āReadersā is proper and write Readers capitalized elsewhere, when itās not. Much like Jakeās argument how if
/>
is used in void elements, readers might try to use/>
on non-void elements too. - But itās useful to human readers, as we donāt easily see the period-space as the periods are small (and look very similar to commas!) and spaces are everywhere in sentences.
- And thereās nothing really to be gained by switching. Itās just extra pain and relearning for nothing.
Hereās an example paragraph with no capital letters to start sentences:
Compare that last paragraph to its equivalent with capital letters at the start of sentences:
Itās a subtle stylistic difference. I suspect many people will read this and conclude that actually, we should get rid of self-closing tags and capital letters at the start of sentences, and that instead of defending self-closing tags I argued against capital letters. āWe can retrain our brains to look for periods!ā you argue. If so, then I think we just have a difference in stylistic choice and preference. We also probably have different thresholds for change - think about all the new textbooks that would have to be written to that new styleguide, the teachers that would need to relearn and teach it, and all the cascading impacts if the English world were to adopt the āno capital start of sentencesā rule. Itās a lot of retraining, and I just donāt really see a true benefit to it.
Conclusion
Itās interesting to me that Jake and I approach this topic in such different ways. His approach is close to the core and the technology: Itās about the parser and semantic utility and correctness. I approach it from the natural evolution of easing humans into new paradigms: Itās about prioritizing what makes sense to the human developers and what fits their existing mental models.
I made a similar statement in my final Tweet:
Your conclusion is basically: "Given that parsers ignore ` />` across the board, programmers should not include ` />` even when tags are self closing."
ā RickyRombo (@rickyrombo) July 7, 2023
My argument is: "Regardless of parsers, the ` />` is a useful visual indicator to human readers when used appropriately"
That said, I agree somewhat with Jakeās conclusion on what Prettier should do. Given the similar conclusions, this article might seem weird. The interest I had in writing this article was less about the direct question of āshould we continue using self-closing tagsā and more about analyzing the arguments and exploring the mindsets and mental models. Itās also been especially useful as an exercise for myself to explore and understand why Iām so impulsively hesitant to let />
go.
In some ways, Iād liken Jakeās approach to a āprescriptiveā one, where the spec defines how we should read and write HTML. If youāll let me make another attempt at an analogy from the English language, it feels to me like a teacher informing you to say āYes thatās she over thereā instead of āYes thatās her over thereā in your papers. Meanwhile Iād liken my approach to ādescriptiveā one, where I feel like if the number of people that read/write HTML a certain way is at a critical mass, then itās canon. To continue the analogy, Iād be the annoying student that would argue with the wiser, more experienced, and technically correct teacher about how saying āYes thatās she over thereā sounds terribly wrong and awkward, and we should all stick to saying āYes thatās her over thereā instead.
Overall, this was a fun exercise and deep dive. I donāt think it warrants being categorized as a technical ādeep diveā though (especially since I donāt think Iām particularly ācorrectā), so Iām making a new category called āsoapboxā and sticking it in there. I am also considering a category called āhot-takesā or āthings Iām actively wrong about but I donāt care,ā but weāll see. Anyways, congratulations on making it to the end! Iāll get off my soapbox now.