Pitfalls in HTML

HTML_MISTAKE, an HTML document which discusses some mistakes that the HTML language seems to foster.


HTML_MISTAKE is available in a C version and a FORTRAN77 version and a FORTRAN90 version and an HTML version and a MATLAB version.

Special Characters:

It should be no suprise to a writer of HTML code that certain characters are going to be treated specially. In particular, an HTML document is liable to be simply infested with...the "less-than" and "greater-than" characters, also known as "angle brackets", since these are used to delimit many HTML tags.

However, the "less-than" and "greater-than" characters have those names precisely because they have an important role in mathematical formulas. A mathematican will frequently wish to say that a quantity x is less than a quantity y, and can be forgiven for writing this in HTML code hastily as x

Oops, did you not see what I wrote there? I wrote "x less than y" but I used the symbol for "less than". Let's try to do that again: "x < y".

I didn't have a space around the less than sign. Unfortunately, the less than sign is used by HTML as a metacharacter, that is, it doesn't mean "less than", it means "This is the beginning of some special HTML information." Using the less than sign incorrectly (that is, correctly in a normal world, but in the HTML world you have to ask permission to do so!) is enough to confuse a browser into thinking I have begun some HTML tag, which ought to be invisible to the reader, and which continues until either a matching "greater than" sign is encountered or (HTML being a surprisingly forgiving parenthesis language) a new "less than" sign is encountered, indicating that, since a new tag is being started, we must assume the previous tag is abruptly terminated.

Now that means you can sometimes get away with using a less than sign simply by placing a space between it and any subsequent character, since at least some browsers will assume that this means a literal less than was intended, not a tag.

You might however, prefer not to have unnecessary spaces, in which case you might be tempted to try the <pre>text</pre> construction, known as "preformatted text", which seems to allow you to type something between the tags that will then be reproduced verbatim. The trouble is, of course, that once the browser sees the opening <pre> tag, it has to read the subsequent text in such a way that it catches the closing </pre> to close the quote. But a browser may also notice anything else in the intended quote reqion that looks like a tag, and mistakenly treat it as such. Let's try a couple examples:

Here is <pre> x < y </pre> :

 x < y 

Here is <pre>x<y</pre> :


Wow, that stray misunderstood tag has really messed up the page now!

Of course, HTML offers you an escape. Simply replace every literal occurence of less than or greater than by &lt; and &gt; This is a solution, but it's an annoyingly ugly one. On top of that, HTML seems to let you sneak by sometimes, when you have spaces that suggest that a less than is NOT a tag, and your browser is forgiving, so you forget that you are in the danger zone.

Of course, the whole question of escape characters brings me back to the infuriating question of "Where's that META key they keep talking about?" and the fact that, really, when we write about writing, we essentially need to use two keyboards, one in the "target" language and one in the "meta" language.

In any case, that's the choice HTML made, and now we have to live with the fact that human beings (in the increasingly rare cases when it's a human being writing an HTML page!) will gladly fall into any pit that has been prepared for them.

You can return to the HTML web page.

Last revised on 07 July 2016.