Special Characters --
Character and Entity References



Certain characters, such as the left bracket (<), ampersand (&), etc. are reserved by HTML to represent special attributes such as the start of HTML elements, graphic characters, and so on. In addition there are many ISO-Latin 1 characters that you may wish to include in a document, but which are not trivially available on a standard keyboard.

HTML allows special referencing to represent these special characters. These are indicated by either character references or entity references.

Character References

Character references are composed of three parts: For example the character reference for less than symbol (<) is &#60;.

Note that this number depends on the character set being used -- for example, in some character sets, the 60th character may not be the less than symbol. Thus it is more convenient (and universal) to have a symbolic reference for a character, as opposed to an absolute numeric reference. In HTML (and SGML) such references are called entity references.

Entity References

Entity references are similar, but use symbolic names to represent the characters. Entity references also have three parts: Thus the entity reference for less than symbol (<) is &lt;.

Beware of Certain Entity References!

Note that, in HTML 2, not all the valid characters have corresponding entity references. In theses cases you may need to use the direct numerical character references. HTML 3 and 3.2 attempted to rectify this by defining a number of additional references, but some of the newer entity references are not understood by all browsers (these newer entity references are shown, in the ISO data table slightly indented and in an italics font. This table also shows the numeric decimal codes for all the ISO Latin-1 characters.

Character Set/Entity Reference Test Documents

The ISO data table document lists all the ISO Latin-1 characters, alongside their numerical positions in the character set (both decimal -- used by HTML character references, and hexadecimal -- used by URL character encodings) and the corresponding entity reference, if defined.