Specials is the name of a range of characters in the Unicode
character encoding standard.
The Specials code block contains code values that are neither control characters nor graphic characters, but are provided to facilitate current software practices. Of the 16 reserved code points, only 5 have been allocated as of Unicode 3.2.
Byte Order Mark (BOM)
The special character code U+FFFE is guaranteed never to be a valid Unicode character. It is used in conjunction with U+FEFF zero width no break space (Arabic Presentation Forms B) to identify character set and byte order. By convention, a zero width no break space is often placed at the beginning of a Unicode text file, where it neither adds semantics nor alters the display.
When Unicode is stored as 16-bit integers (UTF-16), the concept of byte order rears its ugly head. If your Unicode file begins with the 16-bit value FFFE, you know most likely you've got a valid Unicode file in the reverse byte order from what your machine expects. Similarly, if you have an unknown file that begins with the bytes FFFE or FEFF, you're probably looking at a Unicode text file. If the file starts with EF BB BF, you're probably looking at a UTF-8 encoded Unicode file (as EFBBBF is the UTF-8 encoding of U+FEFF). Files starting with 0000FEFF or FFFE0000 are probably UTF-32.
In some applications, there is annotating text that related so a string of annotated text. In these cases, there are some operations which need to ignore the annotations, and others that want to include them. To this end, Unicode provides three markup characters: an anchor, a separator and a terminator. To specify out of band data this way, the text stream stores
interlinear annotation anchor
The Annotated Text
interlinear annotation separator
The Annotating Text
interlinear annotation terminator
Multiple occurrences of interlinear annotation separator
are allowed, which would then delimit the annotating text into application specific sections. Annotations may be nested.
U+FFFC object replacement character is used as an insertion point for objects located within a stream of text. All information about the object is kept outside of the character stream. This character simply provides an anchor to assure correct placement of the object within the text stream.
U+FFFD replacement character is a catchall for characters that cannot otherwise be encoded in terms of known Unicode values.
As described above U+FFFE will never be assigned and is reserved for use in determining byte order.
U+FFFF will also never be a valid Unicode character, an is suitable for use as an error code or other non-character value.
's Specials code block
reserves the 16
code points from U+FFF0 to U+FFFF, of which 5
are currently assigned.
Halfwidth and Fullwidth Forms <-- Specials --> Linear B Syllabary
Number of characters added in each version of the Unicode standard :
Unicode 1.1 : 1
Unicode 2.1 : 1
Unicode 3.0 : 3
Number of characters in each General Category :
Symbol, Other So : 2
Other, Format Cf : 3
All the characters in this code block are in bidirectional category Other Neutral ON
The columns below should be interpreted as :
- The Unicode code for the character
- The character in question
- The Unicode name for the character
- The Unicode General Category for the character
- The Unicode version when this character was added
If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.
Used internally for Japanese Ruby (furigana), etc.
- U+FFF9 interlinear annotation anchor Cf 3.0
- * marks start of annotated text
- U+FFFA interlinear annotation separator Cf 3.0
- * marks start of annotating character(s)
- U+FFFB interlinear annotation terminator Cf 3.0
- * marks end of annotation block
- U+FFFC ￼ object replacement character So 2.1
- * used as placeholder in text for an otherwise unspecified object
- U+FFFD � replacement character So 1.1
- * used to replace an incoming character whose value is unknown or unrepresentable in Unicode
- * compare the use of 001A as a control character to indicate the substitute function
Some prose may have been lifted verbatim from unicode.org,