The
Unicode standard encodes
eighteen different space characters, differing in width and
layout behavior.
The most commonly used space character is U+0020 space.
Another big favorite is its non-breaking counterpart U+00A0 no break space.
These two characters have the same width, but behave differently for line breaking. no break space behaves like a numeric separator for the purposes of bidirectional layout (see Bidirectional Behavior). In ideographic text, U+3000 ideographic space
is commonly used because its width matches that of the ideographs (i.e. it is a fullwidth character).
The main difference among other space characters is their width.
U+2000 to U+2006 are standard quad widths used in typography.
U+2007 figure space has the same width as a digit.
U+2008 punctuation space has the same width as a period.
The fixed-width space characters U+2000 to U+200A are derived from conventional (hot lead) typography. Algorithmic kerning and justification in computerized typography do not use these characters. When they are used, they typically do not expand during justification, except for U+2009 thin space which sometimes does.
Space character with special behavior in word or line breaking are described in Line and Word Breaking and Layout Controls.
The use of
U+FEFF zero width no break space
as a spacing character has been deprecated in Unicode 3.2.
The character
U+2060 word joiner
should be used instead, allowing U+FEFF to be used exclusively for its most common role as a Byte Order Marker (BOM).
Note that the list below contains every Unicode character with the General Category Zs or Spacing Separator.
As of version
4.0, the
Unicode standard has
26 semantically distinct varients of the space character.
They are enumerated below, separated by
code block
Number of characters added in each version of the Unicode standard :
Unicode 1.1 : 22
Unicode 3.0 : 3
Unicode 3.2 : 1
Number of characters in each General Category :
Separator, Space Zs : 18
Separator, Line Zl : 1
Separator, Paragraph Zp : 1
Other, Control Cc : 6
Number of characters in each Bidirectional Category :
Common Number Separator CS : 1
Paragraph Separator B : 4
Segment Separator S : 2
Whitespace WS : 19
The columns below should be interpreted as :
- The Unicode code for the character
- The character in question
- The Unicode name for the character
- The Unicode General Category for the character
- The Unicode Bidirectional Category for the character
- The Unicode version when this character was added
If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.
Basic Latin
- U+0009 ( ) character tabulation Cc S 1.1
- sgml 	
- aka horizontal tabulation (ht), tab
- U+000A (
) line feed Cc B 1.1
- sgml 

- aka line feed (lf)
- aka new line (nl), end of line (eol)
- U+000B () line tabulation Cc S 1.1
- U+000C () form feed Cc WS 1.1
- aka form feed (ff)
- U+000D (
) carriage return Cc B 1.1
- aka carriage return (cr)
ASCII punctuation and symbols
- U+0020 ( ) space Zs WS 1.1
- * sometimes considered a control code
- * other space characters: 2000-200A
- ref U+00A0 no break space (Latin-1 Supplement)
- ref U+200B zero width space (General Punctuation)
- ref U+2060 word joiner (General Punctuation)
- ref U+3000 ideographic space (CJK Symbols and Punctuation)
- ref U+FEFF zero width no break space (Arabic Presentation Forms B)
Latin-1 Supplement
C1 controls
- U+0085 (…) next line Cc B 1.1
- aka next line (nel)
Latin-1 punctuation and symbols
- U+00A0 ( ) no break space Zs CS 1.1
- html
- sgml
- aka nbsp
- ref U+0020 space (Basic Latin)
- ref U+2007 figure space (General Punctuation)
- ref U+202F narrow no break space (General Punctuation)
- ref U+2060 word joiner (General Punctuation)
- ref U+FEFF zero width no break space (Arabic Presentation Forms B)
Ogham
Punctuation
- U+1680 ( ) Ogham space mark Zs WS 3.0
- * glyph is blank in "stemless" style fonts
Mongolian
Format controls
- U+180E () Mongolian vowel separator Zs WS 3.0
- aka mvs
General Punctuation
Spaces
- U+2000 ( ) en quad Zs WS 1.1
- U+2001 ( ) em quad Zs WS 1.1
- aka mutton quad
- U+2002 ( ) en space Zs WS 1.1
- html  
- sgml  
- aka nut
- * half an em
- U+2003 ( ) em space Zs WS 1.1
- html  
- sgml  
- aka mutton
- * nominally, a space equal to the type size in points
- * may scale by the condensation factor of a font
- U+2004 ( ) three per em space Zs WS 1.1
- sgml  
- aka thick space
- U+2005 ( ) four per em space Zs WS 1.1
- sgml  
- aka mid space
- U+2006 ( ) six per em space Zs WS 1.1
- * in computer typography sometimes equated to thin space
- U+2007 ( ) figure space Zs WS 1.1
- sgml  
- * space equal to tabular width of a font
- * this is equivalent to the digit width of fonts with fixed-width digits
- U+2008 ( ) punctuation space Zs WS 1.1
- sgml  
- * space equal to narrow punctuation of a font
- U+2009 ( ) thin space Zs WS 1.1
- html  
- sgml    
- * a fifth of an em (or sometimes a sixth)
- U+200A ( ) hair space Zs WS 1.1
- sgml  
- * thinner than a thin space
- * in traditional typography, the thinnest space available
Formatting characters
- U+2028 (
) line separator Zl WS 1.1
- * may be used to represent this semantic unambiguously
- U+2029 (
) paragraph separator Zp B 1.1
- * may be used to represent this semantic unambiguously
- U+202F ( ) narrow no break space Zs WS 3.0
- aka nnbsp
- ref U+00A0 no break space (Latin-1 Supplement)
Space
- U+205F ( ) medium mathematical space Zs WS 3.2
- sgml  
- aka mmsp
- * four-eighteenths of an em
CJK Symbols and Punctuation
CJK symbols and punctuation
- U+3000 ( ) ideographic space Zs WS 1.1
- ref U+0020 space (Basic Latin)
http://unicode.org