In computer
programming languages,
a token is a small number
that replaces a longer word (usually a keyword) from the source code.
Many early versions of BASIC tokenized their source code as you typed it in.
Since there is a one to one correspondence between tokens and keywords,
it is easy to list the program by simply emitting the words represented
by the tokens. Many interpreted languages tokenize their source code
before interpreting it.
While tokens can be a form of bytecode, they are distinguished from
byte code in that tokens directly represent elements of
the syntax, where bytecode is more like machine language, and may be
optimized and typically does not contain the original symbol names and
other language syntax, thus making it difficult to reverse back into
source code.
Also, compilers use tokens in their early stages.
For example, the first pass of a compiler might be written
in lex (or flex), which generates a lexical analyzer.
The lexical analyzer recognizes simple units of syntax (including strings and numbers) and passes back a token
which is then fed to the FSM that parses
the grammar. In this application, the token may also have a value
associated with it. (e.g., token: NUMBER value: 42; token: STRING value: "stuff in quotes")