Before going into the details of the algorithms, some terminology:

The divisor is the number being divided; for example, in 5/7 the divisor is 5.

The dividend is the number of divisions; for example, in 5/7 the dividend is 7.

Logic shift left rolls a 0-valued bit into the right [least-significant] bit
of a register, and rolls the left [most-significant] bit into a carry flag
[rolling all the intermediate bits too, of course].

Shift left rolls the carry flag into the right bit, and then rolls the left bit into
the carry flag [also rolling all the intermediate bits].

Rotate left rolls the left bit into the right bit *and* into the carry flag
[rolling all the intermediate bits].

For the sake of these code snippets, the value of the divisor is represented by
`x`, and is considered to be an `m`-bit value, and the value of the
dividend is represented by `y`, and is considered to be an `n`-bit
value. [Typically, `m` = `n`, however, algorithms are more useful
the more generic they are, so I choose not to restrict my writeup to that particular
special case.]

One special case that always needs to be kept in mind when doing *any* kind of
division is division by 0. The details for handling that case are left for another
writeup. Most of this writeup concerns dividing two unsigned values, the rules for
using signed values are explained near the end.

One method of dividing is by multiplying by a reciprocal.
[When using the method described below, another special case is dividing by 1 - this method will not
work when `y` is 1, so for completeness a test for (`y` = 1) and special
handling of that trivial case will have to be added to the algorithm described below.]
To divide `x` [stored in an `m`-bit wide register
`A`] by `y` (`y` being greater than 1), let `L` be a
`p`-bit register [`p` ≥ (`m` + `n`)]
containing the value (2^{p} / `y`) [rounded
down]. Multiply `A` by `L`, using an `m`-bit by
`p`-bit multiply routine, add `L` to the result, then drop the least
significant `p` bits. [Mathematically, multiplying by `L` then
adding `L` is the same as incrementing the value of `x` before
multiplying, but if the value of `x` were (2^{m}
- 1) then an increment would push the value beyond the representable limits of register
`A`. Therefore, the safest implementation is to multiply then add.] The
result turns out to be floor(((`X` + 1) *
floor(2^{p} / `y`)) /
2^{p}), or simply ((`x` + 1) / `y`) with
a rounding error introduced by the 'floor' functions that, in the end, gives the value
floor(`x` / `y`). As long as `p` is not less than (`m`
+ `n`), the final rounding error is always less than 1 and so it never affects the
result. This method has an advantage in that some processors have a very fast,
hardware-implemented `w`-bit by `w`-bit multiply routines [where
`w` is the size of the data word for that particular processor], so if
(`m` + `n`) ≤ `w` then this can be used by extending the
register `A` to `w` bits and by using `p` = `w`. This
normally works very well, as the result of the multiplication is usually stored in two
halves so dropping `w` LSBs from the result usually means just ignoring the lower
half of the result. It has the disadvantage that `L` must be calculated, by
dividing 2^{p} by `y` - so some division has to
be done sometime, anyway. This method is most suitable where `y` can be fixed at
compile-time, so `L` is a constant which can be hard-coded as a literal, or
where many registers must be divided by the same value [compute `L` once
using a slower division routine, then perform all the necessary divisions using the faster
multiply routine].

To more traditional division routines: to divide `x`, stored in
`m`-bit register `A`, by `y`, putting the integer result in `A`
and the remainder in an `n`-bit [or wider] register `C`, using
an integer counter `I` for counting between 0 and `m`:

`C` := 0

`I` := `m`

while (`I` > 0)

Logic shift left `A`

Shift left `C`

if ((carry flag ≠ 0) || (`C` ≥ `y`)) then

`C` := (`C` - `y`)

`A` := (`A` | 0x01)

end if

`I` := (`I` - 1)

end while

At the beginning of the loop, the rightmost `I` bits of `A` contain floor((`x` / `y`) * (2 ^ (`I` - `m`))). [This value is guaranteed to be at most `I` bits wide because `x` fits in `m` bits and so is less than (2 ^ `m`) and `y` ≥ 1, so ((`x` / `y`) * (2 ^ (-`m`))) is less than 1.] The leftmost (`m` - `I`) bits contain the value (`x` mod (2 ^ (`m` - `I`))).

The practice of using one register to hold two values, one value in the leftmost bits and one value in the rightmost bits, is used in other algorithms such as multiplication algorithms. In trying to understand such algorithms, it helps to remember that one register or variable does not always correspond to one value. This makes the algorithms less intuitive, but it does make them more efficient in terms of data memory (and even speed).

To store the integer result of the division in a different `m`-bit register,
`D`, and preserve the value of `A`:

`C` := 0

`I` := `m`

while (`I` > 0)

Logic shift left `D`

Rotate left `A`

Shift left `C`

if ((carry flag ≠ 0) || (`C` ≥ `y`)) then

`C` := (`C` - `y`)

`D` := (`D` | 0x01)

endif

`I` := (`I` - 1)

end while

If the contents of `A` need not be kept, the 'Rotate left `A`' can be
replaced with a 'Logic shift left `A`' [in which case `A` will contain
0 on exit] or a 'Shift left `A`' [in which case `A` will contain
the previous contents of `D` on exit].

Fixed-point division is useful in certain areas, for example sometimes one wishes to
divide and round to the closest integer rather than round down. To do this, use a fixed-point
division with one more bit of precision than integer division, shift the result right one
place, then increment if there is a carry. Every value with a fractional value of a half
or greater will be rounded up, all other values will be rounded down. To perform
fixed-point division, where the programmer wishes a result of `o` bits, and
`o` > `m`, putting the result in `D` [now `o`
bits wide], the following algorithm will do:

`C` := 0

`I` := `o`

while (`I` > 0)

Logic shift left `D`

Logic shift left `A`

Shift left `C`

if ((carry flag ≠ 0) || (`C` ≥ `y`)) then

`C` := (`C` - `y`)

`D` := (`D` | 0x01)

end if

`I` := (`I` - 1)

end while

`A` will be cleared.

To preserve `A`, one may use an extra register [`E`, (`o`
- `m`) bits wide]:

`E` := 0

`C` := 0

`I` := o

while (`I` > 0)

Logic shift left `D`

Rotate left `A`:`E`

Shift left `C`

if ((carry flag ≠ 0) || (`C` ≥ `y`)) then

`C` := (`C` - `y`)

`D` := (`D` | 0x01)

end if

`I` := (`I` - 1)

end while

'Rotate left `A`:`E`' can be accomplished by:

Logic shift left `E`

Shift left `A`

if (carry flag ≠ 0) then

`E` := (`E` | 0x01)

end if

`E` will be clear at the end of this routine.

Fixed-point divide with `o` < `m` is uncommon in my experience,
however, for completeness, here's an algorithm for that [using a register
`E`, which is `o` bits wide]:

`E` := [most significant `o` bits of `x`]

`C` := 0

`I` := `o`

while (`I` > 0)

Logic shift left `E`

Rotate left `C`

Shift left `C`

if ((carry flag ≠ 0) || (`C` ≥ `y`)) then

`C` := (`C` - `y`)

`D` := (`D` | 0x01)

endif

`I` := (`I` - 1)

end while

`E` is clear on exit. [If `A` need not be preserved, the register
`E` can be done away with and 'Logic shift left `E`' could be replaced
with 'Rotate left `A`', 'Logic shift left `A`', or 'Shift left
`A`'.]

Getting back to the method of multiplying by a reciprocal, the following algorithm will
divide 2^{p} by `y`, putting the result in
`p`-bits wide register `D`, provided `y` is not 1:

`C` := 1

`I` := `p`

while (`I` > 0)

Logic shift left `D`

Logic shift left `C`

if ((carry flag ≠ 0) || (`C` ≥ `y`)) then

`C` := (`C` - `y`)

`D` := (`D` | 0x01)

end if

`I` := (`I` - 1)

end while

To divide signed values, first determine the sign of the result (if the signs of the
divisor and dividend are the same, the sign of the result will be +; if the signs are
different, the sign of the result will be -). Next, convert (as necessary) divisor and
dividend to nonnegative values. Perform unsigned division, then apply the sign of the
result as determined in the first step. Care should be taken if rounding to the nearest
integer in signed division - for example, 7.5 normally rounds to 8, but -7.5 normally rounds
to -7. If rounding -7.5 to -8 is acceptable, then no worries; otherwise, the appropriate
value should be taken as shown in the table below. `R` represents the result from
the unsigned divide, `C` is the result of the `C` register after the
division:

\ result is +ve result is -ve
\
((2 * `C`) < `y`) `R` -`R`
((2 * `C`) = `y`) (`R` + 1) -`R`
((2 * `C`) > `y`) (`R` + 1) -(`R` + 1)

Finally, it is worth noting that none of the above examples of division are reentrant.
In general, it's very difficult to make a division routine reentrant, the simplest way would
probably be to have local shadow registers for every instance of the rountine, copy
the values of the dividend and divisor at the beginning of the routine in a short critical
section (i.e. a section with interrupts disabled), work on the shadow registers, then
copy the result back to the main processing context in a critical section at the end of
the routine.