In programming languages and mathematical notation, there are a few different ways to do array indexing - the first choice is to start at 0 or at 1. If you want to refer to an interval (or slice) [ a : b ] of an array, there is a second choice - should the interval include or exclude point b. And thirdly, there is the option of using negative array indexing, and deciding what that should mean.

The node "Array indexes as measure of a language's analness" mentions that there are different schools of thought on this subject. Personally I believe there is One True Way of indexing, and all the rest are just bad design. In this node I will explain what the right way to do indexing and slicing is, and why. Not incidentally, Python designer Guido van Rossum had just the same thoughts, and I will give Python examples.

One True Way of Indexing
Given an array/list (from now on list, since that's the Python term) L, the One True Way of indexing is:

  • L[0] is the first item of the list.
  • L[a:b] is the slice of all the items L[x], for a <= x < b. So this excludes the endpoint, it is the half-open interval that is often written [ a, b ) in math.
  • Negative indices count from the end of the list. That is, L[-x] = L[length(L)-x] for x > 0.

What makes one way of indexing better than the other? It's better if it can lead to simpler expressions in common situations. Off-by-one errors are an extremely common source of bugs. The fewer "-1" or "+1" terms clutter your expressions the better. The One True Way of indexing leads to very elegant expressions, especially when intervals and negative indexing are involved. Without the intervals, I don't think there is much difference between starting with 0 or 1. With them...

Python writes a slice L[0:a] as L[:a], and L[a:len(L)] as L[a:] (and L[:] for the identity slice, or copy of the original list).

In the following list, the equations hold for the One True Way; in italics are the equivalent expressions assuming that we start counting at 1, an interval includes the end point, and there is no negative indexing:

  • The length of a slice L[a:b] is b-a (if b >= a, 0 otherwise) (b-a+1)
  • The first n characters of L: L[:n] (also L[:n])
  • The last n characters of L: L[-n:] (L[len(L)-n+1:])
  • The identity slice. L == L[0:len(L)] == L[:] (L[1:len(L)])
  • The empty slice: L[a:a] is empty for any a. (perhaps L[a:a-1]?)
  • A slice of length n, from point a, is L[a:a+n] (L[a:a+n-1])
  • For any index a, L == L[0,a]+L[a,len(L)] == L[:a]+L[a:] (L[1:a]+L[len(L)-a+1])
  • Given a slice L[a:b] and an index c between them, L[a:b] == L[a:c]+L[c:b] (works for negative indices as well) (L[a:c-1]+L[c:b])

The negative indexing makes it easy to do array operations 'backwards', and most of the above equations still hold for negative a and b. Not a +1 or -1 in sight for the True Way! On the other hand, I'm even in doubt whether I wrote all the expressions correctly for the other option...

Visualization help
One thing people often complain about is that this style of indexing is counter-intuitive. People start counting items at 1, and when they say "items 1 to 3" they mean items 1, 2 and 3.

If you have trouble with this, you should visualize lists a little differently; the indices refer to points between the items in the list. This way, there is no ambiguity possible (no different interpretations), and the math works out exactly the same. Consider the array with the letters of the alphabet:

  | a | b | c | d | e | f | g | h | i | ....
  0   1   2   3   4   5   6   7   8   9 

Here, it is clear that [2:5] refers to c, d, and e, and nothing else. It's also natural that you would slice between numbers and not right through them.

The same ideas work well in Computer Science, correctness proofs and the like; this is not just for programming languages. Use half-open intervals.

Log in or register to write something here or to contact authors.