Here is a simpler solution that is also one-pass with n swaps. Start with 0 and place the element there where it should be (i -> i+m) with a swap. Repeat until you'd wrap around. You've now moved all of the string into place but the last m pieces. These are almost in place, but have been shifted n % m (remainder) places. So if necessary, shift them back recursively using the same algorithm.
I guess if m << n this version should have fairly nice cache-properties, be easy to unroll and work well enough with tail recursion optimization.
In Python:
def swap(a,i,j):
a[i], a[j] = a[j], a[i]
def rotate(a,m,start=0):
n=len(a)-start
m = m % n
if m != 0:
for i in range(start,start+n-m):
swap(a, i, i+m)
rotate (a,m-n%m,start+n-m)
a = range(10)
rotate(a,3)
print a