Following some discussion with Roland, and patterned after the style anointed by Linus last week, here is a new version of the 32-bit MMIO copy routine needed by our InfiniPath device. The name of the routine has changed from memcpy_toio32 to __raw_memcpy_toio32. This reflects the basic nature of the routine; it dodes not guarantee the order in which writes are performed, nor does it perform a memory barrier after it is done. The reason for this is that our chip treats the first and last writes to some MMIO regions specially; our driver performs those directly using writel, and uses __raw_memcpy_toio32 for the bits in between. Regarding the specialised x86_64 implementation, Andi Kleen asked me to perform some measurements of its performance impact. It makes a difference of about 5% in performance on moderately large copies over the HyperTransport bus, compared to the generic implementation.