From mboxrd@z Thu Jan 1 00:00:00 1970 From: h.feurstein@gmail.com (Hubert Feurstein) Date: Fri, 13 Nov 2009 12:32:41 +0100 Subject: ARM: big performance waste in memcpy_{from,to}io In-Reply-To: References: <200911121749.49676.h.feurstein@gmail.com> Message-ID: <200911131232.41716.h.feurstein@gmail.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Am Donnerstag, 12. November 2009 19:44:40 schrieb Alexander Clouter: > > Are there any drawbacks when using the good-and-fast "memcpy" ? On my > > Micro9- board everything is running fine so far. > > From the small bit of MTD work I have done, some NAND's (and I guess > NOR's too) do *not* support 32bit wide read's. For example, if I > remember correctly, the NAND driver for orion will let you read 32bits > but write only 8bits at a time. Other platforms are only 8bit wide in > both direction. I guess the 'slow' memcpy version is used as > *everything* supports 8bit reads....I *guess* :) > The Spansion NOR Flashes (like S29GL...) have a 16bit wide data bus, by using two in parallel it looks like a 32bit-flash for the system. So 32bit reads from the flash works just fine. But in fact that is not the point at all. And it is not an issue of the mtd-driver, that's why I have posted this only to the arm-linux-mailinglist and not to the mtd-list. The memcpy_{to,from}io-function don't has to care about the bus-width of the attached peripheral, because this is already handled correctly by the static memory controller of your arm-derivate (Of course this one has to be configured correctly to the peripherals bus width). In the rare case where you have to take care about that it is anyway a bad idea to use a memcpy_xxio- function. I've checked the implementations of some other architectures. And a lot of them already have optimized memcpy_{from,to}io functions: - alpha/io.c - parisc/lib/io.c - powerpc/kernel/io.c - avr32/include/asm/io.h - blackfin/include/asm/io.h - cris/include/asm/io.h - frv/include/asm/io.h - h8300/include/asm/io.h - m32r/include/asm/io.h - m68k, microblaze, mips, mn10300, ... One architecture which also uses the simple and slow version is 'sh' (and maybe there are a few others). Just want to ask the community if there is a really good reason why this bottle neck is still in the ARM kernel? @Russell: What's your opinion on that? best regards Hubert