From mboxrd@z Thu Jan 1 00:00:00 1970 From: Scott Wood Date: Mon, 26 Nov 2012 17:39:19 -0600 Subject: [U-Boot] [PATCH 1/4] Optimized nand_read_buf for kirkwood In-Reply-To: <1353925988-6859-1-git-send-email-phil.sutter@viprinet.com> (from phil.sutter@viprinet.com on Mon Nov 26 04:33:08 2012) Message-ID: <1353973159.2383.19@tyr> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On 11/26/2012 04:33:08 AM, Phil Sutter wrote: > The basic idea is taken from the linux-kernel, but further optimized. > > First align the buffer to 8 bytes, then use ldrd/strd to read and > store > in 8 byte quantities, then do the final bytes. > > Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'. > Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this > patch in place, reading the same amount of data was done in 27s > (~4.89MB/s). So read performance is increased by ~80%! > > Signed-off-by: Nico Erfurth > Tested-by: Phil Sutter > Cc: Prafulla Wadaskar > --- > drivers/mtd/nand/kirkwood_nand.c | 29 +++++++++++++++++++++++++++++ > 1 files changed, 29 insertions(+), 0 deletions(-) > > diff --git a/drivers/mtd/nand/kirkwood_nand.c > b/drivers/mtd/nand/kirkwood_nand.c > index bdab5aa..e04a59f 100644 > --- a/drivers/mtd/nand/kirkwood_nand.c > +++ b/drivers/mtd/nand/kirkwood_nand.c > @@ -38,6 +38,34 @@ struct kwnandf_registers { > static struct kwnandf_registers *nf_reg = > (struct kwnandf_registers *)KW_NANDF_BASE; > > + > +/* The basic idea is stolen from the linux kernel, but the inner > loop is optimized a bit more */ > +static void kw_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int > len) > +{ > + struct nand_chip *chip = mtd->priv; > + > + while (len && (unsigned long)buf & 7) > + { Brace goes on the previous line. > + *buf++ = readb(chip->IO_ADDR_R); > + len--; > + }; > + > + asm volatile ( > + ".LFlashLoop:\n" > + " subs\t%0, #8\n" > + " ldrpld\tr2, [%2]\n" // Read 2 words > + " strpld\tr2, [%1], #8\n" // Read 2 words > + " bpl\t.LFlashLoop\n" // This results in one > additional loop if len%8 <> 0 > + " addne\t%0, #8\n" > + : "+&r" (len), "+&r" (buf) > + : "r" (chip->IO_ADDR_R) > + : "r2", "r3", "memory", "cc" > + ); Use a real tab (or a space) rather than \t (which only helps readability in the asm output, rather than the C source that people actually look at). Should probably use a numeric label to avoid any possibility of conflict. Would this make more sense as a more generic optimized memcpy_fromio() or similar? -Scott