[patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}()

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* [patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}()
@ 2008-06-09 10:13 David Brownell
  2008-06-09 11:31 ` Haavard Skinnemoen
  0 siblings, 1 reply; 6+ messages in thread
From: David Brownell @ 2008-06-09 10:13 UTC (permalink / raw)
  To: lkml, linux-mtd; +Cc: Nicolas Ferre, Haavard Skinnemoen

This uses __raw_{read,write}s{b,w}() primitives to access data on NAND
chips for more efficient I/O.

On an arm926 with memory clocked at 100 MHz, this reduced the elapsed
time for a 64 MByte read by 16%.  ("dd" /dev/mtd0 to /dev/null, with
an 8-bit NAND using hardware ECC and 128KB blocksize.)

Also some minor section tweaks:

  - Use platform_driver_probe() so no pointer to probe() lingers
    after that code has been removed at run-time.

  - Use __exit and __exit_p so the remove() code will normally be
    removed by the linker.

Since these buffer read/write calls are new, this increases the runtime
code footprint (by 88 bytes on my build, after the section tweaks).

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
---
Yeah, this does may you wonder why the *default* nand r/w code isn't
using these primitives; this speedup shouldn't be platform-specific.

Posting this now since I think this should either be incorporated into
the new atmel_nand.c code or into drivers/mtd/nand/nand_base.c ...
both arm and avr32 support these calls, I'm not sure whether or not
some platforms don't support them.

 drivers/mtd/nand/at91_nand.c |   46 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 5 deletions(-)

--- a/drivers/mtd/nand/at91_nand.c	2008-04-28 11:05:34.000000000 -0700
+++ b/drivers/mtd/nand/at91_nand.c	2008-04-28 21:59:34.000000000 -0700
@@ -146,6 +146,37 @@ static void at91_nand_disable(struct at9
 }
 
 /*
+ * Minimal-overhead PIO for data access.
+ */
+static void at91_read_buf(struct mtd_info *mtd, u8 *buf, int len)
+{
+	struct nand_chip	*nand_chip = mtd->priv;
+
+	__raw_readsb(nand_chip->IO_ADDR_R, buf, len);
+}
+
+static void at91_read_buf16(struct mtd_info *mtd, u8 *buf, int len)
+{
+	struct nand_chip	*nand_chip = mtd->priv;
+
+	__raw_readsw(nand_chip->IO_ADDR_R, buf, len / 2);
+}
+
+static void at91_write_buf(struct mtd_info *mtd, const u8 *buf, int len)
+{
+	struct nand_chip	*nand_chip = mtd->priv;
+
+	__raw_writesb(nand_chip->IO_ADDR_W, buf, len);
+}
+
+static void at91_write_buf16(struct mtd_info *mtd, const u8 *buf, int len)
+{
+	struct nand_chip	*nand_chip = mtd->priv;
+
+	__raw_writesw(nand_chip->IO_ADDR_W, buf, len / 2);
+}
+
+/*
  * write oob for small pages
  */
 static int at91_nand_write_oob_512(struct mtd_info *mtd,
@@ -440,8 +471,14 @@ static int __init at91_nand_probe(struct
 
 	nand_chip->chip_delay = 20;		/* 20us command delay time */
 
-	if (host->board->bus_width_16)		/* 16-bit bus width */
+	if (host->board->bus_width_16) {	/* 16-bit bus width */
 		nand_chip->options |= NAND_BUSWIDTH_16;
+		nand_chip->read_buf = at91_read_buf16;
+		nand_chip->write_buf = at91_write_buf16;
+	} else {
+		nand_chip->read_buf = at91_read_buf;
+		nand_chip->write_buf = at91_write_buf;
+	}
 
 	platform_set_drvdata(pdev, host);
 	at91_nand_enable(host);
@@ -548,7 +585,7 @@ err_ecc_ioremap:
 /*
  * Remove a NAND device.
  */
-static int __devexit at91_nand_remove(struct platform_device *pdev)
+static int __exit at91_nand_remove(struct platform_device *pdev)
 {
 	struct at91_nand_host *host = platform_get_drvdata(pdev);
 	struct mtd_info *mtd = &host->mtd;
@@ -565,8 +602,7 @@ static int __devexit at91_nand_remove(st
 }
 
 static struct platform_driver at91_nand_driver = {
-	.probe		= at91_nand_probe,
-	.remove		= at91_nand_remove,
+	.remove		= __exit_p(at91_nand_remove),
 	.driver		= {
 		.name	= "at91_nand",
 		.owner	= THIS_MODULE,
@@ -575,7 +611,7 @@ static struct platform_driver at91_nand_
 
 static int __init at91_nand_init(void)
 {
-	return platform_driver_register(&at91_nand_driver);
+	return platform_driver_probe(&at91_nand_driver, at91_nand_probe);
 }
 
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}()
  2008-06-09 10:13 [patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}() David Brownell
@ 2008-06-09 11:31 ` Haavard Skinnemoen
  2008-06-09 16:49   ` Haavard Skinnemoen
  2008-06-09 17:07   ` [patch 2.6.26-rc5-git] at91_nand speedup via {read, write}s{b, w}() David Brownell
  0 siblings, 2 replies; 6+ messages in thread
From: Haavard Skinnemoen @ 2008-06-09 11:31 UTC (permalink / raw)
  To: David Brownell; +Cc: Nicolas Ferre, linux-mtd, lkml

David Brownell <david-b@pacbell.net> wrote:
> This uses __raw_{read,write}s{b,w}() primitives to access data on NAND
> chips for more efficient I/O.
> 
> On an arm926 with memory clocked at 100 MHz, this reduced the elapsed
> time for a 64 MByte read by 16%.  ("dd" /dev/mtd0 to /dev/null, with
> an 8-bit NAND using hardware ECC and 128KB blocksize.)

Nice. Here are some numbers from my setup (256 MB, 8-bit, software ECC).

Before:
real	2m38.131s
user	0m0.228s
sys	2m37.740s

After:
real	2m27.404s
user	0m0.180s
sys	2m27.068s

which is a 6.8% speedup. I guess hardware ECC helps...though I can't
seem to get it to work properly. Is there anything I need to do besides
flash_eraseall when changing the ECC layout?

Also, I wonder if we can use the DMA engine framework to get rid of all
that "sys" time...?

> Also some minor section tweaks:
> 
>   - Use platform_driver_probe() so no pointer to probe() lingers
>     after that code has been removed at run-time.
> 
>   - Use __exit and __exit_p so the remove() code will normally be
>     removed by the linker.
> 
> Since these buffer read/write calls are new, this increases the runtime
> code footprint (by 88 bytes on my build, after the section tweaks).

Yeah, I spotted a bug in __raw_readsb on avr32, so I guess those
functions haven't actually been used before...

> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
> ---
> Yeah, this does may you wonder why the *default* nand r/w code isn't
> using these primitives; this speedup shouldn't be platform-specific.
> 
> Posting this now since I think this should either be incorporated into
> the new atmel_nand.c code or into drivers/mtd/nand/nand_base.c ...
> both arm and avr32 support these calls, I'm not sure whether or not
> some platforms don't support them.

I'll leave it up to the MTD people to decide whether or not to update
nand_base.c. Below is your patch rebased onto my patchset. I'll include
it in my next series after I figure out where to send it.

Haavard

From ad420ea11f9c8aa0fcad2ce1c3af69c02a2dc447 Mon Sep 17 00:00:00 2001
From: David Brownell <david-b@pacbell.net>
Date: Mon, 9 Jun 2008 03:13:28 -0700
Subject: [PATCH] atmel_nand speedup via {read,write}s{b,w}()

This uses __raw_{read,write}s{b,w}() primitives to access data on NAND
chips for more efficient I/O.

On an arm926 with memory clocked at 100 MHz, this reduced the elapsed
time for a 64 MByte read by 16%.  ("dd" /dev/mtd0 to /dev/null, with
an 8-bit NAND using hardware ECC and 128KB blocksize.)

Also some minor section tweaks:

  - Use platform_driver_probe() so no pointer to probe() lingers
    after that code has been removed at run-time.

  - Use __exit and __exit_p so the remove() code will normally be
    removed by the linker.

Since these buffer read/write calls are new, this increases the runtime
code footprint (by 88 bytes on my build, after the section tweaks).

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
[haavard.skinnemoen@atmel.com: rebase onto atmel_nand rename]
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
---
 drivers/mtd/nand/atmel_nand.c |   46 ++++++++++++++++++++++++++++++++++++----
 1 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/nand/atmel_nand.c b/drivers/mtd/nand/atmel_nand.c
index 325ce29..d9f7a5d 100644
--- a/drivers/mtd/nand/atmel_nand.c
+++ b/drivers/mtd/nand/atmel_nand.c
@@ -142,6 +142,37 @@ static int atmel_nand_device_ready(struct mtd_info *mtd)
 }
 
 /*
+ * Minimal-overhead PIO for data access.
+ */
+static void atmel_read_buf(struct mtd_info *mtd, u8 *buf, int len)
+{
+	struct nand_chip	*nand_chip = mtd->priv;
+
+	__raw_readsb(nand_chip->IO_ADDR_R, buf, len);
+}
+
+static void atmel_read_buf16(struct mtd_info *mtd, u8 *buf, int len)
+{
+	struct nand_chip	*nand_chip = mtd->priv;
+
+	__raw_readsw(nand_chip->IO_ADDR_R, buf, len / 2);
+}
+
+static void atmel_write_buf(struct mtd_info *mtd, const u8 *buf, int len)
+{
+	struct nand_chip	*nand_chip = mtd->priv;
+
+	__raw_writesb(nand_chip->IO_ADDR_W, buf, len);
+}
+
+static void atmel_write_buf16(struct mtd_info *mtd, const u8 *buf, int len)
+{
+	struct nand_chip	*nand_chip = mtd->priv;
+
+	__raw_writesw(nand_chip->IO_ADDR_W, buf, len / 2);
+}
+
+/*
  * write oob for small pages
  */
 static int atmel_nand_write_oob_512(struct mtd_info *mtd,
@@ -436,8 +467,14 @@ static int __init atmel_nand_probe(struct platform_device *pdev)
 
 	nand_chip->chip_delay = 20;		/* 20us command delay time */
 
-	if (host->board->bus_width_16)		/* 16-bit bus width */
+	if (host->board->bus_width_16) {	/* 16-bit bus width */
 		nand_chip->options |= NAND_BUSWIDTH_16;
+		nand_chip->read_buf = atmel_read_buf16;
+		nand_chip->write_buf = atmel_write_buf16;
+	} else {
+		nand_chip->read_buf = atmel_read_buf;
+		nand_chip->write_buf = atmel_write_buf;
+	}
 
 	platform_set_drvdata(pdev, host);
 	atmel_nand_enable(host);
@@ -546,7 +583,7 @@ err_nand_ioremap:
 /*
  * Remove a NAND device.
  */
-static int __devexit atmel_nand_remove(struct platform_device *pdev)
+static int __exit atmel_nand_remove(struct platform_device *pdev)
 {
 	struct atmel_nand_host *host = platform_get_drvdata(pdev);
 	struct mtd_info *mtd = &host->mtd;
@@ -564,8 +601,7 @@ static int __devexit atmel_nand_remove(struct platform_device *pdev)
 }
 
 static struct platform_driver atmel_nand_driver = {
-	.probe		= atmel_nand_probe,
-	.remove		= atmel_nand_remove,
+	.remove		= __exit_p(atmel_nand_remove),
 	.driver		= {
 		.name	= "atmel_nand",
 		.owner	= THIS_MODULE,
@@ -574,7 +610,7 @@ static struct platform_driver atmel_nand_driver = {
 
 static int __init atmel_nand_init(void)
 {
-	return platform_driver_register(&atmel_nand_driver);
+	return platform_driver_probe(&atmel_nand_driver, atmel_nand_probe);
 }
 
 
-- 
1.5.5.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}()
  2008-06-09 11:31 ` Haavard Skinnemoen
@ 2008-06-09 16:49   ` Haavard Skinnemoen
  2008-06-09 17:07   ` [patch 2.6.26-rc5-git] at91_nand speedup via {read, write}s{b, w}() David Brownell
  1 sibling, 0 replies; 6+ messages in thread
From: Haavard Skinnemoen @ 2008-06-09 16:49 UTC (permalink / raw)
  To: David Brownell; +Cc: Nicolas Ferre, linux-mtd, lkml, kernel

Haavard Skinnemoen <haavard.skinnemoen@atmel.com> wrote:
> which is a 6.8% speedup. I guess hardware ECC helps...though I can't
> seem to get it to work properly. Is there anything I need to do besides
> flash_eraseall when changing the ECC layout?

Turns out there's an AP7000 errata that hasn't made it to the data
sheet yet. The IC designers have already come up with a workaround,
which I've implemented below. This brings the time down to

real	2m0.934s
user	0m0.140s
sys	2m0.700s

which is a nice improvement.

Haavard

From 57d4f806c28a068baae12558794733e838016a71 Mon Sep 17 00:00:00 2001
From: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Date: Mon, 9 Jun 2008 18:31:25 +0200
Subject: [PATCH] atmel_nand: Work around AT32AP7000 errata

The ALE signal isn't correctly wired up to the ECC controller on the
AP7000, so it starts calculating ECC during the address cycles.

Work around this by resetting the ECC controller between the address and
data cycles.

Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
---
 drivers/mtd/nand/atmel_nand.c |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/atmel_nand.c b/drivers/mtd/nand/atmel_nand.c
index d9f7a5d..b769ef3 100644
--- a/drivers/mtd/nand/atmel_nand.c
+++ b/drivers/mtd/nand/atmel_nand.c
@@ -33,6 +33,7 @@
 #include <asm/io.h>
 
 #include <asm/arch/board.h>
+#include <asm/arch/cpu.h>
 
 #ifdef CONFIG_MTD_NAND_ATMEL_ECC_HW
 #define hard_ecc	1
@@ -264,6 +265,19 @@ static int atmel_nand_read_page(struct mtd_info *mtd,
 	uint8_t *ecc_pos;
 	int stat;
 
+	/*
+	 * Errata: ALE is incorrectly wired up to the ECC controller
+	 * on the AP7000, so it will include the address cycles in the
+	 * ECC calculation.
+	 *
+	 * Workaround: Reset the parity registers before reading the
+	 * actual data.
+	 */
+	if (cpu_is_at32ap7000()) {
+		struct atmel_nand_host *host = chip->priv;
+		ecc_writel(host->ecc, CR, ATMEL_ECC_RST);
+	}
+
 	/* read the page */
 	chip->read_buf(mtd, p, eccsize);
 
@@ -377,9 +391,16 @@ static int atmel_nand_correct(struct mtd_info *mtd, u_char *dat,
 }
 
 /*
- * Enable HW ECC : unsused
+ * Enable HW ECC : unused on most chips
  */
-static void atmel_nand_hwctl(struct mtd_info *mtd, int mode) { ; }
+static void atmel_nand_hwctl(struct mtd_info *mtd, int mode)
+{
+	if (cpu_is_at32ap7000()) {
+		struct nand_chip *nand_chip = mtd->priv;
+		struct atmel_nand_host *host = nand_chip->priv;
+		ecc_writel(host->ecc, CR, ATMEL_ECC_RST);
+	}
+}
 
 #ifdef CONFIG_MTD_PARTITIONS
 static const char *part_probes[] = { "cmdlinepart", NULL };
-- 
1.5.5.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [patch 2.6.26-rc5-git] at91_nand speedup via {read, write}s{b, w}()
  2008-06-09 11:31 ` Haavard Skinnemoen
  2008-06-09 16:49   ` Haavard Skinnemoen
@ 2008-06-09 17:07   ` David Brownell
  2008-06-09 17:48     ` [patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}() Haavard Skinnemoen
  1 sibling, 1 reply; 6+ messages in thread
From: David Brownell @ 2008-06-09 17:07 UTC (permalink / raw)
  To: Haavard Skinnemoen; +Cc: Nicolas Ferre, linux-mtd, lkml

On Monday 09 June 2008, Haavard Skinnemoen wrote:
> David Brownell <david-b@pacbell.net> wrote:
> > This uses __raw_{read,write}s{b,w}() primitives to access data on NAND
> > chips for more efficient I/O.
> > 
> > On an arm926 with memory clocked at 100 MHz, this reduced the elapsed
> > time for a 64 MByte read by 16%.  ("dd" /dev/mtd0 to /dev/null, with
> > an 8-bit NAND using hardware ECC and 128KB blocksize.)
> 
> Nice. Here are some numbers from my setup (256 MB, 8-bit, software ECC).
> 
> Before:
> real	2m38.131s
> user	0m0.228s
> sys	2m37.740s
> 
> After:
> real	2m27.404s
> user	0m0.180s
> sys	2m27.068s
> 
> which is a 6.8% speedup. I guess hardware ECC helps...

The AVR32 versions of readsb/writesb didn't look to me as if they'd
be quite as fast as the ARM ones either.  If AVR32 has some analogue
of "stmia r1!, {r3 - r6}" for burst 16 byte stores, it's not using
it right now.  (What was the bug you found in its readsb?)

Yes, I'd think the win would be most visible with hardware ECC, since
without it you've still got a second manual scan of each block.  (And
I see you observed this too, after applying a workaround for an ECC
erratum you just learned about...)  My numbers for one pair of trials
(the "16%" was an average of 6 runs) had a *lot* less system time.
Which oddly enough went *up* after the switch to readsb/writesb:

Before:
real    0m24.199s
user    0m0.000s
sys     0m5.630s

After:
real    0m20.226s
user    0m0.010s
sys     0m6.000s

However, the fact that you got a win even with soft ECC (and, I'm
guessing, slower RAM and slower readsb) suggests that this speedup
should be pretty generally applicable!

> though I can't 
> seem to get it to work properly. Is there anything I need to do besides
> flash_eraseall when changing the ECC layout?

I wouldn't know.  Just be sure not to lose all your badblocks data
when you convert ...

> Also, I wonder if we can use the DMA engine framework to get rid of all
> that "sys" time...?

It's another one of those cases where the framework overhead has to be
low enough to make that practical.  Last time I looked, the overhead to
set up and wait for a DMA of a couple KBytes was a significant chunk of
the cost to readsb()/writesb() the same data ... and that's even before
the data starts transferring.

Plus, the MTD layer currently assumes DMA is never used.  Some of the
buffers it passes are not suitable for dma_map_single() since they
come from vmalloc.

> > 	...
> > 	
> > Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
> > ---
> > Yeah, this does may you wonder why the *default* nand r/w code isn't
> > using these primitives; this speedup shouldn't be platform-specific.
> > 
> > Posting this now since I think this should either be incorporated into
> > the new atmel_nand.c code or into drivers/mtd/nand/nand_base.c ...
> > both arm and avr32 support these calls, I'm not sure whether or not
> > some platforms don't support them.
> 
> I'll leave it up to the MTD people to decide whether or not to update
> nand_base.c. Below is your patch rebased onto my patchset. I'll include
> it in my next series after I figure out where to send it.

Sounds fair to me.  Thanks; this has been sitting in my tree for many
months now, I finally made time to measure it and was pleasantly
surprised by the size of the win!

- Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}()
  2008-06-09 17:07   ` [patch 2.6.26-rc5-git] at91_nand speedup via {read, write}s{b, w}() David Brownell
@ 2008-06-09 17:48     ` Haavard Skinnemoen
  2008-06-09 18:21       ` [patch 2.6.26-rc5-git] at91_nand speedup via {read, write}s{b, w}() David Brownell
  0 siblings, 1 reply; 6+ messages in thread
From: Haavard Skinnemoen @ 2008-06-09 17:48 UTC (permalink / raw)
  To: David Brownell; +Cc: Nicolas Ferre, linux-mtd, lkml

David Brownell <david-b@pacbell.net> wrote:
> On Monday 09 June 2008, Haavard Skinnemoen wrote:
> > David Brownell <david-b@pacbell.net> wrote:
> > > This uses __raw_{read,write}s{b,w}() primitives to access data on NAND
> > > chips for more efficient I/O.
> > > 
> > > On an arm926 with memory clocked at 100 MHz, this reduced the elapsed
> > > time for a 64 MByte read by 16%.  ("dd" /dev/mtd0 to /dev/null, with
> > > an 8-bit NAND using hardware ECC and 128KB blocksize.)
> > 
> > Nice. Here are some numbers from my setup (256 MB, 8-bit, software ECC).
> > 
> > Before:
> > real	2m38.131s
> > user	0m0.228s
> > sys	2m37.740s
> > 
> > After:
> > real	2m27.404s
> > user	0m0.180s
> > sys	2m27.068s
> > 
> > which is a 6.8% speedup. I guess hardware ECC helps...
> 
> The AVR32 versions of readsb/writesb didn't look to me as if they'd
> be quite as fast as the ARM ones either.  If AVR32 has some analogue
> of "stmia r1!, {r3 - r6}" for burst 16 byte stores, it's not using
> it right now.  (What was the bug you found in its readsb?)

Note that I'm talking about the __raw_ versions of those, which are a
bit more optimized than the non-raw versions. They do

1:	ldins.b	r8:t, r12[0]
	ldins.b	r8:u, r12[0]
	ldins.b	r8:l, r12[0]
	ldins.b r8:b, r12[0]
	st.w	r11++, r8
	sub	r10, 4
	brge	1b

I don't think we have an instruction that can store multiple registers
to the same address...it would of course be acceptable to store to
incrementing addresses when dealing with NAND flash, but I don't think
it's a good idea in a general __raw_readsb implementation.

Here's the bug I found, btw:

--- a/arch/avr32/lib/io-readsb.S
+++ b/arch/avr32/lib/io-readsb.S
@@ -41,7 +41,7 @@ __raw_readsb:
 2:     sub     r10, -4
        reteq   r12
 
-3:     ld.uh   r8, r12[0]
+3:     ld.ub   r8, r12[0]
        sub     r10, 1
        st.b    r11++, r8
        brne    3b

Not sure how easy it is to trigger since that code is only executed for
odd sizes.

> Yes, I'd think the win would be most visible with hardware ECC, since
> without it you've still got a second manual scan of each block.  (And
> I see you observed this too, after applying a workaround for an ECC
> erratum you just learned about...)  My numbers for one pair of trials
> (the "16%" was an average of 6 runs) had a *lot* less system time.
> Which oddly enough went *up* after the switch to readsb/writesb:
> 
> Before:
> real    0m24.199s
> user    0m0.000s
> sys     0m5.630s
> 
> After:
> real    0m20.226s
> user    0m0.010s
> sys     0m6.000s

Hmm, that's odd. What's the CPU doing during the remaining 14 seconds?
It can't possibly be sleeping?

Ah, it's I/O wait, isn't it? Because you're going through the block
layer?

> However, the fact that you got a win even with soft ECC (and, I'm
> guessing, slower RAM and slower readsb) suggests that this speedup
> should be pretty generally applicable!

Yes, I would think so...although I've seen gcc generate somewhat crappy
code for the I/O accessors, and we do some address mangling in the
non-raw I/O accessors on avr32 which might explain some of the
difference.

> > though I can't 
> > seem to get it to work properly. Is there anything I need to do besides
> > flash_eraseall when changing the ECC layout?
> 
> I wouldn't know.  Just be sure not to lose all your badblocks data
> when you convert ...

Seems like flash_eraseall skips the bad blocks as it should.

> > Also, I wonder if we can use the DMA engine framework to get rid of all
> > that "sys" time...?
> 
> It's another one of those cases where the framework overhead has to be
> low enough to make that practical.  Last time I looked, the overhead to
> set up and wait for a DMA of a couple KBytes was a significant chunk of
> the cost to readsb()/writesb() the same data ... and that's even before
> the data starts transferring.

Right. I guess we should take a look at how to reduce that overhead at
some point...

> Plus, the MTD layer currently assumes DMA is never used.  Some of the
> buffers it passes are not suitable for dma_map_single() since they
> come from vmalloc.

Aw...the MTD layer uses vmalloc() all over the place :-(

> Sounds fair to me.  Thanks; this has been sitting in my tree for many
> months now, I finally made time to measure it and was pleasantly
> surprised by the size of the win!

Yeah...I'm still not sure where to send it though, since it touches
three different subsystems. I can set up a separate tree for it like
I've done a couple of times before...though I'm not sure if anyone ever
pulls it.

Haavard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 2.6.26-rc5-git] at91_nand speedup via {read, write}s{b, w}()
  2008-06-09 17:48     ` [patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}() Haavard Skinnemoen
@ 2008-06-09 18:21       ` David Brownell
  0 siblings, 0 replies; 6+ messages in thread
From: David Brownell @ 2008-06-09 18:21 UTC (permalink / raw)
  To: Haavard Skinnemoen; +Cc: Nicolas Ferre, linux-mtd, lkml

On Monday 09 June 2008, Haavard Skinnemoen wrote:
> > real    0m20.226s
> > user    0m0.010s
> > sys     0m6.000s
>
> Hmm, that's odd. What's the CPU doing during the remaining 14 seconds?
> It can't possibly be sleeping?
>
> Ah, it's I/O wait, isn't it? Because you're going through the block
> layer?

Some of it is surely data copying, but yes /dev/mtdblock0 might
have something to do with it.  I was puzzled by this too, which
is part of why I quoted only elapsed time.


> Yeah...I'm still not sure where to send it though, since it touches
> three different subsystems. I can set up a separate tree for it like
> I've done a couple of times before...though I'm not sure if anyone ever
> pulls it.

Three subsystems ... you mean, ARM, AVR32, MTD?  If MTD patches
merged more promptly, I'd suggest it goes through there.  Else
maybe you should just get acks from the other maintainers and
push the rename+ directly to Linus once 2.6.27-rc0 starts.

- Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-06-09 18:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-09 10:13 [patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}() David Brownell
2008-06-09 11:31 ` Haavard Skinnemoen
2008-06-09 16:49   ` Haavard Skinnemoen
2008-06-09 17:07   ` [patch 2.6.26-rc5-git] at91_nand speedup via {read, write}s{b, w}() David Brownell
2008-06-09 17:48     ` [patch 2.6.26-rc5-git] at91_nand speedup via {read,write}s{b,w}() Haavard Skinnemoen
2008-06-09 18:21       ` [patch 2.6.26-rc5-git] at91_nand speedup via {read, write}s{b, w}() David Brownell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox