public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed
* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
@ 2009-11-25 13:57 Nick Thompson
  2009-12-01  0:55 ` Scott Wood
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Thompson @ 2009-11-25 13:57 UTC (permalink / raw)
  To: u-boot

Improve read performance from Large Page NAND devices.

This patch employs the following concepts to produce a ~37% improvement in
oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page
read is now 260us, 7.88MB/s (was 357us, 5.74MB/s). oob_first is probably
the best case improvement.

Provides a new config option to allow building for large page devices only.
reducing code size by ~800 bytes. [CONFIG_SYS_NAND_NO_SMALL_PAGE]
This almost exactly compensates for the code increase due to other changes.

Removes waiting after read0 and readoob large page cmdfunc calls. This allows
the caller to continue processing in parallel to the NAND cache load.

Provides a large page only nand_wait_cache_load function for detection of end
of cache loading in the NAND after a (possibly much) earlier read request. This
uses chip->dev_ready or NAND STATUS register reads, never falling back to fixed
udelay calls.

Provides a persistent state store over a buffer read cycle, to allow "page
read" functions to control their own read sequencing.

Requires "page read" functions to initiate read transfers themselves and allows
them to pre-request the next page over multiple page reads. NAND cache load can
then proceed in parallel with software operation. nand_do_read_ops maintains a
flag in the state register indicating if the following page can be requested
(and will be read) after the current page read is complete. This flag also
prevents pre-requests on autoinc NAND devices.

Uses random read command, when available, in oob_first to avoid a second read0
by rewinding the cache column counter to 0 instead.

oob_first page read uses a "continuous and in order" OOB ECC assumption for a
3x improvement in ECC extraction time (34us->10us). These seems to be valid for
all current standard OOB definitions, but may not be valid in all cases. This
10us is a still a 3.9% overhead in read times on LP devices.

Optimised continual function call indirections in nand_command_lp.

Signed-off-by: Nick Thompson <nick.thompson@gefanuc.com>
---
Note: For comments only. This patch is not intended to be committed anywhere.
Other files replace some of these functions and require matching modification.

Equivalent changes may also be useful in Linux.

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 426bb95..64e6d19 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -90,6 +90,22 @@
 #define CONFIG_SYS_NAND_RESET_CNT 200000
 #endif
 
+/* NAND Read States */
+#define NAND_RSTATE_INIT	0
+#define NAND_RSTATE_READY	1
+#define NAND_RSTATE_NO_REQ	(1 << 31) /* Don't pre-request next page */
+
+#define nand_rstate_is_init(x)	((x & ~NAND_RSTATE_NO_REQ) == NAND_RSTATE_INIT)
+#define nand_next_page_req(x)	(!(x & NAND_RSTATE_NO_REQ))
+
+#define CONFIG_SYS_NAND_NO_SMALL_PAGE
+/* Is the device a Large Page device? */
+#ifdef CONFIG_SYS_NAND_NO_SMALL_PAGE
+#define nand_is_lp_device(mtd)	(1)
+#else
+#define nand_is_lp_device(mtd)	(mtd->writesize > 512)
+#endif
+
 /* Define default oob placement schemes for large and small page devices */
 static struct nand_ecclayout nand_oob_8 = {
 	.eccbytes = 3,
@@ -143,6 +159,8 @@ static int nand_do_write_oob(struct mtd_info *mtd, loff_t to,
 
 static int nand_wait(struct mtd_info *mtd, struct nand_chip *this);
 
+static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *this);
+
 /*
  * For devices which display every fart in the system on a separate LED. Is
  * compiled away when LED support is disabled.
@@ -385,6 +403,8 @@ static int nand_block_bad(struct mtd_info *mtd, loff_t ofs, int getchip)
 	if (chip->options & NAND_BUSWIDTH_16) {
 		chip->cmdfunc(mtd, NAND_CMD_READOOB, chip->badblockpos & 0xFE,
 			      page);
+		if (nand_is_lp_device(mtd))
+			nand_wait_cache_load(mtd, chip);
 		bad = cpu_to_le16(chip->read_word(mtd));
 		if (chip->badblockpos & 0x1)
 			bad >>= 8;
@@ -392,6 +412,8 @@ static int nand_block_bad(struct mtd_info *mtd, loff_t ofs, int getchip)
 			res = 1;
 	} else {
 		chip->cmdfunc(mtd, NAND_CMD_READOOB, chip->badblockpos, page);
+		if (nand_is_lp_device(mtd))
+			nand_wait_cache_load(mtd, chip);
 		if (chip->read_byte(mtd) != 0xff)
 			res = 1;
 	}
@@ -644,6 +666,8 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command,
 {
 	register struct nand_chip *chip = mtd->priv;
 	uint32_t rst_sts_cnt = CONFIG_SYS_NAND_RESET_CNT;
+	void (*cmd_ctrl)(struct mtd_info *mtd, int cmd, unsigned int ctrl);
+	cmd_ctrl = chip->cmd_ctrl;
 
 	/* Emulate NAND_CMD_READOOB */
 	if (command == NAND_CMD_READOOB) {
@@ -663,21 +687,21 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command,
 			/* Adjust columns for 16 bit buswidth */
 			if (chip->options & NAND_BUSWIDTH_16)
 				column >>= 1;
-			chip->cmd_ctrl(mtd, column, ctrl);
+			cmd_ctrl(mtd, column, ctrl);
 			ctrl &= ~NAND_CTRL_CHANGE;
-			chip->cmd_ctrl(mtd, column >> 8, ctrl);
+			cmd_ctrl(mtd, column >> 8, ctrl);
 		}
 		if (page_addr != -1) {
-			chip->cmd_ctrl(mtd, page_addr, ctrl);
-			chip->cmd_ctrl(mtd, page_addr >> 8,
+			cmd_ctrl(mtd, page_addr, ctrl);
+			cmd_ctrl(mtd, page_addr >> 8,
 				       NAND_NCE | NAND_ALE);
 			/* One more address cycle for devices > 128MiB */
 			if (chip->chipsize > (128 << 20))
-				chip->cmd_ctrl(mtd, page_addr >> 16,
-					       NAND_NCE | NAND_ALE);
+				cmd_ctrl(mtd, page_addr >> 16,
+					 NAND_NCE | NAND_ALE);
 		}
 	}
-	chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE);
+	cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE);
 
 	/*
 	 * program and erase have their own busy handlers
@@ -710,29 +734,28 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command,
 		if (chip->dev_ready)
 			break;
 		udelay(chip->chip_delay);
-		chip->cmd_ctrl(mtd, NAND_CMD_STATUS,
-			       NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
-		chip->cmd_ctrl(mtd, NAND_CMD_NONE,
-			       NAND_NCE | NAND_CTRL_CHANGE);
+		cmd_ctrl(mtd, NAND_CMD_STATUS,
+			 NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
+		cmd_ctrl(mtd, NAND_CMD_NONE,
+			 NAND_NCE | NAND_CTRL_CHANGE);
 		while (!(chip->read_byte(mtd) & NAND_STATUS_READY) &&
 			(rst_sts_cnt--));
 		return;
 
 	case NAND_CMD_RNDOUT:
 		/* No ready / busy check necessary */
-		chip->cmd_ctrl(mtd, NAND_CMD_RNDOUTSTART,
-			       NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
-		chip->cmd_ctrl(mtd, NAND_CMD_NONE,
-			       NAND_NCE | NAND_CTRL_CHANGE);
+		cmd_ctrl(mtd, NAND_CMD_RNDOUTSTART,
+			 NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
+		cmd_ctrl(mtd, NAND_CMD_NONE,
+			 NAND_NCE | NAND_CTRL_CHANGE);
 		return;
 
 	case NAND_CMD_READ0:
-		chip->cmd_ctrl(mtd, NAND_CMD_READSTART,
-			       NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
-		chip->cmd_ctrl(mtd, NAND_CMD_NONE,
-			       NAND_NCE | NAND_CTRL_CHANGE);
-
-		/* This applies to read commands */
+		cmd_ctrl(mtd, NAND_CMD_READSTART,
+			 NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE);
+		cmd_ctrl(mtd, NAND_CMD_NONE,
+			 NAND_NCE | NAND_CTRL_CHANGE);
+		return;
 	default:
 		/*
 		 * If we don't have access to the busy pin, we apply the given
@@ -863,6 +886,15 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *this)
 	else
 		this->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1);
 
+	/* The chip might be ready by now, don't lose anymore time */
+	if (this->dev_ready) {
+		if (this->dev_ready(mtd))
+			goto ready;
+	} else {
+		if (this->read_byte(mtd) & NAND_STATUS_READY)
+			goto ready;
+	}
+
 	reset_timer();
 
 	while (1) {
@@ -879,6 +911,8 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *this)
 				break;
 		}
 	}
+
+ready:
 #ifdef PPCHAMELON_NAND_TIMER_HACK
 	reset_timer();
 	while (get_timer(0) < 10);
@@ -889,16 +923,45 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *this)
 #endif
 
 /**
+ * Wait for cache ready after read request.
+ *
+ * Returns to read state before returning.
+ *
+ * @mtd:	mtd info structure
+ * @chip:	nand chip info structure
+ */
+static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *chip)
+{
+	int state = nand_wait(mtd, chip);
+	chip->cmd_ctrl(mtd, NAND_CMD_READSTART, NAND_CLE | NAND_CTRL_CLE |
+						NAND_CTRL_CHANGE);
+	chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_CLE | NAND_CTRL_CHANGE);
+	return state;
+}
+
+/**
  * nand_read_page_raw - [Intern] read raw page data without ecc
  * @mtd:	mtd info structure
  * @chip:	nand chip info structure
  * @buf:	buffer to store read data
  */
 static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
-			      uint8_t *buf, int page)
+			      uint8_t *buf, int page, uint32_t *rstate)
 {
+	if (nand_rstate_is_init(*rstate)) {
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+		*rstate = NAND_RSTATE_READY;
+	}
+
+	if (nand_is_lp_device(mtd))
+		nand_wait_cache_load(mtd, chip);
+
 	chip->read_buf(mtd, buf, mtd->writesize);
 	chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
+
+	if (nand_next_page_req(*rstate))
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1);
+
 	return 0;
 }
 
@@ -909,7 +972,7 @@ static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip,
  * @buf:	buffer to store read data
  */
 static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip,
-				uint8_t *buf, int page)
+				uint8_t *buf, int page, uint32_t *rstate)
 {
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
@@ -919,7 +982,7 @@ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip,
 	uint8_t *ecc_code = chip->buffers->ecccode;
 	uint32_t *eccpos = chip->ecc.layout->eccpos;
 
-	chip->ecc.read_page_raw(mtd, chip, buf, page);
+	chip->ecc.read_page_raw(mtd, chip, buf, page, rstate);
 
 	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize)
 		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
@@ -950,7 +1013,9 @@ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip,
  * @readlen	data length
  * @buf:	buffer to store read data
  */
-static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint32_t data_offs, uint32_t readlen, uint8_t *bufpoi)
+static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip,
+			     uint32_t data_offs, uint32_t readlen,
+			     uint8_t *bufpoi, int page, uint32_t *rstate)
 {
 	int start_step, end_step, num_steps;
 	uint32_t *eccpos = chip->ecc.layout->eccpos;
@@ -959,6 +1024,11 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3
 	int datafrag_len, eccfrag_len, aligned_len, aligned_pos;
 	int busw = (chip->options & NAND_BUSWIDTH_16) ? 2 : 1;
 
+	if (nand_rstate_is_init(*rstate)) {
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+		*rstate = NAND_RSTATE_READY;
+	}
+
 	/* Column address wihin the page aligned to ECC size (256bytes). */
 	start_step = data_offs / chip->ecc.size;
 	end_step = (data_offs + readlen - 1) / chip->ecc.size;
@@ -969,6 +1039,10 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3
 	eccfrag_len = num_steps * chip->ecc.bytes;
 
 	data_col_addr = start_step * chip->ecc.size;
+
+	if (nand_is_lp_device(mtd))
+		nand_wait_cache_load(mtd, chip);
+
 	/* If we read not a page aligned data */
 	if (data_col_addr != 0)
 		chip->cmdfunc(mtd, NAND_CMD_RNDOUT, data_col_addr, -1);
@@ -1007,6 +1081,9 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3
 		chip->read_buf(mtd, &chip->oob_poi[aligned_pos], aligned_len);
 	}
 
+	if (nand_next_page_req(*rstate))
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1);
+
 	for (i = 0; i < eccfrag_len; i++)
 		chip->buffers->ecccode[i] = chip->oob_poi[eccpos[i + start_step * chip->ecc.bytes]];
 
@@ -1032,7 +1109,7 @@ static int nand_read_subpage(struct mtd_info *mtd, struct nand_chip *chip, uint3
  * Not for syndrome calculating ecc controllers which need a special oob layout
  */
 static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
-				uint8_t *buf, int page)
+				uint8_t *buf, int page, uint32_t *rstate)
 {
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
@@ -1042,6 +1119,14 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
 	uint8_t *ecc_code = chip->buffers->ecccode;
 	uint32_t *eccpos = chip->ecc.layout->eccpos;
 
+	if (nand_rstate_is_init(*rstate)) {
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+		*rstate = NAND_RSTATE_READY;
+	}
+
+	if (nand_is_lp_device(mtd))
+		nand_wait_cache_load(mtd, chip);
+
 	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
 		chip->ecc.hwctl(mtd, NAND_ECC_READ);
 		chip->read_buf(mtd, p, eccsize);
@@ -1049,6 +1134,9 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
 	}
 	chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
 
+	if (nand_next_page_req(*rstate))
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1);
+
 	for (i = 0; i < chip->ecc.total; i++)
 		ecc_code[i] = chip->oob_poi[eccpos[i]];
 
@@ -1081,29 +1169,52 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
  * overwriting the NAND manufacturer bad block markings.
  */
 static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd,
-	struct nand_chip *chip, uint8_t *buf, int page)
+					  struct nand_chip *chip, uint8_t *buf,
+					  int page, uint32_t *rstate)
 {
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
 	int eccsteps = chip->ecc.steps;
 	uint8_t *p = buf;
 	uint8_t *ecc_code = chip->buffers->ecccode;
-	uint32_t *eccpos = chip->ecc.layout->eccpos;
 	uint8_t *ecc_calc = chip->buffers->ecccalc;
+	uint8_t * const oob_poi = chip->oob_poi;
+	uint8_t *ecc_p;
+	uint32_t eccpos;
+	const int lp_device = nand_is_lp_device(mtd);
 
-	/* Read the OOB area first */
-	chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page);
-	chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
-	chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+	if (nand_rstate_is_init(*rstate)) {
+		chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page);
+		*rstate = NAND_RSTATE_READY;
+	}
 
-	for (i = 0; i < chip->ecc.total; i++)
-		ecc_code[i] = chip->oob_poi[eccpos[i]];
+	if (lp_device)
+		nand_wait_cache_load(mtd, chip);
+
+	chip->read_buf(mtd, oob_poi, mtd->oobsize);
+
+	/* Read from start of page  */
+	if (lp_device)
+		chip->cmdfunc(mtd, NAND_CMD_RNDOUT, 0, -1);
+	else
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+
+	/* extract ECC codes while we wait */
+	ecc_p = ecc_code;
+	eccpos = chip->ecc.layout->eccpos[0];
+	for (i = chip->ecc.total; i > 0; i--)
+		*ecc_p++ = oob_poi[eccpos++];
 
 	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
 		int stat;
 
 		chip->ecc.hwctl(mtd, NAND_ECC_READ);
 		chip->read_buf(mtd, p, eccsize);
+
+		/* kick off new read if next page required */
+		if (eccsteps == 1 && nand_next_page_req(*rstate))
+			chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page+1);
+
 		chip->ecc.calculate(mtd, p, &ecc_calc[i]);
 
 		stat = chip->ecc.correct(mtd, p, &ecc_code[i], NULL);
@@ -1125,7 +1236,7 @@ static int nand_read_page_hwecc_oob_first(struct mtd_info *mtd,
  * we need a special oob layout and handling.
  */
 static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
-				   uint8_t *buf, int page)
+				   uint8_t *buf, int page, uint32_t *rstate)
 {
 	int i, eccsize = chip->ecc.size;
 	int eccbytes = chip->ecc.bytes;
@@ -1133,6 +1244,14 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 	uint8_t *p = buf;
 	uint8_t *oob = chip->oob_poi;
 
+	if (nand_rstate_is_init(*rstate)) {
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+		*rstate = NAND_RSTATE_READY;
+	}
+
+	if (nand_is_lp_device(mtd))
+		nand_wait_cache_load(mtd, chip);
+
 	for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
 		int stat;
 
@@ -1166,6 +1285,9 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 	if (i)
 		chip->read_buf(mtd, oob, i);
 
+	if (nand_next_page_req(*rstate))
+		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1);
+
 	return 0;
 }
 
@@ -1233,11 +1355,11 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 	struct nand_chip *chip = mtd->priv;
 	struct mtd_ecc_stats stats;
 	int blkcheck = (1 << (chip->phys_erase_shift - chip->page_shift)) - 1;
-	int sndcmd = 1;
 	int ret = 0;
 	uint32_t readlen = ops->len;
 	uint32_t oobreadlen = ops->ooblen;
 	uint8_t *bufpoi, *oob, *buf;
+	uint32_t rstate = NAND_RSTATE_INIT;
 
 	stats = mtd->ecc_stats;
 
@@ -1260,20 +1382,33 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 		if (realpage != chip->pagebuf || oob) {
 			bufpoi = aligned ? buf : chip->buffers->databuf;
 
-			if (likely(sndcmd)) {
-				chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
-				sndcmd = 0;
+			/*
+			 * Autoinc devices automatically fetch the next page,
+			 * but can't increment over a block boundary
+			 */
+			if (NAND_CANAUTOINCR(chip)) {
+				if (((page + 1) & blkcheck) != 0)
+					rstate |= NAND_RSTATE_NO_REQ;
+				else
+					rstate &= ~NAND_RSTATE_NO_REQ;
 			}
 
+			/* Last read from this chip? */
+			if (((readlen - bytes) == 0) ||
+			    (((realpage + 1) & (chip->pagemask)) == 0))
+				rstate |= NAND_RSTATE_NO_REQ;
+
 			/* Now read the page into the buffer */
 			if (unlikely(ops->mode == MTD_OOB_RAW))
 				ret = chip->ecc.read_page_raw(mtd, chip,
-						bufpoi, page);
+						bufpoi, page, &rstate);
 			else if (!aligned && NAND_SUBPAGE_READ(chip) && !oob)
-				ret = chip->ecc.read_subpage(mtd, chip, col, bytes, bufpoi);
+				ret = chip->ecc.read_subpage(mtd, chip, col,
+							     bytes, bufpoi,
+							     page, &rstate);
 			else
 				ret = chip->ecc.read_page(mtd, chip, bufpoi,
-						page);
+							  page, &rstate);
 			if (ret < 0)
 				break;
 
@@ -1336,12 +1471,6 @@ static int nand_do_read_ops(struct mtd_info *mtd, loff_t from,
 			chip->select_chip(mtd, -1);
 			chip->select_chip(mtd, chipnr);
 		}
-
-		/* Check, if the chip supports auto page increment
-		 * or if we have hit a block boundary.
-		 */
-		if (!NAND_CANAUTOINCR(chip) || !(page & blkcheck))
-			sndcmd = 1;
 	}
 
 	ops->retlen = ops->len - (size_t) readlen;
@@ -1406,6 +1535,8 @@ static int nand_read_oob_std(struct mtd_info *mtd, struct nand_chip *chip,
 {
 	if (sndcmd) {
 		chip->cmdfunc(mtd, NAND_CMD_READOOB, 0, page);
+		if (nand_is_lp_device(mtd))
+			nand_wait_cache_load(mtd, chip);
 		sndcmd = 0;
 	}
 	chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);
@@ -1434,12 +1565,15 @@ static int nand_read_oob_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 	for (i = 0; i < chip->ecc.steps; i++) {
 		if (sndrnd) {
 			pos = eccsize + i * (eccsize + chunk);
-			if (mtd->writesize > 512)
+			if (nand_is_lp_device(mtd))
 				chip->cmdfunc(mtd, NAND_CMD_RNDOUT, pos, -1);
 			else
 				chip->cmdfunc(mtd, NAND_CMD_READ0, pos, page);
-		} else
+		} else {
 			sndrnd = 1;
+			if (nand_is_lp_device(mtd))
+				nand_wait_cache_load(mtd, chip);
+		}
 		toread = min_t(int, length, chunk);
 		chip->read_buf(mtd, bufpoi, toread);
 		bufpoi += toread;
@@ -1503,7 +1637,7 @@ static int nand_write_oob_syndrome(struct mtd_info *mtd,
 	chip->cmdfunc(mtd, NAND_CMD_SEQIN, pos, page);
 	for (i = 0; i < steps; i++) {
 		if (sndcmd) {
-			if (mtd->writesize <= 512) {
+			if (!nand_is_lp_device(mtd)) {
 				uint32_t fill = 0xFFFFFFFF;
 
 				len = eccsize;
@@ -1832,6 +1966,8 @@ static int nand_write_page(struct mtd_info *mtd, struct nand_chip *chip,
 #ifdef CONFIG_MTD_NAND_VERIFY_WRITE
 	/* Send command to read back the data */
 	chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+	if (nand_is_lp_device(mtd))
+		nand_wait_cache_load(mtd, chip);
 
 	if (chip->verify_buf(mtd, buf, mtd->writesize))
 		return -EIO;
@@ -2468,7 +2604,11 @@ static void nand_set_defaults(struct nand_chip *chip, int busw)
 
 	/* check, if a user supplied command function given */
 	if (chip->cmdfunc == NULL)
+#ifdef CONFIG_SYS_NAND_NO_SMALL_PAGE
+		chip->cmdfunc = nand_command_lp;
+#else
 		chip->cmdfunc = nand_command;
+#endif
 
 	/* check, if a user supplied wait function given */
 	if (chip->waitfunc == NULL)
@@ -2627,7 +2767,7 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd,
 	chip->chip_shift = ffs(chip->chipsize) - 1;
 
 	/* Set the bad block position */
-	chip->badblockpos = mtd->writesize > 512 ?
+	chip->badblockpos = nand_is_lp_device(mtd) ?
 		NAND_LARGE_BADBLOCK_POS : NAND_SMALL_BADBLOCK_POS;
 
 	/* Get chip options, preserve non chip based options */
@@ -2651,9 +2791,14 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd,
 	else
 		chip->erase_cmd = single_erase_cmd;
 
+#ifdef CONFIG_SYS_NAND_NO_SMALL_PAGE
+	if (!nand_is_lp_device(mtd))
+		return ERR_PTR(-ENODEV);
+#else
 	/* Do not replace user supplied command function ! */
-	if (mtd->writesize > 512 && chip->cmdfunc == nand_command)
+	if (nand_is_lp_device(mtd) && chip->cmdfunc == nand_command)
 		chip->cmdfunc = nand_command_lp;
+#endif
 
 	MTDDEBUG (MTD_DEBUG_LEVEL0, "NAND device: Manufacturer ID:"
 	          " 0x%02x, Chip ID: 0x%02x (%s %s)\n", *maf_id, dev_id,
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index cb7c19a..85b7c3c 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -269,17 +269,20 @@ struct nand_ecc_ctrl {
 					   uint8_t *calc_ecc);
 	int			(*read_page_raw)(struct mtd_info *mtd,
 						 struct nand_chip *chip,
-						 uint8_t *buf, int page);
+						 uint8_t *buf, int page,
+						 uint32_t *rstate);
 	void			(*write_page_raw)(struct mtd_info *mtd,
 						  struct nand_chip *chip,
 						  const uint8_t *buf);
 	int			(*read_page)(struct mtd_info *mtd,
 					     struct nand_chip *chip,
-					     uint8_t *buf, int page);
+					     uint8_t *buf, int page,
+					     uint32_t *rstate);
 	int			(*read_subpage)(struct mtd_info *mtd,
 					     struct nand_chip *chip,
 					     uint32_t offs, uint32_t len,
-					     uint8_t *buf);
+					     uint8_t *buf, int page,
+					     uint32_t *rstate);
 	void			(*write_page)(struct mtd_info *mtd,
 					      struct nand_chip *chip,
 					      const uint8_t *buf);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-11-25 13:57 Nick Thompson
@ 2009-12-01  0:55 ` Scott Wood
  2009-12-01 10:13   ` Nick Thompson
  0 siblings, 1 reply; 14+ messages in thread
From: Scott Wood @ 2009-12-01  0:55 UTC (permalink / raw)
  To: u-boot

Nick Thompson wrote:
> Improve read performance from Large Page NAND devices.
> 
> This patch employs the following concepts to produce a ~37% improvement in
> oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page
> read is now 260us, 7.88MB/s (was 357us, 5.74MB/s). oob_first is probably
> the best case improvement.
> 
> Provides a new config option to allow building for large page devices only.
> reducing code size by ~800 bytes. [CONFIG_SYS_NAND_NO_SMALL_PAGE]
> This almost exactly compensates for the code increase due to other changes.

Could we make it more orthogonal?  I.e. CONFIG_NAND_512B_PAGE, 
CONFIG_NAND_2K_PAGE, CONFIG_NAND_4K_PAGE?  As is, it does nothing to 
keep things from growing for small-page-only boards.

As it would determine what support is present rather than what the 
hardware actually is, I don't think it would go in CONFIG_SYS.

>  
> +	/* The chip might be ready by now, don't lose anymore time */
> +	if (this->dev_ready) {
> +		if (this->dev_ready(mtd))
> +			goto ready;
> +	} else {
> +		if (this->read_byte(mtd) & NAND_STATUS_READY)
> +			goto ready;
> +	}

Does it really take a noticeable amount of time to do reset_timer() and 
get_timer() once?

> + * Wait for cache ready after read request.
> + *
> + * Returns to read state before returning.
> + *
> + * @mtd:	mtd info structure
> + * @chip:	nand chip info structure
> + */
> +static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *chip)
> +{
> +	int state = nand_wait(mtd, chip);
> +	chip->cmd_ctrl(mtd, NAND_CMD_READSTART, NAND_CLE | NAND_CTRL_CLE |
> +						NAND_CTRL_CHANGE);

NAND_CTRL_CLE includes NAND_CLE.

Why nand_wait() before READSTART?  The existing nand_command_lp() 
doesn't do this AFAICT.

This change will break drivers that support large page and use the 
default read_page functions, but do not implement cmd_ctrl (they replace 
cmdfunc instead).  This includes fsl_elbc_nand, mxc_nand, and 
mpc5121_nfc.  While I'd like to move them to implementing their own 
read_page-type functions instead of cmdfunc, is there any way to make it 
a smoother transition?

> +	chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_CLE | NAND_CTRL_CHANGE);

Shouldn't this be NAND_NCE | NAND_CTRL_CHANGE?  Don't we want to drop 
CLE here?

> +
> +	if (nand_next_page_req(*rstate))
> +		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1);

Spaces around binary operators.

> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
> index cb7c19a..85b7c3c 100644
> --- a/include/linux/mtd/nand.h
> +++ b/include/linux/mtd/nand.h
> @@ -269,17 +269,20 @@ struct nand_ecc_ctrl {
>  					   uint8_t *calc_ecc);
>  	int			(*read_page_raw)(struct mtd_info *mtd,
>  						 struct nand_chip *chip,
> -						 uint8_t *buf, int page);
> +						 uint8_t *buf, int page,
> +						 uint32_t *rstate);
>  	void			(*write_page_raw)(struct mtd_info *mtd,
>  						  struct nand_chip *chip,
>  						  const uint8_t *buf);
>  	int			(*read_page)(struct mtd_info *mtd,
>  					     struct nand_chip *chip,
> -					     uint8_t *buf, int page);
> +					     uint8_t *buf, int page,
> +					     uint32_t *rstate);
>  	int			(*read_subpage)(struct mtd_info *mtd,
>  					     struct nand_chip *chip,
>  					     uint32_t offs, uint32_t len,
> -					     uint8_t *buf);
> +					     uint8_t *buf, int page,
> +					     uint32_t *rstate);

Does rstate really need to be a parameter, or could it be part of the 
mtd struct?  I'd really like nand_do_read_ops() to not have to know 
about any of this, and have it be internal to the read_page functions -- 
other than perhaps clearing the value on exit, or some other way to let 
read_page know that its context has changed.

If we need to communicate to the read_page function whether this is the 
last page, then that can be a separate flag that isn't tied up with what 
the hardware is capable of, or whether a boundary is being spanned.

-Scott

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-12-01  0:55 ` Scott Wood
@ 2009-12-01 10:13   ` Nick Thompson
  2009-12-01 11:34     ` Nick Thompson
  2009-12-01 18:38     ` Scott Wood
  0 siblings, 2 replies; 14+ messages in thread
From: Nick Thompson @ 2009-12-01 10:13 UTC (permalink / raw)
  To: u-boot

On 01/12/09 00:55, Scott Wood wrote:
> Nick Thompson wrote:
>> Improve read performance from Large Page NAND devices.
>>
>> This patch employs the following concepts to produce a ~37% improvement in
>> oob_first read speed (on a 300MHz ARM9). The time for a mid-buffer 2k page
>> read is now 260us, 7.88MB/s (was 357us, 5.74MB/s). oob_first is probably
>> the best case improvement.
>>
>> Provides a new config option to allow building for large page devices only.
>> reducing code size by ~800 bytes. [CONFIG_SYS_NAND_NO_SMALL_PAGE]
>> This almost exactly compensates for the code increase due to other changes.
> 
> Could we make it more orthogonal?  I.e. CONFIG_NAND_512B_PAGE, 
> CONFIG_NAND_2K_PAGE, CONFIG_NAND_4K_PAGE?  As is, it does nothing to 
> keep things from growing for small-page-only boards.

I guess that would be possible. The proposed usage is to determine
if the incompatible small page command set is supported as well as
the ONFI large page command set. So it excludes nand_command, which
should now have an unused attribute, and optimises away conditionals
in many other functions.

It's not currently there to decided what pages sizes are supported.

I could put in a CONFIG_NAND_NO_LARGE_PAGE as well, but since small
page devices are EOL now I was only looking to make it easier to
remove legacy code (a bit like the Museum IDs in nand_ids.c).

> 
> As it would determine what support is present rather than what the 
> hardware actually is, I don't think it would go in CONFIG_SYS.

Right, that's fine.

>> +	/* The chip might be ready by now, don't lose anymore time */
>> +	if (this->dev_ready) {
>> +		if (this->dev_ready(mtd))
>> +			goto ready;
>> +	} else {
>> +		if (this->read_byte(mtd) & NAND_STATUS_READY)
>> +			goto ready;
>> +	}
> 
> Does it really take a noticeable amount of time to do reset_timer() and 
> get_timer() once?

On ARM the timer functions use divides to adjust the timer values.
it takes several microseconds longer to spot the ready status when
the chip is in fact already ready. It's not major, but it is easily
measurable.

> 
>> + * Wait for cache ready after read request.
>> + *
>> + * Returns to read state before returning.
>> + *
>> + * @mtd:	mtd info structure
>> + * @chip:	nand chip info structure
>> + */
>> +static int nand_wait_cache_load(struct mtd_info *mtd, struct nand_chip *chip)
>> +{
>> +	int state = nand_wait(mtd, chip);
>> +	chip->cmd_ctrl(mtd, NAND_CMD_READSTART, NAND_CLE | NAND_CTRL_CLE |
>> +						NAND_CTRL_CHANGE);
> 
> NAND_CTRL_CLE includes NAND_CLE.

Yes, will fix.

> 
> Why nand_wait() before READSTART?  The existing nand_command_lp() 
> doesn't do this AFAICT.

In fact I'm only after the functionality in nand_wait. nand_wait always
sends a "read status" command. To get out of read status state you have
to send a new command and, if waiting for a read to complete, the obvious
command is read start as this puts the chip back into read state.

The full sequence is "read", "read start", (chip goes busy), "read
status", (polling status reads...), (chip goes ready), "read start",
(data reads...).

Currently nand_command_lp always uses a the "ready busy" read function, or
a timer if the read busy pin is not accessible. Using a timer is out of the
question here, since (we are hoping) significant time has already elapsed.

> This change will break drivers that support large page and use the 
> default read_page functions, but do not implement cmd_ctrl (they replace 
> cmdfunc instead).  This includes fsl_elbc_nand, mxc_nand, and 
> mpc5121_nfc.  While I'd like to move them to implementing their own 
> read_page-type functions instead of cmdfunc, is there any way to make it 
> a smoother transition?

Yes, as it stands they would need modifying simultaneously and I have no
way to test such a change myself. The only required change in cmdfunc is
not to wait after a read0 request. You maybe in a better position to decide
if this has wider repercussions, but I will take a look at the above
drivers as well. [This is the main reason I made this an RFC].

> 
>> +	chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_CLE | NAND_CTRL_CHANGE);
> 
> Shouldn't this be NAND_NCE | NAND_CTRL_CHANGE?  Don't we want to drop 
> CLE here?

You are right, will fix.

> 
>> +
>> +	if (nand_next_page_req(*rstate))
>> +		chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page+1);
> 
> Spaces around binary operators.

Okay.

> 
>> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
>> index cb7c19a..85b7c3c 100644
>> --- a/include/linux/mtd/nand.h
>> +++ b/include/linux/mtd/nand.h
>> @@ -269,17 +269,20 @@ struct nand_ecc_ctrl {
>>  					   uint8_t *calc_ecc);
>>  	int			(*read_page_raw)(struct mtd_info *mtd,
>>  						 struct nand_chip *chip,
>> -						 uint8_t *buf, int page);
>> +						 uint8_t *buf, int page,
>> +						 uint32_t *rstate);
>>  	void			(*write_page_raw)(struct mtd_info *mtd,
>>  						  struct nand_chip *chip,
>>  						  const uint8_t *buf);
>>  	int			(*read_page)(struct mtd_info *mtd,
>>  					     struct nand_chip *chip,
>> -					     uint8_t *buf, int page);
>> +					     uint8_t *buf, int page,
>> +					     uint32_t *rstate);
>>  	int			(*read_subpage)(struct mtd_info *mtd,
>>  					     struct nand_chip *chip,
>>  					     uint32_t offs, uint32_t len,
>> -					     uint8_t *buf);
>> +					     uint8_t *buf, int page,
>> +					     uint32_t *rstate);
> 
> Does rstate really need to be a parameter, or could it be part of the 
> mtd struct?  I'd really like nand_do_read_ops() to not have to know 
> about any of this, and have it be internal to the read_page functions -- 
> other than perhaps clearing the value on exit, or some other way to let 
> read_page know that its context has changed.
> 
> If we need to communicate to the read_page function whether this is the 
> last page, then that can be a separate flag that isn't tied up with what 
> the hardware is capable of, or whether a boundary is being spanned.

Only do_read_ops knows if this is the last page read, so that does need
to be passed to read page. I didn't want to add such low level stuff into
mtd struct as it is very transitory information and bloats the struct.

The page read function also needs to know if this is the first page read.
Currently rstate's main use is this first (INIT) and/or last (NO_REQ)
state.

The page read functions can handle the boundaries and autoinc status, but
only at the expense of repeating the same tests in each read page function.
I'm okay with that, if its acceptable. It will make the code slightly
larger, but I don't really like the way I'm forcing a "last page" state to
avoid sending reads in the autoinc case as it is masking the true purpose
of the flag.

Thanks,
Nick.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-12-01 10:13   ` Nick Thompson
@ 2009-12-01 11:34     ` Nick Thompson
  2009-12-01 18:43       ` Scott Wood
  2009-12-01 18:38     ` Scott Wood
  1 sibling, 1 reply; 14+ messages in thread
From: Nick Thompson @ 2009-12-01 11:34 UTC (permalink / raw)
  To: u-boot

On 01/12/09 10:13, Nick Thompson wrote:
> On 01/12/09 00:55, Scott Wood wrote:

>> This change will break drivers that support large page and use the 
>> default read_page functions, but do not implement cmd_ctrl (they replace 
>> cmdfunc instead).  This includes fsl_elbc_nand, mxc_nand, and 
>> mpc5121_nfc.  While I'd like to move them to implementing their own 
>> read_page-type functions instead of cmdfunc, is there any way to make it 
>> a smoother transition?
> 
> Yes, as it stands they would need modifying simultaneously and I have no
> way to test such a change myself. The only required change in cmdfunc is
> not to wait after a read0 request. You maybe in a better position to decide
> if this has wider repercussions, but I will take a look at the above
> drivers as well. [This is the main reason I made this an RFC].

How about, if nand_wait_cache_load was replaceable (by a no-op in your case)
and the pre-fetch optimisation could be disabled by setting rstate to 
(INIT | NO_REQ) on every page read function call #ifdef
CONFIG_NAND_NO_PREFETCH_READS?

This leaves a problem with NAND_CMD_RNDOUT which is used by oob_first
page reads but not supported by fsl_elbc_cmdfunc. I expect you don't use
oob_first though..?

I believe this would allow you to restore the original sequences and keep you
going until you can define your own page read functions.

[BTW these changes applied quite cleanly to my Linux tree and give similar
performance gains there as well. If we can reach agreement here, I will
make patches for Linux as well. Unfortunately, my tree is 2.6.18 with
backported NAND support from 2.6.32rc1, but I can deal with that.]

Nick.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-12-01 10:13   ` Nick Thompson
  2009-12-01 11:34     ` Nick Thompson
@ 2009-12-01 18:38     ` Scott Wood
  1 sibling, 0 replies; 14+ messages in thread
From: Scott Wood @ 2009-12-01 18:38 UTC (permalink / raw)
  To: u-boot

Nick Thompson wrote:
> I could put in a CONFIG_NAND_NO_LARGE_PAGE as well, but since small
> page devices are EOL now I was only looking to make it easier to
> remove legacy code (a bit like the Museum IDs in nand_ids.c).

I don't think small page support is going away any time soon.

>> Does it really take a noticeable amount of time to do reset_timer() and 
>> get_timer() once?
> 
> On ARM the timer functions use divides to adjust the timer values.

Yay mandatory 1000 Hz clock rate. :-P

Does ARM have a 64-bit multiply that can be used to speed up 
division-by-constant?  If not, can the division be pulled out of 
reset_timer by doing (new - old) / rate instead of new/rate - old/rate?

I assume it's OK for reset_timer to reset completely to zero, including 
the remainder.

> it takes several microseconds longer to spot the ready status when
> the chip is in fact already ready. It's not major, but it is easily
> measurable.

Wow, I know divide is expensive, but several microseconds?

>> Why nand_wait() before READSTART?  The existing nand_command_lp() 
>> doesn't do this AFAICT.
> 
> In fact I'm only after the functionality in nand_wait. nand_wait always
> sends a "read status" command. To get out of read status state you have
> to send a new command and, if waiting for a read to complete, the obvious
> command is read start as this puts the chip back into read state.
> 
> The full sequence is "read", "read start", (chip goes busy), "read
> status", (polling status reads...), (chip goes ready), "read start",
> (data reads...).

Ah, I see.  So this is specifically to help the case where you don't 
have access to the busy pin?  I'm wondering if it will slow things down 
when the busy pin is hooked up, versus not having to issue the extra 
commands and read the status.

>> If we need to communicate to the read_page function whether this is the 
>> last page, then that can be a separate flag that isn't tied up with what 
>> the hardware is capable of, or whether a boundary is being spanned.
> 
> Only do_read_ops knows if this is the last page read, so that does need
> to be passed to read page.

Right, the "if" was wondering what the impact would be of just 
unconditionally fetching the next page.

> I didn't want to add such low level stuff into
> mtd struct as it is very transitory information and bloats the struct.

It's internal state of functions supplied by the generic code; that 
seems the place for such things.

Though I don't really care that much, as long as nand_do_read_ops() 
doesn't have to manage the state beyond saying "start" and "end".

> The page read function also needs to know if this is the first page read.

Clearing the state on entry to the read ops function would allow that, 
with only minor dictation of how read_page uses the field.

> The page read functions can handle the boundaries and autoinc status, but
> only at the expense of repeating the same tests in each read page function.

The tests can be factored out into helper functions -- but it should be 
the replaceable read_page functions that contain such low-level 
knowledge.  nand_do_read_ops() should just be a high-level loop around 
read_page(), plus the OOB and partial page shuffling.

-Scott

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-12-01 11:34     ` Nick Thompson
@ 2009-12-01 18:43       ` Scott Wood
  0 siblings, 0 replies; 14+ messages in thread
From: Scott Wood @ 2009-12-01 18:43 UTC (permalink / raw)
  To: u-boot

Nick Thompson wrote:
> On 01/12/09 10:13, Nick Thompson wrote:
>> On 01/12/09 00:55, Scott Wood wrote:
> 
>>> This change will break drivers that support large page and use the 
>>> default read_page functions, but do not implement cmd_ctrl (they replace 
>>> cmdfunc instead).  This includes fsl_elbc_nand, mxc_nand, and 
>>> mpc5121_nfc.  While I'd like to move them to implementing their own 
>>> read_page-type functions instead of cmdfunc, is there any way to make it 
>>> a smoother transition?
>> Yes, as it stands they would need modifying simultaneously and I have no
>> way to test such a change myself. The only required change in cmdfunc is
>> not to wait after a read0 request. You maybe in a better position to decide
>> if this has wider repercussions, but I will take a look at the above
>> drivers as well. [This is the main reason I made this an RFC].
> 
> How about, if nand_wait_cache_load was replaceable (by a no-op in your case)

Or it could be off by default, and enabled only on those platforms where 
it works and is beneficial.

> and the pre-fetch optimisation could be disabled by setting rstate to 
> (INIT | NO_REQ) on every page read function call #ifdef
> CONFIG_NAND_NO_PREFETCH_READS?
> 
> This leaves a problem with NAND_CMD_RNDOUT which is used by oob_first
> page reads but not supported by fsl_elbc_cmdfunc. I expect you don't use
> oob_first though..?

Right.  It's also used on swecc, but we don't use that either.

> I believe this would allow you to restore the original sequences and keep you
> going until you can define your own page read functions.
> 
> [BTW these changes applied quite cleanly to my Linux tree and give similar
> performance gains there as well. If we can reach agreement here, I will
> make patches for Linux as well. Unfortunately, my tree is 2.6.18 with
> backported NAND support from 2.6.32rc1, but I can deal with that.]

If possible, it would be nice to run these patches by the Linux list 
now, so we get their feedback earlier rather than later.

-Scott

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
@ 2009-12-08 15:33 Nick Thompson
  2009-12-08 22:06 ` Wolfgang Denk
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Thompson @ 2009-12-08 15:33 UTC (permalink / raw)
  To: u-boot

Improve read performance from Large Page NAND devices.

This patch produces a ~31% improvement in oob_first read speed (on a
300MHz ARM9). The time for a mid-buffer 2k page read is now 293us,
6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case
improvement.

Signed-off-by: Nick Thompson <nick.thompson@ge.com>
---

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-12-08 15:33 [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices Nick Thompson
@ 2009-12-08 22:06 ` Wolfgang Denk
  2009-12-09  9:43   ` Nick Thompson
  2009-12-09 11:02   ` Wolfgang Denk
  0 siblings, 2 replies; 14+ messages in thread
From: Wolfgang Denk @ 2009-12-08 22:06 UTC (permalink / raw)
  To: u-boot

Dear Nick Thompson,

In message <4B1E71D9.6080802@ge.com> you wrote:
> Improve read performance from Large Page NAND devices.
> 
> This patch produces a ~31% improvement in oob_first read speed (on a
> 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us,
> 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case
> improvement.
> 
> Signed-off-by: Nick Thompson <nick.thompson@ge.com>

I tested this on mpc5121ads (sector size 128 KiB) and sequoia (sector
size 16 KiB).

The patch applied not cleanly against "master" (but was easy to fix).

However, I did not notice any changes to the speed for a "nand read"
at all.

Is this not the right flash types, or not thr right type of test?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
The Gates in my computer are AND, OR and NOT; they are not Bill.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-12-08 22:06 ` Wolfgang Denk
@ 2009-12-09  9:43   ` Nick Thompson
  2009-12-09 11:02   ` Wolfgang Denk
  1 sibling, 0 replies; 14+ messages in thread
From: Nick Thompson @ 2009-12-09  9:43 UTC (permalink / raw)
  To: u-boot

On 08/12/09 22:06, Wolfgang Denk wrote:
> Dear Nick Thompson,
> 
> In message <4B1E71D9.6080802@ge.com> you wrote:
>> Improve read performance from Large Page NAND devices.
>>
>> This patch produces a ~31% improvement in oob_first read speed (on a
>> 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us,
>> 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case
>> improvement.
>>
>> Signed-off-by: Nick Thompson <nick.thompson@ge.com>
> 
> I tested this on mpc5121ads (sector size 128 KiB) and sequoia (sector
> size 16 KiB).
> 
> The patch applied not cleanly against "master" (but was easy to fix).
> 
> However, I did not notice any changes to the speed for a "nand read"
> at all.
> 
> Is this not the right flash types, or not thr right type of test?
> 
> Best regards,
> 
> Wolfgang Denk
> 

Hi Wolfgang,

Thanks for testing this. I think this has worked as I would have hoped.
Your test is fine. I use "nand read" and make measurements on a mixed
signal 'scope.

On the mpc5121ads, mpc5121_nfc.c is used which defines its own command
function. This function always waits for read commands to complete and
the config file doesn't define the CONFIG_NAND_READ_CMD_NO_WAIT config
(and in this case it can't) so the read should behave more or less
identically.

[Possibly the command function could be changed to not wait for read
to complete...]

On Sequoia, ndfc.c is used which appears to use the default command
functions. If the sector size is 16kBytes I assume this is a small page
NAND device and you should see no change at all. If it was a large page
device you would also see no difference, unless the config file set
CONFIG_NAND_READ_CMD_NO_WAIT, in which case the read_page function for
H/W ECC, should fetch the next page in parallel to doing ECC correction,
leading to an improvement in performance.

All in all your tests was very successful, and though I left you
disappointed this time, from my point of view I'm very pleased :)

Thanks,
Nick.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-12-08 22:06 ` Wolfgang Denk
  2009-12-09  9:43   ` Nick Thompson
@ 2009-12-09 11:02   ` Wolfgang Denk
  2009-12-09 11:43     ` Nick Thompson
  1 sibling, 1 reply; 14+ messages in thread
From: Wolfgang Denk @ 2009-12-09 11:02 UTC (permalink / raw)
  To: u-boot

Dear Nick Thompson,

In message <4B1E71D9.6080802@ge.com> you wrote:
> Improve read performance from Large Page NAND devices.
> 
> This patch produces a ~31% improvement in oob_first read speed (on a
> 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us,
> 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case
> improvement.
> 
> Signed-off-by: Nick Thompson <nick.thompson@ge.com>

Also tested on Canyonlands (460EX); here I actually see a slightj
improvement (5.5% faster, i. e. time to read 126 MB from NAND goes
down from 28.8 to 27.2 seconds (4.4 -> 4.6 MiB/s).

Tested-by: Wolfgang Denk <wd@denx.de>

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
Any time things appear to be going better, you have overlooked  some-
thing.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-12-09 11:02   ` Wolfgang Denk
@ 2009-12-09 11:43     ` Nick Thompson
  2010-01-16  1:51       ` Josh Gelinske
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Thompson @ 2009-12-09 11:43 UTC (permalink / raw)
  To: u-boot

On 09/12/09 11:02, Wolfgang Denk wrote:
> Dear Nick Thompson,
> 
> In message <4B1E71D9.6080802@ge.com> you wrote:
>> Improve read performance from Large Page NAND devices.
>>
>> This patch produces a ~31% improvement in oob_first read speed (on a
>> 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us,
>> 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case
>> improvement.
>>
>> Signed-off-by: Nick Thompson <nick.thompson@ge.com>
> 
> Also tested on Canyonlands (460EX); here I actually see a slightj
> improvement (5.5% faster, i. e. time to read 126 MB from NAND goes
> down from 28.8 to 27.2 seconds (4.4 -> 4.6 MiB/s).
> 
> Tested-by: Wolfgang Denk <wd@denx.de>

Hi Wolfgang,

Thanks again.

It seems the raw page data transfer rate is quite low on that board.
The patch saves time between page data transfers, so the percentage
improvement seen is better if you can get the page data out quicker.

The default read_buf (and write_buf) in nand_base.c are safe, but slow.
I put in davinci specific optimised versions (DMA or multibyte read
ticks might be used) to double my raw transfer rate. Without that, my
measurements would show ~15% improvement only.

In total on da830evm I'm getting a >300% speed improvement. 1.69MB/s
changes to 6.99MB/s. (5.31MB/s without this patch.)

Nick.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2009-12-09 11:43     ` Nick Thompson
@ 2010-01-16  1:51       ` Josh Gelinske
  2010-01-18 12:48         ` Nick Thompson
  0 siblings, 1 reply; 14+ messages in thread
From: Josh Gelinske @ 2010-01-16  1:51 UTC (permalink / raw)
  To: u-boot


What kind of CPU usage are you seeing? I am throughput of ~1.9MBs for writes
on a Samsung K9WBG08U1M 4GB with 4K page but with high cpu usage.



Nick Thompson-9 wrote:
> 
> On 09/12/09 11:02, Wolfgang Denk wrote:
>> Dear Nick Thompson,
>> 
>> In message <4B1E71D9.6080802@ge.com> you wrote:
>>> Improve read performance from Large Page NAND devices.
>>>
>>> This patch produces a ~31% improvement in oob_first read speed (on a
>>> 300MHz ARM9). The time for a mid-buffer 2k page read is now 293us,
>>> 6.99MB/s (was 385us, 5.31MB/s). oob_first is probably the best case
>>> improvement.
>>>
>>> Signed-off-by: Nick Thompson <nick.thompson@ge.com>
>> 
>> Also tested on Canyonlands (460EX); here I actually see a slightj
>> improvement (5.5% faster, i. e. time to read 126 MB from NAND goes
>> down from 28.8 to 27.2 seconds (4.4 -> 4.6 MiB/s).
>> 
>> Tested-by: Wolfgang Denk <wd@denx.de>
> 
> Hi Wolfgang,
> 
> Thanks again.
> 
> It seems the raw page data transfer rate is quite low on that board.
> The patch saves time between page data transfers, so the percentage
> improvement seen is better if you can get the page data out quicker.
> 
> The default read_buf (and write_buf) in nand_base.c are safe, but slow.
> I put in davinci specific optimised versions (DMA or multibyte read
> ticks might be used) to double my raw transfer rate. Without that, my
> measurements would show ~15% improvement only.
> 
> In total on da830evm I'm getting a >300% speed improvement. 1.69MB/s
> changes to 6.99MB/s. (5.31MB/s without this patch.)
> 
> Nick.
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
> 
> 

-- 
View this message in context: http://old.nabble.com/-U-Boot---PATCH-RFC--NAND%3A-Improve-read-performance-from-Large-Page%09NAND-devices-tp26695701p27186018.html
Sent from the Uboot - Users mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2010-01-16  1:51       ` Josh Gelinske
@ 2010-01-18 12:48         ` Nick Thompson
  2010-01-18 15:16           ` Josh Gelinske
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Thompson @ 2010-01-18 12:48 UTC (permalink / raw)
  To: u-boot

On 16/01/10 01:51, Josh Gelinske wrote:
> 
> What kind of CPU usage are you seeing? I am throughput of ~1.9MBs for writes
> on a Samsung K9WBG08U1M 4GB with 4K page but with high cpu usage.
> 

I'm not sure what you are asking here. There is no idle loop to measure so
CPU is running at 100%.

The code does poll the NAND device for data ready, but in my case the NAND
loads its cache faster than the CPU can finish sorting out ECC checks
(despite ECC H/W assistance). The result is the CPU never actually on the
NAND.

This patch (not yet formally submitted) removes this potential waiting time
that was present - plus some redundant command sequences due to incorrect
function levelling. Most of my performance gain comes from optimising the
data transfer (to and) from the NAND - see drivers/mtd/nand/davinci_nand.c
- which is already in the mainline.

Apologies if that doesn't answer your question.

Nick.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices
  2010-01-18 12:48         ` Nick Thompson
@ 2010-01-18 15:16           ` Josh Gelinske
  0 siblings, 0 replies; 14+ messages in thread
From: Josh Gelinske @ 2010-01-18 15:16 UTC (permalink / raw)
  To: u-boot

I hadn't dug into it yet to find the polling behavior but like you said
with no idle the CPU. It was just different behavior from what I see
with the SD which is some CPU time spent waiting on I/O (maybe because
of the DMA vs polling).

I did find a few of simple optimizations that gained me ~1MBs increase
on NAND and SD as well.
- /proc/sys/vm/dirty_expire_centiseconds = 2
- /proc/sys/vm/dirty_background_ratio = 5
- Unchecked KConfig "Optimize for size" which was enabled by default.

Thanks for your feedback.


Appareo Systems, LLC
1810 NDSU Research Circle N
Fargo, ND 58102


JOSH GELINSKE
FIRMWARE ENGINEERING MANAGER

Appareo Systems, LLC
1810 NDSU Research Circle N
Fargo, ND 58102
P: (701) 356-2200 Ext 251
C: (701) 306-0899
F: (701) 356-3157

http://www.appareo.com
jgelinske at appareo.com

NOTICE: This message (including attachments) is covered by the Electronic Communication Privacy Act, 18 U.S.C. sections 2510-2521, is CONFIDENTIAL and may also be protected by ATTORNEY-CLIENT OR OTHER PRIVILEGE. If you believe that it has been sent to you in error, do not read it. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited. Please reply to the sender that you have received the message in error and then delete it.


-----Original Message-----
From: Nick Thompson [mailto:nick.thompson at ge.com] 
Sent: Monday, January 18, 2010 6:48 AM
To: Josh Gelinske
Cc: u-boot at lists.denx.de
Subject: Re: [U-Boot] [PATCH RFC] NAND: Improve read performance from
Large Page NAND devices

On 16/01/10 01:51, Josh Gelinske wrote:
> 
> What kind of CPU usage are you seeing? I am throughput of ~1.9MBs for
writes
> on a Samsung K9WBG08U1M 4GB with 4K page but with high cpu usage.
> 

I'm not sure what you are asking here. There is no idle loop to measure
so
CPU is running at 100%.

The code does poll the NAND device for data ready, but in my case the
NAND
loads its cache faster than the CPU can finish sorting out ECC checks
(despite ECC H/W assistance). The result is the CPU never actually on
the
NAND.

This patch (not yet formally submitted) removes this potential waiting
time
that was present - plus some redundant command sequences due to
incorrect
function levelling. Most of my performance gain comes from optimising
the
data transfer (to and) from the NAND - see
drivers/mtd/nand/davinci_nand.c
- which is already in the mainline.

Apologies if that doesn't answer your question.

Nick.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-01-18 15:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-08 15:33 [U-Boot] [PATCH RFC] NAND: Improve read performance from Large Page NAND devices Nick Thompson
2009-12-08 22:06 ` Wolfgang Denk
2009-12-09  9:43   ` Nick Thompson
2009-12-09 11:02   ` Wolfgang Denk
2009-12-09 11:43     ` Nick Thompson
2010-01-16  1:51       ` Josh Gelinske
2010-01-18 12:48         ` Nick Thompson
2010-01-18 15:16           ` Josh Gelinske
  -- strict thread matches above, loose matches on Subject: below --
2009-11-25 13:57 Nick Thompson
2009-12-01  0:55 ` Scott Wood
2009-12-01 10:13   ` Nick Thompson
2009-12-01 11:34     ` Nick Thompson
2009-12-01 18:43       ` Scott Wood
2009-12-01 18:38     ` Scott Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox