* NAND BBT corruption on MPC83xx @ 2011-06-17 20:54 Matthew L. Creech 2011-06-17 21:34 ` Scott Wood 0 siblings, 1 reply; 15+ messages in thread From: Matthew L. Creech @ 2011-06-17 20:54 UTC (permalink / raw) To: linuxppc-dev Hi, I posted this on the Linux-MTD list but haven't gotten any hits. Since it looks like it could be MPC83xx-specific, I'm reposting here. Rick Johnson noted a problem in fsl_elbc_nand.c back in May which might be related: http://lists.infradead.org/pipermail/linux-mtd/2011-May/035372.html We've gotten some devices back from the field which all suffer from this same problem on bootup when attaching UBI (these messages are from U-Boot): ... Bad block table found at page 524224, version 0x01 Bad block table found at page 524160, version 0x01 nand_bbt: ECC error while reading bad block table ...(long stream of bogus bad blocks)... UBI: attaching mtd1 to ubi0 UBI: physical eraseblock size: 131072 bytes (128 KiB) UBI: logical eraseblock size: 129024 bytes UBI: smallest flash I/O unit: 2048 UBI: sub-page size: 512 UBI: VID header offset: 512 (aligned 512) UBI: data offset: 2048 UBI error: vtbl_check: volume table check failed: record 0, error 9 UBI error: ubi_init: cannot attach mtd1 UBI error: ubi_init: UBI error: cannot initialize UBI, error -22 UBI init error -22 Full console dumps from 2 devices are here: http://mcreech.com/work/bbt-ecc-error.txt http://mcreech.com/work/bbt-ecc-error2.txt Another device encountered a slightly different error, but which I assume is due to the same underlying problem: UBI error: init_volumes: not enough PEBs, required 8061, available 8059 UBI error: ubi_wl_init_scan: no enough physical eraseblocks (-2, need 1) A full dump from that one is here: http://mcreech.com/work/bbt-ecc-error3.txt Are there any known issues that could cause the BBT to become corrupt like this? I noticed that the reported bad blocks were all aligned at multiples of 0x80000 (with one exception). Dump #1 shows: - one BBT with lots of bytes that have their lower 1 or 2 bits un-set (e.g. 0xfe instead of 0xff): this explains all the each-4th-block alignment. - the other BBT shows only one factory-marked bad block at 0x062e0000, which is presumably correct. This is preserved in the bogus BBT, and is the only non-0x80000-aligned bad block in the table. - Only the first 1024 bytes of the BBT contain bogus info - the latter half of the BBT is all correct It seems like the original BBT somehow had 0-2 bits corrupted at the low end of each of its bytes, either while in memory or when the BBT was written to NAND. Any ideas on what I can do to isolate the problem? Thanks in advance! More info on this board: - MPC 8313 SoC - 1GB Samsung NAND flash (K9K8G08U0B) - Linux 2.6.31 - U-Boot 2009.06 -- Matthew L. Creech ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NAND BBT corruption on MPC83xx 2011-06-17 20:54 NAND BBT corruption on MPC83xx Matthew L. Creech @ 2011-06-17 21:34 ` Scott Wood 2011-06-18 17:55 ` Mike Hench ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Scott Wood @ 2011-06-17 21:34 UTC (permalink / raw) To: Matthew L. Creech; +Cc: linux-mtd, linuxppc-dev On Fri, 17 Jun 2011 16:54:27 -0400 "Matthew L. Creech" <mlcreech@gmail.com> wrote: > Hi, I posted this on the Linux-MTD list but haven't gotten any hits. > Since it looks like it could be MPC83xx-specific, I'm reposting here. > Rick Johnson noted a problem in fsl_elbc_nand.c back in May which > might be related: > > http://lists.infradead.org/pipermail/linux-mtd/2011-May/035372.html It seems that the generic code always passes -1 with PAGEPROG, and only provides the actual page address on SEQIN. I don't think the ECC readback is needed, and the fact that it looks like it has always been broken would seem to confirm that. It's broken in other ways, too -- it assumes a particular ECC layout. Let's get rid of it. As for the corruption, could it be degradation from repeated reads of that one page? > More info on this board: > - MPC 8313 SoC > - 1GB Samsung NAND flash (K9K8G08U0B) > - Linux 2.6.31 > - U-Boot 2009.06 Hmm, 2.6.31... it's probably not related to this problem, but you should cherry pick b3a70f0bc32d1b70584bcaa6019fa4260b0da92e and 476459a6cf46d20ec73d9b211f3894ced5f9871e. -Scott ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: NAND BBT corruption on MPC83xx 2011-06-17 21:34 ` Scott Wood @ 2011-06-18 17:55 ` Mike Hench 2011-06-20 11:22 ` Atlant Schmidt 2011-06-20 15:20 ` Matthew L. Creech 2011-07-05 19:58 ` Matthew L. Creech 2 siblings, 1 reply; 15+ messages in thread From: Mike Hench @ 2011-06-18 17:55 UTC (permalink / raw) To: Scott Wood, Matthew L. Creech; +Cc: linuxppc-dev, linux-mtd Scott Wood wrote: > As for the corruption, could it be degradation from repeated reads of that > one page? Read Disturb. I Did not know SLC did that. It just takes 10x as long as MLC, on the order of a million reads. Supposedly erasing the block fixes it. It is not a permanent damage thing. I was seeing ~9 hours before failure with heavy writes. ~4GByte/hour =3D 2M pages, total ~18 million reads before errors in that last block showed up. Cool. Now we know. Thanks. Mike Hench ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: NAND BBT corruption on MPC83xx 2011-06-18 17:55 ` Mike Hench @ 2011-06-20 11:22 ` Atlant Schmidt 2011-06-23 8:31 ` Artem Bityutskiy 0 siblings, 1 reply; 15+ messages in thread From: Atlant Schmidt @ 2011-06-20 11:22 UTC (permalink / raw) To: 'Mike Hench', Scott Wood, Matthew L. Creech Cc: linux-mtd@lists.infradead.org, linuxppc-dev@lists.ozlabs.org Mike: > It is not a permanent damage thing. A "read disturb" does no permanent damage to the chip but if the read disturb event involves more bits than can be corrected by your ECC code, it can do permanent damage to the *DATA* you've stored in that block. For this reason, a good flash management system manages to at least occasionally read through *ALL* of the in-use blocks in the device so that single-bit errors can be scrubbed out (read and successfully corrected) before an adjacent bit in the block also fails (which would eventually lead to a multi-bit error that might be beyond the ability to be corrected by the ECC). As far as I know (and I'm sure the list will correct me if I'm wrong! ;-) ), neither UBI nor UBIFS nor any Linux layer provides this routine scrubbing; you have to code it up yourself, probably by accessing the device at the UBI (underlying block device/LEB) layer. Atlant -----Original Message----- From: linux-mtd-bounces@lists.infradead.org [mailto:linux-mtd-bounces@lists= .infradead.org] On Behalf Of Mike Hench Sent: Saturday, June 18, 2011 13:55 To: Scott Wood; Matthew L. Creech Cc: linuxppc-dev@lists.ozlabs.org; linux-mtd@lists.infradead.org Subject: RE: NAND BBT corruption on MPC83xx Scott Wood wrote: > As for the corruption, could it be degradation from repeated reads of that > one page? Read Disturb. I Did not know SLC did that. It just takes 10x as long as MLC, on the order of a million reads. Supposedly erasing the block fixes it. It is not a permanent damage thing. I was seeing ~9 hours before failure with heavy writes. ~4GByte/hour =3D 2M pages, total ~18 million reads before errors in that last block showed up. Cool. Now we know. Thanks. Mike Hench ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ This e-mail and the information, including any attachments, it contains are= intended to be a confidential communication only to the person or entity t= o whom it is addressed and may contain information that is privileged. If t= he reader of this message is not the intended recipient, you are hereby not= ified that any dissemination, distribution or copying of this communication= is strictly prohibited. If you have received this communication in error, = please immediately notify the sender and destroy the original message. Thank you. Please consider the environment before printing this email. ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: NAND BBT corruption on MPC83xx 2011-06-20 11:22 ` Atlant Schmidt @ 2011-06-23 8:31 ` Artem Bityutskiy 0 siblings, 0 replies; 15+ messages in thread From: Artem Bityutskiy @ 2011-06-23 8:31 UTC (permalink / raw) To: Atlant Schmidt Cc: Scott Wood, linuxppc-dev@lists.ozlabs.org, linux-mtd@lists.infradead.org, Matthew L. Creech, 'Mike Hench' On Mon, 2011-06-20 at 07:22 -0400, Atlant Schmidt wrote: > > As far as I know (and I'm sure the list will correct > me if I'm wrong! ;-) ), neither UBI nor UBIFS nor any > Linux layer provides this routine scrubbing; you have > to code it up yourself, probably by accessing the > device at the UBI (underlying block device/LEB) layer. UBI will scrub all LEBs with bit-flips once they are read. But if you have bit-flips in an LEB and it is never read, it will never be scrubbed. And erasures of the neighboring PEBs may turn bit-flips into hard errors. To force scrubbing, the easies way is to just read all volumes, like dd if=/dev/ubi0_i of=/dev/null bs=4096 for each i. -- Best Regards, Artem Bityutskiy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NAND BBT corruption on MPC83xx 2011-06-17 21:34 ` Scott Wood 2011-06-18 17:55 ` Mike Hench @ 2011-06-20 15:20 ` Matthew L. Creech 2011-07-05 19:58 ` Matthew L. Creech 2 siblings, 0 replies; 15+ messages in thread From: Matthew L. Creech @ 2011-06-20 15:20 UTC (permalink / raw) To: Scott Wood; +Cc: linux-mtd, linuxppc-dev, mhench On Fri, Jun 17, 2011 at 5:34 PM, Scott Wood <scottwood@freescale.com> wrote: > > As for the corruption, could it be degradation from repeated reads of that > one page? > Could be. I think Mike's theory was that the -1 page_addr sort of "wrapped around", and caused us to read in the last block on flash each time NAND_CMD_PAGEPROG was performed. So with a lot of writes happening, we could end up with a BBT that looks like this. That makes sense I guess, since set_addr() in fsl_elbc_nand.c uses page_addr to set FBAR. I don't see anything about it in the manual, but if FBAR wraps beyond the end of the chip, maybe the bits that don't make sense are simply ignored. (In which case we should probably add a check in set_addr() to prevent anything like this in the future) In theory I should be able to prove it out by running 2 devices in parallel - one with that block of code still there, and one with it removed. If the former device sees bit-flips in the BBT and the latter one doesn't, we'll be sure of the culprit. I'll try this and come back with the results. Thanks! -- Matthew L. Creech ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NAND BBT corruption on MPC83xx 2011-06-17 21:34 ` Scott Wood 2011-06-18 17:55 ` Mike Hench 2011-06-20 15:20 ` Matthew L. Creech @ 2011-07-05 19:58 ` Matthew L. Creech 2011-07-05 19:59 ` [PATCH] mtd: eLBC NAND: remove bogus ECC read-back Matthew L. Creech 2011-07-11 15:30 ` NAND BBT corruption on MPC83xx Matthew L. Creech 2 siblings, 2 replies; 15+ messages in thread From: Matthew L. Creech @ 2011-07-05 19:58 UTC (permalink / raw) To: Scott Wood; +Cc: linux-mtd, linuxppc-dev On Fri, Jun 17, 2011 at 5:34 PM, Scott Wood <scottwood@freescale.com> wrote= : > > It seems that the generic code always passes -1 with PAGEPROG, and only > provides the actual page address on SEQIN. > > I don't think the ECC readback is needed, and the fact that it looks like > it has always been broken would seem to confirm that. =A0It's broken in > other ways, too -- it assumes a particular ECC layout. =A0Let's get rid o= f it. > > As for the corruption, could it be degradation from repeated reads of tha= t > one page? > I modified nanddump to do repeated reads, and compare the data obtained from the first iteration with that obtained later (to detect bit-flips). I tried 3 different variations: - one which reads the first page (2k) of the last block - one which reads the second page (2k) of the last block - one which reads the entire last block (128k), just for comparison As I understand it, read-disturb would primarily come into play when the second page is read, since it's adjacent to the first page (please correct me if I'm wrong there). Anyway, all 3 of these tests were run for at least 50 million read cycles, with no bit-flips detected. So I'm somewhat doubtful that this is the cause of the BBT corruption I've been seeing. =3D=3D=3D=3D Separately, I set up 2 test devices to run while I was away last week. One of them contained 2 patches: - Mike Hench's patch which eliminates this block of code in fsl_elbc_nand.c - Adam Thomson's patch (http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html) which initializes oob_poi correctly Upon my return, the device with these patches saw no problems at all, and had no additional bad blocks. The device without these patches had some 200+ blocks which had been newly marked as bad in the BBT over the course of 10 days. After rebooting, this latter device then failed to boot, as shown here: http://mcreech.com/work/bbt-ecc-error4.txt I'm currently running another test to verify which of the two patches actually fixed this problem (which might take a few days), but it seems like removing that block of code in fsl_elbc_nand.c is a good idea. --=20 Matthew L. Creech ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] mtd: eLBC NAND: remove bogus ECC read-back 2011-07-05 19:58 ` Matthew L. Creech @ 2011-07-05 19:59 ` Matthew L. Creech 2011-07-05 20:15 ` Scott Wood 2011-07-11 15:30 ` NAND BBT corruption on MPC83xx Matthew L. Creech 1 sibling, 1 reply; 15+ messages in thread From: Matthew L. Creech @ 2011-07-05 19:59 UTC (permalink / raw) To: linux-mtd; +Cc: scottwood, linuxppc-dev, rick22, mhench From: Mike Hench <mhench@elutions.com> The eLBC NAND driver currently follows up each program/write operation with a read-back of the page, in order to [ostensibly] fill in ECC data for the caller. However, the page address used for this read is always -1, so the read will never work correctly. Remove this useless (and potentially problematic) block of code. Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> --- drivers/mtd/nand/fsl_elbc_nand.c | 17 ----------------- 1 files changed, 0 insertions(+), 17 deletions(-) diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c index 0bb254c..050a2fc 100644 --- a/drivers/mtd/nand/fsl_elbc_nand.c +++ b/drivers/mtd/nand/fsl_elbc_nand.c @@ -455,23 +455,6 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command, fsl_elbc_run_command(mtd); - /* Read back the page in order to fill in the ECC for the - * caller. Is this really needed? - */ - if (full_page && elbc_fcm_ctrl->oob_poi) { - out_be32(&lbc->fbcr, 3); - set_addr(mtd, 6, page_addr, 1); - - elbc_fcm_ctrl->read_bytes = mtd->writesize + 9; - - fsl_elbc_do_read(chip, 1); - fsl_elbc_run_command(mtd); - - memcpy_fromio(elbc_fcm_ctrl->oob_poi + 6, - &elbc_fcm_ctrl->addr[elbc_fcm_ctrl->index], 3); - elbc_fcm_ctrl->index += 3; - } - elbc_fcm_ctrl->oob_poi = NULL; return; } -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] mtd: eLBC NAND: remove bogus ECC read-back 2011-07-05 19:59 ` [PATCH] mtd: eLBC NAND: remove bogus ECC read-back Matthew L. Creech @ 2011-07-05 20:15 ` Scott Wood 2011-07-05 22:35 ` [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi Matthew L. Creech 0 siblings, 1 reply; 15+ messages in thread From: Scott Wood @ 2011-07-05 20:15 UTC (permalink / raw) To: Matthew L. Creech; +Cc: linuxppc-dev, linux-mtd, mhench, rick22 On Tue, 5 Jul 2011 15:59:57 -0400 "Matthew L. Creech" <mlcreech@gmail.com> wrote: > From: Mike Hench <mhench@elutions.com> > > The eLBC NAND driver currently follows up each program/write operation with a > read-back of the page, in order to [ostensibly] fill in ECC data for the > caller. However, the page address used for this read is always -1, so the read > will never work correctly. Remove this useless (and potentially problematic) > block of code. > > Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> > --- > drivers/mtd/nand/fsl_elbc_nand.c | 17 ----------------- > 1 files changed, 0 insertions(+), 17 deletions(-) > > diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c > index 0bb254c..050a2fc 100644 > --- a/drivers/mtd/nand/fsl_elbc_nand.c > +++ b/drivers/mtd/nand/fsl_elbc_nand.c > @@ -455,23 +455,6 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command, > > fsl_elbc_run_command(mtd); > > - /* Read back the page in order to fill in the ECC for the > - * caller. Is this really needed? > - */ > - if (full_page && elbc_fcm_ctrl->oob_poi) { > - out_be32(&lbc->fbcr, 3); > - set_addr(mtd, 6, page_addr, 1); > - > - elbc_fcm_ctrl->read_bytes = mtd->writesize + 9; > - > - fsl_elbc_do_read(chip, 1); > - fsl_elbc_run_command(mtd); > - > - memcpy_fromio(elbc_fcm_ctrl->oob_poi + 6, > - &elbc_fcm_ctrl->addr[elbc_fcm_ctrl->index], 3); > - elbc_fcm_ctrl->index += 3; > - } > - > elbc_fcm_ctrl->oob_poi = NULL; > return; > } All references to elbc_fcm_ctrl->oob_poi (not chip->oob_poi) can be removed now. -Scott ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi 2011-07-05 20:15 ` Scott Wood @ 2011-07-05 22:35 ` Matthew L. Creech 2011-07-05 23:01 ` Scott Wood 2011-07-05 23:14 ` [PATCH v3] " Matthew L. Creech 0 siblings, 2 replies; 15+ messages in thread From: Matthew L. Creech @ 2011-07-05 22:35 UTC (permalink / raw) To: linux-mtd; +Cc: scottwood, linuxppc-dev, rick22, mhench From: Mike Hench <mhench@elutions.com> The eLBC NAND driver currently follows up each program/write operation with a read-back of the page, in order to [ostensibly] fill in ECC data for the caller. However, the page address used for this read is always -1, so the read will never work correctly. Remove this useless (and potentially problematic) block of code. v2: elbc_fcm_ctrl->oob_poi is removed entirely, since this code block was the only place it was actually used. Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> --- drivers/mtd/nand/fsl_elbc_nand.c | 25 ------------------------- 1 files changed, 0 insertions(+), 25 deletions(-) diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c index 0bb254c..5e4fbf5 100644 --- a/drivers/mtd/nand/fsl_elbc_nand.c +++ b/drivers/mtd/nand/fsl_elbc_nand.c @@ -75,7 +75,6 @@ struct fsl_elbc_fcm_ctrl { unsigned int use_mdr; /* Non zero if the MDR is to be set */ unsigned int oob; /* Non zero if operating on OOB data */ unsigned int counter; /* counter for the initializations */ - char *oob_poi; /* Place to write ECC after read back */ }; /* These map to the positions used by the FCM hardware ECC generator */ @@ -454,25 +453,6 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command, } fsl_elbc_run_command(mtd); - - /* Read back the page in order to fill in the ECC for the - * caller. Is this really needed? - */ - if (full_page && elbc_fcm_ctrl->oob_poi) { - out_be32(&lbc->fbcr, 3); - set_addr(mtd, 6, page_addr, 1); - - elbc_fcm_ctrl->read_bytes = mtd->writesize + 9; - - fsl_elbc_do_read(chip, 1); - fsl_elbc_run_command(mtd); - - memcpy_fromio(elbc_fcm_ctrl->oob_poi + 6, - &elbc_fcm_ctrl->addr[elbc_fcm_ctrl->index], 3); - elbc_fcm_ctrl->index += 3; - } - - elbc_fcm_ctrl->oob_poi = NULL; return; } @@ -752,13 +732,8 @@ static void fsl_elbc_write_page(struct mtd_info *mtd, struct nand_chip *chip, const uint8_t *buf) { - struct fsl_elbc_mtd *priv = chip->priv; - struct fsl_elbc_fcm_ctrl *elbc_fcm_ctrl = priv->ctrl->nand; - fsl_elbc_write_buf(mtd, buf, mtd->writesize); fsl_elbc_write_buf(mtd, chip->oob_poi, mtd->oobsize); - - elbc_fcm_ctrl->oob_poi = chip->oob_poi; } static int fsl_elbc_chip_init(struct fsl_elbc_mtd *priv) -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi 2011-07-05 22:35 ` [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi Matthew L. Creech @ 2011-07-05 23:01 ` Scott Wood 2011-07-05 23:14 ` Matthew L. Creech 2011-07-05 23:14 ` [PATCH v3] " Matthew L. Creech 1 sibling, 1 reply; 15+ messages in thread From: Scott Wood @ 2011-07-05 23:01 UTC (permalink / raw) To: Matthew L. Creech; +Cc: linuxppc-dev, linux-mtd, rick22, mhench On Tue, 5 Jul 2011 18:35:02 -0400 "Matthew L. Creech" <mlcreech@gmail.com> wrote: > From: Mike Hench <mhench@elutions.com> > > The eLBC NAND driver currently follows up each program/write operation with a > read-back of the page, in order to [ostensibly] fill in ECC data for the > caller. However, the page address used for this read is always -1, so the read > will never work correctly. Remove this useless (and potentially problematic) > block of code. > > v2: elbc_fcm_ctrl->oob_poi is removed entirely, since this code block was the > only place it was actually used. Just noticed, full_page can come out as well. Otherwise, ACK -Scott ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi 2011-07-05 23:01 ` Scott Wood @ 2011-07-05 23:14 ` Matthew L. Creech 0 siblings, 0 replies; 15+ messages in thread From: Matthew L. Creech @ 2011-07-05 23:14 UTC (permalink / raw) To: Scott Wood; +Cc: linux-mtd, linuxppc-dev, mhench, rick22 On Tue, Jul 5, 2011 at 7:01 PM, Scott Wood <scottwood@freescale.com> wrote: > > Just noticed, full_page can come out as well. > > Otherwise, ACK > Oh right, didn't notice that - thanks. -- Matthew L. Creech ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi 2011-07-05 22:35 ` [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi Matthew L. Creech 2011-07-05 23:01 ` Scott Wood @ 2011-07-05 23:14 ` Matthew L. Creech 2011-07-06 7:23 ` Artem Bityutskiy 1 sibling, 1 reply; 15+ messages in thread From: Matthew L. Creech @ 2011-07-05 23:14 UTC (permalink / raw) To: linux-mtd; +Cc: scottwood, linuxppc-dev, rick22, mhench From: Mike Hench <mhench@elutions.com> The eLBC NAND driver currently follows up each program/write operation with a read-back of the page, in order to [ostensibly] fill in ECC data for the caller. However, the page address used for this read is always -1, so the read will never work correctly. Remove this useless (and potentially problematic) block of code. v2: elbc_fcm_ctrl->oob_poi is removed entirely, since this code block was the only place it was actually used. v3: local 'full_page' variable is no longer used either. Signed-off-by: Matthew L. Creech <mlcreech@gmail.com> --- drivers/mtd/nand/fsl_elbc_nand.c | 33 ++------------------------------- 1 files changed, 2 insertions(+), 31 deletions(-) diff --git a/drivers/mtd/nand/fsl_elbc_nand.c b/drivers/mtd/nand/fsl_elbc_nand.c index 0bb254c..b4d310f 100644 --- a/drivers/mtd/nand/fsl_elbc_nand.c +++ b/drivers/mtd/nand/fsl_elbc_nand.c @@ -75,7 +75,6 @@ struct fsl_elbc_fcm_ctrl { unsigned int use_mdr; /* Non zero if the MDR is to be set */ unsigned int oob; /* Non zero if operating on OOB data */ unsigned int counter; /* counter for the initializations */ - char *oob_poi; /* Place to write ECC after read back */ }; /* These map to the positions used by the FCM hardware ECC generator */ @@ -435,7 +434,6 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command, /* PAGEPROG reuses all of the setup from SEQIN and adds the length */ case NAND_CMD_PAGEPROG: { - int full_page; dev_vdbg(priv->dev, "fsl_elbc_cmdfunc: NAND_CMD_PAGEPROG " "writing %d bytes.\n", elbc_fcm_ctrl->index); @@ -445,34 +443,12 @@ static void fsl_elbc_cmdfunc(struct mtd_info *mtd, unsigned int command, * write so the HW generates the ECC. */ if (elbc_fcm_ctrl->oob || elbc_fcm_ctrl->column != 0 || - elbc_fcm_ctrl->index != mtd->writesize + mtd->oobsize) { + elbc_fcm_ctrl->index != mtd->writesize + mtd->oobsize) out_be32(&lbc->fbcr, elbc_fcm_ctrl->index); - full_page = 0; - } else { + else out_be32(&lbc->fbcr, 0); - full_page = 1; - } fsl_elbc_run_command(mtd); - - /* Read back the page in order to fill in the ECC for the - * caller. Is this really needed? - */ - if (full_page && elbc_fcm_ctrl->oob_poi) { - out_be32(&lbc->fbcr, 3); - set_addr(mtd, 6, page_addr, 1); - - elbc_fcm_ctrl->read_bytes = mtd->writesize + 9; - - fsl_elbc_do_read(chip, 1); - fsl_elbc_run_command(mtd); - - memcpy_fromio(elbc_fcm_ctrl->oob_poi + 6, - &elbc_fcm_ctrl->addr[elbc_fcm_ctrl->index], 3); - elbc_fcm_ctrl->index += 3; - } - - elbc_fcm_ctrl->oob_poi = NULL; return; } @@ -752,13 +728,8 @@ static void fsl_elbc_write_page(struct mtd_info *mtd, struct nand_chip *chip, const uint8_t *buf) { - struct fsl_elbc_mtd *priv = chip->priv; - struct fsl_elbc_fcm_ctrl *elbc_fcm_ctrl = priv->ctrl->nand; - fsl_elbc_write_buf(mtd, buf, mtd->writesize); fsl_elbc_write_buf(mtd, chip->oob_poi, mtd->oobsize); - - elbc_fcm_ctrl->oob_poi = chip->oob_poi; } static int fsl_elbc_chip_init(struct fsl_elbc_mtd *priv) -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi 2011-07-05 23:14 ` [PATCH v3] " Matthew L. Creech @ 2011-07-06 7:23 ` Artem Bityutskiy 0 siblings, 0 replies; 15+ messages in thread From: Artem Bityutskiy @ 2011-07-06 7:23 UTC (permalink / raw) To: Matthew L. Creech; +Cc: scottwood, linuxppc-dev, linux-mtd, mhench, rick22 On Tue, 2011-07-05 at 19:14 -0400, Matthew L. Creech wrote: > From: Mike Hench <mhench@elutions.com> > > The eLBC NAND driver currently follows up each program/write operation with a > read-back of the page, in order to [ostensibly] fill in ECC data for the > caller. However, the page address used for this read is always -1, so the read > will never work correctly. Remove this useless (and potentially problematic) > block of code. Pushed to l2-mtd-2.6.git, thanks! -- Best Regards, Artem Bityutskiy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NAND BBT corruption on MPC83xx 2011-07-05 19:58 ` Matthew L. Creech 2011-07-05 19:59 ` [PATCH] mtd: eLBC NAND: remove bogus ECC read-back Matthew L. Creech @ 2011-07-11 15:30 ` Matthew L. Creech 1 sibling, 0 replies; 15+ messages in thread From: Matthew L. Creech @ 2011-07-11 15:30 UTC (permalink / raw) To: Scott Wood; +Cc: linux-mtd, linuxppc-dev, rick22, Mike Hench On Tue, Jul 5, 2011 at 3:58 PM, Matthew L. Creech <mlcreech@gmail.com> wrot= e: > > Separately, I set up 2 test devices to run while I was away last week. > =A0One of them contained 2 patches: > > - Mike Hench's patch which eliminates this block of code in fsl_elbc_nand= .c > - Adam Thomson's patch > (http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html) > which initializes oob_poi correctly > > Upon my return, the device with these patches saw no problems at all, > and had no additional bad blocks. =A0The device without these patches > had some 200+ blocks which had been newly marked as bad in the BBT > over the course of 10 days. =A0After rebooting, this latter device then > failed to boot, as shown here: > > http://mcreech.com/work/bbt-ecc-error4.txt > > I'm currently running another test to verify which of the two patches > actually fixed this problem (which might take a few days), but it > seems like removing that block of code in fsl_elbc_nand.c is a good > idea. > Just an update: my tests confirmed that the patch to fsl_elbc_nand.c (http://lists.infradead.org/pipermail/linux-mtd/2011-July/036893.html) seems to have fixed these BBT corruption problems. I ran a torture test on 2 devices for several days: the one which had only that patch had no further issues, while the one which didn't have it (but did have the other oob_poi patch from Adam) experienced BBT corruption. Thanks everyone --=20 Matthew L. Creech ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-07-11 15:30 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-17 20:54 NAND BBT corruption on MPC83xx Matthew L. Creech 2011-06-17 21:34 ` Scott Wood 2011-06-18 17:55 ` Mike Hench 2011-06-20 11:22 ` Atlant Schmidt 2011-06-23 8:31 ` Artem Bityutskiy 2011-06-20 15:20 ` Matthew L. Creech 2011-07-05 19:58 ` Matthew L. Creech 2011-07-05 19:59 ` [PATCH] mtd: eLBC NAND: remove bogus ECC read-back Matthew L. Creech 2011-07-05 20:15 ` Scott Wood 2011-07-05 22:35 ` [PATCH v2] mtd: eLBC NAND: remove elbc_fcm_ctrl->oob_poi Matthew L. Creech 2011-07-05 23:01 ` Scott Wood 2011-07-05 23:14 ` Matthew L. Creech 2011-07-05 23:14 ` [PATCH v3] " Matthew L. Creech 2011-07-06 7:23 ` Artem Bityutskiy 2011-07-11 15:30 ` NAND BBT corruption on MPC83xx Matthew L. Creech
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).