* bbt and bitflip @ 2011-04-21 13:40 Matthieu CASTET 2011-04-21 17:17 ` Matthieu CASTET 2011-04-22 8:08 ` Artem Bityutskiy 0 siblings, 2 replies; 8+ messages in thread From: Matthieu CASTET @ 2011-04-21 13:40 UTC (permalink / raw) To: linux-mtd@lists.infradead.org Hi, the current bad block table implementation doesn't seem robust against bit flip. at boot we call : - search_read_bbts which scan for bbt using oob pattern. - check_create -- read_abs_bbt --- read_bbt which ignore ecc bit flip/error So if bit flip happen in BBT, we never scrub it. And if bit flip accumulate and we can't correct it anymore, the code will parse the corrupted data and our bad block info will be wrong (valid block can be marked as bad and we lose bad, bad block can be see as valid). Also the pattern and version in oob isn't protected by ecc. They can be corrupted. Are bbt safe to use ? Are there any plan to make the bbt more robust ? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip 2011-04-21 13:40 bbt and bitflip Matthieu CASTET @ 2011-04-21 17:17 ` Matthieu CASTET 2011-04-22 8:14 ` Artem Bityutskiy 2011-04-22 8:15 ` Artem Bityutskiy 2011-04-22 8:08 ` Artem Bityutskiy 1 sibling, 2 replies; 8+ messages in thread From: Matthieu CASTET @ 2011-04-21 17:17 UTC (permalink / raw) To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org [-- Attachment #1: Type: text/plain, Size: 828 bytes --] Matthieu CASTET a écrit : > Hi, > > the current bad block table implementation doesn't seem robust against bit flip. > > at boot we call : > - search_read_bbts which scan for bbt using oob pattern. > - check_create > -- read_abs_bbt > --- read_bbt which ignore ecc bit flip/error > > So if bit flip happen in BBT, we never scrub it. > And if bit flip accumulate and we can't correct it anymore, the code will parse > the corrupted data and our bad block info will be wrong (valid block can be > marked as bad and we lose bad, bad block can be see as valid). > > > Also the pattern and version in oob isn't protected by ecc. They can be corrupted. > > Are bbt safe to use ? > > Are there any plan to make the bbt more robust ? > Here a quick and dirty patch to make them more robust. Any comment are welcomed. Matthieu [-- Attachment #2: diff --] [-- Type: text/plain, Size: 1733 bytes --] diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c index 5fedf4a..7b85b54 100644 --- a/drivers/mtd/nand/nand_bbt.c +++ b/drivers/mtd/nand/nand_bbt.c @@ -171,7 +171,7 @@ static int check_short_pattern(uint8_t *buf, struct nand_bbt_descr *td) static int read_bbt(struct mtd_info *mtd, uint8_t *buf, int page, int num, int bits, int offs, int reserved_block_code) { - int res, i, j, act = 0; + int res, i, j, act = 0, ret = 0; struct nand_chip *this = mtd->priv; size_t retlen, len, totlen; loff_t from; @@ -188,6 +188,12 @@ static int read_bbt(struct mtd_info *mtd, uint8_t *buf, int page, int num, printk(KERN_INFO "nand_bbt: Error reading bad block table\n"); return res; } + if (res != -EUCLEAN) { + printk(KERN_INFO "nand_bbt: Error reading bad block table2\n"); + return res; + } + /* inform caller that there is bit flips */ + ret |= res; printk(KERN_WARNING "nand_bbt: ECC error while reading bad block table\n"); } @@ -220,7 +226,7 @@ static int read_bbt(struct mtd_info *mtd, uint8_t *buf, int page, int num, totlen -= len; from += len; } - return 0; + return ret; } /** @@ -900,7 +906,20 @@ static int check_create(struct mtd_info *mtd, uint8_t *buf, struct nand_bbt_desc writecheck: /* read back first ? */ if (rd) - read_abs_bbt(mtd, buf, rd, chipsel); + res = read_abs_bbt(mtd, buf, rd, chipsel); + if (!rd2) { + if (res == -EBADMSG) { + /* bad recreate it */ + rd = NULL; + writeops = 0x03; + goto create; + } + else if (!rd2 && res == -EUCLEAN) { + /* rewrite it */ + writeops = 0x03; + } + } + /* If they weren't versioned, read both. */ if (rd2) read_abs_bbt(mtd, buf, rd2, chipsel); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bbt and bitflip 2011-04-21 17:17 ` Matthieu CASTET @ 2011-04-22 8:14 ` Artem Bityutskiy 2011-04-22 8:15 ` Artem Bityutskiy 1 sibling, 0 replies; 8+ messages in thread From: Artem Bityutskiy @ 2011-04-22 8:14 UTC (permalink / raw) To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org On Thu, 2011-04-21 at 19:17 +0200, Matthieu CASTET wrote: > } > + if (res != -EUCLEAN) { > + printk(KERN_INFO "nand_bbt: Error reading bad block table2\n"); > + return res; > + } Just a side note: I tend to think that we need to veto all mtd patches which touch/add prints and force people to first turn all the ugly MTD printing mess into dev_dbg/dev_err/etc messages... But I cannot actually propose it before I clean up UBI/UBIFS in this respect :-))) -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip 2011-04-21 17:17 ` Matthieu CASTET 2011-04-22 8:14 ` Artem Bityutskiy @ 2011-04-22 8:15 ` Artem Bityutskiy 2011-06-23 16:36 ` Brian Norris 1 sibling, 1 reply; 8+ messages in thread From: Artem Bityutskiy @ 2011-04-22 8:15 UTC (permalink / raw) To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org On Thu, 2011-04-21 at 19:17 +0200, Matthieu CASTET wrote: > Here a quick and dirty patch to make them more robust. > Any comment are welcomed. Would be great if you could test it and submit a nice patch with proper commit message and Signed-off-by. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip 2011-04-22 8:15 ` Artem Bityutskiy @ 2011-06-23 16:36 ` Brian Norris 2011-06-24 19:55 ` Artem Bityutskiy 0 siblings, 1 reply; 8+ messages in thread From: Brian Norris @ 2011-06-23 16:36 UTC (permalink / raw) To: dedekind1; +Cc: linux-mtd@lists.infradead.org, Matthieu CASTET Hello, On Fri, Apr 22, 2011 at 1:15 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > On Thu, 2011-04-21 at 19:17 +0200, Matthieu CASTET wrote: >> Here a quick and dirty patch to make them more robust. >> Any comment are welcomed. > > Would be great if you could test it and submit a nice patch with proper > commit message and Signed-off-by. I am interested in Artem's comments on the robustness of flash-based BBT (here, and more recently on http://lists.infradead.org/pipermail/linux-mtd/2011-June/036557.html). I recently have moved to using flash-based BBT (in-band, actually), and it seemed like several NAND drivers use flash-based BBT as well. Is it really that un-trustworthy? So I guess I'm looking for areas of improvement. I see a few suggestions here: "Also the pattern and version in oob isn't protected by ecc. They can be corrupted." I noticed this one recently. My hardware ECC actually *can* do ECC for what little OOB is actually free, but I realized that the nand_base code doesn't even try to check the ECC stats (in nand_do_read_oob()) whereas some similar code for nand_do_read_ops *does* check the ECC stats. Is it fair to adapt the code from nand_do_read_ops to nand_do_read_oob so that both check for ECC errors, just in case the hardware supports it? Would this cause any API problems, if users didn't expect OOB reads to return ECC error statuses? For now, this would only have any effect if a driver replaces the chip->ecc.read_oob function with something that performs ECC operations independently and increments the ECC counters. And speaking of BBT in OOB: Anyone know why the flash-based ident pattern and version is "traditionally" stored in OOB? It was quite recently that Sebastian Andrzej Siewior added the NAND_USE_FLASH_BBT_NO_OOB flag (which is slated to be renamed/replaced, FYI). It seems like ...NO_OOB is a generally good (or at least better) idea. Then we wouldn't even have to worry much about ECC in OOB. "read_bbt which ignore ecc bit flip/error" If I understand right, read_bbt just prints warning message on ECC flips/errors and otherwise ignores them? Would Matthieu's "quick and dirty" patch be an OK start for fixing this? (where in the absence of a valid backup tableb, an ECC error causes a flash rescan and an ECC bitflip causes an erase/rewrite "scrub"?) "The bbt should be protected with CRC and if it gets corrupted we should re-scan the flash and re-create it." Wouldn't CRC just be a lesser replacement for proper ECC protection? Or am I missing something? Brian ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip 2011-06-23 16:36 ` Brian Norris @ 2011-06-24 19:55 ` Artem Bityutskiy 2011-06-24 20:36 ` Matthew L. Creech 0 siblings, 1 reply; 8+ messages in thread From: Artem Bityutskiy @ 2011-06-24 19:55 UTC (permalink / raw) To: Brian Norris; +Cc: linux-mtd@lists.infradead.org, Matthieu CASTET On Thu, 2011-06-23 at 09:36 -0700, Brian Norris wrote: > I am interested in Artem's comments on the robustness of flash-based > BBT (here, and more recently on > http://lists.infradead.org/pipermail/linux-mtd/2011-June/036557.html). > I recently have moved to using flash-based BBT (in-band, actually), > and it seemed like several NAND drivers use flash-based BBT as well. > Is it really that un-trustworthy? If you confirm that everything is great and robust when the on-flash BBT gets corrupted - the NAND core for sure notices any corruptions and falls-back to the traditional scanning method and restores the on-flash BBT - then I apologize for saying that I do not trust it. Also, I do not really know the details of this, so I may be completely wrong. > "The bbt should be protected with CRC and if it gets corrupted we > should re-scan the flash and re-create it." > > Wouldn't CRC just be a lesser replacement for proper ECC protection? > Or am I missing something? I'd say ECC and CRC play different roles. ECC is about handling NAND PITAs like read/write/erase disturb, and CRC is about noticing any corruption and recover, instead of reporting inaccurate information up - e.g., reporting a good block as bad. -- Best Regards, Artem Bityutskiy (Битюцкий Артём) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip 2011-06-24 19:55 ` Artem Bityutskiy @ 2011-06-24 20:36 ` Matthew L. Creech 0 siblings, 0 replies; 8+ messages in thread From: Matthew L. Creech @ 2011-06-24 20:36 UTC (permalink / raw) To: dedekind1; +Cc: Brian Norris, linux-mtd@lists.infradead.org, Matthieu CASTET On Fri, Jun 24, 2011 at 3:55 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote: > > If you confirm that everything is great and robust when the on-flash BBT > gets corrupted - the NAND core for sure notices any corruptions and > falls-back to the traditional scanning method and restores the on-flash > BBT - then I apologize for saying that I do not trust it. Also, I do not > really know the details of this, so I may be completely wrong. > I'm still trying to pinpoint the cause of the BBT corruptions that I encountered, but _if_ they were caused by bit-flips, then it definitely appears that your initial statement was accurate: the devices were in many cases unbootable because the corrupted BBT was treated as legit. This also seems to be the case when looking at the code in nand_bbt.c: if an ECC error is encountered in read_bbt(), it prints a warning but then continues to process the BBT data. Maybe there's an additional assumption that I'm overlooking, but at a glance it seems like that would allow bit errors to accumulate and wrongly flag good blocks as bad. -- Matthew L. Creech ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip 2011-04-21 13:40 bbt and bitflip Matthieu CASTET 2011-04-21 17:17 ` Matthieu CASTET @ 2011-04-22 8:08 ` Artem Bityutskiy 1 sibling, 0 replies; 8+ messages in thread From: Artem Bityutskiy @ 2011-04-22 8:08 UTC (permalink / raw) To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org Matthieu, nice finding. On Thu, 2011-04-21 at 15:40 +0200, Matthieu CASTET wrote: > Hi, > > the current bad block table implementation doesn't seem robust against bit flip. > > at boot we call : > - search_read_bbts which scan for bbt using oob pattern. > - check_create > -- read_abs_bbt > --- read_bbt which ignore ecc bit flip/error > > So if bit flip happen in BBT, we never scrub it. This should probably be fixed. > And if bit flip accumulate and we can't correct it anymore, the code will parse > the corrupted data and our bad block info will be wrong (valid block can be > marked as bad and we lose bad, bad block can be see as valid). The bbt should be protected with CRC and if it gets corrupted we should re-scan the flash and re-create it. > Also the pattern and version in oob isn't protected by ecc. They can be corrupted. > > Are bbt safe to use ? It does not look like. > Are there any plan to make the bbt more robust ? I would guess no unless you do it :-) -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-06-24 20:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-21 13:40 bbt and bitflip Matthieu CASTET 2011-04-21 17:17 ` Matthieu CASTET 2011-04-22 8:14 ` Artem Bityutskiy 2011-04-22 8:15 ` Artem Bityutskiy 2011-06-23 16:36 ` Brian Norris 2011-06-24 19:55 ` Artem Bityutskiy 2011-06-24 20:36 ` Matthew L. Creech 2011-04-22 8:08 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox