* bbt and bitflip
@ 2011-04-21 13:40 Matthieu CASTET
2011-04-21 17:17 ` Matthieu CASTET
2011-04-22 8:08 ` Artem Bityutskiy
0 siblings, 2 replies; 8+ messages in thread
From: Matthieu CASTET @ 2011-04-21 13:40 UTC (permalink / raw)
To: linux-mtd@lists.infradead.org
Hi,
the current bad block table implementation doesn't seem robust against bit flip.
at boot we call :
- search_read_bbts which scan for bbt using oob pattern.
- check_create
-- read_abs_bbt
--- read_bbt which ignore ecc bit flip/error
So if bit flip happen in BBT, we never scrub it.
And if bit flip accumulate and we can't correct it anymore, the code will parse
the corrupted data and our bad block info will be wrong (valid block can be
marked as bad and we lose bad, bad block can be see as valid).
Also the pattern and version in oob isn't protected by ecc. They can be corrupted.
Are bbt safe to use ?
Are there any plan to make the bbt more robust ?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip
2011-04-21 13:40 bbt and bitflip Matthieu CASTET
@ 2011-04-21 17:17 ` Matthieu CASTET
2011-04-22 8:14 ` Artem Bityutskiy
2011-04-22 8:15 ` Artem Bityutskiy
2011-04-22 8:08 ` Artem Bityutskiy
1 sibling, 2 replies; 8+ messages in thread
From: Matthieu CASTET @ 2011-04-21 17:17 UTC (permalink / raw)
To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org
[-- Attachment #1: Type: text/plain, Size: 828 bytes --]
Matthieu CASTET a écrit :
> Hi,
>
> the current bad block table implementation doesn't seem robust against bit flip.
>
> at boot we call :
> - search_read_bbts which scan for bbt using oob pattern.
> - check_create
> -- read_abs_bbt
> --- read_bbt which ignore ecc bit flip/error
>
> So if bit flip happen in BBT, we never scrub it.
> And if bit flip accumulate and we can't correct it anymore, the code will parse
> the corrupted data and our bad block info will be wrong (valid block can be
> marked as bad and we lose bad, bad block can be see as valid).
>
>
> Also the pattern and version in oob isn't protected by ecc. They can be corrupted.
>
> Are bbt safe to use ?
>
> Are there any plan to make the bbt more robust ?
>
Here a quick and dirty patch to make them more robust.
Any comment are welcomed.
Matthieu
[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 1733 bytes --]
diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
index 5fedf4a..7b85b54 100644
--- a/drivers/mtd/nand/nand_bbt.c
+++ b/drivers/mtd/nand/nand_bbt.c
@@ -171,7 +171,7 @@ static int check_short_pattern(uint8_t *buf, struct nand_bbt_descr *td)
static int read_bbt(struct mtd_info *mtd, uint8_t *buf, int page, int num,
int bits, int offs, int reserved_block_code)
{
- int res, i, j, act = 0;
+ int res, i, j, act = 0, ret = 0;
struct nand_chip *this = mtd->priv;
size_t retlen, len, totlen;
loff_t from;
@@ -188,6 +188,12 @@ static int read_bbt(struct mtd_info *mtd, uint8_t *buf, int page, int num,
printk(KERN_INFO "nand_bbt: Error reading bad block table\n");
return res;
}
+ if (res != -EUCLEAN) {
+ printk(KERN_INFO "nand_bbt: Error reading bad block table2\n");
+ return res;
+ }
+ /* inform caller that there is bit flips */
+ ret |= res;
printk(KERN_WARNING "nand_bbt: ECC error while reading bad block table\n");
}
@@ -220,7 +226,7 @@ static int read_bbt(struct mtd_info *mtd, uint8_t *buf, int page, int num,
totlen -= len;
from += len;
}
- return 0;
+ return ret;
}
/**
@@ -900,7 +906,20 @@ static int check_create(struct mtd_info *mtd, uint8_t *buf, struct nand_bbt_desc
writecheck:
/* read back first ? */
if (rd)
- read_abs_bbt(mtd, buf, rd, chipsel);
+ res = read_abs_bbt(mtd, buf, rd, chipsel);
+ if (!rd2) {
+ if (res == -EBADMSG) {
+ /* bad recreate it */
+ rd = NULL;
+ writeops = 0x03;
+ goto create;
+ }
+ else if (!rd2 && res == -EUCLEAN) {
+ /* rewrite it */
+ writeops = 0x03;
+ }
+ }
+
/* If they weren't versioned, read both. */
if (rd2)
read_abs_bbt(mtd, buf, rd2, chipsel);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: bbt and bitflip
2011-04-21 13:40 bbt and bitflip Matthieu CASTET
2011-04-21 17:17 ` Matthieu CASTET
@ 2011-04-22 8:08 ` Artem Bityutskiy
1 sibling, 0 replies; 8+ messages in thread
From: Artem Bityutskiy @ 2011-04-22 8:08 UTC (permalink / raw)
To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org
Matthieu,
nice finding.
On Thu, 2011-04-21 at 15:40 +0200, Matthieu CASTET wrote:
> Hi,
>
> the current bad block table implementation doesn't seem robust against bit flip.
>
> at boot we call :
> - search_read_bbts which scan for bbt using oob pattern.
> - check_create
> -- read_abs_bbt
> --- read_bbt which ignore ecc bit flip/error
>
> So if bit flip happen in BBT, we never scrub it.
This should probably be fixed.
> And if bit flip accumulate and we can't correct it anymore, the code will parse
> the corrupted data and our bad block info will be wrong (valid block can be
> marked as bad and we lose bad, bad block can be see as valid).
The bbt should be protected with CRC and if it gets corrupted we should
re-scan the flash and re-create it.
> Also the pattern and version in oob isn't protected by ecc. They can be corrupted.
>
> Are bbt safe to use ?
It does not look like.
> Are there any plan to make the bbt more robust ?
I would guess no unless you do it :-)
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip
2011-04-21 17:17 ` Matthieu CASTET
@ 2011-04-22 8:14 ` Artem Bityutskiy
2011-04-22 8:15 ` Artem Bityutskiy
1 sibling, 0 replies; 8+ messages in thread
From: Artem Bityutskiy @ 2011-04-22 8:14 UTC (permalink / raw)
To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org
On Thu, 2011-04-21 at 19:17 +0200, Matthieu CASTET wrote:
> }
> + if (res != -EUCLEAN) {
> + printk(KERN_INFO "nand_bbt: Error reading bad block table2\n");
> + return res;
> + }
Just a side note:
I tend to think that we need to veto all mtd patches which touch/add
prints and force people to first turn all the ugly MTD printing mess
into dev_dbg/dev_err/etc messages... But I cannot actually propose it
before I clean up UBI/UBIFS in this respect :-)))
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip
2011-04-21 17:17 ` Matthieu CASTET
2011-04-22 8:14 ` Artem Bityutskiy
@ 2011-04-22 8:15 ` Artem Bityutskiy
2011-06-23 16:36 ` Brian Norris
1 sibling, 1 reply; 8+ messages in thread
From: Artem Bityutskiy @ 2011-04-22 8:15 UTC (permalink / raw)
To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org
On Thu, 2011-04-21 at 19:17 +0200, Matthieu CASTET wrote:
> Here a quick and dirty patch to make them more robust.
> Any comment are welcomed.
Would be great if you could test it and submit a nice patch with proper
commit message and Signed-off-by.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip
2011-04-22 8:15 ` Artem Bityutskiy
@ 2011-06-23 16:36 ` Brian Norris
2011-06-24 19:55 ` Artem Bityutskiy
0 siblings, 1 reply; 8+ messages in thread
From: Brian Norris @ 2011-06-23 16:36 UTC (permalink / raw)
To: dedekind1; +Cc: linux-mtd@lists.infradead.org, Matthieu CASTET
Hello,
On Fri, Apr 22, 2011 at 1:15 AM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Thu, 2011-04-21 at 19:17 +0200, Matthieu CASTET wrote:
>> Here a quick and dirty patch to make them more robust.
>> Any comment are welcomed.
>
> Would be great if you could test it and submit a nice patch with proper
> commit message and Signed-off-by.
I am interested in Artem's comments on the robustness of flash-based
BBT (here, and more recently on
http://lists.infradead.org/pipermail/linux-mtd/2011-June/036557.html).
I recently have moved to using flash-based BBT (in-band, actually),
and it seemed like several NAND drivers use flash-based BBT as well.
Is it really that un-trustworthy?
So I guess I'm looking for areas of improvement. I see a few suggestions here:
"Also the pattern and version in oob isn't protected by ecc. They can
be corrupted."
I noticed this one recently. My hardware ECC actually *can* do ECC for
what little OOB is actually free, but I realized that the nand_base
code doesn't even try to check the ECC stats (in nand_do_read_oob())
whereas some similar code for nand_do_read_ops *does* check the ECC
stats. Is it fair to adapt the code from nand_do_read_ops to
nand_do_read_oob so that both check for ECC errors, just in case the
hardware supports it? Would this cause any API problems, if users
didn't expect OOB reads to return ECC error statuses? For now, this
would only have any effect if a driver replaces the chip->ecc.read_oob
function with something that performs ECC operations independently and
increments the ECC counters.
And speaking of BBT in OOB:
Anyone know why the flash-based ident pattern and version is
"traditionally" stored in OOB? It was quite recently that Sebastian
Andrzej Siewior added the NAND_USE_FLASH_BBT_NO_OOB flag (which is
slated to be renamed/replaced, FYI). It seems like ...NO_OOB is a
generally good (or at least better) idea. Then we wouldn't even have
to worry much about ECC in OOB.
"read_bbt which ignore ecc bit flip/error"
If I understand right, read_bbt just prints warning message on ECC
flips/errors and otherwise ignores them? Would Matthieu's "quick and
dirty" patch be an OK start for fixing this? (where in the absence of
a valid backup tableb, an ECC error causes a flash rescan and an ECC
bitflip causes an erase/rewrite "scrub"?)
"The bbt should be protected with CRC and if it gets corrupted we
should re-scan the flash and re-create it."
Wouldn't CRC just be a lesser replacement for proper ECC protection?
Or am I missing something?
Brian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip
2011-06-23 16:36 ` Brian Norris
@ 2011-06-24 19:55 ` Artem Bityutskiy
2011-06-24 20:36 ` Matthew L. Creech
0 siblings, 1 reply; 8+ messages in thread
From: Artem Bityutskiy @ 2011-06-24 19:55 UTC (permalink / raw)
To: Brian Norris; +Cc: linux-mtd@lists.infradead.org, Matthieu CASTET
On Thu, 2011-06-23 at 09:36 -0700, Brian Norris wrote:
> I am interested in Artem's comments on the robustness of flash-based
> BBT (here, and more recently on
> http://lists.infradead.org/pipermail/linux-mtd/2011-June/036557.html).
> I recently have moved to using flash-based BBT (in-band, actually),
> and it seemed like several NAND drivers use flash-based BBT as well.
> Is it really that un-trustworthy?
If you confirm that everything is great and robust when the on-flash BBT
gets corrupted - the NAND core for sure notices any corruptions and
falls-back to the traditional scanning method and restores the on-flash
BBT - then I apologize for saying that I do not trust it. Also, I do not
really know the details of this, so I may be completely wrong.
> "The bbt should be protected with CRC and if it gets corrupted we
> should re-scan the flash and re-create it."
>
> Wouldn't CRC just be a lesser replacement for proper ECC protection?
> Or am I missing something?
I'd say ECC and CRC play different roles. ECC is about handling NAND
PITAs like read/write/erase disturb, and CRC is about noticing any
corruption and recover, instead of reporting inaccurate information up -
e.g., reporting a good block as bad.
--
Best Regards,
Artem Bityutskiy (Битюцкий Артём)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bbt and bitflip
2011-06-24 19:55 ` Artem Bityutskiy
@ 2011-06-24 20:36 ` Matthew L. Creech
0 siblings, 0 replies; 8+ messages in thread
From: Matthew L. Creech @ 2011-06-24 20:36 UTC (permalink / raw)
To: dedekind1; +Cc: Brian Norris, linux-mtd@lists.infradead.org, Matthieu CASTET
On Fri, Jun 24, 2011 at 3:55 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
>
> If you confirm that everything is great and robust when the on-flash BBT
> gets corrupted - the NAND core for sure notices any corruptions and
> falls-back to the traditional scanning method and restores the on-flash
> BBT - then I apologize for saying that I do not trust it. Also, I do not
> really know the details of this, so I may be completely wrong.
>
I'm still trying to pinpoint the cause of the BBT corruptions that I
encountered, but _if_ they were caused by bit-flips, then it
definitely appears that your initial statement was accurate: the
devices were in many cases unbootable because the corrupted BBT was
treated as legit.
This also seems to be the case when looking at the code in nand_bbt.c:
if an ECC error is encountered in read_bbt(), it prints a warning but
then continues to process the BBT data. Maybe there's an additional
assumption that I'm overlooking, but at a glance it seems like that
would allow bit errors to accumulate and wrongly flag good blocks as
bad.
--
Matthew L. Creech
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-06-24 20:37 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-21 13:40 bbt and bitflip Matthieu CASTET
2011-04-21 17:17 ` Matthieu CASTET
2011-04-22 8:14 ` Artem Bityutskiy
2011-04-22 8:15 ` Artem Bityutskiy
2011-06-23 16:36 ` Brian Norris
2011-06-24 19:55 ` Artem Bityutskiy
2011-06-24 20:36 ` Matthew L. Creech
2011-04-22 8:08 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox