From: Richard Weinberger <richard@nod.at>
To: Arnaud Mouiche <arnaud.mouiche@invoxia.com>,
Artem Bityutskiy <dedekind1@gmail.com>,
David Woodhouse <dwmw2@infradead.org>,
Brian Norris <computersforpeace@gmail.com>,
linux-mtd@lists.infradead.org,
boris.brezillon@free-electrons.com, peterpansjtu@gmail.com
Subject: Re: [RFC] UBI torture test fails to detect some bad blocks.
Date: Fri, 8 Apr 2016 10:24:11 +0200 [thread overview]
Message-ID: <57076AAB.7050008@nod.at> (raw)
In-Reply-To: <1460100214-31298-1-git-send-email-arnaud.mouiche@invoxia.com>
Hi!
Am 08.04.2016 um 09:23 schrieb Arnaud Mouiche:
> Hi all.
>
> Just some details about what I experience recently with some bad blocs on
> a MX35LF1GE4AB spinand device (SLC, 1Gb, 4bits ECC per 512 sub-page),
> where a UBI partition is attached to manage rootfs & co (as usual).
>
> I get the hand on some devices refusing to boot.
> The analyse of the Erase Counters shows that some of them where erased
> more than 100K, while the majority have an EC below 20 !
Ouch.
> Looking at the bad one, they run the following scenario nearly in loop:
> - linux read some file inside the rootfs
> - a bitflip is detected
> - scrubbing is scheduled.
> - the scrubbing target a PEB with a pretty high EC,
> - this high EC is also due to frequent bitflip in the target PEB in the past.
> - while the PEB data are moved, a bitflip is detected scheduling a torture test.
> - the torture test *ALWAYS* pass (whereas bitflip are *VERY* frequent for
> the same PEB when the read comes filesystem read).
>
> So, it seems obvious the PEBs in question are bad PEBs.
> The question is now why the torture test pass.
>
> Reproducing the pattern test by hand on this block shows the same result.
> But applying different patterns on different pages within the block shows that
> the content of some pages are affected by the content of the other pages.
> In particularly, for this block, if the first page is full of FF and the rest
> of the block is full of 00, I can count more than 100 bitflips (!)
100 flips per ECC step? Shouldn't this lead to a uncorrectable ECC error?
I have no idea how much bits your ECC can fix..
Which bitflip threshold do you have? UBI sees bitflips only after a threshold
is reached. If it is too low, UBI scrubs too often, which seems to be the case here.
It is perfectly fine to have bitflips.
So, we need dig a bit deeper first.
> What kind of pattern should be added to detect those kind of issues ?
This is a very hard question and almost impossible to answer as it is vendor
specific.
> We can think of testing every page one by one, but given the relatively large
> number of pages in a block, it doesn't sound realistic.
> The easiest way could be to use a random pattern, and try it a relative low
> number of times.
> Indeed, this simple random test is efficient to detect every bad block of this device.
> If the random test pass once (because this is a random test), there are chances
> that the next bit flip detection will trigger a new torture test, and at the end,
> it will be finally detected as bad.
Having an additional random pattern is not a bad idea.
This is definitively something we can consider adding to UBI.
But I'm not happy with your implementation.
peb_rnd_buff = kmalloc(ubi->peb_size, GFP_KERNEL);
... is a big no-no. peb_size can be a few megabytes.
What about repeating a few random bytes over and over?
> And the implementation is pretty obvious...
;-)
Thanks,
//richard
next prev parent reply other threads:[~2016-04-08 8:24 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-08 7:23 [RFC] UBI torture test fails to detect some bad blocks Arnaud Mouiche
2016-04-08 7:23 ` [RFC] UBI: harden torture_peb to miss less " Arnaud Mouiche
2016-04-08 8:24 ` Richard Weinberger [this message]
2016-04-08 9:02 ` [RFC] UBI torture test fails to detect some " arnaud.mouiche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57076AAB.7050008@nod.at \
--to=richard@nod.at \
--cc=arnaud.mouiche@invoxia.com \
--cc=boris.brezillon@free-electrons.com \
--cc=computersforpeace@gmail.com \
--cc=dedekind1@gmail.com \
--cc=dwmw2@infradead.org \
--cc=linux-mtd@lists.infradead.org \
--cc=peterpansjtu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.