From: Richard Weinberger <richard@nod.at>
To: Arnaud Mouiche <arnaud.mouiche@invoxia.com>,
Artem Bityutskiy <dedekind1@gmail.com>,
David Woodhouse <dwmw2@infradead.org>,
Brian Norris <computersforpeace@gmail.com>,
linux-mtd@lists.infradead.org,
boris.brezillon@free-electrons.com, peterpansjtu@gmail.com
Subject: Re: [RFC] UBI torture test fails to detect some bad blocks.
Date: Fri, 8 Apr 2016 10:24:11 +0200 [thread overview]
Message-ID: <57076AAB.7050008@nod.at> (raw)
In-Reply-To: <1460100214-31298-1-git-send-email-arnaud.mouiche@invoxia.com>
Hi!
Am 08.04.2016 um 09:23 schrieb Arnaud Mouiche:
> Hi all.
>
> Just some details about what I experience recently with some bad blocs on
> a MX35LF1GE4AB spinand device (SLC, 1Gb, 4bits ECC per 512 sub-page),
> where a UBI partition is attached to manage rootfs & co (as usual).
>
> I get the hand on some devices refusing to boot.
> The analyse of the Erase Counters shows that some of them where erased
> more than 100K, while the majority have an EC below 20 !
Ouch.
> Looking at the bad one, they run the following scenario nearly in loop:
> - linux read some file inside the rootfs
> - a bitflip is detected
> - scrubbing is scheduled.
> - the scrubbing target a PEB with a pretty high EC,
> - this high EC is also due to frequent bitflip in the target PEB in the past.
> - while the PEB data are moved, a bitflip is detected scheduling a torture test.
> - the torture test *ALWAYS* pass (whereas bitflip are *VERY* frequent for
> the same PEB when the read comes filesystem read).
>
> So, it seems obvious the PEBs in question are bad PEBs.
> The question is now why the torture test pass.
>
> Reproducing the pattern test by hand on this block shows the same result.
> But applying different patterns on different pages within the block shows that
> the content of some pages are affected by the content of the other pages.
> In particularly, for this block, if the first page is full of FF and the rest
> of the block is full of 00, I can count more than 100 bitflips (!)
100 flips per ECC step? Shouldn't this lead to a uncorrectable ECC error?
I have no idea how much bits your ECC can fix..
Which bitflip threshold do you have? UBI sees bitflips only after a threshold
is reached. If it is too low, UBI scrubs too often, which seems to be the case here.
It is perfectly fine to have bitflips.
So, we need dig a bit deeper first.
> What kind of pattern should be added to detect those kind of issues ?
This is a very hard question and almost impossible to answer as it is vendor
specific.
> We can think of testing every page one by one, but given the relatively large
> number of pages in a block, it doesn't sound realistic.
> The easiest way could be to use a random pattern, and try it a relative low
> number of times.
> Indeed, this simple random test is efficient to detect every bad block of this device.
> If the random test pass once (because this is a random test), there are chances
> that the next bit flip detection will trigger a new torture test, and at the end,
> it will be finally detected as bad.
Having an additional random pattern is not a bad idea.
This is definitively something we can consider adding to UBI.
But I'm not happy with your implementation.
peb_rnd_buff = kmalloc(ubi->peb_size, GFP_KERNEL);
... is a big no-no. peb_size can be a few megabytes.
What about repeating a few random bytes over and over?
> And the implementation is pretty obvious...
;-)
Thanks,
//richard
next prev parent reply other threads:[~2016-04-08 8:24 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-08 7:23 [RFC] UBI torture test fails to detect some bad blocks Arnaud Mouiche
2016-04-08 7:23 ` [RFC] UBI: harden torture_peb to miss less " Arnaud Mouiche
2016-04-08 8:24 ` Richard Weinberger [this message]
2016-04-08 9:02 ` [RFC] UBI torture test fails to detect some " arnaud.mouiche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57076AAB.7050008@nod.at \
--to=richard@nod.at \
--cc=arnaud.mouiche@invoxia.com \
--cc=boris.brezillon@free-electrons.com \
--cc=computersforpeace@gmail.com \
--cc=dedekind1@gmail.com \
--cc=dwmw2@infradead.org \
--cc=linux-mtd@lists.infradead.org \
--cc=peterpansjtu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).