Re: [RFC] UBI torture test fails to detect some bad blocks.

linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Richard Weinberger <richard@nod.at>
To: Arnaud Mouiche <arnaud.mouiche@invoxia.com>,
	Artem Bityutskiy <dedekind1@gmail.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Brian Norris <computersforpeace@gmail.com>,
	linux-mtd@lists.infradead.org,
	boris.brezillon@free-electrons.com, peterpansjtu@gmail.com
Subject: Re: [RFC] UBI torture test fails to detect some bad blocks.
Date: Fri, 8 Apr 2016 10:24:11 +0200	[thread overview]
Message-ID: <57076AAB.7050008@nod.at> (raw)
In-Reply-To: <1460100214-31298-1-git-send-email-arnaud.mouiche@invoxia.com>

Hi!

Am 08.04.2016 um 09:23 schrieb Arnaud Mouiche:
> Hi all.
> 
> Just some details about what I experience recently with some bad blocs on 
> a MX35LF1GE4AB spinand device (SLC, 1Gb, 4bits ECC per 512 sub-page), 
> where a UBI partition is attached to manage rootfs & co  (as usual).
> 
> I get the hand on some devices refusing to boot.
> The analyse of the Erase Counters shows that some of them where erased 
> more than 100K, while the majority have an EC below 20 !

Ouch.

> Looking at the bad one, they run the following scenario nearly in loop:
> - linux read some file inside the rootfs
> - a bitflip is detected
> - scrubbing is scheduled.
> - the scrubbing target a PEB with a pretty high EC,
> - this high EC is also due to frequent bitflip in the target PEB in the past.
> - while the PEB data are moved, a bitflip is detected scheduling a torture test.
> - the torture test *ALWAYS* pass (whereas bitflip are *VERY* frequent for 
>   the same PEB when the read comes filesystem read).
> 
> So, it seems obvious the PEBs in question are bad PEBs.
> The question is now why the torture test pass.
> 
> Reproducing the pattern test by hand on this block shows the same result.
> But applying different patterns on different pages within the block shows that 
> the content of some pages are affected by the content of the other pages.
> In particularly, for this block, if the first page is full of FF and the rest 
> of the block is full of 00, I can count  more than 100 bitflips (!)

100 flips per ECC step? Shouldn't this lead to a uncorrectable ECC error?
I have no idea how much bits your ECC can fix..
Which bitflip threshold do you have? UBI sees bitflips only after a threshold
is reached. If it is too low, UBI scrubs too often, which seems to be the case here.
It is perfectly fine to have bitflips.

So, we need dig a bit deeper first.

> What kind of pattern should be added to detect those kind of issues ?

This is a very hard question and almost impossible to answer as it is vendor
specific.

> We can think of testing every page one by one, but given the relatively large 
> number of pages in a block, it doesn't sound realistic.
> The easiest way could be to use a random pattern, and try it a relative low 
> number of times.
> Indeed, this simple random test is efficient to detect every bad block of this device.
> If the random test pass once (because this is a random test), there are chances 
> that the next bit flip detection will trigger a new torture test, and at the end, 
> it will be finally detected as bad.

Having an additional random pattern is not a bad idea.
This is definitively something we can consider adding to UBI.
But I'm not happy with your implementation.

peb_rnd_buff = kmalloc(ubi->peb_size, GFP_KERNEL);
... is a big no-no. peb_size can be a few megabytes.

What about repeating a few random bytes over and over?

> And the implementation is pretty obvious...

;-)

Thanks,
//richard

next prev parent reply	other threads:[~2016-04-08  8:24 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-08  7:23 [RFC] UBI torture test fails to detect some bad blocks Arnaud Mouiche
2016-04-08  7:23 ` [RFC] UBI: harden torture_peb to miss less " Arnaud Mouiche
2016-04-08  8:24 ` Richard Weinberger [this message]
2016-04-08  9:02   ` [RFC] UBI torture test fails to detect some " arnaud.mouiche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57076AAB.7050008@nod.at \
    --to=richard@nod.at \
    --cc=arnaud.mouiche@invoxia.com \
    --cc=boris.brezillon@free-electrons.com \
    --cc=computersforpeace@gmail.com \
    --cc=dedekind1@gmail.com \
    --cc=dwmw2@infradead.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=peterpansjtu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).