Re: ubi on MLC nand flash

linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Ivan Djelic <ivan.djelic@parrot.com>
To: Mike Dunn <mikedunn@newsguy.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>
Subject: Re: ubi on MLC nand flash
Date: Sun, 6 Nov 2011 18:35:28 +0100	[thread overview]
Message-ID: <20111106173528.GA25467@parrot.com> (raw)
In-Reply-To: <4EB6A6A8.7010703@newsguy.com>

On Sun, Nov 06, 2011 at 03:24:24PM +0000, Mike Dunn wrote:
> Hi everyone,
> 
> I recently started to do serious testing of UBI on the diskonchip G4 MLC nand
> driver I'm finishing up.  I started with the io_basic ubi test in mtd-utils. 
> What I find is that, after a few minutes, enough PEBs are marked as bad to
> exhaust the reserve PEB pool, UBI switches to r/o mode, and the test fails.  The
> reason is that - on this device at least - bit flips seem to be persistent;
> i.e., you will get e.g. 1 bit flip every time you read a certain page. 
> Consequently, when the bit flip occurs and the PEB gets scrubbed, the torture
> test fails because the bit flip reoccurs, and the PEB is marked bad.

Hi Mike,
I had the same results on recent (34 nm) SLC devices.

> I expected that eventually I might have to dig into the "program disturb",
> "read-disturb" or "paired pages" MLC issues, but the problem seems more
> fundamental.  My general impression is that UBI is too unforgiving for this
> device.  The ecc can correct up to 4 bit flips, so 1 bit flip seems to not be a
> big deal.  I'm new to UBI so this is not a critique or a proposal, I'm just
> hoping some experts can offer some advice or opinions.  The obvious remedy is to
> set a higher threshold for marking a PEB as bad, say 2 or 3 bit flips.

I discussed the matter with a nand manufacturer a while ago; the information I
could get (for SLC devices, not MLC) can be summarized as follows:

1. A block should be marked bad if a number of bitflips greater than what ecc
is able to correct has been detected after erase/program; or if the operation
failed with a status error

2. If the maximum number of correctable bitflips is reached during a read
operation, data should be relocated to another block, without marking the block
as bad

I could not get definitive information about the handling of persistent
bitflips, apart from the fact that they are expected and should not cause a
block to be marked as bad (as long as the ecc capability is not exceeded).
Most nand datasheets I had in my hands are also vague on the subject; they lack
a precise error handling strategy description for multi-bitflip devices.

Point 2 above seems reasonable as long as bitflips are reversible (i.e.
cancelled by an erase operation); but what if the maximum number of correctable
errors is reached during a read, those errors being caused by persistent
bitflips ? Should the block be considered bad (IMHO it should be scrubbed then
marked bad), or should data be simply relocated ?
When I asked the latter question to a nand manufacturer, his recommendation
was (quoting):
"(...) not to mark the block bad (because the error is correctable), and
to keep a copy of critical data in another location as backup" (!).

I suggest the following strategy:

Upon reading, when errors are detected (and corrected by ecc):
 - if (nb of errors <  ecc capability (*)) then no scrubbing, do nothing
 - if (nb of errors == ecc capability (*)) then
    - scrub block, then torture it and compute nb of persistent bitflips
    - if (nb of persistent errors <  ecc capability (*)) then block is OK
    - if (nb of persistent errors == ecc capability (*)) then mark block as bad
      [because a single additional bitflip (e.g. a read disturb) would cause
      data loss]

(*) In order to improve reliability, thresholds can be used instead of max ecc
capability.

I'm interested to hear opinions from mtd users/nand experts on the subject; I
know that at least a few of us had to implement ecc thresholds recently. And
UBI/mtd should be modified to support this (IIRC Artem was pushing in that
direction a while ago).

BR,
--
Ivan

next prev parent reply	other threads:[~2011-11-06 17:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-06 15:24 ubi on MLC nand flash Mike Dunn
2011-11-06 17:35 ` Ivan Djelic [this message]
2011-11-06 20:28   ` Mike Dunn
2011-11-08 21:45     ` Artem Bityutskiy
2011-11-09  3:04       ` Mike Dunn
2011-11-09  8:44         ` Artem Bityutskiy
2011-11-09 13:13           ` Mike Dunn
2011-11-09 12:22             ` Artem Bityutskiy
2011-11-08 21:32 ` Artem Bityutskiy
2011-11-09  1:51   ` Mike Dunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111106173528.GA25467@parrot.com \
    --to=ivan.djelic@parrot.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=mikedunn@newsguy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).