From: Ivan Djelic <ivan.djelic@parrot.com>
To: Mike Dunn <mikedunn@newsguy.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>
Subject: Re: ubi on MLC nand flash
Date: Sun, 6 Nov 2011 18:35:28 +0100 [thread overview]
Message-ID: <20111106173528.GA25467@parrot.com> (raw)
In-Reply-To: <4EB6A6A8.7010703@newsguy.com>
On Sun, Nov 06, 2011 at 03:24:24PM +0000, Mike Dunn wrote:
> Hi everyone,
>
> I recently started to do serious testing of UBI on the diskonchip G4 MLC nand
> driver I'm finishing up. I started with the io_basic ubi test in mtd-utils.
> What I find is that, after a few minutes, enough PEBs are marked as bad to
> exhaust the reserve PEB pool, UBI switches to r/o mode, and the test fails. The
> reason is that - on this device at least - bit flips seem to be persistent;
> i.e., you will get e.g. 1 bit flip every time you read a certain page.
> Consequently, when the bit flip occurs and the PEB gets scrubbed, the torture
> test fails because the bit flip reoccurs, and the PEB is marked bad.
Hi Mike,
I had the same results on recent (34 nm) SLC devices.
> I expected that eventually I might have to dig into the "program disturb",
> "read-disturb" or "paired pages" MLC issues, but the problem seems more
> fundamental. My general impression is that UBI is too unforgiving for this
> device. The ecc can correct up to 4 bit flips, so 1 bit flip seems to not be a
> big deal. I'm new to UBI so this is not a critique or a proposal, I'm just
> hoping some experts can offer some advice or opinions. The obvious remedy is to
> set a higher threshold for marking a PEB as bad, say 2 or 3 bit flips.
I discussed the matter with a nand manufacturer a while ago; the information I
could get (for SLC devices, not MLC) can be summarized as follows:
1. A block should be marked bad if a number of bitflips greater than what ecc
is able to correct has been detected after erase/program; or if the operation
failed with a status error
2. If the maximum number of correctable bitflips is reached during a read
operation, data should be relocated to another block, without marking the block
as bad
I could not get definitive information about the handling of persistent
bitflips, apart from the fact that they are expected and should not cause a
block to be marked as bad (as long as the ecc capability is not exceeded).
Most nand datasheets I had in my hands are also vague on the subject; they lack
a precise error handling strategy description for multi-bitflip devices.
Point 2 above seems reasonable as long as bitflips are reversible (i.e.
cancelled by an erase operation); but what if the maximum number of correctable
errors is reached during a read, those errors being caused by persistent
bitflips ? Should the block be considered bad (IMHO it should be scrubbed then
marked bad), or should data be simply relocated ?
When I asked the latter question to a nand manufacturer, his recommendation
was (quoting):
"(...) not to mark the block bad (because the error is correctable), and
to keep a copy of critical data in another location as backup" (!).
I suggest the following strategy:
Upon reading, when errors are detected (and corrected by ecc):
- if (nb of errors < ecc capability (*)) then no scrubbing, do nothing
- if (nb of errors == ecc capability (*)) then
- scrub block, then torture it and compute nb of persistent bitflips
- if (nb of persistent errors < ecc capability (*)) then block is OK
- if (nb of persistent errors == ecc capability (*)) then mark block as bad
[because a single additional bitflip (e.g. a read disturb) would cause
data loss]
(*) In order to improve reliability, thresholds can be used instead of max ecc
capability.
I'm interested to hear opinions from mtd users/nand experts on the subject; I
know that at least a few of us had to implement ecc thresholds recently. And
UBI/mtd should be modified to support this (IIRC Artem was pushing in that
direction a while ago).
BR,
--
Ivan
next prev parent reply other threads:[~2011-11-06 17:36 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-06 15:24 ubi on MLC nand flash Mike Dunn
2011-11-06 17:35 ` Ivan Djelic [this message]
2011-11-06 20:28 ` Mike Dunn
2011-11-08 21:45 ` Artem Bityutskiy
2011-11-09 3:04 ` Mike Dunn
2011-11-09 8:44 ` Artem Bityutskiy
2011-11-09 13:13 ` Mike Dunn
2011-11-09 12:22 ` Artem Bityutskiy
2011-11-08 21:32 ` Artem Bityutskiy
2011-11-09 1:51 ` Mike Dunn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111106173528.GA25467@parrot.com \
--to=ivan.djelic@parrot.com \
--cc=linux-mtd@lists.infradead.org \
--cc=mikedunn@newsguy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).