From: Artem Bityutskiy <dedekind1@gmail.com>
To: Ivan Djelic <ivan.djelic@parrot.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
David Peverley <pev@sketchymonkey.com>,
Ricard Wanderlof <ricard.wanderlof@axis.com>
Subject: Re: CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC
Date: Fri, 25 Feb 2011 14:12:10 +0200 [thread overview]
Message-ID: <1298635930.2798.96.camel@localhost> (raw)
In-Reply-To: <20110225113609.GB21841@parrot.com>
On Fri, 2011-02-25 at 12:36 +0100, Ivan Djelic wrote:
> On Fri, Feb 25, 2011 at 10:29:22AM +0000, Artem Bityutskiy wrote:
> (...)
> > Currently the mechanism to mark a block is bad is the torture function
> > failure: we write a pattern, read it back, compare, and do this several
> > times with different patterns. In case of any error in any step, or if
> > we read back something we did not write, or even if we get a bit-flip
> > when we read back the data, we bark the eraseblock as bad. Otherwise it
> > is returned to the pull of free eraseblocks.
> >
> > See torture_peb() in drivers/mtd/ubi/io.c
> >
> > This procedure is not ideal, and could be improved:
> >
> > a) we could store amount of times the eraseblock was tortured. Since we
> > torture only if there was a write error, too many torture session would
> > indicate that the eraseblock is unstable.
> > b) we could take into account the erase count somehow.
> >
> > But yes, the threshold would probably set up by the system designer at
> > the end.
>
> The fact that a bitflip detected during torture is enough to decide that a
> block is bad causes problems on some 4-bit ecc devices we are using. If we
> stick to this policy, we end up with a _lot_ of blocks being marked as bad
> (i.e. way too many).
I see. May be in your case 1 bit errors are completely harmless, but 2
and 3 are not?
> Our NAND manufacturer tells us that, as long as a block erase operation
> completes without a failure reported by the device, it should not be classified
> as bad, even if it has bitflips (which sounds risky at best).
For any amount of flipped bits per page? Sounds a bit scary.
> Right now, we implement a bitflip threshold, below which we correct ecc errors
> without reporting them. When the bitflip threshold is reached, we report the
> amount of corrected errors, triggering block scrubbing, etc.
> This is not ideal, but it prevents UBI from torturing and marking too many
> blocks as bad.
Hmm... Working around UBI behavior does not sound like a the best
solution.
How about changing the MTD interface a little and teach it to:
1. Report the bit-flip level (or you name it properly) - the amount of
bits flipped in this NAND page (or sub-page). If we read more than one
NAND page at one go, and several pages had bit-flips of different level,
report the maximum.
2. Make it possible for drivers to set the "bit-flip tolerance
threshold" (invent a better name please), which is lowest the bit-flip
level which should be considered harmful. E.g., in your case, the
threshold could be 2.
3. Make UBI only react on bit-flips with order higher or equivalend to
the threshold. In your case then, UBI would ignore all level 1 bit-flips
and react only to level 2, 3, and 4 bit-flips.
Does this sound sensible?
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
next prev parent reply other threads:[~2011-02-25 12:16 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-15 12:35 CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC David Peverley
2011-02-15 13:02 ` Ricard Wanderlof
2011-02-15 14:00 ` David Peverley
2011-02-15 15:01 ` Ricard Wanderlof
2011-02-15 17:58 ` David Peverley
2011-02-17 10:04 ` Ricard Wanderlof
2011-02-25 8:42 ` Artem Bityutskiy
2011-02-25 9:09 ` Ricard Wanderlof
2011-02-25 10:29 ` Artem Bityutskiy
2011-02-25 11:36 ` Ivan Djelic
2011-02-25 12:12 ` Artem Bityutskiy [this message]
2011-02-25 12:59 ` David Peverley
2011-02-25 13:21 ` Artem Bityutskiy
2011-02-25 18:27 ` Ivan Djelic
2011-02-25 14:44 ` Ivan Djelic
2011-02-25 16:41 ` Artem Bityutskiy
2011-02-25 12:22 ` Artem Bityutskiy
2011-02-25 15:14 ` Ivan Djelic
2011-02-25 8:31 ` Artem Bityutskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1298635930.2798.96.camel@localhost \
--to=dedekind1@gmail.com \
--cc=ivan.djelic@parrot.com \
--cc=linux-mtd@lists.infradead.org \
--cc=pev@sketchymonkey.com \
--cc=ricard.wanderlof@axis.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox