Re: CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Artem Bityutskiy <dedekind1@gmail.com>
To: Ivan Djelic <ivan.djelic@parrot.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
	David Peverley <pev@sketchymonkey.com>,
	Ricard Wanderlof <ricard.wanderlof@axis.com>
Subject: Re: CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC
Date: Fri, 25 Feb 2011 14:12:10 +0200	[thread overview]
Message-ID: <1298635930.2798.96.camel@localhost> (raw)
In-Reply-To: <20110225113609.GB21841@parrot.com>

On Fri, 2011-02-25 at 12:36 +0100, Ivan Djelic wrote:
> On Fri, Feb 25, 2011 at 10:29:22AM +0000, Artem Bityutskiy wrote:
> (...)
> > Currently the mechanism to mark a block is bad is the torture function
> > failure: we write a pattern, read it back, compare, and do this several
> > times with different patterns. In case of any error in any step, or if
> > we read back something we did not write, or even if we get a bit-flip
> > when we read back the data, we bark the eraseblock as bad. Otherwise it
> > is returned to the pull of free eraseblocks.
> > 
> > See torture_peb() in drivers/mtd/ubi/io.c
> > 
> > This procedure is not ideal, and could be improved:
> > 
> > a) we could store amount of times the eraseblock was tortured. Since we
> > torture only if there was a write error, too many torture session would
> > indicate that the eraseblock is unstable.
> > b) we could take into account the erase count somehow.
> > 
> > But yes, the threshold would probably set up by the system designer at
> > the end.
> 
> The fact that a bitflip detected during torture is enough to decide that a
> block is bad causes problems on some 4-bit ecc devices we are using. If we
> stick to this policy, we end up with a _lot_ of blocks being marked as bad
> (i.e. way too many).

I see. May be in your case 1 bit errors are completely harmless, but 2
and 3 are not?

> Our NAND manufacturer tells us that, as long as a block erase operation
> completes without a failure reported by the device, it should not be classified
> as bad, even if it has bitflips (which sounds risky at best).

For any amount of flipped bits per page? Sounds a bit scary.

> Right now, we implement a bitflip threshold, below which we correct ecc errors
> without reporting them. When the bitflip threshold is reached, we report the
> amount of corrected errors, triggering block scrubbing, etc.
> This is not ideal, but it prevents UBI from torturing and marking too many
> blocks as bad.

Hmm... Working around UBI behavior does not sound like a the best
solution.

How about changing the MTD interface a little and teach it to:

1. Report the bit-flip level (or you name it properly) - the amount of
bits flipped in this NAND page (or sub-page). If we read more than one
NAND page at one go, and several pages had bit-flips of different level,
report the maximum.

2. Make it possible for drivers to set the "bit-flip tolerance
threshold" (invent a better name please), which is lowest the bit-flip
level which should be considered harmful. E.g., in your case, the
threshold could be 2.

3. Make UBI only react on bit-flips with order higher or equivalend to
the threshold. In your case then, UBI would ignore all level 1 bit-flips
and react only to level 2, 3, and 4 bit-flips.

Does this sound sensible?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

next prev parent reply	other threads:[~2011-02-25 12:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-15 12:35 CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC David Peverley
2011-02-15 13:02 ` Ricard Wanderlof
2011-02-15 14:00   ` David Peverley
2011-02-15 15:01     ` Ricard Wanderlof
2011-02-15 17:58       ` David Peverley
2011-02-17 10:04         ` Ricard Wanderlof
2011-02-25  8:42           ` Artem Bityutskiy
2011-02-25  9:09             ` Ricard Wanderlof
2011-02-25 10:29               ` Artem Bityutskiy
2011-02-25 11:36                 ` Ivan Djelic
2011-02-25 12:12                   ` Artem Bityutskiy [this message]
2011-02-25 12:59                     ` David Peverley
2011-02-25 13:21                       ` Artem Bityutskiy
2011-02-25 18:27                       ` Ivan Djelic
2011-02-25 14:44                     ` Ivan Djelic
2011-02-25 16:41                       ` Artem Bityutskiy
2011-02-25 12:22                   ` Artem Bityutskiy
2011-02-25 15:14                     ` Ivan Djelic
2011-02-25  8:31     ` Artem Bityutskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1298635930.2798.96.camel@localhost \
    --to=dedekind1@gmail.com \
    --cc=ivan.djelic@parrot.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=pev@sketchymonkey.com \
    --cc=ricard.wanderlof@axis.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.