Re: CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

From: Artem Bityutskiy <dedekind1@gmail.com>
To: Ivan Djelic <ivan.djelic@parrot.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
	David Peverley <pev@sketchymonkey.com>,
	Ricard Wanderlof <ricard.wanderlof@axis.com>
Subject: Re: CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC
Date: Fri, 25 Feb 2011 14:12:10 +0200	[thread overview]
Message-ID: <1298635930.2798.96.camel@localhost> (raw)
In-Reply-To: <20110225113609.GB21841@parrot.com>

On Fri, 2011-02-25 at 12:36 +0100, Ivan Djelic wrote:
> On Fri, Feb 25, 2011 at 10:29:22AM +0000, Artem Bityutskiy wrote:
> (...)
> > Currently the mechanism to mark a block is bad is the torture function
> > failure: we write a pattern, read it back, compare, and do this several
> > times with different patterns. In case of any error in any step, or if
> > we read back something we did not write, or even if we get a bit-flip
> > when we read back the data, we bark the eraseblock as bad. Otherwise it
> > is returned to the pull of free eraseblocks.
> > 
> > See torture_peb() in drivers/mtd/ubi/io.c
> > 
> > This procedure is not ideal, and could be improved:
> > 
> > a) we could store amount of times the eraseblock was tortured. Since we
> > torture only if there was a write error, too many torture session would
> > indicate that the eraseblock is unstable.
> > b) we could take into account the erase count somehow.
> > 
> > But yes, the threshold would probably set up by the system designer at
> > the end.
> 
> The fact that a bitflip detected during torture is enough to decide that a
> block is bad causes problems on some 4-bit ecc devices we are using. If we
> stick to this policy, we end up with a _lot_ of blocks being marked as bad
> (i.e. way too many).

I see. May be in your case 1 bit errors are completely harmless, but 2
and 3 are not?

> Our NAND manufacturer tells us that, as long as a block erase operation
> completes without a failure reported by the device, it should not be classified
> as bad, even if it has bitflips (which sounds risky at best).

For any amount of flipped bits per page? Sounds a bit scary.

> Right now, we implement a bitflip threshold, below which we correct ecc errors
> without reporting them. When the bitflip threshold is reached, we report the
> amount of corrected errors, triggering block scrubbing, etc.
> This is not ideal, but it prevents UBI from torturing and marking too many
> blocks as bad.

Hmm... Working around UBI behavior does not sound like a the best
solution.

How about changing the MTD interface a little and teach it to:

1. Report the bit-flip level (or you name it properly) - the amount of
bits flipped in this NAND page (or sub-page). If we read more than one
NAND page at one go, and several pages had bit-flips of different level,
report the maximum.

2. Make it possible for drivers to set the "bit-flip tolerance
threshold" (invent a better name please), which is lowest the bit-flip
level which should be considered harmful. E.g., in your case, the
threshold could be 2.

3. Make UBI only react on bit-flips with order higher or equivalend to
the threshold. In your case then, UBI would ignore all level 1 bit-flips
and react only to level 2, 3, and 4 bit-flips.

Does this sound sensible?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

next prev parent reply	other threads:[~2011-02-25 12:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-15 12:35 CONFIG_MTD_NAND_VERIFY_WRITE with Software ECC David Peverley
2011-02-15 13:02 ` Ricard Wanderlof
2011-02-15 14:00   ` David Peverley
2011-02-15 15:01     ` Ricard Wanderlof
2011-02-15 17:58       ` David Peverley
2011-02-17 10:04         ` Ricard Wanderlof
2011-02-25  8:42           ` Artem Bityutskiy
2011-02-25  9:09             ` Ricard Wanderlof
2011-02-25 10:29               ` Artem Bityutskiy
2011-02-25 11:36                 ` Ivan Djelic
2011-02-25 12:12                   ` Artem Bityutskiy [this message]
2011-02-25 12:59                     ` David Peverley
2011-02-25 13:21                       ` Artem Bityutskiy
2011-02-25 18:27                       ` Ivan Djelic
2011-02-25 14:44                     ` Ivan Djelic
2011-02-25 16:41                       ` Artem Bityutskiy
2011-02-25 12:22                   ` Artem Bityutskiy
2011-02-25 15:14                     ` Ivan Djelic
2011-02-25  8:31     ` Artem Bityutskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1298635930.2798.96.camel@localhost \
    --to=dedekind1@gmail.com \
    --cc=ivan.djelic@parrot.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=pev@sketchymonkey.com \
    --cc=ricard.wanderlof@axis.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox