From: Brian Norris <computersforpeace@gmail.com>
To: David Mosberger <davidm@egauge.net>
Cc: gsi@denx.de, linux-mtd@lists.infradead.org,
Pekon Gupta <pekon@ti.com>,
Artem Bityutskiy <dedekind1@gmail.com>
Subject: Re: [PATCH v4 5/5] mtd: nand: Improve bitflip detection for on-die ECC scheme.
Date: Tue, 1 Apr 2014 11:01:28 -0700 [thread overview]
Message-ID: <20140401180128.GJ29542@ld-irv-0074> (raw)
In-Reply-To: <20140401173305.GI29542@ld-irv-0074>
On Tue, Apr 01, 2014 at 10:33:05AM -0700, Brian Norris wrote:
> (Re-constructing CC list and leaving message intact, since you missed
> the "Reply-All" button)
>
> On Tue, Apr 01, 2014 at 10:03:00AM -0600, David Mosberger wrote:
> > On Tue, Apr 1, 2014 at 1:50 AM, Brian Norris <computersforpeace@gmail.com> wrote:
> > > On Mon, Mar 31, 2014 at 05:28:57PM -0600, David Mosberger wrote:
> > >> +
> > >> + if (on)
> > >> + data[0] = ONFI_FEATURE_ARRAY_OP_MODE_ENABLE_ON_DIE_ECC;
> > >> +
> > >> + return chip->onfi_set_features(mtd, chip,
> > >> + ONFI_FEATURE_ADDR_ARRAY_OP_MODE, data);
> > >> +}
> > >
> > > This should be implemented on a per-vendor basis and provided as a
> > > callback (perhaps chip->set_internal_ecc()?). Then, you would only make
> > > chip->set_internal_ecc non-NULL for flash that support it.
> >
> > It's not clear at all to me how (un-)standardized this stuff is. It
> > may be Micron specific,
> > but it may not be. I don't know. Since it's only called for Micron
> > chips with on-die enabled,
> > the code is safe as it is.
The point is that we don't write code into the generic framework that
assumes Micron is the only one to implement it, if possible. This type
of replaceable feature is best left as a callback which can be set to
NULL, I think. Or if you can find a better point at which the
implementation specifics can be abstracted, that can work as well. But
regardless, my high level comment must be addressed -- you wrote this
code as if Micron is the only one to implement on-die ECC.
FWIW, the Toshiba BENAND (Built-in ECC NAND) datasheet I saw doesn't
advertise the ability to disable its ECC, but it does report per-sector
bitflip information (nice!). Also, I think it hides the ECC syndrome
bytes from the user, so most drivers could possibly ignore the built-in
ECC entirely if desired.
> > > Do you actually need to re-read, or can you use the existing data? Or at
> > > least, you could overwrite the databuf, instead of using a new chkbuf.
> >
> > In general, you have to (re-)read. Consider read_oob or read_subpage.
> >
> > >> +
> > >> + /* Re-read page with on-die ECC off: */
> > >> + set_on_die_ecc(mtd, chip, 0);
> > >> + chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
> > >> + chip->read_buf(mtd, rawbuf, read_size);
> > >> + set_on_die_ecc(mtd, chip, 1);
> > >> +
> > >> + chkoob = chkbuf + mtd->writesize;
> > >> + rawoob = rawbuf + mtd->writesize;
> > >> + eccpos = chip->ecc.layout->eccpos;
> > >> + for (i = 0; i < chip->ecc.steps; ++i) {
> > >> + /* Count bit flips in the actual data area: */
> > >> + flips = bitdiff(chkbuf, rawbuf, chip->ecc.size);
> > >> + /* Count bit flips in the ECC bytes: */
> > >> + for (j = 0; j < chip->ecc.bytes; ++j) {
> > >> + flips += hweight8(chkoob[*eccpos] ^ rawoob[*eccpos]);
> > >
> > > Why didn't you use bitdiff() here too?
> >
> > Because the data is not contiguous and I didn't think the overhead
> > of an extra function call was warranted for individual bytes. But yeah,
> > we could certainly use bitdiff() here on individual bytes, if you prefer.
Sorry, I misread the loop. Never mind. (Although it does then suggest
that maybe the bitdiff() function doesn't really need to stand alone,
for symmetry. Your call.)
> > >> /*
> > >> - * Simple but suboptimal: any page with a single stuck
> > >> - * bit will be unusable since it'll be rewritten on
> > >> - * each read...
> > >> + * The Micron chips turn on the REWRITE status bit for
> > >> + * ANY bit flips. Some pages have stuck bits, so we
> > >> + * don't want to migrate a block just because of
> > >> + * single bit errors because otherwise, that block
> > >> + * would effectively become unusable. So, work out in
> > >> + * software what the max number of flipped bits is for
> > >> + * all subpages in a page:
> > >
> > > Can you shorten this comment? It's rather verbose, and it's making
> > > assumptions about upper-layer "migrations". I think we can leave it at
> > > something much simpler, like:
> > >
> > > /*
> > > * Micron on-die ECC doesn't report the number of bitflips, so
> > > * we have to count them ourself to see if the error rate is too
> > > * high.
> > > */
> >
> > Sure, I did add "This is particularly important for pages with stuck
> > bits." since
> > I think that is an important case to think about here.
Use your judgment, but please at least kill the migration comment.
Brian
next prev parent reply other threads:[~2014-04-01 18:01 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-31 23:28 [PATCH v4 0/5] mtd: nand: Add on-die ECC support David Mosberger
2014-03-31 23:28 ` [PATCH v4 1/5] mtd: nand: Detect Micron flash with on-die ECC (aka "internal ECC") enabled David Mosberger
2014-04-01 6:39 ` Brian Norris
2014-04-01 15:26 ` David Mosberger
2014-04-02 7:27 ` Gupta, Pekon
2014-04-02 15:07 ` David Mosberger-Tang
2014-04-02 16:50 ` Gerhard Sittig
2014-04-02 17:02 ` David Mosberger
2014-04-03 7:10 ` Gerhard Sittig
[not found] ` <CALnQHM1VLY=t6CaQtHGtp=enNCCj=Xz_QN7sj20hUCd8ZJjKpA@mail.gmail.com>
2014-04-03 15:26 ` David Mosberger
2014-03-31 23:28 ` [PATCH v4 2/5] mtd: nand: Add NAND_ECC_HW_ON_DIE ECC-mode David Mosberger
2014-04-01 6:02 ` Gupta, Pekon
2014-04-01 15:32 ` David Mosberger
2014-04-01 7:24 ` Brian Norris
2014-04-01 15:41 ` David Mosberger
2014-03-31 23:28 ` [PATCH v4 3/5] mtd: nand: Enable subpage-reads on flashes with on-die ECC enabled David Mosberger
2014-03-31 23:28 ` [PATCH v4 4/5] mtd: nand: Allocate extra buffers needed for on-die ECC controller David Mosberger
2014-04-01 7:28 ` Brian Norris
2014-04-01 7:37 ` Gupta, Pekon
2014-04-01 8:24 ` Brian Norris
2014-03-31 23:28 ` [PATCH v4 5/5] mtd: nand: Improve bitflip detection for on-die ECC scheme David Mosberger
2014-04-01 6:29 ` Gupta, Pekon
2014-04-01 15:51 ` David Mosberger
2014-04-01 17:30 ` Brian Norris
2014-04-01 7:50 ` Brian Norris
[not found] ` <CALnQHM2Afp8LD6MtGQTT5jrcb9xJdYXRGD0TZ_s5GASZsbRZeg@mail.gmail.com>
2014-04-01 17:33 ` Brian Norris
2014-04-01 18:01 ` Brian Norris [this message]
2014-04-01 18:13 ` David Mosberger-Tang
2014-04-02 7:57 ` Gupta, Pekon
2014-04-01 8:02 ` [PATCH v4 0/5] mtd: nand: Add on-die ECC support Brian Norris
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140401180128.GJ29542@ld-irv-0074 \
--to=computersforpeace@gmail.com \
--cc=davidm@egauge.net \
--cc=dedekind1@gmail.com \
--cc=gsi@denx.de \
--cc=linux-mtd@lists.infradead.org \
--cc=pekon@ti.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox