public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
From: Brian Norris <computersforpeace@gmail.com>
To: David Mosberger <davidm@egauge.net>
Cc: gsi@denx.de, linux-mtd@lists.infradead.org,
	Pekon Gupta <pekon@ti.com>,
	Artem Bityutskiy <dedekind1@gmail.com>
Subject: Re: [PATCH v4 5/5] mtd: nand: Improve bitflip detection for on-die ECC scheme.
Date: Tue, 1 Apr 2014 10:33:05 -0700	[thread overview]
Message-ID: <20140401173305.GI29542@ld-irv-0074> (raw)
In-Reply-To: <CALnQHM2Afp8LD6MtGQTT5jrcb9xJdYXRGD0TZ_s5GASZsbRZeg@mail.gmail.com>

(Re-constructing CC list and leaving message intact, since you missed
the "Reply-All" button)

On Tue, Apr 01, 2014 at 10:03:00AM -0600, David Mosberger wrote:
> Brian,
> 
> On Tue, Apr 1, 2014 at 1:50 AM, Brian Norris
> <computersforpeace@gmail.com> wrote:
> > On Mon, Mar 31, 2014 at 05:28:57PM -0600, David Mosberger wrote:
> 
> >> +static int
> >> +set_on_die_ecc(struct mtd_info *mtd, struct nand_chip *chip, int on)
> >> +{
> >> +     u8 data[ONFI_SUBFEATURE_PARAM_LEN] = { 0, };
> >> +
> >> +     if (chip->ecc.mode != NAND_ECC_HW_ON_DIE)
> >> +             return 0;
> >
> > I think this check is unnecessary, and probably wrong. The caller should
> > make sure not to call this for devices that don't support it. Or else,
> > there should at least be an error code, like -EOPNOTSUPP.
> 
> Fair enough.  I removed the check for ecc.mode.
> 
> >> +
> >> +     if (on)
> >> +             data[0] = ONFI_FEATURE_ARRAY_OP_MODE_ENABLE_ON_DIE_ECC;
> >> +
> >> +     return chip->onfi_set_features(mtd, chip,
> >> +                                    ONFI_FEATURE_ADDR_ARRAY_OP_MODE, data);
> >> +}
> >
> > This should be implemented on a per-vendor basis and provided as a
> > callback (perhaps chip->set_internal_ecc()?). Then, you would only make
> > chip->set_internal_ecc non-NULL for flash that support it.
> 
> It's not clear at all to me how (un-)standardized this stuff is.  It
> may be Micron specific,
> but it may not be.  I don't know.  Since it's only called for Micron
> chips with on-die enabled,
> the code is safe as it is.
> 
> > Do you actually need to re-read, or can you use the existing data? Or at
> > least, you could overwrite the databuf, instead of using a new chkbuf.
> 
> In general,  you have to (re-)read.  Consider read_oob or read_subpage.
> 
> >> +
> >> +     /* Re-read page with on-die ECC off: */
> >> +     set_on_die_ecc(mtd, chip, 0);
> >> +     chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
> >> +     chip->read_buf(mtd, rawbuf, read_size);
> >> +     set_on_die_ecc(mtd, chip, 1);
> >> +
> >> +     chkoob = chkbuf + mtd->writesize;
> >> +     rawoob = rawbuf + mtd->writesize;
> >> +     eccpos = chip->ecc.layout->eccpos;
> >> +     for (i = 0; i < chip->ecc.steps; ++i) {
> >> +             /* Count bit flips in the actual data area: */
> >> +             flips = bitdiff(chkbuf, rawbuf, chip->ecc.size);
> >> +             /* Count bit flips in the ECC bytes: */
> >> +             for (j = 0; j < chip->ecc.bytes; ++j) {
> >> +                     flips += hweight8(chkoob[*eccpos] ^ rawoob[*eccpos]);
> >
> > Why didn't you use bitdiff() here too?
> 
> Because the data is not contiguous and I didn't think the overhead
> of an extra function call was warranted for individual bytes.  But yeah,
> we could certainly use bitdiff() here on individual bytes, if you prefer.
> 
> >>               /*
> >> -              * Simple but suboptimal: any page with a single stuck
> >> -              * bit will be unusable since it'll be rewritten on
> >> -              * each read...
> >> +              * The Micron chips turn on the REWRITE status bit for
> >> +              * ANY bit flips.  Some pages have stuck bits, so we
> >> +              * don't want to migrate a block just because of
> >> +              * single bit errors because otherwise, that block
> >> +              * would effectively become unusable.  So, work out in
> >> +              * software what the max number of flipped bits is for
> >> +              * all subpages in a page:
> >
> > Can you shorten this comment? It's rather verbose, and it's making
> > assumptions about upper-layer "migrations". I think we can leave it at
> > something much simpler, like:
> >
> >         /*
> >          * Micron on-die ECC doesn't report the number of bitflips, so
> >          * we have to count them ourself to see if the error rate is too
> >          * high.
> >          */
> 
> Sure, I did add "This is particularly important for pages with stuck
> bits." since
> I think that is an important case to think about here.
> 
>   --david
> -- 
> eGauge Systems LLC, http://egauge.net/, 1.877-EGAUGE1, fax 720.545.9768

  parent reply	other threads:[~2014-04-01 17:33 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-31 23:28 [PATCH v4 0/5] mtd: nand: Add on-die ECC support David Mosberger
2014-03-31 23:28 ` [PATCH v4 1/5] mtd: nand: Detect Micron flash with on-die ECC (aka "internal ECC") enabled David Mosberger
2014-04-01  6:39   ` Brian Norris
2014-04-01 15:26     ` David Mosberger
2014-04-02  7:27       ` Gupta, Pekon
2014-04-02 15:07         ` David Mosberger-Tang
2014-04-02 16:50           ` Gerhard Sittig
2014-04-02 17:02             ` David Mosberger
2014-04-03  7:10               ` Gerhard Sittig
     [not found]                 ` <CALnQHM1VLY=t6CaQtHGtp=enNCCj=Xz_QN7sj20hUCd8ZJjKpA@mail.gmail.com>
2014-04-03 15:26                   ` David Mosberger
2014-03-31 23:28 ` [PATCH v4 2/5] mtd: nand: Add NAND_ECC_HW_ON_DIE ECC-mode David Mosberger
2014-04-01  6:02   ` Gupta, Pekon
2014-04-01 15:32     ` David Mosberger
2014-04-01  7:24   ` Brian Norris
2014-04-01 15:41     ` David Mosberger
2014-03-31 23:28 ` [PATCH v4 3/5] mtd: nand: Enable subpage-reads on flashes with on-die ECC enabled David Mosberger
2014-03-31 23:28 ` [PATCH v4 4/5] mtd: nand: Allocate extra buffers needed for on-die ECC controller David Mosberger
2014-04-01  7:28   ` Brian Norris
2014-04-01  7:37     ` Gupta, Pekon
2014-04-01  8:24       ` Brian Norris
2014-03-31 23:28 ` [PATCH v4 5/5] mtd: nand: Improve bitflip detection for on-die ECC scheme David Mosberger
2014-04-01  6:29   ` Gupta, Pekon
2014-04-01 15:51     ` David Mosberger
2014-04-01 17:30       ` Brian Norris
2014-04-01  7:50   ` Brian Norris
     [not found]     ` <CALnQHM2Afp8LD6MtGQTT5jrcb9xJdYXRGD0TZ_s5GASZsbRZeg@mail.gmail.com>
2014-04-01 17:33       ` Brian Norris [this message]
2014-04-01 18:01         ` Brian Norris
2014-04-01 18:13           ` David Mosberger-Tang
2014-04-02  7:57             ` Gupta, Pekon
2014-04-01  8:02 ` [PATCH v4 0/5] mtd: nand: Add on-die ECC support Brian Norris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140401173305.GI29542@ld-irv-0074 \
    --to=computersforpeace@gmail.com \
    --cc=davidm@egauge.net \
    --cc=dedekind1@gmail.com \
    --cc=gsi@denx.de \
    --cc=linux-mtd@lists.infradead.org \
    --cc=pekon@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox