From: Brian Norris <computersforpeace@gmail.com>
To: "Gupta, Pekon" <pekon@ti.com>
Cc: Stefan Roese <sr@denx.de>,
linux-mtd <linux-mtd@lists.infradead.org>,
"Balbi, Felipe" <balbi@ti.com>,
Artem Bityutskiy <dedekind1@gmail.com>
Subject: Re: [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes
Date: Fri, 7 Mar 2014 11:11:23 -0800 [thread overview]
Message-ID: <20140307191123.GC5232@ld-irv-0074> (raw)
In-Reply-To: <20980858CB6D3A4BAE95CA194937D5E73EA6FB16@DBDE04.ent.ti.com>
Hi Pekon,
On Wed, Feb 12, 2014 at 10:31:02AM +0000, Pekon Gupta wrote:
> >From: Brian Norris [mailto:computersforpeace@gmail.com]
> >>On Fri, Jan 17, 2014 at 12:43:14PM +0530, Pekon Gupta wrote:
> >> This patch removes dependency on any marker byte in ecc-layout, to differentiate
> >> between erased-page v/s programeed-page. Instead a page is considered erased if
> >> (a) all(OOB) == 0xff, .i.e., number of zero-bits in read_ecc[] == 0
> >> (b) number of zero-bits in (Data + OOB) is less than ecc-strength
[...]
> >> @@ -1410,24 +1441,47 @@ static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
> >> }
> >>
> >> if (eccflag == 1) {
> >> - /*
> >> - * Set threshold to minimum of 4, half of ecc.strength/2
> >> - * to allow max bit flip in byte to 4
> >> - */
> >> - unsigned int threshold = min_t(unsigned int, 4,
> >> - info->nand.ecc.strength / 2);
> >> + bitflip_count = 0;
> >> + /* count zero-bits in OOB region */
> >> + err = count_zero_bits(read_ecc, eccbytes,
> >> + eccstrength, &bitflip_count);
> >> + if (err) {
> >> + /*
> >> + * number of zero-bits in OOB > ecc-strength
> >> + * either un-correctable or programmed-page
> >> + */
> >> + page_is_erased = false;
> >> + } else if (bitflip_count == 0) {
> >> + /* OOB == all(0xff) */
> >> + page_is_erased = true;
> >
> >This else-if is still incorrect. You cannot assume that just because the
> >OOB is all 0xff that the page is erased.
> >
> Now this part is most important, where I would like to get more clarity
> and feedback before I proceed.
> ----------------------------
> if OOB of an page is == all(0xff)
> (a) As per general NAND spec, chip->write_buf() _only_ fills the internal buffers of
> NAND device's with information to be written. Actual write to NAND array cells
> happen only after 'NAND_CMD_PAGEPROG' or 'NAND_CMD_CACHEDPROG' is issued.
I'm not really sure why you bring up chip->write_buf(); not all
implementations actually use this. But anyway...
> Thus both Main-area and OOB-area are programmed simultaneously inside NAND device.
> if there is any disruption (like power-cut) in on-going NAND_CMD_PAGEPROG
> cycle, then the data should be assumed to be garbage, and it may have un-stable bits.
OK. But this is not at all what we're trying to check; we are checking
*only* whether this particular page reads mostly-0xff, due to regular
bitflips on an otherwise unprogrammed page. The "assumed garbage"
conclusion is really irrelevant here.
> (b) if OOB==all(0xff) then there is _no_ ECC stored in OOB to check the read_data
I disagree. Can you guarantee that there is *no* data pattern in which
the matching ECC syndrome bytes are all 0xff?
> Hence driver cannot distinguish between genuine data and bit-flips.
>
> Now based on (a) & (b), it's safe to assume that:
> if (OOB == all 0xff)
> /* page has no-data | garbage-data*/
> Agree ?
No. But more importantly, why do you want to make this conclusion?
Aren't we *only* trying to make a decision of "is this page
mostly-0xff"?
Or more directly: you are using the above conclusion ("page has garbage
data") to then erase the buffer entirely. This is totally, 100% bogus
because the data area might have junk (partially-programmed page?) but
the OOB could be all 0xff -- and you are now lying to the upper layers,
saying the page is cleanly 0xff!!!
I believe one of the two of us has a very dire misunderstanding here.
Please help me decide which of us this is :)
> (This is where I call it page_is_erased==true, Though I could have used better
> variable name as page_has_no_valid_data = true).
> ----------------------------
>
> And also, driver should return 'total number zero-bits in both Main-area + OOB'
> if (0 < 'total number of zero-bits in both OOB + DATA region' < mtd->bitflip_threshold)
> return -EUCLEAN;
> else
> return -EBADMSG;
>
> ** However there is a bug in current patch version which I just figured out.
> I don't include the number of zero-bits of Main-area, if OOB==all(0xff).
> + } else if (bitflip_count == 0) {
> + /* OOB == all(0xff) */
> + page_is_erased = true;
> + /* missing */ bitflip_count += count_zero_bits(buf, eccsize,
> + eccstrength, &bitflip_count);
> ------------------------
>
> >> + } else {
> >> + /*
> >> + * OOB has some bits as '0' but
> >> + * number of zero-bits in OOB < ecc-strength.
> >> + * It may be erased-page with bit-flips so,
> >> + * further checks are needed for confirmation
> >> + */
> >> + /* count zero-bits in DATA region */
> >> + buf = &data[eccsize * i];
> >> + err = count_zero_bits(buf, eccsize,
> >> + eccstrength, &bitflip_count);
> >> + if (err) {
> >> + /*
> >> + * total number of zero-bits in OOB
> >> + * and DATA exceeds ecc-strength
> >> + */
> >> + page_is_erased = false;
> >> + } else {
> >> + /* number of zero-bits < ecc-strength */
> >> + page_is_erased = true;
> >> + }
> >> + }
> >
> >This whole block (where you call count_zero_bits() twice) is a
> >convoluted and buggy way of just doing the following, AFAIU:
> >
> > page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
> > ecc->strength, &bitflip_count) &&
> > !count_zero_bits(&data[ecc->size * i], ecc->size,
> > ecc->strength, &bitflip_count);
> >
> >You could modify that to your liking and add a comment, but it's much
> >more concise than your version, and from what I can tell, it's actually
> >correct.
> >
> Firstly, In you above statement 's/&&/||'
No. I wrote it exactly as I meant it. We need to check both the data
area and the OOB, and if either cause us to exceed the ECC strength,
then the page is not erased.
> because as per above statement, if count_zero_bits(oob) == 0, then
> my patch assumes 'page_has_no_valid_data = true'.
And that assumption is 100% wrong, AIUI.
> page_is_erased = !count_zero_bits(read_ecc, ecc->bytes,
> ecc->strength, &bitflip_count) ||
> !count_zero_bits(&data[ecc->size * i], ecc->size,
> ecc->strength, &bitflip_count);
>
>
> Secondly, You are missing the that here NAND driver is relaxing the limit of
> number-of-acceptable bit-flips in an erased-page, because some MLC and
> Toshiba SLC NANDs show bit-flips almost on every second already erased-page.
No, I'm not missing this. The MTD/NAND layers already take care of the
"number-of-acceptable bit-flips" using the mtd->bitflip_threshold
parameter.
> >> + if (err) {
> >> + /*
> >> + * total number of zero-bits in OOB
> >> + * and DATA exceeds ecc-strength
> >> + */
> >> + page_is_erased = false;
> >> + } else {
> >> + /* number of zero-bits < ecc-strength */
> >> + page_is_erased = true;
> >> + }
>
> In above patch I can replace ecc->strength with mtd->bitflip_threshold
> to make it more user controllable. As bitflip_threshold is configurable via
> /sys/class/mtd/mtdX/bitflip_threshold
> What is your opinion ?
The hardware driver should not be comparing with 'bitflip_threshold'.
Comparing against ecc_strength is correct, as we are simulating the
current ECC correction strength.
Brian
next prev parent reply other threads:[~2014-03-07 19:12 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-17 7:13 [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 1/6] mtd: nand: omap: add field to indicate current ecc-scheme in 'struct omap_nand_info' Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 2/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: rename ambiguous variable 'eccsize' and 'ecc_vector_size' Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 3/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix erased-page detection for BCHx_HW ECC schemes Pekon Gupta
2014-02-11 21:34 ` Brian Norris
2014-02-12 10:31 ` Gupta, Pekon
2014-03-07 19:11 ` Brian Norris [this message]
2014-01-17 7:13 ` [PATCH v7 4/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: remove redundant bit-flip counting for erased-page Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 5/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: cleanup for future enhancements Pekon Gupta
2014-01-17 7:13 ` [PATCH v7 6/6] mtd: nand: omap: ecc.correct: omap_elm_correct_data: fix programmed-page bit-flip correction logic Pekon Gupta
2014-01-29 13:18 ` [PATCH v7 0/6] mtd: nand: omap: optimized chip->ecc.correct() for H/W ECC schemes Stefan Roese
2014-02-04 11:37 ` Gupta, Pekon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140307191123.GC5232@ld-irv-0074 \
--to=computersforpeace@gmail.com \
--cc=balbi@ti.com \
--cc=dedekind1@gmail.com \
--cc=linux-mtd@lists.infradead.org \
--cc=pekon@ti.com \
--cc=sr@denx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.