flash bbt broken due to unitialized bitflip

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* flash bbt broken due to unitialized bitflip_threshold?
@ 2012-06-05 22:06 Sascha Hauer
  2012-06-06  9:50 ` Shmulik Ladkani
  0 siblings, 1 reply; 15+ messages in thread
From: Sascha Hauer @ 2012-06-05 22:06 UTC (permalink / raw)
  To: linux-mtd; +Cc: Artem Bityutskiy, Mike Dunn

Hi All,

nand_scan_tail calls chip->scan_bbt which in case of a flash based bbt
calls mtd_read. mtd_read returns -EUCLEAN when the number of errors
exceeds bitflip_threshold. The problem is that bitflip_threshold is
uninitialized at that time, it is initialized in add_mtd_device which is
called after nand_scan. This is seen on the mxc_nand controller, but
probably on other drivers aswell.

Am I missing something?

Sascha

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-05 22:06 flash bbt broken due to unitialized bitflip_threshold? Sascha Hauer
@ 2012-06-06  9:50 ` Shmulik Ladkani
  2012-06-06 13:30   ` Artem Bityutskiy
  0 siblings, 1 reply; 15+ messages in thread
From: Shmulik Ladkani @ 2012-06-06  9:50 UTC (permalink / raw)
  To: Sascha Hauer, Mike Dunn; +Cc: Artem Bityutskiy, linux-mtd

Hi Sascha,

On Wed, 6 Jun 2012 00:06:47 +0200 Sascha Hauer <s.hauer@pengutronix.de> wrote:
> Hi All,
> 
> nand_scan_tail calls chip->scan_bbt which in case of a flash based bbt
> calls mtd_read. mtd_read returns -EUCLEAN when the number of errors
> exceeds bitflip_threshold. The problem is that bitflip_threshold is
> uninitialized at that time, it is initialized in add_mtd_device which is
> called after nand_scan. This is seen on the mxc_nand controller, but
> probably on other drivers aswell.
> 
> Am I missing something?

Yep, you are correct.
The result is that the BBTs will always be scrubbed (re-programmed).

I think the below fix is missing, should do the job.

Mike, does it look sane to you?

It assigns a 'bitflip_threshold' for the master mtd
(defaults to ecc_strength unless previously set by the driver).

It makes page reads of the BBT area adhere to an erase-block "health"
policy, either initially set by the driver, or set by default to
ecc_strength.

Note that if multiple partitions are later registered, one can't
later override the master's bitflip_threshold via sysfs.
But I guess this isn't important as the BBT is read once, when the nand
is probed.

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 61805e7..46c7d2b 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -3484,6 +3484,8 @@ int nand_scan_tail(struct mtd_info *mtd)
 	/* propagate ecc info to mtd_info */
 	mtd->ecclayout = chip->ecc.layout;
 	mtd->ecc_strength = chip->ecc.strength;
+	if (!mtd->bitflip_threshold)
+		mtd->bitflip_threshold = mtd->ecc_strength;
 
 	/* Check, if we should skip the bad block table scan */
 	if (chip->options & NAND_SKIP_BBTSCAN)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-06  9:50 ` Shmulik Ladkani
@ 2012-06-06 13:30   ` Artem Bityutskiy
  2012-06-06 15:15     ` Shmulik Ladkani
  0 siblings, 1 reply; 15+ messages in thread
From: Artem Bityutskiy @ 2012-06-06 13:30 UTC (permalink / raw)
  To: Shmulik Ladkani; +Cc: linux-mtd, Sascha Hauer, Mike Dunn

[-- Attachment #1: Type: text/plain, Size: 471 bytes --]

On Wed, 2012-06-06 at 12:50 +0300, Shmulik Ladkani wrote:
> +	if (!mtd->bitflip_threshold)
> +		mtd->bitflip_threshold = mtd->ecc_strength;

Hmm, why se default is to report bit-flips only when one more flipping
bit would cause an ECC error? The default be 1 - report on any bit-flip.
The same should be done in 'add_mtd_device()' - we should preserve the
old behavior, as it was before these patches. Do I miss something?

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-06 13:30   ` Artem Bityutskiy
@ 2012-06-06 15:15     ` Shmulik Ladkani
  2012-06-06 15:46       ` Artem Bityutskiy
  0 siblings, 1 reply; 15+ messages in thread
From: Shmulik Ladkani @ 2012-06-06 15:15 UTC (permalink / raw)
  To: artem.bityutskiy; +Cc: linux-mtd, Sascha Hauer, Mike Dunn

Hi Artem,

On Wed, 06 Jun 2012 16:30:53 +0300 Artem Bityutskiy <artem.bityutskiy@linux.intel.com> wrote:
> On Wed, 2012-06-06 at 12:50 +0300, Shmulik Ladkani wrote:
> > +	if (!mtd->bitflip_threshold)
> > +		mtd->bitflip_threshold = mtd->ecc_strength;
> 
> Hmm, why se default is to report bit-flips only when one more flipping
> bit would cause an ECC error? The default be 1 - report on any bit-flip.
> The same should be done in 'add_mtd_device()' - we should preserve the
> old behavior, as it was before these patches. Do I miss something?

Mike's patchset modified behavior of mtd_read() to return EUCLEAN only
if max number of bitflips (per ecc step) exceeds the bitflip_threshold:

	return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;

mtd->bitflip_threshold can be set in the following ways:
- By the lowlevel driver
- By 'add_mtd_device'. Here it defaults to 'ecc_strength' if NOT
  previously set by the driver.
- Using sysfs attribute

Currently, no driver explicitly sets bitflip_threshold, so it defaults
to ecc_strength. This is not necessarily 1; it depends on the hardware
driver, or software algorithm, etc...

So the "old behavior" was not preserved by the patchset, at least for
those setups with ecc_strength greater than 1. And I don't think the
intention was to preserve 'bitflip_threshold' as 1.

As Sascha spotted, the patchset created a problem for 'scan_bbt'.
This is since 'scan_bbt' calls 'mtd_read()' to read the BBT pages,
however 'bitflip_threshold' is usually NOT YET assigned at that point,
which results in mtd_read's condition (quoted above) to ALWAYS
return -EUCLEAN.
This leads to constantly scrubbing the BBT. Ouch.

My suggestion is to assign the default value of 'ecc_strength' to
'bitflip_threshold' at the end of 'nand_scan_tail', PRIOR the call to
scan_bbt().
Hence, BBT will be scrubbed only if maximum bitflips exceeded the
'ecc_strength' of this mtd (which is the default behavior).

Does it make sense to you?

Or would you like to scrub the BBT upon every bitflip, regardless
ecc_strength?

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-06 15:15     ` Shmulik Ladkani
@ 2012-06-06 15:46       ` Artem Bityutskiy
  2012-06-06 16:08         ` Shmulik Ladkani
  2012-06-06 17:55         ` Ivan Djelic
  0 siblings, 2 replies; 15+ messages in thread
From: Artem Bityutskiy @ 2012-06-06 15:46 UTC (permalink / raw)
  To: Shmulik Ladkani; +Cc: Mike Dunn, Sascha Hauer, linux-mtd

[-- Attachment #1: Type: text/plain, Size: 699 bytes --]

On Wed, 2012-06-06 at 18:15 +0300, Shmulik Ladkani wrote:
> - By 'add_mtd_device'. Here it defaults to 'ecc_strength' if NOT
>   previously set by the driver.

But this is wrong. If I use the old doc2000 driver, with ecc_strength =
2, and it works fine for me, and I am happy that UBI scrubs for a single
bit-flip, why should my system become broken because someone decided
that now UBI should start scrubbing on 2 bit-flips?

We should not change the defaults - if I do not set the threshold via
sysfs of in the driver, it should be 1.

Unless I am completely confused, we should change this, CC -stable if
needed, and ask dwmw2 to merge that.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-06 15:46       ` Artem Bityutskiy
@ 2012-06-06 16:08         ` Shmulik Ladkani
  2012-06-06 17:55         ` Ivan Djelic
  1 sibling, 0 replies; 15+ messages in thread
From: Shmulik Ladkani @ 2012-06-06 16:08 UTC (permalink / raw)
  To: artem.bityutskiy; +Cc: Mike Dunn, Sascha Hauer, linux-mtd

Hi,

On Wed, 06 Jun 2012 18:46:15 +0300 Artem Bityutskiy <artem.bityutskiy@linux.intel.com> wrote:
> On Wed, 2012-06-06 at 18:15 +0300, Shmulik Ladkani wrote:
> > - By 'add_mtd_device'. Here it defaults to 'ecc_strength' if NOT
> >   previously set by the driver.
> 
> But this is wrong. If I use the old doc2000 driver, with ecc_strength =
> 2, and it works fine for me, and I am happy that UBI scrubs for a single
> bit-flip, why should my system become broken because someone decided
> that now UBI should start scrubbing on 2 bit-flips?

As I remember, the motivation was to reduce unnecessary scrubbing for
devices exposing high rate of bitflips but having strong ECC to
compensate for.
For these users, the system was "broken" in their perspective, as it
performed too many scrubbing...

Anyway, I understand your point preserving the old behavior.
I thought you were aware of this change.

> We should not change the defaults - if I do not set the threshold via
> sysfs of in the driver, it should be 1.

Fair point. This is safer, backwords compatible.

But eventually, wouldn't we end up with all the drivers assigning
bitflip_threshold to ecc_strength?

> Unless I am completely confused, we should change this, CC -stable if
> needed, and ask dwmw2 to merge that.

No you're not confused.
If we'd like to preserve old behavior, the default assignment needs to
be changed.

BTW this is independent of the 'scan_bbt' bug spotted by Sascha.

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-06 15:46       ` Artem Bityutskiy
  2012-06-06 16:08         ` Shmulik Ladkani
@ 2012-06-06 17:55         ` Ivan Djelic
  2012-06-07  7:36           ` Artem Bityutskiy
  2012-06-07  7:43           ` Artem Bityutskiy
  1 sibling, 2 replies; 15+ messages in thread
From: Ivan Djelic @ 2012-06-06 17:55 UTC (permalink / raw)
  To: Artem Bityutskiy
  Cc: linux-mtd@lists.infradead.org, Sascha Hauer, Mike Dunn,
	Shmulik Ladkani

On Wed, Jun 06, 2012 at 04:46:15PM +0100, Artem Bityutskiy wrote:
> On Wed, 2012-06-06 at 18:15 +0300, Shmulik Ladkani wrote:
> > - By 'add_mtd_device'. Here it defaults to 'ecc_strength' if NOT
> >   previously set by the driver.
> 
> But this is wrong. If I use the old doc2000 driver, with ecc_strength =
> 2, and it works fine for me, and I am happy that UBI scrubs for a single
> bit-flip, why should my system become broken because someone decided
> that now UBI should start scrubbing on 2 bit-flips?
> 
> We should not change the defaults - if I do not set the threshold via
> sysfs of in the driver, it should be 1.
> 
> Unless I am completely confused, we should change this, CC -stable if
> needed, and ask dwmw2 to merge that.
> 

Hi Artem,

The "safest" bitflip_threshold value actually depends on the nand device you are using.
If I want to use today's 4-bit and 8-bit nand devices, a threshold of 1
will make my system unusable (because of constant scrubbing and block torturing).

The rationale behind Mike's patch is that it should be safe to delay block scrubbing
until the nand controller has reached its maximum correction capability; at this point,
any additional bitflip would produce uncorrectable errors, so scrubbing is necessary.
This is the recommended policy in nand datasheets when the device ecc requirement matches
the controller capability. In your doc2000 example, why would it break with a threshold of 2 ?

On the other hand, I agree that we should not force this kind of changes on existing systems.
Let us list use cases:

1. on legacy systems with 1-bit nand and strength = 1, default bitflip_threshold is 1
2. on legacy systems with 1-bit nand and strength > 1, default bitflip_threshold is 'strength'
3. on new systems   with 2+ bit nand and strength > 1, default bitflip_threshold is 'strength'

Case 1 is OK, no change.

Case 2 introduces an unwanted change.

Case 3 (which was not properly handled until Mike's patch) seems OK to me, it fits current datasheet
requirements and the driver still has the possibility to explicitly set bitflip_threshold on a
per-board basis (because it really depends on the nand device).

So rather than reverting to a default bitflip_threshold of 1 for everyone, I suggest we revert the
change on case 2 legacy systems like the doc2000 by explicitly setting bitflip_threshold to 1 in the driver.
What do you think ?

BR,
-- 
Ivan Djelic

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-06 17:55         ` Ivan Djelic
@ 2012-06-07  7:36           ` Artem Bityutskiy
  2012-06-07 14:02             ` Shmulik Ladkani
  2012-06-07 17:34             ` Mike Dunn
  2012-06-07  7:43           ` Artem Bityutskiy
  1 sibling, 2 replies; 15+ messages in thread
From: Artem Bityutskiy @ 2012-06-07  7:36 UTC (permalink / raw)
  To: Ivan Djelic
  Cc: Mike Dunn, Sascha Hauer, linux-mtd@lists.infradead.org,
	Shmulik Ladkani

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

On Wed, 2012-06-06 at 19:55 +0200, Ivan Djelic wrote:
> 1. on legacy systems with 1-bit nand and strength = 1, default bitflip_threshold is 1
> 2. on legacy systems with 1-bit nand and strength > 1, default bitflip_threshold is 'strength'
> 3. on new systems   with 2+ bit nand and strength > 1, default bitflip_threshold is 'strength'

Ivan, Shmulik,

I've gave this another though, and I think it is OK to leave this as it
is now. Your replies were very helpful, thanks.

Shmulik, would you please send your BBT read fix as a nice patch?
Please, add a not to the commit message that this fixes an issue
introduced in 3.5-rc1 and should be merged before 3.5. Thanks!

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-06 17:55         ` Ivan Djelic
  2012-06-07  7:36           ` Artem Bityutskiy
@ 2012-06-07  7:43           ` Artem Bityutskiy
  1 sibling, 0 replies; 15+ messages in thread
From: Artem Bityutskiy @ 2012-06-07  7:43 UTC (permalink / raw)
  To: Ivan Djelic
  Cc: Mike Dunn, Sascha Hauer, linux-mtd@lists.infradead.org,
	Shmulik Ladkani

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

On Wed, 2012-06-06 at 19:55 +0200, Ivan Djelic wrote:
> In your doc2000 example, why would it break with a threshold of 2 ?

Yes, "break" was a wrong word. It would just change the behavior. But I
guess it would in fact _improve_ the behavior. Let's not try treat the
legacy specially, I think we can indeed treat this change as an
improvement and se if anyone complains.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-07  7:36           ` Artem Bityutskiy
@ 2012-06-07 14:02             ` Shmulik Ladkani
  2012-06-07 17:34             ` Mike Dunn
  1 sibling, 0 replies; 15+ messages in thread
From: Shmulik Ladkani @ 2012-06-07 14:02 UTC (permalink / raw)
  To: artem.bityutskiy
  Cc: Mike Dunn, Ivan Djelic, linux-mtd@lists.infradead.org,
	Sascha Hauer

On Thu, 07 Jun 2012 10:36:10 +0300 Artem Bityutskiy <artem.bityutskiy@linux.intel.com> wrote:
> Shmulik, would you please send your BBT read fix as a nice patch?
> Please, add a not to the commit message that this fixes an issue
> introduced in 3.5-rc1 and should be merged before 3.5. Thanks!

Will do.

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-07  7:36           ` Artem Bityutskiy
  2012-06-07 14:02             ` Shmulik Ladkani
@ 2012-06-07 17:34             ` Mike Dunn
  2012-06-07 21:07               ` Shmulik Ladkani
  2012-06-10  7:08               ` Shmulik Ladkani
  1 sibling, 2 replies; 15+ messages in thread
From: Mike Dunn @ 2012-06-07 17:34 UTC (permalink / raw)
  To: artem.bityutskiy
  Cc: Ivan Djelic, linux-mtd@lists.infradead.org, Sascha Hauer,
	Shmulik Ladkani

On 06/07/2012 12:36 AM, Artem Bityutskiy wrote:
> On Wed, 2012-06-06 at 19:55 +0200, Ivan Djelic wrote:
>> 1. on legacy systems with 1-bit nand and strength = 1, default bitflip_threshold is 1
>> 2. on legacy systems with 1-bit nand and strength > 1, default bitflip_threshold is 'strength'
>> 3. on new systems   with 2+ bit nand and strength > 1, default bitflip_threshold is 'strength'
> 
> Ivan, Shmulik,
> 
> I've gave this another though, and I think it is OK to leave this as it
> is now. Your replies were very helpful, thanks.


Yes, many thanks guys.  It seems I picked the wrong couple days to be away from
email.

This is not an issue on my docg4 because it does not use a flash-based BBT, but
instead scans the whole device for blocks that are marked bad in oob. EUCLEAN is
ignored in this case.  The following code is present in both scan_block_full()
and scan_block_fast():

	/* Ignore ECC errors when checking for BBM */
	if (ret && !mtd_is_bitflip_or_eccerr(ret))
		return ret;

Digging into this, it turns out this is a problem only in the case of:

  (1) nand->td != NULL (flash-based BBT present)
  (2) NAND_BBT_NO_OOB is not set

Here's the call stack for the above case, and with NAND_BBT_ABSPAGE not set
(this is true for the mxc_nand controller).  The problem occurs in
scan_read_raw_oob()...

nand_scan_bbt()
|
+-> search_read_bbts()                ignores return code
    |
    +-> search_bbt()                  always returns 1
        |
        +-> scan_read_raw()           -EUCLEAN propagated up
            |
            +-> scan_read_raw_oob()   returns without updating buf, len, offs
                |
                +-> mtd_read_oob()    -EUCLEAN returned


I addition to the patch suggested by Shmulik, I would also suggest the
following, in the interest of consistency with the bad block scanning code, and
also thoroughness:

diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
index 30d1319..ed59aa8 100644
--- a/drivers/mtd/nand/nand_bbt.c
+++ b/drivers/mtd/nand/nand_bbt.c
@@ -319,7 +319,7 @@ static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t
*buf, loff_t offs,

                res = mtd_read_oob(mtd, offs, &ops);

-               if (res)
+               if (res && !mtd_is_bitflip_or_eccerr(res))
                        return res;

                buf += mtd->oobsize + mtd->writesize;

Shmulik, please let me know if yuo'd like me to submit the patch you suggested,
and I will do so promptly.  Otherwise, thanks again!

More gory details... by comparison, here's the call stack for the same case,
except NAND_BBT_NO_OOB is set.  Here, there's no problem.

nand_scan_bbt()
|
+-> search_read_bbts()                ignores return code
    |
    +-> search_bbt()                  always returns 1
        |
        +-> scan_read_raw()           -EUCLEAN propagated up
            |
            +-> scan_read_raw_data()  -EUCLEAN propagated up
                |
                +-> mtd_read_oob()    -EUCLEAN returned


Thanks, and sorry for the oversight,
Mike

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-07 17:34             ` Mike Dunn
@ 2012-06-07 21:07               ` Shmulik Ladkani
  2012-06-10  7:08               ` Shmulik Ladkani
  1 sibling, 0 replies; 15+ messages in thread
From: Shmulik Ladkani @ 2012-06-07 21:07 UTC (permalink / raw)
  To: Mike Dunn, artem.bityutskiy
  Cc: Ivan Djelic, linux-mtd@lists.infradead.org, Sascha Hauer

Hi Mike,

On Thu, 07 Jun 2012 10:34:42 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
> I addition to the patch suggested by Shmulik, I would also suggest the
> following, in the interest of consistency with the bad block scanning code, and
> also thoroughness:
> 
> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
> index 30d1319..ed59aa8 100644
> --- a/drivers/mtd/nand/nand_bbt.c
> +++ b/drivers/mtd/nand/nand_bbt.c
> @@ -319,7 +319,7 @@ static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t
> *buf, loff_t offs,
> 
>                 res = mtd_read_oob(mtd, offs, &ops);
> 
> -               if (res)
> +               if (res && !mtd_is_bitflip_or_eccerr(res))
>                         return res;
> 
>                 buf += mtd->oobsize + mtd->writesize;

Saw you submitted it.

I'll review it thoroughly later on, didn't grasp it on first glance.

> Shmulik, please let me know if yuo'd like me to submit the patch you suggested,
> and I will do so promptly.  Otherwise, thanks again!

Thanks Mike.

I'll work on the patch, as I already digged further, and found two
more problematic spots where same "threshold initialization" is needed.
Obviously, your review would be much appreciated.
I'll probably submit during Friday or Sunday.

Artem, is the ETA ok with your merge plans?

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-07 17:34             ` Mike Dunn
  2012-06-07 21:07               ` Shmulik Ladkani
@ 2012-06-10  7:08               ` Shmulik Ladkani
  2012-06-22 20:39                 ` Brian Norris
  1 sibling, 1 reply; 15+ messages in thread
From: Shmulik Ladkani @ 2012-06-10  7:08 UTC (permalink / raw)
  To: Mike Dunn
  Cc: artem.bityutskiy, Ivan Djelic, dwmw2,
	linux-mtd@lists.infradead.org, Sascha Hauer

Hi,

On Thu, 07 Jun 2012 10:34:42 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
> I addition to the patch suggested by Shmulik, I would also suggest the
> following, in the interest of consistency with the bad block scanning code, and
> also thoroughness:
> 
> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
> index 30d1319..ed59aa8 100644
> --- a/drivers/mtd/nand/nand_bbt.c
> +++ b/drivers/mtd/nand/nand_bbt.c
> @@ -319,7 +319,7 @@ static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t
> *buf, loff_t offs,
> 
>                 res = mtd_read_oob(mtd, offs, &ops);
> 
> -               if (res)
> +               if (res && !mtd_is_bitflip_or_eccerr(res))
>                         return res;
> 
>                 buf += mtd->oobsize + mtd->writesize;

While I'm trying to figure out the issue this patch adresses, I found
out we have an inconsistent API, WRT reporting -EUCLEAN.

The 'mtd_read()' API is currently responsible for returning EUCLEAN,
after examining return value of the 'mtd->_read()' method, quote:

	ret_code = mtd->_read(mtd, from, len, retlen, buf);
	if (unlikely(ret_code < 0))
		return ret_code;
	if (mtd->ecc_strength == 0)
		return 0;	/* device lacks ecc */
	return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;

However, 'mtd_read_oob()' does NOT have the 'ret_code >= bitflip_threshold'
comparison, quote:

static inline int mtd_read_oob(struct mtd_info *mtd, loff_t from,
			       struct mtd_oob_ops *ops)
{
	ops->retlen = ops->oobretlen = 0;
	if (!mtd->_read_oob)
		return -EOPNOTSUPP;
	return mtd->_read_oob(mtd, from, ops);
}

Meaning, 'mtd_read_oob()' returns the direct result of the '_read_oob'
call to the MTD user.

Now let's examine what '_read_oob' returns in the nand case, implemented
in 'nand_read_oob'; There are 2 possible uses of 'mtd_read_oob':

1. Supplying NULL 'ops->datbuf', to indicate only the OOB needs to be
   read.
   Execution goes into 'nand_do_read_oob()', which may return EUCLEAN
   (independently of the bitflip_threshold).
   OK. This was discussed in [1].

2. Supplying a non-null 'ops->datbuf', to indicate read the page content
   along with its OOB.
   Execution goes into 'nand_do_read_ops', which returns max_bitflips.
   Not OK.

To summarize, return value of the 'mtd_read_oob()' API is
inconsistent (sometimes max_bitflips, other times EUCLEAN), and does not
adhere to return code policy of 'mtd_read()' - return EUCLEAN to the users,
they're not interested with internals such as max_bitflips.

This might affect users of 'mtd_read_oob()' which might falsely think
the OOB read has failed.

Mike, care to take a look and amend if necessary?
I think this needs to be fixed pre v3.5 as well.

Regards,
Shmulik

[1] http://lists.infradead.org/pipermail/linux-mtd/2012-May/041407.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-10  7:08               ` Shmulik Ladkani
@ 2012-06-22 20:39                 ` Brian Norris
  2012-06-25 17:44                   ` Mike Dunn
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Norris @ 2012-06-22 20:39 UTC (permalink / raw)
  To: Artem Bityutskiy, Mike Dunn
  Cc: linux-mtd@lists.infradead.org, Ivan Djelic, dwmw2, Sascha Hauer

On Sun, Jun 10, 2012 at 12:08 AM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> To summarize, return value of the 'mtd_read_oob()' API is
> inconsistent (sometimes max_bitflips, other times EUCLEAN), and does not
> adhere to return code policy of 'mtd_read()' - return EUCLEAN to the users,
> they're not interested with internals such as max_bitflips.
>
> This might affect users of 'mtd_read_oob()' which might falsely think
> the OOB read has failed.
>
> Mike, care to take a look and amend if necessary?
> I think this needs to be fixed pre v3.5 as well.

Mike, are you planning on handling this?

I think it would be safe to basically C&P the max_bitflips code from
mtd_read() into mtd_read_oob(). This would simply pass through the
EUCLEAN that may be returned for OOB-only case (only for my driver?),
and it would translate the 'ret_code > 0' case properly for data+OOB
reads. I'll resend officially if this looks OK. Or I'll defer to Mike
if he's working on this.

Brian

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 5757307..37c9820 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -821,6 +821,27 @@ int mtd_read(struct mtd_info *mtd, loff_t from,
size_t len, size_t *retlen,
 }
 EXPORT_SYMBOL_GPL(mtd_read);

+int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops)
+{
+       int ret_code;
+       ops->retlen = ops->oobretlen = 0;
+       if (!mtd->_read_oob)
+               return -EOPNOTSUPP;
+       /*
+        * In cases where ops->datbuf != NULL, mtd->_read_oob() can have
+        * semantics similar to mtd->_read(), regarding max bitflips. In other
+        * cases, mtd->_read_oob() may return -EUCLEAN. In all cases, perform
+        * similar logic to mtd_read() (see above).
+        */
+       ret_code = mtd->_read_oob(mtd, from, ops);
+       if (unlikely(ret_code < 0))
+               return ret_code;
+       if (mtd->ecc_strength == 0)
+               return 0;       /* device lacks ecc */
+       return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;
+}
+EXPORT_SYMBOL_GPL(mtd_read_oob);
+
 int mtd_write(struct mtd_info *mtd, loff_t to, size_t len, size_t *retlen,
              const u_char *buf)
 {
diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
index 63dadc0..81d61e7 100644
--- a/include/linux/mtd/mtd.h
+++ b/include/linux/mtd/mtd.h
@@ -265,14 +265,7 @@ int mtd_write(struct mtd_info *mtd, loff_t to,
size_t len, size_t *retlen,
 int mtd_panic_write(struct mtd_info *mtd, loff_t to, size_t len,
size_t *retlen,
                    const u_char *buf);

-static inline int mtd_read_oob(struct mtd_info *mtd, loff_t from,
-                              struct mtd_oob_ops *ops)
-{
-       ops->retlen = ops->oobretlen = 0;
-       if (!mtd->_read_oob)
-               return -EOPNOTSUPP;
-       return mtd->_read_oob(mtd, from, ops);
-}
+int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops *ops);

 static inline int mtd_write_oob(struct mtd_info *mtd, loff_t to,
                                struct mtd_oob_ops *ops)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: flash bbt broken due to unitialized bitflip_threshold?
  2012-06-22 20:39                 ` Brian Norris
@ 2012-06-25 17:44                   ` Mike Dunn
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Dunn @ 2012-06-25 17:44 UTC (permalink / raw)
  To: Brian Norris
  Cc: Artem Bityutskiy, Ivan Djelic, dwmw2, Sascha Hauer,
	linux-mtd@lists.infradead.org

On 06/22/2012 01:39 PM, Brian Norris wrote:
> On Sun, Jun 10, 2012 at 12:08 AM, Shmulik Ladkani
> <shmulik.ladkani@gmail.com> wrote:
>> To summarize, return value of the 'mtd_read_oob()' API is
>> inconsistent (sometimes max_bitflips, other times EUCLEAN), and does not
>> adhere to return code policy of 'mtd_read()' - return EUCLEAN to the users,
>> they're not interested with internals such as max_bitflips.
>>
>> This might affect users of 'mtd_read_oob()' which might falsely think
>> the OOB read has failed.
>>
>> Mike, care to take a look and amend if necessary?
>> I think this needs to be fixed pre v3.5 as well.
> 
> Mike, are you planning on handling this?
> 
> I think it would be safe to basically C&P the max_bitflips code from
> mtd_read() into mtd_read_oob(). This would simply pass through the
> EUCLEAN that may be returned for OOB-only case (only for my driver?),
> and it would translate the 'ret_code > 0' case properly for data+OOB
> reads. I'll resend officially if this looks OK. Or I'll defer to Mike
> if he's working on this.


I'm sorry Brian, I missed Shmulik's post and thought this issue was put to bed
with the patch that initialized bitflip_threshold before the nand_scan_tail()
call.

I will take a look at your patch set that addresses this.  Sorry for dropping
the ball.

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-06-25 17:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-05 22:06 flash bbt broken due to unitialized bitflip_threshold? Sascha Hauer
2012-06-06  9:50 ` Shmulik Ladkani
2012-06-06 13:30   ` Artem Bityutskiy
2012-06-06 15:15     ` Shmulik Ladkani
2012-06-06 15:46       ` Artem Bityutskiy
2012-06-06 16:08         ` Shmulik Ladkani
2012-06-06 17:55         ` Ivan Djelic
2012-06-07  7:36           ` Artem Bityutskiy
2012-06-07 14:02             ` Shmulik Ladkani
2012-06-07 17:34             ` Mike Dunn
2012-06-07 21:07               ` Shmulik Ladkani
2012-06-10  7:08               ` Shmulik Ladkani
2012-06-22 20:39                 ` Brian Norris
2012-06-25 17:44                   ` Mike Dunn
2012-06-07  7:43           ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox