[RFC/PATCH] mtd: mtd_read: Fix bitflips_threshold comparison to allow max bitflips

linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [RFC/PATCH] mtd: mtd_read: Fix bitflips_threshold comparison to allow max bitflips
@ 2013-12-30 12:40 Stefan Roese
  2014-01-08  7:29 ` Stefan Roese
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Roese @ 2013-12-30 12:40 UTC (permalink / raw)
  To: linux-mtd; +Cc: Brian Norris, Pekon Gupta

On a custom AM335x based platform with a Toshiba NAND device
(TC58NVG1S3H) we are currently seeing quite a few of these UBI messages:

[   18.044967] UBI: fixable bit-flip detected at PEB 50
[   18.050252] UBI: schedule PEB 50 for scrubbing
...

After a bit debugging I found that those messages are only printed when
the OMAP NAND driver has detected 8 (corrected) bitflips / 512 bytes on
a read. We're using HW BCH8 and the Toshiba chip supports 8 bit ECC for
each 512Byte. I was wondering why 8 bitflips resulted in these UBI
messages and e.g. 7 bitflips didn't. Hence I discovered the comparison
"ret_code >= mtd->bitflip_threshold" in mtd_read().

With this patch applied all tests (UBIFS) I've done so far didn't produce
any of these "UBI: fixable bit-flip" messages any more.

Note that I'm sending this patch as RFC for now. To get some feedback
from other MTD / NAND developers on this issue. The main question is:
Should mtd_read() return -EUCLEAN if the corrected bitflips are equal to
the bitflip-threshold value? Or should it return 0 since the bitflips
have been corrected?

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/mtdcore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 92311a5..28500a1 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -824,7 +824,7 @@ int mtd_read(struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen,
 		return ret_code;
 	if (mtd->ecc_strength == 0)
 		return 0;	/* device lacks ecc */
-	return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;
+	return ret_code > mtd->bitflip_threshold ? -EUCLEAN : 0;
 }
 EXPORT_SYMBOL_GPL(mtd_read);

-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC/PATCH] mtd: mtd_read: Fix bitflips_threshold comparison to allow max bitflips
  2013-12-30 12:40 [RFC/PATCH] mtd: mtd_read: Fix bitflips_threshold comparison to allow max bitflips Stefan Roese
@ 2014-01-08  7:29 ` Stefan Roese
  2014-01-11 19:06   ` Brian Norris
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Roese @ 2014-01-08  7:29 UTC (permalink / raw)
  To: linux-mtd; +Cc: Brian Norris, Pekon Gupta

On 30.12.2013 13:40, Stefan Roese wrote:
> On a custom AM335x based platform with a Toshiba NAND device
> (TC58NVG1S3H) we are currently seeing quite a few of these UBI messages:
> 
> [   18.044967] UBI: fixable bit-flip detected at PEB 50
> [   18.050252] UBI: schedule PEB 50 for scrubbing
> ...
> 
> After a bit debugging I found that those messages are only printed when
> the OMAP NAND driver has detected 8 (corrected) bitflips / 512 bytes on
> a read. We're using HW BCH8 and the Toshiba chip supports 8 bit ECC for
> each 512Byte. I was wondering why 8 bitflips resulted in these UBI
> messages and e.g. 7 bitflips didn't. Hence I discovered the comparison
> "ret_code >= mtd->bitflip_threshold" in mtd_read().
> 
> With this patch applied all tests (UBIFS) I've done so far didn't produce
> any of these "UBI: fixable bit-flip" messages any more.
> 
> Note that I'm sending this patch as RFC for now. To get some feedback
> from other MTD / NAND developers on this issue. The main question is:
> Should mtd_read() return -EUCLEAN if the corrected bitflips are equal to
> the bitflip-threshold value? Or should it return 0 since the bitflips
> have been corrected?

Brian, do you have any comments? Is this patch good as is? Should I
resend it as non-RFC?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC/PATCH] mtd: mtd_read: Fix bitflips_threshold comparison to allow max bitflips
  2014-01-08  7:29 ` Stefan Roese
@ 2014-01-11 19:06   ` Brian Norris
  2014-01-13  8:03     ` Ricard Wanderlof
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Norris @ 2014-01-11 19:06 UTC (permalink / raw)
  To: Stefan Roese; +Cc: linux-mtd, Pekon Gupta, Artem Bityutskiy

On Wed, Jan 08, 2014 at 08:29:32AM +0100, Stefan Roese wrote:
> On 30.12.2013 13:40, Stefan Roese wrote:
> > On a custom AM335x based platform with a Toshiba NAND device
> > (TC58NVG1S3H) we are currently seeing quite a few of these UBI messages:
> > 
> > [   18.044967] UBI: fixable bit-flip detected at PEB 50
> > [   18.050252] UBI: schedule PEB 50 for scrubbing
> > ...
> > 
> > After a bit debugging I found that those messages are only printed when
> > the OMAP NAND driver has detected 8 (corrected) bitflips / 512 bytes on
> > a read. We're using HW BCH8 and the Toshiba chip supports 8 bit ECC for
> > each 512Byte. I was wondering why 8 bitflips resulted in these UBI
> > messages and e.g. 7 bitflips didn't. Hence I discovered the comparison
> > "ret_code >= mtd->bitflip_threshold" in mtd_read().
> > 
> > With this patch applied all tests (UBIFS) I've done so far didn't produce
> > any of these "UBI: fixable bit-flip" messages any more.
> > 
> > Note that I'm sending this patch as RFC for now. To get some feedback
> > from other MTD / NAND developers on this issue. The main question is:
> > Should mtd_read() return -EUCLEAN if the corrected bitflips are equal to
> > the bitflip-threshold value? Or should it return 0 since the bitflips
> > have been corrected?

-EUCLEAN is a purposeful part of the MTD API. It means that mtd_read()
encountered some high level of (correctable) bitflips, and MTD is
intentionally informing the upper layer(s) that they may want to take
corrective action on scrubbing the affected area, so that it doesn't
accumulate more bitflips and corrupt your data. MTD is acting correctly.

At a higher layer, I think UBI should be scrubbing the block (erasing
and moving or rewriting data) at least once, to refresh the data. After
that, wear leveling *should* prevent the block from being reused in the
near future. If that isn't the case, then we may need to fix UBI.

> Brian, do you have any comments? Is this patch good as is? Should I
> resend it as non-RFC?

No, this patch is not acceptable. It effectively ignores all bitflips
until it's too late (i.e., there are more bitflips than we can correct).
Also, if you really did want this effect, you could simply increase your
threshold to 9 via sysfs. But I don't recommend that.

Brian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC/PATCH] mtd: mtd_read: Fix bitflips_threshold comparison to allow max bitflips
  2014-01-11 19:06   ` Brian Norris
@ 2014-01-13  8:03     ` Ricard Wanderlof
  0 siblings, 0 replies; 4+ messages in thread
From: Ricard Wanderlof @ 2014-01-13  8:03 UTC (permalink / raw)
  To: Brian Norris
  Cc: Stefan Roese, linux-mtd@lists.infradead.org, Pekon Gupta,
	Artem Bityutskiy


On Sat, 11 Jan 2014, Brian Norris wrote:

>>> After a bit debugging I found that those messages are only printed when
>>> the OMAP NAND driver has detected 8 (corrected) bitflips / 512 bytes on
>>> a read. We're using HW BCH8 and the Toshiba chip supports 8 bit ECC for
>>> each 512Byte. I was wondering why 8 bitflips resulted in these UBI
>>> messages and e.g. 7 bitflips didn't. Hence I discovered the comparison
>>> "ret_code >= mtd->bitflip_threshold" in mtd_read().
>>>
>>> With this patch applied all tests (UBIFS) I've done so far didn't produce
>>> any of these "UBI: fixable bit-flip" messages any more.
>>>
>>> Note that I'm sending this patch as RFC for now. To get some feedback
>>> from other MTD / NAND developers on this issue. The main question is:
>>> Should mtd_read() return -EUCLEAN if the corrected bitflips are equal to
>>> the bitflip-threshold value? Or should it return 0 since the bitflips
>>> have been corrected?
>
> -EUCLEAN is a purposeful part of the MTD API. It means that mtd_read()
> encountered some high level of (correctable) bitflips, and MTD is
> intentionally informing the upper layer(s) that they may want to take
> corrective action on scrubbing the affected area, so that it doesn't
> accumulate more bitflips and corrupt your data. MTD is acting correctly.
>
> At a higher layer, I think UBI should be scrubbing the block (erasing
> and moving or rewriting data) at least once, to refresh the data. After
> that, wear leveling *should* prevent the block from being reused in the
> near future. If that isn't the case, then we may need to fix UBI.
>
>> Brian, do you have any comments? Is this patch good as is? Should I
>> resend it as non-RFC?
>
> No, this patch is not acceptable. It effectively ignores all bitflips
> until it's too late (i.e., there are more bitflips than we can correct).
> Also, if you really did want this effect, you could simply increase your
> threshold to 9 via sysfs. But I don't recommend that.

Besides, assuming you are using BCH-8, then you can never get more than 8 
fixable bitflips, after that you'll get an 'uncorrectable' error. So 
essentially either applying the patch or setting the bitflip_threshold to 
9 results in no potential corrective action being taken (e.g. UBI 
scrubbing the block when it sees -EUCLEAN) until it's too late.

If anything I'd suggest lowering the bitflip_threshold a few steps under 
what the ECC algorithm can correct, e.g. 6 in the case of BCH-8, the 
rationale being that one could imagine getting 6 bitflips one day and 8 
bitflips another, i.e. the number of bitflips may not necessarily increase 
in steps of one.

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-01-13  8:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-30 12:40 [RFC/PATCH] mtd: mtd_read: Fix bitflips_threshold comparison to allow max bitflips Stefan Roese
2014-01-08  7:29 ` Stefan Roese
2014-01-11 19:06   ` Brian Norris
2014-01-13  8:03     ` Ricard Wanderlof

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).