public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] MTD: treat any negative return value from correct() as an error
@ 2007-10-17 21:33 mattjreimer
  2007-10-20 10:42 ` Jörn Engel
  0 siblings, 1 reply; 12+ messages in thread
From: mattjreimer @ 2007-10-17 21:33 UTC (permalink / raw)
  To: linux-mtd; +Cc: Matt Reimer

From: Matt Reimer <mreimer@vpop.net>

Treat any negative return value from a NAND driver's correct() function
as a failure, rather than just -1. Some drivers (e.g. diskonchip) can
return error values such as -EIO, which ended up being treated as the
number of bits corrected, which in turn resulted in nand_do_read_ops()
returning -EUCLEAN rather than -EBADMSG.

Signed-off-by: Matt Reimer <mreimer@vpop.net>
---
 drivers/mtd/nand/nand_base.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 24ac677..6d40edf 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -789,7 +789,7 @@ static int nand_read_page_swecc(struct mtd_info *mtd, struct nand_chip *chip,
 		int stat;
 
 		stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
-		if (stat == -1)
+		if (stat < 0)
 			mtd->ecc_stats.failed++;
 		else
 			mtd->ecc_stats.corrected += stat;
@@ -833,7 +833,7 @@ static int nand_read_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip,
 		int stat;
 
 		stat = chip->ecc.correct(mtd, p, &ecc_code[i], &ecc_calc[i]);
-		if (stat == -1)
+		if (stat < 0)
 			mtd->ecc_stats.failed++;
 		else
 			mtd->ecc_stats.corrected += stat;
@@ -874,7 +874,7 @@ static int nand_read_page_syndrome(struct mtd_info *mtd, struct nand_chip *chip,
 		chip->read_buf(mtd, oob, eccbytes);
 		stat = chip->ecc.correct(mtd, p, oob, NULL);
 
-		if (stat == -1)
+		if (stat < 0)
 			mtd->ecc_stats.failed++;
 		else
 			mtd->ecc_stats.corrected += stat;
-- 
1.5.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] MTD: treat any negative return value from correct() as an error
  2007-10-17 21:33 [PATCH] MTD: treat any negative return value from correct() as an error mattjreimer
@ 2007-10-20 10:42 ` Jörn Engel
  2007-10-20 11:11   ` [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16() Jörn Engel
  2007-10-20 16:37   ` [PATCH] MTD: treat any negative return value from correct() as an error Matt Reimer
  0 siblings, 2 replies; 12+ messages in thread
From: Jörn Engel @ 2007-10-20 10:42 UTC (permalink / raw)
  To: mattjreimer; +Cc: Matt Reimer, linux-mtd

On Wed, 17 October 2007 14:33:23 -0700, mattjreimer@gmail.com wrote:
> 
> Treat any negative return value from a NAND driver's correct() function
> as a failure, rather than just -1. Some drivers (e.g. diskonchip) can
> return error values such as -EIO, which ended up being treated as the
> number of bits corrected, which in turn resulted in nand_do_read_ops()
> returning -EUCLEAN rather than -EBADMSG.

What an interesting patch.

NACK on patch description.  Several things are wrong with this:
o -EIO shouldn't become -EBADMSG any more than -EUCLEAN.  It should
  remain -EIO and nothing else.
o I cannot see how diskonchip can return -EIO.  Only -ERANGE.
o rtc_from4_correct_data() and doc200x_correct_data() will basically
  pass on -ERANGE from decode_rs8() or decode_rs16(), respectively.
  Afaict decode_rsXX() should just BUG() instead.

However I like the patch itself, because it allows one to rename those
magic -1 return values to self-documenting -EBADMSG.

Jörn

-- 
It is the mark of an educated mind to be able to entertain a thought
without accepting it.
-- Aristotle

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16()
  2007-10-20 10:42 ` Jörn Engel
@ 2007-10-20 11:11   ` Jörn Engel
  2007-10-20 11:13     ` [PATCH] Replace -1 with -EBADMSG in nand error correction code Jörn Engel
                       ` (2 more replies)
  2007-10-20 16:37   ` [PATCH] MTD: treat any negative return value from correct() as an error Matt Reimer
  1 sibling, 3 replies; 12+ messages in thread
From: Jörn Engel @ 2007-10-20 11:11 UTC (permalink / raw)
  To: Jörn Engel
  Cc: David Woodhouse, Thomas Gleixner, Matt Reimer, linux-mtd,
	mattjreimer

Returning -ERANGE should never happen.

--- linux-2.6.22cow/lib/reed_solomon/decode_rs.c~rs_ERANGE	2006-10-13 15:58:51.000000000 +0200
+++ linux-2.6.22cow/lib/reed_solomon/decode_rs.c	2007-10-20 13:07:57.000000000 +0200
@@ -39,8 +39,7 @@
 
 	/* Check length parameter for validity */
 	pad = nn - nroots - len;
-	if (pad < 0 || pad >= nn)
-		return -ERANGE;
+	BUG_ON(pad < 0 || pad >= nn);
 
 	/* Does the caller provide the syndrome ? */
 	if (s != NULL)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] Replace -1 with -EBADMSG in nand error correction code
  2007-10-20 11:11   ` [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16() Jörn Engel
@ 2007-10-20 11:13     ` Jörn Engel
  2007-10-21  8:08       ` Thomas Gleixner
  2007-10-21  8:04     ` [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16() Thomas Gleixner
  2007-10-21  8:06     ` Thomas Gleixner
  2 siblings, 1 reply; 12+ messages in thread
From: Jörn Engel @ 2007-10-20 11:13 UTC (permalink / raw)
  To: Jörn Engel
  Cc: David Woodhouse, Thomas Gleixner, Matt Reimer, linux-mtd,
	mattjreimer

Magic numerical values are just bad style.  Particularly so when
undocumented.

--- linux-2.6.22cow/lib/reed_solomon/decode_rs.c~nand_minusone	2007-10-20 12:46:52.000000000 +0200
+++ linux-2.6.22cow/lib/reed_solomon/decode_rs.c	2007-10-20 12:51:12.000000000 +0200
@@ -202,7 +202,7 @@
 		 * deg(lambda) unequal to number of roots => uncorrectable
 		 * error detected
 		 */
-		count = -1;
+		count = -EBADMSG;
 		goto finish;
 	}
 	/*
--- linux-2.6.22cow/lib/reed_solomon/reed_solomon.c~rs_ERANGE	2007-05-16 02:02:00.000000000 +0200
+++ linux-2.6.22cow/lib/reed_solomon/reed_solomon.c	2007-10-20 12:48:12.000000000 +0200
@@ -320,6 +320,7 @@ EXPORT_SYMBOL_GPL(encode_rs8);
  *  The syndrome and parity uses a uint16_t data type to enable
  *  symbol size > 8. The calling code must take care of decoding of the
  *  syndrome result and the received parity before calling this code.
+ *  Returns the number of corrected bits or -EBADMSG for uncorrectable errors.
  */
 int decode_rs8(struct rs_control *rs, uint8_t *data, uint16_t *par, int len,
 	       uint16_t *s, int no_eras, int *eras_pos, uint16_t invmsk,
@@ -363,6 +364,7 @@ EXPORT_SYMBOL_GPL(encode_rs16);
  *  @corr:	buffer to store correction bitmask on eras_pos
  *
  *  Each field in the data array contains up to symbol size bits of valid data.
+ *  Returns the number of corrected bits or -EBADMSG for uncorrectable errors.
  */
 int decode_rs16(struct rs_control *rs, uint16_t *data, uint16_t *par, int len,
 		uint16_t *s, int no_eras, int *eras_pos, uint16_t invmsk,
--- linux-2.6.22cow/drivers/mtd/nand/diskonchip.c~nand_minusone	2007-10-20 12:29:54.000000000 +0200
+++ linux-2.6.22cow/drivers/mtd/nand/diskonchip.c	2007-10-20 12:50:14.000000000 +0200
@@ -225,7 +225,7 @@ static int doc_ecc_decode(struct rs_cont
 		}
 	}
 	/* If the parity is wrong, no rescue possible */
-	return parity ? -1 : nerr;
+	return parity ? -EBADMSG : nerr;
 }
 
 static void DoC_Delay(struct doc_priv *doc, unsigned short cycles)
@@ -1039,7 +1039,7 @@ static int doc200x_correct_data(struct m
 		WriteDOC(DOC_ECC_DIS, docptr, Mplus_ECCConf);
 	else
 		WriteDOC(DOC_ECC_DIS, docptr, ECCConf);
-	if (no_ecc_failures && (ret == -1)) {
+	if (no_ecc_failures && (ret == -EBADMSG)) {
 		printk(KERN_ERR "suppressing ECC failure\n");
 		ret = 0;
 	}
--- linux-2.6.22cow/drivers/mtd/nand/nand_ecc.c~nand_minusone	2007-02-12 11:33:26.000000000 +0100
+++ linux-2.6.22cow/drivers/mtd/nand/nand_ecc.c	2007-10-20 12:52:56.000000000 +0200
@@ -189,7 +189,7 @@ int nand_correct_data(struct mtd_info *m
 	if(countbits(s0 | ((uint32_t)s1 << 8) | ((uint32_t)s2 <<16)) == 1)
 		return 1;
 
-	return -1;
+	return -EBADMSG;
 }
 EXPORT_SYMBOL(nand_correct_data);
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] MTD: treat any negative return value from correct() as an error
  2007-10-20 10:42 ` Jörn Engel
  2007-10-20 11:11   ` [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16() Jörn Engel
@ 2007-10-20 16:37   ` Matt Reimer
  2007-10-20 17:40     ` Jörn Engel
  1 sibling, 1 reply; 12+ messages in thread
From: Matt Reimer @ 2007-10-20 16:37 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Matt Reimer, linux-mtd

On 10/20/07, Jörn Engel <joern@logfs.org> wrote:
> On Wed, 17 October 2007 14:33:23 -0700, mattjreimer@gmail.com wrote:
> >
> > Treat any negative return value from a NAND driver's correct() function
> > as a failure, rather than just -1. Some drivers (e.g. diskonchip) can
> > return error values such as -EIO, which ended up being treated as the
> > number of bits corrected, which in turn resulted in nand_do_read_ops()
> > returning -EUCLEAN rather than -EBADMSG.
>
> What an interesting patch.
>
> NACK on patch description.  Several things are wrong with this:
> o -EIO shouldn't become -EBADMSG any more than -EUCLEAN.  It should
>   remain -EIO and nothing else.

My mistake: I did most of this analysis in the 2.6.21 code, where
cafe_ecc.c:cafe_correct_ecc() did return -EIO, not diskonchip.

But as for the rest of the description, I'm just reading the code
that's there. An error return from ecc.correct bumps stats.failed,
which in nand_do_read_ops does become -EBADMSG:

        if (mtd->ecc_stats.failed - stats.failed)
                return -EBADMSG;

        return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;

So I was just following what I found there, assuming it was standard practice.

> o I cannot see how diskonchip can return -EIO.  Only -ERANGE.

Again, my bad; I was confusing it with an earlier version of
cafe_ecc.c (which I see now didn't pass -EIO up all the way).

Matt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] MTD: treat any negative return value from correct() as an error
  2007-10-20 16:37   ` [PATCH] MTD: treat any negative return value from correct() as an error Matt Reimer
@ 2007-10-20 17:40     ` Jörn Engel
  2007-10-20 17:59       ` Matthew Reimer
  0 siblings, 1 reply; 12+ messages in thread
From: Jörn Engel @ 2007-10-20 17:40 UTC (permalink / raw)
  To: Matt Reimer; +Cc: Jörn Engel, Matt Reimer, linux-mtd

On Sat, 20 October 2007 09:37:16 -0700, Matt Reimer wrote:
> 
> My mistake: I did most of this analysis in the 2.6.21 code, where
> cafe_ecc.c:cafe_correct_ecc() did return -EIO, not diskonchip.
> 
> But as for the rest of the description, I'm just reading the code
> that's there. An error return from ecc.correct bumps stats.failed,
> which in nand_do_read_ops does become -EBADMSG:
> 
>         if (mtd->ecc_stats.failed - stats.failed)
>                 return -EBADMSG;
> 
>         return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
> 
> So I was just following what I found there, assuming it was standard practice.

Fair enough.  Description aside, your patch, my two, plus your patch to
return -1 on uncorrectable ecc errors instead of 0 should be turned into
a patch series.  Better yet, change your patch to return -EBADMSG.

Do you volunteer or should I do it?

Jörn

-- 
Data expands to fill the space available for storage.
-- Parkinson's Law

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] MTD: treat any negative return value from correct() as an error
  2007-10-20 17:40     ` Jörn Engel
@ 2007-10-20 17:59       ` Matthew Reimer
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Reimer @ 2007-10-20 17:59 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mtd, Matt Reimer

On Sat, 2007-10-20 at 19:40 +0200, Jörn Engel wrote:
> On Sat, 20 October 2007 09:37:16 -0700, Matt Reimer wrote:
> > 
> > My mistake: I did most of this analysis in the 2.6.21 code, where
> > cafe_ecc.c:cafe_correct_ecc() did return -EIO, not diskonchip.
> > 
> > But as for the rest of the description, I'm just reading the code
> > that's there. An error return from ecc.correct bumps stats.failed,
> > which in nand_do_read_ops does become -EBADMSG:
> > 
> >         if (mtd->ecc_stats.failed - stats.failed)
> >                 return -EBADMSG;
> > 
> >         return  mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
> > 
> > So I was just following what I found there, assuming it was standard practice.
> 
> Fair enough.  Description aside, your patch, my two, plus your patch to
> return -1 on uncorrectable ecc errors instead of 0 should be turned into
> a patch series.  Better yet, change your patch to return -EBADMSG.
> 
> Do you volunteer or should I do it?

I'll do it, if the next couple of days is soon enough. I'd rather keep
you free to work on LogFS. :-)

Matt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16()
  2007-10-20 11:11   ` [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16() Jörn Engel
  2007-10-20 11:13     ` [PATCH] Replace -1 with -EBADMSG in nand error correction code Jörn Engel
@ 2007-10-21  8:04     ` Thomas Gleixner
  2007-10-21  8:06     ` Thomas Gleixner
  2 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2007-10-21  8:04 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, Matt Reimer, linux-mtd, mattjreimer

[-- Attachment #1: Type: TEXT/PLAIN, Size: 578 bytes --]

On Sat, 20 Oct 2007, Jörn Engel wrote:

> Returning -ERANGE should never happen.
> 
> --- linux-2.6.22cow/lib/reed_solomon/decode_rs.c~rs_ERANGE	2006-10-13 15:58:51.000000000 +0200
> +++ linux-2.6.22cow/lib/reed_solomon/decode_rs.c	2007-10-20 13:07:57.000000000 +0200
> @@ -39,8 +39,7 @@
>  
>  	/* Check length parameter for validity */
>  	pad = nn - nroots - len;
> -	if (pad < 0 || pad >= nn)
> -		return -ERANGE;
> +	BUG_ON(pad < 0 || pad >= nn);

No. This is not a critical error. You escalate a situation where the
kernel can still survive to a fatal error.

       tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16()
  2007-10-20 11:11   ` [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16() Jörn Engel
  2007-10-20 11:13     ` [PATCH] Replace -1 with -EBADMSG in nand error correction code Jörn Engel
  2007-10-21  8:04     ` [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16() Thomas Gleixner
@ 2007-10-21  8:06     ` Thomas Gleixner
  2007-10-21  8:48       ` Jörn Engel
  2 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2007-10-21  8:06 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, Matt Reimer, linux-mtd, mattjreimer

[-- Attachment #1: Type: TEXT/PLAIN, Size: 613 bytes --]

On Sat, 20 Oct 2007, Jörn Engel wrote:

> Returning -ERANGE should never happen.
> 
> --- linux-2.6.22cow/lib/reed_solomon/decode_rs.c~rs_ERANGE	2006-10-13 15:58:51.000000000 +0200
> +++ linux-2.6.22cow/lib/reed_solomon/decode_rs.c	2007-10-20 13:07:57.000000000 +0200
> @@ -39,8 +39,7 @@
>  
>  	/* Check length parameter for validity */
>  	pad = nn - nroots - len;
> -	if (pad < 0 || pad >= nn)
> -		return -ERANGE;
> +	BUG_ON(pad < 0 || pad >= nn);


No. It should never happen, but escalating an error which the kernel
can survive under a lot of circumstances to a fatal death error is
wrong.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Replace -1 with -EBADMSG in nand error correction code
  2007-10-20 11:13     ` [PATCH] Replace -1 with -EBADMSG in nand error correction code Jörn Engel
@ 2007-10-21  8:08       ` Thomas Gleixner
  2007-10-21  8:42         ` Jörn Engel
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2007-10-21  8:08 UTC (permalink / raw)
  To: Jörn Engel; +Cc: David Woodhouse, Matt Reimer, linux-mtd, mattjreimer

[-- Attachment #1: Type: TEXT/PLAIN, Size: 215 bytes --]

On Sat, 20 Oct 2007, Jörn Engel wrote:

> Magic numerical values are just bad style.  Particularly so when
> undocumented.

Please resend with a proper Signed-off-by.
Please add my Acked-by as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Replace -1 with -EBADMSG in nand error correction code
  2007-10-21  8:08       ` Thomas Gleixner
@ 2007-10-21  8:42         ` Jörn Engel
  0 siblings, 0 replies; 12+ messages in thread
From: Jörn Engel @ 2007-10-21  8:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Woodhouse, Jörn Engel, Matt Reimer, linux-mtd,
	mattjreimer

On Sun, 21 October 2007 10:08:40 +0200, Thomas Gleixner wrote:
> On Sat, 20 Oct 2007, Jörn Engel wrote:
> 
> > Magic numerical values are just bad style.  Particularly so when
> > undocumented.
> 
> Please resend with a proper Signed-off-by.
> Please add my Acked-by as well.

Already resent privately to dwmw2 (just added a Signed-Off-By: didn't
seem to justify a huge Cc: list).  He merged it.

Jörn

-- 
Eighty percent of success is showing up.
-- Woody Allen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16()
  2007-10-21  8:06     ` Thomas Gleixner
@ 2007-10-21  8:48       ` Jörn Engel
  0 siblings, 0 replies; 12+ messages in thread
From: Jörn Engel @ 2007-10-21  8:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Woodhouse, Jörn Engel, Matt Reimer, linux-mtd,
	mattjreimer

On Sun, 21 October 2007 10:06:53 +0200, Thomas Gleixner wrote:
> 
> No. It should never happen, but escalating an error which the kernel
> can survive under a lot of circumstances to a fatal death error is
> wrong.

I am unimpressed.  This error is not some rare hardware error, it is a
programming error, a bug in the kernel.  And until now it was handled by
hiding it as a _corrected_ crc error.  If you want to handle it in a
non-fatal way, you can do it.  But that requires more than just
reverting this patch - you should also properly document why an
"impossible" error is indeed possible and how calling code should deal
with it.  And of course make sure current code follows documentation and
does the right thing.

Is all that worth it?  I have serious doubts.

Jörn

-- 
Never argue with idiots - first they drag you down to their level,
then they beat you with experience.
-- unknown

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-10-21 11:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-17 21:33 [PATCH] MTD: treat any negative return value from correct() as an error mattjreimer
2007-10-20 10:42 ` Jörn Engel
2007-10-20 11:11   ` [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16() Jörn Engel
2007-10-20 11:13     ` [PATCH] Replace -1 with -EBADMSG in nand error correction code Jörn Engel
2007-10-21  8:08       ` Thomas Gleixner
2007-10-21  8:42         ` Jörn Engel
2007-10-21  8:04     ` [PATCH] BUG() when passing illegal parameters to decode_rs8() or decode_rs16() Thomas Gleixner
2007-10-21  8:06     ` Thomas Gleixner
2007-10-21  8:48       ` Jörn Engel
2007-10-20 16:37   ` [PATCH] MTD: treat any negative return value from correct() as an error Matt Reimer
2007-10-20 17:40     ` Jörn Engel
2007-10-20 17:59       ` Matthew Reimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox