* single bit flip detector.
@ 2006-08-01 18:44 Dave Jones
2006-08-01 20:15 ` Bill Davidsen
2006-08-01 22:14 ` Alan Cox
0 siblings, 2 replies; 21+ messages in thread
From: Dave Jones @ 2006-08-01 18:44 UTC (permalink / raw)
To: Linux Kernel
In case where we detect a single bit has been flipped, we spew
the usual slab corruption message, which users instantly think
is a kernel bug. In a lot of cases, single bit errors are
down to bad memory, or other hardware failure.
This patch adds an extra line to the slab debug messages
in those cases, in the hope that users will try memtest before
they report a bug.
000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Possibly bad RAM. Run memtest86.
Signed-off-by: Dave Jones <davej@redhat.com>
--- linux-2.6.16.noarch/mm/slab.c~ 2006-03-22 18:29:27.000000000 -0500
+++ linux-2.6.16.noarch/mm/slab.c 2006-03-22 18:30:58.000000000 -0500
@@ -1516,10 +1516,33 @@ static void poison_obj(struct kmem_cache
static void dump_line(char *data, int offset, int limit)
{
int i;
+ unsigned char total=0, bad_count=0;
printk(KERN_ERR "%03x:", offset);
- for (i = 0; i < limit; i++)
+ for (i = 0; i < limit; i++) {
+ if (data[offset+i] != POISON_FREE) {
+ total += data[offset+i];
+ ++bad_count;
+ }
printk(" %02x", (unsigned char)data[offset + i]);
+ }
printk("\n");
+ if (bad_count == 1) {
+ switch (total) {
+ case POISON_FREE ^ 0x01:
+ case POISON_FREE ^ 0x02:
+ case POISON_FREE ^ 0x04:
+ case POISON_FREE ^ 0x08:
+ case POISON_FREE ^ 0x10:
+ case POISON_FREE ^ 0x20:
+ case POISON_FREE ^ 0x40:
+ case POISON_FREE ^ 0x80:
+ printk (KERN_ERR "Single bit error detected. Possibly bad RAM.\n");
+#ifdef CONFIG_X86
+ printk (KERN_ERR "Run memtest86 or other memory test tool.\n");
+#endif
+ return;
+ }
+ }
}
#endif
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 18:44 single bit flip detector Dave Jones
@ 2006-08-01 20:15 ` Bill Davidsen
2006-08-01 22:14 ` Alan Cox
1 sibling, 0 replies; 21+ messages in thread
From: Bill Davidsen @ 2006-08-01 20:15 UTC (permalink / raw)
To: Dave Jones, Linux Kernel
eventuallyDave Jones wrote:
> In case where we detect a single bit has been flipped, we spew
> the usual slab corruption message, which users instantly think
> is a kernel bug. In a lot of cases, single bit errors are
> down to bad memory, or other hardware failure.
>
> This patch adds an extra line to the slab debug messages
> in those cases, in the hope that users will try memtest before
> they report a bug.
>
> 000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Single bit error detected. Possibly bad RAM. Run memtest86.
Given the probability of hardware vs. kernel, you could replace
"possible" with "probable" and not get any argument from me.
--
Bill Davidsen <davidsen@tmr.com>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 18:44 single bit flip detector Dave Jones
2006-08-01 20:15 ` Bill Davidsen
@ 2006-08-01 22:14 ` Alan Cox
2006-08-01 22:30 ` Dave Jones
1 sibling, 1 reply; 21+ messages in thread
From: Alan Cox @ 2006-08-01 22:14 UTC (permalink / raw)
To: Dave Jones; +Cc: Linux Kernel
Ar Maw, 2006-08-01 am 14:44 -0400, ysgrifennodd Dave Jones:
> + case POISON_FREE ^ 0x01:
> + case POISON_FREE ^ 0x02:
> + case POISON_FREE ^ 0x04:
> + case POISON_FREE ^ 0x08:
> + case POISON_FREE ^ 0x10:
> + case POISON_FREE ^ 0x20:
> + case POISON_FREE ^ 0x40:
> + case POISON_FREE ^ 0x80:
> + printk (KERN_ERR "Single bit error detected. Possibly bad RAM.\n");
> +#ifdef CONFIG_X86
> + printk (KERN_ERR "Run memtest86 or other memory test tool.\n");
> +#endif
> + return;
Gack .. NAK
#1: Do we want memtest86 or memtest86+ ?
#2: The check is horrible and there is an elegant implementation for
single bit.
errors = value ^ expected;
if (errors && !(errors & (errors - 1)))
printk(KERN_ERR "Single bit error detected....");
Alan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 22:14 ` Alan Cox
@ 2006-08-01 22:30 ` Dave Jones
2006-08-01 22:36 ` Dave Jones
0 siblings, 1 reply; 21+ messages in thread
From: Dave Jones @ 2006-08-01 22:30 UTC (permalink / raw)
To: Alan Cox; +Cc: Linux Kernel
On Tue, Aug 01, 2006 at 11:14:27PM +0100, Alan Cox wrote:
> Ar Maw, 2006-08-01 am 14:44 -0400, ysgrifennodd Dave Jones:
> > + case POISON_FREE ^ 0x01:
> > + case POISON_FREE ^ 0x02:
> > + case POISON_FREE ^ 0x04:
> > + case POISON_FREE ^ 0x08:
> > + case POISON_FREE ^ 0x10:
> > + case POISON_FREE ^ 0x20:
> > + case POISON_FREE ^ 0x40:
> > + case POISON_FREE ^ 0x80:
> > + printk (KERN_ERR "Single bit error detected. Possibly bad RAM.\n");
> > +#ifdef CONFIG_X86
> > + printk (KERN_ERR "Run memtest86 or other memory test tool.\n");
> > +#endif
> > + return;
>
> Gack .. NAK
>
> #1: Do we want memtest86 or memtest86+ ?
I doubt it really matters.
> #2: The check is horrible and there is an elegant implementation for
> single bit.
>
> errors = value ^ expected;
> if (errors && !(errors & (errors - 1)))
> printk(KERN_ERR "Single bit error detected....");
Good call, I'll hack that up.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 22:30 ` Dave Jones
@ 2006-08-01 22:36 ` Dave Jones
2006-08-01 23:00 ` Alexey Dobriyan
0 siblings, 1 reply; 21+ messages in thread
From: Dave Jones @ 2006-08-01 22:36 UTC (permalink / raw)
To: Alan Cox, Linux Kernel
On Tue, Aug 01, 2006 at 06:30:11PM -0400, Dave Jones wrote:
> On Tue, Aug 01, 2006 at 11:14:27PM +0100, Alan Cox wrote:
> > Ar Maw, 2006-08-01 am 14:44 -0400, ysgrifennodd Dave Jones:
> > > + case POISON_FREE ^ 0x01:
> > > + case POISON_FREE ^ 0x02:
> > > + case POISON_FREE ^ 0x04:
> > > + case POISON_FREE ^ 0x08:
> > > + case POISON_FREE ^ 0x10:
> > > + case POISON_FREE ^ 0x20:
> > > + case POISON_FREE ^ 0x40:
> > > + case POISON_FREE ^ 0x80:
> > > + printk (KERN_ERR "Single bit error detected. Possibly bad RAM.\n");
> > > +#ifdef CONFIG_X86
> > > + printk (KERN_ERR "Run memtest86 or other memory test tool.\n");
> > > +#endif
> > > + return;
> >
> > Gack .. NAK
> >
> > #1: Do we want memtest86 or memtest86+ ?
>
> I doubt it really matters.
>
> > #2: The check is horrible and there is an elegant implementation for
> > single bit.
> >
> > errors = value ^ expected;
> > if (errors && !(errors & (errors - 1)))
> > printk(KERN_ERR "Single bit error detected....");
>
> Good call, I'll hack that up.
Take #2.
In case where we detect a single bit has been flipped, we spew
the usual slab corruption message, which users instantly think
is a kernel bug. In a lot of cases, single bit errors are
down to bad memory, or other hardware failure.
This patch adds an extra line to the slab debug messages
in those cases, in the hope that users will try memtest before
they report a bug.
000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Possibly bad RAM. Run memtest86.
Signed-off-by: Dave Jones <davej@redhat.com>
diff --git a/mm/slab.c b/mm/slab.c
index 21ba060..39f1183 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1638,10 +1638,29 @@ static void poison_obj(struct kmem_cache
static void dump_line(char *data, int offset, int limit)
{
int i;
+ unsigned char total=0, bad_count=0;
printk(KERN_ERR "%03x:", offset);
- for (i = 0; i < limit; i++)
+ for (i = 0; i < limit; i++) {
+ if (data[offset+i] != POISON_FREE) {
+ total += data[offset+i];
+ ++bad_count;
+ }
printk(" %02x", (unsigned char)data[offset + i]);
+ }
printk("\n");
+
+ if (bad_count == 1) {
+ errors = total ^ POISON_FREE;
+ if ((errors && !(errors & (errors-1))) {
+ printk (KERN_ERR "Single bit error detected. Probably bad RAM.\n");
+#ifdef CONFIG_X86
+ printk (KERN_ERR "Run memtest86+ or similar memory test tool.\n");
+#else
+ printk (KERN_ERR "Run a memory test tool.\n");
+#endif
+ return;
+ }
+ }
}
#endif
--
http://www.codemonkey.org.uk
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 22:36 ` Dave Jones
@ 2006-08-01 23:00 ` Alexey Dobriyan
2006-08-01 23:16 ` Dave Jones
2006-08-02 7:08 ` Jan Engelhardt
0 siblings, 2 replies; 21+ messages in thread
From: Alexey Dobriyan @ 2006-08-01 23:00 UTC (permalink / raw)
To: Dave Jones, Alan Cox, Linux Kernel
On Tue, Aug 01, 2006 at 06:36:22PM -0400, Dave Jones wrote:
> 000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1638,10 +1638,29 @@ static void poison_obj(struct kmem_cache
> static void dump_line(char *data, int offset, int limit)
> {
> int i;
> + unsigned char total=0, bad_count=0;
" = 0"
> printk(KERN_ERR "%03x:", offset);
> - for (i = 0; i < limit; i++)
> + for (i = 0; i < limit; i++) {
> + if (data[offset+i] != POISON_FREE) {
" + i"
> + total += data[offset+i];
ditto
> + ++bad_count;
How about post increment?
> + }
> printk(" %02x", (unsigned char)data[offset + i]);
> + }
> printk("\n");
> +
> + if (bad_count == 1) {
> + errors = total ^ POISON_FREE;
undeclared "errors"
> + if ((errors && !(errors & (errors-1))) {
^^^
Turn on CONFIG_DEBUG_SLAB before compiling. ;-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 23:00 ` Alexey Dobriyan
@ 2006-08-01 23:16 ` Dave Jones
2006-08-01 23:27 ` Dave Jones
2006-08-01 23:28 ` Andreas Schwab
2006-08-02 7:08 ` Jan Engelhardt
1 sibling, 2 replies; 21+ messages in thread
From: Dave Jones @ 2006-08-01 23:16 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: Alan Cox, Linux Kernel
On Wed, Aug 02, 2006 at 03:00:03AM +0400, Alexey Dobriyan wrote:
> Turn on CONFIG_DEBUG_SLAB before compiling. ;-)
Well, that was silly. Here's a properly compile tested patch :-)
Dave
In case where we detect a single bit has been flipped, we spew
the usual slab corruption message, which users instantly think
is a kernel bug. In a lot of cases, single bit errors are
down to bad memory, or other hardware failure.
This patch adds an extra line to the slab debug messages
in those cases, in the hope that users will try memtest before
they report a bug.
000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Possibly bad RAM. Run memtest86.
Signed-off-by: Dave Jones <davej@redhat.com>
diff --git a/mm/slab.c b/mm/slab.c
index 21ba060..39f1183 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1638,10 +1638,29 @@ static void poison_obj(struct kmem_cache
static void dump_line(char *data, int offset, int limit)
{
int i;
+ unsigned char total = 0, bad_count = 0, errors = 0;
printk(KERN_ERR "%03x:", offset);
- for (i = 0; i < limit; i++)
+ for (i = 0; i < limit; i++) {
+ if (data[offset + i] != POISON_FREE) {
+ total += data[offset + i];
+ bad_count++;
+ }
printk(" %02x", (unsigned char)data[offset + i]);
+ }
printk("\n");
+
+ if (bad_count == 1) {
+ errors = total ^ POISON_FREE;
+ if (errors && !(errors & (errors-1))) {
+ printk (KERN_ERR "Single bit error detected. Probably bad RAM.\n");
+#ifdef CONFIG_X86
+ printk (KERN_ERR "Run memtest86+ or similar memory test tool.\n");
+#else
+ printk (KERN_ERR "Run a memory test tool.\n");
+#endif
+ return;
+ }
+ }
}
#endif
--
http://www.codemonkey.org.uk
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 23:16 ` Dave Jones
@ 2006-08-01 23:27 ` Dave Jones
2006-08-01 23:28 ` Andreas Schwab
1 sibling, 0 replies; 21+ messages in thread
From: Dave Jones @ 2006-08-01 23:27 UTC (permalink / raw)
To: Alexey Dobriyan, Alan Cox, Linux Kernel
On Tue, Aug 01, 2006 at 07:16:03PM -0400, Dave Jones wrote:
> On Wed, Aug 02, 2006 at 03:00:03AM +0400, Alexey Dobriyan wrote:
>
> > Turn on CONFIG_DEBUG_SLAB before compiling. ;-)
>
> Well, that was silly. Here's a properly compile tested patch :-)
The grammar police found me, here's hopefully the final rendition..
Dave
In case where we detect a single bit has been flipped, we spew
the usual slab corruption message, which users instantly think
is a kernel bug. In a lot of cases, single bit errors are
down to bad memory, or other hardware failure.
This patch adds an extra line to the slab debug messages
in those cases, in the hope that users will try memtest before
they report a bug.
000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Probably bad RAM.
Signed-off-by: Dave Jones <davej@redhat.com>
diff --git a/mm/slab.c b/mm/slab.c
index 21ba060..39f1183 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1638,10 +1638,29 @@ static void poison_obj(struct kmem_cache
static void dump_line(char *data, int offset, int limit)
{
int i;
+ unsigned char total = 0, bad_count = 0, errors = 0;
printk(KERN_ERR "%03x:", offset);
- for (i = 0; i < limit; i++)
+ for (i = 0; i < limit; i++) {
+ if (data[offset + i] != POISON_FREE) {
+ total += data[offset + i];
+ bad_count++;
+ }
printk(" %02x", (unsigned char)data[offset + i]);
+ }
printk("\n");
+
+ if (bad_count == 1) {
+ errors = total ^ POISON_FREE;
+ if (errors && !(errors & (errors-1))) {
+ printk (KERN_ERR "Single bit error detected. Probably bad RAM.\n");
+#ifdef CONFIG_X86
+ printk (KERN_ERR "Run memtest86+ or a similar memory test tool.\n");
+#else
+ printk (KERN_ERR "Run a memory test tool.\n");
+#endif
+ return;
+ }
+ }
}
#endif
--
http://www.codemonkey.org.uk
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 23:16 ` Dave Jones
2006-08-01 23:27 ` Dave Jones
@ 2006-08-01 23:28 ` Andreas Schwab
2006-08-01 23:51 ` Dave Jones
1 sibling, 1 reply; 21+ messages in thread
From: Andreas Schwab @ 2006-08-01 23:28 UTC (permalink / raw)
To: Dave Jones; +Cc: Alexey Dobriyan, Alan Cox, Linux Kernel
Dave Jones <davej@redhat.com> writes:
> diff --git a/mm/slab.c b/mm/slab.c
> index 21ba060..39f1183 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1638,10 +1638,29 @@ static void poison_obj(struct kmem_cache
> static void dump_line(char *data, int offset, int limit)
> {
> int i;
> + unsigned char total = 0, bad_count = 0, errors = 0;
No need to initialize errors here.
> printk(KERN_ERR "%03x:", offset);
> - for (i = 0; i < limit; i++)
> + for (i = 0; i < limit; i++) {
> + if (data[offset + i] != POISON_FREE) {
> + total += data[offset + i];
> + bad_count++;
> + }
> printk(" %02x", (unsigned char)data[offset + i]);
> + }
> printk("\n");
> +
> + if (bad_count == 1) {
> + errors = total ^ POISON_FREE;
> + if (errors && !(errors & (errors-1))) {
> + printk (KERN_ERR "Single bit error detected. Probably bad RAM.\n");
> +#ifdef CONFIG_X86
> + printk (KERN_ERR "Run memtest86+ or similar memory test tool.\n");
> +#else
> + printk (KERN_ERR "Run a memory test tool.\n");
> +#endif
> + return;
Useless return.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 23:28 ` Andreas Schwab
@ 2006-08-01 23:51 ` Dave Jones
2006-08-02 0:16 ` Dave Jones
0 siblings, 1 reply; 21+ messages in thread
From: Dave Jones @ 2006-08-01 23:51 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Alexey Dobriyan, Alan Cox, Linux Kernel
On Wed, Aug 02, 2006 at 01:28:49AM +0200, Andreas Schwab wrote:
> Dave Jones <davej@redhat.com> writes:
>
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 21ba060..39f1183 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -1638,10 +1638,29 @@ static void poison_obj(struct kmem_cache
> > static void dump_line(char *data, int offset, int limit)
> > {
> > int i;
> > + unsigned char total = 0, bad_count = 0, errors = 0;
>
> No need to initialize errors here.
I'm going for the record of 'most times a patch gets submitted in one day'.
And to think we were complaining that patches don't get enough review ? :)
If every change had this much polish, we'd be awesome.
Dave
In case where we detect a single bit has been flipped, we spew
the usual slab corruption message, which users instantly think
is a kernel bug. In a lot of cases, single bit errors are
down to bad memory, or other hardware failure.
This patch adds an extra line to the slab debug messages
in those cases, in the hope that users will try memtest before
they report a bug.
000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Possibly bad RAM. Run memtest86.
Signed-off-by: Dave Jones <davej@redhat.com>
diff --git a/mm/slab.c b/mm/slab.c
index 21ba060..39f1183 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1638,10 +1638,28 @@ static void poison_obj(struct kmem_cache
static void dump_line(char *data, int offset, int limit)
{
int i;
+ unsigned char total = 0, bad_count = 0, errors;
printk(KERN_ERR "%03x:", offset);
- for (i = 0; i < limit; i++)
+ for (i = 0; i < limit; i++) {
+ if (data[offset + i] != POISON_FREE) {
+ total += data[offset + i];
+ bad_count++;
+ }
printk(" %02x", (unsigned char)data[offset + i]);
+ }
printk("\n");
+
+ if (bad_count == 1) {
+ errors = total ^ POISON_FREE;
+ if (errors && !(errors & (errors-1))) {
+ printk (KERN_ERR "Single bit error detected. Probably bad RAM.\n");
+#ifdef CONFIG_X86
+ printk (KERN_ERR "Run memtest86+ or a similar memory test tool.\n");
+#else
+ printk (KERN_ERR "Run a memory test tool.\n");
+#endif
+ }
+ }
}
#endif
--
http://www.codemonkey.org.uk
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 23:51 ` Dave Jones
@ 2006-08-02 0:16 ` Dave Jones
2006-08-02 6:20 ` Rolf Eike Beer
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Dave Jones @ 2006-08-02 0:16 UTC (permalink / raw)
To: Andreas Schwab, Alexey Dobriyan, Alan Cox, Linux Kernel
On Tue, Aug 01, 2006 at 07:51:09PM -0400, Dave Jones wrote:
> I'm going for the record of 'most times a patch gets submitted in one day'.
> And to think we were complaining that patches don't get enough review ? :)
> If every change had this much polish, we'd be awesome.
Sigh. Spaces before printk. Whatever next.
I am now officially bored of seeing this patch.
Dave
In case where we detect a single bit has been flipped, we spew
the usual slab corruption message, which users instantly think
is a kernel bug. In a lot of cases, single bit errors are
down to bad memory, or other hardware failure.
This patch adds an extra line to the slab debug messages
in those cases, in the hope that users will try memtest before
they report a bug.
000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Possibly bad RAM. Run memtest86.
Signed-off-by: Dave Jones <davej@redhat.com>
diff --git a/mm/slab.c b/mm/slab.c
index 21ba060..39f1183 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1638,10 +1638,28 @@ static void poison_obj(struct kmem_cache
static void dump_line(char *data, int offset, int limit)
{
int i;
+ unsigned char total = 0, bad_count = 0, errors;
printk(KERN_ERR "%03x:", offset);
- for (i = 0; i < limit; i++)
+ for (i = 0; i < limit; i++) {
+ if (data[offset + i] != POISON_FREE) {
+ total += data[offset + i];
+ bad_count++;
+ }
printk(" %02x", (unsigned char)data[offset + i]);
+ }
printk("\n");
+
+ if (bad_count == 1) {
+ errors = total ^ POISON_FREE;
+ if (errors && !(errors & (errors-1))) {
+ printk(KERN_ERR "Single bit error detected. Probably bad RAM.\n");
+#ifdef CONFIG_X86
+ printk(KERN_ERR "Run memtest86+ or a similar memory test tool.\n");
+#else
+ printk(KERN_ERR "Run a memory test tool.\n");
+#endif
+ }
+ }
}
#endif
--
http://www.codemonkey.org.uk
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-02 0:16 ` Dave Jones
@ 2006-08-02 6:20 ` Rolf Eike Beer
2006-08-02 7:08 ` Jan Engelhardt
` (2 subsequent siblings)
3 siblings, 0 replies; 21+ messages in thread
From: Rolf Eike Beer @ 2006-08-02 6:20 UTC (permalink / raw)
To: Dave Jones; +Cc: Andreas Schwab, Alexey Dobriyan, Alan Cox, Linux Kernel
[-- Attachment #1: Type: text/plain, Size: 920 bytes --]
Dave Jones wrote:
> On Tue, Aug 01, 2006 at 07:51:09PM -0400, Dave Jones wrote:
> > I'm going for the record of 'most times a patch gets submitted in one
> > day'. And to think we were complaining that patches don't get enough
> > review ? :) If every change had this much polish, we'd be awesome.
>
> Sigh. Spaces before printk. Whatever next.
> I am now officially bored of seeing this patch.
> diff --git a/mm/slab.c b/mm/slab.c
> index 21ba060..39f1183 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1638,10 +1638,28 @@ static void poison_obj(struct kmem_cache
> static void dump_line(char *data, int offset, int limit)
> {
> int i;
> + unsigned char total = 0, bad_count = 0, errors;
> printk(KERN_ERR "%03x:", offset);
> - for (i = 0; i < limit; i++)
> + for (i = 0; i < limit; i++) {
You might want to add a newline before the actual code to make it more
readable :)
Eike
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-01 23:00 ` Alexey Dobriyan
2006-08-01 23:16 ` Dave Jones
@ 2006-08-02 7:08 ` Jan Engelhardt
1 sibling, 0 replies; 21+ messages in thread
From: Jan Engelhardt @ 2006-08-02 7:08 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: Dave Jones, Alan Cox, Linux Kernel
>
>> + ++bad_count;
>
>How about post increment?
>
There would no difference, in effect.
Same goes for the return; mentioned elsewhere in the thread.
Jan Engelhardt
--
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-02 0:16 ` Dave Jones
2006-08-02 6:20 ` Rolf Eike Beer
@ 2006-08-02 7:08 ` Jan Engelhardt
2006-08-06 11:05 ` Geert Uytterhoeven
2006-08-02 15:24 ` Patrick McLean
2006-08-04 21:19 ` Andrew Morton
3 siblings, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2006-08-02 7:08 UTC (permalink / raw)
To: Dave Jones; +Cc: Andreas Schwab, Alexey Dobriyan, Alan Cox, Linux Kernel
> printk(" %02x", (unsigned char)data[offset + i]);
Remove cast. (Or does it spew a warning message for you?)
Jan Engelhardt
--
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-02 0:16 ` Dave Jones
2006-08-02 6:20 ` Rolf Eike Beer
2006-08-02 7:08 ` Jan Engelhardt
@ 2006-08-02 15:24 ` Patrick McLean
2006-08-02 16:12 ` Randy.Dunlap
2006-08-04 21:19 ` Andrew Morton
3 siblings, 1 reply; 21+ messages in thread
From: Patrick McLean @ 2006-08-02 15:24 UTC (permalink / raw)
To: Dave Jones, Andreas Schwab, Alexey Dobriyan, Alan Cox,
Linux Kernel
Dave Jones wrote:
> On Tue, Aug 01, 2006 at 07:51:09PM -0400, Dave Jones wrote:
> > I'm going for the record of 'most times a patch gets submitted in one day'.
> > And to think we were complaining that patches don't get enough review ? :)
> > If every change had this much polish, we'd be awesome.
>
> Sigh. Spaces before printk. Whatever next.
> I am now officially bored of seeing this patch.
>
> Dave
>
>
> In case where we detect a single bit has been flipped, we spew
> the usual slab corruption message, which users instantly think
> is a kernel bug. In a lot of cases, single bit errors are
> down to bad memory, or other hardware failure.
>
> This patch adds an extra line to the slab debug messages
> in those cases, in the hope that users will try memtest before
> they report a bug.
>
> 000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Single bit error detected. Possibly bad RAM. Run memtest86.
>
> Signed-off-by: Dave Jones <davej@redhat.com>
>
> + if (errors && !(errors & (errors-1))) {
> + printk(KERN_ERR "Single bit error detected. Probably bad RAM.\n");
> +#ifdef CONFIG_X86
#if defined(CONFIG_X86) || defined(CONFIG_X86_64)
memtest86+ runs fine on x86_64 machines as well.
> + printk(KERN_ERR "Run memtest86+ or a similar memory test tool.\n");
> +#else
> + printk(KERN_ERR "Run a memory test tool.\n");
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-02 15:24 ` Patrick McLean
@ 2006-08-02 16:12 ` Randy.Dunlap
0 siblings, 0 replies; 21+ messages in thread
From: Randy.Dunlap @ 2006-08-02 16:12 UTC (permalink / raw)
To: Patrick McLean
Cc: Dave Jones, Andreas Schwab, Alexey Dobriyan, Alan Cox,
Linux Kernel
On Wed, 02 Aug 2006 11:24:13 -0400 Patrick McLean wrote:
> Dave Jones wrote:
> > On Tue, Aug 01, 2006 at 07:51:09PM -0400, Dave Jones wrote:
> > > I'm going for the record of 'most times a patch gets submitted in one day'.
> > > And to think we were complaining that patches don't get enough review ? :)
> > > If every change had this much polish, we'd be awesome.
> >
> > Sigh. Spaces before printk. Whatever next.
> > I am now officially bored of seeing this patch.
> >
> > Dave
> >
> >
> > In case where we detect a single bit has been flipped, we spew
> > the usual slab corruption message, which users instantly think
> > is a kernel bug. In a lot of cases, single bit errors are
> > down to bad memory, or other hardware failure.
> >
> > This patch adds an extra line to the slab debug messages
> > in those cases, in the hope that users will try memtest before
> > they report a bug.
> >
> > 000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > Single bit error detected. Possibly bad RAM. Run memtest86.
> >
> > Signed-off-by: Dave Jones <davej@redhat.com>
> >
> > + if (errors && !(errors & (errors-1))) {
> > + printk(KERN_ERR "Single bit error detected. Probably bad RAM.\n");
> > +#ifdef CONFIG_X86
>
> #if defined(CONFIG_X86) || defined(CONFIG_X86_64)
CONFIG_X86 is set on both x86-32 and x86-64.
> memtest86+ runs fine on x86_64 machines as well.
>
> > + printk(KERN_ERR "Run memtest86+ or a similar memory test tool.\n");
> > +#else
> > + printk(KERN_ERR "Run a memory test tool.\n");
> -
---
~Randy
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-02 0:16 ` Dave Jones
` (2 preceding siblings ...)
2006-08-02 15:24 ` Patrick McLean
@ 2006-08-04 21:19 ` Andrew Morton
2006-08-04 22:06 ` Dave Jones
3 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2006-08-04 21:19 UTC (permalink / raw)
To: Dave Jones; +Cc: Andreas Schwab, Alexey Dobriyan, Alan Cox, Linux Kernel
On Tue, 1 Aug 2006 20:16:26 -0400
Dave Jones <davej@redhat.com> wrote:
> In case where we detect a single bit has been flipped, we spew
> the usual slab corruption message, which users instantly think
> is a kernel bug. In a lot of cases, single bit errors are
> down to bad memory, or other hardware failure.
>
> This patch adds an extra line to the slab debug messages
> in those cases, in the hope that users will try memtest before
> they report a bug.
Well boy, this has to be the most-reviewed patch ever. You'd think that
I'd apply it with great confidence and warm fuzzies.
However...
From: Andrew Morton <akpm@osdl.org>
- one decl per line is more patching-friendly and a bit more idiomatic.
- make `bad_count' an int: a uchar might overflow
- Put a blank line between decls and code
- rename `total' to `error', remove `errors'.
- there's no need to sum up the errors.
- don't need to check for non-zero `errors': we know it is != POISON_FREE.
- make it look non-crapful in an 80-col window.
- add missing spaces in arithmetic
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
diff -puN mm/slab.c~single-bit-flip-detector-tidy mm/slab.c
--- a/mm/slab.c~single-bit-flip-detector-tidy
+++ a/mm/slab.c
@@ -1637,11 +1637,13 @@ static void poison_obj(struct kmem_cache
static void dump_line(char *data, int offset, int limit)
{
int i;
- unsigned char total = 0, bad_count = 0, errors;
+ unsigned char error = 0;
+ int bad_count = 0;
+
printk(KERN_ERR "%03x:", offset);
for (i = 0; i < limit; i++) {
if (data[offset + i] != POISON_FREE) {
- total += data[offset + i];
+ error = data[offset + i];
bad_count++;
}
printk(" %02x", (unsigned char)data[offset + i]);
@@ -1649,11 +1651,13 @@ static void dump_line(char *data, int of
printk("\n");
if (bad_count == 1) {
- errors = total ^ POISON_FREE;
- if (errors && !(errors & (errors-1))) {
- printk(KERN_ERR "Single bit error detected. Probably bad RAM.\n");
+ error ^= POISON_FREE;
+ if (!(error & (error - 1))) {
+ printk(KERN_ERR "Single bit error detected. Probably "
+ "bad RAM.\n");
#ifdef CONFIG_X86
- printk(KERN_ERR "Run memtest86+ or a similar memory test tool.\n");
+ printk(KERN_ERR "Run memtest86+ or a similar memory "
+ "test tool.\n");
#else
printk(KERN_ERR "Run a memory test tool.\n");
#endif
_
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-04 21:19 ` Andrew Morton
@ 2006-08-04 22:06 ` Dave Jones
2006-08-04 22:25 ` Andrew Morton
0 siblings, 1 reply; 21+ messages in thread
From: Dave Jones @ 2006-08-04 22:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andreas Schwab, Alexey Dobriyan, Alan Cox, Linux Kernel
On Fri, Aug 04, 2006 at 02:19:55PM -0700, Andrew Morton wrote:
> On Tue, 1 Aug 2006 20:16:26 -0400
> Dave Jones <davej@redhat.com> wrote:
>
> > In case where we detect a single bit has been flipped, we spew
> > the usual slab corruption message, which users instantly think
> > is a kernel bug. In a lot of cases, single bit errors are
> > down to bad memory, or other hardware failure.
> >
> > This patch adds an extra line to the slab debug messages
> > in those cases, in the hope that users will try memtest before
> > they report a bug.
>
> Well boy, this has to be the most-reviewed patch ever. You'd think that
> I'd apply it with great confidence and warm fuzzies.
I should stick to one-liner fixes.
> - one decl per line is more patching-friendly and a bit more idiomatic.
> - make `bad_count' an int: a uchar might overflow
> - Put a blank line between decls and code
> - rename `total' to `error', remove `errors'.
> - there's no need to sum up the errors.
> - don't need to check for non-zero `errors': we know it is != POISON_FREE.
> - make it look non-crapful in an 80-col window.
> - add missing spaces in arithmetic
With this much iteration, I do have one question..
Does it still work ? :-)
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-04 22:06 ` Dave Jones
@ 2006-08-04 22:25 ` Andrew Morton
2006-08-04 22:28 ` Dave Jones
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2006-08-04 22:25 UTC (permalink / raw)
To: Dave Jones; +Cc: Andreas Schwab, Alexey Dobriyan, Alan Cox, Linux Kernel
On Fri, 4 Aug 2006 18:06:44 -0400
Dave Jones <davej@redhat.com> wrote:
> Does it still work ? :-)
Awww man. And I thought everyone _else_ was being fussy.
I expect it'll work. We'll find out when someone hits a single-bit error.
The worst it can do is crash the machine, except it's already crashed...
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-04 22:25 ` Andrew Morton
@ 2006-08-04 22:28 ` Dave Jones
0 siblings, 0 replies; 21+ messages in thread
From: Dave Jones @ 2006-08-04 22:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andreas Schwab, Alexey Dobriyan, Alan Cox, Linux Kernel
On Fri, Aug 04, 2006 at 03:25:07PM -0700, Andrew Morton wrote:
> On Fri, 4 Aug 2006 18:06:44 -0400
> Dave Jones <davej@redhat.com> wrote:
>
> > Does it still work ? :-)
>
> Awww man. And I thought everyone _else_ was being fussy.
>
> I expect it'll work. We'll find out when someone hits a single-bit error.
> The worst it can do is crash the machine, except it's already crashed...
Ok, if it turns out to be broken, and my version worked, you owe
me a beer at the next conference :-)
Apathetically-Signed-off-by: Dave Jones <davej@redhat.com>
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: single bit flip detector.
2006-08-02 7:08 ` Jan Engelhardt
@ 2006-08-06 11:05 ` Geert Uytterhoeven
0 siblings, 0 replies; 21+ messages in thread
From: Geert Uytterhoeven @ 2006-08-06 11:05 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Dave Jones, Andreas Schwab, Alexey Dobriyan, Alan Cox,
Linux Kernel
On Wed, 2 Aug 2006, Jan Engelhardt wrote:
> > printk(" %02x", (unsigned char)data[offset + i]);
>
> Remove cast. (Or does it spew a warning message for you?)
No warning, but you still want the cast...
On PPC and ARM that will work fine, since char is unsigned.
But on most other platforms char is signed, and contrary to popular belief,
`%02x' doesn't mean `limit this field to 2 characters', so it would print e.g.
ffffffff instead of ff for -1.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2006-08-06 11:06 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-01 18:44 single bit flip detector Dave Jones
2006-08-01 20:15 ` Bill Davidsen
2006-08-01 22:14 ` Alan Cox
2006-08-01 22:30 ` Dave Jones
2006-08-01 22:36 ` Dave Jones
2006-08-01 23:00 ` Alexey Dobriyan
2006-08-01 23:16 ` Dave Jones
2006-08-01 23:27 ` Dave Jones
2006-08-01 23:28 ` Andreas Schwab
2006-08-01 23:51 ` Dave Jones
2006-08-02 0:16 ` Dave Jones
2006-08-02 6:20 ` Rolf Eike Beer
2006-08-02 7:08 ` Jan Engelhardt
2006-08-06 11:05 ` Geert Uytterhoeven
2006-08-02 15:24 ` Patrick McLean
2006-08-02 16:12 ` Randy.Dunlap
2006-08-04 21:19 ` Andrew Morton
2006-08-04 22:06 ` Dave Jones
2006-08-04 22:25 ` Andrew Morton
2006-08-04 22:28 ` Dave Jones
2006-08-02 7:08 ` Jan Engelhardt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox