public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* discriminate single bit error hardware failure from slab corruption.
@ 2006-02-02 19:24 Dave Jones
  2006-02-02 19:28 ` Randy.Dunlap
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Dave Jones @ 2006-02-02 19:24 UTC (permalink / raw)
  To: Linux Kernel

In the case where we detect a single bit has been flipped, we spew
the usual slab corruption message, which users instantly think
is a kernel bug.  In a lot of cases, single bit errors are
down to bad memory, or other hardware failure.

This patch adds an extra line to the slab debug messages in those
cases, in the hope that users will try memtest before they report a bug.

000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Possibly bad RAM. Please run memtest86.

Signed-off-by: Dave Jones <davej@redhat.com>

--- linux-2.6.15/mm/slab.c~	2006-01-09 13:25:17.000000000 -0500
+++ linux-2.6.15/mm/slab.c	2006-01-09 13:26:01.000000000 -0500
@@ -1313,8 +1313,11 @@ static void poison_obj(kmem_cache_t *cac
 static void dump_line(char *data, int offset, int limit)
 {
 	int i;
+	unsigned char total=0;
 	printk(KERN_ERR "%03x:", offset);
 	for (i = 0; i < limit; i++) {
+		if (data[offset+i] != POISON_FREE)
+			total += data[offset+i];
 		printk(" %02x", (unsigned char)data[offset + i]);
 	}
 	printk("\n");
@@ -1019,6 +1023,18 @@ static void dump_line(char *data, int of
 		}
 	}
 	printk("\n");
+	switch (total) {
+		case 0x36:
+		case 0x6a:
+		case 0x6f:
+		case 0x81:
+		case 0xac:
+		case 0xd3:
+		case 0xd5:
+		case 0xea:
+			printk (KERN_ERR "Single bit error detected. Possibly bad RAM. Please run memtest86.\n");
+			return;
+	}
 }
 #endif
 

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: discriminate single bit error hardware failure from slab corruption.
@ 2006-02-03  9:25 linux
  2006-02-03 14:14 ` Jan Engelhardt
  0 siblings, 1 reply; 16+ messages in thread
From: linux @ 2006-02-03  9:25 UTC (permalink / raw)
  To: davej; +Cc: linux-kernel

Um... case values are allowed to be expressions.

Isn't
+	switch (total) {
+		case SLAB_POISON ^ 0x01:
+		case SLAB_POISON ^ 0x02:
+		case SLAB_POISON ^ 0x04:
+		case SLAB_POISON ^ 0x08:
+		case SLAB_POISON ^ 0x10:
+		case SLAB_POISON ^ 0x20:
+		case SLAB_POISON ^ 0x40:
+		case SLAB_POISON ^ 0x80:
+			printk (KERN_ERR "Single bit error detected. Possibly bad RAM\n"

Infinitely clearer, even without the comments?  Or, if you want to
be cleverer:

	total ^= SLAB_POISON;
	if ((total & (total-1)) == 0) {
		printk (KERN_ERR "Single bit error detected. Possibly bad RAM\n"
	}


If you wanted to get the bit-counting exactly accurate, you'd do:

	unsigned char total = 0, total2 = 0;

 	for (i = 0; i < limit; i++) {
		unsigned char delta = data[offset+i];
 		printk(" %02x", delta;
		delta ^= POISON_FREE;
		total2 |= total & delta;
		total |= delta;
 	}
 	printk("\n");

	/* If total2 has 0 bits set and total1 has at most 1 bit set... */
	if (!total2 && !(total1 & (total1 - 1)) {
		printk (KERN_ERR "Single bit error detected. Possibly bad RAM\n"
		
	}

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-02-06 20:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-02 19:24 discriminate single bit error hardware failure from slab corruption Dave Jones
2006-02-02 19:28 ` Randy.Dunlap
2006-02-02 19:38 ` Jesper Juhl
2006-02-02 19:53 ` Pekka Enberg
2006-02-03  0:44 ` Avi Kivity
2006-02-03  1:46   ` Dave Jones
2006-02-03  2:05     ` Avi Kivity
2006-02-03  4:20       ` Dave Jones
2006-02-03  4:41         ` Roland Dreier
2006-02-03  5:03           ` Dave Jones
2006-02-03 14:12           ` Jan Engelhardt
2006-02-03 11:05         ` Olivier Galibert
2006-02-06 20:19     ` Pavel Machek
2006-02-03 14:09 ` Jan Engelhardt
  -- strict thread matches above, loose matches on Subject: below --
2006-02-03  9:25 linux
2006-02-03 14:14 ` Jan Engelhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox