Re: 2.6.17-mm2

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.17-mm2
       [not found] ` <449FF3A2.8010907@mbligh.org>
@ 2006-06-27 15:37   ` Martin J. Bligh
  2006-06-28 10:42     ` 2.6.17-mm2 Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Martin J. Bligh @ 2006-06-27 15:37 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, linuxppc64-dev, Linux Kernel Mailing List,
	Martin J. Bligh

Martin J. Bligh wrote:
> Martin J. Bligh wrote:
> 
>> Panic on PPC64. I'm guessing it's the same as the i386 panics I sent
>> you yesterday, just more cryptic ;-) But for the record ...
>>
>> http://test.kernel.org/abat/37737/debug/console.log
>>
>> cpu 0x2: Vector: 300 (Data Access) at [c0000000f99f78c0]
>>     pc: c0000000000c6a34: .s_show+0x178/0x364
>>     lr: c0000000000c696c: .s_show+0xb0/0x364
>>     sp: c0000000f99f7b40
>>    msr: 8000000000001032
>>    dar: fd528000
>>  dsisr: 40000000
>>   current = 0xc0000000f23e0000
>>   paca    = 0xc00000000046e300
>>     pid   = 17653, comm = cp
>> enter ? for help
>>
> 
> Eeek, this is definitely an intermittent thing. I was trawling older
> results, and it shows up (on PPC only) in 2.6.17-git10, so it's not
> just an -mm thing ;-(

OK, still happens in -mm3, though in a different workload now. I also
get a new panic, that's maybe related but more informative



cpu 0x0: Vector: 700 (Program Check) at [c0000000024938b0]
     pc: c0000000000c3218: .free_block+0xe4/0x240
     lr: c0000000000c3514: .drain_array+0xf4/0x170
     sp: c000000002493b30
    msr: 8000000000021032
   current = 0xc0000000025457f0
   paca    = 0xc0000000004f9f00
     pid   = 14, comm = events/0
kernel BUG in list_del at include/linux/list.h:160!

Plus one with an actual backtrace from PPC64 that looks more like the
i386 ones

SMP NR_CPUS=32 NUMA
Modules linked in:
NIP: C0000000000A311C LR: C0000000000A30D4 CTR: C0000000000A3024
REGS: c0000007725b38d0 TRAP: 0300   Not tainted  (2.6.17-mm3-autokern1)
MSR: 8000000000001032 <ME,IR,DR>  CR: 28224424  XER: 00000000
DAR: 000000077BCC6180, DSISR: 0000000040000000
TASK = c00000002fc74670[29812] 'cp' THREAD: c0000007725b0000 CPU: 2
GPR00: 0000000000000000 C0000007725B3B50 C00000000063B828 C00000001E303EC0
GPR04: 0000000000000010 0000000000000000 0000000000000000 FFFFFFFFFFFFFFFD
GPR08: 0000000000000001 0000000000000000 000000077BCC6180 0000000000000000
GPR12: 0000000000000000 C00000000051FF80 0000000000000000 0000000000000001
GPR16: 0000000000000000 0000000000000004 0000000000020000 0000000000000000
GPR20: 0000000000000000 0000000000000000 C0000007759F9D00 0000000000000000
GPR24: 0000000000000E42 0000000000000000 000000000000474A C00000001E30F300
GPR28: 0000000000000000 0000000000000000 C000000000537288 C00000001E303E80
NIP [C0000000000A311C] .s_show+0xf8/0x364
LR [C0000000000A30D4] .s_show+0xb0/0x364
Call Trace:
[C0000007725B3B50] [C0000000000A3334] .s_show+0x310/0x364 (unreliable)
[C0000007725B3C20] [C0000000000D5E84] .seq_read+0x2f4/0x450
[C0000007725B3D00] [C0000000000AADF8] .vfs_read+0xe0/0x1b4
[C0000007725B3D90] [C0000000000AAFD4] .sys_read+0x54/0x98
[C0000007725B3E30] [C00000000000871C] syscall_exit+0x0/0x40
Instruction dump:
3b180001 7c004a78 79290020 7c0bfe70 7f5a4a14 7d600278 7c005850 54000ffe
7c094038 2c090000 41820008 ebbe80b0 <e96a0000> 2fab0000 419e0008 7c005a2c

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-27 15:37   ` 2.6.17-mm2 Martin J. Bligh
@ 2006-06-28 10:42     ` Andrew Morton
  2006-06-28 10:47       ` 2.6.17-mm2 Andrew Morton
  2006-06-28 15:43       ` 2.6.17-mm2 Jeremy Fitzhardinge
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2006-06-28 10:42 UTC (permalink / raw)
  To: Martin J. Bligh, Jeremy Fitzhardinge
  Cc: linuxppc64-dev, linux-kernel, mbligh, mbligh

On Tue, 27 Jun 2006 08:37:45 -0700
"Martin J. Bligh" <mbligh@mbligh.org> wrote:

> SMP NR_CPUS=32 NUMA
> Modules linked in:
> NIP: C0000000000A311C LR: C0000000000A30D4 CTR: C0000000000A3024
> REGS: c0000007725b38d0 TRAP: 0300   Not tainted  (2.6.17-mm3-autokern1)
> MSR: 8000000000001032 <ME,IR,DR>  CR: 28224424  XER: 00000000
> DAR: 000000077BCC6180, DSISR: 0000000040000000
> TASK = c00000002fc74670[29812] 'cp' THREAD: c0000007725b0000 CPU: 2
> GPR00: 0000000000000000 C0000007725B3B50 C00000000063B828 C00000001E303EC0
> GPR04: 0000000000000010 0000000000000000 0000000000000000 FFFFFFFFFFFFFFFD
> GPR08: 0000000000000001 0000000000000000 000000077BCC6180 0000000000000000
> GPR12: 0000000000000000 C00000000051FF80 0000000000000000 0000000000000001
> GPR16: 0000000000000000 0000000000000004 0000000000020000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 C0000007759F9D00 0000000000000000
> GPR24: 0000000000000E42 0000000000000000 000000000000474A C00000001E30F300
> GPR28: 0000000000000000 0000000000000000 C000000000537288 C00000001E303E80
> NIP [C0000000000A311C] .s_show+0xf8/0x364
> LR [C0000000000A30D4] .s_show+0xb0/0x364
> Call Trace:
> [C0000007725B3B50] [C0000000000A3334] .s_show+0x310/0x364 (unreliable)
> [C0000007725B3C20] [C0000000000D5E84] .seq_read+0x2f4/0x450
> [C0000007725B3D00] [C0000000000AADF8] .vfs_read+0xe0/0x1b4
> [C0000007725B3D90] [C0000000000AAFD4] .sys_read+0x54/0x98
> [C0000007725B3E30] [C00000000000871C] syscall_exit+0x0/0x40

This is caused by the vsprintf() changes.  Right now, if you do

	snprintf(buf, 4, "1111111111111");

the memory at `buf' gets [31 31 31 31 00], which is not good.

This'll plug it, but I didn't check very hard whether it still has any
off-by-ones, or if breaks the intent of Jeremy's patch.  I think it's OK..

--- a/lib/vsprintf.c~c
+++ a/lib/vsprintf.c
@@ -259,7 +259,9 @@ int vsnprintf(char *buf, size_t size, co
 	int len;
 	unsigned long long num;
 	int i, base;
-	char *str, *end, c;
+	char *str;		/* Where we're writing to */
+	char *end;		/* The last byte we can write to */
+	char c;
 	const char *s;
 
 	int flags;		/* flags to number() */
@@ -283,12 +285,12 @@ int vsnprintf(char *buf, size_t size, co
 	}
 
 	str = buf;
-	end = buf + size;
+	end = buf + size - 1;
 
 	/* Make sure end is always >= buf */
-	if (end < buf) {
+	if (end < buf - 1) {
 		end = ((void *) ~0ull);
-		size = end - buf;
+		size = end - buf + 1;
 	}
 
 	for (; *fmt ; ++fmt) {
_

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-28 10:42     ` 2.6.17-mm2 Andrew Morton
@ 2006-06-28 10:47       ` Andrew Morton
  2006-06-28 14:43         ` 2.6.17-mm2 Martin J. Bligh
  2006-06-28 15:43       ` 2.6.17-mm2 Jeremy Fitzhardinge
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2006-06-28 10:47 UTC (permalink / raw)
  To: mbligh, jeremy, mbligh, linux-kernel, apw, linuxppc64-dev

On Wed, 28 Jun 2006 03:42:15 -0700
Andrew Morton <akpm@osdl.org> wrote:

> his is caused by the vsprintf() changes.  Right now, if you do
> 
> 	snprintf(buf, 4, "1111111111111");
> 
> the memory at `buf' gets [31 31 31 31 00], which is not good.
> 
> This'll plug it, but I didn't check very hard whether it still has any
> off-by-ones, or if breaks the intent of Jeremy's patch.  I think it's OK..

That diff was against an older kernel and doesn't apply.  This is against
mainline:

--- a/lib/vsprintf.c~vsnprintf-fix
+++ a/lib/vsprintf.c
@@ -259,7 +259,9 @@ int vsnprintf(char *buf, size_t size, co
 	int len;
 	unsigned long long num;
 	int i, base;
-	char *str, *end, c;
+	char *str;		/* Where we're writing to */
+	char *end;		/* The last byte we can write to */
+	char c;
 	const char *s;
 
 	int flags;		/* flags to number() */
@@ -283,12 +285,12 @@ int vsnprintf(char *buf, size_t size, co
 	}
 
 	str = buf;
-	end = buf + size;
+	end = buf + size - 1;
 
 	/* Make sure end is always >= buf */
-	if (end < buf) {
+	if (end < buf - 1) {
 		end = ((void *)-1);
-		size = end - buf;
+		size = end - buf + 1;
 	}
 
 	for (; *fmt ; ++fmt) {
@@ -494,7 +496,6 @@ int vsnprintf(char *buf, size_t size, co
 	/* the trailing null byte doesn't count towards the total */
 	return str-buf;
 }
-
 EXPORT_SYMBOL(vsnprintf);
 
 /**
_

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-28 10:47       ` 2.6.17-mm2 Andrew Morton
@ 2006-06-28 14:43         ` Martin J. Bligh
  2006-06-28 15:06           ` 2.6.17-mm2 Andy Whitcroft
  2006-06-28 19:11           ` 2.6.17-mm2 Andrew Morton
  0 siblings, 2 replies; 11+ messages in thread
From: Martin J. Bligh @ 2006-06-28 14:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev

Andrew Morton wrote:
> On Wed, 28 Jun 2006 03:42:15 -0700
> Andrew Morton <akpm@osdl.org> wrote:
> 
> 
>>his is caused by the vsprintf() changes.  Right now, if you do
>>
>>	snprintf(buf, 4, "1111111111111");
>>
>>the memory at `buf' gets [31 31 31 31 00], which is not good.
>>
>>This'll plug it, but I didn't check very hard whether it still has any
>>off-by-ones, or if breaks the intent of Jeremy's patch.  I think it's OK..

Aha, you're a genius! How the hell did you figure that one out?

Andy / Steve ... any chance one of you could kick this through the
harness? Against -git10 or so, I'd think

Thanks,

M.

> That diff was against an older kernel and doesn't apply.  This is against
> mainline:
> 
> --- a/lib/vsprintf.c~vsnprintf-fix
> +++ a/lib/vsprintf.c
> @@ -259,7 +259,9 @@ int vsnprintf(char *buf, size_t size, co
>  	int len;
>  	unsigned long long num;
>  	int i, base;
> -	char *str, *end, c;
> +	char *str;		/* Where we're writing to */
> +	char *end;		/* The last byte we can write to */
> +	char c;
>  	const char *s;
>  
>  	int flags;		/* flags to number() */
> @@ -283,12 +285,12 @@ int vsnprintf(char *buf, size_t size, co
>  	}
>  
>  	str = buf;
> -	end = buf + size;
> +	end = buf + size - 1;
>  
>  	/* Make sure end is always >= buf */
> -	if (end < buf) {
> +	if (end < buf - 1) {
>  		end = ((void *)-1);
> -		size = end - buf;
> +		size = end - buf + 1;
>  	}
>  
>  	for (; *fmt ; ++fmt) {
> @@ -494,7 +496,6 @@ int vsnprintf(char *buf, size_t size, co
>  	/* the trailing null byte doesn't count towards the total */
>  	return str-buf;
>  }
> -
>  EXPORT_SYMBOL(vsnprintf);
>  
>  /**
> _
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-28 14:43         ` 2.6.17-mm2 Martin J. Bligh
@ 2006-06-28 15:06           ` Andy Whitcroft
  2006-06-28 19:11           ` 2.6.17-mm2 Andrew Morton
  1 sibling, 0 replies; 11+ messages in thread
From: Andy Whitcroft @ 2006-06-28 15:06 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, jeremy, drfickle, linux-kernel, mbligh,
	linuxppc64-dev

Martin J. Bligh wrote:
> Andrew Morton wrote:
> 
>> On Wed, 28 Jun 2006 03:42:15 -0700
>> Andrew Morton <akpm@osdl.org> wrote:
>>
>>
>>> his is caused by the vsprintf() changes.  Right now, if you do
>>>
>>>     snprintf(buf, 4, "1111111111111");
>>>
>>> the memory at `buf' gets [31 31 31 31 00], which is not good.
>>>
>>> This'll plug it, but I didn't check very hard whether it still has any
>>> off-by-ones, or if breaks the intent of Jeremy's patch.  I think it's
>>> OK..
> 
> 
> Aha, you're a genius! How the hell did you figure that one out?
> 
> Andy / Steve ... any chance one of you could kick this through the
> harness? Against -git10 or so, I'd think
> 
> Thanks,


Suitibly kicked ... against 2.6.17-git10.

-apw

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-28 14:43         ` 2.6.17-mm2 Martin J. Bligh
  2006-06-28 15:06           ` 2.6.17-mm2 Andy Whitcroft
@ 2006-06-28 19:11           ` Andrew Morton
  2006-06-28 19:22             ` 2.6.17-mm2 Jeremy Fitzhardinge
  2006-06-28 19:36             ` 2.6.17-mm2 Martin Bligh
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2006-06-28 19:11 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev

On Wed, 28 Jun 2006 07:43:14 -0700
"Martin J. Bligh" <mbligh@google.com> wrote:

> Andrew Morton wrote:
> > On Wed, 28 Jun 2006 03:42:15 -0700
> > Andrew Morton <akpm@osdl.org> wrote:
> > 
> > 
> >>his is caused by the vsprintf() changes.  Right now, if you do
> >>
> >>	snprintf(buf, 4, "1111111111111");
> >>
> >>the memory at `buf' gets [31 31 31 31 00], which is not good.
> >>
> >>This'll plug it, but I didn't check very hard whether it still has any
> >>off-by-ones, or if breaks the intent of Jeremy's patch.  I think it's OK..
> 
> Aha, you're a genius!

That's not what my kids say.

> How the hell did you figure that one out?

Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
tight loop.  With that happening, a little two-way wasn't able to make
it through `dbench 4' without soiling the upholstery.  Then bisection-searching.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-28 19:11           ` 2.6.17-mm2 Andrew Morton
@ 2006-06-28 19:22             ` Jeremy Fitzhardinge
  2006-06-28 19:49               ` 2.6.17-mm2 Andrew Morton
  2006-06-28 19:36             ` 2.6.17-mm2 Martin Bligh
  1 sibling, 1 reply; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-28 19:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: drfickle, linux-kernel, mbligh, Martin J. Bligh, linuxppc64-dev

Andrew Morton wrote:
> Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
> tight loop.  With that happening, a little two-way wasn't able to make
> it through `dbench 4' without soiling the upholstery.  Then bisection-searching.
>   
It's surprising it was so subtle.  I'd been running with that code for a 
month or so without a peep of problem...

    J

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-28 19:22             ` 2.6.17-mm2 Jeremy Fitzhardinge
@ 2006-06-28 19:49               ` Andrew Morton
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2006-06-28 19:49 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: drfickle, linux-kernel, mbligh, mbligh, linuxppc64-dev

On Wed, 28 Jun 2006 12:22:02 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Andrew Morton wrote:
> > Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
> > tight loop.  With that happening, a little two-way wasn't able to make
> > it through `dbench 4' without soiling the upholstery.  Then bisection-searching.
> >   
> It's surprising it was so subtle.  I'd been running with that code for a 
> month or so without a peep of problem...
> 

It'll only bite if someone does snprintf() into a too-short buffer.  That's
rare (it's usually a bug).  But it looks like the seq_file() code does it
when someone is trying to generate more than PAGE_SIZE's worth of data. 
Like /proc/slabinfo.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-28 19:11           ` 2.6.17-mm2 Andrew Morton
  2006-06-28 19:22             ` 2.6.17-mm2 Jeremy Fitzhardinge
@ 2006-06-28 19:36             ` Martin Bligh
  2006-06-29  0:17               ` 2.6.17-mm2 Martin J. Bligh
  1 sibling, 1 reply; 11+ messages in thread
From: Martin Bligh @ 2006-06-28 19:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev

>>How the hell did you figure that one out?
> 
> Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
> tight loop.  With that happening, a little two-way wasn't able to make
> it through `dbench 4' without soiling the upholstery.  Then bisection-searching.

Aha. we probably trigger it because the automated test harness dumps a
bunch of crap out of /proc before and after running dbench then ;-)

Thanks!

M.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-28 19:36             ` 2.6.17-mm2 Martin Bligh
@ 2006-06-29  0:17               ` Martin J. Bligh
  0 siblings, 0 replies; 11+ messages in thread
From: Martin J. Bligh @ 2006-06-29  0:17 UTC (permalink / raw)
  To: Martin Bligh
  Cc: Andrew Morton, jeremy, drfickle, linux-kernel, mbligh,
	linuxppc64-dev

Martin Bligh wrote:
>>> How the hell did you figure that one out?
>>
>>
>> Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
>> tight loop.  With that happening, a little two-way wasn't able to make
>> it through `dbench 4' without soiling the upholstery.  Then 
>> bisection-searching.
> 
> 
> Aha. we probably trigger it because the automated test harness dumps a
> bunch of crap out of /proc before and after running dbench then ;-)

OK, your patch does seem to fix it for the automated tests. Not 100%
reliable, since it was a little intermittent before, but it looks
good.

Thanks to both Andrew and Andy.

M.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.6.17-mm2
  2006-06-28 10:42     ` 2.6.17-mm2 Andrew Morton
  2006-06-28 10:47       ` 2.6.17-mm2 Andrew Morton
@ 2006-06-28 15:43       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-28 15:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linuxppc64-dev, linux-kernel, mbligh, Martin J. Bligh

Andrew Morton wrote:
> This is caused by the vsprintf() changes.  Right now, if you do
>
> 	snprintf(buf, 4, "1111111111111");
>
> the memory at `buf' gets [31 31 31 31 00], which is not good.
>
> This'll plug it, but I didn't check very hard whether it still has any
> off-by-ones, or if breaks the intent of Jeremy's patch.  I think it's OK..
>   
Damn.  This patch doesn't look right; the intent is that 'end' point to 
just beyond the formatted string.  I'm pretty sure I tested this, since 
its the most obvious test.  Clearly not enough.  I'll look into it.

    J

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-06-29  0:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <449D5D36.3040102@google.com>
     [not found] ` <449FF3A2.8010907@mbligh.org>
2006-06-27 15:37   ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 10:42     ` 2.6.17-mm2 Andrew Morton
2006-06-28 10:47       ` 2.6.17-mm2 Andrew Morton
2006-06-28 14:43         ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 15:06           ` 2.6.17-mm2 Andy Whitcroft
2006-06-28 19:11           ` 2.6.17-mm2 Andrew Morton
2006-06-28 19:22             ` 2.6.17-mm2 Jeremy Fitzhardinge
2006-06-28 19:49               ` 2.6.17-mm2 Andrew Morton
2006-06-28 19:36             ` 2.6.17-mm2 Martin Bligh
2006-06-29  0:17               ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 15:43       ` 2.6.17-mm2 Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).