* Re: 2.6.17-mm2 [not found] ` <449FF3A2.8010907@mbligh.org> @ 2006-06-27 15:37 ` Martin J. Bligh 2006-06-28 10:42 ` 2.6.17-mm2 Andrew Morton 0 siblings, 1 reply; 11+ messages in thread From: Martin J. Bligh @ 2006-06-27 15:37 UTC (permalink / raw) To: Martin J. Bligh Cc: Andrew Morton, linuxppc64-dev, Linux Kernel Mailing List, Martin J. Bligh Martin J. Bligh wrote: > Martin J. Bligh wrote: > >> Panic on PPC64. I'm guessing it's the same as the i386 panics I sent >> you yesterday, just more cryptic ;-) But for the record ... >> >> http://test.kernel.org/abat/37737/debug/console.log >> >> cpu 0x2: Vector: 300 (Data Access) at [c0000000f99f78c0] >> pc: c0000000000c6a34: .s_show+0x178/0x364 >> lr: c0000000000c696c: .s_show+0xb0/0x364 >> sp: c0000000f99f7b40 >> msr: 8000000000001032 >> dar: fd528000 >> dsisr: 40000000 >> current = 0xc0000000f23e0000 >> paca = 0xc00000000046e300 >> pid = 17653, comm = cp >> enter ? for help >> > > Eeek, this is definitely an intermittent thing. I was trawling older > results, and it shows up (on PPC only) in 2.6.17-git10, so it's not > just an -mm thing ;-( OK, still happens in -mm3, though in a different workload now. I also get a new panic, that's maybe related but more informative cpu 0x0: Vector: 700 (Program Check) at [c0000000024938b0] pc: c0000000000c3218: .free_block+0xe4/0x240 lr: c0000000000c3514: .drain_array+0xf4/0x170 sp: c000000002493b30 msr: 8000000000021032 current = 0xc0000000025457f0 paca = 0xc0000000004f9f00 pid = 14, comm = events/0 kernel BUG in list_del at include/linux/list.h:160! Plus one with an actual backtrace from PPC64 that looks more like the i386 ones SMP NR_CPUS=32 NUMA Modules linked in: NIP: C0000000000A311C LR: C0000000000A30D4 CTR: C0000000000A3024 REGS: c0000007725b38d0 TRAP: 0300 Not tainted (2.6.17-mm3-autokern1) MSR: 8000000000001032 <ME,IR,DR> CR: 28224424 XER: 00000000 DAR: 000000077BCC6180, DSISR: 0000000040000000 TASK = c00000002fc74670[29812] 'cp' THREAD: c0000007725b0000 CPU: 2 GPR00: 0000000000000000 C0000007725B3B50 C00000000063B828 C00000001E303EC0 GPR04: 0000000000000010 0000000000000000 0000000000000000 FFFFFFFFFFFFFFFD GPR08: 0000000000000001 0000000000000000 000000077BCC6180 0000000000000000 GPR12: 0000000000000000 C00000000051FF80 0000000000000000 0000000000000001 GPR16: 0000000000000000 0000000000000004 0000000000020000 0000000000000000 GPR20: 0000000000000000 0000000000000000 C0000007759F9D00 0000000000000000 GPR24: 0000000000000E42 0000000000000000 000000000000474A C00000001E30F300 GPR28: 0000000000000000 0000000000000000 C000000000537288 C00000001E303E80 NIP [C0000000000A311C] .s_show+0xf8/0x364 LR [C0000000000A30D4] .s_show+0xb0/0x364 Call Trace: [C0000007725B3B50] [C0000000000A3334] .s_show+0x310/0x364 (unreliable) [C0000007725B3C20] [C0000000000D5E84] .seq_read+0x2f4/0x450 [C0000007725B3D00] [C0000000000AADF8] .vfs_read+0xe0/0x1b4 [C0000007725B3D90] [C0000000000AAFD4] .sys_read+0x54/0x98 [C0000007725B3E30] [C00000000000871C] syscall_exit+0x0/0x40 Instruction dump: 3b180001 7c004a78 79290020 7c0bfe70 7f5a4a14 7d600278 7c005850 54000ffe 7c094038 2c090000 41820008 ebbe80b0 <e96a0000> 2fab0000 419e0008 7c005a2c ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-27 15:37 ` 2.6.17-mm2 Martin J. Bligh @ 2006-06-28 10:42 ` Andrew Morton 2006-06-28 10:47 ` 2.6.17-mm2 Andrew Morton 2006-06-28 15:43 ` 2.6.17-mm2 Jeremy Fitzhardinge 0 siblings, 2 replies; 11+ messages in thread From: Andrew Morton @ 2006-06-28 10:42 UTC (permalink / raw) To: Martin J. Bligh, Jeremy Fitzhardinge Cc: linuxppc64-dev, linux-kernel, mbligh, mbligh On Tue, 27 Jun 2006 08:37:45 -0700 "Martin J. Bligh" <mbligh@mbligh.org> wrote: > SMP NR_CPUS=32 NUMA > Modules linked in: > NIP: C0000000000A311C LR: C0000000000A30D4 CTR: C0000000000A3024 > REGS: c0000007725b38d0 TRAP: 0300 Not tainted (2.6.17-mm3-autokern1) > MSR: 8000000000001032 <ME,IR,DR> CR: 28224424 XER: 00000000 > DAR: 000000077BCC6180, DSISR: 0000000040000000 > TASK = c00000002fc74670[29812] 'cp' THREAD: c0000007725b0000 CPU: 2 > GPR00: 0000000000000000 C0000007725B3B50 C00000000063B828 C00000001E303EC0 > GPR04: 0000000000000010 0000000000000000 0000000000000000 FFFFFFFFFFFFFFFD > GPR08: 0000000000000001 0000000000000000 000000077BCC6180 0000000000000000 > GPR12: 0000000000000000 C00000000051FF80 0000000000000000 0000000000000001 > GPR16: 0000000000000000 0000000000000004 0000000000020000 0000000000000000 > GPR20: 0000000000000000 0000000000000000 C0000007759F9D00 0000000000000000 > GPR24: 0000000000000E42 0000000000000000 000000000000474A C00000001E30F300 > GPR28: 0000000000000000 0000000000000000 C000000000537288 C00000001E303E80 > NIP [C0000000000A311C] .s_show+0xf8/0x364 > LR [C0000000000A30D4] .s_show+0xb0/0x364 > Call Trace: > [C0000007725B3B50] [C0000000000A3334] .s_show+0x310/0x364 (unreliable) > [C0000007725B3C20] [C0000000000D5E84] .seq_read+0x2f4/0x450 > [C0000007725B3D00] [C0000000000AADF8] .vfs_read+0xe0/0x1b4 > [C0000007725B3D90] [C0000000000AAFD4] .sys_read+0x54/0x98 > [C0000007725B3E30] [C00000000000871C] syscall_exit+0x0/0x40 This is caused by the vsprintf() changes. Right now, if you do snprintf(buf, 4, "1111111111111"); the memory at `buf' gets [31 31 31 31 00], which is not good. This'll plug it, but I didn't check very hard whether it still has any off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK.. --- a/lib/vsprintf.c~c +++ a/lib/vsprintf.c @@ -259,7 +259,9 @@ int vsnprintf(char *buf, size_t size, co int len; unsigned long long num; int i, base; - char *str, *end, c; + char *str; /* Where we're writing to */ + char *end; /* The last byte we can write to */ + char c; const char *s; int flags; /* flags to number() */ @@ -283,12 +285,12 @@ int vsnprintf(char *buf, size_t size, co } str = buf; - end = buf + size; + end = buf + size - 1; /* Make sure end is always >= buf */ - if (end < buf) { + if (end < buf - 1) { end = ((void *) ~0ull); - size = end - buf; + size = end - buf + 1; } for (; *fmt ; ++fmt) { _ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-28 10:42 ` 2.6.17-mm2 Andrew Morton @ 2006-06-28 10:47 ` Andrew Morton 2006-06-28 14:43 ` 2.6.17-mm2 Martin J. Bligh 2006-06-28 15:43 ` 2.6.17-mm2 Jeremy Fitzhardinge 1 sibling, 1 reply; 11+ messages in thread From: Andrew Morton @ 2006-06-28 10:47 UTC (permalink / raw) To: mbligh, jeremy, mbligh, linux-kernel, apw, linuxppc64-dev On Wed, 28 Jun 2006 03:42:15 -0700 Andrew Morton <akpm@osdl.org> wrote: > his is caused by the vsprintf() changes. Right now, if you do > > snprintf(buf, 4, "1111111111111"); > > the memory at `buf' gets [31 31 31 31 00], which is not good. > > This'll plug it, but I didn't check very hard whether it still has any > off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK.. That diff was against an older kernel and doesn't apply. This is against mainline: --- a/lib/vsprintf.c~vsnprintf-fix +++ a/lib/vsprintf.c @@ -259,7 +259,9 @@ int vsnprintf(char *buf, size_t size, co int len; unsigned long long num; int i, base; - char *str, *end, c; + char *str; /* Where we're writing to */ + char *end; /* The last byte we can write to */ + char c; const char *s; int flags; /* flags to number() */ @@ -283,12 +285,12 @@ int vsnprintf(char *buf, size_t size, co } str = buf; - end = buf + size; + end = buf + size - 1; /* Make sure end is always >= buf */ - if (end < buf) { + if (end < buf - 1) { end = ((void *)-1); - size = end - buf; + size = end - buf + 1; } for (; *fmt ; ++fmt) { @@ -494,7 +496,6 @@ int vsnprintf(char *buf, size_t size, co /* the trailing null byte doesn't count towards the total */ return str-buf; } - EXPORT_SYMBOL(vsnprintf); /** _ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-28 10:47 ` 2.6.17-mm2 Andrew Morton @ 2006-06-28 14:43 ` Martin J. Bligh 2006-06-28 15:06 ` 2.6.17-mm2 Andy Whitcroft 2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton 0 siblings, 2 replies; 11+ messages in thread From: Martin J. Bligh @ 2006-06-28 14:43 UTC (permalink / raw) To: Andrew Morton; +Cc: jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev Andrew Morton wrote: > On Wed, 28 Jun 2006 03:42:15 -0700 > Andrew Morton <akpm@osdl.org> wrote: > > >>his is caused by the vsprintf() changes. Right now, if you do >> >> snprintf(buf, 4, "1111111111111"); >> >>the memory at `buf' gets [31 31 31 31 00], which is not good. >> >>This'll plug it, but I didn't check very hard whether it still has any >>off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK.. Aha, you're a genius! How the hell did you figure that one out? Andy / Steve ... any chance one of you could kick this through the harness? Against -git10 or so, I'd think Thanks, M. > That diff was against an older kernel and doesn't apply. This is against > mainline: > > --- a/lib/vsprintf.c~vsnprintf-fix > +++ a/lib/vsprintf.c > @@ -259,7 +259,9 @@ int vsnprintf(char *buf, size_t size, co > int len; > unsigned long long num; > int i, base; > - char *str, *end, c; > + char *str; /* Where we're writing to */ > + char *end; /* The last byte we can write to */ > + char c; > const char *s; > > int flags; /* flags to number() */ > @@ -283,12 +285,12 @@ int vsnprintf(char *buf, size_t size, co > } > > str = buf; > - end = buf + size; > + end = buf + size - 1; > > /* Make sure end is always >= buf */ > - if (end < buf) { > + if (end < buf - 1) { > end = ((void *)-1); > - size = end - buf; > + size = end - buf + 1; > } > > for (; *fmt ; ++fmt) { > @@ -494,7 +496,6 @@ int vsnprintf(char *buf, size_t size, co > /* the trailing null byte doesn't count towards the total */ > return str-buf; > } > - > EXPORT_SYMBOL(vsnprintf); > > /** > _ > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-28 14:43 ` 2.6.17-mm2 Martin J. Bligh @ 2006-06-28 15:06 ` Andy Whitcroft 2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton 1 sibling, 0 replies; 11+ messages in thread From: Andy Whitcroft @ 2006-06-28 15:06 UTC (permalink / raw) To: Martin J. Bligh Cc: Andrew Morton, jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev Martin J. Bligh wrote: > Andrew Morton wrote: > >> On Wed, 28 Jun 2006 03:42:15 -0700 >> Andrew Morton <akpm@osdl.org> wrote: >> >> >>> his is caused by the vsprintf() changes. Right now, if you do >>> >>> snprintf(buf, 4, "1111111111111"); >>> >>> the memory at `buf' gets [31 31 31 31 00], which is not good. >>> >>> This'll plug it, but I didn't check very hard whether it still has any >>> off-by-ones, or if breaks the intent of Jeremy's patch. I think it's >>> OK.. > > > Aha, you're a genius! How the hell did you figure that one out? > > Andy / Steve ... any chance one of you could kick this through the > harness? Against -git10 or so, I'd think > > Thanks, Suitibly kicked ... against 2.6.17-git10. -apw ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-28 14:43 ` 2.6.17-mm2 Martin J. Bligh 2006-06-28 15:06 ` 2.6.17-mm2 Andy Whitcroft @ 2006-06-28 19:11 ` Andrew Morton 2006-06-28 19:22 ` 2.6.17-mm2 Jeremy Fitzhardinge 2006-06-28 19:36 ` 2.6.17-mm2 Martin Bligh 1 sibling, 2 replies; 11+ messages in thread From: Andrew Morton @ 2006-06-28 19:11 UTC (permalink / raw) To: Martin J. Bligh; +Cc: jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev On Wed, 28 Jun 2006 07:43:14 -0700 "Martin J. Bligh" <mbligh@google.com> wrote: > Andrew Morton wrote: > > On Wed, 28 Jun 2006 03:42:15 -0700 > > Andrew Morton <akpm@osdl.org> wrote: > > > > > >>his is caused by the vsprintf() changes. Right now, if you do > >> > >> snprintf(buf, 4, "1111111111111"); > >> > >>the memory at `buf' gets [31 31 31 31 00], which is not good. > >> > >>This'll plug it, but I didn't check very hard whether it still has any > >>off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK.. > > Aha, you're a genius! That's not what my kids say. > How the hell did you figure that one out? Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a tight loop. With that happening, a little two-way wasn't able to make it through `dbench 4' without soiling the upholstery. Then bisection-searching. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton @ 2006-06-28 19:22 ` Jeremy Fitzhardinge 2006-06-28 19:49 ` 2.6.17-mm2 Andrew Morton 2006-06-28 19:36 ` 2.6.17-mm2 Martin Bligh 1 sibling, 1 reply; 11+ messages in thread From: Jeremy Fitzhardinge @ 2006-06-28 19:22 UTC (permalink / raw) To: Andrew Morton Cc: drfickle, linux-kernel, mbligh, Martin J. Bligh, linuxppc64-dev Andrew Morton wrote: > Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a > tight loop. With that happening, a little two-way wasn't able to make > it through `dbench 4' without soiling the upholstery. Then bisection-searching. > It's surprising it was so subtle. I'd been running with that code for a month or so without a peep of problem... J ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-28 19:22 ` 2.6.17-mm2 Jeremy Fitzhardinge @ 2006-06-28 19:49 ` Andrew Morton 0 siblings, 0 replies; 11+ messages in thread From: Andrew Morton @ 2006-06-28 19:49 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: drfickle, linux-kernel, mbligh, mbligh, linuxppc64-dev On Wed, 28 Jun 2006 12:22:02 -0700 Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Andrew Morton wrote: > > Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a > > tight loop. With that happening, a little two-way wasn't able to make > > it through `dbench 4' without soiling the upholstery. Then bisection-searching. > > > It's surprising it was so subtle. I'd been running with that code for a > month or so without a peep of problem... > It'll only bite if someone does snprintf() into a too-short buffer. That's rare (it's usually a bug). But it looks like the seq_file() code does it when someone is trying to generate more than PAGE_SIZE's worth of data. Like /proc/slabinfo. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton 2006-06-28 19:22 ` 2.6.17-mm2 Jeremy Fitzhardinge @ 2006-06-28 19:36 ` Martin Bligh 2006-06-29 0:17 ` 2.6.17-mm2 Martin J. Bligh 1 sibling, 1 reply; 11+ messages in thread From: Martin Bligh @ 2006-06-28 19:36 UTC (permalink / raw) To: Andrew Morton; +Cc: jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev >>How the hell did you figure that one out? > > Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a > tight loop. With that happening, a little two-way wasn't able to make > it through `dbench 4' without soiling the upholstery. Then bisection-searching. Aha. we probably trigger it because the automated test harness dumps a bunch of crap out of /proc before and after running dbench then ;-) Thanks! M. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-28 19:36 ` 2.6.17-mm2 Martin Bligh @ 2006-06-29 0:17 ` Martin J. Bligh 0 siblings, 0 replies; 11+ messages in thread From: Martin J. Bligh @ 2006-06-29 0:17 UTC (permalink / raw) To: Martin Bligh Cc: Andrew Morton, jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev Martin Bligh wrote: >>> How the hell did you figure that one out? >> >> >> Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a >> tight loop. With that happening, a little two-way wasn't able to make >> it through `dbench 4' without soiling the upholstery. Then >> bisection-searching. > > > Aha. we probably trigger it because the automated test harness dumps a > bunch of crap out of /proc before and after running dbench then ;-) OK, your patch does seem to fix it for the automated tests. Not 100% reliable, since it was a little intermittent before, but it looks good. Thanks to both Andrew and Andy. M. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2 2006-06-28 10:42 ` 2.6.17-mm2 Andrew Morton 2006-06-28 10:47 ` 2.6.17-mm2 Andrew Morton @ 2006-06-28 15:43 ` Jeremy Fitzhardinge 1 sibling, 0 replies; 11+ messages in thread From: Jeremy Fitzhardinge @ 2006-06-28 15:43 UTC (permalink / raw) To: Andrew Morton; +Cc: linuxppc64-dev, linux-kernel, mbligh, Martin J. Bligh Andrew Morton wrote: > This is caused by the vsprintf() changes. Right now, if you do > > snprintf(buf, 4, "1111111111111"); > > the memory at `buf' gets [31 31 31 31 00], which is not good. > > This'll plug it, but I didn't check very hard whether it still has any > off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK.. > Damn. This patch doesn't look right; the intent is that 'end' point to just beyond the formatted string. I'm pretty sure I tested this, since its the most obvious test. Clearly not enough. I'll look into it. J ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-06-29 0:17 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <449D5D36.3040102@google.com>
[not found] ` <449FF3A2.8010907@mbligh.org>
2006-06-27 15:37 ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 10:42 ` 2.6.17-mm2 Andrew Morton
2006-06-28 10:47 ` 2.6.17-mm2 Andrew Morton
2006-06-28 14:43 ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 15:06 ` 2.6.17-mm2 Andy Whitcroft
2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton
2006-06-28 19:22 ` 2.6.17-mm2 Jeremy Fitzhardinge
2006-06-28 19:49 ` 2.6.17-mm2 Andrew Morton
2006-06-28 19:36 ` 2.6.17-mm2 Martin Bligh
2006-06-29 0:17 ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 15:43 ` 2.6.17-mm2 Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).