* Re: 2.6.17-mm2
[not found] ` <449FF3A2.8010907@mbligh.org>
@ 2006-06-27 15:37 ` Martin J. Bligh
2006-06-28 10:42 ` 2.6.17-mm2 Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: Martin J. Bligh @ 2006-06-27 15:37 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Andrew Morton, linuxppc64-dev, Linux Kernel Mailing List,
Martin J. Bligh
Martin J. Bligh wrote:
> Martin J. Bligh wrote:
>
>> Panic on PPC64. I'm guessing it's the same as the i386 panics I sent
>> you yesterday, just more cryptic ;-) But for the record ...
>>
>> http://test.kernel.org/abat/37737/debug/console.log
>>
>> cpu 0x2: Vector: 300 (Data Access) at [c0000000f99f78c0]
>> pc: c0000000000c6a34: .s_show+0x178/0x364
>> lr: c0000000000c696c: .s_show+0xb0/0x364
>> sp: c0000000f99f7b40
>> msr: 8000000000001032
>> dar: fd528000
>> dsisr: 40000000
>> current = 0xc0000000f23e0000
>> paca = 0xc00000000046e300
>> pid = 17653, comm = cp
>> enter ? for help
>>
>
> Eeek, this is definitely an intermittent thing. I was trawling older
> results, and it shows up (on PPC only) in 2.6.17-git10, so it's not
> just an -mm thing ;-(
OK, still happens in -mm3, though in a different workload now. I also
get a new panic, that's maybe related but more informative
cpu 0x0: Vector: 700 (Program Check) at [c0000000024938b0]
pc: c0000000000c3218: .free_block+0xe4/0x240
lr: c0000000000c3514: .drain_array+0xf4/0x170
sp: c000000002493b30
msr: 8000000000021032
current = 0xc0000000025457f0
paca = 0xc0000000004f9f00
pid = 14, comm = events/0
kernel BUG in list_del at include/linux/list.h:160!
Plus one with an actual backtrace from PPC64 that looks more like the
i386 ones
SMP NR_CPUS=32 NUMA
Modules linked in:
NIP: C0000000000A311C LR: C0000000000A30D4 CTR: C0000000000A3024
REGS: c0000007725b38d0 TRAP: 0300 Not tainted (2.6.17-mm3-autokern1)
MSR: 8000000000001032 <ME,IR,DR> CR: 28224424 XER: 00000000
DAR: 000000077BCC6180, DSISR: 0000000040000000
TASK = c00000002fc74670[29812] 'cp' THREAD: c0000007725b0000 CPU: 2
GPR00: 0000000000000000 C0000007725B3B50 C00000000063B828 C00000001E303EC0
GPR04: 0000000000000010 0000000000000000 0000000000000000 FFFFFFFFFFFFFFFD
GPR08: 0000000000000001 0000000000000000 000000077BCC6180 0000000000000000
GPR12: 0000000000000000 C00000000051FF80 0000000000000000 0000000000000001
GPR16: 0000000000000000 0000000000000004 0000000000020000 0000000000000000
GPR20: 0000000000000000 0000000000000000 C0000007759F9D00 0000000000000000
GPR24: 0000000000000E42 0000000000000000 000000000000474A C00000001E30F300
GPR28: 0000000000000000 0000000000000000 C000000000537288 C00000001E303E80
NIP [C0000000000A311C] .s_show+0xf8/0x364
LR [C0000000000A30D4] .s_show+0xb0/0x364
Call Trace:
[C0000007725B3B50] [C0000000000A3334] .s_show+0x310/0x364 (unreliable)
[C0000007725B3C20] [C0000000000D5E84] .seq_read+0x2f4/0x450
[C0000007725B3D00] [C0000000000AADF8] .vfs_read+0xe0/0x1b4
[C0000007725B3D90] [C0000000000AAFD4] .sys_read+0x54/0x98
[C0000007725B3E30] [C00000000000871C] syscall_exit+0x0/0x40
Instruction dump:
3b180001 7c004a78 79290020 7c0bfe70 7f5a4a14 7d600278 7c005850 54000ffe
7c094038 2c090000 41820008 ebbe80b0 <e96a0000> 2fab0000 419e0008 7c005a2c
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-27 15:37 ` 2.6.17-mm2 Martin J. Bligh
@ 2006-06-28 10:42 ` Andrew Morton
2006-06-28 10:47 ` 2.6.17-mm2 Andrew Morton
2006-06-28 15:43 ` 2.6.17-mm2 Jeremy Fitzhardinge
0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2006-06-28 10:42 UTC (permalink / raw)
To: Martin J. Bligh, Jeremy Fitzhardinge
Cc: linuxppc64-dev, linux-kernel, mbligh, mbligh
On Tue, 27 Jun 2006 08:37:45 -0700
"Martin J. Bligh" <mbligh@mbligh.org> wrote:
> SMP NR_CPUS=32 NUMA
> Modules linked in:
> NIP: C0000000000A311C LR: C0000000000A30D4 CTR: C0000000000A3024
> REGS: c0000007725b38d0 TRAP: 0300 Not tainted (2.6.17-mm3-autokern1)
> MSR: 8000000000001032 <ME,IR,DR> CR: 28224424 XER: 00000000
> DAR: 000000077BCC6180, DSISR: 0000000040000000
> TASK = c00000002fc74670[29812] 'cp' THREAD: c0000007725b0000 CPU: 2
> GPR00: 0000000000000000 C0000007725B3B50 C00000000063B828 C00000001E303EC0
> GPR04: 0000000000000010 0000000000000000 0000000000000000 FFFFFFFFFFFFFFFD
> GPR08: 0000000000000001 0000000000000000 000000077BCC6180 0000000000000000
> GPR12: 0000000000000000 C00000000051FF80 0000000000000000 0000000000000001
> GPR16: 0000000000000000 0000000000000004 0000000000020000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 C0000007759F9D00 0000000000000000
> GPR24: 0000000000000E42 0000000000000000 000000000000474A C00000001E30F300
> GPR28: 0000000000000000 0000000000000000 C000000000537288 C00000001E303E80
> NIP [C0000000000A311C] .s_show+0xf8/0x364
> LR [C0000000000A30D4] .s_show+0xb0/0x364
> Call Trace:
> [C0000007725B3B50] [C0000000000A3334] .s_show+0x310/0x364 (unreliable)
> [C0000007725B3C20] [C0000000000D5E84] .seq_read+0x2f4/0x450
> [C0000007725B3D00] [C0000000000AADF8] .vfs_read+0xe0/0x1b4
> [C0000007725B3D90] [C0000000000AAFD4] .sys_read+0x54/0x98
> [C0000007725B3E30] [C00000000000871C] syscall_exit+0x0/0x40
This is caused by the vsprintf() changes. Right now, if you do
snprintf(buf, 4, "1111111111111");
the memory at `buf' gets [31 31 31 31 00], which is not good.
This'll plug it, but I didn't check very hard whether it still has any
off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK..
--- a/lib/vsprintf.c~c
+++ a/lib/vsprintf.c
@@ -259,7 +259,9 @@ int vsnprintf(char *buf, size_t size, co
int len;
unsigned long long num;
int i, base;
- char *str, *end, c;
+ char *str; /* Where we're writing to */
+ char *end; /* The last byte we can write to */
+ char c;
const char *s;
int flags; /* flags to number() */
@@ -283,12 +285,12 @@ int vsnprintf(char *buf, size_t size, co
}
str = buf;
- end = buf + size;
+ end = buf + size - 1;
/* Make sure end is always >= buf */
- if (end < buf) {
+ if (end < buf - 1) {
end = ((void *) ~0ull);
- size = end - buf;
+ size = end - buf + 1;
}
for (; *fmt ; ++fmt) {
_
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-28 10:42 ` 2.6.17-mm2 Andrew Morton
@ 2006-06-28 10:47 ` Andrew Morton
2006-06-28 14:43 ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 15:43 ` 2.6.17-mm2 Jeremy Fitzhardinge
1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2006-06-28 10:47 UTC (permalink / raw)
To: mbligh, jeremy, mbligh, linux-kernel, apw, linuxppc64-dev
On Wed, 28 Jun 2006 03:42:15 -0700
Andrew Morton <akpm@osdl.org> wrote:
> his is caused by the vsprintf() changes. Right now, if you do
>
> snprintf(buf, 4, "1111111111111");
>
> the memory at `buf' gets [31 31 31 31 00], which is not good.
>
> This'll plug it, but I didn't check very hard whether it still has any
> off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK..
That diff was against an older kernel and doesn't apply. This is against
mainline:
--- a/lib/vsprintf.c~vsnprintf-fix
+++ a/lib/vsprintf.c
@@ -259,7 +259,9 @@ int vsnprintf(char *buf, size_t size, co
int len;
unsigned long long num;
int i, base;
- char *str, *end, c;
+ char *str; /* Where we're writing to */
+ char *end; /* The last byte we can write to */
+ char c;
const char *s;
int flags; /* flags to number() */
@@ -283,12 +285,12 @@ int vsnprintf(char *buf, size_t size, co
}
str = buf;
- end = buf + size;
+ end = buf + size - 1;
/* Make sure end is always >= buf */
- if (end < buf) {
+ if (end < buf - 1) {
end = ((void *)-1);
- size = end - buf;
+ size = end - buf + 1;
}
for (; *fmt ; ++fmt) {
@@ -494,7 +496,6 @@ int vsnprintf(char *buf, size_t size, co
/* the trailing null byte doesn't count towards the total */
return str-buf;
}
-
EXPORT_SYMBOL(vsnprintf);
/**
_
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-28 10:47 ` 2.6.17-mm2 Andrew Morton
@ 2006-06-28 14:43 ` Martin J. Bligh
2006-06-28 15:06 ` 2.6.17-mm2 Andy Whitcroft
2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton
0 siblings, 2 replies; 11+ messages in thread
From: Martin J. Bligh @ 2006-06-28 14:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev
Andrew Morton wrote:
> On Wed, 28 Jun 2006 03:42:15 -0700
> Andrew Morton <akpm@osdl.org> wrote:
>
>
>>his is caused by the vsprintf() changes. Right now, if you do
>>
>> snprintf(buf, 4, "1111111111111");
>>
>>the memory at `buf' gets [31 31 31 31 00], which is not good.
>>
>>This'll plug it, but I didn't check very hard whether it still has any
>>off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK..
Aha, you're a genius! How the hell did you figure that one out?
Andy / Steve ... any chance one of you could kick this through the
harness? Against -git10 or so, I'd think
Thanks,
M.
> That diff was against an older kernel and doesn't apply. This is against
> mainline:
>
> --- a/lib/vsprintf.c~vsnprintf-fix
> +++ a/lib/vsprintf.c
> @@ -259,7 +259,9 @@ int vsnprintf(char *buf, size_t size, co
> int len;
> unsigned long long num;
> int i, base;
> - char *str, *end, c;
> + char *str; /* Where we're writing to */
> + char *end; /* The last byte we can write to */
> + char c;
> const char *s;
>
> int flags; /* flags to number() */
> @@ -283,12 +285,12 @@ int vsnprintf(char *buf, size_t size, co
> }
>
> str = buf;
> - end = buf + size;
> + end = buf + size - 1;
>
> /* Make sure end is always >= buf */
> - if (end < buf) {
> + if (end < buf - 1) {
> end = ((void *)-1);
> - size = end - buf;
> + size = end - buf + 1;
> }
>
> for (; *fmt ; ++fmt) {
> @@ -494,7 +496,6 @@ int vsnprintf(char *buf, size_t size, co
> /* the trailing null byte doesn't count towards the total */
> return str-buf;
> }
> -
> EXPORT_SYMBOL(vsnprintf);
>
> /**
> _
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-28 14:43 ` 2.6.17-mm2 Martin J. Bligh
@ 2006-06-28 15:06 ` Andy Whitcroft
2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton
1 sibling, 0 replies; 11+ messages in thread
From: Andy Whitcroft @ 2006-06-28 15:06 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Andrew Morton, jeremy, drfickle, linux-kernel, mbligh,
linuxppc64-dev
Martin J. Bligh wrote:
> Andrew Morton wrote:
>
>> On Wed, 28 Jun 2006 03:42:15 -0700
>> Andrew Morton <akpm@osdl.org> wrote:
>>
>>
>>> his is caused by the vsprintf() changes. Right now, if you do
>>>
>>> snprintf(buf, 4, "1111111111111");
>>>
>>> the memory at `buf' gets [31 31 31 31 00], which is not good.
>>>
>>> This'll plug it, but I didn't check very hard whether it still has any
>>> off-by-ones, or if breaks the intent of Jeremy's patch. I think it's
>>> OK..
>
>
> Aha, you're a genius! How the hell did you figure that one out?
>
> Andy / Steve ... any chance one of you could kick this through the
> harness? Against -git10 or so, I'd think
>
> Thanks,
Suitibly kicked ... against 2.6.17-git10.
-apw
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-28 10:42 ` 2.6.17-mm2 Andrew Morton
2006-06-28 10:47 ` 2.6.17-mm2 Andrew Morton
@ 2006-06-28 15:43 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-28 15:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: linuxppc64-dev, linux-kernel, mbligh, Martin J. Bligh
Andrew Morton wrote:
> This is caused by the vsprintf() changes. Right now, if you do
>
> snprintf(buf, 4, "1111111111111");
>
> the memory at `buf' gets [31 31 31 31 00], which is not good.
>
> This'll plug it, but I didn't check very hard whether it still has any
> off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK..
>
Damn. This patch doesn't look right; the intent is that 'end' point to
just beyond the formatted string. I'm pretty sure I tested this, since
its the most obvious test. Clearly not enough. I'll look into it.
J
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-28 14:43 ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 15:06 ` 2.6.17-mm2 Andy Whitcroft
@ 2006-06-28 19:11 ` Andrew Morton
2006-06-28 19:22 ` 2.6.17-mm2 Jeremy Fitzhardinge
2006-06-28 19:36 ` 2.6.17-mm2 Martin Bligh
1 sibling, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2006-06-28 19:11 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev
On Wed, 28 Jun 2006 07:43:14 -0700
"Martin J. Bligh" <mbligh@google.com> wrote:
> Andrew Morton wrote:
> > On Wed, 28 Jun 2006 03:42:15 -0700
> > Andrew Morton <akpm@osdl.org> wrote:
> >
> >
> >>his is caused by the vsprintf() changes. Right now, if you do
> >>
> >> snprintf(buf, 4, "1111111111111");
> >>
> >>the memory at `buf' gets [31 31 31 31 00], which is not good.
> >>
> >>This'll plug it, but I didn't check very hard whether it still has any
> >>off-by-ones, or if breaks the intent of Jeremy's patch. I think it's OK..
>
> Aha, you're a genius!
That's not what my kids say.
> How the hell did you figure that one out?
Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
tight loop. With that happening, a little two-way wasn't able to make
it through `dbench 4' without soiling the upholstery. Then bisection-searching.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton
@ 2006-06-28 19:22 ` Jeremy Fitzhardinge
2006-06-28 19:49 ` 2.6.17-mm2 Andrew Morton
2006-06-28 19:36 ` 2.6.17-mm2 Martin Bligh
1 sibling, 1 reply; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2006-06-28 19:22 UTC (permalink / raw)
To: Andrew Morton
Cc: drfickle, linux-kernel, mbligh, Martin J. Bligh, linuxppc64-dev
Andrew Morton wrote:
> Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
> tight loop. With that happening, a little two-way wasn't able to make
> it through `dbench 4' without soiling the upholstery. Then bisection-searching.
>
It's surprising it was so subtle. I'd been running with that code for a
month or so without a peep of problem...
J
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton
2006-06-28 19:22 ` 2.6.17-mm2 Jeremy Fitzhardinge
@ 2006-06-28 19:36 ` Martin Bligh
2006-06-29 0:17 ` 2.6.17-mm2 Martin J. Bligh
1 sibling, 1 reply; 11+ messages in thread
From: Martin Bligh @ 2006-06-28 19:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: jeremy, drfickle, linux-kernel, mbligh, linuxppc64-dev
>>How the hell did you figure that one out?
>
> Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
> tight loop. With that happening, a little two-way wasn't able to make
> it through `dbench 4' without soiling the upholstery. Then bisection-searching.
Aha. we probably trigger it because the automated test harness dumps a
bunch of crap out of /proc before and after running dbench then ;-)
Thanks!
M.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-28 19:22 ` 2.6.17-mm2 Jeremy Fitzhardinge
@ 2006-06-28 19:49 ` Andrew Morton
0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2006-06-28 19:49 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: drfickle, linux-kernel, mbligh, mbligh, linuxppc64-dev
On Wed, 28 Jun 2006 12:22:02 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Andrew Morton wrote:
> > Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
> > tight loop. With that happening, a little two-way wasn't able to make
> > it through `dbench 4' without soiling the upholstery. Then bisection-searching.
> >
> It's surprising it was so subtle. I'd been running with that code for a
> month or so without a peep of problem...
>
It'll only bite if someone does snprintf() into a too-short buffer. That's
rare (it's usually a bug). But it looks like the seq_file() code does it
when someone is trying to generate more than PAGE_SIZE's worth of data.
Like /proc/slabinfo.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.17-mm2
2006-06-28 19:36 ` 2.6.17-mm2 Martin Bligh
@ 2006-06-29 0:17 ` Martin J. Bligh
0 siblings, 0 replies; 11+ messages in thread
From: Martin J. Bligh @ 2006-06-29 0:17 UTC (permalink / raw)
To: Martin Bligh
Cc: Andrew Morton, jeremy, drfickle, linux-kernel, mbligh,
linuxppc64-dev
Martin Bligh wrote:
>>> How the hell did you figure that one out?
>>
>>
>> Found a way to reproduce it - do `cat /proc/slabinfo > /dev/null' in a
>> tight loop. With that happening, a little two-way wasn't able to make
>> it through `dbench 4' without soiling the upholstery. Then
>> bisection-searching.
>
>
> Aha. we probably trigger it because the automated test harness dumps a
> bunch of crap out of /proc before and after running dbench then ;-)
OK, your patch does seem to fix it for the automated tests. Not 100%
reliable, since it was a little intermittent before, but it looks
good.
Thanks to both Andrew and Andy.
M.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-06-29 0:17 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <449D5D36.3040102@google.com>
[not found] ` <449FF3A2.8010907@mbligh.org>
2006-06-27 15:37 ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 10:42 ` 2.6.17-mm2 Andrew Morton
2006-06-28 10:47 ` 2.6.17-mm2 Andrew Morton
2006-06-28 14:43 ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 15:06 ` 2.6.17-mm2 Andy Whitcroft
2006-06-28 19:11 ` 2.6.17-mm2 Andrew Morton
2006-06-28 19:22 ` 2.6.17-mm2 Jeremy Fitzhardinge
2006-06-28 19:49 ` 2.6.17-mm2 Andrew Morton
2006-06-28 19:36 ` 2.6.17-mm2 Martin Bligh
2006-06-29 0:17 ` 2.6.17-mm2 Martin J. Bligh
2006-06-28 15:43 ` 2.6.17-mm2 Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).