* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
@ 2004-02-20 14:18 ` Keith Owens
2004-02-20 14:52 ` Andreas Schwab
` (26 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Keith Owens @ 2004-02-20 14:18 UTC (permalink / raw)
To: linux-ia64
On Fri, 20 Feb 2004 14:34:03 +0100,
Andreas Schwab <schwab@suse.de> wrote:
>This happens regularily on all our Intel Tiger SMP systems, haven't seen
>it on UP systems, and neither on any HP system we have. Theses oopses are
>all from different systems running kernels 2.6.2 and 2.6.3 in various
>incarnations. They are all using e1000 and mptscsih, unlike the HP
>systems, so it might be a bug in either of those drivers.
>
>pdflush[5742]: Oops 11012296146944 [1]
>
>Pid: 5742, CPU 2, comm: pdflush
>psr : 0000121008026018 ifs : 800000000000040b ip : [<a000000100485d01>] Not tainted
>ip is at ip_finish_output2+0x41/0x560
>unat: 0000000000000000 pfs : 0000000000000797 rsc : 0000000000000003
>rnat: e000000007ac7890 bsps: 0000000000001310 pr : 82aa6aa6a555969b
>ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
>csd : 0000000000000000 ssd : 0000000000000000
>b0 : a000000100451900 b6 : a000000100282bc0 b7 : a000000100485cc0
>f6 : 000000000000000000000 f7 : 1003e000000000011d541
>f8 : 1003e000000000016fa06 f9 : 1003e0000000000b38153
>f10 : 1003e000000000023aa82 f11 : 1003e0000000000000001
>r1 : a000000100988130 r2 : 000000005dde178a r3 : 0000000000100000
>r8 : 0000000000000001 r9 : 00000000000006a8 r10 : 840d3f83e0420336
>r11 : 840d3f83e04202b6 r12 : e000000007286830 r13 : e000000007280000
>r14 : 0000000000000001 r15 : a0000001004af498 r16 : a0000001004af530
>r17 : 000000005e01380a r18 : 0000000000232080 r19 : a0000001004af528
>r20 : a00000010084b2c0 r21 : 0000000000100000 r22 : 0000000000100000
>r23 : a0000001007e4ca0 r24 : a000000100485cc0 r25 : e000000007286840
>r26 : 00000000ffffffff r27 : e00000003feb66d8 r28 : 0000000000000000
>r29 : e00000003feb66d4 r30 : 0000000000100000 r31 : e00000003feb66d0
>
>Call Trace:
> [<a0000001000169e0>] show_stack+0x80/0xa0
> spà00000007286a38 bspà00000007286a18
> [<a000000100039270>] die+0x170/0x200
> spà00000007286c08 bspà000000072869d8
> [<a000000100057760>] ia64_do_page_fault+0x720/0xa60
> spà00000007286c08 bspà00000007286970
> [<a00000010000d560>] ia64_leave_kernel+0x0/0x260
> spà00000007286c98 bspà00000007286970
> <0>Kernel panic: Aiee, killing interrupt handler!
>In interrupt handler - not syncing
You should be getting a better backtrace than that. ia64_leave_kernel
is the start of the interrupt handler for the oops, the preceding trace
points are the interesting ones but they are missing.
Which compiler are you using? It could be a compiler fault, not
generating correct unwind data.
Can you try running with the kdb patch to see if that gives a better
backtrace for the oops?
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
2004-02-20 14:18 ` Keith Owens
@ 2004-02-20 14:52 ` Andreas Schwab
2004-02-20 16:41 ` David Mosberger
` (25 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andreas Schwab @ 2004-02-20 14:52 UTC (permalink / raw)
To: linux-ia64
Keith Owens <kaos@sgi.com> writes:
> You should be getting a better backtrace than that. ia64_leave_kernel
> is the start of the interrupt handler for the oops, the preceding trace
> points are the interesting ones but they are missing.
>
> Which compiler are you using?
gcc 3.3.3
> It could be a compiler fault, not generating correct unwind data.
The unwind check says this:
ERROR: ia64_monarch_init_handler: 186 slots, total region length = 0
1 error detected in 8160 functions.
> Can you try running with the kdb patch to see if that gives a better
> backtrace for the oops?
I'll try.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
2004-02-20 14:18 ` Keith Owens
2004-02-20 14:52 ` Andreas Schwab
@ 2004-02-20 16:41 ` David Mosberger
2004-02-20 17:11 ` Andreas Schwab
` (24 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2004-02-20 16:41 UTC (permalink / raw)
To: linux-ia64
>>>>> On Fri, 20 Feb 2004 14:34:03 +0100, Andreas Schwab <schwab@suse.de> said:
Andreas> This happens regularily on all our Intel Tiger SMP systems,
Andreas> haven't seen it on UP systems, and neither on any HP system
Andreas> we have. Theses oopses are all from different systems
Andreas> running kernels 2.6.2 and 2.6.3 in various incarnations.
As an additional data-point: we have a tiger here, too, and it's
running 2.6.2-rc2 just fine (it's a build-server so we try not to
reboot it too often). Are you saying that 2.6.1 was OK and you
started to see the problems only with 2.6.2?
--david
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (2 preceding siblings ...)
2004-02-20 16:41 ` David Mosberger
@ 2004-02-20 17:11 ` Andreas Schwab
2004-02-20 23:09 ` David Mosberger
` (23 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andreas Schwab @ 2004-02-20 17:11 UTC (permalink / raw)
To: linux-ia64
David Mosberger <davidm@napali.hpl.hp.com> writes:
>>>>>> On Fri, 20 Feb 2004 14:34:03 +0100, Andreas Schwab <schwab@suse.de> said:
>
> Andreas> This happens regularily on all our Intel Tiger SMP systems,
> Andreas> haven't seen it on UP systems, and neither on any HP system
> Andreas> we have. Theses oopses are all from different systems
> Andreas> running kernels 2.6.2 and 2.6.3 in various incarnations.
>
> As an additional data-point: we have a tiger here, too, and it's
> running 2.6.2-rc2 just fine (it's a build-server so we try not to
> reboot it too often). Are you saying that 2.6.1 was OK and you
> started to see the problems only with 2.6.2?
I can't tell when it started, but I think 2.6.1 already had similar
problems.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (3 preceding siblings ...)
2004-02-20 17:11 ` Andreas Schwab
@ 2004-02-20 23:09 ` David Mosberger
2004-02-22 13:58 ` Andreas Schwab
` (22 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2004-02-20 23:09 UTC (permalink / raw)
To: linux-ia64
>>>>> On Fri, 20 Feb 2004 18:11:36 +0100, Andreas Schwab <schwab@suse.de> said:
Andreas> David Mosberger <davidm@napali.hpl.hp.com> writes:
>>>>>>> On Fri, 20 Feb 2004 14:34:03 +0100, Andreas Schwab <schwab@suse.de> said:
Andreas> This happens regularily on all our Intel Tiger SMP systems,
Andreas> haven't seen it on UP systems, and neither on any HP system
Andreas> we have. Theses oopses are all from different systems
Andreas> running kernels 2.6.2 and 2.6.3 in various incarnations.
>> As an additional data-point: we have a tiger here, too, and it's
>> running 2.6.2-rc2 just fine (it's a build-server so we try not to
>> reboot it too often). Are you saying that 2.6.1 was OK and you
>> started to see the problems only with 2.6.2?
Andreas> I can't tell when it started, but I think 2.6.1 already had similar
Andreas> problems.
That's rather strange. I'm sure the Intel folks test on Tiger all the
time and Andrew also seems to test on his Tiger quite frequently.
Is this with a particular workload?
In any case, it's not good that the stack-trace is truncated.
gcc-3.3.3 should be good enough, so I'm not sure off hand what's going
wrong. If you could enable unwind-debugging (set UNW_DEBUG to 5 in
arch/ia64/kernel/unwind.c) and capture the resulting output during a
crash, it might get us further.
--david
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (4 preceding siblings ...)
2004-02-20 23:09 ` David Mosberger
@ 2004-02-22 13:58 ` Andreas Schwab
2004-02-22 14:08 ` Keith Owens
` (21 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andreas Schwab @ 2004-02-22 13:58 UTC (permalink / raw)
To: linux-ia64
Keith Owens <kaos@sgi.com> writes:
> Can you try running with the kdb patch to see if that gives a better
> backtrace for the oops?
kdb couldn't do it better.
[2]kdb> bt
Stack traceback for pid 21693
0xe00000002faf0000 21693 1 1 2 R 0xe00000002faf04b0 *pdflush
0xa0000001003ff290 kdba_main_loop+0x150
args (0x5, 0x5, 0x20000000030, 0x4, 0xe00000002faf6640)
kernel 0xa0000001003ff140 0xa0000001003ff2c0
0xa000000100283e60 kdb+0x820
args (0x5, 0x20000000030, 0xe00000002faf6640, 0xa00000010082bbd0, 0xa000000100887a84)
kernel 0xa000000100283640 0xa000000100284bc0
0xa0000001000393c0 die+0x1e0
args (0xe00000002faf64b0, 0xe00000002faf6640, 0x20000000030, 0xa00000010071fa80, 0xa000000100039570)
kernel 0xa0000001000391e0 0xa000000100039400
0xa000000100039570 ia64_fault+0x110
args (0x18, 0x20000000030, 0xe000000004c86b60, 0x3, 0xe00000002faf6640)
kernel 0xa000000100039460 0xa00000010003a4e0
0xa00000010000d640 ia64_leave_kernel
args (0x18, 0x20000000030, 0xe000000004c86b60, 0x3, 0xe00000002faf6640)
kernel 0xa00000010000d640 0xa00000010000d8a0
0xe00000002faf5f00 - No name. May be an area that has no unwind data
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (5 preceding siblings ...)
2004-02-22 13:58 ` Andreas Schwab
@ 2004-02-22 14:08 ` Keith Owens
2004-02-22 16:52 ` Andreas Schwab
` (20 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Keith Owens @ 2004-02-22 14:08 UTC (permalink / raw)
To: linux-ia64
On Sun, 22 Feb 2004 14:58:34 +0100,
Andreas Schwab <schwab@suse.de> wrote:
>[2]kdb> bt
>Stack traceback for pid 21693
>0xe00000002faf0000 21693 1 1 2 R 0xe00000002faf04b0 *pdflush
>0xa0000001003ff290 kdba_main_loop+0x150
> args (0x5, 0x5, 0x20000000030, 0x4, 0xe00000002faf6640)
> kernel 0xa0000001003ff140 0xa0000001003ff2c0
>0xa000000100283e60 kdb+0x820
> args (0x5, 0x20000000030, 0xe00000002faf6640, 0xa00000010082bbd0, 0xa000000100887a84)
> kernel 0xa000000100283640 0xa000000100284bc0
>0xa0000001000393c0 die+0x1e0
> args (0xe00000002faf64b0, 0xe00000002faf6640, 0x20000000030, 0xa00000010071fa80, 0xa000000100039570)
> kernel 0xa0000001000391e0 0xa000000100039400
>0xa000000100039570 ia64_fault+0x110
> args (0x18, 0x20000000030, 0xe000000004c86b60, 0x3, 0xe00000002faf6640)
> kernel 0xa000000100039460 0xa00000010003a4e0
>0xa00000010000d640 ia64_leave_kernel
> args (0x18, 0x20000000030, 0xe000000004c86b60, 0x3, 0xe00000002faf6640)
> kernel 0xa00000010000d640 0xa00000010000d8a0
>0xe00000002faf5f00 - No name. May be an area that has no unwind data
Which makes it a general unwind problem. Apply this patch to turn on
unwind debugging when in kdb.
Index: 25.2/arch/ia64/kernel/unwind.c
--- 25.2/arch/ia64/kernel/unwind.c Wed, 11 Feb 2004 11:17:55 +1100 kaos (linux-2.4/r/c/42_unwind.c 1.1.2.1.1.2.3.1.1.1.1.3.1.1.1.2 644)
+++ 25.2(w)/arch/ia64/kernel/unwind.c Mon, 23 Feb 2004 01:05:02 +1100 kaos (linux-2.4/r/c/42_unwind.c 1.1.2.1.1.2.3.1.1.1.1.3.1.1.1.2 644)
@@ -56,11 +56,12 @@
#define UNW_STATS 0 /* WARNING: this disabled interrupts for long time-spans!! */
+#define UNW_DEBUG 6
#ifdef UNW_DEBUG
static unsigned int unw_debug_level = UNW_DEBUG;
# ifdef CONFIG_KDB
# include <linux/kdb.h>
-# define UNW_DEBUG_ON(n) (unw_debug_level >= n && !KDB_IS_RUNNING())
+# define UNW_DEBUG_ON(n) (unw_debug_level >= n && KDB_IS_RUNNING())
# define UNW_DPRINT(n, ...) if (UNW_DEBUG_ON(n)) kdb_printf(__VA_ARGS__)
# else /* !CONFIG_KDB */
# define UNW_DEBUG_ON(n) unw_debug_level >= n
When pdflush drops into kdb, type these commands. I assume that you
can capture the output on a serial console.
set LINES 2000
set BTSP 1
bt
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (6 preceding siblings ...)
2004-02-22 14:08 ` Keith Owens
@ 2004-02-22 16:52 ` Andreas Schwab
2004-02-24 1:54 ` Grant Grundler
` (19 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andreas Schwab @ 2004-02-22 16:52 UTC (permalink / raw)
To: linux-ia64
Keith Owens <kaos@sgi.com> writes:
> +#define UNW_DEBUG 6
> #ifdef UNW_DEBUG
> static unsigned int unw_debug_level = UNW_DEBUG;
> # ifdef CONFIG_KDB
> # include <linux/kdb.h>
> -# define UNW_DEBUG_ON(n) (unw_debug_level >= n && !KDB_IS_RUNNING())
> +# define UNW_DEBUG_ON(n) (unw_debug_level >= n && KDB_IS_RUNNING())
> # define UNW_DPRINT(n, ...) if (UNW_DEBUG_ON(n)) kdb_printf(__VA_ARGS__)
> # else /* !CONFIG_KDB */
> # define UNW_DEBUG_ON(n) unw_debug_level >= n
This does not fit to the current sources even with the kdb patch applied.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (7 preceding siblings ...)
2004-02-22 16:52 ` Andreas Schwab
@ 2004-02-24 1:54 ` Grant Grundler
2004-02-27 10:16 ` Andreas Schwab
` (18 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Grant Grundler @ 2004-02-24 1:54 UTC (permalink / raw)
To: linux-ia64
On Fri, Feb 20, 2004 at 02:34:03PM +0100, Andreas Schwab wrote:
> This happens regularily on all our Intel Tiger SMP systems, haven't seen
> it on UP systems, and neither on any HP system we have. Theses oopses are
> all from different systems running kernels 2.6.2 and 2.6.3 in various
> incarnations. They are all using e1000 and mptscsih, unlike the HP
> systems, so it might be a bug in either of those drivers.
I have been running e1000 (and tg3) drivers locally quite a bit
the past couple of monthes.
rx2000 uses e1000 for built-in GigE NIC.
rx2600, zx6000, and a few other machines use LSI 53c1030 chip
(MPT/Fusion driver) on the motherboard. My rx2600 seems to be
happy with 2.6.3-rc2 though I've not run any disk IO stress
tests on it lately.
hth,
grant
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (8 preceding siblings ...)
2004-02-24 1:54 ` Grant Grundler
@ 2004-02-27 10:16 ` Andreas Schwab
2004-02-27 13:58 ` Keith Owens
` (17 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andreas Schwab @ 2004-02-27 10:16 UTC (permalink / raw)
To: linux-ia64
David Mosberger <davidm@napali.hpl.hp.com> writes:
> In any case, it's not good that the stack-trace is truncated.
> gcc-3.3.3 should be good enough, so I'm not sure off hand what's going
> wrong. If you could enable unwind-debugging (set UNW_DEBUG to 5 in
> arch/ia64/kernel/unwind.c) and capture the resulting output during a
> crash, it might get us further.
Here's what I get:
pdflush[18140]: Oops 11012296146944 [1]
Pid: 18140, CPU 1, comm: pdflush
psr : 0000121008026018 ifs : 8000000000000590 ip : [<a00000010046e0d1>] Not tainted
ip is at nf_iterate+0x111/0x240
unat: 0000000000000000 pfs : 0000000000000590 rsc : 0000000000000003
rnat: e00000003ccc4800 bsps: 0000000000000000 pr : 82aa6aa6a555a59b
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a00000010046e0f0 b6 : a0000001002968e0 b7 : a000000100260540
f6 : 000000000000000000000 f7 : 1003e000000000011d541
f8 : 1003e0000000000680264 f9 : 1003e00000000032c92ae
f10 : 1003e000000000023aa82 f11 : 1003e0000000000000001
r1 : a000000100a17200 r2 : ffffffffffefffff r3 : 0000000000100000
r8 : 0000000000000001 r9 : 0000000000000000 r10 : a000000100a17228
r11 : 0000000000000000 r12 : e0000000110e6790 r13 : e0000000110e0000
r14 : 0000000000000001 r15 : a000000100a17200 r16 : a000000100a17210
r17 : e00000003feb5400 r18 : a000000100a17200 r19 : 0000000000000000
r20 : a000000100a17200 r21 : 0000000000100000 r22 : 0000000000100000
r23 : a000000100885220 r24 : e0000000110e6750 r25 : e00000003feb541c
r26 : 00000000ffffffff r27 : e00000003feb5418 r28 : 0000000000000000
r29 : e00000003feb5414 r30 : 0000000000100000 r31 : e00000003feb5410
unwind.init_frame_info:
task 0xe0000000110e0000
rbs = [0xe0000000110e0ef0-0xe0000000110e6ac8)
stk = [0xe0000000110e6ac8-0xe0000000110e8000)
pr 0x82aa6aa6a55596a7
sw 0xe0000000110e6160
sp 0xe0000000110e6ac8
unwind.unw_init_frame_info:
bsp 0xe0000000110e6aa8
sol 0x4
ip 0xa000000100016ac0
unwind.build_script: ip 0xa000000100016ac0
unwind.build_script: state record for func 0xa000000100016a40, t$:
ar.pfs <- r34 0
psp <- psp+0x1d0 1
rp <- r33 4
Call Trace:
[<a000000100016ac0>] show_stack+0x80/0xa0
spà000000110e6ac8 bspà000000110e6aa8
unwind.build_script: ip 0xa000000100039350
unwind.build_script: state record for func 0xa0000001000391e0, ti:
ar.pfs <- r37 0
rp <- r36 4
[<a000000100039350>] die+0x170/0x220
spà000000110e6c98 bspà000000110e6a70
unwind.build_script: ip 0xa000000100058e60
unwind.build_script: state record for func 0xa000000100058740, t42:
ar.pfs <- r43 0
psp <- psp+0x90 1
rp <- r42 10
[<a000000100058e60>] ia64_do_page_fault+0x720/0xa60
spà000000110e6c98 bspà000000110e6a08
unwind.build_script: ip 0xa00000010000d640
unwind.desc_abi: interrupt frame
unwind.build_script: state record for func 0xa00000010000d640, t=0:
ar.pfs <- [sp+0x60] -1
psp <- psp+0x1d0 -1
rp <- [sp+0x58] -1
ar.unat <- [sp+0x68] -1
pr <- [sp+0x90] -1
ar.fpsr <- [sp+0xc0] -1
[<a00000010000d640>] ia64_leave_kernel+0x0/0x260
spà000000110e6d28 bspà000000110e6a08
unwind.unw_unwind: reached user-space (ip=0x148e)
unwind.init_frame_info:
task 0xe0000000110e0000
rbs = [0xe0000000110e0ef0-0xe0000000110e6b88)
stk = [0xe0000000110e6b88-0xe0000000110e8000)
pr 0x82aa6955a69aa99b
sw 0xe0000000110e6300
sp 0xe0000000110e6b88
unwind.init_frame_info:
task 0xe0000000026f8000
rbs = [0xe0000000026f8ef0-0xe0000000026f93d8)
stk = [0xe0000000026ffbf0-0xe000000002700000)
pr 0x90000050a655955b
sw 0xe0000000026ff9f0
sp 0xe0000000026ffbf0
unwind.init_frame_info:
task 0xe000000021e10000
rbs = [0xe000000021e10ef0-0xe000000021e11a10)
stk = [0xe000000021e173a0-0xe000000021e18000)
pr 0x900155566655955b
sw 0xe000000021e171a0
sp 0xe000000021e173a0
unwind.init_frame_info:
task 0xe000000003b70000
rbs = [0xe000000003b70ef0-0xe000000003b71948)
stk = [0xe000000003b77810-0xe000000003b78000)
pr 0x90000050a655959b
sw 0xe000000003b77610
sp 0xe000000003b77810
unwind.unw_init_frame_info:
bsp 0xe0000000026f9370
sol 0xd
ip 0xa0000001003ff290
unwind.unw_init_frame_info:
bsp 0xe000000003b718e0
sol 0xd
ip 0xa0000001003ff290
unwind.unw_init_frame_info:
bsp 0xe000000021e119a0
sol 0xd
ip 0xa0000001003ff290
unwind.build_script: ip 0xa0000001003ff290
unwind.build_script: ip 0xa0000001003ff290
unwind.build_script: ip 0xa0000001003ff290
unwind.build_script: state record for func 0xa0000001003ff140, tc:
ar.pfs <- r43 0
psp <- psp+0x20 1
rp <- r42 7
unwind.build_script: state record for func 0xa0000001003ff140, tc:
ar.pfs <- r43 0
psp <- psp+0x20 1
rp <- r42 7
unwind.build_script: state record for func 0xa0000001003ff140, tc:
ar.pfs <- r43 0
psp <- psp+0x20 1
rp <- r42 7
unwind.unw_init_frame_info:
bsp 0xe0000000110e6b20
sol 0xd
ip 0xa0000001003ff290
unwind.build_script: ip 0xa0000001003ff290
unwind.build_script: state record for func 0xa0000001003ff140, tc:
ar.pfs <- r43 0
psp <- psp+0x20 1
rp <- r42 7
Entering kdb (current=0xe0000000110e0000, pid 18140) on processor 1 Oops: <NULL>
due to oops @ 0xa00000010046e0d1
psr: 0x0000121008026018 ifs: 0x8000000000000590 ip: 0xa00000010046e0d0
unat: 0x0000000000000000 pfs: 0x0000000000000590 rsc: 0x0000000000000003
rnat: 0xe00000003ccc4800 bsps: 0x0000000000000000 pr: 0x82aa6aa6a555a59b
ldrs: 0x0000000000000000 ccv: 0x0000000000000000 fpsr: 0x0009804c0270033f
b0: 0xa00000010046e0f0 b6: 0xa0000001002968e0 b7: 0xa000000100260540
r1: 0xa000000100a17200 r2: 0xffffffffffefffff r3: 0x0000000000100000
r8: 0x0000000000000001 r9: 0x0000000000000000 r10: 0xa000000100a17228
r11: 0x0000000000000000 r12: 0xe0000000110e6790 r13: 0xe0000000110e0000
r14: 0x0000000000000001 r15: 0xa000000100a17200 r16: 0xa000000100a17210
r17: 0xe00000003feb5400 r18: 0xa000000100a17200 r19: 0x0000000000000000
r20: 0xa000000100a17200 r21: 0x0000000000100000 r22: 0x0000000000100000
r23: 0xa000000100885220 r24: 0xe0000000110e6750 r25: 0xe00000003feb541c
r26: 0x00000000ffffffff r27: 0xe00000003feb5418 r28: 0x0000000000000000
r29: 0xe00000003feb5414 r30: 0x0000000000100000 r31: 0xe00000003feb5410
®s = e0000000110e65d0
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (9 preceding siblings ...)
2004-02-27 10:16 ` Andreas Schwab
@ 2004-02-27 13:58 ` Keith Owens
2004-02-28 6:52 ` David Mosberger
` (16 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Keith Owens @ 2004-02-27 13:58 UTC (permalink / raw)
To: linux-ia64
On Fri, 27 Feb 2004 11:16:03 +0100,
Andreas Schwab <schwab@suse.de> wrote:
>pdflush[18140]: Oops 11012296146944 [1]
>
>Pid: 18140, CPU 1, comm: pdflush
>psr : 0000121008026018 ifs : 8000000000000590 ip : [<a00000010046e0d1>] Not tainted
>ip is at nf_iterate+0x111/0x240
>unwind.init_frame_info:
> task 0xe0000000110e0000
> rbs = [0xe0000000110e0ef0-0xe0000000110e6ac8)
> stk = [0xe0000000110e6ac8-0xe0000000110e8000)
> pr 0x82aa6aa6a55596a7
> sw 0xe0000000110e6160
> sp 0xe0000000110e6ac8
Ouch. rbs and stack have collided, kernel stack overflow. rbs shows
a normal start, then it loops with the same data over and over again
0xe0000000110e0ef0 3d0bf108 ....=.ñ.
0xe0000000110e0ef8 3fdad668 ....?ÚÖh
0xe0000000110e0f00 3d376460 ....}`
0xe0000000110e0f08 3c72c008 ....<rÀ.
0xe0000000110e0f10 3d376660 ....\x7f`
0xe0000000110e0f18 00000000 ........
0xe0000000110e0f20 3fd1ebf8 ....?Ñëø
0xe0000000110e0f28 8000000000000001 ........
0xe0000000110e0f30 3d376e90 ....=7n.
0xe0000000110e0f38 3d376a60 ....=7j`
0xe0000000110e0f40 3d376e80 ....=7n.
0xe0000000110e0f48 3d376ea8 ....=7n¨
0xe0000000110e0f50 00000001 ........
0xe0000000110e0f58 3d376e88 ....=7n.
0xe0000000110e0f60 3d375810 ....=7X.
0xe0000000110e0f68 3c91e590 ....<.å.
0xe0000000110e0f70 3c8d7060 ....<.p`
0xe0000000110e0f78 00000206 ........
0xe0000000110e0f80 3cb02000 ....<° .
0xe0000000110e0f88 3d0bf108 ....=.ñ.
0xe0000000110e0f90 00001491 ........
0xe0000000110e0f98 3c8d8b50 ....<..P
0xe0000000110e0fa0 00000998 ........
0xe0000000110e0fa8 80000000afb5952b ....¯µ.+
0xe0000000110e0fb0 3ff42000 ....?ô .
0xe0000000110e0fb8 00000001 ........
0xe0000000110e0fc0 3fdacd90 ....?ÚÍ.
0xe0000000110e0fc8 a00000010082d898 num_physpages
0xe0000000110e0fd0 a0000001000085a0 _start+0x280
0xe0000000110e0fd8 00000998 ........
0xe0000000110e0fe0 a000000100a17200 ....¡r.
0xe0000000110e0fe8 a00000010068ce90 start_kernel+0x530
0xe0000000110e0ff0 00000611 ........
0xe0000000110e0ff8 00000000 ........
0xe0000000110e1000 00000611 ........
0xe0000000110e1008 a000000100680990 __kstrtab_csum_partial_copy_nocheck+0xbc80
0xe0000000110e1010 00000000 ........
0xe0000000110e1018 00000000 ........
0xe0000000110e1020 e0000000046e0000 à....n..
0xe0000000110e1028 a000000100009090 rest_init+0x30
0xe0000000110e1030 00000186 ........
0xe0000000110e1038 a000000100a17200 ....¡r.
0xe0000000110e1040 a0000001006de230 __initcall_pdflush_init
0xe0000000110e1048 00000000 ........
0xe0000000110e1050 a000000100818660 _GLOBAL_OFFSET_TABLE_+0x1460
0xe0000000110e1058 a000000100818668 _GLOBAL_OFFSET_TABLE_+0x1468
0xe0000000110e1060 a000000100818678 _GLOBAL_OFFSET_TABLE_+0x1478
0xe0000000110e1068 a000000100818670 _GLOBAL_OFFSET_TABLE_+0x1470
0xe0000000110e1070 a0000001006d38f0 initcall_debug
0xe0000000110e1078 a0000001006de4f8 __con_initcall_start
0xe0000000110e1080 a000000100015480 kernel_thread+0x100
0xe0000000110e1088 00000389 ........
0xe0000000110e1090 a000000100a17200 ....¡r.
0xe0000000110e1098 a000000100009600 init+0x460
0xe0000000110e10a0 0000058e ........
0xe0000000110e10a8 00000000 ........
0xe0000000110e10b0 a0000001006a6bd0 pdflush_init+0x30
0xe0000000110e10b8 00000183 ........
0xe0000000110e10c0 00000000 ........
0xe0000000110e10c8 a000000100682540 __kstrtab_csum_partial_copy_nocheck+0xd830
0xe0000000110e10d0 00000000 ........
0xe0000000110e10d8 00000000 ........
0xe0000000110e10e0 e00000003c738000 à...<s..
0xe0000000110e10e8 a0000001000eb790 start_one_pdflush_thread+0x30
0xe0000000110e10f0 00000186 ........
0xe0000000110e10f8 a000000100a17200 ....¡r.
0xe0000000110e1100 a00000010072f670 pdflush_list
0xe0000000110e1108 e00000003c65fe18 à...<eþ.
0xe0000000110e1110 a00000010082d858 pdflush_lock
0xe0000000110e1118 e00000003c65fe08 à...<eþ.
0xe0000000110e1120 e00000003c65fe20 à...<eþ
0xe0000000110e1128 a00000010082b8a0 jiffies
0xe0000000110e1130 e00000003c65fe28 à...<eþ(
0xe0000000110e1138 00004000 ......@.
0xe0000000110e1140 00000001 ........
0xe0000000110e1148 a00000010082d860 last_empty_jifs
0xe0000000110e1150 e00000003c65fe10 à...<eþ.
0xe0000000110e1158 a00000010081cff8 _GLOBAL_OFFSET_TABLE_+0x5df8
0xe0000000110e1160 a00000010082ba80 nr_pdflush_threads
0xe0000000110e1168 00000400 ........
0xe0000000110e1170 a00000010081d000 _GLOBAL_OFFSET_TABLE_+0x5e00
0xe0000000110e1178 a00000010072f678 pdflush_list+0x8
0xe0000000110e1180 a000000100015480 kernel_thread+0x100
0xe0000000110e1188 00000389 ........
0xe0000000110e1190 a000000100a17200 ....¡r.
0xe0000000110e1198 a0000001000ebb40 pdflush+0x380
0xe0000000110e11a0 00000994 ........
0xe0000000110e11a8 e00000003c65fcd8 à...<eüØ
0xe0000000110e11b0 a000000100682540 __kstrtab_csum_partial_copy_nocheck+0xd830
Repeat from __kstrtab_csum_partial_copy_nocheck+0xd830 until the stack
overflows. That certainly explains why the backtrace is failing. Now
the real question is why did the code loop?
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (10 preceding siblings ...)
2004-02-27 13:58 ` Keith Owens
@ 2004-02-28 6:52 ` David Mosberger
2004-02-28 9:39 ` David Mosberger
` (15 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2004-02-28 6:52 UTC (permalink / raw)
To: linux-ia64
>>>>> On Sat, 28 Feb 2004 00:58:20 +1100, Keith Owens <kaos@sgi.com> said:
Keith> On Fri, 27 Feb 2004 11:16:03 +0100,
Keith> Andreas Schwab <schwab@suse.de> wrote:
>> pdflush[18140]: Oops 11012296146944 [1]
>> Pid: 18140, CPU 1, comm: pdflush
>> psr : 0000121008026018 ifs : 8000000000000590 ip : [<a00000010046e0d1>] Not tainted
>> ip is at nf_iterate+0x111/0x240
>> unwind.init_frame_info:
>> task 0xe0000000110e0000
>> rbs = [0xe0000000110e0ef0-0xe0000000110e6ac8)
>> stk = [0xe0000000110e6ac8-0xe0000000110e8000)
>> pr 0x82aa6aa6a55596a7
>> sw 0xe0000000110e6160
>> sp 0xe0000000110e6ac8
Keith> Ouch. rbs and stack have collided, kernel stack overflow. rbs shows
Keith> a normal start, then it loops with the same data over and over again
So if I'm reading this right, we get a case that looks like unbounded
recursion:
pdflush -> start_one_pdflush_thread -> kernel_thread -> pdflush ...
Except, I don't think this is real recursion. Instead, we effectively
get a (potentially unbounded) sequence of one kernel thread creating
another thread. Each new kernel thread gets nested one deeper,
eventually leading to a stack overflow...
Argh, this wasn't supposed to happen! It's not entirely trivial to
fix. Obviously we could try to modify copy_thread() so it resets the
stack to the top, but in doing so, we still must preserve the stack
frame of kernel_thread(). That wouldn't be a problem---if only we
knew how big that frame was! (Well, OK, then there would also be RNaT
slots to worry about, but that could be handled by ensuring that the
new and old stacks are congruent in that regard).
Hmmh, I think perhaps the right way to fix this is to use a separate
continuation function, which will then take care of doing the
child-specific actions. Let me see if I can come up with something.
Oh, well, now I'm finding that this is of course exactly how Linus
changed the x86 code some 19 months ago (for other reasons though, it
seems):
http://linux.bkbits.net:8080/linux-2.5/diffs/arch/i386/kernel/process.c@1.19.1.11
Say, Andreas, did you by chance have 3 disk drives in your Tiger?
Does it boot fine if you remove one or two of the disks?
--david
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (11 preceding siblings ...)
2004-02-28 6:52 ` David Mosberger
@ 2004-02-28 9:39 ` David Mosberger
2004-02-28 9:45 ` Keith Owens
` (14 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2004-02-28 9:39 UTC (permalink / raw)
To: linux-ia64
>>>>> On Fri, 27 Feb 2004 22:52:46 -0800, David Mosberger <davidm@linux.hpl.hp.com> said:
David> Hmmh, I think perhaps the right way to fix this is to use a separate
David> continuation function, which will then take care of doing the
David> child-specific actions. Let me see if I can come up with something.
OK, how about the attached patch? Does it fix the problem for you,
Andreas?
--david
=== arch/ia64/kernel/head.S 1.16 vs edited ==--- 1.16/arch/ia64/kernel/head.S Wed Dec 10 17:28:59 2003
+++ edited/arch/ia64/kernel/head.S Sat Feb 28 00:40:31 2004
@@ -816,6 +816,19 @@
br.ret.sptk.many rp
END(ia64_delay_loop)
+GLOBAL_ENTRY(ia64_invoke_kernel_thread_helper)
+ .prologue
+ .save rp, r0 // this is the end of the call-chain
+ .body
+ alloc r2 = ar.pfs, 0, 0, 2, 0
+ mov out0 = r9
+ mov out1 = r11;;
+ br.call.sptk.many rp = kernel_thread_helper;;
+ mov out0 = r8
+ br.call.sptk.many rp = sys_exit;;
+1: br.sptk.few 1b // not reached
+END(ia64_invoke_kernel_thread_helper)
+
#ifdef CONFIG_IA64_BRL_EMU
/*
=== arch/ia64/kernel/process.c 1.51 vs edited ==--- 1.51/arch/ia64/kernel/process.c Thu Jan 8 17:52:52 2004
+++ edited/arch/ia64/kernel/process.c Sat Feb 28 01:19:59 2004
@@ -259,10 +259,12 @@
*
* We get here through the following call chain:
*
- * <clone syscall>
- * sys_clone
- * do_fork
- * copy_thread
+ * from user-level: from kernel:
+ *
+ * <clone syscall> <some kernel call frames>
+ * sys_clone :
+ * do_fork do_fork
+ * copy_thread copy_thread
*
* This means that the stack layout is as follows:
*
@@ -276,9 +278,6 @@
* | | <-- sp (lowest addr)
* +---------------------+
*
- * Note: if we get called through kernel_thread() then the memory above "(highest addr)"
- * is valid kernel stack memory that needs to be copied as well.
- *
* Observe that we copy the unat values that are in pt_regs and switch_stack. Spilling an
* integer to address X causes bit N in ar.unat to be set to the NaT bit of the register,
* with N=(X & 0x1ff)/8. Thus, copying the unat value preserves the NaT bits ONLY if the
@@ -291,9 +290,9 @@
unsigned long user_stack_base, unsigned long user_stack_size,
struct task_struct *p, struct pt_regs *regs)
{
- unsigned long rbs, child_rbs, rbs_size, stack_offset, stack_top, stack_used;
- struct switch_stack *child_stack, *stack;
extern char ia64_ret_from_clone, ia32_ret_from_clone;
+ struct switch_stack *child_stack, *stack;
+ unsigned long rbs, child_rbs, rbs_size;
struct pt_regs *child_ptregs;
int retval = 0;
@@ -306,16 +305,13 @@
return 0;
#endif
- stack_top = (unsigned long) current + IA64_STK_OFFSET;
stack = ((struct switch_stack *) regs) - 1;
- stack_used = stack_top - (unsigned long) stack;
- stack_offset = IA64_STK_OFFSET - stack_used;
- child_stack = (struct switch_stack *) ((unsigned long) p + stack_offset);
- child_ptregs = (struct pt_regs *) (child_stack + 1);
+ child_ptregs = (struct pt_regs *) ((unsigned long) p + IA64_STK_OFFSET) - 1;
+ child_stack = (struct switch_stack *) child_ptregs - 1;
/* copy parent's switch_stack & pt_regs to child: */
- memcpy(child_stack, stack, stack_used);
+ memcpy(child_stack, stack, sizeof(*child_ptregs) + sizeof(*child_stack));
rbs = (unsigned long) current + IA64_RBS_OFFSET;
child_rbs = (unsigned long) p + IA64_RBS_OFFSET;
@@ -324,7 +320,7 @@
/* copy the parent's register backing store to the child: */
memcpy((void *) child_rbs, (void *) rbs, rbs_size);
- if (user_mode(child_ptregs)) {
+ if (likely(user_mode(child_ptregs))) {
if ((clone_flags & CLONE_SETTLS) && !IS_IA32_PROCESS(regs))
child_ptregs->r13 = regs->r16; /* see sys_clone2() in entry.S */
if (user_stack_base) {
@@ -341,14 +337,14 @@
* been taken care of by the caller of sys_clone()
* already.
*/
- child_ptregs->r12 = (unsigned long) (child_ptregs + 1); /* kernel sp */
+ child_ptregs->r12 = (unsigned long) child_ptregs - 16; /* kernel sp */
child_ptregs->r13 = (unsigned long) p; /* set `current' pointer */
}
+ child_stack->ar_bspstore = child_rbs + rbs_size;
if (IS_IA32_PROCESS(regs))
child_stack->b0 = (unsigned long) &ia32_ret_from_clone;
else
child_stack->b0 = (unsigned long) &ia64_ret_from_clone;
- child_stack->ar_bspstore = child_rbs + rbs_size;
/* copy parts of thread_struct: */
p->thread.ksp = (unsigned long) child_stack - 16;
@@ -358,8 +354,8 @@
* therefore we must specify them explicitly here and not include them in
* IA64_PSR_BITS_TO_CLEAR.
*/
- child_ptregs->cr_ipsr = ((child_ptregs->cr_ipsr | IA64_PSR_BITS_TO_SET)
- & ~(IA64_PSR_BITS_TO_CLEAR | IA64_PSR_PP | IA64_PSR_UP));
+ child_ptregs->cr_ipsr = ((child_ptregs->cr_ipsr | IA64_PSR_BITS_TO_SET)
+ & ~(IA64_PSR_BITS_TO_CLEAR | IA64_PSR_PP | IA64_PSR_UP));
/*
* NOTE: The calling convention considers all floating point
@@ -578,27 +574,43 @@
pid_t
kernel_thread (int (*fn)(void *), void *arg, unsigned long flags)
{
- struct task_struct *parent = current;
- int result;
- pid_t tid;
+ extern void ia64_invoke_kernel_thread_helper (void);
+ unsigned long *helper_fptr = (unsigned long *) &ia64_invoke_kernel_thread_helper;
+ struct {
+ struct switch_stack sw;
+ struct pt_regs pt;
+ } regs;
+
+ memset(®s, 0, sizeof(regs));
+ regs.pt.cr_iip = helper_fptr[0]; /* set entry point (IP) */
+ regs.pt.r1 = helper_fptr[1]; /* set GP */
+ regs.pt.r9 = (unsigned long) fn; /* 1st argument */
+ regs.pt.r11 = (unsigned long) arg; /* 2nd argument */
+ /* Preserve PSR bits, except for bits 32-34 and 37-45, which we can't read. */
+ regs.pt.cr_ipsr = ia64_getreg(_IA64_REG_PSR) | IA64_PSR_BN;
+ regs.pt.cr_ifs = 1UL << 63; /* mark as valid, empty frame */
+ regs.sw.ar_fpsr = regs.pt.ar_fpsr = ia64_getreg(_IA64_REG_AR_FPSR);
+ regs.sw.ar_bspstore = (unsigned long) current + IA64_RBS_OFFSET;
+
+ return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, ®s.pt, 0, NULL, NULL);
+}
+EXPORT_SYMBOL(kernel_thread);
- tid = clone(flags | CLONE_VM | CLONE_UNTRACED, 0);
- if (parent != current) {
+/* This gets called from kernel_thread() via ia64_invoke_thread_helper(). */
+int
+kernel_thread_helper (int (*fn)(void *), void *arg)
+{
#ifdef CONFIG_IA32_SUPPORT
- if (IS_IA32_PROCESS(ia64_task_regs(current))) {
- /* A kernel thread is always a 64-bit process. */
- current->thread.map_base = DEFAULT_MAP_BASE;
- current->thread.task_size = DEFAULT_TASK_SIZE;
- ia64_set_kr(IA64_KR_IO_BASE, current->thread.old_iob);
- ia64_set_kr(IA64_KR_TSSD, current->thread.old_k1);
- }
-#endif
- result = (*fn)(arg);
- _exit(result);
+ if (IS_IA32_PROCESS(ia64_task_regs(current))) {
+ /* A kernel thread is always a 64-bit process. */
+ current->thread.map_base = DEFAULT_MAP_BASE;
+ current->thread.task_size = DEFAULT_TASK_SIZE;
+ ia64_set_kr(IA64_KR_IO_BASE, current->thread.old_iob);
+ ia64_set_kr(IA64_KR_TSSD, current->thread.old_k1);
}
- return tid;
+#endif
+ return (*fn)(arg);
}
-EXPORT_SYMBOL(kernel_thread);
/*
* Flush thread state. This is called when a thread does an execve().
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (12 preceding siblings ...)
2004-02-28 9:39 ` David Mosberger
@ 2004-02-28 9:45 ` Keith Owens
2004-02-28 10:00 ` Keith Owens
` (13 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Keith Owens @ 2004-02-28 9:45 UTC (permalink / raw)
To: linux-ia64
On Fri, 27 Feb 2004 22:52:46 -0800,
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>>>>>> On Sat, 28 Feb 2004 00:58:20 +1100, Keith Owens <kaos@sgi.com> said:
> Keith> Ouch. rbs and stack have collided, kernel stack overflow. rbs shows
> Keith> a normal start, then it loops with the same data over and over again
>
>So if I'm reading this right, we get a case that looks like unbounded
>recursion:
>
> pdflush -> start_one_pdflush_thread -> kernel_thread -> pdflush ...
>
>Except, I don't think this is real recursion. Instead, we effectively
>get a (potentially unbounded) sequence of one kernel thread creating
>another thread. Each new kernel thread gets nested one deeper,
>eventually leading to a stack overflow...
>
>Hmmh, I think perhaps the right way to fix this is to use a separate
>continuation function, which will then take care of doing the
>child-specific actions. Let me see if I can come up with something.
Separate the pdflush thread creation and move it to a single master
thread. This restricts the maximum stack depth already in use when
starting a worker pdflush thread.
--- 2.6.3-pristine/mm/pdflush.c Thu Dec 18 14:00:02 2003
+++ 2.6.3-pdflush/mm/pdflush.c Sat Feb 28 20:42:04 2004
@@ -5,6 +5,9 @@
*
* 09Apr2002 akpm@zip.com.au
* Initial version
+ * 28Feb2004 kaos@sgi.com
+ * Move worker thread creation to a master thread to avoid chewing
+ * up stack space with nested calls to kernel_thread.
*/
#include <linux/sched.h>
@@ -18,6 +21,7 @@
#include <linux/fs.h> // Needed by writeback.h
#include <linux/writeback.h> // Prototypes pdflush_operation()
+#include <asm/semaphore.h>
/*
* Minimum and maximum number of pdflush instances
@@ -58,6 +62,11 @@ int nr_pdflush_threads = 0;
static unsigned long last_empty_jifs;
/*
+ * up() this to start a new pdflush thread.
+ */
+static struct semaphore new_pdflush;
+
+/*
* The pdflush thread.
*
* Thread pool management algorithm:
@@ -207,13 +216,31 @@ int pdflush_operation(void (*fn)(unsigne
static void start_one_pdflush_thread(void)
{
- kernel_thread(pdflush, NULL, CLONE_KERNEL);
+ up(&new_pdflush);
+}
+
+/*
+ * Create all pdflush worker threads from a single master thread. Creating
+ * worker threads from inside worker threads chews up kernel stack space and
+ * eventually overflows the kernel stack.
+ */
+static int pdflush_master(void *dummy)
+{
+ daemonize("pdflush_master");
+ while (1) {
+ if (down_interruptible(&new_pdflush))
+ continue;
+ kernel_thread(pdflush, NULL, CLONE_KERNEL);
+ }
+ return 0;
}
static int __init pdflush_init(void)
{
int i;
+ kernel_thread(pdflush_master, NULL, CLONE_KERNEL);
+
for (i = 0; i < MIN_PDFLUSH_THREADS; i++)
start_one_pdflush_thread();
return 0;
===========================
This is what the ia64 stack for a pdflush worker thread looks like now.
It has used 560 bytes of stack from creation to sleep.
0xe00000003db08000 16 1 0 2 S 0xe00000003db08570 pdflush
0xa0000001000830f0 schedule+0xf30
0xa0000001000dab70 __pdflush+0x230
0xa0000001000daf00 pdflush+0x20
0xa000000100016bc0 kernel_thread+0x100
0xa0000001000db250 pdflush_master+0xb0
0xa000000100016bc0 kernel_thread+0x100
0xa000000100650670 pdflush_init+0x30
0xa000000100641100 do_initcalls+0xc0
0xa000000100009300 init+0xe0
0xa000000100016bc0 kernel_thread+0x100
0xa000000100009090 rest_init+0x30
0xa000000100640f80 start_kernel+0x460
0xa0000001000085a0 _start+0x280
It would be nice if kernel_thread reset the stack every time it was
called, but that requires arch specific helper code. Until that is
available for every arch, avoid recursive calls to kernel_thread.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (13 preceding siblings ...)
2004-02-28 9:45 ` Keith Owens
@ 2004-02-28 10:00 ` Keith Owens
2004-02-28 10:20 ` David Mosberger
` (12 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Keith Owens @ 2004-02-28 10:00 UTC (permalink / raw)
To: linux-ia64
On Sat, 28 Feb 2004 20:45:38 +1100,
Keith Owens <kaos@sgi.com> wrote:
>This is what the ia64 stack for a pdflush worker thread looks like now.
>It has used 560 bytes of stack from creation to sleep.
>
>0xe00000003db08000 16 1 0 2 S 0xe00000003db08570 pdflush
>0xa0000001000830f0 schedule+0xf30
>0xa0000001000dab70 __pdflush+0x230
>0xa0000001000daf00 pdflush+0x20
>0xa000000100016bc0 kernel_thread+0x100
>0xa0000001000db250 pdflush_master+0xb0
>0xa000000100016bc0 kernel_thread+0x100
>0xa000000100650670 pdflush_init+0x30
>0xa000000100641100 do_initcalls+0xc0
>0xa000000100009300 init+0xe0
>0xa000000100016bc0 kernel_thread+0x100
>0xa000000100009090 rest_init+0x30
>0xa000000100640f80 start_kernel+0x460
>0xa0000001000085a0 _start+0x280
Without DavidM's patch to add ia64_invoke_kernel_thread_helper, pdflush
starts with 560 bytes of stack and 744 bytes of rbs. With
ia64_invoke_kernel_thread_helper, that reduces to 554 bytes of stack
and 272 bytes of rbs. Backtrace with ia64_invoke_kernel_thread_helper.
0xa000000100083290 schedule+0xf30
0xa0000001000dad10 __pdflush+0x230
0xa0000001000db0a0 pdflush+0x20
0xa000000100016d70 kernel_thread_helper+0xd0
0xa000000100009040 ia64_invoke_kernel_thread_helper+0x20
We need both ia64_invoke_kernel_thread_helper and my patch to pdflush.
Until all architectures have a kernel_thread helper, nested calls to
kernel_thread will chew up stack.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (14 preceding siblings ...)
2004-02-28 10:00 ` Keith Owens
@ 2004-02-28 10:20 ` David Mosberger
2004-02-28 10:23 ` Andrew Morton
` (11 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2004-02-28 10:20 UTC (permalink / raw)
To: linux-ia64
>>>>> On Sat, 28 Feb 2004 21:00:54 +1100, Keith Owens <kaos@sgi.com> said:
Keith> Without DavidM's patch to add
Keith> ia64_invoke_kernel_thread_helper, pdflush starts with 560
Keith> bytes of stack and 744 bytes of rbs. With
Keith> ia64_invoke_kernel_thread_helper, that reduces to 554 bytes
Keith> of stack and 272 bytes of rbs. Backtrace with
Keith> ia64_invoke_kernel_thread_helper.
Cool. Thanks for reporting that and for providing the stack-dump (thanks
to Andreas, too, of course).
Keith> We need both ia64_invoke_kernel_thread_helper and my patch to pdflush.
Keith> Until all architectures have a kernel_thread helper, nested calls to
Keith> kernel_thread will chew up stack.
Yes, I think so, too. Andrew?
--david
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (15 preceding siblings ...)
2004-02-28 10:20 ` David Mosberger
@ 2004-02-28 10:23 ` Andrew Morton
2004-02-28 12:00 ` Andrew Morton
` (10 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2004-02-28 10:23 UTC (permalink / raw)
To: linux-ia64
Keith Owens <kaos@sgi.com> wrote:
>
> >So if I'm reading this right, we get a case that looks like unbounded
> >recursion:
> >
> > pdflush -> start_one_pdflush_thread -> kernel_thread -> pdflush ...
> >
Yes. Ow. Thanks.
> >Except, I don't think this is real recursion. Instead, we effectively
> >get a (potentially unbounded) sequence of one kernel thread creating
> >another thread. Each new kernel thread gets nested one deeper,
> >eventually leading to a stack overflow...
> >
> >Hmmh, I think perhaps the right way to fix this is to use a separate
> >continuation function, which will then take care of doing the
> >child-specific actions. Let me see if I can come up with something.
>
> Separate the pdflush thread creation and move it to a single master
> thread. This restricts the maximum stack depth already in use when
> starting a worker pdflush thread.
We should use the new kthread infrastructure rather than open-coding it.
It delegates thread startup to keventd and should thus avoid the stack
windup.
I'll take a look at this over the weekend unless someone else tell me
they're doing it.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (16 preceding siblings ...)
2004-02-28 10:23 ` Andrew Morton
@ 2004-02-28 12:00 ` Andrew Morton
2004-02-28 14:47 ` Keith Owens
` (9 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2004-02-28 12:00 UTC (permalink / raw)
To: linux-ia64
Andrew Morton <akpm@osdl.org> wrote:
>
> Keith Owens <kaos@sgi.com> wrote:
> >
> > >So if I'm reading this right, we get a case that looks like unbounded
> > >recursion:
> > >
> > > pdflush -> start_one_pdflush_thread -> kernel_thread -> pdflush ...
> > >
>
> Yes. Ow. Thanks.
Having just looked at the code, I don't understand the problem.
If kernel thread A starts kernel thread B and kernel thread B starts
kernel thread C and so on, how does that cause stack windup?
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (17 preceding siblings ...)
2004-02-28 12:00 ` Andrew Morton
@ 2004-02-28 14:47 ` Keith Owens
2004-02-28 14:55 ` Andreas Schwab
` (8 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Keith Owens @ 2004-02-28 14:47 UTC (permalink / raw)
To: linux-ia64
On Sat, 28 Feb 2004 04:00:34 -0800,
Andrew Morton <akpm@osdl.org> wrote:
>Having just looked at the code, I don't understand the problem.
>
>If kernel thread A starts kernel thread B and kernel thread B starts
>kernel thread C and so on, how does that cause stack windup?
Backtrace from a pdflush task on standard 2.6.3 ia64.
schedule+0xf30
__pdflush+0x230
pdflush+0x20
kernel_thread+0x100
pdflush_init+0x30
do_initcalls+0xc0
init+0xe0
kernel_thread+0x100
rest_init+0x30
start_kernel+0x460
_start+0x280
Each use of kernel_thread results in a cloned stack which inherits the
stack usage from the previous thread. Some architectures have a
kernel_thread helper which resets the stack, others do not. Without a
helper, each call whittles away at the stack.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (18 preceding siblings ...)
2004-02-28 14:47 ` Keith Owens
@ 2004-02-28 14:55 ` Andreas Schwab
2004-02-28 18:26 ` David Mosberger
` (7 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andreas Schwab @ 2004-02-28 14:55 UTC (permalink / raw)
To: linux-ia64
David Mosberger <davidm@napali.hpl.hp.com> writes:
> Say, Andreas, did you by chance have 3 disk drives in your Tiger?
No, only one.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (19 preceding siblings ...)
2004-02-28 14:55 ` Andreas Schwab
@ 2004-02-28 18:26 ` David Mosberger
2004-02-28 23:59 ` Keith Owens
` (6 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2004-02-28 18:26 UTC (permalink / raw)
To: linux-ia64
>>>>> On Sat, 28 Feb 2004 15:55:37 +0100, Andreas Schwab <schwab@suse.de> said:
Andreas> David Mosberger <davidm@napali.hpl.hp.com> writes:
>> Say, Andreas, did you by chance have 3 disk drives in your Tiger?
Andreas> No, only one.
Hmmh, kind of throws that theory out of the water...
I still don't understand why the problem triggered only on certain
machine.
--david
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (20 preceding siblings ...)
2004-02-28 18:26 ` David Mosberger
@ 2004-02-28 23:59 ` Keith Owens
2004-02-29 3:44 ` Keith Owens
` (5 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Keith Owens @ 2004-02-28 23:59 UTC (permalink / raw)
To: linux-ia64
On Sat, 28 Feb 2004 10:26:18 -0800,
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>>>>>> On Sat, 28 Feb 2004 15:55:37 +0100, Andreas Schwab <schwab@suse.de> said:
>
> Andreas> David Mosberger <davidm@napali.hpl.hp.com> writes:
>
> >> Say, Andreas, did you by chance have 3 disk drives in your Tiger?
>
> Andreas> No, only one.
>
>Hmmh, kind of throws that theory out of the water...
>
>I still don't understand why the problem triggered only on certain
>machine.
pdflush threads are per filesystem, independent of the number of
physical disks. When you have concurrent heavy I/O load against more
than MIN_PDFLUSH_THREADS filesystems, pdflush will detect that all its
worker tasks are in use and will fork new tasks.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (21 preceding siblings ...)
2004-02-28 23:59 ` Keith Owens
@ 2004-02-29 3:44 ` Keith Owens
2004-02-29 5:27 ` Andrew Morton
` (4 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Keith Owens @ 2004-02-29 3:44 UTC (permalink / raw)
To: linux-ia64
On Sat, 28 Feb 2004 02:23:23 -0800,
Andrew Morton <akpm@osdl.org> wrote:
>We should use the new kthread infrastructure rather than open-coding it.
>It delegates thread startup to keventd and should thus avoid the stack
>windup.
Convert pdflush to kthread to avoid stack windup.
Index: 4-rc1.1/mm/pdflush.c
--- 4-rc1.1/mm/pdflush.c Thu, 18 Dec 2003 16:46:13 +1100 kaos (linux-2.6/20_pdflush.c 1.1 644)
+++ 4-rc1.1(w)/mm/pdflush.c Sun, 29 Feb 2004 14:28:02 +1100 kaos (linux-2.6/20_pdflush.c 1.1 644)
@@ -5,6 +5,9 @@
*
* 09Apr2002 akpm@zip.com.au
* Initial version
+ * 29Feb2004 kaos@sgi.com
+ * Move worker thread creation to kthread to avoid chewing
+ * up stack space with nested calls to kernel_thread.
*/
#include <linux/sched.h>
@@ -17,6 +20,7 @@
#include <linux/suspend.h>
#include <linux/fs.h> // Needed by writeback.h
#include <linux/writeback.h> // Prototypes pdflush_operation()
+#include <linux/kthread.h>
/*
@@ -207,7 +211,7 @@ int pdflush_operation(void (*fn)(unsigne
static void start_one_pdflush_thread(void)
{
- kernel_thread(pdflush, NULL, CLONE_KERNEL);
+ kthread_run(pdflush, NULL, "pdflush");
}
static int __init pdflush_init(void)
Rusty, does pdflush() still need to call daemonize() or does kthread
make that redundant?
pdflush backtrace using kthread, without ia64 kernel_thread_helper.
This has got worse. 2.6.3 using kernel_thread used 560 bytes of stack
and 744 bytes of rbs, 2.6.4-rc1 with kthread uses 1120 stack and 1312
rbs.
See http://marc.theaimsgroup.com/?l=linux-ia64&m\x107796262112591&w=2
for 2.6.3 backtraces.
0xa000000100083650 schedule+0xf30
sp 0xe00000003dabfba0 bsp 0xe00000003dab9440 cfm 0x0000000000000f26
0xa0000001000dc730 __pdflush+0x230
0xa0000001000dcac0 pdflush+0x20
0xa0000001000c0720 kthread+0x200
0xa000000100016bc0 kernel_thread+0x100
0xa0000001000c0770 keventd_create_kthread+0x30
0xa0000001000b7b10 worker_thread+0x450
0xa0000001000c0720 kthread+0x200
0xa000000100016bc0 kernel_thread+0x100
0xa0000001000c0770 keventd_create_kthread+0x30
0xa0000001000c0a00 kthread_create+0x200
0xa0000001000b80c0 create_workqueue_thread+0x100
0xa0000001000b82e0 create_workqueue+0x1a0
0xa0000001000b89a0 init_workqueues+0x20
0xa000000100645290 do_basic_setup+0x50
0xa000000100009300 init+0xe0
0xa000000100016bc0 kernel_thread+0x100
0xa000000100009090 rest_init+0x30
0xa000000100644fc0 start_kernel+0x4a0
0xa0000001000085a0 _start+0x280
sp 0xe00000003dabfe30 bsp 0xe00000003dab8f20 cfm 0x0000000000000794
pdflush backtrace using kthread, with ia64 kernel_thread_helper. This
has also got worse. 2.6.3 using kernel_thread and helper used 554
bytes of stack and 272 bytes of rbs, 2.6.4-rc1 with kthread and helper
uses 640 stack and 376 rbs.
0xa0000001000837f0 schedule+0xf30
sp 0xe00000003dabfd80 bsp 0xe00000003dab9098 cfm 0x0000000000000f26
0xa0000001000dc8d0 __pdflush+0x230
0xa0000001000dcc60 pdflush+0x20
0xa0000001000c08c0 kthread+0x200
0xa000000100016d70 kernel_thread_helper+0xd0
0xa000000100009040 ia64_invoke_kernel_thread_helper+0x20
sp 0xe00000003dabfe30 bsp 0xe00000003dab8f20 cfm 0x0000000000000002
This says two things - every architecture should have a kernel_thread
helper and using kthread adds some stack overhead, even with a helper.
The latter is acceptable if it makes the code more reliable.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (22 preceding siblings ...)
2004-02-29 3:44 ` Keith Owens
@ 2004-02-29 5:27 ` Andrew Morton
2004-03-01 10:34 ` Andreas Schwab
` (3 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2004-02-29 5:27 UTC (permalink / raw)
To: linux-ia64
Keith Owens <kaos@sgi.com> wrote:
>
> On Sat, 28 Feb 2004 02:23:23 -0800,
> Andrew Morton <akpm@osdl.org> wrote:
> >We should use the new kthread infrastructure rather than open-coding it.
> >It delegates thread startup to keventd and should thus avoid the stack
> >windup.
>
> Convert pdflush to kthread to avoid stack windup.
Thanks Keith. Tricky patch ;)
> Rusty, does pdflush() still need to call daemonize() or does kthread
> make that redundant?
It's redundant - threads launched by kthread have a genuine kernel thread
as a parent and hence do not need to perform all that "disassociate me from
my userspace parent" stuff.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (23 preceding siblings ...)
2004-02-29 5:27 ` Andrew Morton
@ 2004-03-01 10:34 ` Andreas Schwab
2004-03-01 19:46 ` David Mosberger
` (2 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: Andreas Schwab @ 2004-03-01 10:34 UTC (permalink / raw)
To: linux-ia64
David Mosberger <davidm@napali.hpl.hp.com> writes:
>>>>>> On Fri, 27 Feb 2004 22:52:46 -0800, David Mosberger <davidm@linux.hpl.hp.com> said:
>
> David> Hmmh, I think perhaps the right way to fix this is to use a separate
> David> continuation function, which will then take care of doing the
> David> child-specific actions. Let me see if I can come up with something.
>
> OK, how about the attached patch? Does it fix the problem for you,
> Andreas?
The system has been under moderate load for a couple of days without any
problems, so it seems to be fixed.
Thanks, Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (24 preceding siblings ...)
2004-03-01 10:34 ` Andreas Schwab
@ 2004-03-01 19:46 ` David Mosberger
2006-09-06 13:39 ` D.N.Jagannathan
2006-09-06 17:44 ` Chen, Kenneth W
27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2004-03-01 19:46 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 01 Mar 2004 11:34:33 +0100, Andreas Schwab <schwab@suse.de> said:
Andreas> The system has been under moderate load for a couple of
Andreas> days without any problems, so it seems to be fixed.
Cool!
--david
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (25 preceding siblings ...)
2004-03-01 19:46 ` David Mosberger
@ 2006-09-06 13:39 ` D.N.Jagannathan
2006-09-06 17:44 ` Chen, Kenneth W
27 siblings, 0 replies; 29+ messages in thread
From: D.N.Jagannathan @ 2006-09-06 13:39 UTC (permalink / raw)
To: linux-ia64
can anybody explain what is and the importance of pdflush process
running in Linux.
^ permalink raw reply [flat|nested] 29+ messages in thread* RE: Oops in pdflush
2004-02-20 13:34 Oops in pdflush Andreas Schwab
` (26 preceding siblings ...)
2006-09-06 13:39 ` D.N.Jagannathan
@ 2006-09-06 17:44 ` Chen, Kenneth W
27 siblings, 0 replies; 29+ messages in thread
From: Chen, Kenneth W @ 2006-09-06 17:44 UTC (permalink / raw)
To: linux-ia64
D.N.Jagannathan wrote on Wednesday, September 06, 2006 6:27 AM
> can anybody explain what is and the importance of pdflush process
> running in Linux.
pdflush is a kernel thread that periodically flushing dirty pages in
page cache in order to ensure in-memory data are consistent with disk
storage.
- Ken
^ permalink raw reply [flat|nested] 29+ messages in thread