* <=2.4.17 deadlock (RedHat 7.2, SMP, ext3 related?) (2)
2001-12-23 22:53 <=2.4.17 deadlock (RedHat 7.2, SMP) Frank van Maarseveen
@ 2001-12-24 19:48 ` Frank van Maarseveen
0 siblings, 0 replies; 5+ messages in thread
From: Frank van Maarseveen @ 2001-12-24 19:48 UTC (permalink / raw)
To: linux-kernel
Below a second case, again it occured right at the moment a PPP
connection was terminated. For some unknown reason it seems to occur
frequently enough now for doing some experimenting. Tell me if more info
is needed. This is the same kernel as before. When the problem first
emerged some weeks ago after installing RedHat 7.2 I switched to kgcc
(gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) instead
of gcc 2.96 but it didn't make a difference.
Following two alt-sysrq-p have been decoded from a JPEG image I took from
the screen. Even alt-sysrq-l leaves most processes intact according to
alt-sysrq-t so to me it looks like a genuine deadlock:
Pid: 4465, comm: hotplug
EIP: 0010:[<c03fe0ba>] CPU: 0 EFLAGS 00000286 not tainted
Call Trace: c0140134 c013eba2 c013eab2 c013ee23 c01076cb
Pid: 2930, comm: pppd
EIP: 0010:[<c0403068>] CPU: 1 EFLAGS 00000286 not tainted
Call Trace: c016d320 c028ee04 c0159e01 c0156e68 c013f063 c0150bc6 c028eac8 c01076cb
Now ksymoops at work:
ksymoops 2.4.1 on i686 2.4.17-b65. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.17-b65/ (default)
-m /boot/System.map-2.4.17-b65 (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
EIP: 0010:[<c03fe0ba>] CPU: 0 EFLAGS 00000286 not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Call Trace: c0140134 c013eba2 c013eab2 c013ee23 c01076cb
EIP: 0010:[<c0403068>] CPU: 1 EFLAGS 00000286 not tainted
Call Trace: c016d320 c028ee04 c0159e01 c0156e68 c013f063 c0150bc6 c028eac8 c01076cb
Warning (Oops_read): Code line not seen, dumping what data is available
>>EIP; c03fe0ba <stext_lock+1c82/e9ac> <=====
Trace; c0140134 <chrdev_open+28/124>
Trace; c013eba2 <dentry_open+e6/194>
Trace; c013eab2 <filp_open+52/5c>
Trace; c013ee23 <sys_open+4b/140>
Trace; c01076cb <system_call+33/38>
>>EIP; c0403068 <stext_lock+6c30/e9ac> <=====
Trace; c016d320 <ext3_delete_inode+0/270>
Trace; c028ee04 <ppp_ioctl+33c/d84>
Trace; c0159e01 <iput+2c5/2f0>
Trace; c0156e68 <d_delete+dc/20c>
Trace; c013f063 <filp_close+133/140>
Trace; c0150bc6 <sys_ioctl+24a/2e4>
Trace; c028eac8 <ppp_ioctl+0/d84>
Trace; c01076cb <system_call+33/38>
3 warnings issued. Results may not be reliable.
--
Frank
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: <=2.4.17 deadlock (RedHat 7.2, SMP, ext3 related?) (2)
@ 2001-12-24 20:17 Manfred Spraul
2001-12-24 20:56 ` Frank van Maarseveen
0 siblings, 1 reply; 5+ messages in thread
From: Manfred Spraul @ 2001-12-24 20:17 UTC (permalink / raw)
To: Frank van Maarseveen; +Cc: linux-kernel
Frank,
could you open your vmlinux file with gdb, and figure out where the 2
stext_lock references lead to?
stext_lock+1c82 just means that it spins trying to acquire a spinlock,
but which one?
IIRC
gdb vmlinux
$x/30i 0xc03fe0a0
$x/30i 0xc0403050
should be enough.
>>EIP; c03fe0ba <stext_lock+1c82/e9ac> <=====
Trace; c0140134 <chrdev_open+28/124>
Trace; c013eba2 <dentry_open+e6/194>
Trace; c013eab2 <filp_open+52/5c>
Trace; c013ee23 <sys_open+4b/140>
Trace; c01076cb <system_call+33/38>
Probably this cpu is spinning in get_chrfops(), trying to acquire the big kernel
lock.
>>EIP; c0403068 <stext_lock+6c30/e9ac> <=====
Trace; c016d320 <ext3_delete_inode+0/270>
Trace; c028ee04 <ppp_ioctl+33c/d84>
Trace; c0159e01 <iput+2c5/2f0>
Trace; c0156e68 <d_delete+dc/20c>
Trace; c013f063 <filp_close+133/140>
Trace; c0150bc6 <sys_ioctl+24a/2e4>
Trace; c028eac8 <ppp_ioctl+0/d84>
Trace; c01076cb <system_call+33/38>
I think the call chain is
system_call
-> calls sys_ioctl.
acquires the big kernel lock.
puts ppp_ioctl on the stack (+0: it's a function pointer, not a return value.)
-> calls ppp_ioctl
and that one locks up.
The references to close/iput/delete_inode are just stale stack values from a previous
syscall.
Please check where stext_lock+6c30 leads: I think the deadlock is somewhere within
ppp_ioctl, and then the system locks up because the big kernel lock is blocked.
--
Manfred
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: <=2.4.17 deadlock (RedHat 7.2, SMP, ext3 related?) (2)
2001-12-24 20:17 <=2.4.17 deadlock (RedHat 7.2, SMP, ext3 related?) (2) Manfred Spraul
@ 2001-12-24 20:56 ` Frank van Maarseveen
2001-12-24 21:22 ` Manfred Spraul
[not found] ` <005c01c18cc3$305283f0$010411ac@local>
0 siblings, 2 replies; 5+ messages in thread
From: Frank van Maarseveen @ 2001-12-24 20:56 UTC (permalink / raw)
To: Manfred Spraul; +Cc: linux-kernel
On Mon, Dec 24, 2001 at 09:17:00PM +0100, Manfred Spraul wrote:
> Frank,
> could you open your vmlinux file with gdb, and figure out where the 2
> stext_lock references lead to?
> stext_lock+1c82 just means that it spins trying to acquire a spinlock,
> but which one?
> IIRC
> gdb vmlinux
> $x/30i 0xc03fe0a0
> $x/30i 0xc0403050
> should be enough.
(gdb) x/30i 0xc03fe0a0
0xc03fe0a0 <stext_lock+7272>: lea 0xc04e79c0,%eax
0xc03fe0a6 <stext_lock+7278>: call 0xc01062f4 <__read_lock_failed>
0xc03fe0ab <stext_lock+7283>: pop %eax
0xc03fe0ac <stext_lock+7284>: jmp 0xc013fd3e <get_chrfops+78>
0xc03fe0b1 <stext_lock+7289>: cmpb $0x0,0xc05dd400
0xc03fe0b8 <stext_lock+7296>: repz nop
0xc03fe0ba <stext_lock+7298>: jle 0xc03fe0b1 <stext_lock+7289>
0xc03fe0bc <stext_lock+7300>: jmp 0xc013fde0 <get_chrfops+240>
0xc03fe0c1 <stext_lock+7305>: push %eax
0xc03fe0c2 <stext_lock+7306>: lea 0xc04e79c0,%eax
0xc03fe0c8 <stext_lock+7312>: call 0xc01062f4 <__read_lock_failed>
0xc03fe0cd <stext_lock+7317>: pop %eax
0xc03fe0ce <stext_lock+7318>: jmp 0xc013feee <get_chrfops+510>
0xc03fe0d3 <stext_lock+7323>: push %eax
0xc03fe0d4 <stext_lock+7324>: lea 0xc04e79c0,%eax
0xc03fe0da <stext_lock+7330>: call 0xc01062d4 <__write_lock_failed>
0xc03fe0df <stext_lock+7335>: pop %eax
0xc03fe0e0 <stext_lock+7336>: jmp 0xc013ff78 <register_chrdev+68>
0xc03fe0e5 <stext_lock+7341>: push %eax
0xc03fe0e6 <stext_lock+7342>: lea 0xc04e79c0,%eax
0xc03fe0ec <stext_lock+7348>: call 0xc01062d4 <__write_lock_failed>
0xc03fe0f1 <stext_lock+7353>: pop %eax
0xc03fe0f2 <stext_lock+7354>: jmp 0xc013fff3 <register_chrdev+191>
0xc03fe0f7 <stext_lock+7359>: push %eax
0xc03fe0f8 <stext_lock+7360>: lea 0xc04e79c0,%eax
0xc03fe0fe <stext_lock+7366>: call 0xc01062d4 <__write_lock_failed>
0xc03fe103 <stext_lock+7371>: pop %eax
0xc03fe104 <stext_lock+7372>: jmp 0xc01400a3 <unregister_chrdev+79>
0xc03fe109 <stext_lock+7377>: cmpb $0x0,0xc05dd400
0xc03fe110 <stext_lock+7384>: repz nop
(gdb) x/30i 0xc0403050
0xc0403050 <stext_lock+27672>: nop
0xc0403051 <stext_lock+27673>: jle 0xc0403048 <stext_lock+27664>
0xc0403053 <stext_lock+27675>: jmp 0xc0292971 <find_compressor+53>
0xc0403058 <stext_lock+27680>: cmpb $0x0,0xc057d540
0xc040305f <stext_lock+27687>: repz nop
0xc0403061 <stext_lock+27689>: jle 0xc0403058 <stext_lock+27680>
0xc0403063 <stext_lock+27691>: jmp 0xc0292ac0 <ppp_create_interface+72>
0xc0403068 <stext_lock+27696>: cmpb $0x0,0xc057d540
0xc040306f <stext_lock+27703>: repz nop
0xc0403071 <stext_lock+27705>: jle 0xc0403068 <stext_lock+27696>
0xc0403073 <stext_lock+27707>: jmp 0xc0292df0 <ppp_destroy_interface+68>
0xc0403078 <stext_lock+27712>: cmpb $0x0,(%ebx)
0xc040307b <stext_lock+27715>: repz nop
0xc040307d <stext_lock+27717>: jle 0xc0403078 <stext_lock+27712>
0xc040307f <stext_lock+27719>: jmp 0xc0292e60 <ppp_destroy_interface+180>
0xc0403084 <stext_lock+27724>: cmpb $0x0,(%ebx)
0xc0403087 <stext_lock+27727>: repz nop
0xc0403089 <stext_lock+27729>: jle 0xc0403084 <stext_lock+27724>
0xc040308b <stext_lock+27731>: jmp 0xc0292eb0 <ppp_destroy_interface+260>
0xc0403090 <stext_lock+27736>: cmpb $0x0,(%ebx)
0xc0403093 <stext_lock+27739>: repz nop
0xc0403095 <stext_lock+27741>: jle 0xc0403090 <stext_lock+27736>
0xc0403097 <stext_lock+27743>: jmp 0xc0292f40 <ppp_destroy_interface+404>
0xc040309c <stext_lock+27748>: cmpb $0x0,(%ebx)
0xc040309f <stext_lock+27751>: repz nop
0xc04030a1 <stext_lock+27753>: jle 0xc040309c <stext_lock+27748>
0xc04030a3 <stext_lock+27755>: jmp 0xc0293014 <ppp_destroy_interface+616>
0xc04030a8 <stext_lock+27760>: push %eax
0xc04030a9 <stext_lock+27761>: push %ecx
0xc04030aa <stext_lock+27762>: push %edx
(gdb) p stext_lock
$1 = {<text variable, no debug info>} 0xc03fc438 <stext_lock>
(gdb) p stext_lock+0x1c82
$1 = (<text variable, no debug info> *) 0xc03fe0ba <stext_lock+7298>
(gdb) p stext_lock+0x6c30
$2 = (<text variable, no debug info> *) 0xc0403068 <stext_lock+27696>
--
Frank
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: <=2.4.17 deadlock (RedHat 7.2, SMP, ext3 related?) (2)
2001-12-24 20:56 ` Frank van Maarseveen
@ 2001-12-24 21:22 ` Manfred Spraul
[not found] ` <005c01c18cc3$305283f0$010411ac@local>
1 sibling, 0 replies; 5+ messages in thread
From: Manfred Spraul @ 2001-12-24 21:22 UTC (permalink / raw)
To: Frank van Maarseveen; +Cc: linux-kernel
From: "Frank van Maarseveen" <F.vanMaarseveen@inter.NL.net>
>
> 0xc03fe0b1 <stext_lock+7289>: cmpb $0x0,0xc05dd400
> 0xc03fe0b8 <stext_lock+7296>: repz nop
> 0xc03fe0ba <stext_lock+7298>: jle 0xc03fe0b1 <stext_lock+7289>
> 0xc03fe0bc <stext_lock+7300>: jmp 0xc013fde0 <get_chrfops+240>
I assume 0xd05dd400 is kernel_flag.
> 0xc0403068 <stext_lock+27696>: cmpb $0x0,0xc057d540
> 0xc040306f <stext_lock+27703>: repz nop
> 0xc0403071 <stext_lock+27705>: jle 0xc0403068 <stext_lock+27696>
> 0xc0403073 <stext_lock+27707>: jmp 0xc0292df0 <ppp_destroy_interface+68>
And 0xc057d540 is all_ppp_lock.
The only obvious abuse is that both ppp_destroy_interface and ppp_create_interface
call rtnl_lock (that's a semaphore) with the spinlock acquired.
--
Manfred
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: <=2.4.17 deadlock (RedHat 7.2, SMP, ext3 related?) (2)
[not found] ` <005c01c18cc3$305283f0$010411ac@local>
@ 2001-12-25 12:42 ` Frank van Maarseveen
0 siblings, 0 replies; 5+ messages in thread
From: Frank van Maarseveen @ 2001-12-25 12:42 UTC (permalink / raw)
To: Manfred Spraul; +Cc: linux-kernel
On Mon, Dec 24, 2001 at 10:36:11PM +0100, Manfred Spraul wrote:
> If you are brave, you can try the attached patch.
Your assuptions about the address of kernel_flag and
all_ppp_lock were correct (except for an obvious typo).
The patch seems to work, no deadlocks so far but I'll keep
an eye on it.
Thanks,
--
Frank
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2001-12-25 12:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-24 20:17 <=2.4.17 deadlock (RedHat 7.2, SMP, ext3 related?) (2) Manfred Spraul
2001-12-24 20:56 ` Frank van Maarseveen
2001-12-24 21:22 ` Manfred Spraul
[not found] ` <005c01c18cc3$305283f0$010411ac@local>
2001-12-25 12:42 ` Frank van Maarseveen
-- strict thread matches above, loose matches on Subject: below --
2001-12-23 22:53 <=2.4.17 deadlock (RedHat 7.2, SMP) Frank van Maarseveen
2001-12-24 19:48 ` <=2.4.17 deadlock (RedHat 7.2, SMP, ext3 related?) (2) Frank van Maarseveen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox