public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [2.5.69] rtnl-deadlock with usermodehelper and keventd
@ 2003-05-15 13:14 Martin Diehl
  2003-05-15 20:12 ` Jean Tourrilhes
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Diehl @ 2003-05-15 13:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jean Tourrilhes, DevilKin


Hi,

seems we may run into mutual deadlock in the unregister_netdev() path with 
CONFIG_HOTPLUG=y. I managed to reproduce an irda-user report leading to 
the following description:

* killing irattach (userland daemon comparable to pppd) starts closing the
  irda tty-ldisc

* there we call unregister_netdev() on behalf of the (already closed) 
  irda0 network device.

* unregister_netdev() takes rtnl_lock

* further down in unregister_netdevice() with CONFIG_HOTPLUG the network 
  layers wants to call userland hotplug stuff

* the request to fork the usermodehelper gets queued for the event/0 
  workqueue (aka keventd) and we are blocking with rtnl still acquired for 
  completion.

* at this moment for some reason keventd has a linkwatch_event() 
  apparently already scheduled before the usermode helper. So we run into
  linkwatch_event() with tries to get rtnl_lock.

-> mutual deadlock: keventd waiting for rtnl_lock which is still hold by 
unregister_netdev blocking for completion of work scheduled for keventd.

I can reproduce this with 2.5.69 with CONFIG_HOTPLUG enabled, no matter 
what /proc/sys/kernel/hotplug is, even /bin/true is sufficient. I've no 
idea why I get this with irda0 but not with eth0 for example.
FWIW kernel is SMP running on UP without preempt.

As I don't see how the irda stuff could cause unregister_netdev() to 
schedule the hotplug stuff with some linkwatch_event already scheduled 
I've no idea what the real problem and fix might be.

Below a commented calltrace catched right when it hangs as described.

Thanks
Martin

-----------------------------

> May 14 13:14:17 laptop kernel: events/0      D C12FDF04 412092     3      1             4     2 (L-TLB)
> May 14 13:14:17 laptop kernel: Call Trace:
> May 14 13:14:17 laptop kernel:  [__down+150/256] __down+0x96/0x100
> May 14 13:14:17 laptop kernel:  [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel:  [__down_failed+8/12] __down_failed+0x8/0xc
> May 14 13:14:17 laptop kernel:  [.text.lock.rtnetlink+5/54] .text.lock.rtnetlink+0x5/0x36
> May 14 13:14:17 laptop kernel:  [linkwatch_event+29/48] linkwatch_event+0x1d/0x30
> May 14 13:14:17 laptop kernel:  [worker_thread+511/736] worker_thread+0x1ff/0x2e0
> May 14 13:14:17 laptop kernel:  [linkwatch_event+0/48] linkwatch_event+0x0/0x30
> May 14 13:14:17 laptop kernel:  [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel:  [ret_from_fork+6/20] ret_from_fork+0x6/0x14
> May 14 13:14:17 laptop kernel:  [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel:  [worker_thread+0/736] worker_thread+0x0/0x2e0
> May 14 13:14:17 laptop kernel:  [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18

This is the keventd-thread. It has some work scheduled for the network 
layer, namely linkwatch_event(). This is currently blocking to get the 
rtnl_lock semaphore.


> May 14 13:14:17 laptop kernel: irattach      D 00000000 4283667124   400      1           537   396 (NOTLB)
> May 14 13:14:17 laptop kernel: Call Trace:
> May 14 13:14:17 laptop kernel:  [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0
> May 14 13:14:17 laptop kernel:  [wait_for_completion+153/224] wait_for_completion+0x99/0xe0

(5)

> May 14 13:14:17 laptop kernel:  [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel:  [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel:  [queue_work+132/160] queue_work+0x84/0xa0

(4)

> May 14 13:14:17 laptop kernel:  [call_usermodehelper+257/272] call_usermodehelper+0x101/0x110
> May 14 13:14:17 laptop kernel:  [__call_usermodehelper+0/112] __call_usermodehelper+0x0/0x70
> May 14 13:14:17 laptop kernel:  [vsprintf+39/48] vsprintf+0x27/0x30
> May 14 13:14:17 laptop kernel:  [sprintf+31/48] sprintf+0x1f/0x30
> May 14 13:14:17 laptop kernel:  [net_run_sbin_hotplug+174/195] net_run_sbin_hotplug+0xae/0xc3

(3)

> May 14 13:14:17 laptop kernel:  [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0
> May 14 13:14:17 laptop kernel:  [pfifo_fast_reset+158/160] pfifo_fast_reset+0x9e/0xa0
> May 14 13:14:17 laptop kernel:  [qdisc_destroy+158/160] qdisc_destroy+0x9e/0xa0
> May 14 13:14:17 laptop kernel:  [unregister_netdevice+211/608] unregister_netdevice+0xd3/0x260
> May 14 13:14:17 laptop kernel:  [_end+282800068/1070304612] sirdev_dtor+0x0/0x20 [sir_dev]

(2)

> May 14 13:14:17 laptop kernel:  [unregister_netdev+24/48] unregister_netdev+0x18/0x30

(1)

> May 14 13:14:17 laptop kernel:  [_end+282800429/1070304612] sirdev_put_instance+0x149/0x1ad [sir_dev]
> May 14 13:14:17 laptop kernel:  [_end+282804705/1070304612] __func__.9+0x0/0x14 [sir_dev]
> May 14 13:14:17 laptop kernel:  [_end+282131315/1070304612] irtty_close+0x4f/0x120 [irtty_sir]
> May 14 13:14:17 laptop kernel:  [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel:  [tty_set_ldisc+1091/1200] tty_set_ldisc+0x443/0x4b0
> May 14 13:14:17 laptop kernel:  [uart_wait_until_sent+144/224] uart_wait_until_sent+0x90/0xe0
> May 14 13:14:17 laptop kernel:  [tty_wait_until_sent+243/272] tty_wait_until_sent+0xf3/0x110
> May 14 13:14:17 laptop kernel:  [default_wake_function+0/32] default_wake_function+0x0/0x20
> May 14 13:14:17 laptop kernel:  [sock_destroy_inode+27/32] sock_destroy_inode+0x1b/0x20
> May 14 13:14:17 laptop kernel:  [_end+282132178/1070304612] +0x15a/0x16c [irtty_sir]
> May 14 13:14:17 laptop kernel:  [_end+282130740/1070304612] irtty_open+0x0/0x1f0 [irtty_sir]
> May 14 13:14:17 laptop kernel:  [_end+282131236/1070304612] irtty_close+0x0/0x120 [irtty_sir]
> May 14 13:14:17 laptop kernel:  [_end+282130132/1070304612] irtty_ioctl+0x0/0x260 [irtty_sir]
> May 14 13:14:17 laptop kernel:  [_end+282129076/1070304612] irtty_receive_buf+0x0/0xc0 [irtty_sir]
> May 14 13:14:17 laptop kernel:  [_end+282129268/1070304612] irtty_receive_room+0x0/0x30 [irtty_sir]
> May 14 13:14:17 laptop kernel:  [_end+282129316/1070304612] irtty_write_wakeup+0x0/0x40 [irtty_sir]
> May 14 13:14:17 laptop kernel:  [_end+282134820/1070304612] +0x0/0xe0 [irtty_sir]
> May 14 13:14:17 laptop kernel:  [sys_ioctl+256/656] sys_ioctl+0x100/0x290
> May 14 13:14:17 laptop kernel:  [syscall_call+7/11] syscall_call+0x7/0xb

Ok, nice trace btw: The last printk from sir_dev was at (1) before we 
called unregister_netdev() - which in turn acquired rtnl_lock (2). Due to 
the disappearing irda0 device (and CONFIG_HOTPLUG=y) the network layer 
decided to call the hotplug stuff (3). For this to fork the usermode 
helper, it scheduled some work for keventd (4). Finally we are blocking 
for completion until keventd finishes wait4 usermodehelper (5).

Unfortunately we are blocking for completion with rtnl still locked and 
keventd apparently having the linkwatch_event() scheduled before the 
usermodehelper -> mutual deadlock between irattach and keventd!


^ permalink raw reply	[flat|nested] 13+ messages in thread
[parent not found: <PAO-EX01Cv3uS7sBdxk00001183@pao-ex01.pao.digeo.com>]

end of thread, other threads:[~2003-05-23 23:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-15 13:14 [2.5.69] rtnl-deadlock with usermodehelper and keventd Martin Diehl
2003-05-15 20:12 ` Jean Tourrilhes
2003-05-15 20:19   ` Greg KH
2003-05-15 20:25     ` Jean Tourrilhes
     [not found] <PAO-EX01Cv3uS7sBdxk00001183@pao-ex01.pao.digeo.com>
2003-05-16  0:53 ` David S. Miller
2003-05-16  1:12   ` [2.5.69] rtnl-deadlock with usermodehelper and keventd Andrew Morton
2003-05-16  1:27     ` Jean Tourrilhes
2003-05-23  7:06     ` Martin Diehl
2003-05-23  6:59       ` David S. Miller
2003-05-23  9:38         ` Martin Diehl
2003-05-23  9:43           ` David S. Miller
2003-05-23 14:42             ` Stian Jordet
2003-05-23 16:46             ` Jean Tourrilhes
2003-05-23 23:25             ` Martin Diehl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox