* [2.5.69] rtnl-deadlock with usermodehelper and keventd @ 2003-05-15 13:14 Martin Diehl 2003-05-15 20:12 ` Jean Tourrilhes 0 siblings, 1 reply; 13+ messages in thread From: Martin Diehl @ 2003-05-15 13:14 UTC (permalink / raw) To: linux-kernel; +Cc: Jean Tourrilhes, DevilKin Hi, seems we may run into mutual deadlock in the unregister_netdev() path with CONFIG_HOTPLUG=y. I managed to reproduce an irda-user report leading to the following description: * killing irattach (userland daemon comparable to pppd) starts closing the irda tty-ldisc * there we call unregister_netdev() on behalf of the (already closed) irda0 network device. * unregister_netdev() takes rtnl_lock * further down in unregister_netdevice() with CONFIG_HOTPLUG the network layers wants to call userland hotplug stuff * the request to fork the usermodehelper gets queued for the event/0 workqueue (aka keventd) and we are blocking with rtnl still acquired for completion. * at this moment for some reason keventd has a linkwatch_event() apparently already scheduled before the usermode helper. So we run into linkwatch_event() with tries to get rtnl_lock. -> mutual deadlock: keventd waiting for rtnl_lock which is still hold by unregister_netdev blocking for completion of work scheduled for keventd. I can reproduce this with 2.5.69 with CONFIG_HOTPLUG enabled, no matter what /proc/sys/kernel/hotplug is, even /bin/true is sufficient. I've no idea why I get this with irda0 but not with eth0 for example. FWIW kernel is SMP running on UP without preempt. As I don't see how the irda stuff could cause unregister_netdev() to schedule the hotplug stuff with some linkwatch_event already scheduled I've no idea what the real problem and fix might be. Below a commented calltrace catched right when it hangs as described. Thanks Martin ----------------------------- > May 14 13:14:17 laptop kernel: events/0 D C12FDF04 412092 3 1 4 2 (L-TLB) > May 14 13:14:17 laptop kernel: Call Trace: > May 14 13:14:17 laptop kernel: [__down+150/256] __down+0x96/0x100 > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > May 14 13:14:17 laptop kernel: [__down_failed+8/12] __down_failed+0x8/0xc > May 14 13:14:17 laptop kernel: [.text.lock.rtnetlink+5/54] .text.lock.rtnetlink+0x5/0x36 > May 14 13:14:17 laptop kernel: [linkwatch_event+29/48] linkwatch_event+0x1d/0x30 > May 14 13:14:17 laptop kernel: [worker_thread+511/736] worker_thread+0x1ff/0x2e0 > May 14 13:14:17 laptop kernel: [linkwatch_event+0/48] linkwatch_event+0x0/0x30 > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > May 14 13:14:17 laptop kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14 > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > May 14 13:14:17 laptop kernel: [worker_thread+0/736] worker_thread+0x0/0x2e0 > May 14 13:14:17 laptop kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18 This is the keventd-thread. It has some work scheduled for the network layer, namely linkwatch_event(). This is currently blocking to get the rtnl_lock semaphore. > May 14 13:14:17 laptop kernel: irattach D 00000000 4283667124 400 1 537 396 (NOTLB) > May 14 13:14:17 laptop kernel: Call Trace: > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0 > May 14 13:14:17 laptop kernel: [wait_for_completion+153/224] wait_for_completion+0x99/0xe0 (5) > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > May 14 13:14:17 laptop kernel: [queue_work+132/160] queue_work+0x84/0xa0 (4) > May 14 13:14:17 laptop kernel: [call_usermodehelper+257/272] call_usermodehelper+0x101/0x110 > May 14 13:14:17 laptop kernel: [__call_usermodehelper+0/112] __call_usermodehelper+0x0/0x70 > May 14 13:14:17 laptop kernel: [vsprintf+39/48] vsprintf+0x27/0x30 > May 14 13:14:17 laptop kernel: [sprintf+31/48] sprintf+0x1f/0x30 > May 14 13:14:17 laptop kernel: [net_run_sbin_hotplug+174/195] net_run_sbin_hotplug+0xae/0xc3 (3) > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0 > May 14 13:14:17 laptop kernel: [pfifo_fast_reset+158/160] pfifo_fast_reset+0x9e/0xa0 > May 14 13:14:17 laptop kernel: [qdisc_destroy+158/160] qdisc_destroy+0x9e/0xa0 > May 14 13:14:17 laptop kernel: [unregister_netdevice+211/608] unregister_netdevice+0xd3/0x260 > May 14 13:14:17 laptop kernel: [_end+282800068/1070304612] sirdev_dtor+0x0/0x20 [sir_dev] (2) > May 14 13:14:17 laptop kernel: [unregister_netdev+24/48] unregister_netdev+0x18/0x30 (1) > May 14 13:14:17 laptop kernel: [_end+282800429/1070304612] sirdev_put_instance+0x149/0x1ad [sir_dev] > May 14 13:14:17 laptop kernel: [_end+282804705/1070304612] __func__.9+0x0/0x14 [sir_dev] > May 14 13:14:17 laptop kernel: [_end+282131315/1070304612] irtty_close+0x4f/0x120 [irtty_sir] > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > May 14 13:14:17 laptop kernel: [tty_set_ldisc+1091/1200] tty_set_ldisc+0x443/0x4b0 > May 14 13:14:17 laptop kernel: [uart_wait_until_sent+144/224] uart_wait_until_sent+0x90/0xe0 > May 14 13:14:17 laptop kernel: [tty_wait_until_sent+243/272] tty_wait_until_sent+0xf3/0x110 > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > May 14 13:14:17 laptop kernel: [sock_destroy_inode+27/32] sock_destroy_inode+0x1b/0x20 > May 14 13:14:17 laptop kernel: [_end+282132178/1070304612] +0x15a/0x16c [irtty_sir] > May 14 13:14:17 laptop kernel: [_end+282130740/1070304612] irtty_open+0x0/0x1f0 [irtty_sir] > May 14 13:14:17 laptop kernel: [_end+282131236/1070304612] irtty_close+0x0/0x120 [irtty_sir] > May 14 13:14:17 laptop kernel: [_end+282130132/1070304612] irtty_ioctl+0x0/0x260 [irtty_sir] > May 14 13:14:17 laptop kernel: [_end+282129076/1070304612] irtty_receive_buf+0x0/0xc0 [irtty_sir] > May 14 13:14:17 laptop kernel: [_end+282129268/1070304612] irtty_receive_room+0x0/0x30 [irtty_sir] > May 14 13:14:17 laptop kernel: [_end+282129316/1070304612] irtty_write_wakeup+0x0/0x40 [irtty_sir] > May 14 13:14:17 laptop kernel: [_end+282134820/1070304612] +0x0/0xe0 [irtty_sir] > May 14 13:14:17 laptop kernel: [sys_ioctl+256/656] sys_ioctl+0x100/0x290 > May 14 13:14:17 laptop kernel: [syscall_call+7/11] syscall_call+0x7/0xb Ok, nice trace btw: The last printk from sir_dev was at (1) before we called unregister_netdev() - which in turn acquired rtnl_lock (2). Due to the disappearing irda0 device (and CONFIG_HOTPLUG=y) the network layer decided to call the hotplug stuff (3). For this to fork the usermode helper, it scheduled some work for keventd (4). Finally we are blocking for completion until keventd finishes wait4 usermodehelper (5). Unfortunately we are blocking for completion with rtnl still locked and keventd apparently having the linkwatch_event() scheduled before the usermodehelper -> mutual deadlock between irattach and keventd! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-15 13:14 [2.5.69] rtnl-deadlock with usermodehelper and keventd Martin Diehl @ 2003-05-15 20:12 ` Jean Tourrilhes 2003-05-15 20:19 ` Greg KH 0 siblings, 1 reply; 13+ messages in thread From: Jean Tourrilhes @ 2003-05-15 20:12 UTC (permalink / raw) To: Martin Diehl; +Cc: Jeff Garzik, Linux kernel mailing list Greg, This is a HotPlug problem, so would you mind forwarding this to the relevant person and help Martin ? Thanks in advance... Jean On Thu, May 15, 2003 at 03:14:36PM +0200, Martin Diehl wrote: > > Hi, > > seems we may run into mutual deadlock in the unregister_netdev() path with > CONFIG_HOTPLUG=y. I managed to reproduce an irda-user report leading to > the following description: > > * killing irattach (userland daemon comparable to pppd) starts closing the > irda tty-ldisc > > * there we call unregister_netdev() on behalf of the (already closed) > irda0 network device. > > * unregister_netdev() takes rtnl_lock > > * further down in unregister_netdevice() with CONFIG_HOTPLUG the network > layers wants to call userland hotplug stuff > > * the request to fork the usermodehelper gets queued for the event/0 > workqueue (aka keventd) and we are blocking with rtnl still acquired for > completion. > > * at this moment for some reason keventd has a linkwatch_event() > apparently already scheduled before the usermode helper. So we run into > linkwatch_event() with tries to get rtnl_lock. > > -> mutual deadlock: keventd waiting for rtnl_lock which is still hold by > unregister_netdev blocking for completion of work scheduled for keventd. > > I can reproduce this with 2.5.69 with CONFIG_HOTPLUG enabled, no matter > what /proc/sys/kernel/hotplug is, even /bin/true is sufficient. I've no > idea why I get this with irda0 but not with eth0 for example. > FWIW kernel is SMP running on UP without preempt. > > As I don't see how the irda stuff could cause unregister_netdev() to > schedule the hotplug stuff with some linkwatch_event already scheduled > I've no idea what the real problem and fix might be. > > Below a commented calltrace catched right when it hangs as described. > > Thanks > Martin > > ----------------------------- > > > May 14 13:14:17 laptop kernel: events/0 D C12FDF04 412092 3 1 4 2 (L-TLB) > > May 14 13:14:17 laptop kernel: Call Trace: > > May 14 13:14:17 laptop kernel: [__down+150/256] __down+0x96/0x100 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [__down_failed+8/12] __down_failed+0x8/0xc > > May 14 13:14:17 laptop kernel: [.text.lock.rtnetlink+5/54] .text.lock.rtnetlink+0x5/0x36 > > May 14 13:14:17 laptop kernel: [linkwatch_event+29/48] linkwatch_event+0x1d/0x30 > > May 14 13:14:17 laptop kernel: [worker_thread+511/736] worker_thread+0x1ff/0x2e0 > > May 14 13:14:17 laptop kernel: [linkwatch_event+0/48] linkwatch_event+0x0/0x30 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [worker_thread+0/736] worker_thread+0x0/0x2e0 > > May 14 13:14:17 laptop kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18 > > This is the keventd-thread. It has some work scheduled for the network > layer, namely linkwatch_event(). This is currently blocking to get the > rtnl_lock semaphore. > > > > May 14 13:14:17 laptop kernel: irattach D 00000000 4283667124 400 1 537 396 (NOTLB) > > May 14 13:14:17 laptop kernel: Call Trace: > > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0 > > May 14 13:14:17 laptop kernel: [wait_for_completion+153/224] wait_for_completion+0x99/0xe0 > > (5) > > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [queue_work+132/160] queue_work+0x84/0xa0 > > (4) > > > May 14 13:14:17 laptop kernel: [call_usermodehelper+257/272] call_usermodehelper+0x101/0x110 > > May 14 13:14:17 laptop kernel: [__call_usermodehelper+0/112] __call_usermodehelper+0x0/0x70 > > May 14 13:14:17 laptop kernel: [vsprintf+39/48] vsprintf+0x27/0x30 > > May 14 13:14:17 laptop kernel: [sprintf+31/48] sprintf+0x1f/0x30 > > May 14 13:14:17 laptop kernel: [net_run_sbin_hotplug+174/195] net_run_sbin_hotplug+0xae/0xc3 > > (3) > > > May 14 13:14:17 laptop kernel: [try_to_wake_up+296/464] try_to_wake_up+0x128/0x1d0 > > May 14 13:14:17 laptop kernel: [pfifo_fast_reset+158/160] pfifo_fast_reset+0x9e/0xa0 > > May 14 13:14:17 laptop kernel: [qdisc_destroy+158/160] qdisc_destroy+0x9e/0xa0 > > May 14 13:14:17 laptop kernel: [unregister_netdevice+211/608] unregister_netdevice+0xd3/0x260 > > May 14 13:14:17 laptop kernel: [_end+282800068/1070304612] sirdev_dtor+0x0/0x20 [sir_dev] > > (2) > > > May 14 13:14:17 laptop kernel: [unregister_netdev+24/48] unregister_netdev+0x18/0x30 > > (1) > > > May 14 13:14:17 laptop kernel: [_end+282800429/1070304612] sirdev_put_instance+0x149/0x1ad [sir_dev] > > May 14 13:14:17 laptop kernel: [_end+282804705/1070304612] __func__.9+0x0/0x14 [sir_dev] > > May 14 13:14:17 laptop kernel: [_end+282131315/1070304612] irtty_close+0x4f/0x120 [irtty_sir] > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [tty_set_ldisc+1091/1200] tty_set_ldisc+0x443/0x4b0 > > May 14 13:14:17 laptop kernel: [uart_wait_until_sent+144/224] uart_wait_until_sent+0x90/0xe0 > > May 14 13:14:17 laptop kernel: [tty_wait_until_sent+243/272] tty_wait_until_sent+0xf3/0x110 > > May 14 13:14:17 laptop kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 > > May 14 13:14:17 laptop kernel: [sock_destroy_inode+27/32] sock_destroy_inode+0x1b/0x20 > > May 14 13:14:17 laptop kernel: [_end+282132178/1070304612] +0x15a/0x16c [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282130740/1070304612] irtty_open+0x0/0x1f0 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282131236/1070304612] irtty_close+0x0/0x120 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282130132/1070304612] irtty_ioctl+0x0/0x260 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282129076/1070304612] irtty_receive_buf+0x0/0xc0 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282129268/1070304612] irtty_receive_room+0x0/0x30 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282129316/1070304612] irtty_write_wakeup+0x0/0x40 [irtty_sir] > > May 14 13:14:17 laptop kernel: [_end+282134820/1070304612] +0x0/0xe0 [irtty_sir] > > May 14 13:14:17 laptop kernel: [sys_ioctl+256/656] sys_ioctl+0x100/0x290 > > May 14 13:14:17 laptop kernel: [syscall_call+7/11] syscall_call+0x7/0xb > > Ok, nice trace btw: The last printk from sir_dev was at (1) before we > called unregister_netdev() - which in turn acquired rtnl_lock (2). Due to > the disappearing irda0 device (and CONFIG_HOTPLUG=y) the network layer > decided to call the hotplug stuff (3). For this to fork the usermode > helper, it scheduled some work for keventd (4). Finally we are blocking > for completion until keventd finishes wait4 usermodehelper (5). > > Unfortunately we are blocking for completion with rtnl still locked and > keventd apparently having the linkwatch_event() scheduled before the > usermodehelper -> mutual deadlock between irattach and keventd! > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-15 20:12 ` Jean Tourrilhes @ 2003-05-15 20:19 ` Greg KH 2003-05-15 20:25 ` Jean Tourrilhes 0 siblings, 1 reply; 13+ messages in thread From: Greg KH @ 2003-05-15 20:19 UTC (permalink / raw) To: jt; +Cc: Martin Diehl, Jeff Garzik, Linux kernel mailing list On Thu, May 15, 2003 at 01:12:55PM -0700, jt@bougret.hpl.hp.com wrote: > Greg, > > This is a HotPlug problem, so would you mind forwarding this > to the relevant person and help Martin ? But it's a networking subsystem hotplug problem, right? That's way out of my league. I do agree it looks like a real problem, Martin did a great job in tracking this down. thanks, greg k-h ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-15 20:19 ` Greg KH @ 2003-05-15 20:25 ` Jean Tourrilhes 0 siblings, 0 replies; 13+ messages in thread From: Jean Tourrilhes @ 2003-05-15 20:25 UTC (permalink / raw) To: Greg KH; +Cc: Martin Diehl, Jeff Garzik, Linux kernel mailing list On Thu, May 15, 2003 at 01:19:36PM -0700, Greg KH wrote: > On Thu, May 15, 2003 at 01:12:55PM -0700, jt@bougret.hpl.hp.com wrote: > > Greg, > > > > This is a HotPlug problem, so would you mind forwarding this > > to the relevant person and help Martin ? > > But it's a networking subsystem hotplug problem, right? That's way out > of my league. That's why I say "forwarding", I know that we are all humans after all ;-). > I do agree it looks like a real problem, Martin did a great job in > tracking this down. Yes, I'm glad he is back, that way I can dedicate a bit more time to pending wireless stuff ;-) > thanks, > > greg k-h Jean ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <PAO-EX01Cv3uS7sBdxk00001183@pao-ex01.pao.digeo.com>]
* (no subject) [not found] <PAO-EX01Cv3uS7sBdxk00001183@pao-ex01.pao.digeo.com> @ 2003-05-16 0:53 ` David S. Miller 2003-05-16 1:12 ` [2.5.69] rtnl-deadlock with usermodehelper and keventd Andrew Morton 0 siblings, 1 reply; 13+ messages in thread From: David S. Miller @ 2003-05-16 0:53 UTC (permalink / raw) To: akpm; +Cc: lists, linux-kernel, jt Way too invasive, and this adds bugs to the ipmr.c code. I'd much rather see /sbin/hotplug be able to handle things asynchonously. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-16 0:53 ` David S. Miller @ 2003-05-16 1:12 ` Andrew Morton 2003-05-16 1:27 ` Jean Tourrilhes 2003-05-23 7:06 ` Martin Diehl 0 siblings, 2 replies; 13+ messages in thread From: Andrew Morton @ 2003-05-16 1:12 UTC (permalink / raw) To: David S. Miller; +Cc: lists, linux-kernel, jt "David S. Miller" <davem@redhat.com> wrote: > > > Way too invasive, and this adds bugs to the ipmr.c code. Thought you'd say that. My mail client munched the context. For reference: > Martin Diehl <lists@mdiehl.de> wrote: > > > > > > [ unregister_netdevice calls call_usermodehelper which waits for keventd to > > pick up the subprocess_info, but keventd is blocked on rtnl_lock, which > > unregister_netdev took ] > > > > The nice way to fix this is to change unregister_netdev so it runs > net_run_sbin_hotplug() outside rtnl_lock. > > Problem is, it's hard. I just knocked up the below patch, and it still > needs work. Mainly because of the tremendously deep codepaths which call > unregister_netdevice(), knowing that they are under rtnl_lock(). > > It would be nice to clean all that up, and this patch actually contains > good cleanups and a couple of bugfixes. But I don't think it's going to > get there. > > > The other way to fix it is to make call_usermodehelper() more async. That > means kmallocing the sub_info, the work struct and the string arrays and > all the strings. This is more general, and will probably fix other > keventd-related deadlocks. But it is unattractive. > > > Or we could change linkwatch to not take rtnl_lock() by some means. That's > even less general. > > > David, any comments? You think the below approach shuld be pursued? "David S. Miller" <davem@redhat.com> wrote: > > I'd much rather see /sbin/hotplug be able to handle things > asynchonously. Yeah, I'm inclined to agree. I'll take a look at it. Meanwhile please take a look at the leftover cleanups. It fixes a bug in drivers/net/hamradio/dmascc.c too. 25-akpm/drivers/net/hamradio/dmascc.c | 4 +--- 25-akpm/drivers/net/irda/ali-ircc.c | 7 ++----- 25-akpm/drivers/net/irda/donauboe.c | 7 +------ 25-akpm/drivers/net/irda/irda-usb.c | 10 ++++------ 25-akpm/drivers/net/irda/irport.c | 7 ++----- 25-akpm/drivers/net/irda/irtty.c | 7 ++----- 25-akpm/drivers/net/irda/nsc-ircc.c | 7 ++----- 25-akpm/drivers/net/irda/sa1100_ir.c | 7 ++----- 25-akpm/drivers/net/irda/toshoboe.c | 8 ++------ 25-akpm/drivers/net/irda/w83977af_ir.c | 7 ++----- 10 files changed, 20 insertions(+), 51 deletions(-) diff -puN drivers/net/hamradio/dmascc.c~unregister_netdev-cleanup drivers/net/hamradio/dmascc.c --- 25/drivers/net/hamradio/dmascc.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/hamradio/dmascc.c Thu May 15 18:05:29 2003 @@ -325,9 +325,7 @@ void cleanup_module(void) { /* Unregister devices */ for (i = 0; i < 2; i++) { if (info->dev[i].name) - rtnl_lock(); - unregister_netdevice(&info->dev[i]); - rtnl_unlock(); + unregister_netdev(&info->dev[i]); } /* Reset board */ diff -puN drivers/net/irda/ali-ircc.c~unregister_netdev-cleanup drivers/net/irda/ali-ircc.c --- 25/drivers/net/irda/ali-ircc.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/irda/ali-ircc.c Thu May 15 18:05:29 2003 @@ -390,11 +390,8 @@ static int __exit ali_ircc_close(struct iobase = self->io.fir_base; /* Remove netdevice */ - if (self->netdev) { - rtnl_lock(); - unregister_netdevice(self->netdev); - rtnl_unlock(); - } + if (self->netdev) + unregister_netdev(self->netdev); /* Release the PORT that this driver is using */ IRDA_DEBUG(4, "%s(), Releasing Region %03x\n", __FUNCTION__, self->io.fir_base); diff -puN drivers/net/irda/donauboe.c~unregister_netdev-cleanup drivers/net/irda/donauboe.c --- 25/drivers/net/irda/donauboe.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/irda/donauboe.c Thu May 15 18:05:29 2003 @@ -1577,12 +1577,7 @@ toshoboe_close (struct pci_dev *pci_dev) } if (self->netdev) - { - /* Remove netdevice */ - rtnl_lock (); - unregister_netdevice (self->netdev); - rtnl_unlock (); - } + unregister_netdev(self->netdev); kfree (self->ringbuf); self->ringbuf = NULL; diff -puN drivers/net/irda/irda-usb.c~unregister_netdev-cleanup drivers/net/irda/irda-usb.c --- 25/drivers/net/irda/irda-usb.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/irda/irda-usb.c Thu May 15 18:05:29 2003 @@ -1231,12 +1231,10 @@ static inline int irda_usb_close(struct ASSERT(self != NULL, return -1;); /* Remove netdevice */ - if (self->netdev) { - rtnl_lock(); - unregister_netdevice(self->netdev); - self->netdev = NULL; - rtnl_unlock(); - } + if (self->netdev) + unregister_netdev(self->netdev); + self->netdev = NULL; + /* Remove the speed buffer */ if (self->speed_buff != NULL) { kfree(self->speed_buff); diff -puN drivers/net/irda/irport.c~unregister_netdev-cleanup drivers/net/irda/irport.c --- 25/drivers/net/irda/irport.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/irda/irport.c Thu May 15 18:05:29 2003 @@ -256,11 +256,8 @@ int irport_close(struct irport_cb *self) self->dongle = NULL; /* Remove netdevice */ - if (self->netdev) { - rtnl_lock(); - unregister_netdevice(self->netdev); - rtnl_unlock(); - } + if (self->netdev) + unregister_netdev(self->netdev); /* Release the IO-port that this driver is using */ IRDA_DEBUG(0 , "%s(), Releasing Region %03x\n", diff -puN drivers/net/irda/irtty.c~unregister_netdev-cleanup drivers/net/irda/irtty.c --- 25/drivers/net/irda/irtty.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/irda/irtty.c Thu May 15 18:05:29 2003 @@ -282,11 +282,8 @@ static void irtty_close(struct tty_struc self->dongle = NULL; /* Remove netdevice */ - if (self->netdev) { - rtnl_lock(); - unregister_netdevice(self->netdev); - rtnl_unlock(); - } + if (self->netdev) + unregister_netdev(self->netdev); self = hashbin_remove(irtty, (int) self, NULL); diff -puN drivers/net/irda/nsc-ircc.c~unregister_netdev-cleanup drivers/net/irda/nsc-ircc.c --- 25/drivers/net/irda/nsc-ircc.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/irda/nsc-ircc.c Thu May 15 18:05:29 2003 @@ -391,11 +391,8 @@ static int __exit nsc_ircc_close(struct iobase = self->io.fir_base; /* Remove netdevice */ - if (self->netdev) { - rtnl_lock(); - unregister_netdevice(self->netdev); - rtnl_unlock(); - } + if (self->netdev) + unregister_netdev(self->netdev); /* Release the PORT that this driver is using */ IRDA_DEBUG(4, "%s(), Releasing Region %03x\n", diff -puN drivers/net/irda/sa1100_ir.c~unregister_netdev-cleanup drivers/net/irda/sa1100_ir.c --- 25/drivers/net/irda/sa1100_ir.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/irda/sa1100_ir.c Thu May 15 18:05:29 2003 @@ -1122,11 +1122,8 @@ static void __exit sa1100_irda_exit(void { struct net_device *dev = dev_get_drvdata(&sa1100ir_device.dev); - if (dev) { - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); - } + if (dev) + unregister_netdev(dev); sys_device_unregister(&sa1100ir_device); driver_unregister(&sa1100ir_driver); diff -puN drivers/net/irda/toshoboe.c~unregister_netdev-cleanup drivers/net/irda/toshoboe.c --- 25/drivers/net/irda/toshoboe.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/irda/toshoboe.c Thu May 15 18:05:29 2003 @@ -679,12 +679,8 @@ toshoboe_remove (struct pci_dev *pci_dev self->recv_bufs[i] = NULL; } - if (self->netdev) { - /* Remove netdevice */ - rtnl_lock(); - unregister_netdevice(self->netdev); - rtnl_unlock(); - } + if (self->netdev) + unregister_netdev(self->netdev); kfree (self->taskfilebuf); self->taskfilebuf = NULL; diff -puN drivers/net/irda/w83977af_ir.c~unregister_netdev-cleanup drivers/net/irda/w83977af_ir.c --- 25/drivers/net/irda/w83977af_ir.c~unregister_netdev-cleanup Thu May 15 18:05:29 2003 +++ 25-akpm/drivers/net/irda/w83977af_ir.c Thu May 15 18:05:29 2003 @@ -299,11 +299,8 @@ static int w83977af_close(struct w83977a #endif /* CONFIG_USE_W977_PNP */ /* Remove netdevice */ - if (self->netdev) { - rtnl_lock(); - unregister_netdevice(self->netdev); - rtnl_unlock(); - } + if (self->netdev) + unregister_netdev(self->netdev); /* Release the PORT that this driver is using */ IRDA_DEBUG(0 , "%s(), Releasing Region %03x\n", _ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-16 1:12 ` [2.5.69] rtnl-deadlock with usermodehelper and keventd Andrew Morton @ 2003-05-16 1:27 ` Jean Tourrilhes 2003-05-23 7:06 ` Martin Diehl 1 sibling, 0 replies; 13+ messages in thread From: Jean Tourrilhes @ 2003-05-16 1:27 UTC (permalink / raw) To: Andrew Morton; +Cc: David S. Miller, linux-kernel On Thu, May 15, 2003 at 06:12:11PM -0700, Andrew Morton wrote: > > Meanwhile please take a look at the leftover cleanups. It fixes a bug in > drivers/net/hamradio/dmascc.c too. > > > 25-akpm/drivers/net/hamradio/dmascc.c | 4 +--- > 25-akpm/drivers/net/irda/ali-ircc.c | 7 ++----- > 25-akpm/drivers/net/irda/donauboe.c | 7 +------ > 25-akpm/drivers/net/irda/irda-usb.c | 10 ++++------ > 25-akpm/drivers/net/irda/irport.c | 7 ++----- > 25-akpm/drivers/net/irda/irtty.c | 7 ++----- > 25-akpm/drivers/net/irda/nsc-ircc.c | 7 ++----- > 25-akpm/drivers/net/irda/sa1100_ir.c | 7 ++----- > 25-akpm/drivers/net/irda/toshoboe.c | 8 ++------ > 25-akpm/drivers/net/irda/w83977af_ir.c | 7 ++----- > 10 files changed, 20 insertions(+), 51 deletions(-) IrDA part of the patch applied and compiled fine here (I was worried of header issues). And this doesn't conflict with the patches I sent to Jeff ;-) I'll add that in my patch queue, just in case ;-) Thanks... Jean ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-16 1:12 ` [2.5.69] rtnl-deadlock with usermodehelper and keventd Andrew Morton 2003-05-16 1:27 ` Jean Tourrilhes @ 2003-05-23 7:06 ` Martin Diehl 2003-05-23 6:59 ` David S. Miller 1 sibling, 1 reply; 13+ messages in thread From: Martin Diehl @ 2003-05-23 7:06 UTC (permalink / raw) To: Andrew Morton; +Cc: David S. Miller, Greg KH, linux-kernel, Jean Tourrilhes On Thu, 15 May 2003, Andrew Morton wrote: > > > [ unregister_netdevice calls call_usermodehelper which waits for keventd to > > > pick up the subprocess_info, but keventd is blocked on rtnl_lock, which > > > unregister_netdev took ] > > > "David S. Miller" <davem@redhat.com> wrote: > > > > I'd much rather see /sbin/hotplug be able to handle things > > asynchonously. > > Yeah, I'm inclined to agree. I'll take a look at it. Asking just because there was another user hitting this deadlock: it seems with linux-irda we have a very good test case for reproducing this issue. So I'd be happy to go testing patches if this might help. I've also looked into the code to see if I could do something myself. Well, personally I do also think the best way would be to modify the kernel hotplug part so we can call it asynch under rtnl-lock. Unfortunately, given the fact there are already several layers (schedule_work, 2 times kernel_thread and execve) stacked on top of each other, I'm pretty much lost how to fix it without breaking other stuff. I was also thinking about making net_run_sbin_hotplug asynch itself, but I'm unsure how this might interact with call_usermodehelper(), namely wrt. the wait=0 parameter. I assume we all agree this issue isn't easy to resolve. May I suggest adding it to the must-fix-before-2.6 list so it wouldn't get lost? As people tend to run with CONFIG_HOTPLUG=y there would be a lot of trouble with 2.6 otherwise. Thanks. Martin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-23 7:06 ` Martin Diehl @ 2003-05-23 6:59 ` David S. Miller 2003-05-23 9:38 ` Martin Diehl 0 siblings, 1 reply; 13+ messages in thread From: David S. Miller @ 2003-05-23 6:59 UTC (permalink / raw) To: lists; +Cc: akpm, greg, linux-kernel, jt From: Martin Diehl <lists@mdiehl.de> Date: Fri, 23 May 2003 09:06:10 +0200 (CEST) Asking just because there was another user hitting this deadlock: It's fixed in current 2.5.x sources, wake up :-) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-23 6:59 ` David S. Miller @ 2003-05-23 9:38 ` Martin Diehl 2003-05-23 9:43 ` David S. Miller 0 siblings, 1 reply; 13+ messages in thread From: Martin Diehl @ 2003-05-23 9:38 UTC (permalink / raw) To: David S. Miller; +Cc: akpm, Greg KH, linux-kernel, Jean Tourrilhes On Thu, 22 May 2003, David S. Miller wrote: > Asking just because there was another user hitting this deadlock: > > It's fixed in current 2.5.x sources, wake up :-) Oops, sorry for the noise, I hadn't noticed this yet. But nope, unfortunately it's still hanging! I've just tested with 2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl hold. This time however it seems it's triggered from sysfs side! Thanks anyway! Martin ------------------------- May 23 11:07:31 srv kernel: events/0 D C02B05DC 4294946908 4 1 5 3 (L-TLB) May 23 11:07:31 srv kernel: Call Trace: May 23 11:07:31 srv kernel: [__down+197/368] __down+0xc5/0x170 May 23 11:07:31 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 May 23 11:07:31 srv kernel: [__down_failed+8/12] __down_failed+0x8/0xc May 23 11:07:31 srv kernel: [.text.lock.rtnetlink+5/94] .text.lock.rtnetlink+0x5/0x5e May 23 11:07:31 srv kernel: [linkwatch_event+33/48] linkwatch_event+0x21/0x30 May 23 11:07:32 srv kernel: [worker_thread+478/752] worker_thread+0x1de/0x2f0 May 23 11:07:32 srv kernel: [linkwatch_event+0/48] linkwatch_event+0x0/0x30 May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 May 23 11:07:32 srv kernel: [ret_from_fork+6/32] ret_from_fork+0x6/0x20 May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 May 23 11:07:32 srv kernel: [worker_thread+0/752] worker_thread+0x0/0x2f0 May 23 11:07:32 srv kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18 May 23 11:07:32 srv kernel: irattach D 00000000 19710128 2109 1 2104 (NOTLB) May 23 11:07:32 srv kernel: Call Trace: May 23 11:07:32 srv kernel: [wait_for_completion+220/352] wait_for_completion+0xdc/0x160 May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 May 23 11:07:32 srv kernel: [__wake_up+83/144] __wake_up+0x53/0x90 May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 May 23 11:07:32 srv kernel: [call_usermodehelper+290/301] call_usermodehelper+0x122/0x12d May 23 11:07:32 srv kernel: [__call_usermodehelper+0/96] __call_usermodehelper+0x0/0x60 May 23 11:07:32 srv kernel: [__call_usermodehelper+0/96] __call_usermodehelper+0x0/0x60 May 23 11:07:32 srv kernel: [sprintf+18/32] sprintf+0x12/0x20 May 23 11:07:32 srv kernel: [kset_hotplug+419/464] kset_hotplug+0x1a3/0x1d0 May 23 11:07:32 srv kernel: [kobject_del+75/96] kobject_del+0x4b/0x60 May 23 11:07:32 srv kernel: [class_device_del+166/192] class_device_del+0xa6/0xc0 May 23 11:07:32 srv kernel: [class_device_unregister+11/32] class_device_unregister+0xb/0x20 May 23 11:07:32 srv kernel: [unregister_netdevice+356/496] unregister_netdevice+0x164/0x1f0 May 23 11:07:32 srv kernel: [unregister_netdev+16/48] unregister_netdev+0x10/0x30 May 23 11:07:32 srv kernel: [_end+206658744/1070163436] +0x128/0x75c [sir_dev] May 23 11:07:32 srv kernel: [_end+206653994/1070163436] sirdev_put_instance+0xfe/0x110 [sir_dev] May 23 11:07:32 srv kernel: [tty_wait_until_sent+235/256] tty_wait_until_sent+0xeb/0x100 May 23 11:07:32 srv kernel: [_end+206513126/1070163436] irtty_close+0x3a/0x141 [irtty_sir] May 23 11:07:32 srv kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20 May 23 11:07:32 srv kernel: [tty_set_ldisc+205/464] tty_set_ldisc+0xcd/0x1d0 May 23 11:07:32 srv kernel: [serial8250_tx_empty+60/128] serial8250_tx_empty+0x3c/0x80 May 23 11:07:32 srv kernel: [uart_wait_until_sent+150/224] uart_wait_until_sent+0x96/0xe0 May 23 11:07:32 srv kernel: [_end+206514243/1070163436] +0x274/0x31d [irtty_sir] May 23 11:07:32 srv kernel: [_end+206512540/1070163436] irtty_open+0x0/0x210 [irtty_sir] May 23 11:07:32 srv kernel: [_end+206513068/1070163436] irtty_close+0x0/0x141 [irtty_sir] May 23 11:07:32 srv kernel: [_end+206511964/1070163436] irtty_ioctl+0x0/0x240 [irtty_sir] May 23 11:07:32 srv kernel: [_end+206510924/1070163436] irtty_receive_buf+0x0/0xb0 [irtty_sir] May 23 11:07:32 srv kernel: [_end+206511100/1070163436] irtty_receive_room+0x0/0x30 [irtty_sir] May 23 11:07:32 srv kernel: [_end+206511148/1070163436] irtty_write_wakeup+0x0/0x40 [irtty_sir] May 23 11:07:32 srv kernel: [_end+206517164/1070163436] +0x0/0x100 [irtty_sir] May 23 11:07:32 srv kernel: [dput+28/608] dput+0x1c/0x260 May 23 11:07:32 srv kernel: [tty_ioctl+888/1152] tty_ioctl+0x378/0x480 May 23 11:07:32 srv kernel: [sys_ioctl+646/744] sys_ioctl+0x286/0x2e8 May 23 11:07:32 srv kernel: [sys_fcntl64+89/112] sys_fcntl64+0x59/0x70 May 23 11:07:32 srv kernel: [sys_fcntl64+101/112] sys_fcntl64+0x65/0x70 May 23 11:07:32 srv kernel: [syscall_call+7/11] syscall_call+0x7/0xb ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-23 9:38 ` Martin Diehl @ 2003-05-23 9:43 ` David S. Miller 2003-05-23 14:42 ` Stian Jordet ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: David S. Miller @ 2003-05-23 9:43 UTC (permalink / raw) To: lists; +Cc: akpm, greg, linux-kernel, jt, shemminger From: Martin Diehl <lists@mdiehl.de> Date: Fri, 23 May 2003 11:38:38 +0200 (CEST) On Thu, 22 May 2003, David S. Miller wrote: > Asking just because there was another user hitting this deadlock: > > It's fixed in current 2.5.x sources, wake up :-) Oops, sorry for the noise, I hadn't noticed this yet. But nope, unfortunately it's still hanging! I've just tested with 2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl hold. This time however it seems it's triggered from sysfs side! Stephen, you need to do the device class stuff outside of the RTNL lock please. At least I didn't add this bug :-) This should fix it. --- net/core/dev.c.~1~ Fri May 23 02:42:37 2003 +++ net/core/dev.c Fri May 23 02:43:20 2003 @@ -2754,6 +2754,8 @@ dev->next = NULL; + netdev_unregister_sysfs(dev); + netdev_wait_allrefs(dev); BUG_ON(atomic_read(&dev->refcnt)); @@ -2841,8 +2843,6 @@ BUG_TRAP(!dev->master); free_divert_blk(dev); - - netdev_unregister_sysfs(dev); spin_lock(&unregister_todo_lock); dev->next = unregister_todo; ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-23 9:43 ` David S. Miller @ 2003-05-23 14:42 ` Stian Jordet 2003-05-23 16:46 ` Jean Tourrilhes 2003-05-23 23:25 ` Martin Diehl 2 siblings, 0 replies; 13+ messages in thread From: Stian Jordet @ 2003-05-23 14:42 UTC (permalink / raw) To: David S. Miller; +Cc: lists, akpm, greg, linux-kernel, jt, shemminger fre, 23.05.2003 kl. 11.43 skrev David S. Miller: > From: Martin Diehl <lists@mdiehl.de> > Date: Fri, 23 May 2003 11:38:38 +0200 (CEST) > > On Thu, 22 May 2003, David S. Miller wrote: > > > Asking just because there was another user hitting this deadlock: > > > > It's fixed in current 2.5.x sources, wake up :-) > > Oops, sorry for the noise, I hadn't noticed this yet. > > But nope, unfortunately it's still hanging! I've just tested with > 2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl > hold. This time however it seems it's triggered from sysfs side! > > Stephen, you need to do the device class stuff outside of the RTNL > lock please. > > At least I didn't add this bug :-) > > This should fix it. And so it did :-) Thanks. Best regards, Stian ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-23 9:43 ` David S. Miller 2003-05-23 14:42 ` Stian Jordet @ 2003-05-23 16:46 ` Jean Tourrilhes 2003-05-23 23:25 ` Martin Diehl 2 siblings, 0 replies; 13+ messages in thread From: Jean Tourrilhes @ 2003-05-23 16:46 UTC (permalink / raw) To: David S. Miller; +Cc: lists, akpm, greg, linux-kernel, jt, shemminger On Fri, May 23, 2003 at 02:43:08AM -0700, David S. Miller wrote: > From: Martin Diehl <lists@mdiehl.de> > Date: Fri, 23 May 2003 11:38:38 +0200 (CEST) > > On Thu, 22 May 2003, David S. Miller wrote: > > > Asking just because there was another user hitting this deadlock: > > > > It's fixed in current 2.5.x sources, wake up :-) > > Oops, sorry for the noise, I hadn't noticed this yet. > > But nope, unfortunately it's still hanging! I've just tested with > 2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl > hold. This time however it seems it's triggered from sysfs side! > > Stephen, you need to do the device class stuff outside of the RTNL > lock please. > > At least I didn't add this bug :-) > > This should fix it. Thanks Dave, we are very much obliged ! Jean ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [2.5.69] rtnl-deadlock with usermodehelper and keventd 2003-05-23 9:43 ` David S. Miller 2003-05-23 14:42 ` Stian Jordet 2003-05-23 16:46 ` Jean Tourrilhes @ 2003-05-23 23:25 ` Martin Diehl 2 siblings, 0 replies; 13+ messages in thread From: Martin Diehl @ 2003-05-23 23:25 UTC (permalink / raw) To: David S. Miller; +Cc: akpm, Greg KH, linux-kernel, Jean Tourrilhes, shemminger On Fri, 23 May 2003, David S. Miller wrote: > But nope, unfortunately it's still hanging! I've just tested with > 2.5.69-bk15. Running into the same deadlock due to sleeping with rtnl > hold. This time however it seems it's triggered from sysfs side! > > Stephen, you need to do the device class stuff outside of the RTNL > lock please. > > At least I didn't add this bug :-) > > This should fix it. Well, back online now pretty late ;-) Yes, as was already reported I can also confirm from testing the deadlock is gone now. Thanks for resolving this issue! Just a minor question before the thread gets closed: Don't we have the same problem in the register path? register_netdevice is running unter rtnl and calls netdev_register_sysfs. I've never seen a deadlock there, but I'd expect this to sleep for hotplug usermode completion as well. Maybe this is just what you meant by your comment above ;-) Martin ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2003-05-23 23:05 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-15 13:14 [2.5.69] rtnl-deadlock with usermodehelper and keventd Martin Diehl
2003-05-15 20:12 ` Jean Tourrilhes
2003-05-15 20:19 ` Greg KH
2003-05-15 20:25 ` Jean Tourrilhes
[not found] <PAO-EX01Cv3uS7sBdxk00001183@pao-ex01.pao.digeo.com>
2003-05-16 0:53 ` David S. Miller
2003-05-16 1:12 ` [2.5.69] rtnl-deadlock with usermodehelper and keventd Andrew Morton
2003-05-16 1:27 ` Jean Tourrilhes
2003-05-23 7:06 ` Martin Diehl
2003-05-23 6:59 ` David S. Miller
2003-05-23 9:38 ` Martin Diehl
2003-05-23 9:43 ` David S. Miller
2003-05-23 14:42 ` Stian Jordet
2003-05-23 16:46 ` Jean Tourrilhes
2003-05-23 23:25 ` Martin Diehl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox