* [patch] work around/fix deadlock in the bcm43xx driver by making netlink irq safe
@ 2006-06-30 14:24 Arjan van de Ven
[not found] ` <44A536BE.6020209@gentoo.org>
0 siblings, 1 reply; 4+ messages in thread
From: Arjan van de Ven @ 2006-06-30 14:24 UTC (permalink / raw)
To: netdev
Hi,
there is a complex deadlock in the bcm43xx driver, that apparently can
only be solved by rewriting the softmac layer.... or by the patch below
that makes the netlink lock irq safe. (details about the deadlock
available but sort of not relevant for the discussion).
Please consider this patch for 2.6.18
Signed-off-by: Arjan van de Ven <arjan@Linux.intel.com>
---
net/netlink/af_netlink.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Index: linux-2.6.17-mm4/net/netlink/af_netlink.c
===================================================================
--- linux-2.6.17-mm4.orig/net/netlink/af_netlink.c
+++ linux-2.6.17-mm4/net/netlink/af_netlink.c
@@ -157,7 +157,7 @@ static void netlink_sock_destruct(struct
static void netlink_table_grab(void)
{
- write_lock_bh(&nl_table_lock);
+ write_lock_irq(&nl_table_lock);
if (atomic_read(&nl_table_users)) {
DECLARE_WAITQUEUE(wait, current);
@@ -167,9 +167,9 @@ static void netlink_table_grab(void)
set_current_state(TASK_UNINTERRUPTIBLE);
if (atomic_read(&nl_table_users) == 0)
break;
- write_unlock_bh(&nl_table_lock);
+ write_unlock_irq(&nl_table_lock);
schedule();
- write_lock_bh(&nl_table_lock);
+ write_lock_irq(&nl_table_lock);
}
__set_current_state(TASK_RUNNING);
@@ -179,7 +179,7 @@ static void netlink_table_grab(void)
static __inline__ void netlink_table_ungrab(void)
{
- write_unlock_bh(&nl_table_lock);
+ write_unlock_irq(&nl_table_lock);
wake_up(&nl_table_wait);
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] work around/fix deadlock in the bcm43xx driver by making netlink irq safe
[not found] ` <44A536BE.6020209@gentoo.org>
@ 2006-06-30 14:45 ` Arjan van de Ven
2006-07-08 17:59 ` Larry Finger
0 siblings, 1 reply; 4+ messages in thread
From: Arjan van de Ven @ 2006-06-30 14:45 UTC (permalink / raw)
To: Joseph Jezak; +Cc: netdev
Joseph Jezak wrote:
> Can you provide the details to the list? I'll look into getting
> SoftMAC fixed if you do.
>
sure
the basic issue is that bcm43xx does it's rx processing in a softirq, and
holds the bcm->irq_lock during that time. The rx processing calls into the
softmac layer, which in turn calls into netlink.
With this you can get a deadlock that looks like this
cpu 0: user context |cpu1: softirq context
netlink_table_grab takes nl_table_lock as |take bcm->irq_lock in
write_lock_bh, but leaves irqs enabled |bcm43xx_interrupt_tasklet()
|which then in a few steps
|leads to a call to
|bcm43xx_rx
hardirq comes in and the isr tries to take |in bcm43xx_rx, call
bcm->irq_lock but has to wait on cpu 1 |ieee80211_rx_mgt which
|leads to a call to
|wireless_send_event which
|tries to take nl_table_lock
|for read but has to wait
|for cpu0
according to Michael Buesch, the softmac layer should queue the packet
internally for another softirq, similar to what DeviceScape does, so that
the rx softirq can just drop all packets quickly and drop its locks.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] work around/fix deadlock in the bcm43xx driver by making netlink irq safe
2006-06-30 14:45 ` Arjan van de Ven
@ 2006-07-08 17:59 ` Larry Finger
2006-07-08 18:32 ` Michael Buesch
0 siblings, 1 reply; 4+ messages in thread
From: Larry Finger @ 2006-07-08 17:59 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Joseph Jezak, netdev
Arjan van de Ven wrote:
> Joseph Jezak wrote:
>> Can you provide the details to the list? I'll look into getting
>> SoftMAC fixed if you do.
>>
>
> sure
> the basic issue is that bcm43xx does it's rx processing in a softirq,
> and holds the bcm->irq_lock during that time. The rx processing calls
> into the softmac layer, which in turn calls into netlink.
>
> With this you can get a deadlock that looks like this
> cpu 0: user context |cpu1: softirq context
> netlink_table_grab takes nl_table_lock as |take bcm->irq_lock in
> write_lock_bh, but leaves irqs enabled |bcm43xx_interrupt_tasklet()
> |which then in a few steps
> |leads to a call to
> |bcm43xx_rx
>
>
> hardirq comes in and the isr tries to take |in bcm43xx_rx, call
> bcm->irq_lock but has to wait on cpu 1 |ieee80211_rx_mgt which
> |leads to a call to
> |wireless_send_event which
> |tries to take nl_table_lock
> |for read but has to wait
> |for cpu0
>
> according to Michael Buesch, the softmac layer should queue the packet
> internally for another softirq, similar to what DeviceScape does, so
> that the rx softirq can just drop all packets quickly and drop its locks.
I think the deadlock dump shown below is related; however, since I have a uniprocessor system and
the deadlock is not exactly the same, I'll include it here. This is using v2.6.18-rc1 from Linus's tree.
kernel: -> (af_callback_keys + sk->sk_family#2){-.-?} ops: 431 {
kernel: initial-use at:
kernel: [<c0135d48>] lock_acquire+0x68/0x90
kernel: [<c030c195>] _read_lock+0x45/0x60
kernel: [<c02a786f>] sock_def_readable+0x1f/0x90
kernel: [<c02bf072>] netlink_broadcast+0x282/0x320
kernel: [<c01ef236>] kobject_uevent+0x366/0x4c0
kernel: [<c01eed08>] kobject_register+0x48/0x60
kernel: [<c013ce89>] sys_init_module+0x1439/0x1870
kernel: [<c01031cd>] sysenter_past_esp+0x56/0x8d
kernel: hardirq-on-W at:
kernel: [<c0135d48>] lock_acquire+0x68/0x90
kernel: [<c030c37a>] _write_lock_bh+0x4a/0x60
kernel: [<c02beba3>] netlink_release+0xe3/0x330
kernel: [<c02a439d>] sock_release+0x1d/0xf0
kernel: [<c02a44a7>] sock_close+0x37/0x60
kernel: [<c0163688>] __fput+0xd8/0x210
kernel: [<c01637d8>] fput+0x18/0x20
kernel: [<c0160574>] filp_close+0x54/0x80
kernel: [<c011a5df>] put_files_struct+0x7f/0xd0
kernel: [<c011b6cc>] do_exit+0x12c/0x9a0
kernel: [<c011bf7d>] do_group_exit+0x3d/0xa0
kernel: [<c011bff5>] sys_exit_group+0x15/0x20
kernel: [<c01031cd>] sysenter_past_esp+0x56/0x8d
kernel: in-softirq-R at:
kernel: [<c0135d48>] lock_acquire+0x68/0x90
kernel: [<c030c195>] _read_lock+0x45/0x60
kernel: [<c02a786f>] sock_def_readable+0x1f/0x90
kernel: [<c02bf072>] netlink_broadcast+0x282/0x320
kernel: [<c02bb6e4>] wireless_send_event+0x244/0x3b0
kernel: [<e4a2c586>] ieee80211softmac_call_events_locked+0x86/0x140
[ieee80211softmac]
kernel: [<e4a2c674>] ieee80211softmac_call_events+0x34/0x6f [ieee80211softmac]
kernel: [<e4a28faf>] ieee80211softmac_auth_resp+0x19f/0x620 [ieee80211softmac]
kernel: [<e4a1e413>] ieee80211_rx_mgt+0x543/0x810 [ieee80211]
kernel: [<e4a7ea2b>] bcm43xx_rx+0x34b/0x980 [bcm43xx]
kernel: [<e4a820bc>] bcm43xx_dma_rx+0x23c/0x550 [bcm43xx]
kernel: [<e4a6751e>] bcm43xx_interrupt_tasklet+0x38e/0x970 [bcm43xx]
kernel: [<c011e4de>] tasklet_action+0x4e/0x90
kernel: [<c011ecc2>] __do_softirq+0x62/0xe0
kernel: [<c01055cb>] do_softirq+0x9b/0xf0
kernel: softirq-on-R at:
kernel: [<c0135d48>] lock_acquire+0x68/0x90
kernel: [<c030c195>] _read_lock+0x45/0x60
kernel: [<c02a786f>] sock_def_readable+0x1f/0x90
kernel: [<c02bf072>] netlink_broadcast+0x282/0x320
kernel: [<c01ef236>] kobject_uevent+0x366/0x4c0
kernel: [<c01eed08>] kobject_register+0x48/0x60
kernel: [<c013ce89>] sys_init_module+0x1439/0x1870
kernel: [<c01031cd>] sysenter_past_esp+0x56/0x8d
kernel: hardirq-on-R at:
kernel: [<c0135d48>] lock_acquire+0x68/0x90
kernel: [<c030c195>] _read_lock+0x45/0x60
kernel: [<c02a786f>] sock_def_readable+0x1f/0x90
kernel: [<c02bf072>] netlink_broadcast+0x282/0x320
kernel: [<c01ef236>] kobject_uevent+0x366/0x4c0
kernel: [<c01eed08>] kobject_register+0x48/0x60
kernel: [<c013ce89>] sys_init_module+0x1439/0x1870
kernel: [<c01031cd>] sysenter_past_esp+0x56/0x8d
kernel: }
kernel ... key at: [<c05bad60>] af_callback_keys+0x80/0x100
kernel:
kernel: stack backtrace:
kernel: [<c0103d1d>] show_trace_log_lvl+0x13d/0x160
kernel: [<c010525b>] show_trace+0x1b/0x20
kernel: [<c0105286>] dump_stack+0x26/0x30
kernel: [<c0133f7d>] check_usage+0x26d/0x280
kernel: [<c013536f>] __lock_acquire+0x77f/0xdd0
kernel: [<c0135d48>] lock_acquire+0x68/0x90
kernel: [<c030c195>] _read_lock+0x45/0x60
kernel: [<c02a786f>] sock_def_readable+0x1f/0x90
kernel: [<c02bf072>] netlink_broadcast+0x282/0x320
kernel: [<c02bb6e4>] wireless_send_event+0x244/0x3b0
kernel: [<e4a2c586>] ieee80211softmac_call_events_locked+0x86/0x140 [ieee80211softmac]
kernel: [<e4a2c674>] ieee80211softmac_call_events+0x34/0x6f [ieee80211softmac]
kernel: [<e4a28faf>] ieee80211softmac_auth_resp+0x19f/0x620 [ieee80211softmac]
kernel: [<e4a1e413>] ieee80211_rx_mgt+0x543/0x810 [ieee80211]
kernel: [<e4a7ea2b>] bcm43xx_rx+0x34b/0x980 [bcm43xx]
kernel: [<e4a820bc>] bcm43xx_dma_rx+0x23c/0x550 [bcm43xx]
kernel: [<e4a6751e>] bcm43xx_interrupt_tasklet+0x38e/0x970 [bcm43xx]
kernel: [<c011e4de>] tasklet_action+0x4e/0x90
kernel: [<c011ecc2>] __do_softirq+0x62/0xe0
kernel: [<c01055cb>] do_softirq+0x9b/0xf0
kernel: [<c01056d1>] do_IRQ+0xb1/0x110
kernel: [<c0103439>] common_interrupt+0x25/0x2c
kernel: [<c015e01e>] kmem_cache_free+0x6e/0xa0
kernel: [<c019631d>] proc_destroy_inode+0x1d/0x20
kernel: [<c017d7eb>] destroy_inode+0x2b/0x60
kernel: [<c017e753>] generic_delete_inode+0xb3/0x100
kernel: [<c017d8fd>] iput+0x6d/0x80
kernel: [<c017b79b>] dentry_iput+0x7b/0xd0
kernel: [<c017bee4>] dput+0x84/0x190
kernel: [<c0172194>] path_release+0x14/0x30
kernel: [<c017295a>] __link_path_walk+0x3ea/0xef0
kernel: [<c01734b4>] link_path_walk+0x54/0xf0
kernel: [<c017394e>] do_path_lookup+0xae/0x260
kernel: [<c017403a>] __path_lookup_intent_open+0x4a/0x90
kernel: [<c017410a>] path_lookup_open+0x2a/0x30
kernel: [<c01743a7>] open_namei+0x77/0x6d0
kernel: [<c0161898>] do_filp_open+0x38/0x60
kernel: [<c016190b>] do_sys_open+0x4b/0x100
kernel: [<c0161a17>] sys_open+0x27/0x30
kernel: [<c01031cd>] sysenter_past_esp+0x56/0x8d
kernel: [<b7fb9410>] 0xb7fb9410
kernel: SoftMAC: sent association request!
kernel: SoftMAC: associated!
kernel: SoftMAC: Scanning finished
So far, this situation has only occurred during the initial association/authorization steps during
bootup.
Larry
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] work around/fix deadlock in the bcm43xx driver by making netlink irq safe
2006-07-08 17:59 ` Larry Finger
@ 2006-07-08 18:32 ` Michael Buesch
0 siblings, 0 replies; 4+ messages in thread
From: Michael Buesch @ 2006-07-08 18:32 UTC (permalink / raw)
To: Larry Finger, Jiri Benc; +Cc: Arjan van de Ven, Joseph Jezak, netdev
On Saturday 08 July 2006 19:59, you wrote:
> kernel: stack backtrace:
> kernel: [<c0103d1d>] show_trace_log_lvl+0x13d/0x160
> kernel: [<c010525b>] show_trace+0x1b/0x20
> kernel: [<c0105286>] dump_stack+0x26/0x30
> kernel: [<c0133f7d>] check_usage+0x26d/0x280
> kernel: [<c013536f>] __lock_acquire+0x77f/0xdd0
> kernel: [<c0135d48>] lock_acquire+0x68/0x90
> kernel: [<c030c195>] _read_lock+0x45/0x60
> kernel: [<c02a786f>] sock_def_readable+0x1f/0x90
> kernel: [<c02bf072>] netlink_broadcast+0x282/0x320
> kernel: [<c02bb6e4>] wireless_send_event+0x244/0x3b0
This is another fscking deadlock. But it should be fixed by
the suggested workaround as well.
So I see this problem solved for now, too.
> kernel: [<e4a2c586>] ieee80211softmac_call_events_locked+0x86/0x140 [ieee80211softmac]
> kernel: [<e4a2c674>] ieee80211softmac_call_events+0x34/0x6f [ieee80211softmac]
> kernel: [<e4a28faf>] ieee80211softmac_auth_resp+0x19f/0x620 [ieee80211softmac]
> kernel: [<e4a1e413>] ieee80211_rx_mgt+0x543/0x810 [ieee80211]
> kernel: [<e4a7ea2b>] bcm43xx_rx+0x34b/0x980 [bcm43xx]
> kernel: [<e4a820bc>] bcm43xx_dma_rx+0x23c/0x550 [bcm43xx]
> kernel: [<e4a6751e>] bcm43xx_interrupt_tasklet+0x38e/0x970 [bcm43xx]
> kernel: [<c011e4de>] tasklet_action+0x4e/0x90
> kernel: [<c011ecc2>] __do_softirq+0x62/0xe0
> kernel: [<c01055cb>] do_softirq+0x9b/0xf0
> kernel: [<c01056d1>] do_IRQ+0xb1/0x110
> kernel: [<c0103439>] common_interrupt+0x25/0x2c
> kernel: [<c015e01e>] kmem_cache_free+0x6e/0xa0
> kernel: [<c019631d>] proc_destroy_inode+0x1d/0x20
> kernel: [<c017d7eb>] destroy_inode+0x2b/0x60
> kernel: [<c017e753>] generic_delete_inode+0xb3/0x100
> kernel: [<c017d8fd>] iput+0x6d/0x80
> kernel: [<c017b79b>] dentry_iput+0x7b/0xd0
> kernel: [<c017bee4>] dput+0x84/0x190
> kernel: [<c0172194>] path_release+0x14/0x30
> kernel: [<c017295a>] __link_path_walk+0x3ea/0xef0
> kernel: [<c01734b4>] link_path_walk+0x54/0xf0
> kernel: [<c017394e>] do_path_lookup+0xae/0x260
> kernel: [<c017403a>] __path_lookup_intent_open+0x4a/0x90
> kernel: [<c017410a>] path_lookup_open+0x2a/0x30
> kernel: [<c01743a7>] open_namei+0x77/0x6d0
> kernel: [<c0161898>] do_filp_open+0x38/0x60
> kernel: [<c016190b>] do_sys_open+0x4b/0x100
> kernel: [<c0161a17>] sys_open+0x27/0x30
> kernel: [<c01031cd>] sysenter_past_esp+0x56/0x8d
> kernel: [<b7fb9410>] 0xb7fb9410
> kernel: SoftMAC: sent association request!
> kernel: SoftMAC: associated!
> kernel: SoftMAC: Scanning finished
>
> So far, this situation has only occurred during the initial association/authorization steps during
> bootup.
BTW:
Jiri, As you can see, various deadlocks are possible when calling
directly from a driver tasklet into the 802.11 stack, because by
the nature of the 802.11 we must call back into the driver
at some places.
So, I would like to get rid of the not _irqsafe functions
in devicescape. The _irqsafe functions could be stripped by the
postfix and the unsafe functions should be strictly internal to
the stack. I don't see valid usages for them outside of the stack.
--
Greetings Michael.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-07-08 18:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-30 14:24 [patch] work around/fix deadlock in the bcm43xx driver by making netlink irq safe Arjan van de Ven
[not found] ` <44A536BE.6020209@gentoo.org>
2006-06-30 14:45 ` Arjan van de Ven
2006-07-08 17:59 ` Larry Finger
2006-07-08 18:32 ` Michael Buesch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).