* netlink locking warnings in 2.6.21-rc7-mm1
@ 2007-04-24 19:42 Andrew Morton
2007-04-24 21:20 ` David Miller
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2007-04-24 19:42 UTC (permalink / raw)
To: netdev
http://test.kernel.org/abat/84786/debug/console.log is saying
Starting udevd BUG: at kernel/mutex-debug.c:82 debug_mutex_unlock()
Call Trace:
[<ffffffff80244b71>] debug_mutex_unlock+0x161/0x170
[<ffffffff804b759c>] __mutex_unlock_slowpath+0x5c/0x160
[<ffffffff80467902>] netlink_dump+0x82/0x1e0
[<ffffffff8046a042>] netlink_dump_start+0x142/0x180
[<ffffffff80461be0>] rtnl_dump_ifinfo+0x0/0x90
[<ffffffff80461be0>] rtnl_dump_ifinfo+0x0/0x90
[<ffffffff80461e26>] rtnetlink_rcv_msg+0xe6/0x240
[<ffffffff80461d40>] rtnetlink_rcv_msg+0x0/0x240
[<ffffffff80468bf9>] netlink_run_queue+0xb9/0x140
[<ffffffff80461cc4>] rtnetlink_rcv+0x34/0x60
[<ffffffff804690b2>] netlink_data_ready+0x12/0x50
[<ffffffff80467bbb>] netlink_sendskb+0x2b/0x50
[<ffffffff80468fc1>] netlink_sendmsg+0x221/0x300
[<ffffffff8044d1cb>] sock_sendmsg+0xcb/0x100
[<ffffffff8023f420>] autoremove_wake_function+0x0/0x30
[<ffffffff80262d62>] __handle_mm_fault+0x1d2/0x8b0
[<ffffffff8044c17e>] move_addr_to_kernel+0x2e/0x40
[<ffffffff8044d5f6>] sys_sendto+0x146/0x1b0
[<ffffffff8044dd6d>] move_addr_to_user+0x5d/0x70
[<ffffffff8044e34b>] sys_getsockname+0xcb/0xe0
[<ffffffff80209b5e>] system_call+0x7e/0x83
which is
static int netlink_dump(struct sock *sk)
{
...
len = cb->dump(skb, cb);
if (len > 0) {
--> mutex_unlock(nlk->cb_mutex);
skb_queue_tail(&sk->sk_receive_queue, skb);
sk->sk_data_ready(sk, len);
return 0;
}
and
void debug_mutex_unlock(struct mutex *lock)
{
if (unlikely(!debug_locks))
return;
--> DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
DEBUG_LOCKS_WARN_ON(lock->magic != lock);
so it's complaining that cb_mutex is being release by a thread other than
the one which acquired it. I'm unable to reproduce it with their config,
naturally.
Can anyone see any conceivable way in which this can happen? There's some
moderately tricky-looking rewriting of the ->cb_mutex pointer happening in
there. If that were to happen concurrently then this might happen?
otoh, we're seeing several fairly unrelated whacko things coming out of the
lock debugging code in that kernel and I'm wondering if there's some common
bug which is causing false positives.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: netlink locking warnings in 2.6.21-rc7-mm1 2007-04-24 19:42 netlink locking warnings in 2.6.21-rc7-mm1 Andrew Morton @ 2007-04-24 21:20 ` David Miller 2007-04-24 21:26 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: David Miller @ 2007-04-24 21:20 UTC (permalink / raw) To: akpm; +Cc: netdev, kaber From: Andrew Morton <akpm@linux-foundation.org> Date: Tue, 24 Apr 2007 12:42:50 -0700 > void debug_mutex_unlock(struct mutex *lock) > { > if (unlikely(!debug_locks)) > return; > > --> DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info()); > DEBUG_LOCKS_WARN_ON(lock->magic != lock); > > so it's complaining that cb_mutex is being release by a thread other than > the one which acquired it. I'm unable to reproduce it with their config, > naturally. Is it illegal to sleep with a mutex held? But I'm not so sure that is what is happening here. net/core/rtnetlink.c does: err = netlink_dump_start(rtnl, skb, nlh, dumpit, NULL); here dumpit will be rtnl_dump_ifinfo. Anyways, netlink_dump_start() will go: mutex_lock(nlk->cb_mutex); if (nlk->cb || sock_flag(sk, SOCK_DEAD)) { mutex_unlock(nlk->cb_mutex); netlink_destroy_callback(cb); sock_put(sk); return -EBUSY; } nlk->cb = cb; mutex_unlock(nlk->cb_mutex); Nothing there sleeps. Then it does netlink_dump(): mutex_lock(nlk->cb_mutex); cb = nlk->cb; if (cb == NULL) { err = -EINVAL; goto errout_skb; } len = cb->dump(skb, cb); if (len > 0) { mutex_unlock(nlk->cb_mutex); skb_queue_tail(&sk->sk_receive_queue, skb); sk->sk_data_ready(sk, len); return 0; } nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); if (!nlh) goto errout_skb; memcpy(nlmsg_data(nlh), &len, sizeof(len)); skb_queue_tail(&sk->sk_receive_queue, skb); sk->sk_data_ready(sk, skb->len); if (cb->done) cb->done(cb); nlk->cb = NULL; mutex_unlock(nlk->cb_mutex); This invokes rtnl_dump_ifinfo() via cb->dump() which just fills data into the packet. There are some wakeups and other bits there, but nothing that should mess with the nlk->cb_mutex or sleep. I think I see what might be the problem, nlk->cb_mutex is set to "rtnl_mutex" and this is used for other purposes in various code paths here, maybe there is a double mutex_unlock() or similar due to that? Patrick? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1 2007-04-24 21:20 ` David Miller @ 2007-04-24 21:26 ` Andrew Morton 2007-04-25 1:06 ` Herbert Xu 2007-04-25 13:14 ` Patrick McHardy 2 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2007-04-24 21:26 UTC (permalink / raw) To: David Miller; +Cc: netdev, kaber On Tue, 24 Apr 2007 14:20:08 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Tue, 24 Apr 2007 12:42:50 -0700 > > > void debug_mutex_unlock(struct mutex *lock) > > { > > if (unlikely(!debug_locks)) > > return; > > > > --> DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info()); > > DEBUG_LOCKS_WARN_ON(lock->magic != lock); > > > > so it's complaining that cb_mutex is being release by a thread other than > > the one which acquired it. I'm unable to reproduce it with their config, > > naturally. > > Is it illegal to sleep with a mutex held? Nope. Otherwise we'd use spinlocks everywhere ;) > I think I see what might be the problem, nlk->cb_mutex is set > to "rtnl_mutex" and this is used for other purposes in various > code paths here, maybe there is a double mutex_unlock() or > similar due to that? ooh, this might explain my mysterious ASSERT_RTNL failures, perhaps. Am ready to test a patch. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1 2007-04-24 21:20 ` David Miller 2007-04-24 21:26 ` Andrew Morton @ 2007-04-25 1:06 ` Herbert Xu 2007-04-25 13:17 ` Patrick McHardy 2007-04-25 13:14 ` Patrick McHardy 2 siblings, 1 reply; 10+ messages in thread From: Herbert Xu @ 2007-04-25 1:06 UTC (permalink / raw) To: David Miller; +Cc: akpm, netdev, kaber David Miller <davem@davemloft.net> wrote: > > Is it illegal to sleep with a mutex held? Shouldn't be. > I think I see what might be the problem, nlk->cb_mutex is set > to "rtnl_mutex" and this is used for other purposes in various > code paths here, maybe there is a double mutex_unlock() or > similar due to that? Indeed, the RTNL is held during the processing of all RTNETLINK messages so we'd be trying to lock it recursively here which is not allowed. Actually I'm not quite sure what the benefit is for allowing an override CB mutex. Since we still have to take it and we always allocate memory for a mutex anyway this would seem to be strictly worse than just using our own mutex. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1 2007-04-25 1:06 ` Herbert Xu @ 2007-04-25 13:17 ` Patrick McHardy 2007-04-25 19:44 ` Andrew Morton 0 siblings, 1 reply; 10+ messages in thread From: Patrick McHardy @ 2007-04-25 13:17 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, akpm, netdev Herbert Xu wrote: > David Miller <davem@davemloft.net> wrote: > >>I think I see what might be the problem, nlk->cb_mutex is set >>to "rtnl_mutex" and this is used for other purposes in various >>code paths here, maybe there is a double mutex_unlock() or >>similar due to that? > > > Indeed, the RTNL is held during the processing of all RTNETLINK > messages so we'd be trying to lock it recursively here which is > not allowed. No, it is released before calling netlink_dump_start(). > Actually I'm not quite sure what the benefit is for allowing an > override CB mutex. Since we still have to take it and we always > allocate memory for a mutex anyway this would seem to be strictly > worse than just using our own mutex. The idea was that netlink families that don't want to consistently hold the same mutex used for queue processing during the entire dump operation can still have per-socket mutexes just to protect the callback data and have concurrent dump continuations. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1 2007-04-25 13:17 ` Patrick McHardy @ 2007-04-25 19:44 ` Andrew Morton 2007-04-25 20:51 ` Patrick McHardy 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2007-04-25 19:44 UTC (permalink / raw) To: Patrick McHardy; +Cc: Herbert Xu, David Miller, netdev I just retested bare net-2.6.22, pulled 30 minutes ago. I got just one warning: PM: Removing info for No Bus:0000:06:0b.0 eth0: no IPv6 routers present ipw2200: Radio Frequency Kill Switch is On: Kill switch must be turned off for wireless networking to work. PM: Adding info for No Bus:eth1 ipw2200: Detected geography ZZA (11 802.11bg channels, 13 802.11a channels) ipw2200: Failed to send WEP_KEY: Aborted due to RF kill switch. ipw2200: Failed to send WEP_KEY: Command timed out. ipw2200: Failed to send WEP_KEY: Command timed out. BUG: at kernel/mutex-debug.c:82 debug_mutex_unlock() [<c012d18a>] debug_mutex_unlock+0x5a/0x134 [<c02d67e2>] __mutex_unlock_slowpath+0x9d/0xcf [<f8c3618b>] ipw_wx_set_encode+0x0/0x82 [ipw2200] [<c028b92c>] rtnl_unlock+0xa/0x29 [<c0286651>] dev_ioctl+0x3d0/0x402 [<c014b078>] __handle_mm_fault+0x7c6/0x7e8 [<c01a649b>] selinux_file_alloc_security+0x1f/0x40 [<c027b943>] sock_ioctl+0x0/0x1be [<c0162925>] do_ioctl+0x19/0x4d [<c0162b58>] vfs_ioctl+0x1ff/0x216 [<c0162bbb>] sys_ioctl+0x4c/0x65 [<c0103b0c>] syscall_call+0x7/0xb [<c02d0000>] unix_dgram_sendmsg+0x76/0x400 ======================= It's 100% reproducible here, using http://userweb.kernel.org/~akpm/config-sony.txt The weird ASSERT_RTNL warnings aren't there, so something else in -mm (prior to git-net.patch in the series file) would appear to be interacting with net changes. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1 2007-04-25 19:44 ` Andrew Morton @ 2007-04-25 20:51 ` Patrick McHardy 2007-04-25 21:01 ` David Miller 2007-04-25 23:48 ` Andrew Morton 0 siblings, 2 replies; 10+ messages in thread From: Patrick McHardy @ 2007-04-25 20:51 UTC (permalink / raw) To: Andrew Morton; +Cc: Herbert Xu, David Miller, netdev [-- Attachment #1: Type: text/plain, Size: 1161 bytes --] Andrew Morton wrote: > I just retested bare net-2.6.22, pulled 30 minutes ago. I got just one > warning: > > BUG: at kernel/mutex-debug.c:82 debug_mutex_unlock() > [<c012d18a>] debug_mutex_unlock+0x5a/0x134 > [<c02d67e2>] __mutex_unlock_slowpath+0x9d/0xcf > [<f8c3618b>] ipw_wx_set_encode+0x0/0x82 [ipw2200] > [<c028b92c>] rtnl_unlock+0xa/0x29 > [<c0286651>] dev_ioctl+0x3d0/0x402 > [<c014b078>] __handle_mm_fault+0x7c6/0x7e8 > [<c01a649b>] selinux_file_alloc_security+0x1f/0x40 > [<c027b943>] sock_ioctl+0x0/0x1be > [<c0162925>] do_ioctl+0x19/0x4d > [<c0162b58>] vfs_ioctl+0x1ff/0x216 > [<c0162bbb>] sys_ioctl+0x4c/0x65 > [<c0103b0c>] syscall_call+0x7/0xb > [<c02d0000>] unix_dgram_sendmsg+0x76/0x400 > ======================= > > It's 100% reproducible here, using > http://userweb.kernel.org/~akpm/config-sony.txt > > > The weird ASSERT_RTNL warnings aren't there, so something else in -mm > (prior to git-net.patch in the series file) would appear to be interacting > with net changes. I think I found the problem, the rtnl_mutex was reinitialized on every rtnetlink socket creation. This is most likely responsible for both warnings. [-- Attachment #2: x --] [-- Type: text/plain, Size: 1235 bytes --] [NETLINK]: don't reinitialize callback mutex Don't reinitialize the callback mutex the netlink_kernel_create caller handed in, it is supposed to already be initialized and could already be held by someone. Signed-off-by: Patrick McHardy <kaber@trash.net> --- commit 9cc4e9c2d8b022c10ded98610a3cd76a8b89cf49 tree e53f10a158858e20ef2e9922cabc5bf43980708d parent 7255fbb088e3f1b8be97472a38f645a8da595fe2 author Patrick McHardy <kaber@trash.net> Wed, 25 Apr 2007 22:47:20 +0200 committer Patrick McHardy <kaber@trash.net> Wed, 25 Apr 2007 22:47:20 +0200 net/netlink/af_netlink.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index ec16c9b..64d4b27 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -388,8 +388,12 @@ static int __netlink_create(struct socket *sock, struct mutex *cb_mutex, sock_init_data(sock, sk); nlk = nlk_sk(sk); - nlk->cb_mutex = cb_mutex ? : &nlk->cb_def_mutex; - mutex_init(nlk->cb_mutex); + if (cb_mutex) + nlk->cb_mutex = cb_mutex; + else { + nlk->cb_mutex = &nlk->cb_def_mutex; + mutex_init(nlk->cb_mutex); + } init_waitqueue_head(&nlk->wait); sk->sk_destruct = netlink_sock_destruct; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1 2007-04-25 20:51 ` Patrick McHardy @ 2007-04-25 21:01 ` David Miller 2007-04-25 23:48 ` Andrew Morton 1 sibling, 0 replies; 10+ messages in thread From: David Miller @ 2007-04-25 21:01 UTC (permalink / raw) To: kaber; +Cc: akpm, herbert, netdev From: Patrick McHardy <kaber@trash.net> Date: Wed, 25 Apr 2007 22:51:43 +0200 > [NETLINK]: don't reinitialize callback mutex > > Don't reinitialize the callback mutex the netlink_kernel_create caller > handed in, it is supposed to already be initialized and could already > be held by someone. > > Signed-off-by: Patrick McHardy <kaber@trash.net> Applied, thanks a lot for tracking this down Patrick. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1 2007-04-25 20:51 ` Patrick McHardy 2007-04-25 21:01 ` David Miller @ 2007-04-25 23:48 ` Andrew Morton 1 sibling, 0 replies; 10+ messages in thread From: Andrew Morton @ 2007-04-25 23:48 UTC (permalink / raw) To: Patrick McHardy; +Cc: Herbert Xu, David Miller, netdev On Wed, 25 Apr 2007 22:51:43 +0200 Patrick McHardy <kaber@trash.net> wrote: > I think I found the problem, the rtnl_mutex was reinitialized on every > rtnetlink socket creation. This is most likely responsible for both > warnings. Yup, no warnings any more, thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1 2007-04-24 21:20 ` David Miller 2007-04-24 21:26 ` Andrew Morton 2007-04-25 1:06 ` Herbert Xu @ 2007-04-25 13:14 ` Patrick McHardy 2 siblings, 0 replies; 10+ messages in thread From: Patrick McHardy @ 2007-04-25 13:14 UTC (permalink / raw) To: David Miller; +Cc: akpm, netdev David Miller wrote: > I think I see what might be the problem, nlk->cb_mutex is set > to "rtnl_mutex" and this is used for other purposes in various > code paths here, maybe there is a double mutex_unlock() or > similar due to that? Nothing in the callbacks should be touching the rtnl, that would have been broken before since we already used to hold it during the first invocation of the dump callback, the only difference is that we now hold it during the entire dump operation. The cb_mutex is only set on socket creation, so there's also nothing that should be rewriting it. I can't see whats wrong here. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-04-25 23:48 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-04-24 19:42 netlink locking warnings in 2.6.21-rc7-mm1 Andrew Morton 2007-04-24 21:20 ` David Miller 2007-04-24 21:26 ` Andrew Morton 2007-04-25 1:06 ` Herbert Xu 2007-04-25 13:17 ` Patrick McHardy 2007-04-25 19:44 ` Andrew Morton 2007-04-25 20:51 ` Patrick McHardy 2007-04-25 21:01 ` David Miller 2007-04-25 23:48 ` Andrew Morton 2007-04-25 13:14 ` Patrick McHardy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).