* netlink locking warnings in 2.6.21-rc7-mm1
@ 2007-04-24 19:42 Andrew Morton
2007-04-24 21:20 ` David Miller
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2007-04-24 19:42 UTC (permalink / raw)
To: netdev
http://test.kernel.org/abat/84786/debug/console.log is saying
Starting udevd BUG: at kernel/mutex-debug.c:82 debug_mutex_unlock()
Call Trace:
[<ffffffff80244b71>] debug_mutex_unlock+0x161/0x170
[<ffffffff804b759c>] __mutex_unlock_slowpath+0x5c/0x160
[<ffffffff80467902>] netlink_dump+0x82/0x1e0
[<ffffffff8046a042>] netlink_dump_start+0x142/0x180
[<ffffffff80461be0>] rtnl_dump_ifinfo+0x0/0x90
[<ffffffff80461be0>] rtnl_dump_ifinfo+0x0/0x90
[<ffffffff80461e26>] rtnetlink_rcv_msg+0xe6/0x240
[<ffffffff80461d40>] rtnetlink_rcv_msg+0x0/0x240
[<ffffffff80468bf9>] netlink_run_queue+0xb9/0x140
[<ffffffff80461cc4>] rtnetlink_rcv+0x34/0x60
[<ffffffff804690b2>] netlink_data_ready+0x12/0x50
[<ffffffff80467bbb>] netlink_sendskb+0x2b/0x50
[<ffffffff80468fc1>] netlink_sendmsg+0x221/0x300
[<ffffffff8044d1cb>] sock_sendmsg+0xcb/0x100
[<ffffffff8023f420>] autoremove_wake_function+0x0/0x30
[<ffffffff80262d62>] __handle_mm_fault+0x1d2/0x8b0
[<ffffffff8044c17e>] move_addr_to_kernel+0x2e/0x40
[<ffffffff8044d5f6>] sys_sendto+0x146/0x1b0
[<ffffffff8044dd6d>] move_addr_to_user+0x5d/0x70
[<ffffffff8044e34b>] sys_getsockname+0xcb/0xe0
[<ffffffff80209b5e>] system_call+0x7e/0x83
which is
static int netlink_dump(struct sock *sk)
{
...
len = cb->dump(skb, cb);
if (len > 0) {
--> mutex_unlock(nlk->cb_mutex);
skb_queue_tail(&sk->sk_receive_queue, skb);
sk->sk_data_ready(sk, len);
return 0;
}
and
void debug_mutex_unlock(struct mutex *lock)
{
if (unlikely(!debug_locks))
return;
--> DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
DEBUG_LOCKS_WARN_ON(lock->magic != lock);
so it's complaining that cb_mutex is being release by a thread other than
the one which acquired it. I'm unable to reproduce it with their config,
naturally.
Can anyone see any conceivable way in which this can happen? There's some
moderately tricky-looking rewriting of the ->cb_mutex pointer happening in
there. If that were to happen concurrently then this might happen?
otoh, we're seeing several fairly unrelated whacko things coming out of the
lock debugging code in that kernel and I'm wondering if there's some common
bug which is causing false positives.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1
2007-04-24 19:42 netlink locking warnings in 2.6.21-rc7-mm1 Andrew Morton
@ 2007-04-24 21:20 ` David Miller
2007-04-24 21:26 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: David Miller @ 2007-04-24 21:20 UTC (permalink / raw)
To: akpm; +Cc: netdev, kaber
From: Andrew Morton <akpm@linux-foundation.org>
Date: Tue, 24 Apr 2007 12:42:50 -0700
> void debug_mutex_unlock(struct mutex *lock)
> {
> if (unlikely(!debug_locks))
> return;
>
> --> DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
> DEBUG_LOCKS_WARN_ON(lock->magic != lock);
>
> so it's complaining that cb_mutex is being release by a thread other than
> the one which acquired it. I'm unable to reproduce it with their config,
> naturally.
Is it illegal to sleep with a mutex held?
But I'm not so sure that is what is happening here.
net/core/rtnetlink.c does:
err = netlink_dump_start(rtnl, skb, nlh, dumpit, NULL);
here dumpit will be rtnl_dump_ifinfo.
Anyways, netlink_dump_start() will go:
mutex_lock(nlk->cb_mutex);
if (nlk->cb || sock_flag(sk, SOCK_DEAD)) {
mutex_unlock(nlk->cb_mutex);
netlink_destroy_callback(cb);
sock_put(sk);
return -EBUSY;
}
nlk->cb = cb;
mutex_unlock(nlk->cb_mutex);
Nothing there sleeps. Then it does netlink_dump():
mutex_lock(nlk->cb_mutex);
cb = nlk->cb;
if (cb == NULL) {
err = -EINVAL;
goto errout_skb;
}
len = cb->dump(skb, cb);
if (len > 0) {
mutex_unlock(nlk->cb_mutex);
skb_queue_tail(&sk->sk_receive_queue, skb);
sk->sk_data_ready(sk, len);
return 0;
}
nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
if (!nlh)
goto errout_skb;
memcpy(nlmsg_data(nlh), &len, sizeof(len));
skb_queue_tail(&sk->sk_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);
if (cb->done)
cb->done(cb);
nlk->cb = NULL;
mutex_unlock(nlk->cb_mutex);
This invokes rtnl_dump_ifinfo() via cb->dump() which just fills
data into the packet.
There are some wakeups and other bits there, but nothing that
should mess with the nlk->cb_mutex or sleep.
I think I see what might be the problem, nlk->cb_mutex is set
to "rtnl_mutex" and this is used for other purposes in various
code paths here, maybe there is a double mutex_unlock() or
similar due to that?
Patrick?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1
2007-04-24 21:20 ` David Miller
@ 2007-04-24 21:26 ` Andrew Morton
2007-04-25 1:06 ` Herbert Xu
2007-04-25 13:14 ` Patrick McHardy
2 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2007-04-24 21:26 UTC (permalink / raw)
To: David Miller; +Cc: netdev, kaber
On Tue, 24 Apr 2007 14:20:08 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Tue, 24 Apr 2007 12:42:50 -0700
>
> > void debug_mutex_unlock(struct mutex *lock)
> > {
> > if (unlikely(!debug_locks))
> > return;
> >
> > --> DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
> > DEBUG_LOCKS_WARN_ON(lock->magic != lock);
> >
> > so it's complaining that cb_mutex is being release by a thread other than
> > the one which acquired it. I'm unable to reproduce it with their config,
> > naturally.
>
> Is it illegal to sleep with a mutex held?
Nope. Otherwise we'd use spinlocks everywhere ;)
> I think I see what might be the problem, nlk->cb_mutex is set
> to "rtnl_mutex" and this is used for other purposes in various
> code paths here, maybe there is a double mutex_unlock() or
> similar due to that?
ooh, this might explain my mysterious ASSERT_RTNL failures, perhaps.
Am ready to test a patch.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1
2007-04-24 21:20 ` David Miller
2007-04-24 21:26 ` Andrew Morton
@ 2007-04-25 1:06 ` Herbert Xu
2007-04-25 13:17 ` Patrick McHardy
2007-04-25 13:14 ` Patrick McHardy
2 siblings, 1 reply; 10+ messages in thread
From: Herbert Xu @ 2007-04-25 1:06 UTC (permalink / raw)
To: David Miller; +Cc: akpm, netdev, kaber
David Miller <davem@davemloft.net> wrote:
>
> Is it illegal to sleep with a mutex held?
Shouldn't be.
> I think I see what might be the problem, nlk->cb_mutex is set
> to "rtnl_mutex" and this is used for other purposes in various
> code paths here, maybe there is a double mutex_unlock() or
> similar due to that?
Indeed, the RTNL is held during the processing of all RTNETLINK
messages so we'd be trying to lock it recursively here which is
not allowed.
Actually I'm not quite sure what the benefit is for allowing an
override CB mutex. Since we still have to take it and we always
allocate memory for a mutex anyway this would seem to be strictly
worse than just using our own mutex.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1
2007-04-24 21:20 ` David Miller
2007-04-24 21:26 ` Andrew Morton
2007-04-25 1:06 ` Herbert Xu
@ 2007-04-25 13:14 ` Patrick McHardy
2 siblings, 0 replies; 10+ messages in thread
From: Patrick McHardy @ 2007-04-25 13:14 UTC (permalink / raw)
To: David Miller; +Cc: akpm, netdev
David Miller wrote:
> I think I see what might be the problem, nlk->cb_mutex is set
> to "rtnl_mutex" and this is used for other purposes in various
> code paths here, maybe there is a double mutex_unlock() or
> similar due to that?
Nothing in the callbacks should be touching the rtnl, that would
have been broken before since we already used to hold it during
the first invocation of the dump callback, the only difference
is that we now hold it during the entire dump operation. The
cb_mutex is only set on socket creation, so there's also nothing
that should be rewriting it.
I can't see whats wrong here.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1
2007-04-25 1:06 ` Herbert Xu
@ 2007-04-25 13:17 ` Patrick McHardy
2007-04-25 19:44 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2007-04-25 13:17 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, akpm, netdev
Herbert Xu wrote:
> David Miller <davem@davemloft.net> wrote:
>
>>I think I see what might be the problem, nlk->cb_mutex is set
>>to "rtnl_mutex" and this is used for other purposes in various
>>code paths here, maybe there is a double mutex_unlock() or
>>similar due to that?
>
>
> Indeed, the RTNL is held during the processing of all RTNETLINK
> messages so we'd be trying to lock it recursively here which is
> not allowed.
No, it is released before calling netlink_dump_start().
> Actually I'm not quite sure what the benefit is for allowing an
> override CB mutex. Since we still have to take it and we always
> allocate memory for a mutex anyway this would seem to be strictly
> worse than just using our own mutex.
The idea was that netlink families that don't want to consistently
hold the same mutex used for queue processing during the entire
dump operation can still have per-socket mutexes just to protect
the callback data and have concurrent dump continuations.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1
2007-04-25 13:17 ` Patrick McHardy
@ 2007-04-25 19:44 ` Andrew Morton
2007-04-25 20:51 ` Patrick McHardy
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2007-04-25 19:44 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Herbert Xu, David Miller, netdev
I just retested bare net-2.6.22, pulled 30 minutes ago. I got just one
warning:
PM: Removing info for No Bus:0000:06:0b.0
eth0: no IPv6 routers present
ipw2200: Radio Frequency Kill Switch is On:
Kill switch must be turned off for wireless networking to work.
PM: Adding info for No Bus:eth1
ipw2200: Detected geography ZZA (11 802.11bg channels, 13 802.11a channels)
ipw2200: Failed to send WEP_KEY: Aborted due to RF kill switch.
ipw2200: Failed to send WEP_KEY: Command timed out.
ipw2200: Failed to send WEP_KEY: Command timed out.
BUG: at kernel/mutex-debug.c:82 debug_mutex_unlock()
[<c012d18a>] debug_mutex_unlock+0x5a/0x134
[<c02d67e2>] __mutex_unlock_slowpath+0x9d/0xcf
[<f8c3618b>] ipw_wx_set_encode+0x0/0x82 [ipw2200]
[<c028b92c>] rtnl_unlock+0xa/0x29
[<c0286651>] dev_ioctl+0x3d0/0x402
[<c014b078>] __handle_mm_fault+0x7c6/0x7e8
[<c01a649b>] selinux_file_alloc_security+0x1f/0x40
[<c027b943>] sock_ioctl+0x0/0x1be
[<c0162925>] do_ioctl+0x19/0x4d
[<c0162b58>] vfs_ioctl+0x1ff/0x216
[<c0162bbb>] sys_ioctl+0x4c/0x65
[<c0103b0c>] syscall_call+0x7/0xb
[<c02d0000>] unix_dgram_sendmsg+0x76/0x400
=======================
It's 100% reproducible here, using
http://userweb.kernel.org/~akpm/config-sony.txt
The weird ASSERT_RTNL warnings aren't there, so something else in -mm
(prior to git-net.patch in the series file) would appear to be interacting
with net changes.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1
2007-04-25 19:44 ` Andrew Morton
@ 2007-04-25 20:51 ` Patrick McHardy
2007-04-25 21:01 ` David Miller
2007-04-25 23:48 ` Andrew Morton
0 siblings, 2 replies; 10+ messages in thread
From: Patrick McHardy @ 2007-04-25 20:51 UTC (permalink / raw)
To: Andrew Morton; +Cc: Herbert Xu, David Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]
Andrew Morton wrote:
> I just retested bare net-2.6.22, pulled 30 minutes ago. I got just one
> warning:
>
> BUG: at kernel/mutex-debug.c:82 debug_mutex_unlock()
> [<c012d18a>] debug_mutex_unlock+0x5a/0x134
> [<c02d67e2>] __mutex_unlock_slowpath+0x9d/0xcf
> [<f8c3618b>] ipw_wx_set_encode+0x0/0x82 [ipw2200]
> [<c028b92c>] rtnl_unlock+0xa/0x29
> [<c0286651>] dev_ioctl+0x3d0/0x402
> [<c014b078>] __handle_mm_fault+0x7c6/0x7e8
> [<c01a649b>] selinux_file_alloc_security+0x1f/0x40
> [<c027b943>] sock_ioctl+0x0/0x1be
> [<c0162925>] do_ioctl+0x19/0x4d
> [<c0162b58>] vfs_ioctl+0x1ff/0x216
> [<c0162bbb>] sys_ioctl+0x4c/0x65
> [<c0103b0c>] syscall_call+0x7/0xb
> [<c02d0000>] unix_dgram_sendmsg+0x76/0x400
> =======================
>
> It's 100% reproducible here, using
> http://userweb.kernel.org/~akpm/config-sony.txt
>
>
> The weird ASSERT_RTNL warnings aren't there, so something else in -mm
> (prior to git-net.patch in the series file) would appear to be interacting
> with net changes.
I think I found the problem, the rtnl_mutex was reinitialized on every
rtnetlink socket creation. This is most likely responsible for both
warnings.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1235 bytes --]
[NETLINK]: don't reinitialize callback mutex
Don't reinitialize the callback mutex the netlink_kernel_create caller
handed in, it is supposed to already be initialized and could already
be held by someone.
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
commit 9cc4e9c2d8b022c10ded98610a3cd76a8b89cf49
tree e53f10a158858e20ef2e9922cabc5bf43980708d
parent 7255fbb088e3f1b8be97472a38f645a8da595fe2
author Patrick McHardy <kaber@trash.net> Wed, 25 Apr 2007 22:47:20 +0200
committer Patrick McHardy <kaber@trash.net> Wed, 25 Apr 2007 22:47:20 +0200
net/netlink/af_netlink.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ec16c9b..64d4b27 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -388,8 +388,12 @@ static int __netlink_create(struct socket *sock, struct mutex *cb_mutex,
sock_init_data(sock, sk);
nlk = nlk_sk(sk);
- nlk->cb_mutex = cb_mutex ? : &nlk->cb_def_mutex;
- mutex_init(nlk->cb_mutex);
+ if (cb_mutex)
+ nlk->cb_mutex = cb_mutex;
+ else {
+ nlk->cb_mutex = &nlk->cb_def_mutex;
+ mutex_init(nlk->cb_mutex);
+ }
init_waitqueue_head(&nlk->wait);
sk->sk_destruct = netlink_sock_destruct;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1
2007-04-25 20:51 ` Patrick McHardy
@ 2007-04-25 21:01 ` David Miller
2007-04-25 23:48 ` Andrew Morton
1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2007-04-25 21:01 UTC (permalink / raw)
To: kaber; +Cc: akpm, herbert, netdev
From: Patrick McHardy <kaber@trash.net>
Date: Wed, 25 Apr 2007 22:51:43 +0200
> [NETLINK]: don't reinitialize callback mutex
>
> Don't reinitialize the callback mutex the netlink_kernel_create caller
> handed in, it is supposed to already be initialized and could already
> be held by someone.
>
> Signed-off-by: Patrick McHardy <kaber@trash.net>
Applied, thanks a lot for tracking this down Patrick.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: netlink locking warnings in 2.6.21-rc7-mm1
2007-04-25 20:51 ` Patrick McHardy
2007-04-25 21:01 ` David Miller
@ 2007-04-25 23:48 ` Andrew Morton
1 sibling, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2007-04-25 23:48 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Herbert Xu, David Miller, netdev
On Wed, 25 Apr 2007 22:51:43 +0200 Patrick McHardy <kaber@trash.net> wrote:
> I think I found the problem, the rtnl_mutex was reinitialized on every
> rtnetlink socket creation. This is most likely responsible for both
> warnings.
Yup, no warnings any more, thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-04-25 23:48 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-24 19:42 netlink locking warnings in 2.6.21-rc7-mm1 Andrew Morton
2007-04-24 21:20 ` David Miller
2007-04-24 21:26 ` Andrew Morton
2007-04-25 1:06 ` Herbert Xu
2007-04-25 13:17 ` Patrick McHardy
2007-04-25 19:44 ` Andrew Morton
2007-04-25 20:51 ` Patrick McHardy
2007-04-25 21:01 ` David Miller
2007-04-25 23:48 ` Andrew Morton
2007-04-25 13:14 ` Patrick McHardy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).