* netlink circular locking dependency
@ 2008-06-14 12:35 Marcel Holtmann
2008-06-16 21:34 ` Jarek Poplawski
0 siblings, 1 reply; 19+ messages in thread
From: Marcel Holtmann @ 2008-06-14 12:35 UTC (permalink / raw)
To: netdev; +Cc: Ingo Molnar
Hi guys,
so I have been working on integrating netlink support into the Bluetooth
stack. However this is running on a 2.6.26-rc2 kernel on a Quad G5 with
no netlink support for Bluetooth in the kernel, but with the userspace
part within hcid. That of course will fail to load, but that is fine and
expected if you run the new daemon on old kernels. I can upgrade that
machine to the latest vanilla kernel if needed, but just wanna quickly
check if anybody has seen this before:
NET: Registered protocol family 17
Bluetooth: L2CAP ver 2.9
Bluetooth: L2CAP socket layer initialized
Bluetooth: RFCOMM socket layer initialized
Bluetooth: RFCOMM TTY layer initialized
Bluetooth: RFCOMM ver 1.8
hci_cmd_task: hci0 command tx timeout
hci_cmd_task: hci0 command tx timeout
hci_cmd_task: hci0 command tx timeout
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.26-rc2 #5
-------------------------------------------------------
hcid/4136 is trying to acquire lock:
(genl_mutex){--..}, at: [<c0000000002ace4c>] .ctrl_dumpfamily+0x74/0x174
but task is already holding lock:
(nlk->cb_mutex){--..}, at: [<c0000000002a766c>] .netlink_dump+0x58/0x27c
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (nlk->cb_mutex){--..}:
[<c000000000077758>] .__lock_acquire+0xd14/0xf88
[<c000000000077a9c>] .lock_acquire+0xd0/0x11c
[<c000000000314068>] .mutex_lock_nested+0x158/0x3f4
[<c0000000002aa9fc>] .netlink_dump_start+0x150/0x228
[<c0000000002accc4>] .genl_rcv_msg+0xe4/0x1f8
[<c0000000002a902c>] .netlink_rcv_skb+0x80/0x118
[<c0000000002ad0f0>] .genl_rcv+0x4c/0x7c
[<c0000000002a8ca4>] .netlink_unicast+0x2f8/0x3d8
[<c0000000002a9920>] .netlink_sendmsg+0x2dc/0x320
[<c00000000027e9b0>] .sock_sendmsg+0xf0/0x134
[<c00000000027ec10>] .sys_sendmsg+0x21c/0x2b4
[<c0000000002a2f68>] .compat_sys_sendmsg+0x20/0x38
[<c0000000002a4200>] .compat_sys_socketcall+0x1ec/0x220
[<c0000000000076b0>] syscall_exit+0x0/0x40
-> #0 (genl_mutex){--..}:
[<c000000000077650>] .__lock_acquire+0xc0c/0xf88
[<c000000000077a9c>] .lock_acquire+0xd0/0x11c
[<c000000000314068>] .mutex_lock_nested+0x158/0x3f4
[<c0000000002ace4c>] .ctrl_dumpfamily+0x74/0x174
[<c0000000002a76a0>] .netlink_dump+0x8c/0x27c
[<c0000000002a9b58>] .netlink_recvmsg+0x1f4/0x378
[<c00000000027e878>] .sock_recvmsg+0x100/0x148
[<c00000000027fedc>] .sys_recvmsg+0x180/0x31c
[<c0000000002a2f30>] .compat_sys_recvmsg+0x20/0x38
[<c0000000002a420c>] .compat_sys_socketcall+0x1f8/0x220
[<c0000000000076b0>] syscall_exit+0x0/0x40
other info that might help us debug this:
1 lock held by hcid/4136:
#0: (nlk->cb_mutex){--..}, at: [<c0000000002a766c>] .netlink_dump+0x58/0x27c
stack backtrace:
Call Trace:
[c00000003bf37190] [c00000000000f7a0] .show_stack+0x84/0x1b8 (unreliable)
[c00000003bf37250] [c00000000000f8f4] .dump_stack+0x20/0x34
[c00000003bf372d0] [c000000000074b54] .print_circular_bug_tail+0x88/0xac
[c00000003bf373a0] [c000000000077650] .__lock_acquire+0xc0c/0xf88
[c00000003bf37490] [c000000000077a9c] .lock_acquire+0xd0/0x11c
[c00000003bf37550] [c000000000314068] .mutex_lock_nested+0x158/0x3f4
[c00000003bf37640] [c0000000002ace4c] .ctrl_dumpfamily+0x74/0x174
[c00000003bf37700] [c0000000002a76a0] .netlink_dump+0x8c/0x27c
[c00000003bf377c0] [c0000000002a9b58] .netlink_recvmsg+0x1f4/0x378
[c00000003bf378d0] [c00000000027e878] .sock_recvmsg+0x100/0x148
[c00000003bf37af0] [c00000000027fedc] .sys_recvmsg+0x180/0x31c
[c00000003bf37d00] [c0000000002a2f30] .compat_sys_recvmsg+0x20/0x38
[c00000003bf37d80] [c0000000002a420c] .compat_sys_socketcall+0x1f8/0x220
[c00000003bf37e30] [c0000000000076b0] syscall_exit+0x0/0x40
Bluetooth: BNEP (Ethernet Emulation) ver 1.2
Bluetooth: BNEP filters: protocol multicast
The system is running Ubuntu Hardy and using stock libnl for generic
netlink access.
Regards
Marcel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-14 12:35 netlink circular locking dependency Marcel Holtmann
@ 2008-06-16 21:34 ` Jarek Poplawski
2008-06-16 21:48 ` Patrick McHardy
0 siblings, 1 reply; 19+ messages in thread
From: Jarek Poplawski @ 2008-06-16 21:34 UTC (permalink / raw)
To: Marcel Holtmann; +Cc: netdev, Ingo Molnar, Thomas Graf
Marcel Holtmann wrote, On 06/14/2008 02:35 PM:
...
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.26-rc2 #5
> -------------------------------------------------------
> hcid/4136 is trying to acquire lock:
> (genl_mutex){--..}, at: [<c0000000002ace4c>] .ctrl_dumpfamily+0x74/0x174
>
> but task is already holding lock:
> (nlk->cb_mutex){--..}, at: [<c0000000002a766c>] .netlink_dump+0x58/0x27c
>
> which lock already depends on the new lock.
...
Hi,
IMHO it looks like a real lockup threat. Probably it needs something
better, but for now here is my simplistic patch proposal for testing.
Thanks,
Jarek P.
---
net/netlink/genetlink.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index f5aa23c..bf939ba 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -32,6 +32,11 @@ static inline void genl_unlock(void)
mutex_unlock(&genl_mutex);
}
+static inline int genl_trylock(void)
+{
+ return mutex_trylock(&genl_mutex);
+}
+
#define GENL_FAM_TAB_SIZE 16
#define GENL_FAM_TAB_MASK (GENL_FAM_TAB_SIZE - 1)
@@ -603,8 +608,9 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
int chains_to_skip = cb->args[0];
int fams_to_skip = cb->args[1];
- if (chains_to_skip != 0)
- genl_lock();
+ /* genl_lock vs. cb_mutex inversion threatens here */
+ if (chains_to_skip != 0 && !genl_trylock())
+ return skb->len;
for (i = 0; i < GENL_FAM_TAB_SIZE; i++) {
if (i < chains_to_skip)
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-16 21:34 ` Jarek Poplawski
@ 2008-06-16 21:48 ` Patrick McHardy
2008-06-17 1:45 ` Marcel Holtmann
2008-06-17 8:49 ` Jarek Poplawski
0 siblings, 2 replies; 19+ messages in thread
From: Patrick McHardy @ 2008-06-16 21:48 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Marcel Holtmann, netdev, Ingo Molnar, Thomas Graf
[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]
Jarek Poplawski wrote:
> Marcel Holtmann wrote, On 06/14/2008 02:35 PM:
> ...
>
>> =======================================================
>> [ INFO: possible circular locking dependency detected ]
>> 2.6.26-rc2 #5
>> -------------------------------------------------------
>> hcid/4136 is trying to acquire lock:
>> (genl_mutex){--..}, at: [<c0000000002ace4c>] .ctrl_dumpfamily+0x74/0x174
>>
>> but task is already holding lock:
>> (nlk->cb_mutex){--..}, at: [<c0000000002a766c>] .netlink_dump+0x58/0x27c
>>
>> which lock already depends on the new lock.
>>
> ...
>
> Hi,
>
> IMHO it looks like a real lockup threat. Probably it needs something
> better, but for now here is my simplistic patch proposal for testing.
>
So we have:
genl_rcv() : take genl_mutex
genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
netlink_dump_start(),
netlink_dump() : take nlk->cb_mutex
ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
second time
netlink_rcv() : call netlink_dump
netlink_dump : take nlk->cb_mutex
ctrl_dumpfamily() : take genl_mutex
which is a real bug.
It seems the best fix is to use genl_mutex for the netlink cb_mutex,
drop genl_mutex before calling netlink_dump_start and don't take it
in ctrl_dumpfamily, relying completely on af_netlink.c for dump
locking. Unfortunately this creates a race since the ops passed to
netlink_dump_start are also protect by the mutex, so this patch
is just for testing whether it fixes the warning.
On second though - that race seems to be present already since
the ops can be unregistered and the module unloaded while a dump
is in progress.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1378 bytes --]
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index f5aa23c..3e1191c 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -444,8 +444,11 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
if (ops->dumpit == NULL)
return -EOPNOTSUPP;
- return netlink_dump_start(genl_sock, skb, nlh,
- ops->dumpit, ops->done);
+ genl_unlock();
+ err = netlink_dump_start(genl_sock, skb, nlh,
+ ops->dumpit, ops->done);
+ genl_lock();
+ return err;
}
if (ops->doit == NULL)
@@ -603,9 +606,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
int chains_to_skip = cb->args[0];
int fams_to_skip = cb->args[1];
- if (chains_to_skip != 0)
- genl_lock();
-
for (i = 0; i < GENL_FAM_TAB_SIZE; i++) {
if (i < chains_to_skip)
continue;
@@ -623,9 +623,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
}
errout:
- if (chains_to_skip != 0)
- genl_unlock();
-
cb->args[0] = i;
cb->args[1] = n;
@@ -770,7 +767,7 @@ static int __init genl_init(void)
/* we'll bump the group number right afterwards */
genl_sock = netlink_kernel_create(&init_net, NETLINK_GENERIC, 0,
- genl_rcv, NULL, THIS_MODULE);
+ genl_rcv, &genl_mutex, THIS_MODULE);
if (genl_sock == NULL)
panic("GENL: Cannot initialize generic netlink\n");
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-16 21:48 ` Patrick McHardy
@ 2008-06-17 1:45 ` Marcel Holtmann
2008-06-17 12:50 ` Patrick McHardy
2008-06-17 8:49 ` Jarek Poplawski
1 sibling, 1 reply; 19+ messages in thread
From: Marcel Holtmann @ 2008-06-17 1:45 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Jarek Poplawski, netdev, Ingo Molnar, Thomas Graf
Hi Patrick,
> >> =======================================================
> >> [ INFO: possible circular locking dependency detected ]
> >> 2.6.26-rc2 #5
> >> -------------------------------------------------------
> >> hcid/4136 is trying to acquire lock:
> >> (genl_mutex){--..}, at: [<c0000000002ace4c>] .ctrl_dumpfamily+0x74/0x174
> >>
> >> but task is already holding lock:
> >> (nlk->cb_mutex){--..}, at: [<c0000000002a766c>] .netlink_dump+0x58/0x27c
> >>
> >> which lock already depends on the new lock.
> >>
> > ...
> >
> > Hi,
> >
> > IMHO it looks like a real lockup threat. Probably it needs something
> > better, but for now here is my simplistic patch proposal for testing.
> >
> So we have:
>
> genl_rcv() : take genl_mutex
> genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
> netlink_dump_start(),
> netlink_dump() : take nlk->cb_mutex
> ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
> second time
>
> netlink_rcv() : call netlink_dump
> netlink_dump : take nlk->cb_mutex
> ctrl_dumpfamily() : take genl_mutex
>
> which is a real bug.
>
> It seems the best fix is to use genl_mutex for the netlink cb_mutex,
> drop genl_mutex before calling netlink_dump_start and don't take it
> in ctrl_dumpfamily, relying completely on af_netlink.c for dump
> locking. Unfortunately this creates a race since the ops passed to
> netlink_dump_start are also protect by the mutex, so this patch
> is just for testing whether it fixes the warning.
I updated my test kernel to 2.6.26-rc6 and then applied your patch and
the lockdep warning goes away.
Regards
Marcel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-16 21:48 ` Patrick McHardy
2008-06-17 1:45 ` Marcel Holtmann
@ 2008-06-17 8:49 ` Jarek Poplawski
1 sibling, 0 replies; 19+ messages in thread
From: Jarek Poplawski @ 2008-06-17 8:49 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Marcel Holtmann, netdev, Ingo Molnar, Thomas Graf
On Mon, Jun 16, 2008 at 11:48:01PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> Marcel Holtmann wrote, On 06/14/2008 02:35 PM:
>> ...
>>
>>> =======================================================
>>> [ INFO: possible circular locking dependency detected ]
>>> 2.6.26-rc2 #5
>>> -------------------------------------------------------
>>> hcid/4136 is trying to acquire lock:
>>> (genl_mutex){--..}, at: [<c0000000002ace4c>] .ctrl_dumpfamily+0x74/0x174
>>>
>>> but task is already holding lock:
>>> (nlk->cb_mutex){--..}, at: [<c0000000002a766c>] .netlink_dump+0x58/0x27c
>>>
>>> which lock already depends on the new lock.
>>>
>> ...
>>
>> Hi,
>>
>> IMHO it looks like a real lockup threat. Probably it needs something
>> better, but for now here is my simplistic patch proposal for testing.
>>
> So we have:
>
> genl_rcv() : take genl_mutex
> genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
> netlink_dump_start(),
> netlink_dump() : take nlk->cb_mutex
> ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
> second time
>
> netlink_rcv() : call netlink_dump
> netlink_dump : take nlk->cb_mutex
> ctrl_dumpfamily() : take genl_mutex
>
> which is a real bug.
Right. Probably there is also another variant:
#1
genl_rcv() : take genl_mutex
genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
netlink_dump_start() : mutex_unlock nlk->cb_mutex
#2
netlink_rcv() : call netlink_dump
netlink_dump : take nlk->cb_mutex
ctrl_dumpfamily() : 1st run without genl_mutex
#1
netlink_dump() : take nlk->cb_mutex
ctrl_dumpfamily() : > 1st run: take genl_mutex 2nd time?!
>
> It seems the best fix is to use genl_mutex for the netlink cb_mutex,
> drop genl_mutex before calling netlink_dump_start and don't take it
> in ctrl_dumpfamily, relying completely on af_netlink.c for dump
> locking. Unfortunately this creates a race since the ops passed to
> netlink_dump_start are also protect by the mutex, so this patch
> is just for testing whether it fixes the warning.
>
> On second though - that race seems to be present already since
> the ops can be unregistered and the module unloaded while a dump
> is in progress.
Yes, very interesting. I guess there will be some followup...
Regards,
Jarek P.
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index f5aa23c..3e1191c 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -444,8 +444,11 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> if (ops->dumpit == NULL)
> return -EOPNOTSUPP;
>
> - return netlink_dump_start(genl_sock, skb, nlh,
> - ops->dumpit, ops->done);
> + genl_unlock();
> + err = netlink_dump_start(genl_sock, skb, nlh,
> + ops->dumpit, ops->done);
> + genl_lock();
> + return err;
> }
>
> if (ops->doit == NULL)
> @@ -603,9 +606,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
> int chains_to_skip = cb->args[0];
> int fams_to_skip = cb->args[1];
>
> - if (chains_to_skip != 0)
> - genl_lock();
> -
> for (i = 0; i < GENL_FAM_TAB_SIZE; i++) {
> if (i < chains_to_skip)
> continue;
> @@ -623,9 +623,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
> }
>
> errout:
> - if (chains_to_skip != 0)
> - genl_unlock();
> -
> cb->args[0] = i;
> cb->args[1] = n;
>
> @@ -770,7 +767,7 @@ static int __init genl_init(void)
>
> /* we'll bump the group number right afterwards */
> genl_sock = netlink_kernel_create(&init_net, NETLINK_GENERIC, 0,
> - genl_rcv, NULL, THIS_MODULE);
> + genl_rcv, &genl_mutex, THIS_MODULE);
> if (genl_sock == NULL)
> panic("GENL: Cannot initialize generic netlink\n");
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-17 1:45 ` Marcel Holtmann
@ 2008-06-17 12:50 ` Patrick McHardy
2008-06-17 13:09 ` Jarek Poplawski
0 siblings, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2008-06-17 12:50 UTC (permalink / raw)
To: Marcel Holtmann; +Cc: Jarek Poplawski, netdev, Ingo Molnar, Thomas Graf
Marcel Holtmann wrote:
> Hi Patrick,
>
>> So we have:
>>
>> genl_rcv() : take genl_mutex
>> genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
>> netlink_dump_start(),
>> netlink_dump() : take nlk->cb_mutex
>> ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
>> second time
>>
>> netlink_rcv() : call netlink_dump
>> netlink_dump : take nlk->cb_mutex
>> ctrl_dumpfamily() : take genl_mutex
>>
>> which is a real bug.
>>
>> It seems the best fix is to use genl_mutex for the netlink cb_mutex,
>> drop genl_mutex before calling netlink_dump_start and don't take it
>> in ctrl_dumpfamily, relying completely on af_netlink.c for dump
>> locking. Unfortunately this creates a race since the ops passed to
>> netlink_dump_start are also protect by the mutex, so this patch
>> is just for testing whether it fixes the warning.
>
> I updated my test kernel to 2.6.26-rc6 and then applied your patch and
> the lockdep warning goes away.
Thanks for testing. Unfortunately the module unload races look
more complicated to fix and I'm busy with other things, so it
would great if someone else could fix this.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-17 13:09 ` Jarek Poplawski
@ 2008-06-17 13:07 ` Patrick McHardy
2008-06-17 13:24 ` Jarek Poplawski
2008-06-17 13:08 ` Thomas Graf
1 sibling, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2008-06-17 13:07 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Marcel Holtmann, netdev, Ingo Molnar, Thomas Graf
Jarek Poplawski wrote:
> On Tue, Jun 17, 2008 at 02:50:19PM +0200, Patrick McHardy wrote:
> ...
>> Thanks for testing. Unfortunately the module unload races look
>> more complicated to fix and I'm busy with other things, so it
>> would great if someone else could fix this.
>
> Patrick, I wonder if simply adding an additional mutex e.g.
> genl_lock_table() around all the rest (after your patch) genl_locks
> could be enough until some major rework. This should prevent any
> new races and there are no lockups, I guess?
Not sure I understand you correctly, where exactly would
this mutex be taken?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-17 13:09 ` Jarek Poplawski
2008-06-17 13:07 ` Patrick McHardy
@ 2008-06-17 13:08 ` Thomas Graf
2008-06-17 13:19 ` Patrick McHardy
1 sibling, 1 reply; 19+ messages in thread
From: Thomas Graf @ 2008-06-17 13:08 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Patrick McHardy, Marcel Holtmann, netdev, Ingo Molnar
* Jarek Poplawski <jarkao2@gmail.com> 2008-06-17 13:09
> On Tue, Jun 17, 2008 at 02:50:19PM +0200, Patrick McHardy wrote:
> ...
> > Thanks for testing. Unfortunately the module unload races look
> > more complicated to fix and I'm busy with other things, so it
> > would great if someone else could fix this.
>
> Patrick, I wonder if simply adding an additional mutex e.g.
> genl_lock_table() around all the rest (after your patch) genl_locks
> could be enough until some major rework. This should prevent any
> new races and there are no lockups, I guess?
It would be wise to redo the locking alltogether while we're fixing
this. Decouple message serialization from register/unregister
operations would be a nice thing f.e.
I'll have a look if none beats me to it.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-17 12:50 ` Patrick McHardy
@ 2008-06-17 13:09 ` Jarek Poplawski
2008-06-17 13:07 ` Patrick McHardy
2008-06-17 13:08 ` Thomas Graf
0 siblings, 2 replies; 19+ messages in thread
From: Jarek Poplawski @ 2008-06-17 13:09 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Marcel Holtmann, netdev, Ingo Molnar, Thomas Graf
On Tue, Jun 17, 2008 at 02:50:19PM +0200, Patrick McHardy wrote:
...
> Thanks for testing. Unfortunately the module unload races look
> more complicated to fix and I'm busy with other things, so it
> would great if someone else could fix this.
Patrick, I wonder if simply adding an additional mutex e.g.
genl_lock_table() around all the rest (after your patch) genl_locks
could be enough until some major rework. This should prevent any
new races and there are no lockups, I guess?
Jarek P.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-17 13:08 ` Thomas Graf
@ 2008-06-17 13:19 ` Patrick McHardy
0 siblings, 0 replies; 19+ messages in thread
From: Patrick McHardy @ 2008-06-17 13:19 UTC (permalink / raw)
To: Thomas Graf; +Cc: Jarek Poplawski, Marcel Holtmann, netdev, Ingo Molnar
Thomas Graf wrote:
> * Jarek Poplawski <jarkao2@gmail.com> 2008-06-17 13:09
>> On Tue, Jun 17, 2008 at 02:50:19PM +0200, Patrick McHardy wrote:
>> ...
>>> Thanks for testing. Unfortunately the module unload races look
>>> more complicated to fix and I'm busy with other things, so it
>>> would great if someone else could fix this.
>> Patrick, I wonder if simply adding an additional mutex e.g.
>> genl_lock_table() around all the rest (after your patch) genl_locks
>> could be enough until some major rework. This should prevent any
>> new races and there are no lockups, I guess?
>
> It would be wise to redo the locking alltogether while we're fixing
> this. Decouple message serialization from register/unregister
> operations would be a nice thing f.e.
Agreed.
> I'll have a look if none beats me to it.
While whoever is at this, there are a few similar races I
wanted to fix for a long time, but never got to it.
- the rtnl_register/rtnl_unregister function don't perform any
locking, so they might get called while unregistration is
in progress
- the individual ops are registered one at a time, which could
theoretically lead to some strange behaviour where a subset of
the ops can already be used, but others return EOPNOTSUPP
- netlink_dump_start usage in rtnetlink has the exact same
race as genetlink with my patch
For 1. and 2., it seems the best way would be to register the
entire set of ops (encapsulated in a struct) at once and hold
the rtnl_mutex during that. For 3., the fix would probably
identical to the genetlink fix.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-17 13:07 ` Patrick McHardy
@ 2008-06-17 13:24 ` Jarek Poplawski
2008-06-17 13:27 ` Patrick McHardy
0 siblings, 1 reply; 19+ messages in thread
From: Jarek Poplawski @ 2008-06-17 13:24 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Marcel Holtmann, netdev, Ingo Molnar, Thomas Graf
On Tue, Jun 17, 2008 at 03:07:44PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> On Tue, Jun 17, 2008 at 02:50:19PM +0200, Patrick McHardy wrote:
>> ...
>>> Thanks for testing. Unfortunately the module unload races look
>>> more complicated to fix and I'm busy with other things, so it
>>> would great if someone else could fix this.
>>
>> Patrick, I wonder if simply adding an additional mutex e.g.
>> genl_lock_table() around all the rest (after your patch) genl_locks
>> could be enough until some major rework. This should prevent any
>> new races and there are no lockups, I guess?
>
> Not sure I understand you correctly, where exactly would
> this mutex be taken?
Around (before) each genl_lock(), so any change would need these two
locks. genl_lock() alone would work like read lock (plus cb change).
But, I can miss something... Of course, it's meant as a temporary
solution, until Thomas does it right.
Jarek P.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-17 13:24 ` Jarek Poplawski
@ 2008-06-17 13:27 ` Patrick McHardy
2008-06-17 13:43 ` Jarek Poplawski
0 siblings, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2008-06-17 13:27 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Marcel Holtmann, netdev, Ingo Molnar, Thomas Graf
Jarek Poplawski wrote:
> On Tue, Jun 17, 2008 at 03:07:44PM +0200, Patrick McHardy wrote:
>> Jarek Poplawski wrote:
>>> On Tue, Jun 17, 2008 at 02:50:19PM +0200, Patrick McHardy wrote:
>>> ...
>>>> Thanks for testing. Unfortunately the module unload races look
>>>> more complicated to fix and I'm busy with other things, so it
>>>> would great if someone else could fix this.
>>> Patrick, I wonder if simply adding an additional mutex e.g.
>>> genl_lock_table() around all the rest (after your patch) genl_locks
>>> could be enough until some major rework. This should prevent any
>>> new races and there are no lockups, I guess?
>> Not sure I understand you correctly, where exactly would
>> this mutex be taken?
>
> Around (before) each genl_lock(), so any change would need these two
> locks. genl_lock() alone would work like read lock (plus cb change).
> But, I can miss something... Of course, it's meant as a temporary
> solution, until Thomas does it right.
I guess that might work. But simply using module references to
prevent the module from going away while the mutex is not held
doesn't appear to be more complicated.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-17 13:27 ` Patrick McHardy
@ 2008-06-17 13:43 ` Jarek Poplawski
2008-06-18 4:30 ` David Miller
0 siblings, 1 reply; 19+ messages in thread
From: Jarek Poplawski @ 2008-06-17 13:43 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Marcel Holtmann, netdev, Ingo Molnar, Thomas Graf
On Tue, Jun 17, 2008 at 03:27:18PM +0200, Patrick McHardy wrote:
...
> I guess that might work. But simply using module references to
> prevent the module from going away while the mutex is not held
> doesn't appear to be more complicated.
O.K.
Jarek P.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-17 13:43 ` Jarek Poplawski
@ 2008-06-18 4:30 ` David Miller
2008-06-18 6:15 ` Jarek Poplawski
2008-06-18 8:52 ` Patrick McHardy
0 siblings, 2 replies; 19+ messages in thread
From: David Miller @ 2008-06-18 4:30 UTC (permalink / raw)
To: jarkao2; +Cc: kaber, marcel, netdev, mingo, tgraf
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 17 Jun 2008 13:43:03 +0000
> On Tue, Jun 17, 2008 at 03:27:18PM +0200, Patrick McHardy wrote:
> ...
> > I guess that might work. But simply using module references to
> > prevent the module from going away while the mutex is not held
> > doesn't appear to be more complicated.
>
> O.K.
In the meantime should I apply Patrick's current patch
from this thread?
If so I'd like a proper commit message and signoff, thanks!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-18 4:30 ` David Miller
@ 2008-06-18 6:15 ` Jarek Poplawski
2008-06-18 8:52 ` Patrick McHardy
1 sibling, 0 replies; 19+ messages in thread
From: Jarek Poplawski @ 2008-06-18 6:15 UTC (permalink / raw)
To: David Miller; +Cc: kaber, marcel, netdev, mingo, tgraf
On Tue, Jun 17, 2008 at 09:30:12PM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Tue, 17 Jun 2008 13:43:03 +0000
>
> > On Tue, Jun 17, 2008 at 03:27:18PM +0200, Patrick McHardy wrote:
> > ...
> > > I guess that might work. But simply using module references to
> > > prevent the module from going away while the mutex is not held
> > > doesn't appear to be more complicated.
> >
> > O.K.
>
> In the meantime should I apply Patrick's current patch
> from this thread?
>
> If so I'd like a proper commit message and signoff, thanks!
I hope this is a question "To:" Patrick... (I'm OK with any
version of this patch as a temporary solution.)
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-18 4:30 ` David Miller
2008-06-18 6:15 ` Jarek Poplawski
@ 2008-06-18 8:52 ` Patrick McHardy
2008-06-18 9:08 ` David Miller
1 sibling, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2008-06-18 8:52 UTC (permalink / raw)
To: David Miller; +Cc: jarkao2, marcel, netdev, mingo, tgraf
[-- Attachment #1: Type: text/plain, Size: 280 bytes --]
David Miller wrote:
> In the meantime should I apply Patrick's current patch
> from this thread?
I think that makes sense, the module unload race is present anyway.
>
> If so I'd like a proper commit message and signoff, thanks!
Attached the patch with a proper changelog.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 2251 bytes --]
netlink: genl: fix circular locking
genetlink has a circular locking dependency when dumping the registered
families:
- dump start:
genl_rcv() : take genl_mutex
genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
netlink_dump_start(),
netlink_dump() : take nlk->cb_mutex
ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
second time
- dump continuance:
netlink_rcv() : call netlink_dump
netlink_dump : take nlk->cb_mutex
ctrl_dumpfamily() : take genl_mutex
Register genl_lock as callback mutex with netlink to fix this. This slightly
widens an already existing module unload race, the genl ops used during the
dump might go away when the module is unloaded. Thomas Graf is working on a
seperate fix for this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index f5aa23c..3e1191c 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -444,8 +444,11 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
if (ops->dumpit == NULL)
return -EOPNOTSUPP;
- return netlink_dump_start(genl_sock, skb, nlh,
- ops->dumpit, ops->done);
+ genl_unlock();
+ err = netlink_dump_start(genl_sock, skb, nlh,
+ ops->dumpit, ops->done);
+ genl_lock();
+ return err;
}
if (ops->doit == NULL)
@@ -603,9 +606,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
int chains_to_skip = cb->args[0];
int fams_to_skip = cb->args[1];
- if (chains_to_skip != 0)
- genl_lock();
-
for (i = 0; i < GENL_FAM_TAB_SIZE; i++) {
if (i < chains_to_skip)
continue;
@@ -623,9 +623,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
}
errout:
- if (chains_to_skip != 0)
- genl_unlock();
-
cb->args[0] = i;
cb->args[1] = n;
@@ -770,7 +767,7 @@ static int __init genl_init(void)
/* we'll bump the group number right afterwards */
genl_sock = netlink_kernel_create(&init_net, NETLINK_GENERIC, 0,
- genl_rcv, NULL, THIS_MODULE);
+ genl_rcv, &genl_mutex, THIS_MODULE);
if (genl_sock == NULL)
panic("GENL: Cannot initialize generic netlink\n");
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-18 8:52 ` Patrick McHardy
@ 2008-06-18 9:08 ` David Miller
2008-06-18 11:38 ` Marcel Holtmann
0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2008-06-18 9:08 UTC (permalink / raw)
To: kaber; +Cc: jarkao2, marcel, netdev, mingo, tgraf
From: Patrick McHardy <kaber@trash.net>
Date: Wed, 18 Jun 2008 10:52:30 +0200
> David Miller wrote:
> > In the meantime should I apply Patrick's current patch
> > from this thread?
>
> I think that makes sense, the module unload race is present anyway.
> >
> > If so I'd like a proper commit message and signoff, thanks!
>
> Attached the patch with a proper changelog.
Applied and pushed out to net-2.6, thanks a lot Patrick.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-18 9:08 ` David Miller
@ 2008-06-18 11:38 ` Marcel Holtmann
2008-06-18 11:42 ` Patrick McHardy
0 siblings, 1 reply; 19+ messages in thread
From: Marcel Holtmann @ 2008-06-18 11:38 UTC (permalink / raw)
To: David Miller; +Cc: kaber, jarkao2, netdev, mingo, tgraf
Hi David,
> > > In the meantime should I apply Patrick's current patch
> > > from this thread?
> >
> > I think that makes sense, the module unload race is present anyway.
> > >
> > > If so I'd like a proper commit message and signoff, thanks!
> >
> > Attached the patch with a proper changelog.
>
> Applied and pushed out to net-2.6, thanks a lot Patrick.
you could have added a Tested-by: Marcel Holtmann <marcel@holtmann.org>
Regards
Marcel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: netlink circular locking dependency
2008-06-18 11:38 ` Marcel Holtmann
@ 2008-06-18 11:42 ` Patrick McHardy
0 siblings, 0 replies; 19+ messages in thread
From: Patrick McHardy @ 2008-06-18 11:42 UTC (permalink / raw)
To: Marcel Holtmann; +Cc: David Miller, jarkao2, netdev, mingo, tgraf
Marcel Holtmann wrote:
> Hi David,
>
>>>> In the meantime should I apply Patrick's current patch
>>>> from this thread?
>>> I think that makes sense, the module unload race is present anyway.
>>>> If so I'd like a proper commit message and signoff, thanks!
>>> Attached the patch with a proper changelog.
>> Applied and pushed out to net-2.6, thanks a lot Patrick.
>
> you could have added a Tested-by: Marcel Holtmann <marcel@holtmann.org>
My fault, I wanted to but forgot due to lack of coffee :)
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-06-18 11:42 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-14 12:35 netlink circular locking dependency Marcel Holtmann
2008-06-16 21:34 ` Jarek Poplawski
2008-06-16 21:48 ` Patrick McHardy
2008-06-17 1:45 ` Marcel Holtmann
2008-06-17 12:50 ` Patrick McHardy
2008-06-17 13:09 ` Jarek Poplawski
2008-06-17 13:07 ` Patrick McHardy
2008-06-17 13:24 ` Jarek Poplawski
2008-06-17 13:27 ` Patrick McHardy
2008-06-17 13:43 ` Jarek Poplawski
2008-06-18 4:30 ` David Miller
2008-06-18 6:15 ` Jarek Poplawski
2008-06-18 8:52 ` Patrick McHardy
2008-06-18 9:08 ` David Miller
2008-06-18 11:38 ` Marcel Holtmann
2008-06-18 11:42 ` Patrick McHardy
2008-06-17 13:08 ` Thomas Graf
2008-06-17 13:19 ` Patrick McHardy
2008-06-17 8:49 ` Jarek Poplawski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).