From: Jarek Poplawski <jarkao2@gmail.com>
To: Patrick McHardy <kaber@trash.net>
Cc: Marcel Holtmann <marcel@holtmann.org>,
netdev@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Thomas Graf <tgraf@suug.ch>
Subject: Re: netlink circular locking dependency
Date: Tue, 17 Jun 2008 08:49:36 +0000 [thread overview]
Message-ID: <20080617084936.GA4202@ff.dom.local> (raw)
In-Reply-To: <4856DF91.30606@trash.net>
On Mon, Jun 16, 2008 at 11:48:01PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> Marcel Holtmann wrote, On 06/14/2008 02:35 PM:
>> ...
>>
>>> =======================================================
>>> [ INFO: possible circular locking dependency detected ]
>>> 2.6.26-rc2 #5
>>> -------------------------------------------------------
>>> hcid/4136 is trying to acquire lock:
>>> (genl_mutex){--..}, at: [<c0000000002ace4c>] .ctrl_dumpfamily+0x74/0x174
>>>
>>> but task is already holding lock:
>>> (nlk->cb_mutex){--..}, at: [<c0000000002a766c>] .netlink_dump+0x58/0x27c
>>>
>>> which lock already depends on the new lock.
>>>
>> ...
>>
>> Hi,
>>
>> IMHO it looks like a real lockup threat. Probably it needs something
>> better, but for now here is my simplistic patch proposal for testing.
>>
> So we have:
>
> genl_rcv() : take genl_mutex
> genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
> netlink_dump_start(),
> netlink_dump() : take nlk->cb_mutex
> ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
> second time
>
> netlink_rcv() : call netlink_dump
> netlink_dump : take nlk->cb_mutex
> ctrl_dumpfamily() : take genl_mutex
>
> which is a real bug.
Right. Probably there is also another variant:
#1
genl_rcv() : take genl_mutex
genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
netlink_dump_start() : mutex_unlock nlk->cb_mutex
#2
netlink_rcv() : call netlink_dump
netlink_dump : take nlk->cb_mutex
ctrl_dumpfamily() : 1st run without genl_mutex
#1
netlink_dump() : take nlk->cb_mutex
ctrl_dumpfamily() : > 1st run: take genl_mutex 2nd time?!
>
> It seems the best fix is to use genl_mutex for the netlink cb_mutex,
> drop genl_mutex before calling netlink_dump_start and don't take it
> in ctrl_dumpfamily, relying completely on af_netlink.c for dump
> locking. Unfortunately this creates a race since the ops passed to
> netlink_dump_start are also protect by the mutex, so this patch
> is just for testing whether it fixes the warning.
>
> On second though - that race seems to be present already since
> the ops can be unregistered and the module unloaded while a dump
> is in progress.
Yes, very interesting. I guess there will be some followup...
Regards,
Jarek P.
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index f5aa23c..3e1191c 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -444,8 +444,11 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> if (ops->dumpit == NULL)
> return -EOPNOTSUPP;
>
> - return netlink_dump_start(genl_sock, skb, nlh,
> - ops->dumpit, ops->done);
> + genl_unlock();
> + err = netlink_dump_start(genl_sock, skb, nlh,
> + ops->dumpit, ops->done);
> + genl_lock();
> + return err;
> }
>
> if (ops->doit == NULL)
> @@ -603,9 +606,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
> int chains_to_skip = cb->args[0];
> int fams_to_skip = cb->args[1];
>
> - if (chains_to_skip != 0)
> - genl_lock();
> -
> for (i = 0; i < GENL_FAM_TAB_SIZE; i++) {
> if (i < chains_to_skip)
> continue;
> @@ -623,9 +623,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
> }
>
> errout:
> - if (chains_to_skip != 0)
> - genl_unlock();
> -
> cb->args[0] = i;
> cb->args[1] = n;
>
> @@ -770,7 +767,7 @@ static int __init genl_init(void)
>
> /* we'll bump the group number right afterwards */
> genl_sock = netlink_kernel_create(&init_net, NETLINK_GENERIC, 0,
> - genl_rcv, NULL, THIS_MODULE);
> + genl_rcv, &genl_mutex, THIS_MODULE);
> if (genl_sock == NULL)
> panic("GENL: Cannot initialize generic netlink\n");
>
prev parent reply other threads:[~2008-06-17 8:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-14 12:35 netlink circular locking dependency Marcel Holtmann
2008-06-16 21:34 ` Jarek Poplawski
2008-06-16 21:48 ` Patrick McHardy
2008-06-17 1:45 ` Marcel Holtmann
2008-06-17 12:50 ` Patrick McHardy
2008-06-17 13:09 ` Jarek Poplawski
2008-06-17 13:07 ` Patrick McHardy
2008-06-17 13:24 ` Jarek Poplawski
2008-06-17 13:27 ` Patrick McHardy
2008-06-17 13:43 ` Jarek Poplawski
2008-06-18 4:30 ` David Miller
2008-06-18 6:15 ` Jarek Poplawski
2008-06-18 8:52 ` Patrick McHardy
2008-06-18 9:08 ` David Miller
2008-06-18 11:38 ` Marcel Holtmann
2008-06-18 11:42 ` Patrick McHardy
2008-06-17 13:08 ` Thomas Graf
2008-06-17 13:19 ` Patrick McHardy
2008-06-17 8:49 ` Jarek Poplawski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080617084936.GA4202@ff.dom.local \
--to=jarkao2@gmail.com \
--cc=kaber@trash.net \
--cc=marcel@holtmann.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.