From: Jarek Poplawski <jarkao2@gmail.com>
To: Patrick McHardy <kaber@trash.net>
Cc: Marcel Holtmann <marcel@holtmann.org>,
netdev@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Thomas Graf <tgraf@suug.ch>
Subject: Re: netlink circular locking dependency
Date: Tue, 17 Jun 2008 08:49:36 +0000 [thread overview]
Message-ID: <20080617084936.GA4202@ff.dom.local> (raw)
In-Reply-To: <4856DF91.30606@trash.net>
On Mon, Jun 16, 2008 at 11:48:01PM +0200, Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> Marcel Holtmann wrote, On 06/14/2008 02:35 PM:
>> ...
>>
>>> =======================================================
>>> [ INFO: possible circular locking dependency detected ]
>>> 2.6.26-rc2 #5
>>> -------------------------------------------------------
>>> hcid/4136 is trying to acquire lock:
>>> (genl_mutex){--..}, at: [<c0000000002ace4c>] .ctrl_dumpfamily+0x74/0x174
>>>
>>> but task is already holding lock:
>>> (nlk->cb_mutex){--..}, at: [<c0000000002a766c>] .netlink_dump+0x58/0x27c
>>>
>>> which lock already depends on the new lock.
>>>
>> ...
>>
>> Hi,
>>
>> IMHO it looks like a real lockup threat. Probably it needs something
>> better, but for now here is my simplistic patch proposal for testing.
>>
> So we have:
>
> genl_rcv() : take genl_mutex
> genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
> netlink_dump_start(),
> netlink_dump() : take nlk->cb_mutex
> ctrl_dumpfamily() : try to detect this case and not take genl_mutex a
> second time
>
> netlink_rcv() : call netlink_dump
> netlink_dump : take nlk->cb_mutex
> ctrl_dumpfamily() : take genl_mutex
>
> which is a real bug.
Right. Probably there is also another variant:
#1
genl_rcv() : take genl_mutex
genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex
netlink_dump_start() : mutex_unlock nlk->cb_mutex
#2
netlink_rcv() : call netlink_dump
netlink_dump : take nlk->cb_mutex
ctrl_dumpfamily() : 1st run without genl_mutex
#1
netlink_dump() : take nlk->cb_mutex
ctrl_dumpfamily() : > 1st run: take genl_mutex 2nd time?!
>
> It seems the best fix is to use genl_mutex for the netlink cb_mutex,
> drop genl_mutex before calling netlink_dump_start and don't take it
> in ctrl_dumpfamily, relying completely on af_netlink.c for dump
> locking. Unfortunately this creates a race since the ops passed to
> netlink_dump_start are also protect by the mutex, so this patch
> is just for testing whether it fixes the warning.
>
> On second though - that race seems to be present already since
> the ops can be unregistered and the module unloaded while a dump
> is in progress.
Yes, very interesting. I guess there will be some followup...
Regards,
Jarek P.
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index f5aa23c..3e1191c 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -444,8 +444,11 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> if (ops->dumpit == NULL)
> return -EOPNOTSUPP;
>
> - return netlink_dump_start(genl_sock, skb, nlh,
> - ops->dumpit, ops->done);
> + genl_unlock();
> + err = netlink_dump_start(genl_sock, skb, nlh,
> + ops->dumpit, ops->done);
> + genl_lock();
> + return err;
> }
>
> if (ops->doit == NULL)
> @@ -603,9 +606,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
> int chains_to_skip = cb->args[0];
> int fams_to_skip = cb->args[1];
>
> - if (chains_to_skip != 0)
> - genl_lock();
> -
> for (i = 0; i < GENL_FAM_TAB_SIZE; i++) {
> if (i < chains_to_skip)
> continue;
> @@ -623,9 +623,6 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
> }
>
> errout:
> - if (chains_to_skip != 0)
> - genl_unlock();
> -
> cb->args[0] = i;
> cb->args[1] = n;
>
> @@ -770,7 +767,7 @@ static int __init genl_init(void)
>
> /* we'll bump the group number right afterwards */
> genl_sock = netlink_kernel_create(&init_net, NETLINK_GENERIC, 0,
> - genl_rcv, NULL, THIS_MODULE);
> + genl_rcv, &genl_mutex, THIS_MODULE);
> if (genl_sock == NULL)
> panic("GENL: Cannot initialize generic netlink\n");
>
prev parent reply other threads:[~2008-06-17 8:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-14 12:35 netlink circular locking dependency Marcel Holtmann
2008-06-16 21:34 ` Jarek Poplawski
2008-06-16 21:48 ` Patrick McHardy
2008-06-17 1:45 ` Marcel Holtmann
2008-06-17 12:50 ` Patrick McHardy
2008-06-17 13:09 ` Jarek Poplawski
2008-06-17 13:07 ` Patrick McHardy
2008-06-17 13:24 ` Jarek Poplawski
2008-06-17 13:27 ` Patrick McHardy
2008-06-17 13:43 ` Jarek Poplawski
2008-06-18 4:30 ` David Miller
2008-06-18 6:15 ` Jarek Poplawski
2008-06-18 8:52 ` Patrick McHardy
2008-06-18 9:08 ` David Miller
2008-06-18 11:38 ` Marcel Holtmann
2008-06-18 11:42 ` Patrick McHardy
2008-06-17 13:08 ` Thomas Graf
2008-06-17 13:19 ` Patrick McHardy
2008-06-17 8:49 ` Jarek Poplawski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080617084936.GA4202@ff.dom.local \
--to=jarkao2@gmail.com \
--cc=kaber@trash.net \
--cc=marcel@holtmann.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).