netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash.net>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	netfilter-devel@vger.kernel.org,
	Stephen Hemminger <shemminger@vyatta.com>
Subject: Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
Date: Mon, 24 Jan 2011 13:46:37 +0100	[thread overview]
Message-ID: <4D3D74AD.5080300@trash.net> (raw)
In-Reply-To: <4D3D691F.3050403@netfilter.org>

On 24.01.2011 12:57, Pablo Neira Ayuso wrote:
> On 24/01/11 07:41, Eric Dumazet wrote:
>> Le lundi 24 janvier 2011 à 00:16 +0100, Pablo Neira Ayuso a écrit :
>>> In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks
>>> to protect the dump of the conntrack table according to reports from
>>> Stephen and acknowledgments on the issue from Eric.
>>
>> When a commit is referred, always give its title, and you can use a
>> short id (12 digits are OK)
> 
> I'm using the id that git log --oneline shows, is it OK?
> 
>> In 13ee6ac57957 (netfilter: fix race in conntrack between dump_table and
>> destroy)
>>
>>> However, Stephen removed the refcount bump in that patch that allows
>>> to keep a reference to the current ct object we are interating over.
>>> That code avoids race conditions between ct object destruction and
>>> the iteration itself. This patch reintroduces these lines since the
>>> ct object may vanish between two recvmgs() invocations.
>>>
>>> This patch fixes ocasional crashes while dumping the conntrack table
>>> intensively.
>>>
>>> Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
>>
>> Thats not true, Stephen was not in CC on your mail
> 
> I'm sorry, I forgot to add --cc to stgit.
> 
>> I add him right now.
>>
>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>> ---
>>>  net/netfilter/nf_conntrack_netlink.c |    8 ++++++--
>>>  1 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
>>> index 93297aa..519e245 100644
>>> --- a/net/netfilter/nf_conntrack_netlink.c
>>> +++ b/net/netfilter/nf_conntrack_netlink.c
>>> @@ -654,14 +654,16 @@ restart:
>>>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>>>  				continue;
>>>  			ct = nf_ct_tuplehash_to_ctrack(h);
>>> +			if (!atomic_inc_not_zero(&ct->ct_general.use))
>>> +				continue;
>>>  			/* Dump entries of a given L3 protocol number.
>>>  			 * If it is not specified, ie. l3proto == 0,
>>>  			 * then dump everything. */
>>>  			if (l3proto && nf_ct_l3num(ct) != l3proto)
>>> -				continue;
>>> +				goto releasect;
>>>  			if (cb->args[1]) {
>>>  				if (ct != last)
>>> -					continue;
>>> +					goto releasect;
>>>  				cb->args[1] = 0;
>>>  			}
>>>  			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
>>> @@ -679,6 +681,8 @@ restart:
>>>  				if (acct)
>>>  					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
>>>  			}
>>> +releasect:
>>> +		nf_ct_put(ct);
>>>  		}
>>>  		if (cb->args[1]) {
>>>  			cb->args[1] = 0;
>>>
>>
>>
>> Hmm...
>>
>> Only the RCU lookup should need this extra check (atomic_inc_not_zero),
>> not a writer. (And a dumper is a "writer" since it holds nf_conntrack
>> spinlock)
>>
>> In Stephen commit, we switched from RCU lookup to traditional one
>> (spinlock protected), so, we should not need to touch individual objects
>> refcount ?
>>
>> I feel the right fix is not to increment refcount then decrement it.
>>
>> Only to _test_ it being zero, and not dumping the ct, eventually ?
>>
>>
>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
>> index 93297aa..a977cc7 100644
>> --- a/net/netfilter/nf_conntrack_netlink.c
>> +++ b/net/netfilter/nf_conntrack_netlink.c
>> @@ -654,6 +654,8 @@ restart:
>>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>>  				continue;
>>  			ct = nf_ct_tuplehash_to_ctrack(h);
>> +			if (!atomic_read(&ct->ct_general.use))
>> +				continue;
>>  			/* Dump entries of a given L3 protocol number.
>>  			 * If it is not specified, ie. l3proto == 0,
>>  			 * then dump everything. */
> 
> No, that won't work.
> 
> Sorry, I didn't explain the problem appropriately. We still do a
> nf_ct_put() for last ct. In that patch, you forgot to add the refcount
> bump that we need to keep the reference to the object.
> 
> The following patch attached is smaller, it also fixes the problem and
> we don't have to increase the refcount for each object. Now it looks
> similar to how it was before the RCU patches.

This looks correct to me, we need to keep a reference for continuations.
Eric, do you agree with this patch?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-01-24 12:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-23 23:16 [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy Pablo Neira Ayuso
2011-01-24  6:41 ` Eric Dumazet
2011-01-24  7:04   ` Eric Dumazet
2011-01-24 11:57   ` Pablo Neira Ayuso
2011-01-24 12:46     ` Patrick McHardy [this message]
2011-01-24 12:54       ` Eric Dumazet
2011-01-24 13:06         ` Pablo Neira Ayuso
2011-01-24 13:12           ` Eric Dumazet
2011-01-24 13:25             ` Pablo Neira Ayuso
2011-01-24 13:37               ` Patrick McHardy
2011-01-24 14:06                 ` Eric Dumazet
2011-01-24 17:53                   ` Patrick McHardy
2011-01-24 18:01                     ` Eric Dumazet
2011-01-24 18:16                       ` Patrick McHardy
2011-02-20 20:48 ` Stephen Hemminger
2011-02-20 20:51   ` [PATCH 1/2] netfilter: fix race in conntrack " Stephen Hemminger
2011-02-20 20:51   ` [PATCH 2/2] netfilter: ctnetlink: fix missing refcount increment during dumps Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D3D74AD.5080300@trash.net \
    --to=kaber@trash.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).