netfilter.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash.net>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Joerg Marx <joerg.marx@secunet.com>,
	Netfilter Development Mailinglist
	<netfilter-devel@vger.kernel.org>,
	Mail List - Netfilter <netfilter@vger.kernel.org>
Subject: Re: [PATCH] nf_conntrack_core.c: fix for dead connection after flushing conntrack cache
Date: Mon, 10 May 2010 17:23:22 +0200	[thread overview]
Message-ID: <4BE824EA.6000303@trash.net> (raw)
In-Reply-To: <4BE73DE3.6080304@netfilter.org>

Pablo Neira Ayuso wrote:
>> we encountered a weird problem of a 'stalled' connection when using
>> 'conntrack -F' on a box with heavy network load. 'conntrack -L' gave us
>> sometimes a [UNREPLIED] entry for the traffic in question, but no traffic
>> flow, no matching of packets in or out, only the timer went down from 600
>> to 0 (it was ESP traffic - the default generic timeout = 600 seconds).
>> After the entry vanished by timeout (or doing a 'conntrack -F' once more),
>> all worked normal again.
>>
>> The reason we finally found, is a race window in 'nf_conntrack_confirm'
>> when calling '__nf_conntrack_confirm':
>>
>> In 'nf_conntrack_confirm' is checked (without holding a lock), if the entry
>> to be confirmed is possibly dying: !nf_ct_is_dying(ct).
>> If not, then __nf_conntrack_confirm will do some sanity checking, grab a
>> spin_lock_bh and insert the 'ct' into the lookup cache.
>>
>> Now consider the following scenario:
>>
>> 1. a connection has seen the first packet already -> state is UNREPLIED
>> 2. now the answer is to be sent, conntrack wants to confirm the connection
>> 3. the !nf_ct_is_dying(ct) check is passed, __nf_conntrack_confirm is just
>> started
>> 4. in a user context a 'conntrack -F' command is running right now e.g. on
>> another CPU
>> 5. this will flag all unconfirmed connections as 'dying' in
>> get_next_corpse(...), including the entry going to be confirmed!
>> 6. now the already 'dying' entry is included into the hash cache in
>> __nf_conntrack_confirm - BOOM!

Just for the record: the conntrack will be confirmed once the first
packet passed through all the hooks, so the incorrect hash insertion
will happen on the first packet, but the race condition you describe
is correct.

>> After this step the connection in question is dead, because no packets are
>> forwarded until the entry is purged from hash cache. This was a big blocker
>> for us, because each dead IPsec tunnel is a dead branch network for 10
>> minutes...
>>
>> For every packet from now on 'nf_conntrack_find_get' will ignore the entry,
>> because it is dying and because __nf_conntrack_confirm finds the hash in
>> the cache already, it will NF_DROP the packet.
>>
>> The key for finding this was 'NF_CT_STAT_INC(net, insert_failed)' in
>> __nf_conntrack_confirm.
>>
>> The suggested solution is to check for '!nf_ct_is_dying(ct)' again, _after_
>> the spin_lock_bh is grabbed in __nf_conntrack_confirm. So it is clear, that
>> no other softirq or user context can set that 'evil' dying flag ;-)
>> The return value in this case should be NF_ACCEPT, so we loose no packets
>> then, this is also important for us.

I think this should be fine since the race you describe only affects
unconfirmed conntracks, but it took me a while to realize that all
the other spots where the DYING bit is set are fine without holding
the conntrack lock.

Could you please add a comment to the check in __nf_conntrack_confirm()
stating that the dying check is supposed to prevent races against
nf_ct_get_next_corpse()? The semantic of the DYING bit is unfortunately
a bit overloaded.

Also, since the condition unconfirmed + dying in nf_conntrack_confirm()
is highly unlikely, I'd suggest to remove the dying check there and only
perform it in __nf_conntrack_confirm().

>> diff --git a/net/netfilter/nf_conntrack_core.c
>> b/net/netfilter/nf_conntrack_core.c
>> index 1374179..e2c8bfe 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -413,6 +413,11 @@ __nf_conntrack_confirm(struct sk_buff *skb)
>>
>>  	spin_lock_bh(&nf_conntrack_lock);
>>
>> +	if (unlikely(nf_ct_is_dying(ct))) {
>> +		spin_unlock_bh(&nf_conntrack_lock);
>> +		return NF_ACCEPT;
>> +	}
>> +
>>  	/* See if there's one in the list already, including reverse:
>>  	   NAT could have grabbed it without realizing, since we're
>>  	   not in the hash.  If there is, we lost race. */
>> -- 1.5.6.5
> 
> I think this patch is fine. Patrick, would you apply it and pass it to
> -stable, please? Thanks!
> 
> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
> 


  reply	other threads:[~2010-05-10 15:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-07  8:45 [PATCH] nf_conntrack_core.c: fix for dead connection after flushing conntrack cache Joerg Marx
2010-05-09 22:57 ` Pablo Neira Ayuso
2010-05-10 15:23   ` Patrick McHardy [this message]
2010-05-10 17:08     ` Joerg Marx
2010-05-14 11:05       ` Patrick McHardy
2010-05-17  8:48         ` Joerg Marx
2010-05-20 13:56           ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BE824EA.6000303@trash.net \
    --to=kaber@trash.net \
    --cc=joerg.marx@secunet.com \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=netfilter@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).