Re: Netfilter Connection Tracking Race Condition in Kernel 2.4.x

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Patrick McHardy <kaber@trash.net>
To: Bob Halley <Bob.Halley@nominum.com>
Cc: netfilter-devel@lists.netfilter.org
Subject: Re: Netfilter Connection Tracking Race Condition in Kernel 2.4.x
Date: Tue, 25 Jul 2006 03:07:24 +0200	[thread overview]
Message-ID: <44C56ECC.3020107@trash.net> (raw)
In-Reply-To: <44C56653.10901@nominum.com>

Bob Halley wrote:
> This is bugzilla 495, resent to netfilter-devel by request.

Thanks.

> Background
> 
> Our application uses ip_queue in prerouting to divert DNS UDP packets
> to a userland daemon which inspects them and then issues a NF_ACCEPT
> or NF_DROP verdict back to the kernel.
> 
> We found that if several packets with the same conntrack tuple,
> i.e. the same src addr, src port, dst addr, and dst port, arrive very
> close together, then only the first one accepted by our software
> actually makes it back out to the wire; the others are silently
> dropped.
> 
> 
> Analysis
> 
> We instrumented the kernel to find out where the drop was occurring.
> The code doing the dropping was ip_refrag() in
> net/ipv4/netfilter/ip_conntrack_standalone.c, specifically:
> 
>       /* We've seen it coming out the other side: confirm */
>       if (ip_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT)
>               return NF_DROP;
> 
> The dropping is caused by a race between the first packet of a given
> tuple making it to confirmed state, and the arrival of another packet
> with the same tuple.  If a second packet arrives before the first is
> confirmed, it is assigned a new connection tracking context instead of
> joining that of the first unconfirmed packet.  When the second packet
> is finally handled by ip_refrag(), the call to ip_confim() finds that
> there is already a confirmed entry in the table, and returns NF_DROP.
> From the comments in __ip_contrack_confirm(), we infer that this is to
> deal with duplicated datagrams and some REJECT case, but it's the
> wrong thing in this case because the subsequent packets are neither
> duplicates nor REJECTs.
> 
> We were using RHEL 3 kernel 2.4.21-40 initially.  We looked at later
> 2.4.x kernels and found some promising looking changes, namely the
> addition of an unconfirmed list, in more recent 2.4.x kernels.  We built
> a 2.4.32 kernel and tested it, but the problem remained.  We looked into
> the nature of the unconfirmed list and discovered that it was solving a
> different problem, but could be a useful starting point for a fix.
> 
> 
> Fix
> 
> We decided to eliminate the race by having subsequent packets with the
> same conntrack tuple join the conntrack context of the first packet
> instead of creating a new conntrack context for each of them.  Here's
> the patch:
> 
> --- linux-2.4.32/net/ipv4/netfilter/ip_conntrack_core.c.orig  
> 2005-04-03 18:42:20.000000000 -0700
> +++ linux-2.4.32/net/ipv4/netfilter/ip_conntrack_core.c    2006-07-24
> 13:23:25.000000000 -0700
> @@ -777,6 +777,14 @@
>    /* look for tuple match */
>    h = ip_conntrack_find_get(&tuple, NULL);
>    if (!h) {
> +        READ_LOCK(&ip_conntrack_lock);
> +        h = LIST_FIND(&unconfirmed, conntrack_tuple_cmp,
> +                  struct ip_conntrack_tuple_hash *, &tuple, NULL);
> +        if (h)
> +            atomic_inc(&h->ctrack->ct_general.use);
> +        READ_UNLOCK(&ip_conntrack_lock);
> +    }
> +    if (!h) {
>        h = init_conntrack(&tuple, proto, skb);
>        if (!h)
>            return NULL;
> 
> This patch reliably ends the race, and we no longer have mysteriously
> disappearing packets.  Not being netfilter experts, we're not certain
> that this patch has no other side effects, and would appreciate any
> advice or alternative fixes that people who know more than we do have
> to offer.

The patch itself still contains a race, when a conntrack is confirmed
between the hash lookup and the unconfirmed lookup, it will still
create a new conntrack entry. About the general concept:

Using the unconfirmed lists looks like the only way to do this
besides changing the conntrack of the packet after noticing a
clash, but this would get tricky as the packets could already
cause state transitions or need sequence number adjustments.
Walking the unconfirmed list (which should usually only be very
small, but can grow large with queueing) for every new connection
sounds like a too large performance impact, so at least we need to
use a hash instead. Unfortunately its hard to specify a reasonable
size since it purely depends on userspace. The two other choices I
see are:

- confirm entries manually (using a target) or automatically
  before queueing packets. Not a good choice IMO since dropped
  connections will stay in the hash table until timeing out.

- change conntrack to always put connections in the hash immediately
  and remove them again if the connection is dropped before beeing
  confirmed.

The last option looks like the best to me for 2.6 (but could be
tricky to implement, haven't looked at that code in a while).
For 2.4 I think all these possibilities are too intrusive, you
probably need to maintain this patch yourself.

Any better suggestions?

next prev parent reply	other threads:[~2006-07-25  1:07 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-25  0:31 Netfilter Connection Tracking Race Condition in Kernel 2.4.x Bob Halley
2006-07-25  1:07 ` Patrick McHardy [this message]
2006-07-26  0:54   ` Phil Oester
2006-07-26  3:56     ` Patrick McHardy
2006-07-26  4:49       ` Yasuyuki KOZAKAI
2006-07-28 13:16   ` [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation Yasuyuki KOZAKAI
2006-07-31 11:15     ` Pablo Neira Ayuso
2006-08-04 14:43       ` Amin Azez
2006-08-08 10:19         ` Patrick McHardy
  -- strict thread matches above, loose matches on Subject: below --
2006-07-25 13:18 Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44C56ECC.3020107@trash.net \
    --to=kaber@trash.net \
    --cc=Bob.Halley@nominum.com \
    --cc=netfilter-devel@lists.netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.