netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash.net>
To: Tobias Klausmann <klausman@schwarzvogel.de>
Cc: netdev@vger.kernel.org,
	Netfilter Development Mailinglist
	<netfilter-devel@vger.kernel.org>
Subject: Re: Possible race condition in conntracking
Date: Tue, 27 Jan 2009 10:20:33 +0100	[thread overview]
Message-ID: <497ED1E1.40304@trash.net> (raw)
In-Reply-To: <20090127075744.GA19875@eric.schwarzvogel.de>

[CCed netfilter-devel]

Tobias Klausmann wrote:
> Hi!
> 
> I'm resending this to netdev (sent it to linux-net yesterday)
> because I was told all the cool and relevant kids hang out here
> rather than there.
> 
> It seems I've stumbled across a bug in the way Netfilter handles
> packets. I have only been able to reproduce this with UDP, but it
> might also affect other IP protocols. This first bit me when
> updating from glibc 2.7 to 2.9.
> 
> Suppose a program calls getaddrinfo() to find the address of a
> given hostname. Usually, the glibc resolver asks the name server
> for both the A and AAAA records, gets two answers (addresses or
> NXDOMAIN) and happily continues on. What is new with glibc 2.9 is
> that it doesn't serialize the two requests in the same way as 2.7
> did. The older version will ask for the A record, wait for the
> answer, ask for the AAAA record, then wait for that answer. The
> newer lib will fire off both requests in short time (usually 5-20
> microseconds apart on the systems I tested with). Not only that,
> it also uses the same socket fd (and thus source port) for both
> requests.
> 
> Now if those packets traverse a Netfilter firewall, in the
> glibc-2.7 case, they will create two conntrack entries, allowing
> the answers back[0] and everything is peachy. In the glibc-2.9
> case, sometimes, the second packet gets lost[1]. After
> eliminating other causes (buggy checksum offloading, packetloss,
> busy firewall and/or DNS server and a host of others), I'm sure
> it's lost inside the firewall's Netfilter code. 
> 
> Using counting-only rules and building a dedicated setup with a
> minimal Netfilter rule set, we could watch the counters, finding
> two interesting facts for the failing case:
> 
> - The count in the NAT pre/postrouting chains is higher than for
>   the case where the requests work. This points to the second
>   packet being counted although it's part of the same connection
>   as the first.
>   
> - All other counters increase, up to and including
>   mangle/POSTROUTING. 
> 
> In essence, if you have N tries and one of them fails, you have
> 2N packets counted everywhere except the NAT chains, where it's
> N+1.
> 
> Since neither QoS nor tunneling is involved, the second packet
> appears to be dropped by Netfilter or the NICs code. Since we see
> this behaviour on varying hardware, I'm rather sure it's the
> former.
> 
> The working hypothesis of what happens is this:
> 
> - The first packet enters Netfilter code, triggering a check if a
>   conntrack entry is relevant for it. Since there is no entry,
>   the packet creates a new conntrack that isn't yet in the global
>   hash of conntrack entries. Since the chains could modify the
>   packet's relevant info, the entry can not be added to the hash
>   then and there (aka unconfirmed conntrack).
> 
> - The second packet enters Netfilter code. Again, no conntrack
>   entry is relevant since the first packet has not gotten to the
>   point where its conntrack would have been added to the global
>   hash, so the second packet gets an unconfirmed conntrack, too.
> 
> - The first packet reaches the point where the conntrack entry is
>   added to the global hash.
> 
> - The second packet reaches the same point but since it has the
>   same src/sport-dst/dport-proto tuple, its conntrack causes a
>   clash with the existing entry and both (packet and entry) are
>   discarded.

That sounds plausible, but we only discard the new conntrack
entry on clashes. The packet should be fine, unless you drop
INVALID packets in your ruleset.

> Since the timing is very critical on this, it only happens if an
> application (such as the glibc resolver of 2.9) fires two packets
> rapidly *and* those have the same 5-tuple *and* they are
> processed in parallel (e.g. on a multicore machine). 
> 
> Another observation is that this happens much less often with
> some kernels. While the on one it can be triggered about 50% of
> the cases, on another you can go for 20k rounds of two packets
> before the bug is triggered. Note, however, that the
> probabilities vary wildly: I've seen the program break on the
> first 100 packets a dozen times in a row and later not breaking
> for 50k tries in a row on the same kernel.
> 
> Since glibc 2.7 is using different ports and waiting for answers,
> it doesn't trigger this race. I guess there are very few
> applications where normal operations allow for a quickfire of the
> first two UDP packets in this manner. As a result, this has gone
> unnoticed for quite a while - and even if it happens, it may look
> like a fluke.
> 
> When looking at the conntrack stats, we also see that
> insert_failed in /proc/net/stat/nf_conntrack does indeed increase
> when the routing of the second packet fails.
> 
> The kernels used on the firewall (all vanilla versions):
> 2.6.25.16 
> 2.4.19pre1
> 2.6.28.1
> 
> All of them show this behaviour. On the clients, we only have
> 2.6-series kernels, but I doubt they influence this scenario
> (much).

Try tracing the packet using the TRACE target. That should show
whether it really disappears within netfilter and where.

       reply	other threads:[~2009-01-27  9:20 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090127075744.GA19875@eric.schwarzvogel.de>
2009-01-27  9:20 ` Patrick McHardy [this message]
2009-01-27 13:06   ` Possible race condition in conntracking Tobias Klausmann
2009-01-27 13:14     ` Patrick McHardy
2009-01-27 13:28       ` Tobias Klausmann
2009-01-27 13:48         ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=497ED1E1.40304@trash.net \
    --to=kaber@trash.net \
    --cc=klausman@schwarzvogel.de \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).