Re: Possible race condition in conntracking

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Possible race condition in conntracking
       [not found] <20090127075744.GA19875@eric.schwarzvogel.de>
@ 2009-01-27  9:20 ` Patrick McHardy
  2009-01-27 13:06   ` Tobias Klausmann
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick McHardy @ 2009-01-27  9:20 UTC (permalink / raw)
  To: Tobias Klausmann; +Cc: netdev, Netfilter Development Mailinglist

[CCed netfilter-devel]

Tobias Klausmann wrote:
> Hi!
> 
> I'm resending this to netdev (sent it to linux-net yesterday)
> because I was told all the cool and relevant kids hang out here
> rather than there.
> 
> It seems I've stumbled across a bug in the way Netfilter handles
> packets. I have only been able to reproduce this with UDP, but it
> might also affect other IP protocols. This first bit me when
> updating from glibc 2.7 to 2.9.
> 
> Suppose a program calls getaddrinfo() to find the address of a
> given hostname. Usually, the glibc resolver asks the name server
> for both the A and AAAA records, gets two answers (addresses or
> NXDOMAIN) and happily continues on. What is new with glibc 2.9 is
> that it doesn't serialize the two requests in the same way as 2.7
> did. The older version will ask for the A record, wait for the
> answer, ask for the AAAA record, then wait for that answer. The
> newer lib will fire off both requests in short time (usually 5-20
> microseconds apart on the systems I tested with). Not only that,
> it also uses the same socket fd (and thus source port) for both
> requests.
> 
> Now if those packets traverse a Netfilter firewall, in the
> glibc-2.7 case, they will create two conntrack entries, allowing
> the answers back[0] and everything is peachy. In the glibc-2.9
> case, sometimes, the second packet gets lost[1]. After
> eliminating other causes (buggy checksum offloading, packetloss,
> busy firewall and/or DNS server and a host of others), I'm sure
> it's lost inside the firewall's Netfilter code. 
> 
> Using counting-only rules and building a dedicated setup with a
> minimal Netfilter rule set, we could watch the counters, finding
> two interesting facts for the failing case:
> 
> - The count in the NAT pre/postrouting chains is higher than for
>   the case where the requests work. This points to the second
>   packet being counted although it's part of the same connection
>   as the first.
>   
> - All other counters increase, up to and including
>   mangle/POSTROUTING. 
> 
> In essence, if you have N tries and one of them fails, you have
> 2N packets counted everywhere except the NAT chains, where it's
> N+1.
> 
> Since neither QoS nor tunneling is involved, the second packet
> appears to be dropped by Netfilter or the NICs code. Since we see
> this behaviour on varying hardware, I'm rather sure it's the
> former.
> 
> The working hypothesis of what happens is this:
> 
> - The first packet enters Netfilter code, triggering a check if a
>   conntrack entry is relevant for it. Since there is no entry,
>   the packet creates a new conntrack that isn't yet in the global
>   hash of conntrack entries. Since the chains could modify the
>   packet's relevant info, the entry can not be added to the hash
>   then and there (aka unconfirmed conntrack).
> 
> - The second packet enters Netfilter code. Again, no conntrack
>   entry is relevant since the first packet has not gotten to the
>   point where its conntrack would have been added to the global
>   hash, so the second packet gets an unconfirmed conntrack, too.
> 
> - The first packet reaches the point where the conntrack entry is
>   added to the global hash.
> 
> - The second packet reaches the same point but since it has the
>   same src/sport-dst/dport-proto tuple, its conntrack causes a
>   clash with the existing entry and both (packet and entry) are
>   discarded.

That sounds plausible, but we only discard the new conntrack
entry on clashes. The packet should be fine, unless you drop
INVALID packets in your ruleset.

> Since the timing is very critical on this, it only happens if an
> application (such as the glibc resolver of 2.9) fires two packets
> rapidly *and* those have the same 5-tuple *and* they are
> processed in parallel (e.g. on a multicore machine). 
> 
> Another observation is that this happens much less often with
> some kernels. While the on one it can be triggered about 50% of
> the cases, on another you can go for 20k rounds of two packets
> before the bug is triggered. Note, however, that the
> probabilities vary wildly: I've seen the program break on the
> first 100 packets a dozen times in a row and later not breaking
> for 50k tries in a row on the same kernel.
> 
> Since glibc 2.7 is using different ports and waiting for answers,
> it doesn't trigger this race. I guess there are very few
> applications where normal operations allow for a quickfire of the
> first two UDP packets in this manner. As a result, this has gone
> unnoticed for quite a while - and even if it happens, it may look
> like a fluke.
> 
> When looking at the conntrack stats, we also see that
> insert_failed in /proc/net/stat/nf_conntrack does indeed increase
> when the routing of the second packet fails.
> 
> The kernels used on the firewall (all vanilla versions):
> 2.6.25.16 
> 2.4.19pre1
> 2.6.28.1
> 
> All of them show this behaviour. On the clients, we only have
> 2.6-series kernels, but I doubt they influence this scenario
> (much).

Try tracing the packet using the TRACE target. That should show
whether it really disappears within netfilter and where.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race condition in conntracking
  2009-01-27  9:20 ` Possible race condition in conntracking Patrick McHardy
@ 2009-01-27 13:06   ` Tobias Klausmann
  2009-01-27 13:14     ` Patrick McHardy
  0 siblings, 1 reply; 5+ messages in thread
From: Tobias Klausmann @ 2009-01-27 13:06 UTC (permalink / raw)
  To: netdev; +Cc: Netfilter Development Mailinglist

Hi! 

(I've now subscribed to netdev@, so no more CCs to me are necessary).

On Tue, 27 Jan 2009, Patrick McHardy wrote:
> That sounds plausible, but we only discard the new conntrack
> entry on clashes. The packet should be fine, unless you drop
> INVALID packets in your ruleset.

The ruleset currently does not contain any rules regarding
INVALID. Consequently, we opted for the TRACE approach.

> Try tracing the packet using the TRACE target. That should show
> whether it really disappears within netfilter and where.

I've removed the irrelevant fields like TTL, PREC etc and timing
info from syslog from the trace after making sure nothing funky
was going on there.

Apart from the ID field, I ended up with two identical traces.

So, as far as rule-matching is concerned, the two packets are
handled identically. Whatever happens after this:

Jan 27 11:00:39 fw2 kernel: TRACE: nat:POSTROUTING:policy:3 IN=
OUT=eth2.188 SRC=194.97.7.116 DST=194.97.3.83 LEN=66 TOS=0x00
PREC=0x00 TTL=63 ID=46964 DF PROTO=UDP SPT=53452 DPT=53 LEN=46 

is making this very packet go away. The policy of nat/PR is
ACCEPT.

Presuming this:
http://xkr47.outerspace.dyndns.org/netfilter/packet_flow/packet_flow9.png

is accurate, I'm not sure what could drop the packet. We're not
using QoS or tunneling on the packetfilter in question. This
happens on two different machines (the machines are of the same
type, but they have different NICs), so I doubt it's a hardware
or driver issue.

-- 
printk("Cool stuff's happening!\n")
        linux-2.4.3/fs/jffs/intrep.c

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race condition in conntracking
  2009-01-27 13:06   ` Tobias Klausmann
@ 2009-01-27 13:14     ` Patrick McHardy
  2009-01-27 13:28       ` Tobias Klausmann
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick McHardy @ 2009-01-27 13:14 UTC (permalink / raw)
  To: Tobias Klausmann; +Cc: netdev, Netfilter Development Mailinglist

Tobias Klausmann wrote:
> Hi! 
> 
> (I've now subscribed to netdev@, so no more CCs to me are necessary).
> 
> On Tue, 27 Jan 2009, Patrick McHardy wrote:
>> That sounds plausible, but we only discard the new conntrack
>> entry on clashes. The packet should be fine, unless you drop
>> INVALID packets in your ruleset.
> 
> The ruleset currently does not contain any rules regarding
> INVALID. Consequently, we opted for the TRACE approach.
> 
>> Try tracing the packet using the TRACE target. That should show
>> whether it really disappears within netfilter and where.
> 
> I've removed the irrelevant fields like TTL, PREC etc and timing
> info from syslog from the trace after making sure nothing funky
> was going on there.
> 
> Apart from the ID field, I ended up with two identical traces.
> 
> So, as far as rule-matching is concerned, the two packets are
> handled identically. Whatever happens after this:
> 
> Jan 27 11:00:39 fw2 kernel: TRACE: nat:POSTROUTING:policy:3 IN=
> OUT=eth2.188 SRC=194.97.7.116 DST=194.97.3.83 LEN=66 TOS=0x00
> PREC=0x00 TTL=63 ID=46964 DF PROTO=UDP SPT=53452 DPT=53 LEN=46 
> 
> is making this very packet go away. The policy of nat/PR is
> ACCEPT.

This just means it passed through the last table/chain. The
only one following is conntrack confirmation.

Damn it :) I just noticed, we do indeed drop packets from
duplicate new connections in conntrack confirmation.

You should see the insert_failed conntrack counter show this
(/proc/net/stat/nf_conntrack).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race condition in conntracking
  2009-01-27 13:14     ` Patrick McHardy
@ 2009-01-27 13:28       ` Tobias Klausmann
  2009-01-27 13:48         ` Patrick McHardy
  0 siblings, 1 reply; 5+ messages in thread
From: Tobias Klausmann @ 2009-01-27 13:28 UTC (permalink / raw)
  To: netdev; +Cc: Netfilter Development Mailinglist

Hi! 

On Tue, 27 Jan 2009, Patrick McHardy wrote:
> Tobias Klausmann wrote:
>> So, as far as rule-matching is concerned, the two packets are
>> handled identically. Whatever happens after this:
>> Jan 27 11:00:39 fw2 kernel: TRACE: nat:POSTROUTING:policy:3 IN=
>> OUT=eth2.188 SRC=194.97.7.116 DST=194.97.3.83 LEN=66 TOS=0x00
>> PREC=0x00 TTL=63 ID=46964 DF PROTO=UDP SPT=53452 DPT=53 LEN=46 is making 
>> this very packet go away. The policy of nat/PR is
>> ACCEPT.
>
> This just means it passed through the last table/chain. The
> only one following is conntrack confirmation.
>
> Damn it :) I just noticed, we do indeed drop packets from
> duplicate new connections in conntrack confirmation.

So the question remains what to do instead and how to do it. That
probably is deep Netfilter mojo, so I could only speculate wildly.

> You should see the insert_failed conntrack counter show this
> (/proc/net/stat/nf_conntrack).

We do, as I said in my first mail. Near as I can tell,
nf_conntrack_confirm() is the only function that ever increases
that counter, so it's definitely dropped there. As to how one
could handle it differently, I have to defer to people with more
Netfilter expertise. No point in "fixing" this by breaking other
stuff.

Regards,
Tobias

-- 
printk("Cool stuff's happening!\n")
        linux-2.4.3/fs/jffs/intrep.c

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race condition in conntracking
  2009-01-27 13:28       ` Tobias Klausmann
@ 2009-01-27 13:48         ` Patrick McHardy
  0 siblings, 0 replies; 5+ messages in thread
From: Patrick McHardy @ 2009-01-27 13:48 UTC (permalink / raw)
  To: Tobias Klausmann; +Cc: netdev, Netfilter Development Mailinglist

Tobias Klausmann wrote:
> So the question remains what to do instead and how to do it. That
> probably is deep Netfilter mojo, so I could only speculate wildly.
> 
>> You should see the insert_failed conntrack counter show this
>> (/proc/net/stat/nf_conntrack).
> 
> We do, as I said in my first mail. Near as I can tell,
> nf_conntrack_confirm() is the only function that ever increases
> that counter, so it's definitely dropped there. As to how one
> could handle it differently, I have to defer to people with more
> Netfilter expertise. No point in "fixing" this by breaking other
> stuff.

Fixing this requires some rather intrusive changes. We need
to perform a lookup on the unconfirmed list when a conntrack
is not found in the hash and use the one we find there, if any.
The entries on that list are not reference counted and there
are a lot of assumptions in the code that an unconfirmed conntrack
is exclusively associated with a single packet. This needs to
be audited and fixed, but it looks quite hard.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-01-27 13:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20090127075744.GA19875@eric.schwarzvogel.de>
2009-01-27  9:20 ` Possible race condition in conntracking Patrick McHardy
2009-01-27 13:06   ` Tobias Klausmann
2009-01-27 13:14     ` Patrick McHardy
2009-01-27 13:28       ` Tobias Klausmann
2009-01-27 13:48         ` Patrick McHardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).