* recent in_range fix uncovered another bug?
@ 2003-09-10 11:05 Karlis Peisenieks
2003-09-11 5:45 ` [netfilter-core] " Rusty Russell
0 siblings, 1 reply; 5+ messages in thread
From: Karlis Peisenieks @ 2003-09-10 11:05 UTC (permalink / raw)
To: coreteam; +Cc: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 2325 bytes --]
Hello!
I came across "lockup" in conntrack code using 2.4 kernel with bridge
netfilter (which is standart in 2.6 kernel) patch applied. Lockup can be
reproduced by pinging some host over bridge interface if the host is
unreachable in the beginning (e.g. ARP does not resolve) and then
becomes reachable (ARP resolves).
Here is what happens:
- echo request is conntrack-ed, new conntrack is created
- bridge netfilter code "sabotages" in post-routing hook, so it can
process the rest of hooks when actual outgoing interface is determined
- neigbour code takes over packet and holds it until ARP is resolved,
conntrack for packet is "hanging in the air"
- above repeats for _every_ echo request packet created
At this point ARP gets resolved
- neighbour code "flushes" its queue, sending packets one by one to
bridge interface
- bridge code determines outgoing interface and calls IP post-routing
hook
- nat code creates null-binding
- post-routing hook confirms conntrack, it gets linked in lists
- packet is queued for hw interface
- neighbour code sends second packet
- all goes as above up to creating of null-binding
- in ip_nat_setup_info loop that is calling get_unique_tuple loops
forever because:
- find_appropriate_src finds existing src-mainp (which is not doing
anything anyway) and decides to use it
- ip_conntrack_alter_reply fails because it finds conntrack looking
exactly the same as this (remember - that conntrack got confirmed when
sending first packet) and thinks that get_unique_tuple will return
better unique tuple.
So problem here are 2 equal conntrack-s, that should normaly be one, but
as confirmation of first got delayed (because of bridge interface
standing in the middle), second one got created.
It did not happen before in_range logic fix because every time
completely new manip was created (with different icmp id) which made
those conntracks different.
I hope the problem is more or less clear :). What should be fixed here?
I doubt it is bridge code (although this "sabotage" is evil - at least
in how it makes code hard to understand), because there can be other
cases when packet can be delayed (e.g. with ip_queue?) without
conntrack confirmation.
Quick and dirty fix to avoid lockup was to avoid ip_nat_setup_info
looping forever. Patch attached.
Karlis
[-- Attachment #2: nat.patch --]
[-- Type: text/plain, Size: 615 bytes --]
diff -u -r1.3.4.8 ip_nat_core.c
--- ip_nat_core.c 2 Sep 2003 10:21:18 -0000 1.3.4.8
+++ ip_nat_core.c 10 Sep 2003 09:22:50 -0000
@@ -516,6 +516,7 @@
struct ip_conntrack_tuple new_tuple, inv_tuple, reply;
struct ip_conntrack_tuple orig_tp;
struct ip_nat_info *info = &conntrack->nat.info;
+ int i;
MUST_BE_WRITE_LOCKED(&ip_nat_lock);
IP_NF_ASSERT(hooknum == NF_IP_PRE_ROUTING
@@ -557,7 +558,9 @@
}
#endif
+ i = 0;
do {
+ if (i++ == 3) return NF_DROP;
if (!get_unique_tuple(&new_tuple, &orig_tp, mr, conntrack,
hooknum)) {
DEBUGP("ip_nat_setup_info: Can't get unique for %p.\n",
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [netfilter-core] recent in_range fix uncovered another bug?
2003-09-10 11:05 recent in_range fix uncovered another bug? Karlis Peisenieks
@ 2003-09-11 5:45 ` Rusty Russell
2003-09-11 7:09 ` Karlis Peisenieks
0 siblings, 1 reply; 5+ messages in thread
From: Rusty Russell @ 2003-09-11 5:45 UTC (permalink / raw)
To: Karlis Peisenieks; +Cc: netfilter-devel
In message <20030910110555.GA30240@mt.lv> you write:
> Hello!
>
> I came across "lockup" in conntrack code using 2.4 kernel with bridge
> netfilter (which is standart in 2.6 kernel) patch applied. Lockup can be
> reproduced by pinging some host over bridge interface if the host is
> unreachable in the beginning (e.g. ARP does not resolve) and then
> becomes reachable (ARP resolves).
Hmm, I've had a patch around for this code for a while now. The code
is simply wrong. Does this work for you?
Thanks for your report!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
Name: ip_conntrack_alter_reply thinko fix
Author: Rusty Russell
Status: Trivial
D: ip_conntrack_alter_reply checks that the reply isn't already taken,
D: but there's little point, since there's *still* a race after it is
D: called (which we handle at confirm time anyway).
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .18239-linux-2.6.0-test5-bk1/include/linux/netfilter_ipv4/ip_conntrack.h .18239-linux-2.6.0-test5-bk1.updated/include/linux/netfilter_ipv4/ip_conntrack.h
--- .18239-linux-2.6.0-test5-bk1/include/linux/netfilter_ipv4/ip_conntrack.h 2003-05-27 15:02:21.000000000 +1000
+++ .18239-linux-2.6.0-test5-bk1.updated/include/linux/netfilter_ipv4/ip_conntrack.h 2003-09-11 15:38:49.000000000 +1000
@@ -211,9 +211,8 @@ struct ip_conntrack
/* get master conntrack via master expectation */
#define master_ct(conntr) (conntr->master ? conntr->master->expectant : NULL)
-/* Alter reply tuple (maybe alter helper). If it's already taken,
- return 0 and don't do alteration. */
-extern int
+/* Alter reply tuple (maybe alter helper). */
+extern void
ip_conntrack_alter_reply(struct ip_conntrack *conntrack,
const struct ip_conntrack_tuple *newreply);
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .18239-linux-2.6.0-test5-bk1/net/ipv4/netfilter/ip_conntrack_core.c .18239-linux-2.6.0-test5-bk1.updated/net/ipv4/netfilter/ip_conntrack_core.c
--- .18239-linux-2.6.0-test5-bk1/net/ipv4/netfilter/ip_conntrack_core.c 2003-09-09 10:35:07.000000000 +1000
+++ .18239-linux-2.6.0-test5-bk1.updated/net/ipv4/netfilter/ip_conntrack_core.c 2003-09-11 15:39:14.000000000 +1000
@@ -1081,16 +1081,12 @@ int ip_conntrack_change_expect(struct ip
return ret;
}
-/* Alter reply tuple (maybe alter helper). If it's already taken,
- return 0 and don't do alteration. */
-int ip_conntrack_alter_reply(struct ip_conntrack *conntrack,
- const struct ip_conntrack_tuple *newreply)
+/* Alter reply tuple (maybe alter helper). This is for NAT, and is
+ implicitly racy: see __ip_conntrack_confirm */
+void ip_conntrack_alter_reply(struct ip_conntrack *conntrack,
+ const struct ip_conntrack_tuple *newreply)
{
WRITE_LOCK(&ip_conntrack_lock);
- if (__ip_conntrack_find(newreply, conntrack)) {
- WRITE_UNLOCK(&ip_conntrack_lock);
- return 0;
- }
/* Should be unconfirmed, so not in hash table yet */
IP_NF_ASSERT(!is_confirmed(conntrack));
@@ -1103,8 +1099,6 @@ int ip_conntrack_alter_reply(struct ip_c
struct ip_conntrack_helper *,
newreply);
WRITE_UNLOCK(&ip_conntrack_lock);
-
- return 1;
}
int ip_conntrack_helper_register(struct ip_conntrack_helper *me)
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .18239-linux-2.6.0-test5-bk1/net/ipv4/netfilter/ip_nat_core.c .18239-linux-2.6.0-test5-bk1.updated/net/ipv4/netfilter/ip_nat_core.c
--- .18239-linux-2.6.0-test5-bk1/net/ipv4/netfilter/ip_nat_core.c 2003-09-09 10:35:07.000000000 +1000
+++ .18239-linux-2.6.0-test5-bk1.updated/net/ipv4/netfilter/ip_nat_core.c 2003-09-11 15:38:49.000000000 +1000
@@ -557,38 +557,25 @@ ip_nat_setup_info(struct ip_conntrack *c
}
#endif
- do {
- if (!get_unique_tuple(&new_tuple, &orig_tp, mr, conntrack,
- hooknum)) {
- DEBUGP("ip_nat_setup_info: Can't get unique for %p.\n",
- conntrack);
- return NF_DROP;
- }
-
-#if 0
- DEBUGP("Hook %u (%s) %p\n", hooknum,
- HOOK2MANIP(hooknum)==IP_NAT_MANIP_SRC ? "SRC" : "DST",
+ if (!get_unique_tuple(&new_tuple, &orig_tp, mr, conntrack, hooknum)) {
+ DEBUGP("ip_nat_setup_info: Can't get unique for %p.\n",
conntrack);
- DEBUGP("Original: ");
- DUMP_TUPLE(&orig_tp);
- DEBUGP("New: ");
- DUMP_TUPLE(&new_tuple);
-#endif
+ return NF_DROP;
+ }
- /* We now have two tuples (SRCIP/SRCPT/DSTIP/DSTPT):
- the original (A/B/C/D') and the mangled one (E/F/G/H').
+ /* We now have two tuples (SRCIP/SRCPT/DSTIP/DSTPT):
+ the original (A/B/C/D') and the mangled one (E/F/G/H').
- We're only allowed to work with the SRC per-proto
- part, so we create inverses of both to start, then
- derive the other fields we need. */
+ We're only allowed to work with the SRC per-proto
+ part, so we create inverses of both to start, then
+ derive the other fields we need. */
- /* Reply connection: simply invert the new tuple
- (G/H/E/F') */
- invert_tuplepr(&reply, &new_tuple);
+ /* Reply connection: simply invert the new tuple
+ (G/H/E/F') */
+ invert_tuplepr(&reply, &new_tuple);
- /* Alter conntrack table so it recognizes replies.
- If fail this race (reply tuple now used), repeat. */
- } while (!ip_conntrack_alter_reply(conntrack, &reply));
+ /* Alter conntrack table so it recognizes replies. */
+ ip_conntrack_alter_reply(conntrack, &reply);
/* FIXME: We can simply used existing conntrack reply tuple
here --RR */
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [netfilter-core] recent in_range fix uncovered another bug?
2003-09-11 5:45 ` [netfilter-core] " Rusty Russell
@ 2003-09-11 7:09 ` Karlis Peisenieks
2003-09-11 7:15 ` Karlis Peisenieks
0 siblings, 1 reply; 5+ messages in thread
From: Karlis Peisenieks @ 2003-09-11 7:09 UTC (permalink / raw)
To: Rusty Russell; +Cc: netfilter-devel
On Thu, Sep 11, 2003 at 03:45:53PM +1000, Rusty Russell wrote:
> In message <20030910110555.GA30240@mt.lv> you write:
> > Hello!
> >
> > I came across "lockup" in conntrack code using 2.4 kernel with bridge
> > netfilter (which is standart in 2.6 kernel) patch applied. Lockup can be
> > reproduced by pinging some host over bridge interface if the host is
> > unreachable in the beginning (e.g. ARP does not resolve) and then
> > becomes reachable (ARP resolves).
>
> Hmm, I've had a patch around for this code for a while now. The code
> is simply wrong. Does this work for you?
This patch solves "lockup" problem, but what about tons of equal
conntracks?
I do not understand all logic beyond this, but perhaps tuples should get
linked in hashtable as they are created in prerouting/localout/whenever
and confirm would only increase refcount (and set appropriate bit in
status)? This way kfree_skb of packet without confirmation would throw
conntrack out anyway, but while packets are sitting somewhere in the
stack, all "retransmissions" would get "accounted" for correct
conntrack.
Think about SYN retransmission - connection can be long over, but the
duplicate (created because first syn was delayed) is still sitting
around.
Karlis
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [netfilter-core] recent in_range fix uncovered another bug?
2003-09-11 7:09 ` Karlis Peisenieks
@ 2003-09-11 7:15 ` Karlis Peisenieks
2003-09-11 7:50 ` Rusty Russell
0 siblings, 1 reply; 5+ messages in thread
From: Karlis Peisenieks @ 2003-09-11 7:15 UTC (permalink / raw)
To: Rusty Russell; +Cc: netfilter-devel
On Thu, Sep 11, 2003 at 10:09:56AM +0300, Karlis Peisenieks wrote:
> This patch solves "lockup" problem, but what about tons of equal
> conntracks?
Sorry, I overlooked how confirm works, now it is clear that two equal
conntracks can not be on lists. The only problem stays that potentially
innocent packets (e.g. in case of icmp echo request) get dropped.
Is it correct?
Karlis
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [netfilter-core] recent in_range fix uncovered another bug?
2003-09-11 7:15 ` Karlis Peisenieks
@ 2003-09-11 7:50 ` Rusty Russell
0 siblings, 0 replies; 5+ messages in thread
From: Rusty Russell @ 2003-09-11 7:50 UTC (permalink / raw)
To: Karlis Peisenieks; +Cc: netfilter-devel
In message <20030911071549.GA1045@mt.lv> you write:
> On Thu, Sep 11, 2003 at 10:09:56AM +0300, Karlis Peisenieks wrote:
>
> > This patch solves "lockup" problem, but what about tons of equal
> > conntracks?
>
> Sorry, I overlooked how confirm works, now it is clear that two equal
> conntracks can not be on lists. The only problem stays that potentially
> innocent packets (e.g. in case of icmp echo request) get dropped.
>
> Is it correct?
Yes, sometimes packets get lost. That's actually quite a nice
solution to the race conditon 8)
Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-09-11 7:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-10 11:05 recent in_range fix uncovered another bug? Karlis Peisenieks
2003-09-11 5:45 ` [netfilter-core] " Rusty Russell
2003-09-11 7:09 ` Karlis Peisenieks
2003-09-11 7:15 ` Karlis Peisenieks
2003-09-11 7:50 ` Rusty Russell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.