* NAT && TIME_WAIT TCP connections
@ 2003-09-29 12:20 Kovacs Krisztian
2003-10-02 20:05 ` Harald Welte
0 siblings, 1 reply; 2+ messages in thread
From: Kovacs Krisztian @ 2003-09-29 12:20 UTC (permalink / raw)
To: Netfilter Devel
[-- Attachment #1: Type: text/plain, Size: 1680 bytes --]
Hi,
I have a proposal regarding connection tracking and NAT. Imagine a
scenario, where you must SNAT TCP traffic not only to a specific IP range,
but also to a specific port range. The extreme case of the scenario is of
course one IP with only one port, for example:
iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -p tcp -j SNAT \
--to-source 0.2.0.1:1234
However, such a setup maximizes the number of connections to which NAT
can be applied, and you must wait for the existing connections to get
deleted (timeout, etc.) before another connection can be created. However,
in case of TCP, when the SNAT range is a scarce resource, IP:port pairs
could be reused for connections where the connection is already in a
'half-died' state (for example, TCP's TIME_WAIT). The theory of operation
is the following: a protocol helper marks the conntrack entry
MAY_BE_DELETED if it thinks that it's in a state where new packages cannot
be received. Then, if ip_nat_setup_info() finds that while trying to
allocate a new IP/port pair from the given range, a clashing conntrack
entry has this flag, it deletes the old one, so the allocation can succeed.
While the upper example may look a bit extreme, such problems occur
much more often when using the TPROXY patch and a transparent SQUID proxy.
The attached patch helped a lot in these cases (and after modifying
ip_conntrack_proto_tcp.c accordingly, to mark TPROXY-ed TCP connections
'deletable' when they reach the TIME_WAIT state).
Any comments? (I don't like the idea of deleting conntrack entries in
ip_nat_setup_info(), however, I don't have a better idea.)
--
Regards,
Krisztian KOVACS
[-- Attachment #2: nat-delete-conntrack.diff --]
[-- Type: text/plain, Size: 3689 bytes --]
diff -urN linux-2.4.22-orig/include/linux/netfilter_ipv4/ip_conntrack.h linux-2.4.22/include/linux/netfilter_ipv4/ip_conntrack.h
--- linux-2.4.22-orig/include/linux/netfilter_ipv4/ip_conntrack.h Fri Jun 13 16:51:38 2003
+++ linux-2.4.22/include/linux/netfilter_ipv4/ip_conntrack.h Mon Sep 29 11:43:55 2003
@@ -46,6 +46,10 @@
/* Connection is confirmed: originating packet has left box */
IPS_CONFIRMED_BIT = 3,
IPS_CONFIRMED = (1 << IPS_CONFIRMED_BIT),
+
+ /* May delete conntrack if its tuple is needed for NAT */
+ IPS_MAY_DELETE_BIT = 5,
+ IPS_MAY_DELETE = (1 << IPS_MAY_DELETE_BIT),
};
#include <linux/netfilter_ipv4/ip_conntrack_tcp.h>
@@ -219,7 +223,7 @@
/* Is this tuple taken? (ignoring any belonging to the given
conntrack). */
-extern int
+extern struct ip_conntrack_tuple_hash *
ip_conntrack_tuple_taken(const struct ip_conntrack_tuple *tuple,
const struct ip_conntrack *ignored_conntrack);
diff -urN linux-2.4.22-orig/net/ipv4/netfilter/ip_conntrack_core.c linux-2.4.22/net/ipv4/netfilter/ip_conntrack_core.c
--- linux-2.4.22-orig/net/ipv4/netfilter/ip_conntrack_core.c Mon Aug 25 13:44:44 2003
+++ linux-2.4.22/net/ipv4/netfilter/ip_conntrack_core.c Mon Sep 29 11:43:00 2003
@@ -479,7 +479,7 @@
/* Returns true if a connection correspondings to the tuple (required
for NAT). */
-int
+struct ip_conntrack_tuple_hash *
ip_conntrack_tuple_taken(const struct ip_conntrack_tuple *tuple,
const struct ip_conntrack *ignored_conntrack)
{
@@ -489,7 +489,7 @@
h = __ip_conntrack_find(tuple, ignored_conntrack);
READ_UNLOCK(&ip_conntrack_lock);
- return h != NULL;
+ return h;
}
/* Returns conntrack if it dealt with ICMP, and filled in skb fields */
diff -urN linux-2.4.22-orig/net/ipv4/netfilter/ip_nat_core.c linux-2.4.22/net/ipv4/netfilter/ip_nat_core.c
--- linux-2.4.22-orig/net/ipv4/netfilter/ip_nat_core.c Mon Aug 25 13:44:44 2003
+++ linux-2.4.22/net/ipv4/netfilter/ip_nat_core.c Mon Sep 29 11:53:53 2003
@@ -92,6 +92,35 @@
WRITE_UNLOCK(&ip_nat_lock);
}
+static void __ip_nat_cleanup_conntrack(struct ip_conntrack *conn)
+{
+ struct ip_nat_info *info = &conn->nat.info;
+
+ if (!info->initialized)
+ return;
+
+ IP_NF_ASSERT(info->bysource.conntrack);
+ IP_NF_ASSERT(info->byipsproto.conntrack);
+
+ MUST_BE_WRITE_LOCKED(&ip_nat_lock);
+
+ LIST_DELETE(&bysource[hash_by_src(&conn->tuplehash[IP_CT_DIR_ORIGINAL]
+ .tuple.src,
+ conn->tuplehash[IP_CT_DIR_ORIGINAL]
+ .tuple.dst.protonum)],
+ &info->bysource);
+
+ LIST_DELETE(&byipsproto
+ [hash_by_ipsproto(conn->tuplehash[IP_CT_DIR_REPLY]
+ .tuple.src.ip,
+ conn->tuplehash[IP_CT_DIR_REPLY]
+ .tuple.dst.ip,
+ conn->tuplehash[IP_CT_DIR_REPLY]
+ .tuple.dst.protonum)],
+ &info->byipsproto);
+}
+
+
/* We do checksum mangling, so if they were wrong before they're still
* wrong. Also works for incomplete packets (eg. ICMP dest
* unreachables.) */
@@ -131,9 +160,21 @@
We could keep a separate hash if this proves too slow. */
struct ip_conntrack_tuple reply;
+ struct ip_conntrack_tuple_hash *h;
invert_tuplepr(&reply, tuple);
- return ip_conntrack_tuple_taken(&reply, ignored_conntrack);
+ h = ip_conntrack_tuple_taken(&reply, ignored_conntrack);
+
+ if ((h != NULL) && test_bit(IPS_MAY_DELETE_BIT, &h->ctrack->status)) {
+ DEBUGP(KERN_DEBUG "Deleting old conntrack entry for NAT\n");
+ __ip_nat_cleanup_conntrack(h->ctrack);
+ h->ctrack->nat.info.initialized = 0;
+ if (del_timer(&h->ctrack->timeout))
+ h->ctrack->timeout.function((unsigned long)h->ctrack);
+ h = NULL;
+ }
+
+ return h != NULL;
}
/* Does tuple + the source manip come within the range mr */
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: NAT && TIME_WAIT TCP connections
2003-09-29 12:20 NAT && TIME_WAIT TCP connections Kovacs Krisztian
@ 2003-10-02 20:05 ` Harald Welte
0 siblings, 0 replies; 2+ messages in thread
From: Harald Welte @ 2003-10-02 20:05 UTC (permalink / raw)
To: Kovacs Krisztian; +Cc: Netfilter Devel
[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]
On Mon, Sep 29, 2003 at 02:20:43PM +0200, Kovacs Krisztian wrote:
> While the upper example may look a bit extreme, such problems occur
> much more often when using the TPROXY patch and a transparent SQUID proxy.
> The attached patch helped a lot in these cases (and after modifying
> ip_conntrack_proto_tcp.c accordingly, to mark TPROXY-ed TCP connections
> 'deletable' when they reach the TIME_WAIT state).
>
> Any comments? (I don't like the idea of deleting conntrack entries in
> ip_nat_setup_info(), however, I don't have a better idea.)
It's questionable whether we should optimize for minimization of tuple
allocations at the cost of run time performance :(
I'm really not sure if it's worth the effort.
> Regards,
> Krisztian KOVACS
--
- Harald Welte <laforge@netfilter.org> http://www.netfilter.org/
============================================================================
"Fragmentation is like classful addressing -- an interesting early
architectural error that shows how much experimentation was going
on while IP was being designed." -- Paul Vixie
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2003-10-02 20:05 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-29 12:20 NAT && TIME_WAIT TCP connections Kovacs Krisztian
2003-10-02 20:05 ` Harald Welte
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.