Re: TCP connection tracking timeout

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: TCP connection tracking timeout
       [not found] <20080729030104.GA15915@gondor.apana.org.au>
@ 2008-07-29  4:47 ` Patrick McHardy
  2008-07-29  5:00   ` Patrick McHardy
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick McHardy @ 2008-07-29  4:47 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Netfilter Developer Mailing List

[List address fixed - I assume netfilter-devel@lists.debian.org doesn't 
exist :)]

Herbert Xu wrote:
> Hi:
>
> I've recently started keeping an eye on the number of connections
> in my router's conntrack table.  It was sad to see so many TCP
> connections that have died long ago still lingering in it.  We all
> know that wandering ghosts are bad :)
>
> Here's my proposal to lay them to rest once and for all.  The
> obvious solution is to reduce the timeout.  However, that runs
> afoul of idle connections.  So the key is how do we tell an
> idle connection apart from a dead one.
>
> Actually it isn't too hard.  The most common reason for a connection
> to die without sending FIN/RST is a retransmission timeout.  For
> example in Linux we can enter FIN_WAIT_1 without even transmitting
> the actual FIN because of outstanding data before it.  So if we
> tracked whether each connection has unacknowledged data then we
> will be able to easily distinguish them.  In other words, we can
> drastically lower the timeout on a connection with data outstanding.
>
> The only trouble now is to find a sucker^H^H^H^H^H^Hvolunteer
> to implement this :)
>   


That sounds like a pretty neat idea. I'm testing a patch now, I'll
send it over in a few minutes if it survives :)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29  4:47 ` TCP connection tracking timeout Patrick McHardy
@ 2008-07-29  5:00   ` Patrick McHardy
  2008-07-29  6:01     ` Herbert Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick McHardy @ 2008-07-29  5:00 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Netfilter Developer Mailing List

[-- Attachment #1: Type: text/plain, Size: 1563 bytes --]

Patrick McHardy wrote:
> [List address fixed - I assume netfilter-devel@lists.debian.org 
> doesn't exist :)]
>
> Herbert Xu wrote:
>> Hi:
>>
>> I've recently started keeping an eye on the number of connections
>> in my router's conntrack table.  It was sad to see so many TCP
>> connections that have died long ago still lingering in it.  We all
>> know that wandering ghosts are bad :)
>>
>> Here's my proposal to lay them to rest once and for all.  The
>> obvious solution is to reduce the timeout.  However, that runs
>> afoul of idle connections.  So the key is how do we tell an
>> idle connection apart from a dead one.
>>
>> Actually it isn't too hard.  The most common reason for a connection
>> to die without sending FIN/RST is a retransmission timeout.  For
>> example in Linux we can enter FIN_WAIT_1 without even transmitting
>> the actual FIN because of outstanding data before it.  So if we
>> tracked whether each connection has unacknowledged data then we
>> will be able to easily distinguish them.  In other words, we can
>> drastically lower the timeout on a connection with data outstanding.
>>
>> The only trouble now is to find a sucker^H^H^H^H^H^Hvolunteer
>> to implement this :)
>>   
>
>
> That sounds like a pretty neat idea. I'm testing a patch now, I'll
> send it over in a few minutes if it survives :)

This seems to work. I'm wondering however if this will really help.
We already track retransmissions and decrease the timeout on the
3rd retransmission, so this should only help if both the sender and
the receiver went down.



[-- Attachment #2: x --]
[-- Type: text/plain, Size: 3254 bytes --]

diff --git a/include/linux/netfilter/nf_conntrack_tcp.h b/include/linux/netfilter/nf_conntrack_tcp.h
index 22ce299..a049df4 100644
--- a/include/linux/netfilter/nf_conntrack_tcp.h
+++ b/include/linux/netfilter/nf_conntrack_tcp.h
@@ -30,6 +30,9 @@ enum tcp_conntrack {
 /* Be liberal in window checking */
 #define IP_CT_TCP_FLAG_BE_LIBERAL		0x08
 
+/* Has unacknowledged data */
+#define IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED	0x10
+
 struct nf_ct_tcp_flags {
 	u_int8_t flags;
 	u_int8_t mask;
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 420a10d..6f61261 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -67,7 +67,8 @@ static const char *const tcp_conntrack_names[] = {
 /* RFC1122 says the R2 limit should be at least 100 seconds.
    Linux uses 15 packets as limit, which corresponds
    to ~13-30min depending on RTO. */
-static unsigned int nf_ct_tcp_timeout_max_retrans __read_mostly =   5 MINS;
+static unsigned int nf_ct_tcp_timeout_max_retrans __read_mostly    =   5 MINS;
+static unsigned int nf_ct_tcp_timeout_unacknowledged __read_mostly =   5 MINS;
 
 static unsigned int tcp_timeouts[TCP_CONNTRACK_MAX] __read_mostly = {
 	[TCP_CONNTRACK_SYN_SENT]	= 2 MINS,
@@ -625,8 +626,10 @@ static bool tcp_in_window(const struct nf_conn *ct,
 		swin = win + (sack - ack);
 		if (sender->td_maxwin < swin)
 			sender->td_maxwin = swin;
-		if (after(end, sender->td_end))
+		if (after(end, sender->td_end)) {
 			sender->td_end = end;
+			sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
+		}
 		/*
 		 * Update receiver data.
 		 */
@@ -637,6 +640,8 @@ static bool tcp_in_window(const struct nf_conn *ct,
 			if (win == 0)
 				receiver->td_maxend++;
 		}
+		if (ack == receiver->td_end)
+			receiver->flags &= ~IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
 
 		/*
 		 * Check retransmissions.
@@ -951,9 +956,16 @@ static int tcp_packet(struct nf_conn *ct,
 	if (old_state != new_state
 	    && new_state == TCP_CONNTRACK_FIN_WAIT)
 		ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT;
-	timeout = ct->proto.tcp.retrans >= nf_ct_tcp_max_retrans
-		  && tcp_timeouts[new_state] > nf_ct_tcp_timeout_max_retrans
-		  ? nf_ct_tcp_timeout_max_retrans : tcp_timeouts[new_state];
+
+	if (ct->proto.tcp.retrans >= nf_ct_tcp_max_retrans &&
+	    tcp_timeouts[new_state] > nf_ct_tcp_timeout_max_retrans)
+		timeout = nf_ct_tcp_timeout_max_retrans;
+	else if ((ct->proto.tcp.seen[0].flags | ct->proto.tcp.seen[1].flags) &
+		 IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED &&
+		 tcp_timeouts[new_state] > nf_ct_tcp_timeout_unacknowledged)
+		timeout = nf_ct_tcp_timeout_unacknowledged;
+	else
+		timeout = tcp_timeouts[new_state];
 	write_unlock_bh(&tcp_lock);
 
 	nf_conntrack_event_cache(IPCT_PROTOINFO_VOLATILE, skb);
@@ -1236,6 +1248,13 @@ static struct ctl_table tcp_sysctl_table[] = {
 		.proc_handler	= &proc_dointvec_jiffies,
 	},
 	{
+		.procname	= "nf_conntrack_tcp_timeout_unacknowledged",
+		.data		= &nf_ct_tcp_timeout_unacknowledged,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_jiffies,
+	},
+	{
 		.ctl_name	= NET_NF_CONNTRACK_TCP_LOOSE,
 		.procname	= "nf_conntrack_tcp_loose",
 		.data		= &nf_ct_tcp_loose,

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29  5:00   ` Patrick McHardy
@ 2008-07-29  6:01     ` Herbert Xu
  2008-07-29  6:13       ` Patrick McHardy
  0 siblings, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2008-07-29  6:01 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Developer Mailing List

On Tue, Jul 29, 2008 at 07:00:46AM +0200, Patrick McHardy wrote:
> Patrick McHardy wrote:
> >[List address fixed - I assume netfilter-devel@lists.debian.org 
> >doesn't exist :)]

Heh my fingers were doing the thinking :)

> >That sounds like a pretty neat idea. I'm testing a patch now, I'll
> >send it over in a few minutes if it survives :)

Awesome! I'll try this out on my router.

> This seems to work. I'm wondering however if this will really help.
> We already track retransmissions and decrease the timeout on the
> 3rd retransmission, so this should only help if both the sender and
> the receiver went down.

The problem with requiring a 3rd retransmit is that once a socket
is orphaned (closed) on Linux, we fast-track the retransmit timeout
process.  In particular, if it's already maxed out the RTO prior
to closing, we'll kill it straight away.

This is in violation of the RFCs but it's an important optimisation.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29  6:01     ` Herbert Xu
@ 2008-07-29  6:13       ` Patrick McHardy
  2008-07-29  9:07         ` Herbert Xu
  2008-07-29 12:20         ` Jozsef Kadlecsik
  0 siblings, 2 replies; 16+ messages in thread
From: Patrick McHardy @ 2008-07-29  6:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Netfilter Developer Mailing List, Jozsef Kadlecsik

Herbert Xu wrote:
> On Tue, Jul 29, 2008 at 07:00:46AM +0200, Patrick McHardy wrote:
>>> That sounds like a pretty neat idea. I'm testing a patch now, I'll
>>> send it over in a few minutes if it survives :)
>>>       
>
> Awesome! I'll try this out on my router.
>
>   
>> This seems to work. I'm wondering however if this will really help.
>> We already track retransmissions and decrease the timeout on the
>> 3rd retransmission, so this should only help if both the sender and
>> the receiver went down.
>>     
>
> The problem with requiring a 3rd retransmit is that once a socket
> is orphaned (closed) on Linux, we fast-track the retransmit timeout
> process.  In particular, if it's already maxed out the RTO prior
> to closing, we'll kill it straight away.
>
> This is in violation of the RFCs but it's an important optimisation.

Thanks for this explanation. Unless Jozsef sees something wrong with this
patch, I'll queue it with a proper changelog. Its small enough, so perhaps
we can even put this in 2.6.27.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29  6:13       ` Patrick McHardy
@ 2008-07-29  9:07         ` Herbert Xu
  2008-07-29  9:30           ` Herbert Xu
  2008-07-29 12:30           ` Jozsef Kadlecsik
  2008-07-29 12:20         ` Jozsef Kadlecsik
  1 sibling, 2 replies; 16+ messages in thread
From: Herbert Xu @ 2008-07-29  9:07 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Developer Mailing List, Jozsef Kadlecsik

On Tue, Jul 29, 2008 at 08:13:05AM +0200, Patrick McHardy wrote:
> 
> Thanks for this explanation. Unless Jozsef sees something wrong with this
> patch, I'll queue it with a proper changelog. Its small enough, so perhaps
> we can even put this in 2.6.27.

It killed 95% of the ghosts (if you can do such a thing :)

I've tried to determine the cause of the remaining spirits:

1) Zero-window receivers.  This is like not acking without actually
doing it.  So we should treat zero windows in the same way as an
unacknowledged packet (with each probe/zero-window extending its
life time).

2) We've got a single state for each connection.  This doesn't
work because each TCP connection really consists of two disparate
streams.  Their states need to be tracked separately.

Otherwise we can for example enter TIME_WAIT when only one direction
has been shut down.  This then causes the tracked connection to be
pruned quickly, after which the other direction may reinstall
it in the ESTABLISHED state with a pure ack and it could remain
forever should that direction then die.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29  9:07         ` Herbert Xu
@ 2008-07-29  9:30           ` Herbert Xu
  2008-07-29 12:30           ` Jozsef Kadlecsik
  1 sibling, 0 replies; 16+ messages in thread
From: Herbert Xu @ 2008-07-29  9:30 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Developer Mailing List, Jozsef Kadlecsik

On Tue, Jul 29, 2008 at 05:07:37PM +0800, Herbert Xu wrote:
>
> 1) Zero-window receivers.  This is like not acking without actually
> doing it.  So we should treat zero windows in the same way as an
> unacknowledged packet (with each probe/zero-window extending its
> life time).

Oh and this should probably only be done after we detect the
first zero probe heading the other way and coming back with
a second zero window (just in case the zero window happens to
match the completion of data on the sending side).

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29  9:07         ` Herbert Xu
  2008-07-29  9:30           ` Herbert Xu
@ 2008-07-29 12:30           ` Jozsef Kadlecsik
  2008-07-29 13:34             ` Herbert Xu
  1 sibling, 1 reply; 16+ messages in thread
From: Jozsef Kadlecsik @ 2008-07-29 12:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick McHardy, Netfilter Developer Mailing List

Hi,

On Tue, 29 Jul 2008, Herbert Xu wrote:

> On Tue, Jul 29, 2008 at 08:13:05AM +0200, Patrick McHardy wrote:
> > 
> > Thanks for this explanation. Unless Jozsef sees something wrong with this
> > patch, I'll queue it with a proper changelog. Its small enough, so perhaps
> > we can even put this in 2.6.27.
> 
> It killed 95% of the ghosts (if you can do such a thing :)
> 
> I've tried to determine the cause of the remaining spirits:
> 
> 1) Zero-window receivers.  This is like not acking without actually
> doing it.  So we should treat zero windows in the same way as an
> unacknowledged packet (with each probe/zero-window extending its
> life time).
> 
> 2) We've got a single state for each connection.  This doesn't
> work because each TCP connection really consists of two disparate
> streams.  Their states need to be tracked separately.
> 
> Otherwise we can for example enter TIME_WAIT when only one direction
> has been shut down.  This then causes the tracked connection to be
> pruned quickly, after which the other direction may reinstall
> it in the ESTABLISHED state with a pure ack and it could remain
> forever should that direction then die.

We always approximate the state of the sender from the packet it sends. As 
packets can be lost in transit (i.e we see a packet but it'll be lost), we 
cannot say for sure every time which is the actual state of *both* of the 
parties. Therefore we do not even attempt to track both states.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29 12:30           ` Jozsef Kadlecsik
@ 2008-07-29 13:34             ` Herbert Xu
  2008-07-29 20:34               ` Jozsef Kadlecsik
  0 siblings, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2008-07-29 13:34 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Patrick McHardy, Netfilter Developer Mailing List

On Tue, Jul 29, 2008 at 02:30:14PM +0200, Jozsef Kadlecsik wrote:
>
> We always approximate the state of the sender from the packet it sends. As 
> packets can be lost in transit (i.e we see a packet but it'll be lost), we 
> cannot say for sure every time which is the actual state of *both* of the 
> parties. Therefore we do not even attempt to track both states.

No I'm not suggesting that your state must be equal to that of
each side, what I'm saying is that the TCP state machine is
fundamentally divided into two directions.  If you're going
to track state at all you need to track both directions separately.
Otherwise the state transitions are simply bogus.

For example, TIME_WAIT is a state that only makes sense if
you look at a given direction.  The other direction may well
still be ESTABLISHED.  As it is netfilter will lower the timeout
when only a single direction has been shut down, thus causing
the connection to be prematurely killed.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29 13:34             ` Herbert Xu
@ 2008-07-29 20:34               ` Jozsef Kadlecsik
  2008-07-30  1:35                 ` Herbert Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Jozsef Kadlecsik @ 2008-07-29 20:34 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick McHardy, Netfilter Developer Mailing List

On Tue, 29 Jul 2008, Herbert Xu wrote:

> On Tue, Jul 29, 2008 at 02:30:14PM +0200, Jozsef Kadlecsik wrote:
> >
> > We always approximate the state of the sender from the packet it sends. As 
> > packets can be lost in transit (i.e we see a packet but it'll be lost), we 
> > cannot say for sure every time which is the actual state of *both* of the 
> > parties. Therefore we do not even attempt to track both states.
> 
> No I'm not suggesting that your state must be equal to that of each 
> side, what I'm saying is that the TCP state machine is fundamentally 
> divided into two directions. If you're going to track state at all you 
> need to track both directions separately. Otherwise the state 
> transitions are simply bogus.

Completely true - but the states in netfilter do no correspond one-to-one 
to the real TCP states. From nf_conntrack_proto_tcp.c:

 * The meaning of the states are:
 *
 * NONE:        initial state
 * SYN_SENT:    SYN-only packet seen
 * SYN_RECV:    SYN-ACK packet seen
 * ESTABLISHED: ACK packet seen
 * FIN_WAIT:    FIN packet seen
 * CLOSE_WAIT:  ACK seen (after FIN)
 * LAST_ACK:    FIN seen (after FIN)
 * TIME_WAIT:   last ACK seen
 * CLOSE:       closed connection

We try to get a general view of the connection. It is not rocket science 
what we are doing.

> For example, TIME_WAIT is a state that only makes sense if
> you look at a given direction.  The other direction may well
> still be ESTABLISHED.  As it is netfilter will lower the timeout
> when only a single direction has been shut down, thus causing
> the connection to be prematurely killed.

Hm, I might be completely outdated: how come that in one direction the 
state is TIME_WAIT and the other's ESTABLISHED? If one side is in the 
TIME_WAIT state, the other one cannot be in the ESTABLISHED state - at 
least according to RFC793, RFC1122. What do I miss here?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29 20:34               ` Jozsef Kadlecsik
@ 2008-07-30  1:35                 ` Herbert Xu
  2008-07-30 21:18                   ` Jozsef Kadlecsik
  0 siblings, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2008-07-30  1:35 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Patrick McHardy, Netfilter Developer Mailing List

On Tue, Jul 29, 2008 at 10:34:51PM +0200, Jozsef Kadlecsik wrote:
>
> > For example, TIME_WAIT is a state that only makes sense if
> > you look at a given direction.  The other direction may well
> > still be ESTABLISHED.  As it is netfilter will lower the timeout
> > when only a single direction has been shut down, thus causing
> > the connection to be prematurely killed.
> 
> Hm, I might be completely outdated: how come that in one direction the 
> state is TIME_WAIT and the other's ESTABLISHED? If one side is in the 
> TIME_WAIT state, the other one cannot be in the ESTABLISHED state - at 
> least according to RFC793, RFC1122. What do I miss here?

OK that was a stupid example.  All I'm trying say is that the
state transition as it stands is bogus.  Here is why:

If you see receive a FIN in one direction but no ack in the other,
you go from sES to sFW.  If that isn't acked and a FIN retransmit
occurs you go from sFW to sLA.  Now if the other direction is still
transmitting and you get an ACK in the direction being shut down,
it goes from sLA to sTW.  However, in reality the side that's still
transmitting may still be in ESTABLISHED because it never received
those FINs.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-30  1:35                 ` Herbert Xu
@ 2008-07-30 21:18                   ` Jozsef Kadlecsik
  2008-07-31 12:19                     ` Herbert Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Jozsef Kadlecsik @ 2008-07-30 21:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick McHardy, Netfilter Developer Mailing List

Hi,

On Wed, 30 Jul 2008, Herbert Xu wrote:

> All I'm trying say is that the state transition as it stands is bogus.  
> Here is why:
> 
> If you see receive a FIN in one direction but no ack in the other,
> you go from sES to sFW.  If that isn't acked and a FIN retransmit
> occurs you go from sFW to sLA.  Now if the other direction is still
> transmitting and you get an ACK in the direction being shut down,
> it goes from sLA to sTW.  However, in reality the side that's still
> transmitting may still be in ESTABLISHED because it never received
> those FINs.

Yes, absolutely right. From the comments of the state transition table:

 *      sFW -> sLA      FIN seen in both directions, waiting for
 *                      the last ACK.
 *                      Migth be a retransmitted FIN as well...

As we already handle a lot of exceptions, what do you think about the next 
(untested) patch?

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 6f61261..a9b3b8f 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -911,6 +911,15 @@ static int tcp_packet(struct nf_conn *ct,
 			nf_log_packet(pf, 0, skb, NULL, NULL, NULL,
 				  "nf_ct_tcp: invalid state ");
 		return -NF_ACCEPT;
+	case TCP_CONNTRACK_LAST_ACK:
+	case TCP_CONNTRACK_CLOSE_WAIT:
+		/* Check and compensate retransmitted FIN or
+		 * reordered ACK packets */
+		if (old_state == TCP_CONNTRACK_FIN_WAIT
+		    && (ct->proto.tcp.seen[dir].flags
+		        & IP_CT_TCP_FLAG_CLOSE_INIT))
+		        new_state = TCP_CONNTRACK_FIN_WAIT;
+		break;
 	case TCP_CONNTRACK_CLOSE:
 		if (index == TCP_RST_SET
 		    && ((test_bit(IPS_SEEN_REPLY_BIT, &ct->status)


Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-30 21:18                   ` Jozsef Kadlecsik
@ 2008-07-31 12:19                     ` Herbert Xu
  2008-07-31 19:26                       ` Jozsef Kadlecsik
  0 siblings, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2008-07-31 12:19 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Patrick McHardy, Netfilter Developer Mailing List

On Wed, Jul 30, 2008 at 11:18:42PM +0200, Jozsef Kadlecsik wrote:
>
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index 6f61261..a9b3b8f 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -911,6 +911,15 @@ static int tcp_packet(struct nf_conn *ct,
>  			nf_log_packet(pf, 0, skb, NULL, NULL, NULL,
>  				  "nf_ct_tcp: invalid state ");
>  		return -NF_ACCEPT;
> +	case TCP_CONNTRACK_LAST_ACK:
> +	case TCP_CONNTRACK_CLOSE_WAIT:
> +		/* Check and compensate retransmitted FIN or
> +		 * reordered ACK packets */
> +		if (old_state == TCP_CONNTRACK_FIN_WAIT
> +		    && (ct->proto.tcp.seen[dir].flags
> +		        & IP_CT_TCP_FLAG_CLOSE_INIT))
> +		        new_state = TCP_CONNTRACK_FIN_WAIT;

This is slightly better but it is still fundamentally broken.
Do we really need to establish a committee just to add a new
state? :)

We can squeeze out the requisite number of bits (4) from the
existing state variable.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-31 12:19                     ` Herbert Xu
@ 2008-07-31 19:26                       ` Jozsef Kadlecsik
  2008-08-01 11:58                         ` Herbert Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Jozsef Kadlecsik @ 2008-07-31 19:26 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick McHardy, Netfilter Developer Mailing List

On Thu, 31 Jul 2008, Herbert Xu wrote:

[...]
> This is slightly better but it is still fundamentally broken.
> Do we really need to establish a committee just to add a new
> state? :)
> 
> We can squeeze out the requisite number of bits (4) from the
> existing state variable.

So what do you suggest ;-)?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-31 19:26                       ` Jozsef Kadlecsik
@ 2008-08-01 11:58                         ` Herbert Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Herbert Xu @ 2008-08-01 11:58 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Patrick McHardy, Netfilter Developer Mailing List

On Thu, Jul 31, 2008 at 09:26:40PM +0200, Jozsef Kadlecsik wrote:
>
> So what do you suggest ;-)?

Out of a nutshell, let's divide the existing u8 state into two
separate states, each 4 bits wide.  They should be used to track
the two directions separately using the same logic that we currently
have.

The timout should then be computed as the maximum of that of each
state.  However, the overall timeout should be the minimum of this
and that given by the retransmit logic (This is because the logic
for retransmits is to detect the death of the connection at the
sender, in which case both directions evaporate).

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29  6:13       ` Patrick McHardy
  2008-07-29  9:07         ` Herbert Xu
@ 2008-07-29 12:20         ` Jozsef Kadlecsik
  2008-07-30 10:07           ` Patrick McHardy
  1 sibling, 1 reply; 16+ messages in thread
From: Jozsef Kadlecsik @ 2008-07-29 12:20 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Herbert Xu, Netfilter Developer Mailing List

Hi,

On Tue, 29 Jul 2008, Patrick McHardy wrote:

> Herbert Xu wrote:
> > On Tue, Jul 29, 2008 at 07:00:46AM +0200, Patrick McHardy wrote:
> > > > That sounds like a pretty neat idea. I'm testing a patch now, I'll
> > > > send it over in a few minutes if it survives :)
> >   
> > > This seems to work. I'm wondering however if this will really help.
> > > We already track retransmissions and decrease the timeout on the
> > > 3rd retransmission, so this should only help if both the sender and
> > > the receiver went down.
> > 
> > The problem with requiring a 3rd retransmit is that once a socket
> > is orphaned (closed) on Linux, we fast-track the retransmit timeout
> > process.  In particular, if it's already maxed out the RTO prior
> > to closing, we'll kill it straight away.
> > 
> > This is in violation of the RFCs but it's an important optimisation.
> 
> Thanks for this explanation. Unless Jozsef sees something wrong with this
> patch, I'll queue it with a proper changelog. Its small enough, so perhaps
> we can even put this in 2.6.27.

Seems to be fine - as we cannot assume everything runs Linux (;-), the 
only risk is that we may kill the conntrack entry too early and then it'll 
mark packets as INVALID unnecessarily. But 5min timeout when there are 
unacknowledged data should be enough to avoid it.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: TCP connection tracking timeout
  2008-07-29 12:20         ` Jozsef Kadlecsik
@ 2008-07-30 10:07           ` Patrick McHardy
  0 siblings, 0 replies; 16+ messages in thread
From: Patrick McHardy @ 2008-07-30 10:07 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Herbert Xu, Netfilter Developer Mailing List

Jozsef Kadlecsik wrote:
> Hi,
> 
> On Tue, 29 Jul 2008, Patrick McHardy wrote:
> 
>> Herbert Xu wrote:
>>> On Tue, Jul 29, 2008 at 07:00:46AM +0200, Patrick McHardy wrote:
>>>>> That sounds like a pretty neat idea. I'm testing a patch now, I'll
>>>>> send it over in a few minutes if it survives :)
>>>   
>>>> This seems to work. I'm wondering however if this will really help.
>>>> We already track retransmissions and decrease the timeout on the
>>>> 3rd retransmission, so this should only help if both the sender and
>>>> the receiver went down.
>>> The problem with requiring a 3rd retransmit is that once a socket
>>> is orphaned (closed) on Linux, we fast-track the retransmit timeout
>>> process.  In particular, if it's already maxed out the RTO prior
>>> to closing, we'll kill it straight away.
>>>
>>> This is in violation of the RFCs but it's an important optimisation.
>> Thanks for this explanation. Unless Jozsef sees something wrong with this
>> patch, I'll queue it with a proper changelog. Its small enough, so perhaps
>> we can even put this in 2.6.27.
> 
> Seems to be fine - as we cannot assume everything runs Linux (;-), the 
> only risk is that we may kill the conntrack entry too early and then it'll 
> mark packets as INVALID unnecessarily. But 5min timeout when there are 
> unacknowledged data should be enough to avoid it.

Thanks, I've queued the patch.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-08-01 11:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080729030104.GA15915@gondor.apana.org.au>
2008-07-29  4:47 ` TCP connection tracking timeout Patrick McHardy
2008-07-29  5:00   ` Patrick McHardy
2008-07-29  6:01     ` Herbert Xu
2008-07-29  6:13       ` Patrick McHardy
2008-07-29  9:07         ` Herbert Xu
2008-07-29  9:30           ` Herbert Xu
2008-07-29 12:30           ` Jozsef Kadlecsik
2008-07-29 13:34             ` Herbert Xu
2008-07-29 20:34               ` Jozsef Kadlecsik
2008-07-30  1:35                 ` Herbert Xu
2008-07-30 21:18                   ` Jozsef Kadlecsik
2008-07-31 12:19                     ` Herbert Xu
2008-07-31 19:26                       ` Jozsef Kadlecsik
2008-08-01 11:58                         ` Herbert Xu
2008-07-29 12:20         ` Jozsef Kadlecsik
2008-07-30 10:07           ` Patrick McHardy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.