Linux Netfilter discussions
 help / color / mirror / Atom feed
* conntrack and ESTABLISHED / UNREPLIED connections
@ 2008-07-03 20:08 Robert L Mathews
  2008-07-07 23:04 ` Robert L Mathews
  0 siblings, 1 reply; 9+ messages in thread
From: Robert L Mathews @ 2008-07-03 20:08 UTC (permalink / raw)
  To: netfilter

I've been having a problem with /proc/net/ip_conntrack showing many 
connections in a state like this for up to five days:

tcp      6 426339 ESTABLISHED src=64.62.209.98 dst=96.221.109.137 
sport=443 dport=50465 packets=2 bytes=178 [UNREPLIED] src=96.221.109.137 
dst=64.62.209.98 sport=50465 dport=443 packets=0 bytes=0 mark=0 
secmark=0 use=1

netstat doesn't show these as established connections.

This is a problem because I'm also using connlimit on the server, and 
these phantom connections build up until the connlimit rule thinks a 
limit has been exceeded and the client is blocked.

I've been able to capture a tcpdump of this from both ends, and the way 
the connection is closing appears to be odd (although I'm no tcp 
expert). Here's the close of the connection from the client's perspective:

11:23:30.108118 IP 192.168.1.7.50465 > 64.62.209.98.443: F 6111:6111(0) 
ack 28907 win 65535 <nop,nop,timestamp 393902224 13017587>
11:23:30.139599 IP 64.62.209.98.443 > 192.168.1.7.50465: P 
28907:28944(37) ack 6111 win 148 <nop,nop,timestamp 13018053 393902224>
11:23:30.139624 IP 192.168.1.7.50465 > 64.62.209.98.443: R 
3460428831:3460428831(0) win 0

As you can see, it sends a FIN, then a RST, then it thinks it's done. 
Here's the close of the connection from the server's perspective:

11:23:30.178131 IP 64.62.209.98.443 > 96.221.109.137.50465: F 
28944:28944(0) ack 6111 win 148 <nop,nop,timestamp 13018053 393902224>
11:23:30.178471 IP 96.221.109.137.50465 > 64.62.209.98.443: F 
6111:6111(0) ack 28907 win 65535 <nop,nop,timestamp 393902224 13017587>
11:23:30.178486 IP 64.62.209.98.443 > 96.221.109.137.50465: . ack 6112 
win 148 <nop,nop,timestamp 13018053 393902224>
11:23:30.209702 IP 96.221.109.137.50465 > 64.62.209.98.443: R 
3460428831:3460428831(0) win 0
11:23:30.456820 IP 64.62.209.98.443 > 96.221.109.137.50465: P 
28907:28944(37) ack 6112 win 148 <nop,nop,timestamp 13018123 393902224>
11:23:31.016813 IP 64.62.209.98.443 > 96.221.109.137.50465: P 
28907:28944(37) ack 6112 win 148 <nop,nop,timestamp 13018263 393902224>

(Then the last lines are then repeated several more times over several 
minutes.)

What seems to be happening is that the server is sending a FIN, then 
expecting an ack of that, but instead it receives a RST. This results in 
a closed connection according to netstat, but conntrack thinks it's 
still ESTABLISHED until it times out five days later.

Am I understanding correctly? How can I avoid connlimit thinking that 
these connections are still established for days?

More details: the client is Mac OS X 10.4.11; the server is Debian Linux 
running a stock Debian kernel 2.6.24. This is a connection from Firefox 
2.0.0.14 on the client to Apache 2.2.3 (with a short 2 second keepalive 
timeout) on the server.

The full tcpdumps of the entire session are available at:

  http://tigertech.net/20080703.tcpdump.client.txt
  http://tigertech.net/20080703.tcpdump.server.txt

Thanks for any advice!

-- 
Robert L Mathews, Tiger Technologies

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: conntrack and ESTABLISHED / UNREPLIED connections
  2008-07-03 20:08 conntrack and ESTABLISHED / UNREPLIED connections Robert L Mathews
@ 2008-07-07 23:04 ` Robert L Mathews
  2008-07-08  9:42   ` Jozsef Kadlecsik
  0 siblings, 1 reply; 9+ messages in thread
From: Robert L Mathews @ 2008-07-07 23:04 UTC (permalink / raw)
  To: netfilter

Forgive me for replying to my own message, but I didn't get any replies 
from the list, and I've found out more info.

Summary of the previous problem: On our Web servers, conntrack shows 
many ESTABLISHED / UNREPLIED connections, like this:

tcp      6 426339 ESTABLISHED src=64.62.209.98 dst=96.221.109.137 
sport=443 dport=50465 packets=2 bytes=178 [UNREPLIED] src=96.221.109.137 
dst=64.62.209.98 sport=50465 dport=443 packets=0 bytes=0 mark=0 
secmark=0 use=1

These are not real connections (they don't appear in netstat), and they 
linger for five days before they go away. This can cause serious 
problems when using connlimit, because it incorrectly counts these 
phantom connections as valid.

After much debugging, I now see exactly how these "connections" are 
getting created. What's happening is this (I think):

A client makes an HTTP connection and sends a request. The client 
decides to close that connection, sending a FIN packet. However, just 
before the server receives the FIN packet, it sent a data packet with 
the ack bit set. When that data packet arrives at the client, the client 
considers the connection to be closed, so it replies with a RST.

On the server, conntrack sees that RST and marks the connection as state 
CLOSEd, then DESTROYed. But there's a problem -- the server is still 
repeatedly trying to resend that final data packet because it was never 
ACKed.

So conntrack sees the resent outgoing packet with the ack bit set, but 
it doesn't know about an established connection (that connection was 
destroyed by the RST). This makes conntrack create a new outgoing 
ESTABLISHED "connection" that doesn't really exist, but which lingers 
for 5 days. This appears to happen because the TCP state transition 
table for the original direction of nf_conntrack_proto_tcp.c assumes 
that sNO -> sES is automatically a valid, established connection.

This is a problem. On each of my Web servers, there are thousands of 
such phantom "outgoing source port 80/443" connections being tracked by 
netfilter, and it causes incorrect matches of my connlimit rule.

Is this a known issue? How do other people work around it? Assuming this 
is normal behavior, what I really want is for conntrack to ignore any 
bogus outgoing packets that appear to be "from" port 80 or 443 on the 
server -- my servers never open new outgoing connections with those 
source ports -- but I can't figure out how to do that, despite much 
playing around with the "raw" table.

Something like this sounds like it should work:

  iptables -t raw -I OUTPUT -p tcp --sport 80 -j NOTRACK

But in practice that breaks the normal conntrack reply tracking for an 
incoming connection.

Any suggestions? Thanks for your time!

-- 
Robert L Mathews, Tiger Technologies

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: conntrack and ESTABLISHED / UNREPLIED connections
  2008-07-07 23:04 ` Robert L Mathews
@ 2008-07-08  9:42   ` Jozsef Kadlecsik
  2008-07-08 17:38     ` Robert L Mathews
  0 siblings, 1 reply; 9+ messages in thread
From: Jozsef Kadlecsik @ 2008-07-08  9:42 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter

On Mon, 7 Jul 2008, Robert L Mathews wrote:

> Summary of the previous problem: On our Web servers, conntrack shows 
> many ESTABLISHED / UNREPLIED connections, like this:
> 
> tcp      6 426339 ESTABLISHED src=64.62.209.98 dst=96.221.109.137 sport=443
> dport=50465 packets=2 bytes=178 [UNREPLIED] src=96.221.109.137
> dst=64.62.209.98 sport=50465 dport=443 packets=0 bytes=0 mark=0 secmark=0
> use=1
> 
> These are not real connections (they don't appear in netstat), and they linger
> for five days before they go away. This can cause serious problems when using
> connlimit, because it incorrectly counts these phantom connections as valid.
> 
> After much debugging, I now see exactly how these "connections" are getting
> created. What's happening is this (I think):
> 
> A client makes an HTTP connection and sends a request. The client decides to
> close that connection, sending a FIN packet. However, just before the server
> receives the FIN packet, it sent a data packet with the ack bit set. When that
> data packet arrives at the client, the client considers the connection to be
> closed, so it replies with a RST.

If the data packet sent by the server is valid, the client should send an 
ACK and not a RST packet. If the data packet is invalid, the client should 
send an ACK again and not a RST. 
 
> On the server, conntrack sees that RST and marks the connection as state
> CLOSEd, then DESTROYed. But there's a problem -- the server is still
> repeatedly trying to resend that final data packet because it was never ACKed.

If the server receives the RST packet (and it's valid) it should never 
ever send unsent data but destroy the connection.

Something is definitely wrong at the client, the server or between.
 
> So conntrack sees the resent outgoing packet with the ack bit set, but it
> doesn't know about an established connection (that connection was destroyed by
> the RST). This makes conntrack create a new outgoing ESTABLISHED "connection"
> that doesn't really exist, but which lingers for 5 days. This appears to
> happen because the TCP state transition table for the original direction of
> nf_conntrack_proto_tcp.c assumes that sNO -> sES is automatically a valid,
> established connection.
> 
> This is a problem. On each of my Web servers, there are thousands of such
> phantom "outgoing source port 80/443" connections being tracked by netfilter,
> and it causes incorrect matches of my connlimit rule.

Disable picking up connections and you get rid of those stale connections:

# echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: conntrack and ESTABLISHED / UNREPLIED connections
  2008-07-08  9:42   ` Jozsef Kadlecsik
@ 2008-07-08 17:38     ` Robert L Mathews
  2008-07-09  6:54       ` Jozsef Kadlecsik
  0 siblings, 1 reply; 9+ messages in thread
From: Robert L Mathews @ 2008-07-08 17:38 UTC (permalink / raw)
  To: netfilter

Jozsef Kadlecsik wrote:

> If the data packet sent by the server is valid, the client should send an 
> ACK and not a RST packet. If the data packet is invalid, the client should 
> send an ACK again and not a RST. 

Actually, unless I'm misunderstanding, RFC1122 section 4.2.2.13 seems to 
allow what the client is doing:

   A host MAY implement a "half-duplex" TCP close sequence, so
   that an application that has called CLOSE cannot continue to
   read data from the connection.  If such a host issues a
   CLOSE call while received data is still pending in TCP, or
   if new data is received after CLOSE is called, its TCP
   SHOULD send a RST to show that data was lost.

So in this case, the client app closed the connection even though there 
was data from the server that hadn't been delivered, and the client's 
TCP stack replied with a RST as described above.


> If the server receives the RST packet (and it's valid) it should never 
> ever send unsent data but destroy the connection.

That's what I would have thought, but packet dumps are showing 
otherwise. On a standard Debian 2.6.24 kernel it keeps retrying the 
un-acked packet for more than a minute despite the RST, as shown at the 
end of:

  http://tigertech.net/20080703.tcpdump.server.txt

Are you sure that a normal kernel doesn't do this? Maybe it's a Debian 
bug? If you have access to a fairly busy Web server, you can see if 
yours is doing it with:

  egrep 'ESTABLISHED.+port=(80|443).+UNREPLIED' /proc/net/ip_conntrack

Under normal circumstances, there should be none of these, but all our 
Web servers have thousands (from many hundreds of different client IPs).


> Disable picking up connections and you get rid of those stale connections:
> 
> # echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

Thanks! That's exactly what I was looking for, and I really appreciate it.

(I also had the idea of changing nf_conntrack_tcp_timeout_close to be 
120, like nf_conntrack_tcp_timeout_fin_wait, which would hopefully cause 
conntrack to remember the first closed connection for long enough that 
it doesn't think the retransmitted outbound packet is new.)

-- 
Robert L Mathews, Tiger Technologies

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: conntrack and ESTABLISHED / UNREPLIED connections
  2008-07-08 17:38     ` Robert L Mathews
@ 2008-07-09  6:54       ` Jozsef Kadlecsik
  2008-07-09 16:22         ` Robert L Mathews
  0 siblings, 1 reply; 9+ messages in thread
From: Jozsef Kadlecsik @ 2008-07-09  6:54 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter

On Tue, 8 Jul 2008, Robert L Mathews wrote:

> Jozsef Kadlecsik wrote:
> 
> > If the data packet sent by the server is valid, the client should send an
> > ACK and not a RST packet. If the data packet is invalid, the client should
> > send an ACK again and not a RST. 
> 
> Actually, unless I'm misunderstanding, RFC1122 section 4.2.2.13 seems to allow
> what the client is doing:
> 
>   A host MAY implement a "half-duplex" TCP close sequence, so
>   that an application that has called CLOSE cannot continue to
>   read data from the connection.  If such a host issues a
>   CLOSE call while received data is still pending in TCP, or
>   if new data is received after CLOSE is called, its TCP
>   SHOULD send a RST to show that data was lost.
> 
> So in this case, the client app closed the connection even though there was
> data from the server that hadn't been delivered, and the client's TCP stack
> replied with a RST as described above.

Yes, that's correct and I was wrong above.
  
> > If the server receives the RST packet (and it's valid) it should never ever
> > send unsent data but destroy the connection.
> 
> That's what I would have thought, but packet dumps are showing otherwise. On a
> standard Debian 2.6.24 kernel it keeps retrying the un-acked packet for more
> than a minute despite the RST, as shown at the end of:
> 
>  http://tigertech.net/20080703.tcpdump.server.txt

Is the RST segment valid? Could you create a dump file using the '-S' flag 
of tcpdump so that not relative but absolute sequence numbers are printed?
 
Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: conntrack and ESTABLISHED / UNREPLIED connections
  2008-07-09  6:54       ` Jozsef Kadlecsik
@ 2008-07-09 16:22         ` Robert L Mathews
  2008-07-10 12:44           ` Jozsef Kadlecsik
  0 siblings, 1 reply; 9+ messages in thread
From: Robert L Mathews @ 2008-07-09 16:22 UTC (permalink / raw)
  To: netfilter

Jozsef Kadlecsik wrote:

> Is the RST segment valid? Could you create a dump file using the '-S' flag 
> of tcpdump so that not relative but absolute sequence numbers are printed?

Sure thing; here's another one with -S:

  http://tigertech.net/20080709.tcpdump.server.txt

-- 
Robert L Mathews, Tiger Technologies

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: conntrack and ESTABLISHED / UNREPLIED connections
  2008-07-09 16:22         ` Robert L Mathews
@ 2008-07-10 12:44           ` Jozsef Kadlecsik
  2008-07-11  0:45             ` Robert L Mathews
  0 siblings, 1 reply; 9+ messages in thread
From: Jozsef Kadlecsik @ 2008-07-10 12:44 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter

On Wed, 9 Jul 2008, Robert L Mathews wrote:

> > Is the RST segment valid? Could you create a dump file using the '-S' flag
> > of tcpdump so that not relative but absolute sequence numbers are printed?
> 
> Sure thing; here's another one with -S:
> 
>  http://tigertech.net/20080709.tcpdump.server.txt

The end of the TCP session is as follows:

08:55:16.499702 IP 64.62.209.98.443 > 96.221.109.137.49553: P 
716440962:716440999(37) ack 2354211888 win 71 
<nop,nop,timestamp 140394633 528033351>

Server sends data still pending to the client.

08:55:16.499799 IP 64.62.209.98.443 > 96.221.109.137.49553: F 
716440999:716440999(0) ack 2354211888 win 71 
<nop,nop,timestamp 140394633 528033351>

Then the server sends a connection termination request.

08:55:16.500008 IP 96.221.109.137.49553 > 64.62.209.98.443: F 
2354211888:2354211888(0) ack 716440962 win 65535 
<nop,nop,timestamp 528033351 140394349>

Client did not receive the last two packets from the server but sends
a connection termination request too.

08:55:16.500037 IP 64.62.209.98.443 > 96.221.109.137.49553: . ack 
2354211889 win 71 <nop,nop,timestamp 140394633 528033351>

Server ACKs that it received the FIN packet from the client.

08:55:16.529487 IP 96.221.109.137.49553 > 64.62.209.98.443: R 
2354211888:2354211888(0) win 0

Client sends RST, which is out of (before) the window (left edge is at 
2354211889), thus ignored by the server.

08:55:16.740815 IP 64.62.209.98.443 > 96.221.109.137.49553: P 
716440962:716440999(37) ack 2354211889 win 71 
<nop,nop,timestamp 140394693 528033351>

Server tries to send the data still pending.

You wrote the client runs Mac OS X 10.4.11. I don't really know what's 
wrong with it but it seems as a client related issue - or an ISP between 
the client and server which tries to generate fake RST packets to tear 
down connections.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: conntrack and ESTABLISHED / UNREPLIED connections
  2008-07-10 12:44           ` Jozsef Kadlecsik
@ 2008-07-11  0:45             ` Robert L Mathews
  2008-07-14  8:55               ` Jozsef Kadlecsik
  0 siblings, 1 reply; 9+ messages in thread
From: Robert L Mathews @ 2008-07-11  0:45 UTC (permalink / raw)
  To: netfilter

Jozsef Kadlecsik wrote:

> Client sends RST, which is out of (before) the window (left edge is at 
> 2354211889), thus ignored by the server.

That makes sense, but it seems like conntrack processes the RST and 
marks the original connection as closed, then treats the server resends 
as new outgoing connections, which doesn't seem right.

In other words, if the server's TCP stack is ignoring the RST, shouldn't 
conntrack ignore it, too?

It apparently doesn't -- I used the "conntrack -E" tool to show a log of 
these connections, and it definitely shows it handling the RST, then 
detecting a new connection in the other direction from the retried 
outgoing packets.

Here's an example of one:

  http://www.tigertech.net/20080710.txt

This includes the tcpdump, plus (at the end) the output of the 
"conntrack" tool used in both directions, showing how it incorrectly 
detected a new connection in the outbound direction. Unfortunately the 
conntrack output doesn't show timestamps, but I was watching it, and the 
spurious outbound "connection" was detected during the retries, within 
maybe 30 seconds after the incoming connection was DESTROYed.


> You wrote the client runs Mac OS X 10.4.11. I don't really know what's 
> wrong with it but it seems as a client related issue - or an ISP between 
> the client and server which tries to generate fake RST packets to tear 
> down connections.

Whatever it is, it's unfortunately quite common -- as I said, any of our 
reasonably-busy Web servers show thousands of such phantom connections 
from hundreds of unique IP addresses.

-- 
Robert L Mathews, Tiger Technologies

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: conntrack and ESTABLISHED / UNREPLIED connections
  2008-07-11  0:45             ` Robert L Mathews
@ 2008-07-14  8:55               ` Jozsef Kadlecsik
  0 siblings, 0 replies; 9+ messages in thread
From: Jozsef Kadlecsik @ 2008-07-14  8:55 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter

On Thu, 10 Jul 2008, Robert L Mathews wrote:

> Jozsef Kadlecsik wrote:
> 
> > Client sends RST, which is out of (before) the window (left edge is at
> > 2354211889), thus ignored by the server.
> 
> That makes sense, but it seems like conntrack processes the RST and marks the
> original connection as closed, then treats the server resends as new outgoing
> connections, which doesn't seem right.
> 
> In other words, if the server's TCP stack is ignoring the RST, shouldn't
> conntrack ignore it, too?

In theory, yes. But conntrack does not have got the same info as the end 
nodes, e.g. packets which were seen and taken into account by conntrack 
may get lost in transit. The window sizes in netfilter can easily be 
slightly wider than the real window sizes at the sender and the receiver. 
This is the reason why conntrack may think the RST segment is valid and 
therefore destroys the conntrack entry.
 
> Whatever it is, it's unfortunately quite common -- as I said, any of our
> reasonably-busy Web servers show thousands of such phantom connections from
> hundreds of unique IP addresses.

Still, it's quite strange. Why such RST packets are generated at all?

"Late" packets (i.e. packets which are received after a connection is 
destroyed in conntrack) creates "phantom" connections if 
nf_conntrack_tcp_be_liberal is enabled (which is the default). We have to 
balance on a two-edge sword: if we keep the connection in memory too long 
after we have seen a complete termination (FIN-FIN-ACK or RST), then we 
waste memory and conntrack is susceptible to DoS attacks; if we destroy 
the connections too fast, the late packets create phantom connections or 
false alarms.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-07-14  8:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-03 20:08 conntrack and ESTABLISHED / UNREPLIED connections Robert L Mathews
2008-07-07 23:04 ` Robert L Mathews
2008-07-08  9:42   ` Jozsef Kadlecsik
2008-07-08 17:38     ` Robert L Mathews
2008-07-09  6:54       ` Jozsef Kadlecsik
2008-07-09 16:22         ` Robert L Mathews
2008-07-10 12:44           ` Jozsef Kadlecsik
2008-07-11  0:45             ` Robert L Mathews
2008-07-14  8:55               ` Jozsef Kadlecsik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox