netfilter.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* conntrack and RSTs received during CLOSE_WAIT
@ 2009-05-15 22:10 Robert L Mathews
  2009-05-16 21:57 ` Jozsef Kadlecsik
  0 siblings, 1 reply; 22+ messages in thread
From: Robert L Mathews @ 2009-05-15 22:10 UTC (permalink / raw)
  To: netfilter

I'm using Linux kernel 2.6.26 with conntrack/connlimit to prevent people 
from DOSing our Web servers by opening up too many simultaneous 
connections from one IP address. This is mostly for protection against 
unintentional DOSes from broken proxy servers that try to open up 
literally hundreds of simultaneous connections; we DROP their syn 
packets if they already have 40 connections open.

This is generally working well (and thanks to folks on this list for the 
hard work that makes this possible).

However: Some clients send evil TCP RSTs that confuse conntrack and 
break connlimit in a way that I'll detail below. First, here's a sample 
recreation:

  client > server [SYN] Seq=0 Len=0
  server > client [SYN,ACK] Seq=0 Ack=1 Len=0
  client > server [ACK] Seq=1 Ack=1 Len=0
  client > server [PSH,ACK] Seq=1 Ack=1 Len=420 (HTTP GET request)
  server > client [ACK] Seq=1 Ack=421 Len=0
  server > client [ACK] Seq=1 Ack=421 Len=1448    (HTTP response)
  server > client [ACK] Seq=1449 Ack=421 Len=1448 (more HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (more HTTP response)
  client > server [FIN,ACK] Seq=421 Ack=1449 Len=0
  server > client [ACK] Seq=4345 Ack=422 Len=1448 (more HTTP response)
  server > client [ACK] Seq=5793 Ack=422 Len=1448 (more HTTP response)
  client > server [RST] Seq=421 Len=0
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)

Everything up to and including the "RST" takes place in under a tenth of 
a second. The remaining ten retransmits take place over 5 minutes.

As soon as the client received the first packet of the HTTP response, it 
decided to close the connection. This appears to be due to a SonicWall 
firewall on the client end, which examines the Content-Type of the HTTP 
reply and immediately shuts down the connection if it's a "forbidden" 
type. This is apparently common.

 From the server's TCP stack point of view, this connection enters the 
CLOSE_WAIT state when the FIN is received. The stack then waits for 
Apache to close() the socket. However, Apache doesn't close the socket 
for five minutes. That's because it's blocked waiting for a socket write 
to complete, and it doesn't notice the end-of-input on the socket until 
the write times out. (Yes, according to netstat, the connection remains 
in CLOSE_WAIT even after the RST packet, which surprised me, but that's 
how Linux works, apparently.)

If the client opens up hundreds of these connections within five 
minutes, it can use up hundreds of Apache process slots. I want 
connlimit to prevent that, and it looks like it should, because 
conntrack should be tracking the CLOSE_WAIT connections just like any 
other connections. To make sure it tracks them long enough, I've set 
ip_conntrack_tcp_timeout_close_wait to 5 minutes.

However, the RST packet screws things up. As I said, the kernel ignores 
the RST packet and leaves the connection in CLOSE_WAIT. But when 
conntrack sees the RST packet, it marks the connection CLOSEd, and then 
forgets about it 10 seconds later.

What happens next depends on whether nf_conntrack_tcp_loose is set. If 
it's set to 1, the server's retransmitted packets cause a new, "fake" 
connection to be ESTABLISHED in conntrack, which lingers for five 
days(!). We originally had it set that way, but a couple of legitimate 
customers were complaining about still being blocked from our servers 
for five days after they'd actually closed all their connections.

So we set nf_conntrack_tcp_loose to 0. That solved the "blocked for five 
days" problem.... but now the CLOSE_WAIT connections quickly go to CLOSE 
in conntrack when the RST arrives and are totally forgotten ten seconds 
later. A rogue client can quickly get 40 connections into the CLOSE_WAIT 
state, then wait ten seconds and open 40 more, etc., occupying up to 
1200 Apache process slots within five minutes.

What we really want is for conntrack to match what the kernel does: to 
ignore the RST packet for CLOSE_WAIT connections, leaving the connection 
to remain in the conntrack CLOSE_WAIT state until 
ip_conntrack_tcp_timeout_close_wait expires. That looks easy to do with 
a change to nf_conntrack_proto_tcp.c:

-/*rst*/    { sIV, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sIV },
+/*rst*/    { sIV, sCL, sCL, sCL, sCL, sCW, sCL, sCL, sCL, sIV },

... but I'd rather not maintain a custom compiled kernel just for that.

So I've considered other solutions:

1. Set nf_conntrack_tcp_loose to 1, but change 
ip_conntrack_tcp_timeout_established to 1 hour (instead of 5 days). This 
would make sure that people aren't blocked for more than an hour after 
they close all their connections. However, that's still not ideal -- and 
it would also allow someone to intentionally bypass connlimit by opening 
40 connections, then leaving them idle for an hour, then opening 40 
more, and so on.

2. Set nf_conntrack_tcp_loose to 0, and change 
nf_conntrack_tcp_timeout_close to 5 minutes (instead of 10 seconds). 
This would only block people for the 5 minutes that they're still taking 
up an Apache process slot, but would also block anyone who sends 40 TCP 
RSTs within 5 minutes for any reason. You wouldn't think that this would 
be a problem, but RSTs actually seem quite common on a busy Web server 
with a fairly low HTTP keepalive value.

Does anyone have any other suggestions about how to make conntrack 
remember these connections during (and only during) the five-minute 
period netstat shows them as CLOSE_WAIT?

-- 
Robert L Mathews, Tiger Technologies     http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-15 22:10 conntrack and RSTs received during CLOSE_WAIT Robert L Mathews
@ 2009-05-16 21:57 ` Jozsef Kadlecsik
  2009-05-17  3:09   ` Robert L Mathews
  0 siblings, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-16 21:57 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter

Hi,

On Fri, 15 May 2009, Robert L Mathews wrote:

> I'm using Linux kernel 2.6.26 with conntrack/connlimit to prevent people from
> DOSing our Web servers by opening up too many simultaneous connections from
> one IP address. This is mostly for protection against unintentional DOSes from
> broken proxy servers that try to open up literally hundreds of simultaneous
> connections; we DROP their syn packets if they already have 40 connections
> open.
> 
> This is generally working well (and thanks to folks on this list for the hard
> work that makes this possible).
> 
> However: Some clients send evil TCP RSTs that confuse conntrack and break
> connlimit in a way that I'll detail below. First, here's a sample recreation:
> 
>  client > server [SYN] Seq=0 Len=0
>  server > client [SYN,ACK] Seq=0 Ack=1 Len=0
>  client > server [ACK] Seq=1 Ack=1 Len=0
>  client > server [PSH,ACK] Seq=1 Ack=1 Len=420 (HTTP GET request)
>  server > client [ACK] Seq=1 Ack=421 Len=0
>  server > client [ACK] Seq=1 Ack=421 Len=1448    (HTTP response)
>  server > client [ACK] Seq=1449 Ack=421 Len=1448 (more HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (more HTTP response)
>  client > server [FIN,ACK] Seq=421 Ack=1449 Len=0
>  server > client [ACK] Seq=4345 Ack=422 Len=1448 (more HTTP response)
>  server > client [ACK] Seq=5793 Ack=422 Len=1448 (more HTTP response)
>  client > server [RST] Seq=421 Len=0
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
>  server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
> 
> Everything up to and including the "RST" takes place in under a tenth of a
> second. The remaining ten retransmits take place over 5 minutes.

The TCP session seems to be totally broken. After the client sends

client > server [FIN,ACK] Seq=421 Ack=1449 Len=0

it should send the RST packet with Seq=422 and not Seq=421. The RST 
segment won't be accepted by the server.

And I don't get the server either: after sending Ack=422 it can't send 
Ack=421. Or there is an active device between the firewall and the server 
which reorders the packets.

Is it a real TCP session recording or a mistyped one?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-16 21:57 ` Jozsef Kadlecsik
@ 2009-05-17  3:09   ` Robert L Mathews
  2009-05-20  5:16     ` Robert L Mathews
  0 siblings, 1 reply; 22+ messages in thread
From: Robert L Mathews @ 2009-05-17  3:09 UTC (permalink / raw)
  To: netfilter

Jozsef Kadlecsik wrote:

> The TCP session seems to be totally broken. After the client sends
> 
> client > server [FIN,ACK] Seq=421 Ack=1449 Len=0
> 
> it should send the RST packet with Seq=422 and not Seq=421. The RST 
> segment won't be accepted by the server.

Okay. The client is definitely sending exactly that (I'm pretty sure 
it's a SonicWall firewall). That explains why the connection stays in 
the CLOSE_WAIT state according to netstat.

So the problem can be described as:

Some buggy clients send an out-of-sequence RST. When that happens, 
conntrack forgets about the connection ten seconds later, even though 
the TCP stack doesn't.

If nf_conntrack_tcp_loose is set to 0, this gives clients a trivial way 
to bypass connlimit, because the client then has open connections that 
aren't counted.

If nf_conntrack_tcp_loose is set to 1, subsequent packets sent more than 
ten seconds later will result in conntrack seeing a new ESTABLISHED 
connection. Unfortunately, if the subsequent packets were merely TCP 
retransmits (which is likely), the "new connection" will not really 
exist. Connlimit counts a nonexistent connection as being open for five 
days until it times out.

Both of these outcomes are obviously undesirable. Any suggestions how to 
avoid this, or to minimize the impact?


> And I don't get the server either: after sending Ack=422 it can't send 
> Ack=421.
> 
> Is it a real TCP session recording or a mistyped one?

You're right; that was a typo on my part, for which I apologize. I had 
to retype it from Wireshark, and I copied the wrong line. The ten 
retransmitted packets at the end do indeed send Ack=422, just as you say 
they should.

(However, the client problem is not a typo. The client definitely did 
send Seq=421 in the RST, which explains why netstat shows the connection 
remaining in CLOSE_WAIT and why the server continues to retransmit packets.)

-- 
Robert L Mathews, Tiger Technologies     http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-17  3:09   ` Robert L Mathews
@ 2009-05-20  5:16     ` Robert L Mathews
  2009-05-20  7:19       ` Jozsef Kadlecsik
  0 siblings, 1 reply; 22+ messages in thread
From: Robert L Mathews @ 2009-05-20  5:16 UTC (permalink / raw)
  To: netfilter

I haven't received any more followups on this, so I'll try to hack some 
solution around the problem.

But I will mention that I'm surprised that this didn't generate more 
discussion. Unless I'm confused (which is possible), sending 
out-of-sequence RST packets appears to be a trivial way to bypass connlimit.

It seems that all an attacker needs to do is send invalid RST packets 
with a sequence number one less than the last ACK received from the 
server. Then conntrack will forget about the connection, allowing the 
attacker to open as many connections as desired, regardless of connlimit 
limits.

I wrote a little perl script that I can leave running in the background 
on the client to send the necessary RST packets. In my testing, it does 
allow me to bypass connlimit restrictions on a server:

  http://www.tigertech.net/patches/rawip.pl

This seems to make connlimit less useful than I'd previously believed. 
Am I just misunderstanding something?

-- 
Robert L Mathews, Tiger Technologies     http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-20  5:16     ` Robert L Mathews
@ 2009-05-20  7:19       ` Jozsef Kadlecsik
  2009-05-20  7:31         ` Philip Craig
  2009-05-20 20:24         ` Robert L Mathews
  0 siblings, 2 replies; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-20  7:19 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter

On Tue, 19 May 2009, Robert L Mathews wrote:

> But I will mention that I'm surprised that this didn't generate more
> discussion. Unless I'm confused (which is possible), sending out-of-sequence
> RST packets appears to be a trivial way to bypass connlimit.
>
> It seems that all an attacker needs to do is send invalid RST packets with a
> sequence number one less than the last ACK received from the server. Then
> conntrack will forget about the connection, allowing the attacker to open as
> many connections as desired, regardless of connlimit limits.

Without TCP window tracking in conntrack, *any* RST segment (with proper 
src/dst ip/port, of course) would destroy the conntrack entry. With window 
tracking enabled (the default) we can maintain a window of the sequence 
numbers which are accepted and processed by conntrack. Due to the fact 
that the firewall sits in the middle and packets which have been seen by 
the firewall may get lost or even reordered in transit to the destination, 
it is inpossible to calculate the *exact* window sizes of the two end 
points. Therefore the window in conntrack wider and conntrack may process 
packets which otherwise are outside of the window of the receiver.
 
> I wrote a little perl script that I can leave running in the background on the
> client to send the necessary RST packets. In my testing, it does allow me to
> bypass connlimit restrictions on a server:
> 
>  http://www.tigertech.net/patches/rawip.pl
> 
> This seems to make connlimit less useful than I'd previously believed. Am I
> just misunderstanding something?

No, you are correct. If you want to eliminate the possibility to bypass 
connlimit with properly crafted RST segments, probably you should use the 
recent match and count the created NEW connections.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-20  7:19       ` Jozsef Kadlecsik
@ 2009-05-20  7:31         ` Philip Craig
  2009-05-20  7:42           ` Jozsef Kadlecsik
  2009-05-20 20:24         ` Robert L Mathews
  1 sibling, 1 reply; 22+ messages in thread
From: Philip Craig @ 2009-05-20  7:31 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Robert L Mathews, netfilter

Jozsef Kadlecsik wrote:
> Without TCP window tracking in conntrack, *any* RST segment (with proper 
> src/dst ip/port, of course) would destroy the conntrack entry. With window 
> tracking enabled (the default) we can maintain a window of the sequence 
> numbers which are accepted and processed by conntrack. Due to the fact 
> that the firewall sits in the middle and packets which have been seen by 
> the firewall may get lost or even reordered in transit to the destination, 
> it is inpossible to calculate the *exact* window sizes of the two end 
> points. Therefore the window in conntrack wider and conntrack may process 
> packets which otherwise are outside of the window of the receiver.

Is this the same reason why the window tracking accepts pure acks
without checking the sequence?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-20  7:31         ` Philip Craig
@ 2009-05-20  7:42           ` Jozsef Kadlecsik
  2009-05-20  8:06             ` Philip Craig
  0 siblings, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-20  7:42 UTC (permalink / raw)
  To: Philip Craig; +Cc: Robert L Mathews, netfilter

On Wed, 20 May 2009, Philip Craig wrote:

> Jozsef Kadlecsik wrote:
> > Without TCP window tracking in conntrack, *any* RST segment (with proper 
> > src/dst ip/port, of course) would destroy the conntrack entry. With window 
> > tracking enabled (the default) we can maintain a window of the sequence 
> > numbers which are accepted and processed by conntrack. Due to the fact 
> > that the firewall sits in the middle and packets which have been seen by 
> > the firewall may get lost or even reordered in transit to the destination, 
> > it is inpossible to calculate the *exact* window sizes of the two end 
> > points. Therefore the window in conntrack wider and conntrack may process 
> > packets which otherwise are outside of the window of the receiver.
> 
> Is this the same reason why the window tracking accepts pure acks
> without checking the sequence?

You mean, when the ack flag is not set in the packet, we handle it as it 
was set and had a proper ack field? What else could be done? :-)

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-20  7:42           ` Jozsef Kadlecsik
@ 2009-05-20  8:06             ` Philip Craig
  2009-05-20  8:43               ` Jozsef Kadlecsik
  0 siblings, 1 reply; 22+ messages in thread
From: Philip Craig @ 2009-05-20  8:06 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Robert L Mathews, netfilter

Jozsef Kadlecsik wrote:
> You mean, when the ack flag is not set in the packet, we handle it as it 
> was set and had a proper ack field? What else could be done? :-)

No, I mean when there the ack flag is set, but there is no data,
as handled by this code:

         if (seq == end
            && (!tcph->rst
                || (seq == 0 && state->state == TCP_CONNTRACK_SYN_SENT)))
                /*
                 * Packets contains no data: we assume it is valid
                 * and check the ack value only.
                 * However RST segments are always validated by their
                 * SEQ number, except when seq == 0 (reset sent answering
                 * SYN.
                 */
                seq = end = sender->td_end;


We've encountered this in practice where a 'tcp accelerator' was
creating a new tcp connection with all the same port numbers, but
a different sequence number, and the tcp conntrack was accepting
a pure ack packet as part of the old connection, even though the
sequence number was wrong.  This setup won't work no matter what
tcp conntrack does of course, but it did complicate working out
what was going on.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-20  8:06             ` Philip Craig
@ 2009-05-20  8:43               ` Jozsef Kadlecsik
  0 siblings, 0 replies; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-20  8:43 UTC (permalink / raw)
  To: Philip Craig; +Cc: Robert L Mathews, netfilter

On Wed, 20 May 2009, Philip Craig wrote:

> Jozsef Kadlecsik wrote:
> > You mean, when the ack flag is not set in the packet, we handle it as it 
> > was set and had a proper ack field? What else could be done? :-)
> 
> No, I mean when there the ack flag is set, but there is no data,
> as handled by this code:
> 
>          if (seq == end
>             && (!tcph->rst
>                 || (seq == 0 && state->state == TCP_CONNTRACK_SYN_SENT)))
>                 /*
>                  * Packets contains no data: we assume it is valid
>                  * and check the ack value only.
>                  * However RST segments are always validated by their
>                  * SEQ number, except when seq == 0 (reset sent answering
>                  * SYN.
>                  */
>                 seq = end = sender->td_end;
> 
> We've encountered this in practice where a 'tcp accelerator' was
> creating a new tcp connection with all the same port numbers, but
> a different sequence number, and the tcp conntrack was accepting
> a pure ack packet as part of the old connection, even though the
> sequence number was wrong.  This setup won't work no matter what
> tcp conntrack does of course, but it did complicate working out
> what was going on.

I see. The rationale behind not checking the sequence number in this case 
is that there's no data in the packet. If the packet is out of the window 
of the receiver, it'll answer with an ack with the proper seq, ack values.

But it can be argued that conntrack should still check the sequence number 
of dataless packets too :-).

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-20  7:19       ` Jozsef Kadlecsik
  2009-05-20  7:31         ` Philip Craig
@ 2009-05-20 20:24         ` Robert L Mathews
  2009-05-20 21:40           ` Jozsef Kadlecsik
  1 sibling, 1 reply; 22+ messages in thread
From: Robert L Mathews @ 2009-05-20 20:24 UTC (permalink / raw)
  To: netfilter

Jozsef Kadlecsik wrote:

> No, you are correct.

Hmmm, okay. I must say I'm a little surprised by that. I've seen plenty 
of people using connlimit and connbytes (for example) to protect against 
all kinds of things, and I don't think it's widely known that it's 
trivial for an attacker to bypass those restrictions.

Anyway, though:

>If you want to eliminate the possibility to bypass 
> connlimit with properly crafted RST segments, probably you should use the 
> recent match and count the created NEW connections.

My goal with connlimit is to limit simultaneous connections so that it 
prevents a single client from using up all the Apache process slots.

However, I don't want to limit how many connections they can open in a 
period of time.

For example, it's perfectly fine for someone to open, say, 500 
connections per minute, as long as they don't open more than 40 at a 
time. But I do need to block the 41st simultaneous connection even from 
people who open up connections very slowly, such as someone who opens up 
just five connections per hour and never closes them.

Is that something the "recent" feature can help with? I'm not seeing how 
that's possible, but perhaps I'm missing something.

Thanks again for the help!

-- 
Robert L Mathews, Tiger Technologies    http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-20 20:24         ` Robert L Mathews
@ 2009-05-20 21:40           ` Jozsef Kadlecsik
  2009-05-21  8:17             ` Anatoly Muliarski
  2009-05-21 15:31             ` Jozsef Kadlecsik
  0 siblings, 2 replies; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-20 21:40 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter, netfilter-devel

[Cc-ing netfilter-devel]

On Wed, 20 May 2009, Robert L Mathews wrote:

> > It seems that all an attacker needs to do is send invalid RST packets 
> > with a sequence number one less than the last ACK received from the 
> > server. Then conntrack will forget about the connection, allowing the 
> > attacker to open as many connections as desired, regardless of 
> > connlimit limits.
> >
> > I wrote a little perl script that I can leave running in the 
> > background on the client to send the necessary RST packets. In my 
> > testing, it does allow me to bypass connlimit restrictions on a 
> > server:
> > 
> >  http://www.tigertech.net/patches/rawip.pl
> > 
> > This seems to make connlimit less useful than I'd previously believed. 
> > Am I just misunderstanding something?
> 
> > No, you are correct.
> 
> Hmmm, okay. I must say I'm a little surprised by that. I've seen plenty 
> of people using connlimit and connbytes (for example) to protect against 
> all kinds of things, and I don't think it's widely known that it's 
> trivial for an attacker to bypass those restrictions.

I think because it is *not* widely known. The credit is yours for 
discovering how to bypass connlimit/connbytes.
 
> > If you want to eliminate the possibility to bypass connlimit with 
> > properly crafted RST segments, probably you should use the recent 
> > match and count the created NEW connections.
> 
> My goal with connlimit is to limit simultaneous connections so that it
> prevents a single client from using up all the Apache process slots.
> 
> However, I don't want to limit how many connections they can open in a 
> period of time.
> 
> For example, it's perfectly fine for someone to open, say, 500 
> connections per minute, as long as they don't open more than 40 at a 
> time. But I do need to block the 41st simultaneous connection even from 
> people who open up connections very slowly, such as someone who opens up 
> just five connections per hour and never closes them.
> 
> Is that something the "recent" feature can help with? I'm not seeing how
> that's possible, but perhaps I'm missing something.

No, that's not possible with "recent".

Because connlimit/connbytes rely on conntrack, the latter should be 
"fixed". However I do not see any way to make it resistant against such 
attacks: if we shrink the window (by which alogrithm?) we may block valid 
RST segments and thus cause connections to hang instead of termination.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-20 21:40           ` Jozsef Kadlecsik
@ 2009-05-21  8:17             ` Anatoly Muliarski
  2009-05-21  9:11               ` Jozsef Kadlecsik
  2009-05-21 15:31             ` Jozsef Kadlecsik
  1 sibling, 1 reply; 22+ messages in thread
From: Anatoly Muliarski @ 2009-05-21  8:17 UTC (permalink / raw)
  To: netfilter

I would like to put in some words.
Obviously the problem is in conntrack code.
IMHO, to solve this issue the code should track tcp sequence number
and check it correctness on receiving RST packet and on the following
decision about removing the conntrack entry.

2009/5/21, Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>:
>
> Because connlimit/connbytes rely on conntrack, the latter should be
> "fixed". However I do not see any way to make it resistant against such
> attacks: if we shrink the window (by which alogrithm?) we may block valid
> RST segments and thus cause connections to hang instead of termination.
>
> Best regards,
> Jozsef


-- 
Best regards
Anatoly Muliarski

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-21  8:17             ` Anatoly Muliarski
@ 2009-05-21  9:11               ` Jozsef Kadlecsik
  2009-05-21 18:07                 ` Robert L Mathews
  0 siblings, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-21  9:11 UTC (permalink / raw)
  To: Anatoly Muliarski; +Cc: netfilter

On Thu, 21 May 2009, Anatoly Muliarski wrote:

> > Because connlimit/connbytes rely on conntrack, the latter should be
> > "fixed". However I do not see any way to make it resistant against such
> > attacks: if we shrink the window (by which alogrithm?) we may block valid
> > RST segments and thus cause connections to hang instead of termination.
>
> I would like to put in some words.
> Obviously the problem is in conntrack code.
> IMHO, to solve this issue the code should track tcp sequence number
> and check it correctness on receiving RST packet and on the following
> decision about removing the conntrack entry.

The TCP sequence numbers *are* tracked and checked - but with the limit of 
a node being in the middle of the two communicating endpoints. That limit 
is physical and cannot be discarded.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-20 21:40           ` Jozsef Kadlecsik
  2009-05-21  8:17             ` Anatoly Muliarski
@ 2009-05-21 15:31             ` Jozsef Kadlecsik
  2009-05-21 18:45               ` Robert L Mathews
  1 sibling, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-21 15:31 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter, netfilter-devel

On Wed, 20 May 2009, Jozsef Kadlecsik wrote:

> Because connlimit/connbytes rely on conntrack, the latter should be 
> "fixed". However I do not see any way to make it resistant against such 
> attacks: if we shrink the window (by which alogrithm?) we may block valid 
> RST segments and thus cause connections to hang instead of termination.

OK, here is a patch. Could you test it with your script and in your 
environment?

The patch below introduces a new flag for TCP conntrack to mark that RST 
segment was seen. If retransmitted packets detected from the other 
direction after the RST segment detected, the timeout of the conntrack 
entry is linearly increased up to a hardcoded value. Thus we can both 
catch the retransmitted packets and preserve the effectiveness of 
connlimit/connbytes.
---
diff --git a/include/linux/netfilter/nf_conntrack_tcp.h b/include/linux/netfilter/nf_conntrack_tcp.h
index 3066789..465d346 100644
--- a/include/linux/netfilter/nf_conntrack_tcp.h
+++ b/include/linux/netfilter/nf_conntrack_tcp.h
@@ -35,6 +35,9 @@ enum tcp_conntrack {
 /* Has unacknowledged data */
 #define IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED	0x10
 
+/* Has seen RST */
+#define IP_CT_TCP_FLAG_RST_SEEN			0x20
+
 struct nf_ct_tcp_flags {
 	__u8 flags;
 	__u8 mask;
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index b5ccf2b..fca7caa 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -84,6 +84,10 @@ static unsigned int tcp_timeouts[TCP_CONNTRACK_MAX] __read_mostly = {
 	[TCP_CONNTRACK_CLOSE]		= 10 SECS,
 };
 
+/* Max timeout when retransmitted packets detected after RST was seen
+   from the other direction */
+#define TCP_TIMEOUT_RETRANS_AFTER_RST		2 MINS
+
 #define sNO TCP_CONNTRACK_NONE
 #define sSS TCP_CONNTRACK_SYN_SENT
 #define sSR TCP_CONNTRACK_SYN_RECV
@@ -918,6 +922,8 @@ static int tcp_packet(struct nf_conn *ct,
 				  "nf_ct_tcp: invalid state ");
 		return -NF_ACCEPT;
 	case TCP_CONNTRACK_CLOSE:
+		if (index == TCP_RST_SET)
+			ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_RST_SEEN;
 		if (index == TCP_RST_SET
 		    && ((test_bit(IPS_SEEN_REPLY_BIT, &ct->status)
 			 && ct->proto.tcp.last_index == TCP_SYN_SET)
@@ -963,7 +969,12 @@ static int tcp_packet(struct nf_conn *ct,
 	    && new_state == TCP_CONNTRACK_FIN_WAIT)
 		ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT;
 
-	if (ct->proto.tcp.retrans >= nf_ct_tcp_max_retrans &&
+	if (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_RST_SEEN
+	    && ct->proto.tcp.retrans > 1)
+	    	timeout = min_t(unsigned int,
+	    			tcp_timeouts[sCL] * ct->proto.tcp.retrans,
+	    			TCP_TIMEOUT_RETRANS_AFTER_RST);
+	else if (ct->proto.tcp.retrans >= nf_ct_tcp_max_retrans &&
 	    tcp_timeouts[new_state] > nf_ct_tcp_timeout_max_retrans)
 		timeout = nf_ct_tcp_timeout_max_retrans;
 	else if ((ct->proto.tcp.seen[0].flags | ct->proto.tcp.seen[1].flags) &


Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-21  9:11               ` Jozsef Kadlecsik
@ 2009-05-21 18:07                 ` Robert L Mathews
  0 siblings, 0 replies; 22+ messages in thread
From: Robert L Mathews @ 2009-05-21 18:07 UTC (permalink / raw)
  To: netfilter

Jozsef Kadlecsik wrote:

> The TCP sequence numbers *are* tracked and checked - but with the limit of 
> a node being in the middle of the two communicating endpoints. That limit 
> is physical and cannot be discarded.

That said, would it be safe to say that if conntrack has already seen a 
certain sequence number in one direction, then a RST packet with less 
than sequence number + 1 is automatically invalid?

To take my original example, if the client sends a legitimate packet 
with sequence number 421, then sends a following RST packet that also 
has sequence number 421, the RST packet *must* be bogus, and will 
(hopefully) be ignored by the server.

I realize that in practice, when our node-in-the-middle forwards the two 
seq=421 packets to the server, the first ("real") packet could be lost. 
Or perhaps the two packets arrive at the server in the other order. 
Either case will cause the server to see the bogus RST packet before it 
sees the "real" packet, and the server will "incorrectly" accept the 
bogus RST. And I don't see a way for conntrack to detect whether this 
happened.

However, it seems much more likely that the server will "correctly" 
reject the RST. Perhaps conntrack should assume that, instead? It would 
be correct more often, and it's also less harmful if we're wrong: when 
we receive data that violates the TCP standard, it seems better for 
conntrack to decide that a connection is still open when it's not, than 
to decide that a connection is closed when it isn't.

-- 
Robert L Mathews, Tiger Technologies    http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-21 15:31             ` Jozsef Kadlecsik
@ 2009-05-21 18:45               ` Robert L Mathews
  2009-05-22  4:32                 ` Anatoly Muliarski
  2009-05-22  7:42                 ` Jozsef Kadlecsik
  0 siblings, 2 replies; 22+ messages in thread
From: Robert L Mathews @ 2009-05-21 18:45 UTC (permalink / raw)
  To: netfilter; +Cc: netfilter-devel

Jozsef Kadlecsik wrote:
> On Wed, 20 May 2009, Jozsef Kadlecsik wrote:
> 
>> Because connlimit/connbytes rely on conntrack, the latter should be 
>> "fixed". However I do not see any way to make it resistant against such 
>> attacks: if we shrink the window (by which alogrithm?) we may block valid 
>> RST segments and thus cause connections to hang instead of termination.
> 
> OK, here is a patch. Could you test it with your script and in your 
> environment?
> 
> The patch below introduces a new flag for TCP conntrack to mark that RST 
> segment was seen. If retransmitted packets detected from the other 
> direction after the RST segment detected, the timeout of the conntrack 
> entry is linearly increased up to a hardcoded value. Thus we can both 
> catch the retransmitted packets and preserve the effectiveness of 
> connlimit/connbytes.

Perhaps I'm misunderstanding, but I don't think this will fix it.

Although the original example I gave involved TCP retransmits that 
"reanimated" the connection, the problem is unfortunately not limited to 
that case, and there don't need to be any TCP retransmits involved. (The 
subject of this thread is now a little misleading and overly-specific, 
unfortunately.)

Here's the opening of a telnet connection that I injected a bogus RST 
packet into (the packet has sequence 522209353 instead of the correct 
522209354):

client.52665 > server.23: S 522209223:522209223(0) win 5840 <mss 
1460,sackOK,nop,wscale 7>
server.23 > client.52665: S 3233007698:3233007698(0) ack 522209224 win 
5792 <mss 1460,sackOK,nop,wscale 6>
client.52665 > server.23: . ack 3233007699 win 46 <nop,nop>
server.23 > client.52665: P 3233007699:3233007711(12) ack 522209224 win 
91 <nop,nop>
client.52665 > server.23: . ack 3233007711 win 46 <nop,nop>
client.52665 > server.23: P 522209224:522209236(12) ack 3233007711 win 
46 <nop,nop>
server.23 > client.52665: . ack 522209236 win 91 <nop,nop>
server.23 > client.52665: P 3233007711:3233007735(24) ack 522209236 win 
91 <nop,nop>
client.52665 > server.23: . ack 3233007735 win 46 <nop,nop>
client.52665 > server.23: P 522209236:522209327(91) ack 3233007735 win 
46 <nop,nop>
server.23 > client.52665: P 3233007735:3233007750(15) ack 522209327 win 
91 <nop,nop>
client.52665 > server.23: P 522209327:522209351(24) ack 3233007750 win 
46 <nop,nop>
server.23 > client.52665: P 3233007750:3233007753(3) ack 522209351 win 
91 <nop,nop>
client.52665 > server.23: P 522209351:522209354(3) ack 3233007753 win 46 
<nop,nop>
server.23 > client.52665: P 3233007753:3233007947(194) ack 522209354 win 
91 <nop,nop>
client.52665 > server.23: R 522209353:522209353(0) win 65535
client.52665 > server.23: . ack 3233007947 win 54 <nop,nop>

At this point, the telnet connection with the server is fully 
established and waiting for me to type something (because the server 
ignored the bogus RST). But conntrack incorrectly considers it CLOSEd 
(because conntrack didn't ignore the RST). No retransmits were involved, 
though.

Even if nf_conntrack_tcp_loose is true, and conntrack treats subsequent 
packets as a new connection when I type something in telnet, I can make 
it forget about the connection again by sending another bogus RST. If I 
send a bogus RST after every legitimate packet, conntrack will almost 
always think the open connection is actually closed.

Since no retransmits are necessary, I don't think a solution that looks 
for retransmits will help, unfortunately.

-- 
Robert L Mathews, Tiger Technologies    http://www.tigertech.net/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-21 18:45               ` Robert L Mathews
@ 2009-05-22  4:32                 ` Anatoly Muliarski
  2009-05-22  7:21                   ` Jozsef Kadlecsik
  2009-05-22  7:42                 ` Jozsef Kadlecsik
  1 sibling, 1 reply; 22+ messages in thread
From: Anatoly Muliarski @ 2009-05-22  4:32 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter

2009/5/21 Robert L Mathews <lists@tigertech.com>:
>
> Perhaps I'm misunderstanding, but I don't think this will fix it.
>
> Although the original example I gave involved TCP retransmits that
> "reanimated" the connection, the problem is unfortunately not limited to
> that case, and there don't need to be any TCP retransmits involved. (The
> subject of this thread is now a little misleading and overly-specific,
> unfortunately.)
>
> Here's the opening of a telnet connection that I injected a bogus RST packet
> into (the packet has sequence 522209353 instead of the correct 522209354):
>
> client.52665 > server.23: S 522209223:522209223(0) win 5840 <mss
> 1460,sackOK,nop,wscale 7>
> server.23 > client.52665: S 3233007698:3233007698(0) ack 522209224 win 5792
> <mss 1460,sackOK,nop,wscale 6>
> client.52665 > server.23: . ack 3233007699 win 46 <nop,nop>
> server.23 > client.52665: P 3233007699:3233007711(12) ack 522209224 win 91
> <nop,nop>
> client.52665 > server.23: . ack 3233007711 win 46 <nop,nop>
> client.52665 > server.23: P 522209224:522209236(12) ack 3233007711 win 46
> <nop,nop>
> server.23 > client.52665: . ack 522209236 win 91 <nop,nop>
> server.23 > client.52665: P 3233007711:3233007735(24) ack 522209236 win 91
> <nop,nop>
> client.52665 > server.23: . ack 3233007735 win 46 <nop,nop>
> client.52665 > server.23: P 522209236:522209327(91) ack 3233007735 win 46
> <nop,nop>
> server.23 > client.52665: P 3233007735:3233007750(15) ack 522209327 win 91
> <nop,nop>
> client.52665 > server.23: P 522209327:522209351(24) ack 3233007750 win 46
> <nop,nop>
> server.23 > client.52665: P 3233007750:3233007753(3) ack 522209351 win 91
> <nop,nop>
> client.52665 > server.23: P 522209351:522209354(3) ack 3233007753 win 46
> <nop,nop>
> server.23 > client.52665: P 3233007753:3233007947(194) ack 522209354 win 91
> <nop,nop>
> client.52665 > server.23: R 522209353:522209353(0) win 65535
> client.52665 > server.23: . ack 3233007947 win 54 <nop,nop>
>
> At this point, the telnet connection with the server is fully established
> and waiting for me to type something (because the server ignored the bogus
> RST). But conntrack incorrectly considers it CLOSEd (because conntrack
> didn't ignore the RST). No retransmits were involved, though.
>
> Even if nf_conntrack_tcp_loose is true, and conntrack treats subsequent
> packets as a new connection when I type something in telnet, I can make it
> forget about the connection again by sending another bogus RST. If I send a
> bogus RST after every legitimate packet, conntrack will almost always think
> the open connection is actually closed.
>
> Since no retransmits are necessary, I don't think a solution that looks for
> retransmits will help, unfortunately.
>
> --
> Robert L Mathews, Tiger Technologies    http://www.tigertech.net/
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

The problem consists in that RST-packet with wrong sequence number
must be treated as INVALID.
IMHO, the conntrack code should be changed in this way.

I wrote a fix( sorry, I don't experienced in kernel tcp stack  -
correct me if I somewhere wrong).

*** nf_conntrack_proto_tcp.c    2009-05-22 07:22:43.000000000 +0300
--- /usr/src/linux/1/nf_conntrack_proto_tcp.c   2008-09-08
13:20:51.000000000 +0300
*************** static int tcp_packet(struct nf_conn *ct
*** 969,979 ****
                   problem case, so we can delete the conntrack
                   immediately.  --RR */
                if (th->rst) {
!                 if (ntohl(th->seq) < ct->proto.tcp.last_seq) {
!                   ct->proto.tcp.state = old_state;
!                   return -NF_DROP;
!                 }
!                       else if (del_timer(&ct->timeout))
                                ct->timeout.function((unsigned long)ct);
                        return NF_ACCEPT;
                }
--- 969,975 ----
                   problem case, so we can delete the conntrack
                   immediately.  --RR */
                if (th->rst) {
!                       if (del_timer(&ct->timeout))
                                ct->timeout.function((unsigned long)ct);
                        return NF_ACCEPT;
                }

-- 
Best regards
Anatoly Muliarski

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-22  4:32                 ` Anatoly Muliarski
@ 2009-05-22  7:21                   ` Jozsef Kadlecsik
  2009-05-22  8:26                     ` Anatoly Muliarski
  0 siblings, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-22  7:21 UTC (permalink / raw)
  To: Anatoly Muliarski; +Cc: Robert L Mathews, netfilter

On Fri, 22 May 2009, Anatoly Muliarski wrote:

> The problem consists in that RST-packet with wrong sequence number
> must be treated as INVALID.
> IMHO, the conntrack code should be changed in this way.
> 
> I wrote a fix( sorry, I don't experienced in kernel tcp stack  -
> correct me if I somewhere wrong).
> 
> *** nf_conntrack_proto_tcp.c    2009-05-22 07:22:43.000000000 +0300
> --- /usr/src/linux/1/nf_conntrack_proto_tcp.c   2008-09-08
> 13:20:51.000000000 +0300
> *************** static int tcp_packet(struct nf_conn *ct
> *** 969,979 ****
>                    problem case, so we can delete the conntrack
>                    immediately.  --RR */
>                 if (th->rst) {
> !                 if (ntohl(th->seq) < ct->proto.tcp.last_seq) {
> !                   ct->proto.tcp.state = old_state;
> !                   return -NF_DROP;
> !                 }
> !                       else if (del_timer(&ct->timeout))
>                                 ct->timeout.function((unsigned long)ct);
>                         return NF_ACCEPT;
>                 }
> --- 969,975 ----
>                    problem case, so we can delete the conntrack
>                    immediately.  --RR */
>                 if (th->rst) {
> !                       if (del_timer(&ct->timeout))
>                                 ct->timeout.function((unsigned long)ct);
>                         return NF_ACCEPT;
>                 }

That won't work: the packet which sequence number was recorded in last_seq 
may get lost in transit to the destination and we may receive a *valid* 
RST with a sequence number less than the one in last_seq.

This is the main problem: we can never be sure the packets which are seen 
by firewall do really reach the destination or they order is preserved.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-21 18:45               ` Robert L Mathews
  2009-05-22  4:32                 ` Anatoly Muliarski
@ 2009-05-22  7:42                 ` Jozsef Kadlecsik
  1 sibling, 0 replies; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-22  7:42 UTC (permalink / raw)
  To: Robert L Mathews; +Cc: netfilter, netfilter-devel

On Thu, 21 May 2009, Robert L Mathews wrote:

> Jozsef Kadlecsik wrote:
> > On Wed, 20 May 2009, Jozsef Kadlecsik wrote:
> > 
> > > Because connlimit/connbytes rely on conntrack, the latter should be
> > > "fixed". However I do not see any way to make it resistant against such
> > > attacks: if we shrink the window (by which alogrithm?) we may block valid
> > > RST segments and thus cause connections to hang instead of termination.
> > 
> > OK, here is a patch. Could you test it with your script and in your
> > environment?
> > 
> > The patch below introduces a new flag for TCP conntrack to mark that RST
> > segment was seen. If retransmitted packets detected from the other direction
> > after the RST segment detected, the timeout of the conntrack entry is
> > linearly increased up to a hardcoded value. Thus we can both catch the
> > retransmitted packets and preserve the effectiveness of connlimit/connbytes.
> 
> Perhaps I'm misunderstanding, but I don't think this will fix it.
> 
> Although the original example I gave involved TCP retransmits that
> "reanimated" the connection, the problem is unfortunately not limited to that
> case, and there don't need to be any TCP retransmits involved. (The subject of
> this thread is now a little misleading and overly-specific, unfortunately.)

Yes, I was concentrating on the original example.
 
> Here's the opening of a telnet connection that I injected a bogus RST packet
> into (the packet has sequence 522209353 instead of the correct 522209354):
> 
> client.52665 > server.23: S 522209223:522209223(0) win 5840 <mss
> 1460,sackOK,nop,wscale 7>
> server.23 > client.52665: S 3233007698:3233007698(0) ack 522209224 win 5792
> <mss 1460,sackOK,nop,wscale 6>
> client.52665 > server.23: . ack 3233007699 win 46 <nop,nop>
> server.23 > client.52665: P 3233007699:3233007711(12) ack 522209224 win 91
> <nop,nop>
> client.52665 > server.23: . ack 3233007711 win 46 <nop,nop>
> client.52665 > server.23: P 522209224:522209236(12) ack 3233007711 win 46
> <nop,nop>
> server.23 > client.52665: . ack 522209236 win 91 <nop,nop>
> server.23 > client.52665: P 3233007711:3233007735(24) ack 522209236 win 91
> <nop,nop>
> client.52665 > server.23: . ack 3233007735 win 46 <nop,nop>
> client.52665 > server.23: P 522209236:522209327(91) ack 3233007735 win 46
> <nop,nop>
> server.23 > client.52665: P 3233007735:3233007750(15) ack 522209327 win 91
> <nop,nop>
> client.52665 > server.23: P 522209327:522209351(24) ack 3233007750 win 46
> <nop,nop>
> server.23 > client.52665: P 3233007750:3233007753(3) ack 522209351 win 91
> <nop,nop>
> client.52665 > server.23: P 522209351:522209354(3) ack 3233007753 win 46
> <nop,nop>
> server.23 > client.52665: P 3233007753:3233007947(194) ack 522209354 win 91
> <nop,nop>
> client.52665 > server.23: R 522209353:522209353(0) win 65535
> client.52665 > server.23: . ack 3233007947 win 54 <nop,nop>
> 
> At this point, the telnet connection with the server is fully established and
> waiting for me to type something (because the server ignored the bogus RST).
> But conntrack incorrectly considers it CLOSEd (because conntrack didn't ignore
> the RST). No retransmits were involved, though.
> 
> Even if nf_conntrack_tcp_loose is true, and conntrack treats subsequent
> packets as a new connection when I type something in telnet, I can make it
> forget about the connection again by sending another bogus RST. If I send a
> bogus RST after every legitimate packet, conntrack will almost always think
> the open connection is actually closed.

But if nf_conntrack_tcp_loose is disabled, then the conntrack entry is 
destroyed and the connection won't be "picked up" at seeing the normal 
packets after the bogus RST.

I do not think that it's a problem that an attacker can destroy conntrack 
entries by forged RST segments: it cannot be prevented. However the 
attacker *can* open up a new connection again and thus overcome the 
limitation of connlimit.
 
Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-22  7:21                   ` Jozsef Kadlecsik
@ 2009-05-22  8:26                     ` Anatoly Muliarski
  2009-05-22  8:54                       ` Jozsef Kadlecsik
  0 siblings, 1 reply; 22+ messages in thread
From: Anatoly Muliarski @ 2009-05-22  8:26 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netfilter

2009/5/22, Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>:
> On Fri, 22 May 2009, Anatoly Muliarski wrote:
>
> > The problem consists in that RST-packet with wrong sequence number
> > must be treated as INVALID.
> > IMHO, the conntrack code should be changed in this way.
> >
> > I wrote a fix( sorry, I don't experienced in kernel tcp stack  -
> > correct me if I somewhere wrong).
> >
> > *** nf_conntrack_proto_tcp.c    2009-05-22 07:22:43.000000000 +0300
> > --- /usr/src/linux/1/nf_conntrack_proto_tcp.c   2008-09-08
> > 13:20:51.000000000 +0300
> > *************** static int tcp_packet(struct nf_conn *ct
> > *** 969,979 ****
> >                    problem case, so we can delete the conntrack
> >                    immediately.  --RR */
> >                 if (th->rst) {
> > !                 if (ntohl(th->seq) < ct->proto.tcp.last_seq) {
> > !                   ct->proto.tcp.state = old_state;
> > !                   return -NF_DROP;
> > !                 }
> > !                       else if (del_timer(&ct->timeout))
> >                                 ct->timeout.function((unsigned long)ct);
> >                         return NF_ACCEPT;
> >                 }
> > --- 969,975 ----
> >                    problem case, so we can delete the conntrack
> >                    immediately.  --RR */
> >                 if (th->rst) {
> > !                       if (del_timer(&ct->timeout))
> >                                 ct->timeout.function((unsigned long)ct);
> >                         return NF_ACCEPT;
> >                 }
>
> That won't work: the packet which sequence number was recorded in last_seq
> may get lost in transit to the destination and we may receive a *valid*
> RST with a sequence number less than the one in last_seq.
>
> This is the main problem: we can never be sure the packets which are seen
> by firewall do really reach the destination or they order is preserved.
>
> Best regards,
> Jozsef

OK.
We could save the LAST sequence number as a current one.
So we keep the connection and mark the current RST as invalid and
correctly react on the following ones. Unfortunately this does not
solve the main problem - unable to know whether the received sequence
number is valid or not. As an vague idea - we could track the ack
number from other direction and so keep the last delivered sequence
number. What can say about it?

I added a line in my patch.


*** nf_conntrack_proto_tcp.c    2009-05-22 11:00:29.000000000 +0300
--- /usr/src/linux/1/nf_conntrack_proto_tcp.c   2008-09-08
13:20:51.000000000 +0300
*************** static int tcp_packet(struct nf_conn *ct
*** 969,980 ****
                   problem case, so we can delete the conntrack
                   immediately.  --RR */
                if (th->rst) {
!                 if (ntohl(th->seq) < ct->proto.tcp.last_seq) {
!                   ct->proto.tcp.state = old_state;
!                   ct->proto.tcp.last_seq = ntohl(th->seq);
!                   return -NF_DROP;
!                 }
!                       else if (del_timer(&ct->timeout))
                                ct->timeout.function((unsigned long)ct);
                        return NF_ACCEPT;
                }
--- 969,975 ----
                   problem case, so we can delete the conntrack
                   immediately.  --RR */
                if (th->rst) {
!                       if (del_timer(&ct->timeout))
                                ct->timeout.function((unsigned long)ct);
                        return NF_ACCEPT;
                }



-- 
Best regards
Anatoly Muliarski

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-22  8:26                     ` Anatoly Muliarski
@ 2009-05-22  8:54                       ` Jozsef Kadlecsik
  2009-05-22 11:27                         ` Jozsef Kadlecsik
  0 siblings, 1 reply; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-22  8:54 UTC (permalink / raw)
  To: Anatoly Muliarski; +Cc: netfilter, netfilter-devel

On Fri, 22 May 2009, Anatoly Muliarski wrote:

> > This is the main problem: we can never be sure the packets which are seen
> > by firewall do really reach the destination or they order is preserved.

> We could save the LAST sequence number as a current one.
> So we keep the connection and mark the current RST as invalid and
> correctly react on the following ones. Unfortunately this does not
> solve the main problem - unable to know whether the received sequence
> number is valid or not. As an vague idea - we could track the ack
> number from other direction and so keep the last delivered sequence
> number. What can say about it?

Relying on the last ACK received from the other direction looks promising. 
We record the last (highest) ACK sent by both endpoints, which makes sure 
the packet they ack they did indeed received. And we accept a RST segment 
only if it's in the window we calculate (wider than the destination's) AND 
equal or higher than the saved last ACK from the other direction.

The only downside is that new fields must be added to struct ip_ct_tcp.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: conntrack and RSTs received during CLOSE_WAIT
  2009-05-22  8:54                       ` Jozsef Kadlecsik
@ 2009-05-22 11:27                         ` Jozsef Kadlecsik
  0 siblings, 0 replies; 22+ messages in thread
From: Jozsef Kadlecsik @ 2009-05-22 11:27 UTC (permalink / raw)
  To: Robert L Mathews, Anatoly Muliarski; +Cc: netfilter, netfilter-devel

On Fri, 22 May 2009, Jozsef Kadlecsik wrote:

> On Fri, 22 May 2009, Anatoly Muliarski wrote:
> 
> > > This is the main problem: we can never be sure the packets which are seen
> > > by firewall do really reach the destination or they order is preserved.
> 
> > We could save the LAST sequence number as a current one.
> > So we keep the connection and mark the current RST as invalid and
> > correctly react on the following ones. Unfortunately this does not
> > solve the main problem - unable to know whether the received sequence
> > number is valid or not. As an vague idea - we could track the ack
> > number from other direction and so keep the last delivered sequence
> > number. What can say about it?
> 
> Relying on the last ACK received from the other direction looks promising. 
> We record the last (highest) ACK sent by both endpoints, which makes sure 
> the packet they ack they did indeed received. And we accept a RST segment 
> only if it's in the window we calculate (wider than the destination's) AND 
> equal or higher than the saved last ACK from the other direction.
> 
> The only downside is that new fields must be added to struct ip_ct_tcp.

So here is the patch which adds checking RST segments against the highest 
ack we seen from the other direction. I tested it with your script and 
conntrack could resist receiving bogus RST segments.

diff --git a/include/linux/netfilter/nf_conntrack_tcp.h b/include/linux/netfilter/nf_conntrack_tcp.h
index 3066789..b2f384d 100644
--- a/include/linux/netfilter/nf_conntrack_tcp.h
+++ b/include/linux/netfilter/nf_conntrack_tcp.h
@@ -35,6 +35,9 @@ enum tcp_conntrack {
 /* Has unacknowledged data */
 #define IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED	0x10
 
+/* The field td_maxack has been set */
+#define IP_CT_TCP_FLAG_MAXACK_SET		0x20
+
 struct nf_ct_tcp_flags {
 	__u8 flags;
 	__u8 mask;
@@ -46,6 +49,7 @@ struct ip_ct_tcp_state {
 	u_int32_t	td_end;		/* max of seq + len */
 	u_int32_t	td_maxend;	/* max of ack + max(win, 1) */
 	u_int32_t	td_maxwin;	/* max(win) */
+	u_int32_t	td_maxack;	/* max of ack */
 	u_int8_t	td_scale;	/* window scale factor */
 	u_int8_t	flags;		/* per direction options */
 };
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index b5ccf2b..97a6e93 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -634,6 +634,14 @@ static bool tcp_in_window(const struct nf_conn *ct,
 			sender->td_end = end;
 			sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
 		}
+		if (tcph->ack) {
+			if (!(sender->flags & IP_CT_TCP_FLAG_MAXACK_SET)) {
+				sender->td_maxack = ack;
+				sender->flags |= IP_CT_TCP_FLAG_MAXACK_SET;
+			} else if (after(ack, sender->td_maxack))
+				sender->td_maxack = ack;
+		}
+
 		/*
 		 * Update receiver data.
 		 */
@@ -919,6 +927,16 @@ static int tcp_packet(struct nf_conn *ct,
 		return -NF_ACCEPT;
 	case TCP_CONNTRACK_CLOSE:
 		if (index == TCP_RST_SET
+		    && (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_MAXACK_SET)
+		    && before(ntohl(th->seq), ct->proto.tcp.seen[!dir].td_maxack)) {
+			/* Invalid RST  */
+			write_unlock_bh(&tcp_lock);
+			if (LOG_INVALID(net, IPPROTO_TCP))
+				nf_log_packet(pf, 0, skb, NULL, NULL, NULL,
+					  "nf_ct_tcp: invalid RST ");
+			return -NF_ACCEPT;
+		}
+		if (index == TCP_RST_SET
 		    && ((test_bit(IPS_SEEN_REPLY_BIT, &ct->status)
 			 && ct->proto.tcp.last_index == TCP_SYN_SET)
 			|| (!test_bit(IPS_ASSURED_BIT, &ct->status)

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-05-22 11:27 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-15 22:10 conntrack and RSTs received during CLOSE_WAIT Robert L Mathews
2009-05-16 21:57 ` Jozsef Kadlecsik
2009-05-17  3:09   ` Robert L Mathews
2009-05-20  5:16     ` Robert L Mathews
2009-05-20  7:19       ` Jozsef Kadlecsik
2009-05-20  7:31         ` Philip Craig
2009-05-20  7:42           ` Jozsef Kadlecsik
2009-05-20  8:06             ` Philip Craig
2009-05-20  8:43               ` Jozsef Kadlecsik
2009-05-20 20:24         ` Robert L Mathews
2009-05-20 21:40           ` Jozsef Kadlecsik
2009-05-21  8:17             ` Anatoly Muliarski
2009-05-21  9:11               ` Jozsef Kadlecsik
2009-05-21 18:07                 ` Robert L Mathews
2009-05-21 15:31             ` Jozsef Kadlecsik
2009-05-21 18:45               ` Robert L Mathews
2009-05-22  4:32                 ` Anatoly Muliarski
2009-05-22  7:21                   ` Jozsef Kadlecsik
2009-05-22  8:26                     ` Anatoly Muliarski
2009-05-22  8:54                       ` Jozsef Kadlecsik
2009-05-22 11:27                         ` Jozsef Kadlecsik
2009-05-22  7:42                 ` Jozsef Kadlecsik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).