* [PATCH] TCP window tracking over-window handling
@ 2005-01-28 23:43 Phil Oester
2005-02-02 9:46 ` Jozsef Kadlecsik
0 siblings, 1 reply; 7+ messages in thread
From: Phil Oester @ 2005-01-28 23:43 UTC (permalink / raw)
To: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 1625 bytes --]
This was a tough one to track down, and also tough to explain...
In cases of packet loss, some broken TCP stacks will send data over the
window of the receiver. Then, on the next packet they will resend the
missing packet, then move on.
The current window tracking code 'ignores' this window overage by resetting
the end of the packet to the theoretical maximum (corrects the math of the
stupid sender).
Unfortunately, the sender does receive this over-the-window packet, and does
increment its ack seq. So on the next ack after receiving the missing
packet, the receiver acks with the ack seq it knows about, which is higher
than that which the window tracking code thinks it has seen. Consequently,
it complains that the ACK is over the upper bound, and does not adjust
the window.
After this, communications cease, as each new packet from the sender is
dropped by the window tracking code -- it thinks it is over the window
of the sender.
So, we have seen situations where large ftp transfers stop midstream, but
smaller transfers complete fine (they got lucky). I've attached a tcpdump
log at the bottom of this message with some annotation and conntrack
debugging inserted in appropriate locations for the interested.
The solution IMO is not to correct the math of the sender -- there is no
benefit to doing so. The attached patch does this (and also clarifies
the language in some debugging code).
Patrick: this is incremental to the retransmission handling patch you have
already queued up. Like that patch, I think this should go into 2.6.11.
Phil
Signed-off-by: Phil Oester <kernel@linuxace.com>
[-- Attachment #2: patch-overwindow --]
[-- Type: text/plain, Size: 1285 bytes --]
diff -ru linux-orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c linux-testdellfw/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
--- linux-orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-01-28 17:48:10.620973992 -0500
+++ linux-testdellfw/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-01-28 17:54:02.799434728 -0500
@@ -622,7 +622,6 @@
/* Ignore data over the right edge of the receiver's window. */
if (after(end, sender->td_maxend) &&
before(seq, sender->td_maxend)) {
- end = sender->td_maxend;
if (*index == TCP_FIN_SET)
*index = TCP_ACK_SET;
}
@@ -691,9 +690,9 @@
after(seq, sender->td_end - receiver->td_maxwin - 1) ?
before(sack, receiver->td_end + 1) ?
after(ack, receiver->td_end - MAXACKWINDOW(sender)) ? "BUG"
- : "ACK is under the lower bound (possibly overly delayed ACK)"
- : "ACK is over the upper bound (ACKed data has never seen yet)"
- : "SEQ is under the lower bound (retransmitted already ACKed data)"
+ : "ACK is under the lower bound (possible overly delayed ACK)"
+ : "ACK is over the upper bound (ACKed data not seen yet)"
+ : "SEQ is under the lower bound (already ACKed data retransmitted)"
: "SEQ is over the upper bound (over the window of the receiver)");
res = ip_ct_tcp_be_liberal && !tcph->rst;
[-- Attachment #3: dump --]
[-- Type: text/plain, Size: 6032 bytes --]
00:29:51.936687 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692880475:3692881855(1380) ack 4116033054 win 8280
00:29:51.936767 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692865295 win 32767
----> packet lost here
00:29:51.977254 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692865295:3692866675(1380) ack 4116033054 win 8280
00:29:51.977340 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.014823 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692883235:3692884615(1380) ack 4116033054 win 8280
00:29:52.014937 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.055532 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692884615:3692885995(1380) ack 4116033054 win 8280
00:29:52.055613 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.055676 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692885995:3692887375(1380) ack 4116033054 win 8280
00:29:52.055758 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.055960 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692887375:3692888755(1380) ack 4116033054 win 8280
00:29:52.056036 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.056248 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692888755:3692890135(1380) ack 4116033054 win 8280
00:29:52.056324 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.093099 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692890135:3692891515(1380) ack 4116033054 win 8280
00:29:52.093178 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.134237 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692891515:3692892895(1380) ack 4116033054 win 8280
00:29:52.134316 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.134525 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692892895:3692894275(1380) ack 4116033054 win 8280
00:29:52.134604 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.171234 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692894275:3692895655(1380) ack 4116033054 win 8280
00:29:52.171314 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.212371 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692895655:3692897035(1380) ack 4116033054 win 8280
00:29:52.212450 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.212800 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692897035:3692898415(1380) ack 4116033054 win 8280
00:29:52.212876 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.249368 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692898415:3692899795(1380) ack 4116033054 win 8280
00:29:52.249448 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.290649 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692899795:3692901175(1380) ack 4116033054 win 8280
00:29:52.290736 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.291077 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692901175:3692902555(1380) ack 4116033054 win 8280
00:29:52.291158 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.327645 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692902555:3692903935(1380) ack 4116033054 win 8280
00:29:52.327729 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.368927 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692903935:3692905315(1380) ack 4116033054 win 8280
00:29:52.369020 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.369214 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692905315:3692906695(1380) ack 4116033054 win 8280
00:29:52.369293 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.405923 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692906695:3692908075(1380) ack 4116033054 win 8280
00:29:52.406004 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.447204 IP x.x.113.7.2877 > x.x.11.14.32773: P 3692908075:3692909455(1380) ack 4116033054 win 8280
00:29:52.447286 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.447633 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692909455:3692910835(1380) ack 4116033054 win 8280
00:29:52.447712 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.484343 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692910835:3692912215(1380) ack 4116033054 win 8280
00:29:52.484424 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
00:29:52.525338 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692912215:3692913595(1380) ack 4116033054 win 8280
00:29:52.525435 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
----> sender sends data over the window here. tracking code adjusts end to maxend
00:29:52.525910 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692913595:3692914975(1380) ack 4116033054 win 8280
00:29:52.656511 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692881855 win 32767
----> sender resends the missing packet:
00:29:53.333968 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692881855:3692883235(1380) ack 4116033054 win 8280
----> receiver acks up to the full range it has received, which now does not match tracking code:
00:29:53.811731 IP x.x.11.14.32773 > x.x.113.7.2877: . ack 3692914975 win 32767
----> kernel complains, and we see it believes the max data seen is 3692914622
Jan 28 00:29:53 fw02 kernel: ip_ct_tcp: ACK is over the upper bound (ACKed data has never seen yet) IN= OUT= SRC=x.x.11.14 DST=x.x.113.7 LEN=40 TOS=0x08 PREC=0x00 TTL=64 ID=48752 DF PROTO=TCP SPT=32773 DPT=2877 SEQ=4116033054 ACK=3692914975 WINDOW=32767 RES=0x00 ACK URGP=0
Jan 28 00:29:53 fw02 kernel: tcp_in_window: res=0 sender end=4116033054 maxend=4116041334 maxwin=32767 receiver end=3692914622 maxend=3692914622 maxwin=8280
----> all is lost now...sender cannot send any packets which appear legal to tracking code:
00:29:53.889995 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692914975:3692916355(1380) ack 4116033054 win 8280
00:29:54.372055 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692916355:3692917735(1380) ack 4116033054 win 8280
00:29:56.286994 IP x.x.113.7.2877 > x.x.11.14.32773: . 3692914975:3692916355(1380) ack 4116033054 win 8280
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] TCP window tracking over-window handling
2005-01-28 23:43 [PATCH] TCP window tracking over-window handling Phil Oester
@ 2005-02-02 9:46 ` Jozsef Kadlecsik
2005-02-02 16:00 ` Phil Oester
0 siblings, 1 reply; 7+ messages in thread
From: Jozsef Kadlecsik @ 2005-02-02 9:46 UTC (permalink / raw)
To: Phil Oester, Patrick McHardy; +Cc: netfilter-devel
Hi Phil,
On Fri, 28 Jan 2005, Phil Oester wrote:
> This was a tough one to track down, and also tough to explain...
>
> In cases of packet loss, some broken TCP stacks will send data over the
> window of the receiver. Then, on the next packet they will resend the
> missing packet, then move on.
>
> The current window tracking code 'ignores' this window overage by resetting
> the end of the packet to the theoretical maximum (corrects the math of the
> stupid sender).
>
> Unfortunately, the sender does receive this over-the-window packet, and does
> increment its ack seq. So on the next ack after receiving the missing
> packet, the receiver acks with the ack seq it knows about, which is higher
> than that which the window tracking code thinks it has seen. Consequently,
> it complains that the ACK is over the upper bound, and does not adjust
> the window.
That implies then that the receiver is broken as well, by accepting and
ack-ing out of window segments. But it is true, we anticipate the
behaviour of the receiver here, which we shouldn't.
> After this, communications cease, as each new packet from the sender is
> dropped by the window tracking code -- it thinks it is over the window
> of the sender.
>
> So, we have seen situations where large ftp transfers stop midstream, but
> smaller transfers complete fine (they got lucky). I've attached a tcpdump
> log at the bottom of this message with some annotation and conntrack
> debugging inserted in appropriate locations for the interested.
Interesting indeed!
> The solution IMO is not to correct the math of the sender -- there is no
> benefit to doing so. The attached patch does this (and also clarifies
> the language in some debugging code).
The current code follows closely TCP/IP Illustrated vol 2, p. 954: Trim
Segment so Data is Within Window.
Do you know the OS of the communicating parties? Weren't window scaling or
SACK negotiated?
With your proposed patch, we'd actually drop the oow segment. Could you
check that it won't cause problems (besides more logging generated :-)?
Best regards,
Jozsef
-
E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] TCP window tracking over-window handling
2005-02-02 9:46 ` Jozsef Kadlecsik
@ 2005-02-02 16:00 ` Phil Oester
2005-02-02 20:44 ` Jozsef Kadlecsik
0 siblings, 1 reply; 7+ messages in thread
From: Phil Oester @ 2005-02-02 16:00 UTC (permalink / raw)
To: Jozsef Kadlecsik; +Cc: netfilter-devel, Patrick McHardy
On Wed, Feb 02, 2005 at 10:46:01AM +0100, Jozsef Kadlecsik wrote:
> That implies then that the receiver is broken as well, by accepting and
> ack-ing out of window segments. But it is true, we anticipate the
> behaviour of the receiver here, which we shouldn't.
The receiver was a linux 2.6.10 box, so not an uncommon OS ;-) The original
problem was noted by clients on Win2K and WinXP against the same FTP server.
> The current code follows closely TCP/IP Illustrated vol 2, p. 954: Trim
> Segment so Data is Within Window.
>
> Do you know the OS of the communicating parties? Weren't window scaling or
> SACK negotiated?
The FTP server was an NT 4.0 box running IIS 3.0 ftp service. As noted
above, receiver was the Linux firewall itself. There was no window
scaling or SACK involved.
> With your proposed patch, we'd actually drop the oow segment. Could you
> check that it won't cause problems (besides more logging generated :-)?
I agree -- the oow segment is dropped, but this at least doesn't break
the communication. Without this patch, I cannot complete a large (5mb)
download from this server. With this patch, it never fails.
Reviewing the ipfilter code, it doesn't seem the author included this check.
So what was the rationale for including it in the netfilter version? I
can't think of what it is protecting us from.
Phil
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] TCP window tracking over-window handling
2005-02-02 16:00 ` Phil Oester
@ 2005-02-02 20:44 ` Jozsef Kadlecsik
2005-02-02 22:35 ` Phil Oester
2005-02-07 10:32 ` Jozsef Kadlecsik
0 siblings, 2 replies; 7+ messages in thread
From: Jozsef Kadlecsik @ 2005-02-02 20:44 UTC (permalink / raw)
To: Phil Oester; +Cc: netfilter-devel, Patrick McHardy
On Wed, 2 Feb 2005, Phil Oester wrote:
> On Wed, Feb 02, 2005 at 10:46:01AM +0100, Jozsef Kadlecsik wrote:
> > That implies then that the receiver is broken as well, by accepting and
> > ack-ing out of window segments. But it is true, we anticipate the
> > behaviour of the receiver here, which we shouldn't.
>
> The receiver was a linux 2.6.10 box, so not an uncommon OS ;-) The original
> problem was noted by clients on Win2K and WinXP against the same FTP server.
Then I'd like even better to understand how can it happen.
> > The current code follows closely TCP/IP Illustrated vol 2, p. 954: Trim
> > Segment so Data is Within Window.
> >
> > Do you know the OS of the communicating parties? Weren't window scaling or
> > SACK negotiated?
>
> The FTP server was an NT 4.0 box running IIS 3.0 ftp service. As noted
> above, receiver was the Linux firewall itself. There was no window
> scaling or SACK involved.
That it can't be due to some window scaling/SACK handling bug, at least.
> > With your proposed patch, we'd actually drop the oow segment. Could you
> > check that it won't cause problems (besides more logging generated :-)?
>
> I agree -- the oow segment is dropped, but this at least doesn't break
> the communication. Without this patch, I cannot complete a large (5mb)
> download from this server. With this patch, it never fails.
>
> Reviewing the ipfilter code, it doesn't seem the author included this check.
> So what was the rationale for including it in the netfilter version? I
> can't think of what it is protecting us from.
I'm digging my mail folders to find the reports which triggered adding
that code segment.
Best regards,
Jozsef
-
E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] TCP window tracking over-window handling
2005-02-02 20:44 ` Jozsef Kadlecsik
@ 2005-02-02 22:35 ` Phil Oester
2005-02-07 10:32 ` Jozsef Kadlecsik
1 sibling, 0 replies; 7+ messages in thread
From: Phil Oester @ 2005-02-02 22:35 UTC (permalink / raw)
To: Jozsef Kadlecsik; +Cc: netfilter-devel, Patrick McHardy
On Wed, Feb 02, 2005 at 09:44:51PM +0100, Jozsef Kadlecsik wrote:
> That it can't be due to some window scaling/SACK handling bug, at least.
I don't think so either, but you can correct me if I'm wrong given the below
handshake:
x.x.x.14.32782 > x.x.x.7.21: S 855161881:855161881(0) win 5840 <mss 1460,sackOK,timestamp 430761638 0,nop,wscale 2>
x.x.x.7.21 > x.x.x.14.32782: S 3715392258:3715392258(0) ack 855161882 win 8280 <mss 1380>
x.x.x.14.32782 > x.x.x.7.21: . ack 1 win 5840
> I'm digging my mail folders to find the reports which triggered adding
> that code segment.
Great, thanks.
Phil
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] TCP window tracking over-window handling
2005-02-02 20:44 ` Jozsef Kadlecsik
2005-02-02 22:35 ` Phil Oester
@ 2005-02-07 10:32 ` Jozsef Kadlecsik
2005-02-07 16:25 ` Phil Oester
1 sibling, 1 reply; 7+ messages in thread
From: Jozsef Kadlecsik @ 2005-02-07 10:32 UTC (permalink / raw)
To: Phil Oester; +Cc: netfilter-devel, Patrick McHardy
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1715 bytes --]
Hi,
On Wed, 2 Feb 2005, Jozsef Kadlecsik wrote:
> > The receiver was a linux 2.6.10 box, so not an uncommon OS ;-) The original
> > problem was noted by clients on Win2K and WinXP against the same FTP server.
>
> Then I'd like even better to understand how can it happen.
Re-reading the article on which the code is based, the relevant RFCs and
the TCP code in the Linux kernel, it proves that the article (and thus the
present window tracking code) is simply too strict by requiring that
packets must completely fit into the window.
Actually the real governing rule is that packets must intersect the
window: there may be segments before the left or after the right edge.
Moreover, the receivers may keep the segments over the window for later
processing, and your recording just proves it does happen.
So we can either follow the article and drop the assumption about
receivers trimming the segments over the window or adjust the code to
meet RFC793 and real life traffic patterns. I believe the second
approach would be preferable because then conntrack wouldn't drop
legitimate packets and there were less false alarms.
The first attached patch (your version with some modifications to
complete it) implements the first variation.
The second one aims to implement the more RFC-compatible window tracking
code. It is slightly tested using the first window tracking tests by
nfsim. I'm working on writing more tests to cover as many cases as
possible.
Best regards,
Jozsef
-
E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
[-- Attachment #2: tcp-win-phil.patch --]
[-- Type: TEXT/PLAIN, Size: 3717 bytes --]
diff -urN --exclude-from=/usr/src/diff.exclude linux-2.6.9-tcp-win-retrans/net/ipv4/netfilter/ip_conntrack_proto_tcp.c linux-2.6.9-tcp-win-phil/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
--- linux-2.6.9-tcp-win-retrans/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-02-04 16:13:42.000000000 +0100
+++ linux-2.6.9-tcp-win-phil/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-02-07 10:15:05.000000000 +0100
@@ -500,7 +500,7 @@
static int tcp_in_window(struct ip_ct_tcp *state,
enum ip_conntrack_dir dir,
- unsigned int *index,
+ unsigned int index,
const struct sk_buff *skb,
struct iphdr *iph,
struct tcphdr *tcph)
@@ -606,12 +606,10 @@
seq = end = sender->td_end;
DEBUGP("tcp_in_window: src=%u.%u.%u.%u:%hu dst=%u.%u.%u.%u:%hu "
- "seq=%u ack=%u sack =%u win=%u end=%u trim=%u\n",
+ "seq=%u ack=%u sack =%u win=%u end=%u\n",
NIPQUAD(iph->saddr), ntohs(tcph->source),
NIPQUAD(iph->daddr), ntohs(tcph->dest),
- seq, ack, sack, win, end,
- after(end, sender->td_maxend) && before(seq, sender->td_maxend)
- ? sender->td_maxend : end);
+ seq, ack, sack, win, end);
DEBUGP("tcp_in_window: sender end=%u maxend=%u maxwin=%u scale=%i "
"receiver end=%u maxend=%u maxwin=%u scale=%i\n",
sender->td_end, sender->td_maxend, sender->td_maxwin,
@@ -619,18 +617,9 @@
receiver->td_end, receiver->td_maxend, receiver->td_maxwin,
receiver->td_scale);
- /* Ignore data over the right edge of the receiver's window. */
- if (after(end, sender->td_maxend) &&
- before(seq, sender->td_maxend)) {
- end = sender->td_maxend;
- if (*index == TCP_FIN_SET)
- *index = TCP_ACK_SET;
- }
DEBUGP("tcp_in_window: I=%i II=%i III=%i IV=%i\n",
- before(end, sender->td_maxend + 1)
- || before(seq, sender->td_maxend + 1),
- after(seq, sender->td_end - receiver->td_maxwin - 1)
- || after(end, sender->td_end - receiver->td_maxwin - 1),
+ before(end, sender->td_maxend + 1),
+ after(seq, sender->td_end - receiver->td_maxwin - 1),
before(sack, receiver->td_end + 1),
after(ack, receiver->td_end - MAXACKWINDOW(sender)));
@@ -662,7 +651,7 @@
/*
* Check retransmissions.
*/
- if (*index == TCP_ACK_SET) {
+ if (index == TCP_ACK_SET) {
if (state->last_dir == dir
&& state->last_seq == seq
&& state->last_ack == ack
@@ -691,9 +680,9 @@
after(seq, sender->td_end - receiver->td_maxwin - 1) ?
before(sack, receiver->td_end + 1) ?
after(ack, receiver->td_end - MAXACKWINDOW(sender)) ? "BUG"
- : "ACK is under the lower bound (possibly overly delayed ACK)"
- : "ACK is over the upper bound (ACKed data has never seen yet)"
- : "SEQ is under the lower bound (retransmitted already ACKed data)"
+ : "ACK is under the lower bound (possible overly delayed ACK)"
+ : "ACK is over the upper bound (ACKed data not seen yet)"
+ : "SEQ is under the lower bound (already ACKed data retransmitted)"
: "SEQ is over the upper bound (over the window of the receiver)");
res = ip_ct_tcp_be_liberal && !tcph->rst;
@@ -926,14 +915,12 @@
break;
}
- if (!tcp_in_window(&conntrack->proto.tcp, dir, &index,
+ if (!tcp_in_window(&conntrack->proto.tcp, dir, index,
skb, iph, th)) {
WRITE_UNLOCK(&tcp_lock);
return -NF_ACCEPT;
}
/* From now on we have got in-window packets */
-
- /* If FIN was trimmed off, we don't change state. */
conntrack->proto.tcp.last_index = index;
new_state = tcp_conntracks[dir][index][old_state];
[-- Attachment #3: tcp-win-rfc.patch --]
[-- Type: TEXT/PLAIN, Size: 5957 bytes --]
diff -urN --exclude-from=/usr/src/diff.exclude linux-2.6.9-tcp-win-retrans/net/ipv4/netfilter/ip_conntrack_proto_tcp.c linux-2.6.9-tcp-win-win/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
--- linux-2.6.9-tcp-win-retrans/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-02-04 16:13:42.000000000 +0100
+++ linux-2.6.9-tcp-win-win/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-02-07 10:16:08.000000000 +0100
@@ -352,17 +352,21 @@
http://www.nluug.nl/events/sane2000/papers.html
http://www.iae.nl/users/guido/papers/tcp_filtering.ps.gz
- The boundaries and the conditions are slightly changed:
+ The boundaries and the conditions are changed to meet RFC793:
+ the packet must intersect the window (i.e. segments may be
+ after the right or before the left edge).
td_maxend = max(sack + max(win,1)) seen in reply packets
td_maxwin = max(max(win, 1)) + (sack - ack) seen in sent packets
+ td_maxwin += seq + len - sender.td_maxend
+ if seq + len > sender.td_maxend
td_end = max(seq + len) seen in sent packets
- I. Upper bound for valid data: seq + len <= sender.td_maxend
- II. Lower bound for valid data: seq >= sender.td_end - receiver.td_maxwin
+ I. Upper bound for valid data: seq <= sender.td_maxend
+ II. Lower bound for valid data: seq + len >= sender.td_end - receiver.td_maxwin
III. Upper bound for valid ack: sack <= receiver.td_end
IV. Lower bound for valid ack: ack >= receiver.td_end - MAXACKWINDOW
-
+
where sack is the highest right edge of sack block found in the packet.
The upper bound limit for a valid ack is not ignored -
@@ -500,7 +504,7 @@
static int tcp_in_window(struct ip_ct_tcp *state,
enum ip_conntrack_dir dir,
- unsigned int *index,
+ unsigned int index,
const struct sk_buff *skb,
struct iphdr *iph,
struct tcphdr *tcph)
@@ -606,12 +610,10 @@
seq = end = sender->td_end;
DEBUGP("tcp_in_window: src=%u.%u.%u.%u:%hu dst=%u.%u.%u.%u:%hu "
- "seq=%u ack=%u sack =%u win=%u end=%u trim=%u\n",
+ "seq=%u ack=%u sack =%u win=%u end=%u\n",
NIPQUAD(iph->saddr), ntohs(tcph->source),
NIPQUAD(iph->daddr), ntohs(tcph->dest),
- seq, ack, sack, win, end,
- after(end, sender->td_maxend) && before(seq, sender->td_maxend)
- ? sender->td_maxend : end);
+ seq, ack, sack, win, end);
DEBUGP("tcp_in_window: sender end=%u maxend=%u maxwin=%u scale=%i "
"receiver end=%u maxend=%u maxwin=%u scale=%i\n",
sender->td_end, sender->td_maxend, sender->td_maxwin,
@@ -619,24 +621,15 @@
receiver->td_end, receiver->td_maxend, receiver->td_maxwin,
receiver->td_scale);
- /* Ignore data over the right edge of the receiver's window. */
- if (after(end, sender->td_maxend) &&
- before(seq, sender->td_maxend)) {
- end = sender->td_maxend;
- if (*index == TCP_FIN_SET)
- *index = TCP_ACK_SET;
- }
DEBUGP("tcp_in_window: I=%i II=%i III=%i IV=%i\n",
- before(end, sender->td_maxend + 1)
- || before(seq, sender->td_maxend + 1),
- after(seq, sender->td_end - receiver->td_maxwin - 1)
- || after(end, sender->td_end - receiver->td_maxwin - 1),
+ before(seq, sender->td_maxend + 1),
+ after(end, sender->td_end - receiver->td_maxwin),
before(sack, receiver->td_end + 1),
after(ack, receiver->td_end - MAXACKWINDOW(sender)));
if (sender->loose || receiver->loose ||
- (before(end, sender->td_maxend + 1) &&
- after(seq, sender->td_end - receiver->td_maxwin - 1) &&
+ (before(seq, sender->td_maxend + 1) &&
+ after(end, sender->td_end - receiver->td_maxwin) &&
before(sack, receiver->td_end + 1) &&
after(ack, receiver->td_end - MAXACKWINDOW(sender)))) {
/*
@@ -653,6 +646,11 @@
sender->td_maxwin = swin;
if (after(end, sender->td_end))
sender->td_end = end;
+ /*
+ * Update receiver data.
+ */
+ if (after(end, sender->td_maxend))
+ receiver->td_maxwin += end - sender->td_maxend;
if (after(sack + win, receiver->td_maxend - 1)) {
receiver->td_maxend = sack + win;
if (win == 0)
@@ -662,7 +660,7 @@
/*
* Check retransmissions.
*/
- if (*index == TCP_ACK_SET) {
+ if (index == TCP_ACK_SET) {
if (state->last_dir == dir
&& state->last_seq == seq
&& state->last_ack == ack
@@ -687,13 +685,13 @@
if (LOG_INVALID(IPPROTO_TCP))
nf_log_packet(PF_INET, 0, skb, NULL, NULL,
"ip_ct_tcp: %s ",
- before(end, sender->td_maxend + 1) ?
- after(seq, sender->td_end - receiver->td_maxwin - 1) ?
+ before(seq, sender->td_maxend + 1) ?
+ after(end, sender->td_end - receiver->td_maxwin - 1) ?
before(sack, receiver->td_end + 1) ?
after(ack, receiver->td_end - MAXACKWINDOW(sender)) ? "BUG"
- : "ACK is under the lower bound (possibly overly delayed ACK)"
- : "ACK is over the upper bound (ACKed data has never seen yet)"
- : "SEQ is under the lower bound (retransmitted already ACKed data)"
+ : "ACK is under the lower bound (possible overly delayed ACK)"
+ : "ACK is over the upper bound (ACKed data not seen yet)"
+ : "SEQ is under the lower bound (already ACKed data retransmitted)"
: "SEQ is over the upper bound (over the window of the receiver)");
res = ip_ct_tcp_be_liberal && !tcph->rst;
@@ -926,14 +924,12 @@
break;
}
- if (!tcp_in_window(&conntrack->proto.tcp, dir, &index,
+ if (!tcp_in_window(&conntrack->proto.tcp, dir, index,
skb, iph, th)) {
WRITE_UNLOCK(&tcp_lock);
return -NF_ACCEPT;
}
/* From now on we have got in-window packets */
-
- /* If FIN was trimmed off, we don't change state. */
conntrack->proto.tcp.last_index = index;
new_state = tcp_conntracks[dir][index][old_state];
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] TCP window tracking over-window handling
2005-02-07 10:32 ` Jozsef Kadlecsik
@ 2005-02-07 16:25 ` Phil Oester
0 siblings, 0 replies; 7+ messages in thread
From: Phil Oester @ 2005-02-07 16:25 UTC (permalink / raw)
To: Jozsef Kadlecsik; +Cc: netfilter-devel, Patrick McHardy
On Mon, Feb 07, 2005 at 11:32:27AM +0100, Jozsef Kadlecsik wrote:
> Actually the real governing rule is that packets must intersect the
> window: there may be segments before the left or after the right edge.
> Moreover, the receivers may keep the segments over the window for later
> processing, and your recording just proves it does happen.
>
> So we can either follow the article and drop the assumption about
> receivers trimming the segments over the window or adjust the code to
> meet RFC793 and real life traffic patterns. I believe the second
> approach would be preferable because then conntrack wouldn't drop
> legitimate packets and there were less false alarms.
>
> The first attached patch (your version with some modifications to
> complete it) implements the first variation.
>
> The second one aims to implement the more RFC-compatible window tracking
> code. It is slightly tested using the first window tracking tests by
> nfsim. I'm working on writing more tests to cover as many cases as
> possible.
Both look good, but would it be best to merge the less intrusive alternative
#1 for 2.6.11, then update to alternative #2 early in 2.6.12 so it can receive
more testing?
Phil
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-02-07 16:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-28 23:43 [PATCH] TCP window tracking over-window handling Phil Oester
2005-02-02 9:46 ` Jozsef Kadlecsik
2005-02-02 16:00 ` Phil Oester
2005-02-02 20:44 ` Jozsef Kadlecsik
2005-02-02 22:35 ` Phil Oester
2005-02-07 10:32 ` Jozsef Kadlecsik
2005-02-07 16:25 ` Phil Oester
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.