[PATCH] for some issues in tcp window tracking patch

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] for some issues in tcp window tracking patch
@ 2004-05-16 23:48 Pablo Neira
  2004-05-16 23:51 ` Pablo Neira
  2004-05-17 13:00 ` Jozsef Kadlecsik
  0 siblings, 2 replies; 28+ messages in thread
From: Pablo Neira @ 2004-05-16 23:48 UTC (permalink / raw)
  To: Jozsef Kadlecsik, Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 3589 bytes --]

Hi Jozsef,

I'm having a look at your tcp-window-tracking patch, I made some 
modifications. Attached to this email a patch which applies to an 
already applied patch (I know, a bit tricky), but I just wanted to show 
you clearly my modifications. After you've reviewed this email, I have 
no problem to send a patch which applies to the CVS with the stuff that 
you considered that it's ok.

Well, let me comment you a bit the patch.

a) Use flags defined in tcp.h instead of bsd flags: I found that these 
flags are already defined in linux with different names in tcp.h, so we 
could use linux flags instead.

-#define        TH_FIN  0x01
-#define        TH_SYN  0x02
...

----- snipped from tcp.h ----
#define TCPCB_FLAG_FIN 0x01
#define TCPCB_FLAG_SYN 0x02
...

b) check that tcp is not malformed (doff has a faked value): I trust the 
value of skb->len, so we could use it to check that the doff is correct 
instead of that skb_copy_bits and that buff var, couldn't we?

+       if ((skb->len - iph->ihl*4 - sizeof(struct tcphdr) == tcph->doff*4)
+            && tcph->doff*4 < sizeof(struct tcphdr)) {

I added this to the unclean function.

c) I replaced this:

-       } else if ((old_state == TCP_CONNTRACK_SYN_RECV
-                   || old_state == TCP_CONNTRACK_ESTABLISHED)
-                  && new_state == TCP_CONNTRACK_ESTABLISHED) {
-               /* Set ASSURED if we see see valid ack in ESTABLISHED 
after SYN_RECV
-                  or a valid answer for a picked up connection. */
-                       set_bit(IPS_ASSURED_BIT, &conntrack->status);

for this:

+       case TCP_CONNTRACK_ESTABLISHED:
+               if ((old_state == TCP_CONNTRACK_SYN_RECV
+                    || old_state == TCP_CONNTRACK_ESTABLISHED)
+                    && !test_bit(IPS_ASSURED_BIT, &conntrack->status)) {
+                       /* Set ASSURED if we see see valid ack in 
ESTABLISHED
+                        * after SYN_RECV or a valid answer for a picked
+                        * up connection. */
+                       set_bit(IPS_ASSURED_BIT, &conntrack->status);
+               }

actually I also added that test_bit, so the bit assured is set once.

d) I think that we don't need this anymore, this is part of the old 
tracking:

-       if (!test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)) {
-               /* If only reply is a RST, we can consider ourselves not 
to           
-                  have an established connection: this is a fairly common
-                  problem case, so we can delete the 
conntrack                       
-                  immediately.  --RR */
-               if (tcph->rst) 
{                                                      
-                       WRITE_UNLOCK(&tcp_lock);
-                       if 
(del_timer(&conntrack->timeout))                           
-                               conntrack->timeout.function((unsigned 
long)conntrack);
-                       return 
NF_ACCEPT;                                             
-               }

because this should be always check in TCP_CONNTRACK_CLOSE which is, if 
i'm not wrong, the state that we reach when a rst is received.

e) I defined an inline __u32 segment_seq_plus_len instead a macro, it 
gives me more flexibility.

f) Do you think that it's interesting adding fine grain locking in 
tcp-window-tracking?
http://lists.netfilter.org/pipermail/netfilter-devel/2004-May/015166.html

I'm still understanding the current secuence tracking, I have some ideas 
but I want to think a bit more about them, I'll let you know.

regards,
Pablo

[-- Attachment #2: tcp-win.patch --]
[-- Type: text/x-patch, Size: 11179 bytes --]

--- net/ipv4/netfilter/ip_conntrack_proto_tcp.c~	2004-05-17 00:35:52.000000000 +0200
+++ net/ipv4/netfilter/ip_conntrack_proto_tcp.c	2004-05-17 00:33:54.000000000 +0200
@@ -368,9 +368,12 @@
    we doesn't have to deal with fragments. 
 */
 
-#define SEGMENT_SEQ_PLUS_LEN(seq, len, iph, tcph)	\
-	(seq + len - (iph->ihl + tcph->doff)*4		\
-	 + (tcph->syn ? 1 : 0) + (tcph->fin ? 1 : 0))
+static inline __u32 segment_seq_plus_len(__u32 seq, size_t len, struct iphdr *iph, 
+				 struct tcphdr *tcph)
+{
+	return (seq + len - (iph->ihl + tcph->doff)*4 \
+		+ (tcph->syn ? 1 : 0) + (tcph->fin ? 1 : 0));
+}
 
 /* Fixme: what about big packets? */
 #define MAXACKWINCONST			66000
@@ -440,7 +443,7 @@
 	seq = ntohl(tcph->seq);
 	ack = ntohl(tcph->ack_seq);
 	win = ntohs(tcph->window);
-	end = SEGMENT_SEQ_PLUS_LEN(seq, skb->len, iph, tcph);
+	end = segment_seq_plus_len(seq, skb->len, iph, tcph);
 	
 	DEBUGP("tcp_in_window: START\n");
 	DEBUGP("tcp_in_window: src=%u.%u.%u.%u:%hu dst=%u.%u.%u.%u:%hu seq=%u ack=%u win=%u end=%u\n",
@@ -622,7 +625,7 @@
 	struct ip_ct_tcp_state *receiver = &conntrack->proto.tcp.seen[!dir];
 #endif
 
-	end = SEGMENT_SEQ_PLUS_LEN(ntohl(tcph->seq), skb->len, iph, tcph);
+	end = segment_seq_plus_len(ntohl(tcph->seq), skb->len, iph, tcph);
 	
 	WRITE_LOCK(&tcp_lock);
 	/*
@@ -641,41 +644,34 @@
 EXPORT_SYMBOL(ip_conntrack_tcp_update);
 #endif
 
-#define	TH_FIN	0x01
-#define	TH_SYN	0x02
-#define	TH_RST	0x04
-#define	TH_PUSH	0x08
-#define	TH_ACK	0x10
-#define	TH_URG	0x20
-#define	TH_ECE	0x40
-#define	TH_CWR	0x80
-
 /* table of valid flag combinations - ECE and CWR are always valid */
-static u8 tcp_valid_flags[(TH_FIN|TH_SYN|TH_RST|TH_PUSH|TH_ACK|TH_URG) + 1] =
+static u8 tcp_valid_flags[(TCPCB_FLAG_FIN|TCPCB_FLAG_SYN|TCPCB_FLAG_RST|TCPCB_FLAG_PSH|TCPCB_FLAG_ACK|TCPCB_FLAG_URG) + 1] =
 {
-	[TH_SYN]			= 1,
-	[TH_SYN|TH_ACK]			= 1,
-	[TH_RST]			= 1,
-	[TH_RST|TH_ACK]			= 1,
-	[TH_RST|TH_ACK|TH_PUSH]		= 1,
-	[TH_FIN|TH_ACK]			= 1,
-	[TH_ACK]			= 1,
-	[TH_ACK|TH_PUSH]		= 1,
-	[TH_ACK|TH_URG]			= 1,
-	[TH_ACK|TH_URG|TH_PUSH]		= 1,
-	[TH_FIN|TH_ACK|TH_PUSH]		= 1,
-	[TH_FIN|TH_ACK|TH_URG]		= 1,
-	[TH_FIN|TH_ACK|TH_URG|TH_PUSH]	= 1,
+	[TCPCB_FLAG_SYN]				= 1,
+	[TCPCB_FLAG_SYN|TCPCB_FLAG_ACK]			= 1,
+	[TCPCB_FLAG_RST]				= 1,
+	[TCPCB_FLAG_RST|TCPCB_FLAG_ACK]			= 1,
+	[TCPCB_FLAG_RST|TCPCB_FLAG_ACK|TCPCB_FLAG_PSH]	= 1,
+	[TCPCB_FLAG_FIN|TCPCB_FLAG_ACK]			= 1,
+	[TCPCB_FLAG_ACK]				= 1,
+	[TCPCB_FLAG_ACK|TCPCB_FLAG_PSH]			= 1,
+	[TCPCB_FLAG_ACK|TCPCB_FLAG_URG]			= 1,
+	[TCPCB_FLAG_ACK|TCPCB_FLAG_URG|TCPCB_FLAG_PSH]	= 1,
+	[TCPCB_FLAG_FIN|TCPCB_FLAG_ACK|TCPCB_FLAG_PSH]	= 1,
+	[TCPCB_FLAG_FIN|TCPCB_FLAG_ACK|TCPCB_FLAG_URG]	= 1,
+	[TCPCB_FLAG_FIN|TCPCB_FLAG_ACK|TCPCB_FLAG_URG|TCPCB_FLAG_PSH]	= 1,
 };
 
-/* Protect conntrack agaist broken packets. Code taken from ipt_unclean.c.  */
+/* Protect conntrack agaist broken packets. Modified version from ipt_unclean.c.  */
 static int unclean(const struct sk_buff *skb, struct iphdr *iph, struct tcphdr *tcph)
 {
 	unsigned int tcplen = skb->len - iph->ihl * 4;
 	u_int8_t tcpflags;
 
+
 	/* Not whole TCP header or malformed packet */
-	if (tcph->doff*4 < sizeof(struct tcphdr) || tcplen < tcph->doff*4) {
+	if ((skb->len - iph->ihl*4 - sizeof(struct tcphdr) == tcph->doff*4)
+	     && tcph->doff*4 < sizeof(struct tcphdr)) {
 		if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
 			nf_log_packet(PF_INET, 0, skb, NULL, NULL, 
 				  "ip_conntrack_tcp: INVALID: truncated/malformed packet ");
@@ -696,7 +692,7 @@
 	}
 
 	/* Check TCP flags. */
-	tcpflags = (((u_int8_t *)tcph)[13] & ~(TH_ECE|TH_CWR));
+	tcpflags = (((u_int8_t *)tcph)[13] & ~(TCPCB_FLAG_ECE|TCPCB_FLAG_CWR));
 	if (!tcp_valid_flags[tcpflags]) {
 		if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
 			nf_log_packet(PF_INET, 0, skb, NULL, NULL, 
@@ -715,29 +711,22 @@
 	enum tcp_conntrack new_state, old_state;
 	enum ip_conntrack_dir dir;
 	struct iphdr *iph = skb->nh.iph;
-	unsigned char buff[15 * 4];
-	struct tcphdr *tcph = (struct tcphdr *)buff;
+	struct tcphdr tcph;
 	unsigned long timeout;
 	unsigned int index;
 	
 	/* Smaller that minimal TCP header? */
-	if (skb_copy_bits(skb, iph->ihl * 4, buff, sizeof(*tcph)) != 0)
-		return -NF_ACCEPT;
+	if (skb_copy_bits(skb, iph->ihl * 4, &tcph, sizeof(tcph)) != 0)
+		return -NF_DROP;
 
-	/* Smaller than actual TCP header? */
-	if (skb_copy_bits(skb, iph->ihl * 4 + sizeof(*tcph), 
-			  buff + sizeof(*tcph), 
-			  tcph->doff * 4 - sizeof(*tcph)) != 0)
-		return -NF_ACCEPT;
-	
 	/* Do not handle unclean packets, which could cause false alarms */
-	if (unclean(skb, iph, tcph))
-		return -NF_ACCEPT;
+	if (unclean(skb, iph, &tcph))
+		return -NF_DROP;
 	
-	WRITE_LOCK(&tcp_lock);
+ 	WRITE_LOCK(&tcp_lock);
 	old_state = conntrack->proto.tcp.state;
 	dir = CTINFO2DIR(ctinfo);
-	index = get_conntrack_index(tcph);
+	index = get_conntrack_index(&tcph);
 	new_state = tcp_conntracks[dir][index][old_state];
 
 	switch (new_state) {
@@ -746,7 +735,7 @@
 		if (index == TCP_SYNACK_SET
 		    && conntrack->proto.tcp.stored_seq == TCP_SYN_SET
 		    && conntrack->proto.tcp.last_dir != dir
-		    && after(ntohl(tcph->ack_seq), conntrack->proto.tcp.last_seq)) {
+		    && after(ntohl(tcph.ack_seq), conntrack->proto.tcp.last_seq)) {
 			/* This SYN/ACK acknowledges a SYN that we earlier ignored 
 			 * as invalid. This means that the client and the server 
 			 * are both in sync, while the firewall is not. We kill 
@@ -764,7 +753,7 @@
 		}
 		conntrack->proto.tcp.stored_seq = index;
 		conntrack->proto.tcp.last_dir = dir;
-		conntrack->proto.tcp.last_seq = ntohl(tcph->seq);
+		conntrack->proto.tcp.last_seq = ntohl(tcph.seq);
 		
 		WRITE_UNLOCK(&tcp_lock);
 		if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
@@ -773,10 +762,10 @@
 		return NF_ACCEPT;
 	case TCP_CONNTRACK_MAX:
 		/* Invalid packet */
-		DEBUGP("ip_conntrack_tcp: Invalid dir=%i index=%u conntrack=%u\n",
+ 		DEBUGP("ip_conntrack_tcp: Invalid dir=%i index=%u conntrack=%u\n",
 		       dir, get_conntrack_index(tcph),
 		       old_state);
-		WRITE_UNLOCK(&tcp_lock);
+ 		WRITE_UNLOCK(&tcp_lock);
 		if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
 			nf_log_packet(PF_INET, 0, skb, NULL, NULL, 
 				  "ip_conntrack_tcp: INVALID: invalid state ");
@@ -804,23 +793,32 @@
 			return NF_ACCEPT;
 		}
 		/* Just fall trough */
+	case TCP_CONNTRACK_ESTABLISHED:
+		if ((old_state == TCP_CONNTRACK_SYN_RECV
+		     || old_state == TCP_CONNTRACK_ESTABLISHED)
+		     && !test_bit(IPS_ASSURED_BIT, &conntrack->status)) {
+			/* Set ASSURED if we see see valid ack in ESTABLISHED 
+			 * after SYN_RECV or a valid answer for a picked 
+			 * up connection. */
+			set_bit(IPS_ASSURED_BIT, &conntrack->status);
+		}
 	default:
 		/* Keep compilers happy. */
 		break;
-	}
-
+ 	}
+ 
 	conntrack->proto.tcp.stored_seq = index;
-	if (!tcp_in_window(&conntrack->proto.tcp, dir, skb, iph, tcph)) {
+	if (!tcp_in_window(&conntrack->proto.tcp, dir, skb, iph, &tcph)) {
 		/* Invalid packet */
-		WRITE_UNLOCK(&tcp_lock);
+ 		WRITE_UNLOCK(&tcp_lock);
 		return -NF_ACCEPT;
 	}
 	/* If FIN was trimmed off, don't change state. */
 	new_state = tcp_conntracks[dir][conntrack->proto.tcp.stored_seq][old_state];
-
+ 
 	DEBUGP("tcp_conntracks: src=%u.%u.%u.%u:%hu dst=%u.%u.%u.%u:%hu syn=%i ack=%i fin=%i rst=%i old=%i new=%i\n",
-		NIPQUAD(iph->saddr), ntohs(tcph->source), NIPQUAD(iph->daddr), ntohs(tcph->dest),
-		(tcph->syn ? 1 : 0), (tcph->ack ? 1 : 0), (tcph->fin ? 1 : 0), (tcph->rst ? 1 : 0),
+		NIPQUAD(iph->saddr), ntohs(tcph.source), NIPQUAD(iph->daddr), ntohs(tcph.dest),
+		(tcph.syn ? 1 : 0), (tcph.ack ? 1 : 0), (tcph.fin ? 1 : 0), (tcph.rst ? 1 : 0),
 		old_state, new_state);
 
 	conntrack->proto.tcp.state = new_state;
@@ -828,28 +826,10 @@
 		  && *tcp_timeouts[new_state] > ip_ct_tcp_timeout_max_retrans ?
 		  ip_ct_tcp_timeout_max_retrans : *tcp_timeouts[new_state];
 
-	if (!test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)) {
-		/* If only reply is a RST, we can consider ourselves not to
-		   have an established connection: this is a fairly common
-		   problem case, so we can delete the conntrack
-		   immediately.  --RR */
-		if (tcph->rst) {
-			WRITE_UNLOCK(&tcp_lock);
-			if (del_timer(&conntrack->timeout))
-				conntrack->timeout.function((unsigned long)conntrack);
-			return NF_ACCEPT;
-		}
-	} else if ((old_state == TCP_CONNTRACK_SYN_RECV
-		    || old_state == TCP_CONNTRACK_ESTABLISHED)
-		   && new_state == TCP_CONNTRACK_ESTABLISHED) {
-		/* Set ASSURED if we see see valid ack in ESTABLISHED after SYN_RECV 
-		   or a valid answer for a picked up connection. */
-			set_bit(IPS_ASSURED_BIT, &conntrack->status);
-	}
 	WRITE_UNLOCK(&tcp_lock);
 	ip_ct_refresh(conntrack, timeout);
-
-	return NF_ACCEPT;
+ 
+ 	return NF_ACCEPT;
 }
 
 /* Called when a new connection for this protocol found. */
@@ -857,30 +837,23 @@
 {
 	enum tcp_conntrack new_state;
 	struct iphdr *iph = skb->nh.iph;
-	unsigned char buff[15 * 4];
-	struct tcphdr *tcph = (struct tcphdr *)buff;
+	struct tcphdr tcph;
 #ifdef DEBUGP_VARS
 	struct ip_ct_tcp_state *sender = &conntrack->proto.tcp.seen[0];
 	struct ip_ct_tcp_state *receiver = &conntrack->proto.tcp.seen[1];
 #endif
 
 	/* Smaller that minimal TCP header? */
-	if (skb_copy_bits(skb, iph->ihl * 4, buff, sizeof(*tcph)) != 0)
+	if (skb_copy_bits(skb, iph->ihl * 4, &tcph, sizeof(tcph)) != 0)
 		return 0;
 
-	/* Smaller than actual TCP header? */
-	if (skb_copy_bits(skb, iph->ihl * 4 + sizeof(*tcph), 
-			  buff + sizeof(*tcph), 
-			  tcph->doff * 4 - sizeof(*tcph)) != 0)
-		return 0;
-	
 	/* Skip unclean packets */
-	if (unclean(skb, iph, tcph))
+	if (unclean(skb, iph, &tcph))
 		return 0;
 
 	/* Don't need lock here: this conntrack not in circulation yet */
 	new_state
-		= tcp_conntracks[0][get_conntrack_index(tcph)]
+		= tcp_conntracks[0][get_conntrack_index(&tcph)]
 		[TCP_CONNTRACK_NONE];
 
 	/* Invalid: delete conntrack */
@@ -892,14 +865,14 @@
 	if (new_state == TCP_CONNTRACK_SYN_SENT) {
 		/* SYN packet */
 		conntrack->proto.tcp.seen[0].td_end =
-			SEGMENT_SEQ_PLUS_LEN(ntohl(tcph->seq), skb->len, iph, tcph);
-		conntrack->proto.tcp.seen[0].td_maxwin = ntohs(tcph->window);
+			 segment_seq_plus_len(ntohl(tcph.seq), skb->len, iph, &tcph);
+		conntrack->proto.tcp.seen[0].td_maxwin = ntohs(tcph.window);
 		if (conntrack->proto.tcp.seen[0].td_maxwin == 0)
 			conntrack->proto.tcp.seen[0].td_maxwin = 1;
 		conntrack->proto.tcp.seen[0].td_maxend =
 			conntrack->proto.tcp.seen[0].td_end;
 
-		tcp_options(tcph, &conntrack->proto.tcp.seen[0]);
+		tcp_options(&tcph, &conntrack->proto.tcp.seen[0]);
 		conntrack->proto.tcp.seen[1].flags = 0;
 		conntrack->proto.tcp.seen[0].loose = 
 		conntrack->proto.tcp.seen[1].loose = 0;
@@ -913,8 +886,8 @@
 		 * Let's try to use the data from the packet.
 		 */
 		conntrack->proto.tcp.seen[0].td_end =
-			SEGMENT_SEQ_PLUS_LEN(ntohl(tcph->seq), skb->len, iph, tcph);
-		conntrack->proto.tcp.seen[0].td_maxwin = ntohs(tcph->window);
+			 segment_seq_plus_len(ntohl(tcph.seq), skb->len, iph, &tcph);
+		conntrack->proto.tcp.seen[0].td_maxwin = ntohs(tcph.window);
 		if (conntrack->proto.tcp.seen[0].td_maxwin == 0)
 			conntrack->proto.tcp.seen[0].td_maxwin = 1;
 		conntrack->proto.tcp.seen[0].td_maxend =

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-16 23:48 [PATCH] for some issues in tcp window tracking patch Pablo Neira
@ 2004-05-16 23:51 ` Pablo Neira
  2004-05-17 13:00 ` Jozsef Kadlecsik
  1 sibling, 0 replies; 28+ messages in thread
From: Pablo Neira @ 2004-05-16 23:51 UTC (permalink / raw)
  To: Jozsef Kadlecsik, Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 67 bytes --]

Hi again,

please wrong patch, use this one instead.

Sorry,
Pablo

[-- Attachment #2: tcp-win.patch --]
[-- Type: text/x-patch, Size: 11626 bytes --]

--- net/ipv4/netfilter/ip_conntrack_proto_tcp.c~	2004-05-17 00:35:52.000000000 +0200
+++ net/ipv4/netfilter/ip_conntrack_proto_tcp.c	2004-05-17 00:58:42.000000000 +0200
@@ -368,9 +368,12 @@
    we doesn't have to deal with fragments. 
 */
 
-#define SEGMENT_SEQ_PLUS_LEN(seq, len, iph, tcph)	\
-	(seq + len - (iph->ihl + tcph->doff)*4		\
-	 + (tcph->syn ? 1 : 0) + (tcph->fin ? 1 : 0))
+static inline __u32 segment_seq_plus_len(__u32 seq, size_t len, struct iphdr *iph, 
+				 struct tcphdr *tcph)
+{
+	return (seq + len - (iph->ihl + tcph->doff)*4 \
+		+ (tcph->syn ? 1 : 0) + (tcph->fin ? 1 : 0));
+}
 
 /* Fixme: what about big packets? */
 #define MAXACKWINCONST			66000
@@ -440,7 +443,7 @@
 	seq = ntohl(tcph->seq);
 	ack = ntohl(tcph->ack_seq);
 	win = ntohs(tcph->window);
-	end = SEGMENT_SEQ_PLUS_LEN(seq, skb->len, iph, tcph);
+	end = segment_seq_plus_len(seq, skb->len, iph, tcph);
 	
 	DEBUGP("tcp_in_window: START\n");
 	DEBUGP("tcp_in_window: src=%u.%u.%u.%u:%hu dst=%u.%u.%u.%u:%hu seq=%u ack=%u win=%u end=%u\n",
@@ -622,7 +625,7 @@
 	struct ip_ct_tcp_state *receiver = &conntrack->proto.tcp.seen[!dir];
 #endif
 
-	end = SEGMENT_SEQ_PLUS_LEN(ntohl(tcph->seq), skb->len, iph, tcph);
+	end = segment_seq_plus_len(ntohl(tcph->seq), skb->len, iph, tcph);
 	
 	WRITE_LOCK(&tcp_lock);
 	/*
@@ -641,41 +644,34 @@
 EXPORT_SYMBOL(ip_conntrack_tcp_update);
 #endif
 
-#define	TH_FIN	0x01
-#define	TH_SYN	0x02
-#define	TH_RST	0x04
-#define	TH_PUSH	0x08
-#define	TH_ACK	0x10
-#define	TH_URG	0x20
-#define	TH_ECE	0x40
-#define	TH_CWR	0x80
-
 /* table of valid flag combinations - ECE and CWR are always valid */
-static u8 tcp_valid_flags[(TH_FIN|TH_SYN|TH_RST|TH_PUSH|TH_ACK|TH_URG) + 1] =
+static u8 tcp_valid_flags[(TCPCB_FLAG_FIN|TCPCB_FLAG_SYN|TCPCB_FLAG_RST|TCPCB_FLAG_PSH|TCPCB_FLAG_ACK|TCPCB_FLAG_URG) + 1] =
 {
-	[TH_SYN]			= 1,
-	[TH_SYN|TH_ACK]			= 1,
-	[TH_RST]			= 1,
-	[TH_RST|TH_ACK]			= 1,
-	[TH_RST|TH_ACK|TH_PUSH]		= 1,
-	[TH_FIN|TH_ACK]			= 1,
-	[TH_ACK]			= 1,
-	[TH_ACK|TH_PUSH]		= 1,
-	[TH_ACK|TH_URG]			= 1,
-	[TH_ACK|TH_URG|TH_PUSH]		= 1,
-	[TH_FIN|TH_ACK|TH_PUSH]		= 1,
-	[TH_FIN|TH_ACK|TH_URG]		= 1,
-	[TH_FIN|TH_ACK|TH_URG|TH_PUSH]	= 1,
+	[TCPCB_FLAG_SYN]				= 1,
+	[TCPCB_FLAG_SYN|TCPCB_FLAG_ACK]			= 1,
+	[TCPCB_FLAG_RST]				= 1,
+	[TCPCB_FLAG_RST|TCPCB_FLAG_ACK]			= 1,
+	[TCPCB_FLAG_RST|TCPCB_FLAG_ACK|TCPCB_FLAG_PSH]	= 1,
+	[TCPCB_FLAG_FIN|TCPCB_FLAG_ACK]			= 1,
+	[TCPCB_FLAG_ACK]				= 1,
+	[TCPCB_FLAG_ACK|TCPCB_FLAG_PSH]			= 1,
+	[TCPCB_FLAG_ACK|TCPCB_FLAG_URG]			= 1,
+	[TCPCB_FLAG_ACK|TCPCB_FLAG_URG|TCPCB_FLAG_PSH]	= 1,
+	[TCPCB_FLAG_FIN|TCPCB_FLAG_ACK|TCPCB_FLAG_PSH]	= 1,
+	[TCPCB_FLAG_FIN|TCPCB_FLAG_ACK|TCPCB_FLAG_URG]	= 1,
+	[TCPCB_FLAG_FIN|TCPCB_FLAG_ACK|TCPCB_FLAG_URG|TCPCB_FLAG_PSH]	= 1,
 };
 
-/* Protect conntrack agaist broken packets. Code taken from ipt_unclean.c.  */
+/* Protect conntrack agaist broken packets. Modified version from ipt_unclean.c.  */
 static int unclean(const struct sk_buff *skb, struct iphdr *iph, struct tcphdr *tcph)
 {
 	unsigned int tcplen = skb->len - iph->ihl * 4;
 	u_int8_t tcpflags;
 
+
 	/* Not whole TCP header or malformed packet */
-	if (tcph->doff*4 < sizeof(struct tcphdr) || tcplen < tcph->doff*4) {
+	if ((skb->len - iph->ihl*4 - sizeof(struct tcphdr) == tcph->doff*4)
+	     && tcph->doff*4 < sizeof(struct tcphdr)) {
 		if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
 			nf_log_packet(PF_INET, 0, skb, NULL, NULL, 
 				  "ip_conntrack_tcp: INVALID: truncated/malformed packet ");
@@ -696,7 +692,7 @@
 	}
 
 	/* Check TCP flags. */
-	tcpflags = (((u_int8_t *)tcph)[13] & ~(TH_ECE|TH_CWR));
+	tcpflags = (((u_int8_t *)tcph)[13] & ~(TCPCB_FLAG_ECE|TCPCB_FLAG_CWR));
 	if (!tcp_valid_flags[tcpflags]) {
 		if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
 			nf_log_packet(PF_INET, 0, skb, NULL, NULL, 
@@ -715,29 +711,22 @@
 	enum tcp_conntrack new_state, old_state;
 	enum ip_conntrack_dir dir;
 	struct iphdr *iph = skb->nh.iph;
-	unsigned char buff[15 * 4];
-	struct tcphdr *tcph = (struct tcphdr *)buff;
+	struct tcphdr tcph;
 	unsigned long timeout;
 	unsigned int index;
 	
 	/* Smaller that minimal TCP header? */
-	if (skb_copy_bits(skb, iph->ihl * 4, buff, sizeof(*tcph)) != 0)
-		return -NF_ACCEPT;
+	if (skb_copy_bits(skb, iph->ihl * 4, &tcph, sizeof(tcph)) != 0)
+		return -NF_DROP;
 
-	/* Smaller than actual TCP header? */
-	if (skb_copy_bits(skb, iph->ihl * 4 + sizeof(*tcph), 
-			  buff + sizeof(*tcph), 
-			  tcph->doff * 4 - sizeof(*tcph)) != 0)
-		return -NF_ACCEPT;
-	
 	/* Do not handle unclean packets, which could cause false alarms */
-	if (unclean(skb, iph, tcph))
-		return -NF_ACCEPT;
+	if (unclean(skb, iph, &tcph))
+		return -NF_DROP;
 	
-	WRITE_LOCK(&tcp_lock);
+ 	WRITE_LOCK(&tcp_lock);
 	old_state = conntrack->proto.tcp.state;
 	dir = CTINFO2DIR(ctinfo);
-	index = get_conntrack_index(tcph);
+	index = get_conntrack_index(&tcph);
 	new_state = tcp_conntracks[dir][index][old_state];
 
 	switch (new_state) {
@@ -746,7 +735,7 @@
 		if (index == TCP_SYNACK_SET
 		    && conntrack->proto.tcp.stored_seq == TCP_SYN_SET
 		    && conntrack->proto.tcp.last_dir != dir
-		    && after(ntohl(tcph->ack_seq), conntrack->proto.tcp.last_seq)) {
+		    && after(ntohl(tcph.ack_seq), conntrack->proto.tcp.last_seq)) {
 			/* This SYN/ACK acknowledges a SYN that we earlier ignored 
 			 * as invalid. This means that the client and the server 
 			 * are both in sync, while the firewall is not. We kill 
@@ -764,7 +753,7 @@
 		}
 		conntrack->proto.tcp.stored_seq = index;
 		conntrack->proto.tcp.last_dir = dir;
-		conntrack->proto.tcp.last_seq = ntohl(tcph->seq);
+		conntrack->proto.tcp.last_seq = ntohl(tcph.seq);
 		
 		WRITE_UNLOCK(&tcp_lock);
 		if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
@@ -773,10 +762,10 @@
 		return NF_ACCEPT;
 	case TCP_CONNTRACK_MAX:
 		/* Invalid packet */
-		DEBUGP("ip_conntrack_tcp: Invalid dir=%i index=%u conntrack=%u\n",
+ 		DEBUGP("ip_conntrack_tcp: Invalid dir=%i index=%u conntrack=%u\n",
 		       dir, get_conntrack_index(tcph),
 		       old_state);
-		WRITE_UNLOCK(&tcp_lock);
+ 		WRITE_UNLOCK(&tcp_lock);
 		if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
 			nf_log_packet(PF_INET, 0, skb, NULL, NULL, 
 				  "ip_conntrack_tcp: INVALID: invalid state ");
@@ -795,7 +784,7 @@
 		if (index == TCP_RST_SET
 		    && test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)
 		    && conntrack->proto.tcp.stored_seq <= TCP_SYNACK_SET
-		    && after(ntohl(tcph->ack), conntrack->proto.tcp.last_seq)) {
+		    && after(ntohl(tcph.ack), conntrack->proto.tcp.last_seq)) {
 			/* Ignore RST closing down invalid SYN we had let trough. */ 
 		    	WRITE_UNLOCK(&tcp_lock);
 			if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
@@ -804,23 +793,32 @@
 			return NF_ACCEPT;
 		}
 		/* Just fall trough */
+	case TCP_CONNTRACK_ESTABLISHED:
+		if ((old_state == TCP_CONNTRACK_SYN_RECV
+		     || old_state == TCP_CONNTRACK_ESTABLISHED)
+		     && !test_bit(IPS_ASSURED_BIT, &conntrack->status)) {
+			/* Set ASSURED if we see see valid ack in ESTABLISHED 
+			 * after SYN_RECV or a valid answer for a picked 
+			 * up connection. */
+			set_bit(IPS_ASSURED_BIT, &conntrack->status);
+		}
 	default:
 		/* Keep compilers happy. */
 		break;
-	}
-
+ 	}
+ 
 	conntrack->proto.tcp.stored_seq = index;
-	if (!tcp_in_window(&conntrack->proto.tcp, dir, skb, iph, tcph)) {
+	if (!tcp_in_window(&conntrack->proto.tcp, dir, skb, iph, &tcph)) {
 		/* Invalid packet */
-		WRITE_UNLOCK(&tcp_lock);
+ 		WRITE_UNLOCK(&tcp_lock);
 		return -NF_ACCEPT;
 	}
 	/* If FIN was trimmed off, don't change state. */
 	new_state = tcp_conntracks[dir][conntrack->proto.tcp.stored_seq][old_state];
-
+ 
 	DEBUGP("tcp_conntracks: src=%u.%u.%u.%u:%hu dst=%u.%u.%u.%u:%hu syn=%i ack=%i fin=%i rst=%i old=%i new=%i\n",
-		NIPQUAD(iph->saddr), ntohs(tcph->source), NIPQUAD(iph->daddr), ntohs(tcph->dest),
-		(tcph->syn ? 1 : 0), (tcph->ack ? 1 : 0), (tcph->fin ? 1 : 0), (tcph->rst ? 1 : 0),
+		NIPQUAD(iph->saddr), ntohs(tcph.source), NIPQUAD(iph->daddr), ntohs(tcph.dest),
+		(tcph.syn ? 1 : 0), (tcph.ack ? 1 : 0), (tcph.fin ? 1 : 0), (tcph.rst ? 1 : 0),
 		old_state, new_state);
 
 	conntrack->proto.tcp.state = new_state;
@@ -828,28 +826,10 @@
 		  && *tcp_timeouts[new_state] > ip_ct_tcp_timeout_max_retrans ?
 		  ip_ct_tcp_timeout_max_retrans : *tcp_timeouts[new_state];
 
-	if (!test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)) {
-		/* If only reply is a RST, we can consider ourselves not to
-		   have an established connection: this is a fairly common
-		   problem case, so we can delete the conntrack
-		   immediately.  --RR */
-		if (tcph->rst) {
-			WRITE_UNLOCK(&tcp_lock);
-			if (del_timer(&conntrack->timeout))
-				conntrack->timeout.function((unsigned long)conntrack);
-			return NF_ACCEPT;
-		}
-	} else if ((old_state == TCP_CONNTRACK_SYN_RECV
-		    || old_state == TCP_CONNTRACK_ESTABLISHED)
-		   && new_state == TCP_CONNTRACK_ESTABLISHED) {
-		/* Set ASSURED if we see see valid ack in ESTABLISHED after SYN_RECV 
-		   or a valid answer for a picked up connection. */
-			set_bit(IPS_ASSURED_BIT, &conntrack->status);
-	}
 	WRITE_UNLOCK(&tcp_lock);
 	ip_ct_refresh(conntrack, timeout);
-
-	return NF_ACCEPT;
+ 
+ 	return NF_ACCEPT;
 }
 
 /* Called when a new connection for this protocol found. */
@@ -857,30 +837,23 @@
 {
 	enum tcp_conntrack new_state;
 	struct iphdr *iph = skb->nh.iph;
-	unsigned char buff[15 * 4];
-	struct tcphdr *tcph = (struct tcphdr *)buff;
+	struct tcphdr tcph;
 #ifdef DEBUGP_VARS
 	struct ip_ct_tcp_state *sender = &conntrack->proto.tcp.seen[0];
 	struct ip_ct_tcp_state *receiver = &conntrack->proto.tcp.seen[1];
 #endif
 
 	/* Smaller that minimal TCP header? */
-	if (skb_copy_bits(skb, iph->ihl * 4, buff, sizeof(*tcph)) != 0)
+	if (skb_copy_bits(skb, iph->ihl * 4, &tcph, sizeof(tcph)) != 0)
 		return 0;
 
-	/* Smaller than actual TCP header? */
-	if (skb_copy_bits(skb, iph->ihl * 4 + sizeof(*tcph), 
-			  buff + sizeof(*tcph), 
-			  tcph->doff * 4 - sizeof(*tcph)) != 0)
-		return 0;
-	
 	/* Skip unclean packets */
-	if (unclean(skb, iph, tcph))
+	if (unclean(skb, iph, &tcph))
 		return 0;
 
 	/* Don't need lock here: this conntrack not in circulation yet */
 	new_state
-		= tcp_conntracks[0][get_conntrack_index(tcph)]
+		= tcp_conntracks[0][get_conntrack_index(&tcph)]
 		[TCP_CONNTRACK_NONE];
 
 	/* Invalid: delete conntrack */
@@ -892,14 +865,14 @@
 	if (new_state == TCP_CONNTRACK_SYN_SENT) {
 		/* SYN packet */
 		conntrack->proto.tcp.seen[0].td_end =
-			SEGMENT_SEQ_PLUS_LEN(ntohl(tcph->seq), skb->len, iph, tcph);
-		conntrack->proto.tcp.seen[0].td_maxwin = ntohs(tcph->window);
+			 segment_seq_plus_len(ntohl(tcph.seq), skb->len, iph, &tcph);
+		conntrack->proto.tcp.seen[0].td_maxwin = ntohs(tcph.window);
 		if (conntrack->proto.tcp.seen[0].td_maxwin == 0)
 			conntrack->proto.tcp.seen[0].td_maxwin = 1;
 		conntrack->proto.tcp.seen[0].td_maxend =
 			conntrack->proto.tcp.seen[0].td_end;
 
-		tcp_options(tcph, &conntrack->proto.tcp.seen[0]);
+		tcp_options(&tcph, &conntrack->proto.tcp.seen[0]);
 		conntrack->proto.tcp.seen[1].flags = 0;
 		conntrack->proto.tcp.seen[0].loose = 
 		conntrack->proto.tcp.seen[1].loose = 0;
@@ -913,8 +886,8 @@
 		 * Let's try to use the data from the packet.
 		 */
 		conntrack->proto.tcp.seen[0].td_end =
-			SEGMENT_SEQ_PLUS_LEN(ntohl(tcph->seq), skb->len, iph, tcph);
-		conntrack->proto.tcp.seen[0].td_maxwin = ntohs(tcph->window);
+			 segment_seq_plus_len(ntohl(tcph.seq), skb->len, iph, &tcph);
+		conntrack->proto.tcp.seen[0].td_maxwin = ntohs(tcph.window);
 		if (conntrack->proto.tcp.seen[0].td_maxwin == 0)
 			conntrack->proto.tcp.seen[0].td_maxwin = 1;
 		conntrack->proto.tcp.seen[0].td_maxend =

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-16 23:48 [PATCH] for some issues in tcp window tracking patch Pablo Neira
  2004-05-16 23:51 ` Pablo Neira
@ 2004-05-17 13:00 ` Jozsef Kadlecsik
  2004-05-17 21:41   ` Patrick McHardy
  2004-05-18 21:10   ` Pablo Neira
  1 sibling, 2 replies; 28+ messages in thread
From: Jozsef Kadlecsik @ 2004-05-17 13:00 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist

Hi Pablo,

On Mon, 17 May 2004, Pablo Neira wrote:

> I'm having a look at your tcp-window-tracking patch, I made some
> modifications.

All efforts to sanitize and improve the patch is greatly appreciated!

> a) Use flags defined in tcp.h instead of bsd flags: I found that these
> flags are already defined in linux with different names in tcp.h, so we
> could use linux flags instead.

Fine, better to use the flags already defined in tcp.h.

> b) check that tcp is not malformed (doff has a faked value): I trust the
> value of skb->len, so we could use it to check that the doff is correct
> instead of that skb_copy_bits and that buff var, couldn't we?

No, I think we cannot avoid calling skb_copy_bits two times: first for the
minimal TCP header and secondly for the whole TCP header. In your patch
you trim the options off, by which we'd loose window scaling support.

> c) I replaced this:
[...]
> d) I think that we don't need this anymore, this is part of the old
> tracking:

No, we cannot do that: out of window TCP packets could result status
changes or to drop connections.

> e) I defined an inline __u32 segment_seq_plus_len instead a macro, it
> gives me more flexibility.

Yes, that's fine.

> f) Do you think that it's interesting adding fine grain locking in
> tcp-window-tracking?
> http://lists.netfilter.org/pipermail/netfilter-devel/2004-May/015166.html

First I was not quite convinced, but yes, I think it'd worth. However I
believe we should introduce a generic per data locking in ip_conntrack
instead of just a per TCP data locking. Thus there were no need the extra
per protocol-helper locks either.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-17 13:00 ` Jozsef Kadlecsik
@ 2004-05-17 21:41   ` Patrick McHardy
  2004-05-18  9:57     ` Pablo Neira
  2004-05-18 21:10   ` Pablo Neira
  1 sibling, 1 reply; 28+ messages in thread
From: Patrick McHardy @ 2004-05-17 21:41 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Pablo Neira, Netfilter Development Mailinglist

Jozsef Kadlecsik wrote:

> On Mon, 17 May 2004, Pablo Neira wrote:
> 
>>a) Use flags defined in tcp.h instead of bsd flags: I found that these
>>flags are already defined in linux with different names in tcp.h, so we
>>could use linux flags instead.
> 
> 
> Fine, better to use the flags already defined in tcp.h.
> 

I don't think we should use those, they are for use in the tcp control
buffer, which is also expressed in their name (TCPCB_FLAG_*).

Regards
Patrick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-17 21:41   ` Patrick McHardy
@ 2004-05-18  9:57     ` Pablo Neira
  0 siblings, 0 replies; 28+ messages in thread
From: Pablo Neira @ 2004-05-18  9:57 UTC (permalink / raw)
  To: Patrick McHardy, Netfilter Development Mailinglist,
	Jozsef Kadlecsik

Hi Patrick,

Patrick McHardy wrote:

> Jozsef Kadlecsik wrote:
>
>> On Mon, 17 May 2004, Pablo Neira wrote:
>>
>>> a) Use flags defined in tcp.h instead of bsd flags: I found that these
>>> flags are already defined in linux with different names in tcp.h, so we
>>> could use linux flags instead.
>>
>>
>>
>> Fine, better to use the flags already defined in tcp.h.
>>
>
> I don't think we should use those, they are for use in the tcp control
> buffer, which is also expressed in their name (TCPCB_FLAG_*).


I think that we should, we are "redefining" the size of a wheel :-), and 
also this comment:

------- snipped from tcp.h ------

1130         /* NOTE: These must match up to the flags byte in a
1131          *       real TCP header.
1132          */

--------- end --------

regards,
Pablo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-17 13:00 ` Jozsef Kadlecsik
  2004-05-17 21:41   ` Patrick McHardy
@ 2004-05-18 21:10   ` Pablo Neira
  2004-05-19  8:51     ` Jozsef Kadlecsik
  1 sibling, 1 reply; 28+ messages in thread
From: Pablo Neira @ 2004-05-18 21:10 UTC (permalink / raw)
  To: Jozsef Kadlecsik, Netfilter Development Mailinglist,
	Patrick McHardy

Hi Jozsef,

Patrick, would be better if we use these flags instead of TCPCB_*?

http://lxr.linux.no/source/include/linux/tcp.h#L105

Now some questions for Jozsef...

Jozsef Kadlecsik wrote:

>>b) check that tcp is not malformed (doff has a faked value): I trust the
>>value of skb->len, so we could use it to check that the doff is correct
>>instead of that skb_copy_bits and that buff var, couldn't we?
>>    
>>
>
>No, I think we cannot avoid calling skb_copy_bits two times: first for the
>minimal TCP header and secondly for the whole TCP header. In your patch
>you trim the options off, by which we'd loose window scaling support.
>  
>

I missed that, so couldn't we call skb_copy_bits just once?, I mean:

        if (skb_copy_bits(skb, iph->ihl * 4, buff, tcph->doff * 4) != 0)
                return -NF_ACCEPT;

>>c) I replaced this:
>>    
>>
>[...]
>  
>
>>d) I think that we don't need this anymore, this is part of the old
>>tracking:
>>    
>>
>
>No, we cannot do that: out of window TCP packets could result status
>changes or to drop connections.
>  
>

I also missed that! I didn't see that if we saw a FIN packet the state 
could change, anyway I don't understand this artefact in tcp_in_window.

        /* Ignore data over the right edge of the receiver's window. */
        if (after(end, sender->td_maxend) &&
            before(seq, sender->td_maxend)) {
                end = sender->td_maxend;
                if (state->stored_seq == TCP_FIN_SET)
                        state->stored_seq = TCP_ACK_SET;
        }

why if we saw a FIN packet, we modify stored_seq to TCP_ACK_SET? I got 
confused while having a look at the state table, if old state was 
Established and we saw a FIN packet which complies with that condition, 
we will keep in state Established, so we are ignoring that FIN packet.

new_state = tcp_conntracks[dir][conntrack->tcp.stored_seq][old_state]

mmm, is that a trick to handle FIN+ACK? please, give a clue :-)

>>f) Do you think that it's interesting adding fine grain locking in
>>tcp-window-tracking?
>>http://lists.netfilter.org/pipermail/netfilter-devel/2004-May/015166.html
>>    
>>
>
>First I was not quite convinced, but yes, I think it'd worth. However I
>believe we should introduce a generic per data locking in ip_conntrack
>instead of just a per TCP data locking. Thus there were no need the extra
>per protocol-helper locks either.
>  
>
yes, that would be also ok, but some other helpers like udp don't need 
it, so is it worthy adding that field to the conntrack structure?

Another observation:

if (sender->loose)
    sender->loose--;

I think that we should decrease sender->loose inside this if condition:

        if (sender->loose || receiver->loose ||
            (before(end, sender->td_maxend + 1) &&
             after(seq, sender->td_end - receiver->td_maxwin - 1) &&
             before(ack, receiver->td_end + 1) &&
             after(ack, receiver->td_end - MAXACKWINDOW(sender))))

because initially sender->loose is 3 (or whatever was set via sysctl), 
we check that sender->loose is non-zero and decrement, so that if 
condition above would be true only for first two packet, third packet 
won't get in. I mean, if we set ip_ct_tcp_loose to 3, actually we will 
only consider first 2 packets, and we'll check if third packet is inside 
those 4 defined boundaries.

regards,
Pablo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-18 21:10   ` Pablo Neira
@ 2004-05-19  8:51     ` Jozsef Kadlecsik
  2004-05-19 10:14       ` Pablo Neira
  0 siblings, 1 reply; 28+ messages in thread
From: Jozsef Kadlecsik @ 2004-05-19  8:51 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Hi Pablo,

On Tue, 18 May 2004, Pablo Neira wrote:

> >>b) check that tcp is not malformed (doff has a faked value): I trust the
> >>value of skb->len, so we could use it to check that the doff is correct
> >>instead of that skb_copy_bits and that buff var, couldn't we?
> >
> >No, I think we cannot avoid calling skb_copy_bits two times: first for the
> >minimal TCP header and secondly for the whole TCP header. In your patch
> >you trim the options off, by which we'd loose window scaling support.
>
> I missed that, so couldn't we call skb_copy_bits just once?, I mean:
>
>         if (skb_copy_bits(skb, iph->ihl * 4, buff, tcph->doff * 4) != 0)
>                 return -NF_ACCEPT;

No, because first we have to copy to get access to tcph->doff. But it's
not bad at all! It's not a bottleneck or whatsoever.

> >>c) I replaced this:
> >>d) I think that we don't need this anymore, this is part of the old
> >>tracking:
> >
> >No, we cannot do that: out of window TCP packets could result status
> >changes or to drop connections.
>
> I also missed that! I didn't see that if we saw a FIN packet the state
> could change, anyway I don't understand this artefact in tcp_in_window.
>
>         /* Ignore data over the right edge of the receiver's window. */
>         if (after(end, sender->td_maxend) &&
>             before(seq, sender->td_maxend)) {
>                 end = sender->td_maxend;
>                 if (state->stored_seq == TCP_FIN_SET)
>                         state->stored_seq = TCP_ACK_SET;
>         }
>
> why if we saw a FIN packet, we modify stored_seq to TCP_ACK_SET? I got
> confused while having a look at the state table, if old state was
> Established and we saw a FIN packet which complies with that condition,
> we will keep in state Established, so we are ignoring that FIN packet.

Exactly. As the comment itself says as well, we ignore data (including
FIN!) over the right edge of the receiver's window. That's how the
receiver will deal with the packet.

> >>f) Do you think that it's interesting adding fine grain locking in
> >>tcp-window-tracking?
> >>http://lists.netfilter.org/pipermail/netfilter-devel/2004-May/015166.html
> >
> >First I was not quite convinced, but yes, I think it'd worth. However I
> >believe we should introduce a generic per data locking in ip_conntrack
> >instead of just a per TCP data locking. Thus there were no need the extra
> >per protocol-helper locks either.
> >
> yes, that would be also ok, but some other helpers like udp don't need
> it, so is it worthy adding that field to the conntrack structure?

After thinking on it again, you're right, but I meant the
application protocol-helper locks. If there is a TCP data lock, then there
is no need for (TCP based) application protocol-helper locks.

> Another observation:
>
> if (sender->loose)
>     sender->loose--;
>
> I think that we should decrease sender->loose inside this if condition:
>
>         if (sender->loose || receiver->loose ||
>             (before(end, sender->td_maxend + 1) &&
>              after(seq, sender->td_end - receiver->td_maxwin - 1) &&
>              before(ack, receiver->td_end + 1) &&
>              after(ack, receiver->td_end - MAXACKWINDOW(sender))))
>
> because initially sender->loose is 3 (or whatever was set via sysctl),
> we check that sender->loose is non-zero and decrement, so that if
> condition above would be true only for first two packet, third packet
> won't get in. I mean, if we set ip_ct_tcp_loose to 3, actually we will
> only consider first 2 packets, and we'll check if third packet is inside
> those 4 defined boundaries.

Yes, you're right, that decrementing is mis-placed. I'll work on the new
patch (both for 2.4/2.6).

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-19  8:51     ` Jozsef Kadlecsik
@ 2004-05-19 10:14       ` Pablo Neira
  2004-05-20 11:27         ` Jozsef Kadlecsik
  0 siblings, 1 reply; 28+ messages in thread
From: Pablo Neira @ 2004-05-19 10:14 UTC (permalink / raw)
  To: Jozsef Kadlecsik, Patrick McHardy,
	Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 2009 bytes --]

Hi Jozsef,

Jozsef Kadlecsik wrote:

>>>>f) Do you think that it's interesting adding fine grain locking in
>>>>tcp-window-tracking?
>>>>http://lists.netfilter.org/pipermail/netfilter-devel/2004-May/015166.html
>>>>        
>>>>
>>>First I was not quite convinced, but yes, I think it'd worth. However I
>>>believe we should introduce a generic per data locking in ip_conntrack
>>>instead of just a per TCP data locking. Thus there were no need the extra
>>>per protocol-helper locks either.
>>>
>>>      
>>>
>>yes, that would be also ok, but some other helpers like udp don't need
>>it, so is it worthy adding that field to the conntrack structure?
>>    
>>
>
>After thinking on it again, you're right, but I meant the
>application protocol-helper locks. If there is a TCP data lock, then there
>is no need for (TCP based) application protocol-helper locks.
>  
>

so you mean that we could use a single lock to protect both protocol and 
application helper private information, don't you? but in the case of 
the ftp helper the lock is protecting that global buffer, so your 
assumption would be wrong, wouldn't it?

I know that connection tracking shouldn't drop packets (in theory), but 
I think that it would be ok if we can add set an option via sysctl with 
the default return value. I mean, if we saw an unclean packet, return 
NF_DROP, just let the administrator choose how aggresive his tcp 
tracking is.

More feedback, I restructured the skeleton of the tcp_packet function, 
attached a patch. I moved up the call to the tcp_in_window function, so 
at first we check if a packets complies with Guido's four boundaries. 
Then we check if we saw a FIN packet which is out the boundaries so we 
came to the old state. And we get in the switch... I added both 
checkings which were performed after tcp_in_window (rst and set 
connection as assured) in the switch. Can you see any problem on this 
approach? if it's ok, I think it could sanitize the current structure of 
tcp_packet.

regards,
Pablo

[-- Attachment #2: pablo.patch --]
[-- Type: text/x-patch, Size: 6075 bytes --]

--- net/ipv4/netfilter/ip_conntrack_proto_tcp.c~	2004-05-18 11:51:02.000000000 +0200
+++ net/ipv4/netfilter/ip_conntrack_proto_tcp.c	2004-05-19 11:42:20.000000000 +0200
@@ -518,9 +518,6 @@
 		 */
 		seq = end = sender->td_end;
 		
-	if (sender->loose)
-		sender->loose--;
-
 	DEBUGP("tcp_in_window: src=%u.%u.%u.%u:%hu dst=%u.%u.%u.%u:%hu seq=%u ack=%u win=%u end=%u trim=%u\n",
 		NIPQUAD(iph->saddr), ntohs(tcph->source), NIPQUAD(iph->daddr), ntohs(tcph->dest),
 		seq, ack, win, end, 
@@ -549,6 +546,8 @@
 	     after(seq, sender->td_end - receiver->td_maxwin - 1) &&
 	     before(ack, receiver->td_end + 1) &&
 	     after(ack, receiver->td_end - MAXACKWINDOW(sender)))) {
+
+		sender->loose--;
 	    	/*
 		 * Take into account window scaling (RFC 1323).
 		 */
@@ -719,6 +718,7 @@
 	struct tcphdr *tcph = (struct tcphdr *)buff;
 	unsigned long timeout;
 	unsigned int index;
+	enum ip_conntrack_dir last_dir = conntrack->proto.tcp.last_dir;
 	
 	/* Smaller that minimal TCP header? */
 	if (skb_copy_bits(skb, iph->ihl * 4, buff, sizeof(*tcph)) != 0)
@@ -734,19 +734,34 @@
 	if (unclean(skb, iph, tcph))
 		return -NF_ACCEPT;
 	
-	WRITE_LOCK(&tcp_lock);
-	old_state = conntrack->proto.tcp.state;
 	dir = CTINFO2DIR(ctinfo);
 	index = get_conntrack_index(tcph);
-	new_state = tcp_conntracks[dir][index][old_state];
+	WRITE_LOCK(&tcp_lock);
+	old_state = conntrack->proto.tcp.state;
+	conntrack->proto.tcp.stored_seq = index;
+
+	if (!tcp_in_window(&conntrack->proto.tcp, dir, skb, iph, tcph)) {
+		/* Invalid packet */
+		WRITE_UNLOCK(&tcp_lock);
+		return -NF_ACCEPT;
+	}
+
+	/* If FIN was trimmed off, don't change state. */
+	if (index != conntrack->proto.tcp.stored_seq) 
+		new_state = 
+		     tcp_conntracks[dir][conntrack->proto.tcp.stored_seq][old_state];
+	else 
+		new_state = tcp_conntracks[dir][index][old_state];
+
+	conntrack->proto.tcp.state = new_state;
 
 	switch (new_state) {
 	case TCP_CONNTRACK_IGNORE:
 		/* Either SYN in ORIGINAL, or SYN/ACK in REPLY direction. */
 		if (index == TCP_SYNACK_SET
-		    && conntrack->proto.tcp.stored_seq == TCP_SYN_SET
-		    && conntrack->proto.tcp.last_dir != dir
-		    && after(ntohl(tcph->ack_seq), conntrack->proto.tcp.last_seq)) {
+		    && (old_state == TCP_CONNTRACK_SYN_SENT 
+			    || old_state == TCP_CONNTRACK_SYN_RECV)
+		    && last_dir != dir) {
 			/* This SYN/ACK acknowledges a SYN that we earlier ignored 
 			 * as invalid. This means that the client and the server 
 			 * are both in sync, while the firewall is not. We kill 
@@ -762,9 +777,6 @@
 		    		conntrack->timeout.function((unsigned long)conntrack);
 		    	return -NF_ACCEPT;
 		}
-		conntrack->proto.tcp.stored_seq = index;
-		conntrack->proto.tcp.last_dir = dir;
-		conntrack->proto.tcp.last_seq = ntohl(tcph->seq);
 		
 		WRITE_UNLOCK(&tcp_lock);
 		if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
@@ -794,8 +806,8 @@
 	case TCP_CONNTRACK_CLOSE:
 		if (index == TCP_RST_SET
 		    && test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)
-		    && conntrack->proto.tcp.stored_seq <= TCP_SYNACK_SET
-		    && after(ntohl(tcph->ack), conntrack->proto.tcp.last_seq)) {
+		    && (old_state == TCP_CONNTRACK_SYN_SENT
+			    || old_state == TCP_CONNTRACK_SYN_RECV)) {
 			/* Ignore RST closing down invalid SYN we had let trough. */ 
 		    	WRITE_UNLOCK(&tcp_lock);
 			if (NET_RATELIMIT(ip_ct_tcp_log_invalid))
@@ -803,49 +815,41 @@
 					  "ip_conntrack_tcp: INVALID: invalid RST (ignored) ");
 			return NF_ACCEPT;
 		}
+
+		if (!test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)) {
+		/* If only reply is a RST, we can consider ourselves not to
+		   have an established connection: this is a fairly common
+		   problem case, so we can delete the conntrack
+		   immediately.  --RR */
+			if (tcph->rst) {
+				WRITE_UNLOCK(&tcp_lock);
+				if (del_timer(&conntrack->timeout))
+					conntrack->timeout.function((unsigned long)conntrack);
+				return NF_ACCEPT;
+			}
+		}
 		/* Just fall trough */
+	case TCP_CONNTRACK_ESTABLISHED:
+		if ((old_state == TCP_CONNTRACK_SYN_RECV
+		    || old_state == TCP_CONNTRACK_ESTABLISHED)
+		    && !test_bit(IPS_ASSURED_BIT, &conntrack->status)) 
+		/* Set ASSURED if we see see valid ack in ESTABLISHED after SYN_RECV 
+		   or a valid answer for a picked up connection. */
+			set_bit(IPS_ASSURED_BIT, &conntrack->status);
 	default:
 		/* Keep compilers happy. */
 		break;
 	}
 
-	conntrack->proto.tcp.stored_seq = index;
-	if (!tcp_in_window(&conntrack->proto.tcp, dir, skb, iph, tcph)) {
-		/* Invalid packet */
-		WRITE_UNLOCK(&tcp_lock);
-		return -NF_ACCEPT;
-	}
-	/* If FIN was trimmed off, don't change state. */
-	new_state = tcp_conntracks[dir][conntrack->proto.tcp.stored_seq][old_state];
-
 	DEBUGP("tcp_conntracks: src=%u.%u.%u.%u:%hu dst=%u.%u.%u.%u:%hu syn=%i ack=%i fin=%i rst=%i old=%i new=%i\n",
 		NIPQUAD(iph->saddr), ntohs(tcph->source), NIPQUAD(iph->daddr), ntohs(tcph->dest),
 		(tcph->syn ? 1 : 0), (tcph->ack ? 1 : 0), (tcph->fin ? 1 : 0), (tcph->rst ? 1 : 0),
 		old_state, new_state);
 
-	conntrack->proto.tcp.state = new_state;
 	timeout = conntrack->proto.tcp.retrans >= ip_ct_tcp_max_retrans
 		  && *tcp_timeouts[new_state] > ip_ct_tcp_timeout_max_retrans ?
 		  ip_ct_tcp_timeout_max_retrans : *tcp_timeouts[new_state];
 
-	if (!test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)) {
-		/* If only reply is a RST, we can consider ourselves not to
-		   have an established connection: this is a fairly common
-		   problem case, so we can delete the conntrack
-		   immediately.  --RR */
-		if (tcph->rst) {
-			WRITE_UNLOCK(&tcp_lock);
-			if (del_timer(&conntrack->timeout))
-				conntrack->timeout.function((unsigned long)conntrack);
-			return NF_ACCEPT;
-		}
-	} else if ((old_state == TCP_CONNTRACK_SYN_RECV
-		    || old_state == TCP_CONNTRACK_ESTABLISHED)
-		   && new_state == TCP_CONNTRACK_ESTABLISHED) {
-		/* Set ASSURED if we see see valid ack in ESTABLISHED after SYN_RECV 
-		   or a valid answer for a picked up connection. */
-			set_bit(IPS_ASSURED_BIT, &conntrack->status);
-	}
 	WRITE_UNLOCK(&tcp_lock);
 	ip_ct_refresh(conntrack, timeout);
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-19 10:14       ` Pablo Neira
@ 2004-05-20 11:27         ` Jozsef Kadlecsik
  2004-05-20 23:18           ` Pablo Neira
  0 siblings, 1 reply; 28+ messages in thread
From: Jozsef Kadlecsik @ 2004-05-20 11:27 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Patrick McHardy, Netfilter Development Mailinglist

On Wed, 19 May 2004, Pablo Neira wrote:

> >After thinking on it again, you're right, but I meant the
> >application protocol-helper locks. If there is a TCP data lock, then there
> >is no need for (TCP based) application protocol-helper locks.
>
> so you mean that we could use a single lock to protect both protocol and
> application helper private information, don't you? but in the case of
> the ftp helper the lock is protecting that global buffer, so your
> assumption would be wrong, wouldn't it?

I was wrong again. Those global buffers cause a lot of headache. :-(

> I know that connection tracking shouldn't drop packets (in theory), but
> I think that it would be ok if we can add set an option via sysctl with
> the default return value. I mean, if we saw an unclean packet, return
> NF_DROP, just let the administrator choose how aggresive his tcp
> tracking is.

It's funny: (practically) connection tracking itself does not drop
packets, it leaves the administrator to do the dirty job by setting up
explicit rules to drop INVALID packets (or implicitly, by letting trough
only valid packets) :-)

> More feedback, I restructured the skeleton of the tcp_packet function,
> attached a patch. I moved up the call to the tcp_in_window function, so
> at first we check if a packets complies with Guido's four boundaries.

Some tests are done before checking the window boundaries from good
reasons: checking special cases when packets are possibly (mostly!) out of
window. You cannot move them behind the boundary checkings.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-20 11:27         ` Jozsef Kadlecsik
@ 2004-05-20 23:18           ` Pablo Neira
  2004-05-21  1:40             ` Henrik Nordstrom
                               ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Pablo Neira @ 2004-05-20 23:18 UTC (permalink / raw)
  To: Jozsef Kadlecsik, Netfilter Development Mailinglist

Hi Jozsef,

Jozsef Kadlecsik wrote:

>It's funny: (practically) connection tracking itself does not drop
>packets, it leaves the administrator to do the dirty job by setting up
>explicit rules to drop INVALID packets (or implicitly, by letting trough
>only valid packets) :-)
>  
>

sorry! I missed that... I should know better iptables !:-)

>
>Some tests are done before checking the window boundaries from good
>reasons: checking special cases when packets are possibly (mostly!) out of
>window. You cannot move them behind the boundary checkings.
>  
>

ok, I also missed that... more issues:

a) Sometimes we destroy the conntrack calling 
conntrack->timeout.function((unsigned long)conntrack) but we are 
returning -NF_ACCEPT, so we are calling implicitely nf_conntrack_put 
from conntrack core, could we have any unexpected problem?

b) We don't need to set assured bit all the time, I think that I told 
you, well if so, sorry for repeating me.

         } else if ((old_state == TCP_CONNTRACK_SYN_RECV
                    || old_state == TCP_CONNTRACK_ESTABLISHED)
-                   && new_state == TCP_CONNTRACK_ESTABLISHED)
+                 && new_state == TCP_CONNTRACK_ESTABLISHED
+                 && !test_bit(IPS_ASSURED_BIT, &conntrack->status)) {

c) In the TCP_CONNTRACK_SYN_SENT case, we saw a SYN packet while staying 
in state TIME_WAIT, so there's a reopened connection. Why do we destroy 
the conntrack and don't recycle /reuse this conntrack? I mean, 
reinitilize all the fields and let it continue its travel through the 
tcp tracking of instead of returning NF_REPEAT.

d) We could move up WRITE_UNLOCK(&tcp_lock) just before timeout is updated.

e) I'm not sure if it's worthy optimizing this case. In tcp_new, we are 
checking if a packet is clean as well as size of tcp header, then in 
tcp_packet we will do the same, so we are doing twice the same checking. 
If we consider that general case is that packets are clean, will be this 
double check worthy?

thanks Jozsef!,
insidious Pablo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-20 23:18           ` Pablo Neira
@ 2004-05-21  1:40             ` Henrik Nordstrom
  2004-05-21  2:23               ` Patrick McHardy
  2004-05-21  9:33               ` Pablo Neira
  2004-05-21  8:11             ` Willy Tarreau
                               ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Henrik Nordstrom @ 2004-05-21  1:40 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Jozsef Kadlecsik, Netfilter Development Mailinglist

On Fri, 21 May 2004, Pablo Neira wrote:

> b) We don't need to set assured bit all the time, I think that I told 
> you, well if so, sorry for repeating me.

>          } else if ((old_state == TCP_CONNTRACK_SYN_RECV
>                     || old_state == TCP_CONNTRACK_ESTABLISHED)
> -                   && new_state == TCP_CONNTRACK_ESTABLISHED)
> +                 && new_state == TCP_CONNTRACK_ESTABLISHED
> +                 && !test_bit(IPS_ASSURED_BIT, &conntrack->status)) {

Are you sure this really is an optimization in the current code? Is 
test_bit really faster than set_bit when the bit is already set?

But still I second this change. If ctnetlink is used then there is 
additional actions taken when the bit is set and this should only be done 
once.

Also, is it really the case that this operation needs to be atomic?

> c) In the TCP_CONNTRACK_SYN_SENT case, we saw a SYN packet while staying 
> in state TIME_WAIT, so there's a reopened connection. Why do we destroy 
> the conntrack and don't recycle /reuse this conntrack? I mean, 
> reinitilize all the fields and let it continue its travel through the 
> tcp tracking of instead of returning NF_REPEAT.

Simplicity?

It is not a very common operation.

> e) I'm not sure if it's worthy optimizing this case. In tcp_new, we are 
> checking if a packet is clean as well as size of tcp header, then in 
> tcp_packet we will do the same, so we are doing twice the same checking. 
> If we consider that general case is that packets are clean, will be this 
> double check worthy?

The old conntrack code did not verify anything in tcp_new..  but it is 
wise to check that there is a tcp header before trying to access field 
members in the header and with tcp_new being called before tcp_packet on 
new connections but only tcp_packet on already known connections...

But there is one obvious way to optimize this.. move the check for a valid 
TCP header to the pkt_to_tuple function. But I think ICMP error processing 
may complicate this a bit so maybe it is not so easy.

Regards
Henrik

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21  1:40             ` Henrik Nordstrom
@ 2004-05-21  2:23               ` Patrick McHardy
  2004-05-21  9:58                 ` Henrik Nordstrom
  2004-05-21  9:33               ` Pablo Neira
  1 sibling, 1 reply; 28+ messages in thread
From: Patrick McHardy @ 2004-05-21  2:23 UTC (permalink / raw)
  To: Henrik Nordstrom
  Cc: Pablo Neira, Jozsef Kadlecsik, Netfilter Development Mailinglist

Henrik Nordstrom wrote:
> On Fri, 21 May 2004, Pablo Neira wrote:
> 
> 
>>b) We don't need to set assured bit all the time, I think that I told 
>>you, well if so, sorry for repeating me.
> 
> 
>>         } else if ((old_state == TCP_CONNTRACK_SYN_RECV
>>                    || old_state == TCP_CONNTRACK_ESTABLISHED)
>>-                   && new_state == TCP_CONNTRACK_ESTABLISHED)
>>+                 && new_state == TCP_CONNTRACK_ESTABLISHED
>>+                 && !test_bit(IPS_ASSURED_BIT, &conntrack->status)) {
> 
> 
> Are you sure this really is an optimization in the current code? Is 
> test_bit really faster than set_bit when the bit is already set?

It should be, on x86 set_bit locks the bus:

static __inline__ void set_bit(int nr, volatile unsigned long * addr)
{
         __asm__ __volatile__( LOCK_PREFIX
                 "btsl %1,%0"
                 :"=m" (ADDR)
                 :"Ir" (nr));
}

but test_bit doesn't:

static __inline__ int variable_test_bit(int nr, const volatile unsigned 
long * addr)
{
         int oldbit;
 

         __asm__ __volatile__(
                 "btl %2,%1\n\tsbbl %0,%0"
                 :"=r" (oldbit)
                 :"m" (ADDR),"Ir" (nr));
         return oldbit;
}

Regards
Patrick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21  2:23               ` Patrick McHardy
@ 2004-05-21  9:58                 ` Henrik Nordstrom
  2004-05-21 16:07                   ` Patrick McHardy
  0 siblings, 1 reply; 28+ messages in thread
From: Henrik Nordstrom @ 2004-05-21  9:58 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Pablo Neira, Jozsef Kadlecsik, Netfilter Development Mailinglist

On Fri, 21 May 2004, Patrick McHardy wrote:

> It should be, on x86 set_bit locks the bus:

Which was my second question.. does this set_bit really need to be atomic 
with a memory barrier?

Regards
Henrik

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21  9:58                 ` Henrik Nordstrom
@ 2004-05-21 16:07                   ` Patrick McHardy
  2004-05-21 22:56                     ` Henrik Nordstrom
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick McHardy @ 2004-05-21 16:07 UTC (permalink / raw)
  To: Henrik Nordstrom
  Cc: Pablo Neira, Jozsef Kadlecsik, Netfilter Development Mailinglist

Henrik Nordstrom wrote:
> On Fri, 21 May 2004, Patrick McHardy wrote:
> 
> 
>>It should be, on x86 set_bit locks the bus:
> 
> 
> Which was my second question.. does this set_bit really need to be atomic 
> with a memory barrier?

I missed that question. It's probably not neccessary, we will get a
small race when zapping conntracks, but according to this comment in
unreplied() we already have one anyway.

/* There's a small race here where we may free a just-assured
    connection.  Too bad: we're in trouble anyway. */

Not sure if it's still correct, though.

Regards
Patrick

PS: As it turns out, I won't be gone over the weekend like I wrote.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21 16:07                   ` Patrick McHardy
@ 2004-05-21 22:56                     ` Henrik Nordstrom
  2004-05-22 13:04                       ` Stephane Ouellette
  2004-05-22 14:31                       ` Patrick McHardy
  0 siblings, 2 replies; 28+ messages in thread
From: Henrik Nordstrom @ 2004-05-21 22:56 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Pablo Neira, Jozsef Kadlecsik, Netfilter Development Mailinglist

On Fri, 21 May 2004, Patrick McHardy wrote:

> I missed that question. It's probably not neccessary, we will get a
> small race when zapping conntracks, but according to this comment in
> unreplied() we already have one anyway.

What I thought.. a __set_bit should be sufficient here, right?

Is there any good description somewhere on how these memory barriers 
works and when they are needed, or more precisely what is the difference 
between set_bit and __set_bit in terms of when/how/why each should be 
used?

But in this case there is many races.. a test_bit followed  by a set_bit 
is not atomic unles governed by a surrounding lock independent of if 
there is memory barriers of not around the operations, and the flushing of 
unconfirmed sessions is inherently racy anyway without strict locking. If 
there is strict locking then memory barriers in the set_bit should be 
completely irrelevant. Right?

Regards
Henrik

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21 22:56                     ` Henrik Nordstrom
@ 2004-05-22 13:04                       ` Stephane Ouellette
  2004-05-22 14:31                       ` Patrick McHardy
  1 sibling, 0 replies; 28+ messages in thread
From: Stephane Ouellette @ 2004-05-22 13:04 UTC (permalink / raw)
  To: Henrik Nordstrom; +Cc: Netfilter Development Mailinglist

Henrik Nordstrom wrote:

>On Fri, 21 May 2004, Patrick McHardy wrote:
>
>  
>
>>I missed that question. It's probably not neccessary, we will get a
>>small race when zapping conntracks, but according to this comment in
>>unreplied() we already have one anyway.
>>    
>>
>
>What I thought.. a __set_bit should be sufficient here, right?
>
>Is there any good description somewhere on how these memory barriers 
>works and when they are needed, or more precisely what is the difference 
>between set_bit and __set_bit in terms of when/how/why each should be 
>used?
>  
>

Henrick,

    some CPUs can reorder memory accesses in order to optimize bus 
usage.  Compilers can reorder instructions to optimize cache and 
register usage. Memory barriers are used to avoid races when the order 
of execution is important.

    For example, suppose you create a proc file with create_proc_entry() :

myprocfile = create_proc_entry( . . . );
myprocfile->fops = my_driver_proc_fops;
myprocfile->data = &my_data_structure;
.
.
.


    Many people don't check that "data" is not NULL in the file 
operation procedures and that's OK as long as you can make sure that the 
"data" pointer is set BEFORE you set the "fops" pointer...  So, here's 
our example reworked :

myprocfile = create_proc_entry( . . . );
myprocfile->data = &my_data_structure;
myprocfile->fops = my_driver_proc_fops;
.
.
.


    This is much better but still imperfect.  GCC could reorder the 
instructions to better use the cache and/or processor registers.  We 
need to introduce a write memory barrier to GARANTEE that the "data" 
pointer is set BEFORE the "fops" pointer :

myprocfile = create_proc_entry( . . . );
myprocfile->data = &my_data_structure;
wmb();
myprocfile->fops = my_driver_proc_fops;
.
.
.

   In this case, a write memory barriers forbids write accesses to cross 
the barrier in either direction.  That's our garantee that "data" is 
ALWAYS valid inside the file operation procedures...

Regards,

Stephane Ouellette

>But in this case there is many races.. a test_bit followed  by a set_bit 
>is not atomic unles governed by a surrounding lock independent of if 
>there is memory barriers of not around the operations, and the flushing of 
>unconfirmed sessions is inherently racy anyway without strict locking. If 
>there is strict locking then memory barriers in the set_bit should be 
>completely irrelevant. Right?
>
>Regards
>Henrik
>

   

>  
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21 22:56                     ` Henrik Nordstrom
  2004-05-22 13:04                       ` Stephane Ouellette
@ 2004-05-22 14:31                       ` Patrick McHardy
  1 sibling, 0 replies; 28+ messages in thread
From: Patrick McHardy @ 2004-05-22 14:31 UTC (permalink / raw)
  To: Henrik Nordstrom
  Cc: Pablo Neira, Jozsef Kadlecsik, Netfilter Development Mailinglist

Henrik Nordstrom wrote:
> On Fri, 21 May 2004, Patrick McHardy wrote:
> 
> 
>>I missed that question. It's probably not neccessary, we will get a
>>small race when zapping conntracks, but according to this comment in
>>unreplied() we already have one anyway.
> 
> 
> What I thought.. a __set_bit should be sufficient here, right?

I think so.

> 
> Is there any good description somewhere on how these memory barriers 
> works and when they are needed, or more precisely what is the difference 
> between set_bit and __set_bit in terms of when/how/why each should be 
> used?

Judging from the comments, the difference is that set_bit guarantees
multiple set_bits will be seen by all processors in the right order.
atomic is not the right word, since a single load or store is
indivisible on x86 anyway. The reason why one would use __set_bit
instead of just ORing the bits together is that it can operate on
almost arbitrarily large bitmaps.

> 
> But in this case there is many races.. a test_bit followed  by a set_bit 
> is not atomic unles governed by a surrounding lock independent of if 
> there is memory barriers of not around the operations, and the flushing of 
> unconfirmed sessions is inherently racy anyway without strict locking. If 
> there is strict locking then memory barriers in the set_bit should be 
> completely irrelevant. Right?

Yes.

Regards
Patrick

> 
> Regards
> Henrik
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21  1:40             ` Henrik Nordstrom
  2004-05-21  2:23               ` Patrick McHardy
@ 2004-05-21  9:33               ` Pablo Neira
  1 sibling, 0 replies; 28+ messages in thread
From: Pablo Neira @ 2004-05-21  9:33 UTC (permalink / raw)
  To: Henrik Nordstrom, Netfilter Development Mailinglist

Hi Henrik,

Henrik Nordstrom wrote:

>>e) I'm not sure if it's worthy optimizing this case. In tcp_new, we are 
>>checking if a packet is clean as well as size of tcp header, then in 
>>tcp_packet we will do the same, so we are doing twice the same checking. 
>>If we consider that general case is that packets are clean, will be this 
>>double check worthy?
>>    
>>
>
>But there is one obvious way to optimize this.. move the check for a valid 
>TCP header to the pkt_to_tuple function. 
>

We can't do that because we are not only checking the tcp header but 
also using the information for window scaling.

regards,
Pablo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-20 23:18           ` Pablo Neira
  2004-05-21  1:40             ` Henrik Nordstrom
@ 2004-05-21  8:11             ` Willy Tarreau
  2004-05-21 10:06               ` Pablo Neira
  2004-05-21 10:14             ` Pablo Neira
  2004-05-21 12:13             ` Jozsef Kadlecsik
  3 siblings, 1 reply; 28+ messages in thread
From: Willy Tarreau @ 2004-05-21  8:11 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Jozsef Kadlecsik, Netfilter Development Mailinglist

On Fri, May 21, 2004 at 01:18:15AM +0200, Pablo Neira wrote:

> c) In the TCP_CONNTRACK_SYN_SENT case, we saw a SYN packet while staying 
> in state TIME_WAIT, so there's a reopened connection. Why do we destroy 
> the conntrack and don't recycle /reuse this conntrack? I mean, 
> reinitilize all the fields and let it continue its travel through the 
> tcp tracking of instead of returning NF_REPEAT.

I seem to recall that it was the only way to mark it NEW again. Otherwise,
it would still match the ESTABLISHED state and not necessarily be logged
or whatever, depending on the rules.

> e) I'm not sure if it's worthy optimizing this case. In tcp_new, we are 
> checking if a packet is clean as well as size of tcp header, then in 
> tcp_packet we will do the same, so we are doing twice the same checking. 
> If we consider that general case is that packets are clean, will be this 
> double check worthy?

I remember that we already discussed this once with Jozsef, and it ended up
that we didn't want to create a new session on invalid packets, so the check
in tcp_new was necessary, and that of course we wanted to do the check in
tcp_packet for all subsequent packets. One alternative would be to call
tcp_unclean() from tcp_packet() only if the packet has not passed through
tcp_new previously. Something based on the conntrack state might be OK IMHO.

Regards,
Willy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21  8:11             ` Willy Tarreau
@ 2004-05-21 10:06               ` Pablo Neira
  0 siblings, 0 replies; 28+ messages in thread
From: Pablo Neira @ 2004-05-21 10:06 UTC (permalink / raw)
  To: Willy Tarreau, Netfilter Development Mailinglist,
	Jozsef Kadlecsik

salut Willy,

Willy Tarreau wrote:

>On Fri, May 21, 2004 at 01:18:15AM +0200, Pablo Neira wrote:
> 
>  
>
>>c) In the TCP_CONNTRACK_SYN_SENT case, we saw a SYN packet while staying 
>>in state TIME_WAIT, so there's a reopened connection. Why do we destroy 
>>the conntrack and don't recycle /reuse this conntrack? I mean, 
>>reinitilize all the fields and let it continue its travel through the 
>>tcp tracking of instead of returning NF_REPEAT.
>>    
>>
>
>I seem to recall that it was the only way to mark it NEW again. Otherwise,
>it would still match the ESTABLISHED state and not necessarily be logged
>or whatever, depending on the rules.
>  
>

oh, that makes me understand the way it's done thanks :-).

>>e) I'm not sure if it's worthy optimizing this case. In tcp_new, we are 
>>checking if a packet is clean as well as size of tcp header, then in 
>>tcp_packet we will do the same, so we are doing twice the same checking. 
>>If we consider that general case is that packets are clean, will be this 
>>double check worthy?
>>    
>>
>
>I remember that we already discussed this once with Jozsef, and it ended up
>that we didn't want to create a new session on invalid packets, so the check
>in tcp_new was necessary, and that of course we wanted to do the check in
>tcp_packet for all subsequent packets. One alternative would be to call
>tcp_unclean() from tcp_packet() only if the packet has not passed through
>tcp_new previously. Something based on the conntrack state might be OK IMHO.
>  
>

yes, we could check ctinfo to see if it's a new conntrack. I started 
thinking that optimizing this case is not so worthy since it's done only 
once when we create a conntrack for a packet...

regards,
Pablo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-20 23:18           ` Pablo Neira
  2004-05-21  1:40             ` Henrik Nordstrom
  2004-05-21  8:11             ` Willy Tarreau
@ 2004-05-21 10:14             ` Pablo Neira
  2004-05-21 12:13             ` Jozsef Kadlecsik
  3 siblings, 0 replies; 28+ messages in thread
From: Pablo Neira @ 2004-05-21 10:14 UTC (permalink / raw)
  To: Pablo Neira, Jozsef Kadlecsik, Netfilter Development Mailinglist

Pablo Neira wrote:

> d) We could move up WRITE_UNLOCK(&tcp_lock) just before timeout is 
> updated.


oh, I think this comment is not so clear, I meant:

        timeout = conntrack->proto.tcp.retrans >= ip_ct_tcp_max_retrans
                  && *tcp_timeouts[new_state] > 
ip_ct_tcp_timeout_max_retrans ?
                  ip_ct_tcp_timeout_max_retrans : *tcp_timeouts[new_state]
+     WRITE_UNLOCK(&tcp_lock);

regards,
Pablo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-20 23:18           ` Pablo Neira
                               ` (2 preceding siblings ...)
  2004-05-21 10:14             ` Pablo Neira
@ 2004-05-21 12:13             ` Jozsef Kadlecsik
  2004-05-21 23:01               ` Pablo Neira
  3 siblings, 1 reply; 28+ messages in thread
From: Jozsef Kadlecsik @ 2004-05-21 12:13 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist

Hi Pablo,

On Fri, 21 May 2004, Pablo Neira wrote:

> a) Sometimes we destroy the conntrack calling
> conntrack->timeout.function((unsigned long)conntrack) but we are
> returning -NF_ACCEPT, so we are calling implicitely nf_conntrack_put
> from conntrack core, could we have any unexpected problem?

Good catch, this is one of the rare cases where conntrack must explicitly
drop the packet, so the return value should be -NF_DROP. (-NF_ACCEPT does
not guarantee that the packet will truly be dropped, as it was discussed
in the previous mail.)

> b) We don't need to set assured bit all the time, I think that I told
> you, well if so, sorry for repeating me.
>
>          } else if ((old_state == TCP_CONNTRACK_SYN_RECV
>                     || old_state == TCP_CONNTRACK_ESTABLISHED)
> -                   && new_state == TCP_CONNTRACK_ESTABLISHED)
> +                 && new_state == TCP_CONNTRACK_ESTABLISHED
> +                 && !test_bit(IPS_ASSURED_BIT, &conntrack->status)) {

Yes, I'll add the test but as first, because that's the common case.

> c) In the TCP_CONNTRACK_SYN_SENT case, we saw a SYN packet while staying
> in state TIME_WAIT, so there's a reopened connection. Why do we destroy
> the conntrack and don't recycle /reuse this conntrack? I mean,
> reinitilize all the fields and let it continue its travel through the
> tcp tracking of instead of returning NF_REPEAT.

To complete Willy's answer the reason is that there are a lot of
informations attached to a conntrack entry (expectations, NAT). There is
no functionality to reset all those info, but to destroy the entry and let
the system create a new one, with high probability from the slab cache.

> d) We could move up WRITE_UNLOCK(&tcp_lock) just before timeout is updated.

Yes, I'll modify the code accordingly.

> e) I'm not sure if it's worthy optimizing this case. In tcp_new, we are
> checking if a packet is clean as well as size of tcp header, then in
> tcp_packet we will do the same, so we are doing twice the same checking.
> If we consider that general case is that packets are clean, will be this
> double check worthy?

That double-checking happens only for connection-initiating packets.
If we'd add a check of a flag wether the cleaness checking had been
done or not, then the normal packets would suffer from it. I think the
current solution is the optimal one.

> thanks Jozsef!,
> insidious Pablo

Actually, I'm really glad that there are guys who goes trough the code,
work hard to understand and try to improve, optimize it. We have been
planning to submit the code for kernel inclusion for a long time, but
it cannot be done lightheartedly, without a wider (known) test base and
code scrutinizing by more people.

Compared to the cvs version, the following modifications has been added to
my tree:

- tcph->ack should be tcph->ack_seq in tcp_packet at TCP_CONNTRACK_CLOSE
  case
- inlined segment_seq_plus_len
- explicit drop of SYN/ACK when the state machine is out of sync
- releasing lock moved up, after setting timeout parameter in tcp_packet
- IPS_ASSURED_BIT is checked before set in tcp_packet
- SACK support added

I'm waiting for the result of the validation of the SACK code by the guys
from Balabit, then a new patch will be released.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21 12:13             ` Jozsef Kadlecsik
@ 2004-05-21 23:01               ` Pablo Neira
  2004-05-22 12:54                 ` Jozsef Kadlecsik
  2004-06-01  9:48                 ` Jozsef Kadlecsik
  0 siblings, 2 replies; 28+ messages in thread
From: Pablo Neira @ 2004-05-21 23:01 UTC (permalink / raw)
  To: Jozsef Kadlecsik, Netfilter Development Mailinglist

Hi Jozsef,

Jozsef Kadlecsik wrote:

>>a) Sometimes we destroy the conntrack calling
>>conntrack->timeout.function((unsigned long)conntrack) but we are
>>returning -NF_ACCEPT, so we are calling implicitely nf_conntrack_put
>>from conntrack core, could we have any unexpected problem?
>>    
>>
>
>Good catch, this is one of the rare cases where conntrack must explicitly
>drop the packet, so the return value should be -NF_DROP. (-NF_ACCEPT does
>not guarantee that the packet will truly be dropped, as it was discussed
>in the previous mail.)
>  
>

still one question, some tracing:

conntrack->timeout.function is death_by_timeout -> which calls 
ip_conntrack_put -> which calls nf_conntrack_put  which decrements 
refcount and if zero, we release the conntrack.

Now we return -NF_whatever, from conntrack core we call 
nf_conntrack_put, so same thing, we reduce refcount and if zero, we 
release the conntrack.

I don't know if I'm missing anything but we are decrementing twice 
conntrack refcount, at worst we could try to release an already released 
conntrack. Am I missing anything?
 

>Actually, I'm really glad that there are guys who goes trough the code,
>work hard to understand and try to improve, optimize it.
>

I do also appreciate that some people like you consider that my efforts 
are not a waste of time :-)

> We have been
>planning to submit the code for kernel inclusion for a long time, but
>it cannot be done lightheartedly, without a wider (known) test base and
>code scrutinizing by more people.
>  
>

I think that this could be an interesting new feature, as well as those 
patches to make fine grain locking in conntrack which still need some 
spins and Patrick's ipsec... well, there are interesting stuff in pom-ng.

>Compared to the cvs version, the following modifications has been added to
>my tree:
>
>- tcph->ack should be tcph->ack_seq in tcp_packet at TCP_CONNTRACK_CLOSE
>  case
>- inlined segment_seq_plus_len
>- explicit drop of SYN/ACK when the state machine is out of sync
>- releasing lock moved up, after setting timeout parameter in tcp_packet
>- IPS_ASSURED_BIT is checked before set in tcp_packet
>- SACK support added
>  
>

two issues missing:
- sender->loose decrements misplaced
- use of defined flags in tcp.h

and pending that you give your ok:
- fine grain locking for private tcp helper info

regards,
Pablo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21 23:01               ` Pablo Neira
@ 2004-05-22 12:54                 ` Jozsef Kadlecsik
  2004-06-01  9:48                 ` Jozsef Kadlecsik
  1 sibling, 0 replies; 28+ messages in thread
From: Jozsef Kadlecsik @ 2004-05-22 12:54 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist

On Sat, 22 May 2004, Pablo Neira wrote:

> conntrack->timeout.function is death_by_timeout -> which calls
> ip_conntrack_put -> which calls nf_conntrack_put  which decrements
> refcount and if zero, we release the conntrack.
>
> Now we return -NF_whatever, from conntrack core we call
> nf_conntrack_put, so same thing, we reduce refcount and if zero, we
> release the conntrack.
>
> I don't know if I'm missing anything but we are decrementing twice
> conntrack refcount, at worst we could try to release an already released
> conntrack. Am I missing anything?

No, that's fine. The conntrack entry goes away when all references are
released and it springs into existence with refcount 1 (conntrack itself).
Then when we attach the conntrack entry to a packet, the refcount is
incremented so that it can't disappear under us. We just decrement
those values here to destroy conntrack.

> >- tcph->ack should be tcph->ack_seq in tcp_packet at TCP_CONNTRACK_CLOSE
> >  case
> >- inlined segment_seq_plus_len
> >- explicit drop of SYN/ACK when the state machine is out of sync
> >- releasing lock moved up, after setting timeout parameter in tcp_packet
> >- IPS_ASSURED_BIT is checked before set in tcp_packet
> >- SACK support added
>
> two issues missing:
> - sender->loose decrements misplaced

I missed to add that to the 'done' list above.

> - use of defined flags in tcp.h
>
> and pending that you give your ok:
> - fine grain locking for private tcp helper info

Yep, those are not included yet - SACK was more important to add.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-05-21 23:01               ` Pablo Neira
  2004-05-22 12:54                 ` Jozsef Kadlecsik
@ 2004-06-01  9:48                 ` Jozsef Kadlecsik
  2004-06-01 12:26                   ` Pablo Neira
  1 sibling, 1 reply; 28+ messages in thread
From: Jozsef Kadlecsik @ 2004-06-01  9:48 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist

Hi Pablo,

As far as I remember there are two issues not settled down yet:

On Sat, 22 May 2004, Pablo Neira wrote:

> - use of defined flags in tcp.h

As Patrick reminded us, we cannot use the TCPCB_FLAG_* flags. And I'm
reluctant to use the enumerated values from include/linux/tcp.h because
then I should have to add shifts to the window tracking code - just to
eliminate the locally defined flags. What'd we gain by complicating the
code?

> and pending that you give your ok:
> - fine grain locking for private tcp helper info

First I added your code to the patch then removed it, sorry: it's overkill
to add per entry locking. In the (near) future we'll have per bucket
locking and then we can rely on that instead of a global tcp_lock.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-06-01  9:48                 ` Jozsef Kadlecsik
@ 2004-06-01 12:26                   ` Pablo Neira
  2004-06-02  5:13                     ` Willy Tarreau
  2004-06-08  9:55                     ` Jozsef Kadlecsik
  0 siblings, 2 replies; 28+ messages in thread
From: Pablo Neira @ 2004-06-01 12:26 UTC (permalink / raw)
  To: Jozsef Kadlecsik, Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

Hi Jozsef,

Jozsef Kadlecsik wrote:

>>- use of defined flags in tcp.h
>>    
>>
>
>As Patrick reminded us, we cannot use the TCPCB_FLAG_* flags. And I'm
>reluctant to use the enumerated values from include/linux/tcp.h because
>then I should have to add shifts to the window tracking code - just to
>eliminate the locally defined flags. What'd we gain by complicating the
>code?
>  
>

nothing, I agree with you and Patrick.

>>and pending that you give your ok:
>>- fine grain locking for private tcp helper info
>>    
>>
>
>First I added your code to the patch then removed it, sorry: it's overkill
>to add per entry locking. In the (near) future we'll have per bucket
>locking and then we can rely on that instead of a global tcp_lock.
>  
>

I agree, it doesn't make much sense adding that now, but it will after 
pushing forward your patches to improve locking stuff in conntrack.

Jozsef, I attached a patch to this email to clean up the way we handle 
trimmed off FIN packets, let me know if it's worth it. Still one 
question, should we update the timeout in this case?

regards,
Pablo

[-- Attachment #2: tcp-win.patch --]
[-- Type: text/x-patch, Size: 2430 bytes --]

--- net/ipv4/netfilter/ip_conntrack_proto_tcp.c~	2004-06-01 13:33:39.000000000 +0200
+++ net/ipv4/netfilter/ip_conntrack_proto_tcp.c	2004-06-01 14:18:43.000000000 +0200
@@ -485,7 +485,8 @@
                          enum ip_conntrack_dir dir,
                          const struct sk_buff *skb,
                          struct iphdr *iph,
-                         struct tcphdr *tcph)
+                         struct tcphdr *tcph,
+                         int *commit)
 {
 	struct ip_ct_tcp_state *sender = &state->seen[dir];
 	struct ip_ct_tcp_state *receiver = &state->seen[!dir];
@@ -592,7 +593,7 @@
 	    before(seq, sender->td_maxend)) {
 		end = sender->td_maxend;
 		if (state->stored_seq == TCP_FIN_SET)
-			state->stored_seq = TCP_ACK_SET;
+			*commit = 0;
 	}
 	DEBUGP("tcp_in_window: I=%i II=%i III=%i IV=%i\n",
 		before(end, sender->td_maxend + 1) 
@@ -784,6 +785,7 @@
 	struct tcphdr *tcph = (struct tcphdr *)buff;
 	unsigned long timeout;
 	unsigned int index, old_index;
+	int commit = 1;
 	
 	/* Smaller that minimal TCP header? */
 	if (skb_copy_bits(skb, iph->ihl * 4, buff, sizeof(*tcph)) != 0)
@@ -876,7 +878,7 @@
 
 	old_index =  conntrack->proto.tcp.stored_seq;
 	conntrack->proto.tcp.stored_seq = index;
-	if (!tcp_in_window(&conntrack->proto.tcp, dir, skb, iph, tcph)) {
+	if (!tcp_in_window(&conntrack->proto.tcp, dir, skb, iph, tcph, &commit)) {
 		/* Invalid packet, restore previous state */
 		conntrack->proto.tcp.stored_seq = old_index;
 		WRITE_UNLOCK(&tcp_lock);
@@ -884,15 +886,17 @@
 	}
 	/* From now on we have got in-window packets */
 	
-	/* If FIN was trimmed off, don't change state. */
-	new_state = tcp_conntracks[dir][conntrack->proto.tcp.stored_seq][old_state];
-
 	DEBUGP("tcp_conntracks: src=%u.%u.%u.%u:%hu dst=%u.%u.%u.%u:%hu syn=%i ack=%i fin=%i rst=%i old=%i new=%i\n",
 		NIPQUAD(iph->saddr), ntohs(tcph->source), NIPQUAD(iph->daddr), ntohs(tcph->dest),
 		(tcph->syn ? 1 : 0), (tcph->ack ? 1 : 0), (tcph->fin ? 1 : 0), (tcph->rst ? 1 : 0),
 		old_state, new_state);
 
-	conntrack->proto.tcp.state = new_state;
+	/* If FIN was trimmed off, don't change state. */
+	if (commit)
+		conntrack->proto.tcp.state = new_state;
+	else 
+		conntrack->proto.tcp.stored_seq = old_index;
+
 	timeout = conntrack->proto.tcp.retrans >= ip_ct_tcp_max_retrans
 		  && *tcp_timeouts[new_state] > ip_ct_tcp_timeout_max_retrans ?
 		  ip_ct_tcp_timeout_max_retrans : *tcp_timeouts[new_state];

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-06-01 12:26                   ` Pablo Neira
@ 2004-06-02  5:13                     ` Willy Tarreau
  2004-06-08  9:55                     ` Jozsef Kadlecsik
  1 sibling, 0 replies; 28+ messages in thread
From: Willy Tarreau @ 2004-06-02  5:13 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Jozsef Kadlecsik, Netfilter Development Mailinglist

Hi Pablo,

On Tue, Jun 01, 2004 at 02:26:51PM +0200, Pablo Neira wrote:
> Jozsef, I attached a patch to this email to clean up the way we handle 
> trimmed off FIN packets, let me know if it's worth it. Still one 
> question, should we update the timeout in this case?

I like your cleanup with the notion of "commit", it's much cleaner than
changing things then repairing them afterwards. You're right IMHO, if we
ignore a packet, we should not update the timeout.

BTW, there's something else we could change. There are several fields
that we have re-used and abused to store information during evolution.
For example, 'stored_seq' is used to maintain the combination of flags seen
last ! Not very obvious from the name, and we could change it to something
more consistent with other names, such as 'last_index' or 'last_flags'.
Perhaps we should audit all these variables again and propose to change
a few names to something more explicit ? And we could use unions when
some field are shared for several purpose. Another example, stored_seq
is between 0 and 4 or 5 (don't remember, sorry), and last_dir is only
0 or 1, state is <13... Well, 8 bits are enough to store these 3 fields
together in an efficient way (we would have to adjust some #defines so
that we avoid bit shifts). The same is true for 'loose' and 'flags' in
struct ip_ct_tcp_state. With proper arrangement and due to alignment
constraints we could save 4 or 8 bytes. Well, perhaps I'm going too far
again :-)

Regards,
willy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] for some issues in tcp window tracking patch
  2004-06-01 12:26                   ` Pablo Neira
  2004-06-02  5:13                     ` Willy Tarreau
@ 2004-06-08  9:55                     ` Jozsef Kadlecsik
  1 sibling, 0 replies; 28+ messages in thread
From: Jozsef Kadlecsik @ 2004-06-08  9:55 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist

Hi Pablo,

On Tue, 1 Jun 2004, Pablo Neira wrote:

> Jozsef, I attached a patch to this email to clean up the way we handle
> trimmed off FIN packets, let me know if it's worth it.

Based on your patch I cleaned up handling the index and renamed a
structure element to reflect for what we really use it.

> Still one question, should we update the timeout in this case?

Yes, because we just ignore the segment part (and the FIN flag) over the
right edge of the window of the receiver, but the other parts of the
segment are taken into account.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2004-06-08  9:55 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-16 23:48 [PATCH] for some issues in tcp window tracking patch Pablo Neira
2004-05-16 23:51 ` Pablo Neira
2004-05-17 13:00 ` Jozsef Kadlecsik
2004-05-17 21:41   ` Patrick McHardy
2004-05-18  9:57     ` Pablo Neira
2004-05-18 21:10   ` Pablo Neira
2004-05-19  8:51     ` Jozsef Kadlecsik
2004-05-19 10:14       ` Pablo Neira
2004-05-20 11:27         ` Jozsef Kadlecsik
2004-05-20 23:18           ` Pablo Neira
2004-05-21  1:40             ` Henrik Nordstrom
2004-05-21  2:23               ` Patrick McHardy
2004-05-21  9:58                 ` Henrik Nordstrom
2004-05-21 16:07                   ` Patrick McHardy
2004-05-21 22:56                     ` Henrik Nordstrom
2004-05-22 13:04                       ` Stephane Ouellette
2004-05-22 14:31                       ` Patrick McHardy
2004-05-21  9:33               ` Pablo Neira
2004-05-21  8:11             ` Willy Tarreau
2004-05-21 10:06               ` Pablo Neira
2004-05-21 10:14             ` Pablo Neira
2004-05-21 12:13             ` Jozsef Kadlecsik
2004-05-21 23:01               ` Pablo Neira
2004-05-22 12:54                 ` Jozsef Kadlecsik
2004-06-01  9:48                 ` Jozsef Kadlecsik
2004-06-01 12:26                   ` Pablo Neira
2004-06-02  5:13                     ` Willy Tarreau
2004-06-08  9:55                     ` Jozsef Kadlecsik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.