netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] Per route TCP options
@ 2009-10-20 15:22 Gilad Ben-Yossef
  2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef

Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
time stamp and window scale into per route entry feature options,
laying the ground to future removal of the relevant global sysctls.

You really only want to disable SACK, DSACK, time stamp or window
scale if you've got a piece of broken networking equipment somewhere 
as a stop gap until you can bring a big enough hammer to deal with
the broken network equipment. It doesn't make sense to "punish" the
entire connections going through the machine to destinations not 
related to the broken equipment.

This is doubly true when you're dealing with network containers
used to isolate several virtual domains.

Per route options implemented in free bits in the features route
entry property, which in some cases were reserved by name for these
options, so this does not inflate any structure and I expect that
when the apropriate global sysctls will be removed the overall code
base will be smaller.

Tested on x86 using Qemu/KVM.  

Will send the matching patch to iproute2 if/when this is ACKed or
if someone wants to test this.

Patchset based on original work by Ori Finkelman and Yoni Amit 
from ComSleep Ltd.

Gilad Ben-Yossef (8):
  Only parse time stamp TCP option in time wait sock
  Allow tcp_parse_options to consult dst entry
  Infrastructure for querying route entry features
  Add the no SACK route option feature
  Allow disabling TCP timestamp options per route
  Allow to turn off TCP window scale opt per route
  Allow disabling of DSACK TCP option per route
  Document future removal of sysctl_tcp_* options

 Documentation/feature-removal-schedule.txt |   12 ++++++++++++
 include/linux/rtnetlink.h                  |    6 ++++--
 include/net/dst.h                          |    8 +++++++-
 include/net/tcp.h                          |    3 ++-
 net/ipv4/syncookies.c                      |   27 ++++++++++++++-------------
 net/ipv4/tcp_input.c                       |   26 ++++++++++++++++++--------
 net/ipv4/tcp_ipv4.c                        |   19 ++++++++++---------
 net/ipv4/tcp_minisocks.c                   |    8 +++++---
 net/ipv4/tcp_output.c                      |   18 +++++++++++++-----
 net/ipv6/syncookies.c                      |   28 +++++++++++++++-------------
 net/ipv6/tcp_ipv6.c                        |    3 ++-
 11 files changed, 102 insertions(+), 56 deletions(-)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH RFC] Only parse time stamp TCP option in time wait sock
  2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef
@ 2009-10-20 15:22 ` Gilad Ben-Yossef
  2009-10-20 15:22   ` [PATCH RFC] Allow tcp_parse_options to consult dst entry Gilad Ben-Yossef
  2009-10-20 15:44 ` [PATCH RFC] Per route TCP options Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef, Yony Amit

A time wait socket is established - we already know if time stamp
option is called for or not.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>
Signed-off-by: Yony Amit <yony@comsleep.com>

---
 net/ipv4/tcp_minisocks.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 624c3c9..c49a550 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -100,9 +100,8 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 	struct tcp_options_received tmp_opt;
 	int paws_reject = 0;
 
-	tmp_opt.saw_tstamp = 0;
 	if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {
-		tcp_parse_options(skb, &tmp_opt, 0);
+		tcp_parse_options(skb, &tmp_opt, 1);
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent	= tcptw->tw_ts_recent;
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC] Allow tcp_parse_options to consult dst entry
  2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef
@ 2009-10-20 15:22   ` Gilad Ben-Yossef
  2009-10-20 15:22     ` [PATCH RFC] Add dst_feature to query route entry features Gilad Ben-Yossef
  0 siblings, 1 reply; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef

We need tcp_parse_options to be aware of dst_entry to 
take into account per dst_entry TCP options settings

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/net/tcp.h        |    3 ++-
 net/ipv4/syncookies.c    |   27 ++++++++++++++-------------
 net/ipv4/tcp_input.c     |    9 ++++++---
 net/ipv4/tcp_ipv4.c      |   19 ++++++++++---------
 net/ipv4/tcp_minisocks.c |    7 +++++--
 net/ipv6/syncookies.c    |   28 +++++++++++++++-------------
 net/ipv6/tcp_ipv6.c      |    3 ++-
 7 files changed, 54 insertions(+), 42 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..740d09b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -409,7 +409,8 @@ extern int			tcp_recvmsg(struct kiocb *iocb, struct sock *sk,
 
 extern void			tcp_parse_options(struct sk_buff *skb,
 						  struct tcp_options_received *opt_rx,
-						  int estab);
+						  int estab,
+						  struct dst_entry *dst);
 
 extern u8			*tcp_parse_md5sig_option(struct tcphdr *th);
 
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index a6e0e07..4990dd4 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -276,13 +276,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
 
-	/* check for timestamp cookie support */
-	memset(&tcp_opt, 0, sizeof(tcp_opt));
-	tcp_parse_options(skb, &tcp_opt, 0);
-
-	if (tcp_opt.saw_tstamp)
-		cookie_check_timestamp(&tcp_opt);
-
 	ret = NULL;
 	req = inet_reqsk_alloc(&tcp_request_sock_ops); /* for safety */
 	if (!req)
@@ -298,12 +291,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	ireq->loc_addr		= ip_hdr(skb)->daddr;
 	ireq->rmt_addr		= ip_hdr(skb)->saddr;
 	ireq->ecn_ok		= 0;
-	ireq->snd_wscale	= tcp_opt.snd_wscale;
-	ireq->rcv_wscale	= tcp_opt.rcv_wscale;
-	ireq->sack_ok		= tcp_opt.sack_ok;
-	ireq->wscale_ok		= tcp_opt.wscale_ok;
-	ireq->tstamp_ok		= tcp_opt.saw_tstamp;
-	req->ts_recent		= tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
 
 	/* We throwed the options of the initial SYN away, so we hope
 	 * the ACK carries the same options again (see RFC1122 4.2.3.8)
@@ -351,6 +338,20 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 		}
 	}
 
+	/* check for timestamp cookie support */
+	memset(&tcp_opt, 0, sizeof(tcp_opt));
+	tcp_parse_options(skb, &tcp_opt, 0, &rt->u.dst);
+
+	if (tcp_opt.saw_tstamp)
+		cookie_check_timestamp(&tcp_opt);
+
+	ireq->snd_wscale        = tcp_opt.snd_wscale;
+	ireq->rcv_wscale        = tcp_opt.rcv_wscale;
+	ireq->sack_ok           = tcp_opt.sack_ok;
+	ireq->wscale_ok         = tcp_opt.wscale_ok;
+	ireq->tstamp_ok         = tcp_opt.saw_tstamp;
+	req->ts_recent          = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
+
 	/* Try to redo what tcp_v4_send_synack did. */
 	req->window_clamp = tp->window_clamp ? :dst_metric(&rt->u.dst, RTAX_WINDOW);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d86784b..d502f49 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3698,12 +3698,14 @@ old_ack:
  * the fast version below fails.
  */
 void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
-		       int estab)
+		       int estab,  struct dst_entry *dst)
 {
 	unsigned char *ptr;
 	struct tcphdr *th = tcp_hdr(skb);
 	int length = (th->doff * 4) - sizeof(struct tcphdr);
 
+	BUG_ON(!estab && !dst);
+
 	ptr = (unsigned char *)(th + 1);
 	opt_rx->saw_tstamp = 0;
 
@@ -3820,7 +3822,7 @@ static int tcp_fast_parse_options(struct sk_buff *skb, struct tcphdr *th,
 		if (tcp_parse_aligned_timestamp(tp, th))
 			return 1;
 	}
-	tcp_parse_options(skb, &tp->rx_opt, 1);
+	tcp_parse_options(skb, &tp->rx_opt, 1, NULL);
 	return 1;
 }
 
@@ -5364,8 +5366,9 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	int saved_clamp = tp->rx_opt.mss_clamp;
+	struct dst_entry *dst = __sk_dst_get(sk);
 
-	tcp_parse_options(skb, &tp->rx_opt, 0);
+	tcp_parse_options(skb, &tp->rx_opt, 0, dst);
 
 	if (th->ack) {
 		/* rfc793:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7cda24b..1cb0ec4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
 #endif
 
+	ireq = inet_rsk(req);
+	ireq->loc_addr = daddr;
+	ireq->rmt_addr = saddr;
+	ireq->no_srccheck = inet_sk(sk)->transparent;
+	ireq->opt = tcp_v4_save_options(sk, skb);
+
+	dst = inet_csk_route_req(sk, req);
 	tcp_clear_options(&tmp_opt);
 	tmp_opt.mss_clamp = 536;
 	tmp_opt.user_mss  = tcp_sk(sk)->rx_opt.user_mss;
 
-	tcp_parse_options(skb, &tmp_opt, 0);
+	tcp_parse_options(skb, &tmp_opt, 0, dst);
 
 	if (want_cookie && !tmp_opt.saw_tstamp)
 		tcp_clear_options(&tmp_opt);
@@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 
 	tcp_openreq_init(req, &tmp_opt, skb);
 
-	ireq = inet_rsk(req);
-	ireq->loc_addr = daddr;
-	ireq->rmt_addr = saddr;
-	ireq->no_srccheck = inet_sk(sk)->transparent;
-	ireq->opt = tcp_v4_save_options(sk, skb);
-
 	if (security_inet_conn_request(sk, skb, req))
-		goto drop_and_free;
+		goto drop_and_release;
 
 	if (!want_cookie)
 		TCP_ECN_create_request(req, tcp_hdr(skb));
@@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		 */
 		if (tmp_opt.saw_tstamp &&
 		    tcp_death_row.sysctl_tw_recycle &&
-		    (dst = inet_csk_route_req(sk, req)) != NULL &&
+		    dst != NULL &&
 		    (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
 		    peer->v4daddr == saddr) {
 			if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index c49a550..70ff955 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -101,7 +101,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 	int paws_reject = 0;
 
 	if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {
-		tcp_parse_options(skb, &tmp_opt, 1);
+		tcp_parse_options(skb, &tmp_opt, 1, NULL);
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent	= tcptw->tw_ts_recent;
@@ -499,10 +499,11 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 	int paws_reject = 0;
 	struct tcp_options_received tmp_opt;
 	struct sock *child;
+	struct dst_entry *dst = inet_csk_route_req(sk, req);
 
 	tmp_opt.saw_tstamp = 0;
 	if (th->doff > (sizeof(struct tcphdr)>>2)) {
-		tcp_parse_options(skb, &tmp_opt, 0);
+		tcp_parse_options(skb, &tmp_opt, 0, dst);
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent = req->ts_recent;
@@ -515,6 +516,8 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 		}
 	}
 
+	dst_release(dst);
+
 	/* Check for pure retransmitted SYN. */
 	if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn &&
 	    flg == TCP_FLAG_SYN &&
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 6b6ae91..6ece408 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -184,13 +184,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
 
-	/* check for timestamp cookie support */
-	memset(&tcp_opt, 0, sizeof(tcp_opt));
-	tcp_parse_options(skb, &tcp_opt, 0);
-
-	if (tcp_opt.saw_tstamp)
-		cookie_check_timestamp(&tcp_opt);
-
 	ret = NULL;
 	req = inet6_reqsk_alloc(&tcp6_request_sock_ops);
 	if (!req)
@@ -224,12 +217,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	req->expires = 0UL;
 	req->retrans = 0;
 	ireq->ecn_ok		= 0;
-	ireq->snd_wscale	= tcp_opt.snd_wscale;
-	ireq->rcv_wscale	= tcp_opt.rcv_wscale;
-	ireq->sack_ok		= tcp_opt.sack_ok;
-	ireq->wscale_ok		= tcp_opt.wscale_ok;
-	ireq->tstamp_ok		= tcp_opt.saw_tstamp;
-	req->ts_recent		= tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
 	treq->rcv_isn = ntohl(th->seq) - 1;
 	treq->snt_isn = cookie;
 
@@ -264,6 +251,21 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 			goto out_free;
 	}
 
+	/* check for timestamp cookie support */
+	memset(&tcp_opt, 0, sizeof(tcp_opt));
+	tcp_parse_options(skb, &tcp_opt, 0, dst);
+
+	if (tcp_opt.saw_tstamp)
+		cookie_check_timestamp(&tcp_opt);
+
+	req->ts_recent          = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
+
+	ireq->snd_wscale        = tcp_opt.snd_wscale;
+	ireq->rcv_wscale        = tcp_opt.rcv_wscale;
+	ireq->sack_ok           = tcp_opt.sack_ok;
+	ireq->wscale_ok         = tcp_opt.wscale_ok;
+	ireq->tstamp_ok         = tcp_opt.saw_tstamp;
+
 	req->window_clamp = tp->window_clamp ? :dst_metric(dst, RTAX_WINDOW);
 	tcp_select_initial_window(tcp_full_space(sk), req->mss,
 				  &req->rcv_wnd, &req->window_clamp,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 21d100b..2eebab5 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1165,6 +1165,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct request_sock *req = NULL;
 	__u32 isn = TCP_SKB_CB(skb)->when;
+	struct dst_entry *dst = __sk_dst_get(sk);
 #ifdef CONFIG_SYN_COOKIES
 	int want_cookie = 0;
 #else
@@ -1203,7 +1204,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	tmp_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
 	tmp_opt.user_mss = tp->rx_opt.user_mss;
 
-	tcp_parse_options(skb, &tmp_opt, 0);
+	tcp_parse_options(skb, &tmp_opt, 0, dst);
 
 	if (want_cookie && !tmp_opt.saw_tstamp)
 		tcp_clear_options(&tmp_opt);
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC] Add dst_feature to query route entry features
  2009-10-20 15:22   ` [PATCH RFC] Allow tcp_parse_options to consult dst entry Gilad Ben-Yossef
@ 2009-10-20 15:22     ` Gilad Ben-Yossef
  2009-10-20 15:22       ` [PATCH RFC] Add the no SACK route option feature Gilad Ben-Yossef
  0 siblings, 1 reply; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef

Adding an accessor to existing  dst_entry feautres field and
refactor the only supported feature (allfrag) to use it.


Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/net/dst.h |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 5a900dd..b562be3 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -111,6 +111,12 @@ dst_metric(const struct dst_entry *dst, int metric)
 	return dst->metrics[metric-1];
 }
 
+static inline u32
+dst_feature(const struct dst_entry *dst, u32 feature)
+{
+	return dst_metric(dst, RTAX_FEATURES) & feature;
+}
+
 static inline u32 dst_mtu(const struct dst_entry *dst)
 {
 	u32 mtu = dst_metric(dst, RTAX_MTU);
@@ -136,7 +142,7 @@ static inline void set_dst_metric_rtt(struct dst_entry *dst, int metric,
 static inline u32
 dst_allfrag(const struct dst_entry *dst)
 {
-	int ret = dst_metric(dst, RTAX_FEATURES) & RTAX_FEATURE_ALLFRAG;
+	int ret = dst_feature(dst,  RTAX_FEATURE_ALLFRAG);
 	/* Yes, _exactly_. This is paranoia. */
 	barrier();
 	return ret;
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC] Add the no SACK route option feature
  2009-10-20 15:22     ` [PATCH RFC] Add dst_feature to query route entry features Gilad Ben-Yossef
@ 2009-10-20 15:22       ` Gilad Ben-Yossef
  2009-10-20 15:22         ` [PATCH RFC] Allow disabling TCP timestamp options per route Gilad Ben-Yossef
  0 siblings, 1 reply; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef

Implement querying and acting upon the no sack bit in the features
field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/linux/rtnetlink.h |    2 +-
 net/ipv4/tcp_input.c      |    3 ++-
 net/ipv4/tcp_output.c     |    4 +++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index adf2068..9c802a6 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -377,7 +377,7 @@ enum
 #define RTAX_MAX (__RTAX_MAX - 1)
 
 #define RTAX_FEATURE_ECN	0x00000001
-#define RTAX_FEATURE_SACK	0x00000002
+#define RTAX_FEATURE_NO_SACK	0x00000002
 #define RTAX_FEATURE_TIMESTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d502f49..b14f780 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3763,7 +3763,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
 				break;
 			case TCPOPT_SACK_PERM:
 				if (opsize == TCPOLEN_SACK_PERM && th->syn &&
-				    !estab && sysctl_tcp_sack) {
+				    !estab && sysctl_tcp_sack &&
+				    !dst_feature(dst, RTAX_FEATURE_NO_SACK)) {
 					opt_rx->sack_ok = 1;
 					tcp_sack_reset(opt_rx);
 				}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fcd278a..64db8dd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -464,6 +464,7 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 				struct tcp_md5sig_key **md5) {
 	struct tcp_sock *tp = tcp_sk(sk);
 	unsigned size = 0;
+	struct dst_entry *dst = __sk_dst_get(sk);
 
 #ifdef CONFIG_TCP_MD5SIG
 	*md5 = tp->af_specific->md5_lookup(sk, sk);
@@ -498,7 +499,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 		opts->options |= OPTION_WSCALE;
 		size += TCPOLEN_WSCALE_ALIGNED;
 	}
-	if (likely(sysctl_tcp_sack)) {
+	if (likely(sysctl_tcp_sack &&
+		   !dst_feature(dst, RTAX_FEATURE_NO_SACK))) {
 		opts->options |= OPTION_SACK_ADVERTISE;
 		if (unlikely(!(OPTION_TS & opts->options)))
 			size += TCPOLEN_SACKPERM_ALIGNED;
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC] Allow disabling TCP timestamp options per route
  2009-10-20 15:22       ` [PATCH RFC] Add the no SACK route option feature Gilad Ben-Yossef
@ 2009-10-20 15:22         ` Gilad Ben-Yossef
  2009-10-20 15:22           ` [PATCH RFC] Allow to turn off TCP window scale opt " Gilad Ben-Yossef
  0 siblings, 1 reply; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef

Implement querying and acting upon the no timestamp bit in the feature 
field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/linux/rtnetlink.h |    2 +-
 net/ipv4/tcp_input.c      |    3 ++-
 net/ipv4/tcp_output.c     |    8 ++++++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 9c802a6..2ab8c75 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -378,7 +378,7 @@ enum
 
 #define RTAX_FEATURE_ECN	0x00000001
 #define RTAX_FEATURE_NO_SACK	0x00000002
-#define RTAX_FEATURE_TIMESTAMP	0x00000004
+#define RTAX_FEATURE_NO_TSTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
 
 struct rta_session
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b14f780..d2f9742 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3755,7 +3755,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
 			case TCPOPT_TIMESTAMP:
 				if ((opsize == TCPOLEN_TIMESTAMP) &&
 				    ((estab && opt_rx->tstamp_ok) ||
-				     (!estab && sysctl_tcp_timestamps))) {
+				     (!estab && sysctl_tcp_timestamps &&
+				      !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP)))) {
 					opt_rx->saw_tstamp = 1;
 					opt_rx->rcv_tsval = get_unaligned_be32(ptr);
 					opt_rx->rcv_tsecr = get_unaligned_be32(ptr + 4);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 64db8dd..8f30c18 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -488,7 +488,9 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 	opts->mss = tcp_advertise_mss(sk);
 	size += TCPOLEN_MSS_ALIGNED;
 
-	if (likely(sysctl_tcp_timestamps && *md5 == NULL)) {
+	if (likely(sysctl_tcp_timestamps &&
+		   !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) &&
+		   *md5 == NULL)) {
 		opts->options |= OPTION_TS;
 		opts->tsval = TCP_SKB_CB(skb)->when;
 		opts->tsecr = tp->rx_opt.ts_recent;
@@ -2317,7 +2319,9 @@ static void tcp_connect_init(struct sock *sk)
 	 * See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
 	 */
 	tp->tcp_header_len = sizeof(struct tcphdr) +
-		(sysctl_tcp_timestamps ? TCPOLEN_TSTAMP_ALIGNED : 0);
+		(sysctl_tcp_timestamps &&
+		(!dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) ?
+		  TCPOLEN_TSTAMP_ALIGNED : 0));
 
 #ifdef CONFIG_TCP_MD5SIG
 	if (tp->af_specific->md5_lookup(sk, sk) != NULL)
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC] Allow to turn off TCP window scale opt per route
  2009-10-20 15:22         ` [PATCH RFC] Allow disabling TCP timestamp options per route Gilad Ben-Yossef
@ 2009-10-20 15:22           ` Gilad Ben-Yossef
  2009-10-20 15:22             ` [PATCH RFC] Allow disabling of DSACK TCP option " Gilad Ben-Yossef
  2009-10-21  1:40             ` [PATCH RFC] Allow to turn off TCP window scale opt per route Stephen Hemminger
  0 siblings, 2 replies; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef

Add and use no window scale bit in the features field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
 include/linux/rtnetlink.h |    1 +
 net/ipv4/tcp_input.c      |    3 ++-
 net/ipv4/tcp_output.c     |    6 ++++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 2ab8c75..6784b34 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -380,6 +380,7 @@ enum
 #define RTAX_FEATURE_NO_SACK	0x00000002
 #define RTAX_FEATURE_NO_TSTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
+#define RTAX_FEATURE_NO_WSCALE	0x00000010
 
 struct rta_session
 {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d2f9742..4f5e914 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3739,7 +3739,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
 				break;
 			case TCPOPT_WINDOW:
 				if (opsize == TCPOLEN_WINDOW && th->syn &&
-				    !estab && sysctl_tcp_window_scaling) {
+				    !estab && sysctl_tcp_window_scaling &&
+				    !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)) {
 					__u8 snd_wscale = *(__u8 *)ptr;
 					opt_rx->wscale_ok = 1;
 					if (snd_wscale > 14) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8f30c18..ff60a21 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -496,7 +496,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 		opts->tsecr = tp->rx_opt.ts_recent;
 		size += TCPOLEN_TSTAMP_ALIGNED;
 	}
-	if (likely(sysctl_tcp_window_scaling)) {
+	if (likely(sysctl_tcp_window_scaling &&
+		   !dst_feature(dst, RTAX_FEATURE_NO_WSCALE))) {
 		opts->ws = tp->rx_opt.rcv_wscale;
 		opts->options |= OPTION_WSCALE;
 		size += TCPOLEN_WSCALE_ALIGNED;
@@ -2347,7 +2348,8 @@ static void tcp_connect_init(struct sock *sk)
 				  tp->advmss - (tp->rx_opt.ts_recent_stamp ? tp->tcp_header_len - sizeof(struct tcphdr) : 0),
 				  &tp->rcv_wnd,
 				  &tp->window_clamp,
-				  sysctl_tcp_window_scaling,
+				  (sysctl_tcp_window_scaling &&
+				   !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)),
 				  &rcv_wscale);
 
 	tp->rx_opt.rcv_wscale = rcv_wscale;
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC] Allow disabling of DSACK TCP option per route
  2009-10-20 15:22           ` [PATCH RFC] Allow to turn off TCP window scale opt " Gilad Ben-Yossef
@ 2009-10-20 15:22             ` Gilad Ben-Yossef
  2009-10-20 15:22               ` [PATCH RFC] Document future removal of sysctl_tcp_* options Gilad Ben-Yossef
  2009-10-21  1:40             ` [PATCH RFC] Allow to turn off TCP window scale opt per route Stephen Hemminger
  1 sibling, 1 reply; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef

Add and use no DSCAK bit in the features field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/linux/rtnetlink.h |    1 +
 net/ipv4/tcp_input.c      |    8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 6784b34..e78b60c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -381,6 +381,7 @@ enum
 #define RTAX_FEATURE_NO_TSTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
 #define RTAX_FEATURE_NO_WSCALE	0x00000010
+#define RTAX_FEATURE_NO_DSACK	0x00000020
 
 struct rta_session
 {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4f5e914..4262da5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4080,8 +4080,10 @@ static inline int tcp_sack_extend(struct tcp_sack_block *sp, u32 seq,
 static void tcp_dsack_set(struct sock *sk, u32 seq, u32 end_seq)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct dst_entry *dst = __sk_dst_get(sk);
 
-	if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
+	if (tcp_is_sack(tp) && sysctl_tcp_dsack &&
+	    !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) {
 		int mib_idx;
 
 		if (before(seq, tp->rcv_nxt))
@@ -4110,13 +4112,15 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
 static void tcp_send_dupack(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct dst_entry *dst = __sk_dst_get(sk);
 
 	if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
 	    before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
 		tcp_enter_quickack_mode(sk);
 
-		if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
+		if (tcp_is_sack(tp) && sysctl_tcp_dsack &&
+		    !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) {
 			u32 end_seq = TCP_SKB_CB(skb)->end_seq;
 
 			if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RFC] Document future removal of sysctl_tcp_* options
  2009-10-20 15:22             ` [PATCH RFC] Allow disabling of DSACK TCP option " Gilad Ben-Yossef
@ 2009-10-20 15:22               ` Gilad Ben-Yossef
  0 siblings, 0 replies; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef

No need for global kill switches if we have per route entry controls.
Wait a year before removing in case someone is using this.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>

---
 Documentation/feature-removal-schedule.txt |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 89a47b5..60db855 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -6,6 +6,18 @@ be removed from this file.
 
 ---------------------------
 
+What:	sysctl_tcp_sack, sysctl_tcp_timestamps, sysctl_tcp_window_scaling,
+	sysctl_tcp_dsack
+When:	October 2010
+
+Why:	These options can now be set on a per route basis via the
+	RTAX_FEATURE_NO_SACK, RTAX_FEATURE_NO_TSTAMP, RTAX_FEATURE_NO_WSCALE,
+	and RTAX_FEATURE_NO_DSACK route feature options.
+
+Who:	Gilad Ben-Yossef <gilad@codefidence.com>
+
+---------------------------
+
 What:	PRISM54
 When:	2.6.34
 
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef
  2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef
@ 2009-10-20 15:44 ` Eric Dumazet
  2009-10-20 16:11   ` Gilad Ben-Yossef
  2009-10-20 16:26 ` Rick Jones
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2009-10-20 15:44 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: netdev, ori

Gilad Ben-Yossef a écrit :
> Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
> time stamp and window scale into per route entry feature options,
> laying the ground to future removal of the relevant global sysctls.
> 
> You really only want to disable SACK, DSACK, time stamp or window
> scale if you've got a piece of broken networking equipment somewhere 
> as a stop gap until you can bring a big enough hammer to deal with
> the broken network equipment. It doesn't make sense to "punish" the
> entire connections going through the machine to destinations not 
> related to the broken equipment.
> 
> This is doubly true when you're dealing with network containers
> used to isolate several virtual domains.
> 
> Per route options implemented in free bits in the features route
> entry property, which in some cases were reserved by name for these
> options, so this does not inflate any structure and I expect that
> when the apropriate global sysctls will be removed the overall code
> base will be smaller.
> 
> Tested on x86 using Qemu/KVM.  
> 
> Will send the matching patch to iproute2 if/when this is ACKed or
> if someone wants to test this.
> 
> Patchset based on original work by Ori Finkelman and Yoni Amit 
> from ComSleep Ltd.
> 
> Gilad Ben-Yossef (8):
>   Only parse time stamp TCP option in time wait sock
>   Allow tcp_parse_options to consult dst entry
>   Infrastructure for querying route entry features
>   Add the no SACK route option feature
>   Allow disabling TCP timestamp options per route
>   Allow to turn off TCP window scale opt per route
>   Allow disabling of DSACK TCP option per route
>   Document future removal of sysctl_tcp_* options
> 

Interesting... But you should give numbers to your patches so that we know their order

You could also do the ECN part for consistency (ie RTAX_FEATURE_ECN -> RTAX_FEATURE_NO_ECN)

And please post iproute2 patches as well :)

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-20 15:44 ` [PATCH RFC] Per route TCP options Eric Dumazet
@ 2009-10-20 16:11   ` Gilad Ben-Yossef
  2009-10-20 18:53     ` Ilpo Järvinen
  0 siblings, 1 reply; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 16:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, ori

Eric Dumazet wrote:

> Gilad Ben-Yossef a écrit :
>   
>> Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
>> time stamp and window scale into per route entry feature options,
>> laying the ground to future removal of the relevant global sysctls.
>>
>> ...
> Interesting... But you should give numbers to your patches so that we know their order
>   
Will do.
> You could also do the ECN part for consistency (ie RTAX_FEATURE_ECN -> RTAX_FEATURE_NO_ECN)
>   
That and  sysctl_tcp_mtu_probing are on my todo list, assuming the 
general direction is accepted, of course.

Specifically, I couldn't understand why sysctl_tcp_ecn is documented to 
be a boolean value, but is initialized to 2 and queried with if 
(sysctl_tcp_ecn == 1) so I decided to let it be until I figure it out... 
;-)
> And please post iproute2 patches as well :)
>   
Will do. Note that the patch I have still has an ugly bit that needs 
addressing - it sets the features by name, but right now only displays 
numeric values to the, for "show". This of course will be fixed, but it 
does work for testing.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef
  2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef
  2009-10-20 15:44 ` [PATCH RFC] Per route TCP options Eric Dumazet
@ 2009-10-20 16:26 ` Rick Jones
  2009-10-21  8:04   ` Gilad Ben-Yossef
  2009-10-21  0:36 ` David Miller
  2009-10-21  2:13 ` Bill Fink
  4 siblings, 1 reply; 23+ messages in thread
From: Rick Jones @ 2009-10-20 16:26 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: netdev, ori

Gilad Ben-Yossef wrote:
> Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
> time stamp and window scale into per route entry feature options,
> laying the ground to future removal of the relevant global sysctls.
> 
> You really only want to disable SACK, DSACK, time stamp or window
> scale if you've got a piece of broken networking equipment somewhere 
> as a stop gap until you can bring a big enough hammer to deal with
> the broken network equipment. It doesn't make sense to "punish" the
> entire connections going through the machine to destinations not 
> related to the broken equipment.

Is it really only the case that those options get disabled for broken networking 
equipment?  Does this presage making all TCP options per-route only?

rick jones

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-20 16:11   ` Gilad Ben-Yossef
@ 2009-10-20 18:53     ` Ilpo Järvinen
  2009-10-21  8:15       ` Gilad Ben-Yossef
  0 siblings, 1 reply; 23+ messages in thread
From: Ilpo Järvinen @ 2009-10-20 18:53 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Eric Dumazet, Netdev, ori

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1052 bytes --]

On Tue, 20 Oct 2009, Gilad Ben-Yossef wrote:

> Eric Dumazet wrote:
> 
> > Gilad Ben-Yossef a écrit :
> >   
> > > Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
> > > time stamp and window scale into per route entry feature options,
> > > laying the ground to future removal of the relevant global sysctls.
> > >
> > > ...
> > Interesting... But you should give numbers to your patches so that we know
> > their order
> >   
> Will do.
> > You could also do the ECN part for consistency (ie RTAX_FEATURE_ECN ->
> > RTAX_FEATURE_NO_ECN)
> >   
> That and  sysctl_tcp_mtu_probing are on my todo list, assuming the general
> direction is accepted, of course.
> 
> Specifically, I couldn't understand why sysctl_tcp_ecn is documented to be a
> boolean value, but is initialized to 2 and queried with if (sysctl_tcp_ecn ==
> 1) so I decided to let it be until I figure it out... ;-)

Ah, I didn't notice that "- BOOLEAN" there so I modified only the 
description (some blindness to caps perhaps :-)), did you perhaps read
it fully?

-- 
 i.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef
                   ` (2 preceding siblings ...)
  2009-10-20 16:26 ` Rick Jones
@ 2009-10-21  0:36 ` David Miller
  2009-10-21  8:10   ` Gilad Ben-Yossef
  2009-10-21  2:13 ` Bill Fink
  4 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2009-10-21  0:36 UTC (permalink / raw)
  To: gilad; +Cc: netdev, ori


When sending more than one patch in a patch set, you must
number your patches so that people know in what order your
pathes should be applied.

Mailing lists and other entities reorder list postings
quite severely, so it's not "obvious" what order your
postings matter just based upon arrival order.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Allow to turn off TCP window scale opt per route
  2009-10-20 15:22           ` [PATCH RFC] Allow to turn off TCP window scale opt " Gilad Ben-Yossef
  2009-10-20 15:22             ` [PATCH RFC] Allow disabling of DSACK TCP option " Gilad Ben-Yossef
@ 2009-10-21  1:40             ` Stephen Hemminger
  2009-10-21  8:18               ` Gilad Ben-Yossef
  1 sibling, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2009-10-21  1:40 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: netdev, ori, Gilad Ben-Yossef

On Tue, 20 Oct 2009 17:22:39 +0200
Gilad Ben-Yossef <gilad@codefidence.com> wrote:

> Add and use no window scale bit in the features field.
> 
> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> Sigend-off-by: Ori Finkelman <ori@comsleep.com>
> Sigend-off-by: Yony Amit <yony@comsleep.com>

The same effect can by just using window limit on route


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef
                   ` (3 preceding siblings ...)
  2009-10-21  0:36 ` David Miller
@ 2009-10-21  2:13 ` Bill Fink
  2009-10-21  8:27   ` Gilad Ben-Yossef
  4 siblings, 1 reply; 23+ messages in thread
From: Bill Fink @ 2009-10-21  2:13 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: netdev, ori

On Tue, 20 Oct 2009, Gilad Ben-Yossef wrote:

> Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
> time stamp and window scale into per route entry feature options,
> laying the ground to future removal of the relevant global sysctls.
> 
> You really only want to disable SACK, DSACK, time stamp or window
> scale if you've got a piece of broken networking equipment somewhere 
> as a stop gap until you can bring a big enough hammer to deal with
> the broken network equipment. It doesn't make sense to "punish" the
> entire connections going through the machine to destinations not 
> related to the broken equipment.

For certain test situations, it is sometimes desirable to globally
disable TCP timestamps.  Although I have not personally wanted to
globally disable the other mentioned features, I can imagine test
scenarios where it could be useful.  Admittedly it could also be
accomplished with per route features, just not as conveniently,
especially if there are a large number of interfaces and/or routes.

						-Bill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-20 16:26 ` Rick Jones
@ 2009-10-21  8:04   ` Gilad Ben-Yossef
  2009-10-21  8:21     ` Florian Westphal
  0 siblings, 1 reply; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21  8:04 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, ori

Rick Jones wrote:

> Gilad Ben-Yossef wrote:
>> Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
>> time stamp and window scale into per route entry feature options,
>> laying the ground to future removal of the relevant global sysctls.
>>
>> You really only want to disable SACK, DSACK, time stamp or window
>> scale if you've got a piece of broken networking equipment somewhere 
>> as a stop gap until you can bring a big enough hammer to deal with
>> the broken network equipment. It doesn't make sense to "punish" the
>> entire connections going through the machine to destinations not 
>> related to the broken equipment.
>
> Is it really only the case that those options get disabled for broken 
> networking equipment?  Does this presage making all TCP options 
> per-route only?
Well, I assume it might be the case that there are situations where you 
are trying to communicate over some exotic link  where the networking 
equipment is not broken as such, but the unusual properties of the link 
makes one of the features not desirable. I can't think of such a 
situation right now off the top of my head, but maybe they exist.

The point is that even then you are more then likely to wish to turn off 
these options to specific destination and routes (that go over said 
exotic link) and keep using them over others - e.g. timestamp OK for 
local LAN, but for default route that goes over exotic TCP/IP over 
carrier penguins turn it off.

To sum it up, I think making these options per route is a win no matter 
the situation. The question I am less certain about if it is also 
desirable to have a global kill switch in addition to the per route 
options. My gut feeling is that this is not needed once you have a per 
route option.

Cheers,
Gilad



-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-21  0:36 ` David Miller
@ 2009-10-21  8:10   ` Gilad Ben-Yossef
  0 siblings, 0 replies; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21  8:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ori

David Miller wrote:

> When sending more than one patch in a patch set, you must
> number your patches so that people know in what order your
> pathes should be applied.
>
> Mailing lists and other entities reorder list postings
> quite severely, so it's not "obvious" what order your
> postings matter just based upon arrival order.
>
>   
Yes, of course. How silly of me not to notice. My apologies.

git and I are currently have a bit of love hate thing going on...

I will re-send the patch set in a more favorable form.

Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-20 18:53     ` Ilpo Järvinen
@ 2009-10-21  8:15       ` Gilad Ben-Yossef
  0 siblings, 0 replies; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21  8:15 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Eric Dumazet, Netdev, ori

Hi Ilpo,


Ilpo Järvinen wrote:

>
>> Specifically, I couldn't understand why sysctl_tcp_ecn is documented to be a
>> boolean value, but is initialized to 2 and queried with if (sysctl_tcp_ecn ==
>> 1) so I decided to let it be until I figure it out... ;-)
>>     
>
> Ah, I didn't notice that "- BOOLEAN" there so I modified only the 
> description (some blindness to caps perhaps :-)), did you perhaps read
> it fully?
>
>   
Yes, I did finally. What threw me off was that that 
TCP_ECN_create_request() were the value if checked as a boolean was 
inlined in the include file. I've missed that. Thanks for the tip.

At any rate, it means that "noecn" is not a good route option feature, 
since it is not boolean. I plan to handle those in the next patch set, 
if the current one is accepted.

Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Allow to turn off TCP window scale opt per route
  2009-10-21  1:40             ` [PATCH RFC] Allow to turn off TCP window scale opt per route Stephen Hemminger
@ 2009-10-21  8:18               ` Gilad Ben-Yossef
  0 siblings, 0 replies; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21  8:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, ori

Stephen Hemminger wrote:

> On Tue, 20 Oct 2009 17:22:39 +0200
> Gilad Ben-Yossef <gilad@codefidence.com> wrote:
>
>   
>> Add and use no window scale bit in the features field.
>>
>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>> Sigend-off-by: Ori Finkelman <ori@comsleep.com>
>> Sigend-off-by: Yony Amit <yony@comsleep.com>
>>     
>
> The same effect can by just using window limit on route
>   
Actually, not exactly. There is a subtle but important difference 
between "I support window scaling, but choose a scale of 0", which is 
what your suggestion will imply and "I don't support window scaling at 
all", which is what the suggested feature field does.

To make things more complicated, until the previous patch I sent, when 
Linux would choose a window scale of zero for any reason, it would also 
wrongfully report it does not support window scaling at all to the other 
side, with some subtle bug caused by it, but this is no longer the 
behavior and I believe rightly so.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-21  8:04   ` Gilad Ben-Yossef
@ 2009-10-21  8:21     ` Florian Westphal
  2009-10-21  8:40       ` Gilad Ben-Yossef
  0 siblings, 1 reply; 23+ messages in thread
From: Florian Westphal @ 2009-10-21  8:21 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Rick Jones, netdev, ori

Gilad Ben-Yossef <gilad@codefidence.com> wrote:
> The point is that even then you are more then likely to wish to turn
> off these options to specific destination and routes (that go over
> said exotic link) and keep using them over others - e.g. timestamp
> OK for local LAN, but for default route that goes over exotic TCP/IP
> over carrier penguins turn it off.

If you need a bandaid solution, its is possible to replace tcp
options with NOOPs using netfilters TCPOPTSTRIP target.

There is also an ECN target to work around ECN blackholes.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-21  2:13 ` Bill Fink
@ 2009-10-21  8:27   ` Gilad Ben-Yossef
  0 siblings, 0 replies; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21  8:27 UTC (permalink / raw)
  To: Bill Fink; +Cc: netdev, ori

Hi,


Bill Fink wrote:

> On Tue, 20 Oct 2009, Gilad Ben-Yossef wrote:
>
>   
>> Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
>> time stamp and window scale into per route entry feature options,
>> laying the ground to future removal of the relevant global sysctls.
>>
>> You really only want to disable SACK, DSACK, time stamp or window
>> scale if you've got a piece of broken networking equipment somewhere 
>> as a stop gap until you can bring a big enough hammer to deal with
>> the broken network equipment. It doesn't make sense to "punish" the
>> entire connections going through the machine to destinations not 
>> related to the broken equipment.
>>     
>
> For certain test situations, it is sometimes desirable to globally
> disable TCP timestamps.  Although I have not personally wanted to
> globally disable the other mentioned features, I can imagine test
> scenarios where it could be useful.  Admittedly it could also be
> accomplished with per route features, just not as conveniently,
> especially if there are a large number of interfaces and/or routes.
>
>   
I don't feel that strongly about it either way, just trying to hearken to
Linus "Linux is bloated" message by not fostering duplicate ways to do
the same thing... ;-)

If the consensus is to adopt the route features but leave the global
kill switches, I certainly have no problem with that.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RFC] Per route TCP options
  2009-10-21  8:21     ` Florian Westphal
@ 2009-10-21  8:40       ` Gilad Ben-Yossef
  0 siblings, 0 replies; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21  8:40 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Rick Jones, netdev, ori

Hi Florian,


Florian Westphal wrote:

> Gilad Ben-Yossef <gilad@codefidence.com> wrote:
>   
>> The point is that even then you are more then likely to wish to turn
>> off these options to specific destination and routes (that go over
>> said exotic link) and keep using them over others - e.g. timestamp
>> OK for local LAN, but for default route that goes over exotic TCP/IP
>> over carrier penguins turn it off.
>>     
>
> If you need a bandaid solution, its is possible to replace tcp
> options with NOOPs using netfilters TCPOPTSTRIP target.
>
> There is also an ECN target to work around ECN blackholes.
>   
Thanks for the tip. It is appreciated.

The band aid solution that Ori and company found was
simply to patch the local copy of the kernel used but being that sitting
on a bunch of "private" patches seems like a lose-lose situation (there
is a term you don't hear much when talking to MBAs :-) I'm now trying
to get it mainlined for them.


Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-10-21  8:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef
2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef
2009-10-20 15:22   ` [PATCH RFC] Allow tcp_parse_options to consult dst entry Gilad Ben-Yossef
2009-10-20 15:22     ` [PATCH RFC] Add dst_feature to query route entry features Gilad Ben-Yossef
2009-10-20 15:22       ` [PATCH RFC] Add the no SACK route option feature Gilad Ben-Yossef
2009-10-20 15:22         ` [PATCH RFC] Allow disabling TCP timestamp options per route Gilad Ben-Yossef
2009-10-20 15:22           ` [PATCH RFC] Allow to turn off TCP window scale opt " Gilad Ben-Yossef
2009-10-20 15:22             ` [PATCH RFC] Allow disabling of DSACK TCP option " Gilad Ben-Yossef
2009-10-20 15:22               ` [PATCH RFC] Document future removal of sysctl_tcp_* options Gilad Ben-Yossef
2009-10-21  1:40             ` [PATCH RFC] Allow to turn off TCP window scale opt per route Stephen Hemminger
2009-10-21  8:18               ` Gilad Ben-Yossef
2009-10-20 15:44 ` [PATCH RFC] Per route TCP options Eric Dumazet
2009-10-20 16:11   ` Gilad Ben-Yossef
2009-10-20 18:53     ` Ilpo Järvinen
2009-10-21  8:15       ` Gilad Ben-Yossef
2009-10-20 16:26 ` Rick Jones
2009-10-21  8:04   ` Gilad Ben-Yossef
2009-10-21  8:21     ` Florian Westphal
2009-10-21  8:40       ` Gilad Ben-Yossef
2009-10-21  0:36 ` David Miller
2009-10-21  8:10   ` Gilad Ben-Yossef
2009-10-21  2:13 ` Bill Fink
2009-10-21  8:27   ` Gilad Ben-Yossef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).