* [PATCH RFC] Per route TCP options
@ 2009-10-20 15:22 Gilad Ben-Yossef
2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef
` (4 more replies)
0 siblings, 5 replies; 23+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw)
To: netdev; +Cc: ori, Gilad Ben-Yossef
Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
time stamp and window scale into per route entry feature options,
laying the ground to future removal of the relevant global sysctls.
You really only want to disable SACK, DSACK, time stamp or window
scale if you've got a piece of broken networking equipment somewhere
as a stop gap until you can bring a big enough hammer to deal with
the broken network equipment. It doesn't make sense to "punish" the
entire connections going through the machine to destinations not
related to the broken equipment.
This is doubly true when you're dealing with network containers
used to isolate several virtual domains.
Per route options implemented in free bits in the features route
entry property, which in some cases were reserved by name for these
options, so this does not inflate any structure and I expect that
when the apropriate global sysctls will be removed the overall code
base will be smaller.
Tested on x86 using Qemu/KVM.
Will send the matching patch to iproute2 if/when this is ACKed or
if someone wants to test this.
Patchset based on original work by Ori Finkelman and Yoni Amit
from ComSleep Ltd.
Gilad Ben-Yossef (8):
Only parse time stamp TCP option in time wait sock
Allow tcp_parse_options to consult dst entry
Infrastructure for querying route entry features
Add the no SACK route option feature
Allow disabling TCP timestamp options per route
Allow to turn off TCP window scale opt per route
Allow disabling of DSACK TCP option per route
Document future removal of sysctl_tcp_* options
Documentation/feature-removal-schedule.txt | 12 ++++++++++++
include/linux/rtnetlink.h | 6 ++++--
include/net/dst.h | 8 +++++++-
include/net/tcp.h | 3 ++-
net/ipv4/syncookies.c | 27 ++++++++++++++-------------
net/ipv4/tcp_input.c | 26 ++++++++++++++++++--------
net/ipv4/tcp_ipv4.c | 19 ++++++++++---------
net/ipv4/tcp_minisocks.c | 8 +++++---
net/ipv4/tcp_output.c | 18 +++++++++++++-----
net/ipv6/syncookies.c | 28 +++++++++++++++-------------
net/ipv6/tcp_ipv6.c | 3 ++-
11 files changed, 102 insertions(+), 56 deletions(-)
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH RFC] Only parse time stamp TCP option in time wait sock 2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef @ 2009-10-20 15:22 ` Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Allow tcp_parse_options to consult dst entry Gilad Ben-Yossef 2009-10-20 15:44 ` [PATCH RFC] Per route TCP options Eric Dumazet ` (3 subsequent siblings) 4 siblings, 1 reply; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw) To: netdev; +Cc: ori, Gilad Ben-Yossef, Yony Amit A time wait socket is established - we already know if time stamp option is called for or not. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Signed-off-by: Ori Finkelman <ori@comsleep.com> Signed-off-by: Yony Amit <yony@comsleep.com> --- net/ipv4/tcp_minisocks.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 624c3c9..c49a550 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -100,9 +100,8 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, struct tcp_options_received tmp_opt; int paws_reject = 0; - tmp_opt.saw_tstamp = 0; if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) { - tcp_parse_options(skb, &tmp_opt, 0); + tcp_parse_options(skb, &tmp_opt, 1); if (tmp_opt.saw_tstamp) { tmp_opt.ts_recent = tcptw->tw_ts_recent; -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH RFC] Allow tcp_parse_options to consult dst entry 2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef @ 2009-10-20 15:22 ` Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Add dst_feature to query route entry features Gilad Ben-Yossef 0 siblings, 1 reply; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw) To: netdev; +Cc: ori, Gilad Ben-Yossef We need tcp_parse_options to be aware of dst_entry to take into account per dst_entry TCP options settings Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> --- include/net/tcp.h | 3 ++- net/ipv4/syncookies.c | 27 ++++++++++++++------------- net/ipv4/tcp_input.c | 9 ++++++--- net/ipv4/tcp_ipv4.c | 19 ++++++++++--------- net/ipv4/tcp_minisocks.c | 7 +++++-- net/ipv6/syncookies.c | 28 +++++++++++++++------------- net/ipv6/tcp_ipv6.c | 3 ++- 7 files changed, 54 insertions(+), 42 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 03a49c7..740d09b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -409,7 +409,8 @@ extern int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, extern void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx, - int estab); + int estab, + struct dst_entry *dst); extern u8 *tcp_parse_md5sig_option(struct tcphdr *th); diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index a6e0e07..4990dd4 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -276,13 +276,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV); - /* check for timestamp cookie support */ - memset(&tcp_opt, 0, sizeof(tcp_opt)); - tcp_parse_options(skb, &tcp_opt, 0); - - if (tcp_opt.saw_tstamp) - cookie_check_timestamp(&tcp_opt); - ret = NULL; req = inet_reqsk_alloc(&tcp_request_sock_ops); /* for safety */ if (!req) @@ -298,12 +291,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, ireq->loc_addr = ip_hdr(skb)->daddr; ireq->rmt_addr = ip_hdr(skb)->saddr; ireq->ecn_ok = 0; - ireq->snd_wscale = tcp_opt.snd_wscale; - ireq->rcv_wscale = tcp_opt.rcv_wscale; - ireq->sack_ok = tcp_opt.sack_ok; - ireq->wscale_ok = tcp_opt.wscale_ok; - ireq->tstamp_ok = tcp_opt.saw_tstamp; - req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0; /* We throwed the options of the initial SYN away, so we hope * the ACK carries the same options again (see RFC1122 4.2.3.8) @@ -351,6 +338,20 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, } } + /* check for timestamp cookie support */ + memset(&tcp_opt, 0, sizeof(tcp_opt)); + tcp_parse_options(skb, &tcp_opt, 0, &rt->u.dst); + + if (tcp_opt.saw_tstamp) + cookie_check_timestamp(&tcp_opt); + + ireq->snd_wscale = tcp_opt.snd_wscale; + ireq->rcv_wscale = tcp_opt.rcv_wscale; + ireq->sack_ok = tcp_opt.sack_ok; + ireq->wscale_ok = tcp_opt.wscale_ok; + ireq->tstamp_ok = tcp_opt.saw_tstamp; + req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0; + /* Try to redo what tcp_v4_send_synack did. */ req->window_clamp = tp->window_clamp ? :dst_metric(&rt->u.dst, RTAX_WINDOW); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d86784b..d502f49 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3698,12 +3698,14 @@ old_ack: * the fast version below fails. */ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx, - int estab) + int estab, struct dst_entry *dst) { unsigned char *ptr; struct tcphdr *th = tcp_hdr(skb); int length = (th->doff * 4) - sizeof(struct tcphdr); + BUG_ON(!estab && !dst); + ptr = (unsigned char *)(th + 1); opt_rx->saw_tstamp = 0; @@ -3820,7 +3822,7 @@ static int tcp_fast_parse_options(struct sk_buff *skb, struct tcphdr *th, if (tcp_parse_aligned_timestamp(tp, th)) return 1; } - tcp_parse_options(skb, &tp->rx_opt, 1); + tcp_parse_options(skb, &tp->rx_opt, 1, NULL); return 1; } @@ -5364,8 +5366,9 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk); int saved_clamp = tp->rx_opt.mss_clamp; + struct dst_entry *dst = __sk_dst_get(sk); - tcp_parse_options(skb, &tp->rx_opt, 0); + tcp_parse_options(skb, &tp->rx_opt, 0, dst); if (th->ack) { /* rfc793: diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 7cda24b..1cb0ec4 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops; #endif + ireq = inet_rsk(req); + ireq->loc_addr = daddr; + ireq->rmt_addr = saddr; + ireq->no_srccheck = inet_sk(sk)->transparent; + ireq->opt = tcp_v4_save_options(sk, skb); + + dst = inet_csk_route_req(sk, req); tcp_clear_options(&tmp_opt); tmp_opt.mss_clamp = 536; tmp_opt.user_mss = tcp_sk(sk)->rx_opt.user_mss; - tcp_parse_options(skb, &tmp_opt, 0); + tcp_parse_options(skb, &tmp_opt, 0, dst); if (want_cookie && !tmp_opt.saw_tstamp) tcp_clear_options(&tmp_opt); @@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) tcp_openreq_init(req, &tmp_opt, skb); - ireq = inet_rsk(req); - ireq->loc_addr = daddr; - ireq->rmt_addr = saddr; - ireq->no_srccheck = inet_sk(sk)->transparent; - ireq->opt = tcp_v4_save_options(sk, skb); - if (security_inet_conn_request(sk, skb, req)) - goto drop_and_free; + goto drop_and_release; if (!want_cookie) TCP_ECN_create_request(req, tcp_hdr(skb)); @@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) */ if (tmp_opt.saw_tstamp && tcp_death_row.sysctl_tw_recycle && - (dst = inet_csk_route_req(sk, req)) != NULL && + dst != NULL && (peer = rt_get_peer((struct rtable *)dst)) != NULL && peer->v4daddr == saddr) { if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL && diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index c49a550..70ff955 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -101,7 +101,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, int paws_reject = 0; if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) { - tcp_parse_options(skb, &tmp_opt, 1); + tcp_parse_options(skb, &tmp_opt, 1, NULL); if (tmp_opt.saw_tstamp) { tmp_opt.ts_recent = tcptw->tw_ts_recent; @@ -499,10 +499,11 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, int paws_reject = 0; struct tcp_options_received tmp_opt; struct sock *child; + struct dst_entry *dst = inet_csk_route_req(sk, req); tmp_opt.saw_tstamp = 0; if (th->doff > (sizeof(struct tcphdr)>>2)) { - tcp_parse_options(skb, &tmp_opt, 0); + tcp_parse_options(skb, &tmp_opt, 0, dst); if (tmp_opt.saw_tstamp) { tmp_opt.ts_recent = req->ts_recent; @@ -515,6 +516,8 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, } } + dst_release(dst); + /* Check for pure retransmitted SYN. */ if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn && flg == TCP_FLAG_SYN && diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c index 6b6ae91..6ece408 100644 --- a/net/ipv6/syncookies.c +++ b/net/ipv6/syncookies.c @@ -184,13 +184,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV); - /* check for timestamp cookie support */ - memset(&tcp_opt, 0, sizeof(tcp_opt)); - tcp_parse_options(skb, &tcp_opt, 0); - - if (tcp_opt.saw_tstamp) - cookie_check_timestamp(&tcp_opt); - ret = NULL; req = inet6_reqsk_alloc(&tcp6_request_sock_ops); if (!req) @@ -224,12 +217,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) req->expires = 0UL; req->retrans = 0; ireq->ecn_ok = 0; - ireq->snd_wscale = tcp_opt.snd_wscale; - ireq->rcv_wscale = tcp_opt.rcv_wscale; - ireq->sack_ok = tcp_opt.sack_ok; - ireq->wscale_ok = tcp_opt.wscale_ok; - ireq->tstamp_ok = tcp_opt.saw_tstamp; - req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0; treq->rcv_isn = ntohl(th->seq) - 1; treq->snt_isn = cookie; @@ -264,6 +251,21 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) goto out_free; } + /* check for timestamp cookie support */ + memset(&tcp_opt, 0, sizeof(tcp_opt)); + tcp_parse_options(skb, &tcp_opt, 0, dst); + + if (tcp_opt.saw_tstamp) + cookie_check_timestamp(&tcp_opt); + + req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0; + + ireq->snd_wscale = tcp_opt.snd_wscale; + ireq->rcv_wscale = tcp_opt.rcv_wscale; + ireq->sack_ok = tcp_opt.sack_ok; + ireq->wscale_ok = tcp_opt.wscale_ok; + ireq->tstamp_ok = tcp_opt.saw_tstamp; + req->window_clamp = tp->window_clamp ? :dst_metric(dst, RTAX_WINDOW); tcp_select_initial_window(tcp_full_space(sk), req->mss, &req->rcv_wnd, &req->window_clamp, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 21d100b..2eebab5 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1165,6 +1165,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb) struct tcp_sock *tp = tcp_sk(sk); struct request_sock *req = NULL; __u32 isn = TCP_SKB_CB(skb)->when; + struct dst_entry *dst = __sk_dst_get(sk); #ifdef CONFIG_SYN_COOKIES int want_cookie = 0; #else @@ -1203,7 +1204,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb) tmp_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr); tmp_opt.user_mss = tp->rx_opt.user_mss; - tcp_parse_options(skb, &tmp_opt, 0); + tcp_parse_options(skb, &tmp_opt, 0, dst); if (want_cookie && !tmp_opt.saw_tstamp) tcp_clear_options(&tmp_opt); -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH RFC] Add dst_feature to query route entry features 2009-10-20 15:22 ` [PATCH RFC] Allow tcp_parse_options to consult dst entry Gilad Ben-Yossef @ 2009-10-20 15:22 ` Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Add the no SACK route option feature Gilad Ben-Yossef 0 siblings, 1 reply; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw) To: netdev; +Cc: ori, Gilad Ben-Yossef Adding an accessor to existing dst_entry feautres field and refactor the only supported feature (allfrag) to use it. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> --- include/net/dst.h | 8 +++++++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/include/net/dst.h b/include/net/dst.h index 5a900dd..b562be3 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -111,6 +111,12 @@ dst_metric(const struct dst_entry *dst, int metric) return dst->metrics[metric-1]; } +static inline u32 +dst_feature(const struct dst_entry *dst, u32 feature) +{ + return dst_metric(dst, RTAX_FEATURES) & feature; +} + static inline u32 dst_mtu(const struct dst_entry *dst) { u32 mtu = dst_metric(dst, RTAX_MTU); @@ -136,7 +142,7 @@ static inline void set_dst_metric_rtt(struct dst_entry *dst, int metric, static inline u32 dst_allfrag(const struct dst_entry *dst) { - int ret = dst_metric(dst, RTAX_FEATURES) & RTAX_FEATURE_ALLFRAG; + int ret = dst_feature(dst, RTAX_FEATURE_ALLFRAG); /* Yes, _exactly_. This is paranoia. */ barrier(); return ret; -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH RFC] Add the no SACK route option feature 2009-10-20 15:22 ` [PATCH RFC] Add dst_feature to query route entry features Gilad Ben-Yossef @ 2009-10-20 15:22 ` Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Allow disabling TCP timestamp options per route Gilad Ben-Yossef 0 siblings, 1 reply; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw) To: netdev; +Cc: ori, Gilad Ben-Yossef Implement querying and acting upon the no sack bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> --- include/linux/rtnetlink.h | 2 +- net/ipv4/tcp_input.c | 3 ++- net/ipv4/tcp_output.c | 4 +++- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index adf2068..9c802a6 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -377,7 +377,7 @@ enum #define RTAX_MAX (__RTAX_MAX - 1) #define RTAX_FEATURE_ECN 0x00000001 -#define RTAX_FEATURE_SACK 0x00000002 +#define RTAX_FEATURE_NO_SACK 0x00000002 #define RTAX_FEATURE_TIMESTAMP 0x00000004 #define RTAX_FEATURE_ALLFRAG 0x00000008 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d502f49..b14f780 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3763,7 +3763,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx, break; case TCPOPT_SACK_PERM: if (opsize == TCPOLEN_SACK_PERM && th->syn && - !estab && sysctl_tcp_sack) { + !estab && sysctl_tcp_sack && + !dst_feature(dst, RTAX_FEATURE_NO_SACK)) { opt_rx->sack_ok = 1; tcp_sack_reset(opt_rx); } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index fcd278a..64db8dd 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -464,6 +464,7 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb, struct tcp_md5sig_key **md5) { struct tcp_sock *tp = tcp_sk(sk); unsigned size = 0; + struct dst_entry *dst = __sk_dst_get(sk); #ifdef CONFIG_TCP_MD5SIG *md5 = tp->af_specific->md5_lookup(sk, sk); @@ -498,7 +499,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb, opts->options |= OPTION_WSCALE; size += TCPOLEN_WSCALE_ALIGNED; } - if (likely(sysctl_tcp_sack)) { + if (likely(sysctl_tcp_sack && + !dst_feature(dst, RTAX_FEATURE_NO_SACK))) { opts->options |= OPTION_SACK_ADVERTISE; if (unlikely(!(OPTION_TS & opts->options))) size += TCPOLEN_SACKPERM_ALIGNED; -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH RFC] Allow disabling TCP timestamp options per route 2009-10-20 15:22 ` [PATCH RFC] Add the no SACK route option feature Gilad Ben-Yossef @ 2009-10-20 15:22 ` Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Allow to turn off TCP window scale opt " Gilad Ben-Yossef 0 siblings, 1 reply; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw) To: netdev; +Cc: ori, Gilad Ben-Yossef Implement querying and acting upon the no timestamp bit in the feature field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> --- include/linux/rtnetlink.h | 2 +- net/ipv4/tcp_input.c | 3 ++- net/ipv4/tcp_output.c | 8 ++++++-- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 9c802a6..2ab8c75 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -378,7 +378,7 @@ enum #define RTAX_FEATURE_ECN 0x00000001 #define RTAX_FEATURE_NO_SACK 0x00000002 -#define RTAX_FEATURE_TIMESTAMP 0x00000004 +#define RTAX_FEATURE_NO_TSTAMP 0x00000004 #define RTAX_FEATURE_ALLFRAG 0x00000008 struct rta_session diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b14f780..d2f9742 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3755,7 +3755,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx, case TCPOPT_TIMESTAMP: if ((opsize == TCPOLEN_TIMESTAMP) && ((estab && opt_rx->tstamp_ok) || - (!estab && sysctl_tcp_timestamps))) { + (!estab && sysctl_tcp_timestamps && + !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP)))) { opt_rx->saw_tstamp = 1; opt_rx->rcv_tsval = get_unaligned_be32(ptr); opt_rx->rcv_tsecr = get_unaligned_be32(ptr + 4); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 64db8dd..8f30c18 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -488,7 +488,9 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb, opts->mss = tcp_advertise_mss(sk); size += TCPOLEN_MSS_ALIGNED; - if (likely(sysctl_tcp_timestamps && *md5 == NULL)) { + if (likely(sysctl_tcp_timestamps && + !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) && + *md5 == NULL)) { opts->options |= OPTION_TS; opts->tsval = TCP_SKB_CB(skb)->when; opts->tsecr = tp->rx_opt.ts_recent; @@ -2317,7 +2319,9 @@ static void tcp_connect_init(struct sock *sk) * See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT. */ tp->tcp_header_len = sizeof(struct tcphdr) + - (sysctl_tcp_timestamps ? TCPOLEN_TSTAMP_ALIGNED : 0); + (sysctl_tcp_timestamps && + (!dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) ? + TCPOLEN_TSTAMP_ALIGNED : 0)); #ifdef CONFIG_TCP_MD5SIG if (tp->af_specific->md5_lookup(sk, sk) != NULL) -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH RFC] Allow to turn off TCP window scale opt per route 2009-10-20 15:22 ` [PATCH RFC] Allow disabling TCP timestamp options per route Gilad Ben-Yossef @ 2009-10-20 15:22 ` Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Allow disabling of DSACK TCP option " Gilad Ben-Yossef 2009-10-21 1:40 ` [PATCH RFC] Allow to turn off TCP window scale opt per route Stephen Hemminger 0 siblings, 2 replies; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw) To: netdev; +Cc: ori, Gilad Ben-Yossef Add and use no window scale bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> --- include/linux/rtnetlink.h | 1 + net/ipv4/tcp_input.c | 3 ++- net/ipv4/tcp_output.c | 6 ++++-- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 2ab8c75..6784b34 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -380,6 +380,7 @@ enum #define RTAX_FEATURE_NO_SACK 0x00000002 #define RTAX_FEATURE_NO_TSTAMP 0x00000004 #define RTAX_FEATURE_ALLFRAG 0x00000008 +#define RTAX_FEATURE_NO_WSCALE 0x00000010 struct rta_session { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d2f9742..4f5e914 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3739,7 +3739,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx, break; case TCPOPT_WINDOW: if (opsize == TCPOLEN_WINDOW && th->syn && - !estab && sysctl_tcp_window_scaling) { + !estab && sysctl_tcp_window_scaling && + !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)) { __u8 snd_wscale = *(__u8 *)ptr; opt_rx->wscale_ok = 1; if (snd_wscale > 14) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 8f30c18..ff60a21 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -496,7 +496,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb, opts->tsecr = tp->rx_opt.ts_recent; size += TCPOLEN_TSTAMP_ALIGNED; } - if (likely(sysctl_tcp_window_scaling)) { + if (likely(sysctl_tcp_window_scaling && + !dst_feature(dst, RTAX_FEATURE_NO_WSCALE))) { opts->ws = tp->rx_opt.rcv_wscale; opts->options |= OPTION_WSCALE; size += TCPOLEN_WSCALE_ALIGNED; @@ -2347,7 +2348,8 @@ static void tcp_connect_init(struct sock *sk) tp->advmss - (tp->rx_opt.ts_recent_stamp ? tp->tcp_header_len - sizeof(struct tcphdr) : 0), &tp->rcv_wnd, &tp->window_clamp, - sysctl_tcp_window_scaling, + (sysctl_tcp_window_scaling && + !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)), &rcv_wscale); tp->rx_opt.rcv_wscale = rcv_wscale; -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH RFC] Allow disabling of DSACK TCP option per route 2009-10-20 15:22 ` [PATCH RFC] Allow to turn off TCP window scale opt " Gilad Ben-Yossef @ 2009-10-20 15:22 ` Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Document future removal of sysctl_tcp_* options Gilad Ben-Yossef 2009-10-21 1:40 ` [PATCH RFC] Allow to turn off TCP window scale opt per route Stephen Hemminger 1 sibling, 1 reply; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw) To: netdev; +Cc: ori, Gilad Ben-Yossef Add and use no DSCAK bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> --- include/linux/rtnetlink.h | 1 + net/ipv4/tcp_input.c | 8 ++++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 6784b34..e78b60c 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -381,6 +381,7 @@ enum #define RTAX_FEATURE_NO_TSTAMP 0x00000004 #define RTAX_FEATURE_ALLFRAG 0x00000008 #define RTAX_FEATURE_NO_WSCALE 0x00000010 +#define RTAX_FEATURE_NO_DSACK 0x00000020 struct rta_session { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4f5e914..4262da5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4080,8 +4080,10 @@ static inline int tcp_sack_extend(struct tcp_sack_block *sp, u32 seq, static void tcp_dsack_set(struct sock *sk, u32 seq, u32 end_seq) { struct tcp_sock *tp = tcp_sk(sk); + struct dst_entry *dst = __sk_dst_get(sk); - if (tcp_is_sack(tp) && sysctl_tcp_dsack) { + if (tcp_is_sack(tp) && sysctl_tcp_dsack && + !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) { int mib_idx; if (before(seq, tp->rcv_nxt)) @@ -4110,13 +4112,15 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq) static void tcp_send_dupack(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); + struct dst_entry *dst = __sk_dst_get(sk); if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq && before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST); tcp_enter_quickack_mode(sk); - if (tcp_is_sack(tp) && sysctl_tcp_dsack) { + if (tcp_is_sack(tp) && sysctl_tcp_dsack && + !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) { u32 end_seq = TCP_SKB_CB(skb)->end_seq; if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH RFC] Document future removal of sysctl_tcp_* options 2009-10-20 15:22 ` [PATCH RFC] Allow disabling of DSACK TCP option " Gilad Ben-Yossef @ 2009-10-20 15:22 ` Gilad Ben-Yossef 0 siblings, 0 replies; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-20 15:22 UTC (permalink / raw) To: netdev; +Cc: ori, Gilad Ben-Yossef No need for global kill switches if we have per route entry controls. Wait a year before removing in case someone is using this. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> --- Documentation/feature-removal-schedule.txt | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 89a47b5..60db855 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -6,6 +6,18 @@ be removed from this file. --------------------------- +What: sysctl_tcp_sack, sysctl_tcp_timestamps, sysctl_tcp_window_scaling, + sysctl_tcp_dsack +When: October 2010 + +Why: These options can now be set on a per route basis via the + RTAX_FEATURE_NO_SACK, RTAX_FEATURE_NO_TSTAMP, RTAX_FEATURE_NO_WSCALE, + and RTAX_FEATURE_NO_DSACK route feature options. + +Who: Gilad Ben-Yossef <gilad@codefidence.com> + +--------------------------- + What: PRISM54 When: 2.6.34 -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Allow to turn off TCP window scale opt per route 2009-10-20 15:22 ` [PATCH RFC] Allow to turn off TCP window scale opt " Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Allow disabling of DSACK TCP option " Gilad Ben-Yossef @ 2009-10-21 1:40 ` Stephen Hemminger 2009-10-21 8:18 ` Gilad Ben-Yossef 1 sibling, 1 reply; 23+ messages in thread From: Stephen Hemminger @ 2009-10-21 1:40 UTC (permalink / raw) To: Gilad Ben-Yossef; +Cc: netdev, ori, Gilad Ben-Yossef On Tue, 20 Oct 2009 17:22:39 +0200 Gilad Ben-Yossef <gilad@codefidence.com> wrote: > Add and use no window scale bit in the features field. > > Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> > Sigend-off-by: Ori Finkelman <ori@comsleep.com> > Sigend-off-by: Yony Amit <yony@comsleep.com> The same effect can by just using window limit on route ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Allow to turn off TCP window scale opt per route 2009-10-21 1:40 ` [PATCH RFC] Allow to turn off TCP window scale opt per route Stephen Hemminger @ 2009-10-21 8:18 ` Gilad Ben-Yossef 0 siblings, 0 replies; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-21 8:18 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev, ori Stephen Hemminger wrote: > On Tue, 20 Oct 2009 17:22:39 +0200 > Gilad Ben-Yossef <gilad@codefidence.com> wrote: > > >> Add and use no window scale bit in the features field. >> >> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> >> Sigend-off-by: Ori Finkelman <ori@comsleep.com> >> Sigend-off-by: Yony Amit <yony@comsleep.com> >> > > The same effect can by just using window limit on route > Actually, not exactly. There is a subtle but important difference between "I support window scaling, but choose a scale of 0", which is what your suggestion will imply and "I don't support window scaling at all", which is what the suggested feature field does. To make things more complicated, until the previous patch I sent, when Linux would choose a window scale of zero for any reason, it would also wrongfully report it does not support window scaling at all to the other side, with some subtle bug caused by it, but this is no longer the behavior and I believe rightly so. Thanks, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker & CTO Codefidence Ltd. Web: http://codefidence.com Cell: +972-52-8260388 Skype: gilad_codefidence Tel: +972-8-9316883 ext. 201 Fax: +972-8-9316884 Email: gilad@codefidence.com Check out our Open Source technology and training blog - http://tuxology.net "Sorry cannot parse this, its too long to be true :)" -- Eric Dumazet on netdev mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef @ 2009-10-20 15:44 ` Eric Dumazet 2009-10-20 16:11 ` Gilad Ben-Yossef 2009-10-20 16:26 ` Rick Jones ` (2 subsequent siblings) 4 siblings, 1 reply; 23+ messages in thread From: Eric Dumazet @ 2009-10-20 15:44 UTC (permalink / raw) To: Gilad Ben-Yossef; +Cc: netdev, ori Gilad Ben-Yossef a écrit : > Turn the global sysctls allowing disabling of TCP SACK, DSCAK, > time stamp and window scale into per route entry feature options, > laying the ground to future removal of the relevant global sysctls. > > You really only want to disable SACK, DSACK, time stamp or window > scale if you've got a piece of broken networking equipment somewhere > as a stop gap until you can bring a big enough hammer to deal with > the broken network equipment. It doesn't make sense to "punish" the > entire connections going through the machine to destinations not > related to the broken equipment. > > This is doubly true when you're dealing with network containers > used to isolate several virtual domains. > > Per route options implemented in free bits in the features route > entry property, which in some cases were reserved by name for these > options, so this does not inflate any structure and I expect that > when the apropriate global sysctls will be removed the overall code > base will be smaller. > > Tested on x86 using Qemu/KVM. > > Will send the matching patch to iproute2 if/when this is ACKed or > if someone wants to test this. > > Patchset based on original work by Ori Finkelman and Yoni Amit > from ComSleep Ltd. > > Gilad Ben-Yossef (8): > Only parse time stamp TCP option in time wait sock > Allow tcp_parse_options to consult dst entry > Infrastructure for querying route entry features > Add the no SACK route option feature > Allow disabling TCP timestamp options per route > Allow to turn off TCP window scale opt per route > Allow disabling of DSACK TCP option per route > Document future removal of sysctl_tcp_* options > Interesting... But you should give numbers to your patches so that we know their order You could also do the ECN part for consistency (ie RTAX_FEATURE_ECN -> RTAX_FEATURE_NO_ECN) And please post iproute2 patches as well :) Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-20 15:44 ` [PATCH RFC] Per route TCP options Eric Dumazet @ 2009-10-20 16:11 ` Gilad Ben-Yossef 2009-10-20 18:53 ` Ilpo Järvinen 0 siblings, 1 reply; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-20 16:11 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, ori Eric Dumazet wrote: > Gilad Ben-Yossef a écrit : > >> Turn the global sysctls allowing disabling of TCP SACK, DSCAK, >> time stamp and window scale into per route entry feature options, >> laying the ground to future removal of the relevant global sysctls. >> >> ... > Interesting... But you should give numbers to your patches so that we know their order > Will do. > You could also do the ECN part for consistency (ie RTAX_FEATURE_ECN -> RTAX_FEATURE_NO_ECN) > That and sysctl_tcp_mtu_probing are on my todo list, assuming the general direction is accepted, of course. Specifically, I couldn't understand why sysctl_tcp_ecn is documented to be a boolean value, but is initialized to 2 and queried with if (sysctl_tcp_ecn == 1) so I decided to let it be until I figure it out... ;-) > And please post iproute2 patches as well :) > Will do. Note that the patch I have still has an ugly bit that needs addressing - it sets the features by name, but right now only displays numeric values to the, for "show". This of course will be fixed, but it does work for testing. Thanks, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker & CTO Codefidence Ltd. Web: http://codefidence.com Cell: +972-52-8260388 Skype: gilad_codefidence Tel: +972-8-9316883 ext. 201 Fax: +972-8-9316884 Email: gilad@codefidence.com Check out our Open Source technology and training blog - http://tuxology.net "Sorry cannot parse this, its too long to be true :)" -- Eric Dumazet on netdev mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-20 16:11 ` Gilad Ben-Yossef @ 2009-10-20 18:53 ` Ilpo Järvinen 2009-10-21 8:15 ` Gilad Ben-Yossef 0 siblings, 1 reply; 23+ messages in thread From: Ilpo Järvinen @ 2009-10-20 18:53 UTC (permalink / raw) To: Gilad Ben-Yossef; +Cc: Eric Dumazet, Netdev, ori [-- Attachment #1: Type: TEXT/PLAIN, Size: 1052 bytes --] On Tue, 20 Oct 2009, Gilad Ben-Yossef wrote: > Eric Dumazet wrote: > > > Gilad Ben-Yossef a écrit : > > > > > Turn the global sysctls allowing disabling of TCP SACK, DSCAK, > > > time stamp and window scale into per route entry feature options, > > > laying the ground to future removal of the relevant global sysctls. > > > > > > ... > > Interesting... But you should give numbers to your patches so that we know > > their order > > > Will do. > > You could also do the ECN part for consistency (ie RTAX_FEATURE_ECN -> > > RTAX_FEATURE_NO_ECN) > > > That and sysctl_tcp_mtu_probing are on my todo list, assuming the general > direction is accepted, of course. > > Specifically, I couldn't understand why sysctl_tcp_ecn is documented to be a > boolean value, but is initialized to 2 and queried with if (sysctl_tcp_ecn == > 1) so I decided to let it be until I figure it out... ;-) Ah, I didn't notice that "- BOOLEAN" there so I modified only the description (some blindness to caps perhaps :-)), did you perhaps read it fully? -- i. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-20 18:53 ` Ilpo Järvinen @ 2009-10-21 8:15 ` Gilad Ben-Yossef 0 siblings, 0 replies; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-21 8:15 UTC (permalink / raw) To: Ilpo Järvinen; +Cc: Eric Dumazet, Netdev, ori Hi Ilpo, Ilpo Järvinen wrote: > >> Specifically, I couldn't understand why sysctl_tcp_ecn is documented to be a >> boolean value, but is initialized to 2 and queried with if (sysctl_tcp_ecn == >> 1) so I decided to let it be until I figure it out... ;-) >> > > Ah, I didn't notice that "- BOOLEAN" there so I modified only the > description (some blindness to caps perhaps :-)), did you perhaps read > it fully? > > Yes, I did finally. What threw me off was that that TCP_ECN_create_request() were the value if checked as a boolean was inlined in the include file. I've missed that. Thanks for the tip. At any rate, it means that "noecn" is not a good route option feature, since it is not boolean. I plan to handle those in the next patch set, if the current one is accepted. Thanks, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker & CTO Codefidence Ltd. Web: http://codefidence.com Cell: +972-52-8260388 Skype: gilad_codefidence Tel: +972-8-9316883 ext. 201 Fax: +972-8-9316884 Email: gilad@codefidence.com Check out our Open Source technology and training blog - http://tuxology.net "Sorry cannot parse this, its too long to be true :)" -- Eric Dumazet on netdev mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef 2009-10-20 15:44 ` [PATCH RFC] Per route TCP options Eric Dumazet @ 2009-10-20 16:26 ` Rick Jones 2009-10-21 8:04 ` Gilad Ben-Yossef 2009-10-21 0:36 ` David Miller 2009-10-21 2:13 ` Bill Fink 4 siblings, 1 reply; 23+ messages in thread From: Rick Jones @ 2009-10-20 16:26 UTC (permalink / raw) To: Gilad Ben-Yossef; +Cc: netdev, ori Gilad Ben-Yossef wrote: > Turn the global sysctls allowing disabling of TCP SACK, DSCAK, > time stamp and window scale into per route entry feature options, > laying the ground to future removal of the relevant global sysctls. > > You really only want to disable SACK, DSACK, time stamp or window > scale if you've got a piece of broken networking equipment somewhere > as a stop gap until you can bring a big enough hammer to deal with > the broken network equipment. It doesn't make sense to "punish" the > entire connections going through the machine to destinations not > related to the broken equipment. Is it really only the case that those options get disabled for broken networking equipment? Does this presage making all TCP options per-route only? rick jones ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-20 16:26 ` Rick Jones @ 2009-10-21 8:04 ` Gilad Ben-Yossef 2009-10-21 8:21 ` Florian Westphal 0 siblings, 1 reply; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-21 8:04 UTC (permalink / raw) To: Rick Jones; +Cc: netdev, ori Rick Jones wrote: > Gilad Ben-Yossef wrote: >> Turn the global sysctls allowing disabling of TCP SACK, DSCAK, >> time stamp and window scale into per route entry feature options, >> laying the ground to future removal of the relevant global sysctls. >> >> You really only want to disable SACK, DSACK, time stamp or window >> scale if you've got a piece of broken networking equipment somewhere >> as a stop gap until you can bring a big enough hammer to deal with >> the broken network equipment. It doesn't make sense to "punish" the >> entire connections going through the machine to destinations not >> related to the broken equipment. > > Is it really only the case that those options get disabled for broken > networking equipment? Does this presage making all TCP options > per-route only? Well, I assume it might be the case that there are situations where you are trying to communicate over some exotic link where the networking equipment is not broken as such, but the unusual properties of the link makes one of the features not desirable. I can't think of such a situation right now off the top of my head, but maybe they exist. The point is that even then you are more then likely to wish to turn off these options to specific destination and routes (that go over said exotic link) and keep using them over others - e.g. timestamp OK for local LAN, but for default route that goes over exotic TCP/IP over carrier penguins turn it off. To sum it up, I think making these options per route is a win no matter the situation. The question I am less certain about if it is also desirable to have a global kill switch in addition to the per route options. My gut feeling is that this is not needed once you have a per route option. Cheers, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker & CTO Codefidence Ltd. Web: http://codefidence.com Cell: +972-52-8260388 Skype: gilad_codefidence Tel: +972-8-9316883 ext. 201 Fax: +972-8-9316884 Email: gilad@codefidence.com Check out our Open Source technology and training blog - http://tuxology.net "Sorry cannot parse this, its too long to be true :)" -- Eric Dumazet on netdev mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-21 8:04 ` Gilad Ben-Yossef @ 2009-10-21 8:21 ` Florian Westphal 2009-10-21 8:40 ` Gilad Ben-Yossef 0 siblings, 1 reply; 23+ messages in thread From: Florian Westphal @ 2009-10-21 8:21 UTC (permalink / raw) To: Gilad Ben-Yossef; +Cc: Rick Jones, netdev, ori Gilad Ben-Yossef <gilad@codefidence.com> wrote: > The point is that even then you are more then likely to wish to turn > off these options to specific destination and routes (that go over > said exotic link) and keep using them over others - e.g. timestamp > OK for local LAN, but for default route that goes over exotic TCP/IP > over carrier penguins turn it off. If you need a bandaid solution, its is possible to replace tcp options with NOOPs using netfilters TCPOPTSTRIP target. There is also an ECN target to work around ECN blackholes. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-21 8:21 ` Florian Westphal @ 2009-10-21 8:40 ` Gilad Ben-Yossef 0 siblings, 0 replies; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-21 8:40 UTC (permalink / raw) To: Florian Westphal; +Cc: Rick Jones, netdev, ori Hi Florian, Florian Westphal wrote: > Gilad Ben-Yossef <gilad@codefidence.com> wrote: > >> The point is that even then you are more then likely to wish to turn >> off these options to specific destination and routes (that go over >> said exotic link) and keep using them over others - e.g. timestamp >> OK for local LAN, but for default route that goes over exotic TCP/IP >> over carrier penguins turn it off. >> > > If you need a bandaid solution, its is possible to replace tcp > options with NOOPs using netfilters TCPOPTSTRIP target. > > There is also an ECN target to work around ECN blackholes. > Thanks for the tip. It is appreciated. The band aid solution that Ori and company found was simply to patch the local copy of the kernel used but being that sitting on a bunch of "private" patches seems like a lose-lose situation (there is a term you don't hear much when talking to MBAs :-) I'm now trying to get it mainlined for them. Thanks, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker & CTO Codefidence Ltd. Web: http://codefidence.com Cell: +972-52-8260388 Skype: gilad_codefidence Tel: +972-8-9316883 ext. 201 Fax: +972-8-9316884 Email: gilad@codefidence.com Check out our Open Source technology and training blog - http://tuxology.net "Sorry cannot parse this, its too long to be true :)" -- Eric Dumazet on netdev mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef ` (2 preceding siblings ...) 2009-10-20 16:26 ` Rick Jones @ 2009-10-21 0:36 ` David Miller 2009-10-21 8:10 ` Gilad Ben-Yossef 2009-10-21 2:13 ` Bill Fink 4 siblings, 1 reply; 23+ messages in thread From: David Miller @ 2009-10-21 0:36 UTC (permalink / raw) To: gilad; +Cc: netdev, ori When sending more than one patch in a patch set, you must number your patches so that people know in what order your pathes should be applied. Mailing lists and other entities reorder list postings quite severely, so it's not "obvious" what order your postings matter just based upon arrival order. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-21 0:36 ` David Miller @ 2009-10-21 8:10 ` Gilad Ben-Yossef 0 siblings, 0 replies; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-21 8:10 UTC (permalink / raw) To: David Miller; +Cc: netdev, ori David Miller wrote: > When sending more than one patch in a patch set, you must > number your patches so that people know in what order your > pathes should be applied. > > Mailing lists and other entities reorder list postings > quite severely, so it's not "obvious" what order your > postings matter just based upon arrival order. > > Yes, of course. How silly of me not to notice. My apologies. git and I are currently have a bit of love hate thing going on... I will re-send the patch set in a more favorable form. Thanks, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker & CTO Codefidence Ltd. Web: http://codefidence.com Cell: +972-52-8260388 Skype: gilad_codefidence Tel: +972-8-9316883 ext. 201 Fax: +972-8-9316884 Email: gilad@codefidence.com Check out our Open Source technology and training blog - http://tuxology.net "Sorry cannot parse this, its too long to be true :)" -- Eric Dumazet on netdev mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef ` (3 preceding siblings ...) 2009-10-21 0:36 ` David Miller @ 2009-10-21 2:13 ` Bill Fink 2009-10-21 8:27 ` Gilad Ben-Yossef 4 siblings, 1 reply; 23+ messages in thread From: Bill Fink @ 2009-10-21 2:13 UTC (permalink / raw) To: Gilad Ben-Yossef; +Cc: netdev, ori On Tue, 20 Oct 2009, Gilad Ben-Yossef wrote: > Turn the global sysctls allowing disabling of TCP SACK, DSCAK, > time stamp and window scale into per route entry feature options, > laying the ground to future removal of the relevant global sysctls. > > You really only want to disable SACK, DSACK, time stamp or window > scale if you've got a piece of broken networking equipment somewhere > as a stop gap until you can bring a big enough hammer to deal with > the broken network equipment. It doesn't make sense to "punish" the > entire connections going through the machine to destinations not > related to the broken equipment. For certain test situations, it is sometimes desirable to globally disable TCP timestamps. Although I have not personally wanted to globally disable the other mentioned features, I can imagine test scenarios where it could be useful. Admittedly it could also be accomplished with per route features, just not as conveniently, especially if there are a large number of interfaces and/or routes. -Bill ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH RFC] Per route TCP options 2009-10-21 2:13 ` Bill Fink @ 2009-10-21 8:27 ` Gilad Ben-Yossef 0 siblings, 0 replies; 23+ messages in thread From: Gilad Ben-Yossef @ 2009-10-21 8:27 UTC (permalink / raw) To: Bill Fink; +Cc: netdev, ori Hi, Bill Fink wrote: > On Tue, 20 Oct 2009, Gilad Ben-Yossef wrote: > > >> Turn the global sysctls allowing disabling of TCP SACK, DSCAK, >> time stamp and window scale into per route entry feature options, >> laying the ground to future removal of the relevant global sysctls. >> >> You really only want to disable SACK, DSACK, time stamp or window >> scale if you've got a piece of broken networking equipment somewhere >> as a stop gap until you can bring a big enough hammer to deal with >> the broken network equipment. It doesn't make sense to "punish" the >> entire connections going through the machine to destinations not >> related to the broken equipment. >> > > For certain test situations, it is sometimes desirable to globally > disable TCP timestamps. Although I have not personally wanted to > globally disable the other mentioned features, I can imagine test > scenarios where it could be useful. Admittedly it could also be > accomplished with per route features, just not as conveniently, > especially if there are a large number of interfaces and/or routes. > > I don't feel that strongly about it either way, just trying to hearken to Linus "Linux is bloated" message by not fostering duplicate ways to do the same thing... ;-) If the consensus is to adopt the route features but leave the global kill switches, I certainly have no problem with that. Thanks, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker & CTO Codefidence Ltd. Web: http://codefidence.com Cell: +972-52-8260388 Skype: gilad_codefidence Tel: +972-8-9316883 ext. 201 Fax: +972-8-9316884 Email: gilad@codefidence.com Check out our Open Source technology and training blog - http://tuxology.net "Sorry cannot parse this, its too long to be true :)" -- Eric Dumazet on netdev mailing list ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2009-10-21 8:40 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-20 15:22 [PATCH RFC] Per route TCP options Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Allow tcp_parse_options to consult dst entry Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Add dst_feature to query route entry features Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Add the no SACK route option feature Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Allow disabling TCP timestamp options per route Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Allow to turn off TCP window scale opt " Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Allow disabling of DSACK TCP option " Gilad Ben-Yossef 2009-10-20 15:22 ` [PATCH RFC] Document future removal of sysctl_tcp_* options Gilad Ben-Yossef 2009-10-21 1:40 ` [PATCH RFC] Allow to turn off TCP window scale opt per route Stephen Hemminger 2009-10-21 8:18 ` Gilad Ben-Yossef 2009-10-20 15:44 ` [PATCH RFC] Per route TCP options Eric Dumazet 2009-10-20 16:11 ` Gilad Ben-Yossef 2009-10-20 18:53 ` Ilpo Järvinen 2009-10-21 8:15 ` Gilad Ben-Yossef 2009-10-20 16:26 ` Rick Jones 2009-10-21 8:04 ` Gilad Ben-Yossef 2009-10-21 8:21 ` Florian Westphal 2009-10-21 8:40 ` Gilad Ben-Yossef 2009-10-21 0:36 ` David Miller 2009-10-21 8:10 ` Gilad Ben-Yossef 2009-10-21 2:13 ` Bill Fink 2009-10-21 8:27 ` Gilad Ben-Yossef
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).