* [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock
2009-10-21 8:56 [PATCH v2 0/8] Per route TCP options Gilad Ben-Yossef
@ 2009-10-21 8:56 ` Gilad Ben-Yossef
2009-10-21 9:49 ` William Allen Simpson
2009-10-21 8:56 ` [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry Gilad Ben-Yossef
` (6 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 8:56 UTC (permalink / raw)
To: netdev; +Cc: ori, Gilad Ben-Yossef, Yony Amit
A time wait socket is established - we already know if time stamp
option is called for or not.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>
Signed-off-by: Yony Amit <yony@comsleep.com>
---
net/ipv4/tcp_minisocks.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 624c3c9..c49a550 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -100,9 +100,8 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
struct tcp_options_received tmp_opt;
int paws_reject = 0;
- tmp_opt.saw_tstamp = 0;
if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {
- tcp_parse_options(skb, &tmp_opt, 0);
+ tcp_parse_options(skb, &tmp_opt, 1);
if (tmp_opt.saw_tstamp) {
tmp_opt.ts_recent = tcptw->tw_ts_recent;
--
1.5.6.3
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock
2009-10-21 8:56 ` [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef
@ 2009-10-21 9:49 ` William Allen Simpson
2009-10-21 10:07 ` Gilad Ben-Yossef
0 siblings, 1 reply; 34+ messages in thread
From: William Allen Simpson @ 2009-10-21 9:49 UTC (permalink / raw)
To: netdev
Gilad Ben-Yossef wrote:
> A time wait socket is established - we already know if time stamp
> option is called for or not.
>
Not so sure about this. A timewait sock isn't actually established,
and new/changed options could appear. There's all sorts of edge cases.
There's also some current work to note:
http://tools.ietf.org/html/draft-ietf-tcpm-1323bis
http://tools.ietf.org/html/draft-gont-tcpm-tcp-timestamps
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock
2009-10-21 9:49 ` William Allen Simpson
@ 2009-10-21 10:07 ` Gilad Ben-Yossef
2009-10-21 18:59 ` William Allen Simpson
0 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 10:07 UTC (permalink / raw)
To: William Allen Simpson; +Cc: netdev
William Allen Simpson wrote:
> Gilad Ben-Yossef wrote:
>> A time wait socket is established - we already know if time stamp
>> option is called for or not.
>>
> Not so sure about this. A timewait sock isn't actually established,
> and new/changed options could appear. There's all sorts of edge cases.
If you examine the specific context where tcp_parse_options is being
called here,
the only TCP option which is of interest is the time stamp option, and
this code path
is only being taken when we already know that the original socket had
used the time stamp option.
So while I agree that in general you are right, I do believe that in the
specific context
of this patch we should call tcp_parse_options with the established flag
on and let it
know we are expecting to see a time stamp option, which is what I was
referring to.
>
> There's also some current work to note:
>
> http://tools.ietf.org/html/draft-ietf-tcpm-1323bis
>
> http://tools.ietf.org/html/draft-gont-tcpm-tcp-timestamps
Very interesting, thank you.
As I noted above, my comment about
TIME WAIT sockets being "established" should really only be considered
in the context of the specific call to tcp_parse_options() and the
"established"
parameter of that function.
Thanks,
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Sorry cannot parse this, its too long to be true :)"
-- Eric Dumazet on netdev mailing list
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock
2009-10-21 10:07 ` Gilad Ben-Yossef
@ 2009-10-21 18:59 ` William Allen Simpson
2009-10-25 8:41 ` Gilad Ben-Yossef
0 siblings, 1 reply; 34+ messages in thread
From: William Allen Simpson @ 2009-10-21 18:59 UTC (permalink / raw)
To: netdev
Gilad Ben-Yossef wrote:
> William Allen Simpson wrote:
>
>> Gilad Ben-Yossef wrote:
>>> A time wait socket is established - we already know if time stamp
>>> option is called for or not.
>>>
>> Not so sure about this. A timewait sock isn't actually established,
>> and new/changed options could appear. There's all sorts of edge cases.
> If you examine the specific context where tcp_parse_options is being
> called here,
> the only TCP option which is of interest is the time stamp option, and
> this code path
> is only being taken when we already know that the original socket had
> used the time stamp option.
>
> So while I agree that in general you are right, I do believe that in the
> specific context
> of this patch we should call tcp_parse_options with the established flag
> on and let it
> know we are expecting to see a time stamp option, which is what I was
> referring to.
>
No, a major reason for time-wait is rebooted systems. We don't "know"
anything about them, and they certainly don't know anything about us.
As I mentioned, this is about edge cases.
>>
>> There's also some current work to note:
>>
>> http://tools.ietf.org/html/draft-ietf-tcpm-1323bis
>>
>> http://tools.ietf.org/html/draft-gont-tcpm-tcp-timestamps
>
> Very interesting, thank you.
>
> As I noted above, my comment about
> TIME WAIT sockets being "established" should really only be considered
> in the context of the specific call to tcp_parse_options() and the
> "established"
> parameter of that function.
>
My suggestion, as this patch is not essential to the other patches in the
series, is to separate it. As I'm relatively new to this list, I don't
know the best practice. But I'd like to support the others and delay
this for further consideration.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock
2009-10-21 18:59 ` William Allen Simpson
@ 2009-10-25 8:41 ` Gilad Ben-Yossef
0 siblings, 0 replies; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-25 8:41 UTC (permalink / raw)
To: William Allen Simpson; +Cc: netdev
Hello William,
William Allen Simpson wrote:
>> If you examine the specific context where tcp_parse_options is being
>> called here,
>> the only TCP option which is of interest is the time stamp option,
>> and this code path
>> is only being taken when we already know that the original socket had
>> used the time stamp option.
>>
>> So while I agree that in general you are right, I do believe that in
>> the specific context
>> of this patch we should call tcp_parse_options with the established
>> flag on and let it
>> know we are expecting to see a time stamp option, which is what I was
>> referring to.
>>
> No, a major reason for time-wait is rebooted systems. We don't "know"
> anything about them, and they certainly don't know anything about us.
>
> As I mentioned, this is about edge cases.
I just read thoroughly the code in question again -
We use tcp_parse_option to check if there is a time stamp option in the
packet and if so, get the time stamp from it. We do this only when the
time wait minisocket has information of time stamp from the original
connection. We don't use any other TCP option or other inoformation from
the options read via that call.
The above statements are true both for the original code and my patch.
If there is any corner case with my code it is true for the original
code as well.
>
> My suggestion, as this patch is not essential to the other patches in the
> series, is to separate it. As I'm relatively new to this list, I don't
> know the best practice. But I'd like to support the others and delay
> this for further consideration.
I have no objection to separate or drop it altogether if there is a
specific technical reason why you think the code is wrong. It certainly
is possible I've done some done mistake. In that case, I would love
nothing more to hearing what it is and hopefully fixing it.
But "Maybe there are edge cases we didn't think about" is not specific
enough to work upon :-)
Thanks for all the feedback,
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Sorry cannot parse this, its too long to be true :)"
-- Eric Dumazet on netdev mailing list
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry
2009-10-21 8:56 [PATCH v2 0/8] Per route TCP options Gilad Ben-Yossef
2009-10-21 8:56 ` [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef
@ 2009-10-21 8:56 ` Gilad Ben-Yossef
2009-10-21 13:03 ` Ilpo Järvinen
2009-10-21 8:56 ` [PATCH v2 3/8] Add dst_feature to query route entry features Gilad Ben-Yossef
` (5 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 8:56 UTC (permalink / raw)
To: netdev; +Cc: ori, Gilad Ben-Yossef
We need tcp_parse_options to be aware of dst_entry to
take into account per dst_entry TCP options settings
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
include/net/tcp.h | 3 ++-
net/ipv4/syncookies.c | 27 ++++++++++++++-------------
net/ipv4/tcp_input.c | 9 ++++++---
net/ipv4/tcp_ipv4.c | 19 ++++++++++---------
net/ipv4/tcp_minisocks.c | 7 +++++--
net/ipv6/syncookies.c | 28 +++++++++++++++-------------
net/ipv6/tcp_ipv6.c | 3 ++-
7 files changed, 54 insertions(+), 42 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..740d09b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -409,7 +409,8 @@ extern int tcp_recvmsg(struct kiocb *iocb, struct sock *sk,
extern void tcp_parse_options(struct sk_buff *skb,
struct tcp_options_received *opt_rx,
- int estab);
+ int estab,
+ struct dst_entry *dst);
extern u8 *tcp_parse_md5sig_option(struct tcphdr *th);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index a6e0e07..4990dd4 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -276,13 +276,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
- /* check for timestamp cookie support */
- memset(&tcp_opt, 0, sizeof(tcp_opt));
- tcp_parse_options(skb, &tcp_opt, 0);
-
- if (tcp_opt.saw_tstamp)
- cookie_check_timestamp(&tcp_opt);
-
ret = NULL;
req = inet_reqsk_alloc(&tcp_request_sock_ops); /* for safety */
if (!req)
@@ -298,12 +291,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
ireq->loc_addr = ip_hdr(skb)->daddr;
ireq->rmt_addr = ip_hdr(skb)->saddr;
ireq->ecn_ok = 0;
- ireq->snd_wscale = tcp_opt.snd_wscale;
- ireq->rcv_wscale = tcp_opt.rcv_wscale;
- ireq->sack_ok = tcp_opt.sack_ok;
- ireq->wscale_ok = tcp_opt.wscale_ok;
- ireq->tstamp_ok = tcp_opt.saw_tstamp;
- req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
/* We throwed the options of the initial SYN away, so we hope
* the ACK carries the same options again (see RFC1122 4.2.3.8)
@@ -351,6 +338,20 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
}
}
+ /* check for timestamp cookie support */
+ memset(&tcp_opt, 0, sizeof(tcp_opt));
+ tcp_parse_options(skb, &tcp_opt, 0, &rt->u.dst);
+
+ if (tcp_opt.saw_tstamp)
+ cookie_check_timestamp(&tcp_opt);
+
+ ireq->snd_wscale = tcp_opt.snd_wscale;
+ ireq->rcv_wscale = tcp_opt.rcv_wscale;
+ ireq->sack_ok = tcp_opt.sack_ok;
+ ireq->wscale_ok = tcp_opt.wscale_ok;
+ ireq->tstamp_ok = tcp_opt.saw_tstamp;
+ req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
+
/* Try to redo what tcp_v4_send_synack did. */
req->window_clamp = tp->window_clamp ? :dst_metric(&rt->u.dst, RTAX_WINDOW);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d86784b..d502f49 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3698,12 +3698,14 @@ old_ack:
* the fast version below fails.
*/
void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
- int estab)
+ int estab, struct dst_entry *dst)
{
unsigned char *ptr;
struct tcphdr *th = tcp_hdr(skb);
int length = (th->doff * 4) - sizeof(struct tcphdr);
+ BUG_ON(!estab && !dst);
+
ptr = (unsigned char *)(th + 1);
opt_rx->saw_tstamp = 0;
@@ -3820,7 +3822,7 @@ static int tcp_fast_parse_options(struct sk_buff *skb, struct tcphdr *th,
if (tcp_parse_aligned_timestamp(tp, th))
return 1;
}
- tcp_parse_options(skb, &tp->rx_opt, 1);
+ tcp_parse_options(skb, &tp->rx_opt, 1, NULL);
return 1;
}
@@ -5364,8 +5366,9 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
struct tcp_sock *tp = tcp_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
int saved_clamp = tp->rx_opt.mss_clamp;
+ struct dst_entry *dst = __sk_dst_get(sk);
- tcp_parse_options(skb, &tp->rx_opt, 0);
+ tcp_parse_options(skb, &tp->rx_opt, 0, dst);
if (th->ack) {
/* rfc793:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7cda24b..1cb0ec4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
#endif
+ ireq = inet_rsk(req);
+ ireq->loc_addr = daddr;
+ ireq->rmt_addr = saddr;
+ ireq->no_srccheck = inet_sk(sk)->transparent;
+ ireq->opt = tcp_v4_save_options(sk, skb);
+
+ dst = inet_csk_route_req(sk, req);
tcp_clear_options(&tmp_opt);
tmp_opt.mss_clamp = 536;
tmp_opt.user_mss = tcp_sk(sk)->rx_opt.user_mss;
- tcp_parse_options(skb, &tmp_opt, 0);
+ tcp_parse_options(skb, &tmp_opt, 0, dst);
if (want_cookie && !tmp_opt.saw_tstamp)
tcp_clear_options(&tmp_opt);
@@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
tcp_openreq_init(req, &tmp_opt, skb);
- ireq = inet_rsk(req);
- ireq->loc_addr = daddr;
- ireq->rmt_addr = saddr;
- ireq->no_srccheck = inet_sk(sk)->transparent;
- ireq->opt = tcp_v4_save_options(sk, skb);
-
if (security_inet_conn_request(sk, skb, req))
- goto drop_and_free;
+ goto drop_and_release;
if (!want_cookie)
TCP_ECN_create_request(req, tcp_hdr(skb));
@@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
*/
if (tmp_opt.saw_tstamp &&
tcp_death_row.sysctl_tw_recycle &&
- (dst = inet_csk_route_req(sk, req)) != NULL &&
+ dst != NULL &&
(peer = rt_get_peer((struct rtable *)dst)) != NULL &&
peer->v4daddr == saddr) {
if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index c49a550..70ff955 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -101,7 +101,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
int paws_reject = 0;
if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {
- tcp_parse_options(skb, &tmp_opt, 1);
+ tcp_parse_options(skb, &tmp_opt, 1, NULL);
if (tmp_opt.saw_tstamp) {
tmp_opt.ts_recent = tcptw->tw_ts_recent;
@@ -499,10 +499,11 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
int paws_reject = 0;
struct tcp_options_received tmp_opt;
struct sock *child;
+ struct dst_entry *dst = inet_csk_route_req(sk, req);
tmp_opt.saw_tstamp = 0;
if (th->doff > (sizeof(struct tcphdr)>>2)) {
- tcp_parse_options(skb, &tmp_opt, 0);
+ tcp_parse_options(skb, &tmp_opt, 0, dst);
if (tmp_opt.saw_tstamp) {
tmp_opt.ts_recent = req->ts_recent;
@@ -515,6 +516,8 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
}
}
+ dst_release(dst);
+
/* Check for pure retransmitted SYN. */
if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn &&
flg == TCP_FLAG_SYN &&
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 6b6ae91..6ece408 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -184,13 +184,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
- /* check for timestamp cookie support */
- memset(&tcp_opt, 0, sizeof(tcp_opt));
- tcp_parse_options(skb, &tcp_opt, 0);
-
- if (tcp_opt.saw_tstamp)
- cookie_check_timestamp(&tcp_opt);
-
ret = NULL;
req = inet6_reqsk_alloc(&tcp6_request_sock_ops);
if (!req)
@@ -224,12 +217,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
req->expires = 0UL;
req->retrans = 0;
ireq->ecn_ok = 0;
- ireq->snd_wscale = tcp_opt.snd_wscale;
- ireq->rcv_wscale = tcp_opt.rcv_wscale;
- ireq->sack_ok = tcp_opt.sack_ok;
- ireq->wscale_ok = tcp_opt.wscale_ok;
- ireq->tstamp_ok = tcp_opt.saw_tstamp;
- req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
treq->rcv_isn = ntohl(th->seq) - 1;
treq->snt_isn = cookie;
@@ -264,6 +251,21 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
goto out_free;
}
+ /* check for timestamp cookie support */
+ memset(&tcp_opt, 0, sizeof(tcp_opt));
+ tcp_parse_options(skb, &tcp_opt, 0, dst);
+
+ if (tcp_opt.saw_tstamp)
+ cookie_check_timestamp(&tcp_opt);
+
+ req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
+
+ ireq->snd_wscale = tcp_opt.snd_wscale;
+ ireq->rcv_wscale = tcp_opt.rcv_wscale;
+ ireq->sack_ok = tcp_opt.sack_ok;
+ ireq->wscale_ok = tcp_opt.wscale_ok;
+ ireq->tstamp_ok = tcp_opt.saw_tstamp;
+
req->window_clamp = tp->window_clamp ? :dst_metric(dst, RTAX_WINDOW);
tcp_select_initial_window(tcp_full_space(sk), req->mss,
&req->rcv_wnd, &req->window_clamp,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 21d100b..2eebab5 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1165,6 +1165,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
struct tcp_sock *tp = tcp_sk(sk);
struct request_sock *req = NULL;
__u32 isn = TCP_SKB_CB(skb)->when;
+ struct dst_entry *dst = __sk_dst_get(sk);
#ifdef CONFIG_SYN_COOKIES
int want_cookie = 0;
#else
@@ -1203,7 +1204,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
tmp_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
tmp_opt.user_mss = tp->rx_opt.user_mss;
- tcp_parse_options(skb, &tmp_opt, 0);
+ tcp_parse_options(skb, &tmp_opt, 0, dst);
if (want_cookie && !tmp_opt.saw_tstamp)
tcp_clear_options(&tmp_opt);
--
1.5.6.3
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry
2009-10-21 8:56 ` [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry Gilad Ben-Yossef
@ 2009-10-21 13:03 ` Ilpo Järvinen
2009-10-21 14:07 ` Gilad Ben-Yossef
0 siblings, 1 reply; 34+ messages in thread
From: Ilpo Järvinen @ 2009-10-21 13:03 UTC (permalink / raw)
To: Gilad Ben-Yossef; +Cc: Netdev, ori
On Wed, 21 Oct 2009, Gilad Ben-Yossef wrote:
> We need tcp_parse_options to be aware of dst_entry to
> take into account per dst_entry TCP options settings
>
> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> Sigend-off-by: Ori Finkelman <ori@comsleep.com>
> Sigend-off-by: Yony Amit <yony@comsleep.com>
>
> ---
> include/net/tcp.h | 3 ++-
> net/ipv4/syncookies.c | 27 ++++++++++++++-------------
> net/ipv4/tcp_input.c | 9 ++++++---
> net/ipv4/tcp_ipv4.c | 19 ++++++++++---------
> net/ipv4/tcp_minisocks.c | 7 +++++--
> net/ipv6/syncookies.c | 28 +++++++++++++++-------------
> net/ipv6/tcp_ipv6.c | 3 ++-
> 7 files changed, 54 insertions(+), 42 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 03a49c7..740d09b 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -409,7 +409,8 @@ extern int tcp_recvmsg(struct kiocb *iocb, struct sock *sk,
>
> extern void tcp_parse_options(struct sk_buff *skb,
> struct tcp_options_received *opt_rx,
> - int estab);
> + int estab,
> + struct dst_entry *dst);
>
> extern u8 *tcp_parse_md5sig_option(struct tcphdr *th);
>
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index a6e0e07..4990dd4 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -276,13 +276,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>
> NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
>
> - /* check for timestamp cookie support */
> - memset(&tcp_opt, 0, sizeof(tcp_opt));
> - tcp_parse_options(skb, &tcp_opt, 0);
> -
> - if (tcp_opt.saw_tstamp)
> - cookie_check_timestamp(&tcp_opt);
> -
> ret = NULL;
> req = inet_reqsk_alloc(&tcp_request_sock_ops); /* for safety */
> if (!req)
> @@ -298,12 +291,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
> ireq->loc_addr = ip_hdr(skb)->daddr;
> ireq->rmt_addr = ip_hdr(skb)->saddr;
> ireq->ecn_ok = 0;
> - ireq->snd_wscale = tcp_opt.snd_wscale;
> - ireq->rcv_wscale = tcp_opt.rcv_wscale;
> - ireq->sack_ok = tcp_opt.sack_ok;
> - ireq->wscale_ok = tcp_opt.wscale_ok;
> - ireq->tstamp_ok = tcp_opt.saw_tstamp;
> - req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
>
> /* We throwed the options of the initial SYN away, so we hope
> * the ACK carries the same options again (see RFC1122 4.2.3.8)
> @@ -351,6 +338,20 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
> }
> }
>
> + /* check for timestamp cookie support */
> + memset(&tcp_opt, 0, sizeof(tcp_opt));
> + tcp_parse_options(skb, &tcp_opt, 0, &rt->u.dst);
> +
> + if (tcp_opt.saw_tstamp)
> + cookie_check_timestamp(&tcp_opt);
> +
> + ireq->snd_wscale = tcp_opt.snd_wscale;
> + ireq->rcv_wscale = tcp_opt.rcv_wscale;
> + ireq->sack_ok = tcp_opt.sack_ok;
> + ireq->wscale_ok = tcp_opt.wscale_ok;
> + ireq->tstamp_ok = tcp_opt.saw_tstamp;
> + req->ts_recent = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
> +
> /* Try to redo what tcp_v4_send_synack did. */
> req->window_clamp = tp->window_clamp ? :dst_metric(&rt->u.dst, RTAX_WINDOW);
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index d86784b..d502f49 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3698,12 +3698,14 @@ old_ack:
> * the fast version below fails.
> */
> void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
> - int estab)
> + int estab, struct dst_entry *dst)
> {
> unsigned char *ptr;
> struct tcphdr *th = tcp_hdr(skb);
> int length = (th->doff * 4) - sizeof(struct tcphdr);
>
> + BUG_ON(!estab && !dst);
> +
> ptr = (unsigned char *)(th + 1);
> opt_rx->saw_tstamp = 0;
>
> @@ -3820,7 +3822,7 @@ static int tcp_fast_parse_options(struct sk_buff *skb, struct tcphdr *th,
> if (tcp_parse_aligned_timestamp(tp, th))
> return 1;
> }
> - tcp_parse_options(skb, &tp->rx_opt, 1);
> + tcp_parse_options(skb, &tp->rx_opt, 1, NULL);
> return 1;
> }
>
> @@ -5364,8 +5366,9 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
> struct tcp_sock *tp = tcp_sk(sk);
> struct inet_connection_sock *icsk = inet_csk(sk);
> int saved_clamp = tp->rx_opt.mss_clamp;
> + struct dst_entry *dst = __sk_dst_get(sk);
>
> - tcp_parse_options(skb, &tp->rx_opt, 0);
> + tcp_parse_options(skb, &tp->rx_opt, 0, dst);
>
> if (th->ack) {
> /* rfc793:
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 7cda24b..1cb0ec4 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
> #endif
>
> + ireq = inet_rsk(req);
> + ireq->loc_addr = daddr;
> + ireq->rmt_addr = saddr;
> + ireq->no_srccheck = inet_sk(sk)->transparent;
> + ireq->opt = tcp_v4_save_options(sk, skb);
> +
> + dst = inet_csk_route_req(sk, req);
> tcp_clear_options(&tmp_opt);
> tmp_opt.mss_clamp = 536;
> tmp_opt.user_mss = tcp_sk(sk)->rx_opt.user_mss;
>
> - tcp_parse_options(skb, &tmp_opt, 0);
> + tcp_parse_options(skb, &tmp_opt, 0, dst);
>
> if (want_cookie && !tmp_opt.saw_tstamp)
> tcp_clear_options(&tmp_opt);
> @@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>
> tcp_openreq_init(req, &tmp_opt, skb);
>
> - ireq = inet_rsk(req);
> - ireq->loc_addr = daddr;
> - ireq->rmt_addr = saddr;
> - ireq->no_srccheck = inet_sk(sk)->transparent;
> - ireq->opt = tcp_v4_save_options(sk, skb);
> -
> if (security_inet_conn_request(sk, skb, req))
> - goto drop_and_free;
> + goto drop_and_release;
>
> if (!want_cookie)
> TCP_ECN_create_request(req, tcp_hdr(skb));
> @@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> */
> if (tmp_opt.saw_tstamp &&
> tcp_death_row.sysctl_tw_recycle &&
> - (dst = inet_csk_route_req(sk, req)) != NULL &&
> + dst != NULL &&
Why you need this NULL check this here while you trap it with BUG_ON
elsewhere? Does your patch perhaps create a remote DoS opportunity?
--
i.
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry
2009-10-21 13:03 ` Ilpo Järvinen
@ 2009-10-21 14:07 ` Gilad Ben-Yossef
2009-10-22 9:41 ` Ilpo Järvinen
0 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 14:07 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: Netdev, ori
Hi Ilpo,
Thanks for the feedback :-)
Ilpo Järvinen wrote:
> On Wed, 21 Oct 2009, Gilad Ben-Yossef wrote:
>
>
>> We need tcp_parse_options to be aware of dst_entry to
>> take into account per dst_entry TCP options settings
>>
>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>> Sigend-off-by: Ori Finkelman <ori@comsleep.com>
>> Sigend-off-by: Yony Amit <yony@comsleep.com>
>>
>> ---
>> include/net/tcp.h | 3 ++-
>> net/ipv4/syncookies.c | 27 ++++++++++++++-------------
>> net/ipv4/tcp_input.c | 9 ++++++---
>> net/ipv4/tcp_ipv4.c | 19 ++++++++++---------
>> net/ipv4/tcp_minisocks.c | 7 +++++--
>> net/ipv6/syncookies.c | 28 +++++++++++++++-------------
>> net/ipv6/tcp_ipv6.c | 3 ++-
>> 7 files changed, 54 insertions(+), 42 deletions(-)
>>
>>
>>
<snip>
>> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
>> index 7cda24b..1cb0ec4 100644
>> --- a/net/ipv4/tcp_ipv4.c
>> +++ b/net/ipv4/tcp_ipv4.c
>> @@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>> tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
>> #endif
>>
>> + ireq = inet_rsk(req);
>> + ireq->loc_addr = daddr;
>> + ireq->rmt_addr = saddr;
>> + ireq->no_srccheck = inet_sk(sk)->transparent;
>> + ireq->opt = tcp_v4_save_options(sk, skb);
>> +
>> + dst = inet_csk_route_req(sk, req);
>> tcp_clear_options(&tmp_opt);
>> tmp_opt.mss_clamp = 536;
>> tmp_opt.user_mss = tcp_sk(sk)->rx_opt.user_mss;
>>
>> - tcp_parse_options(skb, &tmp_opt, 0);
>> + tcp_parse_options(skb, &tmp_opt, 0, dst);
>>
>> if (want_cookie && !tmp_opt.saw_tstamp)
>> tcp_clear_options(&tmp_opt);
>> @@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>>
>> tcp_openreq_init(req, &tmp_opt, skb);
>>
>> - ireq = inet_rsk(req);
>> - ireq->loc_addr = daddr;
>> - ireq->rmt_addr = saddr;
>> - ireq->no_srccheck = inet_sk(sk)->transparent;
>> - ireq->opt = tcp_v4_save_options(sk, skb);
>> -
>> if (security_inet_conn_request(sk, skb, req))
>> - goto drop_and_free;
>> + goto drop_and_release;
>>
>> if (!want_cookie)
>> TCP_ECN_create_request(req, tcp_hdr(skb));
>> @@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>> */
>> if (tmp_opt.saw_tstamp &&
>> tcp_death_row.sysctl_tw_recycle &&
>> - (dst = inet_csk_route_req(sk, req)) != NULL &&
>> + dst != NULL &&
>>
>
> Why you need this NULL check this here while you trap it with BUG_ON
> elsewhere? Does your patch perhaps create a remote DoS opportunity?
>
>
>
Indeed, I believe you are right. Good catch.
What about this (I know the patch gets eaten by Thunderbird, sorry about
that. This is just for explaining what I want to do):
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1cb0ec4..1d611e3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1263,6 +1263,9 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
ireq->opt = tcp_v4_save_options(sk, skb);
dst = inet_csk_route_req(sk, req);
+ if(!dst)
+ goto drop_and_free;
+
tcp_clear_options(&tmp_opt);
tmp_opt.mss_clamp = 536;
tmp_opt.user_mss = tcp_sk(sk)->rx_opt.user_mss;
@@ -1302,7 +1305,6 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
*/
if (tmp_opt.saw_tstamp &&
tcp_death_row.sysctl_tw_recycle &&
- dst != NULL &&
(peer = rt_get_peer((struct rtable *)dst)) != NULL &&
peer->v4daddr == saddr) {
if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
My rational is that since if the connection is formed we will need to
send a syn/ack ( call to __tcp_v4_send_synack a couple of lines below)
and since we can't do that if we don't have a route, this makes sense.
If this sounds sane, I'll re-spin the patch with this as a fix.
Thanks a bunch!
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Sorry cannot parse this, its too long to be true :)"
-- Eric Dumazet on netdev mailing list
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry
2009-10-21 14:07 ` Gilad Ben-Yossef
@ 2009-10-22 9:41 ` Ilpo Järvinen
0 siblings, 0 replies; 34+ messages in thread
From: Ilpo Järvinen @ 2009-10-22 9:41 UTC (permalink / raw)
To: Gilad Ben-Yossef; +Cc: Netdev, ori
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4535 bytes --]
On Wed, 21 Oct 2009, Gilad Ben-Yossef wrote:
> Hi Ilpo,
>
>
> Thanks for the feedback :-)
>
>
> Ilpo Järvinen wrote:
>
> > On Wed, 21 Oct 2009, Gilad Ben-Yossef wrote:
> >
> >
> > > We need tcp_parse_options to be aware of dst_entry to take into account
> > > per dst_entry TCP options settings
> > >
> > > Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> > > Sigend-off-by: Ori Finkelman <ori@comsleep.com>
> > > Sigend-off-by: Yony Amit <yony@comsleep.com>
> > >
> > > ---
> > > include/net/tcp.h | 3 ++-
> > > net/ipv4/syncookies.c | 27 ++++++++++++++-------------
> > > net/ipv4/tcp_input.c | 9 ++++++---
> > > net/ipv4/tcp_ipv4.c | 19 ++++++++++---------
> > > net/ipv4/tcp_minisocks.c | 7 +++++--
> > > net/ipv6/syncookies.c | 28 +++++++++++++++-------------
> > > net/ipv6/tcp_ipv6.c | 3 ++-
> > > 7 files changed, 54 insertions(+), 42 deletions(-)
> > >
> > >
> > >
> <snip>
> > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > > index 7cda24b..1cb0ec4 100644
> > > --- a/net/ipv4/tcp_ipv4.c
> > > +++ b/net/ipv4/tcp_ipv4.c
> >> @@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct
> sk_buff *skb)
> > > tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
> > > #endif
> > >
> > > + ireq = inet_rsk(req);
> > > + ireq->loc_addr = daddr;
> > > + ireq->rmt_addr = saddr;
> > > + ireq->no_srccheck = inet_sk(sk)->transparent;
> > > + ireq->opt = tcp_v4_save_options(sk, skb);
> > > +
> > > + dst = inet_csk_route_req(sk, req);
> > > tcp_clear_options(&tmp_opt);
> > > tmp_opt.mss_clamp = 536;
> > > tmp_opt.user_mss = tcp_sk(sk)->rx_opt.user_mss;
> > >
> > > - tcp_parse_options(skb, &tmp_opt, 0);
> > > + tcp_parse_options(skb, &tmp_opt, 0, dst);
> > >
> > > if (want_cookie && !tmp_opt.saw_tstamp)
> > > tcp_clear_options(&tmp_opt);
> >> @@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct
> sk_buff *skb)
> > >
> > > tcp_openreq_init(req, &tmp_opt, skb);
> > >
> > > - ireq = inet_rsk(req);
> > > - ireq->loc_addr = daddr;
> > > - ireq->rmt_addr = saddr;
> > > - ireq->no_srccheck = inet_sk(sk)->transparent;
> > > - ireq->opt = tcp_v4_save_options(sk, skb);
> > > -
> > > if (security_inet_conn_request(sk, skb, req))
> > > - goto drop_and_free;
> > > + goto drop_and_release;
> > >
> > > if (!want_cookie)
> > > TCP_ECN_create_request(req, tcp_hdr(skb));
> >> @@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct
> sk_buff *skb)
> > > */
> > > if (tmp_opt.saw_tstamp &&
> > > tcp_death_row.sysctl_tw_recycle &&
> > > - (dst = inet_csk_route_req(sk, req)) != NULL &&
> > > + dst != NULL &&
> > >
> >
> > Why you need this NULL check this here while you trap it with BUG_ON
> > elsewhere? Does your patch perhaps create a remote DoS opportunity?
> >
> >
> >
> Indeed, I believe you are right. Good catch.
>
> What about this (I know the patch gets eaten by Thunderbird, sorry about that.
> This is just for explaining what I want to do):
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
>
> index 1cb0ec4..1d611e3 100644
>
> --- a/net/ipv4/tcp_ipv4.c
>
> +++ b/net/ipv4/tcp_ipv4.c
>
> @@ -1263,6 +1263,9 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff
> *skb)
>
> ireq->opt = tcp_v4_save_options(sk, skb);
>
>
>
> dst = inet_csk_route_req(sk, req);
>
> + if(!dst)
>
> + goto drop_and_free;
>
> +
>
> tcp_clear_options(&tmp_opt);
>
> tmp_opt.mss_clamp = 536;
>
> tmp_opt.user_mss = tcp_sk(sk)->rx_opt.user_mss;
>
> @@ -1302,7 +1305,6 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff
> *skb)
>
> */
>
> if (tmp_opt.saw_tstamp &&
>
> tcp_death_row.sysctl_tw_recycle &&
>
> - dst != NULL &&
>
> (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
>
> peer->v4daddr == saddr) {
>
> if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL
> &&
>
>
>
> My rational is that since if the connection is formed we will need to send a
> syn/ack ( call to __tcp_v4_send_synack a couple of lines below) and since we
> can't do that if we don't have a route, this makes sense.
>
> If this sounds sane, I'll re-spin the patch with this as a fix.
I'd just guard the relevant places with dst && ...? ...But I didn't go
through that far to find out how many one would then need.
--
i.
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 3/8] Add dst_feature to query route entry features
2009-10-21 8:56 [PATCH v2 0/8] Per route TCP options Gilad Ben-Yossef
2009-10-21 8:56 ` [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock Gilad Ben-Yossef
2009-10-21 8:56 ` [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry Gilad Ben-Yossef
@ 2009-10-21 8:56 ` Gilad Ben-Yossef
2009-10-21 8:56 ` [PATCH v2 4/8] Add the no SACK route option feature Gilad Ben-Yossef
` (4 subsequent siblings)
7 siblings, 0 replies; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 8:56 UTC (permalink / raw)
To: netdev; +Cc: ori, Gilad Ben-Yossef
Adding an accessor to existing dst_entry feautres field and
refactor the only supported feature (allfrag) to use it.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
include/net/dst.h | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/include/net/dst.h b/include/net/dst.h
index 5a900dd..b562be3 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -111,6 +111,12 @@ dst_metric(const struct dst_entry *dst, int metric)
return dst->metrics[metric-1];
}
+static inline u32
+dst_feature(const struct dst_entry *dst, u32 feature)
+{
+ return dst_metric(dst, RTAX_FEATURES) & feature;
+}
+
static inline u32 dst_mtu(const struct dst_entry *dst)
{
u32 mtu = dst_metric(dst, RTAX_MTU);
@@ -136,7 +142,7 @@ static inline void set_dst_metric_rtt(struct dst_entry *dst, int metric,
static inline u32
dst_allfrag(const struct dst_entry *dst)
{
- int ret = dst_metric(dst, RTAX_FEATURES) & RTAX_FEATURE_ALLFRAG;
+ int ret = dst_feature(dst, RTAX_FEATURE_ALLFRAG);
/* Yes, _exactly_. This is paranoia. */
barrier();
return ret;
--
1.5.6.3
^ permalink raw reply related [flat|nested] 34+ messages in thread* [PATCH v2 4/8] Add the no SACK route option feature
2009-10-21 8:56 [PATCH v2 0/8] Per route TCP options Gilad Ben-Yossef
` (2 preceding siblings ...)
2009-10-21 8:56 ` [PATCH v2 3/8] Add dst_feature to query route entry features Gilad Ben-Yossef
@ 2009-10-21 8:56 ` Gilad Ben-Yossef
2009-10-21 19:22 ` William Allen Simpson
2009-10-21 8:56 ` [PATCH v2 5/8] Allow disabling TCP timestamp options per route Gilad Ben-Yossef
` (3 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 8:56 UTC (permalink / raw)
To: netdev; +Cc: ori, Gilad Ben-Yossef
Implement querying and acting upon the no sack bit in the features
field.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
include/linux/rtnetlink.h | 2 +-
net/ipv4/tcp_input.c | 3 ++-
net/ipv4/tcp_output.c | 4 +++-
3 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index adf2068..9c802a6 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -377,7 +377,7 @@ enum
#define RTAX_MAX (__RTAX_MAX - 1)
#define RTAX_FEATURE_ECN 0x00000001
-#define RTAX_FEATURE_SACK 0x00000002
+#define RTAX_FEATURE_NO_SACK 0x00000002
#define RTAX_FEATURE_TIMESTAMP 0x00000004
#define RTAX_FEATURE_ALLFRAG 0x00000008
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d502f49..b14f780 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3763,7 +3763,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
break;
case TCPOPT_SACK_PERM:
if (opsize == TCPOLEN_SACK_PERM && th->syn &&
- !estab && sysctl_tcp_sack) {
+ !estab && sysctl_tcp_sack &&
+ !dst_feature(dst, RTAX_FEATURE_NO_SACK)) {
opt_rx->sack_ok = 1;
tcp_sack_reset(opt_rx);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fcd278a..64db8dd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -464,6 +464,7 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
struct tcp_md5sig_key **md5) {
struct tcp_sock *tp = tcp_sk(sk);
unsigned size = 0;
+ struct dst_entry *dst = __sk_dst_get(sk);
#ifdef CONFIG_TCP_MD5SIG
*md5 = tp->af_specific->md5_lookup(sk, sk);
@@ -498,7 +499,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
opts->options |= OPTION_WSCALE;
size += TCPOLEN_WSCALE_ALIGNED;
}
- if (likely(sysctl_tcp_sack)) {
+ if (likely(sysctl_tcp_sack &&
+ !dst_feature(dst, RTAX_FEATURE_NO_SACK))) {
opts->options |= OPTION_SACK_ADVERTISE;
if (unlikely(!(OPTION_TS & opts->options)))
size += TCPOLEN_SACKPERM_ALIGNED;
--
1.5.6.3
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 4/8] Add the no SACK route option feature
2009-10-21 8:56 ` [PATCH v2 4/8] Add the no SACK route option feature Gilad Ben-Yossef
@ 2009-10-21 19:22 ` William Allen Simpson
2009-10-25 8:44 ` Gilad Ben-Yossef
0 siblings, 1 reply; 34+ messages in thread
From: William Allen Simpson @ 2009-10-21 19:22 UTC (permalink / raw)
To: netdev
Gilad Ben-Yossef wrote:
> Implement querying and acting upon the no sack bit in the features
> field.
>
> #define RTAX_FEATURE_ECN 0x00000001
> -#define RTAX_FEATURE_SACK 0x00000002
> +#define RTAX_FEATURE_NO_SACK 0x00000002
> #define RTAX_FEATURE_TIMESTAMP 0x00000004
> #define RTAX_FEATURE_ALLFRAG 0x00000008
>
I just realized that unlike NO_DSACK, this change assumes removing the
sysctl and defaulting on. I'm opposed to removing this sysctl, so I'm
opposed to this change.
I'd prefer the ability to both turn on for global default off, and
turn off for global default on. Shouldn't that be 2 different bits?
Or should this be a toggle? How do other systems handle it?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 4/8] Add the no SACK route option feature
2009-10-21 19:22 ` William Allen Simpson
@ 2009-10-25 8:44 ` Gilad Ben-Yossef
0 siblings, 0 replies; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-25 8:44 UTC (permalink / raw)
To: William Allen Simpson; +Cc: netdev
William Allen Simpson wrote:
> Gilad Ben-Yossef wrote:
>> Implement querying and acting upon the no sack bit in the features
>> field.
>>
>> #define RTAX_FEATURE_ECN 0x00000001
>> -#define RTAX_FEATURE_SACK 0x00000002
>> +#define RTAX_FEATURE_NO_SACK 0x00000002
>> #define RTAX_FEATURE_TIMESTAMP 0x00000004
>> #define RTAX_FEATURE_ALLFRAG 0x00000008
>>
> I just realized that unlike NO_DSACK, this change assumes removing the
> sysctl and defaulting on.
Once again, no it does not do no such thing. The sysctl and semantics
stays here just the same.
The RTAX_FEATURE_SACK is not sued AFAIK in current code at all.
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Sorry cannot parse this, its too long to be true :)"
-- Eric Dumazet on netdev mailing list
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 5/8] Allow disabling TCP timestamp options per route
2009-10-21 8:56 [PATCH v2 0/8] Per route TCP options Gilad Ben-Yossef
` (3 preceding siblings ...)
2009-10-21 8:56 ` [PATCH v2 4/8] Add the no SACK route option feature Gilad Ben-Yossef
@ 2009-10-21 8:56 ` Gilad Ben-Yossef
2009-10-21 19:22 ` William Allen Simpson
2009-10-21 8:56 ` [PATCH v2 6/8] Allow to turn off TCP window scale opt " Gilad Ben-Yossef
` (2 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 8:56 UTC (permalink / raw)
To: netdev; +Cc: ori, Gilad Ben-Yossef
Implement querying and acting upon the no timestamp bit in the feature
field.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
include/linux/rtnetlink.h | 2 +-
net/ipv4/tcp_input.c | 3 ++-
net/ipv4/tcp_output.c | 8 ++++++--
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 9c802a6..2ab8c75 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -378,7 +378,7 @@ enum
#define RTAX_FEATURE_ECN 0x00000001
#define RTAX_FEATURE_NO_SACK 0x00000002
-#define RTAX_FEATURE_TIMESTAMP 0x00000004
+#define RTAX_FEATURE_NO_TSTAMP 0x00000004
#define RTAX_FEATURE_ALLFRAG 0x00000008
struct rta_session
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b14f780..d2f9742 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3755,7 +3755,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
case TCPOPT_TIMESTAMP:
if ((opsize == TCPOLEN_TIMESTAMP) &&
((estab && opt_rx->tstamp_ok) ||
- (!estab && sysctl_tcp_timestamps))) {
+ (!estab && sysctl_tcp_timestamps &&
+ !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP)))) {
opt_rx->saw_tstamp = 1;
opt_rx->rcv_tsval = get_unaligned_be32(ptr);
opt_rx->rcv_tsecr = get_unaligned_be32(ptr + 4);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 64db8dd..8f30c18 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -488,7 +488,9 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
opts->mss = tcp_advertise_mss(sk);
size += TCPOLEN_MSS_ALIGNED;
- if (likely(sysctl_tcp_timestamps && *md5 == NULL)) {
+ if (likely(sysctl_tcp_timestamps &&
+ !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) &&
+ *md5 == NULL)) {
opts->options |= OPTION_TS;
opts->tsval = TCP_SKB_CB(skb)->when;
opts->tsecr = tp->rx_opt.ts_recent;
@@ -2317,7 +2319,9 @@ static void tcp_connect_init(struct sock *sk)
* See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
*/
tp->tcp_header_len = sizeof(struct tcphdr) +
- (sysctl_tcp_timestamps ? TCPOLEN_TSTAMP_ALIGNED : 0);
+ (sysctl_tcp_timestamps &&
+ (!dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) ?
+ TCPOLEN_TSTAMP_ALIGNED : 0));
#ifdef CONFIG_TCP_MD5SIG
if (tp->af_specific->md5_lookup(sk, sk) != NULL)
--
1.5.6.3
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 5/8] Allow disabling TCP timestamp options per route
2009-10-21 8:56 ` [PATCH v2 5/8] Allow disabling TCP timestamp options per route Gilad Ben-Yossef
@ 2009-10-21 19:22 ` William Allen Simpson
2009-10-25 8:43 ` Gilad Ben-Yossef
0 siblings, 1 reply; 34+ messages in thread
From: William Allen Simpson @ 2009-10-21 19:22 UTC (permalink / raw)
To: netdev
Gilad Ben-Yossef wrote:
> Implement querying and acting upon the no timestamp bit in the feature
> field.
>
> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> Sigend-off-by: Ori Finkelman <ori@comsleep.com>
> Sigend-off-by: Yony Amit <yony@comsleep.com>
>
> ---
> include/linux/rtnetlink.h | 2 +-
> net/ipv4/tcp_input.c | 3 ++-
> net/ipv4/tcp_output.c | 8 ++++++--
> 3 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
> index 9c802a6..2ab8c75 100644
> --- a/include/linux/rtnetlink.h
> +++ b/include/linux/rtnetlink.h
> @@ -378,7 +378,7 @@ enum
>
> #define RTAX_FEATURE_ECN 0x00000001
> #define RTAX_FEATURE_NO_SACK 0x00000002
> -#define RTAX_FEATURE_TIMESTAMP 0x00000004
> +#define RTAX_FEATURE_NO_TSTAMP 0x00000004
> #define RTAX_FEATURE_ALLFRAG 0x00000008
>
I just realized that unlike NO_WSCALE, this change assumes removing the
sysctl and defaulting on. I'm opposed to removing this sysctl, so I'm
opposed to this change.
I'd prefer the ability to both turn on for global default off, and
turn off for global default on. Shouldn't that be 2 different bits?
Or should this be a toggle? How do other systems handle it?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 5/8] Allow disabling TCP timestamp options per route
2009-10-21 19:22 ` William Allen Simpson
@ 2009-10-25 8:43 ` Gilad Ben-Yossef
0 siblings, 0 replies; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-25 8:43 UTC (permalink / raw)
To: William Allen Simpson; +Cc: netdev
Hi,
William Allen Simpson wrote:
>
>> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
>> index 9c802a6..2ab8c75 100644
>> --- a/include/linux/rtnetlink.h
>> +++ b/include/linux/rtnetlink.h
>> @@ -378,7 +378,7 @@ enum
>>
>> #define RTAX_FEATURE_ECN 0x00000001
>> #define RTAX_FEATURE_NO_SACK 0x00000002
>> -#define RTAX_FEATURE_TIMESTAMP 0x00000004
>> +#define RTAX_FEATURE_NO_TSTAMP 0x00000004
>> #define RTAX_FEATURE_ALLFRAG 0x00000008
>>
> I just realized that unlike NO_WSCALE, this change assumes removing the
> sysctl and defaulting on.
No, it doesn't. This patch does not change the sysctl behavior in any
way. The RTAX_FEATURE_TIMESTAMP define is a currently a dead code not
used anywhere AFAIK.
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Sorry cannot parse this, its too long to be true :)"
-- Eric Dumazet on netdev mailing list
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 6/8] Allow to turn off TCP window scale opt per route
2009-10-21 8:56 [PATCH v2 0/8] Per route TCP options Gilad Ben-Yossef
` (4 preceding siblings ...)
2009-10-21 8:56 ` [PATCH v2 5/8] Allow disabling TCP timestamp options per route Gilad Ben-Yossef
@ 2009-10-21 8:56 ` Gilad Ben-Yossef
2009-10-21 8:57 ` [PATCH v2 7/8] Allow disabling of DSACK TCP option " Gilad Ben-Yossef
2009-10-21 8:57 ` [PATCH v2 8/8] Document future removal of sysctl_tcp_* options Gilad Ben-Yossef
7 siblings, 0 replies; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 8:56 UTC (permalink / raw)
To: netdev; +Cc: ori, Gilad Ben-Yossef
Add and use no window scale bit in the features field.
Note that this is not the same as setting a window scale of 0
as would happen with window limit on route.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
include/linux/rtnetlink.h | 1 +
net/ipv4/tcp_input.c | 3 ++-
net/ipv4/tcp_output.c | 6 ++++--
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 2ab8c75..6784b34 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -380,6 +380,7 @@ enum
#define RTAX_FEATURE_NO_SACK 0x00000002
#define RTAX_FEATURE_NO_TSTAMP 0x00000004
#define RTAX_FEATURE_ALLFRAG 0x00000008
+#define RTAX_FEATURE_NO_WSCALE 0x00000010
struct rta_session
{
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d2f9742..4f5e914 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3739,7 +3739,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
break;
case TCPOPT_WINDOW:
if (opsize == TCPOLEN_WINDOW && th->syn &&
- !estab && sysctl_tcp_window_scaling) {
+ !estab && sysctl_tcp_window_scaling &&
+ !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)) {
__u8 snd_wscale = *(__u8 *)ptr;
opt_rx->wscale_ok = 1;
if (snd_wscale > 14) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8f30c18..ff60a21 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -496,7 +496,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
opts->tsecr = tp->rx_opt.ts_recent;
size += TCPOLEN_TSTAMP_ALIGNED;
}
- if (likely(sysctl_tcp_window_scaling)) {
+ if (likely(sysctl_tcp_window_scaling &&
+ !dst_feature(dst, RTAX_FEATURE_NO_WSCALE))) {
opts->ws = tp->rx_opt.rcv_wscale;
opts->options |= OPTION_WSCALE;
size += TCPOLEN_WSCALE_ALIGNED;
@@ -2347,7 +2348,8 @@ static void tcp_connect_init(struct sock *sk)
tp->advmss - (tp->rx_opt.ts_recent_stamp ? tp->tcp_header_len - sizeof(struct tcphdr) : 0),
&tp->rcv_wnd,
&tp->window_clamp,
- sysctl_tcp_window_scaling,
+ (sysctl_tcp_window_scaling &&
+ !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)),
&rcv_wscale);
tp->rx_opt.rcv_wscale = rcv_wscale;
--
1.5.6.3
^ permalink raw reply related [flat|nested] 34+ messages in thread* [PATCH v2 7/8] Allow disabling of DSACK TCP option per route
2009-10-21 8:56 [PATCH v2 0/8] Per route TCP options Gilad Ben-Yossef
` (5 preceding siblings ...)
2009-10-21 8:56 ` [PATCH v2 6/8] Allow to turn off TCP window scale opt " Gilad Ben-Yossef
@ 2009-10-21 8:57 ` Gilad Ben-Yossef
2009-10-21 8:57 ` [PATCH v2 8/8] Document future removal of sysctl_tcp_* options Gilad Ben-Yossef
7 siblings, 0 replies; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 8:57 UTC (permalink / raw)
To: netdev; +Cc: ori, Gilad Ben-Yossef
Add and use no DSCAK bit in the features field.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
include/linux/rtnetlink.h | 1 +
net/ipv4/tcp_input.c | 8 ++++++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 6784b34..e78b60c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -381,6 +381,7 @@ enum
#define RTAX_FEATURE_NO_TSTAMP 0x00000004
#define RTAX_FEATURE_ALLFRAG 0x00000008
#define RTAX_FEATURE_NO_WSCALE 0x00000010
+#define RTAX_FEATURE_NO_DSACK 0x00000020
struct rta_session
{
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4f5e914..4262da5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4080,8 +4080,10 @@ static inline int tcp_sack_extend(struct tcp_sack_block *sp, u32 seq,
static void tcp_dsack_set(struct sock *sk, u32 seq, u32 end_seq)
{
struct tcp_sock *tp = tcp_sk(sk);
+ struct dst_entry *dst = __sk_dst_get(sk);
- if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
+ if (tcp_is_sack(tp) && sysctl_tcp_dsack &&
+ !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) {
int mib_idx;
if (before(seq, tp->rcv_nxt))
@@ -4110,13 +4112,15 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
static void tcp_send_dupack(struct sock *sk, struct sk_buff *skb)
{
struct tcp_sock *tp = tcp_sk(sk);
+ struct dst_entry *dst = __sk_dst_get(sk);
if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
tcp_enter_quickack_mode(sk);
- if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
+ if (tcp_is_sack(tp) && sysctl_tcp_dsack &&
+ !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) {
u32 end_seq = TCP_SKB_CB(skb)->end_seq;
if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
--
1.5.6.3
^ permalink raw reply related [flat|nested] 34+ messages in thread* [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-21 8:56 [PATCH v2 0/8] Per route TCP options Gilad Ben-Yossef
` (6 preceding siblings ...)
2009-10-21 8:57 ` [PATCH v2 7/8] Allow disabling of DSACK TCP option " Gilad Ben-Yossef
@ 2009-10-21 8:57 ` Gilad Ben-Yossef
2009-10-21 9:40 ` William Allen Simpson
7 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 8:57 UTC (permalink / raw)
To: netdev; +Cc: ori, Gilad Ben-Yossef
No need for global kill switches if we have per route entry controls.
Wait a year before removing in case someone is using this.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
---
Documentation/feature-removal-schedule.txt | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 89a47b5..60db855 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -6,6 +6,18 @@ be removed from this file.
---------------------------
+What: sysctl_tcp_sack, sysctl_tcp_timestamps, sysctl_tcp_window_scaling,
+ sysctl_tcp_dsack
+When: October 2010
+
+Why: These options can now be set on a per route basis via the
+ RTAX_FEATURE_NO_SACK, RTAX_FEATURE_NO_TSTAMP, RTAX_FEATURE_NO_WSCALE,
+ and RTAX_FEATURE_NO_DSACK route feature options.
+
+Who: Gilad Ben-Yossef <gilad@codefidence.com>
+
+---------------------------
+
What: PRISM54
When: 2.6.34
--
1.5.6.3
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-21 8:57 ` [PATCH v2 8/8] Document future removal of sysctl_tcp_* options Gilad Ben-Yossef
@ 2009-10-21 9:40 ` William Allen Simpson
2009-10-21 10:23 ` Gilad Ben-Yossef
0 siblings, 1 reply; 34+ messages in thread
From: William Allen Simpson @ 2009-10-21 9:40 UTC (permalink / raw)
To: Gilad Ben-Yossef; +Cc: netdev, ori
Gilad Ben-Yossef wrote:
> +What: sysctl_tcp_sack, sysctl_tcp_timestamps, sysctl_tcp_window_scaling,
> + sysctl_tcp_dsack
Opposed to removing, as this is a common configuration for *BSD. It helps
operators to have similar utility across systems.
net.inet.tcp.sack.enable
net.inet.tcp.timestamps
net.inet.tcp.win_scale
Support removing sysctl_tcp_dsack, as this appears to be Linux-only.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-21 9:40 ` William Allen Simpson
@ 2009-10-21 10:23 ` Gilad Ben-Yossef
2009-10-21 19:30 ` William Allen Simpson
0 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-21 10:23 UTC (permalink / raw)
To: William Allen Simpson; +Cc: netdev, ori
William Allen Simpson wrote:
> Gilad Ben-Yossef wrote:
>> +What: sysctl_tcp_sack, sysctl_tcp_timestamps,
>> sysctl_tcp_window_scaling,
>> + sysctl_tcp_dsack
>
> Opposed to removing, as this is a common configuration for *BSD. It
> helps
> operators to have similar utility across systems.
>
> net.inet.tcp.sack.enable
>
> net.inet.tcp.timestamps
>
> net.inet.tcp.win_scale
>
> Support removing sysctl_tcp_dsack, as this appears to be Linux-only.
I have no issue with leaving those, if everyone thinks we're better off.
BTW, while we're talking about OS envy, I do believe that Windows do let
you specify on a per route basis. Not that this is really a good ground for
technical decision, but still... :-)
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Sorry cannot parse this, its too long to be true :)"
-- Eric Dumazet on netdev mailing list
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-21 10:23 ` Gilad Ben-Yossef
@ 2009-10-21 19:30 ` William Allen Simpson
2009-10-22 4:32 ` Bill Fink
2009-10-25 8:45 ` Gilad Ben-Yossef
0 siblings, 2 replies; 34+ messages in thread
From: William Allen Simpson @ 2009-10-21 19:30 UTC (permalink / raw)
To: netdev
Gilad Ben-Yossef wrote:
> I have no issue with leaving those, if everyone thinks we're better off.
>
> BTW, while we're talking about OS envy, I do believe that Windows do let
> you specify on a per route basis. Not that this is really a good ground for
> technical decision, but still... :-)
>
I'm not concerned with "envy", I'm concerned with training operators, and
consistency across platforms.
I'm in favor of per route configuration, it seems reasonably clean, as
long as it's done consistently with other systems. I don't permit Windows
systems to be used here (except under controlled security circumstances), so
I'm not familiar with their configuration. However, doing things similarly
across platforms will ease documentation and training.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-21 19:30 ` William Allen Simpson
@ 2009-10-22 4:32 ` Bill Fink
2009-10-22 4:57 ` Eric Dumazet
2009-10-25 8:45 ` Gilad Ben-Yossef
1 sibling, 1 reply; 34+ messages in thread
From: Bill Fink @ 2009-10-22 4:32 UTC (permalink / raw)
To: William Allen Simpson; +Cc: netdev
On Wed, 21 Oct 2009, William Allen Simpson wrote:
> Gilad Ben-Yossef wrote:
> > I have no issue with leaving those, if everyone thinks we're better off.
> >
> > BTW, while we're talking about OS envy, I do believe that Windows do let
> > you specify on a per route basis. Not that this is really a good ground for
> > technical decision, but still... :-)
> >
> I'm not concerned with "envy", I'm concerned with training operators, and
> consistency across platforms.
>
> I'm in favor of per route configuration, it seems reasonably clean, as
> long as it's done consistently with other systems. I don't permit Windows
> systems to be used here (except under controlled security circumstances), so
> I'm not familiar with their configuration. However, doing things similarly
> across platforms will ease documentation and training.
And as mentioned previously, the global options can be quite useful
in certain test scenarios. I also agree the per route settings are
a very useful addition. I think the global and per route settings
are complementary and shouldn't be thought of as in conflict with
one another.
-Bill
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-22 4:32 ` Bill Fink
@ 2009-10-22 4:57 ` Eric Dumazet
2009-10-22 10:53 ` William Allen Simpson
2009-10-25 9:09 ` Gilad Ben-Yossef
0 siblings, 2 replies; 34+ messages in thread
From: Eric Dumazet @ 2009-10-22 4:57 UTC (permalink / raw)
To: Bill Fink; +Cc: William Allen Simpson, netdev
Bill Fink a écrit :
> And as mentioned previously, the global options can be quite useful
> in certain test scenarios. I also agree the per route settings are
> a very useful addition. I think the global and per route settings
> are complementary and shouldn't be thought of as in conflict with
> one another.
>
Absolutely, global setting is a must when an admin wants a quick path.
The more flexible would be to have two bits per route, plus
2 bits on the global configuration.
global conf:
00 : timestamps OFF, unless a route setting is not 00
01 : timestamps ON, unless a route setting is not 00
10 : Force timestamps OFF, ignore route settings (emergency sysadmin request)
11 : Force timestamps ON, ignore route settings
Route settings (used *only* if global setting is 0Y)
00 : global conf is used
01 : Force timestamps being OFF for this route
10 : Force timestamps being ON for this route
11 : complement global conf
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-22 4:57 ` Eric Dumazet
@ 2009-10-22 10:53 ` William Allen Simpson
2009-10-25 9:09 ` Gilad Ben-Yossef
1 sibling, 0 replies; 34+ messages in thread
From: William Allen Simpson @ 2009-10-22 10:53 UTC (permalink / raw)
To: netdev
Eric Dumazet wrote:
> Absolutely, global setting is a must when an admin wants a quick path.
>
> The more flexible would be to have two bits per route, plus
> 2 bits on the global configuration.
>
> global conf:
> 00 : timestamps OFF, unless a route setting is not 00
> 01 : timestamps ON, unless a route setting is not 00
> 10 : Force timestamps OFF, ignore route settings (emergency sysadmin request)
> 11 : Force timestamps ON, ignore route settings
>
> Route settings (used *only* if global setting is 0Y)
> 00 : global conf is used
> 01 : Force timestamps being OFF for this route
> 10 : Force timestamps being ON for this route
> 11 : complement global conf
>
Nice! Seems to have all the bases covered. For consistency, I'd swap the
latter values (although I doubt complement will have much use):
00 : global conf is used
01 : complement global conf
10 : Timestamps OFF for this route
11 : Timestamps ON for this route
And the documentation should make it clear that global 10 and 11 override
per route 10 and 11.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-22 4:57 ` Eric Dumazet
2009-10-22 10:53 ` William Allen Simpson
@ 2009-10-25 9:09 ` Gilad Ben-Yossef
2009-10-26 0:21 ` Bill Fink
1 sibling, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-25 9:09 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Bill Fink, William Allen Simpson, netdev
Hi,
Eric Dumazet wrote:
> Bill Fink a écrit :
>
>
>> And as mentioned previously, the global options can be quite useful
>> in certain test scenarios. I also agree the per route settings are
>> a very useful addition. I think the global and per route settings
>> are complementary and shouldn't be thought of as in conflict with
>> one another.
>>
> Absolutely, global setting is a must when an admin wants a quick path.
>
> The more flexible would be to have two bits per route, plus
> 2 bits on the global configuration.
>
> global conf:
> 00 : timestamps OFF, unless a route setting is not 00
> 01 : timestamps ON, unless a route setting is not 00
> 10 : Force timestamps OFF, ignore route settings (emergency sysadmin request)
> 11 : Force timestamps ON, ignore route settings
>
> Route settings (used *only* if global setting is 0Y)
> 00 : global conf is used
> 01 : Force timestamps being OFF for this route
> 10 : Force timestamps being ON for this route
> 11 : complement global conf
> --
>
Hey, I have no issue to re-spin the patch with this suggestion, if you
truly think this is valuable, but would you please consider the
nightmare of having to just explain this to someone?
It sounds to me way too complicated for what it does.
I still think having a global kill switch and per route options better
(basically use the exiting patch but not retire the global kill
switch|), but if you must Hgow about we leave the global sysctl as they
are and just have a two bit route option:
0 Use global default
1 Off
2 On
It's kind of funny, because this is what the original patch from
Comsleep does and I thought it needlessly complicates things.
So, what do you say - which will it be?
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Sorry cannot parse this, its too long to be true :)"
-- Eric Dumazet on netdev mailing list
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-25 9:09 ` Gilad Ben-Yossef
@ 2009-10-26 0:21 ` Bill Fink
2009-10-26 5:03 ` Eric Dumazet
0 siblings, 1 reply; 34+ messages in thread
From: Bill Fink @ 2009-10-26 0:21 UTC (permalink / raw)
To: Gilad Ben-Yossef; +Cc: Eric Dumazet, William Allen Simpson, netdev
On Sun, 25 Oct 2009, Gilad Ben-Yossef wrote:
> Eric Dumazet wrote:
>
> > Bill Fink a écrit :
> >
> >
> >> And as mentioned previously, the global options can be quite useful
> >> in certain test scenarios. I also agree the per route settings are
> >> a very useful addition. I think the global and per route settings
> >> are complementary and shouldn't be thought of as in conflict with
> >> one another.
> >>
> > Absolutely, global setting is a must when an admin wants a quick path.
> >
> > The more flexible would be to have two bits per route, plus
> > 2 bits on the global configuration.
> >
> > global conf:
> > 00 : timestamps OFF, unless a route setting is not 00
> > 01 : timestamps ON, unless a route setting is not 00
> > 10 : Force timestamps OFF, ignore route settings (emergency sysadmin request)
> > 11 : Force timestamps ON, ignore route settings
> >
> > Route settings (used *only* if global setting is 0Y)
> > 00 : global conf is used
> > 01 : Force timestamps being OFF for this route
> > 10 : Force timestamps being ON for this route
> > 11 : complement global conf
>
> Hey, I have no issue to re-spin the patch with this suggestion, if you
> truly think this is valuable, but would you please consider the
> nightmare of having to just explain this to someone?
>
> It sounds to me way too complicated for what it does.
>
> I still think having a global kill switch and per route options better
> (basically use the exiting patch but not retire the global kill
> switch|), but if you must Hgow about we leave the global sysctl as they
> are and just have a two bit route option:
>
> 0 Use global default
> 1 Off
> 2 On
>
> It's kind of funny, because this is what the original patch from
> Comsleep does and I thought it needlessly complicates things.
>
> So, what do you say - which will it be?
I personally feel the 2-bit settings are overkill. What i think
makes the most sense is for the global options to act as they always
have in the absence of any route specific settings, and for any
route specific settings to override the related global settings.
This is both simple and maintains backward compatibility.
-Bill
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-26 0:21 ` Bill Fink
@ 2009-10-26 5:03 ` Eric Dumazet
2009-10-26 8:05 ` Gilad Ben-Yossef
0 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2009-10-26 5:03 UTC (permalink / raw)
To: Bill Fink; +Cc: Gilad Ben-Yossef, William Allen Simpson, netdev
Bill Fink a écrit :
> On Sun, 25 Oct 2009, Gilad Ben-Yossef wrote:
>
>> Eric Dumazet wrote:
>>
>>> Bill Fink a écrit :
>>>
>>>
>>>> And as mentioned previously, the global options can be quite useful
>>>> in certain test scenarios. I also agree the per route settings are
>>>> a very useful addition. I think the global and per route settings
>>>> are complementary and shouldn't be thought of as in conflict with
>>>> one another.
>>>>
>>> Absolutely, global setting is a must when an admin wants a quick path.
>>>
>>> The more flexible would be to have two bits per route, plus
>>> 2 bits on the global configuration.
>>>
>>> global conf:
>>> 00 : timestamps OFF, unless a route setting is not 00
>>> 01 : timestamps ON, unless a route setting is not 00
>>> 10 : Force timestamps OFF, ignore route settings (emergency sysadmin request)
>>> 11 : Force timestamps ON, ignore route settings
>>>
>>> Route settings (used *only* if global setting is 0Y)
>>> 00 : global conf is used
>>> 01 : Force timestamps being OFF for this route
>>> 10 : Force timestamps being ON for this route
>>> 11 : complement global conf
>> Hey, I have no issue to re-spin the patch with this suggestion, if you
>> truly think this is valuable, but would you please consider the
>> nightmare of having to just explain this to someone?
>>
>> It sounds to me way too complicated for what it does.
>>
>> I still think having a global kill switch and per route options better
>> (basically use the exiting patch but not retire the global kill
>> switch|), but if you must Hgow about we leave the global sysctl as they
>> are and just have a two bit route option:
>>
>> 0 Use global default
>> 1 Off
>> 2 On
>>
>> It's kind of funny, because this is what the original patch from
>> Comsleep does and I thought it needlessly complicates things.
>>
>> So, what do you say - which will it be?
>
> I personally feel the 2-bit settings are overkill. What i think
> makes the most sense is for the global options to act as they always
> have in the absence of any route specific settings, and for any
> route specific settings to override the related global settings.
> This is both simple and maintains backward compatibility.
Backward compatibility is important, very important, if not the most
important thing. Then usability comes.
I know some busy servers where adding/changing a single route makes them
go crazy (because of ip route flush cache)
So if a route is overriding a global conf, and the admin wants to make an
emergency change during peak hours, he should do it by a global setting,
or he wont use at all this new stuff, and stay conservative.
Alternative would be to not trigger the flush of cache when changing
features flags.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-26 5:03 ` Eric Dumazet
@ 2009-10-26 8:05 ` Gilad Ben-Yossef
2009-10-26 15:08 ` Bill Fink
0 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-26 8:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Bill Fink, William Allen Simpson, netdev, Ilpo Järvinen
Eric Dumazet wrote:
> Bill Fink a écrit :
>
>> On Sun, 25 Oct 2009, Gilad Ben-Yossef wrote:
>>
>>
>>> Eric Dumazet wrote:
>>>
>>>
>>> I still think having a global kill switch and per route options better
>>> (basically use the exiting patch but not retire the global kill
>>> switch|), but if you must Hgow about we leave the global sysctl as they
>>> are and just have a two bit route option:
>>>
>>> 0 Use global default
>>> 1 Off
>>> 2 On
>>>
>>> It's kind of funny, because this is what the original patch from
>>> Comsleep does and I thought it needlessly complicates things.
>>>
>>> So, what do you say - which will it be?
>>>
>> I personally feel the 2-bit settings are overkill. What i think
>> makes the most sense is for the global options to act as they always
>> have in the absence of any route specific settings, and for any
>> route specific settings to override the related global settings.
>> This is both simple and maintains backward compatibility.
>>
>
> Backward compatibility is important, very important, if not the most
> important thing. Then usability comes.
>
I tend to agree.
> I know some busy servers where adding/changing a single route makes them
> go crazy (because of ip route flush cache)
>
> So if a route is overriding a global conf, and the admin wants to make an
> emergency change during peak hours, he should do it by a global setting,
> or he wont use at all this new stuff, and stay conservative.
>
> Alternative would be to not trigger the flush of cache when changing
> features flags.
>
>
OK. It really sounds like we should go with my first suggestion: global
sysctl based kill switches, just as we have now and in addition, the
ability to kill TCP options per route. The TCP option will be used if
and only if both kill switches (global and per route) are not set.
What we achieve is:
1. Global kill switches work exactly as they do now, whether you use the
new per route options or not, so backwards compatible.
2. In addition, if the global kill switch is not in effect, you can also
kill the options on a per route basis.
I'm going to send third version of the patch to this effect, minus the
new remote DoS possibility that Ilpo pointed out and leaving the global
sysctl kill switches be.
If you like it, please ACK ;-)
Thanks,
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Linux is Ir. Ir, of course, is a form of hypereviscerated Reiyk."
-- Marc Volovic, linux-il, 14 Dec 2000
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-26 8:05 ` Gilad Ben-Yossef
@ 2009-10-26 15:08 ` Bill Fink
2009-10-26 15:51 ` Gilad Ben-Yossef
0 siblings, 1 reply; 34+ messages in thread
From: Bill Fink @ 2009-10-26 15:08 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: Eric Dumazet, William Allen Simpson, netdev, Ilpo Järvinen
On Mon, 26 Oct 2009, Gilad Ben-Yossef wrote:
> Eric Dumazet wrote:
>
> > Bill Fink a écrit :
> >
> >> On Sun, 25 Oct 2009, Gilad Ben-Yossef wrote:
> >>
> >>
> >>> Eric Dumazet wrote:
> >>>
> >>>
> >>> I still think having a global kill switch and per route options better
> >>> (basically use the exiting patch but not retire the global kill
> >>> switch|), but if you must Hgow about we leave the global sysctl as they
> >>> are and just have a two bit route option:
> >>>
> >>> 0 Use global default
> >>> 1 Off
> >>> 2 On
> >>>
> >>> It's kind of funny, because this is what the original patch from
> >>> Comsleep does and I thought it needlessly complicates things.
> >>>
> >>> So, what do you say - which will it be?
> >>>
> >> I personally feel the 2-bit settings are overkill. What i think
> >> makes the most sense is for the global options to act as they always
> >> have in the absence of any route specific settings, and for any
> >> route specific settings to override the related global settings.
> >> This is both simple and maintains backward compatibility.
> >>
> >
> > Backward compatibility is important, very important, if not the most
> > important thing. Then usability comes.
> >
> I tend to agree.
> > I know some busy servers where adding/changing a single route makes them
> > go crazy (because of ip route flush cache)
> >
> > So if a route is overriding a global conf, and the admin wants to make an
> > emergency change during peak hours, he should do it by a global setting,
> > or he wont use at all this new stuff, and stay conservative.
> >
> > Alternative would be to not trigger the flush of cache when changing
> > features flags.
> >
> >
> OK. It really sounds like we should go with my first suggestion: global
> sysctl based kill switches, just as we have now and in addition, the
> ability to kill TCP options per route. The TCP option will be used if
> and only if both kill switches (global and per route) are not set.
This wording is confusing. The global kill switch not being set
really means that the sysctl is set. And this assumes the per-route
default is not set. Correct?
> What we achieve is:
>
> 1. Global kill switches work exactly as they do now, whether you use the
> new per route options or not, so backwards compatible.
>
> 2. In addition, if the global kill switch is not in effect, you can also
> kill the options on a per route basis.
>
> I'm going to send third version of the patch to this effect, minus the
> new remote DoS possibility that Ilpo pointed out and leaving the global
> sysctl kill switches be.
>
> If you like it, please ACK ;-)
IIUC this doesn't seem right to me. I believe the global setting
should be a default and the route specific an override. Your scheme
would mean that if I set a global option to disable timestamps, then
I couldn't enable timestamps on specific routes using the per route
setting.
And it also doesn't seem to address Eric's scenario. If I understand
his concern correctly, what seems to be needed is a third global
reset value (not calling it a setting since the actual global setting
wouldn't be changed), which would reset any per-route override settings
to the global default setting.
-Bill
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-26 15:08 ` Bill Fink
@ 2009-10-26 15:51 ` Gilad Ben-Yossef
2009-10-27 5:09 ` Bill Fink
0 siblings, 1 reply; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-26 15:51 UTC (permalink / raw)
To: Bill Fink; +Cc: Eric Dumazet, William Allen Simpson, netdev, Ilpo Järvinen
Bill Fink wrote:
>
>> OK. It really sounds like we should go with my first suggestion: global
>> sysctl based kill switches, just as we have now and in addition, the
>> ability to kill TCP options per route. The TCP option will be used if
>> and only if both kill switches (global and per route) are not set.
>>
>
> This wording is confusing. The global kill switch not being set
> really means that the sysctl is set. And this assumes the per-route
> default is not set. Correct?
>
Now it is my turn to get confused, because I didn't understand your
question :-)
What I suggest is to leave the sysctl exactly as they are now:
- You leave them be (value of 1), the respective TCP option is
supported. This is the default.
- You turn them off (write 0), the respective TCP option is not supported.
What I suggest to *add* is the following ability:
- If you have the TCP option support turned on (default, value of one),
you can turn support for the option for a specific route using a ip
route option.
Hope that made it clearer.
>
>> What we achieve is:
>>
>> 1. Global kill switches work exactly as they do now, whether you use the
>> new per route options or not, so backwards compatible.
>>
>> 2. In addition, if the global kill switch is not in effect, you can also
>> kill the options on a per route basis.
>>
>> I'm going to send third version of the patch to this effect, minus the
>> new remote DoS possibility that Ilpo pointed out and leaving the global
>> sysctl kill switches be.
>>
>> If you like it, please ACK ;-)
>>
>
> IIUC this doesn't seem right to me. I believe the global setting
> should be a default and the route specific an override. Your scheme
> would mean that if I set a global option to disable timestamps, then
> I couldn't enable timestamps on specific routes using the per route
> setting.
>
Yes. You understand my intention perfectly.
Let me try to explain why I believe this is the correct behavior to
implement:
1. This is the closest thing to what we have now. Today you write 0 to
the sysctl and that TCP option is turned off globally. Period. My
suggestion leaves this behavior as is now regardless if you've used per
route settings. The other way make a very subtle change in the meaning
of writing 0 to the sysctl.
I believe very subtle changes to meaning of long established interfaces
is bad way to go. It's better to change interfaces on users, but it is
even worse to maek something that they have long used do something just
slightly different.
2. If the per route options needs to be "default, of or off" instead of
"on or off", we'd need to move from 1 bit to store the option to, well
2s bit in theory, but probably 32 bits in practice, since we can't use
RTAX_FEATURES any longer.
Yes, we can invent RTAX_FEATURES_TWO_BITS or some such, but I'd say that
is ugly :-)
3. I believe that the scenario of needing to set the support of a TCP
option globally off and just turn it on for a specific route is not very
likely to be needed and losing it is a small price to pay for 1 + 2.
> And it also doesn't seem to address Eric's scenario. If I understand
> his concern correctly, what seems to be needed is a third global
> reset value (not calling it a setting since the actual global setting
> wouldn't be changed), which would reset any per-route override settings
> to the global default setting.
>
>
Well, I do not believe this is what Eric meant (Eric?) but if it is then
I fail to see why
to require from the per route TCP options switches what is not required
of any other
route specific option already existing, since AFAIK we don't have a
"reset to default values" to the other options already supported.
Having said all that, I have no issue with re-spinning the patch with
your suggestion.
I don't feel all that much which is the correct way- I just want to get
as much feedback as possible
since I'm suggesting to add a new user interface options and we all know
it is very hard to back peddle
on those, so I'm trying to make sure to get enough feedback to do it
right the firs time.
So any feedback from anyone regarding favorite interface? it seems each
person fancy a different one :-)
Thanks!
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"The biggest risk you can take it is to take no risk."
-- Mark Zuckerberg and probably others
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-26 15:51 ` Gilad Ben-Yossef
@ 2009-10-27 5:09 ` Bill Fink
0 siblings, 0 replies; 34+ messages in thread
From: Bill Fink @ 2009-10-27 5:09 UTC (permalink / raw)
To: Gilad Ben-Yossef
Cc: Eric Dumazet, William Allen Simpson, netdev, Ilpo Järvinen
On Mon, 26 Oct 2009, Gilad Ben-Yossef wrote:
> Bill Fink wrote:
>
> >> OK. It really sounds like we should go with my first suggestion: global
> >> sysctl based kill switches, just as we have now and in addition, the
> >> ability to kill TCP options per route. The TCP option will be used if
> >> and only if both kill switches (global and per route) are not set.
> >>
> >
> > This wording is confusing. The global kill switch not being set
> > really means that the sysctl is set. And this assumes the per-route
> > default is not set. Correct?
> >
> Now it is my turn to get confused, because I didn't understand your
> question :-)
My question was just if I understood you correctly, which apparently
I did.
> What I suggest is to leave the sysctl exactly as they are now:
>
> - You leave them be (value of 1), the respective TCP option is
> supported. This is the default.
> - You turn them off (write 0), the respective TCP option is not supported.
>
> What I suggest to *add* is the following ability:
>
> - If you have the TCP option support turned on (default, value of one),
> you can turn support for the option for a specific route using a ip
> route option.
Should be "turn" -> "turn off" above.
> Hope that made it clearer.
Yes.
> >> What we achieve is:
> >>
> >> 1. Global kill switches work exactly as they do now, whether you use the
> >> new per route options or not, so backwards compatible.
> >>
> >> 2. In addition, if the global kill switch is not in effect, you can also
> >> kill the options on a per route basis.
> >>
> >> I'm going to send third version of the patch to this effect, minus the
> >> new remote DoS possibility that Ilpo pointed out and leaving the global
> >> sysctl kill switches be.
> >>
> >> If you like it, please ACK ;-)
> >>
> >
> > IIUC this doesn't seem right to me. I believe the global setting
> > should be a default and the route specific an override. Your scheme
> > would mean that if I set a global option to disable timestamps, then
> > I couldn't enable timestamps on specific routes using the per route
> > setting.
> >
> Yes. You understand my intention perfectly.
>
> Let me try to explain why I believe this is the correct behavior to
> implement:
>
> 1. This is the closest thing to what we have now. Today you write 0 to
> the sysctl and that TCP option is turned off globally. Period. My
> suggestion leaves this behavior as is now regardless if you've used per
> route settings. The other way make a very subtle change in the meaning
> of writing 0 to the sysctl.
Continuing to support existing behavior is the most important
consideration for me, so your intention for the per route settings
doesn't cause me any major grief. I initially thought the per route
setting as an override to a global default setting was more intuitive
than your method of it being a disable switch only, but my thinking
has since evolved (see below).
> I believe very subtle changes to meaning of long established interfaces
> is bad way to go. It's better to change interfaces on users, but it is
> even worse to maek something that they have long used do something just
> slightly different.
I didn't see how my suggestion changed existing behavior. If you
didn't use the per route settings everything remained as it was
(or so I initially thought). And if you were using a new feature
then you needed to be aware of its semantics.
> 2. If the per route options needs to be "default, of or off" instead of
> "on or off", we'd need to move from 1 bit to store the option to, well
> 2s bit in theory, but probably 32 bits in practice, since we can't use
> RTAX_FEATURES any longer.
>
> Yes, we can invent RTAX_FEATURES_TWO_BITS or some such, but I'd say that
> is ugly :-)
I only intended a single bit for the global default and per route
settings. The global default would just be enabled (1) or disabled(0).
The per route setting would be inherited from the global default setting,
and also would be just enabled(1) or disabled(0).
Explicitly setting a per route setting to enabled(1) or disabled(0)
would override the initial default setting.
However I didn't fully work through all the ramifications of this
idea, and I now realize this would entail some changes to the existing
behavior of the global setting, which I agree is unacceptable.
I therefore now believe your latest proposal is a better approach
for maintaining backward compatibility and avoiding any nasty
unexpected surprises to existing working configurations.
> 3. I believe that the scenario of needing to set the support of a TCP
> option globally off and just turn it on for a specific route is not very
> likely to be needed and losing it is a small price to pay for 1 + 2.
I agree this is probably an uncommon scenario.
> > And it also doesn't seem to address Eric's scenario. If I understand
> > his concern correctly, what seems to be needed is a third global
> > reset value (not calling it a setting since the actual global setting
> > wouldn't be changed), which would reset any per-route override settings
> > to the global default setting.
> >
> >
> Well, I do not believe this is what Eric meant (Eric?) but if it is then
> I fail to see why
> to require from the per route TCP options switches what is not required
> of any other
> route specific option already existing, since AFAIK we don't have a
> "reset to default values" to the other options already supported.
I won't try and speak for Eric anymore.
> Having said all that, I have no issue with re-spinning the patch with
> your suggestion.
> I don't feel all that much which is the correct way- I just want to get
> as much feedback as possible
> since I'm suggesting to add a new user interface options and we all know
> it is very hard to back peddle
> on those, so I'm trying to make sure to get enough feedback to do it
> right the firs time.
>
> So any feedback from anyone regarding favorite interface? it seems each
> person fancy a different one :-)
I haven't checked the details of your patch, but I am now fine with
the concepts of the patch as you most recently presented them.
-Bill
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
2009-10-21 19:30 ` William Allen Simpson
2009-10-22 4:32 ` Bill Fink
@ 2009-10-25 8:45 ` Gilad Ben-Yossef
1 sibling, 0 replies; 34+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-25 8:45 UTC (permalink / raw)
To: William Allen Simpson; +Cc: netdev
William Allen Simpson wrote:
> Gilad Ben-Yossef wrote:
>> I have no issue with leaving those, if everyone thinks we're better off.
>>
>> BTW, while we're talking about OS envy, I do believe that Windows do let
>> you specify on a per route basis. Not that this is really a good
>> ground for
>> technical decision, but still... :-)
>>
> I'm not concerned with "envy", I'm concerned with training operators, and
> consistency across platforms.
>
> I'm in favor of per route configuration, it seems reasonably clean, as
> long as it's done consistently with other systems. I don't permit
> Windows
> systems to be used here (except under controlled security
> circumstances), so
> I'm not familiar with their configuration. However, doing things
> similarly
> across platforms will ease documentation and training.
>
I have no objection leaving the current sysctl as is. Personally, I
think it's useless when you have route level options but I don't care
all that much.
The powers that be are more then welcome to ACK the first 7 patches and
NACK the documentation of the sysctl as deprecated. :-)
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Sorry cannot parse this, its too long to be true :)"
-- Eric Dumazet on netdev mailing list
^ permalink raw reply [flat|nested] 34+ messages in thread