* [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
@ 2009-09-29 15:05 Gilad Ben-Yossef
2009-09-29 17:19 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Gilad Ben-Yossef @ 2009-09-29 15:05 UTC (permalink / raw)
To: netdev; +Cc: Ori Finkalman
From: Ori Finkalman <ori@comsleep.com>
Acknowledge TCP window scale support by inserting the proper option in
SYN/ACK header
even if our window scale is zero.
This fixes the following observed behavior:
1. Client sends a SYN with TCP window scaling option and non zero window
scale value to a Linux box.
2. Linux box notes large receive window from client.
3. Linux decides on a zero value of window scale for its part.
4. Due to compare against requested window scale size option, Linux does
not to send windows scale
TCP option header on SYN/ACK at all.
Result:
Client box thinks TCP window scaling is not supported, since SYN/ACK had
no TCP window scale option,
while Linux thinks that TCP window scaling is supported (and scale might
be non zero), since SYN had
TCP window scale option and we have a mismatched idea between the client
and server regarding window sizes.
Please comment and/or apply.
---
Bug reported and patch written by Ori Finkalman from Comsleep Ltd. I'm
just helping mainline it.
The behavior was observed with a Windows box as the client and latest
Debian kernel but for the best
of my understanding this can happen with latest kernel versions and
other client OS (probably also Linux)
as well.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>
Index: net/ipv4/tcp_output.c
===================================================================
--- net/ipv4/tcp_output.c (revision 46)
+++ net/ipv4/tcp_output.c (revision 210)
@@ -353,6 +353,7 @@ static void tcp_init_nondata_skb(struct
#define OPTION_SACK_ADVERTISE (1 << 0)
#define OPTION_TS (1 << 1)
#define OPTION_MD5 (1 << 2)
+#define OPTION_WSCALE (1 << 3)
struct tcp_out_options {
u8 options; /* bit field of OPTION_* */
@@ -417,7 +418,7 @@ static void tcp_options_write(__be32 *pt
TCPOLEN_SACK_PERM);
}
- if (unlikely(opts->ws)) {
+ if (unlikely(OPTION_WSCALE & opts->options)) {
*ptr++ = htonl((TCPOPT_NOP << 24) |
(TCPOPT_WINDOW << 16) |
(TCPOLEN_WINDOW << 8) |
@@ -530,8 +531,8 @@ static unsigned tcp_synack_options(struc
if (likely(ireq->wscale_ok)) {
opts->ws = ireq->rcv_wscale;
- if(likely(opts->ws))
- size += TCPOLEN_WSCALE_ALIGNED;
+ opts->options |= OPTION_WSCALE;
+ size += TCPOLEN_WSCALE_ALIGNED;
}
if (likely(doing_ts)) {
opts->options |= OPTION_TS;
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Now the world has gone to bed
Darkness won't engulf my head
I can see by infra-red
How I hate the night."
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
2009-09-29 15:05 [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero Gilad Ben-Yossef
@ 2009-09-29 17:19 ` Eric Dumazet
2009-09-30 6:28 ` Gilad Ben-Yossef
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-09-29 17:19 UTC (permalink / raw)
To: Gilad Ben-Yossef; +Cc: netdev, Ori Finkalman
Gilad Ben-Yossef a écrit :
> From: Ori Finkalman <ori@comsleep.com>
>
>
> Acknowledge TCP window scale support by inserting the proper option in
> SYN/ACK header
> even if our window scale is zero.
>
>
> This fixes the following observed behavior:
>
>
> 1. Client sends a SYN with TCP window scaling option and non zero window
> scale value to a Linux box.
>
> 2. Linux box notes large receive window from client.
>
> 3. Linux decides on a zero value of window scale for its part.
>
> 4. Due to compare against requested window scale size option, Linux does
> not to send windows scale
>
> TCP option header on SYN/ACK at all.
>
>
> Result:
>
>
> Client box thinks TCP window scaling is not supported, since SYN/ACK had
> no TCP window scale option,
> while Linux thinks that TCP window scaling is supported (and scale might
> be non zero), since SYN had
>
> TCP window scale option and we have a mismatched idea between the client
> and server regarding window sizes.
>
>
> Please comment and/or apply.
>
>
> ---
>
>
> Bug reported and patch written by Ori Finkalman from Comsleep Ltd. I'm
> just helping mainline it.
>
>
> The behavior was observed with a Windows box as the client and latest
> Debian kernel but for the best
> of my understanding this can happen with latest kernel versions and
> other client OS (probably also Linux)
>
> as well.
>
>
>
> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> Signed-off-by: Ori Finkelman <ori@comsleep.com>
>
>
> Index: net/ipv4/tcp_output.c
> ===================================================================
> --- net/ipv4/tcp_output.c (revision 46)
> +++ net/ipv4/tcp_output.c (revision 210)
> @@ -353,6 +353,7 @@ static void tcp_init_nondata_skb(struct
> #define OPTION_SACK_ADVERTISE (1 << 0)
> #define OPTION_TS (1 << 1)
> #define OPTION_MD5 (1 << 2)
> +#define OPTION_WSCALE (1 << 3)
>
> struct tcp_out_options {
> u8 options; /* bit field of OPTION_* */
> @@ -417,7 +418,7 @@ static void tcp_options_write(__be32 *pt
> TCPOLEN_SACK_PERM);
> }
>
> - if (unlikely(opts->ws)) {
> + if (unlikely(OPTION_WSCALE & opts->options)) {
> *ptr++ = htonl((TCPOPT_NOP << 24) |
> (TCPOPT_WINDOW << 16) |
> (TCPOLEN_WINDOW << 8) |
> @@ -530,8 +531,8 @@ static unsigned tcp_synack_options(struc
>
> if (likely(ireq->wscale_ok)) {
> opts->ws = ireq->rcv_wscale;
> - if(likely(opts->ws))
> - size += TCPOLEN_WSCALE_ALIGNED;
> + opts->options |= OPTION_WSCALE;
> + size += TCPOLEN_WSCALE_ALIGNED;
> }
> if (likely(doing_ts)) {
> opts->options |= OPTION_TS;
>
>
>
Seems not the more logical places to put this logic...
How about this instead ?
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5200aab..b78c084 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -216,6 +216,11 @@ void tcp_select_initial_window(int __space, __u32 mss,
space >>= 1;
(*rcv_wscale)++;
}
+ /*
+ * Set a minimum wscale of 1
+ */
+ if (*rcv_wscale == 0)
+ *rcv_wscale = 1;
}
/* Set initial window to value enough for senders,
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
2009-09-29 17:19 ` Eric Dumazet
@ 2009-09-30 6:28 ` Gilad Ben-Yossef
2009-09-30 7:16 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Gilad Ben-Yossef @ 2009-09-30 6:28 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Ori Finkalman
Hi,
[ Resending reply due to Android Gmail client sorry state. My apologies
if you got it twice. ]
Eric Dumazet wrote:
> Gilad Ben-Yossef a écrit :
>
>> From: Ori Finkalman <ori@comsleep.com>
>>
>>
>> Acknowledge TCP window scale support by inserting the proper option in
>> SYN/ACK header
>> even if our window scale is zero.
>>
>>
>> This fixes the following observed behavior:
>>
>>
>> 1. Client sends a SYN with TCP window scaling option and non zero window
>> scale value to a Linux box.
>>
>> 2. Linux box notes large receive window from client.
>>
>> 3. Linux decides on a zero value of window scale for its part.
>>
>> 4. Due to compare against requested window scale size option, Linux does
>> not to send windows scale
>>
>> TCP option header on SYN/ACK at all.
>>
>>
>> Result:
>>
>>
>> Client box thinks TCP window scaling is not supported, since SYN/ACK had
>> no TCP window scale option,
>> while Linux thinks that TCP window scaling is supported (and scale might
>> be non zero), since SYN had
>>
>> TCP window scale option and we have a mismatched idea between the client
>> and server regarding window sizes.
>>
>>
>> Please comment and/or apply.
>> ...
>>
>>
>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>> Signed-off-by: Ori Finkelman <ori@comsleep.com>
>>
>>
>> Index: net/ipv4/tcp_output.c
>> ===================================================================
>> --- net/ipv4/tcp_output.c (revision 46)
>> +++ net/ipv4/tcp_output.c (revision 210)
>> @@ -353,6 +353,7 @@ static void tcp_init_nondata_skb(struct
>> #define OPTION_SACK_ADVERTISE (1 << 0)
>> #define OPTION_TS (1 << 1)
>> #define OPTION_MD5 (1 << 2)
>> +#define OPTION_WSCALE (1 << 3)
>>
>> struct tcp_out_options {
>> u8 options; /* bit field of OPTION_* */
>> @@ -417,7 +418,7 @@ static void tcp_options_write(__be32 *pt
>> TCPOLEN_SACK_PERM);
>> }
>>
>> - if (unlikely(opts->ws)) {
>> + if (unlikely(OPTION_WSCALE & opts->options)) {
>> *ptr++ = htonl((TCPOPT_NOP << 24) |
>> (TCPOPT_WINDOW << 16) |
>> (TCPOLEN_WINDOW << 8) |
>> @@ -530,8 +531,8 @@ static unsigned tcp_synack_options(struc
>>
>> if (likely(ireq->wscale_ok)) {
>> opts->ws = ireq->rcv_wscale;
>> - if(likely(opts->ws))
>> - size += TCPOLEN_WSCALE_ALIGNED;
>> + opts->options |= OPTION_WSCALE;
>> + size += TCPOLEN_WSCALE_ALIGNED;
>> }
>> if (likely(doing_ts)) {
>> opts->options |= OPTION_TS;
>>
>>
>>
>>
>
> Seems not the more logical places to put this logic...
>
> How about this instead ?
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5200aab..b78c084 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -216,6 +216,11 @@ void tcp_select_initial_window(int __space, __u32 mss,
> space >>= 1;
> (*rcv_wscale)++;
> }
> + /*
> + * Set a minimum wscale of 1
> + */
> + if (*rcv_wscale == 0)
> + *rcv_wscale = 1;
> }
>
> /* Set initial window to value enough for senders,
>
>
Thank you for the patch review. The suggested replacement patch
certainly is shorter, code wise, which is an advantage.
I cant help but feel though, that it is less readable - a window scale
of zero is a perfectly legit value. Adding special logic to rule it out
just because we chose to overload this setting for something else
(whether window scaling is supported or not) seems like an invitation
for someone to get it wrong again down the line, in my opinion.
Also note that the suggested fix is in line with how other TCP options
are handled, e.g. TCP timestamp.
Anyone else wants to chime in on that?
PS. I also managed to to get the patch author name spelling wrong. It is
Ori Finkelman and not as written.
Thanks!
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Now the world has gone to bed
Darkness won't engulf my head
I can see by infra-red
How I hate the night."
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
2009-09-30 6:28 ` Gilad Ben-Yossef
@ 2009-09-30 7:16 ` Eric Dumazet
2009-09-30 11:42 ` Ilpo Järvinen
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-09-30 7:16 UTC (permalink / raw)
To: Gilad Ben-Yossef; +Cc: netdev, Ori Finkalman
Gilad Ben-Yossef a écrit :
> Hi,
>
>
> [ Resending reply due to Android Gmail client sorry state. My apologies
> if you got it twice. ]
>
>
> Eric Dumazet wrote:
>
>> Gilad Ben-Yossef a écrit :
>>
>>> From: Ori Finkalman <ori@comsleep.com>
>>>
>>>
>>> Acknowledge TCP window scale support by inserting the proper option in
>>> SYN/ACK header
>>> even if our window scale is zero.
>>>
>>>
>>> This fixes the following observed behavior:
>>>
>>>
>>> 1. Client sends a SYN with TCP window scaling option and non zero window
>>> scale value to a Linux box.
>>>
>>> 2. Linux box notes large receive window from client.
>>>
>>> 3. Linux decides on a zero value of window scale for its part.
>>>
>>> 4. Due to compare against requested window scale size option, Linux does
>>> not to send windows scale
>>>
>>> TCP option header on SYN/ACK at all.
>>>
>>>
>>> Result:
>>>
>>>
>>> Client box thinks TCP window scaling is not supported, since SYN/ACK had
>>> no TCP window scale option,
>>> while Linux thinks that TCP window scaling is supported (and scale might
>>> be non zero), since SYN had
>>>
>>> TCP window scale option and we have a mismatched idea between the client
>>> and server regarding window sizes.
>>>
>>>
>>> Please comment and/or apply.
>>> ...
>>>
>>>
>>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>>> Signed-off-by: Ori Finkelman <ori@comsleep.com>
>>>
>>>
>>> Index: net/ipv4/tcp_output.c
>>> ===================================================================
>>> --- net/ipv4/tcp_output.c (revision 46)
>>> +++ net/ipv4/tcp_output.c (revision 210)
>>> @@ -353,6 +353,7 @@ static void tcp_init_nondata_skb(struct
>>> #define OPTION_SACK_ADVERTISE (1 << 0)
>>> #define OPTION_TS (1 << 1)
>>> #define OPTION_MD5 (1 << 2)
>>> +#define OPTION_WSCALE (1 << 3)
>>>
>>> struct tcp_out_options {
>>> u8 options; /* bit field of OPTION_* */
>>> @@ -417,7 +418,7 @@ static void tcp_options_write(__be32 *pt
>>> TCPOLEN_SACK_PERM);
>>> }
>>>
>>> - if (unlikely(opts->ws)) {
>>> + if (unlikely(OPTION_WSCALE & opts->options)) {
>>> *ptr++ = htonl((TCPOPT_NOP << 24) |
>>> (TCPOPT_WINDOW << 16) |
>>> (TCPOLEN_WINDOW << 8) |
>>> @@ -530,8 +531,8 @@ static unsigned tcp_synack_options(struc
>>>
>>> if (likely(ireq->wscale_ok)) {
>>> opts->ws = ireq->rcv_wscale;
>>> - if(likely(opts->ws))
>>> - size += TCPOLEN_WSCALE_ALIGNED;
>>> + opts->options |= OPTION_WSCALE;
>>> + size += TCPOLEN_WSCALE_ALIGNED;
>>> }
>>> if (likely(doing_ts)) {
>>> opts->options |= OPTION_TS;
>>>
>>>
>>>
>>>
>>
>> Seems not the more logical places to put this logic...
>>
>> How about this instead ?
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 5200aab..b78c084 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -216,6 +216,11 @@ void tcp_select_initial_window(int __space, __u32
>> mss,
>> space >>= 1;
>> (*rcv_wscale)++;
>> }
>> + /*
>> + * Set a minimum wscale of 1
>> + */
>> + if (*rcv_wscale == 0)
>> + *rcv_wscale = 1;
>> }
>>
>> /* Set initial window to value enough for senders,
>>
>>
>
> Thank you for the patch review. The suggested replacement patch
> certainly is shorter, code wise, which is an advantage.
>
> I cant help but feel though, that it is less readable - a window scale
> of zero is a perfectly legit value. Adding special logic to rule it out
> just because we chose to overload this setting for something else
> (whether window scaling is supported or not) seems like an invitation
> for someone to get it wrong again down the line, in my opinion.
As a matter of fact I didnot test your patch.
My reaction was driven by :
Your version slows down the tcp_options_write() function, once per tx packet.
tcp_options_write() should not change socket state, while
tcp_select_initial_window() is the exact place where we are supposed to
compute wscale.
Also how is managed tcp_syn_options() case (for outgoing connections ?)
if (likely(sysctl_tcp_window_scaling)) {
opts->ws = tp->rx_opt.rcv_wscale;
if (likely(opts->ws))
size += TCPOLEN_WSCALE_ALIGNED;
}
Dont you need to patch it as well ?
>
> Also note that the suggested fix is in line with how other TCP options
> are handled, e.g. TCP timestamp.
>
> Anyone else wants to chime in on that?
>
> PS. I also managed to to get the patch author name spelling wrong. It is
> Ori Finkelman and not as written.
>
> Thanks!
> Gilad
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
2009-09-30 7:16 ` Eric Dumazet
@ 2009-09-30 11:42 ` Ilpo Järvinen
2009-09-30 13:06 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Ilpo Järvinen @ 2009-09-30 11:42 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Gilad Ben-Yossef, Netdev, Ori Finkalman
[-- Attachment #1: Type: TEXT/PLAIN, Size: 5468 bytes --]
On Wed, 30 Sep 2009, Eric Dumazet wrote:
> Gilad Ben-Yossef a écrit :
> >
> > Eric Dumazet wrote:
> >
> >> Gilad Ben-Yossef a écrit :
> >>
> >>> From: Ori Finkalman <ori@comsleep.com>
> >>>
> >>>
> >>> Acknowledge TCP window scale support by inserting the proper option in
> >>> SYN/ACK header
> >>> even if our window scale is zero.
> >>>
> >>>
> >>> This fixes the following observed behavior:
> >>>
> >>>
> >>> 1. Client sends a SYN with TCP window scaling option and non zero window
> >>> scale value to a Linux box.
> >>>
> >>> 2. Linux box notes large receive window from client.
> >>>
> >>> 3. Linux decides on a zero value of window scale for its part.
> >>>
> >>> 4. Due to compare against requested window scale size option, Linux does
> >>> not to send windows scale
> >>>
> >>> TCP option header on SYN/ACK at all.
> >>>
> >>>
> >>> Result:
> >>>
> >>>
> >>> Client box thinks TCP window scaling is not supported, since SYN/ACK had
> >>> no TCP window scale option,
> >>> while Linux thinks that TCP window scaling is supported (and scale might
> >>> be non zero), since SYN had
> >>>
> >>> TCP window scale option and we have a mismatched idea between the client
> >>> and server regarding window sizes.
> >>>
> >>>
> >>> Please comment and/or apply.
> >>> ...
> >>>
> >>>
> >>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> >>> Signed-off-by: Ori Finkelman <ori@comsleep.com>
> >>>
> >>>
> >>> Index: net/ipv4/tcp_output.c
> >>> ===================================================================
> >>> --- net/ipv4/tcp_output.c (revision 46)
> >>> +++ net/ipv4/tcp_output.c (revision 210)
> >>> @@ -353,6 +353,7 @@ static void tcp_init_nondata_skb(struct
> >>> #define OPTION_SACK_ADVERTISE (1 << 0)
> >>> #define OPTION_TS (1 << 1)
> >>> #define OPTION_MD5 (1 << 2)
> >>> +#define OPTION_WSCALE (1 << 3)
> >>>
> >>> struct tcp_out_options {
> >>> u8 options; /* bit field of OPTION_* */
> >>> @@ -417,7 +418,7 @@ static void tcp_options_write(__be32 *pt
> >>> TCPOLEN_SACK_PERM);
> >>> }
> >>>
> >>> - if (unlikely(opts->ws)) {
> >>> + if (unlikely(OPTION_WSCALE & opts->options)) {
> >>> *ptr++ = htonl((TCPOPT_NOP << 24) |
> >>> (TCPOPT_WINDOW << 16) |
> >>> (TCPOLEN_WINDOW << 8) |
> >>> @@ -530,8 +531,8 @@ static unsigned tcp_synack_options(struc
> >>>
> >>> if (likely(ireq->wscale_ok)) {
> >>> opts->ws = ireq->rcv_wscale;
> >>> - if(likely(opts->ws))
> >>> - size += TCPOLEN_WSCALE_ALIGNED;
> >>> + opts->options |= OPTION_WSCALE;
> >>> + size += TCPOLEN_WSCALE_ALIGNED;
> >>> }
> >>> if (likely(doing_ts)) {
> >>> opts->options |= OPTION_TS;
> >>>
> >>>
> >>>
> >>>
> >>
> >> Seems not the more logical places to put this logic...
> >>
> >> How about this instead ?
> >>
> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> >> index 5200aab..b78c084 100644
> >> --- a/net/ipv4/tcp_output.c
> >> +++ b/net/ipv4/tcp_output.c
> >> @@ -216,6 +216,11 @@ void tcp_select_initial_window(int __space, __u32
> >> mss,
> >> space >>= 1;
> >> (*rcv_wscale)++;
> >> }
> >> + /*
> >> + * Set a minimum wscale of 1
> >> + */
> >> + if (*rcv_wscale == 0)
> >> + *rcv_wscale = 1;
> >> }
> >>
> >> /* Set initial window to value enough for senders,
> >>
> >>
> >
> > Thank you for the patch review. The suggested replacement patch
> > certainly is shorter, code wise, which is an advantage.
> >
> > I cant help but feel though, that it is less readable - a window scale
> > of zero is a perfectly legit value. Adding special logic to rule it out
> > just because we chose to overload this setting for something else
> > (whether window scaling is supported or not) seems like an invitation
> > for someone to get it wrong again down the line, in my opinion.
>
> As a matter of fact I didnot test your patch.
>
> My reaction was driven by :
>
> Your version slows down the tcp_options_write() function, once per tx packet.
Are you serious that anding would cost that much? :-/
> tcp_options_write() should not change socket state,
I fail to see how his patch was changing socket state in anyway in
anywhere?
> while
> tcp_select_initial_window() is the exact place where we are supposed to
> compute wscale.
And it calculated yielding to result of 0, which is perfectly valid. The
problem is that tcp_write_options thinks that 0 is indication of no window
scaling, instead of the correct interpretation of zero window scaling
which makes the huge difference for the opposite direction traffic as
these guys have noted. Not that I find your approach that bad either as
we only lose 1-bit accuracy for the window which is rather insignificant
as 1-byte window increments do not really make that much sense anyway
(and we have to specifically code against doing them anyway so the
effective granularity is much higher).
> Also how is managed tcp_syn_options() case (for outgoing connections ?)
>
> if (likely(sysctl_tcp_window_scaling)) {
> opts->ws = tp->rx_opt.rcv_wscale;
> if (likely(opts->ws))
> size += TCPOLEN_WSCALE_ALIGNED;
> }
>
> Dont you need to patch it as well ?
One certainly should change that too if that patch is the way to go
forward.
--
i.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
2009-09-30 11:42 ` Ilpo Järvinen
@ 2009-09-30 13:06 ` Eric Dumazet
2009-10-01 9:39 ` Gilad Ben-Yossef
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-09-30 13:06 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: Gilad Ben-Yossef, Netdev, Ori Finkalman
Ilpo Järvinen a écrit :
> On Wed, 30 Sep 2009, Eric Dumazet wrote:
>
>> Gilad Ben-Yossef a écrit :
>>> Eric Dumazet wrote:
>>>
>>>> Gilad Ben-Yossef a écrit :
>>>>
>>>>> From: Ori Finkalman <ori@comsleep.com>
>>>>>
>>>>>
>>>>> Acknowledge TCP window scale support by inserting the proper option in
>>>>> SYN/ACK header
>>>>> even if our window scale is zero.
>>>>>
>>>>>
>>>>> This fixes the following observed behavior:
>>>>>
>>>>>
>>>>> 1. Client sends a SYN with TCP window scaling option and non zero window
>>>>> scale value to a Linux box.
>>>>>
>>>>> 2. Linux box notes large receive window from client.
>>>>>
>>>>> 3. Linux decides on a zero value of window scale for its part.
>>>>>
>>>>> 4. Due to compare against requested window scale size option, Linux does
>>>>> not to send windows scale
>>>>>
>>>>> TCP option header on SYN/ACK at all.
>>>>>
>>>>>
>>>>> Result:
>>>>>
>>>>>
>>>>> Client box thinks TCP window scaling is not supported, since SYN/ACK had
>>>>> no TCP window scale option,
>>>>> while Linux thinks that TCP window scaling is supported (and scale might
>>>>> be non zero), since SYN had
>>>>>
>>>>> TCP window scale option and we have a mismatched idea between the client
>>>>> and server regarding window sizes.
>>>>>
>>>>>
>>>>> Please comment and/or apply.
>>>>> ...
>>>>>
>>>>>
>>>>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>>>>> Signed-off-by: Ori Finkelman <ori@comsleep.com>
>>>>>
>>>>>
>>>>> Index: net/ipv4/tcp_output.c
>>>>> ===================================================================
>>>>> --- net/ipv4/tcp_output.c (revision 46)
>>>>> +++ net/ipv4/tcp_output.c (revision 210)
>>>>> @@ -353,6 +353,7 @@ static void tcp_init_nondata_skb(struct
>>>>> #define OPTION_SACK_ADVERTISE (1 << 0)
>>>>> #define OPTION_TS (1 << 1)
>>>>> #define OPTION_MD5 (1 << 2)
>>>>> +#define OPTION_WSCALE (1 << 3)
>>>>>
>>>>> struct tcp_out_options {
>>>>> u8 options; /* bit field of OPTION_* */
>>>>> @@ -417,7 +418,7 @@ static void tcp_options_write(__be32 *pt
>>>>> TCPOLEN_SACK_PERM);
>>>>> }
>>>>>
>>>>> - if (unlikely(opts->ws)) {
>>>>> + if (unlikely(OPTION_WSCALE & opts->options)) {
>>>>> *ptr++ = htonl((TCPOPT_NOP << 24) |
>>>>> (TCPOPT_WINDOW << 16) |
>>>>> (TCPOLEN_WINDOW << 8) |
>>>>> @@ -530,8 +531,8 @@ static unsigned tcp_synack_options(struc
>>>>>
>>>>> if (likely(ireq->wscale_ok)) {
>>>>> opts->ws = ireq->rcv_wscale;
>>>>> - if(likely(opts->ws))
>>>>> - size += TCPOLEN_WSCALE_ALIGNED;
>>>>> + opts->options |= OPTION_WSCALE;
>>>>> + size += TCPOLEN_WSCALE_ALIGNED;
>>>>> }
>>>>> if (likely(doing_ts)) {
>>>>> opts->options |= OPTION_TS;
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Seems not the more logical places to put this logic...
>>>>
>>>> How about this instead ?
>>>>
>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>> index 5200aab..b78c084 100644
>>>> --- a/net/ipv4/tcp_output.c
>>>> +++ b/net/ipv4/tcp_output.c
>>>> @@ -216,6 +216,11 @@ void tcp_select_initial_window(int __space, __u32
>>>> mss,
>>>> space >>= 1;
>>>> (*rcv_wscale)++;
>>>> }
>>>> + /*
>>>> + * Set a minimum wscale of 1
>>>> + */
>>>> + if (*rcv_wscale == 0)
>>>> + *rcv_wscale = 1;
>>>> }
>>>>
>>>> /* Set initial window to value enough for senders,
>>>>
>>>>
>>> Thank you for the patch review. The suggested replacement patch
>>> certainly is shorter, code wise, which is an advantage.
>>>
>>> I cant help but feel though, that it is less readable - a window scale
>>> of zero is a perfectly legit value. Adding special logic to rule it out
>>> just because we chose to overload this setting for something else
>>> (whether window scaling is supported or not) seems like an invitation
>>> for someone to get it wrong again down the line, in my opinion.
>> As a matter of fact I didnot test your patch.
>>
>> My reaction was driven by :
>>
>> Your version slows down the tcp_options_write() function, once per tx packet.
>
> Are you serious that anding would cost that much? :-/
Not really :)
>
>> tcp_options_write() should not change socket state,
>
> I fail to see how his patch was changing socket state in anyway in
> anywhere?
Me too, now you say it :)
>
>> while
>> tcp_select_initial_window() is the exact place where we are supposed to
>> compute wscale.
>
> And it calculated yielding to result of 0, which is perfectly valid. The
> problem is that tcp_write_options thinks that 0 is indication of no window
> scaling, instead of the correct interpretation of zero window scaling
> which makes the huge difference for the opposite direction traffic as
> these guys have noted. Not that I find your approach that bad either as
> we only lose 1-bit accuracy for the window which is rather insignificant
> as 1-byte window increments do not really make that much sense anyway
> (and we have to specifically code against doing them anyway so the
> effective granularity is much higher).
Yes, wscale 0 is RFC valid, but are we sure some equipment wont play funny games
with such value ? At least sending "wscale 1-14" must be working...
My quick&dirty patch was only for discussion, I have no strong opinion on it,
only that was on one place to patch instead of two/three/four I dont know yet.
So please Gilad & Ori send us a new patch :)
>
>> Also how is managed tcp_syn_options() case (for outgoing connections ?)
>>
>> if (likely(sysctl_tcp_window_scaling)) {
>> opts->ws = tp->rx_opt.rcv_wscale;
>> if (likely(opts->ws))
>> size += TCPOLEN_WSCALE_ALIGNED;
>> }
>>
>> Dont you need to patch it as well ?
>
> One certainly should change that too if that patch is the way to go
> forward.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
2009-09-30 13:06 ` Eric Dumazet
@ 2009-10-01 9:39 ` Gilad Ben-Yossef
0 siblings, 0 replies; 7+ messages in thread
From: Gilad Ben-Yossef @ 2009-10-01 9:39 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ilpo Järvinen, Netdev, Ori Finkalman
Eric Dumazet wrote:
>
>>>
>>> Your version slows down the tcp_options_write() function, once per tx packet.
>>>
>> Are you serious that anding would cost that much? :-/
>>
>
> Not really :)
>
LOL I was trying very hard to understand why you thought this was such
an issue. My head was flying into all sorts of weird directions like
cache effects and the like... ;-)
<snip>
> Yes, wscale 0 is RFC valid, but are we sure some equipment wont play funny games
> with such value ? At least sending "wscale 1-14" must be working...
>
Well, there at least used to be routers that would actually zeroed the
WS value in transit while leaving the option set, but this is another
issue of course.
Anyway, I know Vista at least does set the window scale TCP option by
default. One assumes they occasionally send a zero value scale. Not that
Vista is such a good benchmark to compare Linux to but at least I tend
to believe the issue would have popped up if it is common enough.
I can craft a patch to introduce a route table option to set TCP window
scale minimum and maximum sizes, similar to window size route option, if
you there is a need for that. Personally, I think it is just overkill.
>
> My quick&dirty patch was only for discussion, I have no strong opinion on it,
> only that was on one place to patch instead of two/three/four I dont know yet.
>
> So please Gilad & Ori send us a new patch :)
>
>
Revised patch follows in next email.
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.
Web: http://codefidence.com
Cell: +972-52-8260388
Skype: gilad_codefidence
Tel: +972-8-9316883 ext. 201
Fax: +972-8-9316884
Email: gilad@codefidence.com
Check out our Open Source technology and training blog - http://tuxology.net
"Now the world has gone to bed
Darkness won't engulf my head
I can see by infra-red
How I hate the night."
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-10-01 9:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-29 15:05 [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero Gilad Ben-Yossef
2009-09-29 17:19 ` Eric Dumazet
2009-09-30 6:28 ` Gilad Ben-Yossef
2009-09-30 7:16 ` Eric Dumazet
2009-09-30 11:42 ` Ilpo Järvinen
2009-09-30 13:06 ` Eric Dumazet
2009-10-01 9:39 ` Gilad Ben-Yossef
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).