netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gilad Ben-Yossef <gilad@codefidence.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, Ori Finkalman <ori@comsleep.com>
Subject: Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
Date: Wed, 30 Sep 2009 08:28:12 +0200	[thread overview]
Message-ID: <4AC2FA7C.6030901@codefidence.com> (raw)
In-Reply-To: <4AC241BA.8040608@gmail.com>

Hi,


[ Resending reply due to Android Gmail client sorry state. My apologies 
if you got it twice. ]


Eric Dumazet wrote:

> Gilad Ben-Yossef a écrit :
>   
>> From: Ori Finkalman <ori@comsleep.com>
>>
>>
>> Acknowledge TCP window scale support by inserting the proper option in
>> SYN/ACK header
>> even if our window scale is zero.
>>
>>
>> This fixes the following observed behavior:
>>
>>
>> 1. Client sends a SYN with TCP window scaling option and non zero window
>> scale value to a Linux box.
>>
>> 2. Linux box notes large receive window from client.
>>
>> 3. Linux decides on a zero value of window scale for its part.
>>
>> 4. Due to compare against requested window scale size option, Linux does
>> not to send windows scale
>>
>> TCP option header on SYN/ACK at all.
>>
>>
>> Result:
>>
>>
>> Client box thinks TCP window scaling is not supported, since SYN/ACK had
>> no TCP window scale option,
>> while Linux thinks that TCP window scaling is supported (and scale might
>> be non zero), since SYN had
>>
>> TCP window scale option and we have a mismatched idea between the client
>> and server regarding window sizes.
>>
>>
>> Please comment and/or apply.
>> ...
>>
>>
>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>> Signed-off-by: Ori Finkelman <ori@comsleep.com>
>>
>>
>> Index: net/ipv4/tcp_output.c
>> ===================================================================
>> --- net/ipv4/tcp_output.c    (revision 46)
>> +++ net/ipv4/tcp_output.c    (revision 210)
>> @@ -353,6 +353,7 @@ static void tcp_init_nondata_skb(struct
>> #define OPTION_SACK_ADVERTISE    (1 << 0)
>> #define OPTION_TS        (1 << 1)
>> #define OPTION_MD5        (1 << 2)
>> +#define OPTION_WSCALE        (1 << 3)
>>
>> struct tcp_out_options {
>>     u8 options;        /* bit field of OPTION_* */
>> @@ -417,7 +418,7 @@ static void tcp_options_write(__be32 *pt
>>                    TCPOLEN_SACK_PERM);
>>     }
>>
>> -    if (unlikely(opts->ws)) {
>> +    if (unlikely(OPTION_WSCALE & opts->options)) {
>>         *ptr++ = htonl((TCPOPT_NOP << 24) |
>>                    (TCPOPT_WINDOW << 16) |
>>                    (TCPOLEN_WINDOW << 8) |
>> @@ -530,8 +531,8 @@ static unsigned tcp_synack_options(struc
>>
>>     if (likely(ireq->wscale_ok)) {
>>         opts->ws = ireq->rcv_wscale;
>> -        if(likely(opts->ws))
>> -            size += TCPOLEN_WSCALE_ALIGNED;
>> +        opts->options |= OPTION_WSCALE;
>> +        size += TCPOLEN_WSCALE_ALIGNED;
>>     }
>>     if (likely(doing_ts)) {
>>         opts->options |= OPTION_TS;
>>
>>
>>
>>     
>
> Seems not the more logical places to put this logic...
>
> How about this instead ?
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5200aab..b78c084 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -216,6 +216,11 @@ void tcp_select_initial_window(int __space, __u32 mss,
>  			space >>= 1;
>  			(*rcv_wscale)++;
>  		}
> +		/*
> +		 * Set a minimum wscale of 1
> +		 */
> +		if (*rcv_wscale == 0)
> +			*rcv_wscale = 1;
>         }
>
>         /* Set initial window to value enough for senders,
>
>   

Thank you for the patch review. The suggested replacement patch 
certainly is shorter, code wise, which is an advantage.

I cant help but feel though, that it is less readable - a window scale 
of zero is a perfectly legit value. Adding special logic to rule it out 
just because we chose to overload this setting for something else 
(whether window scaling is supported or not) seems like an invitation 
for someone to get it wrong again down the line, in my opinion.

Also note that the suggested fix is in line with how other TCP options 
are handled, e.g. TCP timestamp.

Anyone else wants to chime in on that?

PS. I also managed to to get the patch author name spelling wrong. It is 
Ori Finkelman and not as written.

Thanks!
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Now the world has gone to bed
	 Darkness won't engulf my head
	 I can see by infra-red
	 How I hate the night."


  reply	other threads:[~2009-09-30  6:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-29 15:05 [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero Gilad Ben-Yossef
2009-09-29 17:19 ` Eric Dumazet
2009-09-30  6:28   ` Gilad Ben-Yossef [this message]
2009-09-30  7:16     ` Eric Dumazet
2009-09-30 11:42       ` Ilpo Järvinen
2009-09-30 13:06         ` Eric Dumazet
2009-10-01  9:39           ` Gilad Ben-Yossef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AC2FA7C.6030901@codefidence.com \
    --to=gilad@codefidence.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=ori@comsleep.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).