netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fan Du <fengyuleidian0615@gmail.com>
To: John Heffner <johnwheffner@gmail.com>
Cc: Fan Du <fan.du@intel.com>, David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH net-next 2/3] ipv4: Use binary search to choose tcp PMTU probe_size
Date: Mon, 16 Feb 2015 13:27:00 +0800	[thread overview]
Message-ID: <54E17FA4.1000104@gmail.com> (raw)
In-Reply-To: <CABrhC0mPhY1u5rb=KsaF96fLqw_QYLUGm_D9_Yhn655JxLN1Xw@mail.gmail.com>

于 2015年02月14日 01:52, John Heffner 写道:
> On Fri, Feb 13, 2015 at 3:16 AM, Fan Du <fan.du@intel.com> wrote:
>> Current probe_size is chosen by doubling mss_cache,
>> the initial mss base is 512 Bytes, as a result the
>> converged probe_size will only be 1024 Bytes, there
>> is still big gap between 1024 and common 1500 bytes
>> of mtu.
>>
>> Use binary search to choose probe_size in a fine
>> granularity manner, an optimal mss will be found
>> to boost performance as its maxmium.
>>
>> Test env:
>> Docker instance with vxlan encapuslation(82599EB)
>> iperf -c 10.0.0.24  -t 60
>>
>> before this patch:
>> 1.26 Gbits/sec
>>
>> After this patch: increase 26%
>> 1.59 Gbits/sec
>>
>> Signed-off-by: Fan Du <fan.du@intel.com>
>
> Thanks for looking into making mtu probing better.  Improving the
> search strategy is commendable.  One high level comment though is that
> there's some cost associated with probing and diminishing returns the
> smaller the interval (search_high - search_low), so there should be
> some threshold below which further probing is deemed no longer useful.
>
> Aside from that, some things in this patch don't look right to me.
> Comments inline below.
>
>
>> ---
>>   include/net/inet_connection_sock.h |    3 +++
>>   net/ipv4/tcp_input.c               |    5 ++++-
>>   net/ipv4/tcp_output.c              |   12 +++++++++---
>>   net/ipv4/tcp_timer.c               |    2 +-
>>   4 files changed, 17 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
>> index 5976bde..3d0932e 100644
>> --- a/include/net/inet_connection_sock.h
>> +++ b/include/net/inet_connection_sock.h
>> @@ -124,6 +124,9 @@ struct inet_connection_sock {
>>                  int               search_high;
>>                  int               search_low;
>>
>> +               int               search_high_sav;
>> +               int               search_low_sav;
>> +
>>                  /* Information on the current probe. */
>>                  int               probe_size;
>>          } icsk_mtup;
>
>
> What are these for?  They're assigned but not used.

It's used by the probe timer to restore original search range.
See patch3/3.

>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 8fdd27b..20b28e9 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -2613,7 +2613,10 @@ static void tcp_mtup_probe_success(struct sock *sk)
>>          tp->snd_cwnd_stamp = tcp_time_stamp;
>>          tp->snd_ssthresh = tcp_current_ssthresh(sk);
>>
>> -       icsk->icsk_mtup.search_low = icsk->icsk_mtup.probe_size;
>> +       if (icsk->icsk_mtup.search_low == icsk->icsk_mtup.probe_size)
>> +               icsk->icsk_mtup.search_low = icsk->icsk_mtup.search_high;
>> +       else
>> +               icsk->icsk_mtup.search_low = icsk->icsk_mtup.probe_size;
>>          icsk->icsk_mtup.probe_size = 0;
>>          tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
>>   }
>
> It would be cleaner to handle this in tcp_mtu_probe, in deciding
> whether to issue a probe, than to change the semantics of search_high
> and search_low.  Issuing a probe where probe_size == search_low seems
> like the wrong thing to do.
That's my original thoughts, the seconds thoughts is every BYTE in datacenter
cost money, so why not to get optimal performance by using every possible byte
available.

Anyway, a sysctl threshold will also do the job, will incorporate this in next version.

>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index a2a796c..0a60deb 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -1349,10 +1349,13 @@ void tcp_mtup_init(struct sock *sk)
>>          struct inet_connection_sock *icsk = inet_csk(sk);
>>          struct net *net = sock_net(sk);
>>
>> -       icsk->icsk_mtup.enabled = net->ipv4.sysctl_tcp_mtu_probing > 1;
>> +       icsk->icsk_mtup.enabled = net->ipv4.sysctl_tcp_mtu_probing;
>>          icsk->icsk_mtup.search_high = tp->rx_opt.mss_clamp + sizeof(struct tcphdr) +
>>                                 icsk->icsk_af_ops->net_header_len;
>>          icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, net->ipv4.sysctl_tcp_base_mss);
>> +
>> +       icsk->icsk_mtup.search_high_sav = icsk->icsk_mtup.search_high;
>> +       icsk->icsk_mtup.search_low_sav = icsk->icsk_mtup.search_low;
>>          icsk->icsk_mtup.probe_size = 0;
>>   }
>>   EXPORT_SYMBOL(tcp_mtup_init);
>
> You're changing the meaning of sysctl_tcp_mtu_probing.  I don't think
> that's what you want.  From Documentation/networking/ip-sysctl.txt:
>
> tcp_mtu_probing - INTEGER
> Controls TCP Packetization-Layer Path MTU Discovery.  Takes three
> values:
>   0 - Disabled
>   1 - Disabled by default, enabled when an ICMP black hole detected
>   2 - Always enabled, use initial MSS of tcp_base_mss.
yes, will honor the original enable theme.

>
>> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
>> index 0732b78..9d1cfe0 100644
>> --- a/net/ipv4/tcp_timer.c
>> +++ b/net/ipv4/tcp_timer.c
>> @@ -113,7 +113,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
>>                          struct tcp_sock *tp = tcp_sk(sk);
>>                          int mss;
>>
>> -                       mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
>> +                       mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low);
>>                          mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
>>                          mss = max(mss, 68 - tp->tcp_header_len);
>>                          icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
>
> Why did you change this?  I think this breaks black hole detection.
hmm, I misunderstood this part.
In case of pure black hole detection, lowering the current tcp mss instead of search_low,
will make more sense, as current tcp mss still got lost.

-                       mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
+                       /* try mss smaller than current mss */
+                       mss = tcp_current_mss(sk) >> 1;

Black hole seems more like a misconfiguration in administrative level on intermediate node,
rather than a stack issue, why keep shrinking mss to get packet through with poor performance?


> Thanks,
>    -John
>

  reply	other threads:[~2015-02-16  5:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-13  8:16 [PATCH net-next 0/3] Small fix for TCP PMTU Fan Du
2015-02-13  8:16 ` [PATCH net-next 1/3] ipv4: Raise tcp PMTU probe mss base size Fan Du
2015-02-13  9:49   ` yzhu1
2015-02-16  5:15     ` Fan Du
2015-02-13  8:16 ` [PATCH net-next 2/3] ipv4: Use binary search to choose tcp PMTU probe_size Fan Du
2015-02-13 17:52   ` John Heffner
2015-02-16  5:27     ` Fan Du [this message]
2015-02-16 23:59       ` John Heffner
2015-02-13  8:16 ` [PATCH net-next 3/3] ipv4: Create probe timer for tcp PMTU as per RFC4821 Fan Du
2015-02-13  9:59   ` Ying Xue
2015-02-16  5:28     ` Fan Du
2015-02-13 12:31   ` Eric Dumazet
2015-02-16  5:38     ` Fan Du
2015-02-16 12:19       ` Eric Dumazet
2015-02-26  3:49 ` [PATCHv2 net-next 0/4] Small fix for TCP PMTU Fan Du
2015-02-26  3:49   ` [PATCHv2 net-next 1/4] ipv4: Raise tcp PMTU probe mss base size Fan Du
2015-02-26  3:49   ` [PATCHv2 net-next 2/4] ipv4: Use binary search to choose tcp PMTU probe_size Fan Du
2015-02-27 22:17     ` David Miller
2015-02-26  3:49   ` [PATCHv2 net-next 3/4] ipv4: shrink current mss for tcp PMTU blackhole detection Fan Du
2015-02-26  3:49   ` [PATCHv2 net-next 4/4] ipv4: Create probe timer for tcp PMTU as per RFC4821 Fan Du
2015-02-26  4:19     ` Eric Dumazet
2015-02-26  6:24       ` Fan Du
2015-02-26 13:40   ` [PATCHv2 net-next 0/4] Small fix for TCP PMTU David Laight
2015-02-27  5:37     ` Fan Du

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54E17FA4.1000104@gmail.com \
    --to=fengyuleidian0615@gmail.com \
    --cc=davem@davemloft.net \
    --cc=fan.du@intel.com \
    --cc=johnwheffner@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).