From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <netdev-owner@vger.kernel.org>
Received: from mail-pl0-f45.google.com ([209.85.160.45]:46351 "EHLO
        mail-pl0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751274AbeBTS5q (ORCPT
        <rfc822;netdev@vger.kernel.org>); Tue, 20 Feb 2018 13:57:46 -0500
Received: by mail-pl0-f45.google.com with SMTP id x19so7920716plr.13
        for <netdev@vger.kernel.org>; Tue, 20 Feb 2018 10:57:45 -0800 (PST)
Message-ID: <1519153062.55655.24.camel@gmail.com>
Subject: Re: [PATCH net-next 0/6] tcp: remove non GSO code
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Oleksandr Natalenko <oleksandr@natalenko.name>,
        Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>,
        netdev <netdev@vger.kernel.org>,
        Neal Cardwell <ncardwell@google.com>,
        Yuchung Cheng <ycheng@google.com>,
        Soheil Hassas Yeganeh <soheil@google.com>
Date: Tue, 20 Feb 2018 10:57:42 -0800
In-Reply-To: <1519141172.55655.21.camel@gmail.com>
References: <20180219195652.242663-1-edumazet@google.com>
         <34197c670230376051d3830704f18e85@natalenko.name>
         <1519141172.55655.21.camel@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Tue, 2018-02-20 at 07:39 -0800, Eric Dumazet wrote:
> On Tue, 2018-02-20 at 10:32 +0100, Oleksandr Natalenko wrote:
> > Hi.
> > 
> > 19.02.2018 20:56, Eric Dumazet wrote:
> > > Switching TCP to GSO mode, relying on core networking layers
> > > to perform eventual adaptation for dumb devices was overdue.
> > > 
> > > 1) Most TCP developments are done with TSO in mind.
> > > 2) Less high-resolution timers needs to be armed for TCP-pacing
> > > 3) GSO can benefit of xmit_more hint
> > > 4) Receiver GRO is more effective (as if TSO was used for real on 
> > > sender)
> > >    -> less ACK packets and overhead.
> > > 5) Write queues have less overhead (one skb holds about 64KB of 
> > > payload)
> > > 6) SACK coalescing just works. (no payload in skb->head)
> > > 7) rtx rb-tree contains less packets, SACK is cheaper.
> > > 8) Removal of legacy code. Less maintenance hassles.
> > > 
> > > Note that I have left the sendpage/zerocopy paths, but they probably 
> > > can
> > > benefit from the same strategy.
> > > 
> > > Thanks to Oleksandr Natalenko for reporting a performance issue for
> > > BBR/fq_codel,
> > > which was the main reason I worked on this patch series.
> > 
> > Thanks for dealing with this that fast.
> > 
> > Does this mean that the option to optimise internal TCP pacing is still 
> > an open question?
> 
> It is not an optimization that is needed, but taking into account that
> highres timers can have latencies of ~2 usec or more.
> 
> When sending 64KB TSO packets, having extra 2 usec after every ~54 usec
> (at 10Gbit) has no big impact, since TCP computes a slightly inflated
> pacing rate anyway.
> 
> But when sending one MSS/packet every usec, this definitely can
> demonstrate a big slowdown.
> 
> But the anser is yes, I will take a look at this timer drift.

Actually timer drifts are not horrible (at least on my lab hosts)

But BBR has a pessimistic way to sense the burst size, as it is tied to
TSO/GSO being there.

Following patch helps a lot.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b2bca373f8bee35267df49b5947a6793fed71a12..6818042cd8a9a1778f54637861647091afd9a769 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1730,7 +1730,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 	 */
 	segs = max_t(u32, bytes / mss_now, min_tso_segs);
 
-	return min_t(u32, segs, sk->sk_gso_max_segs);
+	return segs;
 }
 EXPORT_SYMBOL(tcp_tso_autosize);
 
@@ -1742,9 +1742,10 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now)
 	const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
 	u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0;
 
-	return tso_segs ? :
-		tcp_tso_autosize(sk, mss_now,
-				 sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
+	if (!tso_segs)
+		tso_segs = tcp_tso_autosize(sk, mss_now,
+				sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
+	return min_t(u32, tso_segs, sk->sk_gso_max_segs);
 }
 
 /* Returns the portion of skb which can be sent right away */