From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jesper Dangaard Brouer <brouer@redhat.com>
Subject: Re: [net-next PATCH] net: ipv4: fix listify ip_rcv_finish in case
 of forwarding
Date: Fri, 13 Jul 2018 13:08:40 +0200
Message-ID: <20180713130840.1b6b78ea@redhat.com>
References: <153132125549.13161.16380200872856218805.stgit@firesoul>
        <7c5605ed2fe9505b982fde312d8416bd7fbbe6af.camel@mellanox.com>
        <20180711220649.266b071a@redhat.com>
        <CAJ3xEMiUbjSM4-EZkm-_b+EtAS4NZOA+Whcv3V3=UT+CwmOvQA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: Edward Cree <ecree@solarflare.com>,
        Saeed Mahameed <saeedm@mellanox.com>,
        "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
        brouer@redhat.com
To: Or Gerlitz <gerlitz.or@gmail.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mx3-rdu2.redhat.com ([66.187.233.73]:54120 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1727132AbeGMLW6 (ORCPT <rfc822;netdev@vger.kernel.org>);
        Fri, 13 Jul 2018 07:22:58 -0400
In-Reply-To: <CAJ3xEMiUbjSM4-EZkm-_b+EtAS4NZOA+Whcv3V3=UT+CwmOvQA@mail.gmail.com>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Thu, 12 Jul 2018 23:10:28 +0300 Or Gerlitz <gerlitz.or@gmail.com> wrote:

> On Wed, Jul 11, 2018 at 11:06 PM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> 
> > Well, I would prefer you to implement those.  I just did a quick
> > implementation (its trivially easy) so I have something to benchmark
> > with.  The performance boost is quite impressive!  
> 
> sounds good, but wait
> 
> 
> > One reason I didn't "just" send a patch, is that Edward so-fare only
> > implemented netif_receive_skb_list() and not napi_gro_receive_list().  
> 
> sfc does't support gro?! doesn't make sense.. Edward?
> 
> > And your driver uses napi_gro_receive().  This sort-of disables GRO for
> > your driver, which is not a choice I can make.  Interestingly I get
> > around the same netperf TCP_STREAM performance.  
> 
> Same TCP performance

I said around the same... I'll redo the benchmarks and verify...
(did it.. see later).

> with GRO and no rx-batching
> 
> or
> 
> without GRO and yes rx-batching

Yes, obviously without GRO and yes rx-batching.


> is by far not intuitive result to me unless both these techniques
> mostly serve to eliminate lots of instruction cache misses and the
> TCP stack is so much optimized that if the code is in the cache,
> going through it once with 64K byte GRO-ed packet is like going
> through it ~40 (64K/1500) times with non GRO-ed packets.

Actually the GRO code path is actually rather expensive, and uses a lot
of indirect-calls.  If you have an UDP workload, then disable-GRO will
give you a 10-15% performance boost.

Edward's changes are basically a generalized version of GRO, up-to the
IP layer (ip_rcv).  So, for me it makes perfect sense.  

> What's the baseline (with GRO and no rx-batching) number on your setup?

Okay, redoing the benchmarks...

Implemented a code hack so I runtime can control if mlx5 driver uses
napi_gro_receive() or netif_receive_skb_list() (abusing a netdev ethtool
controlled feature flag no-in-use).

To get a quick test going with feedback every 3 sec I use:

 $ netperf -t TCP_STREAM -H 198.18.1.1 -D3 -l 60000 -T 4,4


Default: using napi_gro_receive() with GRO enabled:

 Interim result: 25995.28 10^6bits/s over 3.000 seconds

Disable GRO but still use napi_gro_receive():

 Interim result: 21980.45 10^6bits/s over 3.001 seconds

Make driver use netif_receive_skb_list():

 Interim result: 25490.67 10^6bits/s over 3.002 seconds

As you can see, using netif_receive_skb_list() have a huge performance
boost over disabled-GRO.  And it comes very close to the performance
of enabled-GRO. Which is rather impressive! :-)

Notice, even more impressively; these tests are without CONFIG_RETPOLINE.
We primarily merged netif_receive_skb_list() due to the overhead of
RETPOLINEs, but we even see a benefit when not using RETPOLINEs.


> > I assume we can get even better perf if we "listify" napi_gro_receive.  
> 
> yeah, that would be very interesting to get there

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer