From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: [net-next PATCH] net: ipv4: fix listify ip_rcv_finish in case of forwarding Date: Fri, 13 Jul 2018 13:08:40 +0200 Message-ID: <20180713130840.1b6b78ea@redhat.com> References: <153132125549.13161.16380200872856218805.stgit@firesoul> <7c5605ed2fe9505b982fde312d8416bd7fbbe6af.camel@mellanox.com> <20180711220649.266b071a@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Edward Cree , Saeed Mahameed , "netdev@vger.kernel.org" , brouer@redhat.com To: Or Gerlitz Return-path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:54120 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727132AbeGMLW6 (ORCPT ); Fri, 13 Jul 2018 07:22:58 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 12 Jul 2018 23:10:28 +0300 Or Gerlitz wrote: > On Wed, Jul 11, 2018 at 11:06 PM, Jesper Dangaard Brouer > wrote: > > > Well, I would prefer you to implement those. I just did a quick > > implementation (its trivially easy) so I have something to benchmark > > with. The performance boost is quite impressive! > > sounds good, but wait > > > > One reason I didn't "just" send a patch, is that Edward so-fare only > > implemented netif_receive_skb_list() and not napi_gro_receive_list(). > > sfc does't support gro?! doesn't make sense.. Edward? > > > And your driver uses napi_gro_receive(). This sort-of disables GRO for > > your driver, which is not a choice I can make. Interestingly I get > > around the same netperf TCP_STREAM performance. > > Same TCP performance I said around the same... I'll redo the benchmarks and verify... (did it.. see later). > with GRO and no rx-batching > > or > > without GRO and yes rx-batching Yes, obviously without GRO and yes rx-batching. > is by far not intuitive result to me unless both these techniques > mostly serve to eliminate lots of instruction cache misses and the > TCP stack is so much optimized that if the code is in the cache, > going through it once with 64K byte GRO-ed packet is like going > through it ~40 (64K/1500) times with non GRO-ed packets. Actually the GRO code path is actually rather expensive, and uses a lot of indirect-calls. If you have an UDP workload, then disable-GRO will give you a 10-15% performance boost. Edward's changes are basically a generalized version of GRO, up-to the IP layer (ip_rcv). So, for me it makes perfect sense. > What's the baseline (with GRO and no rx-batching) number on your setup? Okay, redoing the benchmarks... Implemented a code hack so I runtime can control if mlx5 driver uses napi_gro_receive() or netif_receive_skb_list() (abusing a netdev ethtool controlled feature flag no-in-use). To get a quick test going with feedback every 3 sec I use: $ netperf -t TCP_STREAM -H 198.18.1.1 -D3 -l 60000 -T 4,4 Default: using napi_gro_receive() with GRO enabled: Interim result: 25995.28 10^6bits/s over 3.000 seconds Disable GRO but still use napi_gro_receive(): Interim result: 21980.45 10^6bits/s over 3.001 seconds Make driver use netif_receive_skb_list(): Interim result: 25490.67 10^6bits/s over 3.002 seconds As you can see, using netif_receive_skb_list() have a huge performance boost over disabled-GRO. And it comes very close to the performance of enabled-GRO. Which is rather impressive! :-) Notice, even more impressively; these tests are without CONFIG_RETPOLINE. We primarily merged netif_receive_skb_list() due to the overhead of RETPOLINEs, but we even see a benefit when not using RETPOLINEs. > > I assume we can get even better perf if we "listify" napi_gro_receive. > > yeah, that would be very interesting to get there -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer