From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexei Starovoitov Subject: Re: [bpf-next V3 PATCH 4/4] xdp: change ndo_xdp_xmit API to support bulking Date: Tue, 15 May 2018 08:13:15 -0700 Message-ID: <20180515151314.tpflos2pxlvfc4dg@ast-mbp> References: <152638638695.9477.13781600009169577949.stgit@firesoul> <152638643041.9477.5642795713014271240.stgit@firesoul> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@vger.kernel.org, Daniel Borkmann , Christoph Hellwig , =?utf-8?B?QmrDtnJuVMO2cGVs?= , Magnus Karlsson , makita.toshiaki@lab.ntt.co.jp To: Jesper Dangaard Brouer Return-path: Received: from mail-pl0-f67.google.com ([209.85.160.67]:40184 "EHLO mail-pl0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752021AbeEOPNU (ORCPT ); Tue, 15 May 2018 11:13:20 -0400 Received: by mail-pl0-f67.google.com with SMTP id t12-v6so232703plo.7 for ; Tue, 15 May 2018 08:13:20 -0700 (PDT) Content-Disposition: inline In-Reply-To: <152638643041.9477.5642795713014271240.stgit@firesoul> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, May 15, 2018 at 02:13:50PM +0200, Jesper Dangaard Brouer wrote: > This patch change the API for ndo_xdp_xmit to support bulking > xdp_frames. > > When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. > Most of the slowdown is caused by DMA API indirect function calls, but > also the net_device->ndo_xdp_xmit() call. > > Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with > single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed > performance improved: > for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps > for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps > > With frames avail as a bulk inside the driver ndo_xdp_xmit call, > further optimizations are possible, like bulk DMA-mapping for TX. > > Testing without CONFIG_RETPOLINE show the same performance for > physical NIC drivers. > > The virtual NIC driver tun sees a huge performance boost, as it can > avoid doing per frame producer locking, but instead amortize the > locking cost over the bulk. > > V2: Fix compile errors reported by kbuild test robot > > Signed-off-by: Jesper Dangaard Brouer > --- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 +++++++--- > drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 - > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 ++++++-- > drivers/net/tun.c | 37 +++++++++----- > drivers/net/virtio_net.c | 66 +++++++++++++++++++------ > include/linux/netdevice.h | 14 +++-- > include/net/page_pool.h | 5 +- > include/net/xdp.h | 1 > include/trace/events/xdp.h | 10 ++-- > kernel/bpf/devmap.c | 33 ++++++++----- > net/core/filter.c | 4 +- > net/core/xdp.c | 20 ++++++-- > samples/bpf/xdp_monitor_kern.c | 10 ++++ > samples/bpf/xdp_monitor_user.c | 35 +++++++++++-- > 14 files changed, 206 insertions(+), 78 deletions(-) This patch has to be split into at least five: - bpf and net core piece - intel driver changes - tun/virtio changes - addition of tracepoints - addition to samples Putting changes from all over the areas into one patch makes it harder to review, bisect, ack, test, merge conflicts. Same issue with 3/4 as well. Please split it into two (core and samples).