From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: [RFC PATCH 00/14] Introducing AF_PACKET V4 support (AF_XDP or AF_CHANNEL?) Date: Thu, 16 Nov 2017 09:00:23 +0100 Message-ID: <20171116090023.27860207@redhat.com> References: <20171031124145.9667-1-bjorn.topel@gmail.com> <20171114181922.6e2c0d9f@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Cc: "Karlsson, Magnus" , "Duyck, Alexander H" , Alexander Duyck , John Fastabend , Alexei Starovoitov , michael.lundkvist@ericsson.com, ravineet.singh@ericsson.com, Daniel Borkmann , Netdev , Willem de Bruijn , Tushar Dave , eric.dumazet@gmail.com, =?UTF-8?B?QmrDtnJu?= =?UTF-8?B?IFTDtnBlbA==?= , jesse.brandeburg@intel.com, anjali.singhai@intel.com, rami.rosen@intel.com, jeffrey.b.shaw@intel.com, ferruh.yigit@intel.com, qi.z.zhang@intel.com, davem@davemloft.net, brouer@redhat.com To: =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= Return-path: Received: from mx1.redhat.com ([209.132.183.28]:42592 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933564AbdKPIAf (ORCPT ); Thu, 16 Nov 2017 03:00:35 -0500 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 14 Nov 2017 20:01:01 +0100 Björn Töpel wrote: > 2017-11-14 18:19 GMT+01:00 Jesper Dangaard Brouer : > > > > On Mon, 13 Nov 2017 22:07:47 +0900 Björn Töpel wrote: > > > >> I'll summarize the major points, that we'll address in the next RFC > >> below. > >> > >> * Instead of extending AF_PACKET with yet another version, introduce a > >> new address/packet family. As for naming had some name suggestions: > >> AF_CAPTURE, AF_CHANNEL, AF_XDP and AF_ZEROCOPY. We'll go for > >> AF_ZEROCOPY, unless there're no strong opinions against it. > > > > I mostly like AF_CHANNEL and AF_XDP. I do know XDP is/have-evolved-into > > a kernel-side facility, that moves XDP-frames/packets _inside_ the > > kernel. > > > > *BUT* I've always imagined, that we would create a "channel" to > > userspace. By using XDP_REDIRECT to choose what frames get redirected > > into which userspace "channel" (new channel-map type). Userspace > > pre-allocate and register memory/pages exactly like this patchset. > > > > [Step-1]: (non-ZC) XDP_REDIRECT need to copy frame-data into userspace > > memory pages. And update your packet_array etc. (Use map-flush to get > > RX bulking). > > > > [Step 2]: (ZC) Userspace call driver NDO to register pages. The > > XDP_REDIRECT action happens in driver, and can have knowledge about > > RX-ring. It can know if this RX-ring is Zero-Copy enabled and can skip > > the copy-step. > > > > Jesper, I *really* like this approach -- especially the fact that the > existing XDP path in the drivers can be reused. I'll spend some time > dissecting the details of your suggestion. I'm very happy that you like this approach :-) > >> * No explicit zerocopy enablement. Use the zeropcopy path if > >> supported, if not -- fallback to the skb path, for netdevs that > >> don't support the required ndos. > > > > When driver does not support NDO in above model. I think, that there > > will still be a significant performance boost for the non-ZC variant. > > Even-though we need a copy-operation, because there are no memory > > allocations. As userspace have preallocated and registered pages with > > the kernel (and mem-limits are implicit via mem-size reg by userspace). > > > > Yup, and we're not paying for the whole skb creation, given that we > execute from XDP_DRV and not XDP_SKB. Yes, exactly. Avoiding the SKB allocation for non-ZC mode will be a significant saving. As your benchmarks showed, the AF_PACKET-V4 approach for non-ZC mode does not give you/us any real performance improvement. This approach would. > >> * Do not introduce a new XDP action XDP_PASS_TO_KERNEL, instead use > >> XDP redirect map call with ingress flag. > > > > In above model, XDP_REDIRECT is used for filtering into a userspace > > "channel". If ZC gets enabled on a RX-ring queue, then XDP_PASS have > > to do a copy (RX-ring knowledge is avail), like you describe with > > XDP_PASS_TO_KERNEL. > > > > Again, this fits nicely in. > > >> * Extend the XDP redirect to support explicit allocator/destructor > >> functions. Right now, XDP redirect assumes that the page allocator > >> was used, and the XDP redirect cleanup path is decreasing the page > >> count of the XDP buffer. This assumption breaks for the zerocopy > >> case. > > > > Yes, please. If XDP_REDIRECT get call a destructor call-back, then we > > can allow XDP_REDIRECT out another net_device, even-when ZC is enabled > > on a RX-ring queue. I will (of-cause) be eager to test and benchmark this approach, as I have high hopes a performance boost even for non-ZC. I know an AF_XDP approach is a lot of work, but I would like to offer to help-out in anyway I can. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer