From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6B16C10F13 for ; Tue, 16 Apr 2019 14:48:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B6326206BA for ; Tue, 16 Apr 2019 14:48:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729678AbfDPOsv convert rfc822-to-8bit (ORCPT ); Tue, 16 Apr 2019 10:48:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36528 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726641AbfDPOsu (ORCPT ); Tue, 16 Apr 2019 10:48:50 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BBFC1308624A; Tue, 16 Apr 2019 14:48:49 +0000 (UTC) Received: from carbon (ovpn-200-23.brq.redhat.com [10.40.200.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 79D7360C43; Tue, 16 Apr 2019 14:48:36 +0000 (UTC) Date: Tue, 16 Apr 2019 16:48:34 +0200 From: Jesper Dangaard Brouer To: "Jonathan Lemon" Cc: "=?UTF-8?B?QmrDtnJuIFTDtnBlbA==?=" , " =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?=" , ilias.apalodimas@linaro.org, toke@redhat.com, magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, "Jason Wang" , "Alexei Starovoitov" , "Daniel Borkmann" , "Jakub Kicinski" , "John Fastabend" , "David Miller" , "Andy Gospodarek" , netdev@vger.kernel.org, bpf@vger.kernel.org, "Thomas Graf" , "Thomas Monjalon" , brouer@redhat.com Subject: Re: Per-queue XDP programs, thoughts Message-ID: <20190416164834.2ce7e8ba@carbon> In-Reply-To: <467AEB5A-DE90-4460-84EF-AFA33A7D6CD1@gmail.com> References: <20190405131745.24727-1-bjorn.topel@gmail.com> <20190405131745.24727-2-bjorn.topel@gmail.com> <64259723-f0d8-8ade-467e-ad865add4908@intel.com> <20190415183258.36dcee9a@carbon> <467AEB5A-DE90-4460-84EF-AFA33A7D6CD1@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Tue, 16 Apr 2019 14:48:50 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Mon, 15 Apr 2019 10:58:07 -0700 "Jonathan Lemon" wrote: > On 15 Apr 2019, at 9:32, Jesper Dangaard Brouer wrote: > > > On Mon, 15 Apr 2019 13:59:03 +0200 Björn Töpel > > wrote: > > > >> Hi, > >> > >> As you probably can derive from the amount of time this is taking, > >> I'm > >> not really satisfied with the design of per-queue XDP program. (That, > >> plus I'm a terribly slow hacker... ;-)) I'll try to expand my > >> thinking > >> in this mail! > >> > >> Beware, it's kind of a long post, and it's all over the place. > > > > Cc'ing all the XDP-maintainers (and netdev). > > > >> There are a number of ways of setting up flows in the kernel, e.g. > >> > >> * Connecting/accepting a TCP socket (in-band) > >> * Using tc-flower (out-of-band) > >> * ethtool (out-of-band) > >> * ... > >> > >> The first acts on sockets, the second on netdevs. Then there's > >> ethtool > >> to configure RSS, and the RSS-on-steriods rxhash/ntuple that can > >> steer > >> to queues. Most users care about sockets and netdevices. Queues is > >> more of an implementation detail of Rx or for QoS on the Tx side. > > > > Let me first acknowledge that the current Linux tools to administrator > > HW filters is lacking (well sucks). We know the hardware is capable, > > as DPDK have an full API for this called rte_flow[1]. If nothing else > > you/we can use the DPDK API to create a program to configure the > > hardware, examples here[2] > > > > [1] https://doc.dpdk.org/guides/prog_guide/rte_flow.html > > [2] https://doc.dpdk.org/guides/howto/rte_flow.html > > > >> XDP is something that we can attach to a netdevice. Again, very > >> natural from a user perspective. As for XDP sockets, the current > >> mechanism is that we attach to an existing netdevice queue. Ideally > >> what we'd like is to *remove* the queue concept. A better approach > >> would be creating the socket and set it up -- but not binding it to a > >> queue. Instead just binding it to a netdevice (or crazier just > >> creating a socket without a netdevice). > > > > Let me just remind everybody that the AF_XDP performance gains comes > > from binding the resource, which allow for lock-free semantics, as > > explained here[3]. > > > > [3] > > https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP#where-does-af_xdp-performance-come-from > > > > > >> The socket is an endpoint, where I'd like data to end up (or get sent > >> from). If the kernel can attach the socket to a hardware queue, > >> there's zerocopy if not, copy-mode. Dito for Tx. > > > > Well XDP programs per RXQ is just a building block to achieve this. > > > > As Van Jacobson explain[4], sockets or applications "register" a > > "transport signature", and gets back a "channel". In our case, the > > netdev-global XDP program is our way to register/program these > > transport > > signatures and redirect (e.g. into the AF_XDP socket). > > This requires some work in software to parse and match transport > > signatures to sockets. The XDP programs per RXQ is a way to get > > hardware to perform this filtering for us. > > > > [4] http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf > > > > > >> Does a user (control plane) want/need to care about queues? Just > >> create a flow to a socket (out-of-band or inband) or to a netdevice > >> (out-of-band). > > > > A userspace "control-plane" program, could hide the setup and use what > > the system/hardware can provide of optimizations. VJ[4] e.g. suggest > > that the "listen" socket first register the transport signature (with > > the driver) on "accept()". If the HW supports DPDK-rte_flow API we > > can register a 5-tuple (or create TC-HW rules) and load our > > "transport-signature" XDP prog on the queue number we choose. If not, > > when our netdev-global XDP prog need a hash-table with 5-tuple and do > > 5-tuple parsing. > > > > Creating netdevices via HW filter into queues is an interesting idea. > > DPDK have an example here[5], on how to per flow (via ethtool filter > > setup even!) send packets to queues, that endup in SRIOV devices. > > > > [5] https://doc.dpdk.org/guides/howto/flow_bifurcation.html > > > > > >> Do we envison any other uses for per-queue XDP other than AF_XDP? If > >> not, it would make *more* sense to attach the XDP program to the > >> socket (e.g. if the endpoint would like to use kernel data structures > >> via XDP). > > > > As demonstrated in [5] you can use (ethtool) hardware filters to > > redirect packets into VFs (Virtual Functions). > > > > I also want us to extend XDP to allow for redirect from a PF (Physical > > Function) into a VF (Virtual Function). First the netdev-global > > XDP-prog need to support this (maybe extend xdp_rxq_info with PF + VF > > info). Next configure HW filter to queue# and load XDP prog on that > > queue# that only "redirect" to a single VF. Now if driver+HW supports > > it, it can "eliminate" the per-queue XDP-prog and do everything in HW. > > One thing I'd like to see is have RSS distribute incoming traffic > across a set of queues. The application would open a set of xsk's > which are bound to those queues. Yes. (Some) NIC hardware does support this RSS distribute incoming traffic across a set of queues. As you can see in [5] they have an example of this: testpmd> flow isolate 0 true testpmd> flow create 0 ingress pattern eth / ipv4 / udp / vxlan vni is 42 / end \ actions rss queues 0 1 2 3 end / end > I'm not seeing how a transport signature would achieve this. The > current tooling seems to treat the queue as the basic building block, > which seems generally appropriate. After creating N-queue that your RSS-hash distribute over, I imagine that you load your per-queue XDP program on each of these N-queues. I don't necessarily see a need for the kernel API to expose to userspace a API/facility to load an XDP-prog on N-queue in-one-go (you can just iterate over them). > Whittling things down (receiving packets only for a specific flow) > could be achieved by creating a queue which only contains those > packets which atched via some form of classification (or perhaps > steered to a VF device), aka [5] above. Exposing multiple queues > allows load distribution for those apps which care about it. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer