netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Ciara Loftus <ciara.loftus@intel.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org
Cc: brouer@redhat.com, ast@kernel.org, daniel@iogearbox.net,
	davem@davemloft.net, kuba@kernel.org, hawk@kernel.org,
	john.fastabend@gmail.com, toke@redhat.com, bjorn@kernel.org,
	magnus.karlsson@intel.com, jonathan.lemon@gmail.com,
	maciej.fijalkowski@intel.com
Subject: Re: [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx
Date: Tue, 16 Nov 2021 10:43:25 +0100	[thread overview]
Message-ID: <5a121fc4-fb6c-c70b-d674-9bf13c325b64@redhat.com> (raw)
In-Reply-To: <20211116073742.7941-1-ciara.loftus@intel.com>



On 16/11/2021 08.37, Ciara Loftus wrote:
> The common case for AF_XDP sockets (xsks) is creating a single xsk on a queue for sending and
> receiving frames as this is analogous to HW packet steering through RSS and other classification
> methods in the NIC. AF_XDP uses the xdp redirect infrastructure to direct packets to the socket. It
> was designed for the much more complicated case of DEVMAP xdp_redirects which directs traffic to
> another netdev and thus potentially another driver. In the xsk redirect case, by skipping the
> unnecessary parts of this common code we can significantly improve performance and pave the way
> for batching in the driver. This RFC proposes one such way to simplify the infrastructure which
> yields a 27% increase in throughput and a decrease in cycles per packet of 24 cycles [1]. The goal
> of this RFC is to start a discussion on how best to simplify the single-socket datapath while
> providing one method as an example.
> 
> Current approach:
> 1. XSK pointer: an xsk is created and a handle to the xsk is stored in the XSKMAP.
> 2. XDP program: bpf_redirect_map helper triggers the XSKMAP lookup which stores the result (handle
> to the xsk) and the map type (XSKMAP) in the percpu bpf_redirect_info struct. The XDP_REDIRECT
> action is returned.
> 3. XDP_REDIRECT handling called by the driver: the map type (XSKMAP) is read from the
> bpf_redirect_info which selects the xsk_map_redirect path. The xsk pointer is retrieved from the
> bpf_redirect_info and the XDP descriptor is pushed to the xsk's Rx ring. The socket is added to a
> list for flushing later.
> 4. xdp_do_flush: iterate through the lists of all maps that can be used for redirect (CPUMAP,
> DEVMAP and XSKMAP). When XSKMAP is flushed, go through all xsks that had any traffic redirected to
> them and bump the Rx ring head pointer(s).
> 
> For the end goal of submitting the descriptor to the Rx ring and bumping the head pointer of that
> ring, only some of these steps are needed. The rest is overhead. The bpf_redirect_map
> infrastructure is needed for all other redirect operations, but is not necessary when redirecting
> to a single AF_XDP socket. And similarly, flushing the list for every map type in step 4 is not
> necessary when only one socket needs to be flushed.
> 
> Proposed approach:
> 1. XSK pointer: an xsk is created and a handle to the xsk is stored both in the XSKMAP and also the
> netdev_rx_queue struct.
> 2. XDP program: new bpf_redirect_xsk helper returns XDP_REDIRECT_XSK.
> 3. XDP_REDIRECT_XSK handling called by the driver: the xsk pointer is retrieved from the
> netdev_rx_queue struct and the XDP descriptor is pushed to the xsk's Rx ring.
> 4. xsk_flush: fetch the handle from the netdev_rx_queue and flush the xsk.
> 
> This fast path is triggered on XDP_REDIRECT_XSK if:
>    (i) AF_XDP socket SW Rx ring configured
>   (ii) Exactly one xsk attached to the queue
> If any of these conditions are not met, fall back to the same behavior as the original approach:
> xdp_redirect_map. This is handled under-the-hood in the new bpf_xdp_redirect_xsk helper so the user
> does not need to be aware of these conditions.
> 
> Batching:
> With this new approach it is possible to optimize the driver by submitting a batch of descriptors
> to the Rx ring in Step 3 of the new approach by simply verifying that the action returned from
> every program run of each packet in a batch equals XDP_REDIRECT_XSK. That's because with this
> action we know the socket to redirect to will be the same for each packet in the batch. This is
> not possible with XDP_REDIRECT because the xsk pointer is stored in the bpf_redirect_info and not
> guaranteed to be the same for every packet in a batch.
> 
> [1] Performance:
> The benchmarks were performed on VM running a 2.4GHz Ice Lake host with an i40e device passed
> through. The xdpsock app was run on a single core with busy polling and configured in 'rxonly' mode.
> ./xdpsock -i <iface> -r -B
> The improvement in throughput when using the new bpf helper and XDP action was measured at ~13% for
> scalar processing, with reduction in cycles per packet of ~13. A further ~14% improvement in
> throughput and reduction of ~11 cycles per packet was measured when the batched i40e driver path
> was used, for a total improvement of ~27% in throughput and reduction of ~24 cycles per packet.
> 
> Other approaches considered:
> Two other approaches were considered. The advantage of both being that neither involved introducing
> a new XDP action. The first alternative approach considered was to create a new map type
> BPF_MAP_TYPE_XSKMAP_DIRECT. When the XDP_REDIRECT action was returned, this map type could be
> checked and used as an indicator to skip the map lookup and use the netdev_rx_queue xsk instead.
> The second approach considered was similar and involved using a new bpf_redirect_info flag which
> could be used in a similar fashion.
> While both approaches yielded a performance improvement they were measured at about half of what
> was measured for the approach outlined in this RFC. It seems using bpf_redirect_info is too
> expensive.

I think it was Bjørn that discovered that accessing the per CPU 
bpf_redirect_info struct have an overhead of approx 2 ns (times 2.4GHz 
~4.8 cycles).  Your reduction in cycles per packet was ~13, where ~4.8 
seem to be large.

The code access this_cpu_ptr(&bpf_redirect_info) two times.
One time in the BPF-helper redirect call and second in xdp_do_redirect.
(Hint xdp_redirect_map end-up calling __bpf_xdp_redirect_map)

Thus, it seems (as you say), the bpf_redirect_info approach is too 
expensive.  May be should look at storing bpf_redirect_info in a place 
that doesn't requires the this_cpu_ptr() lookup... or cache the lookup 
per NAPI cycle.
Have you tried this?


> Also, centralised processing of XDP actions was investigated. This would involve porting all drivers
> to a common interface for handling XDP actions which would greatly simplify the work involved in
> adding support for new XDP actions such as XDP_REDIRECT_XSK. However it was deemed at this point to
> be more complex than adding support for the new action to every driver. Should this series be
> considered worth pursuing for a proper patch set, the intention would be to update each driver
> individually.

I'm fine with adding a new helper, but I don't like introducing a new 
XDP_REDIRECT_XSK action, which requires updating ALL the drivers.

With XDP_REDIRECT infra we beleived we didn't need to add more 
XDP-action code to drivers, as we multiplex/add new features by 
extending the bpf_redirect_info.
In this extreme performance case, it seems the this_cpu_ptr "lookup" of 
bpf_redirect_info is the performance issue itself.

Could you experiement with different approaches that modify 
xdp_do_redirect() to handle if new helper bpf_redirect_xsk was called, 
prior to this_cpu_ptr() call.
(Thus, avoiding to introduce a new XDP-action).

> Thank you to Magnus Karlsson and Maciej Fijalkowski for several suggestions and insight provided.
> 
> TODO:
> * Add selftest(s)
> * Add support for all copy and zero copy drivers
> * Libxdp support
> 
> The series applies on commit e5043894b21f ("bpftool: Use libbpf_get_error() to check error")
> 
> Thanks,
> Ciara
> 
> Ciara Loftus (8):
>    xsk: add struct xdp_sock to netdev_rx_queue
>    bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action
>    xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush
>    i40e: handle the XDP_REDIRECT_XSK action
>    xsk: implement a batched version of xsk_rcv
>    i40e: isolate descriptor processing in separate function
>    i40e: introduce batched XDP rx descriptor processing
>    libbpf: use bpf_redirect_xsk in the default program
> 
>   drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  13 +-
>   .../ethernet/intel/i40e/i40e_txrx_common.h    |   1 +
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 285 +++++++++++++++---
>   include/linux/netdevice.h                     |   2 +
>   include/net/xdp_sock_drv.h                    |  49 +++
>   include/net/xsk_buff_pool.h                   |  22 ++
>   include/uapi/linux/bpf.h                      |  13 +
>   kernel/bpf/verifier.c                         |   7 +-
>   net/core/dev.c                                |  14 +
>   net/core/filter.c                             |  26 ++
>   net/xdp/xsk.c                                 |  69 ++++-
>   net/xdp/xsk_queue.h                           |  31 ++
>   tools/include/uapi/linux/bpf.h                |  13 +
>   tools/lib/bpf/xsk.c                           |  50 ++-
>   14 files changed, 551 insertions(+), 44 deletions(-)
> 


  parent reply	other threads:[~2021-11-16  9:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16  7:37 [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 1/8] xsk: add struct xdp_sock to netdev_rx_queue Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 2/8] bpf: add bpf_redirect_xsk helper and XDP_REDIRECT_XSK action Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 3/8] xsk: handle XDP_REDIRECT_XSK and expose xsk_rcv/flush Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 4/8] i40e: handle the XDP_REDIRECT_XSK action Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 5/8] xsk: implement a batched version of xsk_rcv Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 6/8] i40e: isolate descriptor processing in separate function Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 7/8] i40e: introduce batched XDP rx descriptor processing Ciara Loftus
2021-11-16  7:37 ` [RFC PATCH bpf-next 8/8] libbpf: use bpf_redirect_xsk in the default program Ciara Loftus
2021-11-16  9:43 ` Jesper Dangaard Brouer [this message]
2021-11-16 16:29   ` [RFC PATCH bpf-next 0/8] XDP_REDIRECT_XSK and Batched AF_XDP Rx Loftus, Ciara
2021-11-17 14:24     ` Toke Høiland-Jørgensen
2021-11-19 15:48       ` Loftus, Ciara
2021-11-22 14:38         ` Toke Høiland-Jørgensen
2021-11-18  2:54 ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a121fc4-fb6c-c70b-d674-9bf13c325b64@redhat.com \
    --to=jbrouer@redhat.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=ciara.loftus@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).