public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Maxim Mikityanskiy <maximmi@nvidia.com>
To: "bjorn@kernel.org" <bjorn@kernel.org>,
	"maciej.fijalkowski@intel.com" <maciej.fijalkowski@intel.com>,
	"magnus.karlsson@intel.com" <magnus.karlsson@intel.com>
Cc: "daniel@iogearbox.net" <daniel@iogearbox.net>,
	Saeed Mahameed <saeedm@nvidia.com>,
	"alexandr.lobakin@intel.com" <alexandr.lobakin@intel.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ
Date: Tue, 23 Aug 2022 09:49:43 +0000	[thread overview]
Message-ID: <93b8740b39267bc550a8f6e0077fb4772535c65e.camel@nvidia.com> (raw)
In-Reply-To: <f1eea2e9ca337e0c4e072bdd94a89859a4539c09.camel@nvidia.com>

Anyone from Intel? Maciej, Björn, Magnus?

Does anyone else have anything to say? IMO, calling XDP for the same
packet multiple times is a bug, we should agree on some sane solution.

On Thu, 2022-08-18 at 14:26 +0300, Maxim Mikityanskiy wrote:
> Hi Maciej,
> 
> On Wed, 2022-04-13 at 17:30 +0200, Maciej Fijalkowski wrote:
> > v2:
> > - add likely for internal redirect return codes in ice and ixgbe
> >   (Jesper)
> > - do not drop the buffer that head pointed to at full XSK RQ
> > (Maxim)
> 
> I found an issue with this approach. If you don't drop that packet,
> you'll need to run the XDP program for the same packet again. If the
> XDP program is anything more complex than "redirect-everything-to-
> XSK",
> it will get confused, for example, if it tracks any state or counts
> anything.
> 
> We can't check whether there is enough space in the XSK RX ring
> before
> running the XDP program, as we don't know in advance which XSK socket
> will be selected.
> 
> We can't store bpf_redirect_info across NAPI calls to avoid running
> the
> XDP program second time, as bpf_redirect_info is protected by RCU and
> the assumption that the whole XDP_REDIRECT operation happens within
> one
> NAPI cycle.
> 
> I see the following options:
> 
> 1. Drop the packet when an overflow happens. The problem is that it
> will happen systematically, but it's still better than the old
> behavior
> (drop mulitple packets when an overflow happens and hog the CPU).
> 
> 2. Make this feature opt-in. If the application opts in, it
> guarantees
> to take measures to handle duplicate packets in XDP properly.
> 
> 3. Require the XDP program to indicate it supports being called
> multiple times for the same packet and pass a flag to it if it's a
> repeated run. Drop the packet in the driver if the XDP program
> doesn't
> indicate this support. The indication can be done similar to how it's
> implemented for XDP multi buffer.
> 
> Thoughts? Other options?
> 
> Thanks,
> Max
> 
> > - terminate Rx loop only when need_wakeup feature is enabled
> > (Maxim)
> > - reword from 'stop softirq processing' to 'stop NAPI Rx
> > processing'
> > - s/ENXIO/EINVAL in mlx5 and stmmac's ndo_xsk_wakeup to keep it
> >   consitent with Intel's drivers (Maxim)
> > - include Jesper's Acks
> > 
> > Hi!
> > 
> > This is a revival of Bjorn's idea [0] to break NAPI loop when XSK
> > Rx
> > queue gets full which in turn makes it impossible to keep on
> > successfully producing descriptors to XSK Rx ring. By breaking out
> > of
> > the driver side immediately we will give the user space opportunity
> > for
> > consuming descriptors from XSK Rx ring and therefore provide room
> > in the
> > ring so that HW Rx -> XSK Rx redirection can be done.
> > 
> > Maxim asked and Jesper agreed on simplifying Bjorn's original API
> > used
> > for detecting the event of interest, so let's just simply check for
> > -ENOBUFS within Intel's ZC drivers after an attempt to redirect a
> > buffer
> > to XSK Rx. No real need for redirect API extension.
> > 
> > One might ask why it is still relevant even after having proper
> > busy
> > poll support in place - here is the justification.
> > 
> > For xdpsock that was:
> > - run for l2fwd scenario,
> > - app/driver processing took place on the same core in busy poll
> >   with 2048 budget,
> > - HW ring sizes Tx 256, Rx 2048,
> > 
> > this work improved throughput by 78% and reduced Rx queue full
> > statistic
> > bump by 99%.
> > 
> > For testing ice, make sure that you have [1] present on your side.
> > 
> > This set, besides the work described above, carries also
> > improvements
> > around return codes in various XSK paths and lastly a minor
> > optimization
> > for xskq_cons_has_entries(), a helper that might be used when XSK
> > Rx
> > batching would make it to the kernel.
> > 
> > Link to v1 and discussion around it is at [2].
> > 
> > Thanks!
> > MF
> > 
> > [0]:
> > https://lore.kernel.org/bpf/20200904135332.60259-1-bjorn.topel@gmail.com/
> > [1]:
> > https://lore.kernel.org/netdev/20220317175727.340251-1-maciej.fijalkowski@intel.com/
> > [2]:
> > https://lore.kernel.org/bpf/5864171b-1e08-1b51-026e-5f09b8945076@nvidia.com/T/
> > 
> > Björn Töpel (1):
> >   xsk: improve xdp_do_redirect() error codes
> > 
> > Maciej Fijalkowski (13):
> >   xsk: diversify return codes in xsk_rcv_check()
> >   ice: xsk: decorate ICE_XDP_REDIR with likely()
> >   ixgbe: xsk: decorate IXGBE_XDP_REDIR with likely()
> >   ice: xsk: terminate Rx side of NAPI when XSK Rx queue gets full
> >   i40e: xsk: terminate Rx side of NAPI when XSK Rx queue gets full
> >   ixgbe: xsk: terminate Rx side of NAPI when XSK Rx queue gets full
> >   ice: xsk: diversify return values from xsk_wakeup call paths
> >   i40e: xsk: diversify return values from xsk_wakeup call paths
> >   ixgbe: xsk: diversify return values from xsk_wakeup call paths
> >   mlx5: xsk: diversify return values from xsk_wakeup call paths
> >   stmmac: xsk: diversify return values from xsk_wakeup call paths
> >   ice: xsk: avoid refilling single Rx descriptors
> >   xsk: drop ternary operator from xskq_cons_has_entries
> > 
> >  .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
> >  drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 38 ++++++++-----
> >  drivers/net/ethernet/intel/ice/ice_txrx.h     |  1 +
> >  drivers/net/ethernet/intel/ice/ice_xsk.c      | 53 ++++++++++++---
> > ----
> >  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  1 +
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 52 ++++++++++-----
> > ---
> >  .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  2 +-
> >  .../net/ethernet/stmicro/stmmac/stmmac_main.c |  4 +-
> >  net/xdp/xsk.c                                 |  4 +-
> >  net/xdp/xsk_queue.h                           |  4 +-
> >  10 files changed, 99 insertions(+), 61 deletions(-)
> > 
> 


  parent reply	other threads:[~2022-08-23 12:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 15:30 [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 01/14] xsk: improve xdp_do_redirect() error codes Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 02/14] xsk: diversify return codes in xsk_rcv_check() Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 03/14] ice: xsk: decorate ICE_XDP_REDIR with likely() Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 04/14] ixgbe: xsk: decorate IXGBE_XDP_REDIR " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 05/14] ice: xsk: terminate Rx side of NAPI when XSK Rx queue gets full Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 06/14] i40e: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 07/14] ixgbe: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 08/14] ice: xsk: diversify return values from xsk_wakeup call paths Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 09/14] i40e: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 10/14] ixgbe: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 11/14] mlx5: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 12/14] stmmac: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 13/14] ice: xsk: avoid refilling single Rx descriptors Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 14/14] xsk: drop ternary operator from xskq_cons_has_entries Maciej Fijalkowski
2022-04-15 19:20 ` [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ patchwork-bot+netdevbpf
2022-08-19  8:35 ` Maxim Mikityanskiy
     [not found] ` <f1eea2e9ca337e0c4e072bdd94a89859a4539c09.camel@nvidia.com>
2022-08-23  9:49   ` Maxim Mikityanskiy [this message]
2022-08-23 11:24     ` Maciej Fijalkowski
2022-08-23 13:46       ` Maxim Mikityanskiy
2022-08-24  5:18         ` John Fastabend
2022-08-25 14:42           ` Maxim Mikityanskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93b8740b39267bc550a8f6e0077fb4772535c65e.camel@nvidia.com \
    --to=maximmi@nvidia.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox