From: Maxim Mikityanskiy <maximmi@nvidia.com>
To: "bjorn@kernel.org" <bjorn@kernel.org>,
"maciej.fijalkowski@intel.com" <maciej.fijalkowski@intel.com>,
"magnus.karlsson@intel.com" <magnus.karlsson@intel.com>
Cc: "daniel@iogearbox.net" <daniel@iogearbox.net>,
Saeed Mahameed <saeedm@nvidia.com>,
"alexandr.lobakin@intel.com" <alexandr.lobakin@intel.com>,
"ast@kernel.org" <ast@kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"kuba@kernel.org" <kuba@kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ
Date: Tue, 23 Aug 2022 09:49:43 +0000 [thread overview]
Message-ID: <93b8740b39267bc550a8f6e0077fb4772535c65e.camel@nvidia.com> (raw)
In-Reply-To: <f1eea2e9ca337e0c4e072bdd94a89859a4539c09.camel@nvidia.com>
Anyone from Intel? Maciej, Björn, Magnus?
Does anyone else have anything to say? IMO, calling XDP for the same
packet multiple times is a bug, we should agree on some sane solution.
On Thu, 2022-08-18 at 14:26 +0300, Maxim Mikityanskiy wrote:
> Hi Maciej,
>
> On Wed, 2022-04-13 at 17:30 +0200, Maciej Fijalkowski wrote:
> > v2:
> > - add likely for internal redirect return codes in ice and ixgbe
> > (Jesper)
> > - do not drop the buffer that head pointed to at full XSK RQ
> > (Maxim)
>
> I found an issue with this approach. If you don't drop that packet,
> you'll need to run the XDP program for the same packet again. If the
> XDP program is anything more complex than "redirect-everything-to-
> XSK",
> it will get confused, for example, if it tracks any state or counts
> anything.
>
> We can't check whether there is enough space in the XSK RX ring
> before
> running the XDP program, as we don't know in advance which XSK socket
> will be selected.
>
> We can't store bpf_redirect_info across NAPI calls to avoid running
> the
> XDP program second time, as bpf_redirect_info is protected by RCU and
> the assumption that the whole XDP_REDIRECT operation happens within
> one
> NAPI cycle.
>
> I see the following options:
>
> 1. Drop the packet when an overflow happens. The problem is that it
> will happen systematically, but it's still better than the old
> behavior
> (drop mulitple packets when an overflow happens and hog the CPU).
>
> 2. Make this feature opt-in. If the application opts in, it
> guarantees
> to take measures to handle duplicate packets in XDP properly.
>
> 3. Require the XDP program to indicate it supports being called
> multiple times for the same packet and pass a flag to it if it's a
> repeated run. Drop the packet in the driver if the XDP program
> doesn't
> indicate this support. The indication can be done similar to how it's
> implemented for XDP multi buffer.
>
> Thoughts? Other options?
>
> Thanks,
> Max
>
> > - terminate Rx loop only when need_wakeup feature is enabled
> > (Maxim)
> > - reword from 'stop softirq processing' to 'stop NAPI Rx
> > processing'
> > - s/ENXIO/EINVAL in mlx5 and stmmac's ndo_xsk_wakeup to keep it
> > consitent with Intel's drivers (Maxim)
> > - include Jesper's Acks
> >
> > Hi!
> >
> > This is a revival of Bjorn's idea [0] to break NAPI loop when XSK
> > Rx
> > queue gets full which in turn makes it impossible to keep on
> > successfully producing descriptors to XSK Rx ring. By breaking out
> > of
> > the driver side immediately we will give the user space opportunity
> > for
> > consuming descriptors from XSK Rx ring and therefore provide room
> > in the
> > ring so that HW Rx -> XSK Rx redirection can be done.
> >
> > Maxim asked and Jesper agreed on simplifying Bjorn's original API
> > used
> > for detecting the event of interest, so let's just simply check for
> > -ENOBUFS within Intel's ZC drivers after an attempt to redirect a
> > buffer
> > to XSK Rx. No real need for redirect API extension.
> >
> > One might ask why it is still relevant even after having proper
> > busy
> > poll support in place - here is the justification.
> >
> > For xdpsock that was:
> > - run for l2fwd scenario,
> > - app/driver processing took place on the same core in busy poll
> > with 2048 budget,
> > - HW ring sizes Tx 256, Rx 2048,
> >
> > this work improved throughput by 78% and reduced Rx queue full
> > statistic
> > bump by 99%.
> >
> > For testing ice, make sure that you have [1] present on your side.
> >
> > This set, besides the work described above, carries also
> > improvements
> > around return codes in various XSK paths and lastly a minor
> > optimization
> > for xskq_cons_has_entries(), a helper that might be used when XSK
> > Rx
> > batching would make it to the kernel.
> >
> > Link to v1 and discussion around it is at [2].
> >
> > Thanks!
> > MF
> >
> > [0]:
> > https://lore.kernel.org/bpf/20200904135332.60259-1-bjorn.topel@gmail.com/
> > [1]:
> > https://lore.kernel.org/netdev/20220317175727.340251-1-maciej.fijalkowski@intel.com/
> > [2]:
> > https://lore.kernel.org/bpf/5864171b-1e08-1b51-026e-5f09b8945076@nvidia.com/T/
> >
> > Björn Töpel (1):
> > xsk: improve xdp_do_redirect() error codes
> >
> > Maciej Fijalkowski (13):
> > xsk: diversify return codes in xsk_rcv_check()
> > ice: xsk: decorate ICE_XDP_REDIR with likely()
> > ixgbe: xsk: decorate IXGBE_XDP_REDIR with likely()
> > ice: xsk: terminate Rx side of NAPI when XSK Rx queue gets full
> > i40e: xsk: terminate Rx side of NAPI when XSK Rx queue gets full
> > ixgbe: xsk: terminate Rx side of NAPI when XSK Rx queue gets full
> > ice: xsk: diversify return values from xsk_wakeup call paths
> > i40e: xsk: diversify return values from xsk_wakeup call paths
> > ixgbe: xsk: diversify return values from xsk_wakeup call paths
> > mlx5: xsk: diversify return values from xsk_wakeup call paths
> > stmmac: xsk: diversify return values from xsk_wakeup call paths
> > ice: xsk: avoid refilling single Rx descriptors
> > xsk: drop ternary operator from xskq_cons_has_entries
> >
> > .../ethernet/intel/i40e/i40e_txrx_common.h | 1 +
> > drivers/net/ethernet/intel/i40e/i40e_xsk.c | 38 ++++++++-----
> > drivers/net/ethernet/intel/ice/ice_txrx.h | 1 +
> > drivers/net/ethernet/intel/ice/ice_xsk.c | 53 ++++++++++++---
> > ----
> > .../ethernet/intel/ixgbe/ixgbe_txrx_common.h | 1 +
> > drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 52 ++++++++++-----
> > ---
> > .../ethernet/mellanox/mlx5/core/en/xsk/tx.c | 2 +-
> > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +-
> > net/xdp/xsk.c | 4 +-
> > net/xdp/xsk_queue.h | 4 +-
> > 10 files changed, 99 insertions(+), 61 deletions(-)
> >
>
next prev parent reply other threads:[~2022-08-23 12:42 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 15:30 [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 01/14] xsk: improve xdp_do_redirect() error codes Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 02/14] xsk: diversify return codes in xsk_rcv_check() Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 03/14] ice: xsk: decorate ICE_XDP_REDIR with likely() Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 04/14] ixgbe: xsk: decorate IXGBE_XDP_REDIR " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 05/14] ice: xsk: terminate Rx side of NAPI when XSK Rx queue gets full Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 06/14] i40e: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 07/14] ixgbe: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 08/14] ice: xsk: diversify return values from xsk_wakeup call paths Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 09/14] i40e: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 10/14] ixgbe: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 11/14] mlx5: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 12/14] stmmac: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 13/14] ice: xsk: avoid refilling single Rx descriptors Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 14/14] xsk: drop ternary operator from xskq_cons_has_entries Maciej Fijalkowski
2022-04-15 19:20 ` [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ patchwork-bot+netdevbpf
2022-08-19 8:35 ` Maxim Mikityanskiy
[not found] ` <f1eea2e9ca337e0c4e072bdd94a89859a4539c09.camel@nvidia.com>
2022-08-23 9:49 ` Maxim Mikityanskiy [this message]
2022-08-23 11:24 ` Maciej Fijalkowski
2022-08-23 13:46 ` Maxim Mikityanskiy
2022-08-24 5:18 ` John Fastabend
2022-08-25 14:42 ` Maxim Mikityanskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=93b8740b39267bc550a8f6e0077fb4772535c65e.camel@nvidia.com \
--to=maximmi@nvidia.com \
--cc=alexandr.lobakin@intel.com \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox