From: Jakub Kicinski <kuba@kernel.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Michael Chan <michael.chan@broadcom.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
"David S. Miller" <davem@davemloft.net>,
Boqun Feng <boqun.feng@gmail.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Eric Dumazet <edumazet@google.com>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@redhat.com>, Paolo Abeni <pabeni@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH net-next 00/24] locking: Introduce nested-BH locking.
Date: Mon, 18 Dec 2023 16:41:42 -0800 [thread overview]
Message-ID: <20231218164142.0b10e29d@kernel.org> (raw)
In-Reply-To: <20231218172331.fp-nC_Nr@linutronix.de>
On Mon, 18 Dec 2023 18:23:31 +0100 Sebastian Andrzej Siewior wrote:
> On 2023-12-15 14:50:59 [-0800], Jakub Kicinski wrote:
> > On Fri, 15 Dec 2023 18:07:19 +0100 Sebastian Andrzej Siewior wrote:
> > > The proposed way out is to introduce explicit per-CPU locks for
> > > resources which are protected by local_bh_disable() and use those only
> > > on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds.
> >
> > As I said at LPC, complicating drivers with odd locking constructs
> > is a no go for me.
>
> I misunderstood it then as I assumed you wanted to ease the work while I
> was done which every driver after (hopefully) understanding what is
> possible/ needed and what not. We do speak here about 15++?
My main concern is that it takes the complexity of writing network
device drivers to a next level. It's already hard enough to implement
XDP correctly. "local lock" and "guard"? Too complicated :(
Or "unmaintainable" as in "too much maintainer's time will be spent
reviewing code that gets this wrong".
> Now. The pattern is usually
> | act = bpf_prog_run_xdp(xdp_prog, &xdp);
> | switch (act) {
> | case XDP_REDIRECT:
> | ret = xdp_do_redirect(netdev, &xdp, xdp_prog)))
> | if (ret)
> | goto XDP_ABORTED;
> | xdp_redir++ or so;
>
> so we might be able to turn this into something that covers both and
> returns either XDP_REDIRECT or XDP_ABORTED. So this could be merged
> into
>
> | u32 bpf_prog_run_xdp_and_redirect(struct net_device *dev, const struct
> | bpf_prog *prog, struct xdp_buff *xdp)
> | {
> | u32 act;
> | int ret;
> |
> | act = bpf_prog_run_xdp(prog, xdp);
> | if (act == XDP_REDIRECT) {
> | ret = xdp_do_redirect(netdev, xdp, prog);
> | if (ret < 0)
> | act = XDP_ABORTED;
> | }
> | return act;
> | }
If we could fold the DROP case into this -- even better!
> so the lock can be put inside the function and all drivers use this
> function.
>
> From looking through drivers/net/ethernet/, this should work for most
> drivers:
> - amazon/ena
> - aquantia/atlantic
> - engleder/tsnep
> - freescale/enetc
> - freescale/fec
> - intel/igb
> - intel/igc
> - marvell/mvneta
> - marvell/mvpp2
> - marvell/octeontx2
> - mediatek/mtk
> - mellanox/mlx5
> - microchip/lan966x
> - microsoft/mana
> - netronome/nfp (two call paths with no support XDP_REDIRECT)
> - sfc/rx
> - sfc/siena (that offset pointer can be moved)
> - socionext/netsec
> - stmicro/stmmac
>
> A few do something custom/ additionally between bpf_prog_run_xdp() and
> xdp_do_redirect():
>
> - broadcom/bnxt
> calculates length, offset, data pointer. DMA unmaps + memory
> allocations before redirect.
Just looked at this one. The recalculation is probably for the PASS /
TX cases, REDIRECT / DROP shouldn't care. The DMA unmap looks like
a bug (hi, Michael!)
> - freescale/dpaa2
> - freescale/dpaa
> sets xdp.data_hard_start + frame_sz, unmaps DMA.
>
> - fungible/funeth
> conditional redirect.
>
> - google/gve
> Allocates a new packet for redirect.
>
> - intel/ixgbe
> - intel/i40e
> - intel/ice
> Failure in the ZC case is different from XDP_ABORTED, depends on the
> error from xdp_do_redirect())
>
> - mellanox/mlx4/
> calculates page_offset.
>
> - qlogic/qede
> DMA unmap and buffer alloc.
>
> - ti/cpsw_priv
> recalculates length (pointer).
>
> and a few more don't support XDP_REDIRECT:
>
> - cavium/thunder
> does not support XDP_REDIRECT, calculates length, offset.
>
> - intel/ixgbevf
> does not support XDP_REDIRECT
>
> I don't understand why some driver need to recalculate data_hard_start,
> length and so on and others don't. This might be only needed for the
> XDP_TX case or not needed…
> Also I'm not sure about the dma unmaps and skb allocations. The new skb
> allocation can be probably handled before running the bpf prog but then
> in the XDP_PASS case it is a waste…
> And the DMA unmaps. Only a few seem to need it. Maybe it can be done
> before running the BPF program. After all the bpf may look into the skb.
>
>
> If that is no go, then the only thing that comes to mind is (as you
> mentioned on LPC) to acquire the lock in bpf_prog_run_xdp() and drop it
> in xdp_do_redirect(). This would require that every driver invokes
> xdp_do_redirect() even not if it is not supporting it (by setting netdev
> to NULL or so).
To make progress on other parts of the stack we could also take
the local lock around all of napi->poll() for now..
next prev parent reply other threads:[~2023-12-19 0:41 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-15 17:07 [PATCH net-next 00/24] locking: Introduce nested-BH locking Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 01/24] locking/local_lock: Introduce guard definition for local_lock Sebastian Andrzej Siewior
2023-12-18 8:16 ` Paolo Abeni
2024-01-11 16:19 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 02/24] locking/local_lock: Add local nested BH locking infrastructure Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 03/24] net: Use __napi_alloc_frag_align() instead of open coding it Sebastian Andrzej Siewior
2023-12-18 7:48 ` Paolo Abeni
2024-01-12 9:01 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 04/24] net: Use nested-BH locking for napi_alloc_cache Sebastian Andrzej Siewior
2023-12-16 4:43 ` kernel test robot
2024-01-12 10:58 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 05/24] net/tcp_sigpool: Use nested-BH locking for sigpool_scratch Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 06/24] net/ipv4: Use nested-BH locking for ipv4_tcp_sk Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 07/24] netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 08/24] net: softnet_data: Make xmit.recursion per task Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 09/24] dev: Use the RPS lock for softnet_data::input_pkt_queue on PREEMPT_RT Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 10/24] dev: Use nested-BH locking for softnet_data.process_queue Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 11/24] lwt: Don't disable migration prio invoking BPF Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 12/24] seg6: Use nested-BH locking for seg6_bpf_srh_states Sebastian Andrzej Siewior
2023-12-16 3:39 ` kernel test robot
2023-12-18 8:33 ` Paolo Abeni
2024-01-12 11:23 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 13/24] net: Use nested-BH locking for bpf_scratchpad Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 14/24] net: Add a lock which held during the redirect process Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect Sebastian Andrzej Siewior
2023-12-16 4:12 ` kernel test robot
2023-12-20 0:25 ` Alexei Starovoitov
2024-01-04 19:29 ` Toke Høiland-Jørgensen
2024-01-12 17:41 ` Sebastian Andrzej Siewior
2024-01-17 16:37 ` Toke Høiland-Jørgensen
2024-01-18 2:04 ` Jakub Kicinski
2024-01-18 8:27 ` Sebastian Andrzej Siewior
2024-01-18 16:38 ` Jakub Kicinski
2024-01-18 16:50 ` Sebastian Andrzej Siewior
2024-01-18 11:51 ` Toke Høiland-Jørgensen
2024-01-18 16:37 ` Jakub Kicinski
2024-01-20 14:41 ` Toke Høiland-Jørgensen
2024-01-18 7:35 ` Sebastian Andrzej Siewior
2024-01-18 11:58 ` Toke Høiland-Jørgensen
2023-12-15 17:07 ` [PATCH net-next 16/24] net: netkit, veth, tun, virt*: " Sebastian Andrzej Siewior
2023-12-16 19:28 ` kernel test robot
2023-12-18 8:52 ` Daniel Borkmann
2024-01-12 15:37 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 17/24] net: amazon, aquanti, broadcom, cavium, engleder: " Sebastian Andrzej Siewior
2023-12-16 22:09 ` Kiyanovski, Arthur
2024-01-12 17:53 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 18/24] net: Freescale: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 19/24] net: fungible, gve, mtk, microchip, mana: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 20/24] net: intel: " Sebastian Andrzej Siewior
2023-12-16 4:53 ` kernel test robot
2023-12-19 0:01 ` Nathan Chancellor
2023-12-19 16:55 ` Nick Desaulniers
2023-12-15 17:07 ` [PATCH net-next 21/24] net: marvell: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 22/24] net: mellanox, nfp, sfc: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 23/24] net: qlogic, socionext, stmmac, cpsw: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 24/24] net: bpf: Add lockdep assert for the redirect process Sebastian Andrzej Siewior
2023-12-15 22:50 ` [PATCH net-next 00/24] locking: Introduce nested-BH locking Jakub Kicinski
2023-12-18 17:23 ` Sebastian Andrzej Siewior
2023-12-19 0:41 ` Jakub Kicinski [this message]
2023-12-21 20:46 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231218164142.0b10e29d@kernel.org \
--to=kuba@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=michael.chan@broadcom.com \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).