From: Jakub Kicinski <kuba@kernel.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Michael Chan <michael.chan@broadcom.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
"David S. Miller" <davem@davemloft.net>,
Boqun Feng <boqun.feng@gmail.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Eric Dumazet <edumazet@google.com>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@redhat.com>, Paolo Abeni <pabeni@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH net-next 00/24] locking: Introduce nested-BH locking.
Date: Mon, 18 Dec 2023 16:41:42 -0800 [thread overview]
Message-ID: <20231218164142.0b10e29d@kernel.org> (raw)
In-Reply-To: <20231218172331.fp-nC_Nr@linutronix.de>
On Mon, 18 Dec 2023 18:23:31 +0100 Sebastian Andrzej Siewior wrote:
> On 2023-12-15 14:50:59 [-0800], Jakub Kicinski wrote:
> > On Fri, 15 Dec 2023 18:07:19 +0100 Sebastian Andrzej Siewior wrote:
> > > The proposed way out is to introduce explicit per-CPU locks for
> > > resources which are protected by local_bh_disable() and use those only
> > > on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds.
> >
> > As I said at LPC, complicating drivers with odd locking constructs
> > is a no go for me.
>
> I misunderstood it then as I assumed you wanted to ease the work while I
> was done which every driver after (hopefully) understanding what is
> possible/ needed and what not. We do speak here about 15++?
My main concern is that it takes the complexity of writing network
device drivers to a next level. It's already hard enough to implement
XDP correctly. "local lock" and "guard"? Too complicated :(
Or "unmaintainable" as in "too much maintainer's time will be spent
reviewing code that gets this wrong".
> Now. The pattern is usually
> | act = bpf_prog_run_xdp(xdp_prog, &xdp);
> | switch (act) {
> | case XDP_REDIRECT:
> | ret = xdp_do_redirect(netdev, &xdp, xdp_prog)))
> | if (ret)
> | goto XDP_ABORTED;
> | xdp_redir++ or so;
>
> so we might be able to turn this into something that covers both and
> returns either XDP_REDIRECT or XDP_ABORTED. So this could be merged
> into
>
> | u32 bpf_prog_run_xdp_and_redirect(struct net_device *dev, const struct
> | bpf_prog *prog, struct xdp_buff *xdp)
> | {
> | u32 act;
> | int ret;
> |
> | act = bpf_prog_run_xdp(prog, xdp);
> | if (act == XDP_REDIRECT) {
> | ret = xdp_do_redirect(netdev, xdp, prog);
> | if (ret < 0)
> | act = XDP_ABORTED;
> | }
> | return act;
> | }
If we could fold the DROP case into this -- even better!
> so the lock can be put inside the function and all drivers use this
> function.
>
> From looking through drivers/net/ethernet/, this should work for most
> drivers:
> - amazon/ena
> - aquantia/atlantic
> - engleder/tsnep
> - freescale/enetc
> - freescale/fec
> - intel/igb
> - intel/igc
> - marvell/mvneta
> - marvell/mvpp2
> - marvell/octeontx2
> - mediatek/mtk
> - mellanox/mlx5
> - microchip/lan966x
> - microsoft/mana
> - netronome/nfp (two call paths with no support XDP_REDIRECT)
> - sfc/rx
> - sfc/siena (that offset pointer can be moved)
> - socionext/netsec
> - stmicro/stmmac
>
> A few do something custom/ additionally between bpf_prog_run_xdp() and
> xdp_do_redirect():
>
> - broadcom/bnxt
> calculates length, offset, data pointer. DMA unmaps + memory
> allocations before redirect.
Just looked at this one. The recalculation is probably for the PASS /
TX cases, REDIRECT / DROP shouldn't care. The DMA unmap looks like
a bug (hi, Michael!)
> - freescale/dpaa2
> - freescale/dpaa
> sets xdp.data_hard_start + frame_sz, unmaps DMA.
>
> - fungible/funeth
> conditional redirect.
>
> - google/gve
> Allocates a new packet for redirect.
>
> - intel/ixgbe
> - intel/i40e
> - intel/ice
> Failure in the ZC case is different from XDP_ABORTED, depends on the
> error from xdp_do_redirect())
>
> - mellanox/mlx4/
> calculates page_offset.
>
> - qlogic/qede
> DMA unmap and buffer alloc.
>
> - ti/cpsw_priv
> recalculates length (pointer).
>
> and a few more don't support XDP_REDIRECT:
>
> - cavium/thunder
> does not support XDP_REDIRECT, calculates length, offset.
>
> - intel/ixgbevf
> does not support XDP_REDIRECT
>
> I don't understand why some driver need to recalculate data_hard_start,
> length and so on and others don't. This might be only needed for the
> XDP_TX case or not needed…
> Also I'm not sure about the dma unmaps and skb allocations. The new skb
> allocation can be probably handled before running the bpf prog but then
> in the XDP_PASS case it is a waste…
> And the DMA unmaps. Only a few seem to need it. Maybe it can be done
> before running the BPF program. After all the bpf may look into the skb.
>
>
> If that is no go, then the only thing that comes to mind is (as you
> mentioned on LPC) to acquire the lock in bpf_prog_run_xdp() and drop it
> in xdp_do_redirect(). This would require that every driver invokes
> xdp_do_redirect() even not if it is not supporting it (by setting netdev
> to NULL or so).
To make progress on other parts of the stack we could also take
the local lock around all of napi->poll() for now..
next prev parent reply other threads:[~2023-12-19 0:41 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-15 17:07 [PATCH net-next 00/24] locking: Introduce nested-BH locking Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 01/24] locking/local_lock: Introduce guard definition for local_lock Sebastian Andrzej Siewior
2023-12-18 8:16 ` Paolo Abeni
2024-01-11 16:19 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 02/24] locking/local_lock: Add local nested BH locking infrastructure Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 03/24] net: Use __napi_alloc_frag_align() instead of open coding it Sebastian Andrzej Siewior
2023-12-18 7:48 ` Paolo Abeni
2024-01-12 9:01 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 04/24] net: Use nested-BH locking for napi_alloc_cache Sebastian Andrzej Siewior
2023-12-16 4:43 ` kernel test robot
2024-01-12 10:58 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 05/24] net/tcp_sigpool: Use nested-BH locking for sigpool_scratch Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 06/24] net/ipv4: Use nested-BH locking for ipv4_tcp_sk Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 07/24] netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 08/24] net: softnet_data: Make xmit.recursion per task Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 09/24] dev: Use the RPS lock for softnet_data::input_pkt_queue on PREEMPT_RT Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 10/24] dev: Use nested-BH locking for softnet_data.process_queue Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 11/24] lwt: Don't disable migration prio invoking BPF Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 12/24] seg6: Use nested-BH locking for seg6_bpf_srh_states Sebastian Andrzej Siewior
2023-12-16 3:39 ` kernel test robot
2023-12-18 8:33 ` Paolo Abeni
2024-01-12 11:23 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 13/24] net: Use nested-BH locking for bpf_scratchpad Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 14/24] net: Add a lock which held during the redirect process Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect Sebastian Andrzej Siewior
2023-12-16 4:12 ` kernel test robot
2023-12-20 0:25 ` Alexei Starovoitov
2024-01-04 19:29 ` Toke Høiland-Jørgensen
2024-01-12 17:41 ` Sebastian Andrzej Siewior
2024-01-17 16:37 ` Toke Høiland-Jørgensen
2024-01-18 2:04 ` Jakub Kicinski
2024-01-18 8:27 ` Sebastian Andrzej Siewior
2024-01-18 16:38 ` Jakub Kicinski
2024-01-18 16:50 ` Sebastian Andrzej Siewior
2024-01-18 11:51 ` Toke Høiland-Jørgensen
2024-01-18 16:37 ` Jakub Kicinski
2024-01-20 14:41 ` Toke Høiland-Jørgensen
2024-01-18 7:35 ` Sebastian Andrzej Siewior
2024-01-18 11:58 ` Toke Høiland-Jørgensen
2023-12-15 17:07 ` [PATCH net-next 16/24] net: netkit, veth, tun, virt*: " Sebastian Andrzej Siewior
2023-12-16 19:28 ` kernel test robot
2023-12-18 8:52 ` Daniel Borkmann
2024-01-12 15:37 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 17/24] net: amazon, aquanti, broadcom, cavium, engleder: " Sebastian Andrzej Siewior
2023-12-16 22:09 ` Kiyanovski, Arthur
2024-01-12 17:53 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 18/24] net: Freescale: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 19/24] net: fungible, gve, mtk, microchip, mana: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 20/24] net: intel: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [Intel-wired-lan] " Sebastian Andrzej Siewior
2023-12-16 4:53 ` kernel test robot
2023-12-16 4:53 ` [Intel-wired-lan] " kernel test robot
2023-12-19 0:01 ` Nathan Chancellor
2023-12-19 0:01 ` [Intel-wired-lan] " Nathan Chancellor
2023-12-19 16:55 ` Nick Desaulniers
2023-12-19 16:55 ` [Intel-wired-lan] " Nick Desaulniers
2023-12-15 17:07 ` [PATCH net-next 21/24] net: marvell: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 22/24] net: mellanox, nfp, sfc: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 23/24] net: qlogic, socionext, stmmac, cpsw: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 24/24] net: bpf: Add lockdep assert for the redirect process Sebastian Andrzej Siewior
2023-12-15 22:50 ` [PATCH net-next 00/24] locking: Introduce nested-BH locking Jakub Kicinski
2023-12-18 17:23 ` Sebastian Andrzej Siewior
2023-12-19 0:41 ` Jakub Kicinski [this message]
2023-12-21 20:46 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231218164142.0b10e29d@kernel.org \
--to=kuba@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=michael.chan@broadcom.com \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.