linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Michael Chan <michael.chan@broadcom.com>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Boqun Feng <boqun.feng@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <edumazet@google.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Paolo Abeni <pabeni@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH net-next 00/24] locking: Introduce nested-BH locking.
Date: Mon, 18 Dec 2023 16:41:42 -0800	[thread overview]
Message-ID: <20231218164142.0b10e29d@kernel.org> (raw)
In-Reply-To: <20231218172331.fp-nC_Nr@linutronix.de>

On Mon, 18 Dec 2023 18:23:31 +0100 Sebastian Andrzej Siewior wrote:
> On 2023-12-15 14:50:59 [-0800], Jakub Kicinski wrote:
> > On Fri, 15 Dec 2023 18:07:19 +0100 Sebastian Andrzej Siewior wrote:  
> > > The proposed way out is to introduce explicit per-CPU locks for
> > > resources which are protected by local_bh_disable() and use those only
> > > on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds.  
> > 
> > As I said at LPC, complicating drivers with odd locking constructs
> > is a no go for me.  
> 
> I misunderstood it then as I assumed you wanted to ease the work while I
> was done which every driver after (hopefully) understanding what is
> possible/ needed and what not. We do speak here about 15++?

My main concern is that it takes the complexity of writing network
device drivers to a next level. It's already hard enough to implement
XDP correctly. "local lock" and "guard"? Too complicated :(
Or "unmaintainable" as in "too much maintainer's time will be spent
reviewing code that gets this wrong".

> Now. The pattern is usually
> |	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> |	switch (act) {
> |	case XDP_REDIRECT:
> |		ret = xdp_do_redirect(netdev, &xdp, xdp_prog)))
> |		if (ret)
> |			goto XDP_ABORTED;
> |		xdp_redir++ or so;
> 
> so we might be able to turn this into something that covers both and
> returns either XDP_REDIRECT or XDP_ABORTED. So this could be merged
> into
> 
> | u32 bpf_prog_run_xdp_and_redirect(struct net_device *dev, const struct
> | 				  bpf_prog *prog, struct xdp_buff *xdp)
> | {
> | 	u32 act;
> | 	int ret;
> | 
> | 	act = bpf_prog_run_xdp(prog, xdp);
> | 	if (act == XDP_REDIRECT) {
> | 		ret = xdp_do_redirect(netdev, xdp, prog);
> | 		if (ret < 0)
> | 			act = XDP_ABORTED;
> | 	}
> | 	return act;
> | }

If we could fold the DROP case into this -- even better!

> so the lock can be put inside the function and all drivers use this
> function.
> 
> From looking through drivers/net/ethernet/, this should work for most
> drivers:
> - amazon/ena
> - aquantia/atlantic
> - engleder/tsnep
> - freescale/enetc
> - freescale/fec
> - intel/igb
> - intel/igc
> - marvell/mvneta
> - marvell/mvpp2
> - marvell/octeontx2
> - mediatek/mtk
> - mellanox/mlx5
> - microchip/lan966x
> - microsoft/mana
> - netronome/nfp (two call paths with no support XDP_REDIRECT)
> - sfc/rx
> - sfc/siena (that offset pointer can be moved)
> - socionext/netsec
> - stmicro/stmmac
> 
> A few do something custom/ additionally between bpf_prog_run_xdp() and
>   xdp_do_redirect():
> 
> - broadcom/bnxt
>   calculates length, offset, data pointer. DMA unmaps + memory
>   allocations before redirect.

Just looked at this one. The recalculation is probably for the PASS /
TX cases, REDIRECT / DROP shouldn't care. The DMA unmap looks like 
a bug (hi, Michael!)

> - freescale/dpaa2
> - freescale/dpaa
>   sets xdp.data_hard_start + frame_sz, unmaps DMA.
> 
> - fungible/funeth
>   conditional redirect.
> 
> - google/gve
>   Allocates a new packet for redirect.
> 
> - intel/ixgbe
> - intel/i40e
> - intel/ice
>   Failure in the ZC case is different from XDP_ABORTED, depends on the
>   error from xdp_do_redirect())
> 
> - mellanox/mlx4/
>   calculates page_offset.
> 
> - qlogic/qede
>   DMA unmap and buffer alloc.
> 
> - ti/cpsw_priv
>   recalculates length (pointer).
> 
> and a few more don't support XDP_REDIRECT:
> 
> - cavium/thunder
>   does not support XDP_REDIRECT, calculates length, offset.
> 
> - intel/ixgbevf
>   does not support XDP_REDIRECT
> 
> I don't understand why some driver need to recalculate data_hard_start,
> length and so on and others don't. This might be only needed for the
> XDP_TX case or not needed…
> Also I'm not sure about the dma unmaps and skb allocations. The new skb
> allocation can be probably handled before running the bpf prog but then
> in the XDP_PASS case it is a waste…
> And the DMA unmaps. Only a few seem to need it. Maybe it can be done
> before running the BPF program. After all the bpf may look into the skb.
> 
> 
> If that is no go, then the only thing that comes to mind is (as you
> mentioned on LPC) to acquire the lock in bpf_prog_run_xdp() and drop it
> in xdp_do_redirect(). This would require that every driver invokes
> xdp_do_redirect() even not if it is not supporting it (by setting netdev
> to NULL or so).

To make progress on other parts of the stack we could also take 
the local lock around all of napi->poll() for now..

  reply	other threads:[~2023-12-19  0:41 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-15 17:07 [PATCH net-next 00/24] locking: Introduce nested-BH locking Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 01/24] locking/local_lock: Introduce guard definition for local_lock Sebastian Andrzej Siewior
2023-12-18  8:16   ` Paolo Abeni
2024-01-11 16:19     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 02/24] locking/local_lock: Add local nested BH locking infrastructure Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 03/24] net: Use __napi_alloc_frag_align() instead of open coding it Sebastian Andrzej Siewior
2023-12-18  7:48   ` Paolo Abeni
2024-01-12  9:01     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 04/24] net: Use nested-BH locking for napi_alloc_cache Sebastian Andrzej Siewior
2023-12-16  4:43   ` kernel test robot
2024-01-12 10:58     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 05/24] net/tcp_sigpool: Use nested-BH locking for sigpool_scratch Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 06/24] net/ipv4: Use nested-BH locking for ipv4_tcp_sk Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 07/24] netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 08/24] net: softnet_data: Make xmit.recursion per task Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 09/24] dev: Use the RPS lock for softnet_data::input_pkt_queue on PREEMPT_RT Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 10/24] dev: Use nested-BH locking for softnet_data.process_queue Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 11/24] lwt: Don't disable migration prio invoking BPF Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 12/24] seg6: Use nested-BH locking for seg6_bpf_srh_states Sebastian Andrzej Siewior
2023-12-16  3:39   ` kernel test robot
2023-12-18  8:33   ` Paolo Abeni
2024-01-12 11:23     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 13/24] net: Use nested-BH locking for bpf_scratchpad Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 14/24] net: Add a lock which held during the redirect process Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect Sebastian Andrzej Siewior
2023-12-16  4:12   ` kernel test robot
2023-12-20  0:25   ` Alexei Starovoitov
2024-01-04 19:29     ` Toke Høiland-Jørgensen
2024-01-12 17:41       ` Sebastian Andrzej Siewior
2024-01-17 16:37         ` Toke Høiland-Jørgensen
2024-01-18  2:04           ` Jakub Kicinski
2024-01-18  8:27             ` Sebastian Andrzej Siewior
2024-01-18 16:38               ` Jakub Kicinski
2024-01-18 16:50                 ` Sebastian Andrzej Siewior
2024-01-18 11:51             ` Toke Høiland-Jørgensen
2024-01-18 16:37               ` Jakub Kicinski
2024-01-20 14:41                 ` Toke Høiland-Jørgensen
2024-01-18  7:35           ` Sebastian Andrzej Siewior
2024-01-18 11:58             ` Toke Høiland-Jørgensen
2023-12-15 17:07 ` [PATCH net-next 16/24] net: netkit, veth, tun, virt*: " Sebastian Andrzej Siewior
2023-12-16 19:28   ` kernel test robot
2023-12-18  8:52   ` Daniel Borkmann
2024-01-12 15:37     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 17/24] net: amazon, aquanti, broadcom, cavium, engleder: " Sebastian Andrzej Siewior
2023-12-16 22:09   ` Kiyanovski, Arthur
2024-01-12 17:53     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 18/24] net: Freescale: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 19/24] net: fungible, gve, mtk, microchip, mana: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 20/24] net: intel: " Sebastian Andrzej Siewior
2023-12-16  4:53   ` kernel test robot
2023-12-19  0:01     ` Nathan Chancellor
2023-12-19 16:55       ` Nick Desaulniers
2023-12-15 17:07 ` [PATCH net-next 21/24] net: marvell: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 22/24] net: mellanox, nfp, sfc: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 23/24] net: qlogic, socionext, stmmac, cpsw: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 24/24] net: bpf: Add lockdep assert for the redirect process Sebastian Andrzej Siewior
2023-12-15 22:50 ` [PATCH net-next 00/24] locking: Introduce nested-BH locking Jakub Kicinski
2023-12-18 17:23   ` Sebastian Andrzej Siewior
2023-12-19  0:41     ` Jakub Kicinski [this message]
2023-12-21 20:46       ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231218164142.0b10e29d@kernel.org \
    --to=kuba@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=michael.chan@broadcom.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).