All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	bpf@vger.kernel.org, netdev@vger.kernel.org
Cc: "Björn Töpel" <bjorn@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Hao Luo" <haoluo@google.com>, "Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Jiri Olsa" <jolsa@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Jonathan Lemon" <jonathan.lemon@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Song Liu" <song@kernel.org>,
	"Stanislav Fomichev" <sdf@google.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Subject: Re: [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
Date: Wed, 14 Feb 2024 17:13:03 +0100	[thread overview]
Message-ID: <87il2rdnxs.fsf@toke.dk> (raw)
In-Reply-To: <20240213145923.2552753-2-bigeasy@linutronix.de>

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> The XDP redirect process is two staged:
> - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
>   packet and makes decisions. While doing that, the per-CPU variable
>   bpf_redirect_info is used.
>
> - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
>   and it may also access other per-CPU variables like xskmap_flush_list.
>
> At the very end of the NAPI callback, xdp_do_flush() is invoked which
> does not access bpf_redirect_info but will touch the individual per-CPU
> lists.
>
> The per-CPU variables are only used in the NAPI callback hence disabling
> bottom halves is the only protection mechanism. Users from preemptible
> context (like cpu_map_kthread_run()) explicitly disable bottom halves
> for protections reasons.
> Without locking in local_bh_disable() on PREEMPT_RT this data structure
> requires explicit locking to avoid corruption if preemption occurs.
>
> PREEMPT_RT has forced-threaded interrupts enabled and every
> NAPI-callback runs in a thread. If each thread has its own data
> structure then locking can be avoided and data corruption is also avoided.
>
> Create a struct bpf_xdp_storage which contains struct bpf_redirect_info.
> Define the variable on stack, use xdp_storage_set() to set a pointer to
> it in task_struct of the current task. Use the __free() annotation to
> automatically reset the pointer once function returns. Use a pointer which can
> be used by the __free() annotation to avoid invoking the callback the pointer
> is NULL. This helps the compiler to optimize the code.
> The xdp_storage_set() can nest. For instance local_bh_enable() in
> bpf_test_run_xdp_live() may run NET_RX_SOFTIRQ/ net_rx_action() which
> also uses xdp_storage_set(). Therefore only the first invocations
> updates the per-task pointer.
> Use xdp_storage_get_ri() as a wrapper to retrieve the current struct
> bpf_redirect_info.
>
> This is only done on PREEMPT_RT. The !PREEMPT_RT builds keep using the
> per-CPU variable instead. This should also work for !PREEMPT_RT but
> isn't needed.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

[...]

> diff --git a/net/core/dev.c b/net/core/dev.c
> index de362d5f26559..c3f7d2a6b6134 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3988,11 +3988,15 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
>  		   struct net_device *orig_dev, bool *another)
>  {
>  	struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
> +	struct bpf_xdp_storage *xdp_store __free(xdp_storage_clear) = NULL;
>  	enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS;
> +	struct bpf_xdp_storage __xdp_store;
>  	int sch_ret;
>  
>  	if (!entry)
>  		return skb;
> +
> +	xdp_store = xdp_storage_set(&__xdp_store);
>  	if (*pt_prev) {
>  		*ret = deliver_skb(skb, *pt_prev, orig_dev);
>  		*pt_prev = NULL;
> @@ -4044,12 +4048,16 @@ static __always_inline struct sk_buff *
>  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>  {
>  	struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress);
> +	struct bpf_xdp_storage *xdp_store __free(xdp_storage_clear) = NULL;
>  	enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_EGRESS;
> +	struct bpf_xdp_storage __xdp_store;
>  	int sch_ret;
>  
>  	if (!entry)
>  		return skb;
>  
> +	xdp_store = xdp_storage_set(&__xdp_store);
> +
>  	/* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was
>  	 * already set by the caller.
>  	 */


These, and the LWT code, don't actually have anything to do with XDP,
which indicates that the 'xdp_storage' name misleading. Maybe
'bpf_net_context' or something along those lines? Or maybe we could just
move the flush lists into bpf_redirect_info itself and just keep that as
the top-level name?

-Toke


  parent reply	other threads:[~2024-02-14 16:13 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 14:58 [PATCH RFC net-next 0/2] Use per-task storage for XDP-redirects on PREEMPT_RT Sebastian Andrzej Siewior
2024-02-13 14:58 ` [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via task_struct " Sebastian Andrzej Siewior
2024-02-13 20:50   ` Jesper Dangaard Brouer
2024-02-14 12:19     ` Sebastian Andrzej Siewior
2024-02-14 13:23       ` Toke Høiland-Jørgensen
2024-02-14 14:28         ` Sebastian Andrzej Siewior
2024-02-14 16:08           ` Toke Høiland-Jørgensen
2024-02-14 16:36             ` Sebastian Andrzej Siewior
2024-02-15 20:23               ` Toke Høiland-Jørgensen
2024-02-16 16:57                 ` Sebastian Andrzej Siewior
2024-02-19 19:01                   ` Toke Høiland-Jørgensen
2024-02-20  9:17                     ` Jesper Dangaard Brouer
2024-02-20 10:17                       ` Sebastian Andrzej Siewior
2024-02-20 10:42                         ` Jesper Dangaard Brouer
2024-02-20 12:08                           ` Sebastian Andrzej Siewior
2024-02-20 12:57                             ` Jesper Dangaard Brouer
2024-02-20 15:32                               ` Sebastian Andrzej Siewior
2024-02-22  9:22                                 ` Sebastian Andrzej Siewior
2024-02-22 10:10                                   ` Jesper Dangaard Brouer
2024-02-22 10:58                                     ` Sebastian Andrzej Siewior
2024-02-20 12:10                           ` Dave Taht
2024-02-14 16:13   ` Toke Høiland-Jørgensen [this message]
2024-02-15  9:04     ` Sebastian Andrzej Siewior
2024-02-15 12:11       ` Toke Høiland-Jørgensen
2024-02-13 14:58 ` [PATCH RFC net-next 2/2] net: Move per-CPU flush-lists to bpf_xdp_storage " Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87il2rdnxs.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.