All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Jesper Dangaard Brouer <hawk@kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Boqun Feng <boqun.feng@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <edumazet@google.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Stanislav Fomichev <sdf@google.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH net-next 14/15 v2] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
Date: Tue, 14 May 2024 13:54:43 +0200	[thread overview]
Message-ID: <87le4cd2ws.fsf@toke.dk> (raw)
In-Reply-To: <20240510162121.f-tvqcyf@linutronix.de>

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> The XDP redirect process is two staged:
> - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
>   packet and makes decisions. While doing that, the per-CPU variable
>   bpf_redirect_info is used.
>
> - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
>   and it may also access other per-CPU variables like xskmap_flush_list.
>
> At the very end of the NAPI callback, xdp_do_flush() is invoked which
> does not access bpf_redirect_info but will touch the individual per-CPU
> lists.
>
> The per-CPU variables are only used in the NAPI callback hence disabling
> bottom halves is the only protection mechanism. Users from preemptible
> context (like cpu_map_kthread_run()) explicitly disable bottom halves
> for protections reasons.
> Without locking in local_bh_disable() on PREEMPT_RT this data structure
> requires explicit locking.
>
> PREEMPT_RT has forced-threaded interrupts enabled and every
> NAPI-callback runs in a thread. If each thread has its own data
> structure then locking can be avoided.
>
> Create a struct bpf_net_context which contains struct bpf_redirect_info.
> Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
> it. Use the __free() annotation to automatically reset the pointer once
> function returns.
> The bpf_net_ctx_set() may nest. For instance a function can be used from
> within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
> NET_TX_SOFTIRQ which does not. Therefore only the first invocations
> updates the pointer.
> Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
> bpf_redirect_info.
>
> On PREEMPT_RT the pointer to bpf_net_context is saved task's
> task_struct. On non-PREEMPT_RT builds the pointer saved in a per-CPU
> variable (which is always NODE-local memory). Using always the
> bpf_net_context approach has the advantage that there is almost zero
> differences between PREEMPT_RT and non-PREEMPT_RT builds.
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Eduard Zingerman <eddyz87@gmail.com>
> Cc: Hao Luo <haoluo@google.com>
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: KP Singh <kpsingh@kernel.org>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Song Liu <song@kernel.org>
> Cc: Stanislav Fomichev <sdf@google.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Yonghong Song <yonghong.song@linux.dev>
> Cc: bpf@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  include/linux/filter.h | 42 ++++++++++++++++++++++++++++++++-----
>  include/linux/sched.h  |  3 +++
>  kernel/bpf/cpumap.c    |  3 +++
>  kernel/fork.c          |  1 +
>  net/bpf/test_run.c     | 11 +++++++++-
>  net/core/dev.c         | 19 ++++++++++++++++-
>  net/core/filter.c      | 47 +++++++++++++++++++-----------------------
>  net/core/lwt_bpf.c     |  3 +++
>  8 files changed, 96 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index d5fea03cb6e61..6db5a68db6ee1 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -744,7 +744,39 @@ struct bpf_redirect_info {
>  	struct bpf_nh_params nh;
>  };
>  
> -DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
> +struct bpf_net_context {
> +	struct bpf_redirect_info ri;
> +};
> +
> +static inline struct bpf_net_context *bpf_net_ctx_set(struct bpf_net_context *bpf_net_ctx)
> +{
> +	struct task_struct *tsk = current;
> +
> +	if (tsk->bpf_net_context != NULL)
> +		return NULL;
> +	tsk->bpf_net_context = bpf_net_ctx;
> +	return bpf_net_ctx;
> +}
> +
> +static inline void bpf_net_ctx_clear(struct bpf_net_context *bpf_net_ctx)
> +{
> +	if (bpf_net_ctx)
> +		current->bpf_net_context = NULL;
> +}
> +
> +static inline struct bpf_net_context *bpf_net_ctx_get(void)
> +{
> +	return current->bpf_net_context;
> +}
> +
> +static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void)
> +{
> +	struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get();
> +
> +	return &bpf_net_ctx->ri;
> +}
> +
> +DEFINE_FREE(bpf_net_ctx_clear, struct bpf_net_context *, bpf_net_ctx_clear(_T));
>  
>  /* flags for bpf_redirect_info kern_flags */
>  #define BPF_RI_F_RF_NO_DIRECT	BIT(0)	/* no napi_direct on return_frame */
> @@ -1021,21 +1053,21 @@ void bpf_clear_redirect_map(struct bpf_map *map);
>  
>  static inline bool xdp_return_frame_no_direct(void)
>  {
> -	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> +	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
>  
>  	return ri->kern_flags & BPF_RI_F_RF_NO_DIRECT;
>  }
>  
>  static inline void xdp_set_return_frame_no_direct(void)
>  {
> -	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> +	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
>  
>  	ri->kern_flags |= BPF_RI_F_RF_NO_DIRECT;
>  }
>  
>  static inline void xdp_clear_return_frame_no_direct(void)
>  {
> -	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> +	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
>  
>  	ri->kern_flags &= ~BPF_RI_F_RF_NO_DIRECT;
>  }
> @@ -1591,7 +1623,7 @@ static __always_inline long __bpf_xdp_redirect_map(struct bpf_map *map, u64 inde
>  						   u64 flags, const u64 flag_mask,
>  						   void *lookup_elem(struct bpf_map *map, u32 key))
>  {
> -	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> +	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
>  	const u64 action_mask = XDP_ABORTED | XDP_DROP | XDP_PASS | XDP_TX;
>  
>  	/* Lower bits of the flags are used as return code on lookup failure */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 6779d3b8f2578..cc9be45de6606 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -53,6 +53,7 @@ struct bio_list;
>  struct blk_plug;
>  struct bpf_local_storage;
>  struct bpf_run_ctx;
> +struct bpf_net_context;
>  struct capture_control;
>  struct cfs_rq;
>  struct fs_struct;
> @@ -1504,6 +1505,8 @@ struct task_struct {
>  	/* Used for BPF run context */
>  	struct bpf_run_ctx		*bpf_ctx;
>  #endif
> +	/* Used by BPF for per-TASK xdp storage */
> +	struct bpf_net_context		*bpf_net_context;

Okay, so if we are going the route of always putting this in 'current',
why not just embed the whole struct bpf_net_context inside task_struct,
instead of mucking about with the stack-allocated structures and
setting/clearing of pointers?

-Toke


  parent reply	other threads:[~2024-05-14 11:54 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-03 18:25 [PATCH v2 net-next 00/15] locking: Introduce nested-BH locking Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 01/15] locking/local_lock: Introduce guard definition for local_lock Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 02/15] locking/local_lock: Add local nested BH locking infrastructure Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 03/15] net: Use __napi_alloc_frag_align() instead of open coding it Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 04/15] net: Use nested-BH locking for napi_alloc_cache Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 05/15] net/tcp_sigpool: Use nested-BH locking for sigpool_scratch Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 06/15] net/ipv4: Use nested-BH locking for ipv4_tcp_sk Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 07/15] netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 08/15] net: softnet_data: Make xmit.recursion per task Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 09/15] dev: Remove PREEMPT_RT ifdefs from backlog_lock.*() Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 10/15] dev: Use nested-BH locking for softnet_data.process_queue Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 11/15] lwt: Don't disable migration prio invoking BPF Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 12/15] seg6: Use nested-BH locking for seg6_bpf_srh_states Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 13/15] net: Use nested-BH locking for bpf_scratchpad Sebastian Andrzej Siewior
2024-05-03 18:25 ` [PATCH net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT Sebastian Andrzej Siewior
2024-05-06 19:41   ` Toke Høiland-Jørgensen
2024-05-06 23:09     ` Alexei Starovoitov
2024-05-07 12:36       ` Sebastian Andrzej Siewior
2024-05-07 13:27         ` Jesper Dangaard Brouer
2024-05-10 16:21           ` [PATCH net-next 14/15 v2] " Sebastian Andrzej Siewior
2024-05-10 16:22             ` Sebastian Andrzej Siewior
2024-05-14  5:07               ` Jesper Dangaard Brouer
2024-05-14  5:43                 ` Sebastian Andrzej Siewior
2024-05-14 12:20                   ` Jesper Dangaard Brouer
2024-05-17 16:15                     ` Sebastian Andrzej Siewior
2024-05-22  7:09                       ` Jesper Dangaard Brouer
2024-05-24  7:54                         ` Sebastian Andrzej Siewior
2024-05-24 13:59                         ` Sebastian Andrzej Siewior
2024-05-14 11:54             ` Toke Høiland-Jørgensen [this message]
2024-05-15 13:43               ` Sebastian Andrzej Siewior
2024-05-21  1:52                 ` Alexei Starovoitov
2024-05-07 10:57     ` [PATCH net-next 14/15] " Sebastian Andrzej Siewior
2024-05-07 13:50       ` Toke Høiland-Jørgensen
2024-05-03 18:25 ` [PATCH net-next 15/15] net: Move per-CPU flush-lists to bpf_net_context " Sebastian Andrzej Siewior
2024-05-06  8:43 ` [PATCH v2 net-next 00/15] locking: Introduce nested-BH locking Paolo Abeni
2024-05-06  9:38   ` Sebastian Andrzej Siewior
2024-05-06 14:12     ` Paolo Abeni
2024-05-06 14:43       ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87le4cd2ws.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=martin.lau@linux.dev \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.