From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
bpf@vger.kernel.org, netdev@vger.kernel.org
Cc: "Björn Töpel" <bjorn@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
"Alexei Starovoitov" <ast@kernel.org>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Eric Dumazet" <edumazet@google.com>,
"Hao Luo" <haoluo@google.com>, "Jakub Kicinski" <kuba@kernel.org>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Jiri Olsa" <jolsa@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
"Jonathan Lemon" <jonathan.lemon@gmail.com>,
"KP Singh" <kpsingh@kernel.org>,
"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
"Magnus Karlsson" <magnus.karlsson@intel.com>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Paolo Abeni" <pabeni@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Song Liu" <song@kernel.org>,
"Stanislav Fomichev" <sdf@google.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Yonghong Song" <yonghong.song@linux.dev>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Subject: Re: [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
Date: Wed, 14 Feb 2024 17:13:03 +0100 [thread overview]
Message-ID: <87il2rdnxs.fsf@toke.dk> (raw)
In-Reply-To: <20240213145923.2552753-2-bigeasy@linutronix.de>
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> The XDP redirect process is two staged:
> - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
> packet and makes decisions. While doing that, the per-CPU variable
> bpf_redirect_info is used.
>
> - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
> and it may also access other per-CPU variables like xskmap_flush_list.
>
> At the very end of the NAPI callback, xdp_do_flush() is invoked which
> does not access bpf_redirect_info but will touch the individual per-CPU
> lists.
>
> The per-CPU variables are only used in the NAPI callback hence disabling
> bottom halves is the only protection mechanism. Users from preemptible
> context (like cpu_map_kthread_run()) explicitly disable bottom halves
> for protections reasons.
> Without locking in local_bh_disable() on PREEMPT_RT this data structure
> requires explicit locking to avoid corruption if preemption occurs.
>
> PREEMPT_RT has forced-threaded interrupts enabled and every
> NAPI-callback runs in a thread. If each thread has its own data
> structure then locking can be avoided and data corruption is also avoided.
>
> Create a struct bpf_xdp_storage which contains struct bpf_redirect_info.
> Define the variable on stack, use xdp_storage_set() to set a pointer to
> it in task_struct of the current task. Use the __free() annotation to
> automatically reset the pointer once function returns. Use a pointer which can
> be used by the __free() annotation to avoid invoking the callback the pointer
> is NULL. This helps the compiler to optimize the code.
> The xdp_storage_set() can nest. For instance local_bh_enable() in
> bpf_test_run_xdp_live() may run NET_RX_SOFTIRQ/ net_rx_action() which
> also uses xdp_storage_set(). Therefore only the first invocations
> updates the per-task pointer.
> Use xdp_storage_get_ri() as a wrapper to retrieve the current struct
> bpf_redirect_info.
>
> This is only done on PREEMPT_RT. The !PREEMPT_RT builds keep using the
> per-CPU variable instead. This should also work for !PREEMPT_RT but
> isn't needed.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[...]
> diff --git a/net/core/dev.c b/net/core/dev.c
> index de362d5f26559..c3f7d2a6b6134 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3988,11 +3988,15 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
> struct net_device *orig_dev, bool *another)
> {
> struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
> + struct bpf_xdp_storage *xdp_store __free(xdp_storage_clear) = NULL;
> enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS;
> + struct bpf_xdp_storage __xdp_store;
> int sch_ret;
>
> if (!entry)
> return skb;
> +
> + xdp_store = xdp_storage_set(&__xdp_store);
> if (*pt_prev) {
> *ret = deliver_skb(skb, *pt_prev, orig_dev);
> *pt_prev = NULL;
> @@ -4044,12 +4048,16 @@ static __always_inline struct sk_buff *
> sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
> {
> struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress);
> + struct bpf_xdp_storage *xdp_store __free(xdp_storage_clear) = NULL;
> enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_EGRESS;
> + struct bpf_xdp_storage __xdp_store;
> int sch_ret;
>
> if (!entry)
> return skb;
>
> + xdp_store = xdp_storage_set(&__xdp_store);
> +
> /* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was
> * already set by the caller.
> */
These, and the LWT code, don't actually have anything to do with XDP,
which indicates that the 'xdp_storage' name misleading. Maybe
'bpf_net_context' or something along those lines? Or maybe we could just
move the flush lists into bpf_redirect_info itself and just keep that as
the top-level name?
-Toke
next prev parent reply other threads:[~2024-02-14 16:13 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-13 14:58 [PATCH RFC net-next 0/2] Use per-task storage for XDP-redirects on PREEMPT_RT Sebastian Andrzej Siewior
2024-02-13 14:58 ` [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via task_struct " Sebastian Andrzej Siewior
2024-02-13 20:50 ` Jesper Dangaard Brouer
2024-02-14 12:19 ` Sebastian Andrzej Siewior
2024-02-14 13:23 ` Toke Høiland-Jørgensen
2024-02-14 14:28 ` Sebastian Andrzej Siewior
2024-02-14 16:08 ` Toke Høiland-Jørgensen
2024-02-14 16:36 ` Sebastian Andrzej Siewior
2024-02-15 20:23 ` Toke Høiland-Jørgensen
2024-02-16 16:57 ` Sebastian Andrzej Siewior
2024-02-19 19:01 ` Toke Høiland-Jørgensen
2024-02-20 9:17 ` Jesper Dangaard Brouer
2024-02-20 10:17 ` Sebastian Andrzej Siewior
2024-02-20 10:42 ` Jesper Dangaard Brouer
2024-02-20 12:08 ` Sebastian Andrzej Siewior
2024-02-20 12:57 ` Jesper Dangaard Brouer
2024-02-20 15:32 ` Sebastian Andrzej Siewior
2024-02-22 9:22 ` Sebastian Andrzej Siewior
2024-02-22 10:10 ` Jesper Dangaard Brouer
2024-02-22 10:58 ` Sebastian Andrzej Siewior
2024-02-20 12:10 ` Dave Taht
2024-02-14 16:13 ` Toke Høiland-Jørgensen [this message]
2024-02-15 9:04 ` Sebastian Andrzej Siewior
2024-02-15 12:11 ` Toke Høiland-Jørgensen
2024-02-13 14:58 ` [PATCH RFC net-next 2/2] net: Move per-CPU flush-lists to bpf_xdp_storage " Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87il2rdnxs.fsf@toke.dk \
--to=toke@redhat.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jonathan.lemon@gmail.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=peterz@infradead.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).