BPF List
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next] bpf: defer bpf_link dealloc to after RCU grace period
@ 2024-03-25 21:51 Andrii Nakryiko
  2024-03-26  2:16 ` Alexei Starovoitov
  0 siblings, 1 reply; 4+ messages in thread
From: Andrii Nakryiko @ 2024-03-25 21:51 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau
  Cc: andrii, kernel-team, syzbot+981935d9485a560bfbcb,
	syzbot+2cb5a6c573e98db598cc, syzbot+62d8b26793e8a2bd0516

BPF link for some program types is passed as a "context" which can be
used by those BPF programs to look up additional information. E.g., for
BPF raw tracepoints, link is used to fetch BPF cookie value, similarly
for BPF multi-kprobes and multi-uprobes.

Because of this runtime dependency, when bpf_link refcnt drops to zero
that could be still active BPF programs running accessing link data
(cookie, program pointer, etc).

This patch accommodates this by delaying freeing memory to after RCU GP,
which will fix BPF raw tp, multi-kprobe, and non-sleepable multi-uprobe.

Perhaps a better approach would be to have a per-link flag specifying
desired behavior: no delay, RCU delay, or task_trace RCU delay? So
sending this as an RFC fix to discuss desired final solution.

Fixes: d4dfc5700e86 ("bpf: pass whole link instead of prog when triggering raw tracepoint")
Reported-by: syzbot+981935d9485a560bfbcb@syzkaller.appspotmail.com
Reported-by: syzbot+2cb5a6c573e98db598cc@syzkaller.appspotmail.com
Reported-by: syzbot+62d8b26793e8a2bd0516@syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h  |  8 +++++++-
 kernel/bpf/syscall.c | 12 ++++++++++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 62762390c93d..d73a8978c800 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1573,7 +1573,13 @@ struct bpf_link {
 	enum bpf_link_type type;
 	const struct bpf_link_ops *ops;
 	struct bpf_prog *prog;
-	struct work_struct work;
+	/* rcu is used before freeing, work can be used to schedule that
+	 * RCU-based freeing before that, so they never overlap
+	 */
+	union {
+		struct rcu_head rcu;
+		struct work_struct work;
+	};
 };
 
 struct bpf_link_ops {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e44c276e8617..af1591af10bb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3024,6 +3024,14 @@ void bpf_link_inc(struct bpf_link *link)
 	atomic64_inc(&link->refcnt);
 }
 
+static void bpf_link_dealloc_deferred(struct rcu_head *rcu)
+{
+	struct bpf_link *link = container_of(rcu, struct bpf_link, rcu);
+
+	/* free bpf_link and its containing memory */
+	link->ops->dealloc(link);
+}
+
 /* bpf_link_free is guaranteed to be called from process context */
 static void bpf_link_free(struct bpf_link *link)
 {
@@ -3033,8 +3041,8 @@ static void bpf_link_free(struct bpf_link *link)
 		link->ops->release(link);
 		bpf_prog_put(link->prog);
 	}
-	/* free bpf_link and its containing memory */
-	link->ops->dealloc(link);
+	/* schedule BPF link deallocation after RCU grace period */
+	call_rcu(&link->rcu, bpf_link_dealloc_deferred);
 }
 
 static void bpf_link_put_deferred(struct work_struct *work)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-26  3:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-25 21:51 [PATCH RFC bpf-next] bpf: defer bpf_link dealloc to after RCU grace period Andrii Nakryiko
2024-03-26  2:16 ` Alexei Starovoitov
2024-03-26  2:59   ` Andrii Nakryiko
2024-03-26  3:15     ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox