* [PATCH bpf 1/2] bpf: put uprobe link's path and task in release callback @ 2024-03-28 5:24 Andrii Nakryiko 2024-03-28 5:24 ` [PATCH bpf 2/2] bpf: support deferring bpf_link dealloc to after RCU grace period Andrii Nakryiko 2024-03-29 2:00 ` [PATCH bpf 1/2] bpf: put uprobe link's path and task in release callback patchwork-bot+netdevbpf 0 siblings, 2 replies; 4+ messages in thread From: Andrii Nakryiko @ 2024-03-28 5:24 UTC (permalink / raw) To: bpf, ast, daniel, martin.lau; +Cc: jolsa, Andrii Nakryiko There is no need to delay putting either path or task to deallocation step. It can be done right after bpf_uprobe_unregister. Between release and dealloc, there could be still some running BPF programs, but they don't access either task or path, only data in link->uprobes, so it is safe to do. On the other hand, doing path_put() in dealloc callback makes this dealloc sleepable because path_put() itself might sleep. Which is problematic due to the need to call uprobe's dealloc through call_rcu(), which is what is done in the next bug fix patch. So solve the problem by releasing these resources early. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> --- kernel/trace/bpf_trace.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 0a5c4efc73c3..0b73fe5f7206 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -3157,6 +3157,9 @@ static void bpf_uprobe_multi_link_release(struct bpf_link *link) umulti_link = container_of(link, struct bpf_uprobe_multi_link, link); bpf_uprobe_unregister(&umulti_link->path, umulti_link->uprobes, umulti_link->cnt); + if (umulti_link->task) + put_task_struct(umulti_link->task); + path_put(&umulti_link->path); } static void bpf_uprobe_multi_link_dealloc(struct bpf_link *link) @@ -3164,9 +3167,6 @@ static void bpf_uprobe_multi_link_dealloc(struct bpf_link *link) struct bpf_uprobe_multi_link *umulti_link; umulti_link = container_of(link, struct bpf_uprobe_multi_link, link); - if (umulti_link->task) - put_task_struct(umulti_link->task); - path_put(&umulti_link->path); kvfree(umulti_link->uprobes); kfree(umulti_link); } -- 2.43.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH bpf 2/2] bpf: support deferring bpf_link dealloc to after RCU grace period 2024-03-28 5:24 [PATCH bpf 1/2] bpf: put uprobe link's path and task in release callback Andrii Nakryiko @ 2024-03-28 5:24 ` Andrii Nakryiko 2024-03-28 10:10 ` Jiri Olsa 2024-03-29 2:00 ` [PATCH bpf 1/2] bpf: put uprobe link's path and task in release callback patchwork-bot+netdevbpf 1 sibling, 1 reply; 4+ messages in thread From: Andrii Nakryiko @ 2024-03-28 5:24 UTC (permalink / raw) To: bpf, ast, daniel, martin.lau Cc: jolsa, Andrii Nakryiko, syzbot+981935d9485a560bfbcb, syzbot+2cb5a6c573e98db598cc, syzbot+62d8b26793e8a2bd0516 BPF link for some program types is passed as a "context" which can be used by those BPF programs to look up additional information. E.g., for multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values. Because of this runtime dependency, when bpf_link refcnt drops to zero there could still be active BPF programs running accessing link data. This patch adds generic support to defer bpf_link dealloc callback to after RCU GP, if requested. This is done by exposing two different deallocation callbacks, one synchronous and one deferred. If deferred one is provided, bpf_link_free() will schedule dealloc_deferred() callback to happen after RCU GP. BPF is using two flavors of RCU: "classic" non-sleepable one and RCU tasks trace one. The latter is used when sleepable BPF programs are used. bpf_link_free() accommodates that by checking underlying BPF program's sleepable flag, and goes either through normal RCU GP only for non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP (taking into account rcu_trace_implies_rcu_gp() optimization), if BPF program is sleepable. We use this for multi-kprobe and multi-uprobe links, which dereference link during program run. We also preventively switch raw_tp link to use deferred dealloc callback, as upcoming changes in bpf-next tree expose raw_tp link data (specifically, cookie value) to BPF program at runtime as well. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") Reported-by: syzbot+981935d9485a560bfbcb@syzkaller.appspotmail.com Reported-by: syzbot+2cb5a6c573e98db598cc@syzkaller.appspotmail.com Reported-by: syzbot+62d8b26793e8a2bd0516@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org> --- include/linux/bpf.h | 16 +++++++++++++++- kernel/bpf/syscall.c | 35 ++++++++++++++++++++++++++++++++--- kernel/trace/bpf_trace.c | 4 ++-- 3 files changed, 49 insertions(+), 6 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 4f20f62f9d63..890e152d553e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1574,12 +1574,26 @@ struct bpf_link { enum bpf_link_type type; const struct bpf_link_ops *ops; struct bpf_prog *prog; - struct work_struct work; + /* rcu is used before freeing, work can be used to schedule that + * RCU-based freeing before that, so they never overlap + */ + union { + struct rcu_head rcu; + struct work_struct work; + }; }; struct bpf_link_ops { void (*release)(struct bpf_link *link); + /* deallocate link resources callback, called without RCU grace period + * waiting + */ void (*dealloc)(struct bpf_link *link); + /* deallocate link resources callback, called after RCU grace period; + * if underlying BPF program is sleepable we go through tasks trace + * RCU GP and then "classic" RCU GP + */ + void (*dealloc_deferred)(struct bpf_link *link); int (*detach)(struct bpf_link *link); int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog, struct bpf_prog *old_prog); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ae2ff73bde7e..c287925471f6 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3024,17 +3024,46 @@ void bpf_link_inc(struct bpf_link *link) atomic64_inc(&link->refcnt); } +static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) +{ + struct bpf_link *link = container_of(rcu, struct bpf_link, rcu); + + /* free bpf_link and its containing memory */ + link->ops->dealloc_deferred(link); +} + +static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) +{ + if (rcu_trace_implies_rcu_gp()) + bpf_link_defer_dealloc_rcu_gp(rcu); + else + call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); +} + /* bpf_link_free is guaranteed to be called from process context */ static void bpf_link_free(struct bpf_link *link) { + bool sleepable = false; + bpf_link_free_id(link->id); if (link->prog) { + sleepable = link->prog->sleepable; /* detach BPF program, clean up used resources */ link->ops->release(link); bpf_prog_put(link->prog); } - /* free bpf_link and its containing memory */ - link->ops->dealloc(link); + if (link->ops->dealloc_deferred) { + /* schedule BPF link deallocation; if underlying BPF program + * is sleepable, we need to first wait for RCU tasks trace + * sync, then go through "classic" RCU grace period + */ + if (sleepable) + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); + else + call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); + } + if (link->ops->dealloc) + link->ops->dealloc(link); } static void bpf_link_put_deferred(struct work_struct *work) @@ -3544,7 +3573,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link, static const struct bpf_link_ops bpf_raw_tp_link_lops = { .release = bpf_raw_tp_link_release, - .dealloc = bpf_raw_tp_link_dealloc, + .dealloc_deferred = bpf_raw_tp_link_dealloc, .show_fdinfo = bpf_raw_tp_link_show_fdinfo, .fill_link_info = bpf_raw_tp_link_fill_link_info, }; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 0b73fe5f7206..9dc605f08a23 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2728,7 +2728,7 @@ static int bpf_kprobe_multi_link_fill_link_info(const struct bpf_link *link, static const struct bpf_link_ops bpf_kprobe_multi_link_lops = { .release = bpf_kprobe_multi_link_release, - .dealloc = bpf_kprobe_multi_link_dealloc, + .dealloc_deferred = bpf_kprobe_multi_link_dealloc, .fill_link_info = bpf_kprobe_multi_link_fill_link_info, }; @@ -3242,7 +3242,7 @@ static int bpf_uprobe_multi_link_fill_link_info(const struct bpf_link *link, static const struct bpf_link_ops bpf_uprobe_multi_link_lops = { .release = bpf_uprobe_multi_link_release, - .dealloc = bpf_uprobe_multi_link_dealloc, + .dealloc_deferred = bpf_uprobe_multi_link_dealloc, .fill_link_info = bpf_uprobe_multi_link_fill_link_info, }; -- 2.43.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH bpf 2/2] bpf: support deferring bpf_link dealloc to after RCU grace period 2024-03-28 5:24 ` [PATCH bpf 2/2] bpf: support deferring bpf_link dealloc to after RCU grace period Andrii Nakryiko @ 2024-03-28 10:10 ` Jiri Olsa 0 siblings, 0 replies; 4+ messages in thread From: Jiri Olsa @ 2024-03-28 10:10 UTC (permalink / raw) To: Andrii Nakryiko Cc: bpf, ast, daniel, martin.lau, syzbot+981935d9485a560bfbcb, syzbot+2cb5a6c573e98db598cc, syzbot+62d8b26793e8a2bd0516 On Wed, Mar 27, 2024 at 10:24:26PM -0700, Andrii Nakryiko wrote: > BPF link for some program types is passed as a "context" which can be > used by those BPF programs to look up additional information. E.g., for > multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values. > > Because of this runtime dependency, when bpf_link refcnt drops to zero > there could still be active BPF programs running accessing link data. > > This patch adds generic support to defer bpf_link dealloc callback to > after RCU GP, if requested. This is done by exposing two different > deallocation callbacks, one synchronous and one deferred. If deferred > one is provided, bpf_link_free() will schedule dealloc_deferred() > callback to happen after RCU GP. > > BPF is using two flavors of RCU: "classic" non-sleepable one and RCU > tasks trace one. The latter is used when sleepable BPF programs are > used. bpf_link_free() accommodates that by checking underlying BPF > program's sleepable flag, and goes either through normal RCU GP only for > non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP > (taking into account rcu_trace_implies_rcu_gp() optimization), if BPF > program is sleepable. > > We use this for multi-kprobe and multi-uprobe links, which dereference > link during program run. We also preventively switch raw_tp link to use > deferred dealloc callback, as upcoming changes in bpf-next tree expose > raw_tp link data (specifically, cookie value) to BPF program at runtime > as well. nice catch.. I thought there'd be more link types accesing link data in runtime.. but looks like it's just [ku]probe_multi Acked-by: Jiri Olsa <jolsa@kernel.org> jirka > > Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") > Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") > Reported-by: syzbot+981935d9485a560bfbcb@syzkaller.appspotmail.com > Reported-by: syzbot+2cb5a6c573e98db598cc@syzkaller.appspotmail.com > Reported-by: syzbot+62d8b26793e8a2bd0516@syzkaller.appspotmail.com > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > --- > include/linux/bpf.h | 16 +++++++++++++++- > kernel/bpf/syscall.c | 35 ++++++++++++++++++++++++++++++++--- > kernel/trace/bpf_trace.c | 4 ++-- > 3 files changed, 49 insertions(+), 6 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 4f20f62f9d63..890e152d553e 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1574,12 +1574,26 @@ struct bpf_link { > enum bpf_link_type type; > const struct bpf_link_ops *ops; > struct bpf_prog *prog; > - struct work_struct work; > + /* rcu is used before freeing, work can be used to schedule that > + * RCU-based freeing before that, so they never overlap > + */ > + union { > + struct rcu_head rcu; > + struct work_struct work; > + }; > }; > > struct bpf_link_ops { > void (*release)(struct bpf_link *link); > + /* deallocate link resources callback, called without RCU grace period > + * waiting > + */ > void (*dealloc)(struct bpf_link *link); > + /* deallocate link resources callback, called after RCU grace period; > + * if underlying BPF program is sleepable we go through tasks trace > + * RCU GP and then "classic" RCU GP > + */ > + void (*dealloc_deferred)(struct bpf_link *link); > int (*detach)(struct bpf_link *link); > int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog, > struct bpf_prog *old_prog); > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index ae2ff73bde7e..c287925471f6 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -3024,17 +3024,46 @@ void bpf_link_inc(struct bpf_link *link) > atomic64_inc(&link->refcnt); > } > > +static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) > +{ > + struct bpf_link *link = container_of(rcu, struct bpf_link, rcu); > + > + /* free bpf_link and its containing memory */ > + link->ops->dealloc_deferred(link); > +} > + > +static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) > +{ > + if (rcu_trace_implies_rcu_gp()) > + bpf_link_defer_dealloc_rcu_gp(rcu); > + else > + call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); > +} > + > /* bpf_link_free is guaranteed to be called from process context */ > static void bpf_link_free(struct bpf_link *link) > { > + bool sleepable = false; > + > bpf_link_free_id(link->id); > if (link->prog) { > + sleepable = link->prog->sleepable; > /* detach BPF program, clean up used resources */ > link->ops->release(link); > bpf_prog_put(link->prog); > } > - /* free bpf_link and its containing memory */ > - link->ops->dealloc(link); > + if (link->ops->dealloc_deferred) { > + /* schedule BPF link deallocation; if underlying BPF program > + * is sleepable, we need to first wait for RCU tasks trace > + * sync, then go through "classic" RCU grace period > + */ > + if (sleepable) > + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); > + else > + call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); > + } > + if (link->ops->dealloc) > + link->ops->dealloc(link); > } > > static void bpf_link_put_deferred(struct work_struct *work) > @@ -3544,7 +3573,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link, > > static const struct bpf_link_ops bpf_raw_tp_link_lops = { > .release = bpf_raw_tp_link_release, > - .dealloc = bpf_raw_tp_link_dealloc, > + .dealloc_deferred = bpf_raw_tp_link_dealloc, > .show_fdinfo = bpf_raw_tp_link_show_fdinfo, > .fill_link_info = bpf_raw_tp_link_fill_link_info, > }; > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 0b73fe5f7206..9dc605f08a23 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -2728,7 +2728,7 @@ static int bpf_kprobe_multi_link_fill_link_info(const struct bpf_link *link, > > static const struct bpf_link_ops bpf_kprobe_multi_link_lops = { > .release = bpf_kprobe_multi_link_release, > - .dealloc = bpf_kprobe_multi_link_dealloc, > + .dealloc_deferred = bpf_kprobe_multi_link_dealloc, > .fill_link_info = bpf_kprobe_multi_link_fill_link_info, > }; > > @@ -3242,7 +3242,7 @@ static int bpf_uprobe_multi_link_fill_link_info(const struct bpf_link *link, > > static const struct bpf_link_ops bpf_uprobe_multi_link_lops = { > .release = bpf_uprobe_multi_link_release, > - .dealloc = bpf_uprobe_multi_link_dealloc, > + .dealloc_deferred = bpf_uprobe_multi_link_dealloc, > .fill_link_info = bpf_uprobe_multi_link_fill_link_info, > }; > > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH bpf 1/2] bpf: put uprobe link's path and task in release callback 2024-03-28 5:24 [PATCH bpf 1/2] bpf: put uprobe link's path and task in release callback Andrii Nakryiko 2024-03-28 5:24 ` [PATCH bpf 2/2] bpf: support deferring bpf_link dealloc to after RCU grace period Andrii Nakryiko @ 2024-03-29 2:00 ` patchwork-bot+netdevbpf 1 sibling, 0 replies; 4+ messages in thread From: patchwork-bot+netdevbpf @ 2024-03-29 2:00 UTC (permalink / raw) To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, jolsa Hello: This series was applied to bpf/bpf.git (master) by Alexei Starovoitov <ast@kernel.org>: On Wed, 27 Mar 2024 22:24:25 -0700 you wrote: > There is no need to delay putting either path or task to deallocation > step. It can be done right after bpf_uprobe_unregister. Between release > and dealloc, there could be still some running BPF programs, but they > don't access either task or path, only data in link->uprobes, so it is > safe to do. > > On the other hand, doing path_put() in dealloc callback makes this > dealloc sleepable because path_put() itself might sleep. Which is > problematic due to the need to call uprobe's dealloc through call_rcu(), > which is what is done in the next bug fix patch. So solve the problem by > releasing these resources early. > > [...] Here is the summary with links: - [bpf,1/2] bpf: put uprobe link's path and task in release callback https://git.kernel.org/bpf/bpf/c/e9c856cabefb - [bpf,2/2] bpf: support deferring bpf_link dealloc to after RCU grace period https://git.kernel.org/bpf/bpf/c/1a80dbcb2dba You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-03-29 2:00 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-28 5:24 [PATCH bpf 1/2] bpf: put uprobe link's path and task in release callback Andrii Nakryiko 2024-03-28 5:24 ` [PATCH bpf 2/2] bpf: support deferring bpf_link dealloc to after RCU grace period Andrii Nakryiko 2024-03-28 10:10 ` Jiri Olsa 2024-03-29 2:00 ` [PATCH bpf 1/2] bpf: put uprobe link's path and task in release callback patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox