From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Andrii Nakryiko <andrii@kernel.org>,
Jordan Rife <jrife@google.com>,
Steven Rostedt <rostedt@goodmis.org>,
Sasha Levin <sashal@kernel.org>,
ast@kernel.org, daniel@iogearbox.net, bpf@vger.kernel.org
Subject: [PATCH AUTOSEL 6.6 11/24] bpf: put bpf_link's program when link is safe to be deallocated
Date: Wed, 4 Dec 2024 10:49:31 -0500 [thread overview]
Message-ID: <20241204155003.2213733-11-sashal@kernel.org> (raw)
In-Reply-To: <20241204155003.2213733-1-sashal@kernel.org>
From: Andrii Nakryiko <andrii@kernel.org>
[ Upstream commit f44ec8733a8469143fde1984b5e6931b2e2f6f3f ]
In general, BPF link's underlying BPF program should be considered to be
reachable through attach hook -> link -> prog chain, and, pessimistically,
we have to assume that as long as link's memory is not safe to free,
attach hook's code might hold a pointer to BPF program and use it.
As such, it's not (generally) correct to put link's program early before
waiting for RCU GPs to go through. More eager bpf_prog_put() that we
currently do is mostly correct due to BPF program's release code doing
similar RCU GP waiting, but as will be shown in the following patches,
BPF program can be non-sleepable (and, thus, reliant on only "classic"
RCU GP), while BPF link's attach hook can have sleepable semantics and
needs to be protected by RCU Tasks Trace, and for such cases BPF link
has to go through RCU Tasks Trace + "classic" RCU GPs before being
deallocated. And so, if we put BPF program early, we might free BPF
program before we free BPF link, leading to use-after-free situation.
So, this patch defers bpf_prog_put() until we are ready to perform
bpf_link's deallocation. At worst, this delays BPF program freeing by
one extra RCU GP, but that seems completely acceptable. Alternatively,
we'd need more elaborate ways to determine BPF hook, BPF link, and BPF
program lifetimes, and how they relate to each other, which seems like
an unnecessary complication.
Note, for most BPF links we still will perform eager bpf_prog_put() and
link dealloc, so for those BPF links there are no observable changes
whatsoever. Only BPF links that use deferred dealloc might notice
slightly delayed freeing of BPF programs.
Also, to reduce code and logic duplication, extract program put + link
dealloc logic into bpf_link_dealloc() helper.
Link: https://lore.kernel.org/20241101181754.782341-1-andrii@kernel.org
Tested-by: Jordan Rife <jrife@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/bpf/syscall.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 252aed82d45ea..ba38c08a9a059 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2870,12 +2870,24 @@ void bpf_link_inc(struct bpf_link *link)
atomic64_inc(&link->refcnt);
}
+static void bpf_link_dealloc(struct bpf_link *link)
+{
+ /* now that we know that bpf_link itself can't be reached, put underlying BPF program */
+ if (link->prog)
+ bpf_prog_put(link->prog);
+
+ /* free bpf_link and its containing memory */
+ if (link->ops->dealloc_deferred)
+ link->ops->dealloc_deferred(link);
+ else
+ link->ops->dealloc(link);
+}
+
static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu)
{
struct bpf_link *link = container_of(rcu, struct bpf_link, rcu);
- /* free bpf_link and its containing memory */
- link->ops->dealloc_deferred(link);
+ bpf_link_dealloc(link);
}
static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
@@ -2897,7 +2909,6 @@ static void bpf_link_free(struct bpf_link *link)
sleepable = link->prog->aux->sleepable;
/* detach BPF program, clean up used resources */
ops->release(link);
- bpf_prog_put(link->prog);
}
if (ops->dealloc_deferred) {
/* schedule BPF link deallocation; if underlying BPF program
@@ -2908,8 +2919,9 @@ static void bpf_link_free(struct bpf_link *link)
call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
else
call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp);
- } else if (ops->dealloc)
- ops->dealloc(link);
+ } else if (ops->dealloc) {
+ bpf_link_dealloc(link);
+ }
}
static void bpf_link_put_deferred(struct work_struct *work)
--
2.43.0
next prev parent reply other threads:[~2024-12-04 17:01 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-04 15:49 [PATCH AUTOSEL 6.6 01/24] pinctrl: freescale: fix COMPILE_TEST error with PINCTRL_IMX_SCU Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 02/24] tracing/ftrace: disable preemption in syscall probe Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 03/24] tracing: Use atomic64_inc_return() in trace_clock_counter() Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 04/24] tools/rtla: fix collision with glibc sched_attr/sched_set_attr Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 05/24] rtla/timerlat: Make timerlat_top_cpu->*_count unsigned long long Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 06/24] scsi: hisi_sas: Add cond_resched() for no forced preemption model Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 07/24] rtla/utils: Add idle state disabling via libcpupower Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 08/24] pinmux: Use sequential access to access desc->pinmux data Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 09/24] scsi: ufs: core: Make DMA mask configuration more flexible Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 10/24] mfd: axp20x: Allow multiple regulators Sasha Levin
2024-12-04 15:49 ` Sasha Levin [this message]
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 12/24] scsi: lpfc: Call lpfc_sli4_queue_unset() in restart and rmmod paths Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 13/24] clk: qcom: rcg2: add clk_rcg2_shared_floor_ops Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 14/24] clk: qcom: rpmh: add support for SAR2130P Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 15/24] clk: qcom: tcsrcc-sm8550: add SAR2130P support Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 16/24] leds: class: Protect brightness_show() with led_cdev->led_access mutex Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 17/24] scsi: st: Don't modify unknown block number in MTIOCGET Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 18/24] scsi: st: Add MTIOCGET and MTLOAD to ioctls allowed after device reset Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 19/24] pinctrl: qcom-pmic-gpio: add support for PM8937 Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 20/24] pinctrl: qcom: spmi-mpp: Add PM8937 compatible Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 21/24] thermal/drivers/qcom/tsens-v1: Add support for MSM8937 tsens Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 22/24] nvdimm: rectify the illogical code within nd_dax_probe() Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 23/24] smb: client: memcpy() with surrounding object base address Sasha Levin
2024-12-04 15:49 ` [PATCH AUTOSEL 6.6 24/24] verification/dot2: Improve dot parser robustness Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241204155003.2213733-11-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jrife@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox