* [PATCH bpf-next v1 1/3] bpf: Enforce RCU protection for KF_RCU_PROTECTED
2025-09-15 2:47 [PATCH bpf-next v1 0/3] Update KF_RCU_PROTECTED, add KF_RET_RCU Kumar Kartikeya Dwivedi
@ 2025-09-15 2:47 ` Kumar Kartikeya Dwivedi
2025-09-15 2:47 ` [PATCH bpf-next v1 2/3] bpf: Add support for KF_RET_RCU flag Kumar Kartikeya Dwivedi
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-09-15 2:47 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Tejun Heo, Andrea Righi, kkd,
kernel-team
Currently, KF_RCU_PROTECTED only applies to iterator APIs and that too
in a convoluted fashion: the presence of this flag on the kfunc is used
to set MEM_RCU in iterator type, and the lack of RCU protection results
in an error only later, once next() or destroy() methods are invoked on
the iterator. While there is no bug, this is certainly a bit
unintuitive, and makes the enforcement of the flag iterator specific.
In the interest of making this flag useful for other upcoming kfuncs,
e.g. scx_bpf_cpu_curr() [0][1], add enforcement for invoking the kfunc
in an RCU critical section in general.
This would also mean that iterator APIs using KF_RCU_PROTECTED will
error out earlier, instead of throwing an error for lack of RCU CS
protection when next() or destroy() methods are invoked.
[0]: https://lore.kernel.org/all/20250903212311.369697-3-christian.loehle@arm.com
[1]: https://lore.kernel.org/all/20250909195709.92669-1-arighi@nvidia.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
Documentation/bpf/kfuncs.rst | 13 ++++++++++++-
kernel/bpf/verifier.c | 5 +++++
.../testing/selftests/bpf/progs/cgroup_read_xattr.c | 2 +-
.../selftests/bpf/progs/iters_task_failure.c | 4 ++--
4 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index ae468b781d31..18ba1f7c26b3 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -335,9 +335,20 @@ consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely
also be KF_RET_NULL.
+2.4.8 KF_RCU_PROTECTED flag
+---------------------------
+
+The KF_RCU_PROTECTED flag is used to indicate that the kfunc must be invoked in
+an RCU critical section. This is assumed by default in non-sleepable programs,
+and must be explicitly ensured by calling ``bpf_rcu_read_lock`` for sleepable
+ones. The flag is distinct from the ``KF_RCU`` flag, which only ensures that its
+arguments are at least RCU protected pointers. This may transitively imply that
+RCU protection is ensured, but it does not work in cases of kfuncs which require
+RCU protection but do not take RCU protected arguments.
+
.. _KF_deprecated_flag:
-2.4.8 KF_DEPRECATED flag
+2.4.9 KF_DEPRECATED flag
------------------------
The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 17fe623400a5..aa7c82ab50b9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13916,6 +13916,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
return -EACCES;
}
+ if (is_kfunc_rcu_protected(&meta) && !in_rcu_cs(env)) {
+ verbose(env, "kernel func %s requires RCU critical section protection\n", func_name);
+ return -EACCES;
+ }
+
/* In case of release function, we get register number of refcounted
* PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now.
*/
diff --git a/tools/testing/selftests/bpf/progs/cgroup_read_xattr.c b/tools/testing/selftests/bpf/progs/cgroup_read_xattr.c
index 092db1d0435e..88e13e17ec9e 100644
--- a/tools/testing/selftests/bpf/progs/cgroup_read_xattr.c
+++ b/tools/testing/selftests/bpf/progs/cgroup_read_xattr.c
@@ -73,7 +73,7 @@ int BPF_PROG(use_css_iter_non_sleepable)
}
SEC("lsm.s/socket_connect")
-__failure __msg("expected an RCU CS")
+__failure __msg("kernel func bpf_iter_css_new requires RCU critical section protection")
int BPF_PROG(use_css_iter_sleepable_missing_rcu_lock)
{
u64 cgrp_id = bpf_get_current_cgroup_id();
diff --git a/tools/testing/selftests/bpf/progs/iters_task_failure.c b/tools/testing/selftests/bpf/progs/iters_task_failure.c
index 6b1588d70652..fe3663dedbe1 100644
--- a/tools/testing/selftests/bpf/progs/iters_task_failure.c
+++ b/tools/testing/selftests/bpf/progs/iters_task_failure.c
@@ -15,7 +15,7 @@ void bpf_rcu_read_lock(void) __ksym;
void bpf_rcu_read_unlock(void) __ksym;
SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
-__failure __msg("expected an RCU CS when using bpf_iter_task_next")
+__failure __msg("kernel func bpf_iter_task_new requires RCU critical section protection")
int BPF_PROG(iter_tasks_without_lock)
{
struct task_struct *pos;
@@ -27,7 +27,7 @@ int BPF_PROG(iter_tasks_without_lock)
}
SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
-__failure __msg("expected an RCU CS when using bpf_iter_css_next")
+__failure __msg("kernel func bpf_iter_css_new requires RCU critical section protection")
int BPF_PROG(iter_css_without_lock)
{
u64 cg_id = bpf_get_current_cgroup_id();
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH bpf-next v1 2/3] bpf: Add support for KF_RET_RCU flag
2025-09-15 2:47 [PATCH bpf-next v1 0/3] Update KF_RCU_PROTECTED, add KF_RET_RCU Kumar Kartikeya Dwivedi
2025-09-15 2:47 ` [PATCH bpf-next v1 1/3] bpf: Enforce RCU protection for KF_RCU_PROTECTED Kumar Kartikeya Dwivedi
@ 2025-09-15 2:47 ` Kumar Kartikeya Dwivedi
2025-09-15 6:14 ` Andrea Righi
2025-09-15 2:47 ` [PATCH bpf-next v1 3/3] selftests/bpf: Add tests for KF_RET_RCU Kumar Kartikeya Dwivedi
2025-09-15 12:20 ` [PATCH bpf-next v1 0/3] Update KF_RCU_PROTECTED, add KF_RET_RCU Andrea Righi
3 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-09-15 2:47 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Tejun Heo, Andrea Righi, kkd,
kernel-team
Add a kfunc annotation 'KF_RET_RCU' to signal that the return type must
be marked MEM_RCU, to return objects that are RCU protected. Naturally,
this must imply that the kfunc is invoked in an RCU critical section,
and thus the presence of this flag implies the presence of the
KF_RCU_PROTECTED flag. Upcoming kfunc scx_bpf_cpu_curr() [0] will be
made to make use of this flag.
[0]: https://lore.kernel.org/all/20250903212311.369697-3-christian.loehle@arm.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
Documentation/bpf/kfuncs.rst | 13 +++++++++++--
include/linux/btf.h | 1 +
kernel/bpf/verifier.c | 7 +++++++
3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 18ba1f7c26b3..7d1b7009338b 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -346,10 +346,19 @@ arguments are at least RCU protected pointers. This may transitively imply that
RCU protection is ensured, but it does not work in cases of kfuncs which require
RCU protection but do not take RCU protected arguments.
+2.4.9 KF_RET_RCU flag
+---------------------
+
+The KF_RET_RCU flag is used for kfuncs which return pointers to RCU protected
+objects. Since this only works when the invocation of the kfunc is made in an
+active RCU critical section, the usage of this flag implies ``KF_RCU_PROTECTED``
+flag automatically. This flag may be combined with other return value modifiers,
+such as ``KF_RET_NULL``.
+
.. _KF_deprecated_flag:
-2.4.9 KF_DEPRECATED flag
-------------------------
+2.4.10 KF_DEPRECATED flag
+-------------------------
The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
changed or removed in a subsequent kernel release. A kfunc that is
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 9eda6b113f9b..97205b8a938c 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -79,6 +79,7 @@
#define KF_ARENA_RET (1 << 13) /* kfunc returns an arena pointer */
#define KF_ARENA_ARG1 (1 << 14) /* kfunc takes an arena pointer as its first argument */
#define KF_ARENA_ARG2 (1 << 15) /* kfunc takes an arena pointer as its second argument */
+#define KF_RET_RCU ((1 << 16) | KF_RCU_PROTECTED) /* kfunc returns an RCU protected pointer, implies KF_RCU_PROTECTED */
/*
* Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index aa7c82ab50b9..f1cc602ed556 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12342,6 +12342,11 @@ static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
return meta->kfunc_flags & KF_RET_NULL;
}
+static bool is_kfunc_ret_rcu(struct bpf_kfunc_call_arg_meta *meta)
+{
+ return meta->kfunc_flags & KF_RET_RCU;
+}
+
static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
{
return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_lock];
@@ -14042,6 +14047,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache])
regs[BPF_REG_0].type |= PTR_UNTRUSTED;
+ else if (is_kfunc_ret_rcu(&meta))
+ regs[BPF_REG_0].type |= MEM_RCU;
if (is_iter_next_kfunc(&meta)) {
struct bpf_reg_state *cur_iter;
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH bpf-next v1 2/3] bpf: Add support for KF_RET_RCU flag
2025-09-15 2:47 ` [PATCH bpf-next v1 2/3] bpf: Add support for KF_RET_RCU flag Kumar Kartikeya Dwivedi
@ 2025-09-15 6:14 ` Andrea Righi
2025-09-15 7:13 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 10+ messages in thread
From: Andrea Righi @ 2025-09-15 6:14 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Tejun Heo, kkd, kernel-team
Hi Kumar,
thanks for looking at this! Comment below.
On Mon, Sep 15, 2025 at 02:47:30AM +0000, Kumar Kartikeya Dwivedi wrote:
> Add a kfunc annotation 'KF_RET_RCU' to signal that the return type must
> be marked MEM_RCU, to return objects that are RCU protected. Naturally,
> this must imply that the kfunc is invoked in an RCU critical section,
> and thus the presence of this flag implies the presence of the
> KF_RCU_PROTECTED flag. Upcoming kfunc scx_bpf_cpu_curr() [0] will be
> made to make use of this flag.
I'm not sure we actually need two separate annotations, I can't think of a
case where KF_RCU_PROTECTED would be useful without also having KF_RET_RCU.
What I mean is: if the kfunc is only meant to be called inside an RCU
critical section, but doesn't return an RCU-protected pointer, then we can
simply add rcu_read_lock/unlock() internally in the kfunc. And for kfuncs
that take RCU-protected arguments we already have KF_RCU.
So it seems sufficient to have a single annotation that implements the
semantic "this kfunc returns an RCU-protected pointer".
What do you think?
Thanks,
-Andrea
>
> [0]: https://lore.kernel.org/all/20250903212311.369697-3-christian.loehle@arm.com
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
> Documentation/bpf/kfuncs.rst | 13 +++++++++++--
> include/linux/btf.h | 1 +
> kernel/bpf/verifier.c | 7 +++++++
> 3 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> index 18ba1f7c26b3..7d1b7009338b 100644
> --- a/Documentation/bpf/kfuncs.rst
> +++ b/Documentation/bpf/kfuncs.rst
> @@ -346,10 +346,19 @@ arguments are at least RCU protected pointers. This may transitively imply that
> RCU protection is ensured, but it does not work in cases of kfuncs which require
> RCU protection but do not take RCU protected arguments.
>
> +2.4.9 KF_RET_RCU flag
> +---------------------
> +
> +The KF_RET_RCU flag is used for kfuncs which return pointers to RCU protected
> +objects. Since this only works when the invocation of the kfunc is made in an
> +active RCU critical section, the usage of this flag implies ``KF_RCU_PROTECTED``
> +flag automatically. This flag may be combined with other return value modifiers,
> +such as ``KF_RET_NULL``.
> +
> .. _KF_deprecated_flag:
>
> -2.4.9 KF_DEPRECATED flag
> -------------------------
> +2.4.10 KF_DEPRECATED flag
> +-------------------------
>
> The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
> changed or removed in a subsequent kernel release. A kfunc that is
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 9eda6b113f9b..97205b8a938c 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -79,6 +79,7 @@
> #define KF_ARENA_RET (1 << 13) /* kfunc returns an arena pointer */
> #define KF_ARENA_ARG1 (1 << 14) /* kfunc takes an arena pointer as its first argument */
> #define KF_ARENA_ARG2 (1 << 15) /* kfunc takes an arena pointer as its second argument */
> +#define KF_RET_RCU ((1 << 16) | KF_RCU_PROTECTED) /* kfunc returns an RCU protected pointer, implies KF_RCU_PROTECTED */
>
> /*
> * Tag marking a kernel function as a kfunc. This is meant to minimize the
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index aa7c82ab50b9..f1cc602ed556 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12342,6 +12342,11 @@ static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
> return meta->kfunc_flags & KF_RET_NULL;
> }
>
> +static bool is_kfunc_ret_rcu(struct bpf_kfunc_call_arg_meta *meta)
> +{
> + return meta->kfunc_flags & KF_RET_RCU;
> +}
> +
> static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
> {
> return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_lock];
> @@ -14042,6 +14047,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>
> if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache])
> regs[BPF_REG_0].type |= PTR_UNTRUSTED;
> + else if (is_kfunc_ret_rcu(&meta))
> + regs[BPF_REG_0].type |= MEM_RCU;
>
> if (is_iter_next_kfunc(&meta)) {
> struct bpf_reg_state *cur_iter;
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH bpf-next v1 2/3] bpf: Add support for KF_RET_RCU flag
2025-09-15 6:14 ` Andrea Righi
@ 2025-09-15 7:13 ` Kumar Kartikeya Dwivedi
2025-09-15 8:02 ` Andrea Righi
0 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-09-15 7:13 UTC (permalink / raw)
To: Andrea Righi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Tejun Heo, kkd, kernel-team
On Mon, 15 Sept 2025 at 08:14, Andrea Righi <arighi@nvidia.com> wrote:
>
> Hi Kumar,
>
> thanks for looking at this! Comment below.
>
> On Mon, Sep 15, 2025 at 02:47:30AM +0000, Kumar Kartikeya Dwivedi wrote:
> > Add a kfunc annotation 'KF_RET_RCU' to signal that the return type must
> > be marked MEM_RCU, to return objects that are RCU protected. Naturally,
> > this must imply that the kfunc is invoked in an RCU critical section,
> > and thus the presence of this flag implies the presence of the
> > KF_RCU_PROTECTED flag. Upcoming kfunc scx_bpf_cpu_curr() [0] will be
> > made to make use of this flag.
>
> I'm not sure we actually need two separate annotations, I can't think of a
> case where KF_RCU_PROTECTED would be useful without also having KF_RET_RCU.
>
> What I mean is: if the kfunc is only meant to be called inside an RCU
> critical section, but doesn't return an RCU-protected pointer, then we can
> simply add rcu_read_lock/unlock() internally in the kfunc. And for kfuncs
> that take RCU-protected arguments we already have KF_RCU.
>
> So it seems sufficient to have a single annotation that implements the
> semantic "this kfunc returns an RCU-protected pointer".
Yeah, that seems reasonable in general, but we already have iterator
APIs (bpf_iter_task_{new,next,destroy}()) that collectively require
RCU CS to be open throughout the three calls. That's why I just
cleaned up the internal logic for KF_RCU_PROTECTED and made KF_RET_RCU
as what you're suggesting (i.e., fold KF_RCU_PROTECTED into it), which
I assume will be most useful for the majority of kfuncs that are not
iterators.
>
> What do you think?
>
> Thanks,
> -Andrea
>
> >
> > [0]: https://lore.kernel.org/all/20250903212311.369697-3-christian.loehle@arm.com
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> > Documentation/bpf/kfuncs.rst | 13 +++++++++++--
> > include/linux/btf.h | 1 +
> > kernel/bpf/verifier.c | 7 +++++++
> > 3 files changed, 19 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > index 18ba1f7c26b3..7d1b7009338b 100644
> > --- a/Documentation/bpf/kfuncs.rst
> > +++ b/Documentation/bpf/kfuncs.rst
> > @@ -346,10 +346,19 @@ arguments are at least RCU protected pointers. This may transitively imply that
> > RCU protection is ensured, but it does not work in cases of kfuncs which require
> > RCU protection but do not take RCU protected arguments.
> >
> > +2.4.9 KF_RET_RCU flag
> > +---------------------
> > +
> > +The KF_RET_RCU flag is used for kfuncs which return pointers to RCU protected
> > +objects. Since this only works when the invocation of the kfunc is made in an
> > +active RCU critical section, the usage of this flag implies ``KF_RCU_PROTECTED``
> > +flag automatically. This flag may be combined with other return value modifiers,
> > +such as ``KF_RET_NULL``.
> > +
> > .. _KF_deprecated_flag:
> >
> > -2.4.9 KF_DEPRECATED flag
> > -------------------------
> > +2.4.10 KF_DEPRECATED flag
> > +-------------------------
> >
> > The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
> > changed or removed in a subsequent kernel release. A kfunc that is
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index 9eda6b113f9b..97205b8a938c 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -79,6 +79,7 @@
> > #define KF_ARENA_RET (1 << 13) /* kfunc returns an arena pointer */
> > #define KF_ARENA_ARG1 (1 << 14) /* kfunc takes an arena pointer as its first argument */
> > #define KF_ARENA_ARG2 (1 << 15) /* kfunc takes an arena pointer as its second argument */
> > +#define KF_RET_RCU ((1 << 16) | KF_RCU_PROTECTED) /* kfunc returns an RCU protected pointer, implies KF_RCU_PROTECTED */
> >
> > /*
> > * Tag marking a kernel function as a kfunc. This is meant to minimize the
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index aa7c82ab50b9..f1cc602ed556 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -12342,6 +12342,11 @@ static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
> > return meta->kfunc_flags & KF_RET_NULL;
> > }
> >
> > +static bool is_kfunc_ret_rcu(struct bpf_kfunc_call_arg_meta *meta)
> > +{
> > + return meta->kfunc_flags & KF_RET_RCU;
> > +}
> > +
> > static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
> > {
> > return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_lock];
> > @@ -14042,6 +14047,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >
> > if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache])
> > regs[BPF_REG_0].type |= PTR_UNTRUSTED;
> > + else if (is_kfunc_ret_rcu(&meta))
> > + regs[BPF_REG_0].type |= MEM_RCU;
> >
> > if (is_iter_next_kfunc(&meta)) {
> > struct bpf_reg_state *cur_iter;
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH bpf-next v1 2/3] bpf: Add support for KF_RET_RCU flag
2025-09-15 7:13 ` Kumar Kartikeya Dwivedi
@ 2025-09-15 8:02 ` Andrea Righi
2025-09-15 17:55 ` Alexei Starovoitov
0 siblings, 1 reply; 10+ messages in thread
From: Andrea Righi @ 2025-09-15 8:02 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Tejun Heo, kkd, kernel-team
On Mon, Sep 15, 2025 at 09:13:15AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Mon, 15 Sept 2025 at 08:14, Andrea Righi <arighi@nvidia.com> wrote:
> >
> > Hi Kumar,
> >
> > thanks for looking at this! Comment below.
> >
> > On Mon, Sep 15, 2025 at 02:47:30AM +0000, Kumar Kartikeya Dwivedi wrote:
> > > Add a kfunc annotation 'KF_RET_RCU' to signal that the return type must
> > > be marked MEM_RCU, to return objects that are RCU protected. Naturally,
> > > this must imply that the kfunc is invoked in an RCU critical section,
> > > and thus the presence of this flag implies the presence of the
> > > KF_RCU_PROTECTED flag. Upcoming kfunc scx_bpf_cpu_curr() [0] will be
> > > made to make use of this flag.
> >
> > I'm not sure we actually need two separate annotations, I can't think of a
> > case where KF_RCU_PROTECTED would be useful without also having KF_RET_RCU.
> >
> > What I mean is: if the kfunc is only meant to be called inside an RCU
> > critical section, but doesn't return an RCU-protected pointer, then we can
> > simply add rcu_read_lock/unlock() internally in the kfunc. And for kfuncs
> > that take RCU-protected arguments we already have KF_RCU.
> >
> > So it seems sufficient to have a single annotation that implements the
> > semantic "this kfunc returns an RCU-protected pointer".
>
> Yeah, that seems reasonable in general, but we already have iterator
> APIs (bpf_iter_task_{new,next,destroy}()) that collectively require
> RCU CS to be open throughout the three calls. That's why I just
> cleaned up the internal logic for KF_RCU_PROTECTED and made KF_RET_RCU
> as what you're suggesting (i.e., fold KF_RCU_PROTECTED into it), which
> I assume will be most useful for the majority of kfuncs that are not
> iterators.
Right, my suggestion was to fold the KF_RET_RCU semantics into
KF_RCU_PROTECTED, even if the kfunc doesn't return anything (assuming it's
possible). Then annotate both iterators and kfuncs that require RCU as
KF_RCU_PROTECTED. This should handle both, right?
In any case, I don't have a strong opinion on that and I'm also happy with
KF_RCU_RET and KF_RCU_PROTECTED if it gives more flexibility.
Thanks,
-Andrea
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH bpf-next v1 2/3] bpf: Add support for KF_RET_RCU flag
2025-09-15 8:02 ` Andrea Righi
@ 2025-09-15 17:55 ` Alexei Starovoitov
2025-09-17 2:18 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2025-09-15 17:55 UTC (permalink / raw)
To: Andrea Righi
Cc: Kumar Kartikeya Dwivedi, bpf, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Tejun Heo,
kkd, Kernel Team
On Mon, Sep 15, 2025 at 1:03 AM Andrea Righi <arighi@nvidia.com> wrote:
>
> On Mon, Sep 15, 2025 at 09:13:15AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Mon, 15 Sept 2025 at 08:14, Andrea Righi <arighi@nvidia.com> wrote:
> > >
> > > Hi Kumar,
> > >
> > > thanks for looking at this! Comment below.
> > >
> > > On Mon, Sep 15, 2025 at 02:47:30AM +0000, Kumar Kartikeya Dwivedi wrote:
> > > > Add a kfunc annotation 'KF_RET_RCU' to signal that the return type must
> > > > be marked MEM_RCU, to return objects that are RCU protected. Naturally,
> > > > this must imply that the kfunc is invoked in an RCU critical section,
> > > > and thus the presence of this flag implies the presence of the
> > > > KF_RCU_PROTECTED flag. Upcoming kfunc scx_bpf_cpu_curr() [0] will be
> > > > made to make use of this flag.
> > >
> > > I'm not sure we actually need two separate annotations, I can't think of a
> > > case where KF_RCU_PROTECTED would be useful without also having KF_RET_RCU.
> > >
> > > What I mean is: if the kfunc is only meant to be called inside an RCU
> > > critical section, but doesn't return an RCU-protected pointer, then we can
> > > simply add rcu_read_lock/unlock() internally in the kfunc. And for kfuncs
> > > that take RCU-protected arguments we already have KF_RCU.
> > >
> > > So it seems sufficient to have a single annotation that implements the
> > > semantic "this kfunc returns an RCU-protected pointer".
> >
> > Yeah, that seems reasonable in general, but we already have iterator
> > APIs (bpf_iter_task_{new,next,destroy}()) that collectively require
> > RCU CS to be open throughout the three calls. That's why I just
> > cleaned up the internal logic for KF_RCU_PROTECTED and made KF_RET_RCU
> > as what you're suggesting (i.e., fold KF_RCU_PROTECTED into it), which
> > I assume will be most useful for the majority of kfuncs that are not
> > iterators.
>
> Right, my suggestion was to fold the KF_RET_RCU semantics into
> KF_RCU_PROTECTED, even if the kfunc doesn't return anything (assuming it's
> possible). Then annotate both iterators and kfuncs that require RCU as
> KF_RCU_PROTECTED. This should handle both, right?
+1 to this suggestion.
I think we can extend KF_RCU_PROTECTED semantics to mean
that the returned pointer is only usable in RCU CS.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH bpf-next v1 2/3] bpf: Add support for KF_RET_RCU flag
2025-09-15 17:55 ` Alexei Starovoitov
@ 2025-09-17 2:18 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-09-17 2:18 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Andrea Righi, bpf, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Tejun Heo,
kkd, Kernel Team
On Mon, 15 Sept 2025 at 19:55, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Sep 15, 2025 at 1:03 AM Andrea Righi <arighi@nvidia.com> wrote:
> >
> > On Mon, Sep 15, 2025 at 09:13:15AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Mon, 15 Sept 2025 at 08:14, Andrea Righi <arighi@nvidia.com> wrote:
> > > >
> > > > Hi Kumar,
> > > >
> > > > thanks for looking at this! Comment below.
> > > >
> > > > On Mon, Sep 15, 2025 at 02:47:30AM +0000, Kumar Kartikeya Dwivedi wrote:
> > > > > Add a kfunc annotation 'KF_RET_RCU' to signal that the return type must
> > > > > be marked MEM_RCU, to return objects that are RCU protected. Naturally,
> > > > > this must imply that the kfunc is invoked in an RCU critical section,
> > > > > and thus the presence of this flag implies the presence of the
> > > > > KF_RCU_PROTECTED flag. Upcoming kfunc scx_bpf_cpu_curr() [0] will be
> > > > > made to make use of this flag.
> > > >
> > > > I'm not sure we actually need two separate annotations, I can't think of a
> > > > case where KF_RCU_PROTECTED would be useful without also having KF_RET_RCU.
> > > >
> > > > What I mean is: if the kfunc is only meant to be called inside an RCU
> > > > critical section, but doesn't return an RCU-protected pointer, then we can
> > > > simply add rcu_read_lock/unlock() internally in the kfunc. And for kfuncs
> > > > that take RCU-protected arguments we already have KF_RCU.
> > > >
> > > > So it seems sufficient to have a single annotation that implements the
> > > > semantic "this kfunc returns an RCU-protected pointer".
> > >
> > > Yeah, that seems reasonable in general, but we already have iterator
> > > APIs (bpf_iter_task_{new,next,destroy}()) that collectively require
> > > RCU CS to be open throughout the three calls. That's why I just
> > > cleaned up the internal logic for KF_RCU_PROTECTED and made KF_RET_RCU
> > > as what you're suggesting (i.e., fold KF_RCU_PROTECTED into it), which
> > > I assume will be most useful for the majority of kfuncs that are not
> > > iterators.
> >
> > Right, my suggestion was to fold the KF_RET_RCU semantics into
> > KF_RCU_PROTECTED, even if the kfunc doesn't return anything (assuming it's
> > possible). Then annotate both iterators and kfuncs that require RCU as
> > KF_RCU_PROTECTED. This should handle both, right?
>
> +1 to this suggestion.
> I think we can extend KF_RCU_PROTECTED semantics to mean
> that the returned pointer is only usable in RCU CS.
Ok, I'll drop KF_RET_RCU and use KF_RCU_PROTECTED to imply it in v2.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH bpf-next v1 3/3] selftests/bpf: Add tests for KF_RET_RCU
2025-09-15 2:47 [PATCH bpf-next v1 0/3] Update KF_RCU_PROTECTED, add KF_RET_RCU Kumar Kartikeya Dwivedi
2025-09-15 2:47 ` [PATCH bpf-next v1 1/3] bpf: Enforce RCU protection for KF_RCU_PROTECTED Kumar Kartikeya Dwivedi
2025-09-15 2:47 ` [PATCH bpf-next v1 2/3] bpf: Add support for KF_RET_RCU flag Kumar Kartikeya Dwivedi
@ 2025-09-15 2:47 ` Kumar Kartikeya Dwivedi
2025-09-15 12:20 ` [PATCH bpf-next v1 0/3] Update KF_RCU_PROTECTED, add KF_RET_RCU Andrea Righi
3 siblings, 0 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-09-15 2:47 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Tejun Heo, Andrea Righi, kkd,
kernel-team
Add a couple of test cases to ensure RCU protection is kicked in
automatically, and the return type is as expected.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
.../selftests/bpf/progs/iters_testmod.c | 23 +++++++++++++++++++
.../selftests/bpf/test_kmods/bpf_testmod.c | 6 +++++
.../bpf/test_kmods/bpf_testmod_kfunc.h | 1 +
3 files changed, 30 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/iters_testmod.c b/tools/testing/selftests/bpf/progs/iters_testmod.c
index 9e4b45201e69..ab4a519e8004 100644
--- a/tools/testing/selftests/bpf/progs/iters_testmod.c
+++ b/tools/testing/selftests/bpf/progs/iters_testmod.c
@@ -123,3 +123,26 @@ int iter_next_ptr_mem_not_trusted(const void *ctx)
bpf_iter_num_destroy(&num_it);
return 0;
}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("kernel func bpf_kfunc_ret_rcu_test requires RCU critical section protection")
+int iter_ret_rcu_test_protected(const void *ctx)
+{
+ struct task_struct *p;
+
+ p = bpf_kfunc_ret_rcu_test();
+ return p->pid;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("R1 type=rcu_ptr_or_null_ expected=")
+int iter_ret_rcu_test_type(const void *ctx)
+{
+ struct task_struct *p;
+
+ bpf_rcu_read_lock();
+ p = bpf_kfunc_ret_rcu_test();
+ bpf_this_cpu_ptr(p);
+ bpf_rcu_read_unlock();
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
index 2beb9b2fcbd8..fac0e08fa803 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
@@ -218,6 +218,11 @@ __bpf_kfunc void bpf_kfunc_rcu_task_test(struct task_struct *ptr)
{
}
+__bpf_kfunc struct task_struct *bpf_kfunc_ret_rcu_test(void)
+{
+ return NULL;
+}
+
__bpf_kfunc struct bpf_testmod_ctx *
bpf_testmod_ctx_create(int *err)
{
@@ -623,6 +628,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_trusted_vma_test, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_kfunc_trusted_task_test, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_kfunc_trusted_num_test, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_kfunc_rcu_task_test, KF_RCU)
+BTF_ID_FLAGS(func, bpf_kfunc_ret_rcu_test, KF_RET_NULL | KF_RET_RCU)
BTF_ID_FLAGS(func, bpf_testmod_ctx_create, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_testmod_ctx_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_testmod_ops3_call_test_1)
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h b/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h
index 286e7faa4754..3281f1e446cc 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod_kfunc.h
@@ -158,6 +158,7 @@ void bpf_kfunc_trusted_vma_test(struct vm_area_struct *ptr) __ksym;
void bpf_kfunc_trusted_task_test(struct task_struct *ptr) __ksym;
void bpf_kfunc_trusted_num_test(int *ptr) __ksym;
void bpf_kfunc_rcu_task_test(struct task_struct *ptr) __ksym;
+struct task_struct *bpf_kfunc_ret_rcu_test(void) __ksym;
int bpf_kfunc_multi_st_ops_test_1(struct st_ops_args *args, u32 id) __ksym;
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH bpf-next v1 0/3] Update KF_RCU_PROTECTED, add KF_RET_RCU
2025-09-15 2:47 [PATCH bpf-next v1 0/3] Update KF_RCU_PROTECTED, add KF_RET_RCU Kumar Kartikeya Dwivedi
` (2 preceding siblings ...)
2025-09-15 2:47 ` [PATCH bpf-next v1 3/3] selftests/bpf: Add tests for KF_RET_RCU Kumar Kartikeya Dwivedi
@ 2025-09-15 12:20 ` Andrea Righi
3 siblings, 0 replies; 10+ messages in thread
From: Andrea Righi @ 2025-09-15 12:20 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Tejun Heo, kkd, kernel-team
On Mon, Sep 15, 2025 at 02:47:28AM +0000, Kumar Kartikeya Dwivedi wrote:
> Currently, KF_RCU_PROTECTED only applies to iterator APIs and that too
> in a convoluted fashion: the presence of this flag on the kfunc is used
> to set MEM_RCU in iterator type, and the lack of RCU protection results
> in an error only later, once next() or destroy() methods are invoked on
> the iterator. While there is no bug, this is certainly a bit unintuitive,
> and makes the enforcement of the flag iterator specific.
>
> In the interest of making this flag useful for other upcoming kfuncs,
> e.g. scx_bpf_cpu_curr() [0][1], add enforcement for invoking the kfunc
> in an RCU critical section in general.
>
> In addition to this, the aforementioned kfunc also needs to return an
> RCU protected pointer, which currently has no generic kfunc flag or
> annotation. Add such a flag as well while we are at it.
>
> [0]: https://lore.kernel.org/all/20250903212311.369697-3-christian.loehle@arm.com
> [1]: https://lore.kernel.org/all/20250909195709.92669-1-arighi@nvidia.com
In the meantime, everything looks good from a sched_ext perspective. With
this applied we can implement the correct scx_bpf_cpu_curr() behavior.
Tested-by: Andrea Righi <arighi@nvidia.com>
Thanks,
-Andrea
^ permalink raw reply [flat|nested] 10+ messages in thread