bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/2] Remove use of current->cgns in bpf_cgroup_from_id
@ 2025-08-11 19:58 Kumar Kartikeya Dwivedi
  2025-08-11 19:59 ` [PATCH bpf-next v2 1/2] bpf: Do not limit bpf_cgroup_from_id to current's namespace Kumar Kartikeya Dwivedi
  2025-08-11 19:59 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns Kumar Kartikeya Dwivedi
  0 siblings, 2 replies; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-11 19:58 UTC (permalink / raw)
  To: bpf, tj
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Dan Schatzberg, kkd,
	kernel-team

bpf_cgroup_from_id currently ends up doing a check on whether the cgroup
being looked up is a descendant of the root cgroup of the current task's
cgroup namespace. This leads to unreliable results since this kfunc can
be invoked from any arbitrary context, for any arbitrary value of
current. Fix this by removing namespace-awarness in the kfunc, and
include a test that detects such a case and fails without the fix.

Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/20250811175045.1055202-1-memxor@gmail.com

 * Add Ack from Tejun.
 * Fix selftest to perform namespace migration and cgroup setup in a
   child process to avoid changing test_progs namespace.

Kumar Kartikeya Dwivedi (2):
  bpf: Do not limit bpf_cgroup_from_id to current's namespace
  selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root
    cgns

 include/linux/cgroup.h                        |  2 +-
 kernel/bpf/cgroup_iter.c                      |  2 +-
 kernel/bpf/helpers.c                          |  2 +-
 kernel/cgroup/cgroup.c                        |  7 +-
 .../selftests/bpf/prog_tests/cgrp_kfunc.c     | 76 +++++++++++++++++++
 .../selftests/bpf/progs/cgrp_kfunc_success.c  | 12 +++
 6 files changed, 97 insertions(+), 4 deletions(-)


base-commit: fa479132845e94b60068fad01c2a9979b3efe2dc
-- 
2.47.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v2 1/2] bpf: Do not limit bpf_cgroup_from_id to current's namespace
  2025-08-11 19:58 [PATCH bpf-next v2 0/2] Remove use of current->cgns in bpf_cgroup_from_id Kumar Kartikeya Dwivedi
@ 2025-08-11 19:59 ` Kumar Kartikeya Dwivedi
  2025-08-12 23:20   ` Andrii Nakryiko
  2025-08-11 19:59 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-11 19:59 UTC (permalink / raw)
  To: bpf, tj
  Cc: Dan Schatzberg, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, kkd,
	kernel-team

The bpf_cgroup_from_id kfunc relies on cgroup_get_from_id to obtain the
cgroup corresponding to a given cgroup ID. This helper can be called in
a lot of contexts where the current thread can be random. A recent
example was its use in sched_ext's ops.tick(), to obtain the root cgroup
pointer. Since the current task can be whatever random user space task
preempted by the timer tick, this makes the behavior of the helper
unreliable.

Resolve this by refactoring cgroup_get_from_id to take a parameter to
elide the cgroup_is_descendant check when root_cgns parameter is set to
true.

There is no compatibility breakage here, since changing the namespace
against which the lookup is being done to the root cgroup namespace only
permits a wider set of lookups to succeed now. The cgroup IDs across
namespaces are globally unique, and thus don't need to be retranslated.

Reported-by: Dan Schatzberg <dschatzberg@meta.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/cgroup.h   | 2 +-
 kernel/bpf/cgroup_iter.c | 2 +-
 kernel/bpf/helpers.c     | 2 +-
 kernel/cgroup/cgroup.c   | 7 ++++++-
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b18fb5fcb38e..da757a496fbe 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -650,7 +650,7 @@ static inline void cgroup_kthread_ready(void)
 }
 
 void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen);
-struct cgroup *cgroup_get_from_id(u64 id);
+struct cgroup *cgroup_get_from_id(u64 id, bool root_cgns);
 #else /* !CONFIG_CGROUPS */
 
 struct cgroup_subsys_state;
diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
index f04a468cf6a7..49234d035583 100644
--- a/kernel/bpf/cgroup_iter.c
+++ b/kernel/bpf/cgroup_iter.c
@@ -212,7 +212,7 @@ static int bpf_iter_attach_cgroup(struct bpf_prog *prog,
 	if (fd)
 		cgrp = cgroup_v1v2_get_from_fd(fd);
 	else if (id)
-		cgrp = cgroup_get_from_id(id);
+		cgrp = cgroup_get_from_id(id, false);
 	else /* walk the entire hierarchy by default. */
 		cgrp = cgroup_get_from_path("/");
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6b4877e85a68..12466103917f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2537,7 +2537,7 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid)
 {
 	struct cgroup *cgrp;
 
-	cgrp = cgroup_get_from_id(cgid);
+	cgrp = cgroup_get_from_id(cgid, true);
 	if (IS_ERR(cgrp))
 		return NULL;
 	return cgrp;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 312c6a8b55bb..b490e1e0d2c4 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6345,10 +6345,11 @@ void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
 /*
  * cgroup_get_from_id : get the cgroup associated with cgroup id
  * @id: cgroup id
+ * @root_cgns: Select root cgroup namespace instead of current's.
  * On success return the cgrp or ERR_PTR on failure
  * Only cgroups within current task's cgroup NS are valid.
  */
-struct cgroup *cgroup_get_from_id(u64 id)
+struct cgroup *cgroup_get_from_id(u64 id, bool root_cgns)
 {
 	struct kernfs_node *kn;
 	struct cgroup *cgrp, *root_cgrp;
@@ -6374,6 +6375,10 @@ struct cgroup *cgroup_get_from_id(u64 id)
 	if (!cgrp)
 		return ERR_PTR(-ENOENT);
 
+	/* We don't need to namespace this operation against current. */
+	if (root_cgns)
+		return cgrp;
+
 	root_cgrp = current_cgns_cgroup_dfl();
 	if (!cgroup_is_descendant(cgrp, root_cgrp)) {
 		cgroup_put(cgrp);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v2 2/2] selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns
  2025-08-11 19:58 [PATCH bpf-next v2 0/2] Remove use of current->cgns in bpf_cgroup_from_id Kumar Kartikeya Dwivedi
  2025-08-11 19:59 ` [PATCH bpf-next v2 1/2] bpf: Do not limit bpf_cgroup_from_id to current's namespace Kumar Kartikeya Dwivedi
@ 2025-08-11 19:59 ` Kumar Kartikeya Dwivedi
  2025-08-11 22:51   ` Eduard Zingerman
  1 sibling, 1 reply; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-11 19:59 UTC (permalink / raw)
  To: bpf, tj
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Dan Schatzberg, kkd,
	kernel-team

Make sure that we only switch the cgroup namespace and enter a new
cgroup in a child process separate from test_progs, to not mess up the
environment for subsequent tests.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/cgrp_kfunc.c     | 76 +++++++++++++++++++
 .../selftests/bpf/progs/cgrp_kfunc_success.c  | 12 +++
 2 files changed, 88 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c b/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
index adda85f97058..e75a29728f9c 100644
--- a/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
@@ -4,6 +4,7 @@
 #define _GNU_SOURCE
 #include <cgroup_helpers.h>
 #include <test_progs.h>
+#include <sched.h>
 
 #include "cgrp_kfunc_failure.skel.h"
 #include "cgrp_kfunc_success.skel.h"
@@ -87,6 +88,78 @@ static const char * const success_tests[] = {
 	"test_cgrp_from_id",
 };
 
+static void test_cgrp_from_id_ns(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	struct cgrp_kfunc_success *skel;
+	struct bpf_program *prog;
+	int fd, pid, pipe_fd[2];
+
+	skel = open_load_cgrp_kfunc_skel();
+	if (!ASSERT_OK_PTR(skel, "open_load_skel"))
+		return;
+
+	if (!ASSERT_OK(skel->bss->err, "pre_mkdir_err"))
+		goto cleanup;
+
+	prog = bpf_object__find_program_by_name(skel->obj, "test_cgrp_from_id_ns");
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+
+	if (!ASSERT_OK(pipe(pipe_fd), "pipe"))
+		goto cleanup;
+
+	pid = fork();
+	if (!ASSERT_GE(pid, 0, "fork result"))
+		goto pipe_cleanup;
+
+	if (pid == 0) {
+		int ret = 1;
+
+		close(pipe_fd[0]);
+		fd = create_and_get_cgroup("cgrp_from_id_ns");
+		if (!ASSERT_GE(fd, 0, "cgrp_fd"))
+			_exit(1);
+
+		if (!ASSERT_OK(join_cgroup("cgrp_from_id_ns"), "join cgrp"))
+			goto fail;
+
+		if (!ASSERT_OK(unshare(CLONE_NEWCGROUP), "unshare cgns"))
+			goto fail;
+
+		ret = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
+		if (!ASSERT_OK(ret, "test run ret"))
+			goto fail;
+
+		remove_cgroup("cgrp_from_id_ns");
+
+		if (!ASSERT_OK(opts.retval, "test run retval"))
+			_exit(1);
+		ret = 0;
+		close(fd);
+		if (!ASSERT_EQ(write(pipe_fd[1], &ret, sizeof(ret)), sizeof(ret), "write pipe"))
+			_exit(1);
+
+		_exit(0);
+fail:
+		remove_cgroup("cgrp_from_id_ns");
+		_exit(1);
+	} else {
+		int res;
+
+		close(pipe_fd[1]);
+		if (!ASSERT_EQ(read(pipe_fd[0], &res, sizeof(res)), sizeof(res), "read res"))
+			goto pipe_cleanup;
+		if (!ASSERT_OK(res, "result from run"))
+			goto pipe_cleanup;
+	}
+
+pipe_cleanup:
+	close(pipe_fd[1]);
+cleanup:
+	cgrp_kfunc_success__destroy(skel);
+}
+
 void test_cgrp_kfunc(void)
 {
 	int i, err;
@@ -102,6 +175,9 @@ void test_cgrp_kfunc(void)
 		run_success_test(success_tests[i]);
 	}
 
+	if (test__start_subtest("test_cgrp_from_id_ns"))
+		test_cgrp_from_id_ns();
+
 	RUN_TESTS(cgrp_kfunc_failure);
 
 cleanup:
diff --git a/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c b/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c
index 5354455a01be..02d8f160ca0e 100644
--- a/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c
+++ b/tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c
@@ -221,3 +221,15 @@ int BPF_PROG(test_cgrp_from_id, struct cgroup *cgrp, const char *path)
 
 	return 0;
 }
+
+SEC("syscall")
+int test_cgrp_from_id_ns(void *ctx)
+{
+	struct cgroup *cg;
+
+	cg = bpf_cgroup_from_id(1);
+	if (!cg)
+		return 42;
+	bpf_cgroup_release(cg);
+	return 0;
+}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2 2/2] selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns
  2025-08-11 19:59 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns Kumar Kartikeya Dwivedi
@ 2025-08-11 22:51   ` Eduard Zingerman
  2025-08-11 22:57     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 7+ messages in thread
From: Eduard Zingerman @ 2025-08-11 22:51 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf, tj
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dan Schatzberg, kkd, kernel-team

On Mon, 2025-08-11 at 12:59 -0700, Kumar Kartikeya Dwivedi wrote:

[...]

> +static void test_cgrp_from_id_ns(void)
> +{
> +	LIBBPF_OPTS(bpf_test_run_opts, opts);
> +	struct cgrp_kfunc_success *skel;
> +	struct bpf_program *prog;
> +	int fd, pid, pipe_fd[2];
> +
> +	skel = open_load_cgrp_kfunc_skel();
> +	if (!ASSERT_OK_PTR(skel, "open_load_skel"))
> +		return;
> +
> +	if (!ASSERT_OK(skel->bss->err, "pre_mkdir_err"))
> +		goto cleanup;
> +
> +	prog = bpf_object__find_program_by_name(skel->obj, "test_cgrp_from_id_ns");

Nit: skel->test_cgrp_from_id_ns ?

> +	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> +		goto cleanup;
> +
> +	if (!ASSERT_OK(pipe(pipe_fd), "pipe"))
> +		goto cleanup;
> +
> +	pid = fork();
> +	if (!ASSERT_GE(pid, 0, "fork result"))
> +		goto pipe_cleanup;
> +
> +	if (pid == 0) {
> +		int ret = 1;
> +
> +		close(pipe_fd[0]);
> +		fd = create_and_get_cgroup("cgrp_from_id_ns");
> +		if (!ASSERT_GE(fd, 0, "cgrp_fd"))
> +			_exit(1);
> +
> +		if (!ASSERT_OK(join_cgroup("cgrp_from_id_ns"), "join cgrp"))
> +			goto fail;
> +
> +		if (!ASSERT_OK(unshare(CLONE_NEWCGROUP), "unshare cgns"))
> +			goto fail;
> +
> +		ret = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
> +		if (!ASSERT_OK(ret, "test run ret"))
> +			goto fail;
> +
> +		remove_cgroup("cgrp_from_id_ns");

If this test is executed in -vvv mode, the following is printed:

  (cgroup_helpers.c:412: errno: Device or resource busy) rmdiring cgroup cgrp_from_id_ns ...

And cgroup is still in place after exit.  As far as I understand,
child process needs to change cgroup again or remove_cgroup needs to
be called in the parent process.

> +
> +		if (!ASSERT_OK(opts.retval, "test run retval"))
> +			_exit(1);

Nit: why not 'exit'? '_exit' does not flush file descriptors.

> +		ret = 0;
> +		close(fd);
> +		if (!ASSERT_EQ(write(pipe_fd[1], &ret, sizeof(ret)), sizeof(ret), "write pipe"))
> +			_exit(1);
> +
> +		_exit(0);
> +fail:
> +		remove_cgroup("cgrp_from_id_ns");
> +		_exit(1);
> +	} else {
> +		int res;
> +
> +		close(pipe_fd[1]);
> +		if (!ASSERT_EQ(read(pipe_fd[0], &res, sizeof(res)), sizeof(res), "read res"))
> +			goto pipe_cleanup;
> +		if (!ASSERT_OK(res, "result from run"))
> +			goto pipe_cleanup;
> +	}
> +
> +pipe_cleanup:
> +	close(pipe_fd[1]);

Nit: should this be pipe_fd[0]?
     in case of a fork() failure, should this be both?

> +cleanup:
> +	cgrp_kfunc_success__destroy(skel);
> +}
> +
>  void test_cgrp_kfunc(void)
>  {
>  	int i, err;

[...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2 2/2] selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns
  2025-08-11 22:51   ` Eduard Zingerman
@ 2025-08-11 22:57     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-11 22:57 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, tj, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Dan Schatzberg, kkd, kernel-team

On Tue, 12 Aug 2025 at 00:51, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2025-08-11 at 12:59 -0700, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > +static void test_cgrp_from_id_ns(void)
> > +{
> > +     LIBBPF_OPTS(bpf_test_run_opts, opts);
> > +     struct cgrp_kfunc_success *skel;
> > +     struct bpf_program *prog;
> > +     int fd, pid, pipe_fd[2];
> > +
> > +     skel = open_load_cgrp_kfunc_skel();
> > +     if (!ASSERT_OK_PTR(skel, "open_load_skel"))
> > +             return;
> > +
> > +     if (!ASSERT_OK(skel->bss->err, "pre_mkdir_err"))
> > +             goto cleanup;
> > +
> > +     prog = bpf_object__find_program_by_name(skel->obj, "test_cgrp_from_id_ns");
>
> Nit: skel->test_cgrp_from_id_ns ?

Fixed.

>
> > +     if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> > +             goto cleanup;
> > +
> > +     if (!ASSERT_OK(pipe(pipe_fd), "pipe"))
> > +             goto cleanup;
> > +
> > +     pid = fork();
> > +     if (!ASSERT_GE(pid, 0, "fork result"))
> > +             goto pipe_cleanup;
> > +
> > +     if (pid == 0) {
> > +             int ret = 1;
> > +
> > +             close(pipe_fd[0]);
> > +             fd = create_and_get_cgroup("cgrp_from_id_ns");
> > +             if (!ASSERT_GE(fd, 0, "cgrp_fd"))
> > +                     _exit(1);
> > +
> > +             if (!ASSERT_OK(join_cgroup("cgrp_from_id_ns"), "join cgrp"))
> > +                     goto fail;
> > +
> > +             if (!ASSERT_OK(unshare(CLONE_NEWCGROUP), "unshare cgns"))
> > +                     goto fail;
> > +
> > +             ret = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
> > +             if (!ASSERT_OK(ret, "test run ret"))
> > +                     goto fail;
> > +
> > +             remove_cgroup("cgrp_from_id_ns");
>
> If this test is executed in -vvv mode, the following is printed:
>
>   (cgroup_helpers.c:412: errno: Device or resource busy) rmdiring cgroup cgrp_from_id_ns ...
>
> And cgroup is still in place after exit.  As far as I understand,
> child process needs to change cgroup again or remove_cgroup needs to
> be called in the parent process.

It is probably because the cgroup fd is still open, I will check and fix.

>
> > +
> > +             if (!ASSERT_OK(opts.retval, "test run retval"))
> > +                     _exit(1);
>
> Nit: why not 'exit'? '_exit' does not flush file descriptors.

Fixed.

>
> > +             ret = 0;
> > +             close(fd);
> > +             if (!ASSERT_EQ(write(pipe_fd[1], &ret, sizeof(ret)), sizeof(ret), "write pipe"))
> > +                     _exit(1);
> > +
> > +             _exit(0);
> > +fail:
> > +             remove_cgroup("cgrp_from_id_ns");
> > +             _exit(1);
> > +     } else {
> > +             int res;
> > +
> > +             close(pipe_fd[1]);
> > +             if (!ASSERT_EQ(read(pipe_fd[0], &res, sizeof(res)), sizeof(res), "read res"))
> > +                     goto pipe_cleanup;
> > +             if (!ASSERT_OK(res, "result from run"))
> > +                     goto pipe_cleanup;
> > +     }
> > +
> > +pipe_cleanup:
> > +     close(pipe_fd[1]);
>
> Nit: should this be pipe_fd[0]?
>      in case of a fork() failure, should this be both?

Yeah, fixed.

>
> > +cleanup:
> > +     cgrp_kfunc_success__destroy(skel);
> > +}
> > +
> >  void test_cgrp_kfunc(void)
> >  {
> >       int i, err;
>
> [...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Do not limit bpf_cgroup_from_id to current's namespace
  2025-08-11 19:59 ` [PATCH bpf-next v2 1/2] bpf: Do not limit bpf_cgroup_from_id to current's namespace Kumar Kartikeya Dwivedi
@ 2025-08-12 23:20   ` Andrii Nakryiko
  2025-08-12 23:36     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 7+ messages in thread
From: Andrii Nakryiko @ 2025-08-12 23:20 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, tj, Dan Schatzberg, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, kkd,
	kernel-team

On Mon, Aug 11, 2025 at 12:59 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> The bpf_cgroup_from_id kfunc relies on cgroup_get_from_id to obtain the
> cgroup corresponding to a given cgroup ID. This helper can be called in
> a lot of contexts where the current thread can be random. A recent
> example was its use in sched_ext's ops.tick(), to obtain the root cgroup
> pointer. Since the current task can be whatever random user space task
> preempted by the timer tick, this makes the behavior of the helper
> unreliable.
>
> Resolve this by refactoring cgroup_get_from_id to take a parameter to
> elide the cgroup_is_descendant check when root_cgns parameter is set to
> true.
>
> There is no compatibility breakage here, since changing the namespace
> against which the lookup is being done to the root cgroup namespace only
> permits a wider set of lookups to succeed now. The cgroup IDs across
> namespaces are globally unique, and thus don't need to be retranslated.
>
> Reported-by: Dan Schatzberg <dschatzberg@meta.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/cgroup.h   | 2 +-
>  kernel/bpf/cgroup_iter.c | 2 +-
>  kernel/bpf/helpers.c     | 2 +-
>  kernel/cgroup/cgroup.c   | 7 ++++++-

hmm... I see mm/memcontrol.c and block/blk-cgroup-fc-appid.c using
this function as well, but you didn't update them, so under some
kernel config this will break?

but I'm also wondering if it wouldn't be cleaner to keep
cgroup_get_from_id() as is, and add __cgroup_get_from_id(u64 id) that
would not perform a cgroup_is_descendant() check. BPF helpers that
don't need this root descendancy would call __cgroup_get_from_id()
directly, while cgroup_get_from_id() would use __cgroup_get_from_id()
first, and then (if successful) would perform addition root_cgrp
check?


>  4 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index b18fb5fcb38e..da757a496fbe 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -650,7 +650,7 @@ static inline void cgroup_kthread_ready(void)
>  }
>
>  void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen);
> -struct cgroup *cgroup_get_from_id(u64 id);
> +struct cgroup *cgroup_get_from_id(u64 id, bool root_cgns);
>  #else /* !CONFIG_CGROUPS */
>
>  struct cgroup_subsys_state;
> diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
> index f04a468cf6a7..49234d035583 100644
> --- a/kernel/bpf/cgroup_iter.c
> +++ b/kernel/bpf/cgroup_iter.c
> @@ -212,7 +212,7 @@ static int bpf_iter_attach_cgroup(struct bpf_prog *prog,
>         if (fd)
>                 cgrp = cgroup_v1v2_get_from_fd(fd);
>         else if (id)
> -               cgrp = cgroup_get_from_id(id);
> +               cgrp = cgroup_get_from_id(id, false);
>         else /* walk the entire hierarchy by default. */
>                 cgrp = cgroup_get_from_path("/");
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 6b4877e85a68..12466103917f 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2537,7 +2537,7 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid)
>  {
>         struct cgroup *cgrp;
>
> -       cgrp = cgroup_get_from_id(cgid);
> +       cgrp = cgroup_get_from_id(cgid, true);
>         if (IS_ERR(cgrp))
>                 return NULL;
>         return cgrp;
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 312c6a8b55bb..b490e1e0d2c4 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6345,10 +6345,11 @@ void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
>  /*
>   * cgroup_get_from_id : get the cgroup associated with cgroup id
>   * @id: cgroup id
> + * @root_cgns: Select root cgroup namespace instead of current's.
>   * On success return the cgrp or ERR_PTR on failure
>   * Only cgroups within current task's cgroup NS are valid.
>   */
> -struct cgroup *cgroup_get_from_id(u64 id)
> +struct cgroup *cgroup_get_from_id(u64 id, bool root_cgns)
>  {
>         struct kernfs_node *kn;
>         struct cgroup *cgrp, *root_cgrp;
> @@ -6374,6 +6375,10 @@ struct cgroup *cgroup_get_from_id(u64 id)
>         if (!cgrp)
>                 return ERR_PTR(-ENOENT);
>
> +       /* We don't need to namespace this operation against current. */
> +       if (root_cgns)
> +               return cgrp;
> +
>         root_cgrp = current_cgns_cgroup_dfl();
>         if (!cgroup_is_descendant(cgrp, root_cgrp)) {
>                 cgroup_put(cgrp);
> --
> 2.47.3
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Do not limit bpf_cgroup_from_id to current's namespace
  2025-08-12 23:20   ` Andrii Nakryiko
@ 2025-08-12 23:36     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 7+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-12 23:36 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, tj, Dan Schatzberg, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, kkd,
	kernel-team

On Wed, 13 Aug 2025 at 01:20, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Aug 11, 2025 at 12:59 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > The bpf_cgroup_from_id kfunc relies on cgroup_get_from_id to obtain the
> > cgroup corresponding to a given cgroup ID. This helper can be called in
> > a lot of contexts where the current thread can be random. A recent
> > example was its use in sched_ext's ops.tick(), to obtain the root cgroup
> > pointer. Since the current task can be whatever random user space task
> > preempted by the timer tick, this makes the behavior of the helper
> > unreliable.
> >
> > Resolve this by refactoring cgroup_get_from_id to take a parameter to
> > elide the cgroup_is_descendant check when root_cgns parameter is set to
> > true.
> >
> > There is no compatibility breakage here, since changing the namespace
> > against which the lookup is being done to the root cgroup namespace only
> > permits a wider set of lookups to succeed now. The cgroup IDs across
> > namespaces are globally unique, and thus don't need to be retranslated.
> >
> > Reported-by: Dan Schatzberg <dschatzberg@meta.com>
> > Acked-by: Tejun Heo <tj@kernel.org>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/cgroup.h   | 2 +-
> >  kernel/bpf/cgroup_iter.c | 2 +-
> >  kernel/bpf/helpers.c     | 2 +-
> >  kernel/cgroup/cgroup.c   | 7 ++++++-
>
> hmm... I see mm/memcontrol.c and block/blk-cgroup-fc-appid.c using
> this function as well, but you didn't update them, so under some
> kernel config this will break?

Yeah, addressed in upcoming v3, I also got a kernel test robot error
privately about it.

>
> but I'm also wondering if it wouldn't be cleaner to keep
> cgroup_get_from_id() as is, and add __cgroup_get_from_id(u64 id) that
> would not perform a cgroup_is_descendant() check. BPF helpers that
> don't need this root descendancy would call __cgroup_get_from_id()
> directly, while cgroup_get_from_id() would use __cgroup_get_from_id()
> first, and then (if successful) would perform addition root_cgrp
> check?

Yeah, I can refactor as such.

>
>
> >  4 files changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> > index b18fb5fcb38e..da757a496fbe 100644
> > --- a/include/linux/cgroup.h
> > +++ b/include/linux/cgroup.h
> > @@ -650,7 +650,7 @@ static inline void cgroup_kthread_ready(void)
> >  }
> >
> >  void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen);
> > -struct cgroup *cgroup_get_from_id(u64 id);
> > +struct cgroup *cgroup_get_from_id(u64 id, bool root_cgns);
> >  #else /* !CONFIG_CGROUPS */
> >
> >  struct cgroup_subsys_state;
> > diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
> > index f04a468cf6a7..49234d035583 100644
> > --- a/kernel/bpf/cgroup_iter.c
> > +++ b/kernel/bpf/cgroup_iter.c
> > @@ -212,7 +212,7 @@ static int bpf_iter_attach_cgroup(struct bpf_prog *prog,
> >         if (fd)
> >                 cgrp = cgroup_v1v2_get_from_fd(fd);
> >         else if (id)
> > -               cgrp = cgroup_get_from_id(id);
> > +               cgrp = cgroup_get_from_id(id, false);
> >         else /* walk the entire hierarchy by default. */
> >                 cgrp = cgroup_get_from_path("/");
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 6b4877e85a68..12466103917f 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2537,7 +2537,7 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid)
> >  {
> >         struct cgroup *cgrp;
> >
> > -       cgrp = cgroup_get_from_id(cgid);
> > +       cgrp = cgroup_get_from_id(cgid, true);
> >         if (IS_ERR(cgrp))
> >                 return NULL;
> >         return cgrp;
> > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> > index 312c6a8b55bb..b490e1e0d2c4 100644
> > --- a/kernel/cgroup/cgroup.c
> > +++ b/kernel/cgroup/cgroup.c
> > @@ -6345,10 +6345,11 @@ void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
> >  /*
> >   * cgroup_get_from_id : get the cgroup associated with cgroup id
> >   * @id: cgroup id
> > + * @root_cgns: Select root cgroup namespace instead of current's.
> >   * On success return the cgrp or ERR_PTR on failure
> >   * Only cgroups within current task's cgroup NS are valid.
> >   */
> > -struct cgroup *cgroup_get_from_id(u64 id)
> > +struct cgroup *cgroup_get_from_id(u64 id, bool root_cgns)
> >  {
> >         struct kernfs_node *kn;
> >         struct cgroup *cgrp, *root_cgrp;
> > @@ -6374,6 +6375,10 @@ struct cgroup *cgroup_get_from_id(u64 id)
> >         if (!cgrp)
> >                 return ERR_PTR(-ENOENT);
> >
> > +       /* We don't need to namespace this operation against current. */
> > +       if (root_cgns)
> > +               return cgrp;
> > +
> >         root_cgrp = current_cgns_cgroup_dfl();
> >         if (!cgroup_is_descendant(cgrp, root_cgrp)) {
> >                 cgroup_put(cgrp);
> > --
> > 2.47.3
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-12 23:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11 19:58 [PATCH bpf-next v2 0/2] Remove use of current->cgns in bpf_cgroup_from_id Kumar Kartikeya Dwivedi
2025-08-11 19:59 ` [PATCH bpf-next v2 1/2] bpf: Do not limit bpf_cgroup_from_id to current's namespace Kumar Kartikeya Dwivedi
2025-08-12 23:20   ` Andrii Nakryiko
2025-08-12 23:36     ` Kumar Kartikeya Dwivedi
2025-08-11 19:59 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns Kumar Kartikeya Dwivedi
2025-08-11 22:51   ` Eduard Zingerman
2025-08-11 22:57     ` Kumar Kartikeya Dwivedi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).