* [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
@ 2023-09-03 14:27 Yafang Shao
2023-09-03 14:27 ` [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() " Yafang Shao
` (5 more replies)
0 siblings, 6 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-03 14:27 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed
Cc: cgroups, bpf, Yafang Shao
Currently, the cgroup_array map serves as a critical component for
bpf_current_under_cgroup() and bpf_skb_under_cgroup() functions, allowing
us to determine whether a task or a socket buffer (skb) resides within a
specific cgroup. However, a limitation exists as we can only store cgroup2
file descriptors in the cgroup_array map. This limitation stems from the
fact that cgroup_get_from_fd() exclusively supports cgroup2 file
descriptors. Fortunately, an alternative solution presents itself by
leveraging cgroup_v1v2_get_from_fd(), which accommodates both cgroup1 and
cgroup2 file descriptors.
It is essential to note that it is safe to utilize a cgroup1 pointer within
both bpf_current_under_cgroup() and bpf_skb_under_cgroup(), with the result
of receiving a "false" return value when verifying a cgroup1 pointer. To
enable the checking of tasks under a cgroup1 hierarchy, we can make a minor
modification to task_under_cgroup_hierarchy() to add support for cgroup1.
In our specific use case, we intend to use bpf_current_under_cgroup() to
audit whether the current task resides within specific containers.
Subsequently, we can use this information to create distinct ACLs within
our LSM BPF programs, enabling us to control specific operations performed
by these tasks.
Considering the widespread use of cgroup1 in container environments,
coupled with the considerable time it will take to transition to cgroup2,
implementing this change will significantly enhance the utility of BPF
in container scenarios. This is especially noteworthy because the necessary
adjustments can be made with minimal alterations to both the cgroup
subsystem and the BPF subsystem.
Yafang Shao (5):
cgroup: Enable task_under_cgroup_hierarchy() on cgroup1
bpf: Enable cgroup_array map on cgroup1
selftests/bpf: Fix issues in setup_classid_environment()
selftests/bpf: Add new cgroup helper open_classid()
selftests/bpf: Add selftests for current_under_cgroupv1v2
include/linux/cgroup.h | 24 ++++++-
kernel/bpf/arraymap.c | 2 +-
tools/testing/selftests/bpf/cgroup_helpers.c | 34 ++++++++--
tools/testing/selftests/bpf/cgroup_helpers.h | 1 +
.../bpf/prog_tests/current_under_cgroupv1v2.c | 76 ++++++++++++++++++++++
.../bpf/progs/test_current_under_cgroupv1v2.c | 31 +++++++++
6 files changed, 160 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/current_under_cgroupv1v2.c
create mode 100644 tools/testing/selftests/bpf/progs/test_current_under_cgroupv1v2.c
--
1.8.3.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() on cgroup1
2023-09-03 14:27 [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Yafang Shao
@ 2023-09-03 14:27 ` Yafang Shao
2023-09-06 19:53 ` Alexei Starovoitov
` (2 more replies)
2023-09-03 14:27 ` [RFC PATCH bpf-next 2/5] bpf: Enable cgroup_array map " Yafang Shao
` (4 subsequent siblings)
5 siblings, 3 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-03 14:27 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed
Cc: cgroups, bpf, Yafang Shao
Currently, the function task_under_cgroup_hierarchy() allows us to
determine if a task resides exclusively within a cgroup2 hierarchy.
Nevertheless, given the continued prevalence of cgroup1, it's useful that
we make a minor adjustment to extend its functionality to cgroup1 as well.
Once this modification is implemented, we will have the ability to
effortlessly verify a task's cgroup membership within BPF programs. For
instance, we can easily check if a task belongs to a cgroup1 directory,
such as /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/ or
/sys/fs/cgroup/cpu,cpuacct/kubepods/besteffort/.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
include/linux/cgroup.h | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b307013..5414a2c 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -543,15 +543,33 @@ static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp,
* @ancestor: possible ancestor of @task's cgroup
*
* Tests whether @task's default cgroup hierarchy is a descendant of @ancestor.
- * It follows all the same rules as cgroup_is_descendant, and only applies
- * to the default hierarchy.
+ * It follows all the same rules as cgroup_is_descendant.
*/
static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
struct cgroup *ancestor)
{
struct css_set *cset = task_css_set(task);
+ struct cgroup *cgrp;
+ bool ret = false;
+ int ssid;
+
+ if (ancestor->root == &cgrp_dfl_root)
+ return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
+
+ for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
+ if (!ancestor->subsys[ssid])
+ continue;
- return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
+ cgrp = task_css(task, ssid)->cgroup;
+ if (!cgrp)
+ continue;
+
+ if (!cgroup_is_descendant(cgrp, ancestor))
+ return false;
+ if (!ret)
+ ret = true;
+ }
+ return ret;
}
/* no synchronization, the result can only be used as a hint */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH bpf-next 2/5] bpf: Enable cgroup_array map on cgroup1
2023-09-03 14:27 [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Yafang Shao
2023-09-03 14:27 ` [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() " Yafang Shao
@ 2023-09-03 14:27 ` Yafang Shao
2023-09-06 19:54 ` Alexei Starovoitov
2023-09-03 14:27 ` [RFC PATCH bpf-next 3/5] selftests/bpf: Fix issues in setup_classid_environment() Yafang Shao
` (3 subsequent siblings)
5 siblings, 1 reply; 27+ messages in thread
From: Yafang Shao @ 2023-09-03 14:27 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed
Cc: cgroups, bpf, Yafang Shao
The cgroup_array map currently has support exclusively for cgroup2, owing
to the fact that cgroup_get_from_fd() is only valid for cgroup2 file
descriptors. However, an alternative approach is available where we can use
cgroup_v1v2_get_from_fd() for both cgroup1 and cgroup2 file descriptors.
The corresponding cgroup pointer extracted from the cgroup file descriptor
will be utilized by functions like bpf_current_task_under_cgroup() or
bpf_skb_under_cgroup() to determine whether a task or socket buffer (skb)
is associated with a specific cgroup. In a previous commit, we successfully
enabled bpf_current_task_under_cgroup(), ensuring the safety of storing a
cgroup1 pointer within the cgroup_array map.
Regarding bpf_skb_under_cgroup(), it is currently restricted to cgroup2
functionality only. Nevertheless, it remains safe to verify a cgroup1
pointer within this context as well, with the understanding that it will
return a "false" result in such cases.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
kernel/bpf/arraymap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2058e89..30ea57c 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1291,7 +1291,7 @@ static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
struct file *map_file /* not used */,
int fd)
{
- return cgroup_get_from_fd(fd);
+ return cgroup_v1v2_get_from_fd(fd);
}
static void cgroup_fd_array_put_ptr(void *ptr)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH bpf-next 3/5] selftests/bpf: Fix issues in setup_classid_environment()
2023-09-03 14:27 [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Yafang Shao
2023-09-03 14:27 ` [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() " Yafang Shao
2023-09-03 14:27 ` [RFC PATCH bpf-next 2/5] bpf: Enable cgroup_array map " Yafang Shao
@ 2023-09-03 14:27 ` Yafang Shao
2023-09-03 14:27 ` [RFC PATCH bpf-next 4/5] selftests/bpf: Add new cgroup helper open_classid() Yafang Shao
` (2 subsequent siblings)
5 siblings, 0 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-03 14:27 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed
Cc: cgroups, bpf, Yafang Shao
If the net_cls subsystem is already mounted, attempting to mount it again
in setup_classid_environment() will result in a failure with the error code
EBUSY. Despite this, tmpfs will have been successfully mounted at
/sys/fs/cgroup/net_cls. Consequently, the /sys/fs/cgroup/net_cls directory
will be empty, causing subsequent setup operations to fail.
Here's an error log excerpt illustrating the issue when net_cls has already
been mounted at /sys/fs/cgroup/net_cls prior to running
setup_classid_environment():
- Before that change
$ tools/testing/selftests/bpf/test_progs --name=cgroup_v1v2
test_cgroup_v1v2:PASS:server_fd 0 nsec
test_cgroup_v1v2:PASS:client_fd 0 nsec
test_cgroup_v1v2:PASS:cgroup_fd 0 nsec
test_cgroup_v1v2:PASS:server_fd 0 nsec
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
test_cgroup_v1v2:PASS:cgroup-v2-only 0 nsec
(cgroup_helpers.c:248: errno: No such file or directory) Opening Cgroup Procs: /sys/fs/cgroup/net_cls/cgroup.procs
(cgroup_helpers.c:540: errno: No such file or directory) Opening cgroup classid: /sys/fs/cgroup/net_cls/cgroup-test-work-dir/net_cls.classid
run_test:PASS:skel_open 0 nsec
run_test:PASS:prog_attach 0 nsec
(cgroup_helpers.c:248: errno: No such file or directory) Opening Cgroup Procs: /sys/fs/cgroup/net_cls/cgroup-test-work-dir/cgroup.procs
run_test:FAIL:join_classid unexpected error: 1 (errno 2)
test_cgroup_v1v2:FAIL:cgroup-v1v2 unexpected error: -1 (errno 2)
(cgroup_helpers.c:248: errno: No such file or directory) Opening Cgroup Procs: /sys/fs/cgroup/net_cls/cgroup.procs
#44 cgroup_v1v2:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
- After that change
$ tools/testing/selftests/bpf/test_progs --name=cgroup_v1v2
#44 cgroup_v1v2:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
tools/testing/selftests/bpf/cgroup_helpers.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index 2caee84..f68fbc6 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -499,10 +499,20 @@ int setup_classid_environment(void)
return 1;
}
- if (mount("net_cls", NETCLS_MOUNT_PATH, "cgroup", 0, "net_cls") &&
- errno != EBUSY) {
- log_err("mount cgroup net_cls");
- return 1;
+ if (mount("net_cls", NETCLS_MOUNT_PATH, "cgroup", 0, "net_cls")) {
+ if (errno != EBUSY) {
+ log_err("mount cgroup net_cls");
+ return 1;
+ }
+
+ if (rmdir(NETCLS_MOUNT_PATH)) {
+ log_err("rmdir cgroup net_cls");
+ return 1;
+ }
+ if (umount(CGROUP_MOUNT_DFLT)) {
+ log_err("umount cgroup base");
+ return 1;
+ }
}
cleanup_classid_environment();
--
1.8.3.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH bpf-next 4/5] selftests/bpf: Add new cgroup helper open_classid()
2023-09-03 14:27 [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Yafang Shao
` (2 preceding siblings ...)
2023-09-03 14:27 ` [RFC PATCH bpf-next 3/5] selftests/bpf: Fix issues in setup_classid_environment() Yafang Shao
@ 2023-09-03 14:27 ` Yafang Shao
2023-09-03 14:28 ` [RFC PATCH bpf-next 5/5] selftests/bpf: Add selftests for current_under_cgroupv1v2 Yafang Shao
2023-09-07 14:41 ` [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Michal Koutný
5 siblings, 0 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-03 14:27 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed
Cc: cgroups, bpf, Yafang Shao
Add a new cgroup helper open_classid() to get the net_cls cgroup fd.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
tools/testing/selftests/bpf/cgroup_helpers.c | 16 ++++++++++++++++
tools/testing/selftests/bpf/cgroup_helpers.h | 1 +
2 files changed, 17 insertions(+)
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index f68fbc6..2631efe 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -578,6 +578,22 @@ int join_classid(void)
}
/**
+ * open_classid() - Open a cgroupv1 net_cls classid
+ *
+ * This function expects the cgroup work dir to be already created, as we
+ * open it here.
+ *
+ * On success, it returns the file descriptor. On failure it returns -1.
+ */
+int open_classid(void)
+{
+ char cgroup_workdir[PATH_MAX + 1];
+
+ format_classid_path(cgroup_workdir);
+ return open(cgroup_workdir, O_RDONLY);
+}
+
+/**
* cleanup_classid_environment() - Cleanup the cgroupv1 net_cls environment
*
* At call time, it moves the calling process to the root cgroup, and then
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.h b/tools/testing/selftests/bpf/cgroup_helpers.h
index 5c2cb9c..ebc0513 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.h
+++ b/tools/testing/selftests/bpf/cgroup_helpers.h
@@ -31,6 +31,7 @@ int write_cgroup_file_parent(const char *relative_path, const char *file,
/* cgroupv1 related */
int set_classid(unsigned int id);
int join_classid(void);
+int open_classid(void);
int setup_classid_environment(void);
void cleanup_classid_environment(void);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [RFC PATCH bpf-next 5/5] selftests/bpf: Add selftests for current_under_cgroupv1v2
2023-09-03 14:27 [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Yafang Shao
` (3 preceding siblings ...)
2023-09-03 14:27 ` [RFC PATCH bpf-next 4/5] selftests/bpf: Add new cgroup helper open_classid() Yafang Shao
@ 2023-09-03 14:28 ` Yafang Shao
2023-09-07 14:41 ` [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Michal Koutný
5 siblings, 0 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-03 14:28 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed
Cc: cgroups, bpf, Yafang Shao
Add selftests for bpf_current_task_under_cgroup() on both cgroup1 and
cgroup2. The result as follows,
$ tools/testing/selftests/bpf/test_progs --name=current_under_cgroupv1v2
#62/1 current_under_cgroupv1v2/test_current_under_cgroup2:OK
#62/2 current_under_cgroupv1v2/test_current_under_cgroup1:OK
#62 current_under_cgroupv1v2:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
.../bpf/prog_tests/current_under_cgroupv1v2.c | 76 ++++++++++++++++++++++
.../bpf/progs/test_current_under_cgroupv1v2.c | 31 +++++++++
2 files changed, 107 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/current_under_cgroupv1v2.c
create mode 100644 tools/testing/selftests/bpf/progs/test_current_under_cgroupv1v2.c
diff --git a/tools/testing/selftests/bpf/prog_tests/current_under_cgroupv1v2.c b/tools/testing/selftests/bpf/prog_tests/current_under_cgroupv1v2.c
new file mode 100644
index 0000000..62efca3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/current_under_cgroupv1v2.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include "cgroup_helpers.h"
+#include "test_current_under_cgroupv1v2.skel.h"
+
+#define CGROUP2_DIR "/current_under_cgroup2"
+
+static void attach_progs(int cgrp_fd)
+{
+ struct test_current_under_cgroupv1v2 *skel;
+ int cgrp_map_fd, ret, idx = 0;
+
+ skel = test_current_under_cgroupv1v2__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "test_current_under_cgroupv1v2__open"))
+ return;
+
+ cgrp_map_fd = bpf_map__fd(skel->maps.cgrp_map);
+ ret = bpf_map_update_elem(cgrp_map_fd, &idx, &cgrp_fd, BPF_ANY);
+ if (!ASSERT_OK(ret, "update_cgrp_map"))
+ goto cleanup;
+
+ /* Attach LSM prog first */
+ skel->links.lsm_run = bpf_program__attach_lsm(skel->progs.lsm_run);
+ if (!ASSERT_OK_PTR(skel->links.lsm_run, "lsm_attach"))
+ goto cleanup;
+
+ /* LSM prog will be triggered when attaching fentry */
+ skel->links.fentry_run = bpf_program__attach_trace(skel->progs.fentry_run);
+ ASSERT_NULL(skel->links.fentry_run, "fentry_attach");
+
+cleanup:
+ test_current_under_cgroupv1v2__destroy(skel);
+}
+
+static void current_under_cgroup1(void)
+{
+ int cgrp_fd, ret;
+
+ /* Setup cgroup1 hierarchy */
+ ret = setup_classid_environment();
+ if (!ASSERT_OK(ret, "setup_classid_environment"))
+ return;
+
+ ret = join_classid();
+ if (!ASSERT_OK(ret, "join_cgroup1"))
+ goto cleanup;
+
+ cgrp_fd = open_classid();
+ attach_progs(cgrp_fd);
+ close(cgrp_fd);
+
+cleanup:
+ /* Cleanup cgroup1 hierarchy */
+ cleanup_classid_environment();
+}
+
+static void current_under_cgroup2(void)
+{
+ int cgrp_fd;
+
+ cgrp_fd = test__join_cgroup(CGROUP2_DIR);
+ if (!ASSERT_GE(cgrp_fd, 0, "cgroup_join_cgroup2"))
+ return;
+
+ attach_progs(cgrp_fd);
+ close(cgrp_fd);
+}
+
+void test_current_under_cgroupv1v2(void)
+{
+ if (test__start_subtest("test_current_under_cgroup2"))
+ current_under_cgroup2();
+ if (test__start_subtest("test_current_under_cgroup1"))
+ current_under_cgroup1();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_current_under_cgroupv1v2.c b/tools/testing/selftests/bpf/progs/test_current_under_cgroupv1v2.c
new file mode 100644
index 0000000..9f0af0b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_current_under_cgroupv1v2.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct {
+ __uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
+ __uint(key_size, sizeof(u32));
+ __uint(value_size, sizeof(u32));
+ __uint(max_entries, 1);
+} cgrp_map SEC(".maps");
+
+SEC("lsm/bpf")
+int BPF_PROG(lsm_run, int cmd, union bpf_attr *attr, unsigned int size)
+{
+ if (cmd != BPF_LINK_CREATE)
+ return 0;
+
+ if (bpf_current_task_under_cgroup(&cgrp_map, 0 /* map index */))
+ return -1;
+ return 0;
+}
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(fentry_run)
+{
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
1.8.3.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() on cgroup1
2023-09-03 14:27 ` [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() " Yafang Shao
@ 2023-09-06 19:53 ` Alexei Starovoitov
2023-09-06 20:13 ` Tejun Heo
2023-09-18 14:45 ` Michal Koutný
2 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2023-09-06 19:53 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed, cgroups, bpf
On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao wrote:
> Currently, the function task_under_cgroup_hierarchy() allows us to
> determine if a task resides exclusively within a cgroup2 hierarchy.
> Nevertheless, given the continued prevalence of cgroup1, it's useful that
> we make a minor adjustment to extend its functionality to cgroup1 as well.
> Once this modification is implemented, we will have the ability to
> effortlessly verify a task's cgroup membership within BPF programs. For
> instance, we can easily check if a task belongs to a cgroup1 directory,
> such as /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/ or
> /sys/fs/cgroup/cpu,cpuacct/kubepods/besteffort/.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> include/linux/cgroup.h | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index b307013..5414a2c 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -543,15 +543,33 @@ static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp,
> * @ancestor: possible ancestor of @task's cgroup
> *
> * Tests whether @task's default cgroup hierarchy is a descendant of @ancestor.
> - * It follows all the same rules as cgroup_is_descendant, and only applies
> - * to the default hierarchy.
> + * It follows all the same rules as cgroup_is_descendant.
> */
> static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
> struct cgroup *ancestor)
> {
> struct css_set *cset = task_css_set(task);
> + struct cgroup *cgrp;
> + bool ret = false;
> + int ssid;
> +
> + if (ancestor->root == &cgrp_dfl_root)
> + return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> +
> + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
> + if (!ancestor->subsys[ssid])
> + continue;
This looks wrong. I believe cgroup_mutex should be held to iterate.
Tejun ?
>
> - return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> + cgrp = task_css(task, ssid)->cgroup;
> + if (!cgrp)
> + continue;
> +
> + if (!cgroup_is_descendant(cgrp, ancestor))
> + return false;
> + if (!ret)
> + ret = true;
> + }
> + return ret;
> }
>
> /* no synchronization, the result can only be used as a hint */
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 2/5] bpf: Enable cgroup_array map on cgroup1
2023-09-03 14:27 ` [RFC PATCH bpf-next 2/5] bpf: Enable cgroup_array map " Yafang Shao
@ 2023-09-06 19:54 ` Alexei Starovoitov
0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2023-09-06 19:54 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed, cgroups, bpf
On Sun, Sep 03, 2023 at 02:27:57PM +0000, Yafang Shao wrote:
> The cgroup_array map currently has support exclusively for cgroup2, owing
> to the fact that cgroup_get_from_fd() is only valid for cgroup2 file
> descriptors. However, an alternative approach is available where we can use
> cgroup_v1v2_get_from_fd() for both cgroup1 and cgroup2 file descriptors.
>
> The corresponding cgroup pointer extracted from the cgroup file descriptor
> will be utilized by functions like bpf_current_task_under_cgroup() or
> bpf_skb_under_cgroup() to determine whether a task or socket buffer (skb)
> is associated with a specific cgroup. In a previous commit, we successfully
> enabled bpf_current_task_under_cgroup(), ensuring the safety of storing a
> cgroup1 pointer within the cgroup_array map.
>
> Regarding bpf_skb_under_cgroup(), it is currently restricted to cgroup2
> functionality only. Nevertheless, it remains safe to verify a cgroup1
> pointer within this context as well, with the understanding that it will
> return a "false" result in such cases.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> kernel/bpf/arraymap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 2058e89..30ea57c 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -1291,7 +1291,7 @@ static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
> struct file *map_file /* not used */,
> int fd)
> {
> - return cgroup_get_from_fd(fd);
> + return cgroup_v1v2_get_from_fd(fd);
This part looks ok.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() on cgroup1
2023-09-03 14:27 ` [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() " Yafang Shao
2023-09-06 19:53 ` Alexei Starovoitov
@ 2023-09-06 20:13 ` Tejun Heo
2023-09-07 3:05 ` Yafang Shao
2023-09-18 14:45 ` Michal Koutný
2 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2023-09-06 20:13 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, lizefan.x, hannes,
yosryahmed, cgroups, bpf
Hello,
On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao wrote:
> static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
> struct cgroup *ancestor)
> {
> struct css_set *cset = task_css_set(task);
> + struct cgroup *cgrp;
> + bool ret = false;
> + int ssid;
> +
> + if (ancestor->root == &cgrp_dfl_root)
> + return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> +
> + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
> + if (!ancestor->subsys[ssid])
> + continue;
>
> - return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> + cgrp = task_css(task, ssid)->cgroup;
> + if (!cgrp)
> + continue;
> +
> + if (!cgroup_is_descendant(cgrp, ancestor))
> + return false;
> + if (!ret)
> + ret = true;
> + }
> + return ret;
I feel ambivalent about adding support for this in cgroup1 especially given
that this can only work for fd based interface which is worse than the ID
based ones. Even if we're doing this, the above is definitely not what we
want to do as it won't work for controller-less hierarchies like the one
that systemd used to use. You'd have to lock css_set_lock and walk the
cgpr_cset_links.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() on cgroup1
2023-09-06 20:13 ` Tejun Heo
@ 2023-09-07 3:05 ` Yafang Shao
2023-09-11 20:27 ` Tejun Heo
0 siblings, 1 reply; 27+ messages in thread
From: Yafang Shao @ 2023-09-07 3:05 UTC (permalink / raw)
To: Tejun Heo
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, lizefan.x, hannes,
yosryahmed, cgroups, bpf
On Thu, Sep 7, 2023 at 4:13 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao wrote:
> > static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
> > struct cgroup *ancestor)
> > {
> > struct css_set *cset = task_css_set(task);
> > + struct cgroup *cgrp;
> > + bool ret = false;
> > + int ssid;
> > +
> > + if (ancestor->root == &cgrp_dfl_root)
> > + return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> > +
> > + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
> > + if (!ancestor->subsys[ssid])
> > + continue;
> >
> > - return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> > + cgrp = task_css(task, ssid)->cgroup;
> > + if (!cgrp)
> > + continue;
> > +
> > + if (!cgroup_is_descendant(cgrp, ancestor))
> > + return false;
> > + if (!ret)
> > + ret = true;
> > + }
> > + return ret;
>
> I feel ambivalent about adding support for this in cgroup1 especially given
> that this can only work for fd based interface which is worse than the ID
> based ones.
The fd-based cgroup interface plays a crucial role in BPF programs,
particularly in components such as cgroup_iter, bpf_cgrp_storage, and
cgroup_array maps, as well as in the attachment and detachment of
cgroups.
However, it's important to note that as far as my knowledge goes,
bpf_cgrp_storage, cgroup_array, and the attachment/detachment of
cgroups are exclusively compatible with the cgroup fd-based interface.
Unfortunately, all these functionalities are limited to cgroup2, which
poses challenges in containerized environments.
In our pursuit of enabling seamless BPF integration within our
Kubernetes environment, we've been exploring the possibility of
transitioning from cgroup1 to cgroup2. This transition, while
desirable for its future-forward nature, presents complexities due to
the need for numerous applications to adapt.
We acknowledge that cgroup2 represents the future, but we also
understand that such transitions require time and effort.
Consequently, we are considering an alternative approach. Rather than
migrating to cgroup2, we are contemplating modifications to the BPF
kernel code to ensure compatibility with cgroup1. Moreover, it appears
that these modifications may entail only minor adjustments, making
this option more palatable.
> Even if we're doing this, the above is definitely not what we
> want to do as it won't work for controller-less hierarchies like the one
> that systemd used to use.
Right. It can't work for /sys/fs/cgroup/systemd/.
> You'd have to lock css_set_lock and walk the
> cgpr_cset_links.
That seems better. Will investigate it. Thanks for your suggestion.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-03 14:27 [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Yafang Shao
` (4 preceding siblings ...)
2023-09-03 14:28 ` [RFC PATCH bpf-next 5/5] selftests/bpf: Add selftests for current_under_cgroupv1v2 Yafang Shao
@ 2023-09-07 14:41 ` Michal Koutný
2023-09-08 2:53 ` Yafang Shao
5 siblings, 1 reply; 27+ messages in thread
From: Michal Koutný @ 2023-09-07 14:41 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed, cgroups, bpf
[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]
Hello Yafang.
On Sun, Sep 03, 2023 at 02:27:55PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> In our specific use case, we intend to use bpf_current_under_cgroup() to
> audit whether the current task resides within specific containers.
I wonder -- how does this work in practice?
If it's systemd hybrid setup, you can get the information from the
unified hierarchy which represents the container membership.
If it's a setup without the unified hierarchy, you have to pick one
hieararchy as a representation of the membership. Which one will it be?
> Subsequently, we can use this information to create distinct ACLs within
> our LSM BPF programs, enabling us to control specific operations performed
> by these tasks.
If one was serious about container-based ACLs, it'd be best to have a
dedicated and maintained hierarchy for this (I mean a named v1
hiearchy). But your implementation omits this, so this hints to me that
this scenario may already be better covered with querying the unified
hierarchy.
> Considering the widespread use of cgroup1 in container environments,
> coupled with the considerable time it will take to transition to cgroup2,
> implementing this change will significantly enhance the utility of BPF
> in container scenarios.
If a change like this is not accepted, will it make the transition
period shorter? (As written above, the unified hierarchy seems a better
fit for your use case.)
Thanks,
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-07 14:41 ` [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Michal Koutný
@ 2023-09-08 2:53 ` Yafang Shao
2023-09-08 18:09 ` Alexei Starovoitov
0 siblings, 1 reply; 27+ messages in thread
From: Yafang Shao @ 2023-09-08 2:53 UTC (permalink / raw)
To: Michal Koutný
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed, cgroups, bpf
On Thu, Sep 7, 2023 at 10:41 PM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hello Yafang.
>
> On Sun, Sep 03, 2023 at 02:27:55PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> > In our specific use case, we intend to use bpf_current_under_cgroup() to
> > audit whether the current task resides within specific containers.
>
> I wonder -- how does this work in practice?
In our practice, the cgroup_array map serves as a shared map utilized
by both our LSM programs and the target pods. as follows,
----------------
| target pod |
----------------
|
|
V ----------------
/sys/fs/bpf/cgoup_array <--- | LSM progs|
----------------
Within the target pods, we employ a script to update its cgroup file
descriptor into the cgroup_array, for instance:
cgrp_fd = open("/sys/fs/cgroup/cpu");
cgrp_map_fd = bpf_obj_get("/sys/fs/bpf/cgroup_array");
bpf_map_update_elem(cgrp_map_fd, &app_idx, &cgrp_fd, 0);
Next, we will validate the contents of the cgroup_array within our LSM
programs, as follows:
if (!bpf_current_task_under_cgroup(&cgroup_array, app_idx))
return -1;
Within our Kubernetes deployment system, we will inject this script
into the target pods only if specific annotations, such as
"bpf_audit," are present. Consequently, users do not need to manually
modify their code; this process will be handled automatically.
Within our Kubernetes environment, there is only a single instance of
these target pods on each host. Consequently, we can conveniently
utilize the array index as the application ID. However, in scenarios
where you have multiple instances running on a single host, you will
need to manage the mapping of instances to array indexes
independently. For cases with multiple instances, a cgroup_hash may be
a more suitable approach, although that is a separate discussion
altogether.
>
> If it's systemd hybrid setup, you can get the information from the
> unified hierarchy which represents the container membership.
>
> If it's a setup without the unified hierarchy, you have to pick one
> hieararchy as a representation of the membership. Which one will it be?
We utilize the CPU subsystem, and all of our pods have this cgroup
subsystem enabled.
>
> > Subsequently, we can use this information to create distinct ACLs within
> > our LSM BPF programs, enabling us to control specific operations performed
> > by these tasks.
>
> If one was serious about container-based ACLs, it'd be best to have a
> dedicated and maintained hierarchy for this (I mean a named v1
> hiearchy). But your implementation omits this, so this hints to me that
> this scenario may already be better covered with querying the unified
> hierarchy.
>
> > Considering the widespread use of cgroup1 in container environments,
> > coupled with the considerable time it will take to transition to cgroup2,
> > implementing this change will significantly enhance the utility of BPF
> > in container scenarios.
>
> If a change like this is not accepted, will it make the transition
> period shorter? (As written above, the unified hierarchy seems a better
> fit for your use case.)
If that change is not accepted by upstream, we will need to
independently manage and maintain it within our local kernel :(
--
Regards
Yafang
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-08 2:53 ` Yafang Shao
@ 2023-09-08 18:09 ` Alexei Starovoitov
2023-09-10 3:17 ` Yafang Shao
0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2023-09-08 18:09 UTC (permalink / raw)
To: Yafang Shao
Cc: Michal Koutný, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Tejun Heo, Zefan Li, Johannes Weiner, Yosry Ahmed,
open list:CONTROL GROUP (CGROUP), bpf
On Thu, Sep 7, 2023 at 7:54 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Sep 7, 2023 at 10:41 PM Michal Koutný <mkoutny@suse.com> wrote:
> >
> > Hello Yafang.
> >
> > On Sun, Sep 03, 2023 at 02:27:55PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> > > In our specific use case, we intend to use bpf_current_under_cgroup() to
> > > audit whether the current task resides within specific containers.
> >
> > I wonder -- how does this work in practice?
>
> In our practice, the cgroup_array map serves as a shared map utilized
> by both our LSM programs and the target pods. as follows,
>
> ----------------
> | target pod |
> ----------------
> |
> |
> V ----------------
> /sys/fs/bpf/cgoup_array <--- | LSM progs|
> ----------------
>
> Within the target pods, we employ a script to update its cgroup file
> descriptor into the cgroup_array, for instance:
>
> cgrp_fd = open("/sys/fs/cgroup/cpu");
> cgrp_map_fd = bpf_obj_get("/sys/fs/bpf/cgroup_array");
> bpf_map_update_elem(cgrp_map_fd, &app_idx, &cgrp_fd, 0);
>
> Next, we will validate the contents of the cgroup_array within our LSM
> programs, as follows:
>
> if (!bpf_current_task_under_cgroup(&cgroup_array, app_idx))
> return -1;
>
> Within our Kubernetes deployment system, we will inject this script
> into the target pods only if specific annotations, such as
> "bpf_audit," are present. Consequently, users do not need to manually
> modify their code; this process will be handled automatically.
>
> Within our Kubernetes environment, there is only a single instance of
> these target pods on each host. Consequently, we can conveniently
> utilize the array index as the application ID. However, in scenarios
> where you have multiple instances running on a single host, you will
> need to manage the mapping of instances to array indexes
> independently. For cases with multiple instances, a cgroup_hash may be
> a more suitable approach, although that is a separate discussion
> altogether.
Is there a reason you cannot use bpf_get_current_cgroup_id()
to associate task with cgroup in your lsm prog?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-08 18:09 ` Alexei Starovoitov
@ 2023-09-10 3:17 ` Yafang Shao
2023-09-11 19:53 ` Alexei Starovoitov
2023-09-11 20:24 ` Tejun Heo
0 siblings, 2 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-10 3:17 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Michal Koutný, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Tejun Heo, Zefan Li, Johannes Weiner, Yosry Ahmed,
open list:CONTROL GROUP (CGROUP), bpf
On Sat, Sep 9, 2023 at 2:09 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Sep 7, 2023 at 7:54 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Sep 7, 2023 at 10:41 PM Michal Koutný <mkoutny@suse.com> wrote:
> > >
> > > Hello Yafang.
> > >
> > > On Sun, Sep 03, 2023 at 02:27:55PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > In our specific use case, we intend to use bpf_current_under_cgroup() to
> > > > audit whether the current task resides within specific containers.
> > >
> > > I wonder -- how does this work in practice?
> >
> > In our practice, the cgroup_array map serves as a shared map utilized
> > by both our LSM programs and the target pods. as follows,
> >
> > ----------------
> > | target pod |
> > ----------------
> > |
> > |
> > V ----------------
> > /sys/fs/bpf/cgoup_array <--- | LSM progs|
> > ----------------
> >
> > Within the target pods, we employ a script to update its cgroup file
> > descriptor into the cgroup_array, for instance:
> >
> > cgrp_fd = open("/sys/fs/cgroup/cpu");
> > cgrp_map_fd = bpf_obj_get("/sys/fs/bpf/cgroup_array");
> > bpf_map_update_elem(cgrp_map_fd, &app_idx, &cgrp_fd, 0);
> >
> > Next, we will validate the contents of the cgroup_array within our LSM
> > programs, as follows:
> >
> > if (!bpf_current_task_under_cgroup(&cgroup_array, app_idx))
> > return -1;
> >
> > Within our Kubernetes deployment system, we will inject this script
> > into the target pods only if specific annotations, such as
> > "bpf_audit," are present. Consequently, users do not need to manually
> > modify their code; this process will be handled automatically.
> >
> > Within our Kubernetes environment, there is only a single instance of
> > these target pods on each host. Consequently, we can conveniently
> > utilize the array index as the application ID. However, in scenarios
> > where you have multiple instances running on a single host, you will
> > need to manage the mapping of instances to array indexes
> > independently. For cases with multiple instances, a cgroup_hash may be
> > a more suitable approach, although that is a separate discussion
> > altogether.
>
> Is there a reason you cannot use bpf_get_current_cgroup_id()
> to associate task with cgroup in your lsm prog?
Using cgroup_id as the key serves as a temporary workaround;
nevertheless, employing bpf_get_current_cgroup_id() is impractical due
to its exclusive support for cgroup2.
To acquire the cgroup_id, we can resort to open coding, as exemplified below:
task = bpf_get_current_task_btf();
cgroups = task->cgroups;
cgroup = cgroups->subsys[cpu_cgrp_id]->cgroup;
key = cgroup->kn->id;
Nonetheless, creating an open-coded version of
bpf_get_current_ancestor_cgroup_id() is unfeasible since the BPF
verifier prohibits access to "cgrp->ancestors[ancestor_level]."
--
Regards
Yafang
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-10 3:17 ` Yafang Shao
@ 2023-09-11 19:53 ` Alexei Starovoitov
2023-09-11 20:24 ` Tejun Heo
1 sibling, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2023-09-11 19:53 UTC (permalink / raw)
To: Yafang Shao
Cc: Michal Koutný, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Tejun Heo, Zefan Li, Johannes Weiner, Yosry Ahmed,
open list:CONTROL GROUP (CGROUP), bpf
On Sat, Sep 9, 2023 at 8:18 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sat, Sep 9, 2023 at 2:09 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Sep 7, 2023 at 7:54 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Thu, Sep 7, 2023 at 10:41 PM Michal Koutný <mkoutny@suse.com> wrote:
> > > >
> > > > Hello Yafang.
> > > >
> > > > On Sun, Sep 03, 2023 at 02:27:55PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > In our specific use case, we intend to use bpf_current_under_cgroup() to
> > > > > audit whether the current task resides within specific containers.
> > > >
> > > > I wonder -- how does this work in practice?
> > >
> > > In our practice, the cgroup_array map serves as a shared map utilized
> > > by both our LSM programs and the target pods. as follows,
> > >
> > > ----------------
> > > | target pod |
> > > ----------------
> > > |
> > > |
> > > V ----------------
> > > /sys/fs/bpf/cgoup_array <--- | LSM progs|
> > > ----------------
> > >
> > > Within the target pods, we employ a script to update its cgroup file
> > > descriptor into the cgroup_array, for instance:
> > >
> > > cgrp_fd = open("/sys/fs/cgroup/cpu");
> > > cgrp_map_fd = bpf_obj_get("/sys/fs/bpf/cgroup_array");
> > > bpf_map_update_elem(cgrp_map_fd, &app_idx, &cgrp_fd, 0);
> > >
> > > Next, we will validate the contents of the cgroup_array within our LSM
> > > programs, as follows:
> > >
> > > if (!bpf_current_task_under_cgroup(&cgroup_array, app_idx))
> > > return -1;
> > >
> > > Within our Kubernetes deployment system, we will inject this script
> > > into the target pods only if specific annotations, such as
> > > "bpf_audit," are present. Consequently, users do not need to manually
> > > modify their code; this process will be handled automatically.
> > >
> > > Within our Kubernetes environment, there is only a single instance of
> > > these target pods on each host. Consequently, we can conveniently
> > > utilize the array index as the application ID. However, in scenarios
> > > where you have multiple instances running on a single host, you will
> > > need to manage the mapping of instances to array indexes
> > > independently. For cases with multiple instances, a cgroup_hash may be
> > > a more suitable approach, although that is a separate discussion
> > > altogether.
> >
> > Is there a reason you cannot use bpf_get_current_cgroup_id()
> > to associate task with cgroup in your lsm prog?
>
> Using cgroup_id as the key serves as a temporary workaround;
> nevertheless, employing bpf_get_current_cgroup_id() is impractical due
> to its exclusive support for cgroup2.
>
> To acquire the cgroup_id, we can resort to open coding, as exemplified below:
>
> task = bpf_get_current_task_btf();
> cgroups = task->cgroups;
> cgroup = cgroups->subsys[cpu_cgrp_id]->cgroup;
> key = cgroup->kn->id;
>
> Nonetheless, creating an open-coded version of
> bpf_get_current_ancestor_cgroup_id() is unfeasible since the BPF
> verifier prohibits access to "cgrp->ancestors[ancestor_level]."
Both helpers can be extended to support v1 or not?
I mean can a task be part of v1 and v2 hierarchy at the same time?
If not then bpf_get_current_cgroup_id() can fallback to what you
describing above and return cgroup_id.
Same would apply to bpf_get_current_ancestor_cgroup_id.
If not, two new kfuncs for v1 could be another option.
prog_array for cgroups is an old design. We can and should do
more flexible interface nowadays.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-10 3:17 ` Yafang Shao
2023-09-11 19:53 ` Alexei Starovoitov
@ 2023-09-11 20:24 ` Tejun Heo
2023-09-12 3:30 ` Yafang Shao
1 sibling, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2023-09-11 20:24 UTC (permalink / raw)
To: Yafang Shao
Cc: Alexei Starovoitov, Michal Koutný, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Zefan Li, Johannes Weiner,
Yosry Ahmed, open list:CONTROL GROUP (CGROUP), bpf
On Sun, Sep 10, 2023 at 11:17:48AM +0800, Yafang Shao wrote:
> To acquire the cgroup_id, we can resort to open coding, as exemplified below:
>
> task = bpf_get_current_task_btf();
> cgroups = task->cgroups;
> cgroup = cgroups->subsys[cpu_cgrp_id]->cgroup;
> key = cgroup->kn->id;
You can't hardcode it to a specific controller tree like that. You either
stick with fd based interface or need also add something to identify the
specifc cgroup1 tree.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() on cgroup1
2023-09-07 3:05 ` Yafang Shao
@ 2023-09-11 20:27 ` Tejun Heo
0 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2023-09-11 20:27 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, lizefan.x, hannes,
yosryahmed, cgroups, bpf
On Thu, Sep 07, 2023 at 11:05:07AM +0800, Yafang Shao wrote:
> The fd-based cgroup interface plays a crucial role in BPF programs,
> particularly in components such as cgroup_iter, bpf_cgrp_storage, and
> cgroup_array maps, as well as in the attachment and detachment of
> cgroups.
Yeah, I know they're used. It's just that they are inferior identifiers from
cgroup's POV as they are ephemeral, can't easily be transferred across
process boundaries or persisted beyond the lifetime of the fd-owing process
or the cgroups which are being pointed to.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-11 20:24 ` Tejun Heo
@ 2023-09-12 3:30 ` Yafang Shao
2023-09-15 17:01 ` Michal Koutný
2023-09-15 18:57 ` Hao Luo
0 siblings, 2 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-12 3:30 UTC (permalink / raw)
To: Tejun Heo
Cc: Alexei Starovoitov, Michal Koutný, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Zefan Li, Johannes Weiner,
Yosry Ahmed, open list:CONTROL GROUP (CGROUP), bpf
On Tue, Sep 12, 2023 at 4:24 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Sun, Sep 10, 2023 at 11:17:48AM +0800, Yafang Shao wrote:
> > To acquire the cgroup_id, we can resort to open coding, as exemplified below:
> >
> > task = bpf_get_current_task_btf();
> > cgroups = task->cgroups;
> > cgroup = cgroups->subsys[cpu_cgrp_id]->cgroup;
> > key = cgroup->kn->id;
>
> You can't hardcode it to a specific controller tree like that. You either
> stick with fd based interface or need also add something to identify the
> specifc cgroup1 tree.
As pointed out by Alexei, I think we can introduce some
cgroup_id-based kfuncs which can work for both cgroup1 and cgroup2.
Something as follows (untested),
__bpf_kfunc u64 bpf_current_cgroup_id_from_subsys(int subsys)
{
struct cgroup *cgroup;
cgroup = task_cgroup(current, subsys);
if (!cgroup)
return 0;
return cgroup_id(cgroup);
}
__bpf_kfunc struct cgroup *bpf_cgroup_from_subsys_id(u64 cgid, int subsys)
{
struct cgroup_subsys_state *css = init_css_set.subsys[subsys];
struct cgroup *subsys_root = css->cgroup;
// We should introduce a new helper cgroup_get_from_subsys_id()
// in the cgroup subsystem.
return cgroup_get_from_subsys_id(subsys_root, cgid);
}
And change task_under_cgroup_hierarchy() as follows,
static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
struct cgroup *ancestor)
{
struct css_set *cset = task_css_set(task);
-
- return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
+ struct cgroup *cgrp;
+ bool ret = false;
+
+ if (ancestor->root == &cgrp_dfl_root)
+ return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
+
+ cgroup_lock();
+ spin_lock_irq(&css_set_lock);
+ cgrp = task_cgroup_from_root(task, ancestor->root);
+ if (cgrp && cgroup_is_descendant(cgrp, ancestor))
+ ret = true;
+ spin_unlock_irq(&css_set_lock);
+ cgroup_unlock();
+ return ret;
}
With the above changes, I think it can meet most use cases with BPF on cgroup1.
What do you think ?
--
Regards
Yafang
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-12 3:30 ` Yafang Shao
@ 2023-09-15 17:01 ` Michal Koutný
2023-09-15 17:31 ` Tejun Heo
2023-09-17 7:19 ` Yafang Shao
2023-09-15 18:57 ` Hao Luo
1 sibling, 2 replies; 27+ messages in thread
From: Michal Koutný @ 2023-09-15 17:01 UTC (permalink / raw)
To: Yafang Shao
Cc: Tejun Heo, Alexei Starovoitov, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Zefan Li, Johannes Weiner,
Yosry Ahmed, open list:CONTROL GROUP (CGROUP), bpf
[-- Attachment #1: Type: text/plain, Size: 564 bytes --]
Hello.
On Tue, Sep 12, 2023 at 11:30:32AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote:
> With the above changes, I think it can meet most use cases with BPF on cgroup1.
> What do you think ?
I think the presented use case of LSM hooks is better served by the
default hierarchy (see also [1]).
Relying on a chosen subsys v1 hierarchy is not systematic. And extending
ancestry checking on named v1 hierarchies seems backwards given
the existence of the default hierarchy.
Michal
[1] https://docs.kernel.org/admin-guide/cgroup-v2.html#delegation-containment
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-15 17:01 ` Michal Koutný
@ 2023-09-15 17:31 ` Tejun Heo
2023-09-17 7:28 ` Yafang Shao
2023-09-17 7:19 ` Yafang Shao
1 sibling, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2023-09-15 17:31 UTC (permalink / raw)
To: Michal Koutný
Cc: Yafang Shao, Alexei Starovoitov, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Zefan Li, Johannes Weiner,
Yosry Ahmed, open list:CONTROL GROUP (CGROUP), bpf
On Fri, Sep 15, 2023 at 07:01:28PM +0200, Michal Koutný wrote:
> Hello.
>
> On Tue, Sep 12, 2023 at 11:30:32AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote:
> > With the above changes, I think it can meet most use cases with BPF on cgroup1.
> > What do you think ?
>
> I think the presented use case of LSM hooks is better served by the
> default hierarchy (see also [1]).
> Relying on a chosen subsys v1 hierarchy is not systematic. And extending
> ancestry checking on named v1 hierarchies seems backwards given
> the existence of the default hierarchy.
Yeah, identifying cgroup1 hierarchies by subsys leave out pretty good chunk
of usecases - e.g. systemd used to use a named hierarchy for primary process
organization on cgroup1.
Also, you don't have to switch to cgroup2 wholesale. You can just build the
same hierarchy in cgroup2 for process organization and combine that with any
cgroup1 hierarchies.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-12 3:30 ` Yafang Shao
2023-09-15 17:01 ` Michal Koutný
@ 2023-09-15 18:57 ` Hao Luo
2023-09-17 7:30 ` Yafang Shao
1 sibling, 1 reply; 27+ messages in thread
From: Hao Luo @ 2023-09-15 18:57 UTC (permalink / raw)
To: Yafang Shao
Cc: Tejun Heo, Alexei Starovoitov, Michal Koutný,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
KP Singh, Stanislav Fomichev, Jiri Olsa, Zefan Li,
Johannes Weiner, Yosry Ahmed, open list:CONTROL GROUP (CGROUP),
bpf
On Mon, Sep 11, 2023 at 8:31 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Sep 12, 2023 at 4:24 AM Tejun Heo <tj@kernel.org> wrote:
> >
> > On Sun, Sep 10, 2023 at 11:17:48AM +0800, Yafang Shao wrote:
> > > To acquire the cgroup_id, we can resort to open coding, as exemplified below:
> > >
> > > task = bpf_get_current_task_btf();
> > > cgroups = task->cgroups;
> > > cgroup = cgroups->subsys[cpu_cgrp_id]->cgroup;
> > > key = cgroup->kn->id;
> >
> > You can't hardcode it to a specific controller tree like that. You either
> > stick with fd based interface or need also add something to identify the
> > specifc cgroup1 tree.
>
> As pointed out by Alexei, I think we can introduce some
> cgroup_id-based kfuncs which can work for both cgroup1 and cgroup2.
>
> Something as follows (untested),
>
> __bpf_kfunc u64 bpf_current_cgroup_id_from_subsys(int subsys)
> {
> struct cgroup *cgroup;
>
> cgroup = task_cgroup(current, subsys);
> if (!cgroup)
> return 0;
> return cgroup_id(cgroup);
> }
>
Can we also support checking arbitrary tasks, instead of just current?
I find myself often needing to find the cgroup only given a
task_struct. For example, when attaching to context switch, I want to
know whether the next task is under a cgroup. Having such a kfunc
would be very useful. It can also be used in task_iter programs.
Hao
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-15 17:01 ` Michal Koutný
2023-09-15 17:31 ` Tejun Heo
@ 2023-09-17 7:19 ` Yafang Shao
2023-09-18 14:44 ` Michal Koutný
1 sibling, 1 reply; 27+ messages in thread
From: Yafang Shao @ 2023-09-17 7:19 UTC (permalink / raw)
To: Michal Koutný
Cc: Tejun Heo, Alexei Starovoitov, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Zefan Li, Johannes Weiner,
Yosry Ahmed, open list:CONTROL GROUP (CGROUP), bpf
On Sat, Sep 16, 2023 at 1:01 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hello.
>
> On Tue, Sep 12, 2023 at 11:30:32AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote:
> > With the above changes, I think it can meet most use cases with BPF on cgroup1.
> > What do you think ?
>
> I think the presented use case of LSM hooks is better served by the
> default hierarchy (see also [1]).
Hi Michal,
The crucial issue at hand is not whether the LSM hooks are better
suited for the cgroup default hierarchy. What truly matters is the
effort and time required to migrate all cgroup1-based applications to
cgroup2-based ones. While transitioning a single component from
cgroup1-based to cgroup2-based is a straightforward task, the
complexity arises when multiple interdependent components in a
production environment necessitate this transition. In such cases, the
work becomes significantly challenging.
> Relying on a chosen subsys v1 hierarchy is not systematic. And extending
> ancestry checking on named v1 hierarchies seems backwards given
> the existence of the default hierarchy.
The cgroup becomes active only when it has one or more of its
controllers enabled. In a production environment, a task is invariably
governed by at least one cgroup controller. Even in hybrid cgroup
mode, a task is subject to either a cgroup1 controller or a cgroup2
controller. Our objective is to enhance BPF support for
controller-based scenarios, eliminating the need to concern ourselves
with hierarchies, whether they involve cgroup1 or cgroup2. This change
seems quite reasonable, in my opinion.
>
>
> Michal
>
> [1] https://docs.kernel.org/admin-guide/cgroup-v2.html#delegation-containment
--
Regards
Yafang
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-15 17:31 ` Tejun Heo
@ 2023-09-17 7:28 ` Yafang Shao
0 siblings, 0 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-17 7:28 UTC (permalink / raw)
To: Tejun Heo
Cc: Michal Koutný, Alexei Starovoitov, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Zefan Li, Johannes Weiner,
Yosry Ahmed, open list:CONTROL GROUP (CGROUP), bpf
On Sat, Sep 16, 2023 at 1:31 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Fri, Sep 15, 2023 at 07:01:28PM +0200, Michal Koutný wrote:
> > Hello.
> >
> > On Tue, Sep 12, 2023 at 11:30:32AM +0800, Yafang Shao <laoar.shao@gmail.com> wrote:
> > > With the above changes, I think it can meet most use cases with BPF on cgroup1.
> > > What do you think ?
> >
> > I think the presented use case of LSM hooks is better served by the
> > default hierarchy (see also [1]).
> > Relying on a chosen subsys v1 hierarchy is not systematic. And extending
> > ancestry checking on named v1 hierarchies seems backwards given
> > the existence of the default hierarchy.
>
> Yeah, identifying cgroup1 hierarchies by subsys leave out pretty good chunk
> of usecases - e.g. systemd used to use a named hierarchy for primary process
> organization on cgroup1.
Systemd-managed tasks invariably have one or more cgroup controllers
enabled, as exemplified by entries like
"/sys/fs/cgroup/cpu/{system.slice, user.slice, XXX.service}".
Consequently, the presence of a cgroup controller can be employed as
an indicator to identify a systemd-managed task.
>
> Also, you don't have to switch to cgroup2 wholesale. You can just build the
> same hierarchy in cgroup2 for process organization and combine that with any
> cgroup1 hierarchies.
The challenge lies in the need to adapt a multitude of applications to
this system, and furthermore, not all of these applications are under
our direct management. This poses a formidable task.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-15 18:57 ` Hao Luo
@ 2023-09-17 7:30 ` Yafang Shao
0 siblings, 0 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-17 7:30 UTC (permalink / raw)
To: Hao Luo
Cc: Tejun Heo, Alexei Starovoitov, Michal Koutný,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
KP Singh, Stanislav Fomichev, Jiri Olsa, Zefan Li,
Johannes Weiner, Yosry Ahmed, open list:CONTROL GROUP (CGROUP),
bpf
On Sat, Sep 16, 2023 at 2:57 AM Hao Luo <haoluo@google.com> wrote:
>
> On Mon, Sep 11, 2023 at 8:31 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Sep 12, 2023 at 4:24 AM Tejun Heo <tj@kernel.org> wrote:
> > >
> > > On Sun, Sep 10, 2023 at 11:17:48AM +0800, Yafang Shao wrote:
> > > > To acquire the cgroup_id, we can resort to open coding, as exemplified below:
> > > >
> > > > task = bpf_get_current_task_btf();
> > > > cgroups = task->cgroups;
> > > > cgroup = cgroups->subsys[cpu_cgrp_id]->cgroup;
> > > > key = cgroup->kn->id;
> > >
> > > You can't hardcode it to a specific controller tree like that. You either
> > > stick with fd based interface or need also add something to identify the
> > > specifc cgroup1 tree.
> >
> > As pointed out by Alexei, I think we can introduce some
> > cgroup_id-based kfuncs which can work for both cgroup1 and cgroup2.
> >
> > Something as follows (untested),
> >
> > __bpf_kfunc u64 bpf_current_cgroup_id_from_subsys(int subsys)
> > {
> > struct cgroup *cgroup;
> >
> > cgroup = task_cgroup(current, subsys);
> > if (!cgroup)
> > return 0;
> > return cgroup_id(cgroup);
> > }
> >
>
> Can we also support checking arbitrary tasks, instead of just current?
> I find myself often needing to find the cgroup only given a
> task_struct. For example, when attaching to context switch, I want to
> know whether the next task is under a cgroup. Having such a kfunc
> would be very useful. It can also be used in task_iter programs.
Agree. Will do it.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1
2023-09-17 7:19 ` Yafang Shao
@ 2023-09-18 14:44 ` Michal Koutný
0 siblings, 0 replies; 27+ messages in thread
From: Michal Koutný @ 2023-09-18 14:44 UTC (permalink / raw)
To: Yafang Shao
Cc: Tejun Heo, Alexei Starovoitov, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Zefan Li, Johannes Weiner,
Yosry Ahmed, open list:CONTROL GROUP (CGROUP), bpf
[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]
On Sun, Sep 17, 2023 at 03:19:06PM +0800, Yafang Shao <laoar.shao@gmail.com> wrote:
> The crucial issue at hand is not whether the LSM hooks are better
> suited for the cgroup default hierarchy. What truly matters is the
> effort and time required to migrate all cgroup1-based applications to
> cgroup2-based ones. While transitioning a single component from
> cgroup1-based to cgroup2-based is a straightforward task, the
> complexity arises when multiple interdependent components in a
> production environment necessitate this transition. In such cases, the
> work becomes significantly challenging.
systemd's hybrid mode is the approach helping such combined
environments. (I understand that it's not warranted with all container
runtimes but FYI.) On v1-only deployments BPF predicates couldn't be
used at all currently.
Transition is transitional but accompanying complexity in the code would
have to be kept much longer.
> Our objective is to enhance BPF support for controller-based
> scenarios, eliminating the need to concern ourselves with hierarchies,
> whether they involve cgroup1 or cgroup2.
I'm posting some notes on this to the 1st patch.
Regards,
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() on cgroup1
2023-09-03 14:27 ` [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() " Yafang Shao
2023-09-06 19:53 ` Alexei Starovoitov
2023-09-06 20:13 ` Tejun Heo
@ 2023-09-18 14:45 ` Michal Koutný
2023-09-19 5:42 ` Yafang Shao
2 siblings, 1 reply; 27+ messages in thread
From: Michal Koutný @ 2023-09-18 14:45 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed, cgroups, bpf
[-- Attachment #1: Type: text/plain, Size: 975 bytes --]
On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
> struct cgroup *ancestor)
> {
> struct css_set *cset = task_css_set(task);
> + struct cgroup *cgrp;
> + bool ret = false;
> + int ssid;
> +
> + if (ancestor->root == &cgrp_dfl_root)
> + return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> +
> + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
This loop were better an iteration over cset->cgrp_links to handle any
v1 hierarchy (under css_set_lock :-/).
> + if (!ancestor->subsys[ssid])
> + continue;
>
> - return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> + cgrp = task_css(task, ssid)->cgroup;
Does this pass on a lockdep-enabled kernel?
See conditions in task_css_set_check(), it seems at least RCU read lock
would be needed (if not going through cgrp_links mentioned above).
HTH,
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() on cgroup1
2023-09-18 14:45 ` Michal Koutný
@ 2023-09-19 5:42 ` Yafang Shao
0 siblings, 0 replies; 27+ messages in thread
From: Yafang Shao @ 2023-09-19 5:42 UTC (permalink / raw)
To: Michal Koutný
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
yosryahmed, cgroups, bpf
On Mon, Sep 18, 2023 at 10:45 PM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Sun, Sep 03, 2023 at 02:27:56PM +0000, Yafang Shao <laoar.shao@gmail.com> wrote:
> > static inline bool task_under_cgroup_hierarchy(struct task_struct *task,
> > struct cgroup *ancestor)
> > {
> > struct css_set *cset = task_css_set(task);
> > + struct cgroup *cgrp;
> > + bool ret = false;
> > + int ssid;
> > +
> > + if (ancestor->root == &cgrp_dfl_root)
> > + return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> > +
> > + for (ssid = 0; ssid < CGROUP_SUBSYS_COUNT; ssid++) {
>
> This loop were better an iteration over cset->cgrp_links to handle any
> v1 hierarchy (under css_set_lock :-/).
Agree. That is better.
>
> > + if (!ancestor->subsys[ssid])
> > + continue;
> >
> > - return cgroup_is_descendant(cset->dfl_cgrp, ancestor);
> > + cgrp = task_css(task, ssid)->cgroup;
>
> Does this pass on a lockdep-enabled kernel?
Yes, the lockdep is enabled.
>
> See conditions in task_css_set_check(), it seems at least RCU read lock
> would be needed (if not going through cgrp_links mentioned above).
All the call sites of it are already under RCU protection, so we don't
need to explicitly set RCU read lock here.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2023-09-19 5:43 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-03 14:27 [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Yafang Shao
2023-09-03 14:27 ` [RFC PATCH bpf-next 1/5] cgroup: Enable task_under_cgroup_hierarchy() " Yafang Shao
2023-09-06 19:53 ` Alexei Starovoitov
2023-09-06 20:13 ` Tejun Heo
2023-09-07 3:05 ` Yafang Shao
2023-09-11 20:27 ` Tejun Heo
2023-09-18 14:45 ` Michal Koutný
2023-09-19 5:42 ` Yafang Shao
2023-09-03 14:27 ` [RFC PATCH bpf-next 2/5] bpf: Enable cgroup_array map " Yafang Shao
2023-09-06 19:54 ` Alexei Starovoitov
2023-09-03 14:27 ` [RFC PATCH bpf-next 3/5] selftests/bpf: Fix issues in setup_classid_environment() Yafang Shao
2023-09-03 14:27 ` [RFC PATCH bpf-next 4/5] selftests/bpf: Add new cgroup helper open_classid() Yafang Shao
2023-09-03 14:28 ` [RFC PATCH bpf-next 5/5] selftests/bpf: Add selftests for current_under_cgroupv1v2 Yafang Shao
2023-09-07 14:41 ` [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1 Michal Koutný
2023-09-08 2:53 ` Yafang Shao
2023-09-08 18:09 ` Alexei Starovoitov
2023-09-10 3:17 ` Yafang Shao
2023-09-11 19:53 ` Alexei Starovoitov
2023-09-11 20:24 ` Tejun Heo
2023-09-12 3:30 ` Yafang Shao
2023-09-15 17:01 ` Michal Koutný
2023-09-15 17:31 ` Tejun Heo
2023-09-17 7:28 ` Yafang Shao
2023-09-17 7:19 ` Yafang Shao
2023-09-18 14:44 ` Michal Koutný
2023-09-15 18:57 ` Hao Luo
2023-09-17 7:30 ` Yafang Shao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox