* [PATCH bpf-next 0/4] bpf: Add bpf_iter_cpumask
@ 2023-12-22 11:30 Yafang Shao
2023-12-22 11:30 ` [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system Yafang Shao
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Yafang Shao @ 2023-12-22 11:30 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes
Cc: bpf, cgroups, Yafang Shao
Three new kfuncs, namely bpf_iter_cpumask_{new,next,destroy}, have been
added for the new bpf_iter_cpumask functionality. These kfuncs enable the
iteration of percpu data, such as runqueues, psi_group_cpu, and more.
Additionally, a new kfunc, bpf_cpumask_set_from_pid, has been introduced to
specify the cpumask for iteration. This function retrieves the cpumask from
a specific task, facilitating the iteration of percpu data associated with
these CPUs.
In our specific use case, we leverage the cgroup iterator to traverse
percpu data, subsequently exposing it to userspace through a seq file.
Refer to the test cases in patch #4 for further context and examples.
Moreover, this patchset incorporates a change in the cgroup subsystem,
ensuring consistent access to PSI for all cgroups via the struct cgroup.
Changes:
- bpf: Add new bpf helper bpf_for_each_cpu
https://lwn.net/ml/bpf/20230801142912.55078-1-laoar.shao@gmail.com/
Yafang Shao (4):
cgroup, psi: Init PSI of root cgroup to psi_system
bpf: Add bpf_iter_cpumask kfuncs
bpf: Add new kfunc bpf_cpumask_set_from_pid
selftests/bpf: Add selftests for cpumask iter
include/linux/psi.h | 2 +-
kernel/bpf/cpumask.c | 65 ++++++++++
kernel/cgroup/cgroup.c | 5 +-
.../selftests/bpf/prog_tests/cpumask_iter.c | 132 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/cpumask_common.h | 4 +
.../selftests/bpf/progs/test_cpumask_iter.c | 50 ++++++++
6 files changed, 256 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/cpumask_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/test_cpumask_iter.c
--
1.8.3.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system
2023-12-22 11:30 [PATCH bpf-next 0/4] bpf: Add bpf_iter_cpumask Yafang Shao
@ 2023-12-22 11:30 ` Yafang Shao
2023-12-22 17:47 ` Tejun Heo
` (2 more replies)
2023-12-22 11:31 ` [PATCH bpf-next 2/4] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
` (2 subsequent siblings)
3 siblings, 3 replies; 13+ messages in thread
From: Yafang Shao @ 2023-12-22 11:30 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes
Cc: bpf, cgroups, Yafang Shao
By initializing the root cgroup's psi field to psi_system, we can
consistently obtain the psi information for all cgroups from the struct
cgroup.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
include/linux/psi.h | 2 +-
kernel/cgroup/cgroup.c | 5 ++++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/psi.h b/include/linux/psi.h
index e074587..8f2db51 100644
--- a/include/linux/psi.h
+++ b/include/linux/psi.h
@@ -34,7 +34,7 @@ __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file,
#ifdef CONFIG_CGROUPS
static inline struct psi_group *cgroup_psi(struct cgroup *cgrp)
{
- return cgroup_ino(cgrp) == 1 ? &psi_system : cgrp->psi;
+ return cgrp->psi;
}
int psi_cgroup_alloc(struct cgroup *cgrp);
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8f3cef1..07c7747 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -164,7 +164,10 @@ struct cgroup_subsys *cgroup_subsys[] = {
static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu);
/* the default hierarchy */
-struct cgroup_root cgrp_dfl_root = { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu };
+struct cgroup_root cgrp_dfl_root = {
+ .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu,
+ .cgrp.psi = &psi_system,
+};
EXPORT_SYMBOL_GPL(cgrp_dfl_root);
/*
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 2/4] bpf: Add bpf_iter_cpumask kfuncs
2023-12-22 11:30 [PATCH bpf-next 0/4] bpf: Add bpf_iter_cpumask Yafang Shao
2023-12-22 11:30 ` [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system Yafang Shao
@ 2023-12-22 11:31 ` Yafang Shao
2024-01-02 22:13 ` Andrii Nakryiko
2023-12-22 11:31 ` [PATCH bpf-next 3/4] bpf: Add new kfunc bpf_cpumask_set_from_pid Yafang Shao
2023-12-22 11:31 ` [PATCH bpf-next 4/4] selftests/bpf: Add selftests for cpumask iter Yafang Shao
3 siblings, 1 reply; 13+ messages in thread
From: Yafang Shao @ 2023-12-22 11:31 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes
Cc: bpf, cgroups, Yafang Shao
Add three new kfuncs for bpf_iter_cpumask.
- bpf_iter_cpumask_new
- bpf_iter_cpumask_next
- bpf_iter_cpumask_destroy
These new kfuncs facilitate the iteration of percpu data, such as
runqueues, psi_cgroup_cpu, and more.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
kernel/bpf/cpumask.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
index 2e73533..4ae07a4 100644
--- a/kernel/bpf/cpumask.c
+++ b/kernel/bpf/cpumask.c
@@ -422,6 +422,51 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask)
return cpumask_weight(cpumask);
}
+struct bpf_iter_cpumask {
+ __u64 __opaque[2];
+} __aligned(8);
+
+struct bpf_iter_cpumask_kern {
+ struct cpumask *mask;
+ int *cpu;
+} __aligned(8);
+
+__bpf_kfunc u32 bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, struct cpumask *mask)
+{
+ struct bpf_iter_cpumask_kern *kit = (void *)it;
+
+ kit->cpu = bpf_mem_alloc(&bpf_global_ma, sizeof(*kit->cpu));
+ if (!kit->cpu)
+ return -ENOMEM;
+
+ kit->mask = mask;
+ *kit->cpu = -1;
+ return 0;
+}
+
+__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it)
+{
+ struct bpf_iter_cpumask_kern *kit = (void *)it;
+ struct cpumask *mask = kit->mask;
+ int cpu;
+
+ cpu = cpumask_next(*kit->cpu, mask);
+ if (cpu >= nr_cpu_ids)
+ return NULL;
+
+ *kit->cpu = cpu;
+ return kit->cpu;
+}
+
+__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it)
+{
+ struct bpf_iter_cpumask_kern *kit = (void *)it;
+
+ if (!kit->cpu)
+ return;
+ bpf_mem_free(&bpf_global_ma, kit->cpu);
+}
+
__bpf_kfunc_end_defs();
BTF_SET8_START(cpumask_kfunc_btf_ids)
@@ -450,6 +495,9 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask)
BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU)
BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU)
+BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU)
+BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL | KF_RCU)
+BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY)
BTF_SET8_END(cpumask_kfunc_btf_ids)
static const struct btf_kfunc_id_set cpumask_kfunc_set = {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 3/4] bpf: Add new kfunc bpf_cpumask_set_from_pid
2023-12-22 11:30 [PATCH bpf-next 0/4] bpf: Add bpf_iter_cpumask Yafang Shao
2023-12-22 11:30 ` [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system Yafang Shao
2023-12-22 11:31 ` [PATCH bpf-next 2/4] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
@ 2023-12-22 11:31 ` Yafang Shao
2023-12-22 17:51 ` Tejun Heo
2023-12-22 11:31 ` [PATCH bpf-next 4/4] selftests/bpf: Add selftests for cpumask iter Yafang Shao
3 siblings, 1 reply; 13+ messages in thread
From: Yafang Shao @ 2023-12-22 11:31 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes
Cc: bpf, cgroups, Yafang Shao
Introducing a new kfunc: bpf_cpumask_set_from_pid. This function serves the
purpose of retrieving the cpumask associated with a specific PID. Its
utility is particularly evident within container environments. For
instance, it allows for extracting the cpuset of a container using the
init task within it.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
kernel/bpf/cpumask.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
index 4ae07a4..5755bb6 100644
--- a/kernel/bpf/cpumask.c
+++ b/kernel/bpf/cpumask.c
@@ -467,6 +467,22 @@ __bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it)
bpf_mem_free(&bpf_global_ma, kit->cpu);
}
+__bpf_kfunc bool bpf_cpumask_set_from_pid(struct cpumask *cpumask, u32 pid)
+{
+ struct task_struct *task;
+
+ if (!cpumask)
+ return false;
+
+ task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
+ if (!task)
+ return false;
+
+ cpumask_copy(cpumask, task->cpus_ptr);
+ put_task_struct(task);
+ return true;
+}
+
__bpf_kfunc_end_defs();
BTF_SET8_START(cpumask_kfunc_btf_ids)
@@ -498,6 +514,7 @@ __bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it)
BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU)
BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL | KF_RCU)
BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_cpumask_set_from_pid, KF_RCU)
BTF_SET8_END(cpumask_kfunc_btf_ids)
static const struct btf_kfunc_id_set cpumask_kfunc_set = {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 4/4] selftests/bpf: Add selftests for cpumask iter
2023-12-22 11:30 [PATCH bpf-next 0/4] bpf: Add bpf_iter_cpumask Yafang Shao
` (2 preceding siblings ...)
2023-12-22 11:31 ` [PATCH bpf-next 3/4] bpf: Add new kfunc bpf_cpumask_set_from_pid Yafang Shao
@ 2023-12-22 11:31 ` Yafang Shao
3 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2023-12-22 11:31 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes
Cc: bpf, cgroups, Yafang Shao
Within the BPF program, we leverage the cgroup iterator to iterate through
percpu runqueue data, specifically the 'nr_running' metric. Subsequently
we expose this data to userspace by means of a sequence file.
The CPU affinity for the cpumask is determined by the PID of a task:
- PID of the init task (PID 1)
We typically don't set CPU affinity for init task and thus we can iterate
across all possible CPUs. However, in scenarios where you've set CPU
affinity for the init task, you should set the cpumask of your current
task to full-F. Then proceed to iterate through all possible CPUs using
the current task.
- PID of a task with defined CPU affinity
The aim here is to iterate through a specific cpumask. This scenario
aligns with tasks residing within a cpuset cgroup.
- Invalid PID (e.g., PID -1)
No cpumask is available in this case.
The result as follows,
#62/1 cpumask_iter/init_pid:OK
#62/2 cpumask_iter/invalid_pid:OK
#62/3 cpumask_iter/self_pid_one_cpu:OK
#62/4 cpumask_iter/self_pid_multi_cpus:OK
#62 cpumask_iter:OK
Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
.../selftests/bpf/prog_tests/cpumask_iter.c | 132 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/cpumask_common.h | 4 +
.../selftests/bpf/progs/test_cpumask_iter.c | 50 ++++++++
3 files changed, 186 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/cpumask_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/test_cpumask_iter.c
diff --git a/tools/testing/selftests/bpf/prog_tests/cpumask_iter.c b/tools/testing/selftests/bpf/prog_tests/cpumask_iter.c
new file mode 100644
index 0000000..40556cf
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/cpumask_iter.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <test_progs.h>
+#include "cgroup_helpers.h"
+#include "test_cpumask_iter.skel.h"
+
+static void verify_percpu_data(struct bpf_link *link, int nr_cpu_exp, int nr_running_exp)
+{
+ int iter_fd, len, item, nr_running, nr_cpus;
+ static char buf[128];
+ size_t left;
+ char *p;
+
+ iter_fd = bpf_iter_create(bpf_link__fd(link));
+ if (!ASSERT_GE(iter_fd, 0, "iter_fd"))
+ return;
+
+ memset(buf, 0, sizeof(buf));
+ left = ARRAY_SIZE(buf);
+ p = buf;
+ while ((len = read(iter_fd, p, left)) > 0) {
+ p += len;
+ left -= len;
+ }
+
+ item = sscanf(buf, "nr_running %u nr_cpus %u\n", &nr_running, &nr_cpus);
+ if (nr_cpu_exp == -1) {
+ ASSERT_EQ(item, -1, "seq_format");
+ goto out;
+ }
+
+ ASSERT_EQ(item, 2, "seq_format");
+ ASSERT_GE(nr_running, nr_running_exp, "nr_running");
+ ASSERT_EQ(nr_cpus, nr_cpu_exp, "nr_cpus");
+
+ /* read() after iter finishes should be ok. */
+ if (len == 0)
+ ASSERT_OK(read(iter_fd, buf, sizeof(buf)), "second_read");
+
+out:
+ close(iter_fd);
+}
+
+void test_cpumask_iter(void)
+{
+ DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+ int nr_possible, cgrp_fd, pid, err, cnt, i;
+ struct test_cpumask_iter *skel = NULL;
+ union bpf_iter_link_info linfo;
+ int cpu_ids[] = {1, 3, 4, 5};
+ struct bpf_link *link;
+ cpu_set_t set;
+
+ skel = test_cpumask_iter__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "test_for_each_cpu__open_and_load"))
+ return;
+
+ if (setup_cgroup_environment())
+ goto destroy;
+
+ /* Utilize the cgroup iter */
+ cgrp_fd = get_root_cgroup();
+ if (!ASSERT_GE(cgrp_fd, 0, "create cgrp"))
+ goto cleanup;
+
+ memset(&linfo, 0, sizeof(linfo));
+ linfo.cgroup.cgroup_fd = cgrp_fd;
+ linfo.cgroup.order = BPF_CGROUP_ITER_SELF_ONLY;
+ opts.link_info = &linfo;
+ opts.link_info_len = sizeof(linfo);
+
+ link = bpf_program__attach_iter(skel->progs.cpu_cgroup, &opts);
+ if (!ASSERT_OK_PTR(link, "attach_iter"))
+ goto close_fd;
+
+ skel->bss->target_pid = 1;
+ /* In case init task is set CPU affinity */
+ err = sched_getaffinity(1, sizeof(set), &set);
+ if (!ASSERT_OK(err, "setaffinity"))
+ goto close_fd;
+
+ cnt = CPU_COUNT(&set);
+ nr_possible = bpf_num_possible_cpus();
+ if (test__start_subtest("init_pid"))
+ /* curent task is running. */
+ verify_percpu_data(link, cnt, cnt == nr_possible ? 1 : 0);
+
+ skel->bss->target_pid = -1;
+ if (test__start_subtest("invalid_pid"))
+ verify_percpu_data(link, -1, -1);
+
+ pid = getpid();
+ skel->bss->target_pid = pid;
+ CPU_ZERO(&set);
+ CPU_SET(0, &set);
+ err = sched_setaffinity(pid, sizeof(set), &set);
+ if (!ASSERT_OK(err, "setaffinity"))
+ goto free_link;
+
+ if (test__start_subtest("self_pid_one_cpu"))
+ verify_percpu_data(link, 1, 1);
+
+ /* Assume there are at least 8 CPUs on the testbed */
+ if (nr_possible < 8)
+ goto free_link;
+
+ CPU_ZERO(&set);
+ /* Set the CPU affinitiy: 1,3-5 */
+ for (i = 0; i < ARRAY_SIZE(cpu_ids); i++)
+ CPU_SET(cpu_ids[i], &set);
+ err = sched_setaffinity(pid, sizeof(set), &set);
+ if (!ASSERT_OK(err, "setaffinity"))
+ goto free_link;
+
+ if (test__start_subtest("self_pid_multi_cpus"))
+ verify_percpu_data(link, ARRAY_SIZE(cpu_ids), 1);
+
+free_link:
+ bpf_link__destroy(link);
+close_fd:
+ close(cgrp_fd);
+cleanup:
+ cleanup_cgroup_environment();
+destroy:
+ test_cpumask_iter__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/cpumask_common.h b/tools/testing/selftests/bpf/progs/cpumask_common.h
index 0cd4aeb..5ebb136 100644
--- a/tools/testing/selftests/bpf/progs/cpumask_common.h
+++ b/tools/testing/selftests/bpf/progs/cpumask_common.h
@@ -55,6 +55,10 @@ void bpf_cpumask_xor(struct bpf_cpumask *cpumask,
u32 bpf_cpumask_any_distribute(const struct cpumask *src) __ksym;
u32 bpf_cpumask_any_and_distribute(const struct cpumask *src1, const struct cpumask *src2) __ksym;
u32 bpf_cpumask_weight(const struct cpumask *cpumask) __ksym;
+u32 bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, struct cpumask *mask) __ksym;
+u32 *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) __ksym;
+void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) __ksym;
+bool bpf_cpumask_set_from_pid(struct cpumask *cpumask, u32 pid) __ksym;
void bpf_rcu_read_lock(void) __ksym;
void bpf_rcu_read_unlock(void) __ksym;
diff --git a/tools/testing/selftests/bpf/progs/test_cpumask_iter.c b/tools/testing/selftests/bpf/progs/test_cpumask_iter.c
new file mode 100644
index 0000000..d0cdb92
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_cpumask_iter.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "cpumask_common.h"
+
+extern const struct rq runqueues __ksym __weak;
+
+int target_pid;
+
+SEC("iter/cgroup")
+int BPF_PROG(cpu_cgroup, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+ u32 *cpu, nr_running = 0, nr_cpus = 0;
+ struct bpf_cpumask *mask;
+ struct rq *rq;
+ int ret;
+
+ /* epilogue */
+ if (cgrp == NULL)
+ return 0;
+
+ mask = bpf_cpumask_create();
+ if (!mask)
+ return 1;
+
+ ret = bpf_cpumask_set_from_pid(&mask->cpumask, target_pid);
+ if (ret == false) {
+ bpf_cpumask_release(mask);
+ return 1;
+ }
+
+ bpf_for_each(cpumask, cpu, &mask->cpumask) {
+ rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, *cpu);
+ if (!rq)
+ continue;
+
+ nr_running += rq->nr_running;
+ nr_cpus += 1;
+ }
+ BPF_SEQ_PRINTF(meta->seq, "nr_running %u nr_cpus %u\n", nr_running, nr_cpus);
+
+ bpf_cpumask_release(mask);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system
2023-12-22 11:30 ` [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system Yafang Shao
@ 2023-12-22 17:47 ` Tejun Heo
2023-12-24 3:14 ` Yafang Shao
2023-12-22 23:49 ` kernel test robot
2023-12-23 7:26 ` kernel test robot
2 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2023-12-22 17:47 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, lizefan.x, hannes,
bpf, cgroups
Hello,
On Fri, Dec 22, 2023 at 11:30:59AM +0000, Yafang Shao wrote:
> By initializing the root cgroup's psi field to psi_system, we can
> consistently obtain the psi information for all cgroups from the struct
> cgroup.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> include/linux/psi.h | 2 +-
> kernel/cgroup/cgroup.c | 5 ++++-
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/psi.h b/include/linux/psi.h
> index e074587..8f2db51 100644
> --- a/include/linux/psi.h
> +++ b/include/linux/psi.h
> @@ -34,7 +34,7 @@ __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file,
> #ifdef CONFIG_CGROUPS
> static inline struct psi_group *cgroup_psi(struct cgroup *cgrp)
> {
> - return cgroup_ino(cgrp) == 1 ? &psi_system : cgrp->psi;
> + return cgrp->psi;
> }
How have you tested this change? Looking at the code there are other
references to psi_system, e.g. to show it under /proc/pressure/* and to
exempt it from CPU FULL accounting. I don't see how the above change would
be sufficient.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 3/4] bpf: Add new kfunc bpf_cpumask_set_from_pid
2023-12-22 11:31 ` [PATCH bpf-next 3/4] bpf: Add new kfunc bpf_cpumask_set_from_pid Yafang Shao
@ 2023-12-22 17:51 ` Tejun Heo
2023-12-24 3:05 ` Yafang Shao
0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2023-12-22 17:51 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, lizefan.x, hannes,
bpf, cgroups
Hello,
On Fri, Dec 22, 2023 at 11:31:01AM +0000, Yafang Shao wrote:
> Introducing a new kfunc: bpf_cpumask_set_from_pid. This function serves the
> purpose of retrieving the cpumask associated with a specific PID. Its
> utility is particularly evident within container environments. For
> instance, it allows for extracting the cpuset of a container using the
> init task within it.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
...
> +__bpf_kfunc bool bpf_cpumask_set_from_pid(struct cpumask *cpumask, u32 pid)
> +{
> + struct task_struct *task;
> +
> + if (!cpumask)
> + return false;
> +
> + task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
> + if (!task)
> + return false;
> +
> + cpumask_copy(cpumask, task->cpus_ptr);
> + put_task_struct(task);
> + return true;
> +}
This seems awfully specific. Why is this necessary? Shouldn't the BPF prog
get the task and bpf_cpumask_copy() its ->cpus_ptr instead?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system
2023-12-22 11:30 ` [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system Yafang Shao
2023-12-22 17:47 ` Tejun Heo
@ 2023-12-22 23:49 ` kernel test robot
2023-12-23 7:26 ` kernel test robot
2 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2023-12-22 23:49 UTC (permalink / raw)
To: Yafang Shao, ast, daniel, john.fastabend, andrii, martin.lau,
song, yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x,
hannes
Cc: oe-kbuild-all, bpf, cgroups, Yafang Shao
Hi Yafang,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Yafang-Shao/cgroup-psi-Init-PSI-of-root-cgroup-to-psi_system/20231222-193221
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20231222113102.4148-2-laoar.shao%40gmail.com
patch subject: [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system
config: s390-randconfig-r081-20231223 (https://download.01.org/0day-ci/archive/20231223/202312230748.92S9ML64-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231223/202312230748.92S9ML64-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312230748.92S9ML64-lkp@intel.com/
All errors (new ones prefixed by >>):
>> kernel/cgroup/cgroup.c:169:22: error: 'psi_system' undeclared here (not in a function)
169 | .cgrp.psi = &psi_system,
| ^~~~~~~~~~
vim +/psi_system +169 kernel/cgroup/cgroup.c
165
166 /* the default hierarchy */
167 struct cgroup_root cgrp_dfl_root = {
168 .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu,
> 169 .cgrp.psi = &psi_system,
170 };
171 EXPORT_SYMBOL_GPL(cgrp_dfl_root);
172
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system
2023-12-22 11:30 ` [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system Yafang Shao
2023-12-22 17:47 ` Tejun Heo
2023-12-22 23:49 ` kernel test robot
@ 2023-12-23 7:26 ` kernel test robot
2 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2023-12-23 7:26 UTC (permalink / raw)
To: Yafang Shao, ast, daniel, john.fastabend, andrii, martin.lau,
song, yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x,
hannes
Cc: llvm, oe-kbuild-all, bpf, cgroups, Yafang Shao
Hi Yafang,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Yafang-Shao/cgroup-psi-Init-PSI-of-root-cgroup-to-psi_system/20231222-193221
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20231222113102.4148-2-laoar.shao%40gmail.com
patch subject: [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system
config: arm-defconfig (https://download.01.org/0day-ci/archive/20231223/202312231522.VWy0LXXY-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231223/202312231522.VWy0LXXY-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312231522.VWy0LXXY-lkp@intel.com/
All errors (new ones prefixed by >>):
>> kernel/cgroup/cgroup.c:169:15: error: use of undeclared identifier 'psi_system'
.cgrp.psi = &psi_system,
^
1 error generated.
vim +/psi_system +169 kernel/cgroup/cgroup.c
165
166 /* the default hierarchy */
167 struct cgroup_root cgrp_dfl_root = {
168 .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu,
> 169 .cgrp.psi = &psi_system,
170 };
171 EXPORT_SYMBOL_GPL(cgrp_dfl_root);
172
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 3/4] bpf: Add new kfunc bpf_cpumask_set_from_pid
2023-12-22 17:51 ` Tejun Heo
@ 2023-12-24 3:05 ` Yafang Shao
0 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2023-12-24 3:05 UTC (permalink / raw)
To: Tejun Heo
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, lizefan.x, hannes,
bpf, cgroups
On Sat, Dec 23, 2023 at 1:51 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Fri, Dec 22, 2023 at 11:31:01AM +0000, Yafang Shao wrote:
> > Introducing a new kfunc: bpf_cpumask_set_from_pid. This function serves the
> > purpose of retrieving the cpumask associated with a specific PID. Its
> > utility is particularly evident within container environments. For
> > instance, it allows for extracting the cpuset of a container using the
> > init task within it.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ...
> > +__bpf_kfunc bool bpf_cpumask_set_from_pid(struct cpumask *cpumask, u32 pid)
> > +{
> > + struct task_struct *task;
> > +
> > + if (!cpumask)
> > + return false;
> > +
> > + task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
> > + if (!task)
> > + return false;
> > +
> > + cpumask_copy(cpumask, task->cpus_ptr);
> > + put_task_struct(task);
> > + return true;
> > +}
>
> This seems awfully specific. Why is this necessary? Shouldn't the BPF prog
> get the task and bpf_cpumask_copy() its ->cpus_ptr instead?
Good point. I missed the bpf_cpumask_copy(). Will use it instead.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system
2023-12-22 17:47 ` Tejun Heo
@ 2023-12-24 3:14 ` Yafang Shao
0 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2023-12-24 3:14 UTC (permalink / raw)
To: Tejun Heo
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, lizefan.x, hannes,
bpf, cgroups
On Sat, Dec 23, 2023 at 1:47 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Fri, Dec 22, 2023 at 11:30:59AM +0000, Yafang Shao wrote:
> > By initializing the root cgroup's psi field to psi_system, we can
> > consistently obtain the psi information for all cgroups from the struct
> > cgroup.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> > include/linux/psi.h | 2 +-
> > kernel/cgroup/cgroup.c | 5 ++++-
> > 2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/psi.h b/include/linux/psi.h
> > index e074587..8f2db51 100644
> > --- a/include/linux/psi.h
> > +++ b/include/linux/psi.h
> > @@ -34,7 +34,7 @@ __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file,
> > #ifdef CONFIG_CGROUPS
> > static inline struct psi_group *cgroup_psi(struct cgroup *cgrp)
> > {
> > - return cgroup_ino(cgrp) == 1 ? &psi_system : cgrp->psi;
> > + return cgrp->psi;
> > }
>
> How have you tested this change? Looking at the code there are other
After implementing the modification, I solely focused on validating
the functionality of root_cgrp->psi to ensure its compatibility with
the recent changes, akin to the self-tests performed in the previous
version [0]. However, it's noteworthy that building the kernel
necessitates clang-14+, hence, I refrained from incorporating this
into the current version.
Regarding the alterations made to /proc/pressure/, I haven't yet
conducted thorough verification to confirm if the adjustments are
comprehensive enough. I will analyze the potential impact on
/proc/pressure/* in the next phase.
[0]. https://lore.kernel.org/bpf/20230801142912.55078-4-laoar.shao@gmail.com/
> references to psi_system, e.g. to show it under /proc/pressure/* and to
> exempt it from CPU FULL accounting. I don't see how the above change would
> be sufficient.
Thanks for your suggestion.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/4] bpf: Add bpf_iter_cpumask kfuncs
2023-12-22 11:31 ` [PATCH bpf-next 2/4] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
@ 2024-01-02 22:13 ` Andrii Nakryiko
2024-01-04 2:30 ` Yafang Shao
0 siblings, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2024-01-02 22:13 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
bpf, cgroups
On Fri, Dec 22, 2023 at 3:31 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Add three new kfuncs for bpf_iter_cpumask.
> - bpf_iter_cpumask_new
> - bpf_iter_cpumask_next
> - bpf_iter_cpumask_destroy
>
> These new kfuncs facilitate the iteration of percpu data, such as
> runqueues, psi_cgroup_cpu, and more.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> kernel/bpf/cpumask.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 48 insertions(+)
>
> diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
> index 2e73533..4ae07a4 100644
> --- a/kernel/bpf/cpumask.c
> +++ b/kernel/bpf/cpumask.c
> @@ -422,6 +422,51 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask)
> return cpumask_weight(cpumask);
> }
>
> +struct bpf_iter_cpumask {
> + __u64 __opaque[2];
> +} __aligned(8);
> +
> +struct bpf_iter_cpumask_kern {
> + struct cpumask *mask;
> + int *cpu;
> +} __aligned(8);
> +
> +__bpf_kfunc u32 bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, struct cpumask *mask)
> +{
> + struct bpf_iter_cpumask_kern *kit = (void *)it;
> +
> + kit->cpu = bpf_mem_alloc(&bpf_global_ma, sizeof(*kit->cpu));
why dynamic memory allocation of 4 bytes?... just have `int cpu;`
field in bpf_iter_cpumask_kern?
> + if (!kit->cpu)
> + return -ENOMEM;
> +
> + kit->mask = mask;
> + *kit->cpu = -1;
> + return 0;
> +}
> +
> +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it)
> +{
> + struct bpf_iter_cpumask_kern *kit = (void *)it;
> + struct cpumask *mask = kit->mask;
> + int cpu;
> +
> + cpu = cpumask_next(*kit->cpu, mask);
> + if (cpu >= nr_cpu_ids)
> + return NULL;
> +
> + *kit->cpu = cpu;
> + return kit->cpu;
> +}
> +
> +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it)
> +{
> + struct bpf_iter_cpumask_kern *kit = (void *)it;
> +
> + if (!kit->cpu)
> + return;
> + bpf_mem_free(&bpf_global_ma, kit->cpu);
> +}
> +
> __bpf_kfunc_end_defs();
>
> BTF_SET8_START(cpumask_kfunc_btf_ids)
> @@ -450,6 +495,9 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask)
> BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU)
> BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU)
> BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU)
> +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU)
> +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL | KF_RCU)
> +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY)
> BTF_SET8_END(cpumask_kfunc_btf_ids)
>
> static const struct btf_kfunc_id_set cpumask_kfunc_set = {
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/4] bpf: Add bpf_iter_cpumask kfuncs
2024-01-02 22:13 ` Andrii Nakryiko
@ 2024-01-04 2:30 ` Yafang Shao
0 siblings, 0 replies; 13+ messages in thread
From: Yafang Shao @ 2024-01-04 2:30 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, lizefan.x, hannes,
bpf, cgroups
On Wed, Jan 3, 2024 at 6:13 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Dec 22, 2023 at 3:31 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Add three new kfuncs for bpf_iter_cpumask.
> > - bpf_iter_cpumask_new
> > - bpf_iter_cpumask_next
> > - bpf_iter_cpumask_destroy
> >
> > These new kfuncs facilitate the iteration of percpu data, such as
> > runqueues, psi_cgroup_cpu, and more.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> > kernel/bpf/cpumask.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 48 insertions(+)
> >
> > diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
> > index 2e73533..4ae07a4 100644
> > --- a/kernel/bpf/cpumask.c
> > +++ b/kernel/bpf/cpumask.c
> > @@ -422,6 +422,51 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask)
> > return cpumask_weight(cpumask);
> > }
> >
> > +struct bpf_iter_cpumask {
> > + __u64 __opaque[2];
> > +} __aligned(8);
> > +
> > +struct bpf_iter_cpumask_kern {
> > + struct cpumask *mask;
> > + int *cpu;
> > +} __aligned(8);
> > +
> > +__bpf_kfunc u32 bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, struct cpumask *mask)
> > +{
> > + struct bpf_iter_cpumask_kern *kit = (void *)it;
> > +
> > + kit->cpu = bpf_mem_alloc(&bpf_global_ma, sizeof(*kit->cpu));
>
> why dynamic memory allocation of 4 bytes?... just have `int cpu;`
> field in bpf_iter_cpumask_kern?
Will do it. Thanks for your suggestion.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-01-04 2:30 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-22 11:30 [PATCH bpf-next 0/4] bpf: Add bpf_iter_cpumask Yafang Shao
2023-12-22 11:30 ` [PATCH bpf-next 1/4] cgroup, psi: Init PSI of root cgroup to psi_system Yafang Shao
2023-12-22 17:47 ` Tejun Heo
2023-12-24 3:14 ` Yafang Shao
2023-12-22 23:49 ` kernel test robot
2023-12-23 7:26 ` kernel test robot
2023-12-22 11:31 ` [PATCH bpf-next 2/4] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
2024-01-02 22:13 ` Andrii Nakryiko
2024-01-04 2:30 ` Yafang Shao
2023-12-22 11:31 ` [PATCH bpf-next 3/4] bpf: Add new kfunc bpf_cpumask_set_from_pid Yafang Shao
2023-12-22 17:51 ` Tejun Heo
2023-12-24 3:05 ` Yafang Shao
2023-12-22 11:31 ` [PATCH bpf-next 4/4] selftests/bpf: Add selftests for cpumask iter Yafang Shao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).