BPF List
 help / color / mirror / Atom feed
* [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask
@ 2024-02-06  8:14 Yafang Shao
  2024-02-06  8:14 ` [PATCH v6 bpf-next 1/5] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Yafang Shao @ 2024-02-06  8:14 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, void
  Cc: bpf, Yafang Shao

Three new kfuncs, namely bpf_iter_cpumask_{new,next,destroy}, have been
added for the new bpf_iter_cpumask functionality. These kfuncs enable the
iteration of percpu data, such as runqueues, system_group_pcpu, and more.

In our specific use case, we leverage the cgroup iterator to traverse
percpu data, subsequently exposing it to userspace through a seq file.
Refer to example in patch #2 for the usage.

Changes:
- v5 -> v6:
  - Various improvements on the comments (Andrii)
  - Use a static function instead as Kumar's patch[0] has been merged.
    (Anrii, Yonghong) 

[0]. https://lore.kernel.org/bpf/170719262630.31872.2248639771567354367.git-patchwork-notify@kernel.org

- v4 -> v5:
  - Use cpumask_size() instead of sizeof(struct cpumask) (David)
  - Refactor the selftests as suggsted by David
  - Improve the doc as suggested by David
- v3 -> v4:
  - Use a copy of cpumask to avoid potential use-after-free (Yonghong)
  - Various code improvement in selftests (Yonghong)
- v2 -> v3:
  - Define KF_RCU_PROTECTED for bpf_iter_cpumask_new (Alexei)
  - Code improvement in selftests
  - Fix build error in selftest due to CONFIG_PSI=n
    reported by kernel test robot <lkp@intel.com>
- v1 -> v2: 
  - Avoid changing cgroup subsystem (Tejun)
  - Remove bpf_cpumask_set_from_pid(), and use bpf_cpumask_copy()
    instead (Tejun)
  - Use `int cpu;` field in bpf_iter_cpumask_kern (Andrii)

Yafang Shao (5):
  bpf: Add bpf_iter_cpumask kfuncs
  bpf, docs: Add document for cpumask iter
  selftests/bpf: Fix error checking for cpumask_success__load()
  selftests/bpf: Mark cpumask kfunc declarations as __weak
  selftests/bpf: Add selftests for cpumask iter

 Documentation/bpf/cpumasks.rst                |  60 +++++++
 kernel/bpf/cpumask.c                          |  79 +++++++++
 tools/testing/selftests/bpf/config            |   1 +
 .../selftests/bpf/prog_tests/cpumask.c        | 158 +++++++++++++++++-
 .../selftests/bpf/progs/cpumask_common.h      |  60 +++----
 .../bpf/progs/cpumask_iter_failure.c          |  99 +++++++++++
 .../bpf/progs/cpumask_iter_success.c          | 126 ++++++++++++++
 7 files changed, 552 insertions(+), 31 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/cpumask_iter_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/cpumask_iter_success.c

-- 
2.39.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 bpf-next 1/5] bpf: Add bpf_iter_cpumask kfuncs
  2024-02-06  8:14 [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Yafang Shao
@ 2024-02-06  8:14 ` Yafang Shao
  2024-02-07  1:06   ` Alexei Starovoitov
  2024-02-06  8:14 ` [PATCH v6 bpf-next 2/5] bpf, docs: Add document for cpumask iter Yafang Shao
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2024-02-06  8:14 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, void
  Cc: bpf, Yafang Shao

Add three new kfuncs for bpf_iter_cpumask.
- bpf_iter_cpumask_new
  KF_RCU is defined because the cpumask must be a RCU trusted pointer
  such as task->cpus_ptr.
- bpf_iter_cpumask_next
- bpf_iter_cpumask_destroy

These new kfuncs facilitate the iteration of percpu data, such as
runqueues, psi_cgroup_cpu, and more.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/cpumask.c | 79 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
index dad0fb1c8e87..ed6078cfa40e 100644
--- a/kernel/bpf/cpumask.c
+++ b/kernel/bpf/cpumask.c
@@ -422,6 +422,82 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask)
 	return cpumask_weight(cpumask);
 }
 
+struct bpf_iter_cpumask {
+	__u64 __opaque[2];
+} __aligned(8);
+
+struct bpf_iter_cpumask_kern {
+	struct cpumask *mask;
+	int cpu;
+} __aligned(8);
+
+/**
+ * bpf_iter_cpumask_new() - Initialize a new CPU mask iterator for a given CPU mask
+ * @it: The new bpf_iter_cpumask to be created.
+ * @mask: The cpumask to be iterated over.
+ *
+ * This function initializes a new bpf_iter_cpumask structure for iterating over
+ * the specified CPU mask. It assigns the provided cpumask to the newly created
+ * bpf_iter_cpumask @it for subsequent iteration operations.
+ *
+ * On success, 0 is returned. On failure, ERR is returned.
+ */
+__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask)
+{
+	struct bpf_iter_cpumask_kern *kit = (void *)it;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) !=
+		     __alignof__(struct bpf_iter_cpumask));
+
+	kit->mask = bpf_mem_alloc(&bpf_global_ma, cpumask_size());
+	if (!kit->mask)
+		return -ENOMEM;
+
+	cpumask_copy(kit->mask, mask);
+	kit->cpu = -1;
+	return 0;
+}
+
+/**
+ * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask
+ * @it: The bpf_iter_cpumask
+ *
+ * This function returns a pointer to a number representing the ID of the
+ * next CPU in CPU mask. It allows sequential access to CPUs within the
+ * cpumask. If there are no further CPUs available, it returns NULL.
+ */
+__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it)
+{
+	struct bpf_iter_cpumask_kern *kit = (void *)it;
+	const struct cpumask *mask = kit->mask;
+	int cpu;
+
+	if (!mask)
+		return NULL;
+	cpu = cpumask_next(kit->cpu, mask);
+	if (cpu >= nr_cpu_ids)
+		return NULL;
+
+	kit->cpu = cpu;
+	return &kit->cpu;
+}
+
+/**
+ * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask
+ * @it: The bpf_iter_cpumask to be destroyed.
+ *
+ * Destroy the resource associated with the bpf_iter_cpumask.
+ */
+__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it)
+{
+	struct bpf_iter_cpumask_kern *kit = (void *)it;
+
+	if (!kit->mask)
+		return;
+	bpf_mem_free(&bpf_global_ma, kit->mask);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(cpumask_kfunc_btf_ids)
@@ -450,6 +526,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU)
 BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU)
 BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU)
 BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU)
+BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU)
+BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY)
 BTF_KFUNCS_END(cpumask_kfunc_btf_ids)
 
 static const struct btf_kfunc_id_set cpumask_kfunc_set = {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 bpf-next 2/5] bpf, docs: Add document for cpumask iter
  2024-02-06  8:14 [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Yafang Shao
  2024-02-06  8:14 ` [PATCH v6 bpf-next 1/5] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
@ 2024-02-06  8:14 ` Yafang Shao
  2024-02-06  8:14 ` [PATCH v6 bpf-next 3/5] selftests/bpf: Fix error checking for cpumask_success__load() Yafang Shao
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2024-02-06  8:14 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, void
  Cc: bpf, Yafang Shao

This patch adds the document for the newly added cpumask iterator
kfuncs.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
 Documentation/bpf/cpumasks.rst | 60 ++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/Documentation/bpf/cpumasks.rst b/Documentation/bpf/cpumasks.rst
index b5d47a04da5d..5cedd719874c 100644
--- a/Documentation/bpf/cpumasks.rst
+++ b/Documentation/bpf/cpumasks.rst
@@ -372,6 +372,66 @@ used.
 .. _tools/testing/selftests/bpf/progs/cpumask_success.c:
    https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/progs/cpumask_success.c
 
+3.3 cpumask iterator
+--------------------
+
+The cpumask iterator facilitates the iteration of per-CPU data, including
+runqueues, system_group_pcpu, and other such structures. To leverage the cpumask
+iterator, one can employ the bpf_for_each() macro.
+
+Here's an example illustrating how to determine the number of running tasks on
+each CPU.
+
+.. code-block:: c
+
+        /**
+         * Here's an example demonstrating the functionality of the cpumask iterator.
+         * We retrieve the cpumask associated with the specified pid, iterate through
+         * its elements, and ultimately expose per-CPU data to userspace through a
+         * seq file.
+         */
+        const struct rq runqueues __ksym __weak;
+        u32 target_pid;
+
+        SEC("iter/cgroup")
+        int BPF_PROG(cpu_cgroup, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+        {
+                u32 nr_running = 0, nr_cpus = 0, nr_null_rq = 0;
+                struct task_struct *p;
+                struct rq *rq;
+                int *cpu;
+
+                /* epilogue */
+                if (cgrp == NULL)
+                        return 0;
+
+                p = bpf_task_from_pid(target_pid);
+                if (!p)
+                        return 1;
+
+                BPF_SEQ_PRINTF(meta->seq, "%4s %s\n", "CPU", "nr_running");
+                bpf_for_each(cpumask, cpu, p->cpus_ptr) {
+                        rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, *cpu);
+                        if (!rq) {
+                                nr_null_rq += 1;
+                                continue;
+                        }
+                        nr_cpus += 1;
+
+                        if (!rq->nr_running)
+                                continue;
+
+                        nr_running += rq->nr_running;
+                        BPF_SEQ_PRINTF(meta->seq, "%4u %u\n", *cpu, rq->nr_running);
+                }
+                BPF_SEQ_PRINTF(meta->seq, "Summary: nr_cpus %u, nr_running %u, nr_null_rq %u\n",
+                               nr_cpus, nr_running, nr_null_rq);
+
+                bpf_task_release(p);
+                return 0;
+        }
+
+----
 
 4. Adding BPF cpumask kfuncs
 ============================
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 bpf-next 3/5] selftests/bpf: Fix error checking for cpumask_success__load()
  2024-02-06  8:14 [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Yafang Shao
  2024-02-06  8:14 ` [PATCH v6 bpf-next 1/5] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
  2024-02-06  8:14 ` [PATCH v6 bpf-next 2/5] bpf, docs: Add document for cpumask iter Yafang Shao
@ 2024-02-06  8:14 ` Yafang Shao
  2024-02-06  8:14 ` [PATCH v6 bpf-next 4/5] selftests/bpf: Mark cpumask kfunc declarations as __weak Yafang Shao
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2024-02-06  8:14 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, void
  Cc: bpf, Yafang Shao

We should verify the return value of cpumask_success__load().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: David Vernet <void@manifault.com>
---
 tools/testing/selftests/bpf/prog_tests/cpumask.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/cpumask.c b/tools/testing/selftests/bpf/prog_tests/cpumask.c
index c2e886399e3c..ecf89df78109 100644
--- a/tools/testing/selftests/bpf/prog_tests/cpumask.c
+++ b/tools/testing/selftests/bpf/prog_tests/cpumask.c
@@ -27,7 +27,7 @@ static void verify_success(const char *prog_name)
 	struct bpf_program *prog;
 	struct bpf_link *link = NULL;
 	pid_t child_pid;
-	int status;
+	int status, err;
 
 	skel = cpumask_success__open();
 	if (!ASSERT_OK_PTR(skel, "cpumask_success__open"))
@@ -36,8 +36,8 @@ static void verify_success(const char *prog_name)
 	skel->bss->pid = getpid();
 	skel->bss->nr_cpus = libbpf_num_possible_cpus();
 
-	cpumask_success__load(skel);
-	if (!ASSERT_OK_PTR(skel, "cpumask_success__load"))
+	err = cpumask_success__load(skel);
+	if (!ASSERT_OK(err, "cpumask_success__load"))
 		goto cleanup;
 
 	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 bpf-next 4/5] selftests/bpf: Mark cpumask kfunc declarations as __weak
  2024-02-06  8:14 [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Yafang Shao
                   ` (2 preceding siblings ...)
  2024-02-06  8:14 ` [PATCH v6 bpf-next 3/5] selftests/bpf: Fix error checking for cpumask_success__load() Yafang Shao
@ 2024-02-06  8:14 ` Yafang Shao
  2024-02-06 15:48   ` Daniel Xu
  2024-02-06  8:14 ` [PATCH v6 bpf-next 5/5] selftests/bpf: Add selftests for cpumask iter Yafang Shao
  2024-02-08  0:18 ` [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Andrii Nakryiko
  5 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2024-02-06  8:14 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, void
  Cc: bpf, Yafang Shao, Andrii Nakryiko, Daniel Xu

After the series "Annotate kfuncs in .BTF_ids section"[0], kfuncs can be
generated from bpftool. Let's mark the existing cpumask kfunc declarations
__weak so they don't conflict with definitions that will eventually come
from vmlinux.h.

[0]. https://lore.kernel.org/all/cover.1706491398.git.dxu@dxuuu.xyz

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Daniel Xu <dxu@dxuuu.xyz>
---
 .../selftests/bpf/progs/cpumask_common.h      | 57 ++++++++++---------
 1 file changed, 29 insertions(+), 28 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/cpumask_common.h b/tools/testing/selftests/bpf/progs/cpumask_common.h
index 0cd4aebb97cf..c705d8112a35 100644
--- a/tools/testing/selftests/bpf/progs/cpumask_common.h
+++ b/tools/testing/selftests/bpf/progs/cpumask_common.h
@@ -23,41 +23,42 @@ struct array_map {
 	__uint(max_entries, 1);
 } __cpumask_map SEC(".maps");
 
-struct bpf_cpumask *bpf_cpumask_create(void) __ksym;
-void bpf_cpumask_release(struct bpf_cpumask *cpumask) __ksym;
-struct bpf_cpumask *bpf_cpumask_acquire(struct bpf_cpumask *cpumask) __ksym;
-u32 bpf_cpumask_first(const struct cpumask *cpumask) __ksym;
-u32 bpf_cpumask_first_zero(const struct cpumask *cpumask) __ksym;
+struct bpf_cpumask *bpf_cpumask_create(void) __ksym __weak;
+void bpf_cpumask_release(struct bpf_cpumask *cpumask) __ksym __weak;
+struct bpf_cpumask *bpf_cpumask_acquire(struct bpf_cpumask *cpumask) __ksym __weak;
+u32 bpf_cpumask_first(const struct cpumask *cpumask) __ksym __weak;
+u32 bpf_cpumask_first_zero(const struct cpumask *cpumask) __ksym __weak;
 u32 bpf_cpumask_first_and(const struct cpumask *src1,
-			  const struct cpumask *src2) __ksym;
-void bpf_cpumask_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) __ksym;
-void bpf_cpumask_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) __ksym;
-bool bpf_cpumask_test_cpu(u32 cpu, const struct cpumask *cpumask) __ksym;
-bool bpf_cpumask_test_and_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) __ksym;
-bool bpf_cpumask_test_and_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) __ksym;
-void bpf_cpumask_setall(struct bpf_cpumask *cpumask) __ksym;
-void bpf_cpumask_clear(struct bpf_cpumask *cpumask) __ksym;
+			  const struct cpumask *src2) __ksym __weak;
+void bpf_cpumask_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) __ksym __weak;
+void bpf_cpumask_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) __ksym __weak;
+bool bpf_cpumask_test_cpu(u32 cpu, const struct cpumask *cpumask) __ksym __weak;
+bool bpf_cpumask_test_and_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) __ksym __weak;
+bool bpf_cpumask_test_and_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) __ksym __weak;
+void bpf_cpumask_setall(struct bpf_cpumask *cpumask) __ksym __weak;
+void bpf_cpumask_clear(struct bpf_cpumask *cpumask) __ksym __weak;
 bool bpf_cpumask_and(struct bpf_cpumask *cpumask,
 		     const struct cpumask *src1,
-		     const struct cpumask *src2) __ksym;
+		     const struct cpumask *src2) __ksym __weak;
 void bpf_cpumask_or(struct bpf_cpumask *cpumask,
 		    const struct cpumask *src1,
-		    const struct cpumask *src2) __ksym;
+		    const struct cpumask *src2) __ksym __weak;
 void bpf_cpumask_xor(struct bpf_cpumask *cpumask,
 		     const struct cpumask *src1,
-		     const struct cpumask *src2) __ksym;
-bool bpf_cpumask_equal(const struct cpumask *src1, const struct cpumask *src2) __ksym;
-bool bpf_cpumask_intersects(const struct cpumask *src1, const struct cpumask *src2) __ksym;
-bool bpf_cpumask_subset(const struct cpumask *src1, const struct cpumask *src2) __ksym;
-bool bpf_cpumask_empty(const struct cpumask *cpumask) __ksym;
-bool bpf_cpumask_full(const struct cpumask *cpumask) __ksym;
-void bpf_cpumask_copy(struct bpf_cpumask *dst, const struct cpumask *src) __ksym;
-u32 bpf_cpumask_any_distribute(const struct cpumask *src) __ksym;
-u32 bpf_cpumask_any_and_distribute(const struct cpumask *src1, const struct cpumask *src2) __ksym;
-u32 bpf_cpumask_weight(const struct cpumask *cpumask) __ksym;
-
-void bpf_rcu_read_lock(void) __ksym;
-void bpf_rcu_read_unlock(void) __ksym;
+		     const struct cpumask *src2) __ksym __weak;
+bool bpf_cpumask_equal(const struct cpumask *src1, const struct cpumask *src2) __ksym __weak;
+bool bpf_cpumask_intersects(const struct cpumask *src1, const struct cpumask *src2) __ksym __weak;
+bool bpf_cpumask_subset(const struct cpumask *src1, const struct cpumask *src2) __ksym __weak;
+bool bpf_cpumask_empty(const struct cpumask *cpumask) __ksym __weak;
+bool bpf_cpumask_full(const struct cpumask *cpumask) __ksym __weak;
+void bpf_cpumask_copy(struct bpf_cpumask *dst, const struct cpumask *src) __ksym __weak;
+u32 bpf_cpumask_any_distribute(const struct cpumask *src) __ksym __weak;
+u32 bpf_cpumask_any_and_distribute(const struct cpumask *src1,
+				   const struct cpumask *src2) __ksym __weak;
+u32 bpf_cpumask_weight(const struct cpumask *cpumask) __ksym __weak;
+
+void bpf_rcu_read_lock(void) __ksym __weak;
+void bpf_rcu_read_unlock(void) __ksym __weak;
 
 static inline const struct cpumask *cast(struct bpf_cpumask *cpumask)
 {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 bpf-next 5/5] selftests/bpf: Add selftests for cpumask iter
  2024-02-06  8:14 [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Yafang Shao
                   ` (3 preceding siblings ...)
  2024-02-06  8:14 ` [PATCH v6 bpf-next 4/5] selftests/bpf: Mark cpumask kfunc declarations as __weak Yafang Shao
@ 2024-02-06  8:14 ` Yafang Shao
  2024-02-08  0:18 ` [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Andrii Nakryiko
  5 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2024-02-06  8:14 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, void
  Cc: bpf, Yafang Shao

Add selftests for the newly added cpumask iter.
- cpumask_iter_success
  - The number of CPUs should be expected when iterating over the cpumask
  - percpu data extracted from the percpu struct should be expected
  - It can work in both non-sleepable and sleepable prog
  - RCU lock is only required by bpf_iter_cpumask_new()
  - It is fine without calling bpf_iter_cpumask_next()

- cpumask_iter_failure
  - RCU lock is required in sleepable prog
  - The cpumask to be iterated over can't be NULL
  - bpf_iter_cpumask_destroy() is required after calling
    bpf_iter_cpumask_new()
  - bpf_iter_cpumask_destroy() can only destroy an initialized iter
  - bpf_iter_cpumask_next() must use an initialized iter

The result as follows,

  #64/37   cpumask/test_cpumask_iter:OK
  #64/38   cpumask/test_cpumask_iter_sleepable:OK
  #64/39   cpumask/test_cpumask_iter_sleepable:OK
  #64/40   cpumask/test_cpumask_iter_next_no_rcu:OK
  #64/41   cpumask/test_cpumask_iter_no_next:OK
  #64/42   cpumask/test_cpumask_iter:OK
  #64/43   cpumask/test_cpumask_iter_no_rcu:OK
  #64/44   cpumask/test_cpumask_iter_no_destroy:OK
  #64/45   cpumask/test_cpumask_iter_null_pointer:OK
  #64/46   cpumask/test_cpumask_iter_next_uninit:OK
  #64/47   cpumask/test_cpumask_iter_destroy_uninit:OK
  #64      cpumask:OK

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 tools/testing/selftests/bpf/config            |   1 +
 .../selftests/bpf/prog_tests/cpumask.c        | 152 ++++++++++++++++++
 .../selftests/bpf/progs/cpumask_common.h      |   3 +
 .../bpf/progs/cpumask_iter_failure.c          |  99 ++++++++++++
 .../bpf/progs/cpumask_iter_success.c          | 126 +++++++++++++++
 5 files changed, 381 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/cpumask_iter_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/cpumask_iter_success.c

diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 01f241ea2c67..dd4b0935e35f 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -78,6 +78,7 @@ CONFIG_NF_CONNTRACK_MARK=y
 CONFIG_NF_DEFRAG_IPV4=y
 CONFIG_NF_DEFRAG_IPV6=y
 CONFIG_NF_NAT=y
+CONFIG_PSI=y
 CONFIG_RC_CORE=y
 CONFIG_SECURITY=y
 CONFIG_SECURITYFS=y
diff --git a/tools/testing/selftests/bpf/prog_tests/cpumask.c b/tools/testing/selftests/bpf/prog_tests/cpumask.c
index ecf89df78109..78423745689c 100644
--- a/tools/testing/selftests/bpf/prog_tests/cpumask.c
+++ b/tools/testing/selftests/bpf/prog_tests/cpumask.c
@@ -1,9 +1,15 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
 
+#define _GNU_SOURCE
+#include <sched.h>
+
 #include <test_progs.h>
 #include "cpumask_failure.skel.h"
 #include "cpumask_success.skel.h"
+#include "cpumask_iter_success.skel.h"
+#include "cpumask_iter_failure.skel.h"
+#include "cgroup_helpers.h"
 
 static const char * const cpumask_success_testcases[] = {
 	"test_alloc_free_cpumask",
@@ -61,6 +67,142 @@ static void verify_success(const char *prog_name)
 	cpumask_success__destroy(skel);
 }
 
+static const char * const cpumask_iter_success_testcases[] = {
+	"test_cpumask_iter",
+	"test_cpumask_iter_sleepable",
+};
+
+static int read_percpu_data(struct bpf_link *link, int nr_cpu_exp, int nr_running_exp)
+{
+	int iter_fd, len, item, nr_running, psi_running, nr_cpus, err = -1;
+	char buf[128];
+	size_t left;
+	char *p;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (!ASSERT_GE(iter_fd, 0, "iter_fd"))
+		return -1;
+
+	memset(buf, 0, sizeof(buf));
+	left = ARRAY_SIZE(buf);
+	p = buf;
+	while ((len = read(iter_fd, p, left)) > 0) {
+		p += len;
+		left -= len;
+	}
+
+	item = sscanf(buf, "nr_running %u nr_cpus %u psi_running %u\n",
+		      &nr_running, &nr_cpus, &psi_running);
+	if (!ASSERT_EQ(item, 3, "seq_format"))
+		goto out;
+	if (!ASSERT_EQ(nr_cpus, nr_cpu_exp, "nr_cpus"))
+		goto out;
+	if (!ASSERT_GE(nr_running, nr_running_exp, "nr_running"))
+		goto out;
+	if (!ASSERT_GE(psi_running, nr_running_exp, "psi_running"))
+		goto out;
+
+	err = 0;
+out:
+	close(iter_fd);
+	return err;
+}
+
+static void verify_iter_success(const char *prog_name)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	int cgrp_fd, nr_cpus, err, i, chosen = 0;
+	struct cpumask_iter_success *skel;
+	union bpf_iter_link_info linfo;
+	struct bpf_program *prog;
+	struct bpf_link *link;
+	cpu_set_t set;
+
+	if (setup_cgroup_environment())
+		return;
+
+	/* Utilize the cgroup iter */
+	cgrp_fd = get_root_cgroup();
+	if (!ASSERT_GE(cgrp_fd, 0, "create_cgrp"))
+		goto cleanup;
+
+	skel = cpumask_iter_success__open();
+	if (!ASSERT_OK_PTR(skel, "cpumask_iter_success__open"))
+		goto close_fd;
+
+	skel->bss->pid = getpid();
+	nr_cpus = libbpf_num_possible_cpus();
+
+	err = cpumask_iter_success__load(skel);
+	if (!ASSERT_OK(err, "cpumask_iter_success__load"))
+		goto destroy;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto destroy;
+
+	memset(&linfo, 0, sizeof(linfo));
+	linfo.cgroup.cgroup_fd = cgrp_fd;
+	linfo.cgroup.order = BPF_CGROUP_ITER_SELF_ONLY;
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+	link = bpf_program__attach_iter(prog, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach"))
+		goto destroy;
+
+	/* Case 1): Enable all possible CPUs */
+	CPU_ZERO(&set);
+	for (i = 0; i < nr_cpus; i++)
+		CPU_SET(i, &set);
+	err = sched_setaffinity(skel->bss->pid, sizeof(set), &set);
+	if (!ASSERT_OK(err, "setaffinity_all_cpus"))
+		goto free_link;
+	err = read_percpu_data(link, nr_cpus, 1);
+	if (!ASSERT_OK(err, "read_percpu_data"))
+		goto free_link;
+	if (!ASSERT_OK(skel->bss->err, "null_rq"))
+		goto free_link;
+
+	/* Case 2): CPU0 only */
+	CPU_ZERO(&set);
+	CPU_SET(0, &set);
+	err = sched_setaffinity(skel->bss->pid, sizeof(set), &set);
+	if (!ASSERT_OK(err, "setaffinity_cpu0"))
+		goto free_link;
+	err = read_percpu_data(link, 1, 1);
+	if (!ASSERT_OK(err, "read_percpu_data"))
+		goto free_link;
+	if (!ASSERT_OK(skel->bss->err, "null_rq_psi"))
+		goto free_link;
+
+	/* Case 3): Partial CPUs */
+	CPU_ZERO(&set);
+	for (i = 0; i < nr_cpus; i++) {
+		if (i < 4 && i & 0x1)
+			continue;
+		if (i > 8 && i & 0x2)
+			continue;
+		CPU_SET(i, &set);
+		chosen++;
+	}
+	err = sched_setaffinity(skel->bss->pid, sizeof(set), &set);
+	if (!ASSERT_OK(err, "setaffinity_partial_cpus"))
+		goto free_link;
+	err = read_percpu_data(link, chosen, 1);
+	if (!ASSERT_OK(err, "read_percpu_data"))
+		goto free_link;
+	ASSERT_OK(skel->bss->err, "null_rq_psi");
+
+free_link:
+	bpf_link__destroy(link);
+destroy:
+	cpumask_iter_success__destroy(skel);
+close_fd:
+	close(cgrp_fd);
+cleanup:
+	cleanup_cgroup_environment();
+}
+
 void test_cpumask(void)
 {
 	int i;
@@ -74,4 +216,14 @@ void test_cpumask(void)
 
 	RUN_TESTS(cpumask_success);
 	RUN_TESTS(cpumask_failure);
+
+	for (i = 0; i < ARRAY_SIZE(cpumask_iter_success_testcases); i++) {
+		if (!test__start_subtest(cpumask_iter_success_testcases[i]))
+			continue;
+
+		verify_iter_success(cpumask_iter_success_testcases[i]);
+	}
+
+	RUN_TESTS(cpumask_iter_success);
+	RUN_TESTS(cpumask_iter_failure);
 }
diff --git a/tools/testing/selftests/bpf/progs/cpumask_common.h b/tools/testing/selftests/bpf/progs/cpumask_common.h
index c705d8112a35..fb65943ef130 100644
--- a/tools/testing/selftests/bpf/progs/cpumask_common.h
+++ b/tools/testing/selftests/bpf/progs/cpumask_common.h
@@ -56,6 +56,9 @@ u32 bpf_cpumask_any_distribute(const struct cpumask *src) __ksym __weak;
 u32 bpf_cpumask_any_and_distribute(const struct cpumask *src1,
 				   const struct cpumask *src2) __ksym __weak;
 u32 bpf_cpumask_weight(const struct cpumask *cpumask) __ksym __weak;
+int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask) __ksym __weak;
+int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it) __ksym __weak;
+void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it) __ksym __weak;
 
 void bpf_rcu_read_lock(void) __ksym __weak;
 void bpf_rcu_read_unlock(void) __ksym __weak;
diff --git a/tools/testing/selftests/bpf/progs/cpumask_iter_failure.c b/tools/testing/selftests/bpf/progs/cpumask_iter_failure.c
new file mode 100644
index 000000000000..3d304cee0a72
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/cpumask_iter_failure.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "bpf_misc.h"
+#include "task_kfunc_common.h"
+#include "cpumask_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+SEC("iter.s/cgroup")
+__failure __msg("R2 must be a rcu pointer")
+int BPF_PROG(test_cpumask_iter_no_rcu, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct task_struct *p;
+	int *cpu;
+
+	p = bpf_task_from_pid(1);
+	if (!p)
+		return 1;
+
+	bpf_for_each(cpumask, cpu, p->cpus_ptr) {
+	}
+	bpf_task_release(p);
+	return 0;
+}
+
+SEC("iter/cgroup")
+__failure __msg("Possibly NULL pointer passed to trusted arg1")
+int BPF_PROG(test_cpumask_iter_null_pointer, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct cpumask *mask = NULL;
+	int *cpu;
+
+	bpf_for_each(cpumask, cpu, mask) {
+	}
+	return 0;
+}
+
+SEC("iter.s/cgroup")
+__failure __msg("Unreleased reference id=3 alloc_insn=10")
+int BPF_PROG(test_cpumask_iter_no_destroy, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_cpumask it;
+	struct task_struct *p;
+
+	p = bpf_task_from_pid(1);
+	if (!p)
+		return 1;
+
+	bpf_rcu_read_lock();
+	bpf_iter_cpumask_new(&it, p->cpus_ptr);
+	bpf_rcu_read_unlock();
+
+	bpf_iter_cpumask_next(&it);
+	bpf_task_release(p);
+	return 0;
+}
+
+SEC("iter/cgroup")
+__failure __msg("expected an initialized iter_cpumask as arg #1")
+int BPF_PROG(test_cpumask_iter_next_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_cpumask *it = NULL;
+
+	bpf_iter_cpumask_next(it);
+	return 0;
+}
+
+SEC("iter/cgroup")
+__failure __msg("expected an initialized iter_cpumask as arg #1")
+int BPF_PROG(test_cpumask_iter_next_uninit2, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_cpumask it = {};
+
+	bpf_iter_cpumask_next(&it);
+	return 0;
+}
+
+SEC("iter/cgroup")
+__failure __msg("expected an initialized iter_cpumask as arg #1")
+int BPF_PROG(test_cpumask_iter_destroy_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_cpumask_kern it = {.cpu = -1};
+	struct bpf_cpumask *mask;
+
+	mask = bpf_cpumask_create();
+	if (!mask)
+		return 1;
+
+	bpf_cpumask_setall(mask);
+	it.mask = &mask->cpumask;
+	bpf_iter_cpumask_destroy((struct bpf_iter_cpumask *)&it);
+	bpf_cpumask_release(mask);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/cpumask_iter_success.c b/tools/testing/selftests/bpf/progs/cpumask_iter_success.c
new file mode 100644
index 000000000000..a71db42cc38a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/cpumask_iter_success.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "task_kfunc_common.h"
+#include "cpumask_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+extern const struct psi_group_cpu system_group_pcpu __ksym __weak;
+extern const struct rq runqueues __ksym __weak;
+
+int pid;
+
+static void read_percpu_data(struct bpf_iter_meta *meta, struct cgroup *cgrp, const cpumask_t *mask)
+{
+	u32 nr_running = 0, psi_nr_running = 0, nr_cpus = 0;
+	struct psi_group_cpu *groupc;
+	struct rq *rq;
+	int *cpu;
+
+	bpf_for_each(cpumask, cpu, mask) {
+		rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, *cpu);
+		if (!rq) {
+			err += 1;
+			continue;
+		}
+		nr_running += rq->nr_running;
+		nr_cpus += 1;
+
+		groupc = (struct psi_group_cpu *)bpf_per_cpu_ptr(&system_group_pcpu, *cpu);
+		if (!groupc) {
+			err += 1;
+			continue;
+		}
+		psi_nr_running += groupc->tasks[NR_RUNNING];
+	}
+	BPF_SEQ_PRINTF(meta->seq, "nr_running %u nr_cpus %u psi_running %u\n",
+		       nr_running, nr_cpus, psi_nr_running);
+}
+
+SEC("iter.s/cgroup")
+int BPF_PROG(test_cpumask_iter_sleepable, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct task_struct *p;
+
+	/* epilogue */
+	if (!cgrp)
+		return 0;
+
+	bpf_rcu_read_lock();
+	p = bpf_task_from_pid(pid);
+	if (!p) {
+		bpf_rcu_read_unlock();
+		return 1;
+	}
+
+	read_percpu_data(meta, cgrp, p->cpus_ptr);
+	bpf_task_release(p);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("iter/cgroup")
+int BPF_PROG(test_cpumask_iter, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct task_struct *p;
+
+	/* epilogue */
+	if (!cgrp)
+		return 0;
+
+	p = bpf_task_from_pid(pid);
+	if (!p)
+		return 1;
+
+	read_percpu_data(meta, cgrp, p->cpus_ptr);
+	bpf_task_release(p);
+	return 0;
+}
+
+SEC("iter.s/cgroup")
+int BPF_PROG(test_cpumask_iter_next_no_rcu, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_cpumask it;
+	struct task_struct *p;
+
+	p = bpf_task_from_pid(1);
+	if (!p)
+		return 1;
+
+	/* RCU is only required by bpf_iter_cpumask_new(). */
+	bpf_rcu_read_lock();
+	bpf_iter_cpumask_new(&it, p->cpus_ptr);
+	bpf_rcu_read_unlock();
+
+	bpf_iter_cpumask_next(&it);
+	bpf_iter_cpumask_destroy(&it);
+
+	bpf_task_release(p);
+	return 0;
+}
+
+SEC("iter.s/cgroup")
+int BPF_PROG(test_cpumask_iter_no_next, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_cpumask it;
+	struct task_struct *p;
+
+	p = bpf_task_from_pid(1);
+	if (!p)
+		return 1;
+
+	bpf_rcu_read_lock();
+	bpf_iter_cpumask_new(&it, p->cpus_ptr);
+	bpf_rcu_read_unlock();
+
+	/* It is fine without calling bpf_iter_cpumask_next(). */
+
+	bpf_iter_cpumask_destroy(&it);
+	bpf_task_release(p);
+	return 0;
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 bpf-next 4/5] selftests/bpf: Mark cpumask kfunc declarations as __weak
  2024-02-06  8:14 ` [PATCH v6 bpf-next 4/5] selftests/bpf: Mark cpumask kfunc declarations as __weak Yafang Shao
@ 2024-02-06 15:48   ` Daniel Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Xu @ 2024-02-06 15:48 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, void, bpf,
	Andrii Nakryiko

On Tue, Feb 06, 2024 at 04:14:15PM +0800, Yafang Shao wrote:
> After the series "Annotate kfuncs in .BTF_ids section"[0], kfuncs can be
> generated from bpftool. Let's mark the existing cpumask kfunc declarations
> __weak so they don't conflict with definitions that will eventually come
> from vmlinux.h.
> 
> [0]. https://lore.kernel.org/all/cover.1706491398.git.dxu@dxuuu.xyz
> 
> Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Daniel Xu <dxu@dxuuu.xyz>
> ---
>  .../selftests/bpf/progs/cpumask_common.h      | 57 ++++++++++---------
>  1 file changed, 29 insertions(+), 28 deletions(-)

Thanks!

Acked-by: Daniel Xu <dxu@dxuuu.xyz>

[..]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 bpf-next 1/5] bpf: Add bpf_iter_cpumask kfuncs
  2024-02-06  8:14 ` [PATCH v6 bpf-next 1/5] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
@ 2024-02-07  1:06   ` Alexei Starovoitov
  2024-02-07  3:24     ` Yafang Shao
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2024-02-07  1:06 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Tejun Heo, David Vernet, bpf

On Tue, Feb 6, 2024 at 12:14 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Add three new kfuncs for bpf_iter_cpumask.
> - bpf_iter_cpumask_new
>   KF_RCU is defined because the cpumask must be a RCU trusted pointer
>   such as task->cpus_ptr.
> - bpf_iter_cpumask_next
> - bpf_iter_cpumask_destroy
>
> These new kfuncs facilitate the iteration of percpu data, such as
> runqueues, psi_cgroup_cpu, and more.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  kernel/bpf/cpumask.c | 79 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 79 insertions(+)
>
> diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
> index dad0fb1c8e87..ed6078cfa40e 100644
> --- a/kernel/bpf/cpumask.c
> +++ b/kernel/bpf/cpumask.c
> @@ -422,6 +422,82 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask)
>         return cpumask_weight(cpumask);
>  }
>
> +struct bpf_iter_cpumask {
> +       __u64 __opaque[2];
> +} __aligned(8);
> +
> +struct bpf_iter_cpumask_kern {
> +       struct cpumask *mask;
> +       int cpu;
> +} __aligned(8);
> +
> +/**
> + * bpf_iter_cpumask_new() - Initialize a new CPU mask iterator for a given CPU mask
> + * @it: The new bpf_iter_cpumask to be created.
> + * @mask: The cpumask to be iterated over.
> + *
> + * This function initializes a new bpf_iter_cpumask structure for iterating over
> + * the specified CPU mask. It assigns the provided cpumask to the newly created
> + * bpf_iter_cpumask @it for subsequent iteration operations.
> + *
> + * On success, 0 is returned. On failure, ERR is returned.
> + */
> +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask)
> +{
> +       struct bpf_iter_cpumask_kern *kit = (void *)it;
> +
> +       BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) !=
> +                    __alignof__(struct bpf_iter_cpumask));
> +
> +       kit->mask = bpf_mem_alloc(&bpf_global_ma, cpumask_size());
> +       if (!kit->mask)
> +               return -ENOMEM;
> +
> +       cpumask_copy(kit->mask, mask);

Since it's mem_alloc plus memcpy how about we make it more
generic ?
Instead of cpumask specific let's pass arbitrary
"void *unsafe_addr, u32 size"

allocate that much and probe_read_kernel into the buffer?


> +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it)
> +{
> +       struct bpf_iter_cpumask_kern *kit = (void *)it;
> +       const struct cpumask *mask = kit->mask;
> +       int cpu;
> +
> +       if (!mask)
> +               return NULL;
> +       cpu = cpumask_next(kit->cpu, mask);

Instead of cpumask_next() call find_next_bit()

> +       if (cpu >= nr_cpu_ids)
> +               return NULL;

instead of nr_cpu_ids we can check size in bits of copied bit array.

> BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU)

KF_RCU is also not needed.
Such iterator will be callable from anywhere and on any address.

wdyt?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 bpf-next 1/5] bpf: Add bpf_iter_cpumask kfuncs
  2024-02-07  1:06   ` Alexei Starovoitov
@ 2024-02-07  3:24     ` Yafang Shao
  0 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2024-02-07  3:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eddy Z, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Tejun Heo, David Vernet, bpf

On Wed, Feb 7, 2024 at 9:06 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 6, 2024 at 12:14 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Add three new kfuncs for bpf_iter_cpumask.
> > - bpf_iter_cpumask_new
> >   KF_RCU is defined because the cpumask must be a RCU trusted pointer
> >   such as task->cpus_ptr.
> > - bpf_iter_cpumask_next
> > - bpf_iter_cpumask_destroy
> >
> > These new kfuncs facilitate the iteration of percpu data, such as
> > runqueues, psi_cgroup_cpu, and more.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  kernel/bpf/cpumask.c | 79 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 79 insertions(+)
> >
> > diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
> > index dad0fb1c8e87..ed6078cfa40e 100644
> > --- a/kernel/bpf/cpumask.c
> > +++ b/kernel/bpf/cpumask.c
> > @@ -422,6 +422,82 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask)
> >         return cpumask_weight(cpumask);
> >  }
> >
> > +struct bpf_iter_cpumask {
> > +       __u64 __opaque[2];
> > +} __aligned(8);
> > +
> > +struct bpf_iter_cpumask_kern {
> > +       struct cpumask *mask;
> > +       int cpu;
> > +} __aligned(8);
> > +
> > +/**
> > + * bpf_iter_cpumask_new() - Initialize a new CPU mask iterator for a given CPU mask
> > + * @it: The new bpf_iter_cpumask to be created.
> > + * @mask: The cpumask to be iterated over.
> > + *
> > + * This function initializes a new bpf_iter_cpumask structure for iterating over
> > + * the specified CPU mask. It assigns the provided cpumask to the newly created
> > + * bpf_iter_cpumask @it for subsequent iteration operations.
> > + *
> > + * On success, 0 is returned. On failure, ERR is returned.
> > + */
> > +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask)
> > +{
> > +       struct bpf_iter_cpumask_kern *kit = (void *)it;
> > +
> > +       BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask));
> > +       BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) !=
> > +                    __alignof__(struct bpf_iter_cpumask));
> > +
> > +       kit->mask = bpf_mem_alloc(&bpf_global_ma, cpumask_size());
> > +       if (!kit->mask)
> > +               return -ENOMEM;
> > +
> > +       cpumask_copy(kit->mask, mask);
>
> Since it's mem_alloc plus memcpy how about we make it more
> generic ?
> Instead of cpumask specific let's pass arbitrary
> "void *unsafe_addr, u32 size"
>
> allocate that much and probe_read_kernel into the buffer?
>
>
> > +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it)
> > +{
> > +       struct bpf_iter_cpumask_kern *kit = (void *)it;
> > +       const struct cpumask *mask = kit->mask;
> > +       int cpu;
> > +
> > +       if (!mask)
> > +               return NULL;
> > +       cpu = cpumask_next(kit->cpu, mask);
>
> Instead of cpumask_next() call find_next_bit()
>
> > +       if (cpu >= nr_cpu_ids)
> > +               return NULL;
>
> instead of nr_cpu_ids we can check size in bits of copied bit array.
>
> > BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU)
>
> KF_RCU is also not needed.
> Such iterator will be callable from anywhere and on any address.
>
> wdyt?

Good suggestion. A more generic bitmap iter is better. I will analyze it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask
  2024-02-06  8:14 [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Yafang Shao
                   ` (4 preceding siblings ...)
  2024-02-06  8:14 ` [PATCH v6 bpf-next 5/5] selftests/bpf: Add selftests for cpumask iter Yafang Shao
@ 2024-02-08  0:18 ` Andrii Nakryiko
  5 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2024-02-08  0:18 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, tj, void, bpf

On Tue, Feb 6, 2024 at 12:14 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Three new kfuncs, namely bpf_iter_cpumask_{new,next,destroy}, have been
> added for the new bpf_iter_cpumask functionality. These kfuncs enable the
> iteration of percpu data, such as runqueues, system_group_pcpu, and more.
>
> In our specific use case, we leverage the cgroup iterator to traverse
> percpu data, subsequently exposing it to userspace through a seq file.
> Refer to example in patch #2 for the usage.
>
> Changes:
> - v5 -> v6:
>   - Various improvements on the comments (Andrii)
>   - Use a static function instead as Kumar's patch[0] has been merged.
>     (Anrii, Yonghong)
>
> [0]. https://lore.kernel.org/bpf/170719262630.31872.2248639771567354367.git-patchwork-notify@kernel.org
>
> - v4 -> v5:
>   - Use cpumask_size() instead of sizeof(struct cpumask) (David)
>   - Refactor the selftests as suggsted by David
>   - Improve the doc as suggested by David
> - v3 -> v4:
>   - Use a copy of cpumask to avoid potential use-after-free (Yonghong)
>   - Various code improvement in selftests (Yonghong)
> - v2 -> v3:
>   - Define KF_RCU_PROTECTED for bpf_iter_cpumask_new (Alexei)
>   - Code improvement in selftests
>   - Fix build error in selftest due to CONFIG_PSI=n
>     reported by kernel test robot <lkp@intel.com>
> - v1 -> v2:
>   - Avoid changing cgroup subsystem (Tejun)
>   - Remove bpf_cpumask_set_from_pid(), and use bpf_cpumask_copy()
>     instead (Tejun)
>   - Use `int cpu;` field in bpf_iter_cpumask_kern (Andrii)
>
> Yafang Shao (5):
>   bpf: Add bpf_iter_cpumask kfuncs
>   bpf, docs: Add document for cpumask iter
>   selftests/bpf: Fix error checking for cpumask_success__load()
>   selftests/bpf: Mark cpumask kfunc declarations as __weak

I've applied patches 3 and 4, as they are general clean ups and
improvements independent from the actual iterator implementation.
Thanks!

>   selftests/bpf: Add selftests for cpumask iter
>
>  Documentation/bpf/cpumasks.rst                |  60 +++++++
>  kernel/bpf/cpumask.c                          |  79 +++++++++
>  tools/testing/selftests/bpf/config            |   1 +
>  .../selftests/bpf/prog_tests/cpumask.c        | 158 +++++++++++++++++-
>  .../selftests/bpf/progs/cpumask_common.h      |  60 +++----
>  .../bpf/progs/cpumask_iter_failure.c          |  99 +++++++++++
>  .../bpf/progs/cpumask_iter_success.c          | 126 ++++++++++++++
>  7 files changed, 552 insertions(+), 31 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/cpumask_iter_failure.c
>  create mode 100644 tools/testing/selftests/bpf/progs/cpumask_iter_success.c
>
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-08  0:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-06  8:14 [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Yafang Shao
2024-02-06  8:14 ` [PATCH v6 bpf-next 1/5] bpf: Add bpf_iter_cpumask kfuncs Yafang Shao
2024-02-07  1:06   ` Alexei Starovoitov
2024-02-07  3:24     ` Yafang Shao
2024-02-06  8:14 ` [PATCH v6 bpf-next 2/5] bpf, docs: Add document for cpumask iter Yafang Shao
2024-02-06  8:14 ` [PATCH v6 bpf-next 3/5] selftests/bpf: Fix error checking for cpumask_success__load() Yafang Shao
2024-02-06  8:14 ` [PATCH v6 bpf-next 4/5] selftests/bpf: Mark cpumask kfunc declarations as __weak Yafang Shao
2024-02-06 15:48   ` Daniel Xu
2024-02-06  8:14 ` [PATCH v6 bpf-next 5/5] selftests/bpf: Add selftests for cpumask iter Yafang Shao
2024-02-08  0:18 ` [PATCH v6 bpf-next 0/5] bpf: Add bpf_iter_cpumask Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox