* [PATCH v7 bpf-next 1/5] bpf: Don't explicitly emit BTF for struct btf_iter_num
2023-10-13 20:44 [PATCH v7 bpf-next 0/5] Open-coded task_vma iter Dave Marchevsky
@ 2023-10-13 20:44 ` Dave Marchevsky
2023-10-13 20:44 ` [PATCH v7 bpf-next 2/5] selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c Dave Marchevsky
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Dave Marchevsky @ 2023-10-13 20:44 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Kernel Team, Dave Marchevsky, Yonghong Song
Commit 6018e1f407cc ("bpf: implement numbers iterator") added the
BTF_TYPE_EMIT line that this patch is modifying. The struct btf_iter_num
doesn't exist, so only a forward declaration is emitted in BTF:
FWD 'btf_iter_num' fwd_kind=struct
That commit was probably hoping to ensure that struct bpf_iter_num is
emitted in vmlinux BTF. A previous version of this patch changed the
line to emit the correct type, but Yonghong confirmed that it would
definitely be emitted regardless in [0], so this patch simply removes
the line.
This isn't marked "Fixes" because the extraneous btf_iter_num FWD wasn't
causing any issues that I noticed, aside from mild confusion when I
looked through the code.
[0]: https://lore.kernel.org/bpf/25d08207-43e6-36a8-5e0f-47a913d4cda5@linux.dev/
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/bpf/bpf_iter.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index 96856f130cbf..833faa04461b 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -793,8 +793,6 @@ __bpf_kfunc int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end)
BUILD_BUG_ON(sizeof(struct bpf_iter_num_kern) != sizeof(struct bpf_iter_num));
BUILD_BUG_ON(__alignof__(struct bpf_iter_num_kern) != __alignof__(struct bpf_iter_num));
- BTF_TYPE_EMIT(struct btf_iter_num);
-
/* start == end is legit, it's an empty range and we'll just get NULL
* on first (and any subsequent) bpf_iter_num_next() call
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v7 bpf-next 2/5] selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
2023-10-13 20:44 [PATCH v7 bpf-next 0/5] Open-coded task_vma iter Dave Marchevsky
2023-10-13 20:44 ` [PATCH v7 bpf-next 1/5] bpf: Don't explicitly emit BTF for struct btf_iter_num Dave Marchevsky
@ 2023-10-13 20:44 ` Dave Marchevsky
2023-10-13 20:44 ` [PATCH v7 bpf-next 3/5] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Dave Marchevsky @ 2023-10-13 20:44 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Kernel Team, Dave Marchevsky
Further patches in this series will add a struct bpf_iter_task_vma,
which will result in a name collision with the selftest prog renamed in
this patch. Rename the selftest to avoid the collision.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
.../selftests/bpf/prog_tests/bpf_iter.c | 26 +++++++++----------
...f_iter_task_vma.c => bpf_iter_task_vmas.c} | 0
2 files changed, 13 insertions(+), 13 deletions(-)
rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 1f02168103dd..41aba139b20b 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -10,7 +10,7 @@
#include "bpf_iter_task.skel.h"
#include "bpf_iter_task_stack.skel.h"
#include "bpf_iter_task_file.skel.h"
-#include "bpf_iter_task_vma.skel.h"
+#include "bpf_iter_task_vmas.skel.h"
#include "bpf_iter_task_btf.skel.h"
#include "bpf_iter_tcp4.skel.h"
#include "bpf_iter_tcp6.skel.h"
@@ -1399,19 +1399,19 @@ static void str_strip_first_line(char *str)
static void test_task_vma_common(struct bpf_iter_attach_opts *opts)
{
int err, iter_fd = -1, proc_maps_fd = -1;
- struct bpf_iter_task_vma *skel;
+ struct bpf_iter_task_vmas *skel;
int len, read_size = 4;
char maps_path[64];
- skel = bpf_iter_task_vma__open();
- if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vma__open"))
+ skel = bpf_iter_task_vmas__open();
+ if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vmas__open"))
return;
skel->bss->pid = getpid();
skel->bss->one_task = opts ? 1 : 0;
- err = bpf_iter_task_vma__load(skel);
- if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
+ err = bpf_iter_task_vmas__load(skel);
+ if (!ASSERT_OK(err, "bpf_iter_task_vmas__load"))
goto out;
skel->links.proc_maps = bpf_program__attach_iter(
@@ -1462,25 +1462,25 @@ static void test_task_vma_common(struct bpf_iter_attach_opts *opts)
out:
close(proc_maps_fd);
close(iter_fd);
- bpf_iter_task_vma__destroy(skel);
+ bpf_iter_task_vmas__destroy(skel);
}
static void test_task_vma_dead_task(void)
{
- struct bpf_iter_task_vma *skel;
+ struct bpf_iter_task_vmas *skel;
int wstatus, child_pid = -1;
time_t start_tm, cur_tm;
int err, iter_fd = -1;
int wait_sec = 3;
- skel = bpf_iter_task_vma__open();
- if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vma__open"))
+ skel = bpf_iter_task_vmas__open();
+ if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vmas__open"))
return;
skel->bss->pid = getpid();
- err = bpf_iter_task_vma__load(skel);
- if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
+ err = bpf_iter_task_vmas__load(skel);
+ if (!ASSERT_OK(err, "bpf_iter_task_vmas__load"))
goto out;
skel->links.proc_maps = bpf_program__attach_iter(
@@ -1533,7 +1533,7 @@ static void test_task_vma_dead_task(void)
out:
waitpid(child_pid, &wstatus, 0);
close(iter_fd);
- bpf_iter_task_vma__destroy(skel);
+ bpf_iter_task_vmas__destroy(skel);
}
void test_bpf_sockmap_map_iter_fd(void)
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_vmas.c
similarity index 100%
rename from tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
rename to tools/testing/selftests/bpf/progs/bpf_iter_task_vmas.c
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v7 bpf-next 3/5] bpf: Introduce task_vma open-coded iterator kfuncs
2023-10-13 20:44 [PATCH v7 bpf-next 0/5] Open-coded task_vma iter Dave Marchevsky
2023-10-13 20:44 ` [PATCH v7 bpf-next 1/5] bpf: Don't explicitly emit BTF for struct btf_iter_num Dave Marchevsky
2023-10-13 20:44 ` [PATCH v7 bpf-next 2/5] selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c Dave Marchevsky
@ 2023-10-13 20:44 ` Dave Marchevsky
2023-10-13 20:44 ` [PATCH v7 bpf-next 4/5] selftests/bpf: Add tests for open-coded task_vma iter Dave Marchevsky
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Dave Marchevsky @ 2023-10-13 20:44 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Kernel Team, Dave Marchevsky,
Nathan Slingerland
This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_task_vma in open-coded
iterator style. BPF programs can use these kfuncs directly or through
bpf_for_each macro for natural-looking iteration of all task vmas.
The implementation borrows heavily from bpf_find_vma helper's locking -
differing only in that it holds the mmap_read lock for all iterations
while the helper only executes its provided callback on a maximum of 1
vma. Aside from locking, struct vma_iterator and vma_next do all the
heavy lifting.
A pointer to an inner data struct, struct bpf_iter_task_vma_data, is the
only field in struct bpf_iter_task_vma. This is because the inner data
struct contains a struct vma_iterator (not ptr), whose size is likely to
change under us. If bpf_iter_task_vma_kern contained vma_iterator directly
such a change would require change in opaque bpf_iter_task_vma struct's
size. So better to allocate vma_iterator using BPF allocator, and since
that alloc must already succeed, might as well allocate all iter fields,
thereby freezing struct bpf_iter_task_vma size.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Nathan Slingerland <slinger@meta.com>
---
kernel/bpf/helpers.c | 3 ++
kernel/bpf/task_iter.c | 91 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 94 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index d2840dd5b00d..62a53ebfedf9 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2552,6 +2552,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW | KF_RCU)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_dynptr_adjust)
BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 7473068ed313..fef17628341f 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -7,7 +7,9 @@
#include <linux/fs.h>
#include <linux/fdtable.h>
#include <linux/filter.h>
+#include <linux/bpf_mem_alloc.h>
#include <linux/btf_ids.h>
+#include <linux/mm_types.h>
#include "mmap_unlock_work.h"
static const char * const iter_task_type_names[] = {
@@ -803,6 +805,95 @@ const struct bpf_func_proto bpf_find_vma_proto = {
.arg5_type = ARG_ANYTHING,
};
+struct bpf_iter_task_vma_kern_data {
+ struct task_struct *task;
+ struct mm_struct *mm;
+ struct mmap_unlock_irq_work *work;
+ struct vma_iterator vmi;
+};
+
+struct bpf_iter_task_vma {
+ /* opaque iterator state; having __u64 here allows to preserve correct
+ * alignment requirements in vmlinux.h, generated from BTF
+ */
+ __u64 __opaque[1];
+} __attribute__((aligned(8)));
+
+/* Non-opaque version of bpf_iter_task_vma */
+struct bpf_iter_task_vma_kern {
+ struct bpf_iter_task_vma_kern_data *data;
+} __attribute__((aligned(8)));
+
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+ "Global functions as their definitions will be in vmlinux BTF");
+
+__bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
+ struct task_struct *task, u64 addr)
+{
+ struct bpf_iter_task_vma_kern *kit = (void *)it;
+ bool irq_work_busy = false;
+ int err;
+
+ BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
+ BUILD_BUG_ON(__alignof__(struct bpf_iter_task_vma_kern) != __alignof__(struct bpf_iter_task_vma));
+
+ /* is_iter_reg_valid_uninit guarantees that kit hasn't been initialized
+ * before, so non-NULL kit->data doesn't point to previously
+ * bpf_mem_alloc'd bpf_iter_task_vma_kern_data
+ */
+ kit->data = bpf_mem_alloc(&bpf_global_ma, sizeof(struct bpf_iter_task_vma_kern_data));
+ if (!kit->data)
+ return -ENOMEM;
+
+ kit->data->task = get_task_struct(task);
+ kit->data->mm = task->mm;
+ if (!kit->data->mm) {
+ err = -ENOENT;
+ goto err_cleanup_iter;
+ }
+
+ /* kit->data->work == NULL is valid after bpf_mmap_unlock_get_irq_work */
+ irq_work_busy = bpf_mmap_unlock_get_irq_work(&kit->data->work);
+ if (irq_work_busy || !mmap_read_trylock(kit->data->mm)) {
+ err = -EBUSY;
+ goto err_cleanup_iter;
+ }
+
+ vma_iter_init(&kit->data->vmi, kit->data->mm, addr);
+ return 0;
+
+err_cleanup_iter:
+ if (kit->data->task)
+ put_task_struct(kit->data->task);
+ bpf_mem_free(&bpf_global_ma, kit->data);
+ /* NULL kit->data signals failed bpf_iter_task_vma initialization */
+ kit->data = NULL;
+ return err;
+}
+
+__bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it)
+{
+ struct bpf_iter_task_vma_kern *kit = (void *)it;
+
+ if (!kit->data) /* bpf_iter_task_vma_new failed */
+ return NULL;
+ return vma_next(&kit->data->vmi);
+}
+
+__bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
+{
+ struct bpf_iter_task_vma_kern *kit = (void *)it;
+
+ if (kit->data) {
+ bpf_mmap_unlock_mm(kit->data->work, kit->data->mm);
+ put_task_struct(kit->data->task);
+ bpf_mem_free(&bpf_global_ma, kit->data);
+ }
+}
+
+__diag_pop();
+
DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
static void do_mmap_read_unlock(struct irq_work *entry)
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v7 bpf-next 4/5] selftests/bpf: Add tests for open-coded task_vma iter
2023-10-13 20:44 [PATCH v7 bpf-next 0/5] Open-coded task_vma iter Dave Marchevsky
` (2 preceding siblings ...)
2023-10-13 20:44 ` [PATCH v7 bpf-next 3/5] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
@ 2023-10-13 20:44 ` Dave Marchevsky
2023-10-13 20:44 ` [PATCH v7 bpf-next 5/5] bpf: Add BPF_KFUNC_{START,END}_defs macros Dave Marchevsky
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Dave Marchevsky @ 2023-10-13 20:44 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Kernel Team, Dave Marchevsky
The open-coded task_vma iter added earlier in this series allows for
natural iteration over a task's vmas using existing open-coded iter
infrastructure, specifically bpf_for_each.
This patch adds a test demonstrating this pattern and validating
correctness. The vma->vm_start and vma->vm_end addresses of the first
1000 vmas are recorded and compared to /proc/PID/maps output. As
expected, both see the same vmas and addresses - with the exception of
the [vsyscall] vma - which is explained in a comment in the prog_tests
program.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
.../testing/selftests/bpf/bpf_experimental.h | 8 +++
.../testing/selftests/bpf/prog_tests/iters.c | 58 +++++++++++++++++++
.../selftests/bpf/progs/iters_task_vma.c | 43 ++++++++++++++
3 files changed, 109 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma.c
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 9aa29564bd74..2c8cb3f61529 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -159,6 +159,14 @@ extern void *bpf_percpu_obj_new_impl(__u64 local_type_id, void *meta) __ksym;
*/
extern void bpf_percpu_obj_drop_impl(void *kptr, void *meta) __ksym;
+struct bpf_iter_task_vma;
+
+extern int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
+ struct task_struct *task,
+ unsigned long addr) __ksym;
+extern struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it) __ksym;
+extern void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) __ksym;
+
/* Convenience macro to wrap over bpf_obj_drop_impl */
#define bpf_percpu_obj_drop(kptr) bpf_percpu_obj_drop_impl(kptr, NULL)
diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index 10804ae5ae97..b696873c5455 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -8,6 +8,7 @@
#include "iters_looping.skel.h"
#include "iters_num.skel.h"
#include "iters_testmod_seq.skel.h"
+#include "iters_task_vma.skel.h"
static void subtest_num_iters(void)
{
@@ -90,6 +91,61 @@ static void subtest_testmod_seq_iters(void)
iters_testmod_seq__destroy(skel);
}
+static void subtest_task_vma_iters(void)
+{
+ unsigned long start, end, bpf_iter_start, bpf_iter_end;
+ struct iters_task_vma *skel;
+ char rest_of_line[1000];
+ unsigned int seen;
+ FILE *f = NULL;
+ int err;
+
+ skel = iters_task_vma__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+ return;
+
+ skel->bss->target_pid = getpid();
+
+ err = iters_task_vma__attach(skel);
+ if (!ASSERT_OK(err, "skel_attach"))
+ goto cleanup;
+
+ getpgid(skel->bss->target_pid);
+ iters_task_vma__detach(skel);
+
+ if (!ASSERT_GT(skel->bss->vmas_seen, 0, "vmas_seen_gt_zero"))
+ goto cleanup;
+
+ f = fopen("/proc/self/maps", "r");
+ if (!ASSERT_OK_PTR(f, "proc_maps_fopen"))
+ goto cleanup;
+
+ seen = 0;
+ while (fscanf(f, "%lx-%lx %[^\n]\n", &start, &end, rest_of_line) == 3) {
+ /* [vsyscall] vma isn't _really_ part of task->mm vmas.
+ * /proc/PID/maps returns it when out of vmas - see get_gate_vma
+ * calls in fs/proc/task_mmu.c
+ */
+ if (strstr(rest_of_line, "[vsyscall]"))
+ continue;
+
+ bpf_iter_start = skel->bss->vm_ranges[seen].vm_start;
+ bpf_iter_end = skel->bss->vm_ranges[seen].vm_end;
+
+ ASSERT_EQ(bpf_iter_start, start, "vma->vm_start match");
+ ASSERT_EQ(bpf_iter_end, end, "vma->vm_end match");
+ seen++;
+ }
+
+ if (!ASSERT_EQ(skel->bss->vmas_seen, seen, "vmas_seen_eq"))
+ goto cleanup;
+
+cleanup:
+ if (f)
+ fclose(f);
+ iters_task_vma__destroy(skel);
+}
+
void test_iters(void)
{
RUN_TESTS(iters_state_safety);
@@ -103,4 +159,6 @@ void test_iters(void)
subtest_num_iters();
if (test__start_subtest("testmod_seq"))
subtest_testmod_seq_iters();
+ if (test__start_subtest("task_vma"))
+ subtest_task_vma_iters();
}
diff --git a/tools/testing/selftests/bpf/progs/iters_task_vma.c b/tools/testing/selftests/bpf/progs/iters_task_vma.c
new file mode 100644
index 000000000000..44edecfdfaee
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_vma.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include "bpf_experimental.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+pid_t target_pid = 0;
+unsigned int vmas_seen = 0;
+
+struct {
+ __u64 vm_start;
+ __u64 vm_end;
+} vm_ranges[1000];
+
+SEC("raw_tp/sys_enter")
+int iter_task_vma_for_each(const void *ctx)
+{
+ struct task_struct *task = bpf_get_current_task_btf();
+ struct vm_area_struct *vma;
+ unsigned int seen = 0;
+
+ if (task->pid != target_pid)
+ return 0;
+
+ if (vmas_seen)
+ return 0;
+
+ bpf_for_each(task_vma, vma, task, 0) {
+ if (seen >= 1000)
+ break;
+
+ vm_ranges[seen].vm_start = vma->vm_start;
+ vm_ranges[seen].vm_end = vma->vm_end;
+ seen++;
+ }
+
+ vmas_seen = seen;
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v7 bpf-next 5/5] bpf: Add BPF_KFUNC_{START,END}_defs macros
2023-10-13 20:44 [PATCH v7 bpf-next 0/5] Open-coded task_vma iter Dave Marchevsky
` (3 preceding siblings ...)
2023-10-13 20:44 ` [PATCH v7 bpf-next 4/5] selftests/bpf: Add tests for open-coded task_vma iter Dave Marchevsky
@ 2023-10-13 20:44 ` Dave Marchevsky
2023-10-13 20:48 ` [PATCH v7 bpf-next 0/5] Open-coded task_vma iter David Marchevsky
2023-10-13 23:00 ` patchwork-bot+netdevbpf
6 siblings, 0 replies; 9+ messages in thread
From: Dave Marchevsky @ 2023-10-13 20:44 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Kernel Team, Dave Marchevsky
BPF kfuncs are meant to be called from BPF programs. Accordingly, most
kfuncs are not called from anywhere in the kernel, which the
-Wmissing-prototypes warning is unhappy about. We've peppered
__diag_ignore_all("-Wmissing-prototypes", ... everywhere kfuncs are
defined in the codebase to suppress this warning.
This patch adds two macros meant to bound one or many kfunc definitions.
BPF_KFUNC_START_DEFS currently expands to __diag_push + the
aforementioned __diag_ignore_all, and BPF_KFUNC_END_DEFS to the
corresponding __diag_pop. All existing kfunc definitions which use these
__diag calls to suppress -Wmissing-prototypes are migrated to use the
newly-introduced macros.
In the future we might need to ignore different warnings or do other
kfunc-specific things. This nonfunctional change will make it easier to
make such modifications for all kfunc defs.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
---
include/linux/btf.h | 7 +++++++
kernel/bpf/bpf_iter.c | 6 ++----
kernel/bpf/cpumask.c | 6 ++----
kernel/bpf/helpers.c | 6 ++----
kernel/bpf/map_iter.c | 6 ++----
kernel/bpf/task_iter.c | 6 ++----
kernel/trace/bpf_trace.c | 6 ++----
net/bpf/test_run.c | 7 +++----
net/core/filter.c | 13 ++++---------
net/core/xdp.c | 6 ++----
net/ipv4/fou_bpf.c | 6 ++----
net/netfilter/nf_conntrack_bpf.c | 6 ++----
net/netfilter/nf_nat_bpf.c | 6 ++----
net/xfrm/xfrm_interface_bpf.c | 6 ++----
14 files changed, 36 insertions(+), 57 deletions(-)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 928113a80a95..249d3d2351ad 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -83,6 +83,13 @@
*/
#define __bpf_kfunc __used noinline
+#define BPF_KFUNC_START_DEFS \
+ __diag_push(); \
+ __diag_ignore_all("-Wmissing-prototypes", \
+ "Global kfuncs as their definitions will be in BTF");
+
+#define BPF_KFUNC_END_DEFS __diag_pop();
+
/*
* Return the name of the passed struct, if exists, or halt the build if for
* example the structure gets renamed. In this way, developers have to revisit
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index 833faa04461b..da165abe8490 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -782,9 +782,7 @@ struct bpf_iter_num_kern {
int end; /* final value, exclusive */
} __aligned(8);
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in vmlinux BTF");
+BPF_KFUNC_START_DEFS
__bpf_kfunc int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end)
{
@@ -843,4 +841,4 @@ __bpf_kfunc void bpf_iter_num_destroy(struct bpf_iter_num *it)
s->cur = s->end = 0;
}
-__diag_pop();
+BPF_KFUNC_END_DEFS
diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
index 6983af8e093c..34e9f2cfa211 100644
--- a/kernel/bpf/cpumask.c
+++ b/kernel/bpf/cpumask.c
@@ -34,9 +34,7 @@ static bool cpu_valid(u32 cpu)
return cpu < nr_cpu_ids;
}
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global kfuncs as their definitions will be in BTF");
+BPF_KFUNC_START_DEFS
/**
* bpf_cpumask_create() - Create a mutable BPF cpumask.
@@ -407,7 +405,7 @@ __bpf_kfunc u32 bpf_cpumask_any_and_distribute(const struct cpumask *src1,
return cpumask_any_and_distribute(src1, src2);
}
-__diag_pop();
+BPF_KFUNC_END_DEFS
BTF_SET8_START(cpumask_kfunc_btf_ids)
BTF_ID_FLAGS(func, bpf_cpumask_create, KF_ACQUIRE | KF_RET_NULL)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 62a53ebfedf9..0d86b7939f0b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1888,9 +1888,7 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
}
}
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in vmlinux BTF");
+BPF_KFUNC_START_DEFS
__bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
{
@@ -2496,7 +2494,7 @@ __bpf_kfunc void bpf_throw(u64 cookie)
WARN(1, "A call to BPF exception callback should never return\n");
}
-__diag_pop();
+BPF_KFUNC_END_DEFS
BTF_SET8_START(generic_btf_ids)
#ifdef CONFIG_KEXEC_CORE
diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
index 6fc9dae9edc8..eb8848fdae87 100644
--- a/kernel/bpf/map_iter.c
+++ b/kernel/bpf/map_iter.c
@@ -193,9 +193,7 @@ static int __init bpf_map_iter_init(void)
late_initcall(bpf_map_iter_init);
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in vmlinux BTF");
+BPF_KFUNC_START_DEFS
__bpf_kfunc s64 bpf_map_sum_elem_count(const struct bpf_map *map)
{
@@ -213,7 +211,7 @@ __bpf_kfunc s64 bpf_map_sum_elem_count(const struct bpf_map *map)
return ret;
}
-__diag_pop();
+BPF_KFUNC_END_DEFS
BTF_SET8_START(bpf_map_iter_kfunc_ids)
BTF_ID_FLAGS(func, bpf_map_sum_elem_count, KF_TRUSTED_ARGS)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index fef17628341f..eb026242bd30 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -824,9 +824,7 @@ struct bpf_iter_task_vma_kern {
struct bpf_iter_task_vma_kern_data *data;
} __attribute__((aligned(8)));
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in vmlinux BTF");
+BPF_KFUNC_START_DEFS
__bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
struct task_struct *task, u64 addr)
@@ -892,7 +890,7 @@ __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
}
}
-__diag_pop();
+BPF_KFUNC_END_DEFS
DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index df697c74d519..53d6c9dfe39d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1252,9 +1252,7 @@ static const struct bpf_func_proto bpf_get_func_arg_cnt_proto = {
};
#ifdef CONFIG_KEYS
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "kfuncs which will be used in BPF programs");
+BPF_KFUNC_START_DEFS
/**
* bpf_lookup_user_key - lookup a key by its serial
@@ -1404,7 +1402,7 @@ __bpf_kfunc int bpf_verify_pkcs7_signature(struct bpf_dynptr_kern *data_ptr,
}
#endif /* CONFIG_SYSTEM_DATA_VERIFICATION */
-__diag_pop();
+BPF_KFUNC_END_DEFS
BTF_SET8_START(key_sig_kfunc_set)
BTF_ID_FLAGS(func, bpf_lookup_user_key, KF_ACQUIRE | KF_RET_NULL | KF_SLEEPABLE)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 0841f8d82419..74b9776560b7 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -503,9 +503,8 @@ static int bpf_test_finish(const union bpf_attr *kattr,
* architecture dependent calling conventions. 7+ can be supported in the
* future.
*/
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in vmlinux BTF");
+BPF_KFUNC_START_DEFS
+
__bpf_kfunc int bpf_fentry_test1(int a)
{
return a + 1;
@@ -605,7 +604,7 @@ __bpf_kfunc void bpf_kfunc_call_memb_release(struct prog_test_member *p)
{
}
-__diag_pop();
+BPF_KFUNC_END_DEFS
BTF_SET8_START(bpf_test_modify_return_ids)
BTF_ID_FLAGS(func, bpf_modify_return_test)
diff --git a/net/core/filter.c b/net/core/filter.c
index cc2e4babc85f..f90e5c3dc9a1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -11751,9 +11751,7 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
return func;
}
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in vmlinux BTF");
+BPF_KFUNC_START_DEFS
__bpf_kfunc int bpf_dynptr_from_skb(struct sk_buff *skb, u64 flags,
struct bpf_dynptr_kern *ptr__uninit)
{
@@ -11800,7 +11798,7 @@ __bpf_kfunc int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern,
return 0;
}
-__diag_pop();
+BPF_KFUNC_END_DEFS
int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags,
struct bpf_dynptr_kern *ptr__uninit)
@@ -11863,10 +11861,7 @@ static int __init bpf_kfunc_init(void)
}
late_initcall(bpf_kfunc_init);
-/* Disables missing prototype warnings */
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in vmlinux BTF");
+BPF_KFUNC_START_DEFS
/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code.
*
@@ -11900,7 +11895,7 @@ __bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
return sk->sk_prot->diag_destroy(sk, ECONNABORTED);
}
-__diag_pop()
+BPF_KFUNC_END_DEFS
BTF_SET8_START(bpf_sk_iter_kfunc_ids)
BTF_ID_FLAGS(func, bpf_sock_destroy, KF_TRUSTED_ARGS)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index df4789ab512d..aed01f0a3acc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -696,9 +696,7 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
return nxdpf;
}
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in vmlinux BTF");
+BPF_KFUNC_START_DEFS
/**
* bpf_xdp_metadata_rx_timestamp - Read XDP frame RX timestamp.
@@ -738,7 +736,7 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
return -EOPNOTSUPP;
}
-__diag_pop();
+BPF_KFUNC_END_DEFS
BTF_SET8_START(xdp_metadata_kfunc_ids)
#define XDP_METADATA_KFUNC(_, __, name, ___) BTF_ID_FLAGS(func, name, KF_TRUSTED_ARGS)
diff --git a/net/ipv4/fou_bpf.c b/net/ipv4/fou_bpf.c
index 3760a14b6b57..9ba0030215b3 100644
--- a/net/ipv4/fou_bpf.c
+++ b/net/ipv4/fou_bpf.c
@@ -22,9 +22,7 @@ enum bpf_fou_encap_type {
FOU_BPF_ENCAP_GUE,
};
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in BTF");
+BPF_KFUNC_START_DEFS
/* bpf_skb_set_fou_encap - Set FOU encap parameters
*
@@ -100,7 +98,7 @@ __bpf_kfunc int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx,
return 0;
}
-__diag_pop()
+BPF_KFUNC_END_DEFS
BTF_SET8_START(fou_kfunc_set)
BTF_ID_FLAGS(func, bpf_skb_set_fou_encap)
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index b21799d468d2..a8f37b9c30ec 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -230,9 +230,7 @@ static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
return 0;
}
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in nf_conntrack BTF");
+BPF_KFUNC_START_DEFS
/* bpf_xdp_ct_alloc - Allocate a new CT entry
*
@@ -467,7 +465,7 @@ __bpf_kfunc int bpf_ct_change_status(struct nf_conn *nfct, u32 status)
return nf_ct_change_status_common(nfct, status);
}
-__diag_pop()
+BPF_KFUNC_END_DEFS
BTF_SET8_START(nf_ct_kfunc_set)
BTF_ID_FLAGS(func, bpf_xdp_ct_alloc, KF_ACQUIRE | KF_RET_NULL)
diff --git a/net/netfilter/nf_nat_bpf.c b/net/netfilter/nf_nat_bpf.c
index 141ee7783223..df04343839ee 100644
--- a/net/netfilter/nf_nat_bpf.c
+++ b/net/netfilter/nf_nat_bpf.c
@@ -12,9 +12,7 @@
#include <net/netfilter/nf_conntrack_core.h>
#include <net/netfilter/nf_nat.h>
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in nf_nat BTF");
+BPF_KFUNC_START_DEFS
/* bpf_ct_set_nat_info - Set source or destination nat address
*
@@ -54,7 +52,7 @@ __bpf_kfunc int bpf_ct_set_nat_info(struct nf_conn___init *nfct,
return nf_nat_setup_info(ct, &range, manip) == NF_DROP ? -ENOMEM : 0;
}
-__diag_pop()
+BPF_KFUNC_END_DEFS
BTF_SET8_START(nf_nat_kfunc_set)
BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_TRUSTED_ARGS)
diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c
index d74f3fd20f2b..8321779d515d 100644
--- a/net/xfrm/xfrm_interface_bpf.c
+++ b/net/xfrm/xfrm_interface_bpf.c
@@ -27,9 +27,7 @@ struct bpf_xfrm_info {
int link;
};
-__diag_push();
-__diag_ignore_all("-Wmissing-prototypes",
- "Global functions as their definitions will be in xfrm_interface BTF");
+BPF_KFUNC_START_DEFS
/* bpf_skb_get_xfrm_info - Get XFRM metadata
*
@@ -93,7 +91,7 @@ __bpf_kfunc int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx, const struct bp
return 0;
}
-__diag_pop()
+BPF_KFUNC_END_DEFS
BTF_SET8_START(xfrm_ifc_kfunc_set)
BTF_ID_FLAGS(func, bpf_skb_get_xfrm_info)
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v7 bpf-next 0/5] Open-coded task_vma iter
2023-10-13 20:44 [PATCH v7 bpf-next 0/5] Open-coded task_vma iter Dave Marchevsky
` (4 preceding siblings ...)
2023-10-13 20:44 ` [PATCH v7 bpf-next 5/5] bpf: Add BPF_KFUNC_{START,END}_defs macros Dave Marchevsky
@ 2023-10-13 20:48 ` David Marchevsky
2023-10-13 22:56 ` Andrii Nakryiko
2023-10-13 23:00 ` patchwork-bot+netdevbpf
6 siblings, 1 reply; 9+ messages in thread
From: David Marchevsky @ 2023-10-13 20:48 UTC (permalink / raw)
To: Dave Marchevsky, bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Kernel Team
On 10/13/23 4:44 PM, Dave Marchevsky wrote:
> At Meta we have a profiling daemon which periodically collects
> information on many hosts. This collection usually involves grabbing
> stacks (user and kernel) using perf_event BPF progs and later symbolicating
> them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on
> remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In
> those cases we must fall back to digging around in /proc/PID/maps to map
> virtual address to (binary, offset). The /proc/PID/maps digging does not
> occur synchronously with stack collection, so the process might already
> be gone, in which case it won't have /proc/PID/maps and we will fail to
> symbolicate.
>
> This 'exited process problem' doesn't occur very often as
> most of the prod services we care to profile are long-lived daemons, but
> there are enough usecases to warrant a workaround: a BPF program which
> can be optionally loaded at data collection time and essentially walks
> /proc/PID/maps. Currently this is done by walking the vma list:
>
> struct vm_area_struct* mmap = BPF_CORE_READ(mm, mmap);
> mmap_next = BPF_CORE_READ(rmap, vm_next); /* in a loop */
>
> Since commit 763ecb035029 ("mm: remove the vma linked list") there's no
> longer a vma linked list to walk. Walking the vma maple tree is not as
> simple as hopping struct vm_area_struct->vm_next. Luckily,
> commit f39af05949a4 ("mm: add VMA iterator"), another commit in that series,
> added struct vma_iterator and for_each_vma macro for easy vma iteration. If
> similar functionality was exposed to BPF programs, it would be perfect for our
> usecase.
>
> This series adds such functionality, specifically a BPF equivalent of
> for_each_vma using the open-coded iterator style.
>
> Notes:
> * This approach was chosen after discussion on a previous series [0] which
> attempted to solve the same problem by adding a BPF_F_VMA_NEXT flag to
> bpf_find_vma.
> * Unlike the task_vma bpf_iter, the open-coded iterator kfuncs here do not
> drop the vma read lock between iterations. See Alexei's response in [0].
> * The [vsyscall] page isn't really part of task->mm's vmas, but
> /proc/PID/maps returns information about it anyways. The vma iter added
> here does not do the same. See comment on selftest in patch 3.
> * bpf_iter_task_vma allocates a _data struct which contains - among other
> things - struct vma_iterator, using BPF allocator and keeps a pointer to
> the bpf_iter_task_vma_data. This is done in order to prevent changes to
> struct ma_state - which is wrapped by struct vma_iterator - from
> necessitating changes to uapi struct bpf_iter_task_vma.
>
> Changelog:
>
> v6 -> v7: https://lore.kernel.org/bpf/20231010185944.3888849-1-davemarchevsky@fb.com/
>
> Patch numbers correspond to their position in v6
>
> Patch 2 ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
> * Add Andrii ack
> Patch 3 ("bpf: Introduce task_vma open-coded iterator kfuncs")
> * Add Andrii ack
> * Add missing __diag_ignore_all for -Wmissing-prototypes (Song)
> Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter")
> * Remove two unnecessary header includes (Andrii)
> * Remove extraneous !vmas_seen check (Andrii)
Whoops, forgot to add another bullet here:
* Initialize FILE *f = NULL to address llvm-16 CI warnings (Andrii)
> New Patch ("bpf: Add BPF_KFUNC_{START,END}_defs macros")
> * After talking to Andrii, this is an attempt to clean up __diag_ignore_all
> spam everywhere kfuncs are defined. If nontrivial changes are needed,
> let's apply the other 4 and I'll respin as a standalone patch.
>
> v5 -> v6: https://lore.kernel.org/bpf/20231010175637.3405682-1-davemarchevsky@fb.com/
>
> Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter")
> * Remove extraneous blank line. I did this manually to the .patch file
> for v5, which caused BPF CI to complain about failing to apply the
> series
>
> v4 -> v5: https://lore.kernel.org/bpf/20231002195341.2940874-1-davemarchevsky@fb.com/
>
> Patch numbers correspond to their position in v4
>
> New Patch ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
> * Patch 2's renaming of this selftest, and associated changes in the
> userspace runner, are split out into this separate commit (Andrii)
>
> Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
> * Remove bpf_iter_task_vma kfuncs from libbpf's bpf_helpers.h, they'll be
> added to selftests' bpf_experimental.h in selftests patch below (Andrii)
> * Split bpf_iter_task_vma.c renaming into separate commit (Andrii)
>
> Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter")
> * Add bpf_iter_task_vma kfuncs to bpf_experimental.h (Andrii)
> * Remove '?' from prog SEC, open_and_load the skel in one operation (Andrii)
> * Ensure that fclose() always happens in test runner (Andrii)
> * Use global var w/ 1000 (vm_start, vm_end) structs instead of two
> MAP_TYPE_ARRAY's w/ 1k u64s each (Andrii)
>
>
> v3 -> v4: https://lore.kernel.org/bpf/20230822050558.2937659-1-davemarchevsky@fb.com/
>
> Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num")
> * Add Andrii ack
> Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
> * Mark bpf_iter_task_vma_new args KF_RCU and remove now-unnecessary !task
> check (Yonghong)
> * Although KF_RCU is a function-level flag, in reality it only applies to
> the task_struct *task parameter, as the other two params are a scalar int
> and a specially-handled KF_ARG_PTR_TO_ITER
> * Remove struct bpf_iter_task_vma definition from uapi headers, define in
> kernel/bpf/task_iter.c instead (Andrii)
> Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter")
> * Use a local var when looping over vmas to track map idx. Update vmas_seen
> global after done iterating. Don't start iterating or update vmas_seen if
> vmas_seen global is nonzero. (Andrii)
> * Move getpgid() call to correct spot - above skel detach. (Andrii)
>
> v2 -> v3: https://lore.kernel.org/bpf/20230821173415.1970776-1-davemarchevsky@fb.com/
>
> Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num")
> * Add Yonghong ack
>
> Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
> * UAPI bpf header and tools/ version should match
> * Add bpf_iter_task_vma_kern_data which bpf_iter_task_vma_kern points to,
> bpf_mem_alloc/free it instead of just vma_iterator. (Alexei)
> * Inner data ptr == NULL implies initialization failed
>
>
> v1 -> v2: https://lore.kernel.org/bpf/20230810183513.684836-1-davemarchevsky@fb.com/
> * Patch 1
> * Now removes the unnecessary BTF_TYPE_EMIT instead of changing the
> type (Yonghong)
> * Patch 2
> * Don't do unnecessary BTF_TYPE_EMIT (Yonghong)
> * Bump task refcount to prevent ->mm reuse (Yonghong)
> * Keep a pointer to vma_iterator in bpf_iter_task_vma, alloc/free
> via BPF mem allocator (Yonghong, Stanislav)
> * Patch 3
>
> [0]: https://lore.kernel.org/bpf/20230801145414.418145-1-davemarchevsky@fb.com/
>
> Dave Marchevsky (5):
> bpf: Don't explicitly emit BTF for struct btf_iter_num
> selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
> bpf: Introduce task_vma open-coded iterator kfuncs
> selftests/bpf: Add tests for open-coded task_vma iter
> bpf: Add BPF_KFUNC_{START,END}_defs macros
>
> include/linux/btf.h | 7 ++
> kernel/bpf/bpf_iter.c | 8 +-
> kernel/bpf/cpumask.c | 6 +-
> kernel/bpf/helpers.c | 9 +-
> kernel/bpf/map_iter.c | 6 +-
> kernel/bpf/task_iter.c | 89 +++++++++++++++++++
> kernel/trace/bpf_trace.c | 6 +-
> net/bpf/test_run.c | 7 +-
> net/core/filter.c | 13 +--
> net/core/xdp.c | 6 +-
> net/ipv4/fou_bpf.c | 6 +-
> net/netfilter/nf_conntrack_bpf.c | 6 +-
> net/netfilter/nf_nat_bpf.c | 6 +-
> net/xfrm/xfrm_interface_bpf.c | 6 +-
> .../testing/selftests/bpf/bpf_experimental.h | 8 ++
> .../selftests/bpf/prog_tests/bpf_iter.c | 26 +++---
> .../testing/selftests/bpf/prog_tests/iters.c | 58 ++++++++++++
> ...f_iter_task_vma.c => bpf_iter_task_vmas.c} | 0
> .../selftests/bpf/progs/iters_task_vma.c | 43 +++++++++
> 19 files changed, 248 insertions(+), 68 deletions(-)
> rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma.c
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH v7 bpf-next 0/5] Open-coded task_vma iter
2023-10-13 20:48 ` [PATCH v7 bpf-next 0/5] Open-coded task_vma iter David Marchevsky
@ 2023-10-13 22:56 ` Andrii Nakryiko
0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2023-10-13 22:56 UTC (permalink / raw)
To: David Marchevsky
Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Kernel Team
On Fri, Oct 13, 2023 at 1:48 PM David Marchevsky
<david.marchevsky@linux.dev> wrote:
>
> On 10/13/23 4:44 PM, Dave Marchevsky wrote:
> > At Meta we have a profiling daemon which periodically collects
> > information on many hosts. This collection usually involves grabbing
> > stacks (user and kernel) using perf_event BPF progs and later symbolicating
> > them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on
> > remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In
> > those cases we must fall back to digging around in /proc/PID/maps to map
> > virtual address to (binary, offset). The /proc/PID/maps digging does not
> > occur synchronously with stack collection, so the process might already
> > be gone, in which case it won't have /proc/PID/maps and we will fail to
> > symbolicate.
> >
> > This 'exited process problem' doesn't occur very often as
> > most of the prod services we care to profile are long-lived daemons, but
> > there are enough usecases to warrant a workaround: a BPF program which
> > can be optionally loaded at data collection time and essentially walks
> > /proc/PID/maps. Currently this is done by walking the vma list:
> >
> > struct vm_area_struct* mmap = BPF_CORE_READ(mm, mmap);
> > mmap_next = BPF_CORE_READ(rmap, vm_next); /* in a loop */
> >
> > Since commit 763ecb035029 ("mm: remove the vma linked list") there's no
> > longer a vma linked list to walk. Walking the vma maple tree is not as
> > simple as hopping struct vm_area_struct->vm_next. Luckily,
> > commit f39af05949a4 ("mm: add VMA iterator"), another commit in that series,
> > added struct vma_iterator and for_each_vma macro for easy vma iteration. If
> > similar functionality was exposed to BPF programs, it would be perfect for our
> > usecase.
> >
> > This series adds such functionality, specifically a BPF equivalent of
> > for_each_vma using the open-coded iterator style.
> >
> > Notes:
> > * This approach was chosen after discussion on a previous series [0] which
> > attempted to solve the same problem by adding a BPF_F_VMA_NEXT flag to
> > bpf_find_vma.
> > * Unlike the task_vma bpf_iter, the open-coded iterator kfuncs here do not
> > drop the vma read lock between iterations. See Alexei's response in [0].
> > * The [vsyscall] page isn't really part of task->mm's vmas, but
> > /proc/PID/maps returns information about it anyways. The vma iter added
> > here does not do the same. See comment on selftest in patch 3.
> > * bpf_iter_task_vma allocates a _data struct which contains - among other
> > things - struct vma_iterator, using BPF allocator and keeps a pointer to
> > the bpf_iter_task_vma_data. This is done in order to prevent changes to
> > struct ma_state - which is wrapped by struct vma_iterator - from
> > necessitating changes to uapi struct bpf_iter_task_vma.
> >
> > Changelog:
> >
> > v6 -> v7: https://lore.kernel.org/bpf/20231010185944.3888849-1-davemarchevsky@fb.com/
> >
> > Patch numbers correspond to their position in v6
> >
> > Patch 2 ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
> > * Add Andrii ack
> > Patch 3 ("bpf: Introduce task_vma open-coded iterator kfuncs")
> > * Add Andrii ack
> > * Add missing __diag_ignore_all for -Wmissing-prototypes (Song)
> > Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter")
> > * Remove two unnecessary header includes (Andrii)
> > * Remove extraneous !vmas_seen check (Andrii)
>
> Whoops, forgot to add another bullet here:
> * Initialize FILE *f = NULL to address llvm-16 CI warnings (Andrii)
>
> > New Patch ("bpf: Add BPF_KFUNC_{START,END}_defs macros")
> > * After talking to Andrii, this is an attempt to clean up __diag_ignore_all
> > spam everywhere kfuncs are defined. If nontrivial changes are needed,
> > let's apply the other 4 and I'll respin as a standalone patch.
> >
> > v5 -> v6: https://lore.kernel.org/bpf/20231010175637.3405682-1-davemarchevsky@fb.com/
> >
> > Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter")
> > * Remove extraneous blank line. I did this manually to the .patch file
> > for v5, which caused BPF CI to complain about failing to apply the
> > series
> >
> > v4 -> v5: https://lore.kernel.org/bpf/20231002195341.2940874-1-davemarchevsky@fb.com/
> >
> > Patch numbers correspond to their position in v4
> >
> > New Patch ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
> > * Patch 2's renaming of this selftest, and associated changes in the
> > userspace runner, are split out into this separate commit (Andrii)
> >
> > Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
> > * Remove bpf_iter_task_vma kfuncs from libbpf's bpf_helpers.h, they'll be
> > added to selftests' bpf_experimental.h in selftests patch below (Andrii)
> > * Split bpf_iter_task_vma.c renaming into separate commit (Andrii)
> >
> > Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter")
> > * Add bpf_iter_task_vma kfuncs to bpf_experimental.h (Andrii)
> > * Remove '?' from prog SEC, open_and_load the skel in one operation (Andrii)
> > * Ensure that fclose() always happens in test runner (Andrii)
> > * Use global var w/ 1000 (vm_start, vm_end) structs instead of two
> > MAP_TYPE_ARRAY's w/ 1k u64s each (Andrii)
> >
> >
> > v3 -> v4: https://lore.kernel.org/bpf/20230822050558.2937659-1-davemarchevsky@fb.com/
> >
> > Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num")
> > * Add Andrii ack
> > Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
> > * Mark bpf_iter_task_vma_new args KF_RCU and remove now-unnecessary !task
> > check (Yonghong)
> > * Although KF_RCU is a function-level flag, in reality it only applies to
> > the task_struct *task parameter, as the other two params are a scalar int
> > and a specially-handled KF_ARG_PTR_TO_ITER
> > * Remove struct bpf_iter_task_vma definition from uapi headers, define in
> > kernel/bpf/task_iter.c instead (Andrii)
> > Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter")
> > * Use a local var when looping over vmas to track map idx. Update vmas_seen
> > global after done iterating. Don't start iterating or update vmas_seen if
> > vmas_seen global is nonzero. (Andrii)
> > * Move getpgid() call to correct spot - above skel detach. (Andrii)
> >
> > v2 -> v3: https://lore.kernel.org/bpf/20230821173415.1970776-1-davemarchevsky@fb.com/
> >
> > Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num")
> > * Add Yonghong ack
> >
> > Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
> > * UAPI bpf header and tools/ version should match
> > * Add bpf_iter_task_vma_kern_data which bpf_iter_task_vma_kern points to,
> > bpf_mem_alloc/free it instead of just vma_iterator. (Alexei)
> > * Inner data ptr == NULL implies initialization failed
> >
> >
> > v1 -> v2: https://lore.kernel.org/bpf/20230810183513.684836-1-davemarchevsky@fb.com/
> > * Patch 1
> > * Now removes the unnecessary BTF_TYPE_EMIT instead of changing the
> > type (Yonghong)
> > * Patch 2
> > * Don't do unnecessary BTF_TYPE_EMIT (Yonghong)
> > * Bump task refcount to prevent ->mm reuse (Yonghong)
> > * Keep a pointer to vma_iterator in bpf_iter_task_vma, alloc/free
> > via BPF mem allocator (Yonghong, Stanislav)
> > * Patch 3
> >
> > [0]: https://lore.kernel.org/bpf/20230801145414.418145-1-davemarchevsky@fb.com/
> >
> > Dave Marchevsky (5):
> > bpf: Don't explicitly emit BTF for struct btf_iter_num
> > selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
> > bpf: Introduce task_vma open-coded iterator kfuncs
> > selftests/bpf: Add tests for open-coded task_vma iter
> > bpf: Add BPF_KFUNC_{START,END}_defs macros
I dropped patch #5 from the series and applied the rest, thanks.
For BPF_KFUNC_{START,END}_DEFS, I like the clean up, but I feel like
we might want to follow similar function-call like syntax as with
__diag_push() and make sure we use semicolon to terminate their
"invocations". If you agree, please adjust and submit separately, I'd
be curious to see what others think.
Also, keep in mind that __diag_ignore_all("-Wmissing-prototypes") that
we use right now might not be enough. We started getting more and more
reports with different warnings recently, so we probably would need to
ignore more warnings for kfuncs:
kernel/bpf/helpers.c:1906:19: warning: no previous declaration for
'bpf_percpu_obj_new_impl' [-Wmissing-declarations]
kernel/bpf/helpers.c:1942:18: warning: no previous declaration for
'bpf_percpu_obj_drop_impl' [-Wmissing-declarations]
kernel/bpf/helpers.c:2477:18: warning: no previous declaration for
'bpf_throw' [-Wmissing-declarations]
Which just means that this refactoring is even more important and timely :)
> >
> > include/linux/btf.h | 7 ++
> > kernel/bpf/bpf_iter.c | 8 +-
> > kernel/bpf/cpumask.c | 6 +-
> > kernel/bpf/helpers.c | 9 +-
> > kernel/bpf/map_iter.c | 6 +-
> > kernel/bpf/task_iter.c | 89 +++++++++++++++++++
> > kernel/trace/bpf_trace.c | 6 +-
> > net/bpf/test_run.c | 7 +-
> > net/core/filter.c | 13 +--
> > net/core/xdp.c | 6 +-
> > net/ipv4/fou_bpf.c | 6 +-
> > net/netfilter/nf_conntrack_bpf.c | 6 +-
> > net/netfilter/nf_nat_bpf.c | 6 +-
> > net/xfrm/xfrm_interface_bpf.c | 6 +-
> > .../testing/selftests/bpf/bpf_experimental.h | 8 ++
> > .../selftests/bpf/prog_tests/bpf_iter.c | 26 +++---
> > .../testing/selftests/bpf/prog_tests/iters.c | 58 ++++++++++++
> > ...f_iter_task_vma.c => bpf_iter_task_vmas.c} | 0
> > .../selftests/bpf/progs/iters_task_vma.c | 43 +++++++++
> > 19 files changed, 248 insertions(+), 68 deletions(-)
> > rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
> > create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma.c
> >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v7 bpf-next 0/5] Open-coded task_vma iter
2023-10-13 20:44 [PATCH v7 bpf-next 0/5] Open-coded task_vma iter Dave Marchevsky
` (5 preceding siblings ...)
2023-10-13 20:48 ` [PATCH v7 bpf-next 0/5] Open-coded task_vma iter David Marchevsky
@ 2023-10-13 23:00 ` patchwork-bot+netdevbpf
6 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-10-13 23:00 UTC (permalink / raw)
To: Dave Marchevsky; +Cc: bpf, ast, daniel, andrii, martin.lau, kernel-team
Hello:
This series was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:
On Fri, 13 Oct 2023 13:44:21 -0700 you wrote:
> At Meta we have a profiling daemon which periodically collects
> information on many hosts. This collection usually involves grabbing
> stacks (user and kernel) using perf_event BPF progs and later symbolicating
> them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on
> remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In
> those cases we must fall back to digging around in /proc/PID/maps to map
> virtual address to (binary, offset). The /proc/PID/maps digging does not
> occur synchronously with stack collection, so the process might already
> be gone, in which case it won't have /proc/PID/maps and we will fail to
> symbolicate.
>
> [...]
Here is the summary with links:
- [v7,bpf-next,1/5] bpf: Don't explicitly emit BTF for struct btf_iter_num
https://git.kernel.org/bpf/bpf-next/c/f10ca5da5bd7
- [v7,bpf-next,2/5] selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
https://git.kernel.org/bpf/bpf-next/c/45b38941c81f
- [v7,bpf-next,3/5] bpf: Introduce task_vma open-coded iterator kfuncs
https://git.kernel.org/bpf/bpf-next/c/4ac454682158
- [v7,bpf-next,4/5] selftests/bpf: Add tests for open-coded task_vma iter
https://git.kernel.org/bpf/bpf-next/c/e0e1a7a5fc37
- [v7,bpf-next,5/5] bpf: Add BPF_KFUNC_{START,END}_defs macros
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread