public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] Open-coded task_vma iter
@ 2023-08-10 18:35 Dave Marchevsky
  2023-08-10 18:35 ` [PATCH bpf-next 1/3] bpf: Explicitly emit BTF for struct bpf_iter_num, not btf_iter_num Dave Marchevsky
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Dave Marchevsky @ 2023-08-10 18:35 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

At Meta we have a profiling daemon which periodically collects
information on many hosts. This collection usually involves grabbing
stacks (user and kernel) using perf_event BPF progs and later symbolicating
them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on
remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In
those cases we must fall back to digging around in /proc/PID/maps to map
virtual address to (binary, offset). The /proc/PID/maps digging does not
occur synchronously with stack collection, so the process might already
be gone, in which case it won't have /proc/PID/maps and we will fail to
symbolicate.

This 'exited process problem' doesn't occur very often as
most of the prod services we care to profile are long-lived daemons, but
there are enough usecases to warrant a workaround: a BPF program which
can be optionally loaded at data collection time and essentially walks
/proc/PID/maps. Currently this is done by walking the vma list:

  struct vm_area_struct* mmap = BPF_CORE_READ(mm, mmap);
  mmap_next = BPF_CORE_READ(rmap, vm_next); /* in a loop */

Since commit 763ecb035029 ("mm: remove the vma linked list") there's no
longer a vma linked list to walk. Walking the vma maple tree is not as
simple as hopping struct vm_area_struct->vm_next. Luckily,
commit f39af05949a4 ("mm: add VMA iterator"), another commit in that series,
added struct vma_iterator and for_each_vma macro for easy vma iteration. If
similar functionality was exposed to BPF programs, it would be perfect for our
usecase.

This series adds such functionality, specifically a BPF equivalent of
for_each_vma using the open-coded iterator style.

Notes:
  * This approach was chosen after discussion on a previous series [0] which
    attempted to solve the same problem by adding a BPF_F_VMA_NEXT flag to
    bpf_find_vma.
  * Unlike the task_vma bpf_iter, the open-coded iterator kfuncs here do not
    drop the vma read lock between iterations. See Alexei's response in [0].
  * The [vsyscall] page isn't really part of task->mm's vmas, but
    /proc/PID/maps returns information about it anyways. The vma iter added
    here does not do the same. See comment on selftest in patch 3.
  * The struct vma_iterator wrapped by struct bpf_iter_task_vma itself wraps
    struct ma_state. Because we need the entire struct, not a ptr, changes to
    either struct vma_iterator or struct ma_state will necessitate changing the
    opaque struct bpf_iter_task_vma to account for the new size. This feels a
    bit brittle. We could instead use bpf_mem_alloc to allocate a struct
    vma_iterator in bpf_iter_task_vma_new and have struct bpf_iter_task_vma
    point to that, but that's not quite equivalent as BPF progs will usually
    use the stack for this struct via bpf_for_each. Went with the simpler route
    for now.

Patch summary:
  * Patch 1 is a tiny fix I ran into while implementing the vma iter in this
    series. It can be applied independently.
  * Patch 2 is the meat of the implementation
  * Patch 3 adds tests for the new functionality
    * Existing iter tests exercise failure cases (e.g. prog that doesn't call
      _destroy()). I didn't replicate them in this series, but am happy to add
      them in v2 if folks feel that it would be worthwhile.

  [0]: https://lore.kernel.org/bpf/20230801145414.418145-1-davemarchevsky@fb.com/


Dave Marchevsky (3):
  bpf: Explicitly emit BTF for struct bpf_iter_num, not btf_iter_num
  bpf: Introduce task_vma open-coded iterator kfuncs
  selftests/bpf: Add tests for open-coded task_vma iter

 include/uapi/linux/bpf.h                      |  5 ++
 kernel/bpf/bpf_iter.c                         |  2 +-
 kernel/bpf/helpers.c                          |  3 +
 kernel/bpf/task_iter.c                        | 54 ++++++++++++++
 tools/include/uapi/linux/bpf.h                |  5 ++
 tools/lib/bpf/bpf_helpers.h                   |  8 +++
 .../selftests/bpf/prog_tests/bpf_iter.c       | 26 +++----
 .../testing/selftests/bpf/prog_tests/iters.c  | 71 +++++++++++++++++++
 ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
 .../selftests/bpf/progs/iters_task_vma.c      | 56 +++++++++++++++
 10 files changed, 216 insertions(+), 14 deletions(-)
 rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 1/3] bpf: Explicitly emit BTF for struct bpf_iter_num, not btf_iter_num
  2023-08-10 18:35 [PATCH bpf-next 0/3] Open-coded task_vma iter Dave Marchevsky
@ 2023-08-10 18:35 ` Dave Marchevsky
  2023-08-11  7:19   ` Yonghong Song
  2023-08-10 18:35 ` [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
  2023-08-10 18:35 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests for open-coded task_vma iter Dave Marchevsky
  2 siblings, 1 reply; 10+ messages in thread
From: Dave Marchevsky @ 2023-08-10 18:35 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

Commit 6018e1f407cc ("bpf: implement numbers iterator") added the
BTF_TYPE_EMIT line that this patch is modifying. The struct btf_iter_num
doesn't exist, so only a forward declaration is emitted in BTF:

  FWD 'btf_iter_num' fwd_kind=struct

Since that commit was probably hoping to ensure that struct bpf_iter_num
is emitted in vmlinux BTF, this patch changes it to the correct type.

This isn't marked "Fixes" because the extraneous btf_iter_num FWD wasn't
causing any issues that I noticed, aside from mild confusion when I
looked through the code.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/bpf_iter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index 96856f130cbf..20ef64445754 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -793,7 +793,7 @@ __bpf_kfunc int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end)
 	BUILD_BUG_ON(sizeof(struct bpf_iter_num_kern) != sizeof(struct bpf_iter_num));
 	BUILD_BUG_ON(__alignof__(struct bpf_iter_num_kern) != __alignof__(struct bpf_iter_num));
 
-	BTF_TYPE_EMIT(struct btf_iter_num);
+	BTF_TYPE_EMIT(struct bpf_iter_num);
 
 	/* start == end is legit, it's an empty range and we'll just get NULL
 	 * on first (and any subsequent) bpf_iter_num_next() call
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs
  2023-08-10 18:35 [PATCH bpf-next 0/3] Open-coded task_vma iter Dave Marchevsky
  2023-08-10 18:35 ` [PATCH bpf-next 1/3] bpf: Explicitly emit BTF for struct bpf_iter_num, not btf_iter_num Dave Marchevsky
@ 2023-08-10 18:35 ` Dave Marchevsky
  2023-08-10 21:57   ` Stanislav Fomichev
                     ` (2 more replies)
  2023-08-10 18:35 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests for open-coded task_vma iter Dave Marchevsky
  2 siblings, 3 replies; 10+ messages in thread
From: Dave Marchevsky @ 2023-08-10 18:35 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky,
	Nathan Slingerland

This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_task_vma in open-coded
iterator style. BPF programs can use these kfuncs directly or through
bpf_for_each macro for natural-looking iteration of all task vmas.

The implementation borrows heavily from bpf_find_vma helper's locking -
differing only in that it holds the mmap_read lock for all iterations
while the helper only executes its provided callback on a maximum of 1
vma. Aside from locking, struct vma_iterator and vma_next do all the
heavy lifting.

The newly-added struct bpf_iter_task_vma has a name collision with a
selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs
file is renamed in order to avoid the collision.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Nathan Slingerland <slinger@meta.com>
---
 include/uapi/linux/bpf.h                      |  5 ++
 kernel/bpf/helpers.c                          |  3 +
 kernel/bpf/task_iter.c                        | 56 +++++++++++++++++++
 tools/include/uapi/linux/bpf.h                |  5 ++
 tools/lib/bpf/bpf_helpers.h                   |  8 +++
 .../selftests/bpf/prog_tests/bpf_iter.c       | 26 ++++-----
 ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
 7 files changed, 90 insertions(+), 13 deletions(-)
 rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d21deb46f49f..c4a65968f9f5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7291,4 +7291,9 @@ struct bpf_iter_num {
 	__u64 __opaque[1];
 } __attribute__((aligned(8)));
 
+struct bpf_iter_task_vma {
+	__u64 __opaque[9]; /* See bpf_iter_num comment above */
+	char __opaque_c[3];
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index eb91cae0612a..7a06dea749f1 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2482,6 +2482,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_dynptr_adjust)
 BTF_ID_FLAGS(func, bpf_dynptr_is_null)
 BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index c4ab9d6cdbe9..76be9998a65a 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -8,6 +8,7 @@
 #include <linux/fdtable.h>
 #include <linux/filter.h>
 #include <linux/btf_ids.h>
+#include <linux/mm_types.h>
 #include "mmap_unlock_work.h"
 
 static const char * const iter_task_type_names[] = {
@@ -823,6 +824,61 @@ const struct bpf_func_proto bpf_find_vma_proto = {
 	.arg5_type	= ARG_ANYTHING,
 };
 
+struct bpf_iter_task_vma_kern {
+	struct mm_struct *mm;
+	struct mmap_unlock_irq_work *work;
+	struct vma_iterator vmi;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
+				      struct task_struct *task, u64 addr)
+{
+	struct bpf_iter_task_vma_kern *i = (void *)it;
+	bool irq_work_busy = false;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_task_vma_kern) != __alignof__(struct bpf_iter_task_vma));
+
+	BTF_TYPE_EMIT(struct bpf_iter_task_vma);
+
+	/* NULL i->mm signals failed bpf_iter_task_vma initialization.
+	 * i->work == NULL is valid.
+	 */
+	i->mm = NULL;
+	if (!task)
+		return -ENOENT;
+
+	i->mm = task->mm;
+	if (!i->mm)
+		return -ENOENT;
+
+	irq_work_busy = bpf_mmap_unlock_get_irq_work(&i->work);
+	if (irq_work_busy || !mmap_read_trylock(i->mm)) {
+		i->mm = NULL;
+		return -EBUSY;
+	}
+
+	vma_iter_init(&i->vmi, i->mm, addr);
+	return 0;
+}
+
+__bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it)
+{
+	struct bpf_iter_task_vma_kern *i = (void *)it;
+
+	if (!i->mm) /* bpf_iter_task_vma_new failed */
+		return NULL;
+	return vma_next(&i->vmi);
+}
+
+__bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
+{
+	struct bpf_iter_task_vma_kern *i = (void *)it;
+
+	if (i->mm)
+		bpf_mmap_unlock_mm(i->work, i->mm);
+}
+
 DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
 
 static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d21deb46f49f..c4a65968f9f5 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7291,4 +7291,9 @@ struct bpf_iter_num {
 	__u64 __opaque[1];
 } __attribute__((aligned(8)));
 
+struct bpf_iter_task_vma {
+	__u64 __opaque[9]; /* See bpf_iter_num comment above */
+	char __opaque_c[3];
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index bbab9ad9dc5a..d885ffee4d88 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -302,6 +302,14 @@ extern int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end) __weak
 extern int *bpf_iter_num_next(struct bpf_iter_num *it) __weak __ksym;
 extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym;
 
+struct bpf_iter_task_vma;
+
+extern int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
+				 struct task_struct *task,
+				 unsigned long addr) __weak __ksym;
+extern struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it) __weak __ksym;
+extern void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) __weak __ksym;
+
 #ifndef bpf_for_each
 /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
  * using BPF open-coded iterators without having to write mundane explicit
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 1f02168103dd..41aba139b20b 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -10,7 +10,7 @@
 #include "bpf_iter_task.skel.h"
 #include "bpf_iter_task_stack.skel.h"
 #include "bpf_iter_task_file.skel.h"
-#include "bpf_iter_task_vma.skel.h"
+#include "bpf_iter_task_vmas.skel.h"
 #include "bpf_iter_task_btf.skel.h"
 #include "bpf_iter_tcp4.skel.h"
 #include "bpf_iter_tcp6.skel.h"
@@ -1399,19 +1399,19 @@ static void str_strip_first_line(char *str)
 static void test_task_vma_common(struct bpf_iter_attach_opts *opts)
 {
 	int err, iter_fd = -1, proc_maps_fd = -1;
-	struct bpf_iter_task_vma *skel;
+	struct bpf_iter_task_vmas *skel;
 	int len, read_size = 4;
 	char maps_path[64];
 
-	skel = bpf_iter_task_vma__open();
-	if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vma__open"))
+	skel = bpf_iter_task_vmas__open();
+	if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vmas__open"))
 		return;
 
 	skel->bss->pid = getpid();
 	skel->bss->one_task = opts ? 1 : 0;
 
-	err = bpf_iter_task_vma__load(skel);
-	if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
+	err = bpf_iter_task_vmas__load(skel);
+	if (!ASSERT_OK(err, "bpf_iter_task_vmas__load"))
 		goto out;
 
 	skel->links.proc_maps = bpf_program__attach_iter(
@@ -1462,25 +1462,25 @@ static void test_task_vma_common(struct bpf_iter_attach_opts *opts)
 out:
 	close(proc_maps_fd);
 	close(iter_fd);
-	bpf_iter_task_vma__destroy(skel);
+	bpf_iter_task_vmas__destroy(skel);
 }
 
 static void test_task_vma_dead_task(void)
 {
-	struct bpf_iter_task_vma *skel;
+	struct bpf_iter_task_vmas *skel;
 	int wstatus, child_pid = -1;
 	time_t start_tm, cur_tm;
 	int err, iter_fd = -1;
 	int wait_sec = 3;
 
-	skel = bpf_iter_task_vma__open();
-	if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vma__open"))
+	skel = bpf_iter_task_vmas__open();
+	if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vmas__open"))
 		return;
 
 	skel->bss->pid = getpid();
 
-	err = bpf_iter_task_vma__load(skel);
-	if (!ASSERT_OK(err, "bpf_iter_task_vma__load"))
+	err = bpf_iter_task_vmas__load(skel);
+	if (!ASSERT_OK(err, "bpf_iter_task_vmas__load"))
 		goto out;
 
 	skel->links.proc_maps = bpf_program__attach_iter(
@@ -1533,7 +1533,7 @@ static void test_task_vma_dead_task(void)
 out:
 	waitpid(child_pid, &wstatus, 0);
 	close(iter_fd);
-	bpf_iter_task_vma__destroy(skel);
+	bpf_iter_task_vmas__destroy(skel);
 }
 
 void test_bpf_sockmap_map_iter_fd(void)
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_vmas.c
similarity index 100%
rename from tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c
rename to tools/testing/selftests/bpf/progs/bpf_iter_task_vmas.c
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 3/3] selftests/bpf: Add tests for open-coded task_vma iter
  2023-08-10 18:35 [PATCH bpf-next 0/3] Open-coded task_vma iter Dave Marchevsky
  2023-08-10 18:35 ` [PATCH bpf-next 1/3] bpf: Explicitly emit BTF for struct bpf_iter_num, not btf_iter_num Dave Marchevsky
  2023-08-10 18:35 ` [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
@ 2023-08-10 18:35 ` Dave Marchevsky
  2 siblings, 0 replies; 10+ messages in thread
From: Dave Marchevsky @ 2023-08-10 18:35 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

The open-coded task_vma iter added earlier in this series allows for
natural iteration over a task's vmas using existing open-coded iter
infrastructure, specifically bpf_for_each.

This patch adds a test demonstrating this pattern and validating
correctness. The vma->vm_start and vma->vm_end addresses of the first
1000 vmas are recorded and compared to /proc/PID/maps output. As
expected, both see the same vmas and addresses - with the exception of
the [vsyscall] vma - which is explained in a comment in the prog_tests
program.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../testing/selftests/bpf/prog_tests/iters.c  | 71 +++++++++++++++++++
 .../selftests/bpf/progs/iters_task_vma.c      | 56 +++++++++++++++
 2 files changed, 127 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma.c

diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index 10804ae5ae97..f91f4a49066a 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -8,6 +8,7 @@
 #include "iters_looping.skel.h"
 #include "iters_num.skel.h"
 #include "iters_testmod_seq.skel.h"
+#include "iters_task_vma.skel.h"
 
 static void subtest_num_iters(void)
 {
@@ -90,6 +91,74 @@ static void subtest_testmod_seq_iters(void)
 	iters_testmod_seq__destroy(skel);
 }
 
+static void subtest_task_vma_iters(void)
+{
+	unsigned long start, end, bpf_iter_start, bpf_iter_end;
+	struct iters_task_vma *skel;
+	char rest_of_line[1000];
+	unsigned int seen;
+	int err;
+	FILE *f;
+
+	skel = iters_task_vma__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	bpf_program__set_autoload(skel->progs.iter_task_vma_for_each, true);
+
+	err = iters_task_vma__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	skel->bss->target_pid = getpid();
+
+	err = iters_task_vma__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto cleanup;
+
+	iters_task_vma__detach(skel);
+	getpgid(skel->bss->target_pid);
+
+	if (!ASSERT_GT(skel->bss->vmas_seen, 0, "vmas_seen_gt_zero"))
+		goto cleanup;
+
+	f = fopen("/proc/self/maps", "r");
+	if (!ASSERT_OK_PTR(f, "proc_maps_fopen"))
+		goto cleanup;
+
+	seen = 0;
+	while (fscanf(f, "%lx-%lx %[^\n]\n", &start, &end, rest_of_line) == 3) {
+		/* [vsyscall] vma isn't _really_ part of task->mm vmas.
+		 * /proc/PID/maps returns it when out of vmas - see get_gate_vma
+		 * calls in fs/proc/task_mmu.c
+		 */
+		if (strstr(rest_of_line, "[vsyscall]"))
+			continue;
+
+		err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.vm_start),
+					  &seen, &bpf_iter_start);
+		if (!ASSERT_OK(err, "vm_start map_lookup_elem"))
+			goto cleanup;
+
+		err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.vm_end),
+					  &seen, &bpf_iter_end);
+		if (!ASSERT_OK(err, "vm_end map_lookup_elem"))
+			goto cleanup;
+
+		ASSERT_EQ(bpf_iter_start, start, "vma->vm_start match");
+		ASSERT_EQ(bpf_iter_end, end, "vma->vm_end match");
+		seen++;
+	}
+
+	fclose(f);
+
+	if (!ASSERT_EQ(skel->bss->vmas_seen, seen, "vmas_seen_eq"))
+		goto cleanup;
+
+cleanup:
+	iters_task_vma__destroy(skel);
+}
+
 void test_iters(void)
 {
 	RUN_TESTS(iters_state_safety);
@@ -103,4 +172,6 @@ void test_iters(void)
 		subtest_num_iters();
 	if (test__start_subtest("testmod_seq"))
 		subtest_testmod_seq_iters();
+	if (test__start_subtest("task_vma"))
+		subtest_task_vma_iters();
 }
diff --git a/tools/testing/selftests/bpf/progs/iters_task_vma.c b/tools/testing/selftests/bpf/progs/iters_task_vma.c
new file mode 100644
index 000000000000..b961d0a12223
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_vma.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+
+#include <limits.h>
+#include <linux/errno.h>
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+pid_t target_pid = 0;
+unsigned int vmas_seen = 0;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1000);
+	__type(key, int);
+	__type(value, unsigned long);
+} vm_start SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1000);
+	__type(key, int);
+	__type(value, unsigned long);
+} vm_end SEC(".maps");
+
+SEC("?raw_tp/sys_enter")
+int iter_task_vma_for_each(const void *ctx)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct vm_area_struct *vma;
+	unsigned long *start, *end;
+
+	if (task->pid != target_pid)
+		return 0;
+
+	bpf_for_each(task_vma, vma, task, 0) {
+		if (vmas_seen >= 1000)
+			break;
+
+		start = bpf_map_lookup_elem(&vm_start, &vmas_seen);
+		if (!start)
+			break;
+		*start = vma->vm_start;
+
+		end = bpf_map_lookup_elem(&vm_end, &vmas_seen);
+		if (!end)
+			break;
+		*end = vma->vm_end;
+
+		vmas_seen++;
+	}
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs
  2023-08-10 18:35 ` [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
@ 2023-08-10 21:57   ` Stanislav Fomichev
  2023-08-11 14:57     ` David Marchevsky
  2023-08-11 16:22   ` Yonghong Song
  2023-08-11 16:41   ` Yonghong Song
  2 siblings, 1 reply; 10+ messages in thread
From: Stanislav Fomichev @ 2023-08-10 21:57 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Nathan Slingerland

On 08/10, Dave Marchevsky wrote:
> This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_task_vma in open-coded
> iterator style. BPF programs can use these kfuncs directly or through
> bpf_for_each macro for natural-looking iteration of all task vmas.
> 
> The implementation borrows heavily from bpf_find_vma helper's locking -
> differing only in that it holds the mmap_read lock for all iterations
> while the helper only executes its provided callback on a maximum of 1
> vma. Aside from locking, struct vma_iterator and vma_next do all the
> heavy lifting.
> 
> The newly-added struct bpf_iter_task_vma has a name collision with a
> selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs
> file is renamed in order to avoid the collision.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Cc: Nathan Slingerland <slinger@meta.com>
> ---
>  include/uapi/linux/bpf.h                      |  5 ++
>  kernel/bpf/helpers.c                          |  3 +
>  kernel/bpf/task_iter.c                        | 56 +++++++++++++++++++
>  tools/include/uapi/linux/bpf.h                |  5 ++
>  tools/lib/bpf/bpf_helpers.h                   |  8 +++
>  .../selftests/bpf/prog_tests/bpf_iter.c       | 26 ++++-----
>  ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
>  7 files changed, 90 insertions(+), 13 deletions(-)
>  rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d21deb46f49f..c4a65968f9f5 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7291,4 +7291,9 @@ struct bpf_iter_num {
>  	__u64 __opaque[1];
>  } __attribute__((aligned(8)));
>  
> +struct bpf_iter_task_vma {

[..]

> +	__u64 __opaque[9]; /* See bpf_iter_num comment above */
> +	char __opaque_c[3];

Everything in the series makes sense, but this part is a big confusing
when reading without too much context. If you're gonna do a respin, maybe:

- __opaque_c[8*9+3] (or whatever the size is)? any reason for separate
  __u64 + char?
- maybe worth adding something like /* Opaque representation of
  bpf_iter_task_vma_kern; see bpf_iter_num comment above */.
  that bpf_iter_task_vma<>bpf_iter_task_vma_kern wasn't super apparent
  until I got to the BUG_ON part

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Explicitly emit BTF for struct bpf_iter_num, not btf_iter_num
  2023-08-10 18:35 ` [PATCH bpf-next 1/3] bpf: Explicitly emit BTF for struct bpf_iter_num, not btf_iter_num Dave Marchevsky
@ 2023-08-11  7:19   ` Yonghong Song
  0 siblings, 0 replies; 10+ messages in thread
From: Yonghong Song @ 2023-08-11  7:19 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/10/23 11:35 AM, Dave Marchevsky wrote:
> Commit 6018e1f407cc ("bpf: implement numbers iterator") added the
> BTF_TYPE_EMIT line that this patch is modifying. The struct btf_iter_num
> doesn't exist, so only a forward declaration is emitted in BTF:
> 
>    FWD 'btf_iter_num' fwd_kind=struct
> 
> Since that commit was probably hoping to ensure that struct bpf_iter_num
> is emitted in vmlinux BTF, this patch changes it to the correct type.
> 
> This isn't marked "Fixes" because the extraneous btf_iter_num FWD wasn't
> causing any issues that I noticed, aside from mild confusion when I
> looked through the code.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>   kernel/bpf/bpf_iter.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
> index 96856f130cbf..20ef64445754 100644
> --- a/kernel/bpf/bpf_iter.c
> +++ b/kernel/bpf/bpf_iter.c
> @@ -793,7 +793,7 @@ __bpf_kfunc int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end)
>   	BUILD_BUG_ON(sizeof(struct bpf_iter_num_kern) != sizeof(struct bpf_iter_num));
>   	BUILD_BUG_ON(__alignof__(struct bpf_iter_num_kern) != __alignof__(struct bpf_iter_num));
>   
> -	BTF_TYPE_EMIT(struct btf_iter_num);
> +	BTF_TYPE_EMIT(struct bpf_iter_num);

I think this can be removed instead.

In kernel/bpf/bpf_iter.c, we have
__bpf_kfunc int bpf_iter_num_new(struct bpf_iter_num *it, int start, int 
end)
__bpf_kfunc int *bpf_iter_num_next(struct bpf_iter_num* it)
__bpf_kfunc void bpf_iter_num_destroy(struct bpf_iter_num *it)

This will ensure that bpf_iter_num btf type will be generated by
the compiler.

>   
>   	/* start == end is legit, it's an empty range and we'll just get NULL
>   	 * on first (and any subsequent) bpf_iter_num_next() call

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs
  2023-08-10 21:57   ` Stanislav Fomichev
@ 2023-08-11 14:57     ` David Marchevsky
  2023-08-11 17:03       ` Stanislav Fomichev
  0 siblings, 1 reply; 10+ messages in thread
From: David Marchevsky @ 2023-08-11 14:57 UTC (permalink / raw)
  To: Stanislav Fomichev, Dave Marchevsky
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Nathan Slingerland

On 8/10/23 5:57 PM, Stanislav Fomichev wrote:
> On 08/10, Dave Marchevsky wrote:
>> This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_task_vma in open-coded
>> iterator style. BPF programs can use these kfuncs directly or through
>> bpf_for_each macro for natural-looking iteration of all task vmas.
>>
>> The implementation borrows heavily from bpf_find_vma helper's locking -
>> differing only in that it holds the mmap_read lock for all iterations
>> while the helper only executes its provided callback on a maximum of 1
>> vma. Aside from locking, struct vma_iterator and vma_next do all the
>> heavy lifting.
>>
>> The newly-added struct bpf_iter_task_vma has a name collision with a
>> selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs
>> file is renamed in order to avoid the collision.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> Cc: Nathan Slingerland <slinger@meta.com>
>> ---
>>  include/uapi/linux/bpf.h                      |  5 ++
>>  kernel/bpf/helpers.c                          |  3 +
>>  kernel/bpf/task_iter.c                        | 56 +++++++++++++++++++
>>  tools/include/uapi/linux/bpf.h                |  5 ++
>>  tools/lib/bpf/bpf_helpers.h                   |  8 +++
>>  .../selftests/bpf/prog_tests/bpf_iter.c       | 26 ++++-----
>>  ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
>>  7 files changed, 90 insertions(+), 13 deletions(-)
>>  rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index d21deb46f49f..c4a65968f9f5 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7291,4 +7291,9 @@ struct bpf_iter_num {
>>  	__u64 __opaque[1];
>>  } __attribute__((aligned(8)));
>>  
>> +struct bpf_iter_task_vma {
> 
> [..]
> 
>> +	__u64 __opaque[9]; /* See bpf_iter_num comment above */
>> +	char __opaque_c[3];
> 
> Everything in the series makes sense, but this part is a big confusing
> when reading without too much context. If you're gonna do a respin, maybe:
> 
> - __opaque_c[8*9+3] (or whatever the size is)? any reason for separate
>   __u64 + char?

IIUC this is because BTF generation doesn't pick up __attribute__((aligned(8))),
so if a vmlinux.h is generated via 'bpftool btf dump file vmlinux format c' and
this struct only contains chars, it won't have the correct alignment.

I'm not sure if the bitfield approach taken by bpf_{list,rb}_node similar has
the same effect. Some quick googling indicates that if it does, it's probably
not in the C standard.

But yeah, I agree that it's ugly. While we're on the topic, WDYT about my
comment in the cover letter about this struct (copied here for convenience):

  * The struct vma_iterator wrapped by struct bpf_iter_task_vma itself wraps
    struct ma_state. Because we need the entire struct, not a ptr, changes to
    either struct vma_iterator or struct ma_state will necessitate changing the
    opaque struct bpf_iter_task_vma to account for the new size. This feels a
    bit brittle. We could instead use bpf_mem_alloc to allocate a struct
    vma_iterator in bpf_iter_task_vma_new and have struct bpf_iter_task_vma
    point to that, but that's not quite equivalent as BPF progs will usually
    use the stack for this struct via bpf_for_each. Went with the simpler route
    for now.

> - maybe worth adding something like /* Opaque representation of
>   bpf_iter_task_vma_kern; see bpf_iter_num comment above */.
>   that bpf_iter_task_vma<>bpf_iter_task_vma_kern wasn't super apparent
>   until I got to the BUG_ON part

It feels weird to refer to the non-UAPI _kern struct in uapi header. Maybe
better to add a comment to the _kern struct referring to this one? I don't
feel strongly either way, though.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs
  2023-08-10 18:35 ` [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
  2023-08-10 21:57   ` Stanislav Fomichev
@ 2023-08-11 16:22   ` Yonghong Song
  2023-08-11 16:41   ` Yonghong Song
  2 siblings, 0 replies; 10+ messages in thread
From: Yonghong Song @ 2023-08-11 16:22 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Nathan Slingerland



On 8/10/23 11:35 AM, Dave Marchevsky wrote:
> This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_task_vma in open-coded
> iterator style. BPF programs can use these kfuncs directly or through
> bpf_for_each macro for natural-looking iteration of all task vmas.
> 
> The implementation borrows heavily from bpf_find_vma helper's locking -
> differing only in that it holds the mmap_read lock for all iterations
> while the helper only executes its provided callback on a maximum of 1
> vma. Aside from locking, struct vma_iterator and vma_next do all the
> heavy lifting.
> 
> The newly-added struct bpf_iter_task_vma has a name collision with a
> selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs
> file is renamed in order to avoid the collision.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Cc: Nathan Slingerland <slinger@meta.com>
> ---
>   include/uapi/linux/bpf.h                      |  5 ++
>   kernel/bpf/helpers.c                          |  3 +
>   kernel/bpf/task_iter.c                        | 56 +++++++++++++++++++
>   tools/include/uapi/linux/bpf.h                |  5 ++
>   tools/lib/bpf/bpf_helpers.h                   |  8 +++
>   .../selftests/bpf/prog_tests/bpf_iter.c       | 26 ++++-----
>   ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
>   7 files changed, 90 insertions(+), 13 deletions(-)
>   rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d21deb46f49f..c4a65968f9f5 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7291,4 +7291,9 @@ struct bpf_iter_num {
>   	__u64 __opaque[1];
>   } __attribute__((aligned(8)));
>   
> +struct bpf_iter_task_vma {
> +	__u64 __opaque[9]; /* See bpf_iter_num comment above */
> +	char __opaque_c[3];
> +} __attribute__((aligned(8)));

I do see we have issues with this struct. See below.

> +
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index eb91cae0612a..7a06dea749f1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2482,6 +2482,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
>   BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
>   BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
>   BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY)
>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index c4ab9d6cdbe9..76be9998a65a 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -8,6 +8,7 @@
>   #include <linux/fdtable.h>
>   #include <linux/filter.h>
>   #include <linux/btf_ids.h>
> +#include <linux/mm_types.h>
>   #include "mmap_unlock_work.h"
>   
>   static const char * const iter_task_type_names[] = {
> @@ -823,6 +824,61 @@ const struct bpf_func_proto bpf_find_vma_proto = {
>   	.arg5_type	= ARG_ANYTHING,
>   };
>   
> +struct bpf_iter_task_vma_kern {
> +	struct mm_struct *mm;
> +	struct mmap_unlock_irq_work *work;
> +	struct vma_iterator vmi;
> +} __attribute__((aligned(8)));

Let us say in 6.5, There is an app developed with 6.5 and
everything works fine.

And in 6.6, vma_iterator size changed, either less or more than
the size in 6.5, then how you fix this issue? You need to update
uapi header bpf_iter_task_vma? Update the header file?
If the vma_iterator size is grew from 6.6, then the app won't work
any more.

So I suggest we do allocation for vma_iterator in bpf_iter_task_vma_new
to avoid this uapi issue.

> +
> +__bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
> +				      struct task_struct *task, u64 addr)
> +{
> +	struct bpf_iter_task_vma_kern *i = (void *)it;

i => kit?

> +	bool irq_work_busy = false;
> +
> +	BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
> +	BUILD_BUG_ON(__alignof__(struct bpf_iter_task_vma_kern) != __alignof__(struct bpf_iter_task_vma));
> +
> +	BTF_TYPE_EMIT(struct bpf_iter_task_vma);

This is not needed.

> +
> +	/* NULL i->mm signals failed bpf_iter_task_vma initialization.
> +	 * i->work == NULL is valid.
> +	 */
> +	i->mm = NULL;
> +	if (!task)
> +		return -ENOENT;
> +
> +	i->mm = task->mm;
> +	if (!i->mm)
> +		return -ENOENT;
> +
> +	irq_work_busy = bpf_mmap_unlock_get_irq_work(&i->work);
> +	if (irq_work_busy || !mmap_read_trylock(i->mm)) {
> +		i->mm = NULL;
> +		return -EBUSY;
> +	}
> +
> +	vma_iter_init(&i->vmi, i->mm, addr);
> +	return 0;
> +}
> +
> +__bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it)
> +{
> +	struct bpf_iter_task_vma_kern *i = (void *)it;
> +
> +	if (!i->mm) /* bpf_iter_task_vma_new failed */
> +		return NULL;
> +	return vma_next(&i->vmi);
> +}
> +
> +__bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
> +{
> +	struct bpf_iter_task_vma_kern *i = (void *)it;
> +
> +	if (i->mm)
> +		bpf_mmap_unlock_mm(i->work, i->mm);
> +}
> +
>   DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>   
[...]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs
  2023-08-10 18:35 ` [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
  2023-08-10 21:57   ` Stanislav Fomichev
  2023-08-11 16:22   ` Yonghong Song
@ 2023-08-11 16:41   ` Yonghong Song
  2 siblings, 0 replies; 10+ messages in thread
From: Yonghong Song @ 2023-08-11 16:41 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Nathan Slingerland



On 8/10/23 11:35 AM, Dave Marchevsky wrote:
> This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_task_vma in open-coded
> iterator style. BPF programs can use these kfuncs directly or through
> bpf_for_each macro for natural-looking iteration of all task vmas.
> 
> The implementation borrows heavily from bpf_find_vma helper's locking -
> differing only in that it holds the mmap_read lock for all iterations
> while the helper only executes its provided callback on a maximum of 1
> vma. Aside from locking, struct vma_iterator and vma_next do all the
> heavy lifting.
> 
> The newly-added struct bpf_iter_task_vma has a name collision with a
> selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs
> file is renamed in order to avoid the collision.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Cc: Nathan Slingerland <slinger@meta.com>
> ---
>   include/uapi/linux/bpf.h                      |  5 ++
>   kernel/bpf/helpers.c                          |  3 +
>   kernel/bpf/task_iter.c                        | 56 +++++++++++++++++++
>   tools/include/uapi/linux/bpf.h                |  5 ++
>   tools/lib/bpf/bpf_helpers.h                   |  8 +++
>   .../selftests/bpf/prog_tests/bpf_iter.c       | 26 ++++-----
>   ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
>   7 files changed, 90 insertions(+), 13 deletions(-)
>   rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d21deb46f49f..c4a65968f9f5 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7291,4 +7291,9 @@ struct bpf_iter_num {
>   	__u64 __opaque[1];
>   } __attribute__((aligned(8)));
>   
> +struct bpf_iter_task_vma {
> +	__u64 __opaque[9]; /* See bpf_iter_num comment above */
> +	char __opaque_c[3];
> +} __attribute__((aligned(8)));
> +
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index eb91cae0612a..7a06dea749f1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2482,6 +2482,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
>   BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
>   BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
>   BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY)
>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index c4ab9d6cdbe9..76be9998a65a 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -8,6 +8,7 @@
>   #include <linux/fdtable.h>
>   #include <linux/filter.h>
>   #include <linux/btf_ids.h>
> +#include <linux/mm_types.h>
>   #include "mmap_unlock_work.h"
>   
>   static const char * const iter_task_type_names[] = {
> @@ -823,6 +824,61 @@ const struct bpf_func_proto bpf_find_vma_proto = {
>   	.arg5_type	= ARG_ANYTHING,
>   };
>   
> +struct bpf_iter_task_vma_kern {
> +	struct mm_struct *mm;
> +	struct mmap_unlock_irq_work *work;
> +	struct vma_iterator vmi;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
> +				      struct task_struct *task, u64 addr)
> +{
> +	struct bpf_iter_task_vma_kern *i = (void *)it;
> +	bool irq_work_busy = false;
> +
> +	BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
> +	BUILD_BUG_ON(__alignof__(struct bpf_iter_task_vma_kern) != __alignof__(struct bpf_iter_task_vma));
> +
> +	BTF_TYPE_EMIT(struct bpf_iter_task_vma);
> +
> +	/* NULL i->mm signals failed bpf_iter_task_vma initialization.
> +	 * i->work == NULL is valid.
> +	 */
> +	i->mm = NULL;
> +	if (!task)
> +		return -ENOENT;
> +
> +	i->mm = task->mm;
> +	if (!i->mm)
> +		return -ENOENT;

We might have an issue here as well if task is in __put_task_struct() 
stage. It is possible that we did i->mm from task->mm and then
task is freed and 'mm' is reused by somebody self.

To prevent such cases, I suggest we try to take a reference
of 'task' first. If we can get a reference then task is valid
and task->mm will not be freed and we will be fine.

> +
> +	irq_work_busy = bpf_mmap_unlock_get_irq_work(&i->work);
> +	if (irq_work_busy || !mmap_read_trylock(i->mm)) {
> +		i->mm = NULL;
> +		return -EBUSY;
> +	}
> +
> +	vma_iter_init(&i->vmi, i->mm, addr);
> +	return 0;
> +}
> +
[...]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs
  2023-08-11 14:57     ` David Marchevsky
@ 2023-08-11 17:03       ` Stanislav Fomichev
  0 siblings, 0 replies; 10+ messages in thread
From: Stanislav Fomichev @ 2023-08-11 17:03 UTC (permalink / raw)
  To: David Marchevsky
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team,
	Nathan Slingerland

On 08/11, David Marchevsky wrote:
> On 8/10/23 5:57 PM, Stanislav Fomichev wrote:
> > On 08/10, Dave Marchevsky wrote:
> >> This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_task_vma in open-coded
> >> iterator style. BPF programs can use these kfuncs directly or through
> >> bpf_for_each macro for natural-looking iteration of all task vmas.
> >>
> >> The implementation borrows heavily from bpf_find_vma helper's locking -
> >> differing only in that it holds the mmap_read lock for all iterations
> >> while the helper only executes its provided callback on a maximum of 1
> >> vma. Aside from locking, struct vma_iterator and vma_next do all the
> >> heavy lifting.
> >>
> >> The newly-added struct bpf_iter_task_vma has a name collision with a
> >> selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs
> >> file is renamed in order to avoid the collision.
> >>
> >> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> >> Cc: Nathan Slingerland <slinger@meta.com>
> >> ---
> >>  include/uapi/linux/bpf.h                      |  5 ++
> >>  kernel/bpf/helpers.c                          |  3 +
> >>  kernel/bpf/task_iter.c                        | 56 +++++++++++++++++++
> >>  tools/include/uapi/linux/bpf.h                |  5 ++
> >>  tools/lib/bpf/bpf_helpers.h                   |  8 +++
> >>  .../selftests/bpf/prog_tests/bpf_iter.c       | 26 ++++-----
> >>  ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
> >>  7 files changed, 90 insertions(+), 13 deletions(-)
> >>  rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index d21deb46f49f..c4a65968f9f5 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -7291,4 +7291,9 @@ struct bpf_iter_num {
> >>  	__u64 __opaque[1];
> >>  } __attribute__((aligned(8)));
> >>  
> >> +struct bpf_iter_task_vma {
> > 
> > [..]
> > 
> >> +	__u64 __opaque[9]; /* See bpf_iter_num comment above */
> >> +	char __opaque_c[3];
> > 
> > Everything in the series makes sense, but this part is a big confusing
> > when reading without too much context. If you're gonna do a respin, maybe:
> > 
> > - __opaque_c[8*9+3] (or whatever the size is)? any reason for separate
> >   __u64 + char?
> 
> IIUC this is because BTF generation doesn't pick up __attribute__((aligned(8))),
> so if a vmlinux.h is generated via 'bpftool btf dump file vmlinux format c' and
> this struct only contains chars, it won't have the correct alignment.
> 
> I'm not sure if the bitfield approach taken by bpf_{list,rb}_node similar has
> the same effect. Some quick googling indicates that if it does, it's probably
> not in the C standard.

Ugh, the alignment, right..

> But yeah, I agree that it's ugly. While we're on the topic, WDYT about my
> comment in the cover letter about this struct (copied here for convenience):
> 
>   * The struct vma_iterator wrapped by struct bpf_iter_task_vma itself wraps
>     struct ma_state. Because we need the entire struct, not a ptr, changes to
>     either struct vma_iterator or struct ma_state will necessitate changing the
>     opaque struct bpf_iter_task_vma to account for the new size. This feels a
>     bit brittle. We could instead use bpf_mem_alloc to allocate a struct
>     vma_iterator in bpf_iter_task_vma_new and have struct bpf_iter_task_vma
>     point to that, but that's not quite equivalent as BPF progs will usually
>     use the stack for this struct via bpf_for_each. Went with the simpler route
>     for now.

LGTM! (assuming you'll keep non-pointer; looking at that other thread
where Yonghong suggests to go with the ptr...)

> > - maybe worth adding something like /* Opaque representation of
> >   bpf_iter_task_vma_kern; see bpf_iter_num comment above */.
> >   that bpf_iter_task_vma<>bpf_iter_task_vma_kern wasn't super apparent
> >   until I got to the BUG_ON part
> 
> It feels weird to refer to the non-UAPI _kern struct in uapi header. Maybe
> better to add a comment to the _kern struct referring to this one? I don't
> feel strongly either way, though.

Yeah, good point, let's keep as is.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-11 17:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-10 18:35 [PATCH bpf-next 0/3] Open-coded task_vma iter Dave Marchevsky
2023-08-10 18:35 ` [PATCH bpf-next 1/3] bpf: Explicitly emit BTF for struct bpf_iter_num, not btf_iter_num Dave Marchevsky
2023-08-11  7:19   ` Yonghong Song
2023-08-10 18:35 ` [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
2023-08-10 21:57   ` Stanislav Fomichev
2023-08-11 14:57     ` David Marchevsky
2023-08-11 17:03       ` Stanislav Fomichev
2023-08-11 16:22   ` Yonghong Song
2023-08-11 16:41   ` Yonghong Song
2023-08-10 18:35 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests for open-coded task_vma iter Dave Marchevsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox