All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/3] Open-coded task_vma iter
@ 2023-08-21 17:34 Dave Marchevsky
  2023-08-21 17:34 ` [PATCH v2 bpf-next 1/3] bpf: Don't explicitly emit BTF for struct btf_iter_num Dave Marchevsky
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dave Marchevsky @ 2023-08-21 17:34 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, yonghong.song, sdf,
	Dave Marchevsky

At Meta we have a profiling daemon which periodically collects
information on many hosts. This collection usually involves grabbing
stacks (user and kernel) using perf_event BPF progs and later symbolicating
them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on
remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In
those cases we must fall back to digging around in /proc/PID/maps to map
virtual address to (binary, offset). The /proc/PID/maps digging does not
occur synchronously with stack collection, so the process might already
be gone, in which case it won't have /proc/PID/maps and we will fail to
symbolicate.

This 'exited process problem' doesn't occur very often as
most of the prod services we care to profile are long-lived daemons, but
there are enough usecases to warrant a workaround: a BPF program which
can be optionally loaded at data collection time and essentially walks
/proc/PID/maps. Currently this is done by walking the vma list:

  struct vm_area_struct* mmap = BPF_CORE_READ(mm, mmap);
  mmap_next = BPF_CORE_READ(rmap, vm_next); /* in a loop */

Since commit 763ecb035029 ("mm: remove the vma linked list") there's no
longer a vma linked list to walk. Walking the vma maple tree is not as
simple as hopping struct vm_area_struct->vm_next. Luckily,
commit f39af05949a4 ("mm: add VMA iterator"), another commit in that series,
added struct vma_iterator and for_each_vma macro for easy vma iteration. If
similar functionality was exposed to BPF programs, it would be perfect for our
usecase.

This series adds such functionality, specifically a BPF equivalent of
for_each_vma using the open-coded iterator style.

Notes:
  * This approach was chosen after discussion on a previous series [0] which
    attempted to solve the same problem by adding a BPF_F_VMA_NEXT flag to
    bpf_find_vma.
  * Unlike the task_vma bpf_iter, the open-coded iterator kfuncs here do not
    drop the vma read lock between iterations. See Alexei's response in [0].
  * The [vsyscall] page isn't really part of task->mm's vmas, but
    /proc/PID/maps returns information about it anyways. The vma iter added
    here does not do the same. See comment on selftest in patch 3.
  * bpf_iter_task_vma allocates a struct vma_iterator using BPF
    allocator and keeps a pointer to it. This is done in order to
    prevent changes to struct ma_state - which is wrapped by struct
    vma_iterator - from necessitating changes to uapi struct
    bpf_iter_task_vma.

Changelog:

v1 -> v2: https://lore.kernel.org/bpf/20230810183513.684836-1-davemarchevsky@fb.com/
  * Patch 1
    * Now removes the unnecessary BTF_TYPE_EMIT instead of changing the
      type (Yonghong)
  * Patch 2
    * Don't do unnecessary BTF_TYPE_EMIT (Yonghong)
    * Bump task refcount to prevent ->mm reuse (Yonghong)
    * Keep a pointer to vma_iterator in bpf_iter_task_vma, alloc/free
      via BPF mem allocator (Yonghong, Stanislav)
  * Patch 3

Patch summary:
  * Patch 1 is a tiny fix I ran into while implementing the vma iter in this
    series. It can be applied independently.
  * Patch 2 is the meat of the implementation
  * Patch 3 adds tests for the new functionality
    * Existing iter tests exercise failure cases (e.g. prog that doesn't call
      _destroy()). I didn't replicate them in this series, but am happy to add
      them in v2 if folks feel that it would be worthwhile.

  [0]: https://lore.kernel.org/bpf/20230801145414.418145-1-davemarchevsky@fb.com/

Dave Marchevsky (3):
  bpf: Don't explicitly emit BTF for struct btf_iter_num
  bpf: Introduce task_vma open-coded iterator kfuncs
  selftests/bpf: Add tests for open-coded task_vma iter

 include/uapi/linux/bpf.h                      |  4 +
 kernel/bpf/bpf_iter.c                         |  2 -
 kernel/bpf/helpers.c                          |  3 +
 kernel/bpf/task_iter.c                        | 79 +++++++++++++++++++
 tools/include/uapi/linux/bpf.h                |  5 ++
 tools/lib/bpf/bpf_helpers.h                   |  8 ++
 .../selftests/bpf/prog_tests/bpf_iter.c       | 26 +++---
 .../testing/selftests/bpf/prog_tests/iters.c  | 71 +++++++++++++++++
 ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
 .../selftests/bpf/progs/iters_task_vma.c      | 56 +++++++++++++
 10 files changed, 239 insertions(+), 15 deletions(-)
 rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma.c

-- 
2.34.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-22  1:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-21 17:34 [PATCH v2 bpf-next 0/3] Open-coded task_vma iter Dave Marchevsky
2023-08-21 17:34 ` [PATCH v2 bpf-next 1/3] bpf: Don't explicitly emit BTF for struct btf_iter_num Dave Marchevsky
2023-08-22  1:21   ` Yonghong Song
2023-08-21 17:34 ` [PATCH v2 bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
2023-08-21 20:32   ` David Marchevsky
2023-08-21 22:24   ` Alexei Starovoitov
2023-08-21 17:34 ` [PATCH v2 bpf-next 3/3] selftests/bpf: Add tests for open-coded task_vma iter Dave Marchevsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.