All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Dave Marchevsky <davemarchevsky@fb.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Kernel Team <kernel-team@fb.com>,
	Nathan Slingerland <slinger@meta.com>
Subject: Re: [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs
Date: Fri, 11 Aug 2023 09:22:15 -0700	[thread overview]
Message-ID: <2ade9ea7-e4f5-9fc8-397c-006d37565360@linux.dev> (raw)
In-Reply-To: <20230810183513.684836-3-davemarchevsky@fb.com>



On 8/10/23 11:35 AM, Dave Marchevsky wrote:
> This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_task_vma in open-coded
> iterator style. BPF programs can use these kfuncs directly or through
> bpf_for_each macro for natural-looking iteration of all task vmas.
> 
> The implementation borrows heavily from bpf_find_vma helper's locking -
> differing only in that it holds the mmap_read lock for all iterations
> while the helper only executes its provided callback on a maximum of 1
> vma. Aside from locking, struct vma_iterator and vma_next do all the
> heavy lifting.
> 
> The newly-added struct bpf_iter_task_vma has a name collision with a
> selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs
> file is renamed in order to avoid the collision.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Cc: Nathan Slingerland <slinger@meta.com>
> ---
>   include/uapi/linux/bpf.h                      |  5 ++
>   kernel/bpf/helpers.c                          |  3 +
>   kernel/bpf/task_iter.c                        | 56 +++++++++++++++++++
>   tools/include/uapi/linux/bpf.h                |  5 ++
>   tools/lib/bpf/bpf_helpers.h                   |  8 +++
>   .../selftests/bpf/prog_tests/bpf_iter.c       | 26 ++++-----
>   ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
>   7 files changed, 90 insertions(+), 13 deletions(-)
>   rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d21deb46f49f..c4a65968f9f5 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7291,4 +7291,9 @@ struct bpf_iter_num {
>   	__u64 __opaque[1];
>   } __attribute__((aligned(8)));
>   
> +struct bpf_iter_task_vma {
> +	__u64 __opaque[9]; /* See bpf_iter_num comment above */
> +	char __opaque_c[3];
> +} __attribute__((aligned(8)));

I do see we have issues with this struct. See below.

> +
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index eb91cae0612a..7a06dea749f1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2482,6 +2482,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
>   BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
>   BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
>   BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY)
>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index c4ab9d6cdbe9..76be9998a65a 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -8,6 +8,7 @@
>   #include <linux/fdtable.h>
>   #include <linux/filter.h>
>   #include <linux/btf_ids.h>
> +#include <linux/mm_types.h>
>   #include "mmap_unlock_work.h"
>   
>   static const char * const iter_task_type_names[] = {
> @@ -823,6 +824,61 @@ const struct bpf_func_proto bpf_find_vma_proto = {
>   	.arg5_type	= ARG_ANYTHING,
>   };
>   
> +struct bpf_iter_task_vma_kern {
> +	struct mm_struct *mm;
> +	struct mmap_unlock_irq_work *work;
> +	struct vma_iterator vmi;
> +} __attribute__((aligned(8)));

Let us say in 6.5, There is an app developed with 6.5 and
everything works fine.

And in 6.6, vma_iterator size changed, either less or more than
the size in 6.5, then how you fix this issue? You need to update
uapi header bpf_iter_task_vma? Update the header file?
If the vma_iterator size is grew from 6.6, then the app won't work
any more.

So I suggest we do allocation for vma_iterator in bpf_iter_task_vma_new
to avoid this uapi issue.

> +
> +__bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it,
> +				      struct task_struct *task, u64 addr)
> +{
> +	struct bpf_iter_task_vma_kern *i = (void *)it;

i => kit?

> +	bool irq_work_busy = false;
> +
> +	BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma));
> +	BUILD_BUG_ON(__alignof__(struct bpf_iter_task_vma_kern) != __alignof__(struct bpf_iter_task_vma));
> +
> +	BTF_TYPE_EMIT(struct bpf_iter_task_vma);

This is not needed.

> +
> +	/* NULL i->mm signals failed bpf_iter_task_vma initialization.
> +	 * i->work == NULL is valid.
> +	 */
> +	i->mm = NULL;
> +	if (!task)
> +		return -ENOENT;
> +
> +	i->mm = task->mm;
> +	if (!i->mm)
> +		return -ENOENT;
> +
> +	irq_work_busy = bpf_mmap_unlock_get_irq_work(&i->work);
> +	if (irq_work_busy || !mmap_read_trylock(i->mm)) {
> +		i->mm = NULL;
> +		return -EBUSY;
> +	}
> +
> +	vma_iter_init(&i->vmi, i->mm, addr);
> +	return 0;
> +}
> +
> +__bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it)
> +{
> +	struct bpf_iter_task_vma_kern *i = (void *)it;
> +
> +	if (!i->mm) /* bpf_iter_task_vma_new failed */
> +		return NULL;
> +	return vma_next(&i->vmi);
> +}
> +
> +__bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it)
> +{
> +	struct bpf_iter_task_vma_kern *i = (void *)it;
> +
> +	if (i->mm)
> +		bpf_mmap_unlock_mm(i->work, i->mm);
> +}
> +
>   DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>   
[...]

  parent reply	other threads:[~2023-08-11 16:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10 18:35 [PATCH bpf-next 0/3] Open-coded task_vma iter Dave Marchevsky
2023-08-10 18:35 ` [PATCH bpf-next 1/3] bpf: Explicitly emit BTF for struct bpf_iter_num, not btf_iter_num Dave Marchevsky
2023-08-11  7:19   ` Yonghong Song
2023-08-10 18:35 ` [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
2023-08-10 21:57   ` Stanislav Fomichev
2023-08-11 14:57     ` David Marchevsky
2023-08-11 17:03       ` Stanislav Fomichev
2023-08-11 16:22   ` Yonghong Song [this message]
2023-08-11 16:41   ` Yonghong Song
2023-08-10 18:35 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests for open-coded task_vma iter Dave Marchevsky
  -- strict thread matches above, loose matches on Subject: below --
2023-08-11  6:15 [PATCH bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs kernel test robot
2023-08-11 23:08 kernel test robot
2023-08-29  2:23 ` Liu, Yujie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ade9ea7-e4f5-9fc8-397c-006d37565360@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=slinger@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.