public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Marchevsky <david.marchevsky@linux.dev>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Kernel Team <kernel-team@fb.com>,
	Stanislav Fomichev <sdf@google.com>,
	Nathan Slingerland <slinger@meta.com>
Subject: Re: [PATCH v3 bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs
Date: Tue, 22 Aug 2023 17:11:12 -0700	[thread overview]
Message-ID: <66c62a37-cb64-f59f-9faf-bd9d5c425b99@linux.dev> (raw)
In-Reply-To: <CAADnVQKffM8=3b_hD0EDM9r-rEwipKGmA2rSz=G_FOXaBUp+6g@mail.gmail.com>



On 8/22/23 3:36 PM, Alexei Starovoitov wrote:
> On Tue, Aug 22, 2023 at 1:14 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>>
>> On 8/22/23 12:19 PM, David Marchevsky wrote:
>>> On 8/22/23 1:42 PM, Yonghong Song wrote:
>>>>
>>>>
>>>> On 8/21/23 10:05 PM, Dave Marchevsky wrote:
>>>>> This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
>>>>> creation and manipulation of struct bpf_iter_task_vma in open-coded
>>>>> iterator style. BPF programs can use these kfuncs directly or through
>>>>> bpf_for_each macro for natural-looking iteration of all task vmas.
>>>>>
>>>>> The implementation borrows heavily from bpf_find_vma helper's locking -
>>>>> differing only in that it holds the mmap_read lock for all iterations
>>>>> while the helper only executes its provided callback on a maximum of 1
>>>>> vma. Aside from locking, struct vma_iterator and vma_next do all the
>>>>> heavy lifting.
>>>>>
>>>>> The newly-added struct bpf_iter_task_vma has a name collision with a
>>>>> selftest for the seq_file task_vma iter's bpf skel, so the selftests/bpf/progs
>>>>> file is renamed in order to avoid the collision.
>>>>>
>>>>> A pointer to an inner data struct, struct bpf_iter_task_vma_kern_data, is the
>>>>> only field in struct bpf_iter_task_vma. This is because the inner data
>>>>> struct contains a struct vma_iterator (not ptr), whose size is likely to
>>>>> change under us. If bpf_iter_task_vma_kern contained vma_iterator directly
>>>>> such a change would require change in opaque bpf_iter_task_vma struct's
>>>>> size. So better to allocate vma_iterator using BPF allocator, and since
>>>>> that alloc must already succeed, might as well allocate all iter fields,
>>>>> thereby freezing struct bpf_iter_task_vma size.
>>>>>
>>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>>> Cc: Nathan Slingerland <slinger@meta.com>
>>>>> ---
>>>>>     include/uapi/linux/bpf.h                      |  4 +
>>>>>     kernel/bpf/helpers.c                          |  3 +
>>>>>     kernel/bpf/task_iter.c                        | 84 +++++++++++++++++++
>>>>>     tools/include/uapi/linux/bpf.h                |  4 +
>>>>>     tools/lib/bpf/bpf_helpers.h                   |  8 ++
>>>>>     .../selftests/bpf/prog_tests/bpf_iter.c       | 26 +++---
>>>>>     ...f_iter_task_vma.c => bpf_iter_task_vmas.c} |  0
>>>>>     7 files changed, 116 insertions(+), 13 deletions(-)
>>>>>     rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%)
>>>>>
>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>>> index 8790b3962e4b..49fc1989a548 100644
>>>>> --- a/include/uapi/linux/bpf.h
>>>>> +++ b/include/uapi/linux/bpf.h
>>>>> @@ -7311,4 +7311,8 @@ struct bpf_iter_num {
>>>>>         __u64 __opaque[1];
>>>>>     } __attribute__((aligned(8)));
>>>>>     +struct bpf_iter_task_vma {
>>>>> +    __u64 __opaque[1]; /* See bpf_iter_num comment above */
>>>>> +} __attribute__((aligned(8)));
>>>>
>>>> In the future, we might have bpf_iter_cgroup, bpf_iter_task, bpf_iter_cgroup_task, etc. They may all use the same struct
>>>> like
>>>>     struct bpf_iter_<...> {
>>>>       __u64 __opaque[1];
>>>>     } __attribute__((aligned(8)));
>>>>
>>>> Maybe we want a generic one instead of having lots of
>>>> structs with the same underline definition? For example,
>>>>     struct bpf_iter_generic
>>>> ?
>>>>
>>>
>>> The bpf_for_each macro assumes a consistent naming scheme for opaque iter struct
>>> and associated kfuncs. Having a 'bpf_iter_generic' shared amongst multiple types
>>> of iters would break the scheme. We could:
>>>
>>>     * Add bpf_for_each_generic that only uses bpf_iter_generic
>>>       * This exposes implementation details in an ugly way, though.
>>>     * Do some macro magic to pick bpf_iter_generic for some types of iters, and
>>>       use consistent naming pattern for others.
>>>       * I'm not sure how to do this with preprocessor
>>>     * Migrate all opaque iter structs to only contain pointer to bpf_mem_alloc'd
>>>       data struct, and use bpf_iter_generic for all of them
>>>       * Probably need to see more iter implementation / usage before making such
>>>         a change
>>>     * Do 'typedef __u64 __aligned(8) bpf_iter_<...>
>>>       * BTF_KIND_TYPEDEF intead of BTF_KIND_STRUCT might throw off some verifier
>>>         logic. Could do similar typedef w/ struct to try to work around
>>>         it.
>>>
>>> Let me know what you think. Personally I considered doing typedef while
>>> implementing this, so that's the alternative I'd choose.
>>
>> Okay, since we have naming convention restriction, typedef probably the
>> best option, something like
>>     typedef struct bpf_iter_num bpf_iter_task_vma
>> ?
>>
>> Verifier might need to be changed if verifier strips all modifiers
>> (including tyypedef) to find the struct name.
> 
> I don't quite see how typedef helps here.
> Say we do:
> struct bpf_iter_task_vma {
>       __u64 __opaque[1];
> } __attribute__((aligned(8)));
> 
> as Dave is proposing.
> Then tomorrow we add another bpf_iter_foo that is exactly the same opaque[1].
> And we will have bpf_iter_num, bpf_iter_task_vma, bpf_iter_foo structs
> with the same layout. So what? Eye sore?
> In case we need to extend task_vma from 1 to 2 it will be easier to do
> when all of them are separate structs.
> 
> And typedef has unknown verification implications.

This is true. Some investigation is needed.

> 
> Either way we need to find a way to move these structs from uapi/bpf.h
> along with bpf_rb_root and friends to some "obviously unstable" header.

If we move this out uapi/bpf.h to an unstable header, we will have much
more flexibility for future change. Original idea (v1) to allocate the
whole struct on the stack would be okay even if kernel changes
struct vma_iterator size, whether bigger or smaller.

  parent reply	other threads:[~2023-08-23  0:11 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-22  5:05 [PATCH v3 bpf-next 0/3] Open-coded task_vma iter Dave Marchevsky
2023-08-22  5:05 ` [PATCH v3 bpf-next 1/3] bpf: Don't explicitly emit BTF for struct btf_iter_num Dave Marchevsky
2023-08-22 23:37   ` Andrii Nakryiko
2023-08-22  5:05 ` [PATCH v3 bpf-next 2/3] bpf: Introduce task_vma open-coded iterator kfuncs Dave Marchevsky
2023-08-22 17:42   ` Yonghong Song
2023-08-22 19:19     ` David Marchevsky
2023-08-22 20:14       ` Yonghong Song
2023-08-22 22:36         ` Alexei Starovoitov
2023-08-22 23:57           ` Andrii Nakryiko
2023-08-23  0:11           ` Yonghong Song [this message]
2023-08-23  0:04       ` Andrii Nakryiko
2023-08-23  5:42         ` David Marchevsky
2023-08-23 14:57           ` Alexei Starovoitov
2023-08-23 16:55             ` Andrii Nakryiko
2023-08-22 23:52   ` Andrii Nakryiko
2023-08-23  7:26     ` David Marchevsky
2023-08-23 15:03       ` Alexei Starovoitov
2023-08-23 17:14         ` Andrii Nakryiko
2023-08-23 17:53           ` Alexei Starovoitov
2023-08-23 18:13             ` Andrii Nakryiko
2023-08-23 17:07       ` Andrii Nakryiko
2023-08-23 17:26         ` Alexei Starovoitov
2023-08-23 17:43           ` Andrii Nakryiko
2023-08-22  5:05 ` [PATCH v3 bpf-next 3/3] selftests/bpf: Add tests for open-coded task_vma iter Dave Marchevsky
2023-08-23  0:13   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66c62a37-cb64-f59f-9faf-bd9d5c425b99@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=david.marchevsky@linux.dev \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=sdf@google.com \
    --cc=slinger@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox