BPF List
 help / color / mirror / Atom feed
From: Kui-Feng Lee <sinquersw@gmail.com>
To: Martin KaFai Lau <martin.lau@linux.dev>, thinker.li@gmail.com
Cc: kuifeng@meta.com, bpf@vger.kernel.org, ast@kernel.org,
	song@kernel.org, kernel-team@meta.com, andrii@kernel.org
Subject: Re: [PATCH bpf-next v2 2/3] bpf: struct_ops supports more than one page for trampolines.
Date: Fri, 23 Feb 2024 10:29:18 -0800	[thread overview]
Message-ID: <c938c3b1-8cce-4563-930d-7e8150365117@gmail.com> (raw)
In-Reply-To: <33c2317c-fde0-4503-991b-314f20d9e7f7@gmail.com>




On 2/23/24 09:36, Kui-Feng Lee wrote:
> 
> 
> On 2/22/24 21:25, Martin KaFai Lau wrote:
>> On 2/22/24 7:01 PM, Kui-Feng Lee wrote:
>>>
>>>
>>>
>>> On 2/22/24 18:16, Martin KaFai Lau wrote:
>>>> On 2/22/24 5:35 PM, Kui-Feng Lee wrote:
>>>>>
>>>>>
>>>>> On 2/22/24 16:33, Martin KaFai Lau wrote:
>>>>>> On 2/21/24 2:59 PM, thinker.li@gmail.com wrote:
>>>>>>> @@ -531,10 +567,10 @@ static long 
>>>>>>> bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
>>>>>>>       const struct btf_type *module_type;
>>>>>>>       const struct btf_member *member;
>>>>>>>       const struct btf_type *t = st_ops_desc->type;
>>>>>>> +    void *image = NULL, *image_end = NULL;
>>>>>>>       struct bpf_tramp_links *tlinks;
>>>>>>>       void *udata, *kdata;
>>>>>>>       int prog_fd, err;
>>>>>>> -    void *image, *image_end;
>>>>>>>       u32 i;
>>>>>>>       if (flags)
>>>>>>> @@ -573,15 +609,14 @@ static long 
>>>>>>> bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
>>>>>>>       udata = &uvalue->data;
>>>>>>>       kdata = &kvalue->data;
>>>>>>> -    image = st_map->image;
>>>>>>> -    image_end = st_map->image + PAGE_SIZE;
>>>>>>>       module_type = btf_type_by_id(btf_vmlinux, 
>>>>>>> st_ops_ids[IDX_MODULE_ID]);
>>>>>>>       for_each_member(i, t, member) {
>>>>>>>           const struct btf_type *mtype, *ptype;
>>>>>>>           struct bpf_prog *prog;
>>>>>>>           struct bpf_tramp_link *link;
>>>>>>> -        u32 moff;
>>>>>>> +        u32 moff, tflags;
>>>>>>> +        int tsize;
>>>>>>>           moff = __btf_member_bit_offset(t, member) / 8;
>>>>>>>           ptype = btf_type_resolve_ptr(st_map->btf, member->type, 
>>>>>>> NULL);
>>>>>>> @@ -653,10 +688,38 @@ static long 
>>>>>>> bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
>>>>>>>                     &bpf_struct_ops_link_lops, prog);
>>>>>>>           st_map->links[i] = &link->link;
>>>>>>> -        err = bpf_struct_ops_prepare_trampoline(tlinks, link,
>>>>>>> -                            &st_ops->func_models[i],
>>>>>>> -                            *(void **)(st_ops->cfi_stubs + moff),
>>>>>>> -                            image, image_end);
>>>>>>> +        tflags = BPF_TRAMP_F_INDIRECT;
>>>>>>> +        if (st_ops->func_models[i].ret_size > 0)
>>>>>>> +            tflags |= BPF_TRAMP_F_RET_FENTRY_RET;
>>>>>>> +
>>>>>>> +        /* Compute the size of the trampoline */
>>>>>>> +        tlinks[BPF_TRAMP_FENTRY].links[0] = link;
>>>>>>> +        tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
>>>>>>> +        tsize = arch_bpf_trampoline_size(&st_ops->func_models[i],
>>>>>>> +                         tflags, tlinks, NULL);
>>>>>>> +        if (tsize < 0) {
>>>>>>> +            err = tsize;
>>>>>>> +            goto reset_unlock;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        /* Allocate pages */
>>>>>>> +        if (tsize > (unsigned long)image_end - (unsigned 
>>>>>>> long)image) {
>>>>>>> +            if (tsize > PAGE_SIZE) {
>>>>>>> +                err = -E2BIG;
>>>>>>> +                goto reset_unlock;
>>>>>>> +            }
>>>>>>> +            image = bpf_struct_ops_map_inc_image(st_map);
>>>>>>> +            if (IS_ERR(image)) {
>>>>>>> +                err = PTR_ERR(image);
>>>>>>> +                goto reset_unlock;
>>>>>>> +            }
>>>>>>> +            image_end = image + PAGE_SIZE;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        err = arch_prepare_bpf_trampoline(NULL, image, image_end,
>>>>>>> +                          &st_ops->func_models[i],
>>>>>>> +                          tflags, tlinks,
>>>>>>> +                          *(void **)(st_ops->cfi_stubs + moff));
>>>>>>
>>>>>> I don't prefer to copy the BPF_TRAMP_F_* setting on tflags, 
>>>>>> tlinks, and the arch_*_trampoline_*() logic from 
>>>>>> bpf_struct_ops_prepare_trampoline() which is used by the 
>>>>>> bpf_dummy_ops for testing also. Considering struct_ops supports 
>>>>>> kernel module now, in the future, it is better to move 
>>>>>> bpf_dummy_ops out to the bpf_testmod somehow and avoid its 
>>>>>> bpf_struct_ops_prepare_trampoline() usage. For now, it is still 
>>>>>> better to keep bpf_struct_ops_prepare_trampoline() to be reusable 
>>>>>> by both.
>>>>>>
>>>>>> Have you thought about the earlier suggestion in v1 to do 
>>>>>> arch_alloc_bpf_trampoline() in bpf_struct_ops_prepare_trampoline() 
>>>>>> instead of copying codes from bpf_struct_ops_prepare_trampoline() 
>>>>>> to bpf_struct_ops_map_update_elem()?
>>>>>>
>>>>>> Something like this (untested code):
>>>>>>
>>>>>> void *bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links 
>>>>>> *tlinks,
>>>>>>                      struct bpf_tramp_link *link,
>>>>>>                      const struct btf_func_model *model,
>>>>>>                      void *stub_func, void *image,
>>>>>>                      u32 *image_off,
>>>>>>                      bool allow_alloc)
>>>>
>>>> To be a little more specific, the changes in 
>>>> bpf_struct_ops_map_update_elem()
>>>> could be mostly like this (untested):
>>>>
>>>>          ret_image = bpf_struct_ops_prepare_trampoline(tlinks, link,
>>>>                                    &st_ops->func_models[i],
>>>>                                    *(void **)(st_ops->cfi_stubs + 
>>>> moff),
>>>>                                    image, &image_off,
>>>>                                    st_map->image_pages_cnt < 
>>>> MAX_TRAMP_IMAGE_PAGES);
>>>>          if (IS_ERR(ret_image))
>>>>              goto reset_unlock;
>>>>
>>>>          if (image != ret_image) {
>>>>              image = ret_image;
>>>>              st_map->image_pages[st_map->image_pages_cnt++] = image;
>>>>          }
>>>>
>>>
>>> What I don't like is the memory management code was in two named
>>> functions, bpf_struct_ops_map_free_image() and
>>> bpf_struct_ops_map_inc_image().
>>
>> bpf_struct_ops_map_inc_image() is not needed.
>>
>>> Now, it falls apart.  Allocate in one place, keep accounting in another
>>> place, and free yet at the 3rd place.
>>>
>>>>>
>>>>>
>>>>> How about pass a struct bpf_struct_ops_map to
>>>>> bpf_struct_ops_prepare_trampoline(). If the pointer of struct
>>>>> bpf_struct_ops_map is not NULL, try to allocate new pages for the map?
>>>>>
>>>>> For example,
>>>>>
>>>>> static int
>>>>> _bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
>>>>>
>>>>>                                     struct bpf_tramp_link *link,
>>>>>
>>>>>                                     const struct btf_func_model 
>>>>> *model,
>>>>>
>>>>>                                     void *stub_func, void *image,
>>>>>                                     void *image_end,
>>>>>                                     struct bpf_struct_ops_map *st_map)
>>>>> {
>>>>>
>>>>> ...
>>>>>
>>>>>      if (!image || size > PAGE_SIZE - *image_off) {
>>>>>          if (!st_map)
>>>>
>>>> Why only limit to st_map != NULL?
>>>>
>>>> arch_alloc_bpf_trampoline() is also called in bpf_dummy_ops.
>>>> If bpf_struct_ops_prepare_trampoline() can do the alloc, it may as 
>>>> well simplify
>>>> bpf_dummy_ops and just use bpf_struct_ops_prepare_trampoline() to 
>>>> alloc.
>>>
>>>
>>> Yes, it can save a few lines from bpf_dummy_ops. But, bpf_dummy_ops
>>> still need to free the memory. And, it doesn't pair alloc and free in
>>> the same function. Usually, paring alloc and free in the same function,
>>> nearby, or the same module is easier to understand.
>>
>> It is not only about saving a few lines. It just does not make sense to
>> add all this complexity and another 
>> "_"bpf_struct_ops_prepare_trampoline()
>> variant to make it conform to the alloc/free pair idea. To be clear, I 
>> don't
> 
> To be clear, we are not talking computation or memory complexity here.
> I consider the complexity in another way. When I look at the code of
> bpf_dummy_ops, and see it free the memory at the very end of a function.
> I have to guess who allocate the memory by looking around without a
> clear sign or hint if we move the allocation to
> bpf_struct_ops_prepare_trampoline(). It is a source of complexity.
> Very often, a duplication is much more simple and easy to understand.
> Especially, when the duplication is in a very well know/recognized
> pattern. Here will create a unusual way to replace a well recognized one
> to simplify the code.
> 
> My reason of duplicating the code from
> bpf_struct_ops_prepare_trampoline() was we don't need
> bpf_struct_ops_prepare_trampoline() in future if we were going to move
> bpf_dummy_ops out. But, just like you said, we still have bpf_dummy_ops
> now, so it is a good trade of to move memory allocation into
> bpf_struct_ops_prepare_trampoline() to avoid the duplication the code
> about flags and tlinks. But, the trade off we are talking here goes to
> an opposite way.
> 
> By the way, I am not insisting on these tiny details. I am just trying
> to explain what I don't like here.

One thing I forgot to mention is that bpf_dummy_ops has to call
bpf_jit_uncharge_modmem(PAGE_SIZE) as well. The other option is to move
bpf_jit_charge_modmem() out of bpf_struct_ops_prepare_trampoline(),
meaning bpf_struct_ops_map_update_elem() should handle the case that the
allocation in bpf_struct_ops_prepare_trampoline() successes, but
bpf_jit_charge_modmem() fails.


> 
>> see alloc/free pair is a must have in all cases. There are many 
>> situations that
>> non-alloc named function calls multiple kmalloc() in different places and
>> one xyz_free() releases everything. Even alloc/free is really preferred,
>> there has to be a better way or else need to make a trade off.
>>
>> I suggested the high level ideal on making
>> bpf_struct_ops_prepare_trampoline() to allocate page. It can sure add a
>> bpf_struct_ops_free_trampoline() if you see fit to make it match with
>> bpf_struct_ops_prepare_trampoline() as alloc/free pair, for example,
>>
>> void bpf_struct_ops_free_trampoline(void *image)
>> {
>>      bpf_jit_uncharge_modmem(PAGE_SIZE);
>>      arch_free_bpf_trampoline(image, PAGE_SIZE);
>> }
>>
>> and make bpf_struct_ops_map_free_image() to use
>> bpf_struct_ops_free_trampoline()
>>
>> static void bpf_struct_ops_map_free_image(struct bpf_struct_ops_map 
>> *st_map)
>> {
>>      int i;
>>
>>      for (i = 0; i < st_map->image_pages_cnt; i++) {
>>          bpf_struct_ops_free_trampoline(st_map->image_pages[i]);
>>          st_map->image_pages[i] = NULL;
>>      }
>>      st_map->image_pages_cnt = 0;
>> }
>>
>> Then it should work like alloc/free pair.

  reply	other threads:[~2024-02-23 18:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-21 22:59 [PATCH bpf-next v2 0/3] Allow struct_ops maps with a large number of programs thinker.li
2024-02-21 22:59 ` [PATCH bpf-next v2 1/3] bpf, net: validate struct_ops when updating value thinker.li
2024-02-21 22:59 ` [PATCH bpf-next v2 2/3] bpf: struct_ops supports more than one page for trampolines thinker.li
2024-02-23  0:33   ` Martin KaFai Lau
2024-02-23  1:35     ` Kui-Feng Lee
2024-02-23  2:16       ` Martin KaFai Lau
2024-02-23  3:01         ` Kui-Feng Lee
2024-02-23  5:25           ` Martin KaFai Lau
2024-02-23 17:36             ` Kui-Feng Lee
2024-02-23 18:29               ` Kui-Feng Lee [this message]
2024-02-23 18:42                 ` Martin KaFai Lau
2024-02-23 19:05                   ` Kui-Feng Lee
2024-02-23 19:15                     ` Martin KaFai Lau
2024-02-23 22:06                       ` Kui-Feng Lee
2024-02-24  3:20                         ` Martin KaFai Lau
2024-02-23 18:32               ` Martin KaFai Lau
2024-02-21 22:59 ` [PATCH bpf-next v2 3/3] selftests/bpf: Test struct_ops maps with a large number of program links thinker.li
2024-02-21 23:02 ` [PATCH bpf-next v2 0/3] Allow struct_ops maps with a large number of programs Kui-Feng Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c938c3b1-8cce-4563-930d-7e8150365117@gmail.com \
    --to=sinquersw@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kuifeng@meta.com \
    --cc=martin.lau@linux.dev \
    --cc=song@kernel.org \
    --cc=thinker.li@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox