From: Kui-Feng Lee <sinquersw@gmail.com>
To: Martin KaFai Lau <martin.lau@linux.dev>, thinker.li@gmail.com
Cc: kuifeng@meta.com, bpf@vger.kernel.org, ast@kernel.org,
song@kernel.org, kernel-team@meta.com, andrii@kernel.org
Subject: Re: [PATCH bpf-next v2 2/3] bpf: struct_ops supports more than one page for trampolines.
Date: Fri, 23 Feb 2024 09:36:40 -0800 [thread overview]
Message-ID: <33c2317c-fde0-4503-991b-314f20d9e7f7@gmail.com> (raw)
In-Reply-To: <b8bac273-27c7-485a-8e45-8825251d6d5a@linux.dev>
On 2/22/24 21:25, Martin KaFai Lau wrote:
> On 2/22/24 7:01 PM, Kui-Feng Lee wrote:
>>
>>
>>
>> On 2/22/24 18:16, Martin KaFai Lau wrote:
>>> On 2/22/24 5:35 PM, Kui-Feng Lee wrote:
>>>>
>>>>
>>>> On 2/22/24 16:33, Martin KaFai Lau wrote:
>>>>> On 2/21/24 2:59 PM, thinker.li@gmail.com wrote:
>>>>>> @@ -531,10 +567,10 @@ static long
>>>>>> bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
>>>>>> const struct btf_type *module_type;
>>>>>> const struct btf_member *member;
>>>>>> const struct btf_type *t = st_ops_desc->type;
>>>>>> + void *image = NULL, *image_end = NULL;
>>>>>> struct bpf_tramp_links *tlinks;
>>>>>> void *udata, *kdata;
>>>>>> int prog_fd, err;
>>>>>> - void *image, *image_end;
>>>>>> u32 i;
>>>>>> if (flags)
>>>>>> @@ -573,15 +609,14 @@ static long
>>>>>> bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
>>>>>> udata = &uvalue->data;
>>>>>> kdata = &kvalue->data;
>>>>>> - image = st_map->image;
>>>>>> - image_end = st_map->image + PAGE_SIZE;
>>>>>> module_type = btf_type_by_id(btf_vmlinux,
>>>>>> st_ops_ids[IDX_MODULE_ID]);
>>>>>> for_each_member(i, t, member) {
>>>>>> const struct btf_type *mtype, *ptype;
>>>>>> struct bpf_prog *prog;
>>>>>> struct bpf_tramp_link *link;
>>>>>> - u32 moff;
>>>>>> + u32 moff, tflags;
>>>>>> + int tsize;
>>>>>> moff = __btf_member_bit_offset(t, member) / 8;
>>>>>> ptype = btf_type_resolve_ptr(st_map->btf, member->type,
>>>>>> NULL);
>>>>>> @@ -653,10 +688,38 @@ static long
>>>>>> bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
>>>>>> &bpf_struct_ops_link_lops, prog);
>>>>>> st_map->links[i] = &link->link;
>>>>>> - err = bpf_struct_ops_prepare_trampoline(tlinks, link,
>>>>>> - &st_ops->func_models[i],
>>>>>> - *(void **)(st_ops->cfi_stubs + moff),
>>>>>> - image, image_end);
>>>>>> + tflags = BPF_TRAMP_F_INDIRECT;
>>>>>> + if (st_ops->func_models[i].ret_size > 0)
>>>>>> + tflags |= BPF_TRAMP_F_RET_FENTRY_RET;
>>>>>> +
>>>>>> + /* Compute the size of the trampoline */
>>>>>> + tlinks[BPF_TRAMP_FENTRY].links[0] = link;
>>>>>> + tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
>>>>>> + tsize = arch_bpf_trampoline_size(&st_ops->func_models[i],
>>>>>> + tflags, tlinks, NULL);
>>>>>> + if (tsize < 0) {
>>>>>> + err = tsize;
>>>>>> + goto reset_unlock;
>>>>>> + }
>>>>>> +
>>>>>> + /* Allocate pages */
>>>>>> + if (tsize > (unsigned long)image_end - (unsigned
>>>>>> long)image) {
>>>>>> + if (tsize > PAGE_SIZE) {
>>>>>> + err = -E2BIG;
>>>>>> + goto reset_unlock;
>>>>>> + }
>>>>>> + image = bpf_struct_ops_map_inc_image(st_map);
>>>>>> + if (IS_ERR(image)) {
>>>>>> + err = PTR_ERR(image);
>>>>>> + goto reset_unlock;
>>>>>> + }
>>>>>> + image_end = image + PAGE_SIZE;
>>>>>> + }
>>>>>> +
>>>>>> + err = arch_prepare_bpf_trampoline(NULL, image, image_end,
>>>>>> + &st_ops->func_models[i],
>>>>>> + tflags, tlinks,
>>>>>> + *(void **)(st_ops->cfi_stubs + moff));
>>>>>
>>>>> I don't prefer to copy the BPF_TRAMP_F_* setting on tflags, tlinks,
>>>>> and the arch_*_trampoline_*() logic from
>>>>> bpf_struct_ops_prepare_trampoline() which is used by the
>>>>> bpf_dummy_ops for testing also. Considering struct_ops supports
>>>>> kernel module now, in the future, it is better to move
>>>>> bpf_dummy_ops out to the bpf_testmod somehow and avoid its
>>>>> bpf_struct_ops_prepare_trampoline() usage. For now, it is still
>>>>> better to keep bpf_struct_ops_prepare_trampoline() to be reusable
>>>>> by both.
>>>>>
>>>>> Have you thought about the earlier suggestion in v1 to do
>>>>> arch_alloc_bpf_trampoline() in bpf_struct_ops_prepare_trampoline()
>>>>> instead of copying codes from bpf_struct_ops_prepare_trampoline()
>>>>> to bpf_struct_ops_map_update_elem()?
>>>>>
>>>>> Something like this (untested code):
>>>>>
>>>>> void *bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links
>>>>> *tlinks,
>>>>> struct bpf_tramp_link *link,
>>>>> const struct btf_func_model *model,
>>>>> void *stub_func, void *image,
>>>>> u32 *image_off,
>>>>> bool allow_alloc)
>>>
>>> To be a little more specific, the changes in
>>> bpf_struct_ops_map_update_elem()
>>> could be mostly like this (untested):
>>>
>>> ret_image = bpf_struct_ops_prepare_trampoline(tlinks, link,
>>> &st_ops->func_models[i],
>>> *(void **)(st_ops->cfi_stubs + moff),
>>> image, &image_off,
>>> st_map->image_pages_cnt <
>>> MAX_TRAMP_IMAGE_PAGES);
>>> if (IS_ERR(ret_image))
>>> goto reset_unlock;
>>>
>>> if (image != ret_image) {
>>> image = ret_image;
>>> st_map->image_pages[st_map->image_pages_cnt++] = image;
>>> }
>>>
>>
>> What I don't like is the memory management code was in two named
>> functions, bpf_struct_ops_map_free_image() and
>> bpf_struct_ops_map_inc_image().
>
> bpf_struct_ops_map_inc_image() is not needed.
>
>> Now, it falls apart. Allocate in one place, keep accounting in another
>> place, and free yet at the 3rd place.
>>
>>>>
>>>>
>>>> How about pass a struct bpf_struct_ops_map to
>>>> bpf_struct_ops_prepare_trampoline(). If the pointer of struct
>>>> bpf_struct_ops_map is not NULL, try to allocate new pages for the map?
>>>>
>>>> For example,
>>>>
>>>> static int
>>>> _bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
>>>>
>>>> struct bpf_tramp_link *link,
>>>>
>>>> const struct btf_func_model *model,
>>>>
>>>> void *stub_func, void *image,
>>>> void *image_end,
>>>> struct bpf_struct_ops_map *st_map)
>>>> {
>>>>
>>>> ...
>>>>
>>>> if (!image || size > PAGE_SIZE - *image_off) {
>>>> if (!st_map)
>>>
>>> Why only limit to st_map != NULL?
>>>
>>> arch_alloc_bpf_trampoline() is also called in bpf_dummy_ops.
>>> If bpf_struct_ops_prepare_trampoline() can do the alloc, it may as
>>> well simplify
>>> bpf_dummy_ops and just use bpf_struct_ops_prepare_trampoline() to alloc.
>>
>>
>> Yes, it can save a few lines from bpf_dummy_ops. But, bpf_dummy_ops
>> still need to free the memory. And, it doesn't pair alloc and free in
>> the same function. Usually, paring alloc and free in the same function,
>> nearby, or the same module is easier to understand.
>
> It is not only about saving a few lines. It just does not make sense to
> add all this complexity and another "_"bpf_struct_ops_prepare_trampoline()
> variant to make it conform to the alloc/free pair idea. To be clear, I
> don't
To be clear, we are not talking computation or memory complexity here.
I consider the complexity in another way. When I look at the code of
bpf_dummy_ops, and see it free the memory at the very end of a function.
I have to guess who allocate the memory by looking around without a
clear sign or hint if we move the allocation to
bpf_struct_ops_prepare_trampoline(). It is a source of complexity.
Very often, a duplication is much more simple and easy to understand.
Especially, when the duplication is in a very well know/recognized
pattern. Here will create a unusual way to replace a well recognized one
to simplify the code.
My reason of duplicating the code from
bpf_struct_ops_prepare_trampoline() was we don't need
bpf_struct_ops_prepare_trampoline() in future if we were going to move
bpf_dummy_ops out. But, just like you said, we still have bpf_dummy_ops
now, so it is a good trade of to move memory allocation into
bpf_struct_ops_prepare_trampoline() to avoid the duplication the code
about flags and tlinks. But, the trade off we are talking here goes to
an opposite way.
By the way, I am not insisting on these tiny details. I am just trying
to explain what I don't like here.
> see alloc/free pair is a must have in all cases. There are many
> situations that
> non-alloc named function calls multiple kmalloc() in different places and
> one xyz_free() releases everything. Even alloc/free is really preferred,
> there has to be a better way or else need to make a trade off.
>
> I suggested the high level ideal on making
> bpf_struct_ops_prepare_trampoline() to allocate page. It can sure add a
> bpf_struct_ops_free_trampoline() if you see fit to make it match with
> bpf_struct_ops_prepare_trampoline() as alloc/free pair, for example,
>
> void bpf_struct_ops_free_trampoline(void *image)
> {
> bpf_jit_uncharge_modmem(PAGE_SIZE);
> arch_free_bpf_trampoline(image, PAGE_SIZE);
> }
>
> and make bpf_struct_ops_map_free_image() to use
> bpf_struct_ops_free_trampoline()
>
> static void bpf_struct_ops_map_free_image(struct bpf_struct_ops_map
> *st_map)
> {
> int i;
>
> for (i = 0; i < st_map->image_pages_cnt; i++) {
> bpf_struct_ops_free_trampoline(st_map->image_pages[i]);
> st_map->image_pages[i] = NULL;
> }
> st_map->image_pages_cnt = 0;
> }
>
> Then it should work like alloc/free pair.
next prev parent reply other threads:[~2024-02-23 17:36 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-21 22:59 [PATCH bpf-next v2 0/3] Allow struct_ops maps with a large number of programs thinker.li
2024-02-21 22:59 ` [PATCH bpf-next v2 1/3] bpf, net: validate struct_ops when updating value thinker.li
2024-02-21 22:59 ` [PATCH bpf-next v2 2/3] bpf: struct_ops supports more than one page for trampolines thinker.li
2024-02-23 0:33 ` Martin KaFai Lau
2024-02-23 1:35 ` Kui-Feng Lee
2024-02-23 2:16 ` Martin KaFai Lau
2024-02-23 3:01 ` Kui-Feng Lee
2024-02-23 5:25 ` Martin KaFai Lau
2024-02-23 17:36 ` Kui-Feng Lee [this message]
2024-02-23 18:29 ` Kui-Feng Lee
2024-02-23 18:42 ` Martin KaFai Lau
2024-02-23 19:05 ` Kui-Feng Lee
2024-02-23 19:15 ` Martin KaFai Lau
2024-02-23 22:06 ` Kui-Feng Lee
2024-02-24 3:20 ` Martin KaFai Lau
2024-02-23 18:32 ` Martin KaFai Lau
2024-02-21 22:59 ` [PATCH bpf-next v2 3/3] selftests/bpf: Test struct_ops maps with a large number of program links thinker.li
2024-02-21 23:02 ` [PATCH bpf-next v2 0/3] Allow struct_ops maps with a large number of programs Kui-Feng Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=33c2317c-fde0-4503-991b-314f20d9e7f7@gmail.com \
--to=sinquersw@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=kernel-team@meta.com \
--cc=kuifeng@meta.com \
--cc=martin.lau@linux.dev \
--cc=song@kernel.org \
--cc=thinker.li@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox