From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kui-Feng Lee <sinquersw@gmail.com>, thinker.li@gmail.com
Cc: kuifeng@meta.com, bpf@vger.kernel.org, ast@kernel.org,
song@kernel.org, kernel-team@meta.com, andrii@kernel.org
Subject: Re: [PATCH bpf-next v2 2/3] bpf: struct_ops supports more than one page for trampolines.
Date: Fri, 23 Feb 2024 19:20:30 -0800 [thread overview]
Message-ID: <e8fb3349-c27a-46f7-877d-180df9289108@linux.dev> (raw)
In-Reply-To: <da7288a2-0e3e-4f46-8d09-450a4bc3978b@gmail.com>
On 2/23/24 2:06 PM, Kui-Feng Lee wrote:
>
>
> On 2/23/24 11:15, Martin KaFai Lau wrote:
>> On 2/23/24 11:05 AM, Kui-Feng Lee wrote:
>>>
>>>
>>>
>>> On 2/23/24 10:42, Martin KaFai Lau wrote:
>>>> On 2/23/24 10:29 AM, Kui-Feng Lee wrote:
>>>>> One thing I forgot to mention is that bpf_dummy_ops has to call
>>>>> bpf_jit_uncharge_modmem(PAGE_SIZE) as well. The other option is to move
>>>>> bpf_jit_charge_modmem() out of bpf_struct_ops_prepare_trampoline(),
>>>>> meaning bpf_struct_ops_map_update_elem() should handle the case that the
>>>>> allocation in bpf_struct_ops_prepare_trampoline() successes, but
>>>>> bpf_jit_charge_modmem() fails.
>>>>
>>>> Keep the charge/uncharge in bpf_struct_ops_prepare_trampoline().
>>>>
>>>> It is fine to have bpf_dummy_ops charge and then uncharge a PAGE_SIZE. There
>>>> is no need to optimize for bpf_dummy_ops. Use
>>>> bpf_struct_ops_free_trampoline() in bpf_dummy_ops to uncharge and free.
>>>
>>>
>>> Then, I don't get the point here.
>>> I agree with moving the allocation into
>>> bpf_struct_ops_prepare_trampoline() to avoid duplication of the code
>>> about flags and tlinks. It really simplifies the code with the fact
>>> that bpf_dummy_ops is still there. So, I tried to pass a st_map to
>>> bpf_struct_ops_prepare_trampoline() to keep page managements code
>>> together. But, you said to simplify the code of bpf_dummy_ops by
>>> allocating pages in bpf_struct_ops_prepare_trampoline(), do bookkeeping
>>> in bpf_struct_ops_map_update_elem(), so bpf_dummy_ops doesn't have to
>>
>> I don't think I ever mentioned to do book keeping in
>> bpf_struct_ops_map_update_elem(). Have you looked at my earlier code in
>> bpf_struct_ops_prepare_trampoline() which also does the memory charging also?
>
> The bookkeeping that I am saying is about maintaining image_pages and
> image_pages_cnt.
image_pages and image_pages_cnt are in the st_map (struct_ops map) itself.
Why the bpf_struct_ops_map_update_elem() cannot keep track of the pages itself
and need "_"bpf_struct_ops_prepare_trampoline() to handle it? It is not
like refactoring the image_pages[_cnt] handling to
_bpf_struct_ops_prepare_trampoline() and it can be reused by another caller.
bpf_struct_ops_map_update_elem() is the only one needing to keep track of
its own image_pages and image_pages_cnt.
>
>>
>>> allocate memory. But, we have to move a bpf_jit_uncharge_modmem() to
>>> bpf_dummy_ops. For me, this trade-off that include removing an
>>> allocation and adding a bpf_jit_uncharge_modmem() make no sense.
>>
>> Which part make no sense? Having bpf_dummy_ops charge/uncharge memory also?
>
> Simplifying bpf_dummy_ops by removing the duty of allocation but adding
> the duty of uncharge memory doesn't make sense to me in terms of
> simplification. Although the lines of code would be similar, it actually
> makes it more complicated than before.
The bpf_dummy_ops does not need to call uncharge. It does not need
to know the page is charged or not. It uses bpf_struct_ops_prepare_trampoline()
to get a prepared trampoline page and bpf_struct_ops_free_trampoline() which
does the uncharge+free. It is the alloc/free concept that you have been
proposing which fits well here.
The bpf_dummy_ops is doing arch_alloc_bpf_trampoline and
arch_free_bpf_trampoline.
The arch_alloc_bpf_trampoline will be gone (-1).
The arch_free_bpf_trampoline will be replaced (+0) by
bpf_struct_ops_free_trampoline.
Untested code:
diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c
index 02de71719aed..a3bb89eb037f 100644
--- a/net/bpf/bpf_dummy_struct_ops.c
+++ b/net/bpf/bpf_dummy_struct_ops.c
@@ -89,8 +89,8 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
struct bpf_dummy_ops_test_args *args;
struct bpf_tramp_links *tlinks;
struct bpf_tramp_link *link = NULL;
+ unsigned int op_idx, image_off = 0;
void *image = NULL;
- unsigned int op_idx;
int prog_ret;
s32 type_id;
int err;
@@ -114,12 +114,6 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
goto out;
}
- image = arch_alloc_bpf_trampoline(PAGE_SIZE);
- if (!image) {
- err = -ENOMEM;
- goto out;
- }
-
link = kzalloc(sizeof(*link), GFP_USER);
if (!link) {
err = -ENOMEM;
@@ -130,12 +124,14 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS, &bpf_struct_ops_link_lops, prog);
op_idx = prog->expected_attach_type;
- err = bpf_struct_ops_prepare_trampoline(tlinks, link,
- &st_ops->func_models[op_idx],
- &dummy_ops_test_ret_function,
- image, image + PAGE_SIZE);
- if (err < 0)
+ image = bpf_struct_ops_prepare_trampoline(tlinks, link,
+ &st_ops->func_models[op_idx],
+ &dummy_ops_test_ret_function,
+ NULL, &image_off, true);
+ if (IS_ERR(image)) {
+ err = PTR_ERR(image);
goto out;
+ }
arch_protect_bpf_trampoline(image, PAGE_SIZE);
prog_ret = dummy_ops_call_op(image, args);
@@ -147,7 +143,8 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
err = -EFAULT;
out:
kfree(args);
- arch_free_bpf_trampoline(image, PAGE_SIZE);
+ if (!IS_ERR_OR_NULL(image))
+ bpf_struct_ops_free_trampoline(image);
if (link)
bpf_link_put(&link->link);
kfree(tlinks);
>>>>>>> void bpf_struct_ops_free_trampoline(void *image)
>>>>>>> {
>>>>>>> bpf_jit_uncharge_modmem(PAGE_SIZE);
>>>>>>> arch_free_bpf_trampoline(image, PAGE_SIZE);
>>>>>>> }
^^^^^^^^^^^^^^^
To be clear, this bpf_struct_ops_free_trampoline() function that
does the uncharge+free.
~~~~~~~~~~~~~~~~
Lets summarize a bit of the thread:
. I think we agree duplicating codes from
bpf_struct_ops_prepare_trampoline() is not good.
. bpf_struct_ops_prepare_trampoline() can allocate page.
. The first concern was there is no alloc/free pairing.
Then adding bpf_struct_ops_free_trampoline was suggested
to do the uncharge+free together,
while bpf_struct_ops_prepare_trampoline does the charge+alloc
. The second concern was bpf_struct_ops_prepare_trampoline()
is not obvious that a free(page) needs to be done. It is
more like a naming perception since returning a page is already
a good signal that needs to be freed. However, it is open for
a function name change if it can make the "alloc" part more obvious.
. Another concern was bpf_dummy_ops now needs to remember it
needs to uncharge. bpf_struct_ops_free_trampoline() should
address it also since it does uncharge+free.
. Another point brought up was having bpf_struct_ops_prepare_trampoline()
store the allocated page in st_map->image_pages instead of
map_update doing it itself which was covered in the first part of
this email.
I hope the code pieces have addressed the points brought up above.
I have combined these code pieces in a patch
which is only compiler tested. Hope it will make things clearer (first patch):
https://git.kernel.org/pub/scm/linux/kernel/git/martin.lau/bpf-next.git/log/?h=struct_ops.images
next prev parent reply other threads:[~2024-02-24 3:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-21 22:59 [PATCH bpf-next v2 0/3] Allow struct_ops maps with a large number of programs thinker.li
2024-02-21 22:59 ` [PATCH bpf-next v2 1/3] bpf, net: validate struct_ops when updating value thinker.li
2024-02-21 22:59 ` [PATCH bpf-next v2 2/3] bpf: struct_ops supports more than one page for trampolines thinker.li
2024-02-23 0:33 ` Martin KaFai Lau
2024-02-23 1:35 ` Kui-Feng Lee
2024-02-23 2:16 ` Martin KaFai Lau
2024-02-23 3:01 ` Kui-Feng Lee
2024-02-23 5:25 ` Martin KaFai Lau
2024-02-23 17:36 ` Kui-Feng Lee
2024-02-23 18:29 ` Kui-Feng Lee
2024-02-23 18:42 ` Martin KaFai Lau
2024-02-23 19:05 ` Kui-Feng Lee
2024-02-23 19:15 ` Martin KaFai Lau
2024-02-23 22:06 ` Kui-Feng Lee
2024-02-24 3:20 ` Martin KaFai Lau [this message]
2024-02-23 18:32 ` Martin KaFai Lau
2024-02-21 22:59 ` [PATCH bpf-next v2 3/3] selftests/bpf: Test struct_ops maps with a large number of program links thinker.li
2024-02-21 23:02 ` [PATCH bpf-next v2 0/3] Allow struct_ops maps with a large number of programs Kui-Feng Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e8fb3349-c27a-46f7-877d-180df9289108@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=kernel-team@meta.com \
--cc=kuifeng@meta.com \
--cc=sinquersw@gmail.com \
--cc=song@kernel.org \
--cc=thinker.li@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox