Re: [PATCH bpf-next v6 1/9] bpf: Allow each subprog having stack size of 512 bytes

BPF List
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v6 1/9] bpf: Allow each subprog having stack size of 512 bytes
Date: Tue, 22 Oct 2024 14:43:22 -0700	[thread overview]
Message-ID: <489b0524-49bc-4df4-8744-1badd40824be@linux.dev> (raw)
In-Reply-To: <CAADnVQL3+o7xV2LQcO-AArBmSEV=CQ7TQsuzBfTUnc_g+MhoMw@mail.gmail.com>


On 10/22/24 1:41 PM, Alexei Starovoitov wrote:
> On Tue, Oct 22, 2024 at 1:13 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 10/21/24 8:43 PM, Alexei Starovoitov wrote:
>>> On Mon, Oct 21, 2024 at 8:21 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>>>            for (int i = 0; i < env->subprog_cnt; i++) {
>>>>>> -               if (!i || si[i].is_async_cb) {
>>>>>> -                       ret = check_max_stack_depth_subprog(env, i);
>>>>>> +               check_subprog = !i || (check_priv_stack ? si[i].is_cb : si[i].is_async_cb);
>>>>> why?
>>>>> This looks very suspicious.
>>>> This is to simplify jit. For example,
>>>>       main_prog   <=== main_prog_priv_stack_ptr
>>>>         subprog1  <=== there is a helper which has a callback_fn
>>>>                   <=== for example bpf_for_each_map_elem
>>>>
>>>>           callback_fn
>>>>             subprog2
>>>>
>>>> In callback_fn, we cannot simplify do
>>>>       r9 += stack_size_for_callback_fn
>>>> since r9 may have been clobbered between subprog1 and callback_fn.
>>>> That is why currently I allocate private_stack separately for callback_fn.
>>>>
>>>> Alternatively we could do
>>>>       callback_fn_priv_stack_ptr = main_prog_priv_stack_ptr + off
>>>> where off equals to (stack size tree main_prog+subprog1).
>>>> I can do this approach too with a little more information in prog->aux.
>>>> WDYT?
>>> I see. I think we're overcomplicating the verifier just to
>>> be able to do 'r9 += stack' in the subprog.
>>> The cases of async vs sync and directly vs kfunc/helper
>>> (and soon with inlining of kfuncs) are getting too hard
>>> to reason about.
>>>
>>> I think we need to go back to the earlier approach
>>> where every subprog had its own private stack and was
>>> setting up r9 = my_priv_stack in the prologue.
>>>
>>> I suspect it's possible to construct a convoluted subprog
>>> that calls itself a limited amount of time and the verifier allows that.
>>> I feel it will be easier to detect just that condition
>>> in the verifier and fallback to the normal stack.
>> I tried a simple bpf prog below.
>>
>> $ cat private_stack_subprog_recur.c
>> // SPDX-License-Identifier: GPL-2.0
>>
>> #include <vmlinux.h>
>> #include <bpf/bpf_helpers.h>
>> #include <bpf/bpf_tracing.h>
>> #include "../bpf_testmod/bpf_testmod.h"
>>
>> char _license[] SEC("license") = "GPL";
>>
>> #if defined(__TARGET_ARCH_x86)
>> bool skip __attribute((__section__(".data"))) = false;
>> #else
>> bool skip = true;
>> #endif
>>
>> int i;
>>
>> __noinline static void subprog1(int level)
>> {
>>           if (level > 0) {
>>                   subprog1(level >> 1);
>>                   i++;
>>           }
>> }
>>
>> SEC("kprobe")
>> int prog1(void)
>> {
>>           subprog1(1);
>>           return 0;
>> }
>>
>> In the above prog, we have a recursion of subprog1. The
>> callchain is:
>>      prog -> subprog1 -> subprog1
>>
>> The insn-level verification is successful since argument
>> of subprog1() has precise value.
>>
>> But eventually, verification failed with the following message:
>>     the call stack of 8 frames is too deep !
>>
>> The error message is
>>                   if (frame >= MAX_CALL_FRAMES) {
>>                           verbose(env, "the call stack of %d frames is too deep !\n",
>>                                   frame);
>>                           return -E2BIG;
>>                   }
>> in function check_max_stack_depth_subprog().
>> Basically in function check_max_stack_depth_subprog(), tracing subprog
>> call is done only based on call insn. All conditionals are ignored.
>> In the above example, check_max_stack_depth_subprog() will have the
>> call graph like
>>       prog -> subprog1 -> subprog1 -> subprog1 -> subprog1 -> ...
>> and eventually hit the error.
>>
>> Basically with check_max_stack_depth_subprog() self recursion is not
>> possible for a bpf prog.
>>
>> This limitation is back to year 2017.
>>     commit 70a87ffea8ac  bpf: fix maximum stack depth tracking logic
>>
>> So I assume people really do not write progs with self recursion inside
>> the main prog (including subprogs).
> Thanks for checking this part.
>
> What about sync and async callbacks? Can they recurse?

For sync, there will be no recurses between subprogs.
This is due to the following func.

static int check_max_stack_depth(struct bpf_verifier_env *env)
{
         struct bpf_subprog_info *si = env->subprog_info;
         int ret;
         
         for (int i = 0; i < env->subprog_cnt; i++) {
                 if (!i || si[i].is_async_cb) {
                         ret = check_max_stack_depth_subprog(env, i);
                         if (ret < 0)
                                 return ret;
                 }
                 continue;
         }
         return 0;
}

subprog root only starts from the main prog or async_cb.
So regular sync callback will is treated similar
to other direct-call subprog.

>
> Since progs are preemptible is the following possible:
>
> __noinline static void subprog(void)
> {
>    /* delay */
> }
>
> static int timer_cb(void *map, int *key, void *val)
> {
>    subprog();
> }
>
> SEC("tc")
> int prog1(void)
> {
>      bpf_timer_set_callback(  &timer_cb);
>      subprog();
>      return 0;
> }
>
> timers use softirq.
> I'm not sure whether it's the same stack or not.
> So it may be borderline ok-ish for other reasons,
> but the question remains. Will subprog recurse this way?

But for async cb, as you mentioned it is possible that
prog1->subprog could be called in process context
and the callback timer_cb->subprog could be called in
nested way on top of prog1->subprog.

To handle such cases, I guess I can refactor the code
to record maximum stack_tree_depth in subprog info and
do the checking after the subprog 0 and all async
progs are processed.

To handle a subprog may be used in more than one
subtree (subprog 0 tree or async tree), I need to
add a 'visited' field to bpf_subprog_info.
I think this should work.

next prev parent reply	other threads:[~2024-10-22 21:43 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-20 19:13 [PATCH bpf-next v6 0/9] bpf: Support private stack for bpf progs Yonghong Song
2024-10-20 19:13 ` [PATCH bpf-next v6 1/9] bpf: Allow each subprog having stack size of 512 bytes Yonghong Song
2024-10-22  1:18   ` Alexei Starovoitov
2024-10-22  3:21     ` Yonghong Song
2024-10-22  3:43       ` Alexei Starovoitov
2024-10-22  4:08         ` Yonghong Song
2024-10-22 20:13         ` Yonghong Song
2024-10-22 20:41           ` Alexei Starovoitov
2024-10-22 21:29             ` Kumar Kartikeya Dwivedi
2024-10-22 21:36               ` Kumar Kartikeya Dwivedi
2024-10-22 21:43             ` Yonghong Song [this message]
2024-10-22 21:57               ` Alexei Starovoitov
2024-10-22 22:41                 ` Yonghong Song
2024-10-22 22:59                   ` Alexei Starovoitov
2024-10-22 23:53                     ` Yonghong Song
2024-10-20 19:13 ` [PATCH bpf-next v6 2/9] bpf: Rename bpf_struct_ops_arg_info to bpf_struct_ops_func_info Yonghong Song
2024-10-20 19:13 ` [PATCH bpf-next v6 3/9] bpf: Support private stack for struct ops programs Yonghong Song
2024-10-22  1:34   ` Alexei Starovoitov
2024-10-22  2:59     ` Yonghong Song
2024-10-22 17:26     ` Martin KaFai Lau
2024-10-22 20:19       ` Alexei Starovoitov
2024-10-23 21:00         ` Tejun Heo
2024-10-23 23:07           ` Alexei Starovoitov
2024-10-24  0:56             ` Tejun Heo
2024-10-20 19:14 ` [PATCH bpf-next v6 4/9] bpf: Mark each subprog with proper private stack modes Yonghong Song
2024-10-20 22:01   ` Jiri Olsa
2024-10-21  4:22     ` Yonghong Song
2024-10-20 19:14 ` [PATCH bpf-next v6 5/9] bpf, x86: Refactor func emit_prologue Yonghong Song
2024-10-20 19:14 ` [PATCH bpf-next v6 6/9] bpf, x86: Create a helper for certain "reg <op>= imm" operations Yonghong Song
2024-10-20 19:14 ` [PATCH bpf-next v6 7/9] bpf, x86: Add jit support for private stack Yonghong Song
2024-10-20 19:14 ` [PATCH bpf-next v6 8/9] selftests/bpf: Add tracing prog private stack tests Yonghong Song
2024-10-20 21:59   ` Jiri Olsa
2024-10-21  4:32     ` Yonghong Song
2024-10-21 10:40       ` Jiri Olsa
2024-10-21 16:19         ` Yonghong Song
2024-10-21 21:13           ` Jiri Olsa
2024-10-20 19:14 ` [PATCH bpf-next v6 9/9] selftests/bpf: Add struct_ops " Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=489b0524-49bc-4df4-8744-1badd40824be@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox