public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v9 04/10] bpf: Check potential private stack recursion for progs with async callback
Date: Tue, 5 Nov 2024 13:26:14 -0800	[thread overview]
Message-ID: <c00685dc-c51b-4058-8373-93b01443143d@linux.dev> (raw)
In-Reply-To: <CAADnVQJzV_eRaNMzYP5Fj-FsSNx7-1-f0yXjtXSpeOqr9tBVAg@mail.gmail.com>




On 11/5/24 12:26 PM, Alexei Starovoitov wrote:
> On Mon, Nov 4, 2024 at 7:37 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 11/4/24 6:51 PM, Alexei Starovoitov wrote:
>>> On Mon, Nov 4, 2024 at 11:38 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> In previous patch, tracing progs are enabled for private stack since
>>>> recursion checking ensures there exists no nested same bpf prog run on
>>>> the same cpu.
>>>>
>>>> But it is still possible for nested bpf subprog run on the same cpu
>>>> if the same subprog is called in both main prog and async callback,
>>>> or in different async callbacks. For example,
>>>>     main_prog
>>>>      bpf_timer_set_callback(timer, timer_cb);
>>>>      call sub1
>>>>     sub1
>>>>      ...
>>>>     time_cb
>>>>      call sub1
>>>>
>>>> In the above case, nested subprog run for sub1 is possible with one in
>>>> process context and the other in softirq context. If this is the case,
>>>> the verifier will disable private stack for this bpf prog.
>>>>
>>>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>>>> ---
>>>>    include/linux/bpf_verifier.h |  2 ++
>>>>    kernel/bpf/verifier.c        | 42 +++++++++++++++++++++++++++++++-----
>>>>    2 files changed, 39 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>>> index 0622c11a7e19..e921589abc72 100644
>>>> --- a/include/linux/bpf_verifier.h
>>>> +++ b/include/linux/bpf_verifier.h
>>>> @@ -669,6 +669,8 @@ struct bpf_subprog_info {
>>>>           /* true if bpf_fastcall stack region is used by functions that can't be inlined */
>>>>           bool keep_fastcall_stack: 1;
>>>>           bool use_priv_stack: 1;
>>>> +       bool visited_with_priv_stack_accum: 1;
>>>> +       bool visited_with_priv_stack: 1;
>>>>
>>>>           u8 arg_cnt;
>>>>           struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 406195c433ea..e01b3f0fd314 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -6118,8 +6118,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>>>                                           idx, subprog_depth);
>>>>                                   return -EACCES;
>>>>                           }
>>>> -                       if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE)
>>>> +                       if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE) {
>>>>                                   subprog[idx].use_priv_stack = true;
>>>> +                               subprog[idx].visited_with_priv_stack = true;
>>>> +                       }
>>>> +               } else {
>>>> +                       subprog[idx].visited_with_priv_stack = true;
>>> See suggestion for patch 3.
>>> It's cleaner to rewrite with a single visited_with_priv_stack = true; statement.
>> Ack.
>>
>>>>                   }
>>>>           }
>>>>    continue_func:
>>>> @@ -6220,10 +6224,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>>>    static int check_max_stack_depth(struct bpf_verifier_env *env)
>>>>    {
>>>>           struct bpf_subprog_info *si = env->subprog_info;
>>>> +       enum priv_stack_mode orig_priv_stack_supported;
>>>>           enum priv_stack_mode priv_stack_supported;
>>>>           int ret, subtree_depth = 0, depth_frame;
>>>>
>>>>           priv_stack_supported = bpf_enable_priv_stack(env->prog);
>>>> +       orig_priv_stack_supported = priv_stack_supported;
>>>>
>>>>           if (priv_stack_supported != NO_PRIV_STACK) {
>>>>                   for (int i = 0; i < env->subprog_cnt; i++) {
>>>> @@ -6240,13 +6246,39 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
>>>>                                                               priv_stack_supported);
>>>>                           if (ret < 0)
>>>>                                   return ret;
>>>> +
>>>> +                       if (priv_stack_supported != NO_PRIV_STACK) {
>>>> +                               for (int j = 0; j < env->subprog_cnt; j++) {
>>>> +                                       if (si[j].visited_with_priv_stack_accum &&
>>>> +                                           si[j].visited_with_priv_stack) {
>>>> +                                               /* si[j] is visited by both main/async subprog
>>>> +                                                * and another async subprog.
>>>> +                                                */
>>>> +                                               priv_stack_supported = NO_PRIV_STACK;
>>>> +                                               break;
>>>> +                                       }
>>>> +                                       if (!si[j].visited_with_priv_stack_accum)
>>>> +                                               si[j].visited_with_priv_stack_accum =
>>>> +                                                       si[j].visited_with_priv_stack;
>>>> +                               }
>>>> +                       }
>>>> +                       if (priv_stack_supported != NO_PRIV_STACK) {
>>>> +                               for (int j = 0; j < env->subprog_cnt; j++)
>>>> +                                       si[j].visited_with_priv_stack = false;
>>>> +                       }
>>> I cannot understand what this algorithm is doing.
>>> What is the meaning of visited_with_priv_stack_accum ?
>> The following is an example to show how the algorithm works.
>> Let us say we have prog like
>>      main_prog0  si[0]
>>        sub1      si[1]
>>        sub2      si[2]
>>      async1      si[3]
>>        sub4      si[4]
>>        sub2      si[2]
>>      async2      si[5]
>>        sub4      si[4]
>>        sub5      si[6]
>>
>>
>> Total 9 subprograms.
>>
>> after iteration 1 (main_prog0)
>>      visited_with_priv_stack_accum: si[i] = false for i = 0 ... 9
>>      visited_with_priv_stack: si[0] = si[1] = si[2] = true, others false
>>
>>      for all i, visited_with_priv_stack_accum[i] and visited_with_priv_stack[i]
>>      is false, so main_prog0 can use priv stack.
>>
>>      visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
>>      visited_with_priv_stack cleared with false.
>>
>> after iteration 2 (async1)
>>      visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
>>      visited_with_priv_stack: si[2] = si[3] = si[4] = true, others false
>>
>>      Here, si[2] appears in both visited_with_priv_stack_accum and
>>      visited_with_priv_stack, so async1 cannot have priv stack.
>>
>>      In my algorithm, I flipped the whole thing to no_priv_stack, which is
>>      too conservative. We should just skip async1 and continues.
>>
>>      Let us say, we say async1 not having priv stack while main_prog0 has.
>>
>>      /* the same as end of iteration 1 */
>>      visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
>>      visited_with_priv_stack cleared with false.
>>
>> after iteration 3 (async2)
>>      visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
>>      visited_with_priv_stack: si[4] = si[5] = si[6] = true;
>>
>>      there are no conflict, so async2 can use private stack.
>>
>>
>> If we only have one bit in bpf_subprog_info, for a async tree,
>> if marking a subprog to be true and later we found there is a conflict in
>> async tree and we need make the whole async subprogs not eligible for priv stack,
>> then it will be hard to undo previous markings.
>>
>> So visited_with_priv_stack_accum is to accumulate "true" results from
>> main_prog/async's.
> I see. I think it works, but feels complicated.
> It feels it should be possible to do without extra flags. Like
> check_max_stack_depth_subprog() will know whether it was called
> to verify async_cb or not.
> So it's just a matter of adding single 'if' to it:
> if (subprog[idx].use_priv_stack && checking_async_cb)
>     /* reset to false due to potential recursion */
>     subprog[idx].use_priv_stack = false;
>
> check_max_stack_depth() starts with i==0,
> so reachable and eligible subprogs will be marked with use_priv_stack.
> Then check_max_stack_depth_subprog() will be called again
> to verify async. If it sees the mark it's a bad case.
> what am I missing?

First I think we still want to mark some subprogs in async tree
to use private stack, right? If this is the case, then let us see
the following examle:

main_prog:
    sub1: use_priv_stack = true
    sub2" use_priv_stack = true

async: /* calling sub1 twice */
    sub1
      <=== we do
             if (subprog[idx].use_priv_stack && checking_async_cb)
                 subprog[idx].use_priv_stack = false;
    sub1
      <=== here we have subprog[idx].use_priv_stack = false;
           we could mark use_priv_stack = true again here
           since logic didn't keep track of sub1 has been
           visited before.

To solve the above issue, we need one visited bit in bpf_subprog_info.
After finishing async tree, if for any subprog,
   visited_bit && subprog[idx].use_priv_stack
is true, we can mark subprog[idx].use_priv_stack = false

So one visited bit is enough.

More complicated case is two asyncs. For example:

main_prog:
   sub1
   sub2

async1:
   sub3

async2:
   sub3

If async1/sub3 and async2/sub3 can be nested, then we will
need two visited bits as I have above.
If async1/sub3 and async2/sub3 cannot be nested, then one
visited bit should be enough, since we can traverse
async1/async2 with 'visited' marking and then compare against
main prog.

So the question would be:
   1. Is it possible that two async call backs may nest with
      each other? I actually do not know the answer.
   2. Do we want to allow subprogs in async tree to use private
      stacks?


  reply	other threads:[~2024-11-05 21:26 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-04 19:34 [PATCH bpf-next v9 00/10] bpf: Support private stack for bpf progs Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 01/10] bpf: Check stack depth limit after visiting all subprogs Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 02/10] bpf: Return false for bpf_prog_check_recur() default case Yonghong Song
2024-11-05  1:21   ` Alexei Starovoitov
2024-11-05  1:35     ` Yonghong Song
2024-11-05  1:55       ` Alexei Starovoitov
2024-11-05  2:53         ` Yonghong Song
2024-11-05  3:50           ` Yonghong Song
2024-11-05  4:28             ` Alexei Starovoitov
2024-11-05  6:02               ` Yonghong Song
2024-11-05 15:50                 ` Alexei Starovoitov
2024-11-05 16:33                   ` Yonghong Song
2024-11-05 16:38                     ` Alexei Starovoitov
2024-11-05 16:48                       ` Yonghong Song
2024-11-05 17:47                         ` Alexei Starovoitov
2024-11-04 19:35 ` [PATCH bpf-next v9 03/10] bpf: Allow private stack to have each subprog having stack size of 512 bytes Yonghong Song
2024-11-05  2:47   ` Alexei Starovoitov
2024-11-05  3:09     ` Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 04/10] bpf: Check potential private stack recursion for progs with async callback Yonghong Song
2024-11-05  2:51   ` Alexei Starovoitov
2024-11-05  3:37     ` Yonghong Song
2024-11-05 20:26       ` Alexei Starovoitov
2024-11-05 21:26         ` Yonghong Song [this message]
2024-11-05 21:52           ` Alexei Starovoitov
2024-11-06  0:19             ` Yonghong Song
2024-11-06  1:07               ` Alexei Starovoitov
2024-11-06  2:33                 ` Yonghong Song
2024-11-06  6:55                 ` Yonghong Song
2024-11-06 15:26                   ` Alexei Starovoitov
2024-11-06 15:44                     ` Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 05/10] bpf: Allocate private stack for eligible main prog or subprogs Yonghong Song
2024-11-05  1:38   ` Alexei Starovoitov
2024-11-05  3:07     ` Yonghong Song
2024-11-05  3:44       ` Yonghong Song
2024-11-05  5:19         ` Alexei Starovoitov
2024-11-05  6:05           ` Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 06/10] bpf, x86: Avoid repeated usage of bpf_prog->aux->stack_depth Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 07/10] bpf, x86: Support private stack in jit Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 08/10] selftests/bpf: Add tracing prog private stack tests Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 09/10] bpf: Support private stack for struct_ops progs Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 10/10] selftests/bpf: Add struct_ops prog private stack tests Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c00685dc-c51b-4058-8373-93b01443143d@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox