All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v9 04/10] bpf: Check potential private stack recursion for progs with async callback
Date: Tue, 5 Nov 2024 13:26:14 -0800	[thread overview]
Message-ID: <c00685dc-c51b-4058-8373-93b01443143d@linux.dev> (raw)
In-Reply-To: <CAADnVQJzV_eRaNMzYP5Fj-FsSNx7-1-f0yXjtXSpeOqr9tBVAg@mail.gmail.com>




On 11/5/24 12:26 PM, Alexei Starovoitov wrote:
> On Mon, Nov 4, 2024 at 7:37 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 11/4/24 6:51 PM, Alexei Starovoitov wrote:
>>> On Mon, Nov 4, 2024 at 11:38 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> In previous patch, tracing progs are enabled for private stack since
>>>> recursion checking ensures there exists no nested same bpf prog run on
>>>> the same cpu.
>>>>
>>>> But it is still possible for nested bpf subprog run on the same cpu
>>>> if the same subprog is called in both main prog and async callback,
>>>> or in different async callbacks. For example,
>>>>     main_prog
>>>>      bpf_timer_set_callback(timer, timer_cb);
>>>>      call sub1
>>>>     sub1
>>>>      ...
>>>>     time_cb
>>>>      call sub1
>>>>
>>>> In the above case, nested subprog run for sub1 is possible with one in
>>>> process context and the other in softirq context. If this is the case,
>>>> the verifier will disable private stack for this bpf prog.
>>>>
>>>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>>>> ---
>>>>    include/linux/bpf_verifier.h |  2 ++
>>>>    kernel/bpf/verifier.c        | 42 +++++++++++++++++++++++++++++++-----
>>>>    2 files changed, 39 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>>> index 0622c11a7e19..e921589abc72 100644
>>>> --- a/include/linux/bpf_verifier.h
>>>> +++ b/include/linux/bpf_verifier.h
>>>> @@ -669,6 +669,8 @@ struct bpf_subprog_info {
>>>>           /* true if bpf_fastcall stack region is used by functions that can't be inlined */
>>>>           bool keep_fastcall_stack: 1;
>>>>           bool use_priv_stack: 1;
>>>> +       bool visited_with_priv_stack_accum: 1;
>>>> +       bool visited_with_priv_stack: 1;
>>>>
>>>>           u8 arg_cnt;
>>>>           struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 406195c433ea..e01b3f0fd314 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -6118,8 +6118,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>>>                                           idx, subprog_depth);
>>>>                                   return -EACCES;
>>>>                           }
>>>> -                       if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE)
>>>> +                       if (subprog_depth >= BPF_PRIV_STACK_MIN_SIZE) {
>>>>                                   subprog[idx].use_priv_stack = true;
>>>> +                               subprog[idx].visited_with_priv_stack = true;
>>>> +                       }
>>>> +               } else {
>>>> +                       subprog[idx].visited_with_priv_stack = true;
>>> See suggestion for patch 3.
>>> It's cleaner to rewrite with a single visited_with_priv_stack = true; statement.
>> Ack.
>>
>>>>                   }
>>>>           }
>>>>    continue_func:
>>>> @@ -6220,10 +6224,12 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx,
>>>>    static int check_max_stack_depth(struct bpf_verifier_env *env)
>>>>    {
>>>>           struct bpf_subprog_info *si = env->subprog_info;
>>>> +       enum priv_stack_mode orig_priv_stack_supported;
>>>>           enum priv_stack_mode priv_stack_supported;
>>>>           int ret, subtree_depth = 0, depth_frame;
>>>>
>>>>           priv_stack_supported = bpf_enable_priv_stack(env->prog);
>>>> +       orig_priv_stack_supported = priv_stack_supported;
>>>>
>>>>           if (priv_stack_supported != NO_PRIV_STACK) {
>>>>                   for (int i = 0; i < env->subprog_cnt; i++) {
>>>> @@ -6240,13 +6246,39 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
>>>>                                                               priv_stack_supported);
>>>>                           if (ret < 0)
>>>>                                   return ret;
>>>> +
>>>> +                       if (priv_stack_supported != NO_PRIV_STACK) {
>>>> +                               for (int j = 0; j < env->subprog_cnt; j++) {
>>>> +                                       if (si[j].visited_with_priv_stack_accum &&
>>>> +                                           si[j].visited_with_priv_stack) {
>>>> +                                               /* si[j] is visited by both main/async subprog
>>>> +                                                * and another async subprog.
>>>> +                                                */
>>>> +                                               priv_stack_supported = NO_PRIV_STACK;
>>>> +                                               break;
>>>> +                                       }
>>>> +                                       if (!si[j].visited_with_priv_stack_accum)
>>>> +                                               si[j].visited_with_priv_stack_accum =
>>>> +                                                       si[j].visited_with_priv_stack;
>>>> +                               }
>>>> +                       }
>>>> +                       if (priv_stack_supported != NO_PRIV_STACK) {
>>>> +                               for (int j = 0; j < env->subprog_cnt; j++)
>>>> +                                       si[j].visited_with_priv_stack = false;
>>>> +                       }
>>> I cannot understand what this algorithm is doing.
>>> What is the meaning of visited_with_priv_stack_accum ?
>> The following is an example to show how the algorithm works.
>> Let us say we have prog like
>>      main_prog0  si[0]
>>        sub1      si[1]
>>        sub2      si[2]
>>      async1      si[3]
>>        sub4      si[4]
>>        sub2      si[2]
>>      async2      si[5]
>>        sub4      si[4]
>>        sub5      si[6]
>>
>>
>> Total 9 subprograms.
>>
>> after iteration 1 (main_prog0)
>>      visited_with_priv_stack_accum: si[i] = false for i = 0 ... 9
>>      visited_with_priv_stack: si[0] = si[1] = si[2] = true, others false
>>
>>      for all i, visited_with_priv_stack_accum[i] and visited_with_priv_stack[i]
>>      is false, so main_prog0 can use priv stack.
>>
>>      visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
>>      visited_with_priv_stack cleared with false.
>>
>> after iteration 2 (async1)
>>      visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
>>      visited_with_priv_stack: si[2] = si[3] = si[4] = true, others false
>>
>>      Here, si[2] appears in both visited_with_priv_stack_accum and
>>      visited_with_priv_stack, so async1 cannot have priv stack.
>>
>>      In my algorithm, I flipped the whole thing to no_priv_stack, which is
>>      too conservative. We should just skip async1 and continues.
>>
>>      Let us say, we say async1 not having priv stack while main_prog0 has.
>>
>>      /* the same as end of iteration 1 */
>>      visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
>>      visited_with_priv_stack cleared with false.
>>
>> after iteration 3 (async2)
>>      visited_with_priv_stack_accum: si[0] = si[1] = si[2] = true; others false
>>      visited_with_priv_stack: si[4] = si[5] = si[6] = true;
>>
>>      there are no conflict, so async2 can use private stack.
>>
>>
>> If we only have one bit in bpf_subprog_info, for a async tree,
>> if marking a subprog to be true and later we found there is a conflict in
>> async tree and we need make the whole async subprogs not eligible for priv stack,
>> then it will be hard to undo previous markings.
>>
>> So visited_with_priv_stack_accum is to accumulate "true" results from
>> main_prog/async's.
> I see. I think it works, but feels complicated.
> It feels it should be possible to do without extra flags. Like
> check_max_stack_depth_subprog() will know whether it was called
> to verify async_cb or not.
> So it's just a matter of adding single 'if' to it:
> if (subprog[idx].use_priv_stack && checking_async_cb)
>     /* reset to false due to potential recursion */
>     subprog[idx].use_priv_stack = false;
>
> check_max_stack_depth() starts with i==0,
> so reachable and eligible subprogs will be marked with use_priv_stack.
> Then check_max_stack_depth_subprog() will be called again
> to verify async. If it sees the mark it's a bad case.
> what am I missing?

First I think we still want to mark some subprogs in async tree
to use private stack, right? If this is the case, then let us see
the following examle:

main_prog:
    sub1: use_priv_stack = true
    sub2" use_priv_stack = true

async: /* calling sub1 twice */
    sub1
      <=== we do
             if (subprog[idx].use_priv_stack && checking_async_cb)
                 subprog[idx].use_priv_stack = false;
    sub1
      <=== here we have subprog[idx].use_priv_stack = false;
           we could mark use_priv_stack = true again here
           since logic didn't keep track of sub1 has been
           visited before.

To solve the above issue, we need one visited bit in bpf_subprog_info.
After finishing async tree, if for any subprog,
   visited_bit && subprog[idx].use_priv_stack
is true, we can mark subprog[idx].use_priv_stack = false

So one visited bit is enough.

More complicated case is two asyncs. For example:

main_prog:
   sub1
   sub2

async1:
   sub3

async2:
   sub3

If async1/sub3 and async2/sub3 can be nested, then we will
need two visited bits as I have above.
If async1/sub3 and async2/sub3 cannot be nested, then one
visited bit should be enough, since we can traverse
async1/async2 with 'visited' marking and then compare against
main prog.

So the question would be:
   1. Is it possible that two async call backs may nest with
      each other? I actually do not know the answer.
   2. Do we want to allow subprogs in async tree to use private
      stacks?


  reply	other threads:[~2024-11-05 21:26 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-04 19:34 [PATCH bpf-next v9 00/10] bpf: Support private stack for bpf progs Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 01/10] bpf: Check stack depth limit after visiting all subprogs Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 02/10] bpf: Return false for bpf_prog_check_recur() default case Yonghong Song
2024-11-05  1:21   ` Alexei Starovoitov
2024-11-05  1:35     ` Yonghong Song
2024-11-05  1:55       ` Alexei Starovoitov
2024-11-05  2:53         ` Yonghong Song
2024-11-05  3:50           ` Yonghong Song
2024-11-05  4:28             ` Alexei Starovoitov
2024-11-05  6:02               ` Yonghong Song
2024-11-05 15:50                 ` Alexei Starovoitov
2024-11-05 16:33                   ` Yonghong Song
2024-11-05 16:38                     ` Alexei Starovoitov
2024-11-05 16:48                       ` Yonghong Song
2024-11-05 17:47                         ` Alexei Starovoitov
2024-11-04 19:35 ` [PATCH bpf-next v9 03/10] bpf: Allow private stack to have each subprog having stack size of 512 bytes Yonghong Song
2024-11-05  2:47   ` Alexei Starovoitov
2024-11-05  3:09     ` Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 04/10] bpf: Check potential private stack recursion for progs with async callback Yonghong Song
2024-11-05  2:51   ` Alexei Starovoitov
2024-11-05  3:37     ` Yonghong Song
2024-11-05 20:26       ` Alexei Starovoitov
2024-11-05 21:26         ` Yonghong Song [this message]
2024-11-05 21:52           ` Alexei Starovoitov
2024-11-06  0:19             ` Yonghong Song
2024-11-06  1:07               ` Alexei Starovoitov
2024-11-06  2:33                 ` Yonghong Song
2024-11-06  6:55                 ` Yonghong Song
2024-11-06 15:26                   ` Alexei Starovoitov
2024-11-06 15:44                     ` Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 05/10] bpf: Allocate private stack for eligible main prog or subprogs Yonghong Song
2024-11-05  1:38   ` Alexei Starovoitov
2024-11-05  3:07     ` Yonghong Song
2024-11-05  3:44       ` Yonghong Song
2024-11-05  5:19         ` Alexei Starovoitov
2024-11-05  6:05           ` Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 06/10] bpf, x86: Avoid repeated usage of bpf_prog->aux->stack_depth Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 07/10] bpf, x86: Support private stack in jit Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 08/10] selftests/bpf: Add tracing prog private stack tests Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 09/10] bpf: Support private stack for struct_ops progs Yonghong Song
2024-11-04 19:35 ` [PATCH bpf-next v9 10/10] selftests/bpf: Add struct_ops prog private stack tests Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c00685dc-c51b-4058-8373-93b01443143d@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.