BPF List
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>,
	Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: yet another approach Was: [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack
Date: Tue, 1 Oct 2024 23:48:51 -0700	[thread overview]
Message-ID: <6366e7d3-f81b-4837-b105-bced5217c95f@linux.dev> (raw)
In-Reply-To: <a1686631-3c65-4ed0-bdb6-90fa1f0c6242@linux.dev>


On 10/1/24 11:28 PM, Yonghong Song wrote:
>
> On 10/1/24 7:16 PM, Kumar Kartikeya Dwivedi wrote:
>> On Wed, 2 Oct 2024 at 03:26, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>> On Tue, Oct 1, 2024 at 5:23 PM Kumar Kartikeya Dwivedi 
>>> <memxor@gmail.com> wrote:
>>>> Makes sense, though will we have cases where hierarchical scheduling
>>>> attaches the same prog at different points of the hierarchy?
>>> I'm not sure anyone was asking for such a use case.
>> I wondered because why would you then need a limit of 4 (say instead
>> of disallowing it)?
>>
>>>> Then the
>>>> limit of 4 may not be enough (e.g. say with cgroup nested levels > 4).
>>> Well, 4 was the number from TJ.
>>>
>> Ok, then let's assume 4 would be enough.
>>
>>> Anyway the proposed pseudo code:
>>>
>>> __bpf_prog_enter_recur_limited()
>>> {
>>>    cnt = this_cpu_inc_return(*(prog->active));
>>>    if (cnt > 4) {
>>>       inc_miss
>>>       return 0;
>>>    }
>>>   // pass cnt into bpf prog somehow, like %rdx ?
>>>   // or re-read prog->active from prog
>>> }
>>>
>>>
>>> then in the prologue emit:
>>>
>>> push rbp
>>> mov rbp, rsp
>>> if %rdx == 1
>>>     // main prog is called for the first time
>>>     mov rsp, pcpu_priv_stack_top
>
> This sounds good in high level. I still need to figure out
> 'if %rdx == 1' part and how to implement this.

Okay, looks like trampoline could supply rdx == 1.

>
>>> else
>>>     // 2+nd time main prog is called or 1+ time subprog
>>>    sub rsp, stack_size
>>>    if rsp < pcpu_priv_stack_bottom
>>>      goto exit  // stack is too small, exit
>>> fi
>> I think we need just the second part for subprogs, right?
>> Since rdx is R3 (arg into subprog).
>> I guess that's what you meant in the pseudocode.
>> But otherwise sounds good.
>> The benefit with stack probing is we don't exactly limit to 4 cases.
>>
>> Another option instead of the branch in main prog is to divide in 4
>> slots (as you said before) and choose the slot based on cnt.
>> But then we're stuck with a max limit of 4. Since we're allocating
>> stack size of bpf + extra (which I guess is 8K?). rdx can be used to
>> pass in the priv_stack address of the right slot.
>>
>> So I think the probing version seems better. We can probably pass in
>> rdx = priv_stack and then test and cmov instead for main prog.
>
> Yes, we do not need to limit to 4, checking rsp < pcpu_priv_stack_bottom
> should be okay.
>
>>
>>> Since stack bottom/top are known at JIT time we can
>>> generate reliable stack overflow checks.
>>> Much better than guard pages and -fstack-protector.
>>> The prog can alloc percpu
>>> (stack size of main prog + subprogs + extra) * 4
>> extra will be 8K, I guess (same as kernel stack size)?
>> Just confirming.
>>
>>> and it likely will be enough.
>>> If not, the stack protection will gently exit the prog
>>> when the stack is too deep.
>> I like this stack probing version, since there's no hard limit on the
>> number of recursions, and it's safe against stack overflow as well.
>>
>>> kfunc won't have such a check, so we need a buffer zone.
>>> Can have a guard page too, but feels like overkill.
>> I was leaning toward saying yes for a guard page, since we'll atleast
>> have a hard error instead of random corruption if the kfunc goes
>> beyond the bottom after probing succeeds.
>>
>> But the better way might be doing if rsp < pcpu_priv_stack_bottom +
>> 8K, so we leave max headroom we reserve for kernel stuff (or say add
>> 4K instead, which should be good enough), and then skip execution.
>

  reply	other threads:[~2024-10-02  6:49 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-26 23:45 [PATCH bpf-next v3 0/5] bpf: Support private stack for bpf progs Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 1/5] bpf: Allow each subprog having stack size of 512 bytes Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 2/5] bpf: Collect stack depth information Yonghong Song
2024-09-30 14:42   ` Alexei Starovoitov
2024-09-30 16:23     ` Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 3/5] bpf: Mark each subprog with proper pstack states Yonghong Song
2024-09-30 14:49   ` Alexei Starovoitov
2024-09-30 16:26     ` Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack Yonghong Song
2024-09-27  4:58   ` Leon Hwang
2024-09-27 15:24     ` Yonghong Song
2024-09-29  8:31   ` kernel test robot
2024-09-30 16:29     ` Yonghong Song
2024-09-29 13:02   ` kernel test robot
2024-09-30 16:31     ` Yonghong Song
2024-09-29 13:34   ` kernel test robot
2024-09-30 15:03   ` Alexei Starovoitov
2024-09-30 16:33     ` Yonghong Song
2024-10-01  4:31     ` Kumar Kartikeya Dwivedi
2024-10-01  4:37       ` Kumar Kartikeya Dwivedi
2024-10-01 18:49         ` Alexei Starovoitov
2024-10-01 19:53           ` yet another approach Was: " Alexei Starovoitov
2024-10-01 20:50             ` Kumar Kartikeya Dwivedi
2024-10-01 21:28               ` Alexei Starovoitov
2024-10-02  0:22                 ` Kumar Kartikeya Dwivedi
2024-10-02  1:26                   ` Alexei Starovoitov
2024-10-02  2:16                     ` Kumar Kartikeya Dwivedi
2024-10-02  6:28                       ` Yonghong Song
2024-10-02  6:48                         ` Yonghong Song [this message]
2024-10-03  6:17                     ` Yonghong Song
2024-10-03 13:39                       ` Kumar Kartikeya Dwivedi
2024-10-03 17:35                         ` Alexei Starovoitov
2024-10-03 18:53                           ` Yonghong Song
2024-10-03 20:44                           ` Yonghong Song
2024-10-03 20:47                             ` Kumar Kartikeya Dwivedi
2024-10-03 20:54                               ` Yonghong Song
2024-10-03 22:32                             ` Alexei Starovoitov
2024-10-04  5:22                               ` Yonghong Song
2024-10-04 19:27                                 ` Yonghong Song
2024-10-04 19:52                                   ` Alexei Starovoitov
2024-10-05  2:03                                     ` Yonghong Song
2024-10-08 22:10                                       ` Alexei Starovoitov
2024-10-09  2:06                                         ` Alexei Starovoitov
2024-10-09  6:31                                           ` Yonghong Song
2024-10-09 14:56                                             ` Alexei Starovoitov
2024-10-09 15:56                                               ` Yonghong Song
2024-10-09 16:36                                           ` Kumar Kartikeya Dwivedi
2024-10-09 16:38                                             ` Kumar Kartikeya Dwivedi
2024-10-09 17:37                                               ` Kumar Kartikeya Dwivedi
2024-10-09  6:12                                         ` Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add private stack tests Yonghong Song
2024-09-30 13:40   ` Jiri Olsa
2024-09-30 15:05     ` Alexei Starovoitov
2024-09-30 16:35       ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6366e7d3-f81b-4837-b105-bced5217c95f@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox