Re: [LSF/MM/BPF TOPIC] Segmented Stacks for BPF Programs

BPF List
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	David Vernet <void@manifault.com>,
	lsf-pc <lsf-pc@lists.linux-foundation.org>
Subject: Re: [LSF/MM/BPF TOPIC] Segmented Stacks for BPF Programs
Date: Wed, 14 Feb 2024 19:07:14 -0800	[thread overview]
Message-ID: <77fbe6f8-b1df-44a2-a177-d3b9faba5482@linux.dev> (raw)
In-Reply-To: <CAADnVQJwN_NvjM2121urjutY3FqtzHxNWyGPWQzyzhCmFmDDzQ@mail.gmail.com>


On 2/14/24 6:20 PM, Alexei Starovoitov wrote:
> On Wed, Feb 14, 2024 at 11:53 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>> For each active kernel thread, the thread stack size is 2*PAGE_SIZE ([1]).
>> Each bpf program has a maximum stack size 512 bytes to avoid
>> overflowing the thread stack. But nested bpf programs may post
>> a challenge to avoid stack overflow.
>>
>> For example, currently we already allow nested bpf
>> programs esp in tracing, i.e.,
>>     Prog_A
>>       -> Call Helper_B
>>         -> Call Func_C
>>           -> fentry program is called due to Func_C.
>>             -> Call Helper_D and then Func_E
>>               -> fentry due to Func_E
>>                 -> ...
>> If we have too many bpf programs in the chain and each bpf program
>> has close to 512 byte stack size, it could overflow the kernel thread
>> stack.
>>
>> Another more practical potential use case is from a discussion between
>> Alexei and Tejun. It is possible for a complex scheduler like sched-ext,
>> we could have BPF prog hierarchy like below:
>>                          Prog_1 (at system level)
>>             Prog_Numa_1    Prog_Numa_2 ...  Prog_Numa_4
>>          Prog_LLC_1 Prog_LLC_2 ...
>>        Prog_CPU_1 ...
>>
>> Basically, the top bpf program (Prog_1) will call Prog_Numa_* programs
>>
>> through a kfunc to collect information from programs in each numa node.
>> Each Prog_Numa_* program will call Prog_LLC_* programs to collect
>> information from programs in each llc domain in that particular
>> numa node, etc. The same for Prog_LLC_* vs. Prog_CPU_*.
>> Now we have four level nested bpf programs.
>>
>> The proposed approach is to allocate stack from heap for
>> each bpf program. That way, we do not need to worry about
>> kernel stack overflow. Such an approach is called
>> segmented stacks ([2]) in clang/gcc/go etc.
>>
>> Obviously there are some drawbacks for segmented stack approach:
>>    - some performance degradation, so this approach may not for everyone.
>>    - stack backtracking,  kernel changes are necessary.
> I suspect segmented stacks the way compilers do them are not suitable
> for bpf progs, since they break backtraces and backtrace is a crucial
> feature that must work even when there are kernel bugs.
> How about we keep call/ret, save/restore of callee saved regs
> in the normal stack, but use a parallel memory (per-cpu or some other)
> for bpf prog needs. What bpf prog thinks of stack will be in that memory
> while the call chain will remain correct.
>  From bpf prog pov the stack is where bpf_reg_r10 points to.
> It doesn't have to be in the kernel stack. Shadow memory will work.

Thanks for suggestions. Make sense. This will resolve backtrace
issue. Let me experiment with this approach.

>
> Let's also call it something else than "segmented stack" to avoid
> confusion.

Indeed, "segmented stack" terminoloty is not appropriate any more since 
the original stack is backtracable. Will think of a different one.

next prev parent reply	other threads:[~2024-02-15  3:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-14 19:53 [LSF/MM/BPF TOPIC] Segmented Stacks for BPF Programs Yonghong Song
2024-02-15  2:20 ` Alexei Starovoitov
2024-02-15  3:07   ` Yonghong Song [this message]
2024-02-16  5:03 ` Daniel Xu
2024-02-19 18:56   ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77fbe6f8-b1df-44a2-a177-d3b9faba5482@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox