All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	David Vernet <void@manifault.com>,
	lsf-pc <lsf-pc@lists.linux-foundation.org>
Subject: Re: [LSF/MM/BPF TOPIC] Segmented Stacks for BPF Programs
Date: Wed, 14 Feb 2024 19:07:14 -0800	[thread overview]
Message-ID: <77fbe6f8-b1df-44a2-a177-d3b9faba5482@linux.dev> (raw)
In-Reply-To: <CAADnVQJwN_NvjM2121urjutY3FqtzHxNWyGPWQzyzhCmFmDDzQ@mail.gmail.com>


On 2/14/24 6:20 PM, Alexei Starovoitov wrote:
> On Wed, Feb 14, 2024 at 11:53 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>> For each active kernel thread, the thread stack size is 2*PAGE_SIZE ([1]).
>> Each bpf program has a maximum stack size 512 bytes to avoid
>> overflowing the thread stack. But nested bpf programs may post
>> a challenge to avoid stack overflow.
>>
>> For example, currently we already allow nested bpf
>> programs esp in tracing, i.e.,
>>     Prog_A
>>       -> Call Helper_B
>>         -> Call Func_C
>>           -> fentry program is called due to Func_C.
>>             -> Call Helper_D and then Func_E
>>               -> fentry due to Func_E
>>                 -> ...
>> If we have too many bpf programs in the chain and each bpf program
>> has close to 512 byte stack size, it could overflow the kernel thread
>> stack.
>>
>> Another more practical potential use case is from a discussion between
>> Alexei and Tejun. It is possible for a complex scheduler like sched-ext,
>> we could have BPF prog hierarchy like below:
>>                          Prog_1 (at system level)
>>             Prog_Numa_1    Prog_Numa_2 ...  Prog_Numa_4
>>          Prog_LLC_1 Prog_LLC_2 ...
>>        Prog_CPU_1 ...
>>
>> Basically, the top bpf program (Prog_1) will call Prog_Numa_* programs
>>
>> through a kfunc to collect information from programs in each numa node.
>> Each Prog_Numa_* program will call Prog_LLC_* programs to collect
>> information from programs in each llc domain in that particular
>> numa node, etc. The same for Prog_LLC_* vs. Prog_CPU_*.
>> Now we have four level nested bpf programs.
>>
>> The proposed approach is to allocate stack from heap for
>> each bpf program. That way, we do not need to worry about
>> kernel stack overflow. Such an approach is called
>> segmented stacks ([2]) in clang/gcc/go etc.
>>
>> Obviously there are some drawbacks for segmented stack approach:
>>    - some performance degradation, so this approach may not for everyone.
>>    - stack backtracking,  kernel changes are necessary.
> I suspect segmented stacks the way compilers do them are not suitable
> for bpf progs, since they break backtraces and backtrace is a crucial
> feature that must work even when there are kernel bugs.
> How about we keep call/ret, save/restore of callee saved regs
> in the normal stack, but use a parallel memory (per-cpu or some other)
> for bpf prog needs. What bpf prog thinks of stack will be in that memory
> while the call chain will remain correct.
>  From bpf prog pov the stack is where bpf_reg_r10 points to.
> It doesn't have to be in the kernel stack. Shadow memory will work.

Thanks for suggestions. Make sense. This will resolve backtrace
issue. Let me experiment with this approach.

>
> Let's also call it something else than "segmented stack" to avoid
> confusion.

Indeed, "segmented stack" terminoloty is not appropriate any more since 
the original stack is backtracable. Will think of a different one.


  reply	other threads:[~2024-02-15  3:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-14 19:53 [LSF/MM/BPF TOPIC] Segmented Stacks for BPF Programs Yonghong Song
2024-02-15  2:20 ` Alexei Starovoitov
2024-02-15  3:07   ` Yonghong Song [this message]
2024-02-16  5:03 ` Daniel Xu
2024-02-19 18:56   ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77fbe6f8-b1df-44a2-a177-d3b9faba5482@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.