From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Tejun Heo <tj@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
David Vernet <void@manifault.com>,
lsf-pc <lsf-pc@lists.linux-foundation.org>
Subject: Re: [LSF/MM/BPF TOPIC] Segmented Stacks for BPF Programs
Date: Wed, 14 Feb 2024 19:07:14 -0800 [thread overview]
Message-ID: <77fbe6f8-b1df-44a2-a177-d3b9faba5482@linux.dev> (raw)
In-Reply-To: <CAADnVQJwN_NvjM2121urjutY3FqtzHxNWyGPWQzyzhCmFmDDzQ@mail.gmail.com>
On 2/14/24 6:20 PM, Alexei Starovoitov wrote:
> On Wed, Feb 14, 2024 at 11:53 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>> For each active kernel thread, the thread stack size is 2*PAGE_SIZE ([1]).
>> Each bpf program has a maximum stack size 512 bytes to avoid
>> overflowing the thread stack. But nested bpf programs may post
>> a challenge to avoid stack overflow.
>>
>> For example, currently we already allow nested bpf
>> programs esp in tracing, i.e.,
>> Prog_A
>> -> Call Helper_B
>> -> Call Func_C
>> -> fentry program is called due to Func_C.
>> -> Call Helper_D and then Func_E
>> -> fentry due to Func_E
>> -> ...
>> If we have too many bpf programs in the chain and each bpf program
>> has close to 512 byte stack size, it could overflow the kernel thread
>> stack.
>>
>> Another more practical potential use case is from a discussion between
>> Alexei and Tejun. It is possible for a complex scheduler like sched-ext,
>> we could have BPF prog hierarchy like below:
>> Prog_1 (at system level)
>> Prog_Numa_1 Prog_Numa_2 ... Prog_Numa_4
>> Prog_LLC_1 Prog_LLC_2 ...
>> Prog_CPU_1 ...
>>
>> Basically, the top bpf program (Prog_1) will call Prog_Numa_* programs
>>
>> through a kfunc to collect information from programs in each numa node.
>> Each Prog_Numa_* program will call Prog_LLC_* programs to collect
>> information from programs in each llc domain in that particular
>> numa node, etc. The same for Prog_LLC_* vs. Prog_CPU_*.
>> Now we have four level nested bpf programs.
>>
>> The proposed approach is to allocate stack from heap for
>> each bpf program. That way, we do not need to worry about
>> kernel stack overflow. Such an approach is called
>> segmented stacks ([2]) in clang/gcc/go etc.
>>
>> Obviously there are some drawbacks for segmented stack approach:
>> - some performance degradation, so this approach may not for everyone.
>> - stack backtracking, kernel changes are necessary.
> I suspect segmented stacks the way compilers do them are not suitable
> for bpf progs, since they break backtraces and backtrace is a crucial
> feature that must work even when there are kernel bugs.
> How about we keep call/ret, save/restore of callee saved regs
> in the normal stack, but use a parallel memory (per-cpu or some other)
> for bpf prog needs. What bpf prog thinks of stack will be in that memory
> while the call chain will remain correct.
> From bpf prog pov the stack is where bpf_reg_r10 points to.
> It doesn't have to be in the kernel stack. Shadow memory will work.
Thanks for suggestions. Make sense. This will resolve backtrace
issue. Let me experiment with this approach.
>
> Let's also call it something else than "segmented stack" to avoid
> confusion.
Indeed, "segmented stack" terminoloty is not appropriate any more since
the original stack is backtracable. Will think of a different one.
next prev parent reply other threads:[~2024-02-15 3:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-14 19:53 [LSF/MM/BPF TOPIC] Segmented Stacks for BPF Programs Yonghong Song
2024-02-15 2:20 ` Alexei Starovoitov
2024-02-15 3:07 ` Yonghong Song [this message]
2024-02-16 5:03 ` Daniel Xu
2024-02-19 18:56 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=77fbe6f8-b1df-44a2-a177-d3b9faba5482@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox