From: Yonghong Song <yonghong.song@linux.dev>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <kernel-team@fb.com>,
Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: yet another approach Was: [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack
Date: Thu, 3 Oct 2024 13:54:18 -0700 [thread overview]
Message-ID: <89c98687-2087-46eb-8341-6ae65d70cb9c@linux.dev> (raw)
In-Reply-To: <CAP01T75CB=dEzXaHjJK6GCUrZUEqyzw+dxqHZuZLjCE-UyVH4w@mail.gmail.com>
On 10/3/24 1:47 PM, Kumar Kartikeya Dwivedi wrote:
> On Thu, 3 Oct 2024 at 22:44, Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 10/3/24 10:35 AM, Alexei Starovoitov wrote:
>>> On Thu, Oct 3, 2024 at 6:40 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>>> On Thu, 3 Oct 2024 at 08:17, Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>> On 10/1/24 6:26 PM, Alexei Starovoitov wrote:
>>>>>> On Tue, Oct 1, 2024 at 5:23 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>>>>>> Makes sense, though will we have cases where hierarchical scheduling
>>>>>>> attaches the same prog at different points of the hierarchy?
>>>>>> I'm not sure anyone was asking for such a use case.
>>>>>>
>>>>>>> Then the
>>>>>>> limit of 4 may not be enough (e.g. say with cgroup nested levels > 4).
>>>>>> Well, 4 was the number from TJ.
>>>>>>
>>>>>> Anyway the proposed pseudo code:
>>>>>>
>>>>>> __bpf_prog_enter_recur_limited()
>>>>>> {
>>>>>> cnt = this_cpu_inc_return(*(prog->active));
>>>>>> if (cnt > 4) {
>>>>>> inc_miss
>>>>>> return 0;
>>>>>> }
>>>>>> // pass cnt into bpf prog somehow, like %rdx ?
>>>>>> // or re-read prog->active from prog
>>>>>> }
>>>>>>
>>>>>>
>>>>>> then in the prologue emit:
>>>>>>
>>>>>> push rbp
>>>>>> mov rbp, rsp
>>>>>> if %rdx == 1
>>>>>> // main prog is called for the first time
>>>>>> mov rsp, pcpu_priv_stack_top
>>>>>> else
>>>>>> // 2+nd time main prog is called or 1+ time subprog
>>>>>> sub rsp, stack_size
>>>>>> if rsp < pcpu_priv_stack_bottom
>>>>>> goto exit // stack is too small, exit
>>>>>> fi
>>>>> I have tried to implement this approach (not handling
>>>>> recursion yet) based on the above approach. It works
>>>>> okay with nested bpf subprogs like
>>>>> main prog // set rsp = pcpu_priv_stack_top
>>>>> subprog1 // some stack
>>>>> subprog2 // some stack
>>>>>
>>>>> The pcpu_priv_stack is allocated like
>>>>> priv_stack_ptr = __alloc_percpu_gfp(1024 * 16, 8, GFP_KERNEL);
>>>>>
>>>>> But whenever the prog called an external function,
>>>>> e.g. a helper in this case, I will get a double fault.
>>>>> An example could be
>>>>> main prog // set rsp = pcpu_priv_stack_top
>>>>> subprog1 // some stack
>>>>> subprog2 // some stack
>>>>> call bpf_seq_printf
>>>>> (I modified bpf_iter_ipv6_route.c bpf prog for the above
>>>>> purpose.)
>>>>> I added some printk statements from the beginning of bpf_seq_printf and
>>>>> nothing printed out either and of course traps still happens.
>>>>>
>>>>> I tried another example without subprog and the mainprog calls
>>>>> a helper and the same double traps happens below too.
>>>>>
>>>>> The error log looks like
>>>>>
>>>>> [ 54.024955] traps: PANIC: double fault, error_code: 0x0
>>>>> [ 54.024969] Oops: double fault: 0000 [#1] PREEMPT SMP KASAN PTI
>>>>> [ 54.024977] CPU: 3 UID: 0 PID: 1946 Comm: test_progs Tainted: G OE 6.11.0-10577-gf25c172fd840-dirty #968
>>>>> [ 54.024982] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>>>>> [ 54.024983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>>>> [ 54.024986] RIP: 0010:error_entry+0x1e/0x140
>>>>> [ 54.024996] Code: ff ff 90 90 90 90 90 90 90 90 90 90 56 48 8b 74 24 08 48 89 7c 24 08 52 51 50 41 50 41 51 41 52 41 53 53 55 41 54 41 55 41 56 <41> 57 56 31 f6 31 d1
>>>>> [ 54.024999] RSP: 0018:ffffe8ffff580000 EFLAGS: 00010806
>>>>> [ 54.025002] RAX: f3f3f300f1f1f1f1 RBX: fffff91fffeb0044 RCX: ffffffff84201701
>>>>> [ 54.025005] RDX: fffff91fffeb0044 RSI: ffffffff8420128d RDI: ffffe8ffff580178
>>>>> [ 54.025007] RBP: ffffe8ffff580140 R08: 0000000000000000 R09: 0000000000000000
>>>>> [ 54.025009] R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
>>>>> [ 54.025010] R13: 1ffffd1fffeb0014 R14: 0000000000000003 R15: ffffe8ffff580178
>>>>> [ 54.025012] FS: 00007fd076525d00(0000) GS:ffff8881f7180000(0000) knlGS:0000000000000000
>>>>> [ 54.025015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 54.025017] CR2: ffffe8ffff57fff8 CR3: 000000010cd80002 CR4: 0000000000370ef0
>>>>> [ 54.025021] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [ 54.025022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> [ 54.025024] Call Trace:
>>>>> [ 54.025026] <#DF>
>>>>> [ 54.025028] ? __die_body+0xaf/0xc0
>>>>> [ 54.025032] ? die+0x2f/0x50
>>>>> [ 54.025036] ? exc_double_fault+0x73/0x80
>>>>> [ 54.025040] ? asm_exc_double_fault+0x23/0x30
>>>>> [ 54.025044] ? common_interrupt_return+0xb1/0xcc
>>>>> [ 54.025048] ? asm_exc_page_fault+0xd/0x30
>>>>> [ 54.025051] ? error_entry+0x1e/0x140
>>>>> [ 54.025055] </#DF>
>>>>> [ 54.025056] Modules linked in: bpf_testmod(OE)
>>>>> [ 54.025061] ---[ end trace 0000000000000000 ]---
>>>>>
>>>>> Maybe somebody could give a hint why I got a double fault
>>>>> when calling external functions (outside of bpf programs)
>>>>> with allocated stack?
>>>>>
>>>> I will help in debugging. Can you share the patch you applied locally
>>>> so I can reproduce?
>>> Looks like the idea needs more thought.
>>>
>>> in_task_stack() won't recognize the private stack,
>>> so it will look like stack overflow and double fault.
>>>
>>> do you have CONFIG_VMAP_STACK ?
>> Yes, my above test runs fine withCONFIG_VMAP_STACK. Let me guard private stack support with
>> CONFIG_VMAP_STACK for now. Not sure whether distributions enable
>> CONFIG_VMAP_STACK or not.
>>
> I think it is the default on most distributions (Debian, Ubuntu, Fedora, etc.).
Thanks for confirmation! Great CONFIG_VMAP_STACK is on by default for most distro's.
next prev parent reply other threads:[~2024-10-03 20:54 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-26 23:45 [PATCH bpf-next v3 0/5] bpf: Support private stack for bpf progs Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 1/5] bpf: Allow each subprog having stack size of 512 bytes Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 2/5] bpf: Collect stack depth information Yonghong Song
2024-09-30 14:42 ` Alexei Starovoitov
2024-09-30 16:23 ` Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 3/5] bpf: Mark each subprog with proper pstack states Yonghong Song
2024-09-30 14:49 ` Alexei Starovoitov
2024-09-30 16:26 ` Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack Yonghong Song
2024-09-27 4:58 ` Leon Hwang
2024-09-27 15:24 ` Yonghong Song
2024-09-29 8:31 ` kernel test robot
2024-09-30 16:29 ` Yonghong Song
2024-09-29 13:02 ` kernel test robot
2024-09-30 16:31 ` Yonghong Song
2024-09-29 13:34 ` kernel test robot
2024-09-30 15:03 ` Alexei Starovoitov
2024-09-30 16:33 ` Yonghong Song
2024-10-01 4:31 ` Kumar Kartikeya Dwivedi
2024-10-01 4:37 ` Kumar Kartikeya Dwivedi
2024-10-01 18:49 ` Alexei Starovoitov
2024-10-01 19:53 ` yet another approach Was: " Alexei Starovoitov
2024-10-01 20:50 ` Kumar Kartikeya Dwivedi
2024-10-01 21:28 ` Alexei Starovoitov
2024-10-02 0:22 ` Kumar Kartikeya Dwivedi
2024-10-02 1:26 ` Alexei Starovoitov
2024-10-02 2:16 ` Kumar Kartikeya Dwivedi
2024-10-02 6:28 ` Yonghong Song
2024-10-02 6:48 ` Yonghong Song
2024-10-03 6:17 ` Yonghong Song
2024-10-03 13:39 ` Kumar Kartikeya Dwivedi
2024-10-03 17:35 ` Alexei Starovoitov
2024-10-03 18:53 ` Yonghong Song
2024-10-03 20:44 ` Yonghong Song
2024-10-03 20:47 ` Kumar Kartikeya Dwivedi
2024-10-03 20:54 ` Yonghong Song [this message]
2024-10-03 22:32 ` Alexei Starovoitov
2024-10-04 5:22 ` Yonghong Song
2024-10-04 19:27 ` Yonghong Song
2024-10-04 19:52 ` Alexei Starovoitov
2024-10-05 2:03 ` Yonghong Song
2024-10-08 22:10 ` Alexei Starovoitov
2024-10-09 2:06 ` Alexei Starovoitov
2024-10-09 6:31 ` Yonghong Song
2024-10-09 14:56 ` Alexei Starovoitov
2024-10-09 15:56 ` Yonghong Song
2024-10-09 16:36 ` Kumar Kartikeya Dwivedi
2024-10-09 16:38 ` Kumar Kartikeya Dwivedi
2024-10-09 17:37 ` Kumar Kartikeya Dwivedi
2024-10-09 6:12 ` Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add private stack tests Yonghong Song
2024-09-30 13:40 ` Jiri Olsa
2024-09-30 15:05 ` Alexei Starovoitov
2024-09-30 16:35 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=89c98687-2087-46eb-8341-6ae65d70cb9c@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox