From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <kernel-team@fb.com>,
Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: yet another approach Was: [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack
Date: Thu, 3 Oct 2024 11:53:13 -0700 [thread overview]
Message-ID: <4dbee577-af8f-4b27-9099-d56956c8e772@linux.dev> (raw)
In-Reply-To: <CAADnVQ+UByKkpVSg4tC-hoV7DstEYE11WxJ4nbGj27emZ2PFmA@mail.gmail.com>
On 10/3/24 10:35 AM, Alexei Starovoitov wrote:
> On Thu, Oct 3, 2024 at 6:40 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>> On Thu, 3 Oct 2024 at 08:17, Yonghong Song <yonghong.song@linux.dev> wrote:
>>>
>>> On 10/1/24 6:26 PM, Alexei Starovoitov wrote:
>>>> On Tue, Oct 1, 2024 at 5:23 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>>>> Makes sense, though will we have cases where hierarchical scheduling
>>>>> attaches the same prog at different points of the hierarchy?
>>>> I'm not sure anyone was asking for such a use case.
>>>>
>>>>> Then the
>>>>> limit of 4 may not be enough (e.g. say with cgroup nested levels > 4).
>>>> Well, 4 was the number from TJ.
>>>>
>>>> Anyway the proposed pseudo code:
>>>>
>>>> __bpf_prog_enter_recur_limited()
>>>> {
>>>> cnt = this_cpu_inc_return(*(prog->active));
>>>> if (cnt > 4) {
>>>> inc_miss
>>>> return 0;
>>>> }
>>>> // pass cnt into bpf prog somehow, like %rdx ?
>>>> // or re-read prog->active from prog
>>>> }
>>>>
>>>>
>>>> then in the prologue emit:
>>>>
>>>> push rbp
>>>> mov rbp, rsp
>>>> if %rdx == 1
>>>> // main prog is called for the first time
>>>> mov rsp, pcpu_priv_stack_top
>>>> else
>>>> // 2+nd time main prog is called or 1+ time subprog
>>>> sub rsp, stack_size
>>>> if rsp < pcpu_priv_stack_bottom
>>>> goto exit // stack is too small, exit
>>>> fi
>>> I have tried to implement this approach (not handling
>>> recursion yet) based on the above approach. It works
>>> okay with nested bpf subprogs like
>>> main prog // set rsp = pcpu_priv_stack_top
>>> subprog1 // some stack
>>> subprog2 // some stack
>>>
>>> The pcpu_priv_stack is allocated like
>>> priv_stack_ptr = __alloc_percpu_gfp(1024 * 16, 8, GFP_KERNEL);
>>>
>>> But whenever the prog called an external function,
>>> e.g. a helper in this case, I will get a double fault.
>>> An example could be
>>> main prog // set rsp = pcpu_priv_stack_top
>>> subprog1 // some stack
>>> subprog2 // some stack
>>> call bpf_seq_printf
>>> (I modified bpf_iter_ipv6_route.c bpf prog for the above
>>> purpose.)
>>> I added some printk statements from the beginning of bpf_seq_printf and
>>> nothing printed out either and of course traps still happens.
>>>
>>> I tried another example without subprog and the mainprog calls
>>> a helper and the same double traps happens below too.
>>>
>>> The error log looks like
>>>
>>> [ 54.024955] traps: PANIC: double fault, error_code: 0x0
>>> [ 54.024969] Oops: double fault: 0000 [#1] PREEMPT SMP KASAN PTI
>>> [ 54.024977] CPU: 3 UID: 0 PID: 1946 Comm: test_progs Tainted: G OE 6.11.0-10577-gf25c172fd840-dirty #968
>>> [ 54.024982] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>>> [ 54.024983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>> [ 54.024986] RIP: 0010:error_entry+0x1e/0x140
>>> [ 54.024996] Code: ff ff 90 90 90 90 90 90 90 90 90 90 56 48 8b 74 24 08 48 89 7c 24 08 52 51 50 41 50 41 51 41 52 41 53 53 55 41 54 41 55 41 56 <41> 57 56 31 f6 31 d1
>>> [ 54.024999] RSP: 0018:ffffe8ffff580000 EFLAGS: 00010806
>>> [ 54.025002] RAX: f3f3f300f1f1f1f1 RBX: fffff91fffeb0044 RCX: ffffffff84201701
>>> [ 54.025005] RDX: fffff91fffeb0044 RSI: ffffffff8420128d RDI: ffffe8ffff580178
>>> [ 54.025007] RBP: ffffe8ffff580140 R08: 0000000000000000 R09: 0000000000000000
>>> [ 54.025009] R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
>>> [ 54.025010] R13: 1ffffd1fffeb0014 R14: 0000000000000003 R15: ffffe8ffff580178
>>> [ 54.025012] FS: 00007fd076525d00(0000) GS:ffff8881f7180000(0000) knlGS:0000000000000000
>>> [ 54.025015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 54.025017] CR2: ffffe8ffff57fff8 CR3: 000000010cd80002 CR4: 0000000000370ef0
>>> [ 54.025021] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ 54.025022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ 54.025024] Call Trace:
>>> [ 54.025026] <#DF>
>>> [ 54.025028] ? __die_body+0xaf/0xc0
>>> [ 54.025032] ? die+0x2f/0x50
>>> [ 54.025036] ? exc_double_fault+0x73/0x80
>>> [ 54.025040] ? asm_exc_double_fault+0x23/0x30
>>> [ 54.025044] ? common_interrupt_return+0xb1/0xcc
>>> [ 54.025048] ? asm_exc_page_fault+0xd/0x30
>>> [ 54.025051] ? error_entry+0x1e/0x140
>>> [ 54.025055] </#DF>
>>> [ 54.025056] Modules linked in: bpf_testmod(OE)
>>> [ 54.025061] ---[ end trace 0000000000000000 ]---
>>>
>>> Maybe somebody could give a hint why I got a double fault
>>> when calling external functions (outside of bpf programs)
>>> with allocated stack?
>>>
>> I will help in debugging. Can you share the patch you applied locally
>> so I can reproduce?
> Looks like the idea needs more thought.
>
> in_task_stack() won't recognize the private stack,
> so it will look like stack overflow and double fault.
Thanks. Good point. For a particular helper, if the helper is
doing nothing, it works fine. As soon as I add a printk,
it will have double fault. Maybe some case kernel functions
also do check in_task_stack() as well.
>
> do you have CONFIG_VMAP_STACK ?
No. But I can try.
next prev parent reply other threads:[~2024-10-03 18:53 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-26 23:45 [PATCH bpf-next v3 0/5] bpf: Support private stack for bpf progs Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 1/5] bpf: Allow each subprog having stack size of 512 bytes Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 2/5] bpf: Collect stack depth information Yonghong Song
2024-09-30 14:42 ` Alexei Starovoitov
2024-09-30 16:23 ` Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 3/5] bpf: Mark each subprog with proper pstack states Yonghong Song
2024-09-30 14:49 ` Alexei Starovoitov
2024-09-30 16:26 ` Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack Yonghong Song
2024-09-27 4:58 ` Leon Hwang
2024-09-27 15:24 ` Yonghong Song
2024-09-29 8:31 ` kernel test robot
2024-09-30 16:29 ` Yonghong Song
2024-09-29 13:02 ` kernel test robot
2024-09-30 16:31 ` Yonghong Song
2024-09-29 13:34 ` kernel test robot
2024-09-30 15:03 ` Alexei Starovoitov
2024-09-30 16:33 ` Yonghong Song
2024-10-01 4:31 ` Kumar Kartikeya Dwivedi
2024-10-01 4:37 ` Kumar Kartikeya Dwivedi
2024-10-01 18:49 ` Alexei Starovoitov
2024-10-01 19:53 ` yet another approach Was: " Alexei Starovoitov
2024-10-01 20:50 ` Kumar Kartikeya Dwivedi
2024-10-01 21:28 ` Alexei Starovoitov
2024-10-02 0:22 ` Kumar Kartikeya Dwivedi
2024-10-02 1:26 ` Alexei Starovoitov
2024-10-02 2:16 ` Kumar Kartikeya Dwivedi
2024-10-02 6:28 ` Yonghong Song
2024-10-02 6:48 ` Yonghong Song
2024-10-03 6:17 ` Yonghong Song
2024-10-03 13:39 ` Kumar Kartikeya Dwivedi
2024-10-03 17:35 ` Alexei Starovoitov
2024-10-03 18:53 ` Yonghong Song [this message]
2024-10-03 20:44 ` Yonghong Song
2024-10-03 20:47 ` Kumar Kartikeya Dwivedi
2024-10-03 20:54 ` Yonghong Song
2024-10-03 22:32 ` Alexei Starovoitov
2024-10-04 5:22 ` Yonghong Song
2024-10-04 19:27 ` Yonghong Song
2024-10-04 19:52 ` Alexei Starovoitov
2024-10-05 2:03 ` Yonghong Song
2024-10-08 22:10 ` Alexei Starovoitov
2024-10-09 2:06 ` Alexei Starovoitov
2024-10-09 6:31 ` Yonghong Song
2024-10-09 14:56 ` Alexei Starovoitov
2024-10-09 15:56 ` Yonghong Song
2024-10-09 16:36 ` Kumar Kartikeya Dwivedi
2024-10-09 16:38 ` Kumar Kartikeya Dwivedi
2024-10-09 17:37 ` Kumar Kartikeya Dwivedi
2024-10-09 6:12 ` Yonghong Song
2024-09-26 23:45 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add private stack tests Yonghong Song
2024-09-30 13:40 ` Jiri Olsa
2024-09-30 15:05 ` Alexei Starovoitov
2024-09-30 16:35 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4dbee577-af8f-4b27-9099-d56956c8e772@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.