On 10/4/24 12:52 PM, Alexei Starovoitov wrote: > On Fri, Oct 4, 2024 at 12:28 PM Yonghong Song wrote: >> >> On 10/3/24 10:22 PM, Yonghong Song wrote: >>> On 10/3/24 3:32 PM, Alexei Starovoitov wrote: >>>> On Thu, Oct 3, 2024 at 1:44 PM Yonghong Song >>>> wrote: >>>>>> Looks like the idea needs more thought. >>>>>> >>>>>> in_task_stack() won't recognize the private stack, >>>>>> so it will look like stack overflow and double fault. >>>>>> >>>>>> do you have CONFIG_VMAP_STACK ? >>>>> Yes, my above test runs fine withCONFIG_VMAP_STACK. Let me guard >>>>> private stack support with >>>>> CONFIG_VMAP_STACK for now. Not sure whether distributions enable >>>>> CONFIG_VMAP_STACK or not. >>>> Good! but I'm surprised it makes a difference. >>> That only for the test case I tried. Now I tried the whole bpf selftests >>> with CONFIG_VMAP_STACK on. There are still some failures. Some of them >>> due to stack protector. I disabled stack protector and then those stack >>> protector error gone. But some other errors show up like below: >>> >>> [ 27.186581] kernel tried to execute NX-protected page - exploit >>> attempt? (uid: 0) >>> [ 27.187480] BUG: unable to handle page fault for address: >>> ffff888109572800 >>> [ 27.188299] #PF: supervisor instruction fetch in kernel mode >>> [ 27.189085] #PF: error_code(0x0011) - permissions violation >>> >>> or >>> >>> [ 27.736844] BUG: unable to handle page fault for address: >>> 0000000080000000 >>> [ 27.737759] #PF: supervisor instruction fetch in kernel mode >>> [ 27.738631] #PF: error_code(0x0010) - not-present page >>> [ 27.739455] PGD 0 P4D 0 >>> [ 27.739818] Oops: Oops: 0010 [#1] PREEMPT SMP PTI >>> >>> ... >>> >>> Some further investigations are needed. >> >> I found one failure case (with stackprotector disabled): >> >> [ 20.032611] traps: PANIC: double fault, error_code: 0x0 >> [ 20.032615] Oops: double fault: 0000 [#1] PREEMPT SMP PTI >> [ 20.032619] CPU: 0 UID: 0 PID: 1959 Comm: test_progs Tainted: G OE 6.11.0-10576-g17baa0096769-dirty #1006 >> [ 20.032623] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE >> [ 20.032624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >> [ 20.032626] RIP: 0010:error_entry+0x17/0x140 >> [ 20.032633] Code: ff 0f 01 f8 e9 56 fe ff ff 90 90 90 90 90 90 90 90 90 90 56 48 8b 74 24 08 48 89 7c 24 08 52 51 50 41 50 41 51 41 52 49 >> [ 20.032635] RSP: 0018:ffffe8ffff400000 EFLAGS: 00010093 >> [ 20.032637] RAX: ffffe8ffff4000a8 RBX: ffffe8ffff4000a8 RCX: ffffffff82201737 >> [ 20.032639] RDX: 0000000000000000 RSI: ffffffff8220128d RDI: ffffe8ffff4000a8 >> [ 20.032640] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 >> [ 20.032641] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >> [ 20.032642] R13: 0000000000000000 R14: 000000000002ed80 R15: 0000000000000000 >> [ 20.032643] FS: 00007f8a3a2006c0(0000) GS:ffff888237c00000(0000) knlGS:ffff888237c00000 >> [ 20.032645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 20.032646] CR2: ffffe8ffff3ffff8 CR3: 0000000103580002 CR4: 0000000000370ef0 >> [ 20.032649] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 20.032650] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 20.032651] Call Trace: >> [ 20.032660] <#DF> >> [ 20.032664] ? __die_body+0xaf/0xc0 >> [ 20.032667] ? die+0x2f/0x50 >> [ 20.032670] ? exc_double_fault+0xbf/0xd0 >> [ 20.032674] ? asm_exc_double_fault+0x23/0x30 >> [ 20.032678] ? restore_regs_and_return_to_kernel+0x1b/0x1b >> [ 20.032681] ? asm_exc_page_fault+0xd/0x30 >> [ 20.032684] ? error_entry+0x17/0x140 >> [ 20.032687] >> >> The private stack for cpu 0: >> priv_stack_ptr cpu 0 = [ffffe8ffff434000, ffffe8ffff438000] (total 16KB) >> That is, the top stack is ffffe8ffff438000 and the bottom stack is ffffe8ffff434000. >> >> During bpf execution, a softirq may happen, at that point, >> stack pointer becomes: >> RSP: 0018:ffffe8ffff400000 (see above) >> and there is a read/write (mostly write) to address >> CR2: ffffe8ffff3ffff8 >> And this may cause a fault. >> After this fault, there are some further access and probably because >> of invalid stack, double fault happens. >> >> So the quesiton is why RSP is reset to ffffe8ffff400000? > 0x38000 bytes consumed by stack or rounded down? > That's unlikely. > >> I have not figured out which code changed this? Maybe somebody can help? > As Kumar said earlier pls share the patch. Link to github? or whichever. > > Double check that any kind of tail-call logic is not mixed with priv stack. Here is the reproducer. Two attached files: priv_stack.config: the config file to build the kernel 0001-bpf-implement-private-stack.patch: the patch to apply to the top of bpf-next. The top bpf-next commit in my test: commit 9502a7de5a61bec3bda841a830560c5d6d40ecac (origin/master, origin/HEAD, master) Author: Mykyta Yatsenko Date: Tue Oct 1 00:15:22 2024 +0100 selftests/bpf: Emit top frequent code lines in veristat I am using clang18 to build the kernel and selftests. The build command line: make LLVM=1 -j make -C tools/testing/selftests/bpf LLVM=1 -j In qemu vm, tools/testing/selftests/bpf directory, run the following script: $ cat run.sh cat /proc/sys/net/core/bpf_jit_limit echo 796917760 > /proc/sys/net/core/bpf_jit_limit # ./test_progs -n 339/4 ./test_progs -t task_local_storage/nodeadlock With private stack on by default, in my environment, booting will failure. So by default, private stack is off. In the above echo 796917760 > /proc/sys/net/core/bpf_jit_limit intends to enable private stack. The following is the run result: [root@arch-fb-vm1 bpf]# ./run.sh 796917760 [ 55.570032] bpf_testmod: loading out-of-tree module taints kernel. [ 55.570841] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 55.822604] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 55.823588] BUG: unable to handle page fault for address: ffff888107094a40 [ 55.824444] #PF: supervisor instruction fetch in kernel mode [ 55.825120] #PF: error_code(0x0011) - permissions violation [ 55.825800] PGD 4801067 P4D 4801067 PUD 100c99063 PMD 80000001070000e3 [ 55.826578] Oops: Oops: 0011 [#1] PREEMPT SMP PTI [ 55.827162] CPU: 0 UID: 0 PID: 1958 Comm: test_progs Tainted: G OE 6.11.0-10576-g8eba407b1d56 #1008 [ 55.828387] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 55.829027] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 55.830353] RIP: 0010:0xffff888107094a40 [ 55.830836] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 55.833033] RSP: 0018:ffffe8ffff4376d0 EFLAGS: 00010096 [ 55.833649] RAX: ffff8881070918c0 RBX: ffff888103928540 RCX: 0000000000000000 [ 55.834517] RDX: 0000000000000000 RSI: ffffffff82a35174 RDI: 0000000000000000 [ 55.835369] RBP: ffff888107084a40 R08: 0000000000000014 R09: 0000000000000002 [ 55.836213] R10: ffff888107094a00 R11: 0000000002bed400 R12: ffff888237c2fc00 [ 55.837090] R13: 0000000000000000 R14: ffffffff810e35e6 R15: ffffe8ffff4376d0 [ 55.837968] FS: 00007f255b2006c0(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 [ 55.838966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.839731] CR2: ffff888107094a40 CR3: 0000000109766003 CR4: 0000000000370ef0 [ 55.840763] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.841764] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.842764] Call Trace: [ 55.843139] Modules linked in: bpf_testmod(OE) [ 55.843796] CR2: ffff888107094a40 [ 55.844280] ---[ end trace 0000000000000000 ]--- [ 55.844859] RIP: 0010:0xffff888107094a40 [ 55.845358] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 55.847706] RSP: 0018:ffffe8ffff4376d0 EFLAGS: 00010096 [ 55.848415] RAX: ffff8881070918c0 RBX: ffff888103928540 RCX: 0000000000000000 [ 55.849345] RDX: 0000000000000000 RSI: ffffffff82a35174 RDI: 0000000000000000 [ 55.850245] RBP: ffff888107084a40 R08: 0000000000000014 R09: 0000000000000002 [ 55.851228] R10: ffff888107094a00 R11: 0000000002bed400 R12: ffff888237c2fc00 [ 55.852196] R13: 0000000000000000 R14: ffffffff810e35e6 R15: ffffe8ffff4376d0 [ 55.853147] FS: 00007f255b2006c0(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 [ 55.854220] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.854935] CR2: ffff888107094a40 CR3: 0000000109766003 CR4: 0000000000370ef0 [ 55.855825] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.856705] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.857613] Kernel panic - not syncing: Fatal exception [ 55.858599] Kernel Offset: disabled [ 55.859110] ---[ end Kernel panic - not syncing: Fatal exception ]--- There are a few other test failure as well. But I think resolving this issue might help other failures too.