Re: kernel lockup on bpf selftests module

loongarch.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* Re: kernel lockup on bpf selftests module_attach
  2025-08-09  8:15 kernel lockup on bpf selftests module_attach Vincent Li
@ 2025-08-09  3:03 ` Huacai Chen
  2025-08-09  3:48   ` Vincent Li
  0 siblings, 1 reply; 18+ messages in thread
From: Huacai Chen @ 2025-08-09  3:03 UTC (permalink / raw)
  To: Vincent Li; +Cc: loongarch, Hengqi Chen, Chenghao Duan, Tiezhu Yang

Hi, Vincent,

On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
>
> Hi Folks,
>
> Hengqi mentioned offline that the loongarch kernel locked up when
> running full bpf selftests, so I went ahead and ran make run_tests to
> perform full bpf selftest, I observed lockup too. It appears the
> lockup happens when running module_attach test which includes testing
> on fentry so this could be related to the trampoline patch series. for
> example, if I just run ./test_progs -t module_attach, the kernel
> lockup immediately.
Is this a regression caused by the latest trampoline patches? Or in
another word, Does vanilla 6.16 has this problem?

Huacai

>
> A side note, if I put the module_attach test in
> tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> the module_attach test is not skipped.
>
> Thanks
>
> Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-09  3:03 ` Huacai Chen
@ 2025-08-09  3:48   ` Vincent Li
  2025-08-09  5:03     ` Vincent Li
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Li @ 2025-08-09  3:48 UTC (permalink / raw)
  To: Huacai Chen; +Cc: loongarch, Hengqi Chen, Chenghao Duan, Tiezhu Yang

On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
>
> Hi, Vincent,
>
> On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> >
> > Hi Folks,
> >
> > Hengqi mentioned offline that the loongarch kernel locked up when
> > running full bpf selftests, so I went ahead and ran make run_tests to
> > perform full bpf selftest, I observed lockup too. It appears the
> > lockup happens when running module_attach test which includes testing
> > on fentry so this could be related to the trampoline patch series. for
> > example, if I just run ./test_progs -t module_attach, the kernel
> > lockup immediately.
> Is this a regression caused by the latest trampoline patches? Or in
> another word, Does vanilla 6.16 has this problem?
>

I suspect this is caused by the latest trampoline patches because the
module_attach is to test the fentry feature for kernel module
functions, I believe Changhao and I only tested the fentry feature for
non-module kernel functions. I can try kernel without the trampoline
patches and will let you know the result.

> Huacai
>
> >
> > A side note, if I put the module_attach test in
> > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > the module_attach test is not skipped.
> >
> > Thanks
> >
> > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-09  3:48   ` Vincent Li
@ 2025-08-09  5:03     ` Vincent Li
  2025-08-09  6:02       ` Huacai Chen
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Li @ 2025-08-09  5:03 UTC (permalink / raw)
  To: Huacai Chen; +Cc: loongarch, Hengqi Chen, Chenghao Duan, Tiezhu Yang

On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> >
> > Hi, Vincent,
> >
> > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > >
> > > Hi Folks,
> > >
> > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > perform full bpf selftest, I observed lockup too. It appears the
> > > lockup happens when running module_attach test which includes testing
> > > on fentry so this could be related to the trampoline patch series. for
> > > example, if I just run ./test_progs -t module_attach, the kernel
> > > lockup immediately.
> > Is this a regression caused by the latest trampoline patches? Or in
> > another word, Does vanilla 6.16 has this problem?
> >
>
> I suspect this is caused by the latest trampoline patches because the
> module_attach is to test the fentry feature for kernel module
> functions, I believe Changhao and I only tested the fentry feature for
> non-module kernel functions. I can try kernel without the trampoline
> patches and will let you know the result.
>

I reverted  trampoline patches from loongarch-next branch and run
./test_progs -t module_attach simply just errors out with the fentry
feature not supported

[root@fedora bpf]# ./test_progs -t module_attach
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
test_module_attach:PASS:skel_load 0 nsec
libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
test_module_attach:FAIL:skel_attach skeleton attach failed: -524
#205     module_attach:FAIL

All error logs:
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
test_module_attach:PASS:skel_load 0 nsec
libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
test_module_attach:FAIL:skel_attach skeleton attach failed: -524
#205     module_attach:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

I also tested loongarch-next branch with the trampoline patch series
with no lockup kernel config so I can run dmesg to check kernel error
log,  ./test_progs -t module_attach result in below kernel log:

[  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
[  419.728620] CPU 70475748 Unable to handle kernel paging request at
virtual address 0000000800000024, era == 90000000041d5854, ra ==
90000000041d5848
[  419.728629] Oops[#1]:
[  419.728632] CPU 70475748 Unable to handle kernel paging request at
virtual address 0000000000000018, era == 9000000005750268, ra ==
9000000004163938
[  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  441.305380] rcu:     5-...0: (29 ticks this GP)
idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
[  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
[  441.305390] Sending NMI from CPU 4 to CPUs 5:
[  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
[  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
time, OOM is now expected behavior.
[  451.305502] rcu: RCU grace-period kthread stack dump:
[  451.305504] task:rcu_preempt     state:R stack:0     pid:15
tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
[  451.305510] Stack : 9000000100467e80 0000000000000402
0000000000000010 90000001003b0680
[  451.305519]         90000000058e0000 0000000000000000
0000000000000040 9000000006c2dfd0
[  451.305526]         900000000578c9b0 0000000000000001
9000000006b21000 0000000000000005
[  451.305533]         00000001000093a8 00000001000093a8
0000000000000000 0000000000000004
[  451.305540]         90000000058f04e0 0000000000000000
0000000000000002 b793724be1dfb2b8
[  451.305547]         00000001000093a9 b793724be1dfb2b8
000000000000003f 9000000006c2dfd0
[  451.305554]         9000000006c30c18 0000000000000005
9000000006b0e000 9000000006b21000
[  451.305560]         9000000100453c98 90000001003aff80
9000000006c31140 900000000578c9b0
[  451.305567]         00000001000093a8 9000000005794d3c
00000000000000b4 0000000000000000
[  451.305574]         90000000024021b8 00000001000093a8
9000000004284f20 000000000a400001
[  451.305581]         ...
[  451.305584] Call Trace:
[  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
[  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
[  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
[  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
[  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
[  451.305614] [<90000000041be704>] kthread+0x144/0x238
[  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
[  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88

[  451.305630] rcu: Stack dump where RCU GP kthread last ran:
[  451.305633] Sending NMI from CPU 4 to CPUs 1:
[  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
[  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
[  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
[  451.306669] Sending NMI from CPU 6 to CPUs 5:
[  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?

So related to trampoline patches for sure unless I am missing something.

> > Huacai
> >
> > >
> > > A side note, if I put the module_attach test in
> > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > the module_attach test is not skipped.
> > >
> > > Thanks
> > >
> > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-09  5:03     ` Vincent Li
@ 2025-08-09  6:02       ` Huacai Chen
  2025-08-09 19:11         ` Vincent Li
  0 siblings, 1 reply; 18+ messages in thread
From: Huacai Chen @ 2025-08-09  6:02 UTC (permalink / raw)
  To: Vincent Li; +Cc: loongarch, Hengqi Chen, Chenghao Duan, Tiezhu Yang

Hi, Chenghao,

Please take a look.

Huacai

On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> >
> > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > >
> > > Hi, Vincent,
> > >
> > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > >
> > > > Hi Folks,
> > > >
> > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > lockup happens when running module_attach test which includes testing
> > > > on fentry so this could be related to the trampoline patch series. for
> > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > lockup immediately.
> > > Is this a regression caused by the latest trampoline patches? Or in
> > > another word, Does vanilla 6.16 has this problem?
> > >
> >
> > I suspect this is caused by the latest trampoline patches because the
> > module_attach is to test the fentry feature for kernel module
> > functions, I believe Changhao and I only tested the fentry feature for
> > non-module kernel functions. I can try kernel without the trampoline
> > patches and will let you know the result.
> >
>
> I reverted  trampoline patches from loongarch-next branch and run
> ./test_progs -t module_attach simply just errors out with the fentry
> feature not supported
>
> [root@fedora bpf]# ./test_progs -t module_attach
> test_module_attach:PASS:skel_open 0 nsec
> test_module_attach:PASS:set_attach_target 0 nsec
> test_module_attach:PASS:set_attach_target_explicit 0 nsec
> test_module_attach:PASS:skel_load 0 nsec
> libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> #205     module_attach:FAIL
>
> All error logs:
> test_module_attach:PASS:skel_open 0 nsec
> test_module_attach:PASS:set_attach_target 0 nsec
> test_module_attach:PASS:set_attach_target_explicit 0 nsec
> test_module_attach:PASS:skel_load 0 nsec
> libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> #205     module_attach:FAIL
> Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
>
> I also tested loongarch-next branch with the trampoline patch series
> with no lockup kernel config so I can run dmesg to check kernel error
> log,  ./test_progs -t module_attach result in below kernel log:
>
> [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> virtual address 0000000800000024, era == 90000000041d5854, ra ==
> 90000000041d5848
> [  419.728629] Oops[#1]:
> [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> virtual address 0000000000000018, era == 9000000005750268, ra ==
> 9000000004163938
> [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  441.305380] rcu:     5-...0: (29 ticks this GP)
> idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> time, OOM is now expected behavior.
> [  451.305502] rcu: RCU grace-period kthread stack dump:
> [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> [  451.305510] Stack : 9000000100467e80 0000000000000402
> 0000000000000010 90000001003b0680
> [  451.305519]         90000000058e0000 0000000000000000
> 0000000000000040 9000000006c2dfd0
> [  451.305526]         900000000578c9b0 0000000000000001
> 9000000006b21000 0000000000000005
> [  451.305533]         00000001000093a8 00000001000093a8
> 0000000000000000 0000000000000004
> [  451.305540]         90000000058f04e0 0000000000000000
> 0000000000000002 b793724be1dfb2b8
> [  451.305547]         00000001000093a9 b793724be1dfb2b8
> 000000000000003f 9000000006c2dfd0
> [  451.305554]         9000000006c30c18 0000000000000005
> 9000000006b0e000 9000000006b21000
> [  451.305560]         9000000100453c98 90000001003aff80
> 9000000006c31140 900000000578c9b0
> [  451.305567]         00000001000093a8 9000000005794d3c
> 00000000000000b4 0000000000000000
> [  451.305574]         90000000024021b8 00000001000093a8
> 9000000004284f20 000000000a400001
> [  451.305581]         ...
> [  451.305584] Call Trace:
> [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
>
> [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
>
> So related to trampoline patches for sure unless I am missing something.
>
> > > Huacai
> > >
> > > >
> > > > A side note, if I put the module_attach test in
> > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > the module_attach test is not skipped.
> > > >
> > > > Thanks
> > > >
> > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* kernel lockup on bpf selftests module_attach
@ 2025-08-09  8:15 Vincent Li
  2025-08-09  3:03 ` Huacai Chen
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Li @ 2025-08-09  8:15 UTC (permalink / raw)
  To: loongarch; +Cc: Hengqi Chen, Chenghao Duan, Tiezhu Yang, Huacai Chen

Hi Folks,

Hengqi mentioned offline that the loongarch kernel locked up when
running full bpf selftests, so I went ahead and ran make run_tests to
perform full bpf selftest, I observed lockup too. It appears the
lockup happens when running module_attach test which includes testing
on fentry so this could be related to the trampoline patch series. for
example, if I just run ./test_progs -t module_attach, the kernel
lockup immediately.

A side note, if I put the module_attach test in
tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
the module_attach test is not skipped.

Thanks

Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-09  6:02       ` Huacai Chen
@ 2025-08-09 19:11         ` Vincent Li
  2025-08-10 17:39           ` Vincent Li
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Li @ 2025-08-09 19:11 UTC (permalink / raw)
  To: Huacai Chen; +Cc: loongarch, Hengqi Chen, Chenghao Duan, Tiezhu Yang

On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
>
> Hi, Chenghao,
>
> Please take a look.
>
> Huacai
>
I reverted loongson-next branch  tailcall count fix patches, struct
ops trampoline patch, keep the rest of trampoline patches,
module_attach test experienced the same issue, so definitely
trampoline patches issue.

> On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> >
> > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > >
> > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > >
> > > > Hi, Vincent,
> > > >
> > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > >
> > > > > Hi Folks,
> > > > >
> > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > lockup happens when running module_attach test which includes testing
> > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > lockup immediately.
> > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > another word, Does vanilla 6.16 has this problem?
> > > >
> > >
> > > I suspect this is caused by the latest trampoline patches because the
> > > module_attach is to test the fentry feature for kernel module
> > > functions, I believe Changhao and I only tested the fentry feature for
> > > non-module kernel functions. I can try kernel without the trampoline
> > > patches and will let you know the result.
> > >
> >
> > I reverted  trampoline patches from loongarch-next branch and run
> > ./test_progs -t module_attach simply just errors out with the fentry
> > feature not supported
> >
> > [root@fedora bpf]# ./test_progs -t module_attach
> > test_module_attach:PASS:skel_open 0 nsec
> > test_module_attach:PASS:set_attach_target 0 nsec
> > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > test_module_attach:PASS:skel_load 0 nsec
> > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > #205     module_attach:FAIL
> >
> > All error logs:
> > test_module_attach:PASS:skel_open 0 nsec
> > test_module_attach:PASS:set_attach_target 0 nsec
> > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > test_module_attach:PASS:skel_load 0 nsec
> > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > #205     module_attach:FAIL
> > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> >
> > I also tested loongarch-next branch with the trampoline patch series
> > with no lockup kernel config so I can run dmesg to check kernel error
> > log,  ./test_progs -t module_attach result in below kernel log:
> >
> > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > 90000000041d5848
> > [  419.728629] Oops[#1]:
> > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > 9000000004163938
> > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > time, OOM is now expected behavior.
> > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > 0000000000000010 90000001003b0680
> > [  451.305519]         90000000058e0000 0000000000000000
> > 0000000000000040 9000000006c2dfd0
> > [  451.305526]         900000000578c9b0 0000000000000001
> > 9000000006b21000 0000000000000005
> > [  451.305533]         00000001000093a8 00000001000093a8
> > 0000000000000000 0000000000000004
> > [  451.305540]         90000000058f04e0 0000000000000000
> > 0000000000000002 b793724be1dfb2b8
> > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > 000000000000003f 9000000006c2dfd0
> > [  451.305554]         9000000006c30c18 0000000000000005
> > 9000000006b0e000 9000000006b21000
> > [  451.305560]         9000000100453c98 90000001003aff80
> > 9000000006c31140 900000000578c9b0
> > [  451.305567]         00000001000093a8 9000000005794d3c
> > 00000000000000b4 0000000000000000
> > [  451.305574]         90000000024021b8 00000001000093a8
> > 9000000004284f20 000000000a400001
> > [  451.305581]         ...
> > [  451.305584] Call Trace:
> > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> >
> > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> >
> > So related to trampoline patches for sure unless I am missing something.
> >
> > > > Huacai
> > > >
> > > > >
> > > > > A side note, if I put the module_attach test in
> > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > the module_attach test is not skipped.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-09 19:11         ` Vincent Li
@ 2025-08-10 17:39           ` Vincent Li
  2025-08-12  8:34             ` Chenghao Duan
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Li @ 2025-08-10 17:39 UTC (permalink / raw)
  To: Huacai Chen; +Cc: loongarch, Hengqi Chen, Chenghao Duan, Tiezhu Yang

Hi Chenghao,

On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
>
> On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> >
> > Hi, Chenghao,
> >
> > Please take a look.
> >
> > Huacai
> >
> I reverted loongson-next branch  tailcall count fix patches, struct
> ops trampoline patch, keep the rest of trampoline patches,
> module_attach test experienced the same issue, so definitely
> trampoline patches issue.
>

I attempted to isolate which test in module_attach triggers the
"Unable to handle kernel paging request..." error, it appears to be
this one in "prog_tests/module_attach.c"

ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");

you can try to comment out other tests in "prog_tests/module_attach.c"
and perform the test, it might help isolate the issue.


> > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > >
> > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > >
> > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > >
> > > > > Hi, Vincent,
> > > > >
> > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > >
> > > > > > Hi Folks,
> > > > > >
> > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > lockup happens when running module_attach test which includes testing
> > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > lockup immediately.
> > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > another word, Does vanilla 6.16 has this problem?
> > > > >
> > > >
> > > > I suspect this is caused by the latest trampoline patches because the
> > > > module_attach is to test the fentry feature for kernel module
> > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > non-module kernel functions. I can try kernel without the trampoline
> > > > patches and will let you know the result.
> > > >
> > >
> > > I reverted  trampoline patches from loongarch-next branch and run
> > > ./test_progs -t module_attach simply just errors out with the fentry
> > > feature not supported
> > >
> > > [root@fedora bpf]# ./test_progs -t module_attach
> > > test_module_attach:PASS:skel_open 0 nsec
> > > test_module_attach:PASS:set_attach_target 0 nsec
> > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > test_module_attach:PASS:skel_load 0 nsec
> > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > #205     module_attach:FAIL
> > >
> > > All error logs:
> > > test_module_attach:PASS:skel_open 0 nsec
> > > test_module_attach:PASS:set_attach_target 0 nsec
> > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > test_module_attach:PASS:skel_load 0 nsec
> > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > #205     module_attach:FAIL
> > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > >
> > > I also tested loongarch-next branch with the trampoline patch series
> > > with no lockup kernel config so I can run dmesg to check kernel error
> > > log,  ./test_progs -t module_attach result in below kernel log:
> > >
> > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > 90000000041d5848
> > > [  419.728629] Oops[#1]:
> > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > 9000000004163938
> > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > time, OOM is now expected behavior.
> > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > 0000000000000010 90000001003b0680
> > > [  451.305519]         90000000058e0000 0000000000000000
> > > 0000000000000040 9000000006c2dfd0
> > > [  451.305526]         900000000578c9b0 0000000000000001
> > > 9000000006b21000 0000000000000005
> > > [  451.305533]         00000001000093a8 00000001000093a8
> > > 0000000000000000 0000000000000004
> > > [  451.305540]         90000000058f04e0 0000000000000000
> > > 0000000000000002 b793724be1dfb2b8
> > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > 000000000000003f 9000000006c2dfd0
> > > [  451.305554]         9000000006c30c18 0000000000000005
> > > 9000000006b0e000 9000000006b21000
> > > [  451.305560]         9000000100453c98 90000001003aff80
> > > 9000000006c31140 900000000578c9b0
> > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > 00000000000000b4 0000000000000000
> > > [  451.305574]         90000000024021b8 00000001000093a8
> > > 9000000004284f20 000000000a400001
> > > [  451.305581]         ...
> > > [  451.305584] Call Trace:
> > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > >
> > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > >
> > > So related to trampoline patches for sure unless I am missing something.
> > >
> > > > > Huacai
> > > > >
> > > > > >
> > > > > > A side note, if I put the module_attach test in
> > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > the module_attach test is not skipped.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-10 17:39           ` Vincent Li
@ 2025-08-12  8:34             ` Chenghao Duan
  2025-08-12 13:42               ` Vincent Li
  0 siblings, 1 reply; 18+ messages in thread
From: Chenghao Duan @ 2025-08-12  8:34 UTC (permalink / raw)
  To: Vincent Li; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> Hi Chenghao,
> 
> On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> >
> > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > >
> > > Hi, Chenghao,
> > >
> > > Please take a look.
> > >
> > > Huacai
> > >
> > I reverted loongson-next branch  tailcall count fix patches, struct
> > ops trampoline patch, keep the rest of trampoline patches,
> > module_attach test experienced the same issue, so definitely
> > trampoline patches issue.
> >
> 
> I attempted to isolate which test in module_attach triggers the
> "Unable to handle kernel paging request..." error, it appears to be
> this one in "prog_tests/module_attach.c"
> 
> ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> 
> you can try to comment out other tests in "prog_tests/module_attach.c"
> and perform the test, it might help isolate the issue.
> 

Hi Vincent,

The results I tested are different from yours. Could there be other
differences between us? I am using the latest code of the loongarch-next
branch.

[root@localhost bpf]# ./test_progs -v -t module_attach
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
WATCHDOG: test case module_attach executes for 10 seconds...
libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
bpf_testmod_test_read() is not modifiable
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
libbpf: failed to load object 'test_module_attach'
libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
test_module_attach:FAIL:skel_load failed to load skeleton
#205     module_attach:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.



Chenghao

> 
> > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > >
> > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > >
> > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > >
> > > > > > Hi, Vincent,
> > > > > >
> > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Folks,
> > > > > > >
> > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > lockup immediately.
> > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > >
> > > > >
> > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > module_attach is to test the fentry feature for kernel module
> > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > patches and will let you know the result.
> > > > >
> > > >
> > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > feature not supported
> > > >
> > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > test_module_attach:PASS:skel_open 0 nsec
> > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > test_module_attach:PASS:skel_load 0 nsec
> > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > #205     module_attach:FAIL
> > > >
> > > > All error logs:
> > > > test_module_attach:PASS:skel_open 0 nsec
> > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > test_module_attach:PASS:skel_load 0 nsec
> > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > #205     module_attach:FAIL
> > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > >
> > > > I also tested loongarch-next branch with the trampoline patch series
> > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > >
> > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > 90000000041d5848
> > > > [  419.728629] Oops[#1]:
> > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > 9000000004163938
> > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > time, OOM is now expected behavior.
> > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > 0000000000000010 90000001003b0680
> > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > 0000000000000040 9000000006c2dfd0
> > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > 9000000006b21000 0000000000000005
> > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > 0000000000000000 0000000000000004
> > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > 0000000000000002 b793724be1dfb2b8
> > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > 000000000000003f 9000000006c2dfd0
> > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > 9000000006b0e000 9000000006b21000
> > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > 9000000006c31140 900000000578c9b0
> > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > 00000000000000b4 0000000000000000
> > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > 9000000004284f20 000000000a400001
> > > > [  451.305581]         ...
> > > > [  451.305584] Call Trace:
> > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > >
> > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > >
> > > > So related to trampoline patches for sure unless I am missing something.
> > > >
> > > > > > Huacai
> > > > > >
> > > > > > >
> > > > > > > A side note, if I put the module_attach test in
> > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > the module_attach test is not skipped.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-12  8:34             ` Chenghao Duan
@ 2025-08-12 13:42               ` Vincent Li
  2025-08-14 12:00                 ` Chenghao Duan
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Li @ 2025-08-12 13:42 UTC (permalink / raw)
  To: Chenghao Duan; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
>
> On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > Hi Chenghao,
> >
> > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > >
> > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > >
> > > > Hi, Chenghao,
> > > >
> > > > Please take a look.
> > > >
> > > > Huacai
> > > >
> > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > ops trampoline patch, keep the rest of trampoline patches,
> > > module_attach test experienced the same issue, so definitely
> > > trampoline patches issue.
> > >
> >
> > I attempted to isolate which test in module_attach triggers the
> > "Unable to handle kernel paging request..." error, it appears to be
> > this one in "prog_tests/module_attach.c"
> >
> > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> >
> > you can try to comment out other tests in "prog_tests/module_attach.c"
> > and perform the test, it might help isolate the issue.
> >
>
> Hi Vincent,
>
> The results I tested are different from yours. Could there be other
> differences between us? I am using the latest code of the loongarch-next
> branch.
>
> [root@localhost bpf]# ./test_progs -v -t module_attach
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> test_module_attach:PASS:skel_open 0 nsec
> test_module_attach:PASS:set_attach_target 0 nsec
> test_module_attach:PASS:set_attach_target_explicit 0 nsec
> WATCHDOG: test case module_attach executes for 10 seconds...
> libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> bpf_testmod_test_read() is not modifiable
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> -- END PROG LOAD LOG --
> libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> libbpf: failed to load object 'test_module_attach'
> libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> test_module_attach:FAIL:skel_load failed to load skeleton
> #205     module_attach:FAIL
> Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> Successfully unloaded bpf_testmod.ko.
>

I build and run the most recent loongarch-next kernel too, can you try
my config https://www.bpfire.net/download/loongfire/config.txt? I am
on fedora, here are the steps I build, run the kernel, and run the
test

1, check branch
[root@fedora linux-loongson]# git branch
* loongarch-next
  master
  no-tailcall
  no-trampoline

2, build kernel and reboot
cp config.txt .config; make clean; make -j6; make modules_install;
make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot

3, after reboot and login, build bpf selftests, run module_attach
test, dmesg to check kernel log
cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach


>
>
> Chenghao
>
> >
> > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > >
> > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > >
> > > > > > > Hi, Vincent,
> > > > > > >
> > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Folks,
> > > > > > > >
> > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > lockup immediately.
> > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > >
> > > > > >
> > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > patches and will let you know the result.
> > > > > >
> > > > >
> > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > feature not supported
> > > > >
> > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > #205     module_attach:FAIL
> > > > >
> > > > > All error logs:
> > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > #205     module_attach:FAIL
> > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > >
> > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > >
> > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > 90000000041d5848
> > > > > [  419.728629] Oops[#1]:
> > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > 9000000004163938
> > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > time, OOM is now expected behavior.
> > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > 0000000000000010 90000001003b0680
> > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > 0000000000000040 9000000006c2dfd0
> > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > 9000000006b21000 0000000000000005
> > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > 0000000000000000 0000000000000004
> > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > 0000000000000002 b793724be1dfb2b8
> > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > 000000000000003f 9000000006c2dfd0
> > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > 9000000006b0e000 9000000006b21000
> > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > 9000000006c31140 900000000578c9b0
> > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > 00000000000000b4 0000000000000000
> > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > 9000000004284f20 000000000a400001
> > > > > [  451.305581]         ...
> > > > > [  451.305584] Call Trace:
> > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > >
> > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > >
> > > > > So related to trampoline patches for sure unless I am missing something.
> > > > >
> > > > > > > Huacai
> > > > > > >
> > > > > > > >
> > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > the module_attach test is not skipped.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-12 13:42               ` Vincent Li
@ 2025-08-14 12:00                 ` Chenghao Duan
  2025-08-14 13:42                   ` Vincent Li
  2025-08-21 15:04                   ` Vincent Li
  0 siblings, 2 replies; 18+ messages in thread
From: Chenghao Duan @ 2025-08-14 12:00 UTC (permalink / raw)
  To: Vincent Li; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> >
> > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > Hi Chenghao,
> > >
> > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > >
> > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > >
> > > > > Hi, Chenghao,
> > > > >
> > > > > Please take a look.
> > > > >
> > > > > Huacai
> > > > >
> > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > module_attach test experienced the same issue, so definitely
> > > > trampoline patches issue.
> > > >
> > >
> > > I attempted to isolate which test in module_attach triggers the
> > > "Unable to handle kernel paging request..." error, it appears to be
> > > this one in "prog_tests/module_attach.c"
> > >
> > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > >
> > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > and perform the test, it might help isolate the issue.
> > >
> >
> > Hi Vincent,
> >
> > The results I tested are different from yours. Could there be other
> > differences between us? I am using the latest code of the loongarch-next
> > branch.
> >
> > [root@localhost bpf]# ./test_progs -v -t module_attach
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > test_module_attach:PASS:skel_open 0 nsec
> > test_module_attach:PASS:set_attach_target 0 nsec
> > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > WATCHDOG: test case module_attach executes for 10 seconds...
> > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > bpf_testmod_test_read() is not modifiable
> > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > -- END PROG LOAD LOG --
> > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > libbpf: failed to load object 'test_module_attach'
> > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > test_module_attach:FAIL:skel_load failed to load skeleton
> > #205     module_attach:FAIL
> > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > Successfully unloaded bpf_testmod.ko.
> >
> 
> I build and run the most recent loongarch-next kernel too, can you try
> my config https://www.bpfire.net/download/loongfire/config.txt? I am
> on fedora, here are the steps I build, run the kernel, and run the
> test
> 
> 1, check branch
> [root@fedora linux-loongson]# git branch
> * loongarch-next
>   master
>   no-tailcall
>   no-trampoline
> 
> 2, build kernel and reboot
> cp config.txt .config; make clean; make -j6; make modules_install;
> make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> 
> 3, after reboot and login, build bpf selftests, run module_attach
> test, dmesg to check kernel log
> cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> 

Hi Vincent,

I tried to refer to the config you provided, but the test results I
obtained are as follows. I also specifically tested "modify" to verify
the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.

[root@localhost bpf]# ./test_progs -v -t modify_return
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
run_test:PASS:skel_load 0 nsec
run_test:PASS:modify_return__attach failed 0 nsec
run_test:PASS:test_run 0 nsec
run_test:PASS:test_run ret 0 nsec
run_test:PASS:modify_return side_effect 0 nsec
run_test:PASS:modify_return fentry_result 0 nsec
run_test:PASS:modify_return fexit_result 0 nsec
run_test:PASS:modify_return fmod_ret_result 0 nsec
run_test:PASS:modify_return fentry_result2 0 nsec
run_test:PASS:modify_return fexit_result2 0 nsec
run_test:PASS:modify_return fmod_ret_result2 0 nsec
run_test:PASS:skel_load 0 nsec
run_test:PASS:modify_return__attach failed 0 nsec
run_test:PASS:test_run 0 nsec
run_test:PASS:test_run ret 0 nsec
run_test:PASS:modify_return side_effect 0 nsec
run_test:PASS:modify_return fentry_result 0 nsec
run_test:PASS:modify_return fexit_result 0 nsec
run_test:PASS:modify_return fmod_ret_result 0 nsec
run_test:PASS:modify_return fentry_result2 0 nsec
run_test:PASS:modify_return fexit_result2 0 nsec
run_test:PASS:modify_return fmod_ret_result2 0 nsec
#200     modify_return:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Successfully unloaded bpf_testmod.ko.
[root@localhost bpf]# ./test_progs -v -t module_attach
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
test_module_attach:PASS:skel_load 0 nsec
libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
test_module_attach:FAIL:skel_attach skeleton attach failed: -95
#201     module_attach:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.


Chenghao

> 
> >
> >
> > Chenghao
> >
> > >
> > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > >
> > > > > > > > Hi, Vincent,
> > > > > > > >
> > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Folks,
> > > > > > > > >
> > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > lockup immediately.
> > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > >
> > > > > > >
> > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > patches and will let you know the result.
> > > > > > >
> > > > > >
> > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > feature not supported
> > > > > >
> > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > #205     module_attach:FAIL
> > > > > >
> > > > > > All error logs:
> > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > #205     module_attach:FAIL
> > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > >
> > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > >
> > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > 90000000041d5848
> > > > > > [  419.728629] Oops[#1]:
> > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > 9000000004163938
> > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > time, OOM is now expected behavior.
> > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > 0000000000000010 90000001003b0680
> > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > 9000000006b21000 0000000000000005
> > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > 0000000000000000 0000000000000004
> > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > 9000000006b0e000 9000000006b21000
> > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > 9000000006c31140 900000000578c9b0
> > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > 00000000000000b4 0000000000000000
> > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > 9000000004284f20 000000000a400001
> > > > > > [  451.305581]         ...
> > > > > > [  451.305584] Call Trace:
> > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > >
> > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > >
> > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > >
> > > > > > > > Huacai
> > > > > > > >
> > > > > > > > >
> > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > the module_attach test is not skipped.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-14 12:00                 ` Chenghao Duan
@ 2025-08-14 13:42                   ` Vincent Li
  2025-08-14 13:47                     ` Vincent Li
  2025-08-21 15:04                   ` Vincent Li
  1 sibling, 1 reply; 18+ messages in thread
From: Vincent Li @ 2025-08-14 13:42 UTC (permalink / raw)
  To: Chenghao Duan; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
>
> On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > >
> > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > Hi Chenghao,
> > > >
> > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > >
> > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > >
> > > > > > Hi, Chenghao,
> > > > > >
> > > > > > Please take a look.
> > > > > >
> > > > > > Huacai
> > > > > >
> > > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > module_attach test experienced the same issue, so definitely
> > > > > trampoline patches issue.
> > > > >
> > > >
> > > > I attempted to isolate which test in module_attach triggers the
> > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > this one in "prog_tests/module_attach.c"
> > > >
> > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > >
> > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > and perform the test, it might help isolate the issue.
> > > >
> > >
> > > Hi Vincent,
> > >
> > > The results I tested are different from yours. Could there be other
> > > differences between us? I am using the latest code of the loongarch-next
> > > branch.
> > >
> > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > bpf_testmod.ko is already unloaded.
> > > Loading bpf_testmod.ko...
> > > Successfully loaded bpf_testmod.ko.
> > > test_module_attach:PASS:skel_open 0 nsec
> > > test_module_attach:PASS:set_attach_target 0 nsec
> > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > bpf_testmod_test_read() is not modifiable
> > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > -- END PROG LOAD LOG --
> > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > libbpf: failed to load object 'test_module_attach'
> > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > #205     module_attach:FAIL
> > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > Successfully unloaded bpf_testmod.ko.
> > >
> >
> > I build and run the most recent loongarch-next kernel too, can you try
> > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > on fedora, here are the steps I build, run the kernel, and run the
> > test
> >
> > 1, check branch
> > [root@fedora linux-loongson]# git branch
> > * loongarch-next
> >   master
> >   no-tailcall
> >   no-trampoline
> >
> > 2, build kernel and reboot
> > cp config.txt .config; make clean; make -j6; make modules_install;
> > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> >
> > 3, after reboot and login, build bpf selftests, run module_attach
> > test, dmesg to check kernel log
> > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> >
>
> Hi Vincent,
>
> I tried to refer to the config you provided, but the test results I
> obtained are as follows. I also specifically tested "modify" to verify
> the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
>
> [root@localhost bpf]# ./test_progs -v -t modify_return
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> run_test:PASS:skel_load 0 nsec
> run_test:PASS:modify_return__attach failed 0 nsec
> run_test:PASS:test_run 0 nsec
> run_test:PASS:test_run ret 0 nsec
> run_test:PASS:modify_return side_effect 0 nsec
> run_test:PASS:modify_return fentry_result 0 nsec
> run_test:PASS:modify_return fexit_result 0 nsec
> run_test:PASS:modify_return fmod_ret_result 0 nsec
> run_test:PASS:modify_return fentry_result2 0 nsec
> run_test:PASS:modify_return fexit_result2 0 nsec
> run_test:PASS:modify_return fmod_ret_result2 0 nsec
> run_test:PASS:skel_load 0 nsec
> run_test:PASS:modify_return__attach failed 0 nsec
> run_test:PASS:test_run 0 nsec
> run_test:PASS:test_run ret 0 nsec
> run_test:PASS:modify_return side_effect 0 nsec
> run_test:PASS:modify_return fentry_result 0 nsec
> run_test:PASS:modify_return fexit_result 0 nsec
> run_test:PASS:modify_return fmod_ret_result 0 nsec
> run_test:PASS:modify_return fentry_result2 0 nsec
> run_test:PASS:modify_return fexit_result2 0 nsec
> run_test:PASS:modify_return fmod_ret_result2 0 nsec
> #200     modify_return:OK
> Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> Successfully unloaded bpf_testmod.ko.
> [root@localhost bpf]# ./test_progs -v -t module_attach
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> test_module_attach:PASS:skel_open 0 nsec
> test_module_attach:PASS:set_attach_target 0 nsec
> test_module_attach:PASS:set_attach_target_explicit 0 nsec
> test_module_attach:PASS:skel_load 0 nsec
> libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
> test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> #201     module_attach:FAIL
> Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> Successfully unloaded bpf_testmod.ko.
>
this is what I got with addition of -v, it appears you failed at
skel_attach and maybe your libbpf is outdated and does not support
kprobe_multi? my libbpf is 1.5

/usr/lib64/libbpf.so.1.5.0

[root@fedora bpf]# ./test_progs -v -t module_attach
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
test_module_attach:PASS:skel_load 0 nsec
test_module_attach:PASS:skel_attach 0 nsec
trigger_module_test_read:PASS:testmod_file_open 0 nsec
WATCHDOG: test case module_attach executes for 10 seconds...
WATCHDOG: test case module_attach executes for 120 seconds,
terminating with SIGSEGV

>
> Chenghao
>
> >
> > >
> > >
> > > Chenghao
> > >
> > > >
> > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > Hi, Vincent,
> > > > > > > > >
> > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Folks,
> > > > > > > > > >
> > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > lockup immediately.
> > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > >
> > > > > > > >
> > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > patches and will let you know the result.
> > > > > > > >
> > > > > > >
> > > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > feature not supported
> > > > > > >
> > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > #205     module_attach:FAIL
> > > > > > >
> > > > > > > All error logs:
> > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > #205     module_attach:FAIL
> > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > >
> > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > > >
> > > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > 90000000041d5848
> > > > > > > [  419.728629] Oops[#1]:
> > > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > 9000000004163938
> > > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > time, OOM is now expected behavior.
> > > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > 0000000000000010 90000001003b0680
> > > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > > 9000000006b21000 0000000000000005
> > > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > > 0000000000000000 0000000000000004
> > > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > > 00000000000000b4 0000000000000000
> > > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > > 9000000004284f20 000000000a400001
> > > > > > > [  451.305581]         ...
> > > > > > > [  451.305584] Call Trace:
> > > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > >
> > > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > >
> > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > >
> > > > > > > > > Huacai
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-14 13:42                   ` Vincent Li
@ 2025-08-14 13:47                     ` Vincent Li
  0 siblings, 0 replies; 18+ messages in thread
From: Vincent Li @ 2025-08-14 13:47 UTC (permalink / raw)
  To: Chenghao Duan; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Thu, Aug 14, 2025 at 6:42 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
>
> On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> >
> > On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > >
> > > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > > Hi Chenghao,
> > > > >
> > > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > >
> > > > > > > Hi, Chenghao,
> > > > > > >
> > > > > > > Please take a look.
> > > > > > >
> > > > > > > Huacai
> > > > > > >
> > > > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > > module_attach test experienced the same issue, so definitely
> > > > > > trampoline patches issue.
> > > > > >
> > > > >
> > > > > I attempted to isolate which test in module_attach triggers the
> > > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > > this one in "prog_tests/module_attach.c"
> > > > >
> > > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > > >
> > > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > > and perform the test, it might help isolate the issue.
> > > > >
> > > >
> > > > Hi Vincent,
> > > >
> > > > The results I tested are different from yours. Could there be other
> > > > differences between us? I am using the latest code of the loongarch-next
> > > > branch.
> > > >
> > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > bpf_testmod.ko is already unloaded.
> > > > Loading bpf_testmod.ko...
> > > > Successfully loaded bpf_testmod.ko.
> > > > test_module_attach:PASS:skel_open 0 nsec
> > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > > bpf_testmod_test_read() is not modifiable
> > > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > > -- END PROG LOAD LOG --
> > > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > > libbpf: failed to load object 'test_module_attach'
> > > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > > #205     module_attach:FAIL
> > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > Successfully unloaded bpf_testmod.ko.
> > > >
> > >
> > > I build and run the most recent loongarch-next kernel too, can you try
> > > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > > on fedora, here are the steps I build, run the kernel, and run the
> > > test
> > >
> > > 1, check branch
> > > [root@fedora linux-loongson]# git branch
> > > * loongarch-next
> > >   master
> > >   no-tailcall
> > >   no-trampoline
> > >
> > > 2, build kernel and reboot
> > > cp config.txt .config; make clean; make -j6; make modules_install;
> > > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> > >
> > > 3, after reboot and login, build bpf selftests, run module_attach
> > > test, dmesg to check kernel log
> > > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> > >
> >
> > Hi Vincent,
> >
> > I tried to refer to the config you provided, but the test results I
> > obtained are as follows. I also specifically tested "modify" to verify
> > the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
> >
> > [root@localhost bpf]# ./test_progs -v -t modify_return
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > run_test:PASS:skel_load 0 nsec
> > run_test:PASS:modify_return__attach failed 0 nsec
> > run_test:PASS:test_run 0 nsec
> > run_test:PASS:test_run ret 0 nsec
> > run_test:PASS:modify_return side_effect 0 nsec
> > run_test:PASS:modify_return fentry_result 0 nsec
> > run_test:PASS:modify_return fexit_result 0 nsec
> > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > run_test:PASS:modify_return fentry_result2 0 nsec
> > run_test:PASS:modify_return fexit_result2 0 nsec
> > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > run_test:PASS:skel_load 0 nsec
> > run_test:PASS:modify_return__attach failed 0 nsec
> > run_test:PASS:test_run 0 nsec
> > run_test:PASS:test_run ret 0 nsec
> > run_test:PASS:modify_return side_effect 0 nsec
> > run_test:PASS:modify_return fentry_result 0 nsec
> > run_test:PASS:modify_return fexit_result 0 nsec
> > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > run_test:PASS:modify_return fentry_result2 0 nsec
> > run_test:PASS:modify_return fexit_result2 0 nsec
> > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > #200     modify_return:OK
> > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > Successfully unloaded bpf_testmod.ko.
> > [root@localhost bpf]# ./test_progs -v -t module_attach
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > test_module_attach:PASS:skel_open 0 nsec
> > test_module_attach:PASS:set_attach_target 0 nsec
> > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > test_module_attach:PASS:skel_load 0 nsec
> > libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> > libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
> > test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> > #201     module_attach:FAIL
> > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > Successfully unloaded bpf_testmod.ko.
> >
> this is what I got with addition of -v, it appears you failed at
> skel_attach and maybe your libbpf is outdated and does not support
> kprobe_multi? my libbpf is 1.5
>
> /usr/lib64/libbpf.so.1.5.0
>

also double check if you have below in the .config? I recall
kprobe_multi requires CONFIG_FPROBE

[root@fedora linux-loongson]# grep 'KPROBE' .config
CONFIG_KPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
CONFIG_BPF_KPROBE_OVERRIDE=y
# CONFIG_KPROBE_EVENT_GEN_TEST is not set


[root@fedora linux-loongson]# grep 'FPROBE' .config
CONFIG_FPROBE=y
CONFIG_FPROBE_EVENTS=y

> [root@fedora bpf]# ./test_progs -v -t module_attach
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> test_module_attach:PASS:skel_open 0 nsec
> test_module_attach:PASS:set_attach_target 0 nsec
> test_module_attach:PASS:set_attach_target_explicit 0 nsec
> test_module_attach:PASS:skel_load 0 nsec
> test_module_attach:PASS:skel_attach 0 nsec
> trigger_module_test_read:PASS:testmod_file_open 0 nsec
> WATCHDOG: test case module_attach executes for 10 seconds...
> WATCHDOG: test case module_attach executes for 120 seconds,
> terminating with SIGSEGV
>
> >
> > Chenghao
> >
> > >
> > > >
> > > >
> > > > Chenghao
> > > >
> > > > >
> > > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi, Vincent,
> > > > > > > > > >
> > > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Folks,
> > > > > > > > > > >
> > > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > > lockup immediately.
> > > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > > patches and will let you know the result.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > > feature not supported
> > > > > > > >
> > > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > #205     module_attach:FAIL
> > > > > > > >
> > > > > > > > All error logs:
> > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > #205     module_attach:FAIL
> > > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > >
> > > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > > > >
> > > > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > > 90000000041d5848
> > > > > > > > [  419.728629] Oops[#1]:
> > > > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > > 9000000004163938
> > > > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > > time, OOM is now expected behavior.
> > > > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > > 0000000000000010 90000001003b0680
> > > > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > > > 9000000006b21000 0000000000000005
> > > > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > > > 0000000000000000 0000000000000004
> > > > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > > > 00000000000000b4 0000000000000000
> > > > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > > > 9000000004284f20 000000000a400001
> > > > > > > > [  451.305581]         ...
> > > > > > > > [  451.305584] Call Trace:
> > > > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > > >
> > > > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > > >
> > > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > > >
> > > > > > > > > > Huacai
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-14 12:00                 ` Chenghao Duan
  2025-08-14 13:42                   ` Vincent Li
@ 2025-08-21 15:04                   ` Vincent Li
  2025-08-22  3:11                     ` Chenghao Duan
  1 sibling, 1 reply; 18+ messages in thread
From: Vincent Li @ 2025-08-21 15:04 UTC (permalink / raw)
  To: Chenghao Duan; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
>
> On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > >
> > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > Hi Chenghao,
> > > >
> > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > >
> > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > >
> > > > > > Hi, Chenghao,
> > > > > >
> > > > > > Please take a look.
> > > > > >
> > > > > > Huacai
> > > > > >
> > > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > module_attach test experienced the same issue, so definitely
> > > > > trampoline patches issue.
> > > > >
> > > >
> > > > I attempted to isolate which test in module_attach triggers the
> > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > this one in "prog_tests/module_attach.c"
> > > >
> > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > >
> > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > and perform the test, it might help isolate the issue.
> > > >
> > >
> > > Hi Vincent,
> > >
> > > The results I tested are different from yours. Could there be other
> > > differences between us? I am using the latest code of the loongarch-next
> > > branch.
> > >
> > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > bpf_testmod.ko is already unloaded.
> > > Loading bpf_testmod.ko...
> > > Successfully loaded bpf_testmod.ko.
> > > test_module_attach:PASS:skel_open 0 nsec
> > > test_module_attach:PASS:set_attach_target 0 nsec
> > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > bpf_testmod_test_read() is not modifiable
> > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > -- END PROG LOAD LOG --
> > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > libbpf: failed to load object 'test_module_attach'
> > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > #205     module_attach:FAIL
> > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > Successfully unloaded bpf_testmod.ko.
> > >
> >
> > I build and run the most recent loongarch-next kernel too, can you try
> > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > on fedora, here are the steps I build, run the kernel, and run the
> > test
> >
> > 1, check branch
> > [root@fedora linux-loongson]# git branch
> > * loongarch-next
> >   master
> >   no-tailcall
> >   no-trampoline
> >
> > 2, build kernel and reboot
> > cp config.txt .config; make clean; make -j6; make modules_install;
> > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> >
> > 3, after reboot and login, build bpf selftests, run module_attach
> > test, dmesg to check kernel log
> > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> >
>
> Hi Vincent,
>
> I tried to refer to the config you provided, but the test results I
> obtained are as follows. I also specifically tested "modify" to verify
> the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
>
> [root@localhost bpf]# ./test_progs -v -t modify_return
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> run_test:PASS:skel_load 0 nsec
> run_test:PASS:modify_return__attach failed 0 nsec
> run_test:PASS:test_run 0 nsec
> run_test:PASS:test_run ret 0 nsec
> run_test:PASS:modify_return side_effect 0 nsec
> run_test:PASS:modify_return fentry_result 0 nsec
> run_test:PASS:modify_return fexit_result 0 nsec
> run_test:PASS:modify_return fmod_ret_result 0 nsec
> run_test:PASS:modify_return fentry_result2 0 nsec
> run_test:PASS:modify_return fexit_result2 0 nsec
> run_test:PASS:modify_return fmod_ret_result2 0 nsec
> run_test:PASS:skel_load 0 nsec
> run_test:PASS:modify_return__attach failed 0 nsec
> run_test:PASS:test_run 0 nsec
> run_test:PASS:test_run ret 0 nsec
> run_test:PASS:modify_return side_effect 0 nsec
> run_test:PASS:modify_return fentry_result 0 nsec
> run_test:PASS:modify_return fexit_result 0 nsec
> run_test:PASS:modify_return fmod_ret_result 0 nsec
> run_test:PASS:modify_return fentry_result2 0 nsec
> run_test:PASS:modify_return fexit_result2 0 nsec
> run_test:PASS:modify_return fmod_ret_result2 0 nsec
> #200     modify_return:OK
> Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> Successfully unloaded bpf_testmod.ko.
> [root@localhost bpf]# ./test_progs -v -t module_attach
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> test_module_attach:PASS:skel_open 0 nsec
> test_module_attach:PASS:set_attach_target 0 nsec
> test_module_attach:PASS:set_attach_target_explicit 0 nsec
> test_module_attach:PASS:skel_load 0 nsec
> libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP

the -EOPNOTSUPP comes from libbpf, but I am not sure if it is error in
kernel leads to libbpf error or libbpf itself, you can do strace -f
-s1024 -o /tmp/module_attatch.txt  ./test_progs -v -t module_attach.
The strace should have bpf syscall and I think it can tell you if the
-EOPNOTSUPP is the result of kernel error or libbpf, you can share the
strace if you want.


> test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> #201     module_attach:FAIL
> Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> Successfully unloaded bpf_testmod.ko.
>
>
> Chenghao
>
> >
> > >
> > >
> > > Chenghao
> > >
> > > >
> > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > Hi, Vincent,
> > > > > > > > >
> > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Folks,
> > > > > > > > > >
> > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > lockup immediately.
> > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > >
> > > > > > > >
> > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > patches and will let you know the result.
> > > > > > > >
> > > > > > >
> > > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > feature not supported
> > > > > > >
> > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > #205     module_attach:FAIL
> > > > > > >
> > > > > > > All error logs:
> > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > #205     module_attach:FAIL
> > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > >
> > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > > >
> > > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > 90000000041d5848
> > > > > > > [  419.728629] Oops[#1]:
> > > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > 9000000004163938
> > > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > time, OOM is now expected behavior.
> > > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > 0000000000000010 90000001003b0680
> > > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > > 9000000006b21000 0000000000000005
> > > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > > 0000000000000000 0000000000000004
> > > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > > 00000000000000b4 0000000000000000
> > > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > > 9000000004284f20 000000000a400001
> > > > > > > [  451.305581]         ...
> > > > > > > [  451.305584] Call Trace:
> > > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > >
> > > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > >
> > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > >
> > > > > > > > > Huacai
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-21 15:04                   ` Vincent Li
@ 2025-08-22  3:11                     ` Chenghao Duan
  2025-08-22  5:10                       ` Vincent Li
  0 siblings, 1 reply; 18+ messages in thread
From: Chenghao Duan @ 2025-08-22  3:11 UTC (permalink / raw)
  To: Vincent Li; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Thu, Aug 21, 2025 at 08:04:07AM -0700, Vincent Li wrote:
> On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> >
> > On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > >
> > > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > > Hi Chenghao,
> > > > >
> > > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > >
> > > > > > > Hi, Chenghao,
> > > > > > >
> > > > > > > Please take a look.
> > > > > > >
> > > > > > > Huacai
> > > > > > >
> > > > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > > module_attach test experienced the same issue, so definitely
> > > > > > trampoline patches issue.
> > > > > >
> > > > >
> > > > > I attempted to isolate which test in module_attach triggers the
> > > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > > this one in "prog_tests/module_attach.c"
> > > > >
> > > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > > >
> > > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > > and perform the test, it might help isolate the issue.
> > > > >
> > > >
> > > > Hi Vincent,
> > > >
> > > > The results I tested are different from yours. Could there be other
> > > > differences between us? I am using the latest code of the loongarch-next
> > > > branch.
> > > >
> > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > bpf_testmod.ko is already unloaded.
> > > > Loading bpf_testmod.ko...
> > > > Successfully loaded bpf_testmod.ko.
> > > > test_module_attach:PASS:skel_open 0 nsec
> > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > > bpf_testmod_test_read() is not modifiable
> > > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > > -- END PROG LOAD LOG --
> > > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > > libbpf: failed to load object 'test_module_attach'
> > > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > > #205     module_attach:FAIL
> > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > Successfully unloaded bpf_testmod.ko.
> > > >
> > >
> > > I build and run the most recent loongarch-next kernel too, can you try
> > > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > > on fedora, here are the steps I build, run the kernel, and run the
> > > test
> > >
> > > 1, check branch
> > > [root@fedora linux-loongson]# git branch
> > > * loongarch-next
> > >   master
> > >   no-tailcall
> > >   no-trampoline
> > >
> > > 2, build kernel and reboot
> > > cp config.txt .config; make clean; make -j6; make modules_install;
> > > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> > >
> > > 3, after reboot and login, build bpf selftests, run module_attach
> > > test, dmesg to check kernel log
> > > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> > >
> >
> > Hi Vincent,
> >
> > I tried to refer to the config you provided, but the test results I
> > obtained are as follows. I also specifically tested "modify" to verify
> > the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
> >
> > [root@localhost bpf]# ./test_progs -v -t modify_return
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > run_test:PASS:skel_load 0 nsec
> > run_test:PASS:modify_return__attach failed 0 nsec
> > run_test:PASS:test_run 0 nsec
> > run_test:PASS:test_run ret 0 nsec
> > run_test:PASS:modify_return side_effect 0 nsec
> > run_test:PASS:modify_return fentry_result 0 nsec
> > run_test:PASS:modify_return fexit_result 0 nsec
> > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > run_test:PASS:modify_return fentry_result2 0 nsec
> > run_test:PASS:modify_return fexit_result2 0 nsec
> > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > run_test:PASS:skel_load 0 nsec
> > run_test:PASS:modify_return__attach failed 0 nsec
> > run_test:PASS:test_run 0 nsec
> > run_test:PASS:test_run ret 0 nsec
> > run_test:PASS:modify_return side_effect 0 nsec
> > run_test:PASS:modify_return fentry_result 0 nsec
> > run_test:PASS:modify_return fexit_result 0 nsec
> > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > run_test:PASS:modify_return fentry_result2 0 nsec
> > run_test:PASS:modify_return fexit_result2 0 nsec
> > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > #200     modify_return:OK
> > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > Successfully unloaded bpf_testmod.ko.
> > [root@localhost bpf]# ./test_progs -v -t module_attach
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > test_module_attach:PASS:skel_open 0 nsec
> > test_module_attach:PASS:set_attach_target 0 nsec
> > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > test_module_attach:PASS:skel_load 0 nsec
> > libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> > libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
> 
> the -EOPNOTSUPP comes from libbpf, but I am not sure if it is error in
> kernel leads to libbpf error or libbpf itself, you can do strace -f
> -s1024 -o /tmp/module_attatch.txt  ./test_progs -v -t module_attach.
> The strace should have bpf syscall and I think it can tell you if the
> -EOPNOTSUPP is the result of kernel error or libbpf, you can share the
> strace if you want.
> 
2037  read(16, "", 8192)                = 0
2037  close(16)                         = 0
2037  bpf(BPF_LINK_CREATE, {link_create={prog_fd=61, target_fd=0, attach_type=BPF_TRACE_KPROBE_MULTI, flags=0, kprobe_multi={flags=0, cnt=1, syms=NULL, addrs=[0xffff8000035717d0], cookies=NULL}}}, 64) = -1 EOPNOTSUPP (不支持的操作)
2037  write(1, "libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP\n", 59) = 59
2037  write(1, "libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP\n", 64) = 64
2037  write(1, "test_module_attach:FAIL:skel_attach skeleton attach failed: -95\n", 64) = 64

not support attach_type BPF_TRACE_KPROBE_MULTI

Chenghao


> 
> > test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> > #201     module_attach:FAIL
> > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > Successfully unloaded bpf_testmod.ko.
> >
> >
> > Chenghao
> >
> > >
> > > >
> > > >
> > > > Chenghao
> > > >
> > > > >
> > > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi, Vincent,
> > > > > > > > > >
> > > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Folks,
> > > > > > > > > > >
> > > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > > lockup immediately.
> > > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > > patches and will let you know the result.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > > feature not supported
> > > > > > > >
> > > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > #205     module_attach:FAIL
> > > > > > > >
> > > > > > > > All error logs:
> > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > #205     module_attach:FAIL
> > > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > >
> > > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > > > >
> > > > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > > 90000000041d5848
> > > > > > > > [  419.728629] Oops[#1]:
> > > > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > > 9000000004163938
> > > > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > > time, OOM is now expected behavior.
> > > > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > > 0000000000000010 90000001003b0680
> > > > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > > > 9000000006b21000 0000000000000005
> > > > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > > > 0000000000000000 0000000000000004
> > > > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > > > 00000000000000b4 0000000000000000
> > > > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > > > 9000000004284f20 000000000a400001
> > > > > > > > [  451.305581]         ...
> > > > > > > > [  451.305584] Call Trace:
> > > > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > > >
> > > > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > > >
> > > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > > >
> > > > > > > > > > Huacai
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-22  3:11                     ` Chenghao Duan
@ 2025-08-22  5:10                       ` Vincent Li
  2025-08-22  5:22                         ` Vincent Li
  0 siblings, 1 reply; 18+ messages in thread
From: Vincent Li @ 2025-08-22  5:10 UTC (permalink / raw)
  To: Chenghao Duan; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Thu, Aug 21, 2025 at 8:11 PM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
>
> On Thu, Aug 21, 2025 at 08:04:07AM -0700, Vincent Li wrote:
> > On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > >
> > > On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > > > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > > >
> > > > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > > > Hi Chenghao,
> > > > > >
> > > > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > >
> > > > > > > > Hi, Chenghao,
> > > > > > > >
> > > > > > > > Please take a look.
> > > > > > > >
> > > > > > > > Huacai
> > > > > > > >
> > > > > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > > > module_attach test experienced the same issue, so definitely
> > > > > > > trampoline patches issue.
> > > > > > >
> > > > > >
> > > > > > I attempted to isolate which test in module_attach triggers the
> > > > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > > > this one in "prog_tests/module_attach.c"
> > > > > >
> > > > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > > > >
> > > > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > > > and perform the test, it might help isolate the issue.
> > > > > >
> > > > >
> > > > > Hi Vincent,
> > > > >
> > > > > The results I tested are different from yours. Could there be other
> > > > > differences between us? I am using the latest code of the loongarch-next
> > > > > branch.
> > > > >
> > > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > > bpf_testmod.ko is already unloaded.
> > > > > Loading bpf_testmod.ko...
> > > > > Successfully loaded bpf_testmod.ko.
> > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > > > bpf_testmod_test_read() is not modifiable
> > > > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > > > -- END PROG LOAD LOG --
> > > > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > > > libbpf: failed to load object 'test_module_attach'
> > > > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > > > #205     module_attach:FAIL
> > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > Successfully unloaded bpf_testmod.ko.
> > > > >
> > > >
> > > > I build and run the most recent loongarch-next kernel too, can you try
> > > > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > > > on fedora, here are the steps I build, run the kernel, and run the
> > > > test
> > > >
> > > > 1, check branch
> > > > [root@fedora linux-loongson]# git branch
> > > > * loongarch-next
> > > >   master
> > > >   no-tailcall
> > > >   no-trampoline
> > > >
> > > > 2, build kernel and reboot
> > > > cp config.txt .config; make clean; make -j6; make modules_install;
> > > > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> > > >
> > > > 3, after reboot and login, build bpf selftests, run module_attach
> > > > test, dmesg to check kernel log
> > > > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> > > >
> > >
> > > Hi Vincent,
> > >
> > > I tried to refer to the config you provided, but the test results I
> > > obtained are as follows. I also specifically tested "modify" to verify
> > > the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
> > >
> > > [root@localhost bpf]# ./test_progs -v -t modify_return
> > > bpf_testmod.ko is already unloaded.
> > > Loading bpf_testmod.ko...
> > > Successfully loaded bpf_testmod.ko.
> > > run_test:PASS:skel_load 0 nsec
> > > run_test:PASS:modify_return__attach failed 0 nsec
> > > run_test:PASS:test_run 0 nsec
> > > run_test:PASS:test_run ret 0 nsec
> > > run_test:PASS:modify_return side_effect 0 nsec
> > > run_test:PASS:modify_return fentry_result 0 nsec
> > > run_test:PASS:modify_return fexit_result 0 nsec
> > > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > > run_test:PASS:modify_return fentry_result2 0 nsec
> > > run_test:PASS:modify_return fexit_result2 0 nsec
> > > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > > run_test:PASS:skel_load 0 nsec
> > > run_test:PASS:modify_return__attach failed 0 nsec
> > > run_test:PASS:test_run 0 nsec
> > > run_test:PASS:test_run ret 0 nsec
> > > run_test:PASS:modify_return side_effect 0 nsec
> > > run_test:PASS:modify_return fentry_result 0 nsec
> > > run_test:PASS:modify_return fexit_result 0 nsec
> > > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > > run_test:PASS:modify_return fentry_result2 0 nsec
> > > run_test:PASS:modify_return fexit_result2 0 nsec
> > > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > > #200     modify_return:OK
> > > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > > Successfully unloaded bpf_testmod.ko.
> > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > bpf_testmod.ko is already unloaded.
> > > Loading bpf_testmod.ko...
> > > Successfully loaded bpf_testmod.ko.
> > > test_module_attach:PASS:skel_open 0 nsec
> > > test_module_attach:PASS:set_attach_target 0 nsec
> > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > test_module_attach:PASS:skel_load 0 nsec
> > > libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> > > libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
> >
> > the -EOPNOTSUPP comes from libbpf, but I am not sure if it is error in
> > kernel leads to libbpf error or libbpf itself, you can do strace -f
> > -s1024 -o /tmp/module_attatch.txt  ./test_progs -v -t module_attach.
> > The strace should have bpf syscall and I think it can tell you if the
> > -EOPNOTSUPP is the result of kernel error or libbpf, you can share the
> > strace if you want.
> >
> 2037  read(16, "", 8192)                = 0
> 2037  close(16)                         = 0
> 2037  bpf(BPF_LINK_CREATE, {link_create={prog_fd=61, target_fd=0, attach_type=BPF_TRACE_KPROBE_MULTI, flags=0, kprobe_multi={flags=0, cnt=1, syms=NULL, addrs=[0xffff8000035717d0], cookies=NULL}}}, 64) = -1 EOPNOTSUPP (不支持的操作)

so bpf syscall cmd BPF_LINK_CREATE returns  '-1 EOPNOTSUPP' exactly? I
could not tell because I thought the return value is '-1'

> 2037  write(1, "libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP\n", 59) = 59
> 2037  write(1, "libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP\n", 64) = 64
> 2037  write(1, "test_module_attach:FAIL:skel_attach skeleton attach failed: -95\n", 64) = 64
>
> not support attach_type BPF_TRACE_KPROBE_MULTI
>

Could you share your kernel config (.config used for kernel compiling
or running kernel /boot/config-*)  ? I wonder if you have the FPROBE
really configured, since include/linux/fprobe.h has:

#ifdef CONFIG_FPROBE
int register_fprobe(struct fprobe *fp, const char *filter, const char
*notfilter);
int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);
int unregister_fprobe(struct fprobe *fp);
bool fprobe_is_registered(struct fprobe *fp);
int fprobe_count_ips_from_filter(const char *filter, const char *notfilter);
#else
static inline int register_fprobe(struct fprobe *fp, const char
*filter, const char *notfilter)
{
        return -EOPNOTSUPP;
}
static inline int register_fprobe_ips(struct fprobe *fp, unsigned long
*addrs, int num)
{
        return -EOPNOTSUPP;
}
static inline int register_fprobe_syms(struct fprobe *fp, const char
**syms, int num)
{
        return -EOPNOTSUPP;
}
static inline int unregister_fprobe(struct fprobe *fp)
{
        return -EOPNOTSUPP;
}
static inline bool fprobe_is_registered(struct fprobe *fp)
{
        return false;
}
static inline int fprobe_count_ips_from_filter(const char *filter,
const char *notfilter)
{
        return -EOPNOTSUPP;
}
#endif

> Chenghao
>
>
> >
> > > test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> > > #201     module_attach:FAIL
> > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > Successfully unloaded bpf_testmod.ko.
> > >
> > >
> > > Chenghao
> > >
> > > >
> > > > >
> > > > >
> > > > > Chenghao
> > > > >
> > > > > >
> > > > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi, Vincent,
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Folks,
> > > > > > > > > > > >
> > > > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > > > lockup immediately.
> > > > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > > > patches and will let you know the result.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > > > feature not supported
> > > > > > > > >
> > > > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > > #205     module_attach:FAIL
> > > > > > > > >
> > > > > > > > > All error logs:
> > > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > > #205     module_attach:FAIL
> > > > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > > >
> > > > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > > > > >
> > > > > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > > > 90000000041d5848
> > > > > > > > > [  419.728629] Oops[#1]:
> > > > > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > > > 9000000004163938
> > > > > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > > > time, OOM is now expected behavior.
> > > > > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > > > 0000000000000010 90000001003b0680
> > > > > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > > > > 9000000006b21000 0000000000000005
> > > > > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > > > > 0000000000000000 0000000000000004
> > > > > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > > > > 00000000000000b4 0000000000000000
> > > > > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > > > > 9000000004284f20 000000000a400001
> > > > > > > > > [  451.305581]         ...
> > > > > > > > > [  451.305584] Call Trace:
> > > > > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > > > >
> > > > > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > > > >
> > > > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > > > >
> > > > > > > > > > > Huacai
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-22  5:10                       ` Vincent Li
@ 2025-08-22  5:22                         ` Vincent Li
  2025-08-22  5:33                           ` Vincent Li
  2025-08-22  5:36                           ` Chenghao Duan
  0 siblings, 2 replies; 18+ messages in thread
From: Vincent Li @ 2025-08-22  5:22 UTC (permalink / raw)
  To: Chenghao Duan; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Thu, Aug 21, 2025 at 10:10 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
>
> On Thu, Aug 21, 2025 at 8:11 PM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> >
> > On Thu, Aug 21, 2025 at 08:04:07AM -0700, Vincent Li wrote:
> > > On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > >
> > > > On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > > > > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > > > >
> > > > > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > > > > Hi Chenghao,
> > > > > > >
> > > > > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > Hi, Chenghao,
> > > > > > > > >
> > > > > > > > > Please take a look.
> > > > > > > > >
> > > > > > > > > Huacai
> > > > > > > > >
> > > > > > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > > > > module_attach test experienced the same issue, so definitely
> > > > > > > > trampoline patches issue.
> > > > > > > >
> > > > > > >
> > > > > > > I attempted to isolate which test in module_attach triggers the
> > > > > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > > > > this one in "prog_tests/module_attach.c"
> > > > > > >
> > > > > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > > > > >
> > > > > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > > > > and perform the test, it might help isolate the issue.
> > > > > > >
> > > > > >
> > > > > > Hi Vincent,
> > > > > >
> > > > > > The results I tested are different from yours. Could there be other
> > > > > > differences between us? I am using the latest code of the loongarch-next
> > > > > > branch.
> > > > > >
> > > > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > > > bpf_testmod.ko is already unloaded.
> > > > > > Loading bpf_testmod.ko...
> > > > > > Successfully loaded bpf_testmod.ko.
> > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > > > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > > > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > > > > bpf_testmod_test_read() is not modifiable
> > > > > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > > > > -- END PROG LOAD LOG --
> > > > > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > > > > libbpf: failed to load object 'test_module_attach'
> > > > > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > > > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > > > > #205     module_attach:FAIL
> > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > Successfully unloaded bpf_testmod.ko.
> > > > > >
> > > > >
> > > > > I build and run the most recent loongarch-next kernel too, can you try
> > > > > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > > > > on fedora, here are the steps I build, run the kernel, and run the
> > > > > test
> > > > >
> > > > > 1, check branch
> > > > > [root@fedora linux-loongson]# git branch
> > > > > * loongarch-next
> > > > >   master
> > > > >   no-tailcall
> > > > >   no-trampoline
> > > > >
> > > > > 2, build kernel and reboot
> > > > > cp config.txt .config; make clean; make -j6; make modules_install;
> > > > > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> > > > >
> > > > > 3, after reboot and login, build bpf selftests, run module_attach
> > > > > test, dmesg to check kernel log
> > > > > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> > > > >
> > > >
> > > > Hi Vincent,
> > > >
> > > > I tried to refer to the config you provided, but the test results I
> > > > obtained are as follows. I also specifically tested "modify" to verify
> > > > the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
> > > >
> > > > [root@localhost bpf]# ./test_progs -v -t modify_return
> > > > bpf_testmod.ko is already unloaded.
> > > > Loading bpf_testmod.ko...
> > > > Successfully loaded bpf_testmod.ko.
> > > > run_test:PASS:skel_load 0 nsec
> > > > run_test:PASS:modify_return__attach failed 0 nsec
> > > > run_test:PASS:test_run 0 nsec
> > > > run_test:PASS:test_run ret 0 nsec
> > > > run_test:PASS:modify_return side_effect 0 nsec
> > > > run_test:PASS:modify_return fentry_result 0 nsec
> > > > run_test:PASS:modify_return fexit_result 0 nsec
> > > > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > > > run_test:PASS:modify_return fentry_result2 0 nsec
> > > > run_test:PASS:modify_return fexit_result2 0 nsec
> > > > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > > > run_test:PASS:skel_load 0 nsec
> > > > run_test:PASS:modify_return__attach failed 0 nsec
> > > > run_test:PASS:test_run 0 nsec
> > > > run_test:PASS:test_run ret 0 nsec
> > > > run_test:PASS:modify_return side_effect 0 nsec
> > > > run_test:PASS:modify_return fentry_result 0 nsec
> > > > run_test:PASS:modify_return fexit_result 0 nsec
> > > > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > > > run_test:PASS:modify_return fentry_result2 0 nsec
> > > > run_test:PASS:modify_return fexit_result2 0 nsec
> > > > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > > > #200     modify_return:OK
> > > > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > > > Successfully unloaded bpf_testmod.ko.
> > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > bpf_testmod.ko is already unloaded.
> > > > Loading bpf_testmod.ko...
> > > > Successfully loaded bpf_testmod.ko.
> > > > test_module_attach:PASS:skel_open 0 nsec
> > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > test_module_attach:PASS:skel_load 0 nsec
> > > > libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> > > > libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
> > >
> > > the -EOPNOTSUPP comes from libbpf, but I am not sure if it is error in
> > > kernel leads to libbpf error or libbpf itself, you can do strace -f
> > > -s1024 -o /tmp/module_attatch.txt  ./test_progs -v -t module_attach.
> > > The strace should have bpf syscall and I think it can tell you if the
> > > -EOPNOTSUPP is the result of kernel error or libbpf, you can share the
> > > strace if you want.
> > >
> > 2037  read(16, "", 8192)                = 0
> > 2037  close(16)                         = 0
> > 2037  bpf(BPF_LINK_CREATE, {link_create={prog_fd=61, target_fd=0, attach_type=BPF_TRACE_KPROBE_MULTI, flags=0, kprobe_multi={flags=0, cnt=1, syms=NULL, addrs=[0xffff8000035717d0], cookies=NULL}}}, 64) = -1 EOPNOTSUPP (不支持的操作)
>
> so bpf syscall cmd BPF_LINK_CREATE returns  '-1 EOPNOTSUPP' exactly? I
> could not tell because I thought the return value is '-1'
>
> > 2037  write(1, "libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP\n", 59) = 59
> > 2037  write(1, "libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP\n", 64) = 64
> > 2037  write(1, "test_module_attach:FAIL:skel_attach skeleton attach failed: -95\n", 64) = 64
> >
> > not support attach_type BPF_TRACE_KPROBE_MULTI
> >
>
> Could you share your kernel config (.config used for kernel compiling
> or running kernel /boot/config-*)  ? I wonder if you have the FPROBE
> really configured, since include/linux/fprobe.h has:
>
> #ifdef CONFIG_FPROBE
> int register_fprobe(struct fprobe *fp, const char *filter, const char
> *notfilter);
> int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
> int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);
> int unregister_fprobe(struct fprobe *fp);
> bool fprobe_is_registered(struct fprobe *fp);
> int fprobe_count_ips_from_filter(const char *filter, const char *notfilter);
> #else
> static inline int register_fprobe(struct fprobe *fp, const char
> *filter, const char *notfilter)
> {
>         return -EOPNOTSUPP;
> }
> static inline int register_fprobe_ips(struct fprobe *fp, unsigned long
> *addrs, int num)
> {
>         return -EOPNOTSUPP;
> }
> static inline int register_fprobe_syms(struct fprobe *fp, const char
> **syms, int num)
> {
>         return -EOPNOTSUPP;
> }
> static inline int unregister_fprobe(struct fprobe *fp)
> {
>         return -EOPNOTSUPP;
> }
> static inline bool fprobe_is_registered(struct fprobe *fp)
> {
>         return false;
> }
> static inline int fprobe_count_ips_from_filter(const char *filter,
> const char *notfilter)
> {
>         return -EOPNOTSUPP;
> }
> #endif
>

and check CONFIG_BPF_EVENTS since linux/trace_events.h has:

#ifdef CONFIG_BPF_EVENTS
unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
int perf_event_attach_bpf_prog(struct perf_event *event, struct
bpf_prog *prog, u64 bpf_cookie);
void perf_event_detach_bpf_prog(struct perf_event *event);
int perf_event_query_prog_array(struct perf_event *event, void __user *info);

struct bpf_raw_tp_link;
int bpf_probe_register(struct bpf_raw_event_map *btp, struct
bpf_raw_tp_link *link);
int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct
bpf_raw_tp_link *link);

struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
                            u32 *fd_type, const char **buf,
                            u64 *probe_offset, u64 *probe_addr,
                            unsigned long *missed);
int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct
bpf_prog *prog);
int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct
bpf_prog *prog);
#else
static inline unsigned int trace_call_bpf(struct trace_event_call
*call, void *ctx)
{
        return 1;
}

static inline int
perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog
*prog, u64 bpf_cookie)
{
        return -EOPNOTSUPP;
}

static inline void perf_event_detach_bpf_prog(struct perf_event *event) { }

static inline int
perf_event_query_prog_array(struct perf_event *event, void __user *info)
{
        return -EOPNOTSUPP;
}
struct bpf_raw_tp_link;
static inline int bpf_probe_register(struct bpf_raw_event_map *btp,
struct bpf_raw_tp_link *link)
{
        return -EOPNOTSUPP;
}
static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp,
struct bpf_raw_tp_link *link)
{
        return -EOPNOTSUPP;
}
static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
{
        return NULL;
}
static inline void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
{
}
static inline int bpf_get_perf_event_info(const struct perf_event *event,
                                          u32 *prog_id, u32 *fd_type,
                                          const char **buf, u64 *probe_offset,
                                          u64 *probe_addr, unsigned
long *missed)
{
        return -EOPNOTSUPP;
}
static inline int
bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
        return -EOPNOTSUPP;
}
static inline int
bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
        return -EOPNOTSUPP;
}
#endif

> > Chenghao
> >
> >
> > >
> > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> > > > #201     module_attach:FAIL
> > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > Successfully unloaded bpf_testmod.ko.
> > > >
> > > >
> > > > Chenghao
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > Chenghao
> > > > > >
> > > > > > >
> > > > > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi, Vincent,
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Folks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > > > > lockup immediately.
> > > > > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > > > > patches and will let you know the result.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > > > > feature not supported
> > > > > > > > > >
> > > > > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > > > #205     module_attach:FAIL
> > > > > > > > > >
> > > > > > > > > > All error logs:
> > > > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > > > #205     module_attach:FAIL
> > > > > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > > > >
> > > > > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > > > > > >
> > > > > > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > > > > 90000000041d5848
> > > > > > > > > > [  419.728629] Oops[#1]:
> > > > > > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > > > > 9000000004163938
> > > > > > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > > > > time, OOM is now expected behavior.
> > > > > > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > > > > 0000000000000010 90000001003b0680
> > > > > > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > > > > > 9000000006b21000 0000000000000005
> > > > > > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > > > > > 0000000000000000 0000000000000004
> > > > > > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > > > > > 00000000000000b4 0000000000000000
> > > > > > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > > > > > 9000000004284f20 000000000a400001
> > > > > > > > > > [  451.305581]         ...
> > > > > > > > > > [  451.305584] Call Trace:
> > > > > > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > > > > >
> > > > > > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > > > > >
> > > > > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > > > > >
> > > > > > > > > > > > Huacai
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-22  5:22                         ` Vincent Li
@ 2025-08-22  5:33                           ` Vincent Li
  2025-08-22  5:36                           ` Chenghao Duan
  1 sibling, 0 replies; 18+ messages in thread
From: Vincent Li @ 2025-08-22  5:33 UTC (permalink / raw)
  To: Chenghao Duan; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Thu, Aug 21, 2025 at 10:22 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
>
> On Thu, Aug 21, 2025 at 10:10 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> >
> > On Thu, Aug 21, 2025 at 8:11 PM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > >
> > > On Thu, Aug 21, 2025 at 08:04:07AM -0700, Vincent Li wrote:
> > > > On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > > >
> > > > > On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > > > > > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > > > > >
> > > > > > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > > > > > Hi Chenghao,
> > > > > > > >
> > > > > > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi, Chenghao,
> > > > > > > > > >
> > > > > > > > > > Please take a look.
> > > > > > > > > >
> > > > > > > > > > Huacai
> > > > > > > > > >
> > > > > > > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > > > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > > > > > module_attach test experienced the same issue, so definitely
> > > > > > > > > trampoline patches issue.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I attempted to isolate which test in module_attach triggers the
> > > > > > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > > > > > this one in "prog_tests/module_attach.c"
> > > > > > > >
> > > > > > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > > > > > >
> > > > > > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > > > > > and perform the test, it might help isolate the issue.
> > > > > > > >
> > > > > > >
> > > > > > > Hi Vincent,
> > > > > > >
> > > > > > > The results I tested are different from yours. Could there be other
> > > > > > > differences between us? I am using the latest code of the loongarch-next
> > > > > > > branch.
> > > > > > >
> > > > > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > > > > bpf_testmod.ko is already unloaded.
> > > > > > > Loading bpf_testmod.ko...
> > > > > > > Successfully loaded bpf_testmod.ko.
> > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > > > > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > > > > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > > > > > bpf_testmod_test_read() is not modifiable
> > > > > > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > > > > > -- END PROG LOAD LOG --
> > > > > > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > > > > > libbpf: failed to load object 'test_module_attach'
> > > > > > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > > > > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > > > > > #205     module_attach:FAIL
> > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > Successfully unloaded bpf_testmod.ko.
> > > > > > >
> > > > > >
> > > > > > I build and run the most recent loongarch-next kernel too, can you try
> > > > > > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > > > > > on fedora, here are the steps I build, run the kernel, and run the
> > > > > > test
> > > > > >
> > > > > > 1, check branch
> > > > > > [root@fedora linux-loongson]# git branch
> > > > > > * loongarch-next
> > > > > >   master
> > > > > >   no-tailcall
> > > > > >   no-trampoline
> > > > > >
> > > > > > 2, build kernel and reboot
> > > > > > cp config.txt .config; make clean; make -j6; make modules_install;
> > > > > > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> > > > > >
> > > > > > 3, after reboot and login, build bpf selftests, run module_attach
> > > > > > test, dmesg to check kernel log
> > > > > > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> > > > > >
> > > > >
> > > > > Hi Vincent,
> > > > >
> > > > > I tried to refer to the config you provided, but the test results I
> > > > > obtained are as follows. I also specifically tested "modify" to verify
> > > > > the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
> > > > >
> > > > > [root@localhost bpf]# ./test_progs -v -t modify_return
> > > > > bpf_testmod.ko is already unloaded.
> > > > > Loading bpf_testmod.ko...
> > > > > Successfully loaded bpf_testmod.ko.
> > > > > run_test:PASS:skel_load 0 nsec
> > > > > run_test:PASS:modify_return__attach failed 0 nsec
> > > > > run_test:PASS:test_run 0 nsec
> > > > > run_test:PASS:test_run ret 0 nsec
> > > > > run_test:PASS:modify_return side_effect 0 nsec
> > > > > run_test:PASS:modify_return fentry_result 0 nsec
> > > > > run_test:PASS:modify_return fexit_result 0 nsec
> > > > > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > > > > run_test:PASS:modify_return fentry_result2 0 nsec
> > > > > run_test:PASS:modify_return fexit_result2 0 nsec
> > > > > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > > > > run_test:PASS:skel_load 0 nsec
> > > > > run_test:PASS:modify_return__attach failed 0 nsec
> > > > > run_test:PASS:test_run 0 nsec
> > > > > run_test:PASS:test_run ret 0 nsec
> > > > > run_test:PASS:modify_return side_effect 0 nsec
> > > > > run_test:PASS:modify_return fentry_result 0 nsec
> > > > > run_test:PASS:modify_return fexit_result 0 nsec
> > > > > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > > > > run_test:PASS:modify_return fentry_result2 0 nsec
> > > > > run_test:PASS:modify_return fexit_result2 0 nsec
> > > > > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > > > > #200     modify_return:OK
> > > > > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > > > > Successfully unloaded bpf_testmod.ko.
> > > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > > bpf_testmod.ko is already unloaded.
> > > > > Loading bpf_testmod.ko...
> > > > > Successfully loaded bpf_testmod.ko.
> > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> > > > > libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
> > > >
> > > > the -EOPNOTSUPP comes from libbpf, but I am not sure if it is error in
> > > > kernel leads to libbpf error or libbpf itself, you can do strace -f
> > > > -s1024 -o /tmp/module_attatch.txt  ./test_progs -v -t module_attach.
> > > > The strace should have bpf syscall and I think it can tell you if the
> > > > -EOPNOTSUPP is the result of kernel error or libbpf, you can share the
> > > > strace if you want.
> > > >
> > > 2037  read(16, "", 8192)                = 0
> > > 2037  close(16)                         = 0
> > > 2037  bpf(BPF_LINK_CREATE, {link_create={prog_fd=61, target_fd=0, attach_type=BPF_TRACE_KPROBE_MULTI, flags=0, kprobe_multi={flags=0, cnt=1, syms=NULL, addrs=[0xffff8000035717d0], cookies=NULL}}}, 64) = -1 EOPNOTSUPP (不支持的操作)
> >
> > so bpf syscall cmd BPF_LINK_CREATE returns  '-1 EOPNOTSUPP' exactly? I
> > could not tell because I thought the return value is '-1'
> >
> > > 2037  write(1, "libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP\n", 59) = 59
> > > 2037  write(1, "libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP\n", 64) = 64
> > > 2037  write(1, "test_module_attach:FAIL:skel_attach skeleton attach failed: -95\n", 64) = 64
> > >
> > > not support attach_type BPF_TRACE_KPROBE_MULTI
> > >
> >
> > Could you share your kernel config (.config used for kernel compiling
> > or running kernel /boot/config-*)  ? I wonder if you have the FPROBE
> > really configured, since include/linux/fprobe.h has:
> >
> > #ifdef CONFIG_FPROBE
> > int register_fprobe(struct fprobe *fp, const char *filter, const char
> > *notfilter);
> > int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
> > int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);
> > int unregister_fprobe(struct fprobe *fp);
> > bool fprobe_is_registered(struct fprobe *fp);
> > int fprobe_count_ips_from_filter(const char *filter, const char *notfilter);
> > #else
> > static inline int register_fprobe(struct fprobe *fp, const char
> > *filter, const char *notfilter)
> > {
> >         return -EOPNOTSUPP;
> > }
> > static inline int register_fprobe_ips(struct fprobe *fp, unsigned long
> > *addrs, int num)
> > {
> >         return -EOPNOTSUPP;
> > }
> > static inline int register_fprobe_syms(struct fprobe *fp, const char
> > **syms, int num)
> > {
> >         return -EOPNOTSUPP;
> > }
> > static inline int unregister_fprobe(struct fprobe *fp)
> > {
> >         return -EOPNOTSUPP;
> > }
> > static inline bool fprobe_is_registered(struct fprobe *fp)
> > {
> >         return false;
> > }
> > static inline int fprobe_count_ips_from_filter(const char *filter,
> > const char *notfilter)
> > {
> >         return -EOPNOTSUPP;
> > }
> > #endif
> >
>
> and check CONFIG_BPF_EVENTS since linux/trace_events.h has:
>
> #ifdef CONFIG_BPF_EVENTS
> unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
> int perf_event_attach_bpf_prog(struct perf_event *event, struct
> bpf_prog *prog, u64 bpf_cookie);
> void perf_event_detach_bpf_prog(struct perf_event *event);
> int perf_event_query_prog_array(struct perf_event *event, void __user *info);
>
> struct bpf_raw_tp_link;
> int bpf_probe_register(struct bpf_raw_event_map *btp, struct
> bpf_raw_tp_link *link);
> int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct
> bpf_raw_tp_link *link);
>
> struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
> void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
> int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
>                             u32 *fd_type, const char **buf,
>                             u64 *probe_offset, u64 *probe_addr,
>                             unsigned long *missed);
> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct
> bpf_prog *prog);
> int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct
> bpf_prog *prog);
> #else
> static inline unsigned int trace_call_bpf(struct trace_event_call
> *call, void *ctx)
> {
>         return 1;
> }
>
> static inline int
> perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog
> *prog, u64 bpf_cookie)
> {
>         return -EOPNOTSUPP;
> }
>
> static inline void perf_event_detach_bpf_prog(struct perf_event *event) { }
>
> static inline int
> perf_event_query_prog_array(struct perf_event *event, void __user *info)
> {
>         return -EOPNOTSUPP;
> }
> struct bpf_raw_tp_link;
> static inline int bpf_probe_register(struct bpf_raw_event_map *btp,
> struct bpf_raw_tp_link *link)
> {
>         return -EOPNOTSUPP;
> }
> static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp,
> struct bpf_raw_tp_link *link)
> {
>         return -EOPNOTSUPP;
> }
> static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
> {
>         return NULL;
> }
> static inline void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
> {
> }
> static inline int bpf_get_perf_event_info(const struct perf_event *event,
>                                           u32 *prog_id, u32 *fd_type,
>                                           const char **buf, u64 *probe_offset,
>                                           u64 *probe_addr, unsigned
> long *missed)
> {
>         return -EOPNOTSUPP;
> }
> static inline int
> bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> {
>         return -EOPNOTSUPP;
> }
> static inline int
> bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> {
>         return -EOPNOTSUPP;
> }
> #endif
>

FYI, here are the places for bpf_kprobe_multi_link_attach(),  if
CONFIG_FPROBE and CONFIG_BPF_EVENTS not defined,
bpf_kprobe_multi_link_attach() returns -EOPNOTSUPP

Cscope tag: bpf_kprobe_multi_link_attach
   #   line  filename / context / line
   1    781  include/linux/trace_events.h <<GLOBAL>>
             int bpf_kprobe_multi_link_attach(const union bpf_attr
*attr, struct bpf_prog *prog);
   2    826  include/linux/trace_events.h <<bpf_kprobe_multi_link_attach>>
             bpf_kprobe_multi_link_attach(const union bpf_attr *attr,
struct bpf_prog *prog)
   3   5606  kernel/bpf/syscall.c <<link_create>>
             ret = bpf_kprobe_multi_link_attach(attr, prog);
   4   2894  kernel/trace/bpf_trace.c <<bpf_kprobe_multi_link_attach>>
             int bpf_kprobe_multi_link_attach(const union bpf_attr
*attr, struct bpf_prog *prog)
   5   3042  kernel/trace/bpf_trace.c <<bpf_kprobe_multi_link_attach>>
             int bpf_kprobe_multi_link_attach(const union bpf_attr
*attr, struct bpf_prog *prog)

> > > Chenghao
> > >
> > >
> > > >
> > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> > > > > #201     module_attach:FAIL
> > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > Successfully unloaded bpf_testmod.ko.
> > > > >
> > > > >
> > > > > Chenghao
> > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Chenghao
> > > > > > >
> > > > > > > >
> > > > > > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi, Vincent,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Folks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > > > > > lockup immediately.
> > > > > > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > > > > > patches and will let you know the result.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > > > > > feature not supported
> > > > > > > > > > >
> > > > > > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > > > > #205     module_attach:FAIL
> > > > > > > > > > >
> > > > > > > > > > > All error logs:
> > > > > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > > > > #205     module_attach:FAIL
> > > > > > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > > > > >
> > > > > > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > > > > > > >
> > > > > > > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > > > > > 90000000041d5848
> > > > > > > > > > > [  419.728629] Oops[#1]:
> > > > > > > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > > > > > 9000000004163938
> > > > > > > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > > > > > time, OOM is now expected behavior.
> > > > > > > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > > > > > 0000000000000010 90000001003b0680
> > > > > > > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > > > > > > 9000000006b21000 0000000000000005
> > > > > > > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > > > > > > 0000000000000000 0000000000000004
> > > > > > > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > > > > > > 00000000000000b4 0000000000000000
> > > > > > > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > > > > > > 9000000004284f20 000000000a400001
> > > > > > > > > > > [  451.305581]         ...
> > > > > > > > > > > [  451.305584] Call Trace:
> > > > > > > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > > > > > >
> > > > > > > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > > > > > >
> > > > > > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > > > > > >
> > > > > > > > > > > > > Huacai
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernel lockup on bpf selftests module_attach
  2025-08-22  5:22                         ` Vincent Li
  2025-08-22  5:33                           ` Vincent Li
@ 2025-08-22  5:36                           ` Chenghao Duan
  1 sibling, 0 replies; 18+ messages in thread
From: Chenghao Duan @ 2025-08-22  5:36 UTC (permalink / raw)
  To: Vincent Li; +Cc: Huacai Chen, loongarch, Hengqi Chen, Tiezhu Yang

On Thu, Aug 21, 2025 at 10:22:46PM -0700, Vincent Li wrote:
> On Thu, Aug 21, 2025 at 10:10 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> >
> > On Thu, Aug 21, 2025 at 8:11 PM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > >
> > > On Thu, Aug 21, 2025 at 08:04:07AM -0700, Vincent Li wrote:
> > > > On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > > >
> > > > > On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > > > > > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > > > > >
> > > > > > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > > > > > Hi Chenghao,
> > > > > > > >
> > > > > > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi, Chenghao,
> > > > > > > > > >
> > > > > > > > > > Please take a look.
> > > > > > > > > >
> > > > > > > > > > Huacai
> > > > > > > > > >
> > > > > > > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > > > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > > > > > module_attach test experienced the same issue, so definitely
> > > > > > > > > trampoline patches issue.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I attempted to isolate which test in module_attach triggers the
> > > > > > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > > > > > this one in "prog_tests/module_attach.c"
> > > > > > > >
> > > > > > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > > > > > >
> > > > > > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > > > > > and perform the test, it might help isolate the issue.
> > > > > > > >
> > > > > > >
> > > > > > > Hi Vincent,
> > > > > > >
> > > > > > > The results I tested are different from yours. Could there be other
> > > > > > > differences between us? I am using the latest code of the loongarch-next
> > > > > > > branch.
> > > > > > >
> > > > > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > > > > bpf_testmod.ko is already unloaded.
> > > > > > > Loading bpf_testmod.ko...
> > > > > > > Successfully loaded bpf_testmod.ko.
> > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > > > > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > > > > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > > > > > bpf_testmod_test_read() is not modifiable
> > > > > > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > > > > > -- END PROG LOAD LOG --
> > > > > > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > > > > > libbpf: failed to load object 'test_module_attach'
> > > > > > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > > > > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > > > > > #205     module_attach:FAIL
> > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > Successfully unloaded bpf_testmod.ko.
> > > > > > >
> > > > > >
> > > > > > I build and run the most recent loongarch-next kernel too, can you try
> > > > > > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > > > > > on fedora, here are the steps I build, run the kernel, and run the
> > > > > > test
> > > > > >
> > > > > > 1, check branch
> > > > > > [root@fedora linux-loongson]# git branch
> > > > > > * loongarch-next
> > > > > >   master
> > > > > >   no-tailcall
> > > > > >   no-trampoline
> > > > > >
> > > > > > 2, build kernel and reboot
> > > > > > cp config.txt .config; make clean; make -j6; make modules_install;
> > > > > > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> > > > > >
> > > > > > 3, after reboot and login, build bpf selftests, run module_attach
> > > > > > test, dmesg to check kernel log
> > > > > > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> > > > > >
> > > > >
> > > > > Hi Vincent,
> > > > >
> > > > > I tried to refer to the config you provided, but the test results I
> > > > > obtained are as follows. I also specifically tested "modify" to verify
> > > > > the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
> > > > >
> > > > > [root@localhost bpf]# ./test_progs -v -t modify_return
> > > > > bpf_testmod.ko is already unloaded.
> > > > > Loading bpf_testmod.ko...
> > > > > Successfully loaded bpf_testmod.ko.
> > > > > run_test:PASS:skel_load 0 nsec
> > > > > run_test:PASS:modify_return__attach failed 0 nsec
> > > > > run_test:PASS:test_run 0 nsec
> > > > > run_test:PASS:test_run ret 0 nsec
> > > > > run_test:PASS:modify_return side_effect 0 nsec
> > > > > run_test:PASS:modify_return fentry_result 0 nsec
> > > > > run_test:PASS:modify_return fexit_result 0 nsec
> > > > > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > > > > run_test:PASS:modify_return fentry_result2 0 nsec
> > > > > run_test:PASS:modify_return fexit_result2 0 nsec
> > > > > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > > > > run_test:PASS:skel_load 0 nsec
> > > > > run_test:PASS:modify_return__attach failed 0 nsec
> > > > > run_test:PASS:test_run 0 nsec
> > > > > run_test:PASS:test_run ret 0 nsec
> > > > > run_test:PASS:modify_return side_effect 0 nsec
> > > > > run_test:PASS:modify_return fentry_result 0 nsec
> > > > > run_test:PASS:modify_return fexit_result 0 nsec
> > > > > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > > > > run_test:PASS:modify_return fentry_result2 0 nsec
> > > > > run_test:PASS:modify_return fexit_result2 0 nsec
> > > > > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > > > > #200     modify_return:OK
> > > > > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > > > > Successfully unloaded bpf_testmod.ko.
> > > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > > bpf_testmod.ko is already unloaded.
> > > > > Loading bpf_testmod.ko...
> > > > > Successfully loaded bpf_testmod.ko.
> > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> > > > > libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
> > > >
> > > > the -EOPNOTSUPP comes from libbpf, but I am not sure if it is error in
> > > > kernel leads to libbpf error or libbpf itself, you can do strace -f
> > > > -s1024 -o /tmp/module_attatch.txt  ./test_progs -v -t module_attach.
> > > > The strace should have bpf syscall and I think it can tell you if the
> > > > -EOPNOTSUPP is the result of kernel error or libbpf, you can share the
> > > > strace if you want.
> > > >
> > > 2037  read(16, "", 8192)                = 0
> > > 2037  close(16)                         = 0
> > > 2037  bpf(BPF_LINK_CREATE, {link_create={prog_fd=61, target_fd=0, attach_type=BPF_TRACE_KPROBE_MULTI, flags=0, kprobe_multi={flags=0, cnt=1, syms=NULL, addrs=[0xffff8000035717d0], cookies=NULL}}}, 64) = -1 EOPNOTSUPP (不支持的操作)
> >
> > so bpf syscall cmd BPF_LINK_CREATE returns  '-1 EOPNOTSUPP' exactly? I
> > could not tell because I thought the return value is '-1'
> >
> > > 2037  write(1, "libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP\n", 59) = 59
> > > 2037  write(1, "libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP\n", 64) = 64
> > > 2037  write(1, "test_module_attach:FAIL:skel_attach skeleton attach failed: -95\n", 64) = 64
> > >
> > > not support attach_type BPF_TRACE_KPROBE_MULTI
> > >
> >
> > Could you share your kernel config (.config used for kernel compiling
> > or running kernel /boot/config-*)  ? I wonder if you have the FPROBE
> > really configured, since include/linux/fprobe.h has:
> >
> > #ifdef CONFIG_FPROBE
> > int register_fprobe(struct fprobe *fp, const char *filter, const char
> > *notfilter);
> > int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
> > int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);
> > int unregister_fprobe(struct fprobe *fp);
> > bool fprobe_is_registered(struct fprobe *fp);
> > int fprobe_count_ips_from_filter(const char *filter, const char *notfilter);
> > #else
> > static inline int register_fprobe(struct fprobe *fp, const char
> > *filter, const char *notfilter)
> > {
> >         return -EOPNOTSUPP;
> > }
> > static inline int register_fprobe_ips(struct fprobe *fp, unsigned long
> > *addrs, int num)
> > {
> >         return -EOPNOTSUPP;
> > }
> > static inline int register_fprobe_syms(struct fprobe *fp, const char
> > **syms, int num)
> > {
> >         return -EOPNOTSUPP;
> > }
> > static inline int unregister_fprobe(struct fprobe *fp)
> > {
> >         return -EOPNOTSUPP;
> > }
> > static inline bool fprobe_is_registered(struct fprobe *fp)
> > {
> >         return false;
> > }
> > static inline int fprobe_count_ips_from_filter(const char *filter,
> > const char *notfilter)
> > {
> >         return -EOPNOTSUPP;
> > }
> > #endif
> >
> 
> and check CONFIG_BPF_EVENTS since linux/trace_events.h has:
> 
> #ifdef CONFIG_BPF_EVENTS
> unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
> int perf_event_attach_bpf_prog(struct perf_event *event, struct
> bpf_prog *prog, u64 bpf_cookie);
> void perf_event_detach_bpf_prog(struct perf_event *event);
> int perf_event_query_prog_array(struct perf_event *event, void __user *info);
> 
> struct bpf_raw_tp_link;
> int bpf_probe_register(struct bpf_raw_event_map *btp, struct
> bpf_raw_tp_link *link);
> int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct
> bpf_raw_tp_link *link);
> 
> struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
> void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
> int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
>                             u32 *fd_type, const char **buf,
>                             u64 *probe_offset, u64 *probe_addr,
>                             unsigned long *missed);
> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct
> bpf_prog *prog);
> int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct
> bpf_prog *prog);
> #else
> static inline unsigned int trace_call_bpf(struct trace_event_call
> *call, void *ctx)
> {
>         return 1;
> }
> 
> static inline int
> perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog
> *prog, u64 bpf_cookie)
> {
>         return -EOPNOTSUPP;
> }
> 
> static inline void perf_event_detach_bpf_prog(struct perf_event *event) { }
> 
> static inline int
> perf_event_query_prog_array(struct perf_event *event, void __user *info)
> {
>         return -EOPNOTSUPP;
> }
> struct bpf_raw_tp_link;
> static inline int bpf_probe_register(struct bpf_raw_event_map *btp,
> struct bpf_raw_tp_link *link)
> {
>         return -EOPNOTSUPP;
> }
> static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp,
> struct bpf_raw_tp_link *link)
> {
>         return -EOPNOTSUPP;
> }
> static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
> {
>         return NULL;
> }
> static inline void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
> {
> }
> static inline int bpf_get_perf_event_info(const struct perf_event *event,
>                                           u32 *prog_id, u32 *fd_type,
>                                           const char **buf, u64 *probe_offset,
>                                           u64 *probe_addr, unsigned
> long *missed)
> {
>         return -EOPNOTSUPP;
> }
> static inline int
> bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> {
>         return -EOPNOTSUPP;
> }
> static inline int
> bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> {
>         return -EOPNOTSUPP;
> }
> #endif

I checked the config and used the most straightforward method:
intentionally adding errors in the macro's if-else to distinguish.

> 
> > > Chenghao
> > >
> > >
> > > >
> > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> > > > > #201     module_attach:FAIL
> > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > Successfully unloaded bpf_testmod.ko.
> > > > >
> > > > >
> > > > > Chenghao
> > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Chenghao
> > > > > > >
> > > > > > > >
> > > > > > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi, Vincent,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Folks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > > > > > lockup immediately.
> > > > > > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > > > > > patches and will let you know the result.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > > > > > feature not supported
> > > > > > > > > > >
> > > > > > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > > > > #205     module_attach:FAIL
> > > > > > > > > > >
> > > > > > > > > > > All error logs:
> > > > > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > > > > #205     module_attach:FAIL
> > > > > > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > > > > >
> > > > > > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > > > > > > >
> > > > > > > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > > > > > 90000000041d5848
> > > > > > > > > > > [  419.728629] Oops[#1]:
> > > > > > > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > > > > > 9000000004163938
> > > > > > > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > > > > > time, OOM is now expected behavior.
> > > > > > > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > > > > > 0000000000000010 90000001003b0680
> > > > > > > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > > > > > > 9000000006b21000 0000000000000005
> > > > > > > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > > > > > > 0000000000000000 0000000000000004
> > > > > > > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > > > > > > 00000000000000b4 0000000000000000
> > > > > > > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > > > > > > 9000000004284f20 000000000a400001
> > > > > > > > > > > [  451.305581]         ...
> > > > > > > > > > > [  451.305584] Call Trace:
> > > > > > > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > > > > > >
> > > > > > > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > > > > > >
> > > > > > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > > > > > >
> > > > > > > > > > > > > Huacai
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Vincent

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-08-22  5:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-09  8:15 kernel lockup on bpf selftests module_attach Vincent Li
2025-08-09  3:03 ` Huacai Chen
2025-08-09  3:48   ` Vincent Li
2025-08-09  5:03     ` Vincent Li
2025-08-09  6:02       ` Huacai Chen
2025-08-09 19:11         ` Vincent Li
2025-08-10 17:39           ` Vincent Li
2025-08-12  8:34             ` Chenghao Duan
2025-08-12 13:42               ` Vincent Li
2025-08-14 12:00                 ` Chenghao Duan
2025-08-14 13:42                   ` Vincent Li
2025-08-14 13:47                     ` Vincent Li
2025-08-21 15:04                   ` Vincent Li
2025-08-22  3:11                     ` Chenghao Duan
2025-08-22  5:10                       ` Vincent Li
2025-08-22  5:22                         ` Vincent Li
2025-08-22  5:33                           ` Vincent Li
2025-08-22  5:36                           ` Chenghao Duan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).