From: Chenghao Duan <duanchenghao@kylinos.cn>
To: Vincent Li <vincent.mc.li@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>,
loongarch@lists.linux.dev, Hengqi Chen <hengqi.chen@gmail.com>,
Tiezhu Yang <yangtiezhu@loongson.cn>
Subject: Re: kernel lockup on bpf selftests module_attach
Date: Fri, 22 Aug 2025 11:11:02 +0800 [thread overview]
Message-ID: <20250822031102.GA331509@chenghao-pc> (raw)
In-Reply-To: <CAK3+h2zO5bMnGTNb30=ggi8bg1-+gbaP1HfBoatJ4FVMuRgZdw@mail.gmail.com>
On Thu, Aug 21, 2025 at 08:04:07AM -0700, Vincent Li wrote:
> On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> >
> > On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > >
> > > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > > Hi Chenghao,
> > > > >
> > > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > >
> > > > > > > Hi, Chenghao,
> > > > > > >
> > > > > > > Please take a look.
> > > > > > >
> > > > > > > Huacai
> > > > > > >
> > > > > > I reverted loongson-next branch tailcall count fix patches, struct
> > > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > > module_attach test experienced the same issue, so definitely
> > > > > > trampoline patches issue.
> > > > > >
> > > > >
> > > > > I attempted to isolate which test in module_attach triggers the
> > > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > > this one in "prog_tests/module_attach.c"
> > > > >
> > > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > > >
> > > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > > and perform the test, it might help isolate the issue.
> > > > >
> > > >
> > > > Hi Vincent,
> > > >
> > > > The results I tested are different from yours. Could there be other
> > > > differences between us? I am using the latest code of the loongarch-next
> > > > branch.
> > > >
> > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > bpf_testmod.ko is already unloaded.
> > > > Loading bpf_testmod.ko...
> > > > Successfully loaded bpf_testmod.ko.
> > > > test_module_attach:PASS:skel_open 0 nsec
> > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > > bpf_testmod_test_read() is not modifiable
> > > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > > -- END PROG LOAD LOG --
> > > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > > libbpf: failed to load object 'test_module_attach'
> > > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > > #205 module_attach:FAIL
> > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > Successfully unloaded bpf_testmod.ko.
> > > >
> > >
> > > I build and run the most recent loongarch-next kernel too, can you try
> > > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > > on fedora, here are the steps I build, run the kernel, and run the
> > > test
> > >
> > > 1, check branch
> > > [root@fedora linux-loongson]# git branch
> > > * loongarch-next
> > > master
> > > no-tailcall
> > > no-trampoline
> > >
> > > 2, build kernel and reboot
> > > cp config.txt .config; make clean; make -j6; make modules_install;
> > > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> > >
> > > 3, after reboot and login, build bpf selftests, run module_attach
> > > test, dmesg to check kernel log
> > > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> > >
> >
> > Hi Vincent,
> >
> > I tried to refer to the config you provided, but the test results I
> > obtained are as follows. I also specifically tested "modify" to verify
> > the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
> >
> > [root@localhost bpf]# ./test_progs -v -t modify_return
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > run_test:PASS:skel_load 0 nsec
> > run_test:PASS:modify_return__attach failed 0 nsec
> > run_test:PASS:test_run 0 nsec
> > run_test:PASS:test_run ret 0 nsec
> > run_test:PASS:modify_return side_effect 0 nsec
> > run_test:PASS:modify_return fentry_result 0 nsec
> > run_test:PASS:modify_return fexit_result 0 nsec
> > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > run_test:PASS:modify_return fentry_result2 0 nsec
> > run_test:PASS:modify_return fexit_result2 0 nsec
> > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > run_test:PASS:skel_load 0 nsec
> > run_test:PASS:modify_return__attach failed 0 nsec
> > run_test:PASS:test_run 0 nsec
> > run_test:PASS:test_run ret 0 nsec
> > run_test:PASS:modify_return side_effect 0 nsec
> > run_test:PASS:modify_return fentry_result 0 nsec
> > run_test:PASS:modify_return fexit_result 0 nsec
> > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > run_test:PASS:modify_return fentry_result2 0 nsec
> > run_test:PASS:modify_return fexit_result2 0 nsec
> > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > #200 modify_return:OK
> > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > Successfully unloaded bpf_testmod.ko.
> > [root@localhost bpf]# ./test_progs -v -t module_attach
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > test_module_attach:PASS:skel_open 0 nsec
> > test_module_attach:PASS:set_attach_target 0 nsec
> > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > test_module_attach:PASS:skel_load 0 nsec
> > libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> > libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
>
> the -EOPNOTSUPP comes from libbpf, but I am not sure if it is error in
> kernel leads to libbpf error or libbpf itself, you can do strace -f
> -s1024 -o /tmp/module_attatch.txt ./test_progs -v -t module_attach.
> The strace should have bpf syscall and I think it can tell you if the
> -EOPNOTSUPP is the result of kernel error or libbpf, you can share the
> strace if you want.
>
2037 read(16, "", 8192) = 0
2037 close(16) = 0
2037 bpf(BPF_LINK_CREATE, {link_create={prog_fd=61, target_fd=0, attach_type=BPF_TRACE_KPROBE_MULTI, flags=0, kprobe_multi={flags=0, cnt=1, syms=NULL, addrs=[0xffff8000035717d0], cookies=NULL}}}, 64) = -1 EOPNOTSUPP (不支持的操作)
2037 write(1, "libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP\n", 59) = 59
2037 write(1, "libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP\n", 64) = 64
2037 write(1, "test_module_attach:FAIL:skel_attach skeleton attach failed: -95\n", 64) = 64
not support attach_type BPF_TRACE_KPROBE_MULTI
Chenghao
>
> > test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> > #201 module_attach:FAIL
> > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > Successfully unloaded bpf_testmod.ko.
> >
> >
> > Chenghao
> >
> > >
> > > >
> > > >
> > > > Chenghao
> > > >
> > > > >
> > > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi, Vincent,
> > > > > > > > > >
> > > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Folks,
> > > > > > > > > > >
> > > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > > lockup immediately.
> > > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > > patches and will let you know the result.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I reverted trampoline patches from loongarch-next branch and run
> > > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > > feature not supported
> > > > > > > >
> > > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > #205 module_attach:FAIL
> > > > > > > >
> > > > > > > > All error logs:
> > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > #205 module_attach:FAIL
> > > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > >
> > > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > > log, ./test_progs -t module_attach result in below kernel log:
> > > > > > > >
> > > > > > > > [ 417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > > [ 419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > > 90000000041d5848
> > > > > > > > [ 419.728629] Oops[#1]:
> > > > > > > > [ 419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > > 9000000004163938
> > > > > > > > [ 441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > [ 441.305380] rcu: 5-...0: (29 ticks this GP)
> > > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > > [ 441.305386] rcu: (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > > [ 441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > > [ 451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > > [ 451.305500] rcu: Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > > time, OOM is now expected behavior.
> > > > > > > > [ 451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > > [ 451.305504] task:rcu_preempt state:R stack:0 pid:15
> > > > > > > > tgid:15 ppid:2 task_flags:0x208040 flags:0x00000800
> > > > > > > > [ 451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > > 0000000000000010 90000001003b0680
> > > > > > > > [ 451.305519] 90000000058e0000 0000000000000000
> > > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > > [ 451.305526] 900000000578c9b0 0000000000000001
> > > > > > > > 9000000006b21000 0000000000000005
> > > > > > > > [ 451.305533] 00000001000093a8 00000001000093a8
> > > > > > > > 0000000000000000 0000000000000004
> > > > > > > > [ 451.305540] 90000000058f04e0 0000000000000000
> > > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > > [ 451.305547] 00000001000093a9 b793724be1dfb2b8
> > > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > > [ 451.305554] 9000000006c30c18 0000000000000005
> > > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > > [ 451.305560] 9000000100453c98 90000001003aff80
> > > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > > [ 451.305567] 00000001000093a8 9000000005794d3c
> > > > > > > > 00000000000000b4 0000000000000000
> > > > > > > > [ 451.305574] 90000000024021b8 00000001000093a8
> > > > > > > > 9000000004284f20 000000000a400001
> > > > > > > > [ 451.305581] ...
> > > > > > > > [ 451.305584] Call Trace:
> > > > > > > > [ 451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > > [ 451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > > [ 451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > > [ 451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > > [ 451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > > [ 451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > > [ 451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > > [ 451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > > >
> > > > > > > > [ 451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > > [ 451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > > [ 451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > > [ 451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > > [ 451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > > [ 451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > > [ 451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > > >
> > > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > > >
> > > > > > > > > > Huacai
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > Vincent
next prev parent reply other threads:[~2025-08-22 3:11 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-09 8:15 kernel lockup on bpf selftests module_attach Vincent Li
2025-08-09 3:03 ` Huacai Chen
2025-08-09 3:48 ` Vincent Li
2025-08-09 5:03 ` Vincent Li
2025-08-09 6:02 ` Huacai Chen
2025-08-09 19:11 ` Vincent Li
2025-08-10 17:39 ` Vincent Li
2025-08-12 8:34 ` Chenghao Duan
2025-08-12 13:42 ` Vincent Li
2025-08-14 12:00 ` Chenghao Duan
2025-08-14 13:42 ` Vincent Li
2025-08-14 13:47 ` Vincent Li
2025-08-21 15:04 ` Vincent Li
2025-08-22 3:11 ` Chenghao Duan [this message]
2025-08-22 5:10 ` Vincent Li
2025-08-22 5:22 ` Vincent Li
2025-08-22 5:33 ` Vincent Li
2025-08-22 5:36 ` Chenghao Duan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250822031102.GA331509@chenghao-pc \
--to=duanchenghao@kylinos.cn \
--cc=chenhuacai@kernel.org \
--cc=hengqi.chen@gmail.com \
--cc=loongarch@lists.linux.dev \
--cc=vincent.mc.li@gmail.com \
--cc=yangtiezhu@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).