Re: kernel lockup on bpf selftests module_attach

loongarch.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Chenghao Duan <duanchenghao@kylinos.cn>
To: Vincent Li <vincent.mc.li@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>,
	loongarch@lists.linux.dev, Hengqi Chen <hengqi.chen@gmail.com>,
	Tiezhu Yang <yangtiezhu@loongson.cn>
Subject: Re: kernel lockup on bpf selftests module_attach
Date: Fri, 22 Aug 2025 11:11:02 +0800	[thread overview]
Message-ID: <20250822031102.GA331509@chenghao-pc> (raw)
In-Reply-To: <CAK3+h2zO5bMnGTNb30=ggi8bg1-+gbaP1HfBoatJ4FVMuRgZdw@mail.gmail.com>

On Thu, Aug 21, 2025 at 08:04:07AM -0700, Vincent Li wrote:
> On Thu, Aug 14, 2025 at 5:00 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> >
> > On Tue, Aug 12, 2025 at 06:42:08AM -0700, Vincent Li wrote:
> > > On Tue, Aug 12, 2025 at 1:34 AM Chenghao Duan <duanchenghao@kylinos.cn> wrote:
> > > >
> > > > On Sun, Aug 10, 2025 at 10:39:24AM -0700, Vincent Li wrote:
> > > > > Hi Chenghao,
> > > > >
> > > > > On Sat, Aug 9, 2025 at 12:11 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Aug 8, 2025 at 11:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > >
> > > > > > > Hi, Chenghao,
> > > > > > >
> > > > > > > Please take a look.
> > > > > > >
> > > > > > > Huacai
> > > > > > >
> > > > > > I reverted loongson-next branch  tailcall count fix patches, struct
> > > > > > ops trampoline patch, keep the rest of trampoline patches,
> > > > > > module_attach test experienced the same issue, so definitely
> > > > > > trampoline patches issue.
> > > > > >
> > > > >
> > > > > I attempted to isolate which test in module_attach triggers the
> > > > > "Unable to handle kernel paging request..." error, it appears to be
> > > > > this one in "prog_tests/module_attach.c"
> > > > >
> > > > > ASSERT_OK(trigger_module_test_read(READ_SZ), "trigger_read");
> > > > >
> > > > > you can try to comment out other tests in "prog_tests/module_attach.c"
> > > > > and perform the test, it might help isolate the issue.
> > > > >
> > > >
> > > > Hi Vincent,
> > > >
> > > > The results I tested are different from yours. Could there be other
> > > > differences between us? I am using the latest code of the loongarch-next
> > > > branch.
> > > >
> > > > [root@localhost bpf]# ./test_progs -v -t module_attach
> > > > bpf_testmod.ko is already unloaded.
> > > > Loading bpf_testmod.ko...
> > > > Successfully loaded bpf_testmod.ko.
> > > > test_module_attach:PASS:skel_open 0 nsec
> > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > WATCHDOG: test case module_attach executes for 10 seconds...
> > > > libbpf: prog 'handle_fmod_ret': BPF program load failed: -EINVAL
> > > > libbpf: prog 'handle_fmod_ret': -- BEGIN PROG LOAD LOG --
> > > > bpf_testmod_test_read() is not modifiable
> > > > processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> > > > -- END PROG LOAD LOG --
> > > > libbpf: prog 'handle_fmod_ret': failed to load: -EINVAL
> > > > libbpf: failed to load object 'test_module_attach'
> > > > libbpf: failed to load BPF skeleton 'test_module_attach': -EINVAL
> > > > test_module_attach:FAIL:skel_load failed to load skeleton
> > > > #205     module_attach:FAIL
> > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > Successfully unloaded bpf_testmod.ko.
> > > >
> > >
> > > I build and run the most recent loongarch-next kernel too, can you try
> > > my config https://www.bpfire.net/download/loongfire/config.txt? I am
> > > on fedora, here are the steps I build, run the kernel, and run the
> > > test
> > >
> > > 1, check branch
> > > [root@fedora linux-loongson]# git branch
> > > * loongarch-next
> > >   master
> > >   no-tailcall
> > >   no-trampoline
> > >
> > > 2, build kernel and reboot
> > > cp config.txt .config; make clean; make -j6; make modules_install;
> > > make install; grub2-mkconfig -o /boot/grub2/grub.cfg; reboot
> > >
> > > 3, after reboot and login, build bpf selftests, run module_attach
> > > test, dmesg to check kernel log
> > > cd tools/testing/selftests/bpf; make -j6; ./test_progs -t module_attach
> > >
> >
> > Hi Vincent,
> >
> > I tried to refer to the config you provided, but the test results I
> > obtained are as follows. I also specifically tested "modify" to verify
> > the effectiveness of the patch, and the test of module_attach returns -EOPNOTSUPP.
> >
> > [root@localhost bpf]# ./test_progs -v -t modify_return
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > run_test:PASS:skel_load 0 nsec
> > run_test:PASS:modify_return__attach failed 0 nsec
> > run_test:PASS:test_run 0 nsec
> > run_test:PASS:test_run ret 0 nsec
> > run_test:PASS:modify_return side_effect 0 nsec
> > run_test:PASS:modify_return fentry_result 0 nsec
> > run_test:PASS:modify_return fexit_result 0 nsec
> > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > run_test:PASS:modify_return fentry_result2 0 nsec
> > run_test:PASS:modify_return fexit_result2 0 nsec
> > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > run_test:PASS:skel_load 0 nsec
> > run_test:PASS:modify_return__attach failed 0 nsec
> > run_test:PASS:test_run 0 nsec
> > run_test:PASS:test_run ret 0 nsec
> > run_test:PASS:modify_return side_effect 0 nsec
> > run_test:PASS:modify_return fentry_result 0 nsec
> > run_test:PASS:modify_return fexit_result 0 nsec
> > run_test:PASS:modify_return fmod_ret_result 0 nsec
> > run_test:PASS:modify_return fentry_result2 0 nsec
> > run_test:PASS:modify_return fexit_result2 0 nsec
> > run_test:PASS:modify_return fmod_ret_result2 0 nsec
> > #200     modify_return:OK
> > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
> > Successfully unloaded bpf_testmod.ko.
> > [root@localhost bpf]# ./test_progs -v -t module_attach
> > bpf_testmod.ko is already unloaded.
> > Loading bpf_testmod.ko...
> > Successfully loaded bpf_testmod.ko.
> > test_module_attach:PASS:skel_open 0 nsec
> > test_module_attach:PASS:set_attach_target 0 nsec
> > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > test_module_attach:PASS:skel_load 0 nsec
> > libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP
> > libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP
> 
> the -EOPNOTSUPP comes from libbpf, but I am not sure if it is error in
> kernel leads to libbpf error or libbpf itself, you can do strace -f
> -s1024 -o /tmp/module_attatch.txt  ./test_progs -v -t module_attach.
> The strace should have bpf syscall and I think it can tell you if the
> -EOPNOTSUPP is the result of kernel error or libbpf, you can share the
> strace if you want.
> 
2037  read(16, "", 8192)                = 0
2037  close(16)                         = 0
2037  bpf(BPF_LINK_CREATE, {link_create={prog_fd=61, target_fd=0, attach_type=BPF_TRACE_KPROBE_MULTI, flags=0, kprobe_multi={flags=0, cnt=1, syms=NULL, addrs=[0xffff8000035717d0], cookies=NULL}}}, 64) = -1 EOPNOTSUPP (不支持的操作)
2037  write(1, "libbpf: prog 'kprobe_multi': failed to attach: -EOPNOTSUPP\n", 59) = 59
2037  write(1, "libbpf: prog 'kprobe_multi': failed to auto-attach: -EOPNOTSUPP\n", 64) = 64
2037  write(1, "test_module_attach:FAIL:skel_attach skeleton attach failed: -95\n", 64) = 64

not support attach_type BPF_TRACE_KPROBE_MULTI

Chenghao


> 
> > test_module_attach:FAIL:skel_attach skeleton attach failed: -95
> > #201     module_attach:FAIL
> > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > Successfully unloaded bpf_testmod.ko.
> >
> >
> > Chenghao
> >
> > >
> > > >
> > > >
> > > > Chenghao
> > > >
> > > > >
> > > > > > > On Sat, Aug 9, 2025 at 1:03 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Aug 8, 2025 at 8:48 PM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Aug 8, 2025 at 8:03 PM Huacai Chen <chenhuacai@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi, Vincent,
> > > > > > > > > >
> > > > > > > > > > On Sat, Aug 9, 2025 at 12:53 AM Vincent Li <vincent.mc.li@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Folks,
> > > > > > > > > > >
> > > > > > > > > > > Hengqi mentioned offline that the loongarch kernel locked up when
> > > > > > > > > > > running full bpf selftests, so I went ahead and ran make run_tests to
> > > > > > > > > > > perform full bpf selftest, I observed lockup too. It appears the
> > > > > > > > > > > lockup happens when running module_attach test which includes testing
> > > > > > > > > > > on fentry so this could be related to the trampoline patch series. for
> > > > > > > > > > > example, if I just run ./test_progs -t module_attach, the kernel
> > > > > > > > > > > lockup immediately.
> > > > > > > > > > Is this a regression caused by the latest trampoline patches? Or in
> > > > > > > > > > another word, Does vanilla 6.16 has this problem?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I suspect this is caused by the latest trampoline patches because the
> > > > > > > > > module_attach is to test the fentry feature for kernel module
> > > > > > > > > functions, I believe Changhao and I only tested the fentry feature for
> > > > > > > > > non-module kernel functions. I can try kernel without the trampoline
> > > > > > > > > patches and will let you know the result.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I reverted  trampoline patches from loongarch-next branch and run
> > > > > > > > ./test_progs -t module_attach simply just errors out with the fentry
> > > > > > > > feature not supported
> > > > > > > >
> > > > > > > > [root@fedora bpf]# ./test_progs -t module_attach
> > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > #205     module_attach:FAIL
> > > > > > > >
> > > > > > > > All error logs:
> > > > > > > > test_module_attach:PASS:skel_open 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target 0 nsec
> > > > > > > > test_module_attach:PASS:set_attach_target_explicit 0 nsec
> > > > > > > > test_module_attach:PASS:skel_load 0 nsec
> > > > > > > > libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
> > > > > > > > libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
> > > > > > > > test_module_attach:FAIL:skel_attach skeleton attach failed: -524
> > > > > > > > #205     module_attach:FAIL
> > > > > > > > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
> > > > > > > >
> > > > > > > > I also tested loongarch-next branch with the trampoline patch series
> > > > > > > > with no lockup kernel config so I can run dmesg to check kernel error
> > > > > > > > log,  ./test_progs -t module_attach result in below kernel log:
> > > > > > > >
> > > > > > > > [  417.429954] bpf_testmod: loading out-of-tree module taints kernel.
> > > > > > > > [  419.728620] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > virtual address 0000000800000024, era == 90000000041d5854, ra ==
> > > > > > > > 90000000041d5848
> > > > > > > > [  419.728629] Oops[#1]:
> > > > > > > > [  419.728632] CPU 70475748 Unable to handle kernel paging request at
> > > > > > > > virtual address 0000000000000018, era == 9000000005750268, ra ==
> > > > > > > > 9000000004163938
> > > > > > > > [  441.305370] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > > > > [  441.305380] rcu:     5-...0: (29 ticks this GP)
> > > > > > > > idle=eb74/1/0x4000000000000000 softirq=72377/72379 fqs=2599
> > > > > > > > [  441.305386] rcu:     (detected by 4, t=5252 jiffies, g=60333, q=186 ncpus=8)
> > > > > > > > [  441.305390] Sending NMI from CPU 4 to CPUs 5:
> > > > > > > > [  451.305494] rcu: rcu_preempt kthread starved for 2499 jiffies!
> > > > > > > > g60333 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=1
> > > > > > > > [  451.305500] rcu:     Unless rcu_preempt kthread gets sufficient CPU
> > > > > > > > time, OOM is now expected behavior.
> > > > > > > > [  451.305502] rcu: RCU grace-period kthread stack dump:
> > > > > > > > [  451.305504] task:rcu_preempt     state:R stack:0     pid:15
> > > > > > > > tgid:15    ppid:2      task_flags:0x208040 flags:0x00000800
> > > > > > > > [  451.305510] Stack : 9000000100467e80 0000000000000402
> > > > > > > > 0000000000000010 90000001003b0680
> > > > > > > > [  451.305519]         90000000058e0000 0000000000000000
> > > > > > > > 0000000000000040 9000000006c2dfd0
> > > > > > > > [  451.305526]         900000000578c9b0 0000000000000001
> > > > > > > > 9000000006b21000 0000000000000005
> > > > > > > > [  451.305533]         00000001000093a8 00000001000093a8
> > > > > > > > 0000000000000000 0000000000000004
> > > > > > > > [  451.305540]         90000000058f04e0 0000000000000000
> > > > > > > > 0000000000000002 b793724be1dfb2b8
> > > > > > > > [  451.305547]         00000001000093a9 b793724be1dfb2b8
> > > > > > > > 000000000000003f 9000000006c2dfd0
> > > > > > > > [  451.305554]         9000000006c30c18 0000000000000005
> > > > > > > > 9000000006b0e000 9000000006b21000
> > > > > > > > [  451.305560]         9000000100453c98 90000001003aff80
> > > > > > > > 9000000006c31140 900000000578c9b0
> > > > > > > > [  451.305567]         00000001000093a8 9000000005794d3c
> > > > > > > > 00000000000000b4 0000000000000000
> > > > > > > > [  451.305574]         90000000024021b8 00000001000093a8
> > > > > > > > 9000000004284f20 000000000a400001
> > > > > > > > [  451.305581]         ...
> > > > > > > > [  451.305584] Call Trace:
> > > > > > > > [  451.305586] [<900000000578b868>] __schedule+0x410/0x1520
> > > > > > > > [  451.305595] [<900000000578c9ac>] schedule+0x34/0x190
> > > > > > > > [  451.305599] [<9000000005794d38>] schedule_timeout+0x98/0x140
> > > > > > > > [  451.305604] [<9000000004258f40>] rcu_gp_fqs_loop+0x5f8/0x868
> > > > > > > > [  451.305609] [<900000000425d358>] rcu_gp_kthread+0x260/0x2e0
> > > > > > > > [  451.305614] [<90000000041be704>] kthread+0x144/0x238
> > > > > > > > [  451.305619] [<9000000005787b60>] ret_from_kernel_thread+0x28/0xc8
> > > > > > > > [  451.305624] [<90000000041620e4>] ret_from_kernel_thread_asm+0xc/0x88
> > > > > > > >
> > > > > > > > [  451.305630] rcu: Stack dump where RCU GP kthread last ran:
> > > > > > > > [  451.305633] Sending NMI from CPU 4 to CPUs 1:
> > > > > > > > [  451.305636] NMI backtrace for cpu 1 skipped: idling at idle_exit+0x0/0x4
> > > > > > > > [  451.306655] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > > > CPUs/tasks: { 5-...D } 7298 jiffies s: 853 root: 0x20/.
> > > > > > > > [  451.306665] rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > > > [  451.306669] Sending NMI from CPU 6 to CPUs 5:
> > > > > > > > [  451.306672] Unable to send backtrace IPI to CPU5 - perhaps it hung?
> > > > > > > >
> > > > > > > > So related to trampoline patches for sure unless I am missing something.
> > > > > > > >
> > > > > > > > > > Huacai
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > A side note, if I put the module_attach test in
> > > > > > > > > > > tools/testing/selftests/bpf/DENYLIST to skip the module_attach test,
> > > > > > > > > > > the module_attach test is not skipped.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > Vincent

next prev parent reply	other threads:[~2025-08-22  3:11 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-09  8:15 kernel lockup on bpf selftests module_attach Vincent Li
2025-08-09  3:03 ` Huacai Chen
2025-08-09  3:48   ` Vincent Li
2025-08-09  5:03     ` Vincent Li
2025-08-09  6:02       ` Huacai Chen
2025-08-09 19:11         ` Vincent Li
2025-08-10 17:39           ` Vincent Li
2025-08-12  8:34             ` Chenghao Duan
2025-08-12 13:42               ` Vincent Li
2025-08-14 12:00                 ` Chenghao Duan
2025-08-14 13:42                   ` Vincent Li
2025-08-14 13:47                     ` Vincent Li
2025-08-21 15:04                   ` Vincent Li
2025-08-22  3:11                     ` Chenghao Duan [this message]
2025-08-22  5:10                       ` Vincent Li
2025-08-22  5:22                         ` Vincent Li
2025-08-22  5:33                           ` Vincent Li
2025-08-22  5:36                           ` Chenghao Duan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250822031102.GA331509@chenghao-pc \
    --to=duanchenghao@kylinos.cn \
    --cc=chenhuacai@kernel.org \
    --cc=hengqi.chen@gmail.com \
    --cc=loongarch@lists.linux.dev \
    --cc=vincent.mc.li@gmail.com \
    --cc=yangtiezhu@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).