* [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling
@ 2026-07-01 12:14 Pu Hu
2026-07-01 12:30 ` Pu Hu
2026-07-01 13:43 ` Masami Hiramatsu
0 siblings, 2 replies; 7+ messages in thread
From: Pu Hu @ 2026-07-01 12:14 UTC (permalink / raw)
To: catalin.marinas@arm.com, will@kernel.org, naveen@kernel.org,
davem@davemloft.net, mhiramat@kernel.org,
yang@os.amperecomputing.com, Hongyan Xia, Jiazi Li,
ada.coupriediaz@arm.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Cc: Pu Hu
From: hupu <hupu@transsion.com>
This series fixes two arm64 kprobes issues observed when running
simpleperf with preemptirq tracepoints and dwarf callchains while a
kprobe is active on a frequently executed kernel function.
The crash happens in the kprobe debug exception path. While a kprobe is
preparing or executing its XOL single-step instruction, perf/trace code
can run in the same window. That code may either take a fault of its own
or hit another kprobe.
Patch 1 makes kprobe_fault_handler() handle a fault in
KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC points at the
current kprobe's XOL instruction. Otherwise the fault is left to the
normal fault handling path.
Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a
recoverable one-level reentry. Only a hit while already in
KPROBE_REENTER remains unrecoverable.
This follows the same logic as the existing x86 fixes:
6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic")
6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on single-stepping")
Reproducer:
simpleperf record -p <pid> -f 10000 \
-e preemptirq:preempt_disable \
-e preemptirq:preempt_enable \
--duration 9 --call-graph dwarf \
-o /data/local/tmp/perf.data
Before this series, the crash reproduced frequently. With both patches
applied, it was no longer reproduced in our testing.
hupu (2):
arm64: kprobes: Do not handle non-XOL faults as kprobe faults
arm64: kprobes: Allow reentering kprobes while single-stepping
arch/arm64/kernel/probes/kprobes.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling 2026-07-01 12:14 [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling Pu Hu @ 2026-07-01 12:30 ` Pu Hu 2026-07-01 13:43 ` Masami Hiramatsu 1 sibling, 0 replies; 7+ messages in thread From: Pu Hu @ 2026-07-01 12:30 UTC (permalink / raw) To: catalin.marinas@arm.com, will@kernel.org, naveen@kernel.org, davem@davemloft.net, mhiramat@kernel.org, yang@os.amperecomputing.com, Hongyan Xia, Jiazi Li, ada.coupriediaz@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Dear Maintainers, I would like to provide some additional background for this patchset. We observed a high-probability crash on an Android device running a 6.1.145-based kernel when recording preemptirq tracepoints for a user space process with dwarf callchains enabled. The command used to reproduce the issue is: simpleperf record -p <PID> -f 10000 \ -e preemptirq:preempt_disable \ -e preemptirq:preempt_enable \ --duration 9 --call-graph dwarf \ -o /data/local/tmp/perf.data Here <PID> is the PID of a user space process, for example a foreground application UI thread or RenderThread. One important observation is that the crash does not reproduce if "--call-graph dwarf" is removed. The crash log shows a data abort on a user virtual address while the PC is at a probed kernel instruction: [ 297.177775] Unable to handle kernel paging request at virtual address 0000007ff042e000 [ 297.177792] Mem abort info: [ 297.177795] ESR = 0x0000000096000007 [ 297.177799] EC = 0x25: DABT (current EL), IL = 32 bits [ 297.177803] SET = 0, FnV = 0 [ 297.177806] EA = 0, S1PTW = 0 [ 297.177808] FSC = 0x07: level 3 translation fault [ 297.177811] Data abort info: [ 297.177814] ISV = 0, ISS = 0x00000007 [ 297.177817] CM = 0, WnR = 0 [ 297.177820] user pgtable: 4k pages, 39-bit VAs, pgdp=000000098c9f2000 [ 297.177825] [0000007ff042e000] pgd=08000009aaaea003, p4d=08000009aaaea003, pud=08000009aaaea003, pmd=08000000abca0003, pte=0000000000000000 [ 297.177835] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP [ 297.178070] Skip md ftrace buffer dump for: 0x2800d70 ... [ 297.178485] CPU: 6 PID: 10214 Comm: id.article.news Tainted: P S W O 6.1.145-android14-11-maybe-dirty-qki-consolidate #1 [ 297.178489] Hardware name: Qualcomm Technologies, Inc. Volcano QRD,x6878 (DT) [ 297.178491] pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) [ 297.178493] pc : folio_wait_bit_common+0x0/0x408 [ 297.178499] lr : perf_output_sample+0x57c/0xacc [ 297.178502] sp : ffffffc0366c2f90 [ 297.178503] x29: ffffffc0366c2fb0 x28: 0000000000001000 x27: 0000007ff042d5f8 [ 297.178507] x26: 00000000000035e7 x25: 0000000000000000 x24: ffffff892cec3000 [ 297.178510] x23: 0000000000001000 x22: 0000000000009370 x21: ffffffc0366c3140 [ 297.178512] x20: ffffff888aa1a180 x19: ffffffc0366c3020 x18: ffffffe01103b340 [ 297.178515] x17: 00000000ad6b63b6 x16: 00000000ad6b63b6 x15: 0000007ff042d5f8 [ 297.178518] x14: 0000000000000000 x13: 003436737365636f x12: 72705f7070612f6e [ 297.178520] x11: 69622f6d65747379 x10: 732f0030333d7972 x9 : 616d6972705f6c6f [ 297.178523] x8 : 6f705f706173755f x7 : 54454b434f535f44 x6 : ffffff892cec39d8 [ 297.178526] x5 : ffffff892cec4000 x4 : 0000000000000008 x3 : 6e6f6973736e6172 [ 297.178528] x2 : 00000000000005b8 x1 : 0000007ff042e000 x0 : ffffff892cec3000 [ 297.178531] Call trace: [ 297.178532] folio_wait_bit_common+0x0/0x408 [ 297.178535] perf_event_output_forward+0x90/0xdc [ 297.178537] __perf_event_overflow+0x128/0x1e8 [ 297.178540] perf_swevent_event+0x94/0x1a0 [ 297.178543] perf_tp_event+0x140/0x270 [ 297.178545] perf_trace_run_bpf_submit+0x84/0xe0 [ 297.178547] perf_trace_preemptirq_template+0xe8/0x124 [ 297.178553] trace_preempt_on+0xec/0x150 [ 297.178555] preempt_count_sub+0xa8/0x12c [ 297.178562] do_debug_exception+0xd0/0x148 [ 297.178568] el1_dbg+0x64/0x80 [ 297.178575] el1h_64_sync_handler+0x3c/0x90 [ 297.178577] el1h_64_sync+0x68/0x6c [ 297.178579] folio_wait_bit_common+0x0/0x408 [ 297.178582] __get_node_page+0xdc/0x49c [ 297.178587] f2fs_get_dnode_of_data+0x404/0x950 [ 297.178589] f2fs_map_blocks+0x1e0/0xdf8 [ 297.178591] f2fs_mpage_readpages+0x1f0/0x8d0 [ 297.178594] f2fs_readahead+0x84/0x10c [ 297.178596] read_pages+0xb8/0x434 [ 297.178603] page_cache_ra_unbounded+0x9c/0x2f0 [ 297.178605] page_cache_ra_order+0x2b0/0x348 [ 297.178608] do_sync_mmap_readahead+0xd0/0x228 [ 297.178612] filemap_fault+0x158/0x46c [ 297.178615] f2fs_filemap_fault+0x28/0x114 [ 297.178617] handle_mm_fault+0x4f8/0x1468 [ 297.178620] do_page_fault+0x208/0x4b8 [ 297.178622] do_translation_fault+0x38/0x54 [ 297.178624] do_mem_abort+0x58/0x118 [ 297.178626] el0_da+0x48/0xb8 [ 297.178629] el0t_64_sync_handler+0x98/0xb4 [ 297.178632] el0t_64_sync+0x1a4/0x1a8 [ 297.178634] Code: 94000004 a8c17bfd d50323bf d65f03c0 (d4200080) [ 297.178639] ---[ end trace 0000000000000000 ]--- The instruction d4200080 is the kprobe BRK instruction. The stack also shows that the fault happens while handling a kprobe debug exception, and the perf/trace path is entered from that window. From the fulldump analysis, the issue appears to be related to the arm64 kprobe single-step/reentry handling. While a kprobe is preparing or executing its XOL single-step instruction, perf/trace code may run in the same window. With dwarf callchains enabled, this path may also access user memory and take a data abort. In addition, another kprobe may be hit while the first kprobe is still in KPROBE_HIT_SS state. This matches the type of issue that was fixed on x86 by the following commits: 6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic") 6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on single-stepping") This patchset applies the same idea to arm64: - Patch 1 makes the arm64 kprobe fault handler handle a fault in KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC is the current kprobe's XOL instruction. Otherwise, the fault is left to the normal fault handling path. - Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a recoverable one-level reentry. The unrecoverable case remains a hit while already in KPROBE_REENTER. With both patches applied, we have kept the same stress test running for three days and the crash is no longer reproduced. I still have the full dmesg and fulldump from the crash device. Please let me know if any additional information would be useful. Thanks, hupu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling 2026-07-01 12:14 [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling Pu Hu 2026-07-01 12:30 ` Pu Hu @ 2026-07-01 13:43 ` Masami Hiramatsu 2026-07-01 13:56 ` Pu Hu 1 sibling, 1 reply; 7+ messages in thread From: Masami Hiramatsu @ 2026-07-01 13:43 UTC (permalink / raw) To: Pu Hu Cc: Hongyan Xia, Jiazi Li, catalin.marinas@arm.com, naveen@kernel.org, linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, will@kernel.org, davem@davemloft.net, linux-arm-kernel@lists.infradead.org, linux-trace-kernel@vger.kernel.org On Wed, 1 Jul 2026 12:14:54 +0000 Pu Hu <hupu@transsion.com> wrote: > From: hupu <hupu@transsion.com> > > This series fixes two arm64 kprobes issues observed when running > simpleperf with preemptirq tracepoints and dwarf callchains while a > kprobe is active on a frequently executed kernel function. > > The crash happens in the kprobe debug exception path. While a kprobe is > preparing or executing its XOL single-step instruction, perf/trace code > can run in the same window. That code may either take a fault of its own > or hit another kprobe. > > Patch 1 makes kprobe_fault_handler() handle a fault in > KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC points at the > current kprobe's XOL instruction. Otherwise the fault is left to the > normal fault handling path. > > Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a > recoverable one-level reentry. Only a hit while already in > KPROBE_REENTER remains unrecoverable. > > This follows the same logic as the existing x86 fixes: > 6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic") > 6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on single-stepping") Good catch!! The series looks good to me. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> But it should be reviewed by arm64 maintainers too. BTW, if you are "Pu Hu", the Signed-off-by tag should be "Pu Hu <...>" instead of "hupu <...>". Thank you, > > Reproducer: > > simpleperf record -p <pid> -f 10000 \ > -e preemptirq:preempt_disable \ > -e preemptirq:preempt_enable \ > --duration 9 --call-graph dwarf \ > -o /data/local/tmp/perf.data > > Before this series, the crash reproduced frequently. With both patches > applied, it was no longer reproduced in our testing. > > hupu (2): > arm64: kprobes: Do not handle non-XOL faults as kprobe faults > arm64: kprobes: Allow reentering kprobes while single-stepping > > arch/arm64/kernel/probes/kprobes.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > -- > 2.43.0 > > -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling 2026-07-01 13:43 ` Masami Hiramatsu @ 2026-07-01 13:56 ` Pu Hu 2026-07-02 10:07 ` Pu Hu 0 siblings, 1 reply; 7+ messages in thread From: Pu Hu @ 2026-07-01 13:56 UTC (permalink / raw) To: Masami Hiramatsu (Google) Cc: Hongyan Xia, Jiazi Li, catalin.marinas@arm.com, naveen@kernel.org, linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, will@kernel.org, davem@davemloft.net, linux-arm-kernel@lists.infradead.org, linux-trace-kernel@vger.kernel.org On 7/1/2026 9:43 PM, Masami Hiramatsu wrote: > On Wed, 1 Jul 2026 12:14:54 +0000 > Pu Hu <hupu@transsion.com> wrote: > >> From: hupu <hupu@transsion.com> >> >> This series fixes two arm64 kprobes issues observed when running >> simpleperf with preemptirq tracepoints and dwarf callchains while a >> kprobe is active on a frequently executed kernel function. >> >> The crash happens in the kprobe debug exception path. While a kprobe is >> preparing or executing its XOL single-step instruction, perf/trace code >> can run in the same window. That code may either take a fault of its own >> or hit another kprobe. >> >> Patch 1 makes kprobe_fault_handler() handle a fault in >> KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC points at the >> current kprobe's XOL instruction. Otherwise the fault is left to the >> normal fault handling path. >> >> Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a >> recoverable one-level reentry. Only a hit while already in >> KPROBE_REENTER remains unrecoverable. >> >> This follows the same logic as the existing x86 fixes: >> 6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic") >> 6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on single-stepping") > > Good catch!! > The series looks good to me. > > Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > > But it should be reviewed by arm64 maintainers too. > > BTW, if you are "Pu Hu", the Signed-off-by tag should be > "Pu Hu <...>" instead of "hupu <...>". > Hi Masami, Thank you for your reply and Acked-by. Yes, thanks for pointing this out. I will fix the author name and the Signed-off-by tags to use a consistent name in the next version of the patchset. Thanks, hupu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling 2026-07-01 13:56 ` Pu Hu @ 2026-07-02 10:07 ` Pu Hu 2026-07-02 10:09 ` Pu Hu 0 siblings, 1 reply; 7+ messages in thread From: Pu Hu @ 2026-07-02 10:07 UTC (permalink / raw) To: Masami Hiramatsu (Google), catalin.marinas@arm.com, will@kernel.org, naveen@kernel.org, davem@davemloft.net, yang@os.amperecomputing.com, Hongyan Xia, Jiazi Li, ada.coupriediaz@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org On 7/1/2026 9:56 PM, Pu Hu wrote: > On 7/1/2026 9:43 PM, Masami Hiramatsu wrote: >> On Wed, 1 Jul 2026 12:14:54 +0000 >> Pu Hu <hupu@transsion.com> wrote: >> >>> From: hupu <hupu@transsion.com> >>> >>> This series fixes two arm64 kprobes issues observed when running >>> simpleperf with preemptirq tracepoints and dwarf callchains while a >>> kprobe is active on a frequently executed kernel function. >>> >>> The crash happens in the kprobe debug exception path. While a kprobe is >>> preparing or executing its XOL single-step instruction, perf/trace code >>> can run in the same window. That code may either take a fault of its own >>> or hit another kprobe. >>> >>> Patch 1 makes kprobe_fault_handler() handle a fault in >>> KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC points at the >>> current kprobe's XOL instruction. Otherwise the fault is left to the >>> normal fault handling path. >>> >>> Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a >>> recoverable one-level reentry. Only a hit while already in >>> KPROBE_REENTER remains unrecoverable. >>> >>> This follows the same logic as the existing x86 fixes: >>> 6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic") >>> 6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on >>> single-stepping") >> >> Good catch!! >> The series looks good to me. >> >> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> >> >> But it should be reviewed by arm64 maintainers too. >> >> BTW, if you are "Pu Hu", the Signed-off-by tag should be >> "Pu Hu <...>" instead of "hupu <...>". >> > > Hi Masami, > > Thank you for your reply and Acked-by. > > Yes, thanks for pointing this out. I will fix the author name and the > Signed-off-by tags to use a consistent name in the next version of the > patchset. > > Thanks, > hupu > Hi maintainers, I have reproduced the same issue on the latest mainline kernel available today. The commit I tested is 665159e24674. Below are the steps I used to reproduce the issue. I hope this can help with further debugging. The complete test case used in these steps will be provided in a follow-up email. Reproduction steps: 1. Build the test case Please use the test case that I will send in the next email. Depending on your local environment, the following variables in the Makefile may need to be adjusted: CROSS_COMPILE ?= aarch64-dumpstack-linux-gnu- KERN_DIR ?= $(PWD)/../../output/build-mainline DEST_PATH ?= $(PWD)/../../output Then run: make all This builds the userspace test program: fault_stress and the kprobe module: kp_folio.ko 2. Boot QEMU To increase memory pressure, I used only two CPUs and 512 MB of memory in the QEMU guest: SMP="-smp 2" qemu-system-aarch64 -m 512 -cpu cortex-a53 \ -M virt,gic-version=3,its=on,iommu=smmuv3 \ -nographic $SMP -kernel $KERNEL_IMAGE \ -append "nokaslr noinitrd sched_debug root=/dev/vda rootfstype=ext4 rw crashkernel=256M loglevel=8" \ -drive if=none,file=$ROOTFS_IMAGE,id=hd0,format=raw \ -device virtio-blk-device,drive=hd0 \ --fsdev local,id=kmod_dev,path=./output/,security_model=none \ -device virtio-9p-pci,fsdev=kmod_dev,mount_tag=kmod_mount \ -net nic -net tap,ifname=tap0,script=no,downscript=no \ $GDB_DEBUG 3. Run the test in the guest After the guest has booted, run the following commands. Allow kernel symbols to be shown: echo 0 > /proc/sys/kernel/kptr_restrict Load the kprobe module: insmod kp_folio.ko Start the fault stress program: ./fault_stress & Start stress-ng to add memory pressure: ./stress-ng --vm 2 --vm-bytes 70% --page-in & Run perf against the fault_stress process. In the command below, 171 is the PID of fault_stress in my test environment: ./perf record -p 171 -c 1 \ -e preemptirq:preempt_disable \ -e preemptirq:preempt_enable \ --call-graph dwarf \ -o /tmp/perf.data \ -- sleep 5 With the steps above, I can reproduce the crash reliably in my local QEMU setup. After applying my previously submitted fix, I can no longer reproduce the issue with the same test. The crash log is shown below: [ 173.383321] kp_folio: hit=1564 comm=fault_stress tgid=171 tid=173 [ 173.402940] kp_folio: hit=1565 comm=fault_stress tgid=171 tid=179 [ 173.528342] kp_folio: hit=1566 comm=fault_stress tgid=171 tid=175 [ 173.846895] kp_folio: hit=1567 comm=fault_stress tgid=171 tid=172 [ 174.223031] kp_folio: hit=1568 comm=fault_stress tgid=171 tid=179 [ 174.224419] kp_folio: hit=1569 comm=fault_stress tgid=171 tid=174 [ 174.928471] kp_folio: hit=1570 comm=fault_stress tgid=171 tid=175 [ 174.930916] Unable to handle kernel paging request at virtual address 0000ffffa3592000 [ 174.931068] Mem abort info: [ 174.931116] ESR = 0x0000000096000007 [ 174.931180] EC = 0x25: DABT (current EL), IL = 32 bits [ 174.931240] SET = 0, FnV = 0 [ 174.931368] EA = 0, S1PTW = 0 [ 174.931430] FSC = 0x07: level 3 translation fault [ 174.931490] Data abort info: [ 174.931540] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000 [ 174.931593] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 174.931669] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 174.931762] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000049bf8000 [ 174.931829] [0000ffffa3592000] pgd=0800000049a99403, p4d=0800000049a99403, pud=0800000049ac0403, pmd=0800000049bed403, pte=00000000000047c0 [ 174.932328] Internal error: Oops: 0000000096000007 [#1] SMP [ 174.939042] Modules linked in: kp_folio(O) [ 174.942114] CPU: 1 UID: 0 PID: 175 Comm: fault_stress Tainted: G O 7.2.0-rc1-00010-g7679152d724a-dirty #2 PREEMPT [ 174.945427] Tainted: [O]=OOT_MODULE [ 174.946006] Hardware name: linux,dummy-virt (DT) [ 174.947011] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 174.948582] pc : folio_wait_bit_common+0x0/0x320 [ 174.949626] lr : perf_output_sample+0x708/0x968 [ 174.950041] sp : ffff800084b13540 [ 174.950511] x29: ffff800084b13570 x28: ffff000006704260 x27: 0000ffffa3591d08 [ 174.953274] x26: ffff000009a19a80 x25: 0000000000000000 x24: ffff800084b13780 [ 174.953601] x23: 0000000000000ee8 x22: 000000000000b5ef x21: 0000000000001000 [ 174.954003] x20: 0000000000000ee8 x19: ffff800084b135e0 x18: 000000000000000a [ 174.954262] x17: ffff8000803d1af4 x16: ffff80008036d01c x15: 0000ffffa3591d08 [ 174.954549] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 174.954863] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 [ 174.955315] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff0000069ce2c8 [ 174.955592] x5 : ffff0000069ceee8 x4 : 0000000000000008 x3 : 0000000000000000 [ 174.956083] x2 : 0000000000000be0 x1 : 0000ffffa3592000 x0 : ffff0000069ce000 [ 174.956838] Call trace: [ 174.958282] folio_wait_bit_common+0x0/0x320 (P) [ 174.958618] perf_event_output_forward+0xc0/0x1a8 [ 174.958811] __perf_event_overflow+0x108/0x518 [ 174.959066] perf_swevent_event+0x238/0x260 [ 174.959295] perf_tp_event+0x34c/0x6a0 [ 174.959667] perf_trace_run_bpf_submit+0x8c/0xd0 [ 174.962331] perf_trace_preemptirq_template+0xc4/0x130 [ 174.962644] trace_preempt_on+0x114/0x1e8 [ 174.963019] preempt_count_sub+0x78/0xe0 [ 174.963402] el1_brk64+0x40/0x60 [ 174.963617] el1h_64_sync_handler+0x68/0xb0 [ 174.963817] el1h_64_sync+0x6c/0x70 [ 174.964239] 0xffff8000846c5000 (P) [ 174.964938] __do_fault+0x44/0x288 [ 174.965452] __handle_mm_fault+0xaf8/0x1a40 [ 174.965815] handle_mm_fault+0xb4/0x420 [ 174.966527] do_page_fault+0x140/0x7b0 [ 174.967398] do_translation_fault+0x4c/0x70 [ 174.968057] do_mem_abort+0x48/0xa0 [ 174.969705] el0_da+0x64/0x290 [ 174.969984] el0t_64_sync_handler+0xd0/0xe8 [ 174.970324] el0t_64_sync+0x198/0x1a0 [ 174.970713] Code: d50323bf d65f03c0 12800140 17fffffc (d4200080) [ 174.971338] kp_folio: hit=1571 comm=fault_stress tgid=171 tid=174 [ 174.972266] ---[ end trace 0000000000000000 ]--- I will send the complete test case in a follow-up email. Thanks, hupu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling 2026-07-02 10:07 ` Pu Hu @ 2026-07-02 10:09 ` Pu Hu 2026-07-04 14:47 ` Masami Hiramatsu 0 siblings, 1 reply; 7+ messages in thread From: Pu Hu @ 2026-07-02 10:09 UTC (permalink / raw) To: Masami Hiramatsu (Google), catalin.marinas@arm.com, will@kernel.org, naveen@kernel.org, davem@davemloft.net, yang@os.amperecomputing.com, Hongyan Xia, Jiazi Li, ada.coupriediaz@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org On 7/2/2026 6:07 PM, hupu wrote: > On 7/1/2026 9:56 PM, Pu Hu wrote: >> On 7/1/2026 9:43 PM, Masami Hiramatsu wrote: >>> On Wed, 1 Jul 2026 12:14:54 +0000 >>> Pu Hu <hupu@transsion.com> wrote: >>> >>>> From: hupu <hupu@transsion.com> >>>> ... ...> > I will send the complete test case in a follow-up email. > > Thanks, > hupu > Hi maintainers, As mentioned in my previous email, below is the complete test case I used to reproduce the arm64 kprobe crash on mainline. It contains: - a small kprobe module that probes folio_wait_bit_common() - a userspace program that repeatedly triggers file-backed page faults - a Makefile to build both parts Depending on the local build environment, the following variables in the Makefile may need to be adjusted: CROSS_COMPILE KERN_DIR DEST_PATH Thanks, Pu Hu --- diff --git a/misc/kprobe/Makefile b/misc/kprobe/Makefile new file mode 100755 index 0000000..14c00c0 --- /dev/null +++ b/misc/kprobe/Makefile @@ -0,0 +1,36 @@ +PWD := $(shell pwd) +ARCH ?= arm64 +CROSS_COMPILE ?= aarch64-dumpstack-linux-gnu- +KERN_DIR ?= $(PWD)/../../output/build-mainline +DEST_PATH ?= $(PWD)/../../output +Q := @ + +UNIT_TEST := fault_stress +UNIT_TEST_SRC := fault_stress.c + +KP_MOD := kp_folio +obj-m := $(KP_MOD).o + +USER_CFLAGS := -static -g -O0 -fno-omit-frame-pointer -fasynchronous-unwind-tables +USER_LIBS := -lm -lpthread +EXTRA_CFLAGS += -I$(KERN_DIR) + +.PHONY: all modules user clean + +all: modules user install + +modules: + $(Q)$(MAKE) -C $(KERN_DIR) M=$(PWD) EXTRA_CFLAGS="$(EXTRA_CFLAGS)" ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) modules + +user: + $(Q)$(CROSS_COMPILE)gcc $(USER_CFLAGS) $(UNIT_TEST_SRC) -o $(UNIT_TEST) $(USER_LIBS) + +install: + $(Q)mkdir -p $(DEST_PATH) + $(Q)cp -f *.ko $(DEST_PATH)/ + $(Q)cp -f $(UNIT_TEST) $(DEST_PATH)/ + +clean: + $(Q)$(MAKE) -C $(KERN_DIR) M=$(PWD) clean + $(Q)rm -f $(UNIT_TEST) + $(Q)rm -f $(DEST_PATH)/$(UNIT_TEST) $(DEST_PATH)/*.ko diff --git a/misc/kprobe/fault_stress.c b/misc/kprobe/fault_stress.c new file mode 100755 index 0000000..10150ff --- /dev/null +++ b/misc/kprobe/fault_stress.c @@ -0,0 +1,96 @@ +#define _GNU_SOURCE +#include <fcntl.h> +#include <pthread.h> +#include <sched.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mman.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <unistd.h> + +#define FILE_SIZE (256UL * 1024 * 1024) +#define NR_THREADS 8 + +static void deep_call(int n) +{ + volatile char buf[4096]; + + memset((void *)buf, n, sizeof(buf)); + + if (n > 0) + deep_call(n - 1); + else + sched_yield(); +} + +static void *worker(void *arg) +{ + const char *path = arg; + int fd; + char *map; + unsigned long i; + volatile unsigned long sum = 0; + + fd = open(path, O_RDONLY); + if (fd < 0) { + perror("open"); + return NULL; + } + + map = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0); + if (map == MAP_FAILED) { + perror("mmap"); + close(fd); + return NULL; + } + + for (;;) { + /* + * Drop the pages backing this mapping from the current process. + * Subsequent accesses are more likely to trigger file-backed + * page faults again. + */ + madvise(map, FILE_SIZE, MADV_DONTNEED); + + for (i = 0; i < FILE_SIZE; i += 4096 * 17) { + sum += map[i]; + deep_call(64); + } + } + + munmap(map, FILE_SIZE); + close(fd); + return NULL; +} + +int main(void) +{ + pthread_t th[NR_THREADS]; + const char *path = "/tmp/fault_stress_file"; + int fd; + int i; + + fd = open(path, O_CREAT | O_RDWR, 0644); + if (fd < 0) { + perror("open file"); + return 1; + } + + if (ftruncate(fd, FILE_SIZE) < 0) { + perror("ftruncate"); + return 1; + } + + close(fd); + + for (i = 0; i < NR_THREADS; i++) + pthread_create(&th[i], NULL, worker, (void *)path); + + for (i = 0; i < NR_THREADS; i++) + pthread_join(th[i], NULL); + + return 0; +} + diff --git a/misc/kprobe/kp_folio.c b/misc/kprobe/kp_folio.c new file mode 100755 index 0000000..c8f3e1d --- /dev/null +++ b/misc/kprobe/kp_folio.c @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/module.h> +#include <linux/kernel.h> +#include <linux/kprobes.h> +#include <linux/sched.h> +#include <linux/atomic.h> +#include <linux/ratelimit.h> + +static atomic64_t kp_hit_count = ATOMIC64_INIT(0); + +static int folio_wait_bit_common_handler( + struct kprobe *p,^M + struct pt_regs *regs) +{ + unsigned long hit; + + hit = atomic64_inc_return(&kp_hit_count); + + pr_info("kp_folio: hit=%lu comm=%s tgid=%d tid=%d\n", + hit, current->comm, current->tgid, current->pid); + + return 0; +} + +static struct kprobe kp_folio_wait_bit_common = { + .symbol_name = "folio_wait_bit_common", + .pre_handler = folio_wait_bit_common_handler, +}; + +static int __init kp_folio_init(void) +{ + int ret; + + ret = register_kprobe(&kp_folio_wait_bit_common); + if (ret < 0) { + pr_err("kp_folio: register_kprobe failed, ret=%d\n", ret); + return ret; + } + + pr_info("kp_folio: kprobe registered at %pS, addr=%px\n", + kp_folio_wait_bit_common.addr, + kp_folio_wait_bit_common.addr); + + return 0; +} + +static void __exit kp_folio_exit(void) +{ + unregister_kprobe(&kp_folio_wait_bit_common); + + pr_info("kp_folio: kprobe unregistered, total hits=%lld\n", + atomic64_read(&kp_hit_count)); +} + +module_init(kp_folio_init); +module_exit(kp_folio_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("hupu <hupu@transsion.com>"); +MODULE_DESCRIPTION("simple kprobe reproducer for folio_wait_bit_common"); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling 2026-07-02 10:09 ` Pu Hu @ 2026-07-04 14:47 ` Masami Hiramatsu 0 siblings, 0 replies; 7+ messages in thread From: Masami Hiramatsu @ 2026-07-04 14:47 UTC (permalink / raw) To: Pu Hu Cc: Hongyan Xia, Jiazi Li, catalin.marinas@arm.com, naveen@kernel.org, linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, will@kernel.org, davem@davemloft.net, linux-arm-kernel@lists.infradead.org, linux-trace-kernel@vger.kernel.org Hi Pu Hu, Can you update this by rebasing on top of arm64 tree (git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git) and update your signed-off-by with your name. Thank you, On Thu, 2 Jul 2026 10:09:51 +0000 Pu Hu <hupu@transsion.com> wrote: > On 7/2/2026 6:07 PM, hupu wrote: > > On 7/1/2026 9:56 PM, Pu Hu wrote: > >> On 7/1/2026 9:43 PM, Masami Hiramatsu wrote: > >>> On Wed, 1 Jul 2026 12:14:54 +0000 > >>> Pu Hu <hupu@transsion.com> wrote: > >>> > >>>> From: hupu <hupu@transsion.com> > >>>> > ... > ...> > > I will send the complete test case in a follow-up email. > > > > Thanks, > > hupu > > > > Hi maintainers, > > As mentioned in my previous email, below is the complete test case I > used to reproduce the arm64 kprobe crash on mainline. > > It contains: > > - a small kprobe module that probes folio_wait_bit_common() > - a userspace program that repeatedly triggers file-backed page faults > - a Makefile to build both parts > > Depending on the local build environment, the following variables in the > Makefile may need to be adjusted: > > CROSS_COMPILE > KERN_DIR > DEST_PATH > > Thanks, > Pu Hu > > --- > > > diff --git a/misc/kprobe/Makefile b/misc/kprobe/Makefile > new file mode 100755 > index 0000000..14c00c0 > --- /dev/null > +++ b/misc/kprobe/Makefile > @@ -0,0 +1,36 @@ > +PWD := $(shell pwd) > +ARCH ?= arm64 > +CROSS_COMPILE ?= aarch64-dumpstack-linux-gnu- > +KERN_DIR ?= $(PWD)/../../output/build-mainline > +DEST_PATH ?= $(PWD)/../../output > +Q := @ > + > +UNIT_TEST := fault_stress > +UNIT_TEST_SRC := fault_stress.c > + > +KP_MOD := kp_folio > +obj-m := $(KP_MOD).o > + > +USER_CFLAGS := -static -g -O0 -fno-omit-frame-pointer > -fasynchronous-unwind-tables > +USER_LIBS := -lm -lpthread > +EXTRA_CFLAGS += -I$(KERN_DIR) > + > +.PHONY: all modules user clean > + > +all: modules user install > + > +modules: > + $(Q)$(MAKE) -C $(KERN_DIR) M=$(PWD) > EXTRA_CFLAGS="$(EXTRA_CFLAGS)" ARCH=$(ARCH) > CROSS_COMPILE=$(CROSS_COMPILE) modules > + > +user: > + $(Q)$(CROSS_COMPILE)gcc $(USER_CFLAGS) $(UNIT_TEST_SRC) -o > $(UNIT_TEST) $(USER_LIBS) > + > +install: > + $(Q)mkdir -p $(DEST_PATH) > + $(Q)cp -f *.ko $(DEST_PATH)/ > + $(Q)cp -f $(UNIT_TEST) $(DEST_PATH)/ > + > +clean: > + $(Q)$(MAKE) -C $(KERN_DIR) M=$(PWD) clean > + $(Q)rm -f $(UNIT_TEST) > + $(Q)rm -f $(DEST_PATH)/$(UNIT_TEST) $(DEST_PATH)/*.ko > diff --git a/misc/kprobe/fault_stress.c b/misc/kprobe/fault_stress.c > new file mode 100755 > index 0000000..10150ff > --- /dev/null > +++ b/misc/kprobe/fault_stress.c > @@ -0,0 +1,96 @@ > +#define _GNU_SOURCE > +#include <fcntl.h> > +#include <pthread.h> > +#include <sched.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <string.h> > +#include <sys/mman.h> > +#include <sys/stat.h> > +#include <sys/types.h> > +#include <unistd.h> > + > +#define FILE_SIZE (256UL * 1024 * 1024) > +#define NR_THREADS 8 > + > +static void deep_call(int n) > +{ > + volatile char buf[4096]; > + > + memset((void *)buf, n, sizeof(buf)); > + > + if (n > 0) > + deep_call(n - 1); > + else > + sched_yield(); > +} > + > +static void *worker(void *arg) > +{ > + const char *path = arg; > + int fd; > + char *map; > + unsigned long i; > + volatile unsigned long sum = 0; > + > + fd = open(path, O_RDONLY); > + if (fd < 0) { > + perror("open"); > + return NULL; > + } > + > + map = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0); > + if (map == MAP_FAILED) { > + perror("mmap"); > + close(fd); > + return NULL; > + } > + > + for (;;) { > + /* > + * Drop the pages backing this mapping from the current > process. > + * Subsequent accesses are more likely to trigger > file-backed > + * page faults again. > + */ > + madvise(map, FILE_SIZE, MADV_DONTNEED); > + > + for (i = 0; i < FILE_SIZE; i += 4096 * 17) { > + sum += map[i]; > + deep_call(64); > + } > + } > + > + munmap(map, FILE_SIZE); > + close(fd); > + return NULL; > +} > + > +int main(void) > +{ > + pthread_t th[NR_THREADS]; > + const char *path = "/tmp/fault_stress_file"; > + int fd; > + int i; > + > + fd = open(path, O_CREAT | O_RDWR, 0644); > + if (fd < 0) { > + perror("open file"); > + return 1; > + } > + > + if (ftruncate(fd, FILE_SIZE) < 0) { > + perror("ftruncate"); > + return 1; > + } > + > + close(fd); > + > + for (i = 0; i < NR_THREADS; i++) > + pthread_create(&th[i], NULL, worker, (void *)path); > + > + for (i = 0; i < NR_THREADS; i++) > + pthread_join(th[i], NULL); > + > + return 0; > +} > + > diff --git a/misc/kprobe/kp_folio.c b/misc/kprobe/kp_folio.c > new file mode 100755 > index 0000000..c8f3e1d > --- /dev/null > +++ b/misc/kprobe/kp_folio.c > @@ -0,0 +1,60 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include <linux/module.h> > +#include <linux/kernel.h> > +#include <linux/kprobes.h> > +#include <linux/sched.h> > +#include <linux/atomic.h> > +#include <linux/ratelimit.h> > + > +static atomic64_t kp_hit_count = ATOMIC64_INIT(0); > + > +static int folio_wait_bit_common_handler( > + struct kprobe *p,^M > + struct pt_regs *regs) > +{ > + unsigned long hit; > + > + hit = atomic64_inc_return(&kp_hit_count); > + > + pr_info("kp_folio: hit=%lu comm=%s tgid=%d tid=%d\n", > + hit, current->comm, current->tgid, current->pid); > + > + return 0; > +} > + > +static struct kprobe kp_folio_wait_bit_common = { > + .symbol_name = "folio_wait_bit_common", > + .pre_handler = folio_wait_bit_common_handler, > +}; > + > +static int __init kp_folio_init(void) > +{ > + int ret; > + > + ret = register_kprobe(&kp_folio_wait_bit_common); > + if (ret < 0) { > + pr_err("kp_folio: register_kprobe failed, ret=%d\n", ret); > + return ret; > + } > + > + pr_info("kp_folio: kprobe registered at %pS, addr=%px\n", > + kp_folio_wait_bit_common.addr, > + kp_folio_wait_bit_common.addr); > + > + return 0; > +} > + > +static void __exit kp_folio_exit(void) > +{ > + unregister_kprobe(&kp_folio_wait_bit_common); > + > + pr_info("kp_folio: kprobe unregistered, total hits=%lld\n", > + atomic64_read(&kp_hit_count)); > +} > + > +module_init(kp_folio_init); > +module_exit(kp_folio_exit); > + > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("hupu <hupu@transsion.com>"); > +MODULE_DESCRIPTION("simple kprobe reproducer for folio_wait_bit_common"); > > -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-07-04 14:47 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-07-01 12:14 [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling Pu Hu 2026-07-01 12:30 ` Pu Hu 2026-07-01 13:43 ` Masami Hiramatsu 2026-07-01 13:56 ` Pu Hu 2026-07-02 10:07 ` Pu Hu 2026-07-02 10:09 ` Pu Hu 2026-07-04 14:47 ` Masami Hiramatsu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox