* [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling
@ 2026-07-01 12:14 Pu Hu
2026-07-01 12:30 ` Pu Hu
2026-07-01 13:43 ` Masami Hiramatsu
0 siblings, 2 replies; 7+ messages in thread
From: Pu Hu @ 2026-07-01 12:14 UTC (permalink / raw)
To: catalin.marinas@arm.com, will@kernel.org, naveen@kernel.org,
davem@davemloft.net, mhiramat@kernel.org,
yang@os.amperecomputing.com, Hongyan Xia, Jiazi Li,
ada.coupriediaz@arm.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Cc: Pu Hu
From: hupu <hupu@transsion.com>
This series fixes two arm64 kprobes issues observed when running
simpleperf with preemptirq tracepoints and dwarf callchains while a
kprobe is active on a frequently executed kernel function.
The crash happens in the kprobe debug exception path. While a kprobe is
preparing or executing its XOL single-step instruction, perf/trace code
can run in the same window. That code may either take a fault of its own
or hit another kprobe.
Patch 1 makes kprobe_fault_handler() handle a fault in
KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC points at the
current kprobe's XOL instruction. Otherwise the fault is left to the
normal fault handling path.
Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a
recoverable one-level reentry. Only a hit while already in
KPROBE_REENTER remains unrecoverable.
This follows the same logic as the existing x86 fixes:
6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic")
6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on single-stepping")
Reproducer:
simpleperf record -p <pid> -f 10000 \
-e preemptirq:preempt_disable \
-e preemptirq:preempt_enable \
--duration 9 --call-graph dwarf \
-o /data/local/tmp/perf.data
Before this series, the crash reproduced frequently. With both patches
applied, it was no longer reproduced in our testing.
hupu (2):
arm64: kprobes: Do not handle non-XOL faults as kprobe faults
arm64: kprobes: Allow reentering kprobes while single-stepping
arch/arm64/kernel/probes/kprobes.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling
2026-07-01 12:14 [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling Pu Hu
@ 2026-07-01 12:30 ` Pu Hu
2026-07-01 13:43 ` Masami Hiramatsu
1 sibling, 0 replies; 7+ messages in thread
From: Pu Hu @ 2026-07-01 12:30 UTC (permalink / raw)
To: catalin.marinas@arm.com, will@kernel.org, naveen@kernel.org,
davem@davemloft.net, mhiramat@kernel.org,
yang@os.amperecomputing.com, Hongyan Xia, Jiazi Li,
ada.coupriediaz@arm.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Dear Maintainers,
I would like to provide some additional background for this patchset.
We observed a high-probability crash on an Android device running a
6.1.145-based kernel when recording preemptirq tracepoints for a user
space process with dwarf callchains enabled.
The command used to reproduce the issue is:
simpleperf record -p <PID> -f 10000 \
-e preemptirq:preempt_disable \
-e preemptirq:preempt_enable \
--duration 9 --call-graph dwarf \
-o /data/local/tmp/perf.data
Here <PID> is the PID of a user space process, for example a foreground
application UI thread or RenderThread.
One important observation is that the crash does not reproduce if
"--call-graph dwarf" is removed.
The crash log shows a data abort on a user virtual address while the PC
is at a probed kernel instruction:
[ 297.177775] Unable to handle kernel paging request at virtual
address 0000007ff042e000
[ 297.177792] Mem abort info:
[ 297.177795] ESR = 0x0000000096000007
[ 297.177799] EC = 0x25: DABT (current EL), IL = 32 bits
[ 297.177803] SET = 0, FnV = 0
[ 297.177806] EA = 0, S1PTW = 0
[ 297.177808] FSC = 0x07: level 3 translation fault
[ 297.177811] Data abort info:
[ 297.177814] ISV = 0, ISS = 0x00000007
[ 297.177817] CM = 0, WnR = 0
[ 297.177820] user pgtable: 4k pages, 39-bit VAs, pgdp=000000098c9f2000
[ 297.177825] [0000007ff042e000] pgd=08000009aaaea003,
p4d=08000009aaaea003, pud=08000009aaaea003, pmd=08000000abca0003,
pte=0000000000000000
[ 297.177835] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
[ 297.178070] Skip md ftrace buffer dump for: 0x2800d70
...
[ 297.178485] CPU: 6 PID: 10214 Comm: id.article.news Tainted: P S
W O 6.1.145-android14-11-maybe-dirty-qki-consolidate #1
[ 297.178489] Hardware name: Qualcomm Technologies, Inc. Volcano
QRD,x6878 (DT)
[ 297.178491] pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS
BTYPE=--)
[ 297.178493] pc : folio_wait_bit_common+0x0/0x408
[ 297.178499] lr : perf_output_sample+0x57c/0xacc
[ 297.178502] sp : ffffffc0366c2f90
[ 297.178503] x29: ffffffc0366c2fb0 x28: 0000000000001000 x27:
0000007ff042d5f8
[ 297.178507] x26: 00000000000035e7 x25: 0000000000000000 x24:
ffffff892cec3000
[ 297.178510] x23: 0000000000001000 x22: 0000000000009370 x21:
ffffffc0366c3140
[ 297.178512] x20: ffffff888aa1a180 x19: ffffffc0366c3020 x18:
ffffffe01103b340
[ 297.178515] x17: 00000000ad6b63b6 x16: 00000000ad6b63b6 x15:
0000007ff042d5f8
[ 297.178518] x14: 0000000000000000 x13: 003436737365636f x12:
72705f7070612f6e
[ 297.178520] x11: 69622f6d65747379 x10: 732f0030333d7972 x9 :
616d6972705f6c6f
[ 297.178523] x8 : 6f705f706173755f x7 : 54454b434f535f44 x6 :
ffffff892cec39d8
[ 297.178526] x5 : ffffff892cec4000 x4 : 0000000000000008 x3 :
6e6f6973736e6172
[ 297.178528] x2 : 00000000000005b8 x1 : 0000007ff042e000 x0 :
ffffff892cec3000
[ 297.178531] Call trace:
[ 297.178532] folio_wait_bit_common+0x0/0x408
[ 297.178535] perf_event_output_forward+0x90/0xdc
[ 297.178537] __perf_event_overflow+0x128/0x1e8
[ 297.178540] perf_swevent_event+0x94/0x1a0
[ 297.178543] perf_tp_event+0x140/0x270
[ 297.178545] perf_trace_run_bpf_submit+0x84/0xe0
[ 297.178547] perf_trace_preemptirq_template+0xe8/0x124
[ 297.178553] trace_preempt_on+0xec/0x150
[ 297.178555] preempt_count_sub+0xa8/0x12c
[ 297.178562] do_debug_exception+0xd0/0x148
[ 297.178568] el1_dbg+0x64/0x80
[ 297.178575] el1h_64_sync_handler+0x3c/0x90
[ 297.178577] el1h_64_sync+0x68/0x6c
[ 297.178579] folio_wait_bit_common+0x0/0x408
[ 297.178582] __get_node_page+0xdc/0x49c
[ 297.178587] f2fs_get_dnode_of_data+0x404/0x950
[ 297.178589] f2fs_map_blocks+0x1e0/0xdf8
[ 297.178591] f2fs_mpage_readpages+0x1f0/0x8d0
[ 297.178594] f2fs_readahead+0x84/0x10c
[ 297.178596] read_pages+0xb8/0x434
[ 297.178603] page_cache_ra_unbounded+0x9c/0x2f0
[ 297.178605] page_cache_ra_order+0x2b0/0x348
[ 297.178608] do_sync_mmap_readahead+0xd0/0x228
[ 297.178612] filemap_fault+0x158/0x46c
[ 297.178615] f2fs_filemap_fault+0x28/0x114
[ 297.178617] handle_mm_fault+0x4f8/0x1468
[ 297.178620] do_page_fault+0x208/0x4b8
[ 297.178622] do_translation_fault+0x38/0x54
[ 297.178624] do_mem_abort+0x58/0x118
[ 297.178626] el0_da+0x48/0xb8
[ 297.178629] el0t_64_sync_handler+0x98/0xb4
[ 297.178632] el0t_64_sync+0x1a4/0x1a8
[ 297.178634] Code: 94000004 a8c17bfd d50323bf d65f03c0 (d4200080)
[ 297.178639] ---[ end trace 0000000000000000 ]---
The instruction d4200080 is the kprobe BRK instruction. The stack also
shows that the fault happens while handling a kprobe debug exception,
and the perf/trace path is entered from that window.
From the fulldump analysis, the issue appears to be related to the arm64
kprobe single-step/reentry handling. While a kprobe is preparing or
executing its XOL single-step instruction, perf/trace code may run in
the same window. With dwarf callchains enabled, this path may also
access user memory and take a data abort. In addition, another kprobe
may be hit while the first kprobe is still in KPROBE_HIT_SS state.
This matches the type of issue that was fixed on x86 by the following
commits:
6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic")
6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on
single-stepping")
This patchset applies the same idea to arm64:
- Patch 1 makes the arm64 kprobe fault handler handle a fault in
KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC is the current
kprobe's XOL instruction. Otherwise, the fault is left to the normal
fault handling path.
- Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a
recoverable one-level reentry. The unrecoverable case remains a hit
while already in KPROBE_REENTER.
With both patches applied, we have kept the same stress test running for
three days and the crash is no longer reproduced.
I still have the full dmesg and fulldump from the crash device. Please
let me know if any additional information would be useful.
Thanks,
hupu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling
2026-07-01 12:14 [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling Pu Hu
2026-07-01 12:30 ` Pu Hu
@ 2026-07-01 13:43 ` Masami Hiramatsu
2026-07-01 13:56 ` Pu Hu
1 sibling, 1 reply; 7+ messages in thread
From: Masami Hiramatsu @ 2026-07-01 13:43 UTC (permalink / raw)
To: Pu Hu
Cc: Hongyan Xia, Jiazi Li, catalin.marinas@arm.com, naveen@kernel.org,
linux-kernel@vger.kernel.org, yang@os.amperecomputing.com,
will@kernel.org, davem@davemloft.net,
linux-arm-kernel@lists.infradead.org,
linux-trace-kernel@vger.kernel.org
On Wed, 1 Jul 2026 12:14:54 +0000
Pu Hu <hupu@transsion.com> wrote:
> From: hupu <hupu@transsion.com>
>
> This series fixes two arm64 kprobes issues observed when running
> simpleperf with preemptirq tracepoints and dwarf callchains while a
> kprobe is active on a frequently executed kernel function.
>
> The crash happens in the kprobe debug exception path. While a kprobe is
> preparing or executing its XOL single-step instruction, perf/trace code
> can run in the same window. That code may either take a fault of its own
> or hit another kprobe.
>
> Patch 1 makes kprobe_fault_handler() handle a fault in
> KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC points at the
> current kprobe's XOL instruction. Otherwise the fault is left to the
> normal fault handling path.
>
> Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a
> recoverable one-level reentry. Only a hit while already in
> KPROBE_REENTER remains unrecoverable.
>
> This follows the same logic as the existing x86 fixes:
> 6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic")
> 6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on single-stepping")
Good catch!!
The series looks good to me.
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
But it should be reviewed by arm64 maintainers too.
BTW, if you are "Pu Hu", the Signed-off-by tag should be
"Pu Hu <...>" instead of "hupu <...>".
Thank you,
>
> Reproducer:
>
> simpleperf record -p <pid> -f 10000 \
> -e preemptirq:preempt_disable \
> -e preemptirq:preempt_enable \
> --duration 9 --call-graph dwarf \
> -o /data/local/tmp/perf.data
>
> Before this series, the crash reproduced frequently. With both patches
> applied, it was no longer reproduced in our testing.
>
> hupu (2):
> arm64: kprobes: Do not handle non-XOL faults as kprobe faults
> arm64: kprobes: Allow reentering kprobes while single-stepping
>
> arch/arm64/kernel/probes/kprobes.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> --
> 2.43.0
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling
2026-07-01 13:43 ` Masami Hiramatsu
@ 2026-07-01 13:56 ` Pu Hu
2026-07-02 10:07 ` Pu Hu
0 siblings, 1 reply; 7+ messages in thread
From: Pu Hu @ 2026-07-01 13:56 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Hongyan Xia, Jiazi Li, catalin.marinas@arm.com, naveen@kernel.org,
linux-kernel@vger.kernel.org, yang@os.amperecomputing.com,
will@kernel.org, davem@davemloft.net,
linux-arm-kernel@lists.infradead.org,
linux-trace-kernel@vger.kernel.org
On 7/1/2026 9:43 PM, Masami Hiramatsu wrote:
> On Wed, 1 Jul 2026 12:14:54 +0000
> Pu Hu <hupu@transsion.com> wrote:
>
>> From: hupu <hupu@transsion.com>
>>
>> This series fixes two arm64 kprobes issues observed when running
>> simpleperf with preemptirq tracepoints and dwarf callchains while a
>> kprobe is active on a frequently executed kernel function.
>>
>> The crash happens in the kprobe debug exception path. While a kprobe is
>> preparing or executing its XOL single-step instruction, perf/trace code
>> can run in the same window. That code may either take a fault of its own
>> or hit another kprobe.
>>
>> Patch 1 makes kprobe_fault_handler() handle a fault in
>> KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC points at the
>> current kprobe's XOL instruction. Otherwise the fault is left to the
>> normal fault handling path.
>>
>> Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a
>> recoverable one-level reentry. Only a hit while already in
>> KPROBE_REENTER remains unrecoverable.
>>
>> This follows the same logic as the existing x86 fixes:
>> 6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic")
>> 6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on single-stepping")
>
> Good catch!!
> The series looks good to me.
>
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
> But it should be reviewed by arm64 maintainers too.
>
> BTW, if you are "Pu Hu", the Signed-off-by tag should be
> "Pu Hu <...>" instead of "hupu <...>".
>
Hi Masami,
Thank you for your reply and Acked-by.
Yes, thanks for pointing this out. I will fix the author name and the
Signed-off-by tags to use a consistent name in the next version of the
patchset.
Thanks,
hupu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling
2026-07-01 13:56 ` Pu Hu
@ 2026-07-02 10:07 ` Pu Hu
2026-07-02 10:09 ` Pu Hu
0 siblings, 1 reply; 7+ messages in thread
From: Pu Hu @ 2026-07-02 10:07 UTC (permalink / raw)
To: Masami Hiramatsu (Google), catalin.marinas@arm.com,
will@kernel.org, naveen@kernel.org, davem@davemloft.net,
yang@os.amperecomputing.com, Hongyan Xia, Jiazi Li,
ada.coupriediaz@arm.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
On 7/1/2026 9:56 PM, Pu Hu wrote:
> On 7/1/2026 9:43 PM, Masami Hiramatsu wrote:
>> On Wed, 1 Jul 2026 12:14:54 +0000
>> Pu Hu <hupu@transsion.com> wrote:
>>
>>> From: hupu <hupu@transsion.com>
>>>
>>> This series fixes two arm64 kprobes issues observed when running
>>> simpleperf with preemptirq tracepoints and dwarf callchains while a
>>> kprobe is active on a frequently executed kernel function.
>>>
>>> The crash happens in the kprobe debug exception path. While a kprobe is
>>> preparing or executing its XOL single-step instruction, perf/trace code
>>> can run in the same window. That code may either take a fault of its own
>>> or hit another kprobe.
>>>
>>> Patch 1 makes kprobe_fault_handler() handle a fault in
>>> KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC points at the
>>> current kprobe's XOL instruction. Otherwise the fault is left to the
>>> normal fault handling path.
>>>
>>> Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a
>>> recoverable one-level reentry. Only a hit while already in
>>> KPROBE_REENTER remains unrecoverable.
>>>
>>> This follows the same logic as the existing x86 fixes:
>>> 6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic")
>>> 6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on
>>> single-stepping")
>>
>> Good catch!!
>> The series looks good to me.
>>
>> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>>
>> But it should be reviewed by arm64 maintainers too.
>>
>> BTW, if you are "Pu Hu", the Signed-off-by tag should be
>> "Pu Hu <...>" instead of "hupu <...>".
>>
>
> Hi Masami,
>
> Thank you for your reply and Acked-by.
>
> Yes, thanks for pointing this out. I will fix the author name and the
> Signed-off-by tags to use a consistent name in the next version of the
> patchset.
>
> Thanks,
> hupu
>
Hi maintainers,
I have reproduced the same issue on the latest mainline kernel available
today. The commit I tested is 665159e24674.
Below are the steps I used to reproduce the issue. I hope this can help
with further debugging. The complete test case used in these steps will
be provided in a follow-up email.
Reproduction steps:
1. Build the test case
Please use the test case that I will send in the next email. Depending
on your local environment, the following variables in the Makefile may
need to be adjusted:
CROSS_COMPILE ?= aarch64-dumpstack-linux-gnu-
KERN_DIR ?= $(PWD)/../../output/build-mainline
DEST_PATH ?= $(PWD)/../../output
Then run:
make all
This builds the userspace test program:
fault_stress
and the kprobe module:
kp_folio.ko
2. Boot QEMU
To increase memory pressure, I used only two CPUs and 512 MB of memory
in the QEMU guest:
SMP="-smp 2"
qemu-system-aarch64 -m 512 -cpu cortex-a53 \
-M virt,gic-version=3,its=on,iommu=smmuv3 \
-nographic $SMP -kernel $KERNEL_IMAGE \
-append "nokaslr noinitrd sched_debug root=/dev/vda
rootfstype=ext4 rw crashkernel=256M loglevel=8" \
-drive if=none,file=$ROOTFS_IMAGE,id=hd0,format=raw \
-device virtio-blk-device,drive=hd0 \
--fsdev local,id=kmod_dev,path=./output/,security_model=none \
-device virtio-9p-pci,fsdev=kmod_dev,mount_tag=kmod_mount \
-net nic -net tap,ifname=tap0,script=no,downscript=no \
$GDB_DEBUG
3. Run the test in the guest
After the guest has booted, run the following commands.
Allow kernel symbols to be shown:
echo 0 > /proc/sys/kernel/kptr_restrict
Load the kprobe module:
insmod kp_folio.ko
Start the fault stress program:
./fault_stress &
Start stress-ng to add memory pressure:
./stress-ng --vm 2 --vm-bytes 70% --page-in &
Run perf against the fault_stress process. In the command below, 171 is
the PID of fault_stress in my test environment:
./perf record -p 171 -c 1 \
-e preemptirq:preempt_disable \
-e preemptirq:preempt_enable \
--call-graph dwarf \
-o /tmp/perf.data \
-- sleep 5
With the steps above, I can reproduce the crash reliably in my local
QEMU setup. After applying my previously submitted fix, I can no longer
reproduce the issue with the same test.
The crash log is shown below:
[ 173.383321] kp_folio: hit=1564 comm=fault_stress tgid=171 tid=173
[ 173.402940] kp_folio: hit=1565 comm=fault_stress tgid=171 tid=179
[ 173.528342] kp_folio: hit=1566 comm=fault_stress tgid=171 tid=175
[ 173.846895] kp_folio: hit=1567 comm=fault_stress tgid=171 tid=172
[ 174.223031] kp_folio: hit=1568 comm=fault_stress tgid=171 tid=179
[ 174.224419] kp_folio: hit=1569 comm=fault_stress tgid=171 tid=174
[ 174.928471] kp_folio: hit=1570 comm=fault_stress tgid=171 tid=175
[ 174.930916] Unable to handle kernel paging request at virtual address
0000ffffa3592000
[ 174.931068] Mem abort info:
[ 174.931116] ESR = 0x0000000096000007
[ 174.931180] EC = 0x25: DABT (current EL), IL = 32 bits
[ 174.931240] SET = 0, FnV = 0
[ 174.931368] EA = 0, S1PTW = 0
[ 174.931430] FSC = 0x07: level 3 translation fault
[ 174.931490] Data abort info:
[ 174.931540] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[ 174.931593] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 174.931669] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 174.931762] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000049bf8000
[ 174.931829] [0000ffffa3592000] pgd=0800000049a99403,
p4d=0800000049a99403, pud=0800000049ac0403, pmd=0800000049bed403,
pte=00000000000047c0
[ 174.932328] Internal error: Oops: 0000000096000007 [#1] SMP
[ 174.939042] Modules linked in: kp_folio(O)
[ 174.942114] CPU: 1 UID: 0 PID: 175 Comm: fault_stress Tainted: G
O 7.2.0-rc1-00010-g7679152d724a-dirty #2 PREEMPT
[ 174.945427] Tainted: [O]=OOT_MODULE
[ 174.946006] Hardware name: linux,dummy-virt (DT)
[ 174.947011] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 174.948582] pc : folio_wait_bit_common+0x0/0x320
[ 174.949626] lr : perf_output_sample+0x708/0x968
[ 174.950041] sp : ffff800084b13540
[ 174.950511] x29: ffff800084b13570 x28: ffff000006704260 x27:
0000ffffa3591d08
[ 174.953274] x26: ffff000009a19a80 x25: 0000000000000000 x24:
ffff800084b13780
[ 174.953601] x23: 0000000000000ee8 x22: 000000000000b5ef x21:
0000000000001000
[ 174.954003] x20: 0000000000000ee8 x19: ffff800084b135e0 x18:
000000000000000a
[ 174.954262] x17: ffff8000803d1af4 x16: ffff80008036d01c x15:
0000ffffa3591d08
[ 174.954549] x14: 0000000000000000 x13: 0000000000000000 x12:
0000000000000000
[ 174.954863] x11: 0000000000000000 x10: 0000000000000000 x9 :
0000000000000000
[ 174.955315] x8 : 0000000000000000 x7 : 0000000000000000 x6 :
ffff0000069ce2c8
[ 174.955592] x5 : ffff0000069ceee8 x4 : 0000000000000008 x3 :
0000000000000000
[ 174.956083] x2 : 0000000000000be0 x1 : 0000ffffa3592000 x0 :
ffff0000069ce000
[ 174.956838] Call trace:
[ 174.958282] folio_wait_bit_common+0x0/0x320 (P)
[ 174.958618] perf_event_output_forward+0xc0/0x1a8
[ 174.958811] __perf_event_overflow+0x108/0x518
[ 174.959066] perf_swevent_event+0x238/0x260
[ 174.959295] perf_tp_event+0x34c/0x6a0
[ 174.959667] perf_trace_run_bpf_submit+0x8c/0xd0
[ 174.962331] perf_trace_preemptirq_template+0xc4/0x130
[ 174.962644] trace_preempt_on+0x114/0x1e8
[ 174.963019] preempt_count_sub+0x78/0xe0
[ 174.963402] el1_brk64+0x40/0x60
[ 174.963617] el1h_64_sync_handler+0x68/0xb0
[ 174.963817] el1h_64_sync+0x6c/0x70
[ 174.964239] 0xffff8000846c5000 (P)
[ 174.964938] __do_fault+0x44/0x288
[ 174.965452] __handle_mm_fault+0xaf8/0x1a40
[ 174.965815] handle_mm_fault+0xb4/0x420
[ 174.966527] do_page_fault+0x140/0x7b0
[ 174.967398] do_translation_fault+0x4c/0x70
[ 174.968057] do_mem_abort+0x48/0xa0
[ 174.969705] el0_da+0x64/0x290
[ 174.969984] el0t_64_sync_handler+0xd0/0xe8
[ 174.970324] el0t_64_sync+0x198/0x1a0
[ 174.970713] Code: d50323bf d65f03c0 12800140 17fffffc (d4200080)
[ 174.971338] kp_folio: hit=1571 comm=fault_stress tgid=171 tid=174
[ 174.972266] ---[ end trace 0000000000000000 ]---
I will send the complete test case in a follow-up email.
Thanks,
hupu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling
2026-07-02 10:07 ` Pu Hu
@ 2026-07-02 10:09 ` Pu Hu
2026-07-04 14:47 ` Masami Hiramatsu
0 siblings, 1 reply; 7+ messages in thread
From: Pu Hu @ 2026-07-02 10:09 UTC (permalink / raw)
To: Masami Hiramatsu (Google), catalin.marinas@arm.com,
will@kernel.org, naveen@kernel.org, davem@davemloft.net,
yang@os.amperecomputing.com, Hongyan Xia, Jiazi Li,
ada.coupriediaz@arm.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
On 7/2/2026 6:07 PM, hupu wrote:
> On 7/1/2026 9:56 PM, Pu Hu wrote:
>> On 7/1/2026 9:43 PM, Masami Hiramatsu wrote:
>>> On Wed, 1 Jul 2026 12:14:54 +0000
>>> Pu Hu <hupu@transsion.com> wrote:
>>>
>>>> From: hupu <hupu@transsion.com>
>>>>
...
...>
> I will send the complete test case in a follow-up email.
>
> Thanks,
> hupu
>
Hi maintainers,
As mentioned in my previous email, below is the complete test case I
used to reproduce the arm64 kprobe crash on mainline.
It contains:
- a small kprobe module that probes folio_wait_bit_common()
- a userspace program that repeatedly triggers file-backed page faults
- a Makefile to build both parts
Depending on the local build environment, the following variables in the
Makefile may need to be adjusted:
CROSS_COMPILE
KERN_DIR
DEST_PATH
Thanks,
Pu Hu
---
diff --git a/misc/kprobe/Makefile b/misc/kprobe/Makefile
new file mode 100755
index 0000000..14c00c0
--- /dev/null
+++ b/misc/kprobe/Makefile
@@ -0,0 +1,36 @@
+PWD := $(shell pwd)
+ARCH ?= arm64
+CROSS_COMPILE ?= aarch64-dumpstack-linux-gnu-
+KERN_DIR ?= $(PWD)/../../output/build-mainline
+DEST_PATH ?= $(PWD)/../../output
+Q := @
+
+UNIT_TEST := fault_stress
+UNIT_TEST_SRC := fault_stress.c
+
+KP_MOD := kp_folio
+obj-m := $(KP_MOD).o
+
+USER_CFLAGS := -static -g -O0 -fno-omit-frame-pointer
-fasynchronous-unwind-tables
+USER_LIBS := -lm -lpthread
+EXTRA_CFLAGS += -I$(KERN_DIR)
+
+.PHONY: all modules user clean
+
+all: modules user install
+
+modules:
+ $(Q)$(MAKE) -C $(KERN_DIR) M=$(PWD)
EXTRA_CFLAGS="$(EXTRA_CFLAGS)" ARCH=$(ARCH)
CROSS_COMPILE=$(CROSS_COMPILE) modules
+
+user:
+ $(Q)$(CROSS_COMPILE)gcc $(USER_CFLAGS) $(UNIT_TEST_SRC) -o
$(UNIT_TEST) $(USER_LIBS)
+
+install:
+ $(Q)mkdir -p $(DEST_PATH)
+ $(Q)cp -f *.ko $(DEST_PATH)/
+ $(Q)cp -f $(UNIT_TEST) $(DEST_PATH)/
+
+clean:
+ $(Q)$(MAKE) -C $(KERN_DIR) M=$(PWD) clean
+ $(Q)rm -f $(UNIT_TEST)
+ $(Q)rm -f $(DEST_PATH)/$(UNIT_TEST) $(DEST_PATH)/*.ko
diff --git a/misc/kprobe/fault_stress.c b/misc/kprobe/fault_stress.c
new file mode 100755
index 0000000..10150ff
--- /dev/null
+++ b/misc/kprobe/fault_stress.c
@@ -0,0 +1,96 @@
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#define FILE_SIZE (256UL * 1024 * 1024)
+#define NR_THREADS 8
+
+static void deep_call(int n)
+{
+ volatile char buf[4096];
+
+ memset((void *)buf, n, sizeof(buf));
+
+ if (n > 0)
+ deep_call(n - 1);
+ else
+ sched_yield();
+}
+
+static void *worker(void *arg)
+{
+ const char *path = arg;
+ int fd;
+ char *map;
+ unsigned long i;
+ volatile unsigned long sum = 0;
+
+ fd = open(path, O_RDONLY);
+ if (fd < 0) {
+ perror("open");
+ return NULL;
+ }
+
+ map = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
+ if (map == MAP_FAILED) {
+ perror("mmap");
+ close(fd);
+ return NULL;
+ }
+
+ for (;;) {
+ /*
+ * Drop the pages backing this mapping from the current
process.
+ * Subsequent accesses are more likely to trigger
file-backed
+ * page faults again.
+ */
+ madvise(map, FILE_SIZE, MADV_DONTNEED);
+
+ for (i = 0; i < FILE_SIZE; i += 4096 * 17) {
+ sum += map[i];
+ deep_call(64);
+ }
+ }
+
+ munmap(map, FILE_SIZE);
+ close(fd);
+ return NULL;
+}
+
+int main(void)
+{
+ pthread_t th[NR_THREADS];
+ const char *path = "/tmp/fault_stress_file";
+ int fd;
+ int i;
+
+ fd = open(path, O_CREAT | O_RDWR, 0644);
+ if (fd < 0) {
+ perror("open file");
+ return 1;
+ }
+
+ if (ftruncate(fd, FILE_SIZE) < 0) {
+ perror("ftruncate");
+ return 1;
+ }
+
+ close(fd);
+
+ for (i = 0; i < NR_THREADS; i++)
+ pthread_create(&th[i], NULL, worker, (void *)path);
+
+ for (i = 0; i < NR_THREADS; i++)
+ pthread_join(th[i], NULL);
+
+ return 0;
+}
+
diff --git a/misc/kprobe/kp_folio.c b/misc/kprobe/kp_folio.c
new file mode 100755
index 0000000..c8f3e1d
--- /dev/null
+++ b/misc/kprobe/kp_folio.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/sched.h>
+#include <linux/atomic.h>
+#include <linux/ratelimit.h>
+
+static atomic64_t kp_hit_count = ATOMIC64_INIT(0);
+
+static int folio_wait_bit_common_handler(
+ struct kprobe *p,^M
+ struct pt_regs *regs)
+{
+ unsigned long hit;
+
+ hit = atomic64_inc_return(&kp_hit_count);
+
+ pr_info("kp_folio: hit=%lu comm=%s tgid=%d tid=%d\n",
+ hit, current->comm, current->tgid, current->pid);
+
+ return 0;
+}
+
+static struct kprobe kp_folio_wait_bit_common = {
+ .symbol_name = "folio_wait_bit_common",
+ .pre_handler = folio_wait_bit_common_handler,
+};
+
+static int __init kp_folio_init(void)
+{
+ int ret;
+
+ ret = register_kprobe(&kp_folio_wait_bit_common);
+ if (ret < 0) {
+ pr_err("kp_folio: register_kprobe failed, ret=%d\n", ret);
+ return ret;
+ }
+
+ pr_info("kp_folio: kprobe registered at %pS, addr=%px\n",
+ kp_folio_wait_bit_common.addr,
+ kp_folio_wait_bit_common.addr);
+
+ return 0;
+}
+
+static void __exit kp_folio_exit(void)
+{
+ unregister_kprobe(&kp_folio_wait_bit_common);
+
+ pr_info("kp_folio: kprobe unregistered, total hits=%lld\n",
+ atomic64_read(&kp_hit_count));
+}
+
+module_init(kp_folio_init);
+module_exit(kp_folio_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("hupu <hupu@transsion.com>");
+MODULE_DESCRIPTION("simple kprobe reproducer for folio_wait_bit_common");
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling
2026-07-02 10:09 ` Pu Hu
@ 2026-07-04 14:47 ` Masami Hiramatsu
0 siblings, 0 replies; 7+ messages in thread
From: Masami Hiramatsu @ 2026-07-04 14:47 UTC (permalink / raw)
To: Pu Hu
Cc: Hongyan Xia, Jiazi Li, catalin.marinas@arm.com, naveen@kernel.org,
linux-kernel@vger.kernel.org, yang@os.amperecomputing.com,
will@kernel.org, davem@davemloft.net,
linux-arm-kernel@lists.infradead.org,
linux-trace-kernel@vger.kernel.org
Hi Pu Hu,
Can you update this by rebasing on top of arm64 tree
(git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git)
and update your signed-off-by with your name.
Thank you,
On Thu, 2 Jul 2026 10:09:51 +0000
Pu Hu <hupu@transsion.com> wrote:
> On 7/2/2026 6:07 PM, hupu wrote:
> > On 7/1/2026 9:56 PM, Pu Hu wrote:
> >> On 7/1/2026 9:43 PM, Masami Hiramatsu wrote:
> >>> On Wed, 1 Jul 2026 12:14:54 +0000
> >>> Pu Hu <hupu@transsion.com> wrote:
> >>>
> >>>> From: hupu <hupu@transsion.com>
> >>>>
> ...
> ...>
> > I will send the complete test case in a follow-up email.
> >
> > Thanks,
> > hupu
> >
>
> Hi maintainers,
>
> As mentioned in my previous email, below is the complete test case I
> used to reproduce the arm64 kprobe crash on mainline.
>
> It contains:
>
> - a small kprobe module that probes folio_wait_bit_common()
> - a userspace program that repeatedly triggers file-backed page faults
> - a Makefile to build both parts
>
> Depending on the local build environment, the following variables in the
> Makefile may need to be adjusted:
>
> CROSS_COMPILE
> KERN_DIR
> DEST_PATH
>
> Thanks,
> Pu Hu
>
> ---
>
>
> diff --git a/misc/kprobe/Makefile b/misc/kprobe/Makefile
> new file mode 100755
> index 0000000..14c00c0
> --- /dev/null
> +++ b/misc/kprobe/Makefile
> @@ -0,0 +1,36 @@
> +PWD := $(shell pwd)
> +ARCH ?= arm64
> +CROSS_COMPILE ?= aarch64-dumpstack-linux-gnu-
> +KERN_DIR ?= $(PWD)/../../output/build-mainline
> +DEST_PATH ?= $(PWD)/../../output
> +Q := @
> +
> +UNIT_TEST := fault_stress
> +UNIT_TEST_SRC := fault_stress.c
> +
> +KP_MOD := kp_folio
> +obj-m := $(KP_MOD).o
> +
> +USER_CFLAGS := -static -g -O0 -fno-omit-frame-pointer
> -fasynchronous-unwind-tables
> +USER_LIBS := -lm -lpthread
> +EXTRA_CFLAGS += -I$(KERN_DIR)
> +
> +.PHONY: all modules user clean
> +
> +all: modules user install
> +
> +modules:
> + $(Q)$(MAKE) -C $(KERN_DIR) M=$(PWD)
> EXTRA_CFLAGS="$(EXTRA_CFLAGS)" ARCH=$(ARCH)
> CROSS_COMPILE=$(CROSS_COMPILE) modules
> +
> +user:
> + $(Q)$(CROSS_COMPILE)gcc $(USER_CFLAGS) $(UNIT_TEST_SRC) -o
> $(UNIT_TEST) $(USER_LIBS)
> +
> +install:
> + $(Q)mkdir -p $(DEST_PATH)
> + $(Q)cp -f *.ko $(DEST_PATH)/
> + $(Q)cp -f $(UNIT_TEST) $(DEST_PATH)/
> +
> +clean:
> + $(Q)$(MAKE) -C $(KERN_DIR) M=$(PWD) clean
> + $(Q)rm -f $(UNIT_TEST)
> + $(Q)rm -f $(DEST_PATH)/$(UNIT_TEST) $(DEST_PATH)/*.ko
> diff --git a/misc/kprobe/fault_stress.c b/misc/kprobe/fault_stress.c
> new file mode 100755
> index 0000000..10150ff
> --- /dev/null
> +++ b/misc/kprobe/fault_stress.c
> @@ -0,0 +1,96 @@
> +#define _GNU_SOURCE
> +#include <fcntl.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/mman.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <unistd.h>
> +
> +#define FILE_SIZE (256UL * 1024 * 1024)
> +#define NR_THREADS 8
> +
> +static void deep_call(int n)
> +{
> + volatile char buf[4096];
> +
> + memset((void *)buf, n, sizeof(buf));
> +
> + if (n > 0)
> + deep_call(n - 1);
> + else
> + sched_yield();
> +}
> +
> +static void *worker(void *arg)
> +{
> + const char *path = arg;
> + int fd;
> + char *map;
> + unsigned long i;
> + volatile unsigned long sum = 0;
> +
> + fd = open(path, O_RDONLY);
> + if (fd < 0) {
> + perror("open");
> + return NULL;
> + }
> +
> + map = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
> + if (map == MAP_FAILED) {
> + perror("mmap");
> + close(fd);
> + return NULL;
> + }
> +
> + for (;;) {
> + /*
> + * Drop the pages backing this mapping from the current
> process.
> + * Subsequent accesses are more likely to trigger
> file-backed
> + * page faults again.
> + */
> + madvise(map, FILE_SIZE, MADV_DONTNEED);
> +
> + for (i = 0; i < FILE_SIZE; i += 4096 * 17) {
> + sum += map[i];
> + deep_call(64);
> + }
> + }
> +
> + munmap(map, FILE_SIZE);
> + close(fd);
> + return NULL;
> +}
> +
> +int main(void)
> +{
> + pthread_t th[NR_THREADS];
> + const char *path = "/tmp/fault_stress_file";
> + int fd;
> + int i;
> +
> + fd = open(path, O_CREAT | O_RDWR, 0644);
> + if (fd < 0) {
> + perror("open file");
> + return 1;
> + }
> +
> + if (ftruncate(fd, FILE_SIZE) < 0) {
> + perror("ftruncate");
> + return 1;
> + }
> +
> + close(fd);
> +
> + for (i = 0; i < NR_THREADS; i++)
> + pthread_create(&th[i], NULL, worker, (void *)path);
> +
> + for (i = 0; i < NR_THREADS; i++)
> + pthread_join(th[i], NULL);
> +
> + return 0;
> +}
> +
> diff --git a/misc/kprobe/kp_folio.c b/misc/kprobe/kp_folio.c
> new file mode 100755
> index 0000000..c8f3e1d
> --- /dev/null
> +++ b/misc/kprobe/kp_folio.c
> @@ -0,0 +1,60 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/kprobes.h>
> +#include <linux/sched.h>
> +#include <linux/atomic.h>
> +#include <linux/ratelimit.h>
> +
> +static atomic64_t kp_hit_count = ATOMIC64_INIT(0);
> +
> +static int folio_wait_bit_common_handler(
> + struct kprobe *p,^M
> + struct pt_regs *regs)
> +{
> + unsigned long hit;
> +
> + hit = atomic64_inc_return(&kp_hit_count);
> +
> + pr_info("kp_folio: hit=%lu comm=%s tgid=%d tid=%d\n",
> + hit, current->comm, current->tgid, current->pid);
> +
> + return 0;
> +}
> +
> +static struct kprobe kp_folio_wait_bit_common = {
> + .symbol_name = "folio_wait_bit_common",
> + .pre_handler = folio_wait_bit_common_handler,
> +};
> +
> +static int __init kp_folio_init(void)
> +{
> + int ret;
> +
> + ret = register_kprobe(&kp_folio_wait_bit_common);
> + if (ret < 0) {
> + pr_err("kp_folio: register_kprobe failed, ret=%d\n", ret);
> + return ret;
> + }
> +
> + pr_info("kp_folio: kprobe registered at %pS, addr=%px\n",
> + kp_folio_wait_bit_common.addr,
> + kp_folio_wait_bit_common.addr);
> +
> + return 0;
> +}
> +
> +static void __exit kp_folio_exit(void)
> +{
> + unregister_kprobe(&kp_folio_wait_bit_common);
> +
> + pr_info("kp_folio: kprobe unregistered, total hits=%lld\n",
> + atomic64_read(&kp_hit_count));
> +}
> +
> +module_init(kp_folio_init);
> +module_exit(kp_folio_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("hupu <hupu@transsion.com>");
> +MODULE_DESCRIPTION("simple kprobe reproducer for folio_wait_bit_common");
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-07-04 14:47 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 12:14 [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling Pu Hu
2026-07-01 12:30 ` Pu Hu
2026-07-01 13:43 ` Masami Hiramatsu
2026-07-01 13:56 ` Pu Hu
2026-07-02 10:07 ` Pu Hu
2026-07-02 10:09 ` Pu Hu
2026-07-04 14:47 ` Masami Hiramatsu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox