* [PATCH] arm64: insn: Route BTI to simulate_nop to avoid XOL/SS at function entry @ 2025-11-06 10:49 Khaja Hussain Shaik Khaji 2025-11-11 10:26 ` Mark Rutland 2026-02-17 13:38 ` [PATCH v2 0/2] arm64: kprobes: fix XOL preemption window Khaja Hussain Shaik Khaji 0 siblings, 2 replies; 18+ messages in thread From: Khaja Hussain Shaik Khaji @ 2025-11-06 10:49 UTC (permalink / raw) To: linux-arm-kernel Cc: kprobes, linux-kernel, will, catalin.marinas, masami.hiramatsu, khaja.khaji On arm64 with branch protection, functions typically begin with a BTI (Branch Target Identification) landing pad. Today the decoder treats BTI as requiring out-of-line single-step (XOL), allocating a slot and placing an SS-BRK. Under SMP this leaves a small window before DAIF is masked where an asynchronous exception or nested probe can interleave and clear current_kprobe, resulting in an SS-BRK panic. Handle BTI like NOP in the decoder and simulate it (advance PC by one instruction). This avoids XOL/SS-BRK at these sites and removes the single-step window, while preserving correctness for kprobes since BTI’s branch-target enforcement has no program-visible effect in this EL1 exception context. In practice BTI is most commonly observed at function entry, so the main effect of this change is to eliminate entry-site single-stepping. Other instructions and non-entry sites are unaffected. Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> --- arch/arm64/include/asm/insn.h | 5 ----- arch/arm64/kernel/probes/decode-insn.c | 9 ++++++--- arch/arm64/kernel/probes/simulate-insn.c | 1 + 3 files changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h index 18c7811774d3..7e80cc1f0c3d 100644 --- a/arch/arm64/include/asm/insn.h +++ b/arch/arm64/include/asm/insn.h @@ -452,11 +452,6 @@ static __always_inline bool aarch64_insn_is_steppable_hint(u32 insn) case AARCH64_INSN_HINT_PACIASP: case AARCH64_INSN_HINT_PACIBZ: case AARCH64_INSN_HINT_PACIBSP: - case AARCH64_INSN_HINT_BTI: - case AARCH64_INSN_HINT_BTIC: - case AARCH64_INSN_HINT_BTIJ: - case AARCH64_INSN_HINT_BTIJC: - case AARCH64_INSN_HINT_NOP: return true; default: return false; diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c index 6438bf62e753..7ce2cf5e21d3 100644 --- a/arch/arm64/kernel/probes/decode-insn.c +++ b/arch/arm64/kernel/probes/decode-insn.c @@ -79,10 +79,13 @@ enum probe_insn __kprobes arm_probe_decode_insn(u32 insn, struct arch_probe_insn *api) { /* - * While 'nop' instruction can execute in the out-of-line slot, - * simulating them in breakpoint handling offers better performance. + * NOP and BTI (Branch Target Identification) have no program‑visible side + * effects for kprobes purposes. Simulate them to avoid XOL/SS‑BRK and the + * small single‑step window. BTI’s branch‑target enforcement semantics are + * irrelevant in this EL1 kprobe context, so advancing PC by one insn is + * sufficient here. */ - if (aarch64_insn_is_nop(insn)) { + if (aarch64_insn_is_nop(insn) || aarch64_insn_is_bti(insn)) { api->handler = simulate_nop; return INSN_GOOD_NO_SLOT; } diff --git a/arch/arm64/kernel/probes/simulate-insn.c b/arch/arm64/kernel/probes/simulate-insn.c index 4c6d2d712fbd..b83312cb70ba 100644 --- a/arch/arm64/kernel/probes/simulate-insn.c +++ b/arch/arm64/kernel/probes/simulate-insn.c @@ -200,5 +200,6 @@ simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs) void __kprobes simulate_nop(u32 opcode, long addr, struct pt_regs *regs) { + /* Also used as BTI simulator: both just advance PC by one insn. */ arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE); } -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] arm64: insn: Route BTI to simulate_nop to avoid XOL/SS at function entry 2025-11-06 10:49 [PATCH] arm64: insn: Route BTI to simulate_nop to avoid XOL/SS at function entry Khaja Hussain Shaik Khaji @ 2025-11-11 10:26 ` Mark Rutland 2025-11-12 12:17 ` Mark Rutland 2026-02-17 13:38 ` [PATCH v2 0/2] arm64: kprobes: fix XOL preemption window Khaja Hussain Shaik Khaji 1 sibling, 1 reply; 18+ messages in thread From: Mark Rutland @ 2025-11-11 10:26 UTC (permalink / raw) To: Khaja Hussain Shaik Khaji Cc: linux-arm-kernel, kprobes, linux-kernel, will, catalin.marinas, masami.hiramatsu On Thu, Nov 06, 2025 at 04:19:55PM +0530, Khaja Hussain Shaik Khaji wrote: > On arm64 with branch protection, functions typically begin with a BTI > (Branch Target Identification) landing pad. Today the decoder treats BTI > as requiring out-of-line single-step (XOL), allocating a slot and placing > an SS-BRK. Under SMP this leaves a small window before DAIF is masked > where an asynchronous exception or nested probe can interleave and clear > current_kprobe, resulting in an SS-BRK panic. If you can take an exception here, and current_kprobe gets cleared, then XOL stepping is broken in general, but just for BTI. > Handle BTI like NOP in the decoder and simulate it (advance PC by one > instruction). This avoids XOL/SS-BRK at these sites and removes the > single-step window, while preserving correctness for kprobes since BTI’s > branch-target enforcement has no program-visible effect in this EL1 > exception context. One of the reasons for doing this out-of-line is that we should be able to mark the XOL slot as a guarded page, and get the correct BTI behaviour. It looks like we don't currently do that, which is a bug. Just skipping the BTI isn't right; that throws away the BTI target check. > In practice BTI is most commonly observed at function entry, so the main > effect of this change is to eliminate entry-site single-stepping. Other > instructions and non-entry sites are unaffected. > > Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> > --- > arch/arm64/include/asm/insn.h | 5 ----- > arch/arm64/kernel/probes/decode-insn.c | 9 ++++++--- > arch/arm64/kernel/probes/simulate-insn.c | 1 + > 3 files changed, 7 insertions(+), 8 deletions(-) > > diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h > index 18c7811774d3..7e80cc1f0c3d 100644 > --- a/arch/arm64/include/asm/insn.h > +++ b/arch/arm64/include/asm/insn.h > @@ -452,11 +452,6 @@ static __always_inline bool aarch64_insn_is_steppable_hint(u32 insn) > case AARCH64_INSN_HINT_PACIASP: > case AARCH64_INSN_HINT_PACIBZ: > case AARCH64_INSN_HINT_PACIBSP: > - case AARCH64_INSN_HINT_BTI: > - case AARCH64_INSN_HINT_BTIC: > - case AARCH64_INSN_HINT_BTIJ: > - case AARCH64_INSN_HINT_BTIJC: > - case AARCH64_INSN_HINT_NOP: > return true; > default: > return false; > diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c > index 6438bf62e753..7ce2cf5e21d3 100644 > --- a/arch/arm64/kernel/probes/decode-insn.c > +++ b/arch/arm64/kernel/probes/decode-insn.c > @@ -79,10 +79,13 @@ enum probe_insn __kprobes > arm_probe_decode_insn(u32 insn, struct arch_probe_insn *api) > { > /* > - * While 'nop' instruction can execute in the out-of-line slot, > - * simulating them in breakpoint handling offers better performance. > + * NOP and BTI (Branch Target Identification) have no program‑visible side > + * effects for kprobes purposes. Simulate them to avoid XOL/SS‑BRK and the > + * small single‑step window. BTI’s branch‑target enforcement semantics are > + * irrelevant in this EL1 kprobe context, so advancing PC by one insn is > + * sufficient here. > */ > - if (aarch64_insn_is_nop(insn)) { > + if (aarch64_insn_is_nop(insn) || aarch64_insn_is_bti(insn)) { > api->handler = simulate_nop; > return INSN_GOOD_NO_SLOT; > } I'm not necessarily opposed to emulating the BTI, but: (a) The BTI should not be emulated as a NOP. I am not keen on simulating the BTI exception in software, and would strongly prefer that's handled by HW (e.g. in the XOL slot). (b) As above, it sounds like this is bodging around a more general problem. We must solve that more general problem. > diff --git a/arch/arm64/kernel/probes/simulate-insn.c b/arch/arm64/kernel/probes/simulate-insn.c > index 4c6d2d712fbd..b83312cb70ba 100644 > --- a/arch/arm64/kernel/probes/simulate-insn.c > +++ b/arch/arm64/kernel/probes/simulate-insn.c > @@ -200,5 +200,6 @@ simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs) > void __kprobes > simulate_nop(u32 opcode, long addr, struct pt_regs *regs) > { > + /* Also used as BTI simulator: both just advance PC by one insn. */ > arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE); > } This comment should go. Mark. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] arm64: insn: Route BTI to simulate_nop to avoid XOL/SS at function entry 2025-11-11 10:26 ` Mark Rutland @ 2025-11-12 12:17 ` Mark Rutland 0 siblings, 0 replies; 18+ messages in thread From: Mark Rutland @ 2025-11-12 12:17 UTC (permalink / raw) To: Khaja Hussain Shaik Khaji Cc: linux-arm-kernel, kprobes, linux-kernel, will, catalin.marinas, masami.hiramatsu On Tue, Nov 11, 2025 at 10:26:44AM +0000, Mark Rutland wrote: > On Thu, Nov 06, 2025 at 04:19:55PM +0530, Khaja Hussain Shaik Khaji wrote: > > On arm64 with branch protection, functions typically begin with a BTI > > (Branch Target Identification) landing pad. Today the decoder treats BTI > > as requiring out-of-line single-step (XOL), allocating a slot and placing > > an SS-BRK. Under SMP this leaves a small window before DAIF is masked > > where an asynchronous exception or nested probe can interleave and clear > > current_kprobe, resulting in an SS-BRK panic. > > If you can take an exception here, and current_kprobe gets cleared, then > XOL stepping is broken in general, but just for BTI. Sorry, I typo'd the above. That should say: If you can take an exception here, and current_kprobe gets cleared, then XOL stepping is broken in general, *not* just for BTI. I took a look at the exception entry code, and AFIACT DAIF is not relevant. Upon exception entry, HW will mask all DAIF exception, and we don't unmask any of those while handling an EL1 BRK. Given that, IIUC the only way this can happen is if we can place a kprobe on something used during kprobe handling (since BRK exceptions aren't masked by DAIF). I am certain this is possible, and that kprobes isn't generally safe; the existing __kprobes annotations are inadequent and I don't think we can make kprobes generally sound without a significant rework (e.g. to make it noinstr-safe). Can you share any details on how you triggered this? e.g. what functions you had kprobes on, whether you used any specific tooling? Mark. > > Handle BTI like NOP in the decoder and simulate it (advance PC by one > > instruction). This avoids XOL/SS-BRK at these sites and removes the > > single-step window, while preserving correctness for kprobes since BTI’s > > branch-target enforcement has no program-visible effect in this EL1 > > exception context. > > One of the reasons for doing this out-of-line is that we should be able > to mark the XOL slot as a guarded page, and get the correct BTI > behaviour. It looks like we don't currently do that, which is a bug. > > Just skipping the BTI isn't right; that throws away the BTI target > check. > > > In practice BTI is most commonly observed at function entry, so the main > > effect of this change is to eliminate entry-site single-stepping. Other > > instructions and non-entry sites are unaffected. > > > > Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> > > --- > > arch/arm64/include/asm/insn.h | 5 ----- > > arch/arm64/kernel/probes/decode-insn.c | 9 ++++++--- > > arch/arm64/kernel/probes/simulate-insn.c | 1 + > > 3 files changed, 7 insertions(+), 8 deletions(-) > > > > diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h > > index 18c7811774d3..7e80cc1f0c3d 100644 > > --- a/arch/arm64/include/asm/insn.h > > +++ b/arch/arm64/include/asm/insn.h > > @@ -452,11 +452,6 @@ static __always_inline bool aarch64_insn_is_steppable_hint(u32 insn) > > case AARCH64_INSN_HINT_PACIASP: > > case AARCH64_INSN_HINT_PACIBZ: > > case AARCH64_INSN_HINT_PACIBSP: > > - case AARCH64_INSN_HINT_BTI: > > - case AARCH64_INSN_HINT_BTIC: > > - case AARCH64_INSN_HINT_BTIJ: > > - case AARCH64_INSN_HINT_BTIJC: > > - case AARCH64_INSN_HINT_NOP: > > return true; > > default: > > return false; > > diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c > > index 6438bf62e753..7ce2cf5e21d3 100644 > > --- a/arch/arm64/kernel/probes/decode-insn.c > > +++ b/arch/arm64/kernel/probes/decode-insn.c > > @@ -79,10 +79,13 @@ enum probe_insn __kprobes > > arm_probe_decode_insn(u32 insn, struct arch_probe_insn *api) > > { > > /* > > - * While 'nop' instruction can execute in the out-of-line slot, > > - * simulating them in breakpoint handling offers better performance. > > + * NOP and BTI (Branch Target Identification) have no program‑visible side > > + * effects for kprobes purposes. Simulate them to avoid XOL/SS‑BRK and the > > + * small single‑step window. BTI’s branch‑target enforcement semantics are > > + * irrelevant in this EL1 kprobe context, so advancing PC by one insn is > > + * sufficient here. > > */ > > - if (aarch64_insn_is_nop(insn)) { > > + if (aarch64_insn_is_nop(insn) || aarch64_insn_is_bti(insn)) { > > api->handler = simulate_nop; > > return INSN_GOOD_NO_SLOT; > > } > > I'm not necessarily opposed to emulating the BTI, but: > > (a) The BTI should not be emulated as a NOP. I am not keen on simulating > the BTI exception in software, and would strongly prefer that's > handled by HW (e.g. in the XOL slot). > > (b) As above, it sounds like this is bodging around a more general > problem. We must solve that more general problem. > > > diff --git a/arch/arm64/kernel/probes/simulate-insn.c b/arch/arm64/kernel/probes/simulate-insn.c > > index 4c6d2d712fbd..b83312cb70ba 100644 > > --- a/arch/arm64/kernel/probes/simulate-insn.c > > +++ b/arch/arm64/kernel/probes/simulate-insn.c > > @@ -200,5 +200,6 @@ simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs) > > void __kprobes > > simulate_nop(u32 opcode, long addr, struct pt_regs *regs) > > { > > + /* Also used as BTI simulator: both just advance PC by one insn. */ > > arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE); > > } > > This comment should go. > > Mark. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 0/2] arm64: kprobes: fix XOL preemption window 2025-11-06 10:49 [PATCH] arm64: insn: Route BTI to simulate_nop to avoid XOL/SS at function entry Khaja Hussain Shaik Khaji 2025-11-11 10:26 ` Mark Rutland @ 2026-02-17 13:38 ` Khaja Hussain Shaik Khaji 2026-02-17 13:38 ` [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step Khaja Hussain Shaik Khaji 2026-02-17 13:38 ` [PATCH v2 2/2] arm64: insn: drop NOP from steppable hint list Khaja Hussain Shaik Khaji 1 sibling, 2 replies; 18+ messages in thread From: Khaja Hussain Shaik Khaji @ 2026-02-17 13:38 UTC (permalink / raw) To: linux-arm-kernel Cc: mark.rutland, catalin.marinas, dev.jain, linux-kernel, yang, linux-arm-msm, will, mhiramat Hi Mark, Thanks for the detailed analysis. You're right that this is not BTI-specific. The underlying issue is that XOL execution assumes per-CPU kprobe state remains intact across exception return, which can be violated if execution is preempted or migrated during the XOL window. This v2 series addresses the root cause of kprobe crashes that the previous BTI workaround addressed only indirectly: disable preemption across the XOL instruction and re-enable it in the SS-BRK handler. This ensures the XOL/SS-BRK pair executes on the same CPU and avoids corruption of per-CPU kprobe state. Regarding triggering: this was observed with kretprobes during long stability runs (800+ hours on dwc3 paths), where XOL execution may be preempted or migrated before the SS-BRK is handled, resulting in incorrect per-CPU kprobe state. This series leaves BTI handling unchanged and avoids emulating BTI as NOP. Khaja Hussain Shaik Khaji (2): arm64: kprobes: disable preemption across XOL single-step arm64: insn: drop NOP from steppable hint list arch/arm64/include/asm/insn.h | 1 - arch/arm64/kernel/probes/kprobes.c | 13 +++++++++++++ 2 files changed, 13 insertions(+), 1 deletion(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step 2026-02-17 13:38 ` [PATCH v2 0/2] arm64: kprobes: fix XOL preemption window Khaja Hussain Shaik Khaji @ 2026-02-17 13:38 ` Khaja Hussain Shaik Khaji 2026-02-17 16:55 ` Mark Rutland 2026-02-17 13:38 ` [PATCH v2 2/2] arm64: insn: drop NOP from steppable hint list Khaja Hussain Shaik Khaji 1 sibling, 1 reply; 18+ messages in thread From: Khaja Hussain Shaik Khaji @ 2026-02-17 13:38 UTC (permalink / raw) To: linux-arm-kernel Cc: mark.rutland, catalin.marinas, dev.jain, linux-kernel, yang, linux-arm-msm, will, mhiramat On arm64, non-emulatable kprobes instructions execute out-of-line (XOL) after returning from the initial debug exception. The XOL instruction runs in normal kernel context, while kprobe state is maintained per-CPU. If the task is preempted or migrates during the XOL window, the subsequent SS-BRK exception may be handled on a different CPU, corrupting per-CPU kprobe state and preventing correct recovery. Disable preemption across the XOL instruction and re-enable it in the SS-BRK handler to prevent migration until control returns to the kprobe handler. Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> --- arch/arm64/kernel/probes/kprobes.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index 43a0361a8bf0..d8a70c456543 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -227,6 +227,14 @@ static void __kprobes setup_singlestep(struct kprobe *p, kprobes_save_local_irqflag(kcb, regs); instruction_pointer_set(regs, slot); + + /* + * Disable preemption across the out-of-line (XOL) instruction. + * The XOL instruction executes in normal kernel context and + * kprobe state is per-CPU. + */ + preempt_disable(); + } else { /* insn simulation */ arch_simulate_insn(p, regs); @@ -363,6 +371,11 @@ kprobe_ss_brk_handler(struct pt_regs *regs, unsigned long esr) kprobes_restore_local_irqflag(kcb, regs); post_kprobe_handler(cur, kcb, regs); + /* + * Re-enable preemption after completing the XOL instruction. + */ + preempt_enable_no_resched(); + return DBG_HOOK_HANDLED; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step 2026-02-17 13:38 ` [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step Khaja Hussain Shaik Khaji @ 2026-02-17 16:55 ` Mark Rutland 2026-02-23 16:07 ` Masami Hiramatsu ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Mark Rutland @ 2026-02-17 16:55 UTC (permalink / raw) To: Khaja Hussain Shaik Khaji Cc: catalin.marinas, dev.jain, linux-kernel, yang, linux-arm-msm, will, linux-arm-kernel, mhiramat On Tue, Feb 17, 2026 at 07:08:54PM +0530, Khaja Hussain Shaik Khaji wrote: > On arm64, non-emulatable kprobes instructions execute out-of-line (XOL) > after returning from the initial debug exception. The XOL instruction > runs in normal kernel context, while kprobe state is maintained per-CPU. The XOL instruction runs in a context with all DAIF bits set (see kprobes_save_local_irqflag() and kprobes_restore_local_irqflag()), so not quite a regular kernel context. > If the task is preempted or migrates during the XOL window, the subsequent > SS-BRK exception may be handled on a different CPU, corrupting per-CPU > kprobe state and preventing correct recovery. I think we need a better explanation of this. Since DAIF is masked, we won't take an IRQ to preempt during the actual XOL execution. AFAICT we *could* explicitly preempt/schedule in C code around the XOL execution. However, AFAICT that'd equally apply to other architectures, and on x86 they *removed* the preempt count manipulation in commit: 2bbda764d720aaca ("kprobes/x86: Do not disable preempt on int3 path") ... so it looks like there's a wider potential problem here. Can you please share an example failure that you have seen? .. and how you triggered it (e.g. is this a plain kprobe, something with bpf, etc). I reckon you could hack a warning something into schedule() (or cond_resched(), etc) that detects when there's an active XOL slot, so that we can get the full backtrace. > Disable preemption across the XOL instruction and re-enable it in the > SS-BRK handler to prevent migration until control returns to the kprobe > handler. This might work, but without some more detail I'm not certain this is sufficient, and I believe other architectures are likely affected by the same problem. Thanks, Mark. > > Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> > --- > arch/arm64/kernel/probes/kprobes.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c > index 43a0361a8bf0..d8a70c456543 100644 > --- a/arch/arm64/kernel/probes/kprobes.c > +++ b/arch/arm64/kernel/probes/kprobes.c > @@ -227,6 +227,14 @@ static void __kprobes setup_singlestep(struct kprobe *p, > > kprobes_save_local_irqflag(kcb, regs); > instruction_pointer_set(regs, slot); > + > + /* > + * Disable preemption across the out-of-line (XOL) instruction. > + * The XOL instruction executes in normal kernel context and > + * kprobe state is per-CPU. > + */ > + preempt_disable(); > + > } else { > /* insn simulation */ > arch_simulate_insn(p, regs); > @@ -363,6 +371,11 @@ kprobe_ss_brk_handler(struct pt_regs *regs, unsigned long esr) > kprobes_restore_local_irqflag(kcb, regs); > post_kprobe_handler(cur, kcb, regs); > > + /* > + * Re-enable preemption after completing the XOL instruction. > + */ > + preempt_enable_no_resched(); > + > return DBG_HOOK_HANDLED; > } > > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step 2026-02-17 16:55 ` Mark Rutland @ 2026-02-23 16:07 ` Masami Hiramatsu 2026-03-02 10:19 ` Khaja Hussain Shaik Khaji 2026-03-02 10:53 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Khaja Hussain Shaik Khaji 2 siblings, 0 replies; 18+ messages in thread From: Masami Hiramatsu @ 2026-02-23 16:07 UTC (permalink / raw) To: Mark Rutland Cc: catalin.marinas, dev.jain, linux-kernel, mhiramat, linux-arm-msm, yang, will, linux-arm-kernel, Khaja Hussain Shaik Khaji On Tue, 17 Feb 2026 16:55:44 +0000 Mark Rutland <mark.rutland@arm.com> wrote: > On Tue, Feb 17, 2026 at 07:08:54PM +0530, Khaja Hussain Shaik Khaji wrote: > > On arm64, non-emulatable kprobes instructions execute out-of-line (XOL) > > after returning from the initial debug exception. The XOL instruction > > runs in normal kernel context, while kprobe state is maintained per-CPU. > > The XOL instruction runs in a context with all DAIF bits set (see > kprobes_save_local_irqflag() and kprobes_restore_local_irqflag()), so > not quite a regular kernel context. > > > If the task is preempted or migrates during the XOL window, the subsequent > > SS-BRK exception may be handled on a different CPU, corrupting per-CPU > > kprobe state and preventing correct recovery. > > I think we need a better explanation of this. > > Since DAIF is masked, we won't take an IRQ to preempt during the actual > XOL execution. > > AFAICT we *could* explicitly preempt/schedule in C code around the XOL > execution. However, AFAICT that'd equally apply to other architectures, > and on x86 they *removed* the preempt count manipulation in commit: > > 2bbda764d720aaca ("kprobes/x86: Do not disable preempt on int3 path") > > ... so it looks like there's a wider potential problem here. > > Can you please share an example failure that you have seen? .. and how > you triggered it (e.g. is this a plain kprobe, something with bpf, etc). Yeah, this is important to know. Did it really happen on the single stepping? or in user's handler function? > > I reckon you could hack a warning something into schedule() (or > cond_resched(), etc) that detects when there's an active XOL slot, so > that we can get the full backtrace. Sounds good way to show it. Thank you, > > > Disable preemption across the XOL instruction and re-enable it in the > > SS-BRK handler to prevent migration until control returns to the kprobe > > handler. > > This might work, but without some more detail I'm not certain this is > sufficient, and I believe other architectures are likely affected by the > same problem. > > Thanks, > Mark. > > > > > Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> > > --- > > arch/arm64/kernel/probes/kprobes.c | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c > > index 43a0361a8bf0..d8a70c456543 100644 > > --- a/arch/arm64/kernel/probes/kprobes.c > > +++ b/arch/arm64/kernel/probes/kprobes.c > > @@ -227,6 +227,14 @@ static void __kprobes setup_singlestep(struct kprobe *p, > > > > kprobes_save_local_irqflag(kcb, regs); > > instruction_pointer_set(regs, slot); > > + > > + /* > > + * Disable preemption across the out-of-line (XOL) instruction. > > + * The XOL instruction executes in normal kernel context and > > + * kprobe state is per-CPU. > > + */ > > + preempt_disable(); > > + > > } else { > > /* insn simulation */ > > arch_simulate_insn(p, regs); > > @@ -363,6 +371,11 @@ kprobe_ss_brk_handler(struct pt_regs *regs, unsigned long esr) > > kprobes_restore_local_irqflag(kcb, regs); > > post_kprobe_handler(cur, kcb, regs); > > > > + /* > > + * Re-enable preemption after completing the XOL instruction. > > + */ > > + preempt_enable_no_resched(); > > + > > return DBG_HOOK_HANDLED; > > } > > > > -- > > 2.34.1 > > > -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step 2026-02-17 16:55 ` Mark Rutland 2026-02-23 16:07 ` Masami Hiramatsu @ 2026-03-02 10:19 ` Khaja Hussain Shaik Khaji 2026-03-02 10:23 ` Mark Rutland 2026-03-02 10:53 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Khaja Hussain Shaik Khaji 2 siblings, 1 reply; 18+ messages in thread From: Khaja Hussain Shaik Khaji @ 2026-03-02 10:19 UTC (permalink / raw) To: mark.rutland Cc: linux-arm-msm, dev.jain, linux-kernel, mhiramat, catalin.marinas, will, linux-arm-kernel, yang On Tue, Feb 17, 2026 at 04:55:44PM +0000, Mark Rutland wrote: > Since DAIF is masked, we won't take an IRQ to preempt during XOL. > Can you please share an example failure that you have seen? > I believe other architectures are likely affected by the same problem. Thank you for the review. You were correct on all counts. I confirmed the issue is not related to scheduling or preemption, and the v1/v2 approach was based on an incorrect assumption. I’m dropping that line of reasoning. I’ve since identified the actual root cause and have a new fix ready, which I’ll send shortly as v3. Thanks, Khaja ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step 2026-03-02 10:19 ` Khaja Hussain Shaik Khaji @ 2026-03-02 10:23 ` Mark Rutland 0 siblings, 0 replies; 18+ messages in thread From: Mark Rutland @ 2026-03-02 10:23 UTC (permalink / raw) To: Khaja Hussain Shaik Khaji Cc: linux-arm-msm, dev.jain, linux-kernel, mhiramat, catalin.marinas, will, linux-arm-kernel, yang On Mon, Mar 02, 2026 at 03:49:05PM +0530, Khaja Hussain Shaik Khaji wrote: > On Tue, Feb 17, 2026 at 04:55:44PM +0000, Mark Rutland wrote: > > Since DAIF is masked, we won't take an IRQ to preempt during XOL. > > Can you please share an example failure that you have seen? > > I believe other architectures are likely affected by the same problem. > > Thank you for the review. You were correct on all counts. > > I confirmed the issue is not related to scheduling or preemption, and the > v1/v2 approach was based on an incorrect assumption. I’m dropping that > line of reasoning. > > I’ve since identified the actual root cause and have a new fix ready, > which I’ll send shortly as v3. Ok. As above, *please* include an observed failure in the commit message. It will be the first thing we ask for otherwise. Mark. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during 2026-02-17 16:55 ` Mark Rutland 2026-02-23 16:07 ` Masami Hiramatsu 2026-03-02 10:19 ` Khaja Hussain Shaik Khaji @ 2026-03-02 10:53 ` Khaja Hussain Shaik Khaji 2026-03-02 10:53 ` [PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls Khaja Hussain Shaik Khaji 2026-03-02 11:23 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Mark Rutland 2 siblings, 2 replies; 18+ messages in thread From: Khaja Hussain Shaik Khaji @ 2026-03-02 10:53 UTC (permalink / raw) To: mark.rutland Cc: linux-arm-msm, dev.jain, linux-kernel, mhiramat, catalin.marinas, will, linux-arm-kernel, yang This patch fixes a kprobes failure observed due to lost current_kprobe on arm64 during kretprobe entry handling under interrupt load. v1 attempted to address this by simulating BTI instructions as NOPs and v2 attempted to address this by disabling preemption across the out-of-line (XOL) execution window. Further analysis showed that this hypothesis was incorrect: the failure is not caused by scheduling or preemption during XOL. The actual root cause is re-entrant invocation of kprobe_busy_begin() from an active kprobe context. On arm64, IRQs are re-enabled before invoking kprobe handlers, allowing an interrupt during kretprobe entry_handler to trigger kprobe_flush_task(), which calls kprobe_busy_begin/end and corrupts current_kprobe and kprobe_status. [ 2280.630526] Call trace: [ 2280.633044] dump_backtrace+0x104/0x14c [ 2280.636985] show_stack+0x20/0x30 [ 2280.640390] dump_stack_lvl+0x58/0x74 [ 2280.644154] dump_stack+0x20/0x30 [ 2280.647562] kprobe_busy_begin+0xec/0xf0 [ 2280.651593] kprobe_flush_task+0x2c/0x60 [ 2280.655624] delayed_put_task_struct+0x2c/0x124 [ 2280.660282] rcu_core+0x56c/0x984 [ 2280.663695] rcu_core_si+0x18/0x28 [ 2280.667189] handle_softirqs+0x160/0x30c [ 2280.671220] __do_softirq+0x1c/0x2c [ 2280.674807] ____do_softirq+0x18/0x28 [ 2280.678569] call_on_irq_stack+0x48/0x88 [ 2280.682599] do_softirq_own_stack+0x24/0x34 [ 2280.686900] irq_exit_rcu+0x5c/0xbc [ 2280.690489] el1_interrupt+0x40/0x60 [ 2280.694167] el1h_64_irq_handler+0x20/0x30 [ 2280.698372] el1h_64_irq+0x64/0x68 [ 2280.701872] _raw_spin_unlock_irq+0x14/0x54 [ 2280.706173] dwc3_msm_notify_event+0x6e8/0xbe8 [ 2280.710743] entry_dwc3_gadget_pullup+0x3c/0x6c [ 2280.715393] pre_handler_kretprobe+0x1cc/0x304 [ 2280.719956] kprobe_breakpoint_handler+0x1b0/0x388 [ 2280.724878] brk_handler+0x8c/0x128 [ 2280.728464] do_debug_exception+0x94/0x120 [ 2280.732670] el1_dbg+0x60/0x7c [ 2280.735815] el1h_64_sync_handler+0x48/0xb8 [ 2280.740114] el1h_64_sync+0x64/0x68 [ 2280.743701] dwc3_gadget_pullup+0x0/0x124 [ 2280.747827] soft_connect_store+0xb4/0x15c [ 2280.752031] dev_attr_store+0x20/0x38 [ 2280.755798] sysfs_kf_write+0x44/0x5c [ 2280.759564] kernfs_fop_write_iter+0xf4/0x198 [ 2280.764033] vfs_write+0x1d0/0x2b0 [ 2280.767529] ksys_write+0x80/0xf0 [ 2280.770940] __arm64_sys_write+0x24/0x34 [ 2280.774974] invoke_syscall+0x54/0x118 [ 2280.778822] el0_svc_common+0xb4/0xe8 [ 2280.782587] do_el0_svc+0x24/0x34 [ 2280.785999] el0_svc+0x40/0xa4 [ 2280.789140] el0t_64_sync_handler+0x8c/0x108 [ 2280.793526] el0t_64_sync+0x198/0x19c This v3 patch makes kprobe_busy_begin/end re-entrant safe by preserving the active kprobe state using a per-CPU depth counter and saved state. The detailed failure analysis and justification are included in the commit message. Changes since v2: - Dropped the scheduling/preemption-based approach. - Identified the re-entrant kprobe_busy_begin() root cause. - Fixed kprobe_busy_begin/end to preserve active kprobe state. - Link to v2: https://lore.kernel.org/all/20260217133855.3142192-2-khaja.khaji@oss.qualcomm.com/ Khaja Hussain Shaik Khaji (1): kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls kernel/kprobes.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls 2026-03-02 10:53 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Khaja Hussain Shaik Khaji @ 2026-03-02 10:53 ` Khaja Hussain Shaik Khaji 2026-03-02 13:38 ` Mark Rutland 2026-03-02 11:23 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Mark Rutland 1 sibling, 1 reply; 18+ messages in thread From: Khaja Hussain Shaik Khaji @ 2026-03-02 10:53 UTC (permalink / raw) To: mark.rutland Cc: linux-arm-msm, dev.jain, linux-kernel, mhiramat, catalin.marinas, will, linux-arm-kernel, yang Fix cur_kprobe corruption that occurs when kprobe_busy_begin() is called re-entrantly during an active kprobe handler. Previously, kprobe_busy_begin() unconditionally overwrites current_kprobe with &kprobe_busy, and kprobe_busy_end() writes NULL. This approach works correctly when no kprobe is active but fails during re-entrant calls. On arm64, arm64_enter_el1_dbg() re-enables IRQs before invoking kprobe handlers. This allows an IRQ during kretprobe entry_handler to trigger kprobe_flush_task() via softirq, which calls kprobe_busy_begin/end and corrupts cur_kprobe. Problem flow: kretprobe entry_handler -> IRQ -> softirq -> kprobe_flush_task -> kprobe_busy_begin/end -> cur_kprobe corruption. This corruption causes two issues: 1. NULL cur_kprobe in setup_singlestep leading to panic in single-step handler 2. kprobe_status overwritten with HIT_ACTIVE during execute-out-of-line window Implement a per-CPU re-entrancy tracking mechanism with: - A depth counter to track nested calls - Saved state for current_kprobe and kprobe_status - Save state on first entry, restore on final exit - Increment depth counter for nested calls only This approach maintains compatibility with existing callers as save/restore of NULL is a no-op. Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> --- kernel/kprobes.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index e2cd01cf5968..47a4ae50ee6c 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -70,6 +70,15 @@ static bool kprobes_all_disarmed; static DEFINE_MUTEX(kprobe_mutex); static DEFINE_PER_CPU(struct kprobe *, kprobe_instance); +/* Per-CPU re-entrancy state for kprobe_busy_begin/end. + * kprobe_busy_begin() may be called while a kprobe handler + * is active - e.g. kprobe_flush_task() via softirq during + * kretprobe entry_handler on arm64 where IRQs are re-enabled. + */ +static DEFINE_PER_CPU(int, kprobe_busy_depth); +static DEFINE_PER_CPU(struct kprobe *, kprobe_busy_saved_current); +static DEFINE_PER_CPU(unsigned long, kprobe_busy_saved_status); + kprobe_opcode_t * __weak kprobe_lookup_name(const char *name, unsigned int __unused) { @@ -1307,14 +1316,31 @@ void kprobe_busy_begin(void) struct kprobe_ctlblk *kcb; preempt_disable(); - __this_cpu_write(current_kprobe, &kprobe_busy); - kcb = get_kprobe_ctlblk(); - kcb->kprobe_status = KPROBE_HIT_ACTIVE; + if (__this_cpu_read(kprobe_busy_depth) == 0) { + kcb = get_kprobe_ctlblk(); + __this_cpu_write(kprobe_busy_saved_current, + __this_cpu_read(current_kprobe)); + __this_cpu_write(kprobe_busy_saved_status, + kcb->kprobe_status); + __this_cpu_write(current_kprobe, &kprobe_busy); + kcb->kprobe_status = KPROBE_HIT_ACTIVE; + } + __this_cpu_inc(kprobe_busy_depth); } void kprobe_busy_end(void) { - __this_cpu_write(current_kprobe, NULL); + struct kprobe_ctlblk *kcb; + + __this_cpu_dec(kprobe_busy_depth); + + if (__this_cpu_read(kprobe_busy_depth) == 0) { + kcb = get_kprobe_ctlblk(); + __this_cpu_write(current_kprobe, + __this_cpu_read(kprobe_busy_saved_current)); + kcb->kprobe_status = + __this_cpu_read(kprobe_busy_saved_status); + } preempt_enable(); } -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls 2026-03-02 10:53 ` [PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls Khaja Hussain Shaik Khaji @ 2026-03-02 13:38 ` Mark Rutland 0 siblings, 0 replies; 18+ messages in thread From: Mark Rutland @ 2026-03-02 13:38 UTC (permalink / raw) To: Khaja Hussain Shaik Khaji Cc: linux-arm-msm, dev.jain, linux-kernel, mhiramat, catalin.marinas, will, linux-arm-kernel, yang On Mon, Mar 02, 2026 at 04:23:47PM +0530, Khaja Hussain Shaik Khaji wrote: > Fix cur_kprobe corruption that occurs when kprobe_busy_begin() is called > re-entrantly during an active kprobe handler. > > Previously, kprobe_busy_begin() unconditionally overwrites current_kprobe > with &kprobe_busy, and kprobe_busy_end() writes NULL. This approach works > correctly when no kprobe is active but fails during re-entrant calls. The structure of kprobe_busy_begin() and kprobe_busy_end() implies that re-entrancy is unexpected, and something that should be avoided somehow. Is that the case, or are kprobe_busy_begin() and kprobe_busy_end() generally buggy? > On arm64, arm64_enter_el1_dbg() re-enables IRQs before invoking kprobe > handlers. No, arm64_enter_el1_dbg() does not re-enable IRQs. It only manages state tracking. I don't know if you meant to say a different function here, but this statement is clearly wrong. > This allows an IRQ during kretprobe > entry_handler to trigger kprobe_flush_task() via softirq, which calls > kprobe_busy_begin/end and corrupts cur_kprobe. This would be easier to follow if the backtrace were included in the commit message, rather than in the cover letter, such that it could be referred to easily. > Problem flow: kretprobe entry_handler -> IRQ -> softirq -> > kprobe_flush_task -> kprobe_busy_begin/end -> cur_kprobe corruption. We shouldn't take the IRQ in the first place here. AFAICT, nothing unmasks IRQs prior to the entry handler. That suggests that something is going wrong *within* your entry handler that causes IRQs to be unmasked unexpectedly. Please can we find out *exactly* where IRQs get unmasked for the first time? Mark. > > This corruption causes two issues: > 1. NULL cur_kprobe in setup_singlestep leading to panic in single-step > handler > 2. kprobe_status overwritten with HIT_ACTIVE during execute-out-of-line > window > > Implement a per-CPU re-entrancy tracking mechanism with: > - A depth counter to track nested calls > - Saved state for current_kprobe and kprobe_status > - Save state on first entry, restore on final exit > - Increment depth counter for nested calls only > > This approach maintains compatibility with existing callers as > save/restore of NULL is a no-op. > > Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> > --- > kernel/kprobes.c | 34 ++++++++++++++++++++++++++++++---- > 1 file changed, 30 insertions(+), 4 deletions(-) > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index e2cd01cf5968..47a4ae50ee6c 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -70,6 +70,15 @@ static bool kprobes_all_disarmed; > static DEFINE_MUTEX(kprobe_mutex); > static DEFINE_PER_CPU(struct kprobe *, kprobe_instance); > > +/* Per-CPU re-entrancy state for kprobe_busy_begin/end. > + * kprobe_busy_begin() may be called while a kprobe handler > + * is active - e.g. kprobe_flush_task() via softirq during > + * kretprobe entry_handler on arm64 where IRQs are re-enabled. > + */ > +static DEFINE_PER_CPU(int, kprobe_busy_depth); > +static DEFINE_PER_CPU(struct kprobe *, kprobe_busy_saved_current); > +static DEFINE_PER_CPU(unsigned long, kprobe_busy_saved_status); > + > kprobe_opcode_t * __weak kprobe_lookup_name(const char *name, > unsigned int __unused) > { > @@ -1307,14 +1316,31 @@ void kprobe_busy_begin(void) > struct kprobe_ctlblk *kcb; > > preempt_disable(); > - __this_cpu_write(current_kprobe, &kprobe_busy); > - kcb = get_kprobe_ctlblk(); > - kcb->kprobe_status = KPROBE_HIT_ACTIVE; > + if (__this_cpu_read(kprobe_busy_depth) == 0) { > + kcb = get_kprobe_ctlblk(); > + __this_cpu_write(kprobe_busy_saved_current, > + __this_cpu_read(current_kprobe)); > + __this_cpu_write(kprobe_busy_saved_status, > + kcb->kprobe_status); > + __this_cpu_write(current_kprobe, &kprobe_busy); > + kcb->kprobe_status = KPROBE_HIT_ACTIVE; > + } > + __this_cpu_inc(kprobe_busy_depth); > } > > void kprobe_busy_end(void) > { > - __this_cpu_write(current_kprobe, NULL); > + struct kprobe_ctlblk *kcb; > + > + __this_cpu_dec(kprobe_busy_depth); > + > + if (__this_cpu_read(kprobe_busy_depth) == 0) { > + kcb = get_kprobe_ctlblk(); > + __this_cpu_write(current_kprobe, > + __this_cpu_read(kprobe_busy_saved_current)); > + kcb->kprobe_status = > + __this_cpu_read(kprobe_busy_saved_status); > + } > preempt_enable(); > } > > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during 2026-03-02 10:53 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Khaja Hussain Shaik Khaji 2026-03-02 10:53 ` [PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls Khaja Hussain Shaik Khaji @ 2026-03-02 11:23 ` Mark Rutland 2026-03-02 12:23 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls Khaja Hussain Shaik Khaji 1 sibling, 1 reply; 18+ messages in thread From: Mark Rutland @ 2026-03-02 11:23 UTC (permalink / raw) To: Khaja Hussain Shaik Khaji Cc: linux-arm-msm, dev.jain, linux-kernel, mhiramat, catalin.marinas, will, linux-arm-kernel, yang On Mon, Mar 02, 2026 at 04:23:46PM +0530, Khaja Hussain Shaik Khaji wrote: > This patch fixes a kprobes failure observed due to lost current_kprobe > on arm64 during kretprobe entry handling under interrupt load. > > v1 attempted to address this by simulating BTI instructions as NOPs and > v2 attempted to address this by disabling preemption across the > out-of-line (XOL) execution window. Further analysis showed that this > hypothesis was incorrect: the failure is not caused by scheduling or > preemption during XOL. > > The actual root cause is re-entrant invocation of kprobe_busy_begin() > from an active kprobe context. On arm64, IRQs are re-enabled before > invoking kprobe handlers, allowing an interrupt during kretprobe > entry_handler to trigger kprobe_flush_task(), which calls > kprobe_busy_begin/end and corrupts current_kprobe and kprobe_status. > > [ 2280.630526] Call trace: > [ 2280.633044] dump_backtrace+0x104/0x14c > [ 2280.636985] show_stack+0x20/0x30 > [ 2280.640390] dump_stack_lvl+0x58/0x74 > [ 2280.644154] dump_stack+0x20/0x30 > [ 2280.647562] kprobe_busy_begin+0xec/0xf0 > [ 2280.651593] kprobe_flush_task+0x2c/0x60 > [ 2280.655624] delayed_put_task_struct+0x2c/0x124 > [ 2280.660282] rcu_core+0x56c/0x984 > [ 2280.663695] rcu_core_si+0x18/0x28 > [ 2280.667189] handle_softirqs+0x160/0x30c > [ 2280.671220] __do_softirq+0x1c/0x2c > [ 2280.674807] ____do_softirq+0x18/0x28 > [ 2280.678569] call_on_irq_stack+0x48/0x88 > [ 2280.682599] do_softirq_own_stack+0x24/0x34 > [ 2280.686900] irq_exit_rcu+0x5c/0xbc > [ 2280.690489] el1_interrupt+0x40/0x60 > [ 2280.694167] el1h_64_irq_handler+0x20/0x30 > [ 2280.698372] el1h_64_irq+0x64/0x68 > [ 2280.701872] _raw_spin_unlock_irq+0x14/0x54 > [ 2280.706173] dwc3_msm_notify_event+0x6e8/0xbe8 > [ 2280.710743] entry_dwc3_gadget_pullup+0x3c/0x6c > [ 2280.715393] pre_handler_kretprobe+0x1cc/0x304 > [ 2280.719956] kprobe_breakpoint_handler+0x1b0/0x388 > [ 2280.724878] brk_handler+0x8c/0x128 > [ 2280.728464] do_debug_exception+0x94/0x120 > [ 2280.732670] el1_dbg+0x60/0x7c The el1_dbg() function was removed in commit: 31575e11ecf7 ("arm64: debug: split brk64 exception entry") ... which was merged in v6.17. Are you able to reproduce the issue with v6.17 or later? Which specific kernel version did you see this with? The arm64 entry code has changed substantially in recent months (fixing a bunch of latent issues), and we need to know which specific version you're looking at. It's possible that your issue has already been fixed. Mark. > [ 2280.735815] el1h_64_sync_handler+0x48/0xb8 > [ 2280.740114] el1h_64_sync+0x64/0x68 > [ 2280.743701] dwc3_gadget_pullup+0x0/0x124 > [ 2280.747827] soft_connect_store+0xb4/0x15c > [ 2280.752031] dev_attr_store+0x20/0x38 > [ 2280.755798] sysfs_kf_write+0x44/0x5c > [ 2280.759564] kernfs_fop_write_iter+0xf4/0x198 > [ 2280.764033] vfs_write+0x1d0/0x2b0 > [ 2280.767529] ksys_write+0x80/0xf0 > [ 2280.770940] __arm64_sys_write+0x24/0x34 > [ 2280.774974] invoke_syscall+0x54/0x118 > [ 2280.778822] el0_svc_common+0xb4/0xe8 > [ 2280.782587] do_el0_svc+0x24/0x34 > [ 2280.785999] el0_svc+0x40/0xa4 > [ 2280.789140] el0t_64_sync_handler+0x8c/0x108 > [ 2280.793526] el0t_64_sync+0x198/0x19c ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls 2026-03-02 11:23 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Mark Rutland @ 2026-03-02 12:23 ` Khaja Hussain Shaik Khaji 2026-03-02 13:43 ` Mark Rutland 0 siblings, 1 reply; 18+ messages in thread From: Khaja Hussain Shaik Khaji @ 2026-03-02 12:23 UTC (permalink / raw) To: mark.rutland Cc: catalin.marinas, dev.jain, linux-arm-kernel, linux-arm-msm, linux-kernel, mhiramat, will, yang On Mon, Mar 02, 2026 at 04:23:46PM +0530, Mark Rutland wrote: > The el1_dbg() function was removed in commit: > > 31575e11ecf7 ("arm64: debug: split brk64 exception entry") > > ... which was merged in v6.17. > > Are you able to reproduce the issue with v6.17 or later? > > Which specific kernel version did you see this with? The call trace was captured on v6.9-rc1. I have not yet tested on v6.17 or later. I will test and report back. That said, the fix is in kernel/kprobes.c and addresses a generic re-entrancy issue in kprobe_busy_begin/end that is not specific to the arm64 entry path. The race -- where kprobe_busy_begin() is called re-entrantly from within an active kprobe context (e.g. via softirq during kretprobe entry_handler) -- can occur on any architecture where IRQs are re-enabled before invoking kprobe handlers. I will verify whether the issue is still reproducible on v6.17+ and report back. Thanks, Khaja ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls 2026-03-02 12:23 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls Khaja Hussain Shaik Khaji @ 2026-03-02 13:43 ` Mark Rutland 0 siblings, 0 replies; 18+ messages in thread From: Mark Rutland @ 2026-03-02 13:43 UTC (permalink / raw) To: Khaja Hussain Shaik Khaji Cc: catalin.marinas, dev.jain, linux-arm-kernel, linux-arm-msm, linux-kernel, mhiramat, will, yang On Mon, Mar 02, 2026 at 05:53:38PM +0530, Khaja Hussain Shaik Khaji wrote: > On Mon, Mar 02, 2026 at 04:23:46PM +0530, Mark Rutland wrote: > > The el1_dbg() function was removed in commit: > > > > 31575e11ecf7 ("arm64: debug: split brk64 exception entry") > > > > ... which was merged in v6.17. > > > > Are you able to reproduce the issue with v6.17 or later? > > > > Which specific kernel version did you see this with? > > The call trace was captured on v6.9-rc1. Why are you using an -rc1 release from almost two years ago? > I have not yet tested on v6.17 or later. I will test and report back. > > That said, the fix is in kernel/kprobes.c and addresses a generic > re-entrancy issue in kprobe_busy_begin/end that is not specific to the > arm64 entry path. The race -- where kprobe_busy_begin() is called > re-entrantly from within an active kprobe context (e.g. via softirq > during kretprobe entry_handler) -- can occur on any architecture where > IRQs are re-enabled before invoking kprobe handlers. AFAICT, re-enabling IRQs in that path would be a bug, and re-entrancy is simply not expected. Please see my other reply on that front. > I will verify whether the issue is still reproducible on v6.17+ and > report back. Thanks, that would be much appreciated. As would anything you can share on the specifics of your kretprobe entry_handler Mark. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 2/2] arm64: insn: drop NOP from steppable hint list 2026-02-17 13:38 ` [PATCH v2 0/2] arm64: kprobes: fix XOL preemption window Khaja Hussain Shaik Khaji 2026-02-17 13:38 ` [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step Khaja Hussain Shaik Khaji @ 2026-02-17 13:38 ` Khaja Hussain Shaik Khaji 2026-02-17 16:57 ` Mark Rutland 1 sibling, 1 reply; 18+ messages in thread From: Khaja Hussain Shaik Khaji @ 2026-02-17 13:38 UTC (permalink / raw) To: linux-arm-kernel Cc: mark.rutland, catalin.marinas, dev.jain, linux-kernel, yang, linux-arm-msm, will, mhiramat NOP is already handled via instruction emulation and does not require single-stepping. Drop it from aarch64_insn_is_steppable_hint(). Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> --- arch/arm64/include/asm/insn.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h index e1d30ba99d01..9429f76906e0 100644 --- a/arch/arm64/include/asm/insn.h +++ b/arch/arm64/include/asm/insn.h @@ -456,7 +456,6 @@ static __always_inline bool aarch64_insn_is_steppable_hint(u32 insn) case AARCH64_INSN_HINT_BTIC: case AARCH64_INSN_HINT_BTIJ: case AARCH64_INSN_HINT_BTIJC: - case AARCH64_INSN_HINT_NOP: return true; default: return false; -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/2] arm64: insn: drop NOP from steppable hint list 2026-02-17 13:38 ` [PATCH v2 2/2] arm64: insn: drop NOP from steppable hint list Khaja Hussain Shaik Khaji @ 2026-02-17 16:57 ` Mark Rutland 2026-02-24 8:23 ` Masami Hiramatsu 0 siblings, 1 reply; 18+ messages in thread From: Mark Rutland @ 2026-02-17 16:57 UTC (permalink / raw) To: Khaja Hussain Shaik Khaji Cc: catalin.marinas, dev.jain, linux-kernel, yang, linux-arm-msm, will, linux-arm-kernel, mhiramat On Tue, Feb 17, 2026 at 07:08:55PM +0530, Khaja Hussain Shaik Khaji wrote: > NOP is already handled via instruction emulation and does not require > single-stepping. Drop it from aarch64_insn_is_steppable_hint(). > > Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> > --- > arch/arm64/include/asm/insn.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h > index e1d30ba99d01..9429f76906e0 100644 > --- a/arch/arm64/include/asm/insn.h > +++ b/arch/arm64/include/asm/insn.h > @@ -456,7 +456,6 @@ static __always_inline bool aarch64_insn_is_steppable_hint(u32 insn) > case AARCH64_INSN_HINT_BTIC: > case AARCH64_INSN_HINT_BTIJ: > case AARCH64_INSN_HINT_BTIJC: > - case AARCH64_INSN_HINT_NOP: > return true; > default: > return false; The intent is that aarch64_insn_is_steppable_hint() says whether an instruction is safe to step, not whether it *must* be stepped. I think we can leave NOP here unless this is causing some functional problem? Mark. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/2] arm64: insn: drop NOP from steppable hint list 2026-02-17 16:57 ` Mark Rutland @ 2026-02-24 8:23 ` Masami Hiramatsu 0 siblings, 0 replies; 18+ messages in thread From: Masami Hiramatsu @ 2026-02-24 8:23 UTC (permalink / raw) To: Mark Rutland Cc: catalin.marinas, dev.jain, linux-kernel, mhiramat, linux-arm-msm, yang, will, linux-arm-kernel, Khaja Hussain Shaik Khaji On Tue, 17 Feb 2026 16:57:08 +0000 Mark Rutland <mark.rutland@arm.com> wrote: > On Tue, Feb 17, 2026 at 07:08:55PM +0530, Khaja Hussain Shaik Khaji wrote: > > NOP is already handled via instruction emulation and does not require > > single-stepping. Drop it from aarch64_insn_is_steppable_hint(). > > > > Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com> > > --- > > arch/arm64/include/asm/insn.h | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h > > index e1d30ba99d01..9429f76906e0 100644 > > --- a/arch/arm64/include/asm/insn.h > > +++ b/arch/arm64/include/asm/insn.h > > @@ -456,7 +456,6 @@ static __always_inline bool aarch64_insn_is_steppable_hint(u32 insn) > > case AARCH64_INSN_HINT_BTIC: > > case AARCH64_INSN_HINT_BTIJ: > > case AARCH64_INSN_HINT_BTIJC: > > - case AARCH64_INSN_HINT_NOP: > > return true; > > default: > > return false; > > The intent is that aarch64_insn_is_steppable_hint() says whether an > instruction is safe to step, not whether it *must* be stepped. I think > we can leave NOP here unless this is causing some functional problem? Agreed. I think we should keep this as it is. Thank you, > > Mark. > -- Masami Hiramatsu (Google) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-03-02 13:43 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-06 10:49 [PATCH] arm64: insn: Route BTI to simulate_nop to avoid XOL/SS at function entry Khaja Hussain Shaik Khaji 2025-11-11 10:26 ` Mark Rutland 2025-11-12 12:17 ` Mark Rutland 2026-02-17 13:38 ` [PATCH v2 0/2] arm64: kprobes: fix XOL preemption window Khaja Hussain Shaik Khaji 2026-02-17 13:38 ` [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step Khaja Hussain Shaik Khaji 2026-02-17 16:55 ` Mark Rutland 2026-02-23 16:07 ` Masami Hiramatsu 2026-03-02 10:19 ` Khaja Hussain Shaik Khaji 2026-03-02 10:23 ` Mark Rutland 2026-03-02 10:53 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Khaja Hussain Shaik Khaji 2026-03-02 10:53 ` [PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls Khaja Hussain Shaik Khaji 2026-03-02 13:38 ` Mark Rutland 2026-03-02 11:23 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Mark Rutland 2026-03-02 12:23 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls Khaja Hussain Shaik Khaji 2026-03-02 13:43 ` Mark Rutland 2026-02-17 13:38 ` [PATCH v2 2/2] arm64: insn: drop NOP from steppable hint list Khaja Hussain Shaik Khaji 2026-02-17 16:57 ` Mark Rutland 2026-02-24 8:23 ` Masami Hiramatsu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox