From: Mark Rutland <mark.rutland@arm.com>
To: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com>
Cc: linux-arm-msm@vger.kernel.org, dev.jain@arm.com,
linux-kernel@vger.kernel.org, mhiramat@kernel.org,
catalin.marinas@arm.com, will@kernel.org,
linux-arm-kernel@lists.infradead.org,
yang@os.amperecomputing.com
Subject: Re: [PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls
Date: Mon, 2 Mar 2026 13:38:35 +0000 [thread overview]
Message-ID: <aaWS20g-jGu8mCKH@J2N7QTR9R3> (raw)
In-Reply-To: <20260302105347.3602192-2-khaja.khaji@oss.qualcomm.com>
On Mon, Mar 02, 2026 at 04:23:47PM +0530, Khaja Hussain Shaik Khaji wrote:
> Fix cur_kprobe corruption that occurs when kprobe_busy_begin() is called
> re-entrantly during an active kprobe handler.
>
> Previously, kprobe_busy_begin() unconditionally overwrites current_kprobe
> with &kprobe_busy, and kprobe_busy_end() writes NULL. This approach works
> correctly when no kprobe is active but fails during re-entrant calls.
The structure of kprobe_busy_begin() and kprobe_busy_end() implies that
re-entrancy is unexpected, and something that should be avoided somehow.
Is that the case, or are kprobe_busy_begin() and kprobe_busy_end()
generally buggy?
> On arm64, arm64_enter_el1_dbg() re-enables IRQs before invoking kprobe
> handlers.
No, arm64_enter_el1_dbg() does not re-enable IRQs. It only manages state
tracking.
I don't know if you meant to say a different function here, but this
statement is clearly wrong.
> This allows an IRQ during kretprobe
> entry_handler to trigger kprobe_flush_task() via softirq, which calls
> kprobe_busy_begin/end and corrupts cur_kprobe.
This would be easier to follow if the backtrace were included in the
commit message, rather than in the cover letter, such that it could be
referred to easily.
> Problem flow: kretprobe entry_handler -> IRQ -> softirq ->
> kprobe_flush_task -> kprobe_busy_begin/end -> cur_kprobe corruption.
We shouldn't take the IRQ in the first place here. AFAICT, nothing
unmasks IRQs prior to the entry handler.
That suggests that something is going wrong *within* your entry handler
that causes IRQs to be unmasked unexpectedly.
Please can we find out *exactly* where IRQs get unmasked for the first
time?
Mark.
>
> This corruption causes two issues:
> 1. NULL cur_kprobe in setup_singlestep leading to panic in single-step
> handler
> 2. kprobe_status overwritten with HIT_ACTIVE during execute-out-of-line
> window
>
> Implement a per-CPU re-entrancy tracking mechanism with:
> - A depth counter to track nested calls
> - Saved state for current_kprobe and kprobe_status
> - Save state on first entry, restore on final exit
> - Increment depth counter for nested calls only
>
> This approach maintains compatibility with existing callers as
> save/restore of NULL is a no-op.
>
> Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@oss.qualcomm.com>
> ---
> kernel/kprobes.c | 34 ++++++++++++++++++++++++++++++----
> 1 file changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index e2cd01cf5968..47a4ae50ee6c 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -70,6 +70,15 @@ static bool kprobes_all_disarmed;
> static DEFINE_MUTEX(kprobe_mutex);
> static DEFINE_PER_CPU(struct kprobe *, kprobe_instance);
>
> +/* Per-CPU re-entrancy state for kprobe_busy_begin/end.
> + * kprobe_busy_begin() may be called while a kprobe handler
> + * is active - e.g. kprobe_flush_task() via softirq during
> + * kretprobe entry_handler on arm64 where IRQs are re-enabled.
> + */
> +static DEFINE_PER_CPU(int, kprobe_busy_depth);
> +static DEFINE_PER_CPU(struct kprobe *, kprobe_busy_saved_current);
> +static DEFINE_PER_CPU(unsigned long, kprobe_busy_saved_status);
> +
> kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
> unsigned int __unused)
> {
> @@ -1307,14 +1316,31 @@ void kprobe_busy_begin(void)
> struct kprobe_ctlblk *kcb;
>
> preempt_disable();
> - __this_cpu_write(current_kprobe, &kprobe_busy);
> - kcb = get_kprobe_ctlblk();
> - kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> + if (__this_cpu_read(kprobe_busy_depth) == 0) {
> + kcb = get_kprobe_ctlblk();
> + __this_cpu_write(kprobe_busy_saved_current,
> + __this_cpu_read(current_kprobe));
> + __this_cpu_write(kprobe_busy_saved_status,
> + kcb->kprobe_status);
> + __this_cpu_write(current_kprobe, &kprobe_busy);
> + kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> + }
> + __this_cpu_inc(kprobe_busy_depth);
> }
>
> void kprobe_busy_end(void)
> {
> - __this_cpu_write(current_kprobe, NULL);
> + struct kprobe_ctlblk *kcb;
> +
> + __this_cpu_dec(kprobe_busy_depth);
> +
> + if (__this_cpu_read(kprobe_busy_depth) == 0) {
> + kcb = get_kprobe_ctlblk();
> + __this_cpu_write(current_kprobe,
> + __this_cpu_read(kprobe_busy_saved_current));
> + kcb->kprobe_status =
> + __this_cpu_read(kprobe_busy_saved_status);
> + }
> preempt_enable();
> }
>
> --
> 2.34.1
>
next prev parent reply other threads:[~2026-03-02 13:38 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-06 10:49 [PATCH] arm64: insn: Route BTI to simulate_nop to avoid XOL/SS at function entry Khaja Hussain Shaik Khaji
2025-11-11 10:26 ` Mark Rutland
2025-11-12 12:17 ` Mark Rutland
2026-02-17 13:38 ` [PATCH v2 0/2] arm64: kprobes: fix XOL preemption window Khaja Hussain Shaik Khaji
2026-02-17 13:38 ` [PATCH v2 1/2] arm64: kprobes: disable preemption across XOL single-step Khaja Hussain Shaik Khaji
2026-02-17 16:55 ` Mark Rutland
2026-02-23 16:07 ` Masami Hiramatsu
2026-03-02 10:19 ` Khaja Hussain Shaik Khaji
2026-03-02 10:23 ` Mark Rutland
2026-03-02 10:53 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Khaja Hussain Shaik Khaji
2026-03-02 10:53 ` [PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls Khaja Hussain Shaik Khaji
2026-03-02 13:38 ` Mark Rutland [this message]
2026-03-02 11:23 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during Mark Rutland
2026-03-02 12:23 ` [PATCH v3 0/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls Khaja Hussain Shaik Khaji
2026-03-02 13:43 ` Mark Rutland
2026-02-17 13:38 ` [PATCH v2 2/2] arm64: insn: drop NOP from steppable hint list Khaja Hussain Shaik Khaji
2026-02-17 16:57 ` Mark Rutland
2026-02-24 8:23 ` Masami Hiramatsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaWS20g-jGu8mCKH@J2N7QTR9R3 \
--to=mark.rutland@arm.com \
--cc=catalin.marinas@arm.com \
--cc=dev.jain@arm.com \
--cc=khaja.khaji@oss.qualcomm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=will@kernel.org \
--cc=yang@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox