* [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths
@ 2026-05-18 5:08 Aboorva Devarajan
2026-05-18 5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-05-18 5:08 UTC (permalink / raw)
To: Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
Sourabh Jain, Ritesh Harjani, Shrikanth Hegde
Hi all,
This patch series fixes some minor preempt_count bookkeeping issues in
arch/powerpc/ found during a preemption leak audit prompted by the
lazy/full preemption model changes. These are get_cpu/put_cpu and
get_cpu_var/put_cpu_var pairing errors that leave preempt_count
incorrectly elevated or underflowed.
Please let me know your comments.
Thanks,
Aboorva
Aboorva Devarajan (3):
powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
powerpc/powernv: fix preempt count leak in
pnv_kexec_wait_secondaries_down
powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
arch/powerpc/kexec/core_64.c | 15 ++++++++-------
arch/powerpc/perf/core-fsl-emb.c | 3 ++-
arch/powerpc/platforms/powernv/setup.c | 2 +-
3 files changed, 11 insertions(+), 9 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
2026-05-18 5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
@ 2026-05-18 5:08 ` Aboorva Devarajan
2026-05-18 6:13 ` Shrikanth Hegde
2026-05-18 5:08 ` [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down Aboorva Devarajan
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Aboorva Devarajan @ 2026-05-18 5:08 UTC (permalink / raw)
To: Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
Sourabh Jain, Ritesh Harjani, Shrikanth Hegde
fsl_emb_pmu_del() unconditionally calls put_cpu_var(cpu_hw_events) at
the 'out:' label, but only calls the matching get_cpu_var() after the
'i < 0' early-return check. When event->hw.idx is negative the
function jumps to 'out:' without having taken get_cpu_var(), and the
trailing put_cpu_var() then issues an unmatched preempt_enable(),
underflowing preempt_count.
On a CONFIG_PREEMPT=y kernel preempt_count would underflow and
eventually present as a 'scheduling while atomic' BUG.
Move put_cpu_var() to pair with get_cpu_var() so the percpu access is
correctly bracketed and the 'out:' label only handles perf_pmu_enable.
Fixes: a11106544f33c ("powerpc/perf: e500 support")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
arch/powerpc/perf/core-fsl-emb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index 7120ab20cbfec..02b5dd74c187a 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -366,9 +366,10 @@ static void fsl_emb_pmu_del(struct perf_event *event, int flags)
cpuhw->n_events--;
+ put_cpu_var(cpu_hw_events);
+
out:
perf_pmu_enable(event->pmu);
- put_cpu_var(cpu_hw_events);
}
static void fsl_emb_pmu_start(struct perf_event *event, int ef_flags)
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down
2026-05-18 5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
2026-05-18 5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
@ 2026-05-18 5:08 ` Aboorva Devarajan
2026-05-18 7:56 ` Shrikanth Hegde
2026-05-18 5:08 ` [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus Aboorva Devarajan
2026-05-18 8:08 ` [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Shrikanth Hegde
3 siblings, 1 reply; 14+ messages in thread
From: Aboorva Devarajan @ 2026-05-18 5:08 UTC (permalink / raw)
To: Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
Sourabh Jain, Ritesh Harjani, Shrikanth Hegde
pnv_kexec_wait_secondaries_down() calls get_cpu() to obtain the current
CPU id but never calls the matching put_cpu(), leaking one
preempt_disable() nesting level on every invocation.
In practice the imbalance does not trigger a visible splat because the
kexec teardown path is a one-way trip: IRQs are already disabled, no
schedule() occurs after the leak, and default_machine_kexec() overwrites
preempt_count with HARDIRQ_OFFSET before jumping into kexec_sequence()
which never returns. However the bookkeeping is still wrong.
In the kexec teardown path IRQs are already disabled and the CPU is
pinned, so get_cpu()'s preempt_disable() side-effect is unnecessary.
Replace get_cpu() with raw_smp_processor_id() which returns the CPU id
without touching preempt_count.
Fixes: 298b34d7d578 ("powerpc/powernv: Fix kexec races going back to OPAL")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
arch/powerpc/platforms/powernv/setup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 4dbb47ddbdcc4..177da0defcb36 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -396,7 +396,7 @@ static void pnv_kexec_wait_secondaries_down(void)
{
int my_cpu, i, notified = -1;
- my_cpu = get_cpu();
+ my_cpu = raw_smp_processor_id();
for_each_online_cpu(i) {
uint8_t status;
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
2026-05-18 5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
2026-05-18 5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
2026-05-18 5:08 ` [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down Aboorva Devarajan
@ 2026-05-18 5:08 ` Aboorva Devarajan
2026-05-18 6:02 ` Shrikanth Hegde
2026-05-18 8:08 ` [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Shrikanth Hegde
3 siblings, 1 reply; 14+ messages in thread
From: Aboorva Devarajan @ 2026-05-18 5:08 UTC (permalink / raw)
To: Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
Sourabh Jain, Ritesh Harjani, Shrikanth Hegde
kexec_prepare_cpus_wait() calls get_cpu() internally to obtain the
current CPU id. kexec_prepare_cpus() calls kexec_prepare_cpus_wait()
twice -- once for KEXEC_STATE_IRQS_OFF and once for
KEXEC_STATE_REAL_MODE -- but only issues a single put_cpu() at the end,
leaving preempt_count elevated by one extra nesting level.
In practice the imbalance does not trigger a 'scheduling while atomic'
splat because the kexec path is a one-way trip: IRQs are already
disabled, no schedule() occurs after the leak, and
default_machine_kexec() overwrites preempt_count with HARDIRQ_OFFSET
before jumping into kexec_sequence() which never returns. However the
bookkeeping is still wrong.
Lift the get_cpu()/put_cpu() pair into kexec_prepare_cpus() so it is
called exactly once, and pass the CPU id to kexec_prepare_cpus_wait()
as a parameter. This keeps preempt_count correctly balanced.
Fixes: 1fc711f7ffb01 ("powerpc/kexec: Fix race in kexec shutdown")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
arch/powerpc/kexec/core_64.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 825ab8a88f18e..9d7e5a1e6e5b8 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -164,12 +164,11 @@ static void kexec_smp_down(void *arg)
/* NOTREACHED */
}
-static void kexec_prepare_cpus_wait(int wait_state)
+static void kexec_prepare_cpus_wait(int wait_state, int my_cpu)
{
- int my_cpu, i, notified=-1;
+ int i, notified = -1;
hw_breakpoint_disable();
- my_cpu = get_cpu();
/* Make sure each CPU has at least made it to the state we need.
*
* FIXME: There is a (slim) chance of a problem if not all of the CPUs
@@ -246,6 +245,8 @@ static void wake_offline_cpus(void)
static void kexec_prepare_cpus(void)
{
+ int my_cpu;
+
wake_offline_cpus();
smp_call_function(kexec_smp_down, NULL, /* wait */0);
local_irq_disable();
@@ -254,7 +255,8 @@ static void kexec_prepare_cpus(void)
mb(); /* make sure IRQs are disabled before we say they are */
get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF;
- kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF);
+ my_cpu = get_cpu();
+ kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF, my_cpu);
/* we are sure every CPU has IRQs off at this point */
kexec_all_irq_disabled = 1;
@@ -262,13 +264,12 @@ static void kexec_prepare_cpus(void)
* Before removing MMU mappings make sure all CPUs have entered real
* mode:
*/
- kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
+ kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE, my_cpu);
+ put_cpu();
/* after we tell the others to go down */
if (ppc_md.kexec_cpu_down)
ppc_md.kexec_cpu_down(0, 0);
-
- put_cpu();
}
#else /* ! SMP */
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
2026-05-18 5:08 ` [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus Aboorva Devarajan
@ 2026-05-18 6:02 ` Shrikanth Hegde
2026-06-03 6:14 ` Aboorva Devarajan
0 siblings, 1 reply; 14+ messages in thread
From: Shrikanth Hegde @ 2026-05-18 6:02 UTC (permalink / raw)
To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
Ritesh Harjani
Hi Aboorva.
On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> kexec_prepare_cpus_wait() calls get_cpu() internally to obtain the
> current CPU id. kexec_prepare_cpus() calls kexec_prepare_cpus_wait()
> twice -- once for KEXEC_STATE_IRQS_OFF and once for
> KEXEC_STATE_REAL_MODE -- but only issues a single put_cpu() at the end,
> leaving preempt_count elevated by one extra nesting level.
>
> In practice the imbalance does not trigger a 'scheduling while atomic'
> splat because the kexec path is a one-way trip: IRQs are already
> disabled, no schedule() occurs after the leak, and
> default_machine_kexec() overwrites preempt_count with HARDIRQ_OFFSET
> before jumping into kexec_sequence() which never returns. However the
> bookkeeping is still wrong.
>
> Lift the get_cpu()/put_cpu() pair into kexec_prepare_cpus() so it is
> called exactly once, and pass the CPU id to kexec_prepare_cpus_wait()
> as a parameter. This keeps preempt_count correctly balanced.
>
> Fixes: 1fc711f7ffb01 ("powerpc/kexec: Fix race in kexec shutdown")
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
> arch/powerpc/kexec/core_64.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 825ab8a88f18e..9d7e5a1e6e5b8 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -164,12 +164,11 @@ static void kexec_smp_down(void *arg)
> /* NOTREACHED */
> }
>
> -static void kexec_prepare_cpus_wait(int wait_state)
> +static void kexec_prepare_cpus_wait(int wait_state, int my_cpu)
> {
> - int my_cpu, i, notified=-1;
> + int i, notified = -1;
>
> hw_breakpoint_disable();
> - my_cpu = get_cpu();
> /* Make sure each CPU has at least made it to the state we need.
> *
> * FIXME: There is a (slim) chance of a problem if not all of the CPUs
> @@ -246,6 +245,8 @@ static void wake_offline_cpus(void)
>
> static void kexec_prepare_cpus(void)
> {
> + int my_cpu;
> +
> wake_offline_cpus();
> smp_call_function(kexec_smp_down, NULL, /* wait */0);
> local_irq_disable();
> @@ -254,7 +255,8 @@ static void kexec_prepare_cpus(void)
> mb(); /* make sure IRQs are disabled before we say they are */
> get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF;
>
> - kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF);
> + my_cpu = get_cpu();
raw_smp_processor_id() is better here. All it needs is get current cpu?
caller does irq_disable above and that renders call for get_cpu un-necessary.
> + kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF, my_cpu);
> /* we are sure every CPU has IRQs off at this point */
> kexec_all_irq_disabled = 1;
>
> @@ -262,13 +264,12 @@ static void kexec_prepare_cpus(void)
> * Before removing MMU mappings make sure all CPUs have entered real
> * mode:
> */
> - kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
> + kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE, my_cpu);
> + put_cpu();
>
> /* after we tell the others to go down */
> if (ppc_md.kexec_cpu_down)
> ppc_md.kexec_cpu_down(0, 0);
> -
> - put_cpu();
> }
>
> #else /* ! SMP */
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
2026-05-18 5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
@ 2026-05-18 6:13 ` Shrikanth Hegde
2026-06-03 5:59 ` Aboorva Devarajan
0 siblings, 1 reply; 14+ messages in thread
From: Shrikanth Hegde @ 2026-05-18 6:13 UTC (permalink / raw)
To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
Ritesh Harjani
On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> fsl_emb_pmu_del() unconditionally calls put_cpu_var(cpu_hw_events) at
> the 'out:' label, but only calls the matching get_cpu_var() after the
> 'i < 0' early-return check. When event->hw.idx is negative the
> function jumps to 'out:' without having taken get_cpu_var(), and the
> trailing put_cpu_var() then issues an unmatched preempt_enable(),
> underflowing preempt_count.
>
> On a CONFIG_PREEMPT=y kernel preempt_count would underflow and
> eventually present as a 'scheduling while atomic' BUG.
>
> Move put_cpu_var() to pair with get_cpu_var() so the percpu access is
> correctly bracketed and the 'out:' label only handles perf_pmu_enable.
>
> Fixes: a11106544f33c ("powerpc/perf: e500 support")
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
> arch/powerpc/perf/core-fsl-emb.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
> index 7120ab20cbfec..02b5dd74c187a 100644
> --- a/arch/powerpc/perf/core-fsl-emb.c
> +++ b/arch/powerpc/perf/core-fsl-emb.c
> @@ -366,9 +366,10 @@ static void fsl_emb_pmu_del(struct perf_event *event, int flags)
>
> cpuhw->n_events--;
>
> + put_cpu_var(cpu_hw_events);
> +
> out:
> perf_pmu_enable(event->pmu);
> - put_cpu_var(cpu_hw_events);
> }
>
> static void fsl_emb_pmu_start(struct perf_event *event, int ef_flags)
Thanks for fixing this. Looks good to me.
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down
2026-05-18 5:08 ` [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down Aboorva Devarajan
@ 2026-05-18 7:56 ` Shrikanth Hegde
2026-06-03 6:08 ` Aboorva Devarajan
0 siblings, 1 reply; 14+ messages in thread
From: Shrikanth Hegde @ 2026-05-18 7:56 UTC (permalink / raw)
To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
Ritesh Harjani
Hi Aboorva.
On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> pnv_kexec_wait_secondaries_down() calls get_cpu() to obtain the current
> CPU id but never calls the matching put_cpu(), leaking one
> preempt_disable() nesting level on every invocation.
>
> In practice the imbalance does not trigger a visible splat because the
> kexec teardown path is a one-way trip: IRQs are already disabled, no
> schedule() occurs after the leak, and default_machine_kexec() overwrites
> preempt_count with HARDIRQ_OFFSET before jumping into kexec_sequence()
> which never returns. However the bookkeeping is still wrong.
>
> In the kexec teardown path IRQs are already disabled and the CPU is
> pinned, so get_cpu()'s preempt_disable() side-effect is unnecessary.
> Replace get_cpu() with raw_smp_processor_id() which returns the CPU id
> without touching preempt_count.
>
> Fixes: 298b34d7d578 ("powerpc/powernv: Fix kexec races going back to OPAL")
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
> arch/powerpc/platforms/powernv/setup.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
> index 4dbb47ddbdcc4..177da0defcb36 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -396,7 +396,7 @@ static void pnv_kexec_wait_secondaries_down(void)
> {
> int my_cpu, i, notified = -1;
>
> - my_cpu = get_cpu();
> + my_cpu = raw_smp_processor_id();
>
Is it always with irq-disabled?
How about !CONFIG_SMP and in kexec_prepare_cpus. I see it disables interrupt later.
(though it is a less common config)
So use smp_processor_id()?? One could compile with CONFIG_DEBUG_PREEMPT=y and
see any reports.
> for_each_online_cpu(i) {
> uint8_t status;
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths
2026-05-18 5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
` (2 preceding siblings ...)
2026-05-18 5:08 ` [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus Aboorva Devarajan
@ 2026-05-18 8:08 ` Shrikanth Hegde
2026-06-03 6:16 ` Aboorva Devarajan
3 siblings, 1 reply; 14+ messages in thread
From: Shrikanth Hegde @ 2026-05-18 8:08 UTC (permalink / raw)
To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
Ritesh Harjani
Hi Aboorva,
On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> Hi all,
>
> This patch series fixes some minor preempt_count bookkeeping issues in
> arch/powerpc/ found during a preemption leak audit prompted by the
> lazy/full preemption model changes. These are get_cpu/put_cpu and
> get_cpu_var/put_cpu_var pairing errors that leave preempt_count
> incorrectly elevated or underflowed.
>
Thanks for fixing some of these.
while we do this, Can you fix these mismatch in preempt disable/enable in
below files as well.
1. kernel/kprobes.c - kprobe_handler - Does disable, but doesn't enable in some return paths.
A definite leak.
2. Maybe platforms/pseries/lpar.c and platforms/powernv/opal-tracepoints.c.
In __trace_hcall_entry/exit. It maybe a very corner case,
I don't see a big concern there. But it may be remotely possible.
Need to evaluate whether it should be fixed or not.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
2026-05-18 6:13 ` Shrikanth Hegde
@ 2026-06-03 5:59 ` Aboorva Devarajan
0 siblings, 0 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03 5:59 UTC (permalink / raw)
To: Shrikanth Hegde, Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
Ritesh Harjani
On Mon, 2026-05-18 at 11:43 +0530, Shrikanth Hegde wrote:
>
>
> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> > fsl_emb_pmu_del() unconditionally calls put_cpu_var(cpu_hw_events)
> > at
> > the 'out:' label, but only calls the matching get_cpu_var() after
> > the
> > 'i < 0' early-return check. When event->hw.idx is negative the
> > function jumps to 'out:' without having taken get_cpu_var(), and
> > the
> > trailing put_cpu_var() then issues an unmatched preempt_enable(),
> > underflowing preempt_count.
> >
> > On a CONFIG_PREEMPT=y kernel preempt_count would underflow and
> > eventually present as a 'scheduling while atomic' BUG.
> >
> > Move put_cpu_var() to pair with get_cpu_var() so the percpu access
> > is
> > correctly bracketed and the 'out:' label only handles
> > perf_pmu_enable.
> >
> > Fixes: a11106544f33c ("powerpc/perf: e500 support")
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> > ---
> > arch/powerpc/perf/core-fsl-emb.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/perf/core-fsl-emb.c
> > b/arch/powerpc/perf/core-fsl-emb.c
> > index 7120ab20cbfec..02b5dd74c187a 100644
> > --- a/arch/powerpc/perf/core-fsl-emb.c
> > +++ b/arch/powerpc/perf/core-fsl-emb.c
> > @@ -366,9 +366,10 @@ static void fsl_emb_pmu_del(struct perf_event
> > *event, int flags)
> >
> > cpuhw->n_events--;
> >
> > + put_cpu_var(cpu_hw_events);
> > +
> > out:
> > perf_pmu_enable(event->pmu);
> > - put_cpu_var(cpu_hw_events);
> > }
> >
> > static void fsl_emb_pmu_start(struct perf_event *event, int
> > ef_flags)
>
> Thanks for fixing this. Looks good to me.
>
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Hi Shrikanth,
Thanks for the review.
Regards,
Aboorva
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down
2026-05-18 7:56 ` Shrikanth Hegde
@ 2026-06-03 6:08 ` Aboorva Devarajan
0 siblings, 0 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03 6:08 UTC (permalink / raw)
To: Shrikanth Hegde, Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
Ritesh Harjani
On Mon, 2026-05-18 at 13:26 +0530, Shrikanth Hegde wrote:
>
> Hi Aboorva.
>
> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> > pnv_kexec_wait_secondaries_down() calls get_cpu() to obtain the current
> > CPU id but never calls the matching put_cpu(), leaking one
> > preempt_disable() nesting level on every invocation.
> >
> > In practice the imbalance does not trigger a visible splat because the
> > kexec teardown path is a one-way trip: IRQs are already disabled, no
> > schedule() occurs after the leak, and default_machine_kexec() overwrites
> > preempt_count with HARDIRQ_OFFSET before jumping into kexec_sequence()
> > which never returns. However the bookkeeping is still wrong.
> >
> > In the kexec teardown path IRQs are already disabled and the CPU is
> > pinned, so get_cpu()'s preempt_disable() side-effect is unnecessary.
> > Replace get_cpu() with raw_smp_processor_id() which returns the CPU id
> > without touching preempt_count.
> >
> > Fixes: 298b34d7d578 ("powerpc/powernv: Fix kexec races going back to OPAL")
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> > ---
> > arch/powerpc/platforms/powernv/setup.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
> > index 4dbb47ddbdcc4..177da0defcb36 100644
> > --- a/arch/powerpc/platforms/powernv/setup.c
> > +++ b/arch/powerpc/platforms/powernv/setup.c
> > @@ -396,7 +396,7 @@ static void pnv_kexec_wait_secondaries_down(void)
> > {
> > int my_cpu, i, notified = -1;
> >
> > - my_cpu = get_cpu();
> > + my_cpu = raw_smp_processor_id();
> >
>
> Is it always with irq-disabled?
> How about !CONFIG_SMP and in kexec_prepare_cpus. I see it disables interrupt later.
> (though it is a less common config)
>
IIUC, PPC_POWERNV does 'select FORCE_SMP' (-> selects SMP), so there is no
!CONFIG_SMP powernv build. The !SMP kexec_prepare_cpus() variant in
arch/powerpc/kexec/core_64.c, the one you spotted that calls
ppc_md.kexec_cpu_down() before local_irq_disable() is therefore
never compiled with powernv, so pnv_kexec_cpu_down() ->
pnv_kexec_wait_secondaries_down() can't be reached through it.
so, IRQs are disabled in every case that reaches this function.
> So use smp_processor_id()?? One could compile with CONFIG_DEBUG_PREEMPT=y and
> see any reports.
>
> > for_each_online_cpu(i) {
> > uint8_t status;
sure, I'll switch to smp_processor_id() in v2 rather than raw_smp_processor_id().
It returns the cpu id without touching preempt_count (so the leak is
gone), and unlike the raw variant it keeps the CONFIG_DEBUG_PREEMPT
check which is a no-op here since IRQs are off, but will flag any
future caller that reaches this path while preemptible instead of
silently hiding it.
Thanks,
Aboorva
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
2026-05-18 6:02 ` Shrikanth Hegde
@ 2026-06-03 6:14 ` Aboorva Devarajan
2026-06-03 6:16 ` Shrikanth Hegde
0 siblings, 1 reply; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03 6:14 UTC (permalink / raw)
To: Shrikanth Hegde, Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
Ritesh Harjani
Hi Shrikanth,
On Mon, 2026-05-18 at 11:32 +0530, Shrikanth Hegde wrote:
> Hi Aboorva.
>
> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> > kexec_prepare_cpus_wait() calls get_cpu() internally to obtain the
> > current CPU id. kexec_prepare_cpus() calls kexec_prepare_cpus_wait()
> > twice -- once for KEXEC_STATE_IRQS_OFF and once for
> > KEXEC_STATE_REAL_MODE -- but only issues a single put_cpu() at the end,
> > leaving preempt_count elevated by one extra nesting level.
> >
> > In practice the imbalance does not trigger a 'scheduling while atomic'
> > splat because the kexec path is a one-way trip: IRQs are already
> > disabled, no schedule() occurs after the leak, and
> > default_machine_kexec() overwrites preempt_count with HARDIRQ_OFFSET
> > before jumping into kexec_sequence() which never returns. However the
> > bookkeeping is still wrong.
> >
> > Lift the get_cpu()/put_cpu() pair into kexec_prepare_cpus() so it is
> > called exactly once, and pass the CPU id to kexec_prepare_cpus_wait()
> > as a parameter. This keeps preempt_count correctly balanced.
> >
> > Fixes: 1fc711f7ffb01 ("powerpc/kexec: Fix race in kexec shutdown")
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> > ---
> > arch/powerpc/kexec/core_64.c | 15 ++++++++-------
> > 1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> > index 825ab8a88f18e..9d7e5a1e6e5b8 100644
> > --- a/arch/powerpc/kexec/core_64.c
> > +++ b/arch/powerpc/kexec/core_64.c
> > @@ -164,12 +164,11 @@ static void kexec_smp_down(void *arg)
> > /* NOTREACHED */
> > }
> >
> > -static void kexec_prepare_cpus_wait(int wait_state)
> > +static void kexec_prepare_cpus_wait(int wait_state, int my_cpu)
> > {
> > - int my_cpu, i, notified=-1;
> > + int i, notified = -1;
> >
> > hw_breakpoint_disable();
> > - my_cpu = get_cpu();
> > /* Make sure each CPU has at least made it to the state we need.
> > *
> > * FIXME: There is a (slim) chance of a problem if not all of the CPUs
> > @@ -246,6 +245,8 @@ static void wake_offline_cpus(void)
> >
> > static void kexec_prepare_cpus(void)
> > {
> > + int my_cpu;
> > +
> > wake_offline_cpus();
> > smp_call_function(kexec_smp_down, NULL, /* wait */0);
> > local_irq_disable();
> > @@ -254,7 +255,8 @@ static void kexec_prepare_cpus(void)
> > mb(); /* make sure IRQs are disabled before we say they are */
> > get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF;
> >
> > - kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF);
> > + my_cpu = get_cpu();
> raw_smp_processor_id() is better here. All it needs is get current cpu?
> caller does irq_disable above and that renders call for get_cpu un-necessary.
Agreed, get_cpu() is not needed here. kexec_prepare_cpus() already does
local_irq_disable()/hard_irq_disable() before calling
kexec_prepare_cpus_wait(), so we only need the current cpu id.
I will go ahead with smp_processor_id() rather than
raw_smp_processor_id() to stay consistent with Patch 2 and to keep the
CONFIG_DEBUG_PREEMPT check.
> >
> > @@ -262,13 +264,12 @@ static void kexec_prepare_cpus(void)
> > * Before removing MMU mappings make sure all CPUs have entered real
> > * mode:
> > */
> > - kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
> > + kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE, my_cpu);
> > + put_cpu();
> >
> > /* after we tell the others to go down */
> > if (ppc_md.kexec_cpu_down)
> > ppc_md.kexec_cpu_down(0, 0);
> > -
> > - put_cpu();
> > }
> >
> > #else /* ! SMP */
Regards,
Aboorva
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
2026-06-03 6:14 ` Aboorva Devarajan
@ 2026-06-03 6:16 ` Shrikanth Hegde
0 siblings, 0 replies; 14+ messages in thread
From: Shrikanth Hegde @ 2026-06-03 6:16 UTC (permalink / raw)
To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
Ritesh Harjani
On 6/3/26 11:44 AM, Aboorva Devarajan wrote:
> Hi Shrikanth,
>
> On Mon, 2026-05-18 at 11:32 +0530, Shrikanth Hegde wrote:
>> Hi Aboorva.
>>
>> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
>>> kexec_prepare_cpus_wait() calls get_cpu() internally to obtain the
>>> current CPU id. kexec_prepare_cpus() calls kexec_prepare_cpus_wait()
>>> twice -- once for KEXEC_STATE_IRQS_OFF and once for
>>> KEXEC_STATE_REAL_MODE -- but only issues a single put_cpu() at the end,
>>> leaving preempt_count elevated by one extra nesting level.
>>>
>>> In practice the imbalance does not trigger a 'scheduling while atomic'
>>> splat because the kexec path is a one-way trip: IRQs are already
>>> disabled, no schedule() occurs after the leak, and
>>> default_machine_kexec() overwrites preempt_count with HARDIRQ_OFFSET
>>> before jumping into kexec_sequence() which never returns. However the
>>> bookkeeping is still wrong.
>>>
>>> Lift the get_cpu()/put_cpu() pair into kexec_prepare_cpus() so it is
>>> called exactly once, and pass the CPU id to kexec_prepare_cpus_wait()
>>> as a parameter. This keeps preempt_count correctly balanced.
>>>
>>> Fixes: 1fc711f7ffb01 ("powerpc/kexec: Fix race in kexec shutdown")
>>> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
>>> ---
>>> arch/powerpc/kexec/core_64.c | 15 ++++++++-------
>>> 1 file changed, 8 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
>>> index 825ab8a88f18e..9d7e5a1e6e5b8 100644
>>> --- a/arch/powerpc/kexec/core_64.c
>>> +++ b/arch/powerpc/kexec/core_64.c
>>> @@ -164,12 +164,11 @@ static void kexec_smp_down(void *arg)
>>> /* NOTREACHED */
>>> }
>>>
>>> -static void kexec_prepare_cpus_wait(int wait_state)
>>> +static void kexec_prepare_cpus_wait(int wait_state, int my_cpu)
>>> {
>>> - int my_cpu, i, notified=-1;
>>> + int i, notified = -1;
>>>
>>> hw_breakpoint_disable();
>>> - my_cpu = get_cpu();
>>> /* Make sure each CPU has at least made it to the state we need.
>>> *
>>> * FIXME: There is a (slim) chance of a problem if not all of the CPUs
>>> @@ -246,6 +245,8 @@ static void wake_offline_cpus(void)
>>>
>>> static void kexec_prepare_cpus(void)
>>> {
>>> + int my_cpu;
>>> +
>>> wake_offline_cpus();
>>> smp_call_function(kexec_smp_down, NULL, /* wait */0);
>>> local_irq_disable();
>>> @@ -254,7 +255,8 @@ static void kexec_prepare_cpus(void)
>>> mb(); /* make sure IRQs are disabled before we say they are */
>>> get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF;
>>>
>>> - kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF);
>>> + my_cpu = get_cpu();
>
>> raw_smp_processor_id() is better here. All it needs is get current cpu?
>> caller does irq_disable above and that renders call for get_cpu un-necessary.
>
> Agreed, get_cpu() is not needed here. kexec_prepare_cpus() already does
> local_irq_disable()/hard_irq_disable() before calling
> kexec_prepare_cpus_wait(), so we only need the current cpu id.
>
> I will go ahead with smp_processor_id() rather than
> raw_smp_processor_id() to stay consistent with Patch 2 and to keep the
> CONFIG_DEBUG_PREEMPT check.
If the irq's are disabled then use raw_smp_processor_id() in both the places.
For patch2, just put a comment saying irq's are disabled when its get there.
>
>>>
>>> @@ -262,13 +264,12 @@ static void kexec_prepare_cpus(void)
>>> * Before removing MMU mappings make sure all CPUs have entered real
>>> * mode:
>>> */
>>> - kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
>>> + kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE, my_cpu);
>>> + put_cpu();
>>>
>>> /* after we tell the others to go down */
>>> if (ppc_md.kexec_cpu_down)
>>> ppc_md.kexec_cpu_down(0, 0);
>>> -
>>> - put_cpu();
>>> }
>>>
>>> #else /* ! SMP */
>
> Regards,
> Aboorva
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths
2026-05-18 8:08 ` [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Shrikanth Hegde
@ 2026-06-03 6:16 ` Aboorva Devarajan
0 siblings, 0 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03 6:16 UTC (permalink / raw)
To: Shrikanth Hegde, Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
Ritesh Harjani
On Mon, 2026-05-18 at 13:38 +0530, Shrikanth Hegde wrote:
> Hi Aboorva,
>
> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> > Hi all,
> >
> > This patch series fixes some minor preempt_count bookkeeping issues in
> > arch/powerpc/ found during a preemption leak audit prompted by the
> > lazy/full preemption model changes. These are get_cpu/put_cpu and
> > get_cpu_var/put_cpu_var pairing errors that leave preempt_count
> > incorrectly elevated or underflowed.
> >
>
> Thanks for fixing some of these.
>
> while we do this, Can you fix these mismatch in preempt disable/enable in
> below files as well.
>
> 1. kernel/kprobes.c - kprobe_handler - Does disable, but doesn't enable in some return paths.
> A definite leak.
>
> 2. Maybe platforms/pseries/lpar.c and platforms/powernv/opal-tracepoints.c.
> In __trace_hcall_entry/exit. It maybe a very corner case,
> I don't see a big concern there. But it may be remotely possible.
> Need to evaluate whether it should be fixed or not.
Thanks for the pointers. I'll go through these and get back.
Regards,
Aboorva
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
2026-06-03 6:27 [PATCH v2 " Aboorva Devarajan
@ 2026-06-03 6:27 ` Aboorva Devarajan
0 siblings, 0 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03 6:27 UTC (permalink / raw)
To: Madhavan Srinivasan, linuxppc-dev
Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
Sourabh Jain, Ritesh Harjani, Shrikanth Hegde
fsl_emb_pmu_del() unconditionally calls put_cpu_var(cpu_hw_events) at
the 'out:' label, but only calls the matching get_cpu_var() after the
'i < 0' early-return check. When event->hw.idx is negative the
function jumps to 'out:' without having taken get_cpu_var(), and the
trailing put_cpu_var() then issues an unmatched preempt_enable(),
underflowing preempt_count.
On a CONFIG_PREEMPT=y kernel preempt_count would underflow and
eventually present as a 'scheduling while atomic' BUG.
Move put_cpu_var() to pair with get_cpu_var() so the percpu access is
correctly bracketed and the 'out:' label only handles perf_pmu_enable.
Fixes: a11106544f33 ("powerpc/perf: e500 support")
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
arch/powerpc/perf/core-fsl-emb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index 7120ab20cbfe..02b5dd74c187 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -366,9 +366,10 @@ static void fsl_emb_pmu_del(struct perf_event *event, int flags)
cpuhw->n_events--;
+ put_cpu_var(cpu_hw_events);
+
out:
perf_pmu_enable(event->pmu);
- put_cpu_var(cpu_hw_events);
}
static void fsl_emb_pmu_start(struct perf_event *event, int ef_flags)
--
2.54.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-06-03 6:28 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
2026-05-18 5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
2026-05-18 6:13 ` Shrikanth Hegde
2026-06-03 5:59 ` Aboorva Devarajan
2026-05-18 5:08 ` [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down Aboorva Devarajan
2026-05-18 7:56 ` Shrikanth Hegde
2026-06-03 6:08 ` Aboorva Devarajan
2026-05-18 5:08 ` [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus Aboorva Devarajan
2026-05-18 6:02 ` Shrikanth Hegde
2026-06-03 6:14 ` Aboorva Devarajan
2026-06-03 6:16 ` Shrikanth Hegde
2026-05-18 8:08 ` [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Shrikanth Hegde
2026-06-03 6:16 ` Aboorva Devarajan
-- strict thread matches above, loose matches on Subject: below --
2026-06-03 6:27 [PATCH v2 " Aboorva Devarajan
2026-06-03 6:27 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox