LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths
@ 2026-05-18  5:08 Aboorva Devarajan
  2026-05-18  5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-05-18  5:08 UTC (permalink / raw)
  To: Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
	Sourabh Jain, Ritesh Harjani, Shrikanth Hegde

Hi all,

This patch series fixes some minor preempt_count bookkeeping issues in
arch/powerpc/ found during a preemption leak audit prompted by the
lazy/full preemption model changes. These are get_cpu/put_cpu and
get_cpu_var/put_cpu_var pairing errors that leave preempt_count
incorrectly elevated or underflowed.

Please let me know your comments.

Thanks,
Aboorva

Aboorva Devarajan (3):
  powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
  powerpc/powernv: fix preempt count leak in
    pnv_kexec_wait_secondaries_down
  powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus

 arch/powerpc/kexec/core_64.c           | 15 ++++++++-------
 arch/powerpc/perf/core-fsl-emb.c       |  3 ++-
 arch/powerpc/platforms/powernv/setup.c |  2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

-- 
2.54.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
  2026-05-18  5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
@ 2026-05-18  5:08 ` Aboorva Devarajan
  2026-05-18  6:13   ` Shrikanth Hegde
  2026-05-18  5:08 ` [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down Aboorva Devarajan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Aboorva Devarajan @ 2026-05-18  5:08 UTC (permalink / raw)
  To: Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
	Sourabh Jain, Ritesh Harjani, Shrikanth Hegde

fsl_emb_pmu_del() unconditionally calls put_cpu_var(cpu_hw_events) at
the 'out:' label, but only calls the matching get_cpu_var() after the
'i < 0' early-return check. When event->hw.idx is negative the
function jumps to 'out:' without having taken get_cpu_var(), and the
trailing put_cpu_var() then issues an unmatched preempt_enable(),
underflowing preempt_count.

On a CONFIG_PREEMPT=y kernel preempt_count would underflow and
eventually present as a 'scheduling while atomic' BUG.

Move put_cpu_var() to pair with get_cpu_var() so the percpu access is
correctly bracketed and the 'out:' label only handles perf_pmu_enable.

Fixes: a11106544f33c ("powerpc/perf: e500 support")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 arch/powerpc/perf/core-fsl-emb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index 7120ab20cbfec..02b5dd74c187a 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -366,9 +366,10 @@ static void fsl_emb_pmu_del(struct perf_event *event, int flags)
 
 	cpuhw->n_events--;
 
+	put_cpu_var(cpu_hw_events);
+
  out:
 	perf_pmu_enable(event->pmu);
-	put_cpu_var(cpu_hw_events);
 }
 
 static void fsl_emb_pmu_start(struct perf_event *event, int ef_flags)
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down
  2026-05-18  5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
  2026-05-18  5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
@ 2026-05-18  5:08 ` Aboorva Devarajan
  2026-05-18  7:56   ` Shrikanth Hegde
  2026-05-18  5:08 ` [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus Aboorva Devarajan
  2026-05-18  8:08 ` [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Shrikanth Hegde
  3 siblings, 1 reply; 14+ messages in thread
From: Aboorva Devarajan @ 2026-05-18  5:08 UTC (permalink / raw)
  To: Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
	Sourabh Jain, Ritesh Harjani, Shrikanth Hegde

pnv_kexec_wait_secondaries_down() calls get_cpu() to obtain the current
CPU id but never calls the matching put_cpu(), leaking one
preempt_disable() nesting level on every invocation.

In practice the imbalance does not trigger a visible splat because the
kexec teardown path is a one-way trip: IRQs are already disabled, no
schedule() occurs after the leak, and default_machine_kexec() overwrites
preempt_count with HARDIRQ_OFFSET before jumping into kexec_sequence()
which never returns. However the bookkeeping is still wrong.

In the kexec teardown path IRQs are already disabled and the CPU is
pinned, so get_cpu()'s preempt_disable() side-effect is unnecessary.
Replace get_cpu() with raw_smp_processor_id() which returns the CPU id
without touching preempt_count.

Fixes: 298b34d7d578 ("powerpc/powernv: Fix kexec races going back to OPAL")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 4dbb47ddbdcc4..177da0defcb36 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -396,7 +396,7 @@ static void pnv_kexec_wait_secondaries_down(void)
 {
 	int my_cpu, i, notified = -1;
 
-	my_cpu = get_cpu();
+	my_cpu = raw_smp_processor_id();
 
 	for_each_online_cpu(i) {
 		uint8_t status;
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
  2026-05-18  5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
  2026-05-18  5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
  2026-05-18  5:08 ` [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down Aboorva Devarajan
@ 2026-05-18  5:08 ` Aboorva Devarajan
  2026-05-18  6:02   ` Shrikanth Hegde
  2026-05-18  8:08 ` [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Shrikanth Hegde
  3 siblings, 1 reply; 14+ messages in thread
From: Aboorva Devarajan @ 2026-05-18  5:08 UTC (permalink / raw)
  To: Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
	Sourabh Jain, Ritesh Harjani, Shrikanth Hegde

kexec_prepare_cpus_wait() calls get_cpu() internally to obtain the
current CPU id. kexec_prepare_cpus() calls kexec_prepare_cpus_wait()
twice -- once for KEXEC_STATE_IRQS_OFF and once for
KEXEC_STATE_REAL_MODE -- but only issues a single put_cpu() at the end,
leaving preempt_count elevated by one extra nesting level.

In practice the imbalance does not trigger a 'scheduling while atomic'
splat because the kexec path is a one-way trip: IRQs are already
disabled, no schedule() occurs after the leak, and
default_machine_kexec() overwrites preempt_count with HARDIRQ_OFFSET
before jumping into kexec_sequence() which never returns. However the
bookkeeping is still wrong.

Lift the get_cpu()/put_cpu() pair into kexec_prepare_cpus() so it is
called exactly once, and pass the CPU id to kexec_prepare_cpus_wait()
as a parameter. This keeps preempt_count correctly balanced.

Fixes: 1fc711f7ffb01 ("powerpc/kexec: Fix race in kexec shutdown")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 arch/powerpc/kexec/core_64.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 825ab8a88f18e..9d7e5a1e6e5b8 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -164,12 +164,11 @@ static void kexec_smp_down(void *arg)
 	/* NOTREACHED */
 }
 
-static void kexec_prepare_cpus_wait(int wait_state)
+static void kexec_prepare_cpus_wait(int wait_state, int my_cpu)
 {
-	int my_cpu, i, notified=-1;
+	int i, notified = -1;
 
 	hw_breakpoint_disable();
-	my_cpu = get_cpu();
 	/* Make sure each CPU has at least made it to the state we need.
 	 *
 	 * FIXME: There is a (slim) chance of a problem if not all of the CPUs
@@ -246,6 +245,8 @@ static void wake_offline_cpus(void)
 
 static void kexec_prepare_cpus(void)
 {
+	int my_cpu;
+
 	wake_offline_cpus();
 	smp_call_function(kexec_smp_down, NULL, /* wait */0);
 	local_irq_disable();
@@ -254,7 +255,8 @@ static void kexec_prepare_cpus(void)
 	mb(); /* make sure IRQs are disabled before we say they are */
 	get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF;
 
-	kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF);
+	my_cpu = get_cpu();
+	kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF, my_cpu);
 	/* we are sure every CPU has IRQs off at this point */
 	kexec_all_irq_disabled = 1;
 
@@ -262,13 +264,12 @@ static void kexec_prepare_cpus(void)
 	 * Before removing MMU mappings make sure all CPUs have entered real
 	 * mode:
 	 */
-	kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
+	kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE, my_cpu);
+	put_cpu();
 
 	/* after we tell the others to go down */
 	if (ppc_md.kexec_cpu_down)
 		ppc_md.kexec_cpu_down(0, 0);
-
-	put_cpu();
 }
 
 #else /* ! SMP */
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
  2026-05-18  5:08 ` [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus Aboorva Devarajan
@ 2026-05-18  6:02   ` Shrikanth Hegde
  2026-06-03  6:14     ` Aboorva Devarajan
  0 siblings, 1 reply; 14+ messages in thread
From: Shrikanth Hegde @ 2026-05-18  6:02 UTC (permalink / raw)
  To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
	Ritesh Harjani

Hi Aboorva.

On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> kexec_prepare_cpus_wait() calls get_cpu() internally to obtain the
> current CPU id. kexec_prepare_cpus() calls kexec_prepare_cpus_wait()
> twice -- once for KEXEC_STATE_IRQS_OFF and once for
> KEXEC_STATE_REAL_MODE -- but only issues a single put_cpu() at the end,
> leaving preempt_count elevated by one extra nesting level.
> 
> In practice the imbalance does not trigger a 'scheduling while atomic'
> splat because the kexec path is a one-way trip: IRQs are already
> disabled, no schedule() occurs after the leak, and
> default_machine_kexec() overwrites preempt_count with HARDIRQ_OFFSET
> before jumping into kexec_sequence() which never returns. However the
> bookkeeping is still wrong.
> 
> Lift the get_cpu()/put_cpu() pair into kexec_prepare_cpus() so it is
> called exactly once, and pass the CPU id to kexec_prepare_cpus_wait()
> as a parameter. This keeps preempt_count correctly balanced.
> 
> Fixes: 1fc711f7ffb01 ("powerpc/kexec: Fix race in kexec shutdown")
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>   arch/powerpc/kexec/core_64.c | 15 ++++++++-------
>   1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 825ab8a88f18e..9d7e5a1e6e5b8 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -164,12 +164,11 @@ static void kexec_smp_down(void *arg)
>   	/* NOTREACHED */
>   }
>   
> -static void kexec_prepare_cpus_wait(int wait_state)
> +static void kexec_prepare_cpus_wait(int wait_state, int my_cpu)
>   {
> -	int my_cpu, i, notified=-1;
> +	int i, notified = -1;
>   
>   	hw_breakpoint_disable();
> -	my_cpu = get_cpu();
>   	/* Make sure each CPU has at least made it to the state we need.
>   	 *
>   	 * FIXME: There is a (slim) chance of a problem if not all of the CPUs
> @@ -246,6 +245,8 @@ static void wake_offline_cpus(void)
>   
>   static void kexec_prepare_cpus(void)
>   {
> +	int my_cpu;
> +
>   	wake_offline_cpus();
>   	smp_call_function(kexec_smp_down, NULL, /* wait */0);
>   	local_irq_disable();
> @@ -254,7 +255,8 @@ static void kexec_prepare_cpus(void)
>   	mb(); /* make sure IRQs are disabled before we say they are */
>   	get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF;
>   
> -	kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF);
> +	my_cpu = get_cpu();

raw_smp_processor_id() is better here. All it needs is get current cpu?
caller does irq_disable above and that renders call for get_cpu un-necessary.


> +	kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF, my_cpu);
>   	/* we are sure every CPU has IRQs off at this point */
>   	kexec_all_irq_disabled = 1;
>   
> @@ -262,13 +264,12 @@ static void kexec_prepare_cpus(void)
>   	 * Before removing MMU mappings make sure all CPUs have entered real
>   	 * mode:
>   	 */
> -	kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
> +	kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE, my_cpu);
> +	put_cpu();
>   
>   	/* after we tell the others to go down */
>   	if (ppc_md.kexec_cpu_down)
>   		ppc_md.kexec_cpu_down(0, 0);
> -
> -	put_cpu();
>   }
>   
>   #else /* ! SMP */



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
  2026-05-18  5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
@ 2026-05-18  6:13   ` Shrikanth Hegde
  2026-06-03  5:59     ` Aboorva Devarajan
  0 siblings, 1 reply; 14+ messages in thread
From: Shrikanth Hegde @ 2026-05-18  6:13 UTC (permalink / raw)
  To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
	Ritesh Harjani



On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> fsl_emb_pmu_del() unconditionally calls put_cpu_var(cpu_hw_events) at
> the 'out:' label, but only calls the matching get_cpu_var() after the
> 'i < 0' early-return check. When event->hw.idx is negative the
> function jumps to 'out:' without having taken get_cpu_var(), and the
> trailing put_cpu_var() then issues an unmatched preempt_enable(),
> underflowing preempt_count.
> 
> On a CONFIG_PREEMPT=y kernel preempt_count would underflow and
> eventually present as a 'scheduling while atomic' BUG.
> 
> Move put_cpu_var() to pair with get_cpu_var() so the percpu access is
> correctly bracketed and the 'out:' label only handles perf_pmu_enable.
> 
> Fixes: a11106544f33c ("powerpc/perf: e500 support")
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>   arch/powerpc/perf/core-fsl-emb.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
> index 7120ab20cbfec..02b5dd74c187a 100644
> --- a/arch/powerpc/perf/core-fsl-emb.c
> +++ b/arch/powerpc/perf/core-fsl-emb.c
> @@ -366,9 +366,10 @@ static void fsl_emb_pmu_del(struct perf_event *event, int flags)
>   
>   	cpuhw->n_events--;
>   
> +	put_cpu_var(cpu_hw_events);
> +
>    out:
>   	perf_pmu_enable(event->pmu);
> -	put_cpu_var(cpu_hw_events);
>   }
>   
>   static void fsl_emb_pmu_start(struct perf_event *event, int ef_flags)

Thanks for fixing this. Looks good to me.

Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down
  2026-05-18  5:08 ` [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down Aboorva Devarajan
@ 2026-05-18  7:56   ` Shrikanth Hegde
  2026-06-03  6:08     ` Aboorva Devarajan
  0 siblings, 1 reply; 14+ messages in thread
From: Shrikanth Hegde @ 2026-05-18  7:56 UTC (permalink / raw)
  To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
	Ritesh Harjani


Hi Aboorva.

On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> pnv_kexec_wait_secondaries_down() calls get_cpu() to obtain the current
> CPU id but never calls the matching put_cpu(), leaking one
> preempt_disable() nesting level on every invocation.
> 
> In practice the imbalance does not trigger a visible splat because the
> kexec teardown path is a one-way trip: IRQs are already disabled, no
> schedule() occurs after the leak, and default_machine_kexec() overwrites
> preempt_count with HARDIRQ_OFFSET before jumping into kexec_sequence()
> which never returns. However the bookkeeping is still wrong.
> 
> In the kexec teardown path IRQs are already disabled and the CPU is
> pinned, so get_cpu()'s preempt_disable() side-effect is unnecessary.
> Replace get_cpu() with raw_smp_processor_id() which returns the CPU id
> without touching preempt_count.
> 
> Fixes: 298b34d7d578 ("powerpc/powernv: Fix kexec races going back to OPAL")
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>   arch/powerpc/platforms/powernv/setup.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
> index 4dbb47ddbdcc4..177da0defcb36 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -396,7 +396,7 @@ static void pnv_kexec_wait_secondaries_down(void)
>   {
>   	int my_cpu, i, notified = -1;
>   
> -	my_cpu = get_cpu();
> +	my_cpu = raw_smp_processor_id();
>   

Is it always with irq-disabled?
How about !CONFIG_SMP and in kexec_prepare_cpus. I see it disables interrupt later.
(though it is a less common config)

So use smp_processor_id()?? One could compile with CONFIG_DEBUG_PREEMPT=y and
see any reports.

>   	for_each_online_cpu(i) {
>   		uint8_t status;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths
  2026-05-18  5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
                   ` (2 preceding siblings ...)
  2026-05-18  5:08 ` [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus Aboorva Devarajan
@ 2026-05-18  8:08 ` Shrikanth Hegde
  2026-06-03  6:16   ` Aboorva Devarajan
  3 siblings, 1 reply; 14+ messages in thread
From: Shrikanth Hegde @ 2026-05-18  8:08 UTC (permalink / raw)
  To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
	Ritesh Harjani

Hi Aboorva,

On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> Hi all,
> 
> This patch series fixes some minor preempt_count bookkeeping issues in
> arch/powerpc/ found during a preemption leak audit prompted by the
> lazy/full preemption model changes. These are get_cpu/put_cpu and
> get_cpu_var/put_cpu_var pairing errors that leave preempt_count
> incorrectly elevated or underflowed.
> 

Thanks for fixing some of these.

while we do this, Can you fix these mismatch in preempt disable/enable in
below files as well.

1. kernel/kprobes.c - kprobe_handler - Does disable, but doesn't enable in some return paths.
    A definite leak.

2. Maybe platforms/pseries/lpar.c and platforms/powernv/opal-tracepoints.c.
    In __trace_hcall_entry/exit.  It maybe a very corner case,
    I don't see a big concern there. But it may be remotely possible.
    Need to evaluate whether it should be fixed or not.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
  2026-05-18  6:13   ` Shrikanth Hegde
@ 2026-06-03  5:59     ` Aboorva Devarajan
  0 siblings, 0 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03  5:59 UTC (permalink / raw)
  To: Shrikanth Hegde, Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
	Ritesh Harjani

On Mon, 2026-05-18 at 11:43 +0530, Shrikanth Hegde wrote:
> 
> 
> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> > fsl_emb_pmu_del() unconditionally calls put_cpu_var(cpu_hw_events)
> > at
> > the 'out:' label, but only calls the matching get_cpu_var() after
> > the
> > 'i < 0' early-return check. When event->hw.idx is negative the
> > function jumps to 'out:' without having taken get_cpu_var(), and
> > the
> > trailing put_cpu_var() then issues an unmatched preempt_enable(),
> > underflowing preempt_count.
> > 
> > On a CONFIG_PREEMPT=y kernel preempt_count would underflow and
> > eventually present as a 'scheduling while atomic' BUG.
> > 
> > Move put_cpu_var() to pair with get_cpu_var() so the percpu access
> > is
> > correctly bracketed and the 'out:' label only handles
> > perf_pmu_enable.
> > 
> > Fixes: a11106544f33c ("powerpc/perf: e500 support")
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> > ---
> >   arch/powerpc/perf/core-fsl-emb.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/perf/core-fsl-emb.c
> > b/arch/powerpc/perf/core-fsl-emb.c
> > index 7120ab20cbfec..02b5dd74c187a 100644
> > --- a/arch/powerpc/perf/core-fsl-emb.c
> > +++ b/arch/powerpc/perf/core-fsl-emb.c
> > @@ -366,9 +366,10 @@ static void fsl_emb_pmu_del(struct perf_event
> > *event, int flags)
> >   
> >   	cpuhw->n_events--;
> >   
> > +	put_cpu_var(cpu_hw_events);
> > +
> >    out:
> >   	perf_pmu_enable(event->pmu);
> > -	put_cpu_var(cpu_hw_events);
> >   }
> >   
> >   static void fsl_emb_pmu_start(struct perf_event *event, int
> > ef_flags)
> 
> Thanks for fixing this. Looks good to me.
> 
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>


Hi Shrikanth,

Thanks for the review.

Regards,
Aboorva


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down
  2026-05-18  7:56   ` Shrikanth Hegde
@ 2026-06-03  6:08     ` Aboorva Devarajan
  0 siblings, 0 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03  6:08 UTC (permalink / raw)
  To: Shrikanth Hegde, Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
	Ritesh Harjani

On Mon, 2026-05-18 at 13:26 +0530, Shrikanth Hegde wrote:
> 
> Hi Aboorva.
> 
> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> > pnv_kexec_wait_secondaries_down() calls get_cpu() to obtain the current
> > CPU id but never calls the matching put_cpu(), leaking one
> > preempt_disable() nesting level on every invocation.
> > 
> > In practice the imbalance does not trigger a visible splat because the
> > kexec teardown path is a one-way trip: IRQs are already disabled, no
> > schedule() occurs after the leak, and default_machine_kexec() overwrites
> > preempt_count with HARDIRQ_OFFSET before jumping into kexec_sequence()
> > which never returns. However the bookkeeping is still wrong.
> > 
> > In the kexec teardown path IRQs are already disabled and the CPU is
> > pinned, so get_cpu()'s preempt_disable() side-effect is unnecessary.
> > Replace get_cpu() with raw_smp_processor_id() which returns the CPU id
> > without touching preempt_count.
> > 
> > Fixes: 298b34d7d578 ("powerpc/powernv: Fix kexec races going back to OPAL")
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> > ---
> >   arch/powerpc/platforms/powernv/setup.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
> > index 4dbb47ddbdcc4..177da0defcb36 100644
> > --- a/arch/powerpc/platforms/powernv/setup.c
> > +++ b/arch/powerpc/platforms/powernv/setup.c
> > @@ -396,7 +396,7 @@ static void pnv_kexec_wait_secondaries_down(void)
> >   {
> >   	int my_cpu, i, notified = -1;
> >   
> > -	my_cpu = get_cpu();
> > +	my_cpu = raw_smp_processor_id();
> >   
> 
> Is it always with irq-disabled?
> How about !CONFIG_SMP and in kexec_prepare_cpus. I see it disables interrupt later.
> (though it is a less common config)
> 

IIUC, PPC_POWERNV does 'select FORCE_SMP' (-> selects SMP), so there is no
!CONFIG_SMP powernv build. The !SMP kexec_prepare_cpus() variant in
arch/powerpc/kexec/core_64.c, the one you spotted that calls
ppc_md.kexec_cpu_down() before local_irq_disable() is therefore
never compiled with powernv, so pnv_kexec_cpu_down() ->
pnv_kexec_wait_secondaries_down() can't be reached through it.
so, IRQs are disabled in every case that reaches this function.


> So use smp_processor_id()?? One could compile with CONFIG_DEBUG_PREEMPT=y and
> see any reports.
> 
> >   	for_each_online_cpu(i) {
> >   		uint8_t status;

sure, I'll switch to smp_processor_id() in v2 rather than raw_smp_processor_id().
It returns the cpu id without touching preempt_count (so the leak is
gone), and unlike the raw variant it keeps the CONFIG_DEBUG_PREEMPT
check which is a no-op here since IRQs are off, but will flag any
future caller that reaches this path while preemptible instead of
silently hiding it.


Thanks,
Aboorva


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
  2026-05-18  6:02   ` Shrikanth Hegde
@ 2026-06-03  6:14     ` Aboorva Devarajan
  2026-06-03  6:16       ` Shrikanth Hegde
  0 siblings, 1 reply; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03  6:14 UTC (permalink / raw)
  To: Shrikanth Hegde, Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
	Ritesh Harjani

Hi Shrikanth,

On Mon, 2026-05-18 at 11:32 +0530, Shrikanth Hegde wrote:
> Hi Aboorva.
> 
> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> > kexec_prepare_cpus_wait() calls get_cpu() internally to obtain the
> > current CPU id. kexec_prepare_cpus() calls kexec_prepare_cpus_wait()
> > twice -- once for KEXEC_STATE_IRQS_OFF and once for
> > KEXEC_STATE_REAL_MODE -- but only issues a single put_cpu() at the end,
> > leaving preempt_count elevated by one extra nesting level.
> > 
> > In practice the imbalance does not trigger a 'scheduling while atomic'
> > splat because the kexec path is a one-way trip: IRQs are already
> > disabled, no schedule() occurs after the leak, and
> > default_machine_kexec() overwrites preempt_count with HARDIRQ_OFFSET
> > before jumping into kexec_sequence() which never returns. However the
> > bookkeeping is still wrong.
> > 
> > Lift the get_cpu()/put_cpu() pair into kexec_prepare_cpus() so it is
> > called exactly once, and pass the CPU id to kexec_prepare_cpus_wait()
> > as a parameter. This keeps preempt_count correctly balanced.
> > 
> > Fixes: 1fc711f7ffb01 ("powerpc/kexec: Fix race in kexec shutdown")
> > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> > ---
> >   arch/powerpc/kexec/core_64.c | 15 ++++++++-------
> >   1 file changed, 8 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> > index 825ab8a88f18e..9d7e5a1e6e5b8 100644
> > --- a/arch/powerpc/kexec/core_64.c
> > +++ b/arch/powerpc/kexec/core_64.c
> > @@ -164,12 +164,11 @@ static void kexec_smp_down(void *arg)
> >   	/* NOTREACHED */
> >   }
> >   
> > -static void kexec_prepare_cpus_wait(int wait_state)
> > +static void kexec_prepare_cpus_wait(int wait_state, int my_cpu)
> >   {
> > -	int my_cpu, i, notified=-1;
> > +	int i, notified = -1;
> >   
> >   	hw_breakpoint_disable();
> > -	my_cpu = get_cpu();
> >   	/* Make sure each CPU has at least made it to the state we need.
> >   	 *
> >   	 * FIXME: There is a (slim) chance of a problem if not all of the CPUs
> > @@ -246,6 +245,8 @@ static void wake_offline_cpus(void)
> >   
> >   static void kexec_prepare_cpus(void)
> >   {
> > +	int my_cpu;
> > +
> >   	wake_offline_cpus();
> >   	smp_call_function(kexec_smp_down, NULL, /* wait */0);
> >   	local_irq_disable();
> > @@ -254,7 +255,8 @@ static void kexec_prepare_cpus(void)
> >   	mb(); /* make sure IRQs are disabled before we say they are */
> >   	get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF;
> >   
> > -	kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF);
> > +	my_cpu = get_cpu();

> raw_smp_processor_id() is better here. All it needs is get current cpu?
> caller does irq_disable above and that renders call for get_cpu un-necessary.

Agreed, get_cpu() is not needed here. kexec_prepare_cpus() already does
local_irq_disable()/hard_irq_disable() before calling
kexec_prepare_cpus_wait(), so we only need the current cpu id.

I will go ahead with smp_processor_id() rather than
raw_smp_processor_id() to stay consistent with Patch 2 and to keep the
CONFIG_DEBUG_PREEMPT check.

> >   
> > @@ -262,13 +264,12 @@ static void kexec_prepare_cpus(void)
> >   	 * Before removing MMU mappings make sure all CPUs have entered real
> >   	 * mode:
> >   	 */
> > -	kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
> > +	kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE, my_cpu);
> > +	put_cpu();
> >   
> >   	/* after we tell the others to go down */
> >   	if (ppc_md.kexec_cpu_down)
> >   		ppc_md.kexec_cpu_down(0, 0);
> > -
> > -	put_cpu();
> >   }
> >   
> >   #else /* ! SMP */

Regards,
Aboorva


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus
  2026-06-03  6:14     ` Aboorva Devarajan
@ 2026-06-03  6:16       ` Shrikanth Hegde
  0 siblings, 0 replies; 14+ messages in thread
From: Shrikanth Hegde @ 2026-06-03  6:16 UTC (permalink / raw)
  To: Aboorva Devarajan, Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
	Ritesh Harjani



On 6/3/26 11:44 AM, Aboorva Devarajan wrote:
> Hi Shrikanth,
> 
> On Mon, 2026-05-18 at 11:32 +0530, Shrikanth Hegde wrote:
>> Hi Aboorva.
>>
>> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
>>> kexec_prepare_cpus_wait() calls get_cpu() internally to obtain the
>>> current CPU id. kexec_prepare_cpus() calls kexec_prepare_cpus_wait()
>>> twice -- once for KEXEC_STATE_IRQS_OFF and once for
>>> KEXEC_STATE_REAL_MODE -- but only issues a single put_cpu() at the end,
>>> leaving preempt_count elevated by one extra nesting level.
>>>
>>> In practice the imbalance does not trigger a 'scheduling while atomic'
>>> splat because the kexec path is a one-way trip: IRQs are already
>>> disabled, no schedule() occurs after the leak, and
>>> default_machine_kexec() overwrites preempt_count with HARDIRQ_OFFSET
>>> before jumping into kexec_sequence() which never returns. However the
>>> bookkeeping is still wrong.
>>>
>>> Lift the get_cpu()/put_cpu() pair into kexec_prepare_cpus() so it is
>>> called exactly once, and pass the CPU id to kexec_prepare_cpus_wait()
>>> as a parameter. This keeps preempt_count correctly balanced.
>>>
>>> Fixes: 1fc711f7ffb01 ("powerpc/kexec: Fix race in kexec shutdown")
>>> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
>>> ---
>>>    arch/powerpc/kexec/core_64.c | 15 ++++++++-------
>>>    1 file changed, 8 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
>>> index 825ab8a88f18e..9d7e5a1e6e5b8 100644
>>> --- a/arch/powerpc/kexec/core_64.c
>>> +++ b/arch/powerpc/kexec/core_64.c
>>> @@ -164,12 +164,11 @@ static void kexec_smp_down(void *arg)
>>>    	/* NOTREACHED */
>>>    }
>>>    
>>> -static void kexec_prepare_cpus_wait(int wait_state)
>>> +static void kexec_prepare_cpus_wait(int wait_state, int my_cpu)
>>>    {
>>> -	int my_cpu, i, notified=-1;
>>> +	int i, notified = -1;
>>>    
>>>    	hw_breakpoint_disable();
>>> -	my_cpu = get_cpu();
>>>    	/* Make sure each CPU has at least made it to the state we need.
>>>    	 *
>>>    	 * FIXME: There is a (slim) chance of a problem if not all of the CPUs
>>> @@ -246,6 +245,8 @@ static void wake_offline_cpus(void)
>>>    
>>>    static void kexec_prepare_cpus(void)
>>>    {
>>> +	int my_cpu;
>>> +
>>>    	wake_offline_cpus();
>>>    	smp_call_function(kexec_smp_down, NULL, /* wait */0);
>>>    	local_irq_disable();
>>> @@ -254,7 +255,8 @@ static void kexec_prepare_cpus(void)
>>>    	mb(); /* make sure IRQs are disabled before we say they are */
>>>    	get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF;
>>>    
>>> -	kexec_prepare_cpus_wait(KEXEC_STATE_IRQS_OFF);
>>> +	my_cpu = get_cpu();
> 
>> raw_smp_processor_id() is better here. All it needs is get current cpu?
>> caller does irq_disable above and that renders call for get_cpu un-necessary.
> 
> Agreed, get_cpu() is not needed here. kexec_prepare_cpus() already does
> local_irq_disable()/hard_irq_disable() before calling
> kexec_prepare_cpus_wait(), so we only need the current cpu id.
> 
> I will go ahead with smp_processor_id() rather than
> raw_smp_processor_id() to stay consistent with Patch 2 and to keep the
> CONFIG_DEBUG_PREEMPT check.


If the irq's are disabled then use raw_smp_processor_id() in both the places.
For patch2, just put a comment saying irq's are disabled when its get there.

> 
>>>    
>>> @@ -262,13 +264,12 @@ static void kexec_prepare_cpus(void)
>>>    	 * Before removing MMU mappings make sure all CPUs have entered real
>>>    	 * mode:
>>>    	 */
>>> -	kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
>>> +	kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE, my_cpu);
>>> +	put_cpu();
>>>    
>>>    	/* after we tell the others to go down */
>>>    	if (ppc_md.kexec_cpu_down)
>>>    		ppc_md.kexec_cpu_down(0, 0);
>>> -
>>> -	put_cpu();
>>>    }
>>>    
>>>    #else /* ! SMP */
> 
> Regards,
> Aboorva



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths
  2026-05-18  8:08 ` [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Shrikanth Hegde
@ 2026-06-03  6:16   ` Aboorva Devarajan
  0 siblings, 0 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03  6:16 UTC (permalink / raw)
  To: Shrikanth Hegde, Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Christophe Leroy, linux-kernel, Sourabh Jain,
	Ritesh Harjani

On Mon, 2026-05-18 at 13:38 +0530, Shrikanth Hegde wrote:
> Hi Aboorva,
> 
> On 5/18/26 10:38 AM, Aboorva Devarajan wrote:
> > Hi all,
> > 
> > This patch series fixes some minor preempt_count bookkeeping issues in
> > arch/powerpc/ found during a preemption leak audit prompted by the
> > lazy/full preemption model changes. These are get_cpu/put_cpu and
> > get_cpu_var/put_cpu_var pairing errors that leave preempt_count
> > incorrectly elevated or underflowed.
> > 
> 
> Thanks for fixing some of these.
> 
> while we do this, Can you fix these mismatch in preempt disable/enable in
> below files as well.
> 
> 1. kernel/kprobes.c - kprobe_handler - Does disable, but doesn't enable in some return paths.
>     A definite leak.
> 
> 2. Maybe platforms/pseries/lpar.c and platforms/powernv/opal-tracepoints.c.
>     In __trace_hcall_entry/exit.  It maybe a very corner case,
>     I don't see a big concern there. But it may be remotely possible.
>     Need to evaluate whether it should be fixed or not.


Thanks for the pointers. I'll go through these and get back.

Regards,
Aboorva


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down
  2026-06-03  6:27 [PATCH v2 " Aboorva Devarajan
@ 2026-06-03  6:27 ` Aboorva Devarajan
  0 siblings, 0 replies; 14+ messages in thread
From: Aboorva Devarajan @ 2026-06-03  6:27 UTC (permalink / raw)
  To: Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
	Sourabh Jain, Ritesh Harjani, Shrikanth Hegde

pnv_kexec_wait_secondaries_down() calls get_cpu() to obtain the current
CPU id but never calls the matching put_cpu(), leaking one
preempt_disable() nesting level on every invocation.

In practice the imbalance does not trigger a visible splat because the
kexec teardown path is a one-way trip: IRQs are already disabled, no
schedule() occurs after the leak, and default_machine_kexec() overwrites
preempt_count with HARDIRQ_OFFSET before jumping into kexec_sequence()
which never returns. However the bookkeeping is still wrong.

The function only needs the current CPU id, and this path runs with the
CPU pinned and IRQs disabled, so the preempt_disable() side-effect of
get_cpu() is unnecessary. Replace it with smp_processor_id(), which
returns the CPU id without touching preempt_count.

Fixes: 298b34d7d578 ("powerpc/powernv: Fix kexec races going back to OPAL")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 4dbb47ddbdcc..73193264cbe7 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -396,7 +396,7 @@ static void pnv_kexec_wait_secondaries_down(void)
 {
 	int my_cpu, i, notified = -1;
 
-	my_cpu = get_cpu();
+	my_cpu = smp_processor_id();
 
 	for_each_online_cpu(i) {
 		uint8_t status;
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-06-03  6:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18  5:08 [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Aboorva Devarajan
2026-05-18  5:08 ` [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del Aboorva Devarajan
2026-05-18  6:13   ` Shrikanth Hegde
2026-06-03  5:59     ` Aboorva Devarajan
2026-05-18  5:08 ` [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down Aboorva Devarajan
2026-05-18  7:56   ` Shrikanth Hegde
2026-06-03  6:08     ` Aboorva Devarajan
2026-05-18  5:08 ` [PATCH 3/3] powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus Aboorva Devarajan
2026-05-18  6:02   ` Shrikanth Hegde
2026-06-03  6:14     ` Aboorva Devarajan
2026-06-03  6:16       ` Shrikanth Hegde
2026-05-18  8:08 ` [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths Shrikanth Hegde
2026-06-03  6:16   ` Aboorva Devarajan
  -- strict thread matches above, loose matches on Subject: below --
2026-06-03  6:27 [PATCH v2 " Aboorva Devarajan
2026-06-03  6:27 ` [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down Aboorva Devarajan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox