[PATCH v2 0/2] perf: Improve refresh limit in overflow handler

Linux Perf Users
 help / color / mirror / Atom feed

* [PATCH v2 0/2] perf: Improve refresh limit in overflow handler
@ 2026-05-14 16:21 Leo Yan
  2026-05-14 16:21 ` [PATCH v2 1/2] selftests/perf_events: Add test for refresh limit signals Leo Yan
  2026-05-14 16:21 ` [PATCH v2 2/2] perf/core: Ignore overflows while disable is pending Leo Yan
  0 siblings, 2 replies; 5+ messages in thread
From: Leo Yan @ 2026-05-14 16:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Shuah Khan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Sumanth Korikkar
  Cc: linux-kernel, linux-kselftest, linux-perf-users, Leo Yan

Commit 18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage") added a
direct pmu->stop() call when the refresh limit reaches zero. The change
was based on a test program [1] that reported missing POLL_HUP
notifications.

However, the test program used SIGIO, which is a standard signal and can
be coalesced. As a result, userspace may miss signal delivery even
though the signal was generated by the kernel. This is expected
behaviour for standard signals.

This series adds a selftest for the PERF_EVENT_IOC_REFRESH limit using a
real-time signal, which guarantees queued signal delivery and confirms
that POLL_HUP notifications are delivered reliably on arm64.

The second patch replaces the direct PMU stop with an explicit
pending-disable guard, this can avoid redundant stop for most cases and
logic is easier to understand.

[1] https://lore.kernel.org/lkml/aICYAqM5EQUlTqtX@li-2b55cdcc-350b-11b2-a85c-a78bff51fc11.ibm.com/

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
Changes in v2:
- Replaced ASSERT_EQ() with EXPECT_EQ() in test tear down (Sashiko).
- Handled a race case for high frequency overflow before disable irq_work (Sashiko).
- Link to v1: https://lore.kernel.org/r/20260512-arm_cs_clean_perf_handle-v1-0-75ff373ecd22@arm.com

---
Leo Yan (2):
      selftests/perf_events: Add test for refresh limit signals
      perf/core: Ignore overflows while disable is pending

 kernel/events/core.c                               |   8 +-
 tools/testing/selftests/perf_events/.gitignore     |   1 +
 tools/testing/selftests/perf_events/Makefile       |   3 +-
 .../testing/selftests/perf_events/refresh_signal.c | 120 +++++++++++++++++++++
 4 files changed, 130 insertions(+), 2 deletions(-)
---
base-commit: e1914add2799225a87502051415fc5c32aeb02ae
change-id: 20260429-arm_cs_clean_perf_handle-763cc339c1f5

Best regards,
-- 
Leo Yan <leo.yan@arm.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] selftests/perf_events: Add test for refresh limit signals
  2026-05-14 16:21 [PATCH v2 0/2] perf: Improve refresh limit in overflow handler Leo Yan
@ 2026-05-14 16:21 ` Leo Yan
  2026-05-14 20:06   ` sashiko-bot
  2026-05-14 16:21 ` [PATCH v2 2/2] perf/core: Ignore overflows while disable is pending Leo Yan
  1 sibling, 1 reply; 5+ messages in thread
From: Leo Yan @ 2026-05-14 16:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Shuah Khan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Sumanth Korikkar
  Cc: linux-kernel, linux-kselftest, linux-perf-users, Leo Yan

perf reports POLL_IN for overflows while an event still has refreshes
left, and POLL_HUP when the refresh count reaches zero and the event is
disabled.

Add a test to verify PERF_EVENT_IOC_REFRESH with a task-clock software
event. Use a real-time signal so notifications are queued instead of
being coalesced.

Each iteration sets the refresh count to 5 and waits for a POLL_HUP
notification after 5 task-clock overflows at a 1 microsecond interval.
The sequence is repeated 100 times, verifying that exactly one POLL_HUP
notification is delivered per iteration. The test is bounded by
the default selftest timeout (30 seconds).

    POLL_IN             POLL_HUP
    e1   e2   e3   e4   e5        |
    |    |    |    |    |         |
  --+----+----+----+----+---------+----------------------
    <-----      Iter 1      -----> <--- Iter2 ... --->

POLL_IN is allowed to be lower than the theoretical maximum because
overflow delivery is deferred through irq_work. If timers expire too
close to each other, pending irq_work may not run before the next timer
expires, so some POLL_IN notifications can be missed.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/testing/selftests/perf_events/.gitignore     |   1 +
 tools/testing/selftests/perf_events/Makefile       |   3 +-
 .../testing/selftests/perf_events/refresh_signal.c | 120 +++++++++++++++++++++
 3 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/perf_events/.gitignore b/tools/testing/selftests/perf_events/.gitignore
index 4931b3b6bbd3971145b0e09b1fcdaf6cae9eb10e..e1bcf3ab8ab24bdb742a9cd593810c1c566d4b20 100644
--- a/tools/testing/selftests/perf_events/.gitignore
+++ b/tools/testing/selftests/perf_events/.gitignore
@@ -3,3 +3,4 @@ sigtrap_threads
 remove_on_exec
 watermark_signal
 mmap
+refresh_signal
diff --git a/tools/testing/selftests/perf_events/Makefile b/tools/testing/selftests/perf_events/Makefile
index 2e5d85770dfeadd909196dbf980fd334b9580477..e0591b1045f959476a0c5bb57e471a01006b66ee 100644
--- a/tools/testing/selftests/perf_events/Makefile
+++ b/tools/testing/selftests/perf_events/Makefile
@@ -2,5 +2,6 @@
 CFLAGS += -Wl,-no-as-needed -Wall $(KHDR_INCLUDES)
 LDFLAGS += -lpthread
 
-TEST_GEN_PROGS := sigtrap_threads remove_on_exec watermark_signal mmap
+TEST_GEN_PROGS := sigtrap_threads remove_on_exec watermark_signal mmap \
+		  refresh_signal
 include ../lib.mk
diff --git a/tools/testing/selftests/perf_events/refresh_signal.c b/tools/testing/selftests/perf_events/refresh_signal.c
new file mode 100644
index 0000000000000000000000000000000000000000..5519ea7b56f9dc7aea31e53698bb201bcd76b241
--- /dev/null
+++ b/tools/testing/selftests/perf_events/refresh_signal.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <fcntl.h>
+#include <linux/perf_event.h>
+#include <poll.h>
+#include <signal.h>
+#include <sys/ioctl.h>
+#include <time.h>
+
+#include "../kselftest_harness.h"
+
+#define RT_SIG			(SIGRTMIN + 1)
+
+#define EVENT_LIMIT		5
+#define ITERATIONS		100
+
+static struct {
+	volatile sig_atomic_t in;
+	volatile sig_atomic_t hup;
+} count;
+
+static void sigio_handler(int signo __maybe_unused, siginfo_t *info,
+			  void *ucontext __maybe_unused)
+{
+	switch (info->si_code) {
+	case POLL_IN:
+		count.in++;
+		break;
+	case POLL_HUP:
+		count.hup++;
+		break;
+	}
+}
+
+FIXTURE(refresh_signal)
+{
+	struct sigaction old_sa;
+	int fd;
+};
+
+FIXTURE_SETUP(refresh_signal)
+{
+	struct sigaction sa = { 0 };
+	struct perf_event_attr attr = { 0 };
+
+	sa.sa_sigaction = sigio_handler;
+	sa.sa_flags = SA_SIGINFO;
+	sigemptyset(&sa.sa_mask);
+
+	/* Use a real-time signal so notifications are reliably queued */
+	EXPECT_EQ(sigaction(RT_SIG, &sa, &self->old_sa), 0);
+
+	attr.size = sizeof(attr);
+	attr.type = PERF_TYPE_SOFTWARE;
+	attr.config = PERF_COUNT_SW_TASK_CLOCK;
+	attr.disabled = 1;
+	attr.sample_period = 1000;	/* 1 us */
+	attr.exclude_kernel = 1;
+	attr.exclude_hv = 1;
+
+	self->fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
+	ASSERT_NE(self->fd, -1);
+
+	/* Enable async notification */
+	ASSERT_EQ(fcntl(self->fd, F_SETFL, fcntl(self->fd, F_GETFL) | O_ASYNC), 0);
+
+	/* Receive the signal for current process */
+	ASSERT_EQ(fcntl(self->fd, F_SETOWN, getpid()), 0);
+
+	/* Use signo instead of the default SIGIO */
+	ASSERT_EQ(fcntl(self->fd, F_SETSIG, RT_SIG), 0);
+}
+
+FIXTURE_TEARDOWN(refresh_signal)
+{
+	EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_DISABLE, 0), 0);
+
+	close(self->fd);
+	sigaction(RT_SIG, &self->old_sa, NULL);
+}
+
+TEST_F(refresh_signal, refresh_stress)
+{
+	int i;
+
+	ASSERT_EQ(ioctl(self->fd, PERF_EVENT_IOC_RESET, 0), 0);
+
+	for (i = 0; i < ITERATIONS; i++) {
+		sig_atomic_t old_count = count.hup;
+
+		/* Set event limit and the event is enabled */
+		ASSERT_EQ(ioctl(self->fd, PERF_EVENT_IOC_REFRESH, EVENT_LIMIT), 0);
+
+		/*
+		 * Wait for new the POLL_HUP notification. The test is bounded
+		 * by the default timeout.
+		 */
+		while (old_count == count.hup)
+			;
+	}
+
+	/*
+	 * Events before EVENT_LIMIT are reported as POLL_IN. When the limit
+	 * is reached, the final event is reported as POLL_HUP. The total
+	 * number of events is scaled by the number of iterations.
+	 *
+	 * Events are delivered via irq_work. If timers expire too close to
+	 * each other, irq_work may not run before the next timer fires,
+	 * causing some POLL_IN events to be missed. Therefore, use a
+	 * less-or-equal comparison for POLL_IN.
+	 *
+	 * The last event stops the timer, so the POLL_HUP signal must be
+	 * observed once per iteration when the limit is reached.
+	 */
+	EXPECT_LE(count.in, (EVENT_LIMIT - 1) * ITERATIONS);
+	EXPECT_EQ(count.hup, ITERATIONS);
+}
+
+TEST_HARNESS_MAIN

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] perf/core: Ignore overflows while disable is pending
  2026-05-14 16:21 [PATCH v2 0/2] perf: Improve refresh limit in overflow handler Leo Yan
  2026-05-14 16:21 ` [PATCH v2 1/2] selftests/perf_events: Add test for refresh limit signals Leo Yan
@ 2026-05-14 16:21 ` Leo Yan
  2026-05-14 21:36   ` sashiko-bot
  1 sibling, 1 reply; 5+ messages in thread
From: Leo Yan @ 2026-05-14 16:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Shuah Khan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Sumanth Korikkar
  Cc: linux-kernel, linux-kselftest, linux-perf-users, Leo Yan

Commit 18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage") added a
direct pmu->stop(event, 0) call when the refresh limit reaches zero.

That change was based on a report [1] using SIGIO to receive POLL_HUP.
Since SIGIO is a standard signal, multiple notifications can be
coalesced and userspace may miss a signal even though the perf core
generated. Using a real-time signal avoids that coalescing and shows
that POLL_HUP is delivered reliably without directly stopping the PMU
from the overflow path.

There is still a race to handle. For a high frequency event, another
overflow can arrive after perf_event_disable_inatomic() has queued the
disable irq_work but before that irq_work has run. If overflow
processing continues, pending_kill can be overwritten from POLL_HUP
back to POLL_IN and samples can be recorded after the refresh limit has
been reached.

The direct PMU stop avoids this by stopping the hardware immediately,
but the event still relies on perf_event_disable_inatomic() to complete
the disable state transition. This is redundant that it might inject
unnecessary stop operations in the middle. More importantly, the
throttling mechanism already exists for stopping high frequency
overflows.

Make the overflow path explicitly check pending_disable instead of the
PMU stop. Once disable is pending, skip further overflow processing so
the pending POLL_HUP is preserved and no samples are recorded for an
event that is already waiting to be disabled.

[1] https://lore.kernel.org/lkml/aICYAqM5EQUlTqtX@li-2b55cdcc-350b-11b2-a85c-a78bff51fc11.ibm.com/

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 kernel/events/core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7935d5663944ee1cbaf38cf8018c3347635e8d31..0fadb53e5d79cab8cb52a08d7656b0064a77ef55 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10745,12 +10745,18 @@ static int __perf_event_overflow(struct perf_event *event,
 	 * events
 	 */

+	/*
+	 * Disable is pending, skip further overflow processing so the pending
+	 * POLL_HUP is preserved and no samples are recorded beyond the limit.
+	 */
+	if (event->pending_disable)
+		goto out;
+
 	event->pending_kill = POLL_IN;
 	if (events && atomic_dec_and_test(&event->event_limit)) {
 		ret = 1;
 		event->pending_kill = POLL_HUP;
 		perf_event_disable_inatomic(event);
-		event->pmu->stop(event, 0);
 	}

 	if (event->attr.sigtrap) {

-- 
2.34.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] selftests/perf_events: Add test for refresh limit signals
  2026-05-14 16:21 ` [PATCH v2 1/2] selftests/perf_events: Add test for refresh limit signals Leo Yan
@ 2026-05-14 20:06   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-05-14 20:06 UTC (permalink / raw)
  To: Leo Yan; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Low] Missing <sys/syscall.h> include for syscall() and __NR_perf_event_open
- [Low] FIXTURE_TEARDOWN unconditionally operates on potentially invalid file descriptor
- [Low] Using EXPECT_EQ for sigaction may cause an ungraceful test crash on setup failure
--

commit 3f198b4c551c8f0caf35b8f6bc25feee2904073b
Author: Leo Yan <leo.yan@arm.com>

selftests/perf_events: Add test for refresh limit signals

This commit adds a test to verify PERF_EVENT_IOC_REFRESH with a
task-clock software event. It uses a real-time signal to queue
notifications and verifies that exactly one POLL_HUP notification is
delivered per iteration when the refresh count reaches zero.

> diff --git a/tools/testing/selftests/perf_events/refresh_signal.c b/tools/testing/selftests/perf_events/refresh_signal.c
> new file mode 100644
> index 0000000000000..5519ea7b56f9d
> --- /dev/null
> +++ b/tools/testing/selftests/perf_events/refresh_signal.c
> @@ -0,0 +1,120 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +
> +#include <fcntl.h>
> +#include <linux/perf_event.h>
> +#include <poll.h>
> +#include <signal.h>
> +#include <sys/ioctl.h>
> +#include <time.h>
> +
> +#include "../kselftest_harness.h"

Is it necessary to explicitly include <sys/syscall.h> and <unistd.h>
here?

The test uses syscall() and __NR_perf_event_open, and relying on
implicit transitive includes from kselftest_harness.h might lead to
compilation failures on some architectures or libc implementations.

[ ... ]

> +FIXTURE_SETUP(refresh_signal)
> +{
> +	struct sigaction sa = { 0 };
> +	struct perf_event_attr attr = { 0 };
> +
> +	sa.sa_sigaction = sigio_handler;
> +	sa.sa_flags = SA_SIGINFO;
> +	sigemptyset(&sa.sa_mask);
> +
> +	/* Use a real-time signal so notifications are reliably queued */
> +	EXPECT_EQ(sigaction(RT_SIG, &sa, &self->old_sa), 0);

If sigaction() fails, EXPECT_EQ logs a failure but allows the setup to
continue.

When the perf event triggers the first RT_SIG, the process will lack a
custom handler and the default action will terminate the test program.

Would it be safer to use ASSERT_EQ here so the test aborts cleanly on
setup failure?

[ ... ]

> +	self->fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
> +	ASSERT_NE(self->fd, -1);
> +
> +	/* Enable async notification */
> +	ASSERT_EQ(fcntl(self->fd, F_SETFL, fcntl(self->fd, F_GETFL) | O_ASYNC), 0);
> +
> +	/* Receive the signal for current process */
> +	ASSERT_EQ(fcntl(self->fd, F_SETOWN, getpid()), 0);
> +
> +	/* Use signo instead of the default SIGIO */
> +	ASSERT_EQ(fcntl(self->fd, F_SETSIG, RT_SIG), 0);
> +}
> +
> +FIXTURE_TEARDOWN(refresh_signal)
> +{
> +	EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_DISABLE, 0), 0);
> +
> +	close(self->fd);
> +	sigaction(RT_SIG, &self->old_sa, NULL);
> +}

If syscall(__NR_perf_event_open) fails during FIXTURE_SETUP, self->fd is
assigned -1 and the setup aborts via ASSERT_NE.

However, the kselftest framework still executes FIXTURE_TEARDOWN, which
will unconditionally call ioctl() and close() on -1. Could this generate
additional and confusing test failure logs?

Should we verify if self->fd >= 0 before attempting to operate on it?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260514-arm_cs_clean_perf_handle-v2-0-cbb29c3b3661@arm.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] perf/core: Ignore overflows while disable is pending
  2026-05-14 16:21 ` [PATCH v2 2/2] perf/core: Ignore overflows while disable is pending Leo Yan
@ 2026-05-14 21:36   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-05-14 21:36 UTC (permalink / raw)
  To: Leo Yan; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Removing the synchronous PMU stop causes an NMI storm and temporary CPU livelocks when an event reaches its limit.
- [High] Spurious execution of BPF programs and AUX handlers thousands of times beyond the requested event limit.
- [High] A Time-of-Check to Time-of-Use (TOCTOU) race clobbers POLL_HUP and allows sample recording beyond the limit.
--

commit 17c3cee582aa984c752ecbdcaf9da672f9cf95c3
Author: Leo Yan <leo.yan@arm.com>
perf/core: Ignore overflows while disable is pending

This commit removes the direct PMU stop when an event reaches its limit,
relying instead on the irq_work queued by perf_event_disable_inatomic().
It also adds a check for pending_disable to skip further overflow processing.

Link: https://lore.kernel.org/lkml/aICYAqM5EQUlTqtX@li-2b55cdcc-350b-11b2-a85c-a78bff51fc11.ibm.com/

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
[ ... ]
> @@ -10745,12 +10745,18 @@ static int __perf_event_overflow(struct perf_event *event,
>  	 * events
>  	 */
>  
> +	/*
> +	 * Disable is pending, skip further overflow processing so the pending
> +	 * POLL_HUP is preserved and no samples are recorded beyond the limit.
> +	 */
> +	if (event->pending_disable)
> +		goto out;
> +

Looking at __perf_event_overflow(), this pending_disable check is placed after
the executions of perf_event_aux_pause() and bpf_overflow_handler().

Since removing the synchronous PMU stop delays hardware stopping until the
irq_work runs, could this cause the attached BPF programs and AUX handlers to
be spuriously invoked thousands of times beyond the event limit before this
check skips the rest of the function?

>  	event->pending_kill = POLL_IN;
>  	if (events && atomic_dec_and_test(&event->event_limit)) {
>  		ret = 1;
>  		event->pending_kill = POLL_HUP;
>  		perf_event_disable_inatomic(event);
> -		event->pmu->stop(event, 0);

Does removing the direct PMU stop here cause an NMI storm?

Because NMIs have higher priority than standard interrupts used for irq_work,
it appears a high-frequency PMU event could immediately generate another NMI
upon returning from the interrupt handler. Architecture-specific handlers
typically re-arm the counter, potentially creating a tight NMI loop that
starves the irq_work until __perf_event_account_interrupt() forces throttling
by reaching max_samples_per_tick.

Additionally, can a race condition occur when the irq_work eventually runs?

__perf_pending_disable() clears the flag before stopping the hardware:

__perf_pending_disable()
    if (event->pending_disable) {
        event->pending_disable = 0;
        perf_event_disable_local(event);
    }

If an NMI fires in the exact window between clearing the flag and the PMU
being disabled, __perf_event_overflow() would see pending_disable == 0.
It would then bypass the new goto out check, unconditionally set
event->pending_kill = POLL_IN, clobber the previously set POLL_HUP, and
record a sample beyond the requested limit.

>  	}
>  
>  	if (event->attr.sigtrap) {

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260514-arm_cs_clean_perf_handle-v2-0-cbb29c3b3661@arm.com?part=2

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-14 21:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 16:21 [PATCH v2 0/2] perf: Improve refresh limit in overflow handler Leo Yan
2026-05-14 16:21 ` [PATCH v2 1/2] selftests/perf_events: Add test for refresh limit signals Leo Yan
2026-05-14 20:06   ` sashiko-bot
2026-05-14 16:21 ` [PATCH v2 2/2] perf/core: Ignore overflows while disable is pending Leo Yan
2026-05-14 21:36   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox