* [PATCH 0/2] perf: Add PERF_EVENT_IOC_SET_EVENT_LIMIT
@ 2024-07-24 22:54 Charlie Jenkins
2024-07-24 22:54 ` [PATCH 1/2] perf: Add PERF_EVENT_IOC_INC_EVENT_LIMIT Charlie Jenkins
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Charlie Jenkins @ 2024-07-24 22:54 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Atish Patra
Cc: linux-perf-users, linux-kernel, Charlie Jenkins
My motivation here is to have a program that is able to detect when an
arbitrary other program has reviewed some number of PMU overflows.
PERF_EVENT_IOC_REFRESH is not useful in the scenario since that ioctl
enables the event immediately. This new ioctl flag
PERF_EVENT_IOC_SET_EVENT_LIMIT allows a program to setup a event_limit
and then enable the event when ready at some later time.
This solves the first problem. The second problem that I am debating a
solution for is that the refresh SIGIO signals are marked as duplicate
signals and not always delivered to userspace. I will attach a simple
program below that can be ran a handful of times to catch this behavior.
REFRESH is supposed to send a SIGIO for every overflow, with an si_code
of POLL_IN for each signal except the last one which has a si_code of
POLL_HUP. However, __send_signal_locked() in kernel/signal.c will
occasionally catch this POLL_HUP signal before all POLL_IN signals have
been processed, thus treating this signal as a duplicate and dropping
with this logic:
pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
/*
* Short-circuit ignored signals and support queuing
* exactly one non-rt signal, so that we can get more
* detailed information about the cause of the signal.
*/
result = TRACE_SIGNAL_ALREADY_PENDING;
if (legacy_queue(pending, sig))
goto ret;
I am sending this patch to begin a conversation on this issue with
REFRESH. One solution could be to only treat a signal as a duplicate in
__send_signal_locked() if the si_code is the same. Another solution is
to only use event_limits with a value of one, as any other number will
run into this race condition. If consistent signals are desired, sigtrap
can be used, but that also doesn't support enable_on_exec.
Here is a minimal test program for dropped signals. Can be ran on a loop until
program halts waiting for signal that will never arrive.
while ./signal 1 1; do sleep .01; done
// signal.c
int fd;
pid_t pid;
struct read_format {
unsigned long long value;
unsigned long long time_enabled;
unsigned long long time_running;
};
int counter;
static inline int sys_perf_event_open(struct perf_event_attr *attr, int pid, int cpu, int group_fd,
unsigned long flags)
{
return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
}
static void sig_handler(int signum, siginfo_t *siginfo, void *user_context)
{
struct read_format rd;
counter++;
read(fd, &rd, sizeof(rd));
printf("fd: %d, signum: %d, si_code: %d, counter: %llu, time to overflow: %llu. time enabled: %llu\n", signum, siginfo->si_fd, siginfo->si_code, rd.value, rd.time_running, rd.time_enabled);
if (counter == REFRESH_COUNT) {
printf("Hit refresh count\n");
close(fd);
kill(pid, SIGKILL);
if (siginfo->si_code == POLL_HUP) {
exit(EXIT_SUCCESS);
} else {
printf("INCORRECT COUNT! Expected POLL_HUP on second signal.\n");
exit(EXIT_FAILURE);
}
}
}
int main(int argc, char *argv[])
{
char sbuf[128];
if (argc < 3)
goto invalid_args;
unsigned long long num_instructions = atoll(argv[1]);
unsigned int num_samples = atoi(argv[2]);
unsigned long long sample_periods = num_instructions / num_samples;
pid = fork();
if (pid) {
struct sigaction sa = {
.sa_sigaction = (void *) sig_handler,
.sa_flags = SA_SIGINFO
};
struct perf_event_attr pe = {
.type = PERF_TYPE_HARDWARE,
.size = sizeof(pe),
.config = PERF_COUNT_HW_INSTRUCTIONS,
.sample_period = sample_periods,
.wakeup_events = 1,
.disabled = 1,
.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD,
.read_format = PERF_FORMAT_TOTAL_TIME_RUNNING | PERF_FORMAT_TOTAL_TIME_ENABLED,
.exclude_kernel = 1,
.exclude_idle = 1,
.exclude_hv = 1
};
if (sigaction(SIGIO, &sa, NULL) < 0) {
fprintf(stderr, "FAILED setting up signal handler\n");
exit(EXIT_FAILURE);
}
fd = sys_perf_event_open(&pe, pid, -1, -1, PERF_FLAG_FD_CLOEXEC);
if (fd == -1) {
fprintf(stderr, "FAILED opening perf: %s\n", strerror_r(errno, sbuf, sizeof(sbuf)));
exit(EXIT_FAILURE);
}
ioctl(fd, PERF_EVENT_IOC_RESET, 0);
ioctl(fd, PERF_EVENT_IOC_REFRESH, REFRESH_COUNT);
fcntl(fd, F_SETFL, O_RDWR | O_NONBLOCK | O_ASYNC);
fcntl(fd, F_SETSIG, SIGIO);
fcntl(fd, F_SETOWN, getpid());
while (1);
} else {
while (1);
}
invalid_args:
printf("FAILED: Please provide at least 2 args: number of instructions, number of samples. eg: %s 10000 5\n", argv[0]);
exit(EXIT_FAILURE);
}
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
Charlie Jenkins (2):
perf: Add PERF_EVENT_IOC_INC_EVENT_LIMIT
perf: Document PERF_EVENT_IOC_SET_EVENT_LIMIT
include/linux/perf_event.h | 4 ++--
include/uapi/linux/perf_event.h | 1 +
kernel/events/core.c | 17 +++++++++++------
tools/include/uapi/linux/perf_event.h | 1 +
tools/perf/design.txt | 5 +++++
5 files changed, 20 insertions(+), 8 deletions(-)
---
base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
change-id: 20240724-perf_set_event_limit-079f1b996376
--
- Charlie
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] perf: Add PERF_EVENT_IOC_INC_EVENT_LIMIT
2024-07-24 22:54 [PATCH 0/2] perf: Add PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
@ 2024-07-24 22:54 ` Charlie Jenkins
2024-07-24 22:54 ` [PATCH 2/2] perf: Document PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
2024-07-29 21:19 ` [PATCH 0/2] perf: Add PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
2 siblings, 0 replies; 4+ messages in thread
From: Charlie Jenkins @ 2024-07-24 22:54 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Atish Patra
Cc: linux-perf-users, linux-kernel, Charlie Jenkins
PERF_EVENT_IOC_REFRESH immediately enables after incrementing
event_limit. Provide a new ioctl flag that allows programs to increment
event_limit without enabling the event. A usecase for this is to set an
event_limit in combination with enable_on_exec.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
include/linux/perf_event.h | 4 ++--
include/uapi/linux/perf_event.h | 1 +
kernel/events/core.c | 17 +++++++++++------
tools/include/uapi/linux/perf_event.h | 1 +
4 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a5304ae8c654..40025a5eb98a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1103,7 +1103,7 @@ extern int perf_event_task_enable(void);
extern void perf_pmu_resched(struct pmu *pmu);
-extern int perf_event_refresh(struct perf_event *event, int refresh);
+extern int perf_event_refresh(struct perf_event *event, int refresh, bool enable);
extern void perf_event_update_userpage(struct perf_event *event);
extern int perf_event_release_kernel(struct perf_event *event);
extern struct perf_event *
@@ -1770,7 +1770,7 @@ static inline int perf_event_read_local(struct perf_event *event, u64 *value,
static inline void perf_event_print_debug(void) { }
static inline int perf_event_task_disable(void) { return -EINVAL; }
static inline int perf_event_task_enable(void) { return -EINVAL; }
-static inline int perf_event_refresh(struct perf_event *event, int refresh)
+static inline int perf_event_refresh(struct perf_event *event, int refresh, bool enable)
{
return -EINVAL;
}
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 3a64499b0f5d..992f51effb27 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -560,6 +560,7 @@ struct perf_event_query_bpf {
#define PERF_EVENT_IOC_PAUSE_OUTPUT _IOW('$', 9, __u32)
#define PERF_EVENT_IOC_QUERY_BPF _IOWR('$', 10, struct perf_event_query_bpf *)
#define PERF_EVENT_IOC_MODIFY_ATTRIBUTES _IOW('$', 11, struct perf_event_attr *)
+#define PERF_EVENT_IOC_INC_EVENT_LIMIT _IO ('$', 12)
enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8f908f077935..b9d009733ace 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3142,7 +3142,7 @@ void perf_event_addr_filters_sync(struct perf_event *event)
}
EXPORT_SYMBOL_GPL(perf_event_addr_filters_sync);
-static int _perf_event_refresh(struct perf_event *event, int refresh)
+static int _perf_event_refresh(struct perf_event *event, int refresh, bool enable)
{
/*
* not supported on inherited events
@@ -3151,7 +3151,8 @@ static int _perf_event_refresh(struct perf_event *event, int refresh)
return -EINVAL;
atomic_add(refresh, &event->event_limit);
- _perf_event_enable(event);
+ if (enable)
+ _perf_event_enable(event);
return 0;
}
@@ -3159,13 +3160,13 @@ static int _perf_event_refresh(struct perf_event *event, int refresh)
/*
* See perf_event_disable()
*/
-int perf_event_refresh(struct perf_event *event, int refresh)
+int perf_event_refresh(struct perf_event *event, int refresh, bool enable)
{
struct perf_event_context *ctx;
int ret;
ctx = perf_event_ctx_lock(event);
- ret = _perf_event_refresh(event, refresh);
+ ret = _perf_event_refresh(event, refresh, enable);
perf_event_ctx_unlock(event, ctx);
return ret;
@@ -5920,7 +5921,7 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
break;
case PERF_EVENT_IOC_REFRESH:
- return _perf_event_refresh(event, arg);
+ return _perf_event_refresh(event, arg, true);
case PERF_EVENT_IOC_PERIOD:
{
@@ -6006,6 +6007,10 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
return perf_event_modify_attr(event, &new_attr);
}
+
+ case PERF_EVENT_IOC_INC_EVENT_LIMIT:
+ return _perf_event_refresh(event, arg, false);
+
default:
return -ENOTTY;
}
@@ -6721,7 +6726,7 @@ void perf_event_wakeup(struct perf_event *event)
ring_buffer_wakeup(event);
if (event->pending_kill) {
- kill_fasync(perf_event_fasync(event), SIGIO, event->pending_kill);
+ kill_fasync(perf_event_fasync(event), SIGTRAP, event->pending_kill);
event->pending_kill = 0;
}
}
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 3a64499b0f5d..992f51effb27 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -560,6 +560,7 @@ struct perf_event_query_bpf {
#define PERF_EVENT_IOC_PAUSE_OUTPUT _IOW('$', 9, __u32)
#define PERF_EVENT_IOC_QUERY_BPF _IOWR('$', 10, struct perf_event_query_bpf *)
#define PERF_EVENT_IOC_MODIFY_ATTRIBUTES _IOW('$', 11, struct perf_event_attr *)
+#define PERF_EVENT_IOC_INC_EVENT_LIMIT _IO ('$', 12)
enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
--
2.44.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] perf: Document PERF_EVENT_IOC_SET_EVENT_LIMIT
2024-07-24 22:54 [PATCH 0/2] perf: Add PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
2024-07-24 22:54 ` [PATCH 1/2] perf: Add PERF_EVENT_IOC_INC_EVENT_LIMIT Charlie Jenkins
@ 2024-07-24 22:54 ` Charlie Jenkins
2024-07-29 21:19 ` [PATCH 0/2] perf: Add PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
2 siblings, 0 replies; 4+ messages in thread
From: Charlie Jenkins @ 2024-07-24 22:54 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Atish Patra
Cc: linux-perf-users, linux-kernel, Charlie Jenkins
Introduce PERF_EVENT_IOC_SET_EVENT_LIMIT and explain the differences
between it and PERF_EVENT_IOC_REFRESH.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
tools/perf/design.txt | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index aa8cfeabb743..1626ae83785a 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -439,6 +439,11 @@ Additionally, non-inherited overflow counters can use
to enable a counter for 'nr' events, after which it gets disabled again.
+PERF_EVENT_IOC_REFRESH will increment the event limit by 'nr' and enable the
+event. To increment the event limit without enabling it, use the following:
+
+ ioctl(fd, PERF_EVENT_IOC_INC_EVENT_LIMIT, nr);
+
A process can enable or disable all the counter groups that are
attached to it, using prctl:
--
2.44.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] perf: Add PERF_EVENT_IOC_SET_EVENT_LIMIT
2024-07-24 22:54 [PATCH 0/2] perf: Add PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
2024-07-24 22:54 ` [PATCH 1/2] perf: Add PERF_EVENT_IOC_INC_EVENT_LIMIT Charlie Jenkins
2024-07-24 22:54 ` [PATCH 2/2] perf: Document PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
@ 2024-07-29 21:19 ` Charlie Jenkins
2 siblings, 0 replies; 4+ messages in thread
From: Charlie Jenkins @ 2024-07-29 21:19 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Atish Patra, Al Viro
Cc: linux-perf-users, linux-kernel
On Wed, Jul 24, 2024 at 03:54:10PM -0700, Charlie Jenkins wrote:
> My motivation here is to have a program that is able to detect when an
> arbitrary other program has reviewed some number of PMU overflows.
> PERF_EVENT_IOC_REFRESH is not useful in the scenario since that ioctl
> enables the event immediately. This new ioctl flag
> PERF_EVENT_IOC_SET_EVENT_LIMIT allows a program to setup a event_limit
> and then enable the event when ready at some later time.
>
> This solves the first problem. The second problem that I am debating a
> solution for is that the refresh SIGIO signals are marked as duplicate
> signals and not always delivered to userspace. I will attach a simple
> program below that can be ran a handful of times to catch this behavior.
> REFRESH is supposed to send a SIGIO for every overflow, with an si_code
> of POLL_IN for each signal except the last one which has a si_code of
> POLL_HUP. However, __send_signal_locked() in kernel/signal.c will
> occasionally catch this POLL_HUP signal before all POLL_IN signals have
> been processed, thus treating this signal as a duplicate and dropping
> with this logic:
>
> pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
> /*
> * Short-circuit ignored signals and support queuing
> * exactly one non-rt signal, so that we can get more
> * detailed information about the cause of the signal.
> */
> result = TRACE_SIGNAL_ALREADY_PENDING;
> if (legacy_queue(pending, sig))
> goto ret;
>
+ Al Viro
Do you have any input into why signals with the same signal number but
different si_code are marked as duplicates of each other? Does it need
to work this way?
- Charlie
> I am sending this patch to begin a conversation on this issue with
> REFRESH. One solution could be to only treat a signal as a duplicate in
> __send_signal_locked() if the si_code is the same. Another solution is
> to only use event_limits with a value of one, as any other number will
> run into this race condition. If consistent signals are desired, sigtrap
> can be used, but that also doesn't support enable_on_exec.
>
> Here is a minimal test program for dropped signals. Can be ran on a loop until
> program halts waiting for signal that will never arrive.
>
> while ./signal 1 1; do sleep .01; done
>
> // signal.c
>
> int fd;
> pid_t pid;
>
> struct read_format {
> unsigned long long value;
> unsigned long long time_enabled;
> unsigned long long time_running;
> };
>
> int counter;
>
> static inline int sys_perf_event_open(struct perf_event_attr *attr, int pid, int cpu, int group_fd,
> unsigned long flags)
> {
> return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
> }
>
> static void sig_handler(int signum, siginfo_t *siginfo, void *user_context)
> {
> struct read_format rd;
> counter++;
> read(fd, &rd, sizeof(rd));
> printf("fd: %d, signum: %d, si_code: %d, counter: %llu, time to overflow: %llu. time enabled: %llu\n", signum, siginfo->si_fd, siginfo->si_code, rd.value, rd.time_running, rd.time_enabled);
> if (counter == REFRESH_COUNT) {
> printf("Hit refresh count\n");
> close(fd);
> kill(pid, SIGKILL);
> if (siginfo->si_code == POLL_HUP) {
> exit(EXIT_SUCCESS);
> } else {
> printf("INCORRECT COUNT! Expected POLL_HUP on second signal.\n");
> exit(EXIT_FAILURE);
> }
> }
> }
>
> int main(int argc, char *argv[])
> {
> char sbuf[128];
>
> if (argc < 3)
> goto invalid_args;
>
> unsigned long long num_instructions = atoll(argv[1]);
> unsigned int num_samples = atoi(argv[2]);
> unsigned long long sample_periods = num_instructions / num_samples;
>
> pid = fork();
>
> if (pid) {
> struct sigaction sa = {
> .sa_sigaction = (void *) sig_handler,
> .sa_flags = SA_SIGINFO
> };
> struct perf_event_attr pe = {
> .type = PERF_TYPE_HARDWARE,
> .size = sizeof(pe),
> .config = PERF_COUNT_HW_INSTRUCTIONS,
> .sample_period = sample_periods,
> .wakeup_events = 1,
> .disabled = 1,
> .sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD,
> .read_format = PERF_FORMAT_TOTAL_TIME_RUNNING | PERF_FORMAT_TOTAL_TIME_ENABLED,
> .exclude_kernel = 1,
> .exclude_idle = 1,
> .exclude_hv = 1
> };
>
> if (sigaction(SIGIO, &sa, NULL) < 0) {
> fprintf(stderr, "FAILED setting up signal handler\n");
> exit(EXIT_FAILURE);
> }
>
> fd = sys_perf_event_open(&pe, pid, -1, -1, PERF_FLAG_FD_CLOEXEC);
>
> if (fd == -1) {
> fprintf(stderr, "FAILED opening perf: %s\n", strerror_r(errno, sbuf, sizeof(sbuf)));
> exit(EXIT_FAILURE);
> }
>
> ioctl(fd, PERF_EVENT_IOC_RESET, 0);
> ioctl(fd, PERF_EVENT_IOC_REFRESH, REFRESH_COUNT);
>
> fcntl(fd, F_SETFL, O_RDWR | O_NONBLOCK | O_ASYNC);
> fcntl(fd, F_SETSIG, SIGIO);
> fcntl(fd, F_SETOWN, getpid());
>
> while (1);
> } else {
> while (1);
> }
>
> invalid_args:
> printf("FAILED: Please provide at least 2 args: number of instructions, number of samples. eg: %s 10000 5\n", argv[0]);
> exit(EXIT_FAILURE);
> }
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
> Charlie Jenkins (2):
> perf: Add PERF_EVENT_IOC_INC_EVENT_LIMIT
> perf: Document PERF_EVENT_IOC_SET_EVENT_LIMIT
>
> include/linux/perf_event.h | 4 ++--
> include/uapi/linux/perf_event.h | 1 +
> kernel/events/core.c | 17 +++++++++++------
> tools/include/uapi/linux/perf_event.h | 1 +
> tools/perf/design.txt | 5 +++++
> 5 files changed, 20 insertions(+), 8 deletions(-)
> ---
> base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
> change-id: 20240724-perf_set_event_limit-079f1b996376
> --
> - Charlie
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-07-29 21:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-24 22:54 [PATCH 0/2] perf: Add PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
2024-07-24 22:54 ` [PATCH 1/2] perf: Add PERF_EVENT_IOC_INC_EVENT_LIMIT Charlie Jenkins
2024-07-24 22:54 ` [PATCH 2/2] perf: Document PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
2024-07-29 21:19 ` [PATCH 0/2] perf: Add PERF_EVENT_IOC_SET_EVENT_LIMIT Charlie Jenkins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).