* [PATCH 1/5] perf/core: Add PERF_FORMAT_DROPPED
2024-09-05 3:10 [RFC/PATCH 0/5] perf: Relax privilege restriction on AMD IBS (v3) Namhyung Kim
@ 2024-09-05 3:10 ` Namhyung Kim
2024-09-05 3:10 ` [PATCH 2/5] perf/core: Export perf_exclude_event() Namhyung Kim
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2024-09-05 3:10 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria
When a perf_event is dropped due to some kind of (SW-based) filter, it
won't generate sample data. For example, software events drops samples
when it doesn't match to privilege from exclude_{user,kernel}.
In order to account such dropped samples, add a new counter in the
perf_event, and let users can read(2) the number with the new
PERF_FORMAT_DROPPED like the lost sample count.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
include/linux/perf_event.h | 1 +
include/uapi/linux/perf_event.h | 5 ++++-
kernel/events/core.c | 12 ++++++++++++
3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 701549967c1854e3..955d39543398afb0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -809,6 +809,7 @@ struct perf_event {
u64 id;
atomic64_t lost_samples;
+ atomic64_t dropped_samples;
u64 (*clock)(void);
perf_overflow_handler_t overflow_handler;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 58daf6156fd0ee66..6f19d4f7482389b7 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -347,6 +347,7 @@ enum {
* { u64 time_running; } && PERF_FORMAT_TOTAL_TIME_RUNNING
* { u64 id; } && PERF_FORMAT_ID
* { u64 lost; } && PERF_FORMAT_LOST
+ * { u64 dropped; } && PERF_FORMAT_DROPPED
* } && !PERF_FORMAT_GROUP
*
* { u64 nr;
@@ -355,6 +356,7 @@ enum {
* { u64 value;
* { u64 id; } && PERF_FORMAT_ID
* { u64 lost; } && PERF_FORMAT_LOST
+ * { u64 dropped; } && PERF_FORMAT_DROPPED
* } cntr[nr];
* } && PERF_FORMAT_GROUP
* };
@@ -365,8 +367,9 @@ enum perf_event_read_format {
PERF_FORMAT_ID = 1U << 2,
PERF_FORMAT_GROUP = 1U << 3,
PERF_FORMAT_LOST = 1U << 4,
+ PERF_FORMAT_DROPPED = 1U << 5,
- PERF_FORMAT_MAX = 1U << 5, /* non-ABI */
+ PERF_FORMAT_MAX = 1U << 6, /* non-ABI */
};
#define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6a720f412258079..4d72538628ee62f4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5679,6 +5679,8 @@ static int __perf_read_group_add(struct perf_event *leader,
values[n++] = primary_event_id(leader);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&leader->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&leader->dropped_samples);
for_each_sibling_event(sub, leader) {
values[n++] += perf_event_count(sub, false);
@@ -5686,6 +5688,8 @@ static int __perf_read_group_add(struct perf_event *leader,
values[n++] = primary_event_id(sub);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&sub->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&sub->dropped_samples);
}
unlock:
@@ -5751,6 +5755,8 @@ static int perf_read_one(struct perf_event *event,
values[n++] = primary_event_id(event);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&event->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&event->dropped_samples);
if (copy_to_user(buf, values, n * sizeof(u64)))
return -EFAULT;
@@ -7348,6 +7354,8 @@ static void perf_output_read_one(struct perf_output_handle *handle,
values[n++] = primary_event_id(event);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&event->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&event->dropped_samples);
__output_copy(handle, values, n * sizeof(u64));
}
@@ -7386,6 +7394,8 @@ static void perf_output_read_group(struct perf_output_handle *handle,
values[n++] = primary_event_id(leader);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&leader->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&leader->dropped_samples);
__output_copy(handle, values, n * sizeof(u64));
@@ -7401,6 +7411,8 @@ static void perf_output_read_group(struct perf_output_handle *handle,
values[n++] = primary_event_id(sub);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&sub->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&sub->dropped_samples);
__output_copy(handle, values, n * sizeof(u64));
}
--
2.46.0.469.g59c65b2a67-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 2/5] perf/core: Export perf_exclude_event()
2024-09-05 3:10 [RFC/PATCH 0/5] perf: Relax privilege restriction on AMD IBS (v3) Namhyung Kim
2024-09-05 3:10 ` [PATCH 1/5] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
@ 2024-09-05 3:10 ` Namhyung Kim
2024-09-05 3:10 ` [PATCH 3/5] perf/core: Account dropped samples from BPF Namhyung Kim
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2024-09-05 3:10 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Richter, linux-s390
And increase the dropped_sample count when it returns 1. Now it can
track how many samples are dropped due to the privilege filters in
software events.
While at it, rename the same function in s390 cpum_sf PMU and also count
the dropped samples.
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/s390/kernel/perf_cpum_sf.c | 8 +++++---
include/linux/perf_event.h | 6 ++++++
kernel/events/core.c | 11 +++++++----
3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 736c1d9632dd554f..f663530ae1f6ba5d 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -1074,7 +1074,7 @@ static void cpumsf_pmu_disable(struct pmu *pmu)
cpuhw->flags &= ~PMU_F_ENABLED;
}
-/* perf_exclude_event() - Filter event
+/* perf_event_exclude() - Filter event
* @event: The perf event
* @regs: pt_regs structure
* @sde_regs: Sample-data-entry (sde) regs structure
@@ -1083,7 +1083,7 @@ static void cpumsf_pmu_disable(struct pmu *pmu)
*
* Return non-zero if the event shall be excluded.
*/
-static int perf_exclude_event(struct perf_event *event, struct pt_regs *regs,
+static int perf_event_exclude(struct perf_event *event, struct pt_regs *regs,
struct perf_sf_sde_regs *sde_regs)
{
if (event->attr.exclude_user && user_mode(regs))
@@ -1166,8 +1166,10 @@ static int perf_push_sample(struct perf_event *event,
data.tid_entry.pid = basic->hpp & LPP_PID_MASK;
overflow = 0;
- if (perf_exclude_event(event, ®s, sde_regs))
+ if (perf_event_exclude(event, ®s, sde_regs)) {
+ atomic64_inc(&event->dropped_samples);
goto out;
+ }
if (perf_event_overflow(event, &data, ®s)) {
overflow = 1;
event->pmu->stop(event, 0);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 955d39543398afb0..aaa4bc582d2f1d4e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1634,6 +1634,8 @@ static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
}
+extern int perf_exclude_event(struct perf_event *event, struct pt_regs *regs);
+
extern void perf_event_init(void);
extern void perf_tp_event(u16 event_type, u64 count, void *record,
int entry_size, struct pt_regs *regs,
@@ -1817,6 +1819,10 @@ static inline u64 perf_event_pause(struct perf_event *event, bool reset)
{
return 0;
}
+static inline int perf_exclude_event(struct perf_event *event, struct pt_regs *regs)
+{
+ return 0;
+}
#endif
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4d72538628ee62f4..8250e76f63358689 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9978,18 +9978,21 @@ static void perf_swevent_event(struct perf_event *event, u64 nr,
perf_swevent_overflow(event, 0, data, regs);
}
-static int perf_exclude_event(struct perf_event *event,
- struct pt_regs *regs)
+int perf_exclude_event(struct perf_event *event, struct pt_regs *regs)
{
if (event->hw.state & PERF_HES_STOPPED)
return 1;
if (regs) {
- if (event->attr.exclude_user && user_mode(regs))
+ if (event->attr.exclude_user && user_mode(regs)) {
+ atomic64_inc(&event->dropped_samples);
return 1;
+ }
- if (event->attr.exclude_kernel && !user_mode(regs))
+ if (event->attr.exclude_kernel && !user_mode(regs)) {
+ atomic64_inc(&event->dropped_samples);
return 1;
+ }
}
return 0;
--
2.46.0.469.g59c65b2a67-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 3/5] perf/core: Account dropped samples from BPF
2024-09-05 3:10 [RFC/PATCH 0/5] perf: Relax privilege restriction on AMD IBS (v3) Namhyung Kim
2024-09-05 3:10 ` [PATCH 1/5] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
2024-09-05 3:10 ` [PATCH 2/5] perf/core: Export perf_exclude_event() Namhyung Kim
@ 2024-09-05 3:10 ` Namhyung Kim
2024-09-05 4:17 ` Kyle Huey
2024-09-05 3:10 ` [PATCH 4/5] perf/powerpc: Count dropped samples in core-book3s PMU Namhyung Kim
` (2 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2024-09-05 3:10 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Alexei Starovoitov, Andrii Nakryiko, Song Liu, Kyle Huey, bpf
Like in the software events, the BPF overflow handler can drop samples
by returning 0. Let's count the dropped samples here too.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Kyle Huey <me@kylehuey.com>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
kernel/events/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8250e76f63358689..ba1f6b51ea26db5b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9808,8 +9808,10 @@ static int __perf_event_overflow(struct perf_event *event,
ret = __perf_event_account_interrupt(event, throttle);
- if (event->prog && !bpf_overflow_handler(event, data, regs))
+ if (event->prog && !bpf_overflow_handler(event, data, regs)) {
+ atomic64_inc(&event->dropped_samples);
return ret;
+ }
/*
* XXX event_limit might not quite work as expected on inherited
--
2.46.0.469.g59c65b2a67-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH 3/5] perf/core: Account dropped samples from BPF
2024-09-05 3:10 ` [PATCH 3/5] perf/core: Account dropped samples from BPF Namhyung Kim
@ 2024-09-05 4:17 ` Kyle Huey
0 siblings, 0 replies; 13+ messages in thread
From: Kyle Huey @ 2024-09-05 4:17 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ravi Bangoria, Alexei Starovoitov,
Andrii Nakryiko, Song Liu, bpf
On Wed, Sep 4, 2024 at 8:10 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Like in the software events, the BPF overflow handler can drop samples
> by returning 0. Let's count the dropped samples here too.
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Song Liu <song@kernel.org>
> Cc: Kyle Huey <me@kylehuey.com>
> Cc: bpf@vger.kernel.org
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> kernel/events/core.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8250e76f63358689..ba1f6b51ea26db5b 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9808,8 +9808,10 @@ static int __perf_event_overflow(struct perf_event *event,
>
> ret = __perf_event_account_interrupt(event, throttle);
>
> - if (event->prog && !bpf_overflow_handler(event, data, regs))
> + if (event->prog && !bpf_overflow_handler(event, data, regs)) {
> + atomic64_inc(&event->dropped_samples);
> return ret;
> + }
>
> /*
> * XXX event_limit might not quite work as expected on inherited
> --
> 2.46.0.469.g59c65b2a67-goog
>
lgtm
- Kyle
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/5] perf/powerpc: Count dropped samples in core-book3s PMU
2024-09-05 3:10 [RFC/PATCH 0/5] perf: Relax privilege restriction on AMD IBS (v3) Namhyung Kim
` (2 preceding siblings ...)
2024-09-05 3:10 ` [PATCH 3/5] perf/core: Account dropped samples from BPF Namhyung Kim
@ 2024-09-05 3:10 ` Namhyung Kim
2024-09-13 4:46 ` Madhavan Srinivasan
2024-09-05 3:10 ` [PATCH 5/5] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
2024-09-09 18:16 ` [RFC/PATCH 0/5] perf: Relax privilege restriction on AMD IBS (v3) Namhyung Kim
5 siblings, 1 reply; 13+ messages in thread
From: Namhyung Kim @ 2024-09-05 3:10 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
Kajol Jain, Athira Rajeev, linuxppc-dev
Due to the hardware limitation, sometimes it could sample kernel address
while attr.exclude_kernel is set. In that case it silently drops the
sample. Let's count that case in the new dropped_samples counter.
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/powerpc/perf/core-book3s.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 42867469752d73cf..553e288b9f113836 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2287,8 +2287,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
*/
if (event->attr.exclude_kernel &&
(event->attr.sample_type & PERF_SAMPLE_IP) &&
- is_kernel_addr(mfspr(SPRN_SIAR)))
+ is_kernel_addr(mfspr(SPRN_SIAR))) {
+ atomic64_inc(&event->dropped_samples);
record = 0;
+ }
/*
* Finally record data if requested.
--
2.46.0.469.g59c65b2a67-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH 4/5] perf/powerpc: Count dropped samples in core-book3s PMU
2024-09-05 3:10 ` [PATCH 4/5] perf/powerpc: Count dropped samples in core-book3s PMU Namhyung Kim
@ 2024-09-13 4:46 ` Madhavan Srinivasan
0 siblings, 0 replies; 13+ messages in thread
From: Madhavan Srinivasan @ 2024-09-13 4:46 UTC (permalink / raw)
To: Namhyung Kim, Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
Kajol Jain, Athira Rajeev, linuxppc-dev
On 9/5/24 8:40 AM, Namhyung Kim wrote:
> Due to the hardware limitation, sometimes it could sample kernel address
> while attr.exclude_kernel is set. In that case it silently drops the
> sample. Let's count that case in the new dropped_samples counter.
Nice catch. Thanks for the fix.
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Naveen N Rao <naveen@kernel.org>
> Cc: Kajol Jain <kjain@linux.ibm.com>
> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> arch/powerpc/perf/core-book3s.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 42867469752d73cf..553e288b9f113836 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -2287,8 +2287,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
> */
> if (event->attr.exclude_kernel &&
> (event->attr.sample_type & PERF_SAMPLE_IP) &&
> - is_kernel_addr(mfspr(SPRN_SIAR)))
> + is_kernel_addr(mfspr(SPRN_SIAR))) {
> + atomic64_inc(&event->dropped_samples);
> record = 0;
> + }
>
> /*
> * Finally record data if requested.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 5/5] perf/x86: Relax privilege filter restriction on AMD IBS
2024-09-05 3:10 [RFC/PATCH 0/5] perf: Relax privilege restriction on AMD IBS (v3) Namhyung Kim
` (3 preceding siblings ...)
2024-09-05 3:10 ` [PATCH 4/5] perf/powerpc: Count dropped samples in core-book3s PMU Namhyung Kim
@ 2024-09-05 3:10 ` Namhyung Kim
2024-09-23 10:33 ` Ravi Bangoria
2024-10-15 13:36 ` Ravi Bangoria
2024-09-09 18:16 ` [RFC/PATCH 0/5] perf: Relax privilege restriction on AMD IBS (v3) Namhyung Kim
5 siblings, 2 replies; 13+ messages in thread
From: Namhyung Kim @ 2024-09-05 3:10 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Ananth Narayan, Sandipan Das
While IBS is available for per-thread profiling, still regular users
cannot open an event due to the default paranoid setting (2) which
doesn't allow unprivileged users to get kernel samples. That means
it needs to set exclude_kernel bit in the attribute but IBS driver
would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
we want and I've been getting requests to fix this issue.
This should be done in the hardware, but until we get the HW fix we may
allow exclude_{kernel,user} in the attribute and silently drop the
samples in the PMU IRQ handler. It won't guarantee the sampling
frequency or even it'd miss some with fixed period too. Not ideal,
but that'd still be helpful to regular users.
To minimize the confusion, let's add 'swfilt' bit to attr.config2 which
is exposed in the sysfs format directory so that users can figure out
if the kernel support the privilege filters by software.
$ perf record -e ibs_op/swfilt=1/uh true
This uses perf_exclude_event() which checks regs->cs. But it should be
fine because set_linear_ip() also updates the CS according to the RIP
provided by IBS.
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/x86/events/amd/ibs.c | 50 ++++++++++++++++++++++++++++-----------
1 file changed, 36 insertions(+), 14 deletions(-)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d6243e4..7fea9527971a7aae 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -31,6 +31,8 @@ static u32 ibs_caps;
#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
+/* attr.config2 */
+#define IBS_SW_FILTER_MASK 1
/*
* IBS states:
@@ -290,6 +292,15 @@ static int perf_ibs_init(struct perf_event *event)
if (has_branch_stack(event))
return -EOPNOTSUPP;
+ /* handle exclude_{user,kernel} in the IRQ handler */
+ if (event->attr.exclude_hv || event->attr.exclude_idle ||
+ event->attr.exclude_host || event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (!(event->attr.config2 & IBS_SW_FILTER_MASK) &&
+ (event->attr.exclude_kernel || event->attr.exclude_user))
+ return -EINVAL;
+
ret = validate_group(event);
if (ret)
return ret;
@@ -550,24 +561,14 @@ static struct attribute *attrs_empty[] = {
NULL,
};
-static struct attribute_group empty_format_group = {
- .name = "format",
- .attrs = attrs_empty,
-};
-
static struct attribute_group empty_caps_group = {
.name = "caps",
.attrs = attrs_empty,
};
-static const struct attribute_group *empty_attr_groups[] = {
- &empty_format_group,
- &empty_caps_group,
- NULL,
-};
-
PMU_FORMAT_ATTR(rand_en, "config:57");
PMU_FORMAT_ATTR(cnt_ctl, "config:19");
+PMU_FORMAT_ATTR(swfilt, "config2:0");
PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1");
@@ -583,6 +584,11 @@ static struct attribute *rand_en_attrs[] = {
NULL,
};
+static struct attribute *swfilt_attrs[] = {
+ &format_attr_swfilt.attr,
+ NULL,
+};
+
static struct attribute *fetch_l3missonly_attrs[] = {
&fetch_l3missonly.attr.attr,
NULL,
@@ -598,6 +604,11 @@ static struct attribute_group group_rand_en = {
.attrs = rand_en_attrs,
};
+static struct attribute_group group_swfilt = {
+ .name = "format",
+ .attrs = swfilt_attrs,
+};
+
static struct attribute_group group_fetch_l3missonly = {
.name = "format",
.attrs = fetch_l3missonly_attrs,
@@ -612,6 +623,7 @@ static struct attribute_group group_zen4_ibs_extensions = {
static const struct attribute_group *fetch_attr_groups[] = {
&group_rand_en,
+ &group_swfilt,
&empty_caps_group,
NULL,
};
@@ -650,6 +662,12 @@ static struct attribute_group group_op_l3missonly = {
.is_visible = zen4_ibs_extensions_is_visible,
};
+static const struct attribute_group *op_attr_groups[] = {
+ &group_swfilt,
+ &empty_caps_group,
+ NULL,
+};
+
static const struct attribute_group *op_attr_update[] = {
&group_cnt_ctl,
&group_op_l3missonly,
@@ -667,7 +685,6 @@ static struct perf_ibs perf_ibs_fetch = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr = MSR_AMD64_IBSFETCHCTL,
.config_mask = IBS_FETCH_CONFIG_MASK,
@@ -691,7 +708,6 @@ static struct perf_ibs perf_ibs_op = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr = MSR_AMD64_IBSOPCTL,
.config_mask = IBS_OP_CONFIG_MASK,
@@ -1111,6 +1127,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
regs.flags |= PERF_EFLAGS_EXACT;
}
+ if ((event->attr.config2 & IBS_SW_FILTER_MASK) &&
+ perf_exclude_event(event, ®s)) {
+ throttle = perf_event_account_interrupt(event);
+ goto out;
+ }
+
if (event->attr.sample_type & PERF_SAMPLE_RAW) {
raw = (struct perf_raw_record){
.frag = {
@@ -1228,7 +1250,7 @@ static __init int perf_ibs_op_init(void)
if (ibs_caps & IBS_CAPS_ZEN4)
perf_ibs_op.config_mask |= IBS_OP_L3MISSONLY;
- perf_ibs_op.pmu.attr_groups = empty_attr_groups;
+ perf_ibs_op.pmu.attr_groups = op_attr_groups;
perf_ibs_op.pmu.attr_update = op_attr_update;
return perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
--
2.46.0.469.g59c65b2a67-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH 5/5] perf/x86: Relax privilege filter restriction on AMD IBS
2024-09-05 3:10 ` [PATCH 5/5] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
@ 2024-09-23 10:33 ` Ravi Bangoria
2024-10-22 22:24 ` Namhyung Kim
2024-10-15 13:36 ` Ravi Bangoria
1 sibling, 1 reply; 13+ messages in thread
From: Ravi Bangoria @ 2024-09-23 10:33 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ananth Narayan, Sandipan Das, Ravi Bangoria
Hi Namhyung,
> While IBS is available for per-thread profiling, still regular users
> cannot open an event due to the default paranoid setting (2) which
> doesn't allow unprivileged users to get kernel samples. That means
> it needs to set exclude_kernel bit in the attribute but IBS driver
> would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
> we want and I've been getting requests to fix this issue.
I'm working on some IBS improvements that impacts this change as well.
Is it be possible to hold off this patch for some time. I'll try to
post my patches soon.
> @@ -1111,6 +1127,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> regs.flags |= PERF_EFLAGS_EXACT;
> }
>
> + if ((event->attr.config2 & IBS_SW_FILTER_MASK) &&
> + perf_exclude_event(event, ®s)) {
> + throttle = perf_event_account_interrupt(event);
> + goto out;
> + }
Throttling can give surprises when the sample period is very small.
For ex,
$ ./perf record -e cycles:uh -c 192 -- make
[ perf record: Woken up 52 times to write data ]
[ perf record: Captured and wrote 23.016 MB perf.data (705634 samples) ]
$ ./perf record -e ibs_op/swfilt=1/uh -c 192 -- make
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.608 MB perf.data (19 samples) ]
It seems like the IBS event gets throttled (and disabled) before the
cpu get a chance to go back to userspace), hence we end up with very
few samples.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 5/5] perf/x86: Relax privilege filter restriction on AMD IBS
2024-09-23 10:33 ` Ravi Bangoria
@ 2024-10-22 22:24 ` Namhyung Kim
0 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2024-10-22 22:24 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ananth Narayan, Sandipan Das
Hello,
On Mon, Sep 23, 2024 at 04:03:47PM +0530, Ravi Bangoria wrote:
> Hi Namhyung,
>
> > While IBS is available for per-thread profiling, still regular users
> > cannot open an event due to the default paranoid setting (2) which
> > doesn't allow unprivileged users to get kernel samples. That means
> > it needs to set exclude_kernel bit in the attribute but IBS driver
> > would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
> > we want and I've been getting requests to fix this issue.
>
> I'm working on some IBS improvements that impacts this change as well.
> Is it be possible to hold off this patch for some time. I'll try to
> post my patches soon.
>
> > @@ -1111,6 +1127,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> > regs.flags |= PERF_EFLAGS_EXACT;
> > }
> >
> > + if ((event->attr.config2 & IBS_SW_FILTER_MASK) &&
> > + perf_exclude_event(event, ®s)) {
> > + throttle = perf_event_account_interrupt(event);
> > + goto out;
> > + }
>
> Throttling can give surprises when the sample period is very small.
> For ex,
>
> $ ./perf record -e cycles:uh -c 192 -- make
> [ perf record: Woken up 52 times to write data ]
> [ perf record: Captured and wrote 23.016 MB perf.data (705634 samples) ]
>
> $ ./perf record -e ibs_op/swfilt=1/uh -c 192 -- make
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.608 MB perf.data (19 samples) ]
>
> It seems like the IBS event gets throttled (and disabled) before the
> cpu get a chance to go back to userspace), hence we end up with very
> few samples.
Thanks for raising this issue. This indeed looks like a surprising
result. Not sure what we can do here other than adding a documentation
to refrain from using such a small period. I don't think we want to
skip the throttling logic for the filtered samples. Otherwise it can be
used for DoS-like attacks IMHO.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 5/5] perf/x86: Relax privilege filter restriction on AMD IBS
2024-09-05 3:10 ` [PATCH 5/5] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
2024-09-23 10:33 ` Ravi Bangoria
@ 2024-10-15 13:36 ` Ravi Bangoria
2024-10-22 22:25 ` Namhyung Kim
1 sibling, 1 reply; 13+ messages in thread
From: Ravi Bangoria @ 2024-10-15 13:36 UTC (permalink / raw)
To: Namhyung Kim, Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ananth Narayan,
Sandipan Das, Ravi Bangoria
Hi Namhyung,
On 05-Sep-24 8:40 AM, Namhyung Kim wrote:
> While IBS is available for per-thread profiling, still regular users
> cannot open an event due to the default paranoid setting (2) which
> doesn't allow unprivileged users to get kernel samples. That means
> it needs to set exclude_kernel bit in the attribute but IBS driver
> would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
> we want and I've been getting requests to fix this issue.
>
> This should be done in the hardware, but until we get the HW fix we may
> allow exclude_{kernel,user} in the attribute and silently drop the
> samples in the PMU IRQ handler. It won't guarantee the sampling
> frequency or even it'd miss some with fixed period too. Not ideal,
> but that'd still be helpful to regular users.
>
> To minimize the confusion, let's add 'swfilt' bit to attr.config2 which
> is exposed in the sysfs format directory so that users can figure out
> if the kernel support the privilege filters by software.
>
> $ perf record -e ibs_op/swfilt=1/uh true
Shall we add an example in tools/perf/Documentation/perf-amd-ibs.txt?
> +static struct attribute *swfilt_attrs[] = {
> + &format_attr_swfilt.attr,
> + NULL,
> +};
> +
> static struct attribute *fetch_l3missonly_attrs[] = {
> &fetch_l3missonly.attr.attr,
> NULL,
> @@ -598,6 +604,11 @@ static struct attribute_group group_rand_en = {
> .attrs = rand_en_attrs,
> };
>
> +static struct attribute_group group_swfilt = {
> + .name = "format",
> + .attrs = swfilt_attrs,
> +};
> +
> static struct attribute_group group_fetch_l3missonly = {
> .name = "format",
> .attrs = fetch_l3missonly_attrs,
> @@ -612,6 +623,7 @@ static struct attribute_group group_zen4_ibs_extensions = {
>
> static const struct attribute_group *fetch_attr_groups[] = {
> &group_rand_en,
> + &group_swfilt,
> &empty_caps_group,
> NULL,
> };
Causes:
# dmesg
sysfs: cannot create duplicate filename '/devices/ibs_fetch/format'
Failed to register pmu: ibs_fetch, reason -17
Rename rand_en_attrs[] to fetch_attrs[], add &format_attr_swfilt.attr
to it and remove &group_swfilt from fetch_attr_groups[]. And I guess
it should work.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 5/5] perf/x86: Relax privilege filter restriction on AMD IBS
2024-10-15 13:36 ` Ravi Bangoria
@ 2024-10-22 22:25 ` Namhyung Kim
0 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2024-10-22 22:25 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ananth Narayan, Sandipan Das
On Tue, Oct 15, 2024 at 07:06:24PM +0530, Ravi Bangoria wrote:
> Hi Namhyung,
>
> On 05-Sep-24 8:40 AM, Namhyung Kim wrote:
> > While IBS is available for per-thread profiling, still regular users
> > cannot open an event due to the default paranoid setting (2) which
> > doesn't allow unprivileged users to get kernel samples. That means
> > it needs to set exclude_kernel bit in the attribute but IBS driver
> > would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
> > we want and I've been getting requests to fix this issue.
> >
> > This should be done in the hardware, but until we get the HW fix we may
> > allow exclude_{kernel,user} in the attribute and silently drop the
> > samples in the PMU IRQ handler. It won't guarantee the sampling
> > frequency or even it'd miss some with fixed period too. Not ideal,
> > but that'd still be helpful to regular users.
> >
> > To minimize the confusion, let's add 'swfilt' bit to attr.config2 which
> > is exposed in the sysfs format directory so that users can figure out
> > if the kernel support the privilege filters by software.
> >
> > $ perf record -e ibs_op/swfilt=1/uh true
>
> Shall we add an example in tools/perf/Documentation/perf-amd-ibs.txt?
Yep, I'll update the document when I send the tooling changes.
>
>
> > +static struct attribute *swfilt_attrs[] = {
> > + &format_attr_swfilt.attr,
> > + NULL,
> > +};
> > +
> > static struct attribute *fetch_l3missonly_attrs[] = {
> > &fetch_l3missonly.attr.attr,
> > NULL,
> > @@ -598,6 +604,11 @@ static struct attribute_group group_rand_en = {
> > .attrs = rand_en_attrs,
> > };
> >
> > +static struct attribute_group group_swfilt = {
> > + .name = "format",
> > + .attrs = swfilt_attrs,
> > +};
> > +
> > static struct attribute_group group_fetch_l3missonly = {
> > .name = "format",
> > .attrs = fetch_l3missonly_attrs,
> > @@ -612,6 +623,7 @@ static struct attribute_group group_zen4_ibs_extensions = {
> >
> > static const struct attribute_group *fetch_attr_groups[] = {
> > &group_rand_en,
> > + &group_swfilt,
> > &empty_caps_group,
> > NULL,
> > };
>
> Causes:
>
> # dmesg
> sysfs: cannot create duplicate filename '/devices/ibs_fetch/format'
> Failed to register pmu: ibs_fetch, reason -17
>
> Rename rand_en_attrs[] to fetch_attrs[], add &format_attr_swfilt.attr
> to it and remove &group_swfilt from fetch_attr_groups[]. And I guess
> it should work.
Thanks for the review, I'll update the code as you suggested.
Namhyung
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC/PATCH 0/5] perf: Relax privilege restriction on AMD IBS (v3)
2024-09-05 3:10 [RFC/PATCH 0/5] perf: Relax privilege restriction on AMD IBS (v3) Namhyung Kim
` (4 preceding siblings ...)
2024-09-05 3:10 ` [PATCH 5/5] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
@ 2024-09-09 18:16 ` Namhyung Kim
5 siblings, 0 replies; 13+ messages in thread
From: Namhyung Kim @ 2024-09-09 18:16 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria
On Wed, Sep 04, 2024 at 08:10:22PM -0700, Namhyung Kim wrote:
> Hello,
>
> This is RFC v3 to allow AMD IBS to regular users on the default settings
> where kernel-level profiling is disabled (perf_event_paranoid=2).
> Currently AMD IBS doesn't allow any kind of exclusion in the event
> attribute. But users needs to set attr.exclude_kernel to open an event
> on such an environment.
>
> v3 changes)
>
> * fix build on s390
> * add swfilt format for attr.config2
> * count powerpc core-book3s dropped samples
>
> v2) https://lore.kernel.org/lkml/20240830232910.1839548-1-namhyung@kernel.org/
>
> * add PERF_FORMAT_DROPPED
> * account dropped sw events and from BPF handler
> * use precise RIP from IBS record
>
> v1) https://lore.kernel.org/lkml/20240822230816.564262-1-namhyung@kernel.org/
>
> While IBS doesn't support hardware level privilege filters, the kernel
> can allow the event and drop samples belongs to the kernel like in the
> software events. This is limited but it still contains precise samples
> which is important for various analysis like data type profiling.
>
> This version added format/swfilt file in sysfs to expose the software
> filtering by setting the attribute config2 value. I think it's easier
> to add a new config rather than adding a new PMU in order to handle
> event multiplexing across IBS PMU. Users can use the perf tool to
> enable this feature manually like below. Probably the perf tool can
> handle this automatically in the future.
>
> $ perf record -e ibs_op/swfilt=1/uh $PROG
>
> (Not sure if it's better to accept or ignore exclude_hv so that it can
> use ":u" modifier only.)
>
> In order to count those dropped samples correctly, I'd propose a new
> read format PERF_FORMAT_DROPPED same as we did for the lost samples.
> With this, it can count dropped samples in the software events and
> from the BPF overflow handler as well.
>
> Let me know what you think.
Hi Peter and Ingo,
can I get your opinion on this?
Thanks,
Namhyung
>
> Namhyung Kim (5):
> perf/core: Add PERF_FORMAT_DROPPED
> perf/core: Export perf_exclude_event()
> perf/core: Account dropped samples from BPF
> perf/powerpc: Count dropped samples in core-book3s PMU
> perf/x86: Relax privilege filter restriction on AMD IBS
>
> arch/powerpc/perf/core-book3s.c | 4 ++-
> arch/s390/kernel/perf_cpum_sf.c | 8 ++++--
> arch/x86/events/amd/ibs.c | 50 ++++++++++++++++++++++++---------
> include/linux/perf_event.h | 7 +++++
> include/uapi/linux/perf_event.h | 5 +++-
> kernel/events/core.c | 27 ++++++++++++++----
> 6 files changed, 77 insertions(+), 24 deletions(-)
>
> --
> 2.46.0.469.g59c65b2a67-goog
>
^ permalink raw reply [flat|nested] 13+ messages in thread