* [PATCH v4 0/5] perf: Relax privilege restriction on AMD IBS
@ 2024-10-23 0:09 Namhyung Kim
2024-10-23 0:09 ` [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-10-23 0:09 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das
Hello,
This is RFC v3 to allow AMD IBS to regular users on the default settings
where kernel-level profiling is disabled (perf_event_paranoid=2).
Currently AMD IBS doesn't allow any kind of exclusion in the event
attribute. But users needs to set attr.exclude_kernel to open an event
on such an environment.
v4 changes)
* remove RFC tag
* fix sysfs attribute for ibs_fetch/format (Ravi)
* handle exclude_hv as well, so ":u" modifier would work for IBS
* add Acked and Reviewed-by from Kyle and Madhavan
v3 https://lore.kernel.org/lkml/20240905031027.2567913-1-namhyung@kernel.org
* fix build on s390
* add swfilt format for attr.config2
* count powerpc core-book3s dropped samples
v2) https://lore.kernel.org/lkml/20240830232910.1839548-1-namhyung@kernel.org/
* add PERF_FORMAT_DROPPED
* account dropped sw events and from BPF handler
* use precise RIP from IBS record
v1) https://lore.kernel.org/lkml/20240822230816.564262-1-namhyung@kernel.org/
While IBS doesn't support hardware level privilege filters, the kernel
can allow the event and drop samples belongs to the kernel like in the
software events. This is limited but it still contains precise samples
which is important for various analysis like data type profiling.
This version added format/swfilt file in sysfs to expose the software
filtering by setting the attribute config2 value. I think it's easier
to add a new config rather than adding a new PMU in order to handle
event multiplexing across IBS PMU. Users can use the perf tool to
enable this feature manually like below. Probably the perf tool can
handle this automatically in the future.
$ perf record -e ibs_op/swfilt=1/u $PROG
In order to count those dropped samples correctly, I'd propose a new
read format PERF_FORMAT_DROPPED same as we did for the lost samples.
With this, it can count dropped samples in the software events and
from the BPF overflow handler as well.
Let me know what you think.
Thanks,
Namhyung
Namhyung Kim (5):
perf/core: Add PERF_FORMAT_DROPPED
perf/core: Export perf_exclude_event()
perf/core: Account dropped samples from BPF
perf/powerpc: Count dropped samples in core-book3s PMU
perf/x86: Relax privilege filter restriction on AMD IBS
arch/powerpc/perf/core-book3s.c | 4 ++-
arch/s390/kernel/perf_cpum_sf.c | 8 +++--
arch/x86/events/amd/ibs.c | 59 +++++++++++++++++++++++----------
include/linux/perf_event.h | 7 ++++
include/uapi/linux/perf_event.h | 5 ++-
kernel/events/core.c | 27 ++++++++++++---
6 files changed, 82 insertions(+), 28 deletions(-)
base-commit: de20037e1b3c2f2ca97b8c12b8c7bca8abd509a7
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED
2024-10-23 0:09 [PATCH v4 0/5] perf: Relax privilege restriction on AMD IBS Namhyung Kim
@ 2024-10-23 0:09 ` Namhyung Kim
2024-10-23 11:05 ` Michael Ellerman
2024-10-23 0:09 ` [PATCH v4 2/5] perf/core: Export perf_exclude_event() Namhyung Kim
` (4 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Namhyung Kim @ 2024-10-23 0:09 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das
When a perf_event is dropped due to some kind of (SW-based) filter, it
won't generate sample data. For example, software events drops samples
when it doesn't match to privilege from exclude_{user,kernel}.
In order to account such dropped samples, add a new counter in the
perf_event, and let users can read(2) the number with the new
PERF_FORMAT_DROPPED like the lost sample count.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
include/linux/perf_event.h | 1 +
include/uapi/linux/perf_event.h | 5 ++++-
kernel/events/core.c | 12 ++++++++++++
3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index fb908843f209288d..c1e6340e561c400e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -830,6 +830,7 @@ struct perf_event {
u64 id;
atomic64_t lost_samples;
+ atomic64_t dropped_samples;
u64 (*clock)(void);
perf_overflow_handler_t overflow_handler;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4842c36fdf801996..7813e05218657713 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -347,6 +347,7 @@ enum {
* { u64 time_running; } && PERF_FORMAT_TOTAL_TIME_RUNNING
* { u64 id; } && PERF_FORMAT_ID
* { u64 lost; } && PERF_FORMAT_LOST
+ * { u64 dropped; } && PERF_FORMAT_DROPPED
* } && !PERF_FORMAT_GROUP
*
* { u64 nr;
@@ -355,6 +356,7 @@ enum {
* { u64 value;
* { u64 id; } && PERF_FORMAT_ID
* { u64 lost; } && PERF_FORMAT_LOST
+ * { u64 dropped; } && PERF_FORMAT_DROPPED
* } cntr[nr];
* } && PERF_FORMAT_GROUP
* };
@@ -365,8 +367,9 @@ enum perf_event_read_format {
PERF_FORMAT_ID = 1U << 2,
PERF_FORMAT_GROUP = 1U << 3,
PERF_FORMAT_LOST = 1U << 4,
+ PERF_FORMAT_DROPPED = 1U << 5,
- PERF_FORMAT_MAX = 1U << 5, /* non-ABI */
+ PERF_FORMAT_MAX = 1U << 6, /* non-ABI */
};
#define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e3589c4287cb458c..7e15fe0a8dee4ee7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5697,6 +5697,8 @@ static int __perf_read_group_add(struct perf_event *leader,
values[n++] = primary_event_id(leader);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&leader->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&leader->dropped_samples);
for_each_sibling_event(sub, leader) {
values[n++] += perf_event_count(sub, false);
@@ -5704,6 +5706,8 @@ static int __perf_read_group_add(struct perf_event *leader,
values[n++] = primary_event_id(sub);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&sub->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&sub->dropped_samples);
}
unlock:
@@ -5769,6 +5773,8 @@ static int perf_read_one(struct perf_event *event,
values[n++] = primary_event_id(event);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&event->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&event->dropped_samples);
if (copy_to_user(buf, values, n * sizeof(u64)))
return -EFAULT;
@@ -7370,6 +7376,8 @@ static void perf_output_read_one(struct perf_output_handle *handle,
values[n++] = primary_event_id(event);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&event->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&event->dropped_samples);
__output_copy(handle, values, n * sizeof(u64));
}
@@ -7408,6 +7416,8 @@ static void perf_output_read_group(struct perf_output_handle *handle,
values[n++] = primary_event_id(leader);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&leader->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&leader->dropped_samples);
__output_copy(handle, values, n * sizeof(u64));
@@ -7423,6 +7433,8 @@ static void perf_output_read_group(struct perf_output_handle *handle,
values[n++] = primary_event_id(sub);
if (read_format & PERF_FORMAT_LOST)
values[n++] = atomic64_read(&sub->lost_samples);
+ if (read_format & PERF_FORMAT_DROPPED)
+ values[n++] = atomic64_read(&sub->dropped_samples);
__output_copy(handle, values, n * sizeof(u64));
}
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 2/5] perf/core: Export perf_exclude_event()
2024-10-23 0:09 [PATCH v4 0/5] perf: Relax privilege restriction on AMD IBS Namhyung Kim
2024-10-23 0:09 ` [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
@ 2024-10-23 0:09 ` Namhyung Kim
2024-10-23 7:33 ` Thomas Richter
2024-10-23 0:09 ` [PATCH v4 3/5] perf/core: Account dropped samples from BPF Namhyung Kim
` (3 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Namhyung Kim @ 2024-10-23 0:09 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Richter, linux-s390
And increase the dropped_sample count when it returns 1. Now it can
track how many samples are dropped due to the privilege filters in
software events.
While at it, rename the same function in s390 cpum_sf PMU and also count
the dropped samples.
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/s390/kernel/perf_cpum_sf.c | 8 +++++---
include/linux/perf_event.h | 6 ++++++
kernel/events/core.c | 11 +++++++----
3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 5b765e3ccf0cadc8..ff9e694f2be45c6b 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -996,7 +996,7 @@ static void cpumsf_pmu_disable(struct pmu *pmu)
cpuhw->flags &= ~PMU_F_ENABLED;
}
-/* perf_exclude_event() - Filter event
+/* perf_event_exclude() - Filter event
* @event: The perf event
* @regs: pt_regs structure
* @sde_regs: Sample-data-entry (sde) regs structure
@@ -1005,7 +1005,7 @@ static void cpumsf_pmu_disable(struct pmu *pmu)
*
* Return non-zero if the event shall be excluded.
*/
-static int perf_exclude_event(struct perf_event *event, struct pt_regs *regs,
+static int perf_event_exclude(struct perf_event *event, struct pt_regs *regs,
struct perf_sf_sde_regs *sde_regs)
{
if (event->attr.exclude_user && user_mode(regs))
@@ -1088,8 +1088,10 @@ static int perf_push_sample(struct perf_event *event,
data.tid_entry.pid = basic->hpp & LPP_PID_MASK;
overflow = 0;
- if (perf_exclude_event(event, ®s, sde_regs))
+ if (perf_event_exclude(event, ®s, sde_regs)) {
+ atomic64_inc(&event->dropped_samples);
goto out;
+ }
if (perf_event_overflow(event, &data, ®s)) {
overflow = 1;
event->pmu->stop(event, 0);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c1e6340e561c400e..6b31958a2b1db8db 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1649,6 +1649,8 @@ static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
}
+extern int perf_exclude_event(struct perf_event *event, struct pt_regs *regs);
+
extern void perf_event_init(void);
extern void perf_tp_event(u16 event_type, u64 count, void *record,
int entry_size, struct pt_regs *regs,
@@ -1832,6 +1834,10 @@ static inline u64 perf_event_pause(struct perf_event *event, bool reset)
{
return 0;
}
+static inline int perf_exclude_event(struct perf_event *event, struct pt_regs *regs)
+{
+ return 0;
+}
#endif
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7e15fe0a8dee4ee7..5d24597180dec167 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10001,18 +10001,21 @@ static void perf_swevent_event(struct perf_event *event, u64 nr,
perf_swevent_overflow(event, 0, data, regs);
}
-static int perf_exclude_event(struct perf_event *event,
- struct pt_regs *regs)
+int perf_exclude_event(struct perf_event *event, struct pt_regs *regs)
{
if (event->hw.state & PERF_HES_STOPPED)
return 1;
if (regs) {
- if (event->attr.exclude_user && user_mode(regs))
+ if (event->attr.exclude_user && user_mode(regs)) {
+ atomic64_inc(&event->dropped_samples);
return 1;
+ }
- if (event->attr.exclude_kernel && !user_mode(regs))
+ if (event->attr.exclude_kernel && !user_mode(regs)) {
+ atomic64_inc(&event->dropped_samples);
return 1;
+ }
}
return 0;
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 3/5] perf/core: Account dropped samples from BPF
2024-10-23 0:09 [PATCH v4 0/5] perf: Relax privilege restriction on AMD IBS Namhyung Kim
2024-10-23 0:09 ` [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
2024-10-23 0:09 ` [PATCH v4 2/5] perf/core: Export perf_exclude_event() Namhyung Kim
@ 2024-10-23 0:09 ` Namhyung Kim
2024-10-23 16:12 ` Andrii Nakryiko
2024-10-23 0:09 ` [PATCH v4 4/5] perf/powerpc: Count dropped samples in core-book3s PMU Namhyung Kim
` (2 subsequent siblings)
5 siblings, 1 reply; 18+ messages in thread
From: Namhyung Kim @ 2024-10-23 0:09 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das, Kyle Huey, Alexei Starovoitov, Andrii Nakryiko,
Song Liu, bpf
Like in the software events, the BPF overflow handler can drop samples
by returning 0. Let's count the dropped samples here too.
Acked-by: Kyle Huey <me@kylehuey.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
kernel/events/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5d24597180dec167..b41c17a0bc19f7c2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
ret = __perf_event_account_interrupt(event, throttle);
if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
- !bpf_overflow_handler(event, data, regs))
+ !bpf_overflow_handler(event, data, regs)) {
+ atomic64_inc(&event->dropped_samples);
return ret;
+ }
/*
* XXX event_limit might not quite work as expected on inherited
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 4/5] perf/powerpc: Count dropped samples in core-book3s PMU
2024-10-23 0:09 [PATCH v4 0/5] perf: Relax privilege restriction on AMD IBS Namhyung Kim
` (2 preceding siblings ...)
2024-10-23 0:09 ` [PATCH v4 3/5] perf/core: Account dropped samples from BPF Namhyung Kim
@ 2024-10-23 0:09 ` Namhyung Kim
2024-10-23 0:09 ` [PATCH v4 5/5] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
2024-10-24 6:05 ` [PATCH v4 0/5] perf: Relax privilege " Ravi Bangoria
5 siblings, 0 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-10-23 0:09 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Kajol Jain,
Athira Rajeev, linuxppc-dev
Due to the hardware limitation, sometimes it could sample kernel address
while attr.exclude_kernel is set. In that case it silently drops the
sample. Let's count that case in the new dropped_samples counter.
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/powerpc/perf/core-book3s.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 42867469752d73cf..553e288b9f113836 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2287,8 +2287,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
*/
if (event->attr.exclude_kernel &&
(event->attr.sample_type & PERF_SAMPLE_IP) &&
- is_kernel_addr(mfspr(SPRN_SIAR)))
+ is_kernel_addr(mfspr(SPRN_SIAR))) {
+ atomic64_inc(&event->dropped_samples);
record = 0;
+ }
/*
* Finally record data if requested.
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 5/5] perf/x86: Relax privilege filter restriction on AMD IBS
2024-10-23 0:09 [PATCH v4 0/5] perf: Relax privilege restriction on AMD IBS Namhyung Kim
` (3 preceding siblings ...)
2024-10-23 0:09 ` [PATCH v4 4/5] perf/powerpc: Count dropped samples in core-book3s PMU Namhyung Kim
@ 2024-10-23 0:09 ` Namhyung Kim
2024-10-24 6:05 ` [PATCH v4 0/5] perf: Relax privilege " Ravi Bangoria
5 siblings, 0 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-10-23 0:09 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das, Ananth Narayan
While IBS is available for per-thread profiling, still regular users
cannot open an event due to the default paranoid setting (2) which
doesn't allow unprivileged users to get kernel samples. That means
it needs to set exclude_kernel bit in the attribute but IBS driver
would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
we want and I've been getting requests to fix this issue.
This should be done in the hardware, but until we get the HW fix we may
allow exclude_{kernel,user,hv} in the attribute and silently drop the
samples in the PMU IRQ handler. It won't guarantee the sampling
frequency or even it'd miss some with fixed period too. Not ideal,
but that'd still be helpful to regular users.
To minimize the confusion, let's add 'swfilt' bit to attr.config2 which
is exposed in the sysfs format directory so that users can figure out
if the kernel support the privilege filters by software.
$ perf record -e ibs_op/swfilt=1/u true
This uses perf_exclude_event() which checks regs->cs. But it should be
fine because set_linear_ip() also updates the CS according to the RIP
provided by IBS.
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/x86/events/amd/ibs.c | 59 +++++++++++++++++++++++++++------------
1 file changed, 41 insertions(+), 18 deletions(-)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d6243e4..d89622880a9fbbb9 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -31,6 +31,8 @@ static u32 ibs_caps;
#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
+/* attr.config2 */
+#define IBS_SW_FILTER_MASK 1
/*
* IBS states:
@@ -290,6 +292,16 @@ static int perf_ibs_init(struct perf_event *event)
if (has_branch_stack(event))
return -EOPNOTSUPP;
+ /* handle exclude_{user,kernel} in the IRQ handler */
+ if (event->attr.exclude_host || event->attr.exclude_guest ||
+ event->attr.exclude_idle)
+ return -EINVAL;
+
+ if (!(event->attr.config2 & IBS_SW_FILTER_MASK) &&
+ (event->attr.exclude_kernel || event->attr.exclude_user ||
+ event->attr.exclude_hv))
+ return -EINVAL;
+
ret = validate_group(event);
if (ret)
return ret;
@@ -550,24 +562,14 @@ static struct attribute *attrs_empty[] = {
NULL,
};
-static struct attribute_group empty_format_group = {
- .name = "format",
- .attrs = attrs_empty,
-};
-
static struct attribute_group empty_caps_group = {
.name = "caps",
.attrs = attrs_empty,
};
-static const struct attribute_group *empty_attr_groups[] = {
- &empty_format_group,
- &empty_caps_group,
- NULL,
-};
-
PMU_FORMAT_ATTR(rand_en, "config:57");
PMU_FORMAT_ATTR(cnt_ctl, "config:19");
+PMU_FORMAT_ATTR(swfilt, "config2:0");
PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1");
@@ -578,8 +580,9 @@ zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int
return ibs_caps & IBS_CAPS_ZEN4 ? attr->mode : 0;
}
-static struct attribute *rand_en_attrs[] = {
+static struct attribute *fetch_attrs[] = {
&format_attr_rand_en.attr,
+ &format_attr_swfilt.attr,
NULL,
};
@@ -593,9 +596,9 @@ static struct attribute *zen4_ibs_extensions_attrs[] = {
NULL,
};
-static struct attribute_group group_rand_en = {
+static struct attribute_group group_fetch_formats = {
.name = "format",
- .attrs = rand_en_attrs,
+ .attrs = fetch_attrs,
};
static struct attribute_group group_fetch_l3missonly = {
@@ -611,7 +614,7 @@ static struct attribute_group group_zen4_ibs_extensions = {
};
static const struct attribute_group *fetch_attr_groups[] = {
- &group_rand_en,
+ &group_fetch_formats,
&empty_caps_group,
NULL,
};
@@ -628,6 +631,11 @@ cnt_ctl_is_visible(struct kobject *kobj, struct attribute *attr, int i)
return ibs_caps & IBS_CAPS_OPCNT ? attr->mode : 0;
}
+static struct attribute *op_attrs[] = {
+ &format_attr_swfilt.attr,
+ NULL,
+};
+
static struct attribute *cnt_ctl_attrs[] = {
&format_attr_cnt_ctl.attr,
NULL,
@@ -638,6 +646,11 @@ static struct attribute *op_l3missonly_attrs[] = {
NULL,
};
+static struct attribute_group group_op_formats = {
+ .name = "format",
+ .attrs = op_attrs,
+};
+
static struct attribute_group group_cnt_ctl = {
.name = "format",
.attrs = cnt_ctl_attrs,
@@ -650,6 +663,12 @@ static struct attribute_group group_op_l3missonly = {
.is_visible = zen4_ibs_extensions_is_visible,
};
+static const struct attribute_group *op_attr_groups[] = {
+ &group_op_formats,
+ &empty_caps_group,
+ NULL,
+};
+
static const struct attribute_group *op_attr_update[] = {
&group_cnt_ctl,
&group_op_l3missonly,
@@ -667,7 +686,6 @@ static struct perf_ibs perf_ibs_fetch = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr = MSR_AMD64_IBSFETCHCTL,
.config_mask = IBS_FETCH_CONFIG_MASK,
@@ -691,7 +709,6 @@ static struct perf_ibs perf_ibs_op = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr = MSR_AMD64_IBSOPCTL,
.config_mask = IBS_OP_CONFIG_MASK,
@@ -1111,6 +1128,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
regs.flags |= PERF_EFLAGS_EXACT;
}
+ if ((event->attr.config2 & IBS_SW_FILTER_MASK) &&
+ perf_exclude_event(event, ®s)) {
+ throttle = perf_event_account_interrupt(event);
+ goto out;
+ }
+
if (event->attr.sample_type & PERF_SAMPLE_RAW) {
raw = (struct perf_raw_record){
.frag = {
@@ -1228,7 +1251,7 @@ static __init int perf_ibs_op_init(void)
if (ibs_caps & IBS_CAPS_ZEN4)
perf_ibs_op.config_mask |= IBS_OP_L3MISSONLY;
- perf_ibs_op.pmu.attr_groups = empty_attr_groups;
+ perf_ibs_op.pmu.attr_groups = op_attr_groups;
perf_ibs_op.pmu.attr_update = op_attr_update;
return perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/5] perf/core: Export perf_exclude_event()
2024-10-23 0:09 ` [PATCH v4 2/5] perf/core: Export perf_exclude_event() Namhyung Kim
@ 2024-10-23 7:33 ` Thomas Richter
0 siblings, 0 replies; 18+ messages in thread
From: Thomas Richter @ 2024-10-23 7:33 UTC (permalink / raw)
To: Namhyung Kim, Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, linux-s390
On 10/23/24 02:09, Namhyung Kim wrote:
> And increase the dropped_sample count when it returns 1. Now it can
> track how many samples are dropped due to the privilege filters in
> software events.
>
> While at it, rename the same function in s390 cpum_sf PMU and also count
> the dropped samples.
>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: Thomas Richter <tmricht@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> arch/s390/kernel/perf_cpum_sf.c | 8 +++++---
> include/linux/perf_event.h | 6 ++++++
> kernel/events/core.c | 11 +++++++----
> 3 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
> index 5b765e3ccf0cadc8..ff9e694f2be45c6b 100644
> --- a/arch/s390/kernel/perf_cpum_sf.c
> +++ b/arch/s390/kernel/perf_cpum_sf.c
> @@ -996,7 +996,7 @@ static void cpumsf_pmu_disable(struct pmu *pmu)
> cpuhw->flags &= ~PMU_F_ENABLED;
> }
>
> -/* perf_exclude_event() - Filter event
> +/* perf_event_exclude() - Filter event
> * @event: The perf event
> * @regs: pt_regs structure
> * @sde_regs: Sample-data-entry (sde) regs structure
> @@ -1005,7 +1005,7 @@ static void cpumsf_pmu_disable(struct pmu *pmu)
> *
> * Return non-zero if the event shall be excluded.
> */
> -static int perf_exclude_event(struct perf_event *event, struct pt_regs *regs,
> +static int perf_event_exclude(struct perf_event *event, struct pt_regs *regs,
> struct perf_sf_sde_regs *sde_regs)
> {
> if (event->attr.exclude_user && user_mode(regs))
> @@ -1088,8 +1088,10 @@ static int perf_push_sample(struct perf_event *event,
> data.tid_entry.pid = basic->hpp & LPP_PID_MASK;
>
> overflow = 0;
> - if (perf_exclude_event(event, ®s, sde_regs))
> + if (perf_event_exclude(event, ®s, sde_regs)) {
> + atomic64_inc(&event->dropped_samples);
> goto out;
> + }
> if (perf_event_overflow(event, &data, ®s)) {
> overflow = 1;
> event->pmu->stop(event, 0);
For the s390 part:
Acked-by: Thomas Richter <tmricht@linux.ibm.com>
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED
2024-10-23 0:09 ` [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
@ 2024-10-23 11:05 ` Michael Ellerman
2024-10-23 18:30 ` Namhyung Kim
0 siblings, 1 reply; 18+ messages in thread
From: Michael Ellerman @ 2024-10-23 11:05 UTC (permalink / raw)
To: Namhyung Kim, Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das
Namhyung Kim <namhyung@kernel.org> writes:
> When a perf_event is dropped due to some kind of (SW-based) filter, it
> won't generate sample data. For example, software events drops samples
> when it doesn't match to privilege from exclude_{user,kernel}.
>
> In order to account such dropped samples, add a new counter in the
> perf_event, and let users can read(2) the number with the new
> PERF_FORMAT_DROPPED like the lost sample count.
Are we sure there's no scenario where exposing the dropped event count
gives an unprivileged user a way to probe what's happening in the
kernel, which is supposed to be prevented by exclude_kernel?
Clearly it provides an attacker with some information, ie. the event
fired in the kernel and was dropped.
For most events that's not very interesting, but for some maybe it could
be a useful signal?
On the other hand most CPU PMUs implement filtering in hardware, which
this won't affect, so maybe I'm being too paranoid.
cheers
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
2024-10-23 0:09 ` [PATCH v4 3/5] perf/core: Account dropped samples from BPF Namhyung Kim
@ 2024-10-23 16:12 ` Andrii Nakryiko
2024-10-23 18:47 ` Namhyung Kim
0 siblings, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2024-10-23 16:12 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ravi Bangoria, Sandipan Das, Kyle Huey,
Alexei Starovoitov, Andrii Nakryiko, Song Liu, bpf
On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Like in the software events, the BPF overflow handler can drop samples
> by returning 0. Let's count the dropped samples here too.
>
> Acked-by: Kyle Huey <me@kylehuey.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Song Liu <song@kernel.org>
> Cc: bpf@vger.kernel.org
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> kernel/events/core.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5d24597180dec167..b41c17a0bc19f7c2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> ret = __perf_event_account_interrupt(event, throttle);
>
> if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> - !bpf_overflow_handler(event, data, regs))
> + !bpf_overflow_handler(event, data, regs)) {
> + atomic64_inc(&event->dropped_samples);
I don't see the full patch set (please cc relevant people and mailing
list on each patch in the patch set), but do we really want to pay the
price of atomic increment on what's the very typical situation of a
BPF program returning 0?
At least from a BPF perspective this is no "dropping sample", it's
just processing it in BPF and not paying the overhead of the perf
subsystem continuing processing it afterwards. So the dropping part is
also misleading, IMO.
> return ret;
> + }
>
> /*
> * XXX event_limit might not quite work as expected on inherited
> --
> 2.47.0.105.g07ac214952-goog
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED
2024-10-23 11:05 ` Michael Ellerman
@ 2024-10-23 18:30 ` Namhyung Kim
2024-10-24 4:43 ` Ravi Bangoria
0 siblings, 1 reply; 18+ messages in thread
From: Namhyung Kim @ 2024-10-23 18:30 UTC (permalink / raw)
To: Michael Ellerman
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ravi Bangoria, Sandipan Das
Hello,
On Wed, Oct 23, 2024 at 10:05:32PM +1100, Michael Ellerman wrote:
> Namhyung Kim <namhyung@kernel.org> writes:
> > When a perf_event is dropped due to some kind of (SW-based) filter, it
> > won't generate sample data. For example, software events drops samples
> > when it doesn't match to privilege from exclude_{user,kernel}.
> >
> > In order to account such dropped samples, add a new counter in the
> > perf_event, and let users can read(2) the number with the new
> > PERF_FORMAT_DROPPED like the lost sample count.
>
> Are we sure there's no scenario where exposing the dropped event count
> gives an unprivileged user a way to probe what's happening in the
> kernel, which is supposed to be prevented by exclude_kernel?
>
> Clearly it provides an attacker with some information, ie. the event
> fired in the kernel and was dropped.
>
> For most events that's not very interesting, but for some maybe it could
> be a useful signal?
Hmm.. good point. It'd give some information to users. I'm not sure
how much impact it'd have, but there are some folks who want to know
exact number of samples including dropped ones to reconstruct total
period for the monitoring session.
>
> On the other hand most CPU PMUs implement filtering in hardware, which
> this won't affect, so maybe I'm being too paranoid.
Right, it might be possible to estimate some numbers by comparing with
similar events in the core PMU that implements HW filtering even without
this interface IMHO.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
2024-10-23 16:12 ` Andrii Nakryiko
@ 2024-10-23 18:47 ` Namhyung Kim
2024-10-23 19:13 ` Andrii Nakryiko
0 siblings, 1 reply; 18+ messages in thread
From: Namhyung Kim @ 2024-10-23 18:47 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ravi Bangoria, Sandipan Das, Kyle Huey,
Alexei Starovoitov, Andrii Nakryiko, Song Liu, bpf
Hello,
On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Like in the software events, the BPF overflow handler can drop samples
> > by returning 0. Let's count the dropped samples here too.
> >
> > Acked-by: Kyle Huey <me@kylehuey.com>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Andrii Nakryiko <andrii@kernel.org>
> > Cc: Song Liu <song@kernel.org>
> > Cc: bpf@vger.kernel.org
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > kernel/events/core.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > ret = __perf_event_account_interrupt(event, throttle);
> >
> > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > - !bpf_overflow_handler(event, data, regs))
> > + !bpf_overflow_handler(event, data, regs)) {
> > + atomic64_inc(&event->dropped_samples);
>
> I don't see the full patch set (please cc relevant people and mailing
> list on each patch in the patch set), but do we really want to pay the
Sorry, you can find the whole series here.
https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
I thought it's mostly for the perf part so I didn't CC bpf folks but
I'll do in the next version.
> price of atomic increment on what's the very typical situation of a
> BPF program returning 0?
Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs
usually return 0 but PERF_EVENT should care about the return values.
>
> At least from a BPF perspective this is no "dropping sample", it's
> just processing it in BPF and not paying the overhead of the perf
> subsystem continuing processing it afterwards. So the dropping part is
> also misleading, IMO.
In the perf tools, we have a filtering logic in BPF to read sample
values and to decide whether we want this sample or not. In that case
users would be interested in the exact number of samples.
Thanks,
Namhyung
>
> > return ret;
> > + }
> >
> > /*
> > * XXX event_limit might not quite work as expected on inherited
> > --
> > 2.47.0.105.g07ac214952-goog
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
2024-10-23 18:47 ` Namhyung Kim
@ 2024-10-23 19:13 ` Andrii Nakryiko
2024-10-23 20:32 ` Namhyung Kim
0 siblings, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2024-10-23 19:13 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ravi Bangoria, Sandipan Das, Kyle Huey,
Alexei Starovoitov, Andrii Nakryiko, Song Liu, bpf
On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Like in the software events, the BPF overflow handler can drop samples
> > > by returning 0. Let's count the dropped samples here too.
> > >
> > > Acked-by: Kyle Huey <me@kylehuey.com>
> > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > Cc: Song Liu <song@kernel.org>
> > > Cc: bpf@vger.kernel.org
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > > kernel/events/core.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > > ret = __perf_event_account_interrupt(event, throttle);
> > >
> > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > - !bpf_overflow_handler(event, data, regs))
> > > + !bpf_overflow_handler(event, data, regs)) {
> > > + atomic64_inc(&event->dropped_samples);
> >
> > I don't see the full patch set (please cc relevant people and mailing
> > list on each patch in the patch set), but do we really want to pay the
>
> Sorry, you can find the whole series here.
>
> https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
>
> I thought it's mostly for the perf part so I didn't CC bpf folks but
> I'll do in the next version.
>
>
> > price of atomic increment on what's the very typical situation of a
> > BPF program returning 0?
>
> Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs
> usually return 0 but PERF_EVENT should care about the return values.
>
Yeah, it's pretty much always `return 0;` for perf_event-based BPF
profilers. It's rather unusual to return non-zero, actually.
> >
> > At least from a BPF perspective this is no "dropping sample", it's
> > just processing it in BPF and not paying the overhead of the perf
> > subsystem continuing processing it afterwards. So the dropping part is
> > also misleading, IMO.
>
> In the perf tools, we have a filtering logic in BPF to read sample
> values and to decide whether we want this sample or not. In that case
> users would be interested in the exact number of samples.
>
> Thanks,
> Namhyung
>
> >
> > > return ret;
> > > + }
> > >
> > > /*
> > > * XXX event_limit might not quite work as expected on inherited
> > > --
> > > 2.47.0.105.g07ac214952-goog
> > >
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
2024-10-23 19:13 ` Andrii Nakryiko
@ 2024-10-23 20:32 ` Namhyung Kim
2024-10-23 21:24 ` Andrii Nakryiko
0 siblings, 1 reply; 18+ messages in thread
From: Namhyung Kim @ 2024-10-23 20:32 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ravi Bangoria, Sandipan Das, Kyle Huey,
Alexei Starovoitov, Andrii Nakryiko, Song Liu, bpf
On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote:
> On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > Like in the software events, the BPF overflow handler can drop samples
> > > > by returning 0. Let's count the dropped samples here too.
> > > >
> > > > Acked-by: Kyle Huey <me@kylehuey.com>
> > > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > > Cc: Song Liu <song@kernel.org>
> > > > Cc: bpf@vger.kernel.org
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > > kernel/events/core.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > > > ret = __perf_event_account_interrupt(event, throttle);
> > > >
> > > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > > - !bpf_overflow_handler(event, data, regs))
> > > > + !bpf_overflow_handler(event, data, regs)) {
> > > > + atomic64_inc(&event->dropped_samples);
> > >
> > > I don't see the full patch set (please cc relevant people and mailing
> > > list on each patch in the patch set), but do we really want to pay the
> >
> > Sorry, you can find the whole series here.
> >
> > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
> >
> > I thought it's mostly for the perf part so I didn't CC bpf folks but
> > I'll do in the next version.
> >
> >
> > > price of atomic increment on what's the very typical situation of a
> > > BPF program returning 0?
> >
> > Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs
> > usually return 0 but PERF_EVENT should care about the return values.
> >
>
> Yeah, it's pretty much always `return 0;` for perf_event-based BPF
> profilers. It's rather unusual to return non-zero, actually.
Ok, then it may be local_t or plain unsigned long. It should be
updated on overflow only. Read can be racy but I think it's ok to
miss some numbers. If someone really needs a precise count, they can
read it after disabling the event IMHO.
What do you think?
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
2024-10-23 20:32 ` Namhyung Kim
@ 2024-10-23 21:24 ` Andrii Nakryiko
2024-10-28 18:56 ` Namhyung Kim
0 siblings, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2024-10-23 21:24 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ravi Bangoria, Sandipan Das, Kyle Huey,
Alexei Starovoitov, Andrii Nakryiko, Song Liu, bpf
On Wed, Oct 23, 2024 at 1:32 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote:
> > On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hello,
> > >
> > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > Like in the software events, the BPF overflow handler can drop samples
> > > > > by returning 0. Let's count the dropped samples here too.
> > > > >
> > > > > Acked-by: Kyle Huey <me@kylehuey.com>
> > > > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > > > Cc: Song Liu <song@kernel.org>
> > > > > Cc: bpf@vger.kernel.org
> > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > ---
> > > > > kernel/events/core.c | 4 +++-
> > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > > > --- a/kernel/events/core.c
> > > > > +++ b/kernel/events/core.c
> > > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > > > > ret = __perf_event_account_interrupt(event, throttle);
> > > > >
> > > > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > > > - !bpf_overflow_handler(event, data, regs))
> > > > > + !bpf_overflow_handler(event, data, regs)) {
> > > > > + atomic64_inc(&event->dropped_samples);
> > > >
> > > > I don't see the full patch set (please cc relevant people and mailing
> > > > list on each patch in the patch set), but do we really want to pay the
> > >
> > > Sorry, you can find the whole series here.
> > >
> > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
> > >
> > > I thought it's mostly for the perf part so I didn't CC bpf folks but
> > > I'll do in the next version.
> > >
> > >
> > > > price of atomic increment on what's the very typical situation of a
> > > > BPF program returning 0?
> > >
> > > Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs
> > > usually return 0 but PERF_EVENT should care about the return values.
> > >
> >
> > Yeah, it's pretty much always `return 0;` for perf_event-based BPF
> > profilers. It's rather unusual to return non-zero, actually.
>
> Ok, then it may be local_t or plain unsigned long. It should be
> updated on overflow only. Read can be racy but I think it's ok to
> miss some numbers. If someone really needs a precise count, they can
> read it after disabling the event IMHO.
>
> What do you think?
>
See [0] for unsynchronized increment absolutely killing the
performance due to cache line bouncing between CPUs. If the event is
high-frequency and can be triggered across multiple CPUs in short
succession, even an imprecise counter might be harmful.
In general, I'd say that if BPF attachment is involved, we probably
should avoid maintaining unnecessary statistics. Things like this
event->dropped_samples increment can be done very efficiently and
trivially from inside the BPF program, if at all necessary.
Having said that, if it's unlikely to have perf_event bouncing between
multiple CPUs, it's probably not that big of a deal.
[0] https://lore.kernel.org/linux-trace-kernel/20240813203409.3985398-1-andrii@kernel.org/
> Thanks,
> Namhyung
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED
2024-10-23 18:30 ` Namhyung Kim
@ 2024-10-24 4:43 ` Ravi Bangoria
2024-10-28 18:53 ` Namhyung Kim
0 siblings, 1 reply; 18+ messages in thread
From: Ravi Bangoria @ 2024-10-24 4:43 UTC (permalink / raw)
To: Namhyung Kim, Michael Ellerman
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Sandipan Das, Ravi Bangoria
>>> When a perf_event is dropped due to some kind of (SW-based) filter, it
>>> won't generate sample data. For example, software events drops samples
>>> when it doesn't match to privilege from exclude_{user,kernel}.
>>>
>>> In order to account such dropped samples, add a new counter in the
>>> perf_event, and let users can read(2) the number with the new
>>> PERF_FORMAT_DROPPED like the lost sample count.
>>
>> Are we sure there's no scenario where exposing the dropped event count
>> gives an unprivileged user a way to probe what's happening in the
>> kernel, which is supposed to be prevented by exclude_kernel?
>>
>> Clearly it provides an attacker with some information, ie. the event
>> fired in the kernel and was dropped.
>>
>> For most events that's not very interesting, but for some maybe it could
>> be a useful signal?
>
> Hmm.. good point. It'd give some information to users. I'm not sure
> how much impact it'd have, but there are some folks who want to know
> exact number of samples including dropped ones to reconstruct total
> period for the monitoring session.
Can we restrict PERF_FORMAT_DROPPED to perfmon_capable() if it's a
genuine problem? Or would it defeat the whole purpose of _DROPPED
count?
Thanks,
Ravi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/5] perf: Relax privilege restriction on AMD IBS
2024-10-23 0:09 [PATCH v4 0/5] perf: Relax privilege restriction on AMD IBS Namhyung Kim
` (4 preceding siblings ...)
2024-10-23 0:09 ` [PATCH v4 5/5] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
@ 2024-10-24 6:05 ` Ravi Bangoria
5 siblings, 0 replies; 18+ messages in thread
From: Ravi Bangoria @ 2024-10-24 6:05 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Sandipan Das,
Namhyung Kim, Ravi Bangoria
> This is RFC v3 to allow AMD IBS to regular users on the default settings
> where kernel-level profiling is disabled (perf_event_paranoid=2).
> Currently AMD IBS doesn't allow any kind of exclusion in the event
> attribute. But users needs to set attr.exclude_kernel to open an event
> on such an environment.
FWIW, IBS code had few issues with sample period computation which can
lead to SW programing unsupported values to IBS MSRs. Since this series
expose IBS to unprivileged users, below series is a logical dependency:
https://lore.kernel.org/r/20241007034810.754-1-ravi.bangoria@amd.com
Thanks,
Ravi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED
2024-10-24 4:43 ` Ravi Bangoria
@ 2024-10-28 18:53 ` Namhyung Kim
0 siblings, 0 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-10-28 18:53 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Michael Ellerman, Peter Zijlstra, Ingo Molnar, Kan Liang,
Mark Rutland, Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Sandipan Das
Hello Ravi,
On Thu, Oct 24, 2024 at 10:13:24AM +0530, Ravi Bangoria wrote:
> >>> When a perf_event is dropped due to some kind of (SW-based) filter, it
> >>> won't generate sample data. For example, software events drops samples
> >>> when it doesn't match to privilege from exclude_{user,kernel}.
> >>>
> >>> In order to account such dropped samples, add a new counter in the
> >>> perf_event, and let users can read(2) the number with the new
> >>> PERF_FORMAT_DROPPED like the lost sample count.
> >>
> >> Are we sure there's no scenario where exposing the dropped event count
> >> gives an unprivileged user a way to probe what's happening in the
> >> kernel, which is supposed to be prevented by exclude_kernel?
> >>
> >> Clearly it provides an attacker with some information, ie. the event
> >> fired in the kernel and was dropped.
> >>
> >> For most events that's not very interesting, but for some maybe it could
> >> be a useful signal?
> >
> > Hmm.. good point. It'd give some information to users. I'm not sure
> > how much impact it'd have, but there are some folks who want to know
> > exact number of samples including dropped ones to reconstruct total
> > period for the monitoring session.
>
> Can we restrict PERF_FORMAT_DROPPED to perfmon_capable() if it's a
> genuine problem? Or would it defeat the whole purpose of _DROPPED
> count?
Right, that's the purpose of the PERF_FORMAT_DROPPED. But I think
we can discuss this interface later and just focus on the IBS swfilt
first. I'll remove this part now and add it back separately.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
2024-10-23 21:24 ` Andrii Nakryiko
@ 2024-10-28 18:56 ` Namhyung Kim
0 siblings, 0 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-10-28 18:56 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
Stephane Eranian, Ravi Bangoria, Sandipan Das, Kyle Huey,
Alexei Starovoitov, Andrii Nakryiko, Song Liu, bpf
On Wed, Oct 23, 2024 at 02:24:13PM -0700, Andrii Nakryiko wrote:
> On Wed, Oct 23, 2024 at 1:32 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote:
> > > On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > Hello,
> > > >
> > > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > > > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > Like in the software events, the BPF overflow handler can drop samples
> > > > > > by returning 0. Let's count the dropped samples here too.
> > > > > >
> > > > > > Acked-by: Kyle Huey <me@kylehuey.com>
> > > > > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > > > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > > > > Cc: Song Liu <song@kernel.org>
> > > > > > Cc: bpf@vger.kernel.org
> > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > ---
> > > > > > kernel/events/core.c | 4 +++-
> > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > > > > --- a/kernel/events/core.c
> > > > > > +++ b/kernel/events/core.c
> > > > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > > > > > ret = __perf_event_account_interrupt(event, throttle);
> > > > > >
> > > > > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > > > > - !bpf_overflow_handler(event, data, regs))
> > > > > > + !bpf_overflow_handler(event, data, regs)) {
> > > > > > + atomic64_inc(&event->dropped_samples);
> > > > >
> > > > > I don't see the full patch set (please cc relevant people and mailing
> > > > > list on each patch in the patch set), but do we really want to pay the
> > > >
> > > > Sorry, you can find the whole series here.
> > > >
> > > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
> > > >
> > > > I thought it's mostly for the perf part so I didn't CC bpf folks but
> > > > I'll do in the next version.
> > > >
> > > >
> > > > > price of atomic increment on what's the very typical situation of a
> > > > > BPF program returning 0?
> > > >
> > > > Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs
> > > > usually return 0 but PERF_EVENT should care about the return values.
> > > >
> > >
> > > Yeah, it's pretty much always `return 0;` for perf_event-based BPF
> > > profilers. It's rather unusual to return non-zero, actually.
> >
> > Ok, then it may be local_t or plain unsigned long. It should be
> > updated on overflow only. Read can be racy but I think it's ok to
> > miss some numbers. If someone really needs a precise count, they can
> > read it after disabling the event IMHO.
> >
> > What do you think?
> >
>
> See [0] for unsynchronized increment absolutely killing the
> performance due to cache line bouncing between CPUs. If the event is
> high-frequency and can be triggered across multiple CPUs in short
> succession, even an imprecise counter might be harmful.
Ok.
>
> In general, I'd say that if BPF attachment is involved, we probably
> should avoid maintaining unnecessary statistics. Things like this
> event->dropped_samples increment can be done very efficiently and
> trivially from inside the BPF program, if at all necessary.
Right, we can do that in the BPF too.
>
> Having said that, if it's unlikely to have perf_event bouncing between
> multiple CPUs, it's probably not that big of a deal.
Yeah, perf_event is dedicated to a CPU or a task and the counter is
updated only in the overflow handler. So I don't think it'd cause cache
line bouncing between CPUs.
Thanks,
Namhyung
>
>
> [0] https://lore.kernel.org/linux-trace-kernel/20240813203409.3985398-1-andrii@kernel.org/
>
> > Thanks,
> > Namhyung
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-10-28 18:56 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 0:09 [PATCH v4 0/5] perf: Relax privilege restriction on AMD IBS Namhyung Kim
2024-10-23 0:09 ` [PATCH v4 1/5] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
2024-10-23 11:05 ` Michael Ellerman
2024-10-23 18:30 ` Namhyung Kim
2024-10-24 4:43 ` Ravi Bangoria
2024-10-28 18:53 ` Namhyung Kim
2024-10-23 0:09 ` [PATCH v4 2/5] perf/core: Export perf_exclude_event() Namhyung Kim
2024-10-23 7:33 ` Thomas Richter
2024-10-23 0:09 ` [PATCH v4 3/5] perf/core: Account dropped samples from BPF Namhyung Kim
2024-10-23 16:12 ` Andrii Nakryiko
2024-10-23 18:47 ` Namhyung Kim
2024-10-23 19:13 ` Andrii Nakryiko
2024-10-23 20:32 ` Namhyung Kim
2024-10-23 21:24 ` Andrii Nakryiko
2024-10-28 18:56 ` Namhyung Kim
2024-10-23 0:09 ` [PATCH v4 4/5] perf/powerpc: Count dropped samples in core-book3s PMU Namhyung Kim
2024-10-23 0:09 ` [PATCH v4 5/5] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
2024-10-24 6:05 ` [PATCH v4 0/5] perf: Relax privilege " Ravi Bangoria
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox