* [PATCH v5 0/2] perf: Relax privilege restriction on AMD IBS
@ 2024-10-28 20:01 Namhyung Kim
2024-10-28 20:01 ` [PATCH v5 1/2] perf/core: Export perf_exclude_event() Namhyung Kim
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Namhyung Kim @ 2024-10-28 20:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das
Hello,
This is v5 to allow AMD IBS to regular users on the default settings
where kernel-level profiling is disabled (perf_event_paranoid=2).
Currently AMD IBS doesn't allow any kind of exclusion in the event
attribute. But users needs to set attr.exclude_kernel to open an
event on such an environment.
v5 changes)
* drop PERF_FORMAT_DROPPED support for now
* add Acked-by from Thomas for s390
v4) https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
* remove RFC tag
* fix sysfs attribute for ibs_fetch/format (Ravi)
* handle exclude_hv as well, so ":u" modifier would work for IBS
* add Acked and Reviewed-by from Kyle and Madhavan
v3) https://lore.kernel.org/lkml/20240905031027.2567913-1-namhyung@kernel.org
* fix build on s390
* add swfilt format for attr.config2
* count powerpc core-book3s dropped samples
v2) https://lore.kernel.org/lkml/20240830232910.1839548-1-namhyung@kernel.org/
* add PERF_FORMAT_DROPPED
* account dropped sw events and from BPF handler
* use precise RIP from IBS record
v1) https://lore.kernel.org/lkml/20240822230816.564262-1-namhyung@kernel.org/
While IBS doesn't support hardware level privilege filters, the kernel
can allow the event and drop samples belongs to the kernel like in the
software events. This is limited but it still contains precise samples
which is important for various analysis like data type profiling.
This version added format/swfilt file in sysfs to expose the software
filtering by setting the attribute config2 value. I think it's easier
to add a new config rather than adding a new PMU in order to handle
event multiplexing across IBS PMU. Users can use the perf tool to
enable this feature manually like below. Probably the perf tool can
handle this automatically in the future.
$ perf record -e ibs_op/swfilt=1/u $PROG
Let me know what you think.
Thanks,
Namhyung
Namhyung Kim (2):
perf/core: Export perf_exclude_event()
perf/x86: Relax privilege filter restriction on AMD IBS
arch/s390/kernel/perf_cpum_sf.c | 6 ++--
arch/x86/events/amd/ibs.c | 59 +++++++++++++++++++++++----------
include/linux/perf_event.h | 6 ++++
kernel/events/core.c | 3 +-
4 files changed, 51 insertions(+), 23 deletions(-)
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v5 1/2] perf/core: Export perf_exclude_event()
2024-10-28 20:01 [PATCH v5 0/2] perf: Relax privilege restriction on AMD IBS Namhyung Kim
@ 2024-10-28 20:01 ` Namhyung Kim
2024-10-28 20:01 ` [PATCH v5 2/2] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
2024-10-30 9:52 ` [PATCH v5 0/2] perf: Relax privilege " Ravi Bangoria
2 siblings, 0 replies; 5+ messages in thread
From: Namhyung Kim @ 2024-10-28 20:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das, Thomas Richter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
linux-s390
While at it, rename the same function in s390 cpum_sf PMU.
Acked-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/s390/kernel/perf_cpum_sf.c | 6 +++---
include/linux/perf_event.h | 6 ++++++
kernel/events/core.c | 3 +--
3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 5b765e3ccf0cadc8..d1398b23113b5b1a 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -996,7 +996,7 @@ static void cpumsf_pmu_disable(struct pmu *pmu)
cpuhw->flags &= ~PMU_F_ENABLED;
}
-/* perf_exclude_event() - Filter event
+/* perf_event_exclude() - Filter event
* @event: The perf event
* @regs: pt_regs structure
* @sde_regs: Sample-data-entry (sde) regs structure
@@ -1005,7 +1005,7 @@ static void cpumsf_pmu_disable(struct pmu *pmu)
*
* Return non-zero if the event shall be excluded.
*/
-static int perf_exclude_event(struct perf_event *event, struct pt_regs *regs,
+static int perf_event_exclude(struct perf_event *event, struct pt_regs *regs,
struct perf_sf_sde_regs *sde_regs)
{
if (event->attr.exclude_user && user_mode(regs))
@@ -1088,7 +1088,7 @@ static int perf_push_sample(struct perf_event *event,
data.tid_entry.pid = basic->hpp & LPP_PID_MASK;
overflow = 0;
- if (perf_exclude_event(event, ®s, sde_regs))
+ if (perf_event_exclude(event, ®s, sde_regs))
goto out;
if (perf_event_overflow(event, &data, ®s)) {
overflow = 1;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index fb908843f209288d..68c5001ea3102581 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1648,6 +1648,8 @@ static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
}
+extern int perf_exclude_event(struct perf_event *event, struct pt_regs *regs);
+
extern void perf_event_init(void);
extern void perf_tp_event(u16 event_type, u64 count, void *record,
int entry_size, struct pt_regs *regs,
@@ -1831,6 +1833,10 @@ static inline u64 perf_event_pause(struct perf_event *event, bool reset)
{
return 0;
}
+static inline int perf_exclude_event(struct perf_event *event, struct pt_regs *regs)
+{
+ return 0;
+}
#endif
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e3589c4287cb458c..6960c15f85b1a5ad 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9989,8 +9989,7 @@ static void perf_swevent_event(struct perf_event *event, u64 nr,
perf_swevent_overflow(event, 0, data, regs);
}
-static int perf_exclude_event(struct perf_event *event,
- struct pt_regs *regs)
+int perf_exclude_event(struct perf_event *event, struct pt_regs *regs)
{
if (event->hw.state & PERF_HES_STOPPED)
return 1;
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v5 2/2] perf/x86: Relax privilege filter restriction on AMD IBS
2024-10-28 20:01 [PATCH v5 0/2] perf: Relax privilege restriction on AMD IBS Namhyung Kim
2024-10-28 20:01 ` [PATCH v5 1/2] perf/core: Export perf_exclude_event() Namhyung Kim
@ 2024-10-28 20:01 ` Namhyung Kim
2024-10-30 9:52 ` [PATCH v5 0/2] perf: Relax privilege " Ravi Bangoria
2 siblings, 0 replies; 5+ messages in thread
From: Namhyung Kim @ 2024-10-28 20:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria,
Sandipan Das, Ananth Narayan
While IBS is available for per-thread profiling, still regular users
cannot open an event due to the default paranoid setting (2) which
doesn't allow unprivileged users to get kernel samples. That means
it needs to set exclude_kernel bit in the attribute but IBS driver
would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
we want and I've been getting requests to fix this issue.
This should be done in the hardware, but until we get the HW fix we may
allow exclude_{kernel,user,hv} in the attribute and silently drop the
samples in the PMU IRQ handler. It won't guarantee the sampling
frequency or even it'd miss some with fixed period too. Not ideal,
but that'd still be helpful to regular users.
To minimize the confusion, let's add 'swfilt' bit to attr.config2 which
is exposed in the sysfs format directory so that users can figure out
if the kernel support the privilege filters by software.
$ perf record -e ibs_op/swfilt=1/u true
This uses perf_exclude_event() which checks regs->cs. But it should be
fine because set_linear_ip() also updates the CS according to the RIP
provided by IBS.
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/x86/events/amd/ibs.c | 59 +++++++++++++++++++++++++++------------
1 file changed, 41 insertions(+), 18 deletions(-)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d6243e4..d89622880a9fbbb9 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -31,6 +31,8 @@ static u32 ibs_caps;
#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
+/* attr.config2 */
+#define IBS_SW_FILTER_MASK 1
/*
* IBS states:
@@ -290,6 +292,16 @@ static int perf_ibs_init(struct perf_event *event)
if (has_branch_stack(event))
return -EOPNOTSUPP;
+ /* handle exclude_{user,kernel} in the IRQ handler */
+ if (event->attr.exclude_host || event->attr.exclude_guest ||
+ event->attr.exclude_idle)
+ return -EINVAL;
+
+ if (!(event->attr.config2 & IBS_SW_FILTER_MASK) &&
+ (event->attr.exclude_kernel || event->attr.exclude_user ||
+ event->attr.exclude_hv))
+ return -EINVAL;
+
ret = validate_group(event);
if (ret)
return ret;
@@ -550,24 +562,14 @@ static struct attribute *attrs_empty[] = {
NULL,
};
-static struct attribute_group empty_format_group = {
- .name = "format",
- .attrs = attrs_empty,
-};
-
static struct attribute_group empty_caps_group = {
.name = "caps",
.attrs = attrs_empty,
};
-static const struct attribute_group *empty_attr_groups[] = {
- &empty_format_group,
- &empty_caps_group,
- NULL,
-};
-
PMU_FORMAT_ATTR(rand_en, "config:57");
PMU_FORMAT_ATTR(cnt_ctl, "config:19");
+PMU_FORMAT_ATTR(swfilt, "config2:0");
PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1");
@@ -578,8 +580,9 @@ zen4_ibs_extensions_is_visible(struct kobject *kobj, struct attribute *attr, int
return ibs_caps & IBS_CAPS_ZEN4 ? attr->mode : 0;
}
-static struct attribute *rand_en_attrs[] = {
+static struct attribute *fetch_attrs[] = {
&format_attr_rand_en.attr,
+ &format_attr_swfilt.attr,
NULL,
};
@@ -593,9 +596,9 @@ static struct attribute *zen4_ibs_extensions_attrs[] = {
NULL,
};
-static struct attribute_group group_rand_en = {
+static struct attribute_group group_fetch_formats = {
.name = "format",
- .attrs = rand_en_attrs,
+ .attrs = fetch_attrs,
};
static struct attribute_group group_fetch_l3missonly = {
@@ -611,7 +614,7 @@ static struct attribute_group group_zen4_ibs_extensions = {
};
static const struct attribute_group *fetch_attr_groups[] = {
- &group_rand_en,
+ &group_fetch_formats,
&empty_caps_group,
NULL,
};
@@ -628,6 +631,11 @@ cnt_ctl_is_visible(struct kobject *kobj, struct attribute *attr, int i)
return ibs_caps & IBS_CAPS_OPCNT ? attr->mode : 0;
}
+static struct attribute *op_attrs[] = {
+ &format_attr_swfilt.attr,
+ NULL,
+};
+
static struct attribute *cnt_ctl_attrs[] = {
&format_attr_cnt_ctl.attr,
NULL,
@@ -638,6 +646,11 @@ static struct attribute *op_l3missonly_attrs[] = {
NULL,
};
+static struct attribute_group group_op_formats = {
+ .name = "format",
+ .attrs = op_attrs,
+};
+
static struct attribute_group group_cnt_ctl = {
.name = "format",
.attrs = cnt_ctl_attrs,
@@ -650,6 +663,12 @@ static struct attribute_group group_op_l3missonly = {
.is_visible = zen4_ibs_extensions_is_visible,
};
+static const struct attribute_group *op_attr_groups[] = {
+ &group_op_formats,
+ &empty_caps_group,
+ NULL,
+};
+
static const struct attribute_group *op_attr_update[] = {
&group_cnt_ctl,
&group_op_l3missonly,
@@ -667,7 +686,6 @@ static struct perf_ibs perf_ibs_fetch = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr = MSR_AMD64_IBSFETCHCTL,
.config_mask = IBS_FETCH_CONFIG_MASK,
@@ -691,7 +709,6 @@ static struct perf_ibs perf_ibs_op = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr = MSR_AMD64_IBSOPCTL,
.config_mask = IBS_OP_CONFIG_MASK,
@@ -1111,6 +1128,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
regs.flags |= PERF_EFLAGS_EXACT;
}
+ if ((event->attr.config2 & IBS_SW_FILTER_MASK) &&
+ perf_exclude_event(event, ®s)) {
+ throttle = perf_event_account_interrupt(event);
+ goto out;
+ }
+
if (event->attr.sample_type & PERF_SAMPLE_RAW) {
raw = (struct perf_raw_record){
.frag = {
@@ -1228,7 +1251,7 @@ static __init int perf_ibs_op_init(void)
if (ibs_caps & IBS_CAPS_ZEN4)
perf_ibs_op.config_mask |= IBS_OP_L3MISSONLY;
- perf_ibs_op.pmu.attr_groups = empty_attr_groups;
+ perf_ibs_op.pmu.attr_groups = op_attr_groups;
perf_ibs_op.pmu.attr_update = op_attr_update;
return perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
--
2.47.0.163.g1226f6d8fa-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v5 0/2] perf: Relax privilege restriction on AMD IBS
2024-10-28 20:01 [PATCH v5 0/2] perf: Relax privilege restriction on AMD IBS Namhyung Kim
2024-10-28 20:01 ` [PATCH v5 1/2] perf/core: Export perf_exclude_event() Namhyung Kim
2024-10-28 20:01 ` [PATCH v5 2/2] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
@ 2024-10-30 9:52 ` Ravi Bangoria
2024-11-08 6:47 ` Namhyung Kim
2 siblings, 1 reply; 5+ messages in thread
From: Ravi Bangoria @ 2024-10-30 9:52 UTC (permalink / raw)
To: Namhyung Kim, Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Sandipan Das,
Ravi Bangoria
On 29-Oct-24 1:31 AM, Namhyung Kim wrote:
> Hello,
>
> This is v5 to allow AMD IBS to regular users on the default settings
> where kernel-level profiling is disabled (perf_event_paranoid=2).
> Currently AMD IBS doesn't allow any kind of exclusion in the event
> attribute. But users needs to set attr.exclude_kernel to open an
> event on such an environment.
For the series:
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v5 0/2] perf: Relax privilege restriction on AMD IBS
2024-10-30 9:52 ` [PATCH v5 0/2] perf: Relax privilege " Ravi Bangoria
@ 2024-11-08 6:47 ` Namhyung Kim
0 siblings, 0 replies; 5+ messages in thread
From: Namhyung Kim @ 2024-11-08 6:47 UTC (permalink / raw)
To: Ravi Bangoria, Peter Zijlstra, Ingo Molnar
Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Sandipan Das
On Wed, Oct 30, 2024 at 03:22:02PM +0530, Ravi Bangoria wrote:
> On 29-Oct-24 1:31 AM, Namhyung Kim wrote:
> > Hello,
> >
> > This is v5 to allow AMD IBS to regular users on the default settings
> > where kernel-level profiling is disabled (perf_event_paranoid=2).
> > Currently AMD IBS doesn't allow any kind of exclusion in the event
> > attribute. But users needs to set attr.exclude_kernel to open an
> > event on such an environment.
>
> For the series:
>
> Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Thanks for your review!
Peter and Ingo, can you please take a look at this?
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-08 6:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-28 20:01 [PATCH v5 0/2] perf: Relax privilege restriction on AMD IBS Namhyung Kim
2024-10-28 20:01 ` [PATCH v5 1/2] perf/core: Export perf_exclude_event() Namhyung Kim
2024-10-28 20:01 ` [PATCH v5 2/2] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
2024-10-30 9:52 ` [PATCH v5 0/2] perf: Relax privilege " Ravi Bangoria
2024-11-08 6:47 ` Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox