public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH 0/4] perf: Relax privilege restriction on AMD IBS (v2)
@ 2024-08-30 23:29 Namhyung Kim
  2024-08-30 23:29 ` [RFC/PATCH 1/4] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Namhyung Kim @ 2024-08-30 23:29 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
	Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria

Hello,

This is RFC v2 to allow AMD IBS to regular users on the default settings
where kernel-level profiling is disabled.  Currently AMD IBS doesn't
allow any kind of exclusion in the event attribute.  But users needs to
set attr.exclude_kernel to open an event on such an environment.

v2) changes

* add PERF_FORMAT_DROPPED
* account dropped sw events and from BPF handler
* use precise RIP from IBS record

v1) https://lore.kernel.org/lkml/20240822230816.564262-1-namhyung@kernel.org/

While IBS doesn't support hardware level privilege filters, the kernel
can allow the event and drop samples belongs to the kernel like in the
software events.

In order to count those dropped samples correctly, I'd propose a new
read format PERF_FORMAT_DROPPED same as we did for the lost samples.
With this, it can count dropped samples in the software events and
from the BPF overflow handler as well.

Let me know what you think.

Thanks,
Namhyung


Namhyung Kim (4):
  perf/core: Add PERF_FORMAT_DROPPED
  perf/core: Export perf_exclude_event()
  perf/core: Account dropped samples from BPF
  perf/x86: Relax privilege filter restriction on AMD IBS

 arch/x86/events/amd/ibs.c       | 13 +++++++++++--
 include/linux/perf_event.h      |  3 +++
 include/uapi/linux/perf_event.h |  5 ++++-
 kernel/events/core.c            | 27 ++++++++++++++++++++++-----
 4 files changed, 40 insertions(+), 8 deletions(-)

-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC/PATCH 1/4] perf/core: Add PERF_FORMAT_DROPPED
  2024-08-30 23:29 [RFC/PATCH 0/4] perf: Relax privilege restriction on AMD IBS (v2) Namhyung Kim
@ 2024-08-30 23:29 ` Namhyung Kim
  2024-08-30 23:29 ` [RFC/PATCH 2/4] perf/core: Export perf_exclude_event() Namhyung Kim
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2024-08-30 23:29 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
	Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria

When a perf_event is dropped due to some kind of (SW-based) filter, it
won't generate sample data.  For example, software events drops samples
when it doesn't match to privilege from exclude_{user,kernel}.

In order to account such dropped samples, add a new counter in the
perf_event, and let users can read(2) the number with the new
PERF_FORMAT_DROPPED like the lost sample count.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/perf_event.h      |  1 +
 include/uapi/linux/perf_event.h |  5 ++++-
 kernel/events/core.c            | 12 ++++++++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 701549967c18..955d39543398 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -809,6 +809,7 @@ struct perf_event {
 	u64				id;
 
 	atomic64_t			lost_samples;
+	atomic64_t			dropped_samples;
 
 	u64				(*clock)(void);
 	perf_overflow_handler_t		overflow_handler;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 58daf6156fd0..6f19d4f74823 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -347,6 +347,7 @@ enum {
  *	  { u64		time_running; } && PERF_FORMAT_TOTAL_TIME_RUNNING
  *	  { u64		id;           } && PERF_FORMAT_ID
  *	  { u64		lost;         } && PERF_FORMAT_LOST
+ *	  { u64		dropped;      } && PERF_FORMAT_DROPPED
  *	} && !PERF_FORMAT_GROUP
  *
  *	{ u64		nr;
@@ -355,6 +356,7 @@ enum {
  *	  { u64		value;
  *	    { u64	id;           } && PERF_FORMAT_ID
  *	    { u64	lost;         } && PERF_FORMAT_LOST
+ *	    { u64	dropped;      } && PERF_FORMAT_DROPPED
  *	  }		cntr[nr];
  *	} && PERF_FORMAT_GROUP
  * };
@@ -365,8 +367,9 @@ enum perf_event_read_format {
 	PERF_FORMAT_ID				= 1U << 2,
 	PERF_FORMAT_GROUP			= 1U << 3,
 	PERF_FORMAT_LOST			= 1U << 4,
+	PERF_FORMAT_DROPPED			= 1U << 5,
 
-	PERF_FORMAT_MAX = 1U << 5,		/* non-ABI */
+	PERF_FORMAT_MAX = 1U << 6,		/* non-ABI */
 };
 
 #define PERF_ATTR_SIZE_VER0	64	/* sizeof first published struct */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6a720f41225..4d72538628ee 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5679,6 +5679,8 @@ static int __perf_read_group_add(struct perf_event *leader,
 		values[n++] = primary_event_id(leader);
 	if (read_format & PERF_FORMAT_LOST)
 		values[n++] = atomic64_read(&leader->lost_samples);
+	if (read_format & PERF_FORMAT_DROPPED)
+		values[n++] = atomic64_read(&leader->dropped_samples);
 
 	for_each_sibling_event(sub, leader) {
 		values[n++] += perf_event_count(sub, false);
@@ -5686,6 +5688,8 @@ static int __perf_read_group_add(struct perf_event *leader,
 			values[n++] = primary_event_id(sub);
 		if (read_format & PERF_FORMAT_LOST)
 			values[n++] = atomic64_read(&sub->lost_samples);
+		if (read_format & PERF_FORMAT_DROPPED)
+			values[n++] = atomic64_read(&sub->dropped_samples);
 	}
 
 unlock:
@@ -5751,6 +5755,8 @@ static int perf_read_one(struct perf_event *event,
 		values[n++] = primary_event_id(event);
 	if (read_format & PERF_FORMAT_LOST)
 		values[n++] = atomic64_read(&event->lost_samples);
+	if (read_format & PERF_FORMAT_DROPPED)
+		values[n++] = atomic64_read(&event->dropped_samples);
 
 	if (copy_to_user(buf, values, n * sizeof(u64)))
 		return -EFAULT;
@@ -7348,6 +7354,8 @@ static void perf_output_read_one(struct perf_output_handle *handle,
 		values[n++] = primary_event_id(event);
 	if (read_format & PERF_FORMAT_LOST)
 		values[n++] = atomic64_read(&event->lost_samples);
+	if (read_format & PERF_FORMAT_DROPPED)
+		values[n++] = atomic64_read(&event->dropped_samples);
 
 	__output_copy(handle, values, n * sizeof(u64));
 }
@@ -7386,6 +7394,8 @@ static void perf_output_read_group(struct perf_output_handle *handle,
 		values[n++] = primary_event_id(leader);
 	if (read_format & PERF_FORMAT_LOST)
 		values[n++] = atomic64_read(&leader->lost_samples);
+	if (read_format & PERF_FORMAT_DROPPED)
+		values[n++] = atomic64_read(&leader->dropped_samples);
 
 	__output_copy(handle, values, n * sizeof(u64));
 
@@ -7401,6 +7411,8 @@ static void perf_output_read_group(struct perf_output_handle *handle,
 			values[n++] = primary_event_id(sub);
 		if (read_format & PERF_FORMAT_LOST)
 			values[n++] = atomic64_read(&sub->lost_samples);
+		if (read_format & PERF_FORMAT_DROPPED)
+			values[n++] = atomic64_read(&sub->dropped_samples);
 
 		__output_copy(handle, values, n * sizeof(u64));
 	}
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC/PATCH 2/4] perf/core: Export perf_exclude_event()
  2024-08-30 23:29 [RFC/PATCH 0/4] perf: Relax privilege restriction on AMD IBS (v2) Namhyung Kim
  2024-08-30 23:29 ` [RFC/PATCH 1/4] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
@ 2024-08-30 23:29 ` Namhyung Kim
  2024-08-30 23:29 ` [RFC/PATCH 3/4] perf/core: Account dropped samples from BPF Namhyung Kim
  2024-08-30 23:29 ` [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
  3 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2024-08-30 23:29 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
	Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria

And increase the dropped_sample count when it returns 1.  Now it can
track how many samples are dropped due to the privilege filters in
software events.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/perf_event.h |  2 ++
 kernel/events/core.c       | 11 +++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 955d39543398..c93673a8d66a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1634,6 +1634,8 @@ static inline int perf_allow_tracepoint(struct perf_event_attr *attr)
 	return security_perf_event_open(attr, PERF_SECURITY_TRACEPOINT);
 }
 
+extern int perf_exclude_event(struct perf_event *event, struct pt_regs *regs);
+
 extern void perf_event_init(void);
 extern void perf_tp_event(u16 event_type, u64 count, void *record,
 			  int entry_size, struct pt_regs *regs,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4d72538628ee..8250e76f6335 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9978,18 +9978,21 @@ static void perf_swevent_event(struct perf_event *event, u64 nr,
 	perf_swevent_overflow(event, 0, data, regs);
 }
 
-static int perf_exclude_event(struct perf_event *event,
-			      struct pt_regs *regs)
+int perf_exclude_event(struct perf_event *event, struct pt_regs *regs)
 {
 	if (event->hw.state & PERF_HES_STOPPED)
 		return 1;
 
 	if (regs) {
-		if (event->attr.exclude_user && user_mode(regs))
+		if (event->attr.exclude_user && user_mode(regs)) {
+			atomic64_inc(&event->dropped_samples);
 			return 1;
+		}
 
-		if (event->attr.exclude_kernel && !user_mode(regs))
+		if (event->attr.exclude_kernel && !user_mode(regs)) {
+			atomic64_inc(&event->dropped_samples);
 			return 1;
+		}
 	}
 
 	return 0;
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC/PATCH 3/4] perf/core: Account dropped samples from BPF
  2024-08-30 23:29 [RFC/PATCH 0/4] perf: Relax privilege restriction on AMD IBS (v2) Namhyung Kim
  2024-08-30 23:29 ` [RFC/PATCH 1/4] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
  2024-08-30 23:29 ` [RFC/PATCH 2/4] perf/core: Export perf_exclude_event() Namhyung Kim
@ 2024-08-30 23:29 ` Namhyung Kim
  2024-08-30 23:29 ` [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
  3 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2024-08-30 23:29 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
	Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria

Like in the software events, the BPF overflow handler can drop samples
by returning 0.  Let's count the dropped samples here too.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/events/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8250e76f6335..ba1f6b51ea26 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9808,8 +9808,10 @@ static int __perf_event_overflow(struct perf_event *event,
 
 	ret = __perf_event_account_interrupt(event, throttle);
 
-	if (event->prog && !bpf_overflow_handler(event, data, regs))
+	if (event->prog && !bpf_overflow_handler(event, data, regs)) {
+		atomic64_inc(&event->dropped_samples);
 		return ret;
+	}
 
 	/*
 	 * XXX event_limit might not quite work as expected on inherited
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
  2024-08-30 23:29 [RFC/PATCH 0/4] perf: Relax privilege restriction on AMD IBS (v2) Namhyung Kim
                   ` (2 preceding siblings ...)
  2024-08-30 23:29 ` [RFC/PATCH 3/4] perf/core: Account dropped samples from BPF Namhyung Kim
@ 2024-08-30 23:29 ` Namhyung Kim
  2024-09-02  9:12   ` Peter Zijlstra
  3 siblings, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2024-08-30 23:29 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Kan Liang, Mark Rutland, Alexander Shishkin,
	Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria

While IBS is available for per-thread profiling, still regular users
cannot open an event due to the default paranoid setting (2) which
doesn't allow unprivileged users to get kernel samples.  That means
it needs to set exclude_kernel bit in the attribute but IBS driver
would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
we want and I've been getting requests to fix this issue.

This should be done in the hardware, but until we get the HW fix we may
allow exclude_{kernel,user} in the attribute and silently drop the
samples in the PMU IRQ handler.  It won't guarantee the sampling
frequency or even it'd miss some with fixed period too.  Not ideal,
but that'd still be helpful to regular users.

This uses perf_exclude_event() which checks regs->cs.  But it should be
fine because set_linear_ip() also updates the CS according to the RIP
provided by IBS.

Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 arch/x86/events/amd/ibs.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d62..e40e2255239a 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
 	if (has_branch_stack(event))
 		return -EOPNOTSUPP;
 
+	/* handle exclude_{user,kernel} in the IRQ handler */
+	if (event->attr.exclude_hv || event->attr.exclude_idle ||
+	    event->attr.exclude_host || event->attr.exclude_guest)
+		return -EINVAL;
+
 	ret = validate_group(event);
 	if (ret)
 		return ret;
@@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
 		.start		= perf_ibs_start,
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
-		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
 	},
 	.msr			= MSR_AMD64_IBSFETCHCTL,
 	.config_mask		= IBS_FETCH_CONFIG_MASK,
@@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
 		.start		= perf_ibs_start,
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
-		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
 	},
 	.msr			= MSR_AMD64_IBSOPCTL,
 	.config_mask		= IBS_OP_CONFIG_MASK,
@@ -1111,6 +1114,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		regs.flags |= PERF_EFLAGS_EXACT;
 	}
 
+	if (perf_exclude_event(event, &regs)) {
+		throttle = perf_event_account_interrupt(event);
+		atomic64_inc(&event->dropped_samples);
+		goto out;
+	}
+
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		raw = (struct perf_raw_record){
 			.frag = {
-- 
2.46.0.469.g59c65b2a67-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
  2024-08-30 23:29 ` [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
@ 2024-09-02  9:12   ` Peter Zijlstra
  2024-09-02 17:30     ` Namhyung Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2024-09-02  9:12 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Kan Liang, Mark Rutland, Alexander Shishkin,
	Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria

On Fri, Aug 30, 2024 at 04:29:10PM -0700, Namhyung Kim wrote:
> While IBS is available for per-thread profiling, still regular users
> cannot open an event due to the default paranoid setting (2) which
> doesn't allow unprivileged users to get kernel samples.  That means
> it needs to set exclude_kernel bit in the attribute but IBS driver
> would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
> we want and I've been getting requests to fix this issue.
> 
> This should be done in the hardware, but until we get the HW fix we may
> allow exclude_{kernel,user} in the attribute and silently drop the
> samples in the PMU IRQ handler.  It won't guarantee the sampling
> frequency or even it'd miss some with fixed period too.  Not ideal,
> but that'd still be helpful to regular users.

Urgh.... this is really rather bad. And I'm sure a bunch of people are
going to be spending a lot of time trying to figure out why their
results don't make sense.

I realize that having entry hooks to disable/enable the counters is also
not going to happen, this has a ton of problems too.

Also, that PMU passthrough patch set has guest hooks, so you can
actually do the exclude_host/guest nonsense with those, right?

> This uses perf_exclude_event() which checks regs->cs.  But it should be
> fine because set_linear_ip() also updates the CS according to the RIP
> provided by IBS.
> 
> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> Cc: Stephane Eranian <eranian@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  arch/x86/events/amd/ibs.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index e91970b01d62..e40e2255239a 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
>  	if (has_branch_stack(event))
>  		return -EOPNOTSUPP;
>  
> +	/* handle exclude_{user,kernel} in the IRQ handler */
> +	if (event->attr.exclude_hv || event->attr.exclude_idle ||
> +	    event->attr.exclude_host || event->attr.exclude_guest)
> +		return -EINVAL;
> +
>  	ret = validate_group(event);
>  	if (ret)
>  		return ret;
> @@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
>  		.start		= perf_ibs_start,
>  		.stop		= perf_ibs_stop,
>  		.read		= perf_ibs_read,
> -		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
>  	},
>  	.msr			= MSR_AMD64_IBSFETCHCTL,
>  	.config_mask		= IBS_FETCH_CONFIG_MASK,
> @@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
>  		.start		= perf_ibs_start,
>  		.stop		= perf_ibs_stop,
>  		.read		= perf_ibs_read,
> -		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
>  	},
>  	.msr			= MSR_AMD64_IBSOPCTL,
>  	.config_mask		= IBS_OP_CONFIG_MASK,
> @@ -1111,6 +1114,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>  		regs.flags |= PERF_EFLAGS_EXACT;
>  	}
>  
> +	if (perf_exclude_event(event, &regs)) {
> +		throttle = perf_event_account_interrupt(event);
> +		atomic64_inc(&event->dropped_samples);
> +		goto out;
> +	}
> +
>  	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
>  		raw = (struct perf_raw_record){
>  			.frag = {
> -- 
> 2.46.0.469.g59c65b2a67-goog
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
  2024-09-02  9:12   ` Peter Zijlstra
@ 2024-09-02 17:30     ` Namhyung Kim
  2024-09-03  8:54       ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2024-09-02 17:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Kan Liang, Mark Rutland, Alexander Shishkin,
	Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria

On Mon, Sep 2, 2024 at 2:12 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Aug 30, 2024 at 04:29:10PM -0700, Namhyung Kim wrote:
> > While IBS is available for per-thread profiling, still regular users
> > cannot open an event due to the default paranoid setting (2) which
> > doesn't allow unprivileged users to get kernel samples.  That means
> > it needs to set exclude_kernel bit in the attribute but IBS driver
> > would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
> > we want and I've been getting requests to fix this issue.
> >
> > This should be done in the hardware, but until we get the HW fix we may
> > allow exclude_{kernel,user} in the attribute and silently drop the
> > samples in the PMU IRQ handler.  It won't guarantee the sampling
> > frequency or even it'd miss some with fixed period too.  Not ideal,
> > but that'd still be helpful to regular users.
>
> Urgh.... this is really rather bad. And I'm sure a bunch of people are
> going to be spending a lot of time trying to figure out why their
> results don't make sense.

I agree it can be confusing but there are use cases where regular users
want IBS information like memory data source, data address and so on.
Also I realized that software events like cpu-clock use the same logic to
discard samples by privilege mode already.

>
> I realize that having entry hooks to disable/enable the counters is also
> not going to happen, this has a ton of problems too.

Do you mean kernel/user mode change hook?  I guess it'd be too costly.

>
> Also, that PMU passthrough patch set has guest hooks, so you can
> actually do the exclude_host/guest nonsense with those, right?

Oh.. this patch is about exclude_user/kernel not host/guest.  Anyway
it'd be great if IBS could support the guest hooks and allow the exclude
bits.

Thanks,
Namhyung

>
> > This uses perf_exclude_event() which checks regs->cs.  But it should be
> > fine because set_linear_ip() also updates the CS according to the RIP
> > provided by IBS.
> >
> > Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> > Cc: Stephane Eranian <eranian@google.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  arch/x86/events/amd/ibs.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> > index e91970b01d62..e40e2255239a 100644
> > --- a/arch/x86/events/amd/ibs.c
> > +++ b/arch/x86/events/amd/ibs.c
> > @@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
> >       if (has_branch_stack(event))
> >               return -EOPNOTSUPP;
> >
> > +     /* handle exclude_{user,kernel} in the IRQ handler */
> > +     if (event->attr.exclude_hv || event->attr.exclude_idle ||
> > +         event->attr.exclude_host || event->attr.exclude_guest)
> > +             return -EINVAL;
> > +
> >       ret = validate_group(event);
> >       if (ret)
> >               return ret;
> > @@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
> >               .start          = perf_ibs_start,
> >               .stop           = perf_ibs_stop,
> >               .read           = perf_ibs_read,
> > -             .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
> >       },
> >       .msr                    = MSR_AMD64_IBSFETCHCTL,
> >       .config_mask            = IBS_FETCH_CONFIG_MASK,
> > @@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
> >               .start          = perf_ibs_start,
> >               .stop           = perf_ibs_stop,
> >               .read           = perf_ibs_read,
> > -             .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
> >       },
> >       .msr                    = MSR_AMD64_IBSOPCTL,
> >       .config_mask            = IBS_OP_CONFIG_MASK,
> > @@ -1111,6 +1114,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> >               regs.flags |= PERF_EFLAGS_EXACT;
> >       }
> >
> > +     if (perf_exclude_event(event, &regs)) {
> > +             throttle = perf_event_account_interrupt(event);
> > +             atomic64_inc(&event->dropped_samples);
> > +             goto out;
> > +     }
> > +
> >       if (event->attr.sample_type & PERF_SAMPLE_RAW) {
> >               raw = (struct perf_raw_record){
> >                       .frag = {
> > --
> > 2.46.0.469.g59c65b2a67-goog
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
  2024-09-02 17:30     ` Namhyung Kim
@ 2024-09-03  8:54       ` Peter Zijlstra
       [not found]         ` <CAM9d7ch8fwk-o7W6KrTgtJ5n8-oVMGqzxvW_zd_hrcWFoE2AHg@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2024-09-03  8:54 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Kan Liang, Mark Rutland, Alexander Shishkin,
	Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria

On Mon, Sep 02, 2024 at 10:30:19AM -0700, Namhyung Kim wrote:
> On Mon, Sep 2, 2024 at 2:12 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Aug 30, 2024 at 04:29:10PM -0700, Namhyung Kim wrote:
> > > While IBS is available for per-thread profiling, still regular users
> > > cannot open an event due to the default paranoid setting (2) which
> > > doesn't allow unprivileged users to get kernel samples.  That means
> > > it needs to set exclude_kernel bit in the attribute but IBS driver
> > > would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
> > > we want and I've been getting requests to fix this issue.
> > >
> > > This should be done in the hardware, but until we get the HW fix we may
> > > allow exclude_{kernel,user} in the attribute and silently drop the
> > > samples in the PMU IRQ handler.  It won't guarantee the sampling
> > > frequency or even it'd miss some with fixed period too.  Not ideal,
> > > but that'd still be helpful to regular users.
> >
> > Urgh.... this is really rather bad. And I'm sure a bunch of people are
> > going to be spending a lot of time trying to figure out why their
> > results don't make sense.
> 
> I agree it can be confusing but there are use cases where regular users
> want IBS information like memory data source, data address and so on.

Sure, but I'm a bit worried about users not being aware of this
trickery. This makes IBS events that have exclude_kernel=1 behave
significantly different from those that don't have it.

At the very least you should kill the IBS forward in amd_pmu_hw_config()
when this is active. But perhaps we should also emit a one time
KERN_INFO message when such an event gets created?

> Also I realized that software events like cpu-clock use the same logic to
> discard samples by privilege mode already.

Right, but everybody expects the software things to suck :-) And they
always suck, unconditionally.

While the IBS thing only sucks when you use exclude_[user,kernel]
things. Stealth suckage is bad and enrages people.

> > I realize that having entry hooks to disable/enable the counters is also
> > not going to happen, this has a ton of problems too.
> 
> Do you mean kernel/user mode change hook?  I guess it'd be too costly.

Yep, insanely expensive :-)

> > Also, that PMU passthrough patch set has guest hooks, so you can
> > actually do the exclude_host/guest nonsense with those, right?
> 
> Oh.. this patch is about exclude_user/kernel not host/guest.  Anyway
> it'd be great if IBS could support the guest hooks and allow the exclude
> bits.

Right, but since your other patchset about disabling exclude_guest
because IBS don't support it I figured I'd mention that it would be
fairly simple to fix.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
       [not found]         ` <CAM9d7ch8fwk-o7W6KrTgtJ5n8-oVMGqzxvW_zd_hrcWFoE2AHg@mail.gmail.com>
@ 2024-09-04  6:53           ` Ravi Bangoria
  2024-09-04 11:39           ` Peter Zijlstra
  1 sibling, 0 replies; 10+ messages in thread
From: Ravi Bangoria @ 2024-09-04  6:53 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Kan Liang, Mark Rutland,
	Alexander Shishkin, Arnaldo Carvalho de Melo, LKML,
	Stephane Eranian, Ravi Bangoria, Ananth Narayan, Sandipan Das

Hi Namhyung,

On 03-Sep-24 10:04 PM, Namhyung Kim wrote:
> On Tue, Sep 3, 2024 at 1:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Mon, Sep 02, 2024 at 10:30:19AM -0700, Namhyung Kim wrote:
>>> On Mon, Sep 2, 2024 at 2:12 AM Peter Zijlstra <peterz@infradead.org> wrote:
>>>>
>>>> On Fri, Aug 30, 2024 at 04:29:10PM -0700, Namhyung Kim wrote:
>>>>> While IBS is available for per-thread profiling, still regular users
>>>>> cannot open an event due to the default paranoid setting (2) which
>>>>> doesn't allow unprivileged users to get kernel samples.  That means
>>>>> it needs to set exclude_kernel bit in the attribute but IBS driver
>>>>> would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
>>>>> we want and I've been getting requests to fix this issue.
>>>>>
>>>>> This should be done in the hardware, but until we get the HW fix we may
>>>>> allow exclude_{kernel,user} in the attribute and silently drop the
>>>>> samples in the PMU IRQ handler.  It won't guarantee the sampling
>>>>> frequency or even it'd miss some with fixed period too.  Not ideal,
>>>>> but that'd still be helpful to regular users.
>>>>
>>>> Urgh.... this is really rather bad. And I'm sure a bunch of people are
>>>> going to be spending a lot of time trying to figure out why their
>>>> results don't make sense.
>>>
>>> I agree it can be confusing but there are use cases where regular users
>>> want IBS information like memory data source, data address and so on.
>>
>> Sure, but I'm a bit worried about users not being aware of this
>> trickery. This makes IBS events that have exclude_kernel=1 behave
>> significantly different from those that don't have it.
>>
>> At the very least you should kill the IBS forward in amd_pmu_hw_config()
>> when this is active. But perhaps we should also emit a one time
>> KERN_INFO message when such an event gets created?
> 
> What about adding a pseudo config or format for a special event that
> allows exclude_kernel=1 and drops samples?  Maybe we could use
> attr.config = 0xffffffff or attr.config=1 or something.

attr.config is directly mapped to bitmaps of IBS_OP_CTL. We can't abuse
reserved bits also since they might become valid control bits in future.

Maybe use config2 or config3 which are ignored by IBS driver currently.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
       [not found]         ` <CAM9d7ch8fwk-o7W6KrTgtJ5n8-oVMGqzxvW_zd_hrcWFoE2AHg@mail.gmail.com>
  2024-09-04  6:53           ` Ravi Bangoria
@ 2024-09-04 11:39           ` Peter Zijlstra
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2024-09-04 11:39 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Kan Liang, Mark Rutland, Alexander Shishkin,
	Arnaldo Carvalho de Melo, LKML, Stephane Eranian, Ravi Bangoria

On Tue, Sep 03, 2024 at 09:34:58AM -0700, Namhyung Kim wrote:

> > While the IBS thing only sucks when you use exclude_[user,kernel]
> > things. Stealth suckage is bad and enrages people.
> 
> Right, let's make it explicit like the above.

Given Ravi's concern, perhaps a whole second PMU instance 'ibs_swfilt'
or so? Might have more problems than its worth though, but perhaps
something worth exploring for a bit.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-09-04 11:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-30 23:29 [RFC/PATCH 0/4] perf: Relax privilege restriction on AMD IBS (v2) Namhyung Kim
2024-08-30 23:29 ` [RFC/PATCH 1/4] perf/core: Add PERF_FORMAT_DROPPED Namhyung Kim
2024-08-30 23:29 ` [RFC/PATCH 2/4] perf/core: Export perf_exclude_event() Namhyung Kim
2024-08-30 23:29 ` [RFC/PATCH 3/4] perf/core: Account dropped samples from BPF Namhyung Kim
2024-08-30 23:29 ` [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS Namhyung Kim
2024-09-02  9:12   ` Peter Zijlstra
2024-09-02 17:30     ` Namhyung Kim
2024-09-03  8:54       ` Peter Zijlstra
     [not found]         ` <CAM9d7ch8fwk-o7W6KrTgtJ5n8-oVMGqzxvW_zd_hrcWFoE2AHg@mail.gmail.com>
2024-09-04  6:53           ` Ravi Bangoria
2024-09-04 11:39           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox