linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Fix i915 pmu on bind/unbind
@ 2024-07-22 21:06 Lucas De Marchi
  2024-07-22 21:06 ` [PATCH 1/7] perf/core: Add pmu get/put Lucas De Marchi
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-22 21:06 UTC (permalink / raw)
  To: intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Lucas De Marchi

This is intended to fix the pmu integration in i915 when the device
unbinds. I don't have the hardware to test, but I belive a similar issue
occurs with any driver using perf_pmu_unregister() if they can unbind
from the device - in drm subsystem, that would be the amd driver.

Previous attempts I could find:
1) https://lore.kernel.org/lkml/20240115170120.662220-1-tvrtko.ursulin@linux.intel.com/
2) https://lore.kernel.org/all/20240213180302.47266-1-umesh.nerlige.ramappa@intel.com/

I think (2) is a no-go as it tries to do everything from the i915 side
by force-closing the fd.

I believe this series is simpler* than (1), but that could could also be
a good alternative if we want to go with that approach.

First patch is to perf. The rest is to i915 that builds on that and
moves the unregister bit by bit to be done later, after the last
reference to i915 is dropped.  Peter/Ingo, could I get your opinion on
this or if (1) would be a good alternative? Thanks.

* simpler, but see downside mentioned in patch 6

Lucas De Marchi (7):
  perf/core: Add pmu get/put
  drm/i915/pmu: Fix crash due to use-after-free
  drm/i915/pmu: Use event_to_pmu()
  drm/i915/pmu: Drop is_igp()
  drm/i915/pmu: Let resource survive unbind
  drm/i915/pmu: Lazy unregister
  drm/i915/pmu: Do not set event_init to NULL

 drivers/gpu/drm/i915/i915_pmu.c | 103 ++++++++++++++++----------------
 include/linux/perf_event.h      |   3 +
 kernel/events/core.c            |  32 ++++++++--
 3 files changed, 81 insertions(+), 57 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/7] perf/core: Add pmu get/put
  2024-07-22 21:06 [PATCH 0/7] Fix i915 pmu on bind/unbind Lucas De Marchi
@ 2024-07-22 21:06 ` Lucas De Marchi
  2024-07-23 23:07   ` Ian Rogers
  2024-07-22 21:06 ` [PATCH 2/7] drm/i915/pmu: Fix crash due to use-after-free Lucas De Marchi
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-22 21:06 UTC (permalink / raw)
  To: intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Lucas De Marchi

If a pmu is unregistered while there's an active event, perf will still
access the pmu via event->pmu, even after the event is destroyed. This
makes it difficult for drivers like i915 that take a reference on the
device when the event is created and put it when it's destroyed.
Currently the following use-after-free happens just after destroying the
event:

	BUG: KASAN: use-after-free in exclusive_event_destroy+0xd8/0xf0
	Read of size 4 at addr ffff88816e2bb63c by task perf/7748

Whenever and event is created, get a pmu reference to use in event->pmu
and just before calling module_put(), drop the reference..

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 include/linux/perf_event.h |  3 +++
 kernel/events/core.c       | 32 ++++++++++++++++++++++++++++----
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a5304ae8c654..7048a505e93c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -540,6 +540,9 @@ struct pmu {
 	 * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
 	 */
 	int (*check_period)		(struct perf_event *event, u64 value); /* optional */
+
+	struct pmu *(*get)		(struct pmu *pmu); /* optional: get a reference */
+	void (*put)			(struct pmu *pmu); /* optional: put a reference */
 };
 
 enum perf_addr_filter_action_t {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1b6f5dc7ed32..cc7541b644b0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5208,6 +5208,8 @@ static void perf_addr_filters_splice(struct perf_event *event,
 
 static void _free_event(struct perf_event *event)
 {
+	struct module *module;
+
 	irq_work_sync(&event->pending_irq);
 
 	unaccount_event(event);
@@ -5259,7 +5261,13 @@ static void _free_event(struct perf_event *event)
 		put_ctx(event->ctx);
 
 	exclusive_event_destroy(event);
-	module_put(event->pmu->module);
+
+	module = event->pmu->module;
+	event->pmu->put(event->pmu);
+	/* can't touch pmu anymore */
+	event->pmu = NULL;
+
+	module_put(module);
 
 	call_rcu(&event->rcu_head, free_event_rcu);
 }
@@ -11331,6 +11339,11 @@ static int perf_pmu_nop_int(struct pmu *pmu)
 	return 0;
 }
 
+static struct pmu *perf_pmu_nop_pmu(struct pmu *pmu)
+{
+	return pmu;
+}
+
 static int perf_event_nop_int(struct perf_event *event, u64 value)
 {
 	return 0;
@@ -11617,6 +11630,12 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
 	if (!pmu->event_idx)
 		pmu->event_idx = perf_event_idx_default;
 
+	if (!pmu->get)
+		pmu->get = perf_pmu_nop_pmu;
+
+	if (!pmu->put)
+		pmu->put = perf_pmu_nop_void;
+
 	list_add_rcu(&pmu->entry, &pmus);
 	atomic_set(&pmu->exclusive_cnt, 0);
 	ret = 0;
@@ -11695,7 +11714,8 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
 		BUG_ON(!ctx);
 	}
 
-	event->pmu = pmu;
+	event->pmu = pmu->get(pmu);
+
 	ret = pmu->event_init(event);
 
 	if (ctx)
@@ -11714,8 +11734,12 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
 			event->destroy(event);
 	}
 
-	if (ret)
-		module_put(pmu->module);
+	if (ret) {
+		struct module *module = pmu->module;
+
+		pmu->put(pmu);
+		module_put(module);
+	}
 
 	return ret;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/7] drm/i915/pmu: Fix crash due to use-after-free
  2024-07-22 21:06 [PATCH 0/7] Fix i915 pmu on bind/unbind Lucas De Marchi
  2024-07-22 21:06 ` [PATCH 1/7] perf/core: Add pmu get/put Lucas De Marchi
@ 2024-07-22 21:06 ` Lucas De Marchi
  2024-07-22 21:06 ` [PATCH 3/7] drm/i915/pmu: Use event_to_pmu() Lucas De Marchi
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-22 21:06 UTC (permalink / raw)
  To: intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Lucas De Marchi, stable

When an i915 PMU counter is enabled and the driver is then unbound, the
PMU will be unregistered via perf_pmu_unregister(), however the event
will still be alive. i915 currently tries to deal with this situation
by:

	a) Marking the pmu as "closed" and shortcut the calls from perf
	b) Taking a reference from i915, that is put back when the event
	   is destroyed.
	c) Setting event_init to NULL to avoid any further event

(a) is ugly, but may be left as is since it protects not trying to
access the HW that is now gone. Unless a pmu driver can call
perf_pmu_unregister() and not receive any more calls, it's a necessary
ugliness.

(b) doesn't really work: when the event is destroyed and the i915 ref is
put it may free the i915 object, that contains the pmu, not only the
event. After event->destroy() callback, perf still expects the pmu
object to be alive.

Instead of pigging back on the event->destroy() to take and put the
device reference, implement the new get()/put() on the pmu object for
that purpose.

(c) is not entirely correct as from the perf POV it's not an optional
call: perf would just dereference the NULL pointer. However this also
protects other entrypoints in i915_pmu. A new event creation from perf
after the pmu has been unregistered should not be possible anyway:
perf_init_event() bails out when not finding the pmu. This may be
cleaned up later.

Cc: <stable@vger.kernel.org> # 5.11+
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 34 +++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 21eb0c5b320d..cb5f6471ec6e 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -514,15 +514,6 @@ static enum hrtimer_restart i915_sample(struct hrtimer *hrtimer)
 	return HRTIMER_RESTART;
 }
 
-static void i915_pmu_event_destroy(struct perf_event *event)
-{
-	struct i915_pmu *pmu = event_to_pmu(event);
-	struct drm_i915_private *i915 = pmu_to_i915(pmu);
-
-	drm_WARN_ON(&i915->drm, event->parent);
-
-	drm_dev_put(&i915->drm);
-}
 
 static int
 engine_event_status(struct intel_engine_cs *engine,
@@ -628,11 +619,6 @@ static int i915_pmu_event_init(struct perf_event *event)
 	if (ret)
 		return ret;
 
-	if (!event->parent) {
-		drm_dev_get(&i915->drm);
-		event->destroy = i915_pmu_event_destroy;
-	}
-
 	return 0;
 }
 
@@ -872,6 +858,24 @@ static int i915_pmu_event_event_idx(struct perf_event *event)
 	return 0;
 }
 
+static struct pmu *i915_pmu_get(struct pmu *base)
+{
+	struct i915_pmu *pmu = container_of(base, struct i915_pmu, base);
+	struct drm_i915_private *i915 = pmu_to_i915(pmu);
+
+	drm_dev_get(&i915->drm);
+
+	return base;
+}
+
+static void i915_pmu_put(struct pmu *base)
+{
+	struct i915_pmu *pmu = container_of(base, struct i915_pmu, base);
+	struct drm_i915_private *i915 = pmu_to_i915(pmu);
+
+	drm_dev_put(&i915->drm);
+}
+
 struct i915_str_attribute {
 	struct device_attribute attr;
 	const char *str;
@@ -1299,6 +1303,8 @@ void i915_pmu_register(struct drm_i915_private *i915)
 	pmu->base.stop		= i915_pmu_event_stop;
 	pmu->base.read		= i915_pmu_event_read;
 	pmu->base.event_idx	= i915_pmu_event_event_idx;
+	pmu->base.get		= i915_pmu_get;
+	pmu->base.put		= i915_pmu_put;
 
 	ret = perf_pmu_register(&pmu->base, pmu->name, -1);
 	if (ret)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/7] drm/i915/pmu: Use event_to_pmu()
  2024-07-22 21:06 [PATCH 0/7] Fix i915 pmu on bind/unbind Lucas De Marchi
  2024-07-22 21:06 ` [PATCH 1/7] perf/core: Add pmu get/put Lucas De Marchi
  2024-07-22 21:06 ` [PATCH 2/7] drm/i915/pmu: Fix crash due to use-after-free Lucas De Marchi
@ 2024-07-22 21:06 ` Lucas De Marchi
  2024-07-23  4:35   ` Dixit, Ashutosh
  2024-07-22 21:06 ` [PATCH 4/7] drm/i915/pmu: Drop is_igp() Lucas De Marchi
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-22 21:06 UTC (permalink / raw)
  To: intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Lucas De Marchi

i915 pointer is not needed in this function and all the others simply
calculate the i915_pmu container based on the event->pmu. Follow the
same logic as in other functions.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index cb5f6471ec6e..3a8bd11b87e7 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -820,15 +820,14 @@ static void i915_pmu_event_start(struct perf_event *event, int flags)
 
 static void i915_pmu_event_stop(struct perf_event *event, int flags)
 {
-	struct drm_i915_private *i915 =
-		container_of(event->pmu, typeof(*i915), pmu.base);
-	struct i915_pmu *pmu = &i915->pmu;
+	struct i915_pmu *pmu = event_to_pmu(event);
 
 	if (pmu->closed)
 		goto out;
 
 	if (flags & PERF_EF_UPDATE)
 		i915_pmu_event_read(event);
+
 	i915_pmu_disable(event);
 
 out:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/7] drm/i915/pmu: Drop is_igp()
  2024-07-22 21:06 [PATCH 0/7] Fix i915 pmu on bind/unbind Lucas De Marchi
                   ` (2 preceding siblings ...)
  2024-07-22 21:06 ` [PATCH 3/7] drm/i915/pmu: Use event_to_pmu() Lucas De Marchi
@ 2024-07-22 21:06 ` Lucas De Marchi
  2024-07-22 23:25   ` Dixit, Ashutosh
  2024-07-23  7:52   ` Tvrtko Ursulin
  2024-07-22 21:06 ` [PATCH 5/7] drm/i915/pmu: Let resource survive unbind Lucas De Marchi
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-22 21:06 UTC (permalink / raw)
  To: intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Lucas De Marchi

There's no reason to hardcode checking for integrated graphics on a
specific pci slot. That information is already available per platform an
can be checked with IS_DGFX().

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 3a8bd11b87e7..b5d14dd318e4 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1235,17 +1235,6 @@ static void i915_pmu_unregister_cpuhp_state(struct i915_pmu *pmu)
 	cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node);
 }
 
-static bool is_igp(struct drm_i915_private *i915)
-{
-	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
-
-	/* IGP is 0000:00:02.0 */
-	return pci_domain_nr(pdev->bus) == 0 &&
-	       pdev->bus->number == 0 &&
-	       PCI_SLOT(pdev->devfn) == 2 &&
-	       PCI_FUNC(pdev->devfn) == 0;
-}
-
 void i915_pmu_register(struct drm_i915_private *i915)
 {
 	struct i915_pmu *pmu = &i915->pmu;
@@ -1269,7 +1258,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
 	pmu->cpuhp.cpu = -1;
 	init_rc6(pmu);
 
-	if (!is_igp(i915)) {
+	if (IS_DGFX(i915)) {
 		pmu->name = kasprintf(GFP_KERNEL,
 				      "i915_%s",
 				      dev_name(i915->drm.dev));
@@ -1323,7 +1312,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
 	pmu->base.event_init = NULL;
 	free_event_attributes(pmu);
 err_name:
-	if (!is_igp(i915))
+	if (IS_DGFX(i915))
 		kfree(pmu->name);
 err:
 	drm_notice(&i915->drm, "Failed to register PMU!\n");
@@ -1351,7 +1340,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
 	perf_pmu_unregister(&pmu->base);
 	pmu->base.event_init = NULL;
 	kfree(pmu->base.attr_groups);
-	if (!is_igp(i915))
+	if (IS_DGFX(i915))
 		kfree(pmu->name);
 	free_event_attributes(pmu);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/7] drm/i915/pmu: Let resource survive unbind
  2024-07-22 21:06 [PATCH 0/7] Fix i915 pmu on bind/unbind Lucas De Marchi
                   ` (3 preceding siblings ...)
  2024-07-22 21:06 ` [PATCH 4/7] drm/i915/pmu: Drop is_igp() Lucas De Marchi
@ 2024-07-22 21:06 ` Lucas De Marchi
  2024-07-23  7:58   ` Tvrtko Ursulin
  2024-07-22 21:06 ` [PATCH 6/7] drm/i915/pmu: Lazy unregister Lucas De Marchi
  2024-07-22 21:06 ` [PATCH 7/7] drm/i915/pmu: Do not set event_init to NULL Lucas De Marchi
  6 siblings, 1 reply; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-22 21:06 UTC (permalink / raw)
  To: intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Lucas De Marchi

There's no need to free the resources during unbind. Since perf events
may still access them due to open events, it's safer to free them when
dropping the last i915 reference.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index b5d14dd318e4..8708f905f4f4 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/pm_runtime.h>
+#include <drm/drm_managed.h>
 
 #include "gt/intel_engine.h"
 #include "gt/intel_engine_pm.h"
@@ -1152,6 +1153,17 @@ static void free_event_attributes(struct i915_pmu *pmu)
 	pmu->pmu_attr = NULL;
 }
 
+static void free_pmu(struct drm_device *dev, void *res)
+{
+	struct i915_pmu *pmu = res;
+	struct drm_i915_private *i915 = pmu_to_i915(pmu);
+
+	free_event_attributes(pmu);
+	kfree(pmu->base.attr_groups);
+	if (IS_DGFX(i915))
+		kfree(pmu->name);
+}
+
 static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
 {
 	struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
@@ -1302,6 +1314,9 @@ void i915_pmu_register(struct drm_i915_private *i915)
 	if (ret)
 		goto err_unreg;
 
+	if (drmm_add_action_or_reset(&i915->drm, free_pmu, pmu))
+		goto err_unreg;
+
 	return;
 
 err_unreg:
@@ -1336,11 +1351,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
 	hrtimer_cancel(&pmu->timer);
 
 	i915_pmu_unregister_cpuhp_state(pmu);
-
 	perf_pmu_unregister(&pmu->base);
+
 	pmu->base.event_init = NULL;
-	kfree(pmu->base.attr_groups);
-	if (IS_DGFX(i915))
-		kfree(pmu->name);
-	free_event_attributes(pmu);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/7] drm/i915/pmu: Lazy unregister
  2024-07-22 21:06 [PATCH 0/7] Fix i915 pmu on bind/unbind Lucas De Marchi
                   ` (4 preceding siblings ...)
  2024-07-22 21:06 ` [PATCH 5/7] drm/i915/pmu: Let resource survive unbind Lucas De Marchi
@ 2024-07-22 21:06 ` Lucas De Marchi
  2024-07-23  8:03   ` Tvrtko Ursulin
  2024-07-22 21:06 ` [PATCH 7/7] drm/i915/pmu: Do not set event_init to NULL Lucas De Marchi
  6 siblings, 1 reply; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-22 21:06 UTC (permalink / raw)
  To: intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Lucas De Marchi

Instead of calling perf_pmu_unregister() when unbinding, defer that to
the destruction of i915 object. Since perf itself holds a reference in
the event, this only happens when all events are gone, which guarantees
i915 is not unregistering the pmu with live events.

Previously, running the following sequence would crash the system after
~2 tries:

	1) bind device to i915
	2) wait events to show up on sysfs
	3) start perf  stat -I 1000 -e i915/rcs0-busy/
	4) unbind driver
	5) kill perf

Most of the time this crashes in perf_pmu_disable() while accessing the
percpu pmu_disable_count. This happens because perf_pmu_unregister()
destroys it with free_percpu(pmu->pmu_disable_count).

With a lazy unbind, the pmu is only unregistered after (5) as opposed to
after (4). The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully). This seems better
than completely crashing the system.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 8708f905f4f4..df53a8fe53ec 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, void *res)
 	struct i915_pmu *pmu = res;
 	struct drm_i915_private *i915 = pmu_to_i915(pmu);
 
+	perf_pmu_unregister(&pmu->base);
 	free_event_attributes(pmu);
 	kfree(pmu->base.attr_groups);
 	if (IS_DGFX(i915))
 		kfree(pmu->name);
+
+	/*
+	 * Make sure all currently running (but shortcut on pmu->closed) are
+	 * gone before proceeding with free'ing the pmu object embedded in i915.
+	 */
+	synchronize_rcu();
 }
 
 static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
 {
-	struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
-
-	GEM_BUG_ON(!pmu->base.event_init);
-
 	/* Select the first online CPU as a designated reader. */
 	if (cpumask_empty(&i915_pmu_cpumask))
 		cpumask_set_cpu(cpu, &i915_pmu_cpumask);
@@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
 	struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
 	unsigned int target = i915_pmu_target_cpu;
 
-	GEM_BUG_ON(!pmu->base.event_init);
-
 	/*
 	 * Unregistering an instance generates a CPU offline event which we must
 	 * ignore to avoid incorrectly modifying the shared i915_pmu_cpumask.
@@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
 {
 	struct i915_pmu *pmu = &i915->pmu;
 
-	if (!pmu->base.event_init)
-		return;
-
 	/*
-	 * "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
-	 * ensures all currently executing ones will have exited before we
-	 * proceed with unregistration.
+	 * "Disconnect" the PMU callbacks - unregistering the pmu will be done
+	 * later when all currently open events are gone
 	 */
 	pmu->closed = true;
-	synchronize_rcu();
 
 	hrtimer_cancel(&pmu->timer);
-
 	i915_pmu_unregister_cpuhp_state(pmu);
-	perf_pmu_unregister(&pmu->base);
 
 	pmu->base.event_init = NULL;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/7] drm/i915/pmu: Do not set event_init to NULL
  2024-07-22 21:06 [PATCH 0/7] Fix i915 pmu on bind/unbind Lucas De Marchi
                   ` (5 preceding siblings ...)
  2024-07-22 21:06 ` [PATCH 6/7] drm/i915/pmu: Lazy unregister Lucas De Marchi
@ 2024-07-22 21:06 ` Lucas De Marchi
  2024-08-05  6:55   ` kernel test robot
  6 siblings, 1 reply; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-22 21:06 UTC (permalink / raw)
  To: intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Lucas De Marchi

event_init is not an optional function pointer from perf events. Now
that pmu unregister happens only when freeing i915, setting it to NULL
only protects other functions in i915. Replace that by checking
pmu->closed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/i915_pmu.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index df53a8fe53ec..c5738035bc2f 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -303,7 +303,7 @@ void i915_pmu_gt_parked(struct intel_gt *gt)
 {
 	struct i915_pmu *pmu = &gt->i915->pmu;
 
-	if (!pmu->base.event_init)
+	if (pmu->closed)
 		return;
 
 	spin_lock_irq(&pmu->lock);
@@ -325,7 +325,7 @@ void i915_pmu_gt_unparked(struct intel_gt *gt)
 {
 	struct i915_pmu *pmu = &gt->i915->pmu;
 
-	if (!pmu->base.event_init)
+	if (pmu->closed)
 		return;
 
 	spin_lock_irq(&pmu->lock);
@@ -1325,12 +1325,12 @@ void i915_pmu_register(struct drm_i915_private *i915)
 err_groups:
 	kfree(pmu->base.attr_groups);
 err_attr:
-	pmu->base.event_init = NULL;
 	free_event_attributes(pmu);
 err_name:
 	if (IS_DGFX(i915))
 		kfree(pmu->name);
 err:
+	pmu->closed = true;
 	drm_notice(&i915->drm, "Failed to register PMU!\n");
 }
 
@@ -1346,6 +1346,4 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
 
 	hrtimer_cancel(&pmu->timer);
 	i915_pmu_unregister_cpuhp_state(pmu);
-
-	pmu->base.event_init = NULL;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/7] drm/i915/pmu: Drop is_igp()
  2024-07-22 21:06 ` [PATCH 4/7] drm/i915/pmu: Drop is_igp() Lucas De Marchi
@ 2024-07-22 23:25   ` Dixit, Ashutosh
  2024-07-23  7:52   ` Tvrtko Ursulin
  1 sibling, 0 replies; 20+ messages in thread
From: Dixit, Ashutosh @ 2024-07-22 23:25 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-gfx, linux-perf-users, Tvrtko Ursulin, dri-devel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel

On Mon, 22 Jul 2024 14:06:45 -0700, Lucas De Marchi wrote:
>
> There's no reason to hardcode checking for integrated graphics on a
> specific pci slot. That information is already available per platform an
> can be checked with IS_DGFX().

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 17 +++--------------
>  1 file changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 3a8bd11b87e7..b5d14dd318e4 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -1235,17 +1235,6 @@ static void i915_pmu_unregister_cpuhp_state(struct i915_pmu *pmu)
>	cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node);
>  }
>
> -static bool is_igp(struct drm_i915_private *i915)
> -{
> -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> -
> -	/* IGP is 0000:00:02.0 */
> -	return pci_domain_nr(pdev->bus) == 0 &&
> -	       pdev->bus->number == 0 &&
> -	       PCI_SLOT(pdev->devfn) == 2 &&
> -	       PCI_FUNC(pdev->devfn) == 0;
> -}
> -
>  void i915_pmu_register(struct drm_i915_private *i915)
>  {
>	struct i915_pmu *pmu = &i915->pmu;
> @@ -1269,7 +1258,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
>	pmu->cpuhp.cpu = -1;
>	init_rc6(pmu);
>
> -	if (!is_igp(i915)) {
> +	if (IS_DGFX(i915)) {
>		pmu->name = kasprintf(GFP_KERNEL,
>				      "i915_%s",
>				      dev_name(i915->drm.dev));
> @@ -1323,7 +1312,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
>	pmu->base.event_init = NULL;
>	free_event_attributes(pmu);
>  err_name:
> -	if (!is_igp(i915))
> +	if (IS_DGFX(i915))
>		kfree(pmu->name);
>  err:
>	drm_notice(&i915->drm, "Failed to register PMU!\n");
> @@ -1351,7 +1340,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
>	perf_pmu_unregister(&pmu->base);
>	pmu->base.event_init = NULL;
>	kfree(pmu->base.attr_groups);
> -	if (!is_igp(i915))
> +	if (IS_DGFX(i915))
>		kfree(pmu->name);
>	free_event_attributes(pmu);
>  }
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/7] drm/i915/pmu: Use event_to_pmu()
  2024-07-22 21:06 ` [PATCH 3/7] drm/i915/pmu: Use event_to_pmu() Lucas De Marchi
@ 2024-07-23  4:35   ` Dixit, Ashutosh
  0 siblings, 0 replies; 20+ messages in thread
From: Dixit, Ashutosh @ 2024-07-23  4:35 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-gfx, linux-perf-users, Tvrtko Ursulin, dri-devel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel

On Mon, 22 Jul 2024 14:06:44 -0700, Lucas De Marchi wrote:
>
> i915 pointer is not needed in this function and all the others simply
> calculate the i915_pmu container based on the event->pmu. Follow the
> same logic as in other functions.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_pmu.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index cb5f6471ec6e..3a8bd11b87e7 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -820,15 +820,14 @@ static void i915_pmu_event_start(struct perf_event *event, int flags)
>
>  static void i915_pmu_event_stop(struct perf_event *event, int flags)
>  {
> -	struct drm_i915_private *i915 =
> -		container_of(event->pmu, typeof(*i915), pmu.base);
> -	struct i915_pmu *pmu = &i915->pmu;
> +	struct i915_pmu *pmu = event_to_pmu(event);
>
>	if (pmu->closed)
>		goto out;
>
>	if (flags & PERF_EF_UPDATE)
>		i915_pmu_event_read(event);
> +
>	i915_pmu_disable(event);
>
>  out:
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/7] drm/i915/pmu: Drop is_igp()
  2024-07-22 21:06 ` [PATCH 4/7] drm/i915/pmu: Drop is_igp() Lucas De Marchi
  2024-07-22 23:25   ` Dixit, Ashutosh
@ 2024-07-23  7:52   ` Tvrtko Ursulin
  1 sibling, 0 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2024-07-23  7:52 UTC (permalink / raw)
  To: Lucas De Marchi, intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel


On 22/07/2024 22:06, Lucas De Marchi wrote:
> There's no reason to hardcode checking for integrated graphics on a
> specific pci slot. That information is already available per platform an
> can be checked with IS_DGFX().

Hmm probably reason was this, added is_igp:

commit 05488673a4d41383f9dd537f298e525e6b00fb93
Author:     Tvrtko Ursulin <tursulin@ursulin.net>
AuthorDate: Wed Oct 16 10:38:02 2019 +0100
Commit:     Tvrtko Ursulin <tursulin@ursulin.net>
CommitDate: Thu Oct 17 10:50:47 2019 +0100

     drm/i915/pmu: Support multiple GPUs

Added IS_DGFX:

commit dc90fe3fd219c7693617ba09a9467e4aadc2e039
Author:     José Roberto de Souza <jose.souza@intel.com>
AuthorDate: Thu Oct 24 12:51:19 2019 -0700
Commit:     Lucas De Marchi <lucas.demarchi@intel.com>
CommitDate: Fri Oct 25 13:53:51 2019 -0700

     drm/i915: Add is_dgfx to device info

So innocently arrived just a bit before.

Regards,

Tvrtko

> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 17 +++--------------
>   1 file changed, 3 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 3a8bd11b87e7..b5d14dd318e4 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -1235,17 +1235,6 @@ static void i915_pmu_unregister_cpuhp_state(struct i915_pmu *pmu)
>   	cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node);
>   }
>   
> -static bool is_igp(struct drm_i915_private *i915)
> -{
> -	struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> -
> -	/* IGP is 0000:00:02.0 */
> -	return pci_domain_nr(pdev->bus) == 0 &&
> -	       pdev->bus->number == 0 &&
> -	       PCI_SLOT(pdev->devfn) == 2 &&
> -	       PCI_FUNC(pdev->devfn) == 0;
> -}
> -
>   void i915_pmu_register(struct drm_i915_private *i915)
>   {
>   	struct i915_pmu *pmu = &i915->pmu;
> @@ -1269,7 +1258,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
>   	pmu->cpuhp.cpu = -1;
>   	init_rc6(pmu);
>   
> -	if (!is_igp(i915)) {
> +	if (IS_DGFX(i915)) {
>   		pmu->name = kasprintf(GFP_KERNEL,
>   				      "i915_%s",
>   				      dev_name(i915->drm.dev));
> @@ -1323,7 +1312,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
>   	pmu->base.event_init = NULL;
>   	free_event_attributes(pmu);
>   err_name:
> -	if (!is_igp(i915))
> +	if (IS_DGFX(i915))
>   		kfree(pmu->name);
>   err:
>   	drm_notice(&i915->drm, "Failed to register PMU!\n");
> @@ -1351,7 +1340,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
>   	perf_pmu_unregister(&pmu->base);
>   	pmu->base.event_init = NULL;
>   	kfree(pmu->base.attr_groups);
> -	if (!is_igp(i915))
> +	if (IS_DGFX(i915))
>   		kfree(pmu->name);
>   	free_event_attributes(pmu);
>   }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/7] drm/i915/pmu: Let resource survive unbind
  2024-07-22 21:06 ` [PATCH 5/7] drm/i915/pmu: Let resource survive unbind Lucas De Marchi
@ 2024-07-23  7:58   ` Tvrtko Ursulin
  0 siblings, 0 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2024-07-23  7:58 UTC (permalink / raw)
  To: Lucas De Marchi, intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel


On 22/07/2024 22:06, Lucas De Marchi wrote:
> There's no need to free the resources during unbind. Since perf events
> may still access them due to open events, it's safer to free them when
> dropping the last i915 reference.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 21 ++++++++++++++++-----
>   1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index b5d14dd318e4..8708f905f4f4 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -5,6 +5,7 @@
>    */
>   
>   #include <linux/pm_runtime.h>
> +#include <drm/drm_managed.h>
>   
>   #include "gt/intel_engine.h"
>   #include "gt/intel_engine_pm.h"
> @@ -1152,6 +1153,17 @@ static void free_event_attributes(struct i915_pmu *pmu)
>   	pmu->pmu_attr = NULL;
>   }
>   
> +static void free_pmu(struct drm_device *dev, void *res)
> +{
> +	struct i915_pmu *pmu = res;
> +	struct drm_i915_private *i915 = pmu_to_i915(pmu);
> +
> +	free_event_attributes(pmu);
> +	kfree(pmu->base.attr_groups);
> +	if (IS_DGFX(i915))
> +		kfree(pmu->name);
> +}
> +
>   static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>   {
>   	struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
> @@ -1302,6 +1314,9 @@ void i915_pmu_register(struct drm_i915_private *i915)
>   	if (ret)
>   		goto err_unreg;
>   
> +	if (drmm_add_action_or_reset(&i915->drm, free_pmu, pmu))
> +		goto err_unreg;

Is i915_pmu_unregister_cpuhp_state missing on this error path?

Regards,

Tvrtko

> +
>   	return;
>   
>   err_unreg:
> @@ -1336,11 +1351,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
>   	hrtimer_cancel(&pmu->timer);
>   
>   	i915_pmu_unregister_cpuhp_state(pmu);
> -
>   	perf_pmu_unregister(&pmu->base);
> +
>   	pmu->base.event_init = NULL;
> -	kfree(pmu->base.attr_groups);
> -	if (IS_DGFX(i915))
> -		kfree(pmu->name);
> -	free_event_attributes(pmu);
>   }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
  2024-07-22 21:06 ` [PATCH 6/7] drm/i915/pmu: Lazy unregister Lucas De Marchi
@ 2024-07-23  8:03   ` Tvrtko Ursulin
  2024-07-23 15:30     ` Lucas De Marchi
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2024-07-23  8:03 UTC (permalink / raw)
  To: Lucas De Marchi, intel-gfx, linux-perf-users
  Cc: Tvrtko Ursulin, dri-devel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel


On 22/07/2024 22:06, Lucas De Marchi wrote:
> Instead of calling perf_pmu_unregister() when unbinding, defer that to
> the destruction of i915 object. Since perf itself holds a reference in
> the event, this only happens when all events are gone, which guarantees
> i915 is not unregistering the pmu with live events.
> 
> Previously, running the following sequence would crash the system after
> ~2 tries:
> 
> 	1) bind device to i915
> 	2) wait events to show up on sysfs
> 	3) start perf  stat -I 1000 -e i915/rcs0-busy/
> 	4) unbind driver
> 	5) kill perf
> 
> Most of the time this crashes in perf_pmu_disable() while accessing the
> percpu pmu_disable_count. This happens because perf_pmu_unregister()
> destroys it with free_percpu(pmu->pmu_disable_count).
> 
> With a lazy unbind, the pmu is only unregistered after (5) as opposed to
> after (4). The downside is that if a new bind operation is attempted for
> the same device/driver without killing the perf process, i915 will fail
> to register the pmu (but still load successfully). This seems better
> than completely crashing the system.

So effectively allows unbind to succeed without fully unbinding the 
driver from the device? That sounds like a significant drawback and if 
so, I wonder if a more complicated solution wouldn't be better after 
all. Or is there precedence for allowing userspace keeping their paws on 
unbound devices in this way?

Regards,

Tvrtko

> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 24 +++++++++---------------
>   1 file changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 8708f905f4f4..df53a8fe53ec 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, void *res)
>   	struct i915_pmu *pmu = res;
>   	struct drm_i915_private *i915 = pmu_to_i915(pmu);
>   
> +	perf_pmu_unregister(&pmu->base);
>   	free_event_attributes(pmu);
>   	kfree(pmu->base.attr_groups);
>   	if (IS_DGFX(i915))
>   		kfree(pmu->name);
> +
> +	/*
> +	 * Make sure all currently running (but shortcut on pmu->closed) are
> +	 * gone before proceeding with free'ing the pmu object embedded in i915.
> +	 */
> +	synchronize_rcu();
>   }
>   
>   static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>   {
> -	struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
> -
> -	GEM_BUG_ON(!pmu->base.event_init);
> -
>   	/* Select the first online CPU as a designated reader. */
>   	if (cpumask_empty(&i915_pmu_cpumask))
>   		cpumask_set_cpu(cpu, &i915_pmu_cpumask);
> @@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
>   	struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
>   	unsigned int target = i915_pmu_target_cpu;
>   
> -	GEM_BUG_ON(!pmu->base.event_init);
> -
>   	/*
>   	 * Unregistering an instance generates a CPU offline event which we must
>   	 * ignore to avoid incorrectly modifying the shared i915_pmu_cpumask.
> @@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
>   {
>   	struct i915_pmu *pmu = &i915->pmu;
>   
> -	if (!pmu->base.event_init)
> -		return;
> -
>   	/*
> -	 * "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
> -	 * ensures all currently executing ones will have exited before we
> -	 * proceed with unregistration.
> +	 * "Disconnect" the PMU callbacks - unregistering the pmu will be done
> +	 * later when all currently open events are gone
>   	 */
>   	pmu->closed = true;
> -	synchronize_rcu();
>   
>   	hrtimer_cancel(&pmu->timer);
> -
>   	i915_pmu_unregister_cpuhp_state(pmu);
> -	perf_pmu_unregister(&pmu->base);
>   
>   	pmu->base.event_init = NULL;
>   }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
  2024-07-23  8:03   ` Tvrtko Ursulin
@ 2024-07-23 15:30     ` Lucas De Marchi
  2024-07-24  7:48       ` Tvrtko Ursulin
  2024-07-24 12:41       ` Peter Zijlstra
  0 siblings, 2 replies; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-23 15:30 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: intel-gfx, linux-perf-users, Tvrtko Ursulin, dri-devel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel

On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:
>
>On 22/07/2024 22:06, Lucas De Marchi wrote:
>>Instead of calling perf_pmu_unregister() when unbinding, defer that to
>>the destruction of i915 object. Since perf itself holds a reference in
>>the event, this only happens when all events are gone, which guarantees
>>i915 is not unregistering the pmu with live events.
>>
>>Previously, running the following sequence would crash the system after
>>~2 tries:
>>
>>	1) bind device to i915
>>	2) wait events to show up on sysfs
>>	3) start perf  stat -I 1000 -e i915/rcs0-busy/
>>	4) unbind driver
>>	5) kill perf
>>
>>Most of the time this crashes in perf_pmu_disable() while accessing the
>>percpu pmu_disable_count. This happens because perf_pmu_unregister()
>>destroys it with free_percpu(pmu->pmu_disable_count).
>>
>>With a lazy unbind, the pmu is only unregistered after (5) as opposed to
>>after (4). The downside is that if a new bind operation is attempted for
>>the same device/driver without killing the perf process, i915 will fail
>>to register the pmu (but still load successfully). This seems better
>>than completely crashing the system.
>
>So effectively allows unbind to succeed without fully unbinding the 
>driver from the device? That sounds like a significant drawback and if 
>so, I wonder if a more complicated solution wouldn't be better after 
>all. Or is there precedence for allowing userspace keeping their paws 
>on unbound devices in this way?

keeping the resources alive but "unplunged" while the hardware
disappeared is a common thing to do... it's the whole point of the
drmm-managed resource for example. If you bind the driver and then
unbind it while userspace is holding a ref, next time you try to bind it
will come up with a different card number. A similar thing that could be
done is to adjust the name of the event - currently we add the mangled
pci slot.

That said, I agree a better approach would be to allow
perf_pmu_unregister() to do its job even when there are open events. On
top of that (or as a way to help achieve that), make perf core replace
the callbacks with stubs when pmu is unregistered - that would even kill
the need for i915's checks on pmu->closed (and fix the lack thereof in
other drivers).

It can be a can of worms though and may be pushed back by perf core
maintainers, so it'd be good have their feedback.

thanks
Lucas De Marchi

>
>Regards,
>
>Tvrtko
>
>>
>>Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>>---
>>  drivers/gpu/drm/i915/i915_pmu.c | 24 +++++++++---------------
>>  1 file changed, 9 insertions(+), 15 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>>index 8708f905f4f4..df53a8fe53ec 100644
>>--- a/drivers/gpu/drm/i915/i915_pmu.c
>>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>>@@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, void *res)
>>  	struct i915_pmu *pmu = res;
>>  	struct drm_i915_private *i915 = pmu_to_i915(pmu);
>>+	perf_pmu_unregister(&pmu->base);
>>  	free_event_attributes(pmu);
>>  	kfree(pmu->base.attr_groups);
>>  	if (IS_DGFX(i915))
>>  		kfree(pmu->name);
>>+
>>+	/*
>>+	 * Make sure all currently running (but shortcut on pmu->closed) are
>>+	 * gone before proceeding with free'ing the pmu object embedded in i915.
>>+	 */
>>+	synchronize_rcu();
>>  }
>>  static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
>>  {
>>-	struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
>>-
>>-	GEM_BUG_ON(!pmu->base.event_init);
>>-
>>  	/* Select the first online CPU as a designated reader. */
>>  	if (cpumask_empty(&i915_pmu_cpumask))
>>  		cpumask_set_cpu(cpu, &i915_pmu_cpumask);
>>@@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
>>  	struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
>>  	unsigned int target = i915_pmu_target_cpu;
>>-	GEM_BUG_ON(!pmu->base.event_init);
>>-
>>  	/*
>>  	 * Unregistering an instance generates a CPU offline event which we must
>>  	 * ignore to avoid incorrectly modifying the shared i915_pmu_cpumask.
>>@@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
>>  {
>>  	struct i915_pmu *pmu = &i915->pmu;
>>-	if (!pmu->base.event_init)
>>-		return;
>>-
>>  	/*
>>-	 * "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
>>-	 * ensures all currently executing ones will have exited before we
>>-	 * proceed with unregistration.
>>+	 * "Disconnect" the PMU callbacks - unregistering the pmu will be done
>>+	 * later when all currently open events are gone
>>  	 */
>>  	pmu->closed = true;
>>-	synchronize_rcu();
>>  	hrtimer_cancel(&pmu->timer);
>>-
>>  	i915_pmu_unregister_cpuhp_state(pmu);
>>-	perf_pmu_unregister(&pmu->base);
>>  	pmu->base.event_init = NULL;
>>  }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/7] perf/core: Add pmu get/put
  2024-07-22 21:06 ` [PATCH 1/7] perf/core: Add pmu get/put Lucas De Marchi
@ 2024-07-23 23:07   ` Ian Rogers
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Rogers @ 2024-07-23 23:07 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-gfx, linux-perf-users, Tvrtko Ursulin, dri-devel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel

On Mon, Jul 22, 2024 at 2:07 PM Lucas De Marchi
<lucas.demarchi@intel.com> wrote:
>
> If a pmu is unregistered while there's an active event, perf will still
> access the pmu via event->pmu, even after the event is destroyed. This
> makes it difficult for drivers like i915 that take a reference on the
> device when the event is created and put it when it's destroyed.
> Currently the following use-after-free happens just after destroying the
> event:
>
>         BUG: KASAN: use-after-free in exclusive_event_destroy+0xd8/0xf0
>         Read of size 4 at addr ffff88816e2bb63c by task perf/7748
>
> Whenever and event is created, get a pmu reference to use in event->pmu
> and just before calling module_put(), drop the reference..
>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  include/linux/perf_event.h |  3 +++
>  kernel/events/core.c       | 32 ++++++++++++++++++++++++++++----
>  2 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index a5304ae8c654..7048a505e93c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -540,6 +540,9 @@ struct pmu {
>          * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
>          */
>         int (*check_period)             (struct perf_event *event, u64 value); /* optional */
> +
> +       struct pmu *(*get)              (struct pmu *pmu); /* optional: get a reference */
> +       void (*put)                     (struct pmu *pmu); /* optional: put a reference */
>  };
>
>  enum perf_addr_filter_action_t {
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1b6f5dc7ed32..cc7541b644b0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5208,6 +5208,8 @@ static void perf_addr_filters_splice(struct perf_event *event,
>
>  static void _free_event(struct perf_event *event)
>  {
> +       struct module *module;
> +
>         irq_work_sync(&event->pending_irq);
>
>         unaccount_event(event);
> @@ -5259,7 +5261,13 @@ static void _free_event(struct perf_event *event)
>                 put_ctx(event->ctx);
>
>         exclusive_event_destroy(event);
> -       module_put(event->pmu->module);
> +
> +       module = event->pmu->module;
> +       event->pmu->put(event->pmu);
> +       /* can't touch pmu anymore */
> +       event->pmu = NULL;
> +
> +       module_put(module);
>
>         call_rcu(&event->rcu_head, free_event_rcu);
>  }
> @@ -11331,6 +11339,11 @@ static int perf_pmu_nop_int(struct pmu *pmu)
>         return 0;
>  }
>
> +static struct pmu *perf_pmu_nop_pmu(struct pmu *pmu)
> +{
> +       return pmu;
> +}
> +
>  static int perf_event_nop_int(struct perf_event *event, u64 value)
>  {
>         return 0;
> @@ -11617,6 +11630,12 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
>         if (!pmu->event_idx)
>                 pmu->event_idx = perf_event_idx_default;
>
> +       if (!pmu->get)
> +               pmu->get = perf_pmu_nop_pmu;
> +
> +       if (!pmu->put)
> +               pmu->put = perf_pmu_nop_void;
> +
>         list_add_rcu(&pmu->entry, &pmus);
>         atomic_set(&pmu->exclusive_cnt, 0);
>         ret = 0;
> @@ -11695,7 +11714,8 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
>                 BUG_ON(!ctx);
>         }
>
> -       event->pmu = pmu;
> +       event->pmu = pmu->get(pmu);
> +
>         ret = pmu->event_init(event);
>
>         if (ctx)
> @@ -11714,8 +11734,12 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
>                         event->destroy(event);
>         }
>
> -       if (ret)
> -               module_put(pmu->module);
> +       if (ret) {
> +               struct module *module = pmu->module;
> +
> +               pmu->put(pmu);

I think this is a great fix, a nit here, wouldn't it be good to do:

event->pmu = NULL;

Thanks,
Ian

> +               module_put(module);
> +       }
>
>         return ret;
>  }
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
  2024-07-23 15:30     ` Lucas De Marchi
@ 2024-07-24  7:48       ` Tvrtko Ursulin
  2024-07-24 12:41       ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2024-07-24  7:48 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-gfx, linux-perf-users, Tvrtko Ursulin, dri-devel,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel


On 23/07/2024 16:30, Lucas De Marchi wrote:
> On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:
>>
>> On 22/07/2024 22:06, Lucas De Marchi wrote:
>>> Instead of calling perf_pmu_unregister() when unbinding, defer that to
>>> the destruction of i915 object. Since perf itself holds a reference in
>>> the event, this only happens when all events are gone, which guarantees
>>> i915 is not unregistering the pmu with live events.
>>>
>>> Previously, running the following sequence would crash the system after
>>> ~2 tries:
>>>
>>>     1) bind device to i915
>>>     2) wait events to show up on sysfs
>>>     3) start perf  stat -I 1000 -e i915/rcs0-busy/
>>>     4) unbind driver
>>>     5) kill perf
>>>
>>> Most of the time this crashes in perf_pmu_disable() while accessing the
>>> percpu pmu_disable_count. This happens because perf_pmu_unregister()
>>> destroys it with free_percpu(pmu->pmu_disable_count).
>>>
>>> With a lazy unbind, the pmu is only unregistered after (5) as opposed to
>>> after (4). The downside is that if a new bind operation is attempted for
>>> the same device/driver without killing the perf process, i915 will fail
>>> to register the pmu (but still load successfully). This seems better
>>> than completely crashing the system.
>>
>> So effectively allows unbind to succeed without fully unbinding the 
>> driver from the device? That sounds like a significant drawback and if 
>> so, I wonder if a more complicated solution wouldn't be better after 
>> all. Or is there precedence for allowing userspace keeping their paws 
>> on unbound devices in this way?
> 
> keeping the resources alive but "unplunged" while the hardware
> disappeared is a common thing to do... it's the whole point of the
> drmm-managed resource for example. If you bind the driver and then
> unbind it while userspace is holding a ref, next time you try to bind it
> will come up with a different card number. A similar thing that could be
> done is to adjust the name of the event - currently we add the mangled
> pci slot.

Yes.. but what my point was this from your commit message:

"""
The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully).
"""

So the subsequent bind does not "come up with a different card number". 
Statement is it will come up with an error if we look at the PMU subset 
of functionality. I was wondering if there was precedent for that kind 
of situation.

Mangling the PMU driver name probably also wouldn't be great.

> That said, I agree a better approach would be to allow
> perf_pmu_unregister() to do its job even when there are open events. On
> top of that (or as a way to help achieve that), make perf core replace
> the callbacks with stubs when pmu is unregistered - that would even kill
> the need for i915's checks on pmu->closed (and fix the lack thereof in
> other drivers).
> 
> It can be a can of worms though and may be pushed back by perf core
> maintainers, so it'd be good have their feedback.

Yeah definitely would be essential.

Regards,

Tvrtko

>>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>>> ---
>>>  drivers/gpu/drm/i915/i915_pmu.c | 24 +++++++++---------------
>>>  1 file changed, 9 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
>>> b/drivers/gpu/drm/i915/i915_pmu.c
>>> index 8708f905f4f4..df53a8fe53ec 100644
>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>> @@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, 
>>> void *res)
>>>      struct i915_pmu *pmu = res;
>>>      struct drm_i915_private *i915 = pmu_to_i915(pmu);
>>> +    perf_pmu_unregister(&pmu->base);
>>>      free_event_attributes(pmu);
>>>      kfree(pmu->base.attr_groups);
>>>      if (IS_DGFX(i915))
>>>          kfree(pmu->name);
>>> +
>>> +    /*
>>> +     * Make sure all currently running (but shortcut on pmu->closed) 
>>> are
>>> +     * gone before proceeding with free'ing the pmu object embedded 
>>> in i915.
>>> +     */
>>> +    synchronize_rcu();
>>>  }
>>>  static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node 
>>> *node)
>>>  {
>>> -    struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), 
>>> cpuhp.node);
>>> -
>>> -    GEM_BUG_ON(!pmu->base.event_init);
>>> -
>>>      /* Select the first online CPU as a designated reader. */
>>>      if (cpumask_empty(&i915_pmu_cpumask))
>>>          cpumask_set_cpu(cpu, &i915_pmu_cpumask);
>>> @@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int 
>>> cpu, struct hlist_node *node)
>>>      struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), 
>>> cpuhp.node);
>>>      unsigned int target = i915_pmu_target_cpu;
>>> -    GEM_BUG_ON(!pmu->base.event_init);
>>> -
>>>      /*
>>>       * Unregistering an instance generates a CPU offline event which 
>>> we must
>>>       * ignore to avoid incorrectly modifying the shared 
>>> i915_pmu_cpumask.
>>> @@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct 
>>> drm_i915_private *i915)
>>>  {
>>>      struct i915_pmu *pmu = &i915->pmu;
>>> -    if (!pmu->base.event_init)
>>> -        return;
>>> -
>>>      /*
>>> -     * "Disconnect" the PMU callbacks - since all are atomic 
>>> synchronize_rcu
>>> -     * ensures all currently executing ones will have exited before we
>>> -     * proceed with unregistration.
>>> +     * "Disconnect" the PMU callbacks - unregistering the pmu will 
>>> be done
>>> +     * later when all currently open events are gone
>>>       */
>>>      pmu->closed = true;
>>> -    synchronize_rcu();
>>>      hrtimer_cancel(&pmu->timer);
>>> -
>>>      i915_pmu_unregister_cpuhp_state(pmu);
>>> -    perf_pmu_unregister(&pmu->base);
>>>      pmu->base.event_init = NULL;
>>>  }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
  2024-07-23 15:30     ` Lucas De Marchi
  2024-07-24  7:48       ` Tvrtko Ursulin
@ 2024-07-24 12:41       ` Peter Zijlstra
  2024-07-24 15:39         ` Lucas De Marchi
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2024-07-24 12:41 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: Tvrtko Ursulin, intel-gfx, linux-perf-users, Tvrtko Ursulin,
	dri-devel, Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel

On Tue, Jul 23, 2024 at 10:30:08AM -0500, Lucas De Marchi wrote:
> On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:
> > 
> > On 22/07/2024 22:06, Lucas De Marchi wrote:
> > > Instead of calling perf_pmu_unregister() when unbinding, defer that to
> > > the destruction of i915 object. Since perf itself holds a reference in
> > > the event, this only happens when all events are gone, which guarantees
> > > i915 is not unregistering the pmu with live events.
> > > 
> > > Previously, running the following sequence would crash the system after
> > > ~2 tries:
> > > 
> > > 	1) bind device to i915
> > > 	2) wait events to show up on sysfs
> > > 	3) start perf  stat -I 1000 -e i915/rcs0-busy/
> > > 	4) unbind driver
> > > 	5) kill perf
> > > 
> > > Most of the time this crashes in perf_pmu_disable() while accessing the
> > > percpu pmu_disable_count. This happens because perf_pmu_unregister()
> > > destroys it with free_percpu(pmu->pmu_disable_count).
> > > 
> > > With a lazy unbind, the pmu is only unregistered after (5) as opposed to
> > > after (4). The downside is that if a new bind operation is attempted for
> > > the same device/driver without killing the perf process, i915 will fail
> > > to register the pmu (but still load successfully). This seems better
> > > than completely crashing the system.
> > 
> > So effectively allows unbind to succeed without fully unbinding the
> > driver from the device? That sounds like a significant drawback and if
> > so, I wonder if a more complicated solution wouldn't be better after
> > all. Or is there precedence for allowing userspace keeping their paws on
> > unbound devices in this way?
> 
> keeping the resources alive but "unplunged" while the hardware
> disappeared is a common thing to do... it's the whole point of the
> drmm-managed resource for example. If you bind the driver and then
> unbind it while userspace is holding a ref, next time you try to bind it
> will come up with a different card number. A similar thing that could be
> done is to adjust the name of the event - currently we add the mangled
> pci slot.
> 
> That said, I agree a better approach would be to allow
> perf_pmu_unregister() to do its job even when there are open events. On
> top of that (or as a way to help achieve that), make perf core replace
> the callbacks with stubs when pmu is unregistered - that would even kill
> the need for i915's checks on pmu->closed (and fix the lack thereof in
> other drivers).
> 
> It can be a can of worms though and may be pushed back by perf core
> maintainers, so it'd be good have their feedback.

I don't think I understand the problem. I also don't understand drivers
much -- so that might be the problem.

So the problem appears to be that the device just disappears without
warning? How can a GPU go away like that?

Since you have a notion of this device, can't you do this stubbing you
talk about? That is, if your internal device reference becomes NULL, let
the PMU methods preserve the state like no-ops.

And then when the last event goes away, tear down the whole thing.

Again, I'm not sure I'm following.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
  2024-07-24 12:41       ` Peter Zijlstra
@ 2024-07-24 15:39         ` Lucas De Marchi
  2024-09-09 21:03           ` Lucas De Marchi
  0 siblings, 1 reply; 20+ messages in thread
From: Lucas De Marchi @ 2024-07-24 15:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tvrtko Ursulin, intel-gfx, linux-perf-users, Tvrtko Ursulin,
	dri-devel, Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel

On Wed, Jul 24, 2024 at 02:41:05PM GMT, Peter Zijlstra wrote:
>On Tue, Jul 23, 2024 at 10:30:08AM -0500, Lucas De Marchi wrote:
>> On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:
>> >
>> > On 22/07/2024 22:06, Lucas De Marchi wrote:
>> > > Instead of calling perf_pmu_unregister() when unbinding, defer that to
>> > > the destruction of i915 object. Since perf itself holds a reference in
>> > > the event, this only happens when all events are gone, which guarantees
>> > > i915 is not unregistering the pmu with live events.
>> > >
>> > > Previously, running the following sequence would crash the system after
>> > > ~2 tries:
>> > >
>> > > 	1) bind device to i915
>> > > 	2) wait events to show up on sysfs
>> > > 	3) start perf  stat -I 1000 -e i915/rcs0-busy/
>> > > 	4) unbind driver
>> > > 	5) kill perf
>> > >
>> > > Most of the time this crashes in perf_pmu_disable() while accessing the
>> > > percpu pmu_disable_count. This happens because perf_pmu_unregister()
>> > > destroys it with free_percpu(pmu->pmu_disable_count).
>> > >
>> > > With a lazy unbind, the pmu is only unregistered after (5) as opposed to
>> > > after (4). The downside is that if a new bind operation is attempted for
>> > > the same device/driver without killing the perf process, i915 will fail
>> > > to register the pmu (but still load successfully). This seems better
>> > > than completely crashing the system.
>> >
>> > So effectively allows unbind to succeed without fully unbinding the
>> > driver from the device? That sounds like a significant drawback and if
>> > so, I wonder if a more complicated solution wouldn't be better after
>> > all. Or is there precedence for allowing userspace keeping their paws on
>> > unbound devices in this way?
>>
>> keeping the resources alive but "unplunged" while the hardware
>> disappeared is a common thing to do... it's the whole point of the
>> drmm-managed resource for example. If you bind the driver and then
>> unbind it while userspace is holding a ref, next time you try to bind it
>> will come up with a different card number. A similar thing that could be
>> done is to adjust the name of the event - currently we add the mangled
>> pci slot.
>>
>> That said, I agree a better approach would be to allow
>> perf_pmu_unregister() to do its job even when there are open events. On
>> top of that (or as a way to help achieve that), make perf core replace
>> the callbacks with stubs when pmu is unregistered - that would even kill
>> the need for i915's checks on pmu->closed (and fix the lack thereof in
>> other drivers).
>>
>> It can be a can of worms though and may be pushed back by perf core
>> maintainers, so it'd be good have their feedback.
>
>I don't think I understand the problem. I also don't understand drivers
>much -- so that might be the problem.

We can bind/unbind the driver to/from the pci device. Example:

	echo -n "0000:00:02.0" > /sys/bus/pci/drivers/i915/unbind

This is essentially unplugging the HW from the kernel.  This will
trigger the driver to deinitialize and free up all resources use by that
device.

So when the driver is binding it does:

	perf_pmu_register();

When it's unbinding:

	perf_pmu_unregister();
	
Reasons to unbind include:

	- driver testing so then we can unload the module and load it
	  again
	- device is toast - doing an FLR and rebinding may
	  fix/workaround it
	- For SR-IOV, in which we are exposing multiple instances of the
	  same underlying PCI device, we may need to bind/unbind
	  on-demand  (it's not yet clear if perf_pmu_register() would be
	  called on the VF instances, but listed here just to explain
	  the bind/unbind)
	- Hotplug

>
>So the problem appears to be that the device just disappears without
>warning? How can a GPU go away like that?
>
>Since you have a notion of this device, can't you do this stubbing you
>talk about? That is, if your internal device reference becomes NULL, let
>the PMU methods preserve the state like no-ops.

It's not clear if you are suggesting these stubs to be in each driver or
to be assigned by perf in perf_pmu_unregister(). Some downsides
of doing it in the driver:

	- you can't remove the module as perf will still call module
	  code
	- need to replicate the stubs in every driver (or the
	  equivalent of pmu->closed checks in i915_pmu.c)

I *think* the stubs would be quiet the same for every device, so could
be feasible to share them inside perf. Alternatively it could simply
shortcut the call, without stubs, by looking at event->hw.state and
a new pmu->state. I can give this a try.

thanks
Lucas De Marchi

>
>And then when the last event goes away, tear down the whole thing.
>
>Again, I'm not sure I'm following.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/7] drm/i915/pmu: Do not set event_init to NULL
  2024-07-22 21:06 ` [PATCH 7/7] drm/i915/pmu: Do not set event_init to NULL Lucas De Marchi
@ 2024-08-05  6:55   ` kernel test robot
  0 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2024-08-05  6:55 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: oe-lkp, lkp, intel-gfx, linux-perf-users, Tvrtko Ursulin,
	dri-devel, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel, Lucas De Marchi, oliver.sang



Hello,

kernel test robot noticed "INFO:trying_to_register_non-static_key" on:

commit: a236f4d92e214998f7606430282249580f709423 ("[PATCH 7/7] drm/i915/pmu: Do not set event_init to NULL")
url: https://github.com/intel-lab-lkp/linux/commits/Lucas-De-Marchi/perf-core-Add-pmu-get-put/20240723-051755
base: https://git.kernel.org/cgit/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
patch link: https://lore.kernel.org/all/20240722210648.80892-8-lucas.demarchi@intel.com/
patch subject: [PATCH 7/7] drm/i915/pmu: Do not set event_init to NULL

in testcase: kernel-selftests
version: kernel-selftests-x86_64-977d51cf-1_20240508
with following parameters:

	group: locking



compiler: gcc-13
test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz (Skylake) with 16G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202408051423.839b16eb-oliver.sang@intel.com


[   45.439542][  T194] INFO: trying to register non-static key.
[   45.445246][  T194] The code is fine but needs lockdep annotation, or maybe
[   45.452244][  T194] you didn't initialize this object before use?
[   45.458370][  T194] turning off the locking correctness validator.
[   45.464586][  T194] CPU: 3 PID: 194 Comm: (udev-worker) Tainted: G S                 6.10.0-rc3-00133-ga236f4d92e21 #1
[   45.475322][  T194] Hardware name: HP HP Z238 Microtower Workstation/8183, BIOS N51 Ver. 01.63 10/05/2017
[   45.484923][  T194] Call Trace:
[   45.488094][  T194]  <TASK>
[ 45.490908][ T194] dump_stack_lvl (lib/dump_stack.c:118) 
[ 45.495297][ T194] register_lock_class (kernel/locking/lockdep.c:1290) 
[ 45.500377][ T194] ? lock_acquire (kernel/locking/lockdep.c:467 (discriminator 4) kernel/locking/lockdep.c:5756 (discriminator 4) kernel/locking/lockdep.c:5719 (discriminator 4)) 
[ 45.504938][ T194] ? intel_display_power_get (drivers/gpu/drm/i915/display/intel_display_power.c:504 drivers/gpu/drm/i915/display/intel_display_power.c:532) i915
[ 45.511333][ T194] ? __pfx_register_lock_class (kernel/locking/lockdep.c:1276) 
[ 45.516848][ T194] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722) 
[ 45.521748][ T194] ? lock_is_held_type (kernel/locking/lockdep.c:5495 (discriminator 1) kernel/locking/lockdep.c:5825 (discriminator 1)) 
[ 45.526654][ T194] ? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1)) 
[ 45.531306][ T194] __lock_acquire (kernel/locking/lockdep.c:5015) 
[ 45.535870][ T194] ? trace_contention_end (include/trace/events/lock.h:122 (discriminator 2)) 
[ 45.541042][ T194] ? __mutex_lock (arch/x86/include/asm/preempt.h:94 (discriminator 1) kernel/locking/mutex.c:618 (discriminator 1) kernel/locking/mutex.c:752 (discriminator 1)) 
[ 45.545688][ T194] lock_acquire (kernel/locking/lockdep.c:467 (discriminator 4) kernel/locking/lockdep.c:5756 (discriminator 4) kernel/locking/lockdep.c:5719 (discriminator 4)) 
[ 45.550069][ T194] ? i915_pmu_gt_unparked (drivers/gpu/drm/i915/i915_pmu.c:336) i915
[ 45.556233][ T194] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722) 
[ 45.561162][ T194] ? __pfx___mutex_unlock_slowpath (kernel/locking/mutex.c:907) 
[ 45.567023][ T194] ? __pfx_intel_display_power_grab_async_put_ref (drivers/gpu/drm/i915/display/intel_display_power.c:471) i915
[ 45.575181][ T194] _raw_spin_lock_irq (include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170) 
[ 45.579937][ T194] ? i915_pmu_gt_unparked (drivers/gpu/drm/i915/i915_pmu.c:336) i915
[ 45.586041][ T194] i915_pmu_gt_unparked (drivers/gpu/drm/i915/i915_pmu.c:336) i915
[ 45.591960][ T194] __gt_unpark (drivers/gpu/drm/i915/gt/intel_gt_pm.c:93) i915
[ 45.597075][ T194] __intel_wakeref_get_first (drivers/gpu/drm/i915/intel_wakeref.c:32) i915
[ 45.603498][ T194] intel_gt_resume (drivers/gpu/drm/i915/gt/intel_gt_pm.c:259) i915
[ 45.609021][ T194] ? __pfx_intel_execlists_submission_setup (drivers/gpu/drm/i915/gt/intel_execlists_submission.c:3541) i915
[ 45.616560][ T194] intel_gt_init (drivers/gpu/drm/i915/gt/intel_gt.c:740) i915
[ 45.621888][ T194] i915_gem_init (drivers/gpu/drm/i915/i915_gem.c:1193) i915
[ 45.627261][ T194] i915_driver_probe (drivers/gpu/drm/i915/i915_driver.c:798) i915
[ 45.632922][ T194] ? __pfx_i915_driver_probe (drivers/gpu/drm/i915/i915_driver.c:750) i915
[ 45.639091][ T194] ? drm_privacy_screen_get (drivers/gpu/drm/drm_privacy_screen.c:168) drm
[ 45.645110][ T194] ? intel_display_driver_probe_defer (drivers/gpu/drm/i915/display/intel_display_driver.c:81) i915
[ 45.652258][ T194] ? i915_pci_probe (drivers/gpu/drm/i915/i915_pci.c:998 (discriminator 1)) i915
[ 45.657875][ T194] ? __pfx_i915_pci_probe (drivers/gpu/drm/i915/i915_pci.c:959) i915
[ 45.663785][ T194] local_pci_probe (drivers/pci/pci-driver.c:324) 
[ 45.668340][ T194] pci_call_probe (drivers/pci/pci-driver.c:392 (discriminator 1)) 
[ 45.672893][ T194] ? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:114) 
[ 45.678141][ T194] ? __pfx_pci_call_probe (drivers/pci/pci-driver.c:352) 
[ 45.683232][ T194] ? pci_match_device (drivers/pci/pci-driver.c:159) 
[ 45.688169][ T194] ? pci_match_device (drivers/pci/pci-driver.c:159 (discriminator 1)) 
[ 45.693077][ T194] ? kernfs_put (arch/x86/include/asm/atomic.h:67 (discriminator 1) include/linux/atomic/atomic-arch-fallback.h:2278 (discriminator 1) include/linux/atomic/atomic-instrumented.h:1384 (discriminator 1) fs/kernfs/dir.c:557 (discriminator 1)) 
[ 45.697294][ T194] pci_device_probe (drivers/pci/pci-driver.c:452) 
[ 45.701942][ T194] ? pci_dma_configure (drivers/pci/pci-driver.c:1656) 
[ 45.706941][ T194] really_probe (drivers/base/dd.c:578 drivers/base/dd.c:656) 
[ 45.711328][ T194] __driver_probe_device (drivers/base/dd.c:798) 
[ 45.716492][ T194] driver_probe_device (drivers/base/dd.c:828) 
[ 45.721401][ T194] __driver_attach (drivers/base/dd.c:1215) 
[ 45.726051][ T194] ? __pfx___driver_attach (drivers/base/dd.c:1155) 
[ 45.731235][ T194] bus_for_each_dev (drivers/base/bus.c:367) 
[ 45.735991][ T194] ? lockdep_init_map_type (kernel/locking/lockdep.c:4892 (discriminator 1)) 
[ 45.741332][ T194] ? __pfx_bus_for_each_dev (drivers/base/bus.c:356) 
[ 45.746592][ T194] ? bus_add_driver (drivers/base/bus.c:672) 
[ 45.751332][ T194] bus_add_driver (drivers/base/bus.c:673) 
[ 45.755895][ T194] driver_register (drivers/base/driver.c:246) 
[ 45.760539][ T194] i915_init (drivers/gpu/drm/i915/i915_driver.c:1436) i915
[ 45.765494][ T194] ? __pfx_i915_init (drivers/gpu/drm/i915/i915_config.c:13) i915
[ 45.770985][ T194] do_one_initcall (init/main.c:1267) 
[ 45.775534][ T194] ? asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702) 
[ 45.781568][ T194] ? __pfx_do_one_initcall (init/main.c:1258) 
[ 45.786741][ T194] ? kasan_unpoison (mm/kasan/shadow.c:156 mm/kasan/shadow.c:182) 
[ 45.791304][ T194] do_init_module (kernel/module/main.c:2541) 
[ 45.795868][ T194] init_module_from_file (kernel/module/main.c:3173) 
[ 45.800951][ T194] ? __pfx_init_module_from_file (kernel/module/main.c:3149) 
[ 45.806647][ T194] ? __lock_release+0x103/0x440 
[ 45.811991][ T194] ? idempotent_init_module (kernel/module/main.c:3119 kernel/module/main.c:3184) 
[ 45.817423][ T194] ? idempotent_init_module (kernel/module/main.c:3119 kernel/module/main.c:3184) 
[ 45.822857][ T194] ? do_raw_spin_unlock (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/asm-generic/qspinlock.h:57 kernel/locking/spinlock_debug.c:101 kernel/locking/spinlock_debug.c:141) 
[ 45.827853][ T194] idempotent_init_module (kernel/module/main.c:3190) 
[ 45.833104][ T194] ? __pfx_idempotent_init_module (kernel/module/main.c:3177) 
[ 45.838879][ T194] ? __seccomp_filter (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/seccomp.c:359 kernel/seccomp.c:386 kernel/seccomp.c:418 kernel/seccomp.c:1222) 
[ 45.843791][ T194] ? security_capable (security/security.c:1036 (discriminator 13)) 
[ 45.848531][ T194] __x64_sys_finit_module (include/linux/file.h:47 kernel/module/main.c:3212 kernel/module/main.c:3194 kernel/module/main.c:3194) 
[ 45.853704][ T194] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) 
[ 45.858092][ T194] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
[   45.863871][  T194] RIP: 0033:0x7f5179545719
[ 45.868164][ T194] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48
All code
========
   0:	08 89 e8 5b 5d c3    	or     %cl,-0x3ca2a418(%rcx)
   6:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
   d:	00 00 00 
  10:	90                   	nop
  11:	48 89 f8             	mov    %rdi,%rax
  14:	48 89 f7             	mov    %rsi,%rdi
  17:	48 89 d6             	mov    %rdx,%rsi
  1a:	48 89 ca             	mov    %rcx,%rdx
  1d:	4d 89 c2             	mov    %r8,%r10
  20:	4d 89 c8             	mov    %r9,%r8
  23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
  28:	0f 05                	syscall 
  2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
  30:	73 01                	jae    0x33
  32:	c3                   	retq   
  33:	48 8b 0d b7 06 0d 00 	mov    0xd06b7(%rip),%rcx        # 0xd06f1
  3a:	f7 d8                	neg    %eax
  3c:	64 89 01             	mov    %eax,%fs:(%rcx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   6:	73 01                	jae    0x9
   8:	c3                   	retq   
   9:	48 8b 0d b7 06 0d 00 	mov    0xd06b7(%rip),%rcx        # 0xd06c7
  10:	f7 d8                	neg    %eax
  12:	64 89 01             	mov    %eax,%fs:(%rcx)
  15:	48                   	rex.W


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240805/202408051423.839b16eb-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister
  2024-07-24 15:39         ` Lucas De Marchi
@ 2024-09-09 21:03           ` Lucas De Marchi
  0 siblings, 0 replies; 20+ messages in thread
From: Lucas De Marchi @ 2024-09-09 21:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tvrtko Ursulin, intel-gfx, linux-perf-users, Tvrtko Ursulin,
	dri-devel, Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel

Hi Peter,

On Wed, Jul 24, 2024 at 10:39:57AM GMT, Lucas De Marchi wrote:
>On Wed, Jul 24, 2024 at 02:41:05PM GMT, Peter Zijlstra wrote:
>>On Tue, Jul 23, 2024 at 10:30:08AM -0500, Lucas De Marchi wrote:
>>>On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:
>>>>
>>>> On 22/07/2024 22:06, Lucas De Marchi wrote:
>>>> > Instead of calling perf_pmu_unregister() when unbinding, defer that to
>>>> > the destruction of i915 object. Since perf itself holds a reference in
>>>> > the event, this only happens when all events are gone, which guarantees
>>>> > i915 is not unregistering the pmu with live events.
>>>> >
>>>> > Previously, running the following sequence would crash the system after
>>>> > ~2 tries:
>>>> >
>>>> > 	1) bind device to i915
>>>> > 	2) wait events to show up on sysfs
>>>> > 	3) start perf  stat -I 1000 -e i915/rcs0-busy/
>>>> > 	4) unbind driver
>>>> > 	5) kill perf
>>>> >
>>>> > Most of the time this crashes in perf_pmu_disable() while accessing the
>>>> > percpu pmu_disable_count. This happens because perf_pmu_unregister()
>>>> > destroys it with free_percpu(pmu->pmu_disable_count).
>>>> >
>>>> > With a lazy unbind, the pmu is only unregistered after (5) as opposed to
>>>> > after (4). The downside is that if a new bind operation is attempted for
>>>> > the same device/driver without killing the perf process, i915 will fail
>>>> > to register the pmu (but still load successfully). This seems better
>>>> > than completely crashing the system.
>>>>
>>>> So effectively allows unbind to succeed without fully unbinding the
>>>> driver from the device? That sounds like a significant drawback and if
>>>> so, I wonder if a more complicated solution wouldn't be better after
>>>> all. Or is there precedence for allowing userspace keeping their paws on
>>>> unbound devices in this way?
>>>
>>>keeping the resources alive but "unplunged" while the hardware
>>>disappeared is a common thing to do... it's the whole point of the
>>>drmm-managed resource for example. If you bind the driver and then
>>>unbind it while userspace is holding a ref, next time you try to bind it
>>>will come up with a different card number. A similar thing that could be
>>>done is to adjust the name of the event - currently we add the mangled
>>>pci slot.
>>>
>>>That said, I agree a better approach would be to allow
>>>perf_pmu_unregister() to do its job even when there are open events. On
>>>top of that (or as a way to help achieve that), make perf core replace
>>>the callbacks with stubs when pmu is unregistered - that would even kill
>>>the need for i915's checks on pmu->closed (and fix the lack thereof in
>>>other drivers).
>>>
>>>It can be a can of worms though and may be pushed back by perf core
>>>maintainers, so it'd be good have their feedback.
>>
>>I don't think I understand the problem. I also don't understand drivers
>>much -- so that might be the problem.
>
>We can bind/unbind the driver to/from the pci device. Example:
>
>	echo -n "0000:00:02.0" > /sys/bus/pci/drivers/i915/unbind
>
>This is essentially unplugging the HW from the kernel.  This will
>trigger the driver to deinitialize and free up all resources use by that
>device.
>
>So when the driver is binding it does:
>
>	perf_pmu_register();
>
>When it's unbinding:
>
>	perf_pmu_unregister();
>	
>Reasons to unbind include:
>
>	- driver testing so then we can unload the module and load it
>	  again
>	- device is toast - doing an FLR and rebinding may
>	  fix/workaround it
>	- For SR-IOV, in which we are exposing multiple instances of the
>	  same underlying PCI device, we may need to bind/unbind
>	  on-demand  (it's not yet clear if perf_pmu_register() would be
>	  called on the VF instances, but listed here just to explain
>	  the bind/unbind)
>	- Hotplug
>
>>
>>So the problem appears to be that the device just disappears without
>>warning? How can a GPU go away like that?
>>
>>Since you have a notion of this device, can't you do this stubbing you
>>talk about? That is, if your internal device reference becomes NULL, let
>>the PMU methods preserve the state like no-ops.
>
>It's not clear if you are suggesting these stubs to be in each driver or
>to be assigned by perf in perf_pmu_unregister(). Some downsides
>of doing it in the driver:
>
>	- you can't remove the module as perf will still call module
>	  code
>	- need to replicate the stubs in every driver (or the
>	  equivalent of pmu->closed checks in i915_pmu.c)
>
>I *think* the stubs would be quiet the same for every device, so could
>be feasible to share them inside perf. Alternatively it could simply
>shortcut the call, without stubs, by looking at event->hw.state and
>a new pmu->state. I can give this a try.

trying to revive these patches to get this fixed. Are you ok with one of
those approaches, i.e. a) either add the stub in the perf side or b)
shortcut the calls after perf_pmu_unregister() is called ?


Lucas De Marchi

>
>thanks
>Lucas De Marchi
>
>>
>>And then when the last event goes away, tear down the whole thing.
>>
>>Again, I'm not sure I'm following.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-09-09 21:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22 21:06 [PATCH 0/7] Fix i915 pmu on bind/unbind Lucas De Marchi
2024-07-22 21:06 ` [PATCH 1/7] perf/core: Add pmu get/put Lucas De Marchi
2024-07-23 23:07   ` Ian Rogers
2024-07-22 21:06 ` [PATCH 2/7] drm/i915/pmu: Fix crash due to use-after-free Lucas De Marchi
2024-07-22 21:06 ` [PATCH 3/7] drm/i915/pmu: Use event_to_pmu() Lucas De Marchi
2024-07-23  4:35   ` Dixit, Ashutosh
2024-07-22 21:06 ` [PATCH 4/7] drm/i915/pmu: Drop is_igp() Lucas De Marchi
2024-07-22 23:25   ` Dixit, Ashutosh
2024-07-23  7:52   ` Tvrtko Ursulin
2024-07-22 21:06 ` [PATCH 5/7] drm/i915/pmu: Let resource survive unbind Lucas De Marchi
2024-07-23  7:58   ` Tvrtko Ursulin
2024-07-22 21:06 ` [PATCH 6/7] drm/i915/pmu: Lazy unregister Lucas De Marchi
2024-07-23  8:03   ` Tvrtko Ursulin
2024-07-23 15:30     ` Lucas De Marchi
2024-07-24  7:48       ` Tvrtko Ursulin
2024-07-24 12:41       ` Peter Zijlstra
2024-07-24 15:39         ` Lucas De Marchi
2024-09-09 21:03           ` Lucas De Marchi
2024-07-22 21:06 ` [PATCH 7/7] drm/i915/pmu: Do not set event_init to NULL Lucas De Marchi
2024-08-05  6:55   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).