Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/20] core_hotunplug improvements
@ 2024-05-10 18:12 Matthew Auld
  2024-05-10 18:12 ` [PATCH 01/20] drm/drm_managed: try to improve the drmm DOC Matthew Auld
                   ` (22 more replies)
  0 siblings, 23 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe

Currently core_hotunplug or in general trying to hotunplug the device is broken in a number of
ways.

-- 
2.45.0


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/20] drm/drm_managed: try to improve the drmm DOC
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  7:06   ` Andrzej Hajda
  2024-05-13 15:38   ` Rodrigo Vivi
  2024-05-10 18:12 ` [PATCH 02/20] drm/amdgpu: don't trample pdev drvdata Matthew Auld
                   ` (21 subsequent siblings)
  22 siblings, 2 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Daniel Vetter, dri-devel

Hopefully make it clearer when to use devm vs drmm.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/drm_managed.c | 42 +++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c
index 7646f67bda4e..20d705bbc0a3 100644
--- a/drivers/gpu/drm/drm_managed.c
+++ b/drivers/gpu/drm/drm_managed.c
@@ -34,6 +34,48 @@
  * during the lifetime of the driver, all the functions are fully concurrent
  * safe. But it is recommended to use managed resources only for resources that
  * change rarely, if ever, during the lifetime of the &drm_device instance.
+ *
+ * Note that the distinction between devm and drmm is important to get right.
+ * Consider some hotunplug scenarios, where it is valid for there to be multiple
+ * unplugged struct &drm_device instances each being kept alive by an open
+ * driver fd. The driver needs a clean separation between what needs to happen
+ * when the struct &device is removed and what needs to happen when a given
+ * struct &drm_device instance is released, as well as in some cases a more
+ * finer grained marking of critical sections that require hardware interaction.
+ * See below.
+ *
+ * devm
+ * ~~~~
+ * In general use devm for cleaning up anything hardware related. So removing
+ * pci mmaps, releasing interrupt handlers, basically anything hw related.  The
+ * devm release actions are called when the struct &device is removed, shortly
+ * after calling into the drivers struct &pci_driver.remove() callback, if this
+ * is a pci device.
+ *
+ * devm can be thought of as an alternative to putting all the hw related
+ * cleanup directly in the struct &pci_driver.remove() callback, where the
+ * correct ordering of the unwind steps needs to be manually done in the error
+ * path of the struct &pci_driver.probe() and again on the remove side.  With
+ * devm this is all done automatically.
+ *
+ * drmm
+ * ~~~~
+ * In general use this for cleaning up anything software related. So data
+ * structures and the like which are tied to the lifetime of a particular struct
+ * &drm_device instance.
+ *
+ * drmm can be thought of as an alternative to putting all the software related
+ * cleanup directly in the struct &drm_driver.release() callback, where again
+ * the correct ordering of the unwind steps needs to be done manually. As with
+ * devm this is instead done automatically.
+ *
+ * Sometimes there is no clean separation between software and hardware, which
+ * is where drm_dev_enter() comes in. For example, a driver might have some
+ * state tied to a struct &drm_device instance, for which the same cleanup path
+ * is called for both a plugged and unplugged device, and the cleanup itself
+ * might require talking to the device if it's still attached to this particular
+ * struct &drm_device. For that we instead mark the device sections.  See
+ * drm_dev_enter(), drm_dev_exit() and drm_dev_unplug().
  */
 
 struct drmres_node {
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/20] drm/amdgpu: don't trample pdev drvdata
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
  2024-05-10 18:12 ` [PATCH 01/20] drm/drm_managed: try to improve the drmm DOC Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13 11:54   ` Christian König
  2024-05-10 18:12 ` [PATCH 03/20] drm/xe/pci: remove broken driver_release Matthew Auld
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Christian König, Daniel Vetter, amd-gfx

The driver release callback is called when a particular drm_device goes
away, just like with drmm, so here we should never nuke the pdev drvdata
pointer, since that could already be pointing to a new drvdata.
For example something hotunplugs the device, for which we have an open
driver fd, keeping the drm_device alive, and in the meantime the same
physical device is re-attached to a new drm_device therefore setting
drvdata again. Once the original drm_device goes away, we might then
call the release which then incorrectly tramples the drvdata.

The driver core will already nuke the pointer for us when the pci device
is removed, so should be safe to simply drop. Alternative would be to
move to the driver pci remove callback.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index a0ea6fe8d060..d5fed007c698 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1450,7 +1450,6 @@ void amdgpu_driver_release_kms(struct drm_device *dev)
 	struct amdgpu_device *adev = drm_to_adev(dev);
 
 	amdgpu_device_fini_sw(adev);
-	pci_set_drvdata(adev->pdev, NULL);
 }
 
 /*
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/20] drm/xe/pci: remove broken driver_release
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
  2024-05-10 18:12 ` [PATCH 01/20] drm/drm_managed: try to improve the drmm DOC Matthew Auld
  2024-05-10 18:12 ` [PATCH 02/20] drm/amdgpu: don't trample pdev drvdata Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  7:14   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 04/20] drm/xe: covert sysfs over to devm Matthew Auld
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

This is quite broken since we are nuking the pdev link to the private
driver struct, but note here that driver_release is called when the
drm_device is released (poor mans drmm), which can be long after the
device has been removed. So here what we are actually doing is nuking
the pdev link for what is potentially bound to a different drm_device.
If that happens before our pci remove callback is triggered (for the new
drm_device) we silently exit and skip some important cleanup steps,
resulting in hilarity.

There should be no reason to implement driver_release, when we already
have nicer stuff like drmm, so just remove completely. The actual pdev
link is already nuked when removing the device.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index f89c781986a5..d6f9880a15fd 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -188,13 +188,6 @@ static const struct file_operations xe_driver_fops = {
 #endif
 };
 
-static void xe_driver_release(struct drm_device *dev)
-{
-	struct xe_device *xe = to_xe_device(dev);
-
-	pci_set_drvdata(to_pci_dev(xe->drm.dev), NULL);
-}
-
 static struct drm_driver driver = {
 	/* Don't use MTRRs here; the Xserver or userspace app should
 	 * deal with them for Intel hardware.
@@ -213,8 +206,6 @@ static struct drm_driver driver = {
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo = xe_drm_client_fdinfo,
 #endif
-	.release = &xe_driver_release,
-
 	.ioctls = xe_ioctls,
 	.num_ioctls = ARRAY_SIZE(xe_ioctls),
 	.fops = &xe_driver_fops,
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/20] drm/xe: covert sysfs over to devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (2 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 03/20] drm/xe/pci: remove broken driver_release Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-10 18:12 ` [PATCH 05/20] drm/xe/ggtt: use drm_dev_enter to mark device section Matthew Auld
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi, Andrzej Hajda

Hotunplugging the device seems to result in stuff like:

kobject_add_internal failed for tile0 with -EEXIST, don't try to
register things with the same name in the same directory.

We only remove the sysfs as part of drmm, however that is tied to the
lifetime of the driver instance and not the device underneath. Attempt
to fix by using devm for all of the remaining sysfs stuff related to the
device.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1667
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1432
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
 drivers/gpu/drm/xe/xe_device_sysfs.c          |  4 ++--
 drivers/gpu/drm/xe/xe_gt_ccs_mode.c           |  4 ++--
 drivers/gpu/drm/xe/xe_gt_freq.c               |  4 ++--
 drivers/gpu/drm/xe/xe_gt_idle.c               |  4 ++--
 drivers/gpu/drm/xe/xe_gt_sysfs.c              |  4 ++--
 drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c     |  4 ++--
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c | 12 ++++++------
 drivers/gpu/drm/xe/xe_tile_sysfs.c            |  4 ++--
 drivers/gpu/drm/xe/xe_vram_freq.c             |  4 ++--
 9 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_sysfs.c b/drivers/gpu/drm/xe/xe_device_sysfs.c
index 21677b8cd977..7375937934fa 100644
--- a/drivers/gpu/drm/xe/xe_device_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_device_sysfs.c
@@ -69,7 +69,7 @@ vram_d3cold_threshold_store(struct device *dev, struct device_attribute *attr,
 
 static DEVICE_ATTR_RW(vram_d3cold_threshold);
 
-static void xe_device_sysfs_fini(struct drm_device *drm, void *arg)
+static void xe_device_sysfs_fini(void *arg)
 {
 	struct xe_device *xe = arg;
 
@@ -85,5 +85,5 @@ int xe_device_sysfs_init(struct xe_device *xe)
 	if (ret)
 		return ret;
 
-	return drmm_add_action_or_reset(&xe->drm, xe_device_sysfs_fini, xe);
+	return devm_add_action_or_reset(dev, xe_device_sysfs_fini, xe);
 }
diff --git a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
index a34c9a24dafc..5d6ab24348a7 100644
--- a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
+++ b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
@@ -150,7 +150,7 @@ static const struct attribute *gt_ccs_mode_attrs[] = {
 	NULL,
 };
 
-static void xe_gt_ccs_mode_sysfs_fini(struct drm_device *drm, void *arg)
+static void xe_gt_ccs_mode_sysfs_fini(void *arg)
 {
 	struct xe_gt *gt = arg;
 
@@ -182,5 +182,5 @@ int xe_gt_ccs_mode_sysfs_init(struct xe_gt *gt)
 	if (err)
 		return err;
 
-	return drmm_add_action_or_reset(&xe->drm, xe_gt_ccs_mode_sysfs_fini, gt);
+	return devm_add_action_or_reset(xe->drm.dev, xe_gt_ccs_mode_sysfs_fini, gt);
 }
diff --git a/drivers/gpu/drm/xe/xe_gt_freq.c b/drivers/gpu/drm/xe/xe_gt_freq.c
index 855de40e40ea..e0305942644c 100644
--- a/drivers/gpu/drm/xe/xe_gt_freq.c
+++ b/drivers/gpu/drm/xe/xe_gt_freq.c
@@ -209,7 +209,7 @@ static const struct attribute *freq_attrs[] = {
 	NULL
 };
 
-static void freq_fini(struct drm_device *drm, void *arg)
+static void freq_fini(void *arg)
 {
 	struct kobject *kobj = arg;
 
@@ -237,7 +237,7 @@ int xe_gt_freq_init(struct xe_gt *gt)
 	if (!gt->freq)
 		return -ENOMEM;
 
-	err = drmm_add_action_or_reset(&xe->drm, freq_fini, gt->freq);
+	err = devm_add_action(xe->drm.dev, freq_fini, gt->freq);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/xe/xe_gt_idle.c b/drivers/gpu/drm/xe/xe_gt_idle.c
index a4f6f0a96d05..749b7f4b688f 100644
--- a/drivers/gpu/drm/xe/xe_gt_idle.c
+++ b/drivers/gpu/drm/xe/xe_gt_idle.c
@@ -145,7 +145,7 @@ static const struct attribute *gt_idle_attrs[] = {
 	NULL,
 };
 
-static void gt_idle_sysfs_fini(struct drm_device *drm, void *arg)
+static void gt_idle_sysfs_fini(void *arg)
 {
 	struct kobject *kobj = arg;
 
@@ -182,7 +182,7 @@ int xe_gt_idle_sysfs_init(struct xe_gt_idle *gtidle)
 		return err;
 	}
 
-	return drmm_add_action_or_reset(&xe->drm, gt_idle_sysfs_fini, kobj);
+	return devm_add_action_or_reset(xe->drm.dev, gt_idle_sysfs_fini, kobj);
 }
 
 void xe_gt_idle_enable_c6(struct xe_gt *gt)
diff --git a/drivers/gpu/drm/xe/xe_gt_sysfs.c b/drivers/gpu/drm/xe/xe_gt_sysfs.c
index 1e5971072bc8..a05c3699e8b9 100644
--- a/drivers/gpu/drm/xe/xe_gt_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_sysfs.c
@@ -22,7 +22,7 @@ static const struct kobj_type xe_gt_sysfs_kobj_type = {
 	.sysfs_ops = &kobj_sysfs_ops,
 };
 
-static void gt_sysfs_fini(struct drm_device *drm, void *arg)
+static void gt_sysfs_fini(void *arg)
 {
 	struct xe_gt *gt = arg;
 
@@ -51,5 +51,5 @@ int xe_gt_sysfs_init(struct xe_gt *gt)
 
 	gt->sysfs = &kg->base;
 
-	return drmm_add_action_or_reset(&xe->drm, gt_sysfs_fini, gt);
+	return devm_add_action(xe->drm.dev, gt_sysfs_fini, gt);
 }
diff --git a/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c b/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c
index fbe21a8599ca..c9e04151286d 100644
--- a/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c
@@ -229,7 +229,7 @@ static const struct attribute_group throttle_group_attrs = {
 	.attrs = throttle_attrs,
 };
 
-static void gt_throttle_sysfs_fini(struct drm_device *drm, void *arg)
+static void gt_throttle_sysfs_fini(void *arg)
 {
 	struct xe_gt *gt = arg;
 
@@ -245,5 +245,5 @@ int xe_gt_throttle_sysfs_init(struct xe_gt *gt)
 	if (err)
 		return err;
 
-	return drmm_add_action_or_reset(&xe->drm, gt_throttle_sysfs_fini, gt);
+	return devm_add_action_or_reset(xe->drm.dev, gt_throttle_sysfs_fini, gt);
 }
diff --git a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
index 844ec68cbbb8..258078a6b461 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
@@ -492,7 +492,7 @@ static const struct attribute * const files[] = {
 	NULL
 };
 
-static void kobj_xe_hw_engine_class_fini(struct drm_device *drm, void *arg)
+static void kobj_xe_hw_engine_class_fini(void *arg)
 {
 	struct kobject *kobj = arg;
 
@@ -517,7 +517,7 @@ kobj_xe_hw_engine_class(struct xe_device *xe, struct kobject *parent, const char
 	}
 	keclass->xe = xe;
 
-	err = drmm_add_action_or_reset(&xe->drm, kobj_xe_hw_engine_class_fini,
+	err = devm_add_action_or_reset(xe->drm.dev, kobj_xe_hw_engine_class_fini,
 				       &keclass->base);
 	if (err)
 		return NULL;
@@ -525,7 +525,7 @@ kobj_xe_hw_engine_class(struct xe_device *xe, struct kobject *parent, const char
 	return keclass;
 }
 
-static void hw_engine_class_defaults_fini(struct drm_device *drm, void *arg)
+static void hw_engine_class_defaults_fini(void *arg)
 {
 	struct kobject *kobj = arg;
 
@@ -552,7 +552,7 @@ static int xe_add_hw_engine_class_defaults(struct xe_device *xe,
 	if (err)
 		goto err_object;
 
-	return drmm_add_action_or_reset(&xe->drm, hw_engine_class_defaults_fini, kobj);
+	return devm_add_action_or_reset(xe->drm.dev, hw_engine_class_defaults_fini, kobj);
 
 err_object:
 	kobject_put(kobj);
@@ -611,7 +611,7 @@ static const struct kobj_type xe_hw_engine_sysfs_kobj_type = {
 	.sysfs_ops = &xe_hw_engine_class_sysfs_ops,
 };
 
-static void hw_engine_class_sysfs_fini(struct drm_device *drm, void *arg)
+static void hw_engine_class_sysfs_fini(void *arg)
 {
 	struct kobject *kobj = arg;
 
@@ -698,7 +698,7 @@ int xe_hw_engine_class_sysfs_init(struct xe_gt *gt)
 			goto err_object;
 	}
 
-	return drmm_add_action_or_reset(&xe->drm, hw_engine_class_sysfs_fini, kobj);
+	return devm_add_action_or_reset(xe->drm.dev, hw_engine_class_sysfs_fini, kobj);
 
 err_object:
 	kobject_put(kobj);
diff --git a/drivers/gpu/drm/xe/xe_tile_sysfs.c b/drivers/gpu/drm/xe/xe_tile_sysfs.c
index 64661403afcd..b804234a6551 100644
--- a/drivers/gpu/drm/xe/xe_tile_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_tile_sysfs.c
@@ -22,7 +22,7 @@ static const struct kobj_type xe_tile_sysfs_kobj_type = {
 	.sysfs_ops = &kobj_sysfs_ops,
 };
 
-static void tile_sysfs_fini(struct drm_device *drm, void *arg)
+static void tile_sysfs_fini(void *arg)
 {
 	struct xe_tile *tile = arg;
 
@@ -55,5 +55,5 @@ int xe_tile_sysfs_init(struct xe_tile *tile)
 	if (err)
 		return err;
 
-	return drmm_add_action_or_reset(&xe->drm, tile_sysfs_fini, tile);
+	return devm_add_action_or_reset(xe->drm.dev, tile_sysfs_fini, tile);
 }
diff --git a/drivers/gpu/drm/xe/xe_vram_freq.c b/drivers/gpu/drm/xe/xe_vram_freq.c
index 3e21ddc6e60c..99ff95e408e0 100644
--- a/drivers/gpu/drm/xe/xe_vram_freq.c
+++ b/drivers/gpu/drm/xe/xe_vram_freq.c
@@ -87,7 +87,7 @@ static const struct attribute_group freq_group_attrs = {
 	.attrs = freq_attrs,
 };
 
-static void vram_freq_sysfs_fini(struct drm_device *drm, void *arg)
+static void vram_freq_sysfs_fini(void *arg)
 {
 	struct kobject *kobj = arg;
 
@@ -122,5 +122,5 @@ int xe_vram_freq_sysfs_init(struct xe_tile *tile)
 		return err;
 	}
 
-	return drmm_add_action_or_reset(&xe->drm, vram_freq_sysfs_fini, kobj);
+	return devm_add_action_or_reset(xe->drm.dev, vram_freq_sysfs_fini, kobj);
 }
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/20] drm/xe/ggtt: use drm_dev_enter to mark device section
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (3 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 04/20] drm/xe: covert sysfs over to devm Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  7:16   ` Andrzej Hajda
  2024-05-13 19:34   ` Randhawa, Jagmeet
  2024-05-10 18:12 ` [PATCH 06/20] drm/xe/guc: move guc_fini over to devm Matthew Auld
                   ` (17 subsequent siblings)
  22 siblings, 2 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

Device can be hotunplugged before we start destroying gem objects. In
such a case don't touch the GGTT entries, trigger any invalidations or
mess around with rpm.  This should already be taken care of when
removing the device, we just need to take care of dealing with the
software state, like removing the mm node.

v2: (Andrzej)
  - Avoid some duplication by tracking the bound status and checking
    that instead.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_ggtt.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 0d541f55b4fc..17e5066763db 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -8,6 +8,7 @@
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/sizes.h>
 
+#include <drm/drm_drv.h>
 #include <drm/drm_managed.h>
 #include <drm/i915_drm.h>
 
@@ -433,18 +434,29 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
 void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
 			 bool invalidate)
 {
-	xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile));
+	struct xe_device *xe = tile_to_xe(ggtt->tile);
+	bool bound;
+	int idx;
+
+	bound = drm_dev_enter(&xe->drm, &idx);
+	if (bound)
+		xe_pm_runtime_get_noresume(xe);
 
 	mutex_lock(&ggtt->lock);
-	xe_ggtt_clear(ggtt, node->start, node->size);
+	if (bound)
+		xe_ggtt_clear(ggtt, node->start, node->size);
 	drm_mm_remove_node(node);
 	node->size = 0;
 	mutex_unlock(&ggtt->lock);
 
+	if (!bound)
+		return;
+
 	if (invalidate)
 		xe_ggtt_invalidate(ggtt);
 
-	xe_pm_runtime_put(tile_to_xe(ggtt->tile));
+	xe_pm_runtime_put(xe);
+	drm_dev_exit(idx);
 }
 
 void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/20] drm/xe/guc: move guc_fini over to devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (4 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 05/20] drm/xe/ggtt: use drm_dev_enter to mark device section Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-10 18:12 ` [PATCH 07/20] drm/xe/guc: s/guc_fini/guc_fini_hw/ Matthew Auld
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi, Andrzej Hajda

Make sure to actually call this when the device is removed. Currently we
only trigger it when the driver instance goes away, but that doesn't
work too well with hotunplug, since device can be removed and re-probed
with a new driver instance, where the guc_fini() is called too late.
Move the fini over to devm to ensure this is called when device is
removed.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 0c9938e0ab8c..97e2b62df486 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -239,7 +239,7 @@ static void guc_write_params(struct xe_guc *guc)
 		xe_mmio_write32(gt, SOFT_SCRATCH(1 + i), guc->params[i]);
 }
 
-static void guc_fini(struct drm_device *drm, void *arg)
+static void guc_fini(void *arg)
 {
 	struct xe_guc *guc = arg;
 	struct xe_gt *gt = guc_to_gt(guc);
@@ -323,7 +323,7 @@ int xe_guc_init(struct xe_guc *guc)
 	if (ret)
 		goto out;
 
-	ret = drmm_add_action_or_reset(&xe->drm, guc_fini, guc);
+	ret = devm_add_action_or_reset(xe->drm.dev, guc_fini, guc);
 	if (ret)
 		goto out;
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/20] drm/xe/guc: s/guc_fini/guc_fini_hw/
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (5 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 06/20] drm/xe/guc: move guc_fini over to devm Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  7:30   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 08/20] drm/xe/guc_pc: move pc_fini to devm Matthew Auld
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

Make it clear that is about cleaning up the HW/FW side, and not software
state.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 97e2b62df486..90d994e85e43 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -239,7 +239,7 @@ static void guc_write_params(struct xe_guc *guc)
 		xe_mmio_write32(gt, SOFT_SCRATCH(1 + i), guc->params[i]);
 }
 
-static void guc_fini(void *arg)
+static void guc_fini_hw(void *arg)
 {
 	struct xe_guc *guc = arg;
 	struct xe_gt *gt = guc_to_gt(guc);
@@ -323,7 +323,7 @@ int xe_guc_init(struct xe_guc *guc)
 	if (ret)
 		goto out;
 
-	ret = devm_add_action_or_reset(xe->drm.dev, guc_fini, guc);
+	ret = devm_add_action_or_reset(xe->drm.dev, guc_fini_hw, guc);
 	if (ret)
 		goto out;
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/20] drm/xe/guc_pc: move pc_fini to devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (6 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 07/20] drm/xe/guc: s/guc_fini/guc_fini_hw/ Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-10 18:12 ` [PATCH 09/20] drm/xe/guc_pc: s/pc_fini/pc_fini_hw/ Matthew Auld
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi, Andrzej Hajda

Here we are touching the HW/GuC and presumably this should happen when
the device is removed. Currently if you hotunplug the device this is
skipped if there is already open driver instance.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index d10aab29651e..4164dab1ab66 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -893,7 +893,7 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc)
  * @drm: DRM device
  * @arg: opaque pointer that should point to Xe_GuC_PC instance
  */
-static void xe_guc_pc_fini(struct drm_device *drm, void *arg)
+static void xe_guc_pc_fini(void *arg)
 {
 	struct xe_guc_pc *pc = arg;
 	struct xe_device *xe = pc_to_xe(pc);
@@ -941,5 +941,5 @@ int xe_guc_pc_init(struct xe_guc_pc *pc)
 
 	pc->bo = bo;
 
-	return drmm_add_action_or_reset(&xe->drm, xe_guc_pc_fini, pc);
+	return devm_add_action_or_reset(xe->drm.dev, xe_guc_pc_fini, pc);
 }
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/20] drm/xe/guc_pc: s/pc_fini/pc_fini_hw/
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (7 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 08/20] drm/xe/guc_pc: move pc_fini to devm Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-11  9:19   ` kernel test robot
  2024-05-13  7:37   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 10/20] drm/xe/irq: move irq_uninstall over to devm Matthew Auld
                   ` (13 subsequent siblings)
  22 siblings, 2 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

Make it clear that is about cleaning up the HW/FW side, and not software
state.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_pc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index 4164dab1ab66..aabe241be42b 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -889,11 +889,11 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc)
 }
 
 /**
- * xe_guc_pc_fini - Finalize GuC's Power Conservation component
+ * xe_guc_pc_fini_hw - Finalize GuC's Power Conservation component
  * @drm: DRM device
  * @arg: opaque pointer that should point to Xe_GuC_PC instance
  */
-static void xe_guc_pc_fini(void *arg)
+static void xe_guc_pc_fini_hw(void *arg)
 {
 	struct xe_guc_pc *pc = arg;
 	struct xe_device *xe = pc_to_xe(pc);
@@ -941,5 +941,5 @@ int xe_guc_pc_init(struct xe_guc_pc *pc)
 
 	pc->bo = bo;
 
-	return devm_add_action_or_reset(xe->drm.dev, xe_guc_pc_fini, pc);
+	return devm_add_action_or_reset(xe->drm.dev, xe_guc_pc_fini_hw, pc);
 }
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/20] drm/xe/irq: move irq_uninstall over to devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (8 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 09/20] drm/xe/guc_pc: s/pc_fini/pc_fini_hw/ Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-10 18:12 ` [PATCH 11/20] drm/xe/device: move flr " Matthew Auld
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi, Andrzej Hajda

Makes sense to trigger this when the device is removed.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
 drivers/gpu/drm/xe/xe_irq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
index 996806353171..8ee3c300c5e4 100644
--- a/drivers/gpu/drm/xe/xe_irq.c
+++ b/drivers/gpu/drm/xe/xe_irq.c
@@ -663,7 +663,7 @@ static irq_handler_t xe_irq_handler(struct xe_device *xe)
 		return xelp_irq_handler;
 }
 
-static void irq_uninstall(struct drm_device *drm, void *arg)
+static void irq_uninstall(void *arg)
 {
 	struct xe_device *xe = arg;
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
@@ -723,7 +723,7 @@ int xe_irq_install(struct xe_device *xe)
 
 	xe_irq_postinstall(xe);
 
-	err = drmm_add_action_or_reset(&xe->drm, irq_uninstall, xe);
+	err = devm_add_action_or_reset(xe->drm.dev, irq_uninstall, xe);
 	if (err)
 		goto free_irq_handler;
 
@@ -737,7 +737,7 @@ int xe_irq_install(struct xe_device *xe)
 
 void xe_irq_shutdown(struct xe_device *xe)
 {
-	irq_uninstall(&xe->drm, xe);
+	irq_uninstall(xe);
 }
 
 void xe_irq_suspend(struct xe_device *xe)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 11/20] drm/xe/device: move flr to devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (9 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 10/20] drm/xe/irq: move irq_uninstall over to devm Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-10 18:12 ` [PATCH 12/20] drm/xe/device: move xe_device_sanitize over " Matthew Auld
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi, Andrzej Hajda

Should be called when driver is removed, not when this particular driver
instance is destroyed.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index d6f9880a15fd..35d11be56840 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -380,7 +380,7 @@ static void xe_driver_flr(struct xe_device *xe)
 	xe_mmio_write32(gt, GU_DEBUG, DRIVERFLR_STATUS);
 }
 
-static void xe_driver_flr_fini(struct drm_device *drm, void *arg)
+static void xe_driver_flr_fini(void *arg)
 {
 	struct xe_device *xe = arg;
 
@@ -582,7 +582,7 @@ int xe_device_probe(struct xe_device *xe)
 	err = xe_devcoredump_init(xe);
 	if (err)
 		return err;
-	err = drmm_add_action_or_reset(&xe->drm, xe_driver_flr_fini, xe);
+	err = devm_add_action_or_reset(xe->drm.dev, xe_driver_flr_fini, xe);
 	if (err)
 		return err;
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 12/20] drm/xe/device: move xe_device_sanitize over to devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (10 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 11/20] drm/xe/device: move flr " Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-10 18:12 ` [PATCH 13/20] drm/xe/coredump: move " Matthew Auld
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi, Andrzej Hajda

Disable GuC submission when removing the device.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 35d11be56840..e4c30030fc40 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -388,7 +388,7 @@ static void xe_driver_flr_fini(void *arg)
 		xe_driver_flr(xe);
 }
 
-static void xe_device_sanitize(struct drm_device *drm, void *arg)
+static void xe_device_sanitize(void *arg)
 {
 	struct xe_device *xe = arg;
 	struct xe_gt *gt;
@@ -654,7 +654,7 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_hwmon_register(xe);
 
-	return drmm_add_action_or_reset(&xe->drm, xe_device_sanitize, xe);
+	return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
 
 err_fini_display:
 	xe_display_driver_remove(xe);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 13/20] drm/xe/coredump: move over to devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (11 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 12/20] drm/xe/device: move xe_device_sanitize over " Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  7:38   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 14/20] drm/xe/gt: break out gt_fini into sw vs hw state Matthew Auld
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

Here we are using drmm to ensure we release the coredump when unloading
the module, however the coredump is very much tied to the struct device
underneath. We can see this when we hotunplug the device, for which we
have already got a coredump attached. In such a case the coredump still
remains and adding another is not possible. However we still register
the release action via xe_driver_devcoredump_fini(), so in effect two or
more releases for one dump.  The other consideration is that the
coredump state is embedded in the xe_driver instance, so technically
once the drmm release action fires we might free the coredumpe state
from a different driver instance, assuming we have two release actions
and they can race. Rather use devm here to remove the coredump when the
device is released.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1679
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_devcoredump.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 3d7980232be1..e70aef797193 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -238,13 +238,15 @@ void xe_devcoredump(struct xe_sched_job *job)
 		      xe_devcoredump_read, xe_devcoredump_free);
 }
 
-static void xe_driver_devcoredump_fini(struct drm_device *drm, void *arg)
+static void xe_driver_devcoredump_fini(void *arg)
 {
+	struct drm_device *drm = arg;
+
 	dev_coredump_put(drm->dev);
 }
 
 int xe_devcoredump_init(struct xe_device *xe)
 {
-	return drmm_add_action_or_reset(&xe->drm, xe_driver_devcoredump_fini, xe);
+	return devm_add_action_or_reset(xe->drm.dev, xe_driver_devcoredump_fini, &xe->drm);
 }
 #endif
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 14/20] drm/xe/gt: break out gt_fini into sw vs hw state
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (12 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 13/20] drm/xe/coredump: move " Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  7:50   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 15/20] drm/xe: make gt_remove use devm Matthew Auld
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

Have a cleaner separation between hw vs sw.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 36c7b1631fa6..3f1826b783ad 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -57,9 +57,17 @@
 #include "xe_wa.h"
 #include "xe_wopcm.h"
 
+static void gt_fini(struct drm_device *drm, void *arg)
+{
+	struct xe_gt *gt = arg;
+
+	destroy_workqueue(gt->ordered_wq);
+}
+
 struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
 {
 	struct xe_gt *gt;
+	int err;
 
 	gt = drmm_kzalloc(&tile_to_xe(tile)->drm, sizeof(*gt), GFP_KERNEL);
 	if (!gt)
@@ -68,6 +76,10 @@ struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
 	gt->tile = tile;
 	gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0);
 
+	err = drmm_add_action_or_reset(&gt_to_xe(gt)->drm, gt_fini, gt);
+	if (err)
+		return ERR_PTR(err);
+
 	return gt;
 }
 
@@ -90,15 +102,9 @@ void xe_gt_sanitize(struct xe_gt *gt)
  */
 void xe_gt_remove(struct xe_gt *gt)
 {
-	xe_uc_remove(&gt->uc);
-}
-
-static void gt_fini(struct drm_device *drm, void *arg)
-{
-	struct xe_gt *gt = arg;
 	int i;
 
-	destroy_workqueue(gt->ordered_wq);
+	xe_uc_remove(&gt->uc);
 
 	for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
 		xe_hw_fence_irq_finish(&gt->fence_irq[i]);
@@ -567,7 +573,6 @@ int xe_gt_init(struct xe_gt *gt)
 	if (err)
 		return err;
 
-	return drmm_add_action_or_reset(&gt_to_xe(gt)->drm, gt_fini, gt);
 }
 
 static int do_gt_reset(struct xe_gt *gt)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 15/20] drm/xe: make gt_remove use devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (13 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 14/20] drm/xe/gt: break out gt_fini into sw vs hw state Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  7:58   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 16/20] drm/xe/mmio: move mmio_fini over to devm Matthew Auld
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

No need to hand roll the onion unwind here, just move gt_remove over to
devm which will already have the correct ordering.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 22 ++--------------------
 drivers/gpu/drm/xe/xe_gt.c     |  9 ++++-----
 drivers/gpu/drm/xe/xe_gt.h     |  1 -
 3 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index e4c30030fc40..40983ad66e3c 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -537,7 +537,6 @@ int xe_device_probe(struct xe_device *xe)
 	struct xe_tile *tile;
 	struct xe_gt *gt;
 	int err;
-	u8 last_gt;
 	u8 id;
 
 	xe_pat_init_early(xe);
@@ -631,18 +630,16 @@ int xe_device_probe(struct xe_device *xe)
 		goto err_irq_shutdown;
 
 	for_each_gt(gt, xe, id) {
-		last_gt = id;
-
 		err = xe_gt_init(gt);
 		if (err)
-			goto err_fini_gt;
+			goto err_irq_shutdown;
 	}
 
 	xe_heci_gsc_init(xe);
 
 	err = xe_display_init(xe);
 	if (err)
-		goto err_fini_gt;
+		goto err_irq_shutdown;
 
 	err = drm_dev_register(&xe->drm, 0);
 	if (err)
@@ -658,15 +655,6 @@ int xe_device_probe(struct xe_device *xe)
 
 err_fini_display:
 	xe_display_driver_remove(xe);
-
-err_fini_gt:
-	for_each_gt(gt, xe, id) {
-		if (id < last_gt)
-			xe_gt_remove(gt);
-		else
-			break;
-	}
-
 err_irq_shutdown:
 	xe_irq_shutdown(xe);
 err:
@@ -684,18 +672,12 @@ static void xe_device_remove_display(struct xe_device *xe)
 
 void xe_device_remove(struct xe_device *xe)
 {
-	struct xe_gt *gt;
-	u8 id;
-
 	xe_device_remove_display(xe);
 
 	xe_display_fini(xe);
 
 	xe_heci_gsc_fini(xe);
 
-	for_each_gt(gt, xe, id)
-		xe_gt_remove(gt);
-
 	xe_irq_shutdown(xe);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 3f1826b783ad..00dbb84828ff 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -92,16 +92,14 @@ void xe_gt_sanitize(struct xe_gt *gt)
 	gt->uc.guc.submission_state.enabled = false;
 }
 
-/**
- * xe_gt_remove() - Clean up the GT structures before driver removal
- * @gt: the GT object
- *
+/*
  * This function should only act on objects/structures that must be cleaned
  * before the driver removal callback is complete and therefore can't be
  * deferred to a drmm action.
  */
-void xe_gt_remove(struct xe_gt *gt)
+static void gt_remove(void *arg)
 {
+	struct xe_gt *gt = arg;
 	int i;
 
 	xe_uc_remove(&gt->uc);
@@ -573,6 +571,7 @@ int xe_gt_init(struct xe_gt *gt)
 	if (err)
 		return err;
 
+	return devm_add_action_or_reset(gt_to_xe(gt)->drm.dev, gt_remove, gt);
 }
 
 static int do_gt_reset(struct xe_gt *gt)
diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index 8474c50b1b30..866a49037d58 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -43,7 +43,6 @@ int xe_gt_suspend(struct xe_gt *gt);
 int xe_gt_resume(struct xe_gt *gt);
 void xe_gt_reset_async(struct xe_gt *gt);
 void xe_gt_sanitize(struct xe_gt *gt);
-void xe_gt_remove(struct xe_gt *gt);
 
 /**
  * xe_gt_any_hw_engine_by_reset_domain - scan the list of engines and return the
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 16/20] drm/xe/mmio: move mmio_fini over to devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (14 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 15/20] drm/xe: make gt_remove use devm Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  8:00   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 17/20] drm/xe: reset mmio mappings with devm Matthew Auld
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

Not valid to touch mmio once the device is removed, so make sure we
unmap on removal and not just when driver instance goes away. Also set
the mmio pointers to NULL to hopefully catch such issues more easily.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_mmio.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index 05edab0e085d..a3094e741db8 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -387,13 +387,16 @@ void xe_mmio_probe_tiles(struct xe_device *xe)
 	}
 }
 
-static void mmio_fini(struct drm_device *drm, void *arg)
+static void mmio_fini(void *arg)
 {
 	struct xe_device *xe = arg;
 
 	pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
 	if (xe->mem.vram.mapping)
 		iounmap(xe->mem.vram.mapping);
+
+	xe->mem.vram.mapping = NULL;
+	xe->mmio.regs = NULL;
 }
 
 int xe_mmio_init(struct xe_device *xe)
@@ -418,7 +421,7 @@ int xe_mmio_init(struct xe_device *xe)
 	root_tile->mmio.size = SZ_16M;
 	root_tile->mmio.regs = xe->mmio.regs;
 
-	return drmm_add_action_or_reset(&xe->drm, mmio_fini, xe);
+	return devm_add_action_or_reset(xe->drm.dev, mmio_fini, xe);
 }
 
 u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 17/20] drm/xe: reset mmio mappings with devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (15 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 16/20] drm/xe/mmio: move mmio_fini over to devm Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  8:12   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 18/20] drm/xe/display: move display fini stuff to devm Matthew Auld
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

Set our various mmio mappings to NULL. This should make it easier to
catch something rogue trying to mess with mmio after device removal. For
example, we might unmap everything and then start hitting some mmio
address which has already been unmamped by us and then remapped by
something else, causing all kinds of carnage.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c |  4 +++-
 drivers/gpu/drm/xe/xe_mmio.c   | 35 ++++++++++++++++++++++++++++------
 drivers/gpu/drm/xe/xe_mmio.h   |  2 +-
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 40983ad66e3c..ce8bccbb06ba 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -554,7 +554,9 @@ int xe_device_probe(struct xe_device *xe)
 	if (err)
 		return err;
 
-	xe_mmio_probe_tiles(xe);
+	err = xe_mmio_probe_tiles(xe);
+	if (err)
+		return err;
 
 	xe_ttm_sys_mgr_init(xe);
 
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index a3094e741db8..0ad840bab8f3 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -255,6 +255,21 @@ static int xe_mmio_tile_vram_size(struct xe_tile *tile, u64 *vram_size,
 	return xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 }
 
+static void vram_fini(void *arg)
+{
+	struct xe_device *xe = arg;
+	struct xe_tile *tile;
+	int id;
+
+	if (xe->mem.vram.mapping)
+		iounmap(xe->mem.vram.mapping);
+
+	xe->mem.vram.mapping = NULL;
+
+	for_each_tile(tile, xe, id)
+		tile->mem.vram.mapping = NULL;
+}
+
 int xe_mmio_probe_vram(struct xe_device *xe)
 {
 	struct xe_tile *tile;
@@ -331,10 +346,20 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 	drm_info(&xe->drm, "Available VRAM: %pa, %pa\n", &xe->mem.vram.io_start,
 		 &available_size);
 
-	return 0;
+	return devm_add_action_or_reset(xe->drm.dev, vram_fini, xe);
 }
 
-void xe_mmio_probe_tiles(struct xe_device *xe)
+static void tiles_fini(void *arg)
+{
+	struct xe_device *xe = arg;
+	struct xe_tile *tile;
+	int id;
+
+	for_each_tile(tile, xe, id)
+		tile->mmio.regs = NULL;
+}
+
+int xe_mmio_probe_tiles(struct xe_device *xe)
 {
 	size_t tile_mmio_size = SZ_16M, tile_mmio_ext_size = xe->info.tile_mmio_ext_size;
 	u8 id, tile_count = xe->info.tile_count;
@@ -385,6 +410,8 @@ void xe_mmio_probe_tiles(struct xe_device *xe)
 			regs += tile_mmio_ext_size;
 		}
 	}
+
+	return devm_add_action_or_reset(xe->drm.dev, tiles_fini, xe);
 }
 
 static void mmio_fini(void *arg)
@@ -392,10 +419,6 @@ static void mmio_fini(void *arg)
 	struct xe_device *xe = arg;
 
 	pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
-	if (xe->mem.vram.mapping)
-		iounmap(xe->mem.vram.mapping);
-
-	xe->mem.vram.mapping = NULL;
 	xe->mmio.regs = NULL;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
index 445ec6a0753e..6488705c1ffe 100644
--- a/drivers/gpu/drm/xe/xe_mmio.h
+++ b/drivers/gpu/drm/xe/xe_mmio.h
@@ -21,7 +21,7 @@ struct xe_device;
 #define LMEM_BAR		2
 
 int xe_mmio_init(struct xe_device *xe);
-void xe_mmio_probe_tiles(struct xe_device *xe);
+int xe_mmio_probe_tiles(struct xe_device *xe);
 
 u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg);
 u16 xe_mmio_read16(struct xe_gt *gt, struct xe_reg reg);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 18/20] drm/xe/display: move display fini stuff to devm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (16 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 17/20] drm/xe: reset mmio mappings with devm Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  8:13   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 19/20] drm/xe/display: stop calling domains_driver_remove twice Matthew Auld
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

Match the i915 display handling here with calling both no_irq and
noaccel when removing the device.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/display/xe_display.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index 0de0566e5b39..6d74c9f55b0a 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -126,9 +126,9 @@ int xe_display_init_nommio(struct xe_device *xe)
 	return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe);
 }
 
-static void xe_display_fini_noirq(struct drm_device *dev, void *dummy)
+static void xe_display_fini_noirq(void *arg)
 {
-	struct xe_device *xe = to_xe_device(dev);
+	struct xe_device *xe = arg;
 
 	if (!xe->info.enable_display)
 		return;
@@ -163,12 +163,12 @@ int xe_display_init_noirq(struct xe_device *xe)
 	if (err)
 		return err;
 
-	return drmm_add_action_or_reset(&xe->drm, xe_display_fini_noirq, NULL);
+	return devm_add_action_or_reset(xe->drm.dev, xe_display_fini_noirq, xe);
 }
 
-static void xe_display_fini_noaccel(struct drm_device *dev, void *dummy)
+static void xe_display_fini_noaccel(void *arg)
 {
-	struct xe_device *xe = to_xe_device(dev);
+	struct xe_device *xe = arg;
 
 	if (!xe->info.enable_display)
 		return;
@@ -187,7 +187,7 @@ int xe_display_init_noaccel(struct xe_device *xe)
 	if (err)
 		return err;
 
-	return drmm_add_action_or_reset(&xe->drm, xe_display_fini_noaccel, NULL);
+	return devm_add_action_or_reset(xe->drm.dev, xe_display_fini_noaccel, xe);
 }
 
 int xe_display_init(struct xe_device *xe)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 19/20] drm/xe/display: stop calling domains_driver_remove twice
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (17 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 18/20] drm/xe/display: move display fini stuff to devm Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-13  8:19   ` Andrzej Hajda
  2024-05-10 18:12 ` [PATCH 20/20] drm/xe/display: move device_remove over to drmm Matthew Auld
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

Unclear why we call this twice.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/display/xe_display.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index 6d74c9f55b0a..734bddbb8932 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -134,7 +134,6 @@ static void xe_display_fini_noirq(void *arg)
 		return;
 
 	intel_display_driver_remove_noirq(xe);
-	intel_power_domains_driver_remove(xe);
 }
 
 int xe_display_init_noirq(struct xe_device *xe)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 20/20] drm/xe/display: move device_remove over to drmm
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (18 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 19/20] drm/xe/display: stop calling domains_driver_remove twice Matthew Auld
@ 2024-05-10 18:12 ` Matthew Auld
  2024-05-11 10:33   ` kernel test robot
                     ` (2 more replies)
  2024-05-10 18:17 ` ✓ CI.Patch_applied: success for core_hotunplug improvements Patchwork
                   ` (2 subsequent siblings)
  22 siblings, 3 replies; 47+ messages in thread
From: Matthew Auld @ 2024-05-10 18:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Andrzej Hajda, Rodrigo Vivi

i915 display calls this when releasing the drm_device, match this also
in xe by using drmm. intel_display_device_remove() is freeing purely
software state for the drm_device.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/display/xe_display.c | 20 ++++++++++++++++----
 drivers/gpu/drm/xe/display/xe_display.h |  4 ++--
 drivers/gpu/drm/xe/xe_pci.c             |  4 +++-
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index 734bddbb8932..ac674a08664d 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -234,8 +234,6 @@ void xe_display_driver_remove(struct xe_device *xe)
 		return;
 
 	intel_display_driver_remove(xe);
-
-	intel_display_device_remove(xe);
 }
 
 /* IRQ-related functions */
@@ -377,17 +375,31 @@ void xe_display_pm_resume(struct xe_device *xe)
 	intel_power_domains_enable(xe);
 }
 
-void xe_display_probe(struct xe_device *xe)
+static void display_device_remove(struct drm_device *dev, void *arg)
+{
+	struct xe_device *xe = arg;
+
+	intel_display_device_remove(xe);
+}
+
+int xe_display_probe(struct xe_device *xe)
 {
+	int err;
+
 	if (!xe->info.enable_display)
 		goto no_display;
 
 	intel_display_device_probe(xe);
 
+	err = drmm_add_action_or_reset(&xe->drm, display_device_remove, xe);
+	if (err)
+		return err;
+
 	if (has_display(xe))
-		return;
+		return 0;
 
 no_display:
 	xe->info.enable_display = false;
 	unset_display_features(xe);
+	return 0;
 }
diff --git a/drivers/gpu/drm/xe/display/xe_display.h b/drivers/gpu/drm/xe/display/xe_display.h
index 710e56180b52..148bf7d744e6 100644
--- a/drivers/gpu/drm/xe/display/xe_display.h
+++ b/drivers/gpu/drm/xe/display/xe_display.h
@@ -18,7 +18,7 @@ void xe_display_driver_remove(struct xe_device *xe);
 
 int xe_display_create(struct xe_device *xe);
 
-void xe_display_probe(struct xe_device *xe);
+int xe_display_probe(struct xe_device *xe);
 
 int xe_display_init_nommio(struct xe_device *xe);
 int xe_display_init_noirq(struct xe_device *xe);
@@ -47,7 +47,7 @@ static inline void xe_display_driver_remove(struct xe_device *xe) {}
 
 static inline int xe_display_create(struct xe_device *xe) { return 0; }
 
-static inline void xe_display_probe(struct xe_device *xe) { }
+static inline int xe_display_probe(struct xe_device *xe) { }
 
 static inline int xe_display_init_nommio(struct xe_device *xe) { return 0; }
 static inline int xe_display_init_noirq(struct xe_device *xe) { return 0; }
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 841b02ac6ba1..b366cbceb476 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -768,7 +768,9 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		return err;
 
-	xe_display_probe(xe);
+	err = xe_display_probe(xe);
+	if (err)
+		return err;
 
 	drm_dbg(&xe->drm, "%s %s %04x:%04x dgfx:%d gfx:%s (%d.%02d) media:%s (%d.%02d) display:%s dma_m_s:%d tc:%d gscfi:%d",
 		desc->platform_name,
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* ✓ CI.Patch_applied: success for core_hotunplug improvements
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (19 preceding siblings ...)
  2024-05-10 18:12 ` [PATCH 20/20] drm/xe/display: move device_remove over to drmm Matthew Auld
@ 2024-05-10 18:17 ` Patchwork
  2024-05-10 18:18 ` ✗ CI.checkpatch: warning " Patchwork
  2024-05-10 18:18 ` ✗ CI.KUnit: failure " Patchwork
  22 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2024-05-10 18:17 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

== Series Details ==

Series: core_hotunplug improvements
URL   : https://patchwork.freedesktop.org/series/133462/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: afd593c3e9f9 drm-tip: 2024y-05m-10d-17h-57m-52s UTC integration manifest
=== git am output follows ===
Applying: drm/drm_managed: try to improve the drmm DOC
Applying: drm/amdgpu: don't trample pdev drvdata
Applying: drm/xe/pci: remove broken driver_release
Applying: drm/xe: covert sysfs over to devm
Applying: drm/xe/ggtt: use drm_dev_enter to mark device section
Applying: drm/xe/guc: move guc_fini over to devm
Applying: drm/xe/guc: s/guc_fini/guc_fini_hw/
Applying: drm/xe/guc_pc: move pc_fini to devm
Applying: drm/xe/guc_pc: s/pc_fini/pc_fini_hw/
Applying: drm/xe/irq: move irq_uninstall over to devm
Applying: drm/xe/device: move flr to devm
Applying: drm/xe/device: move xe_device_sanitize over to devm
Applying: drm/xe/coredump: move over to devm
Applying: drm/xe/gt: break out gt_fini into sw vs hw state
Applying: drm/xe: make gt_remove use devm
Applying: drm/xe/mmio: move mmio_fini over to devm
Applying: drm/xe: reset mmio mappings with devm
Applying: drm/xe/display: move display fini stuff to devm
Applying: drm/xe/display: stop calling domains_driver_remove twice
Applying: drm/xe/display: move device_remove over to drmm



^ permalink raw reply	[flat|nested] 47+ messages in thread

* ✗ CI.checkpatch: warning for core_hotunplug improvements
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (20 preceding siblings ...)
  2024-05-10 18:17 ` ✓ CI.Patch_applied: success for core_hotunplug improvements Patchwork
@ 2024-05-10 18:18 ` Patchwork
  2024-05-10 18:18 ` ✗ CI.KUnit: failure " Patchwork
  22 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2024-05-10 18:18 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

== Series Details ==

Series: core_hotunplug improvements
URL   : https://patchwork.freedesktop.org/series/133462/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
57b97a66dd129aea93991dc66cd10477f7a05cf8
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 6c043bff1967dd164a2eaa1cfc21a93f6db0d048
Author: Matthew Auld <matthew.auld@intel.com>
Date:   Fri May 10 19:12:33 2024 +0100

    drm/xe/display: move device_remove over to drmm
    
    i915 display calls this when releasing the drm_device, match this also
    in xe by using drmm. intel_display_device_remove() is freeing purely
    software state for the drm_device.
    
    Signed-off-by: Matthew Auld <matthew.auld@intel.com>
    Cc: Andrzej Hajda <andrzej.hajda@intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
+ /mt/dim checkpatch afd593c3e9f9218ececa93e30c33182346295ac9 drm-intel
f2cc8205fe5b drm/drm_managed: try to improve the drmm DOC
5f92a217a75e drm/amdgpu: don't trample pdev drvdata
0891c1b63924 drm/xe/pci: remove broken driver_release
1760609797fb drm/xe: covert sysfs over to devm
cdf82178ac6b drm/xe/ggtt: use drm_dev_enter to mark device section
-:16: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' or 'Closes:' instead
#16: 
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717

total: 0 errors, 1 warnings, 0 checks, 39 lines checked
26c68e93195c drm/xe/guc: move guc_fini over to devm
-:13: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' or 'Closes:' instead
#13: 
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717

total: 0 errors, 1 warnings, 0 checks, 16 lines checked
a61bdf6e5af0 drm/xe/guc: s/guc_fini/guc_fini_hw/
bbefb452533f drm/xe/guc_pc: move pc_fini to devm
6eff907b65c4 drm/xe/guc_pc: s/pc_fini/pc_fini_hw/
c280057f5c85 drm/xe/irq: move irq_uninstall over to devm
e3fc860f512d drm/xe/device: move flr to devm
c1fdc710349f drm/xe/device: move xe_device_sanitize over to devm
3467cf60a470 drm/xe/coredump: move over to devm
-:19: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' or 'Closes:' instead
#19: 
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1679

total: 0 errors, 1 warnings, 0 checks, 17 lines checked
36c887aba920 drm/xe/gt: break out gt_fini into sw vs hw state
ed6fc1a6bccf drm/xe: make gt_remove use devm
b23795ca6416 drm/xe/mmio: move mmio_fini over to devm
4c4b9e75a48c drm/xe: reset mmio mappings with devm
7389922c5733 drm/xe/display: move display fini stuff to devm
d013035c43ba drm/xe/display: stop calling domains_driver_remove twice
6c043bff1967 drm/xe/display: move device_remove over to drmm



^ permalink raw reply	[flat|nested] 47+ messages in thread

* ✗ CI.KUnit: failure for core_hotunplug improvements
  2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
                   ` (21 preceding siblings ...)
  2024-05-10 18:18 ` ✗ CI.checkpatch: warning " Patchwork
@ 2024-05-10 18:18 ` Patchwork
  22 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2024-05-10 18:18 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

== Series Details ==

Series: core_hotunplug improvements
URL   : https://patchwork.freedesktop.org/series/133462/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../arch/x86/um/user-offsets.c:17:6: warning: no previous prototype for ‘foo’ [-Wmissing-prototypes]
   17 | void foo(void)
      |      ^~~
In file included from ../arch/um/kernel/asm-offsets.c:1:
../arch/x86/um/shared/sysdep/kernel-offsets.h:9:6: warning: no previous prototype for ‘foo’ [-Wmissing-prototypes]
    9 | void foo(void)
      |      ^~~
../arch/x86/um/bugs_64.c:9:6: warning: no previous prototype for ‘arch_check_bugs’ [-Wmissing-prototypes]
    9 | void arch_check_bugs(void)
      |      ^~~~~~~~~~~~~~~
../arch/x86/um/bugs_64.c:13:6: warning: no previous prototype for ‘arch_examine_signal’ [-Wmissing-prototypes]
   13 | void arch_examine_signal(int sig, struct uml_pt_regs *regs)
      |      ^~~~~~~~~~~~~~~~~~~
../arch/x86/um/fault.c:18:5: warning: no previous prototype for ‘arch_fixup’ [-Wmissing-prototypes]
   18 | int arch_fixup(unsigned long address, struct uml_pt_regs *regs)
      |     ^~~~~~~~~~
../arch/x86/um/os-Linux/mcontext.c:7:6: warning: no previous prototype for ‘get_regs_from_mc’ [-Wmissing-prototypes]
    7 | void get_regs_from_mc(struct uml_pt_regs *regs, mcontext_t *mc)
      |      ^~~~~~~~~~~~~~~~
../arch/x86/um/os-Linux/registers.c:146:15: warning: no previous prototype for ‘get_thread_reg’ [-Wmissing-prototypes]
  146 | unsigned long get_thread_reg(int reg, jmp_buf *buf)
      |               ^~~~~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:16:5: warning: no previous prototype for ‘__vdso_clock_gettime’ [-Wmissing-prototypes]
   16 | int __vdso_clock_gettime(clockid_t clock, struct __kernel_old_timespec *ts)
      |     ^~~~~~~~~~~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:30:5: warning: no previous prototype for ‘__vdso_gettimeofday’ [-Wmissing-prototypes]
   30 | int __vdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz)
      |     ^~~~~~~~~~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:44:21: warning: no previous prototype for ‘__vdso_time’ [-Wmissing-prototypes]
   44 | __kernel_old_time_t __vdso_time(__kernel_old_time_t *t)
      |                     ^~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:57:1: warning: no previous prototype for ‘__vdso_getcpu’ [-Wmissing-prototypes]
   57 | __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
      | ^~~~~~~~~~~~~
../arch/um/os-Linux/skas/process.c:107:6: warning: no previous prototype for ‘wait_stub_done’ [-Wmissing-prototypes]
  107 | void wait_stub_done(int pid)
      |      ^~~~~~~~~~~~~~
../arch/um/os-Linux/skas/process.c:683:6: warning: no previous prototype for ‘__switch_mm’ [-Wmissing-prototypes]
  683 | void __switch_mm(struct mm_id *mm_idp)
      |      ^~~~~~~~~~~
../arch/um/os-Linux/main.c:187:7: warning: no previous prototype for ‘__wrap_malloc’ [-Wmissing-prototypes]
  187 | void *__wrap_malloc(int size)
      |       ^~~~~~~~~~~~~
../arch/um/os-Linux/main.c:208:7: warning: no previous prototype for ‘__wrap_calloc’ [-Wmissing-prototypes]
  208 | void *__wrap_calloc(int n, int size)
      |       ^~~~~~~~~~~~~
../arch/um/os-Linux/main.c:222:6: warning: no previous prototype for ‘__wrap_free’ [-Wmissing-prototypes]
  222 | void __wrap_free(void *ptr)
      |      ^~~~~~~~~~~
../arch/um/os-Linux/mem.c:28:6: warning: no previous prototype for ‘kasan_map_memory’ [-Wmissing-prototypes]
   28 | void kasan_map_memory(void *start, size_t len)
      |      ^~~~~~~~~~~~~~~~
../arch/um/os-Linux/mem.c:212:13: warning: no previous prototype for ‘check_tmpexec’ [-Wmissing-prototypes]
  212 | void __init check_tmpexec(void)
      |             ^~~~~~~~~~~~~
../arch/um/kernel/skas/process.c:36:12: warning: no previous prototype for ‘start_uml’ [-Wmissing-prototypes]
   36 | int __init start_uml(void)
      |            ^~~~~~~~~
../arch/x86/um/ptrace_64.c:111:5: warning: no previous prototype for ‘poke_user’ [-Wmissing-prototypes]
  111 | int poke_user(struct task_struct *child, long addr, long data)
      |     ^~~~~~~~~
../arch/x86/um/ptrace_64.c:171:5: warning: no previous prototype for ‘peek_user’ [-Wmissing-prototypes]
  171 | int peek_user(struct task_struct *child, long addr, long data)
      |     ^~~~~~~~~
../arch/x86/um/signal.c:560:6: warning: no previous prototype for ‘sys_rt_sigreturn’ [-Wmissing-prototypes]
  560 | long sys_rt_sigreturn(void)
      |      ^~~~~~~~~~~~~~~~
../arch/um/kernel/skas/mmu.c:17:5: warning: no previous prototype for ‘init_new_context’ [-Wmissing-prototypes]
   17 | int init_new_context(struct task_struct *task, struct mm_struct *mm)
      |     ^~~~~~~~~~~~~~~~
../arch/um/kernel/skas/mmu.c:60:6: warning: no previous prototype for ‘destroy_context’ [-Wmissing-prototypes]
   60 | void destroy_context(struct mm_struct *mm)
      |      ^~~~~~~~~~~~~~~
../arch/um/os-Linux/signal.c:75:6: warning: no previous prototype for ‘sig_handler’ [-Wmissing-prototypes]
   75 | void sig_handler(int sig, struct siginfo *si, mcontext_t *mc)
      |      ^~~~~~~~~~~
../arch/um/os-Linux/signal.c:111:6: warning: no previous prototype for ‘timer_alarm_handler’ [-Wmissing-prototypes]
  111 | void timer_alarm_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
      |      ^~~~~~~~~~~~~~~~~~~
../arch/um/os-Linux/start_up.c:301:12: warning: no previous prototype for ‘parse_iomem’ [-Wmissing-prototypes]
  301 | int __init parse_iomem(char *str, int *add)
      |            ^~~~~~~~~~~
../arch/um/kernel/mem.c:202:8: warning: no previous prototype for ‘pgd_alloc’ [-Wmissing-prototypes]
  202 | pgd_t *pgd_alloc(struct mm_struct *mm)
      |        ^~~~~~~~~
../arch/um/kernel/mem.c:215:7: warning: no previous prototype for ‘uml_kmalloc’ [-Wmissing-prototypes]
  215 | void *uml_kmalloc(int size, int flags)
      |       ^~~~~~~~~~~
../arch/um/kernel/process.c:51:5: warning: no previous prototype for ‘pid_to_processor_id’ [-Wmissing-prototypes]
   51 | int pid_to_processor_id(int pid)
      |     ^~~~~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:87:7: warning: no previous prototype for ‘__switch_to’ [-Wmissing-prototypes]
   87 | void *__switch_to(struct task_struct *from, struct task_struct *to)
      |       ^~~~~~~~~~~
../arch/um/kernel/process.c:140:6: warning: no previous prototype for ‘fork_handler’ [-Wmissing-prototypes]
  140 | void fork_handler(void)
      |      ^~~~~~~~~~~~
../arch/um/kernel/process.c:217:6: warning: no previous prototype for ‘arch_cpu_idle’ [-Wmissing-prototypes]
  217 | void arch_cpu_idle(void)
      |      ^~~~~~~~~~~~~
../arch/um/kernel/process.c:253:5: warning: no previous prototype for ‘copy_to_user_proc’ [-Wmissing-prototypes]
  253 | int copy_to_user_proc(void __user *to, void *from, int size)
      |     ^~~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:263:5: warning: no previous prototype for ‘clear_user_proc’ [-Wmissing-prototypes]
  263 | int clear_user_proc(void __user *buf, int size)
      |     ^~~~~~~~~~~~~~~
../arch/um/kernel/process.c:271:6: warning: no previous prototype for ‘set_using_sysemu’ [-Wmissing-prototypes]
  271 | void set_using_sysemu(int value)
      |      ^~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:278:5: warning: no previous prototype for ‘get_using_sysemu’ [-Wmissing-prototypes]
  278 | int get_using_sysemu(void)
      |     ^~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:316:12: warning: no previous prototype for ‘make_proc_sysemu’ [-Wmissing-prototypes]
  316 | int __init make_proc_sysemu(void)
      |            ^~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:348:15: warning: no previous prototype for ‘arch_align_stack’ [-Wmissing-prototypes]
  348 | unsigned long arch_align_stack(unsigned long sp)
      |               ^~~~~~~~~~~~~~~~
../arch/um/kernel/reboot.c:45:6: warning: no previous prototype for ‘machine_restart’ [-Wmissing-prototypes]
   45 | void machine_restart(char * __unused)
      |      ^~~~~~~~~~~~~~~
../arch/um/kernel/reboot.c:51:6: warning: no previous prototype for ‘machine_power_off’ [-Wmissing-prototypes]
   51 | void machine_power_off(void)
      |      ^~~~~~~~~~~~~~~~~
../arch/um/kernel/reboot.c:57:6: warning: no previous prototype for ‘machine_halt’ [-Wmissing-prototypes]
   57 | void machine_halt(void)
      |      ^~~~~~~~~~~~
../arch/x86/um/syscalls_64.c:48:6: warning: no previous prototype for ‘arch_switch_to’ [-Wmissing-prototypes]
   48 | void arch_switch_to(struct task_struct *to)
      |      ^~~~~~~~~~~~~~
../arch/um/kernel/tlb.c:579:6: warning: no previous prototype for ‘flush_tlb_mm_range’ [-Wmissing-prototypes]
  579 | void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
      |      ^~~~~~~~~~~~~~~~~~
../arch/um/kernel/tlb.c:594:6: warning: no previous prototype for ‘force_flush_all’ [-Wmissing-prototypes]
  594 | void force_flush_all(void)
      |      ^~~~~~~~~~~~~~~
../arch/um/kernel/um_arch.c:408:19: warning: no previous prototype for ‘read_initrd’ [-Wmissing-prototypes]
  408 | int __init __weak read_initrd(void)
      |                   ^~~~~~~~~~~
../arch/um/kernel/um_arch.c:461:7: warning: no previous prototype for ‘text_poke’ [-Wmissing-prototypes]
  461 | void *text_poke(void *addr, const void *opcode, size_t len)
      |       ^~~~~~~~~
../arch/um/kernel/um_arch.c:473:6: warning: no previous prototype for ‘text_poke_sync’ [-Wmissing-prototypes]
  473 | void text_poke_sync(void)
      |      ^~~~~~~~~~~~~~
../arch/um/kernel/kmsg_dump.c:60:12: warning: no previous prototype for ‘kmsg_dumper_stdout_init’ [-Wmissing-prototypes]
   60 | int __init kmsg_dumper_stdout_init(void)
      |            ^~~~~~~~~~~~~~~~~~~~~~~
In file included from ../drivers/gpu/drm/xe/xe_device.c:19:
../drivers/gpu/drm/xe/display/xe_display.h: In function ‘xe_display_probe’:
../drivers/gpu/drm/xe/display/xe_display.h:50:43: error: no return statement in function returning non-void [-Werror=return-type]
   50 | static inline int xe_display_probe(struct xe_device *xe) { }
      |                                           ^~~~~~~~~
cc1: some warnings being treated as errors
make[7]: *** [../scripts/Makefile.build:244: drivers/gpu/drm/xe/xe_device.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:485: drivers/gpu/drm/xe] Error 2
make[6]: *** Waiting for unfinished jobs....
make[5]: *** [../scripts/Makefile.build:485: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:485: drivers/gpu] Error 2
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../scripts/Makefile.build:485: drivers] Error 2
make[3]: *** Waiting for unfinished jobs....
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
make[2]: *** [/kernel/Makefile:1919: .] Error 2
make[1]: *** [/kernel/Makefile:240: __sub-make] Error 2
make: *** [Makefile:240: __sub-make] Error 2

[18:18:06] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[18:18:11] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/20] drm/xe/guc_pc: s/pc_fini/pc_fini_hw/
  2024-05-10 18:12 ` [PATCH 09/20] drm/xe/guc_pc: s/pc_fini/pc_fini_hw/ Matthew Auld
@ 2024-05-11  9:19   ` kernel test robot
  2024-05-13  7:37   ` Andrzej Hajda
  1 sibling, 0 replies; 47+ messages in thread
From: kernel test robot @ 2024-05-11  9:19 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: oe-kbuild-all, Andrzej Hajda, Rodrigo Vivi

Hi Matthew,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-xe/drm-xe-next]
[also build test WARNING on drm-misc/drm-misc-next next-20240510]
[cannot apply to linus/master v6.9-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Auld/drm-drm_managed-try-to-improve-the-drmm-DOC/20240511-061452
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20240510181212.264622-31-matthew.auld%40intel.com
patch subject: [PATCH 09/20] drm/xe/guc_pc: s/pc_fini/pc_fini_hw/
config: s390-allyesconfig (https://download.01.org/0day-ci/archive/20240511/202405111645.xTy2cQ1s-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240511/202405111645.xTy2cQ1s-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405111645.xTy2cQ1s-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/xe/xe_guc_pc.c:897: warning: Excess function parameter 'drm' description in 'xe_guc_pc_fini_hw'


vim +897 drivers/gpu/drm/xe/xe_guc_pc.c

dd08ebf6c3525a Matthew Brost          2023-03-30  890  
bef52b5c7a1904 Rodrigo Vivi           2023-12-08  891  /**
d4543ae6de1c7e Matthew Auld           2024-05-10  892   * xe_guc_pc_fini_hw - Finalize GuC's Power Conservation component
8a4587ef9f9521 Michał Winiarski       2024-02-19  893   * @drm: DRM device
8a4587ef9f9521 Michał Winiarski       2024-02-19  894   * @arg: opaque pointer that should point to Xe_GuC_PC instance
bef52b5c7a1904 Rodrigo Vivi           2023-12-08  895   */
d4543ae6de1c7e Matthew Auld           2024-05-10  896  static void xe_guc_pc_fini_hw(void *arg)
dd08ebf6c3525a Matthew Brost          2023-03-30 @897  {
8a4587ef9f9521 Michał Winiarski       2024-02-19  898  	struct xe_guc_pc *pc = arg;
975e4a3795d4f1 Vinay Belgaumkar       2023-11-17  899  	struct xe_device *xe = pc_to_xe(pc);
975e4a3795d4f1 Vinay Belgaumkar       2023-11-17  900  
975e4a3795d4f1 Vinay Belgaumkar       2023-11-17  901  	if (xe->info.skip_guc_pc) {
975e4a3795d4f1 Vinay Belgaumkar       2023-11-17  902  		xe_gt_idle_disable_c6(pc_to_gt(pc));
975e4a3795d4f1 Vinay Belgaumkar       2023-11-17  903  		return;
975e4a3795d4f1 Vinay Belgaumkar       2023-11-17  904  	}
975e4a3795d4f1 Vinay Belgaumkar       2023-11-17  905  
fb74b205cdd263 Rodrigo Vivi           2024-04-23  906  	if (xe_device_wedged(xe))
fb74b205cdd263 Rodrigo Vivi           2024-04-23  907  		return;
fb74b205cdd263 Rodrigo Vivi           2024-04-23  908  
649a125a88da64 Daniele Ceraolo Spurio 2024-03-18  909  	XE_WARN_ON(xe_force_wake_get(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL));
1737785ae5313e Riana Tauro            2023-07-17  910  	XE_WARN_ON(xe_guc_pc_gucrc_disable(pc));
dd08ebf6c3525a Matthew Brost          2023-03-30  911  	XE_WARN_ON(xe_guc_pc_stop(pc));
8a4587ef9f9521 Michał Winiarski       2024-02-19  912  	xe_force_wake_put(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL);
dd08ebf6c3525a Matthew Brost          2023-03-30  913  }
dd08ebf6c3525a Matthew Brost          2023-03-30  914  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 20/20] drm/xe/display: move device_remove over to drmm
  2024-05-10 18:12 ` [PATCH 20/20] drm/xe/display: move device_remove over to drmm Matthew Auld
@ 2024-05-11 10:33   ` kernel test robot
  2024-05-11 10:43   ` kernel test robot
  2024-05-13  8:22   ` Andrzej Hajda
  2 siblings, 0 replies; 47+ messages in thread
From: kernel test robot @ 2024-05-11 10:33 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: llvm, oe-kbuild-all, Andrzej Hajda, Rodrigo Vivi

Hi Matthew,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-xe/drm-xe-next]
[also build test WARNING on drm-misc/drm-misc-next next-20240510]
[cannot apply to linus/master v6.9-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Auld/drm-drm_managed-try-to-improve-the-drmm-DOC/20240511-061452
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20240510181212.264622-42-matthew.auld%40intel.com
patch subject: [PATCH 20/20] drm/xe/display: move device_remove over to drmm
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20240511/202405111828.vE3DTOeD-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240511/202405111828.vE3DTOeD-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405111828.vE3DTOeD-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/xe/xe_device.c:18:
>> drivers/gpu/drm/xe/display/xe_display.h:50:60: warning: non-void function does not return a value [-Wreturn-type]
      50 | static inline int xe_display_probe(struct xe_device *xe) { }
         |                                                            ^
   1 warning generated.


vim +50 drivers/gpu/drm/xe/display/xe_display.h

    49	
  > 50	static inline int xe_display_probe(struct xe_device *xe) { }
    51	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 20/20] drm/xe/display: move device_remove over to drmm
  2024-05-10 18:12 ` [PATCH 20/20] drm/xe/display: move device_remove over to drmm Matthew Auld
  2024-05-11 10:33   ` kernel test robot
@ 2024-05-11 10:43   ` kernel test robot
  2024-05-13  8:22   ` Andrzej Hajda
  2 siblings, 0 replies; 47+ messages in thread
From: kernel test robot @ 2024-05-11 10:43 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: oe-kbuild-all, Andrzej Hajda, Rodrigo Vivi

Hi Matthew,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-xe/drm-xe-next]
[also build test WARNING on drm-misc/drm-misc-next next-20240510]
[cannot apply to linus/master v6.9-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Auld/drm-drm_managed-try-to-improve-the-drmm-DOC/20240511-061452
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20240510181212.264622-42-matthew.auld%40intel.com
patch subject: [PATCH 20/20] drm/xe/display: move device_remove over to drmm
config: s390-allyesconfig (https://download.01.org/0day-ci/archive/20240511/202405111815.TdKOTxKQ-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240511/202405111815.TdKOTxKQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405111815.TdKOTxKQ-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/xe/xe_device.c:18:
   drivers/gpu/drm/xe/display/xe_display.h: In function 'xe_display_probe':
>> drivers/gpu/drm/xe/display/xe_display.h:50:43: warning: no return statement in function returning non-void [-Wreturn-type]
      50 | static inline int xe_display_probe(struct xe_device *xe) { }
         |                                           ^~~~~~~~~


vim +50 drivers/gpu/drm/xe/display/xe_display.h

    49	
  > 50	static inline int xe_display_probe(struct xe_device *xe) { }
    51	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/20] drm/drm_managed: try to improve the drmm DOC
  2024-05-10 18:12 ` [PATCH 01/20] drm/drm_managed: try to improve the drmm DOC Matthew Auld
@ 2024-05-13  7:06   ` Andrzej Hajda
  2024-05-13 15:38   ` Rodrigo Vivi
  1 sibling, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  7:06 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Daniel Vetter, dri-devel

On 10.05.2024 20:12, Matthew Auld wrote:
> Hopefully make it clearer when to use devm vs drmm.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> ---
>   drivers/gpu/drm/drm_managed.c | 42 +++++++++++++++++++++++++++++++++++
>   1 file changed, 42 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c
> index 7646f67bda4e..20d705bbc0a3 100644
> --- a/drivers/gpu/drm/drm_managed.c
> +++ b/drivers/gpu/drm/drm_managed.c
> @@ -34,6 +34,48 @@
>    * during the lifetime of the driver, all the functions are fully concurrent
>    * safe. But it is recommended to use managed resources only for resources that
>    * change rarely, if ever, during the lifetime of the &drm_device instance.
> + *
> + * Note that the distinction between devm and drmm is important to get right.
> + * Consider some hotunplug scenarios, where it is valid for there to be multiple
> + * unplugged struct &drm_device instances each being kept alive by an open
> + * driver fd. The driver needs a clean separation between what needs to happen
> + * when the struct &device is removed and what needs to happen when a given
> + * struct &drm_device instance is released, as well as in some cases a more
> + * finer grained marking of critical sections that require hardware interaction.
> + * See below.
> + *
> + * devm
> + * ~~~~
> + * In general use devm for cleaning up anything hardware related. So removing
> + * pci mmaps, releasing interrupt handlers, basically anything hw related.  The
> + * devm release actions are called when the struct &device is removed, shortly
> + * after calling into the drivers struct &pci_driver.remove() callback, if this
> + * is a pci device.
> + *
> + * devm can be thought of as an alternative to putting all the hw related
> + * cleanup directly in the struct &pci_driver.remove() callback, where the
> + * correct ordering of the unwind steps needs to be manually done in the error
> + * path of the struct &pci_driver.probe() and again on the remove side.  With
> + * devm this is all done automatically.
> + *
> + * drmm
> + * ~~~~
> + * In general use this for cleaning up anything software related. So data
> + * structures and the like which are tied to the lifetime of a particular struct
> + * &drm_device instance.
> + *
> + * drmm can be thought of as an alternative to putting all the software related
> + * cleanup directly in the struct &drm_driver.release() callback, where again
> + * the correct ordering of the unwind steps needs to be done manually. As with
> + * devm this is instead done automatically.
> + *
> + * Sometimes there is no clean separation between software and hardware, which
> + * is where drm_dev_enter() comes in. For example, a driver might have some
> + * state tied to a struct &drm_device instance, for which the same cleanup path
> + * is called for both a plugged and unplugged device, and the cleanup itself
> + * might require talking to the device if it's still attached to this particular
> + * struct &drm_device. For that we instead mark the device sections.  See
> + * drm_dev_enter(), drm_dev_exit() and drm_dev_unplug().

I would emphasize somewhere that after device unbind any interaction 
with physical device is forbidden (are some exceptions for this?).
Anyway nice stuff.

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej

>    */
>   
>   struct drmres_node {


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/20] drm/xe/pci: remove broken driver_release
  2024-05-10 18:12 ` [PATCH 03/20] drm/xe/pci: remove broken driver_release Matthew Auld
@ 2024-05-13  7:14   ` Andrzej Hajda
  0 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  7:14 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> This is quite broken since we are nuking the pdev link to the private
> driver struct, but note here that driver_release is called when the
> drm_device is released (poor mans drmm), which can be long after the
> device has been removed. So here what we are actually doing is nuking
> the pdev link for what is potentially bound to a different drm_device.
> If that happens before our pci remove callback is triggered (for the new
> drm_device) we silently exit and skip some important cleanup steps,
> resulting in hilarity.
> 
> There should be no reason to implement driver_release, when we already
> have nicer stuff like drmm, so just remove completely. The actual pdev
> link is already nuked when removing the device.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej
> ---
>   drivers/gpu/drm/xe/xe_device.c | 9 ---------
>   1 file changed, 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index f89c781986a5..d6f9880a15fd 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -188,13 +188,6 @@ static const struct file_operations xe_driver_fops = {
>   #endif
>   };
>   
> -static void xe_driver_release(struct drm_device *dev)
> -{
> -	struct xe_device *xe = to_xe_device(dev);
> -
> -	pci_set_drvdata(to_pci_dev(xe->drm.dev), NULL);
> -}
> -
>   static struct drm_driver driver = {
>   	/* Don't use MTRRs here; the Xserver or userspace app should
>   	 * deal with them for Intel hardware.
> @@ -213,8 +206,6 @@ static struct drm_driver driver = {
>   #ifdef CONFIG_PROC_FS
>   	.show_fdinfo = xe_drm_client_fdinfo,
>   #endif
> -	.release = &xe_driver_release,
> -
>   	.ioctls = xe_ioctls,
>   	.num_ioctls = ARRAY_SIZE(xe_ioctls),
>   	.fops = &xe_driver_fops,


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/20] drm/xe/ggtt: use drm_dev_enter to mark device section
  2024-05-10 18:12 ` [PATCH 05/20] drm/xe/ggtt: use drm_dev_enter to mark device section Matthew Auld
@ 2024-05-13  7:16   ` Andrzej Hajda
  2024-05-13 19:34   ` Randhawa, Jagmeet
  1 sibling, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  7:16 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> Device can be hotunplugged before we start destroying gem objects. In
> such a case don't touch the GGTT entries, trigger any invalidations or
> mess around with rpm.  This should already be taken care of when
> removing the device, we just need to take care of dealing with the
> software state, like removing the mm node.
> 
> v2: (Andrzej)
>    - Avoid some duplication by tracking the bound status and checking
>      that instead.
> 
> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej

> ---
>   drivers/gpu/drm/xe/xe_ggtt.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index 0d541f55b4fc..17e5066763db 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -8,6 +8,7 @@
>   #include <linux/io-64-nonatomic-lo-hi.h>
>   #include <linux/sizes.h>
>   
> +#include <drm/drm_drv.h>
>   #include <drm/drm_managed.h>
>   #include <drm/i915_drm.h>
>   
> @@ -433,18 +434,29 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
>   void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
>   			 bool invalidate)
>   {
> -	xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile));
> +	struct xe_device *xe = tile_to_xe(ggtt->tile);
> +	bool bound;
> +	int idx;
> +
> +	bound = drm_dev_enter(&xe->drm, &idx);
> +	if (bound)
> +		xe_pm_runtime_get_noresume(xe);
>   
>   	mutex_lock(&ggtt->lock);
> -	xe_ggtt_clear(ggtt, node->start, node->size);
> +	if (bound)
> +		xe_ggtt_clear(ggtt, node->start, node->size);
>   	drm_mm_remove_node(node);
>   	node->size = 0;
>   	mutex_unlock(&ggtt->lock);
>   
> +	if (!bound)
> +		return;
> +
>   	if (invalidate)
>   		xe_ggtt_invalidate(ggtt);
>   
> -	xe_pm_runtime_put(tile_to_xe(ggtt->tile));
> +	xe_pm_runtime_put(xe);
> +	drm_dev_exit(idx);
>   }
>   
>   void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/20] drm/xe/guc: s/guc_fini/guc_fini_hw/
  2024-05-10 18:12 ` [PATCH 07/20] drm/xe/guc: s/guc_fini/guc_fini_hw/ Matthew Auld
@ 2024-05-13  7:30   ` Andrzej Hajda
  0 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  7:30 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> Make it clear that is about cleaning up the HW/FW side, and not software
> state.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej
> ---
>   drivers/gpu/drm/xe/xe_guc.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 97e2b62df486..90d994e85e43 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -239,7 +239,7 @@ static void guc_write_params(struct xe_guc *guc)
>   		xe_mmio_write32(gt, SOFT_SCRATCH(1 + i), guc->params[i]);
>   }
>   
> -static void guc_fini(void *arg)
> +static void guc_fini_hw(void *arg)
>   {
>   	struct xe_guc *guc = arg;
>   	struct xe_gt *gt = guc_to_gt(guc);
> @@ -323,7 +323,7 @@ int xe_guc_init(struct xe_guc *guc)
>   	if (ret)
>   		goto out;
>   
> -	ret = devm_add_action_or_reset(xe->drm.dev, guc_fini, guc);
> +	ret = devm_add_action_or_reset(xe->drm.dev, guc_fini_hw, guc);
>   	if (ret)
>   		goto out;
>   


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/20] drm/xe/guc_pc: s/pc_fini/pc_fini_hw/
  2024-05-10 18:12 ` [PATCH 09/20] drm/xe/guc_pc: s/pc_fini/pc_fini_hw/ Matthew Auld
  2024-05-11  9:19   ` kernel test robot
@ 2024-05-13  7:37   ` Andrzej Hajda
  1 sibling, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  7:37 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> Make it clear that is about cleaning up the HW/FW side, and not software
> state.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_guc_pc.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
> index 4164dab1ab66..aabe241be42b 100644
> --- a/drivers/gpu/drm/xe/xe_guc_pc.c
> +++ b/drivers/gpu/drm/xe/xe_guc_pc.c
> @@ -889,11 +889,11 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc)
>   }
>   
>   /**
> - * xe_guc_pc_fini - Finalize GuC's Power Conservation component
> + * xe_guc_pc_fini_hw - Finalize GuC's Power Conservation component
>    * @drm: DRM device

With removal of @drm description
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej

>    * @arg: opaque pointer that should point to Xe_GuC_PC instance
>    */
> -static void xe_guc_pc_fini(void *arg)
> +static void xe_guc_pc_fini_hw(void *arg)
>   {
>   	struct xe_guc_pc *pc = arg;
>   	struct xe_device *xe = pc_to_xe(pc);
> @@ -941,5 +941,5 @@ int xe_guc_pc_init(struct xe_guc_pc *pc)
>   
>   	pc->bo = bo;
>   
> -	return devm_add_action_or_reset(xe->drm.dev, xe_guc_pc_fini, pc);
> +	return devm_add_action_or_reset(xe->drm.dev, xe_guc_pc_fini_hw, pc);
>   }


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 13/20] drm/xe/coredump: move over to devm
  2024-05-10 18:12 ` [PATCH 13/20] drm/xe/coredump: move " Matthew Auld
@ 2024-05-13  7:38   ` Andrzej Hajda
  0 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  7:38 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> Here we are using drmm to ensure we release the coredump when unloading
> the module, however the coredump is very much tied to the struct device
> underneath. We can see this when we hotunplug the device, for which we
> have already got a coredump attached. In such a case the coredump still
> remains and adding another is not possible. However we still register
> the release action via xe_driver_devcoredump_fini(), so in effect two or
> more releases for one dump.  The other consideration is that the
> coredump state is embedded in the xe_driver instance, so technically
> once the drmm release action fires we might free the coredumpe state
> from a different driver instance, assuming we have two release actions
> and they can race. Rather use devm here to remove the coredump when the
> device is released.
> 
> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1679
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej
> ---
>   drivers/gpu/drm/xe/xe_devcoredump.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> index 3d7980232be1..e70aef797193 100644
> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> @@ -238,13 +238,15 @@ void xe_devcoredump(struct xe_sched_job *job)
>   		      xe_devcoredump_read, xe_devcoredump_free);
>   }
>   
> -static void xe_driver_devcoredump_fini(struct drm_device *drm, void *arg)
> +static void xe_driver_devcoredump_fini(void *arg)
>   {
> +	struct drm_device *drm = arg;
> +
>   	dev_coredump_put(drm->dev);
>   }
>   
>   int xe_devcoredump_init(struct xe_device *xe)
>   {
> -	return drmm_add_action_or_reset(&xe->drm, xe_driver_devcoredump_fini, xe);
> +	return devm_add_action_or_reset(xe->drm.dev, xe_driver_devcoredump_fini, &xe->drm);
>   }
>   #endif


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 14/20] drm/xe/gt: break out gt_fini into sw vs hw state
  2024-05-10 18:12 ` [PATCH 14/20] drm/xe/gt: break out gt_fini into sw vs hw state Matthew Auld
@ 2024-05-13  7:50   ` Andrzej Hajda
  2024-05-13  8:37     ` Matthew Auld
  0 siblings, 1 reply; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  7:50 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> Have a cleaner separation between hw vs sw.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_gt.c | 21 +++++++++++++--------
>   1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 36c7b1631fa6..3f1826b783ad 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -57,9 +57,17 @@
>   #include "xe_wa.h"
>   #include "xe_wopcm.h"
>   
> +static void gt_fini(struct drm_device *drm, void *arg)
> +{
> +	struct xe_gt *gt = arg;
> +
> +	destroy_workqueue(gt->ordered_wq);

Do we need this wq after unbind? It seems to be used by hw stuff:
- gt_reset
- xe_guc_exec_queue_trigger_cleanup

Regards
Andrzej

> +}
> +
>   struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
>   {
>   	struct xe_gt *gt;
> +	int err;
>   
>   	gt = drmm_kzalloc(&tile_to_xe(tile)->drm, sizeof(*gt), GFP_KERNEL);
>   	if (!gt)
> @@ -68,6 +76,10 @@ struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
>   	gt->tile = tile;
>   	gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0);
>   
> +	err = drmm_add_action_or_reset(&gt_to_xe(gt)->drm, gt_fini, gt);
> +	if (err)
> +		return ERR_PTR(err);
> +
>   	return gt;
>   }
>   
> @@ -90,15 +102,9 @@ void xe_gt_sanitize(struct xe_gt *gt)
>    */
>   void xe_gt_remove(struct xe_gt *gt)
>   {
> -	xe_uc_remove(&gt->uc);
> -}
> -
> -static void gt_fini(struct drm_device *drm, void *arg)
> -{
> -	struct xe_gt *gt = arg;
>   	int i;
>   
> -	destroy_workqueue(gt->ordered_wq);
> +	xe_uc_remove(&gt->uc);
>   
>   	for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
>   		xe_hw_fence_irq_finish(&gt->fence_irq[i]);
> @@ -567,7 +573,6 @@ int xe_gt_init(struct xe_gt *gt)
>   	if (err)
>   		return err;
>   
> -	return drmm_add_action_or_reset(&gt_to_xe(gt)->drm, gt_fini, gt);
>   }
>   
>   static int do_gt_reset(struct xe_gt *gt)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/20] drm/xe: make gt_remove use devm
  2024-05-10 18:12 ` [PATCH 15/20] drm/xe: make gt_remove use devm Matthew Auld
@ 2024-05-13  7:58   ` Andrzej Hajda
  0 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  7:58 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> No need to hand roll the onion unwind here, just move gt_remove over to
> devm which will already have the correct ordering.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej
> ---
>   drivers/gpu/drm/xe/xe_device.c | 22 ++--------------------
>   drivers/gpu/drm/xe/xe_gt.c     |  9 ++++-----
>   drivers/gpu/drm/xe/xe_gt.h     |  1 -
>   3 files changed, 6 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index e4c30030fc40..40983ad66e3c 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -537,7 +537,6 @@ int xe_device_probe(struct xe_device *xe)
>   	struct xe_tile *tile;
>   	struct xe_gt *gt;
>   	int err;
> -	u8 last_gt;
>   	u8 id;
>   
>   	xe_pat_init_early(xe);
> @@ -631,18 +630,16 @@ int xe_device_probe(struct xe_device *xe)
>   		goto err_irq_shutdown;
>   
>   	for_each_gt(gt, xe, id) {
> -		last_gt = id;
> -
>   		err = xe_gt_init(gt);
>   		if (err)
> -			goto err_fini_gt;
> +			goto err_irq_shutdown;
>   	}
>   
>   	xe_heci_gsc_init(xe);
>   
>   	err = xe_display_init(xe);
>   	if (err)
> -		goto err_fini_gt;
> +		goto err_irq_shutdown;
>   
>   	err = drm_dev_register(&xe->drm, 0);
>   	if (err)
> @@ -658,15 +655,6 @@ int xe_device_probe(struct xe_device *xe)
>   
>   err_fini_display:
>   	xe_display_driver_remove(xe);
> -
> -err_fini_gt:
> -	for_each_gt(gt, xe, id) {
> -		if (id < last_gt)
> -			xe_gt_remove(gt);
> -		else
> -			break;
> -	}
> -
>   err_irq_shutdown:
>   	xe_irq_shutdown(xe);
>   err:
> @@ -684,18 +672,12 @@ static void xe_device_remove_display(struct xe_device *xe)
>   
>   void xe_device_remove(struct xe_device *xe)
>   {
> -	struct xe_gt *gt;
> -	u8 id;
> -
>   	xe_device_remove_display(xe);
>   
>   	xe_display_fini(xe);
>   
>   	xe_heci_gsc_fini(xe);
>   
> -	for_each_gt(gt, xe, id)
> -		xe_gt_remove(gt);
> -
>   	xe_irq_shutdown(xe);
>   }
>   
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 3f1826b783ad..00dbb84828ff 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -92,16 +92,14 @@ void xe_gt_sanitize(struct xe_gt *gt)
>   	gt->uc.guc.submission_state.enabled = false;
>   }
>   
> -/**
> - * xe_gt_remove() - Clean up the GT structures before driver removal
> - * @gt: the GT object
> - *
> +/*
>    * This function should only act on objects/structures that must be cleaned
>    * before the driver removal callback is complete and therefore can't be
>    * deferred to a drmm action.
>    */
> -void xe_gt_remove(struct xe_gt *gt)
> +static void gt_remove(void *arg)
>   {
> +	struct xe_gt *gt = arg;
>   	int i;
>   
>   	xe_uc_remove(&gt->uc);
> @@ -573,6 +571,7 @@ int xe_gt_init(struct xe_gt *gt)
>   	if (err)
>   		return err;
>   
> +	return devm_add_action_or_reset(gt_to_xe(gt)->drm.dev, gt_remove, gt);
>   }
>   
>   static int do_gt_reset(struct xe_gt *gt)
> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> index 8474c50b1b30..866a49037d58 100644
> --- a/drivers/gpu/drm/xe/xe_gt.h
> +++ b/drivers/gpu/drm/xe/xe_gt.h
> @@ -43,7 +43,6 @@ int xe_gt_suspend(struct xe_gt *gt);
>   int xe_gt_resume(struct xe_gt *gt);
>   void xe_gt_reset_async(struct xe_gt *gt);
>   void xe_gt_sanitize(struct xe_gt *gt);
> -void xe_gt_remove(struct xe_gt *gt);
>   
>   /**
>    * xe_gt_any_hw_engine_by_reset_domain - scan the list of engines and return the


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 16/20] drm/xe/mmio: move mmio_fini over to devm
  2024-05-10 18:12 ` [PATCH 16/20] drm/xe/mmio: move mmio_fini over to devm Matthew Auld
@ 2024-05-13  8:00   ` Andrzej Hajda
  0 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  8:00 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> Not valid to touch mmio once the device is removed, so make sure we
> unmap on removal and not just when driver instance goes away. Also set
> the mmio pointers to NULL to hopefully catch such issues more easily.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej
> ---
>   drivers/gpu/drm/xe/xe_mmio.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
> index 05edab0e085d..a3094e741db8 100644
> --- a/drivers/gpu/drm/xe/xe_mmio.c
> +++ b/drivers/gpu/drm/xe/xe_mmio.c
> @@ -387,13 +387,16 @@ void xe_mmio_probe_tiles(struct xe_device *xe)
>   	}
>   }
>   
> -static void mmio_fini(struct drm_device *drm, void *arg)
> +static void mmio_fini(void *arg)
>   {
>   	struct xe_device *xe = arg;
>   
>   	pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
>   	if (xe->mem.vram.mapping)
>   		iounmap(xe->mem.vram.mapping);
> +
> +	xe->mem.vram.mapping = NULL;
> +	xe->mmio.regs = NULL;
>   }
>   
>   int xe_mmio_init(struct xe_device *xe)
> @@ -418,7 +421,7 @@ int xe_mmio_init(struct xe_device *xe)
>   	root_tile->mmio.size = SZ_16M;
>   	root_tile->mmio.regs = xe->mmio.regs;
>   
> -	return drmm_add_action_or_reset(&xe->drm, mmio_fini, xe);
> +	return devm_add_action_or_reset(xe->drm.dev, mmio_fini, xe);
>   }
>   
>   u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 17/20] drm/xe: reset mmio mappings with devm
  2024-05-10 18:12 ` [PATCH 17/20] drm/xe: reset mmio mappings with devm Matthew Auld
@ 2024-05-13  8:12   ` Andrzej Hajda
  0 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  8:12 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> Set our various mmio mappings to NULL. This should make it easier to
> catch something rogue trying to mess with mmio after device removal. For
> example, we might unmap everything and then start hitting some mmio
> address which has already been unmamped by us and then remapped by
> something else, causing all kinds of carnage.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej
> ---
>   drivers/gpu/drm/xe/xe_device.c |  4 +++-
>   drivers/gpu/drm/xe/xe_mmio.c   | 35 ++++++++++++++++++++++++++++------
>   drivers/gpu/drm/xe/xe_mmio.h   |  2 +-
>   3 files changed, 33 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 40983ad66e3c..ce8bccbb06ba 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -554,7 +554,9 @@ int xe_device_probe(struct xe_device *xe)
>   	if (err)
>   		return err;
>   
> -	xe_mmio_probe_tiles(xe);
> +	err = xe_mmio_probe_tiles(xe);
> +	if (err)
> +		return err;
>   
>   	xe_ttm_sys_mgr_init(xe);
>   
> diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
> index a3094e741db8..0ad840bab8f3 100644
> --- a/drivers/gpu/drm/xe/xe_mmio.c
> +++ b/drivers/gpu/drm/xe/xe_mmio.c
> @@ -255,6 +255,21 @@ static int xe_mmio_tile_vram_size(struct xe_tile *tile, u64 *vram_size,
>   	return xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>   }
>   
> +static void vram_fini(void *arg)
> +{
> +	struct xe_device *xe = arg;
> +	struct xe_tile *tile;
> +	int id;
> +
> +	if (xe->mem.vram.mapping)
> +		iounmap(xe->mem.vram.mapping);
> +
> +	xe->mem.vram.mapping = NULL;
> +
> +	for_each_tile(tile, xe, id)
> +		tile->mem.vram.mapping = NULL;
> +}
> +
>   int xe_mmio_probe_vram(struct xe_device *xe)
>   {
>   	struct xe_tile *tile;
> @@ -331,10 +346,20 @@ int xe_mmio_probe_vram(struct xe_device *xe)
>   	drm_info(&xe->drm, "Available VRAM: %pa, %pa\n", &xe->mem.vram.io_start,
>   		 &available_size);
>   
> -	return 0;
> +	return devm_add_action_or_reset(xe->drm.dev, vram_fini, xe);
>   }
>   
> -void xe_mmio_probe_tiles(struct xe_device *xe)
> +static void tiles_fini(void *arg)
> +{
> +	struct xe_device *xe = arg;
> +	struct xe_tile *tile;
> +	int id;
> +
> +	for_each_tile(tile, xe, id)
> +		tile->mmio.regs = NULL;
> +}
> +
> +int xe_mmio_probe_tiles(struct xe_device *xe)
>   {
>   	size_t tile_mmio_size = SZ_16M, tile_mmio_ext_size = xe->info.tile_mmio_ext_size;
>   	u8 id, tile_count = xe->info.tile_count;
> @@ -385,6 +410,8 @@ void xe_mmio_probe_tiles(struct xe_device *xe)
>   			regs += tile_mmio_ext_size;
>   		}
>   	}
> +
> +	return devm_add_action_or_reset(xe->drm.dev, tiles_fini, xe);
>   }
>   
>   static void mmio_fini(void *arg)
> @@ -392,10 +419,6 @@ static void mmio_fini(void *arg)
>   	struct xe_device *xe = arg;
>   
>   	pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
> -	if (xe->mem.vram.mapping)
> -		iounmap(xe->mem.vram.mapping);
> -
> -	xe->mem.vram.mapping = NULL;
>   	xe->mmio.regs = NULL;
>   }
>   
> diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
> index 445ec6a0753e..6488705c1ffe 100644
> --- a/drivers/gpu/drm/xe/xe_mmio.h
> +++ b/drivers/gpu/drm/xe/xe_mmio.h
> @@ -21,7 +21,7 @@ struct xe_device;
>   #define LMEM_BAR		2
>   
>   int xe_mmio_init(struct xe_device *xe);
> -void xe_mmio_probe_tiles(struct xe_device *xe);
> +int xe_mmio_probe_tiles(struct xe_device *xe);
>   
>   u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg);
>   u16 xe_mmio_read16(struct xe_gt *gt, struct xe_reg reg);


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 18/20] drm/xe/display: move display fini stuff to devm
  2024-05-10 18:12 ` [PATCH 18/20] drm/xe/display: move display fini stuff to devm Matthew Auld
@ 2024-05-13  8:13   ` Andrzej Hajda
  0 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  8:13 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> Match the i915 display handling here with calling both no_irq and
> noaccel when removing the device.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej
> ---
>   drivers/gpu/drm/xe/display/xe_display.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
> index 0de0566e5b39..6d74c9f55b0a 100644
> --- a/drivers/gpu/drm/xe/display/xe_display.c
> +++ b/drivers/gpu/drm/xe/display/xe_display.c
> @@ -126,9 +126,9 @@ int xe_display_init_nommio(struct xe_device *xe)
>   	return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe);
>   }
>   
> -static void xe_display_fini_noirq(struct drm_device *dev, void *dummy)
> +static void xe_display_fini_noirq(void *arg)
>   {
> -	struct xe_device *xe = to_xe_device(dev);
> +	struct xe_device *xe = arg;
>   
>   	if (!xe->info.enable_display)
>   		return;
> @@ -163,12 +163,12 @@ int xe_display_init_noirq(struct xe_device *xe)
>   	if (err)
>   		return err;
>   
> -	return drmm_add_action_or_reset(&xe->drm, xe_display_fini_noirq, NULL);
> +	return devm_add_action_or_reset(xe->drm.dev, xe_display_fini_noirq, xe);
>   }
>   
> -static void xe_display_fini_noaccel(struct drm_device *dev, void *dummy)
> +static void xe_display_fini_noaccel(void *arg)
>   {
> -	struct xe_device *xe = to_xe_device(dev);
> +	struct xe_device *xe = arg;
>   
>   	if (!xe->info.enable_display)
>   		return;
> @@ -187,7 +187,7 @@ int xe_display_init_noaccel(struct xe_device *xe)
>   	if (err)
>   		return err;
>   
> -	return drmm_add_action_or_reset(&xe->drm, xe_display_fini_noaccel, NULL);
> +	return devm_add_action_or_reset(xe->drm.dev, xe_display_fini_noaccel, xe);
>   }
>   
>   int xe_display_init(struct xe_device *xe)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 19/20] drm/xe/display: stop calling domains_driver_remove twice
  2024-05-10 18:12 ` [PATCH 19/20] drm/xe/display: stop calling domains_driver_remove twice Matthew Auld
@ 2024-05-13  8:19   ` Andrzej Hajda
  2024-05-13  8:27     ` Matthew Auld
  0 siblings, 1 reply; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  8:19 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> Unclear why we call this twice.

Hmm, where is another call, I couldn't find one.

Andrzej


> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/display/xe_display.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
> index 6d74c9f55b0a..734bddbb8932 100644
> --- a/drivers/gpu/drm/xe/display/xe_display.c
> +++ b/drivers/gpu/drm/xe/display/xe_display.c
> @@ -134,7 +134,6 @@ static void xe_display_fini_noirq(void *arg)
>   		return;
>   
>   	intel_display_driver_remove_noirq(xe);
> -	intel_power_domains_driver_remove(xe);


>   }
>   
>   int xe_display_init_noirq(struct xe_device *xe)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 20/20] drm/xe/display: move device_remove over to drmm
  2024-05-10 18:12 ` [PATCH 20/20] drm/xe/display: move device_remove over to drmm Matthew Auld
  2024-05-11 10:33   ` kernel test robot
  2024-05-11 10:43   ` kernel test robot
@ 2024-05-13  8:22   ` Andrzej Hajda
  2 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  8:22 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi

On 10.05.2024 20:12, Matthew Auld wrote:
> i915 display calls this when releasing the drm_device, match this also
> in xe by using drmm. intel_display_device_remove() is freeing purely
> software state for the drm_device.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/display/xe_display.c | 20 ++++++++++++++++----
>   drivers/gpu/drm/xe/display/xe_display.h |  4 ++--
>   drivers/gpu/drm/xe/xe_pci.c             |  4 +++-
>   3 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
> index 734bddbb8932..ac674a08664d 100644
> --- a/drivers/gpu/drm/xe/display/xe_display.c
> +++ b/drivers/gpu/drm/xe/display/xe_display.c
> @@ -234,8 +234,6 @@ void xe_display_driver_remove(struct xe_device *xe)
>   		return;
>   
>   	intel_display_driver_remove(xe);
> -
> -	intel_display_device_remove(xe);
>   }
>   
>   /* IRQ-related functions */
> @@ -377,17 +375,31 @@ void xe_display_pm_resume(struct xe_device *xe)
>   	intel_power_domains_enable(xe);
>   }
>   
> -void xe_display_probe(struct xe_device *xe)
> +static void display_device_remove(struct drm_device *dev, void *arg)
> +{
> +	struct xe_device *xe = arg;
> +
> +	intel_display_device_remove(xe);
> +}
> +
> +int xe_display_probe(struct xe_device *xe)
>   {
> +	int err;
> +
>   	if (!xe->info.enable_display)
>   		goto no_display;
>   
>   	intel_display_device_probe(xe);
>   
> +	err = drmm_add_action_or_reset(&xe->drm, display_device_remove, xe);
> +	if (err)
> +		return err;
> +
>   	if (has_display(xe))
> -		return;
> +		return 0;
>   
>   no_display:
>   	xe->info.enable_display = false;
>   	unset_display_features(xe);
> +	return 0;
>   }
> diff --git a/drivers/gpu/drm/xe/display/xe_display.h b/drivers/gpu/drm/xe/display/xe_display.h
> index 710e56180b52..148bf7d744e6 100644
> --- a/drivers/gpu/drm/xe/display/xe_display.h
> +++ b/drivers/gpu/drm/xe/display/xe_display.h
> @@ -18,7 +18,7 @@ void xe_display_driver_remove(struct xe_device *xe);
>   
>   int xe_display_create(struct xe_device *xe);
>   
> -void xe_display_probe(struct xe_device *xe);
> +int xe_display_probe(struct xe_device *xe);
>   
>   int xe_display_init_nommio(struct xe_device *xe);
>   int xe_display_init_noirq(struct xe_device *xe);
> @@ -47,7 +47,7 @@ static inline void xe_display_driver_remove(struct xe_device *xe) {}
>   
>   static inline int xe_display_create(struct xe_device *xe) { return 0; }
>   
> -static inline void xe_display_probe(struct xe_device *xe) { }
> +static inline int xe_display_probe(struct xe_device *xe) { }

With added "return 0"
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej

>   
>   static inline int xe_display_init_nommio(struct xe_device *xe) { return 0; }
>   static inline int xe_display_init_noirq(struct xe_device *xe) { return 0; }
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index 841b02ac6ba1..b366cbceb476 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -768,7 +768,9 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   	if (err)
>   		return err;
>   
> -	xe_display_probe(xe);
> +	err = xe_display_probe(xe);
> +	if (err)
> +		return err;
>   
>   	drm_dbg(&xe->drm, "%s %s %04x:%04x dgfx:%d gfx:%s (%d.%02d) media:%s (%d.%02d) display:%s dma_m_s:%d tc:%d gscfi:%d",
>   		desc->platform_name,


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 19/20] drm/xe/display: stop calling domains_driver_remove twice
  2024-05-13  8:19   ` Andrzej Hajda
@ 2024-05-13  8:27     ` Matthew Auld
  2024-05-13  9:17       ` Andrzej Hajda
  0 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-13  8:27 UTC (permalink / raw)
  To: Andrzej Hajda, intel-xe; +Cc: Rodrigo Vivi

On 13/05/2024 09:19, Andrzej Hajda wrote:
> On 10.05.2024 20:12, Matthew Auld wrote:
>> Unclear why we call this twice.
> 
> Hmm, where is another call, I couldn't find one.

In intel_display_driver_remove_nogem it is already called.

> 
> Andrzej
> 
> 
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> ---
>>   drivers/gpu/drm/xe/display/xe_display.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/display/xe_display.c 
>> b/drivers/gpu/drm/xe/display/xe_display.c
>> index 6d74c9f55b0a..734bddbb8932 100644
>> --- a/drivers/gpu/drm/xe/display/xe_display.c
>> +++ b/drivers/gpu/drm/xe/display/xe_display.c
>> @@ -134,7 +134,6 @@ static void xe_display_fini_noirq(void *arg)
>>           return;
>>       intel_display_driver_remove_noirq(xe);
>> -    intel_power_domains_driver_remove(xe);
> 
> 
>>   }
>>   int xe_display_init_noirq(struct xe_device *xe)
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 14/20] drm/xe/gt: break out gt_fini into sw vs hw state
  2024-05-13  7:50   ` Andrzej Hajda
@ 2024-05-13  8:37     ` Matthew Auld
  2024-05-13  9:19       ` Andrzej Hajda
  0 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2024-05-13  8:37 UTC (permalink / raw)
  To: Andrzej Hajda, intel-xe; +Cc: Rodrigo Vivi

On 13/05/2024 08:50, Andrzej Hajda wrote:
> On 10.05.2024 20:12, Matthew Auld wrote:
>> Have a cleaner separation between hw vs sw.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_gt.c | 21 +++++++++++++--------
>>   1 file changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>> index 36c7b1631fa6..3f1826b783ad 100644
>> --- a/drivers/gpu/drm/xe/xe_gt.c
>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>> @@ -57,9 +57,17 @@
>>   #include "xe_wa.h"
>>   #include "xe_wopcm.h"
>> +static void gt_fini(struct drm_device *drm, void *arg)
>> +{
>> +    struct xe_gt *gt = arg;
>> +
>> +    destroy_workqueue(gt->ordered_wq);
> 
> Do we need this wq after unbind? It seems to be used by hw stuff:
> - gt_reset
> - xe_guc_exec_queue_trigger_cleanup

Yeah, my thinking is that for stuff like exec queue we might need to 
clean it up after unplug? Plus it's just sw state. I can drop for now if 
you prefer?

> 
> Regards
> Andrzej
> 
>> +}
>> +
>>   struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
>>   {
>>       struct xe_gt *gt;
>> +    int err;
>>       gt = drmm_kzalloc(&tile_to_xe(tile)->drm, sizeof(*gt), GFP_KERNEL);
>>       if (!gt)
>> @@ -68,6 +76,10 @@ struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
>>       gt->tile = tile;
>>       gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0);
>> +    err = drmm_add_action_or_reset(&gt_to_xe(gt)->drm, gt_fini, gt);
>> +    if (err)
>> +        return ERR_PTR(err);
>> +
>>       return gt;
>>   }
>> @@ -90,15 +102,9 @@ void xe_gt_sanitize(struct xe_gt *gt)
>>    */
>>   void xe_gt_remove(struct xe_gt *gt)
>>   {
>> -    xe_uc_remove(&gt->uc);
>> -}
>> -
>> -static void gt_fini(struct drm_device *drm, void *arg)
>> -{
>> -    struct xe_gt *gt = arg;
>>       int i;
>> -    destroy_workqueue(gt->ordered_wq);
>> +    xe_uc_remove(&gt->uc);
>>       for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
>>           xe_hw_fence_irq_finish(&gt->fence_irq[i]);
>> @@ -567,7 +573,6 @@ int xe_gt_init(struct xe_gt *gt)
>>       if (err)
>>           return err;
>> -    return drmm_add_action_or_reset(&gt_to_xe(gt)->drm, gt_fini, gt);
>>   }
>>   static int do_gt_reset(struct xe_gt *gt)
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 19/20] drm/xe/display: stop calling domains_driver_remove twice
  2024-05-13  8:27     ` Matthew Auld
@ 2024-05-13  9:17       ` Andrzej Hajda
  0 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  9:17 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi



On 13.05.2024 10:27, Matthew Auld wrote:
> On 13/05/2024 09:19, Andrzej Hajda wrote:
>> On 10.05.2024 20:12, Matthew Auld wrote:
>>> Unclear why we call this twice.
>>
>> Hmm, where is another call, I couldn't find one.
>
> In intel_display_driver_remove_nogem it is already called.

Apparently I have some outdated sources.


>
>>
>> Andrzej
>>
>>
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej

>>> ---
>>>   drivers/gpu/drm/xe/display/xe_display.c | 1 -
>>>   1 file changed, 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/display/xe_display.c 
>>> b/drivers/gpu/drm/xe/display/xe_display.c
>>> index 6d74c9f55b0a..734bddbb8932 100644
>>> --- a/drivers/gpu/drm/xe/display/xe_display.c
>>> +++ b/drivers/gpu/drm/xe/display/xe_display.c
>>> @@ -134,7 +134,6 @@ static void xe_display_fini_noirq(void *arg)
>>>           return;
>>>       intel_display_driver_remove_noirq(xe);
>>> -    intel_power_domains_driver_remove(xe);
>>
>>
>>>   }
>>>   int xe_display_init_noirq(struct xe_device *xe)
>>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 14/20] drm/xe/gt: break out gt_fini into sw vs hw state
  2024-05-13  8:37     ` Matthew Auld
@ 2024-05-13  9:19       ` Andrzej Hajda
  0 siblings, 0 replies; 47+ messages in thread
From: Andrzej Hajda @ 2024-05-13  9:19 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Rodrigo Vivi



On 13.05.2024 10:37, Matthew Auld wrote:
> On 13/05/2024 08:50, Andrzej Hajda wrote:
>> On 10.05.2024 20:12, Matthew Auld wrote:
>>> Have a cleaner separation between hw vs sw.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_gt.c | 21 +++++++++++++--------
>>>   1 file changed, 13 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>>> index 36c7b1631fa6..3f1826b783ad 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>>> @@ -57,9 +57,17 @@
>>>   #include "xe_wa.h"
>>>   #include "xe_wopcm.h"
>>> +static void gt_fini(struct drm_device *drm, void *arg)
>>> +{
>>> +    struct xe_gt *gt = arg;
>>> +
>>> +    destroy_workqueue(gt->ordered_wq);
>>
>> Do we need this wq after unbind? It seems to be used by hw stuff:
>> - gt_reset
>> - xe_guc_exec_queue_trigger_cleanup
>
> Yeah, my thinking is that for stuff like exec queue we might need to 
> clean it up after unplug? Plus it's just sw state. I can drop for now 
> if you prefer?

OK, no strong feelings.

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej

>
>>
>> Regards
>> Andrzej
>>
>>> +}
>>> +
>>>   struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
>>>   {
>>>       struct xe_gt *gt;
>>> +    int err;
>>>       gt = drmm_kzalloc(&tile_to_xe(tile)->drm, sizeof(*gt), 
>>> GFP_KERNEL);
>>>       if (!gt)
>>> @@ -68,6 +76,10 @@ struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
>>>       gt->tile = tile;
>>>       gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0);
>>> +    err = drmm_add_action_or_reset(&gt_to_xe(gt)->drm, gt_fini, gt);
>>> +    if (err)
>>> +        return ERR_PTR(err);
>>> +
>>>       return gt;
>>>   }
>>> @@ -90,15 +102,9 @@ void xe_gt_sanitize(struct xe_gt *gt)
>>>    */
>>>   void xe_gt_remove(struct xe_gt *gt)
>>>   {
>>> -    xe_uc_remove(&gt->uc);
>>> -}
>>> -
>>> -static void gt_fini(struct drm_device *drm, void *arg)
>>> -{
>>> -    struct xe_gt *gt = arg;
>>>       int i;
>>> -    destroy_workqueue(gt->ordered_wq);
>>> +    xe_uc_remove(&gt->uc);
>>>       for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
>>>           xe_hw_fence_irq_finish(&gt->fence_irq[i]);
>>> @@ -567,7 +573,6 @@ int xe_gt_init(struct xe_gt *gt)
>>>       if (err)
>>>           return err;
>>> -    return drmm_add_action_or_reset(&gt_to_xe(gt)->drm, gt_fini, gt);
>>>   }
>>>   static int do_gt_reset(struct xe_gt *gt)
>>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/20] drm/amdgpu: don't trample pdev drvdata
  2024-05-10 18:12 ` [PATCH 02/20] drm/amdgpu: don't trample pdev drvdata Matthew Auld
@ 2024-05-13 11:54   ` Christian König
  0 siblings, 0 replies; 47+ messages in thread
From: Christian König @ 2024-05-13 11:54 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Christian König, Daniel Vetter, amd-gfx

Am 10.05.24 um 20:12 schrieb Matthew Auld:
> The driver release callback is called when a particular drm_device goes
> away, just like with drmm, so here we should never nuke the pdev drvdata
> pointer, since that could already be pointing to a new drvdata.
> For example something hotunplugs the device, for which we have an open
> driver fd, keeping the drm_device alive, and in the meantime the same
> physical device is re-attached to a new drm_device therefore setting
> drvdata again. Once the original drm_device goes away, we might then
> call the release which then incorrectly tramples the drvdata.
>
> The driver core will already nuke the pointer for us when the pci device
> is removed, so should be safe to simply drop. Alternative would be to
> move to the driver pci remove callback.
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: amd-gfx@lists.freedesktop.org

Oh! Very good catch! That might become important for a feature we 
current discuss internally.

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index a0ea6fe8d060..d5fed007c698 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1450,7 +1450,6 @@ void amdgpu_driver_release_kms(struct drm_device *dev)
>   	struct amdgpu_device *adev = drm_to_adev(dev);
>   
>   	amdgpu_device_fini_sw(adev);
> -	pci_set_drvdata(adev->pdev, NULL);
>   }
>   
>   /*


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/20] drm/drm_managed: try to improve the drmm DOC
  2024-05-10 18:12 ` [PATCH 01/20] drm/drm_managed: try to improve the drmm DOC Matthew Auld
  2024-05-13  7:06   ` Andrzej Hajda
@ 2024-05-13 15:38   ` Rodrigo Vivi
  1 sibling, 0 replies; 47+ messages in thread
From: Rodrigo Vivi @ 2024-05-13 15:38 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe, Daniel Vetter, dri-devel

On Fri, May 10, 2024 at 07:12:14PM +0100, Matthew Auld wrote:
> Hopefully make it clearer when to use devm vs drmm.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org
> ---
>  drivers/gpu/drm/drm_managed.c | 42 +++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c
> index 7646f67bda4e..20d705bbc0a3 100644
> --- a/drivers/gpu/drm/drm_managed.c
> +++ b/drivers/gpu/drm/drm_managed.c
> @@ -34,6 +34,48 @@
>   * during the lifetime of the driver, all the functions are fully concurrent
>   * safe. But it is recommended to use managed resources only for resources that
>   * change rarely, if ever, during the lifetime of the &drm_device instance.
> + *
> + * Note that the distinction between devm and drmm is important to get right.
> + * Consider some hotunplug scenarios, where it is valid for there to be multiple
> + * unplugged struct &drm_device instances each being kept alive by an open
> + * driver fd. The driver needs a clean separation between what needs to happen
> + * when the struct &device is removed and what needs to happen when a given
> + * struct &drm_device instance is released, as well as in some cases a more
> + * finer grained marking of critical sections that require hardware interaction.
> + * See below.
> + *
> + * devm
> + * ~~~~
> + * In general use devm for cleaning up anything hardware related. So removing
> + * pci mmaps, releasing interrupt handlers, basically anything hw related.  The
                                                                              ^
Extra space.

> + * devm release actions are called when the struct &device is removed, shortly
> + * after calling into the drivers struct &pci_driver.remove() callback, if this
> + * is a pci device.
> + *
> + * devm can be thought of as an alternative to putting all the hw related

nit: perhaps s/thought/seen ?

> + * cleanup directly in the struct &pci_driver.remove() callback, where the
> + * correct ordering of the unwind steps needs to be manually done in the error
> + * path of the struct &pci_driver.probe() and again on the remove side.  With
> + * devm this is all done automatically.
> + *
> + * drmm
> + * ~~~~
> + * In general use this for cleaning up anything software related. So data
> + * structures and the like which are tied to the lifetime of a particular struct
> + * &drm_device instance.
> + *
> + * drmm can be thought of as an alternative to putting all the software related

nit: perhaps s/thought/seen ?

> + * cleanup directly in the struct &drm_driver.release() callback, where again
> + * the correct ordering of the unwind steps needs to be done manually. As with
> + * devm this is instead done automatically.
> + *
> + * Sometimes there is no clean separation between software and hardware, which
> + * is where drm_dev_enter() comes in. For example, a driver might have some
> + * state tied to a struct &drm_device instance, for which the same cleanup path
> + * is called for both a plugged and unplugged device, and the cleanup itself
> + * might require talking to the device if it's still attached to this particular
> + * struct &drm_device. For that we instead mark the device sections.  See
> + * drm_dev_enter(), drm_dev_exit() and drm_dev_unplug().

perhaps open up a bit more here?

anyway, everything looks good to me.

Sima, thoughts?

>   */
>  
>  struct drmres_node {
> -- 
> 2.45.0
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/20] drm/xe/ggtt: use drm_dev_enter to mark device section
  2024-05-10 18:12 ` [PATCH 05/20] drm/xe/ggtt: use drm_dev_enter to mark device section Matthew Auld
  2024-05-13  7:16   ` Andrzej Hajda
@ 2024-05-13 19:34   ` Randhawa, Jagmeet
  1 sibling, 0 replies; 47+ messages in thread
From: Randhawa, Jagmeet @ 2024-05-13 19:34 UTC (permalink / raw)
  To: Matthew Auld, intel-xe


On 5/10/2024 11:12 AM, Matthew Auld wrote:
> Device can be hotunplugged before we start destroying gem objects. In
> such a case don't touch the GGTT entries, trigger any invalidations or
> mess around with rpm.  This should already be taken care of when
> removing the device, we just need to take care of dealing with the
> software state, like removing the mm node.
>
> v2: (Andrzej)
>    - Avoid some duplication by tracking the bound status and checking
>      that instead.
>
> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_ggtt.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index 0d541f55b4fc..17e5066763db 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -8,6 +8,7 @@
>   #include <linux/io-64-nonatomic-lo-hi.h>
>   #include <linux/sizes.h>
>   
> +#include <drm/drm_drv.h>
>   #include <drm/drm_managed.h>
>   #include <drm/i915_drm.h>
>   
> @@ -433,18 +434,29 @@ int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
>   void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
>   			 bool invalidate)
>   {
> -	xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile));
> +	struct xe_device *xe = tile_to_xe(ggtt->tile);
> +	bool bound;
> +	int idx;
> +
> +	bound = drm_dev_enter(&xe->drm, &idx);
> +	if (bound)
> +		xe_pm_runtime_get_noresume(xe);
>   
>   	mutex_lock(&ggtt->lock);
> -	xe_ggtt_clear(ggtt, node->start, node->size);
> +	if (bound)
> +		xe_ggtt_clear(ggtt, node->start, node->size);
>   	drm_mm_remove_node(node);
>   	node->size = 0;
>   	mutex_unlock(&ggtt->lock);
>   
> +	if (!bound)
> +		return;
> +
>   	if (invalidate)
>   		xe_ggtt_invalidate(ggtt);
>   
> -	xe_pm_runtime_put(tile_to_xe(ggtt->tile));
> +	xe_pm_runtime_put(xe);
> +	drm_dev_exit(idx);
>   }
>   
>   void xe_ggtt_remove_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)

Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com>


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2024-05-13 19:34 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-10 18:12 [PATCH 00/20] core_hotunplug improvements Matthew Auld
2024-05-10 18:12 ` [PATCH 01/20] drm/drm_managed: try to improve the drmm DOC Matthew Auld
2024-05-13  7:06   ` Andrzej Hajda
2024-05-13 15:38   ` Rodrigo Vivi
2024-05-10 18:12 ` [PATCH 02/20] drm/amdgpu: don't trample pdev drvdata Matthew Auld
2024-05-13 11:54   ` Christian König
2024-05-10 18:12 ` [PATCH 03/20] drm/xe/pci: remove broken driver_release Matthew Auld
2024-05-13  7:14   ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 04/20] drm/xe: covert sysfs over to devm Matthew Auld
2024-05-10 18:12 ` [PATCH 05/20] drm/xe/ggtt: use drm_dev_enter to mark device section Matthew Auld
2024-05-13  7:16   ` Andrzej Hajda
2024-05-13 19:34   ` Randhawa, Jagmeet
2024-05-10 18:12 ` [PATCH 06/20] drm/xe/guc: move guc_fini over to devm Matthew Auld
2024-05-10 18:12 ` [PATCH 07/20] drm/xe/guc: s/guc_fini/guc_fini_hw/ Matthew Auld
2024-05-13  7:30   ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 08/20] drm/xe/guc_pc: move pc_fini to devm Matthew Auld
2024-05-10 18:12 ` [PATCH 09/20] drm/xe/guc_pc: s/pc_fini/pc_fini_hw/ Matthew Auld
2024-05-11  9:19   ` kernel test robot
2024-05-13  7:37   ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 10/20] drm/xe/irq: move irq_uninstall over to devm Matthew Auld
2024-05-10 18:12 ` [PATCH 11/20] drm/xe/device: move flr " Matthew Auld
2024-05-10 18:12 ` [PATCH 12/20] drm/xe/device: move xe_device_sanitize over " Matthew Auld
2024-05-10 18:12 ` [PATCH 13/20] drm/xe/coredump: move " Matthew Auld
2024-05-13  7:38   ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 14/20] drm/xe/gt: break out gt_fini into sw vs hw state Matthew Auld
2024-05-13  7:50   ` Andrzej Hajda
2024-05-13  8:37     ` Matthew Auld
2024-05-13  9:19       ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 15/20] drm/xe: make gt_remove use devm Matthew Auld
2024-05-13  7:58   ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 16/20] drm/xe/mmio: move mmio_fini over to devm Matthew Auld
2024-05-13  8:00   ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 17/20] drm/xe: reset mmio mappings with devm Matthew Auld
2024-05-13  8:12   ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 18/20] drm/xe/display: move display fini stuff to devm Matthew Auld
2024-05-13  8:13   ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 19/20] drm/xe/display: stop calling domains_driver_remove twice Matthew Auld
2024-05-13  8:19   ` Andrzej Hajda
2024-05-13  8:27     ` Matthew Auld
2024-05-13  9:17       ` Andrzej Hajda
2024-05-10 18:12 ` [PATCH 20/20] drm/xe/display: move device_remove over to drmm Matthew Auld
2024-05-11 10:33   ` kernel test robot
2024-05-11 10:43   ` kernel test robot
2024-05-13  8:22   ` Andrzej Hajda
2024-05-10 18:17 ` ✓ CI.Patch_applied: success for core_hotunplug improvements Patchwork
2024-05-10 18:18 ` ✗ CI.checkpatch: warning " Patchwork
2024-05-10 18:18 ` ✗ CI.KUnit: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox