Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] Fix error paths in driver load
@ 2024-08-10  1:55 Matthew Brost
  2024-08-10  1:55 ` [PATCH 01/11] drm/xe: use devm instead of drmm for managed bo Matthew Brost
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

Combine a few series [1] [2] [3] plus some new patches to get driver
into working state.

The below test case passed on TGL:
for i in {1..19}; do echo "Run $i"; modprobe xe inject_driver_load_error=$i; rmmod xe; done

More error injection points should be added, likely 100s more. Posted to
likely hand off this work to a team member in a working state.

Matt

[1] https://patchwork.freedesktop.org/series/137113/
[2] https://patchwork.freedesktop.org/series/137112/
[3] https://patchwork.freedesktop.org/series/137111/

Daniele Ceraolo Spurio (3):
  drm/xe: use devm instead of drmm for managed bo
  drm/xe/uc: Use managed bo for HuC and GSC objects
  drm/xe/uc: Use devm to register cleanup that includes exec_queues

Matthew Brost (8):
  drm/xe: Fix tile fini sequence
  drm/xe: Add driver load error injection
  drm/xe: Move ggtt_fini to devm managed
  drm/xe: Set firmware state to loadable before registering guc_fini_hw
  drm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini
  drm/xe: Move hw_engine_fini to devm managed
  drm/xe: Move HuC init before GuC init
  drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map

 drivers/gpu/drm/xe/xe_bo.c           |  6 ++---
 drivers/gpu/drm/xe/xe_device.c       | 31 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_device.h       | 15 ++++++++++++
 drivers/gpu/drm/xe/xe_device_types.h |  4 ++++
 drivers/gpu/drm/xe/xe_ggtt.c         |  4 ++--
 drivers/gpu/drm/xe/xe_gsc.c          | 16 ++++---------
 drivers/gpu/drm/xe/xe_gsc_proxy.c    | 36 +++++-----------------------
 drivers/gpu/drm/xe/xe_gt.c           |  5 ++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf.c  |  4 ++++
 drivers/gpu/drm/xe/xe_guc.c          | 12 ++++++++--
 drivers/gpu/drm/xe/xe_guc_ads.c      |  5 ++++
 drivers/gpu/drm/xe/xe_guc_ct.c       |  4 ++++
 drivers/gpu/drm/xe/xe_guc_log.c      |  5 ++++
 drivers/gpu/drm/xe/xe_guc_pc.c       |  2 +-
 drivers/gpu/drm/xe/xe_guc_submit.c   |  4 ++--
 drivers/gpu/drm/xe/xe_huc.c          | 19 ++++-----------
 drivers/gpu/drm/xe/xe_hw_engine.c    |  4 ++--
 drivers/gpu/drm/xe/xe_mmio.c         | 10 +++++++-
 drivers/gpu/drm/xe/xe_module.c       |  5 ++++
 drivers/gpu/drm/xe/xe_module.h       |  3 +++
 drivers/gpu/drm/xe/xe_pci.c          |  9 +++++++
 drivers/gpu/drm/xe/xe_pm.c           |  8 +++++++
 drivers/gpu/drm/xe/xe_sa.c           | 13 +++++-----
 drivers/gpu/drm/xe/xe_sa_types.h     |  1 +
 drivers/gpu/drm/xe/xe_sriov.c        |  8 ++++++-
 drivers/gpu/drm/xe/xe_tile.c         |  4 ++++
 drivers/gpu/drm/xe/xe_uc.c           | 12 ++++++----
 drivers/gpu/drm/xe/xe_wa.c           |  5 ++++
 drivers/gpu/drm/xe/xe_wopcm.c        |  4 ++++
 29 files changed, 178 insertions(+), 80 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/11] drm/xe: use devm instead of drmm for managed bo
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 02/11] drm/xe/uc: Use managed bo for HuC and GSC objects Matthew Brost
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

The BO cleanup touches the GGTT and therefore requires the HW to be
available, so we need to use devm instead of drmm.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1160
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 3295bc92d7aa..45652d7e6fa6 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1576,7 +1576,7 @@ struct xe_bo *xe_bo_create_from_data(struct xe_device *xe, struct xe_tile *tile,
 	return bo;
 }
 
-static void __xe_bo_unpin_map_no_vm(struct drm_device *drm, void *arg)
+static void __xe_bo_unpin_map_no_vm(void *arg)
 {
 	xe_bo_unpin_map_no_vm(arg);
 }
@@ -1591,7 +1591,7 @@ struct xe_bo *xe_managed_bo_create_pin_map(struct xe_device *xe, struct xe_tile
 	if (IS_ERR(bo))
 		return bo;
 
-	ret = drmm_add_action_or_reset(&xe->drm, __xe_bo_unpin_map_no_vm, bo);
+	ret = devm_add_action_or_reset(xe->drm.dev, __xe_bo_unpin_map_no_vm, bo);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -1639,7 +1639,7 @@ int xe_managed_bo_reinit_in_vram(struct xe_device *xe, struct xe_tile *tile, str
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
-	drmm_release_action(&xe->drm, __xe_bo_unpin_map_no_vm, *src);
+	devm_release_action(xe->drm.dev, __xe_bo_unpin_map_no_vm, *src);
 	*src = bo;
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 02/11] drm/xe/uc: Use managed bo for HuC and GSC objects
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
  2024-08-10  1:55 ` [PATCH 01/11] drm/xe: use devm instead of drmm for managed bo Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 03/11] drm/xe/uc: Use devm to register cleanup that includes exec_queues Matthew Brost
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Drmm actions are not the right ones to clean up BOs and we should use
devm instead. However, we can also instead just allocate the objects
using the managed_bo function, which will internally register the
correct cleanup call and therefore allows us to simplify the code.

While at it, switch to drmm_kzalloc for the GSC proxy allocation to
further simplify the cleanup.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 drivers/gpu/drm/xe/xe_gsc.c       | 12 +++--------
 drivers/gpu/drm/xe/xe_gsc_proxy.c | 36 ++++++-------------------------
 drivers/gpu/drm/xe/xe_huc.c       | 19 +++++-----------
 3 files changed, 14 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index 77ce44e845c5..8a9b3c50a588 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -450,11 +450,6 @@ static void free_resources(struct drm_device *drm, void *arg)
 		xe_exec_queue_put(gsc->q);
 		gsc->q = NULL;
 	}
-
-	if (gsc->private) {
-		xe_bo_unpin_map_no_vm(gsc->private);
-		gsc->private = NULL;
-	}
 }
 
 int xe_gsc_init_post_hwconfig(struct xe_gsc *gsc)
@@ -474,10 +469,9 @@ int xe_gsc_init_post_hwconfig(struct xe_gsc *gsc)
 	if (!hwe)
 		return -ENODEV;
 
-	bo = xe_bo_create_pin_map(xe, tile, NULL, SZ_4M,
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_STOLEN |
-				  XE_BO_FLAG_GGTT);
+	bo = xe_managed_bo_create_pin_map(xe, tile, SZ_4M,
+					  XE_BO_FLAG_STOLEN |
+					  XE_BO_FLAG_GGTT);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.c b/drivers/gpu/drm/xe/xe_gsc_proxy.c
index aa812a2bc3ed..8f880c44211d 100644
--- a/drivers/gpu/drm/xe/xe_gsc_proxy.c
+++ b/drivers/gpu/drm/xe/xe_gsc_proxy.c
@@ -376,27 +376,6 @@ static const struct component_ops xe_gsc_proxy_component_ops = {
 	.unbind = xe_gsc_proxy_component_unbind,
 };
 
-static void proxy_channel_free(struct drm_device *drm, void *arg)
-{
-	struct xe_gsc *gsc = arg;
-
-	if (!gsc->proxy.bo)
-		return;
-
-	if (gsc->proxy.to_csme) {
-		kfree(gsc->proxy.to_csme);
-		gsc->proxy.to_csme = NULL;
-		gsc->proxy.from_csme = NULL;
-	}
-
-	if (gsc->proxy.bo) {
-		iosys_map_clear(&gsc->proxy.to_gsc);
-		iosys_map_clear(&gsc->proxy.from_gsc);
-		xe_bo_unpin_map_no_vm(gsc->proxy.bo);
-		gsc->proxy.bo = NULL;
-	}
-}
-
 static int proxy_channel_alloc(struct xe_gsc *gsc)
 {
 	struct xe_gt *gt = gsc_to_gt(gsc);
@@ -405,18 +384,15 @@ static int proxy_channel_alloc(struct xe_gsc *gsc)
 	struct xe_bo *bo;
 	void *csme;
 
-	csme = kzalloc(GSC_PROXY_CHANNEL_SIZE, GFP_KERNEL);
+	csme = drmm_kzalloc(&xe->drm, GSC_PROXY_CHANNEL_SIZE, GFP_KERNEL);
 	if (!csme)
 		return -ENOMEM;
 
-	bo = xe_bo_create_pin_map(xe, tile, NULL, GSC_PROXY_CHANNEL_SIZE,
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT);
-	if (IS_ERR(bo)) {
-		kfree(csme);
+	bo = xe_managed_bo_create_pin_map(xe, tile, GSC_PROXY_CHANNEL_SIZE,
+					  XE_BO_FLAG_SYSTEM |
+					  XE_BO_FLAG_GGTT);
+	if (IS_ERR(bo))
 		return PTR_ERR(bo);
-	}
 
 	gsc->proxy.bo = bo;
 	gsc->proxy.to_gsc = IOSYS_MAP_INIT_OFFSET(&bo->vmap, 0);
@@ -424,7 +400,7 @@ static int proxy_channel_alloc(struct xe_gsc *gsc)
 	gsc->proxy.to_csme = csme;
 	gsc->proxy.from_csme = csme + GSC_PROXY_BUFFER_SIZE;
 
-	return drmm_add_action_or_reset(&xe->drm, proxy_channel_free, gsc);
+	return 0;
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_huc.c b/drivers/gpu/drm/xe/xe_huc.c
index bec4366e5513..f5459f97af23 100644
--- a/drivers/gpu/drm/xe/xe_huc.c
+++ b/drivers/gpu/drm/xe/xe_huc.c
@@ -43,14 +43,6 @@ huc_to_guc(struct xe_huc *huc)
 	return &container_of(huc, struct xe_uc, huc)->guc;
 }
 
-static void free_gsc_pkt(struct drm_device *drm, void *arg)
-{
-	struct xe_huc *huc = arg;
-
-	xe_bo_unpin_map_no_vm(huc->gsc_pkt);
-	huc->gsc_pkt = NULL;
-}
-
 #define PXP43_HUC_AUTH_INOUT_SIZE SZ_4K
 static int huc_alloc_gsc_pkt(struct xe_huc *huc)
 {
@@ -59,17 +51,16 @@ static int huc_alloc_gsc_pkt(struct xe_huc *huc)
 	struct xe_bo *bo;
 
 	/* we use a single object for both input and output */
-	bo = xe_bo_create_pin_map(xe, gt_to_tile(gt), NULL,
-				  PXP43_HUC_AUTH_INOUT_SIZE * 2,
-				  ttm_bo_type_kernel,
-				  XE_BO_FLAG_SYSTEM |
-				  XE_BO_FLAG_GGTT);
+	bo = xe_managed_bo_create_pin_map(xe, gt_to_tile(gt),
+					  PXP43_HUC_AUTH_INOUT_SIZE * 2,
+					  XE_BO_FLAG_SYSTEM |
+					  XE_BO_FLAG_GGTT);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
 	huc->gsc_pkt = bo;
 
-	return drmm_add_action_or_reset(&xe->drm, free_gsc_pkt, huc);
+	return 0;
 }
 
 int xe_huc_init(struct xe_huc *huc)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/11] drm/xe/uc: Use devm to register cleanup that includes exec_queues
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
  2024-08-10  1:55 ` [PATCH 01/11] drm/xe: use devm instead of drmm for managed bo Matthew Brost
  2024-08-10  1:55 ` [PATCH 02/11] drm/xe/uc: Use managed bo for HuC and GSC objects Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 04/11] drm/xe: Fix tile fini sequence Matthew Brost
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Exec_queue cleanup requires HW access, so we need to use devm instead of
drmm for it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 drivers/gpu/drm/xe/xe_gsc.c        | 4 ++--
 drivers/gpu/drm/xe/xe_guc_submit.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index 8a9b3c50a588..8a137cb83318 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -437,7 +437,7 @@ int xe_gsc_init(struct xe_gsc *gsc)
 	return ret;
 }
 
-static void free_resources(struct drm_device *drm, void *arg)
+static void free_resources(void *arg)
 {
 	struct xe_gsc *gsc = arg;
 
@@ -495,7 +495,7 @@ int xe_gsc_init_post_hwconfig(struct xe_gsc *gsc)
 	gsc->q = q;
 	gsc->wq = wq;
 
-	err = drmm_add_action_or_reset(&xe->drm, free_resources, gsc);
+	err = devm_add_action_or_reset(xe->drm.dev, free_resources, gsc);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 460808507947..2adf551500cb 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -284,7 +284,7 @@ static void guc_submit_fini(struct drm_device *drm, void *arg)
 	free_submit_wq(guc);
 }
 
-static void guc_submit_wedged_fini(struct drm_device *drm, void *arg)
+static void guc_submit_wedged_fini(void *arg)
 {
 	struct xe_guc *guc = arg;
 	struct xe_exec_queue *q;
@@ -877,7 +877,7 @@ void xe_guc_submit_wedge(struct xe_guc *guc)
 
 	xe_gt_assert(guc_to_gt(guc), guc_to_xe(guc)->wedged.mode);
 
-	err = drmm_add_action_or_reset(&guc_to_xe(guc)->drm,
+	err = devm_add_action_or_reset(guc_to_xe(guc)->drm.dev,
 				       guc_submit_wedged_fini, guc);
 	if (err) {
 		drm_err(&xe->drm, "Failed to register xe_guc_submit clean-up on wedged.mode=2. Although device is wedged.\n");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/11] drm/xe: Fix tile fini sequence
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (2 preceding siblings ...)
  2024-08-10  1:55 ` [PATCH 03/11] drm/xe/uc: Use devm to register cleanup that includes exec_queues Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 05/11] drm/xe: Add driver load error injection Matthew Brost
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

Only set tile->mmio.regs to NULL if not the root tile in tile_fini. The
root tile mmio regs is setup ealier in MMIO init thus it should be set
to NULL in mmio_fini.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_mmio.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index bdcc7282385c..f5bdb540e823 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -30,7 +30,8 @@ static void tiles_fini(void *arg)
 	int id;
 
 	for_each_tile(tile, xe, id)
-		tile->mmio.regs = NULL;
+		if (tile != xe_device_get_root_tile(xe))
+			tile->mmio.regs = NULL;
 }
 
 /*
@@ -146,9 +147,11 @@ int xe_mmio_probe_tiles(struct xe_device *xe)
 static void mmio_fini(void *arg)
 {
 	struct xe_device *xe = arg;
+	struct xe_tile *root_tile = xe_device_get_root_tile(xe);
 
 	pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
 	xe->mmio.regs = NULL;
+	root_tile->mmio.regs = NULL;
 }
 
 int xe_mmio_init(struct xe_device *xe)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 05/11] drm/xe: Add driver load error injection
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (3 preceding siblings ...)
  2024-08-10  1:55 ` [PATCH 04/11] drm/xe: Fix tile fini sequence Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 06/11] drm/xe: Move ggtt_fini to devm managed Matthew Brost
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

Port over i915 driver load error injection.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c       | 31 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_device.h       | 15 ++++++++++++++
 drivers/gpu/drm/xe/xe_device_types.h |  4 ++++
 drivers/gpu/drm/xe/xe_gt.c           |  5 +++++
 drivers/gpu/drm/xe/xe_gt_sriov_pf.c  |  4 ++++
 drivers/gpu/drm/xe/xe_guc.c          |  8 +++++++
 drivers/gpu/drm/xe/xe_guc_ads.c      |  5 +++++
 drivers/gpu/drm/xe/xe_guc_ct.c       |  4 ++++
 drivers/gpu/drm/xe/xe_guc_log.c      |  5 +++++
 drivers/gpu/drm/xe/xe_mmio.c         |  5 +++++
 drivers/gpu/drm/xe/xe_module.c       |  5 +++++
 drivers/gpu/drm/xe/xe_module.h       |  3 +++
 drivers/gpu/drm/xe/xe_pci.c          |  9 ++++++++
 drivers/gpu/drm/xe/xe_pm.c           |  8 +++++++
 drivers/gpu/drm/xe/xe_sriov.c        |  8 ++++++-
 drivers/gpu/drm/xe/xe_tile.c         |  4 ++++
 drivers/gpu/drm/xe/xe_uc.c           |  4 ++++
 drivers/gpu/drm/xe/xe_wa.c           |  5 +++++
 drivers/gpu/drm/xe/xe_wopcm.c        |  4 ++++
 19 files changed, 135 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 1aba6f9eaa19..f6cd13ed6d20 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -374,6 +374,10 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 	if (WARN_ON(err))
 		goto err;
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		goto err;
+
 	return xe;
 
 err:
@@ -477,6 +481,10 @@ static int xe_set_dma_info(struct xe_device *xe)
 	if (err)
 		goto mask_err;
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		goto mask_err;
+
 	return 0;
 
 mask_err:
@@ -580,6 +588,10 @@ int xe_device_probe_early(struct xe_device *xe)
 	if (err)
 		return err;
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	xe->wedged.mode = xe_modparam.wedged_mode;
 
 	return 0;
@@ -995,3 +1007,22 @@ void xe_device_declare_wedged(struct xe_device *xe)
 	for_each_gt(gt, xe, id)
 		xe_gt_declare_wedged(gt);
 }
+
+#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
+int __xe_device_inject_driver_load_error(struct xe_device *xe, int err,
+					 const char *func, int line)
+{
+        if (xe->inject_driver_load_error >= xe_modparam.inject_driver_load_error)
+                return 0;
+
+        if (++xe->inject_driver_load_error < xe_modparam.inject_driver_load_error)
+                return 0;
+
+        drm_info(&xe->drm, "Injecting failure %d at checkpoint %u [%s:%d]\n",
+                 err, xe->inject_driver_load_error, func, line);
+
+        xe_modparam.inject_driver_load_error = 0;
+        return err;
+
+}
+#endif
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index db6cc8d0d6b8..4f7e9cdac9fe 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -179,4 +179,19 @@ void xe_device_declare_wedged(struct xe_device *xe);
 struct xe_file *xe_file_get(struct xe_file *xef);
 void xe_file_put(struct xe_file *xef);
 
+#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
+
+int __xe_device_inject_driver_load_error(struct xe_device *xe, int err,
+					 const char *func, int line);
+
+#define xe_device_inject_driver_load_error(__xe) \
+	__xe_device_inject_driver_load_error(__xe, -ENODEV, __func__, __LINE__)
+
+#else
+
+#define xe_device_inject_driver_load_error(__xe) \
+	({ BUILD_BUG_ON_INVALID(__xe); 0; })
+
+#endif
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 5b7292a9a66d..3e620314eec2 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -484,6 +484,10 @@ struct xe_device {
 		int mode;
 	} wedged;
 
+#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
+	int inject_driver_load_error;
+#endif
+
 #ifdef TEST_VM_OPS_ERROR
 	/**
 	 * @vm_inject_error_position: inject errors at different places in VM
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 58895ed22f6e..8209079c0334 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -389,6 +389,10 @@ int xe_gt_init_early(struct xe_gt *gt)
 	xe_pcode_init(gt);
 	spin_lock_init(&gt->global_invl_lock);
 
+	err = xe_device_inject_driver_load_error(gt_to_xe(gt));
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -570,6 +574,7 @@ int xe_gt_init_hwconfig(struct xe_gt *gt)
 	xe_gt_topology_init(gt);
 	xe_gt_mcr_init(gt);
 
+	err = xe_device_inject_driver_load_error(gt_to_xe(gt));
 out_fw:
 	xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
 out:
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
index ef239440963c..897815ddf954 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_pf.c
@@ -57,6 +57,10 @@ int xe_gt_sriov_pf_init_early(struct xe_gt *gt)
 	if (err)
 		return err;
 
+	err = xe_device_inject_driver_load_error(gt_to_xe(gt));
+	if (err)
+		return err;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index de0fe9e65746..980691c178c4 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -354,6 +354,10 @@ int xe_guc_init(struct xe_guc *guc)
 	if (ret)
 		goto out;
 
+	ret = xe_device_inject_driver_load_error(guc_to_xe(guc));
+	if (ret)
+		goto out;
+
 	guc_init_params(guc);
 
 	xe_guc_comm_init_early(guc);
@@ -411,6 +415,10 @@ int xe_guc_init_post_hwconfig(struct xe_guc *guc)
 	if (ret)
 		return ret;
 
+	ret = xe_device_inject_driver_load_error(guc_to_xe(guc));
+	if (ret)
+		return ret;
+
 	return xe_guc_ads_init_post_hwconfig(&guc->ads);
 }
 
diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c
index d1902a8581ca..1944912ef9b8 100644
--- a/drivers/gpu/drm/xe/xe_guc_ads.c
+++ b/drivers/gpu/drm/xe/xe_guc_ads.c
@@ -402,6 +402,7 @@ int xe_guc_ads_init(struct xe_guc_ads *ads)
 	struct xe_gt *gt = ads_to_gt(ads);
 	struct xe_tile *tile = gt_to_tile(gt);
 	struct xe_bo *bo;
+	int err;
 
 	ads->golden_lrc_size = calculate_golden_lrc_size(ads);
 	ads->regset_size = calculate_regset_size(gt);
@@ -416,6 +417,10 @@ int xe_guc_ads_init(struct xe_guc_ads *ads)
 
 	ads->bo = bo;
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index beeeb120d1fc..76a26aaabb13 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -197,6 +197,10 @@ int xe_guc_ct_init(struct xe_guc_ct *ct)
 	if (err)
 		return err;
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	xe_gt_assert(gt, ct->state == XE_GUC_CT_STATE_NOT_INITIALIZED);
 	ct->state = XE_GUC_CT_STATE_DISABLED;
 	return 0;
diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c
index a37ee3419428..f26c37e3ee3a 100644
--- a/drivers/gpu/drm/xe/xe_guc_log.c
+++ b/drivers/gpu/drm/xe/xe_guc_log.c
@@ -82,6 +82,7 @@ int xe_guc_log_init(struct xe_guc_log *log)
 	struct xe_device *xe = log_to_xe(log);
 	struct xe_tile *tile = gt_to_tile(log_to_gt(log));
 	struct xe_bo *bo;
+	int err;
 
 	bo = xe_managed_bo_create_pin_map(xe, tile, guc_log_size(),
 					  XE_BO_FLAG_SYSTEM |
@@ -94,5 +95,9 @@ int xe_guc_log_init(struct xe_guc_log *log)
 	log->bo = bo;
 	log->level = xe_modparam.guc_log_level;
 
+	err = xe_device_inject_driver_load_error(log_to_xe(log));
+	if (err)
+		return err;
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index f5bdb540e823..12ad2f73e8a4 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -137,6 +137,11 @@ int xe_mmio_probe_tiles(struct xe_device *xe)
 {
 	size_t tile_mmio_size = SZ_16M;
 	size_t tile_mmio_ext_size = xe->info.tile_mmio_ext_size;
+	int err;
+
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
 
 	mmio_multi_tile_setup(xe, tile_mmio_size);
 	mmio_extension_setup(xe, tile_mmio_size, tile_mmio_ext_size);
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 7bb99e451fcc..972b64a9f514 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -53,6 +53,11 @@ module_param_named_unsafe(force_probe, xe_modparam.force_probe, charp, 0400);
 MODULE_PARM_DESC(force_probe,
 		 "Force probe options for specified devices. See CONFIG_DRM_XE_FORCE_PROBE for details.");
 
+#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
+module_param_named_unsafe(inject_driver_load_error, xe_modparam.inject_driver_load_error, int, 0600);
+MODULE_PARM_DESC(inject_driver_load_error, "Inject driver load error");
+#endif
+
 #ifdef CONFIG_PCI_IOV
 module_param_named(max_vfs, xe_modparam.max_vfs, uint, 0400);
 MODULE_PARM_DESC(max_vfs,
diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
index 61a0d28a28c8..409ea10be942 100644
--- a/drivers/gpu/drm/xe/xe_module.h
+++ b/drivers/gpu/drm/xe/xe_module.h
@@ -20,6 +20,9 @@ struct xe_modparam {
 	char *force_probe;
 #ifdef CONFIG_PCI_IOV
 	unsigned int max_vfs;
+#endif
+#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
+	int inject_driver_load_error;
 #endif
 	int wedged_mode;
 };
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index f818aa69f3ca..8b278c83128a 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -629,6 +629,10 @@ static int xe_info_init_early(struct xe_device *xe,
 	if (err)
 		return err;
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -645,6 +649,7 @@ static int xe_info_init(struct xe_device *xe,
 	u32 graphics_gmdid_revid = 0, media_gmdid_revid = 0;
 	struct xe_tile *tile;
 	struct xe_gt *gt;
+	int err;
 	u8 id;
 
 	/*
@@ -745,6 +750,10 @@ static int xe_info_init(struct xe_device *xe,
 		gt->info.id = xe->info.gt_count++;
 	}
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c
index 9f3c14fd9f33..64d992c12364 100644
--- a/drivers/gpu/drm/xe/xe_pm.c
+++ b/drivers/gpu/drm/xe/xe_pm.c
@@ -231,6 +231,10 @@ int xe_pm_init_early(struct xe_device *xe)
 	if (err)
 		return err;
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -264,6 +268,10 @@ int xe_pm_init(struct xe_device *xe)
 
 	xe_pm_runtime_init(xe);
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_sriov.c b/drivers/gpu/drm/xe/xe_sriov.c
index 5a1d65e4f19f..1e738f1d80df 100644
--- a/drivers/gpu/drm/xe/xe_sriov.c
+++ b/drivers/gpu/drm/xe/xe_sriov.c
@@ -102,11 +102,17 @@ static void fini_sriov(struct drm_device *drm, void *arg)
  */
 int xe_sriov_init(struct xe_device *xe)
 {
+	int err;
+
 	if (!IS_SRIOV(xe))
 		return 0;
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	if (IS_SRIOV_PF(xe)) {
-		int err = xe_sriov_pf_init_early(xe);
+		err = xe_sriov_pf_init_early(xe);
 
 		if (err)
 			return err;
diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
index 15ea0a942f67..2d25c7b59b0d 100644
--- a/drivers/gpu/drm/xe/xe_tile.c
+++ b/drivers/gpu/drm/xe/xe_tile.c
@@ -124,6 +124,10 @@ int xe_tile_init_early(struct xe_tile *tile, struct xe_device *xe, u8 id)
 	if (IS_ERR(tile->primary_gt))
 		return PTR_ERR(tile->primary_gt);
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
index 0d073a9987c2..a3786020838b 100644
--- a/drivers/gpu/drm/xe/xe_uc.c
+++ b/drivers/gpu/drm/xe/xe_uc.c
@@ -62,6 +62,10 @@ int xe_uc_init(struct xe_uc *uc)
 	if (ret)
 		goto err;
 
+	ret = xe_device_inject_driver_load_error(uc_to_xe(uc));
+	if (ret)
+		goto err;
+
 	return 0;
 
 err:
diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
index 564e32e44e3b..e558715d8027 100644
--- a/drivers/gpu/drm/xe/xe_wa.c
+++ b/drivers/gpu/drm/xe/xe_wa.c
@@ -821,6 +821,7 @@ int xe_wa_init(struct xe_gt *gt)
 	struct xe_device *xe = gt_to_xe(gt);
 	size_t n_oob, n_lrc, n_engine, n_gt, total;
 	unsigned long *p;
+	int err;
 
 	n_gt = BITS_TO_LONGS(ARRAY_SIZE(gt_was));
 	n_engine = BITS_TO_LONGS(ARRAY_SIZE(engine_was));
@@ -840,6 +841,10 @@ int xe_wa_init(struct xe_gt *gt)
 	p += n_lrc;
 	gt->wa_active.oob = p;
 
+	err = xe_device_inject_driver_load_error(xe);
+	if (err)
+		return err;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_wopcm.c b/drivers/gpu/drm/xe/xe_wopcm.c
index d3a99157e523..edaad1c93e58 100644
--- a/drivers/gpu/drm/xe/xe_wopcm.c
+++ b/drivers/gpu/drm/xe/xe_wopcm.c
@@ -263,6 +263,10 @@ int xe_wopcm_init(struct xe_wopcm *wopcm)
 		return -E2BIG;
 	}
 
+	ret = xe_device_inject_driver_load_error(xe);
+	if (ret)
+		return ret;
+
 	if (!locked)
 		ret = __wopcm_init_regs(xe, gt, wopcm);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 06/11] drm/xe: Move ggtt_fini to devm managed
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (4 preceding siblings ...)
  2024-08-10  1:55 ` [PATCH 05/11] drm/xe: Add driver load error injection Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 07/11] drm/xe: Set firmware state to loadable before registering guc_fini_hw Matthew Brost
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

ggtt->scratch is destroyed via devm, ggtt_fini sets ggtt->scratch to
NULL, ggtt->scratch in GGTT clears, so ensure ggtt->scratch is set NULL
before the BO is destroyed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_ggtt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 0cdbc1296e88..21cc9ffcef1c 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -132,7 +132,7 @@ static void ggtt_fini_early(struct drm_device *drm, void *arg)
 	drm_mm_takedown(&ggtt->mm);
 }
 
-static void ggtt_fini(struct drm_device *drm, void *arg)
+static void ggtt_fini(void *arg)
 {
 	struct xe_ggtt *ggtt = arg;
 
@@ -289,7 +289,7 @@ int xe_ggtt_init(struct xe_ggtt *ggtt)
 
 	xe_ggtt_initial_clear(ggtt);
 
-	return drmm_add_action_or_reset(&xe->drm, ggtt_fini, ggtt);
+	return devm_add_action_or_reset(xe->drm.dev, ggtt_fini, ggtt);
 err:
 	ggtt->scratch = NULL;
 	return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 07/11] drm/xe: Set firmware state to loadable before registering guc_fini_hw
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (5 preceding siblings ...)
  2024-08-10  1:55 ` [PATCH 06/11] drm/xe: Move ggtt_fini to devm managed Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 08/11] drm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini Matthew Brost
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

The guc_fini_hw registered calls __xe_uc_fw_status which is only
expected to be called after initializing fw state. Move this before
registering guc_fini_hw.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 980691c178c4..9f2a7de47b9b 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -350,6 +350,8 @@ int xe_guc_init(struct xe_guc *guc)
 	if (ret)
 		goto out;
 
+	xe_uc_fw_change_status(&guc->fw, XE_UC_FIRMWARE_LOADABLE);
+
 	ret = devm_add_action_or_reset(xe->drm.dev, guc_fini_hw, guc);
 	if (ret)
 		goto out;
@@ -362,8 +364,6 @@ int xe_guc_init(struct xe_guc *guc)
 
 	xe_guc_comm_init_early(guc);
 
-	xe_uc_fw_change_status(&guc->fw, XE_UC_FIRMWARE_LOADABLE);
-
 	return 0;
 
 out:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 08/11] drm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (6 preceding siblings ...)
  2024-08-10  1:55 ` [PATCH 07/11] drm/xe: Set firmware state to loadable before registering guc_fini_hw Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 09/11] drm/xe: Move hw_engine_fini to devm managed Matthew Brost
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

Not a big deal if CT is down as driver is unloading, no need to warn.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_pc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
index 32e93a8127d4..def503abeed5 100644
--- a/drivers/gpu/drm/xe/xe_guc_pc.c
+++ b/drivers/gpu/drm/xe/xe_guc_pc.c
@@ -1042,7 +1042,7 @@ static void xe_guc_pc_fini_hw(void *arg)
 		return;
 
 	XE_WARN_ON(xe_force_wake_get(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL));
-	XE_WARN_ON(xe_guc_pc_gucrc_disable(pc));
+	xe_guc_pc_gucrc_disable(pc);
 	XE_WARN_ON(xe_guc_pc_stop(pc));
 
 	/* Bind requested freq to mert_freq_cap before unload */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 09/11] drm/xe: Move hw_engine_fini to devm managed
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (7 preceding siblings ...)
  2024-08-10  1:55 ` [PATCH 08/11] drm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 10/11] drm/xe: Move HuC init before GuC init Matthew Brost
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

Kernel BOs are destroyed with GGTT mappings, this is hardware
interaction so use devm.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_hw_engine.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 402dfa748e16..50c192293e7e 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -266,7 +266,7 @@ static const struct engine_info engine_infos[] = {
 	},
 };
 
-static void hw_engine_fini(struct drm_device *drm, void *arg)
+static void hw_engine_fini(void *arg)
 {
 	struct xe_hw_engine *hwe = arg;
 
@@ -584,7 +584,7 @@ static int hw_engine_init(struct xe_gt *gt, struct xe_hw_engine *hwe,
 	if (xe->info.has_usm && hwe->class == XE_ENGINE_CLASS_COPY)
 		gt->usm.reserved_bcs_instance = hwe->instance;
 
-	return drmm_add_action_or_reset(&xe->drm, hw_engine_fini, hwe);
+	return devm_add_action_or_reset(xe->drm.dev, hw_engine_fini, hwe);
 
 err_kernel_lrc:
 	xe_lrc_put(hwe->kernel_lrc);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 10/11] drm/xe: Move HuC init before GuC init
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (8 preceding siblings ...)
  2024-08-10  1:55 ` [PATCH 09/11] drm/xe: Move hw_engine_fini to devm managed Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  1:55 ` [PATCH 11/11] drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map Matthew Brost
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

The GuC fini, also fini the HuC so move HuC init first.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_uc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
index a3786020838b..732d57875701 100644
--- a/drivers/gpu/drm/xe/xe_uc.c
+++ b/drivers/gpu/drm/xe/xe_uc.c
@@ -36,6 +36,10 @@ int xe_uc_init(struct xe_uc *uc)
 {
 	int ret;
 
+	ret = xe_huc_init(&uc->huc);
+	if (ret)
+		goto err;
+
 	/*
 	 * We call the GuC/HuC/GSC init functions even if GuC submission is off
 	 * to correctly move our tracking of the FW state to "disabled".
@@ -44,10 +48,6 @@ int xe_uc_init(struct xe_uc *uc)
 	if (ret)
 		goto err;
 
-	ret = xe_huc_init(&uc->huc);
-	if (ret)
-		goto err;
-
 	ret = xe_gsc_init(&uc->gsc);
 	if (ret)
 		goto err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 11/11] drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (9 preceding siblings ...)
  2024-08-10  1:55 ` [PATCH 10/11] drm/xe: Move HuC init before GuC init Matthew Brost
@ 2024-08-10  1:55 ` Matthew Brost
  2024-08-10  2:01 ` ✓ CI.Patch_applied: success for Fix error paths in driver load Patchwork
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2024-08-10  1:55 UTC (permalink / raw)
  To: intel-xe

Preferred way to create kernel BOs is xe_managed_bo_create_pin_map, use
it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_sa.c       | 13 ++++++-------
 drivers/gpu/drm/xe/xe_sa_types.h |  1 +
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_sa.c b/drivers/gpu/drm/xe/xe_sa.c
index f3060979e63f..fe2cb2a96f78 100644
--- a/drivers/gpu/drm/xe/xe_sa.c
+++ b/drivers/gpu/drm/xe/xe_sa.c
@@ -25,10 +25,9 @@ static void xe_sa_bo_manager_fini(struct drm_device *drm, void *arg)
 
 	drm_suballoc_manager_fini(&sa_manager->base);
 
-	if (bo->vmap.is_iomem)
+	if (sa_manager->is_iomem)
 		kvfree(sa_manager->cpu_ptr);
 
-	xe_bo_unpin_map_no_vm(bo);
 	sa_manager->bo = NULL;
 }
 
@@ -47,16 +46,17 @@ struct xe_sa_manager *xe_sa_bo_manager_init(struct xe_tile *tile, u32 size, u32
 
 	sa_manager->bo = NULL;
 
-	bo = xe_bo_create_pin_map(xe, tile, NULL, size, ttm_bo_type_kernel,
-				  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
-				  XE_BO_FLAG_GGTT |
-				  XE_BO_FLAG_GGTT_INVALIDATE);
+	bo = xe_managed_bo_create_pin_map(xe, tile, size,
+					  XE_BO_FLAG_VRAM_IF_DGFX(tile) |
+					  XE_BO_FLAG_GGTT |
+					  XE_BO_FLAG_GGTT_INVALIDATE);
 	if (IS_ERR(bo)) {
 		drm_err(&xe->drm, "failed to allocate bo for sa manager: %ld\n",
 			PTR_ERR(bo));
 		return (struct xe_sa_manager *)bo;
 	}
 	sa_manager->bo = bo;
+	sa_manager->is_iomem = bo->vmap.is_iomem;
 
 	drm_suballoc_manager_init(&sa_manager->base, managed_size, align);
 	sa_manager->gpu_addr = xe_bo_ggtt_addr(bo);
@@ -64,7 +64,6 @@ struct xe_sa_manager *xe_sa_bo_manager_init(struct xe_tile *tile, u32 size, u32
 	if (bo->vmap.is_iomem) {
 		sa_manager->cpu_ptr = kvzalloc(managed_size, GFP_KERNEL);
 		if (!sa_manager->cpu_ptr) {
-			xe_bo_unpin_map_no_vm(sa_manager->bo);
 			sa_manager->bo = NULL;
 			return ERR_PTR(-ENOMEM);
 		}
diff --git a/drivers/gpu/drm/xe/xe_sa_types.h b/drivers/gpu/drm/xe/xe_sa_types.h
index 2ef896aeca1d..2b070ff1292e 100644
--- a/drivers/gpu/drm/xe/xe_sa_types.h
+++ b/drivers/gpu/drm/xe/xe_sa_types.h
@@ -14,6 +14,7 @@ struct xe_sa_manager {
 	struct xe_bo *bo;
 	u64 gpu_addr;
 	void *cpu_ptr;
+	bool is_iomem;
 };
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* ✓ CI.Patch_applied: success for Fix error paths in driver load
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (10 preceding siblings ...)
  2024-08-10  1:55 ` [PATCH 11/11] drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map Matthew Brost
@ 2024-08-10  2:01 ` Patchwork
  2024-08-10  2:01 ` ✗ CI.checkpatch: warning " Patchwork
  2024-08-10  2:01 ` ✗ CI.KUnit: failure " Patchwork
  13 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2024-08-10  2:01 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix error paths in driver load
URL   : https://patchwork.freedesktop.org/series/137114/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 99ad64bdf35c drm-tip: 2024y-08m-09d-23h-41m-10s UTC integration manifest
=== git am output follows ===
Applying: drm/xe: use devm instead of drmm for managed bo
Applying: drm/xe/uc: Use managed bo for HuC and GSC objects
Applying: drm/xe/uc: Use devm to register cleanup that includes exec_queues
Applying: drm/xe: Fix tile fini sequence
Applying: drm/xe: Add driver load error injection
Applying: drm/xe: Move ggtt_fini to devm managed
Applying: drm/xe: Set firmware state to loadable before registering guc_fini_hw
Applying: drm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini
Applying: drm/xe: Move hw_engine_fini to devm managed
Applying: drm/xe: Move HuC init before GuC init
Applying: drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map



^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✗ CI.checkpatch: warning for Fix error paths in driver load
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (11 preceding siblings ...)
  2024-08-10  2:01 ` ✓ CI.Patch_applied: success for Fix error paths in driver load Patchwork
@ 2024-08-10  2:01 ` Patchwork
  2024-08-10  2:01 ` ✗ CI.KUnit: failure " Patchwork
  13 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2024-08-10  2:01 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix error paths in driver load
URL   : https://patchwork.freedesktop.org/series/137114/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
dc547930fbb1350eaf6bde84653b9ac973a411db
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit e4276e3678496a951e5af3ab44f52b0851521335
Author: Matthew Brost <matthew.brost@intel.com>
Date:   Fri Aug 9 18:55:44 2024 -0700

    drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map
    
    Preferred way to create kernel BOs is xe_managed_bo_create_pin_map, use
    it.
    
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
+ /mt/dim checkpatch 99ad64bdf35cabb4af86749665508067895eb1d6 drm-intel
d3ddb2e2ad0c drm/xe: use devm instead of drmm for managed bo
40c6bc8b053c drm/xe/uc: Use managed bo for HuC and GSC objects
0467d7d6e063 drm/xe/uc: Use devm to register cleanup that includes exec_queues
2b4b8233a316 drm/xe: Fix tile fini sequence
3bd3c9f644d5 drm/xe: Add driver load error injection
-:56: ERROR:CODE_INDENT: code indent should use tabs where possible
#56: FILE: drivers/gpu/drm/xe/xe_device.c:1015:
+        if (xe->inject_driver_load_error >= xe_modparam.inject_driver_load_error)$

-:56: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#56: FILE: drivers/gpu/drm/xe/xe_device.c:1015:
+        if (xe->inject_driver_load_error >= xe_modparam.inject_driver_load_error)$

-:57: ERROR:CODE_INDENT: code indent should use tabs where possible
#57: FILE: drivers/gpu/drm/xe/xe_device.c:1016:
+                return 0;$

-:57: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#57: FILE: drivers/gpu/drm/xe/xe_device.c:1016:
+                return 0;$

-:59: ERROR:CODE_INDENT: code indent should use tabs where possible
#59: FILE: drivers/gpu/drm/xe/xe_device.c:1018:
+        if (++xe->inject_driver_load_error < xe_modparam.inject_driver_load_error)$

-:59: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#59: FILE: drivers/gpu/drm/xe/xe_device.c:1018:
+        if (++xe->inject_driver_load_error < xe_modparam.inject_driver_load_error)$

-:60: ERROR:CODE_INDENT: code indent should use tabs where possible
#60: FILE: drivers/gpu/drm/xe/xe_device.c:1019:
+                return 0;$

-:60: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#60: FILE: drivers/gpu/drm/xe/xe_device.c:1019:
+                return 0;$

-:62: ERROR:CODE_INDENT: code indent should use tabs where possible
#62: FILE: drivers/gpu/drm/xe/xe_device.c:1021:
+        drm_info(&xe->drm, "Injecting failure %d at checkpoint %u [%s:%d]\n",$

-:62: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#62: FILE: drivers/gpu/drm/xe/xe_device.c:1021:
+        drm_info(&xe->drm, "Injecting failure %d at checkpoint %u [%s:%d]\n",$

-:63: ERROR:CODE_INDENT: code indent should use tabs where possible
#63: FILE: drivers/gpu/drm/xe/xe_device.c:1022:
+                 err, xe->inject_driver_load_error, func, line);$

-:63: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#63: FILE: drivers/gpu/drm/xe/xe_device.c:1022:
+                 err, xe->inject_driver_load_error, func, line);$

-:65: ERROR:CODE_INDENT: code indent should use tabs where possible
#65: FILE: drivers/gpu/drm/xe/xe_device.c:1024:
+        xe_modparam.inject_driver_load_error = 0;$

-:65: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#65: FILE: drivers/gpu/drm/xe/xe_device.c:1024:
+        xe_modparam.inject_driver_load_error = 0;$

-:66: ERROR:CODE_INDENT: code indent should use tabs where possible
#66: FILE: drivers/gpu/drm/xe/xe_device.c:1025:
+        return err;$

-:66: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#66: FILE: drivers/gpu/drm/xe/xe_device.c:1025:
+        return err;$

-:68: CHECK:BRACES: Blank lines aren't necessary before a close brace '}'
#68: FILE: drivers/gpu/drm/xe/xe_device.c:1027:
+
+}

-:258: WARNING:LONG_LINE: line length of 101 exceeds 100 columns
#258: FILE: drivers/gpu/drm/xe/xe_module.c:57:
+module_param_named_unsafe(inject_driver_load_error, xe_modparam.inject_driver_load_error, int, 0600);

total: 8 errors, 9 warnings, 1 checks, 314 lines checked
80825dbef175 drm/xe: Move ggtt_fini to devm managed
574908a5a73f drm/xe: Set firmware state to loadable before registering guc_fini_hw
97d37d80565f drm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini
99df6493572d drm/xe: Move hw_engine_fini to devm managed
ca5c6e4675e6 drm/xe: Move HuC init before GuC init
e4276e367849 drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map



^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✗ CI.KUnit: failure for Fix error paths in driver load
  2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
                   ` (12 preceding siblings ...)
  2024-08-10  2:01 ` ✗ CI.checkpatch: warning " Patchwork
@ 2024-08-10  2:01 ` Patchwork
  13 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2024-08-10  2:01 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix error paths in driver load
URL   : https://patchwork.freedesktop.org/series/137114/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../drivers/gpu/drm/xe/xe_gt_sriov_pf.c: In function ‘xe_gt_sriov_pf_init_early’:
../drivers/gpu/drm/xe/xe_gt_sriov_pf.c:60:8: error: implicit declaration of function ‘xe_device_inject_driver_load_error’ [-Werror=implicit-function-declaration]
   60 |  err = xe_device_inject_driver_load_error(gt_to_xe(gt));
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[7]: *** [../scripts/Makefile.build:244: drivers/gpu/drm/xe/xe_gt_sriov_pf.o] Error 1
make[7]: *** Waiting for unfinished jobs....
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
make[6]: *** [../scripts/Makefile.build:485: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:485: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:485: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:485: drivers] Error 2
make[2]: *** [/kernel/Makefile:1925: .] Error 2
make[1]: *** [/kernel/Makefile:224: __sub-make] Error 2
make: *** [Makefile:224: __sub-make] Error 2

[02:01:31] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[02:01:35] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-08-10  2:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-10  1:55 [PATCH 00/11] Fix error paths in driver load Matthew Brost
2024-08-10  1:55 ` [PATCH 01/11] drm/xe: use devm instead of drmm for managed bo Matthew Brost
2024-08-10  1:55 ` [PATCH 02/11] drm/xe/uc: Use managed bo for HuC and GSC objects Matthew Brost
2024-08-10  1:55 ` [PATCH 03/11] drm/xe/uc: Use devm to register cleanup that includes exec_queues Matthew Brost
2024-08-10  1:55 ` [PATCH 04/11] drm/xe: Fix tile fini sequence Matthew Brost
2024-08-10  1:55 ` [PATCH 05/11] drm/xe: Add driver load error injection Matthew Brost
2024-08-10  1:55 ` [PATCH 06/11] drm/xe: Move ggtt_fini to devm managed Matthew Brost
2024-08-10  1:55 ` [PATCH 07/11] drm/xe: Set firmware state to loadable before registering guc_fini_hw Matthew Brost
2024-08-10  1:55 ` [PATCH 08/11] drm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini Matthew Brost
2024-08-10  1:55 ` [PATCH 09/11] drm/xe: Move hw_engine_fini to devm managed Matthew Brost
2024-08-10  1:55 ` [PATCH 10/11] drm/xe: Move HuC init before GuC init Matthew Brost
2024-08-10  1:55 ` [PATCH 11/11] drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map Matthew Brost
2024-08-10  2:01 ` ✓ CI.Patch_applied: success for Fix error paths in driver load Patchwork
2024-08-10  2:01 ` ✗ CI.checkpatch: warning " Patchwork
2024-08-10  2:01 ` ✗ CI.KUnit: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox