* [PATCH v4 00/13] Cleanup error handling on probe
@ 2025-02-12 19:35 Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 01/13] drm/xe: Add callback support for driver remove Lucas De Marchi
` (15 more replies)
0 siblings, 16 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi
Mixing style with goto and devm_add_action_or_reset() is very confusing
and error prone. Clean that up. The only missing one is one of the
display calls, but that can be done in parallel to the review of these
first patches.
Tested on LNL, BMG and ADL-P with driver bind/unbind.
I have other changes on top of these that will make devm compatible.
That will need some drivers/base/ changes though, so it's probably good
to do it in parallel.
v2:
- Add callback-based removed to overcome issue with gsc/mei
integration: it can't be removed with devm.
- More cleanups, adding drm_dev_unplug in the right place, fixing
hwmon, pmu and oa
v3:
- Drop the hwmon patch changing the mutex destroy: we can use drmm_
only if we also migrate the hwmon allocation to drmm_, but
there isn't much reason to do so
v4:
- Move the callback infro to be the first patch and also use it
for xe_display_fini()
- Make our internal api similar to the devm API for easy conversion
Lucas De Marchi (13):
drm/xe: Add callback support for driver remove
drm/xe: Fix xe_display_fini() calls
drm/xe: Fix error handling in xe_irq_install()
drm/xe: Fix xe_tile_init_noalloc() error propagation
drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init()
drm/xe: Remove leftover pxp comment
drm/xe: Cleanup unwind of gt initialization
drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish()
drm/xe/oa: Move fini to xe_oa
drm/xe: Move drm_dev_unplug() out of display function
drm/xe/oa: Handle errors in xe_oa_register()
drm/xe/pmu: Fail probe if xe_pmu_register() fails
drm/xe/hwmon: Stop ignoring errors on probe
drivers/gpu/drm/xe/display/xe_display.c | 22 ++--
drivers/gpu/drm/xe/display/xe_display.h | 2 -
drivers/gpu/drm/xe/xe_device.c | 151 +++++++++++++++++-------
drivers/gpu/drm/xe/xe_device.h | 4 +
drivers/gpu/drm/xe/xe_device_types.h | 17 +++
drivers/gpu/drm/xe/xe_gsc.c | 9 --
drivers/gpu/drm/xe/xe_gsc.h | 1 -
drivers/gpu/drm/xe/xe_gsc_proxy.c | 63 +++++-----
drivers/gpu/drm/xe/xe_gsc_proxy.h | 1 -
drivers/gpu/drm/xe/xe_gsc_types.h | 1 +
drivers/gpu/drm/xe/xe_gt.c | 50 +++-----
drivers/gpu/drm/xe/xe_gt.h | 1 -
drivers/gpu/drm/xe/xe_hwmon.c | 31 ++---
drivers/gpu/drm/xe/xe_hwmon.h | 4 +-
drivers/gpu/drm/xe/xe_irq.c | 14 +--
drivers/gpu/drm/xe/xe_oa.c | 78 ++++++------
drivers/gpu/drm/xe/xe_oa.h | 4 +-
drivers/gpu/drm/xe/xe_pci.c | 4 +-
drivers/gpu/drm/xe/xe_tile.c | 4 +-
drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 17 +--
drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h | 2 +-
drivers/gpu/drm/xe/xe_uc.c | 13 --
drivers/gpu/drm/xe/xe_uc.h | 1 -
23 files changed, 261 insertions(+), 233 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4 01/13] drm/xe: Add callback support for driver remove
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 20:01 ` Rodrigo Vivi
2025-02-12 21:19 ` Raag Jadav
2025-02-12 19:35 ` [PATCH v4 02/13] drm/xe: Fix xe_display_fini() calls Lucas De Marchi
` (14 subsequent siblings)
15 siblings, 2 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Thomas Hellström
xe device probe uses devm cleanup in most places. However there are a
few cases where this is not possible: when the driver interacts with
component add/del. In that case, the resource group would be cleanup
while the entire device resources are in the process of cleanup. One
example is the xe_gsc_proxy and display using that to interact with mei
and audio.
Add a callback-based remove so the exception doesn't make the probe
use multiple error handling styles.
v2: Change internal API to mimic the devm API. This will make it easier
to migrate in future when devm can be used.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 79 ++++++++++++++++++++++++++++
drivers/gpu/drm/xe/xe_device.h | 4 ++
drivers/gpu/drm/xe/xe_device_types.h | 17 ++++++
drivers/gpu/drm/xe/xe_pci.c | 4 +-
4 files changed, 103 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 36d7ffb3b4d90..69bde506ee87e 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -65,6 +65,12 @@
#include <generated/xe_wa_oob.h>
+struct xe_device_remove_action {
+ struct list_head node;
+ xe_device_remove_action_t remove;
+ void *data;
+};
+
static int xe_file_open(struct drm_device *dev, struct drm_file *file)
{
struct xe_device *xe = to_xe_device(dev);
@@ -746,6 +752,9 @@ int xe_device_probe(struct xe_device *xe)
u8 last_gt;
u8 id;
+ xe->probing = true;
+ INIT_LIST_HEAD(&xe->remove_action_list);
+
xe_pat_init_early(xe);
err = xe_sriov_init(xe);
@@ -886,6 +895,8 @@ int xe_device_probe(struct xe_device *xe)
xe_vsec_init(xe);
+ xe->probing = false;
+
return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
err_fini_display:
@@ -907,6 +918,72 @@ int xe_device_probe(struct xe_device *xe)
return err;
}
+/**
+ * xe_device_call_remove_actions - Call the remove actions
+ * @xe: xe device instance
+ *
+ * This is only to be used by xe_pci and xe_device to call the remove actions
+ * while removing the driver or handling probe failures.
+ */
+void xe_device_call_remove_actions(struct xe_device *xe)
+{
+ struct xe_device_remove_action *ra, *tmp;
+
+ list_for_each_entry_safe(ra, tmp, &xe->remove_action_list, node) {
+ ra->remove(xe, ra->data);
+ list_del(&ra->node);
+ kfree(ra);
+ }
+
+ xe->probing = false;
+}
+
+/**
+ * xe_device_add_action_or_reset - Add an action to run on driver removal
+ * @xe: xe device instance
+ * @ra: pointer to the object embedded into the object to cleanup
+ * @remove: function to execute. The @ra is passed as argument
+ *
+ * Example:
+ *
+ * .. code-block:: c
+ *
+ * static void foo_remove(struct xe_device_remove_action *ra)
+ * {
+ * struct xe_foo *foo = container_of(ra, struct xe_foo, remove_action);
+ * ...
+ * }
+ *
+ * int xe_foo_init(struct xe_foo *foo)
+ * {
+ * ...
+ * xe_device_add_remove_action(xe, &foo->remove_action, foo_remove);
+ * ...
+ * return 0;
+ * };
+ */
+int xe_device_add_action_or_reset(struct xe_device *xe,
+ xe_device_remove_action_t action,
+ void *data)
+{
+ struct xe_device_remove_action *ra;
+
+ drm_WARN_ON(&xe->drm, !xe->probing);
+
+ ra = kmalloc(sizeof(*ra), GFP_KERNEL);
+ if (!ra) {
+ action(xe, data);
+ return -ENOMEM;
+ }
+
+ INIT_LIST_HEAD(&ra->node);
+ ra->remove = action;
+ ra->data = data;
+ list_add(&ra->node, &xe->remove_action_list);
+
+ return 0;
+}
+
static void xe_device_remove_display(struct xe_device *xe)
{
xe_display_unregister(xe);
@@ -932,6 +1009,8 @@ void xe_device_remove(struct xe_device *xe)
for_each_gt(gt, xe, id)
xe_gt_remove(gt);
+
+ xe_device_call_remove_actions(xe);
}
void xe_device_shutdown(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 0bc3bc8e68030..a6fedf1ef3c7b 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -45,6 +45,10 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
const struct pci_device_id *ent);
int xe_device_probe_early(struct xe_device *xe);
int xe_device_probe(struct xe_device *xe);
+int xe_device_add_action_or_reset(struct xe_device *xe,
+ xe_device_remove_action_t action,
+ void *data);
+void xe_device_call_remove_actions(struct xe_device *xe);
void xe_device_remove(struct xe_device *xe);
void xe_device_shutdown(struct xe_device *xe);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 833c29fed3a37..b322d49c83c77 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -35,6 +35,7 @@
#include "intel_display_device.h"
#endif
+struct xe_device;
struct xe_ggtt;
struct xe_pat_ops;
struct xe_pxp;
@@ -70,6 +71,8 @@ struct xe_pxp;
const struct xe_tile * : (const struct xe_device *)((tile__)->xe), \
struct xe_tile * : (tile__)->xe)
+typedef void (*xe_device_remove_action_t)(struct xe_device *xe, void *data);
+
/**
* struct xe_vram_region - memory region structure
* This is used to describe a memory region in xe
@@ -428,6 +431,20 @@ struct xe_device {
/** @tiles: device tiles */
struct xe_tile tiles[XE_MAX_TILES_PER_DEVICE];
+ /**
+ * @remove_action_list: list of actions to execute on device remove.
+ * Use xe_device_add_remove_action() for that. Actions can only be added
+ * during probe and are executed during the call from PCI subsystem to
+ * remove the driver from the device.
+ */
+ struct list_head remove_action_list;
+
+ /**
+ * @probing: cover the section in which @remove_action_list can be used
+ * to post cleaning actions
+ */
+ bool probing;
+
/**
* @mem_access: keep track of memory access in the device, possibly
* triggering additional actions when they occur.
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 6a8e82aff3853..70b697fde5b96 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -905,8 +905,10 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return err;
err = xe_device_probe(xe);
- if (err)
+ if (err) {
+ xe_device_call_remove_actions(xe);
return err;
+ }
err = xe_pm_init(xe);
if (err)
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 02/13] drm/xe: Fix xe_display_fini() calls
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 01/13] drm/xe: Add callback support for driver remove Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 03/13] drm/xe: Fix error handling in xe_irq_install() Lucas De Marchi
` (13 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Maarten Lankhorst,
Himal Prasad Ghimiray
xe_display_fini() undoes things from xe_display_init() (technically from
intel_display_driver_probe()). Those `goto err` in xe_device_probe()
were wrong and being accumulated over time.
Commit 65e366ace5ee ("drm/xe/display: Use a single early init call for
display") made it easier to fix now that we don't have xe_display_* init
calls spread on xe_device_probe(). Change xe_display_init() to use
devm_add_action_or_reset() that will finalize display in the right
order.
While at it, also add a newline and comment about calling
xe_driver_flr_fini.
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/display/xe_display.c | 22 ++++++++++++----------
drivers/gpu/drm/xe/display/xe_display.h | 2 --
drivers/gpu/drm/xe/xe_device.c | 21 +++++++++++----------
3 files changed, 23 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index 651799c946ace..bdcc56e51433f 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -161,27 +161,29 @@ int xe_display_init_early(struct xe_device *xe)
return err;
}
-int xe_display_init(struct xe_device *xe)
+static void xe_display_fini(struct xe_device *__xe, void *arg)
{
+ struct xe_device *xe = arg;
struct intel_display *display = &xe->display;
- if (!xe->info.probe_display)
- return 0;
-
- return intel_display_driver_probe(display);
+ intel_hpd_poll_fini(xe);
+ intel_hdcp_component_fini(display);
+ intel_audio_deinit(display);
}
-void xe_display_fini(struct xe_device *xe)
+int xe_display_init(struct xe_device *xe)
{
struct intel_display *display = &xe->display;
+ int err;
if (!xe->info.probe_display)
- return;
+ return 0;
- intel_hpd_poll_fini(xe);
+ err = intel_display_driver_probe(display);
+ if (err)
+ return err;
- intel_hdcp_component_fini(display);
- intel_audio_deinit(display);
+ return xe_device_add_action_or_reset(xe, xe_display_fini, xe);
}
void xe_display_register(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/display/xe_display.h b/drivers/gpu/drm/xe/display/xe_display.h
index e2a99624f7064..685dc74402fb8 100644
--- a/drivers/gpu/drm/xe/display/xe_display.h
+++ b/drivers/gpu/drm/xe/display/xe_display.h
@@ -22,7 +22,6 @@ int xe_display_probe(struct xe_device *xe);
int xe_display_init_early(struct xe_device *xe);
int xe_display_init(struct xe_device *xe);
-void xe_display_fini(struct xe_device *xe);
void xe_display_register(struct xe_device *xe);
void xe_display_unregister(struct xe_device *xe);
@@ -54,7 +53,6 @@ static inline int xe_display_probe(struct xe_device *xe) { return 0; }
static inline int xe_display_init_early(struct xe_device *xe) { return 0; }
static inline int xe_display_init(struct xe_device *xe) { return 0; }
-static inline void xe_display_fini(struct xe_device *xe) {}
static inline void xe_display_register(struct xe_device *xe) {}
static inline void xe_display_unregister(struct xe_device *xe) {}
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 69bde506ee87e..d05171aac1eef 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -741,6 +741,7 @@ static int probe_has_flat_ccs(struct xe_device *xe)
"Flat CCS has been disabled in bios, May lead to performance impact");
xe_force_wake_put(gt_to_fw(gt), fw_ref);
+
return 0;
}
@@ -812,22 +813,26 @@ int xe_device_probe(struct xe_device *xe)
err = xe_devcoredump_init(xe);
if (err)
return err;
+
+ /*
+ * From here on, if a step fails, make sure a Driver-FLR is triggereed
+ */
err = devm_add_action_or_reset(xe->drm.dev, xe_driver_flr_fini, xe);
if (err)
return err;
err = probe_has_flat_ccs(xe);
if (err)
- goto err;
+ return err;
err = xe_vram_probe(xe);
if (err)
- goto err;
+ return err;
for_each_tile(tile, xe, id) {
err = xe_tile_init_noalloc(tile);
if (err)
- goto err;
+ return err;
}
/* Allocate and map stolen after potential VRAM resize */
@@ -841,17 +846,17 @@ int xe_device_probe(struct xe_device *xe)
*/
err = xe_display_init_early(xe);
if (err)
- goto err;
+ return err;
for_each_tile(tile, xe, id) {
err = xe_tile_init(tile);
if (err)
- goto err;
+ return err;
}
err = xe_irq_install(xe);
if (err)
- goto err;
+ return err;
for_each_gt(gt, xe, id) {
last_gt = id;
@@ -913,8 +918,6 @@ int xe_device_probe(struct xe_device *xe)
break;
}
-err:
- xe_display_fini(xe);
return err;
}
@@ -1001,8 +1004,6 @@ void xe_device_remove(struct xe_device *xe)
xe_device_remove_display(xe);
- xe_display_fini(xe);
-
xe_oa_fini(xe);
xe_heci_gsc_fini(xe);
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 03/13] drm/xe: Fix error handling in xe_irq_install()
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 01/13] drm/xe: Add callback support for driver remove Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 02/13] drm/xe: Fix xe_display_fini() calls Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 04/13] drm/xe: Fix xe_tile_init_noalloc() error propagation Lucas De Marchi
` (12 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Himal Prasad Ghimiray
When devm_add_action_or_reset() fails, it already calls the function
passed as parameter and that function is already free'ing the irqs.
Drop the goto and just return.
The caller, xe_device_probe(), should also do the same thing instead of
wrongly doing `goto err` and calling the unrelated xe_display_fini()
function.
Fixes: 14d25d8d684d ("drm/xe: change old msi irq api to a new one")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_irq.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_irq.c b/drivers/gpu/drm/xe/xe_irq.c
index bf092e6391c7d..5362d3174b060 100644
--- a/drivers/gpu/drm/xe/xe_irq.c
+++ b/drivers/gpu/drm/xe/xe_irq.c
@@ -775,19 +775,7 @@ int xe_irq_install(struct xe_device *xe)
xe_irq_postinstall(xe);
- err = devm_add_action_or_reset(xe->drm.dev, irq_uninstall, xe);
- if (err)
- goto free_irq_handler;
-
- return 0;
-
-free_irq_handler:
- if (xe_device_has_msix(xe))
- xe_irq_msix_free(xe);
- else
- xe_irq_msi_free(xe);
-
- return err;
+ return devm_add_action_or_reset(xe->drm.dev, irq_uninstall, xe);
}
static void xe_irq_msi_synchronize_irq(struct xe_device *xe)
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 04/13] drm/xe: Fix xe_tile_init_noalloc() error propagation
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (2 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 03/13] drm/xe: Fix error handling in xe_irq_install() Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 05/13] drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init() Lucas De Marchi
` (11 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Himal Prasad Ghimiray
Propagate the error to the caller so initialization properly stops if
sysfs creation fails.
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_tile.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
index d9a7a04ff652f..d29658ff4dd41 100644
--- a/drivers/gpu/drm/xe/xe_tile.c
+++ b/drivers/gpu/drm/xe/xe_tile.c
@@ -168,9 +168,7 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
xe_wa_apply_tile_workarounds(tile);
- err = xe_tile_sysfs_init(tile);
-
- return 0;
+ return xe_tile_sysfs_init(tile);
}
int xe_tile_init(struct xe_tile *tile)
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 05/13] drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init()
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (3 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 04/13] drm/xe: Fix xe_tile_init_noalloc() error propagation Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 06/13] drm/xe: Remove leftover pxp comment Lucas De Marchi
` (10 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Himal Prasad Ghimiray
Make sure to differentiate normal behavior, e.g. there's no stolen, from
allocation errors or failure to initialize lower layers.
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 4 +++-
drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 17 +++++++++--------
drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h | 2 +-
3 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index d05171aac1eef..c8d0a100200c9 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -836,7 +836,9 @@ int xe_device_probe(struct xe_device *xe)
}
/* Allocate and map stolen after potential VRAM resize */
- xe_ttm_stolen_mgr_init(xe);
+ err = xe_ttm_stolen_mgr_init(xe);
+ if (err)
+ return err;
/*
* Now that GT is initialized (TTM in particular),
diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
index d414421f8c131..d9c9d2547aadf 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
@@ -207,17 +207,16 @@ static u64 detect_stolen(struct xe_device *xe, struct xe_ttm_stolen_mgr *mgr)
#endif
}
-void xe_ttm_stolen_mgr_init(struct xe_device *xe)
+int xe_ttm_stolen_mgr_init(struct xe_device *xe)
{
- struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm, sizeof(*mgr), GFP_KERNEL);
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
+ struct xe_ttm_stolen_mgr *mgr;
u64 stolen_size, io_size;
int err;
- if (!mgr) {
- drm_dbg_kms(&xe->drm, "Stolen mgr init failed\n");
- return;
- }
+ mgr = drmm_kzalloc(&xe->drm, sizeof(*mgr), GFP_KERNEL);
+ if (!mgr)
+ return -ENOMEM;
if (IS_SRIOV_VF(xe))
stolen_size = 0;
@@ -230,7 +229,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
if (!stolen_size) {
drm_dbg_kms(&xe->drm, "No stolen memory support\n");
- return;
+ return 0;
}
/*
@@ -246,7 +245,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
io_size, PAGE_SIZE);
if (err) {
drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err);
- return;
+ return err;
}
drm_dbg_kms(&xe->drm, "Initialized stolen memory support with %llu bytes\n",
@@ -254,6 +253,8 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
if (io_size)
mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, io_size);
+
+ return 0;
}
u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset)
diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
index 1777245ff8101..8e877d1e839bd 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
@@ -12,7 +12,7 @@ struct ttm_resource;
struct xe_bo;
struct xe_device;
-void xe_ttm_stolen_mgr_init(struct xe_device *xe);
+int xe_ttm_stolen_mgr_init(struct xe_device *xe);
int xe_ttm_stolen_io_mem_reserve(struct xe_device *xe, struct ttm_resource *mem);
bool xe_ttm_stolen_cpu_access_needs_ggtt(struct xe_device *xe);
u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset);
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 06/13] drm/xe: Remove leftover pxp comment
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (4 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 05/13] drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init() Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 07/13] drm/xe: Cleanup unwind of gt initialization Lucas De Marchi
` (9 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi
Not being able to initialize pxp is fatal if the platform is expected to
have it. Update comment after commit 9c9dc9ba4a00 ("drm/xe/pxp: Fail the
load if PXP fails to initialize").
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index c8d0a100200c9..37bac39d74ba6 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -878,7 +878,6 @@ int xe_device_probe(struct xe_device *xe)
if (err)
goto err_fini_oa;
- /* A PXP init failure is not fatal */
err = xe_pxp_init(xe);
if (err)
goto err_fini_display;
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 07/13] drm/xe: Cleanup unwind of gt initialization
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (5 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 06/13] drm/xe: Remove leftover pxp comment Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 08/13] drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish() Lucas De Marchi
` (8 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Thomas Hellström,
Himal Prasad Ghimiray
The only thing in xe_gt_remove() that really needs to happen on the
device remove callback is the xe_uc_remove(). That's because of the
following call chain:
xe_gt_remove()
xe_uc_remove()
xe_gsc_remove()
xe_gsc_proxy_remove()
Move xe_gsc_proxy_remove() to be handled as a xe_device_remove_action,
so it's recorded when it should run during device removal. The rest can
be handled normally by devm infra.
Besides removing the deep call chain above, xe_device_probe() doesn't
have to unwind the gt loop and it's also more in line with the
xe_device_probe() style.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 21 +----------
drivers/gpu/drm/xe/xe_gsc.c | 9 -----
drivers/gpu/drm/xe/xe_gsc.h | 1 -
drivers/gpu/drm/xe/xe_gsc_proxy.c | 63 ++++++++++++++-----------------
drivers/gpu/drm/xe/xe_gsc_proxy.h | 1 -
drivers/gpu/drm/xe/xe_gsc_types.h | 1 +
drivers/gpu/drm/xe/xe_gt.c | 35 ++++++++---------
drivers/gpu/drm/xe/xe_gt.h | 1 -
drivers/gpu/drm/xe/xe_uc.c | 13 -------
drivers/gpu/drm/xe/xe_uc.h | 1 -
10 files changed, 47 insertions(+), 99 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 37bac39d74ba6..08dd8e9f8f6de 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -750,7 +750,6 @@ int xe_device_probe(struct xe_device *xe)
struct xe_tile *tile;
struct xe_gt *gt;
int err;
- u8 last_gt;
u8 id;
xe->probing = true;
@@ -861,18 +860,16 @@ int xe_device_probe(struct xe_device *xe)
return err;
for_each_gt(gt, xe, id) {
- last_gt = id;
-
err = xe_gt_init(gt);
if (err)
- goto err_fini_gt;
+ return err;
}
xe_heci_gsc_init(xe);
err = xe_oa_init(xe);
if (err)
- goto err_fini_gt;
+ return err;
err = xe_display_init(xe);
if (err)
@@ -911,14 +908,6 @@ int xe_device_probe(struct xe_device *xe)
err_fini_oa:
xe_oa_fini(xe);
-err_fini_gt:
- for_each_gt(gt, xe, id) {
- if (id < last_gt)
- xe_gt_remove(gt);
- else
- break;
- }
-
return err;
}
@@ -998,9 +987,6 @@ static void xe_device_remove_display(struct xe_device *xe)
void xe_device_remove(struct xe_device *xe)
{
- struct xe_gt *gt;
- u8 id;
-
xe_oa_unregister(xe);
xe_device_remove_display(xe);
@@ -1009,9 +995,6 @@ void xe_device_remove(struct xe_device *xe)
xe_heci_gsc_fini(xe);
- for_each_gt(gt, xe, id)
- xe_gt_remove(gt);
-
xe_device_call_remove_actions(xe);
}
diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index 1eb791ddc375c..fd41113f85725 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -555,15 +555,6 @@ void xe_gsc_wait_for_worker_completion(struct xe_gsc *gsc)
flush_work(&gsc->work);
}
-/**
- * xe_gsc_remove() - Clean up the GSC structures before driver removal
- * @gsc: the GSC uC
- */
-void xe_gsc_remove(struct xe_gsc *gsc)
-{
- xe_gsc_proxy_remove(gsc);
-}
-
/*
* wa_14015076503: if the GSC FW is loaded, we need to alert it before doing a
* GSC engine reset by writing a notification bit in the GS1 register and then
diff --git a/drivers/gpu/drm/xe/xe_gsc.h b/drivers/gpu/drm/xe/xe_gsc.h
index e282b9ef6ec4d..d99f66c38075c 100644
--- a/drivers/gpu/drm/xe/xe_gsc.h
+++ b/drivers/gpu/drm/xe/xe_gsc.h
@@ -17,7 +17,6 @@ int xe_gsc_init(struct xe_gsc *gsc);
int xe_gsc_init_post_hwconfig(struct xe_gsc *gsc);
void xe_gsc_wait_for_worker_completion(struct xe_gsc *gsc);
void xe_gsc_load_start(struct xe_gsc *gsc);
-void xe_gsc_remove(struct xe_gsc *gsc);
void xe_gsc_hwe_irq_handler(struct xe_hw_engine *hwe, u16 intr_vec);
void xe_gsc_wa_14015076503(struct xe_gt *gt, bool prep);
diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.c b/drivers/gpu/drm/xe/xe_gsc_proxy.c
index 24cc6a4f9a96a..6aa76a7843cfa 100644
--- a/drivers/gpu/drm/xe/xe_gsc_proxy.c
+++ b/drivers/gpu/drm/xe/xe_gsc_proxy.c
@@ -423,6 +423,34 @@ static int proxy_channel_alloc(struct xe_gsc *gsc)
return 0;
}
+static void xe_gsc_proxy_remove(struct xe_device *__xe, void *arg)
+{
+ struct xe_gsc *gsc = arg;
+ struct xe_gt *gt = gsc_to_gt(gsc);
+ struct xe_device *xe = gt_to_xe(gt);
+ unsigned int fw_ref = 0;
+
+ if (!gsc->proxy.component_added)
+ return;
+
+ /* disable HECI2 IRQs */
+ xe_pm_runtime_get(xe);
+ fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC);
+ if (!fw_ref)
+ xe_gt_err(gt, "failed to get forcewake to disable GSC interrupts\n");
+
+ /* try do disable irq even if forcewake failed */
+ gsc_proxy_irq_toggle(gsc, false);
+
+ xe_force_wake_put(gt_to_fw(gt), fw_ref);
+ xe_pm_runtime_put(xe);
+
+ xe_gsc_wait_for_worker_completion(gsc);
+
+ component_del(xe->drm.dev, &xe_gsc_proxy_component_ops);
+ gsc->proxy.component_added = false;
+}
+
/**
* xe_gsc_proxy_init() - init objects and MEI component required by GSC proxy
* @gsc: the GSC uC
@@ -462,40 +490,7 @@ int xe_gsc_proxy_init(struct xe_gsc *gsc)
gsc->proxy.component_added = true;
- /* the component must be removed before unload, so can't use drmm for cleanup */
-
- return 0;
-}
-
-/**
- * xe_gsc_proxy_remove() - remove the GSC proxy MEI component
- * @gsc: the GSC uC
- */
-void xe_gsc_proxy_remove(struct xe_gsc *gsc)
-{
- struct xe_gt *gt = gsc_to_gt(gsc);
- struct xe_device *xe = gt_to_xe(gt);
- unsigned int fw_ref = 0;
-
- if (!gsc->proxy.component_added)
- return;
-
- /* disable HECI2 IRQs */
- xe_pm_runtime_get(xe);
- fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC);
- if (!fw_ref)
- xe_gt_err(gt, "failed to get forcewake to disable GSC interrupts\n");
-
- /* try do disable irq even if forcewake failed */
- gsc_proxy_irq_toggle(gsc, false);
-
- xe_force_wake_put(gt_to_fw(gt), fw_ref);
- xe_pm_runtime_put(xe);
-
- xe_gsc_wait_for_worker_completion(gsc);
-
- component_del(xe->drm.dev, &xe_gsc_proxy_component_ops);
- gsc->proxy.component_added = false;
+ return xe_device_add_action_or_reset(xe, xe_gsc_proxy_remove, gsc);
}
/**
diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.h b/drivers/gpu/drm/xe/xe_gsc_proxy.h
index c511ade6b8637..fdef56995cd43 100644
--- a/drivers/gpu/drm/xe/xe_gsc_proxy.h
+++ b/drivers/gpu/drm/xe/xe_gsc_proxy.h
@@ -12,7 +12,6 @@ struct xe_gsc;
int xe_gsc_proxy_init(struct xe_gsc *gsc);
bool xe_gsc_proxy_init_done(struct xe_gsc *gsc);
-void xe_gsc_proxy_remove(struct xe_gsc *gsc);
int xe_gsc_proxy_start(struct xe_gsc *gsc);
int xe_gsc_proxy_request_handler(struct xe_gsc *gsc);
diff --git a/drivers/gpu/drm/xe/xe_gsc_types.h b/drivers/gpu/drm/xe/xe_gsc_types.h
index 5926de20214c8..97c056656df05 100644
--- a/drivers/gpu/drm/xe/xe_gsc_types.h
+++ b/drivers/gpu/drm/xe/xe_gsc_types.h
@@ -13,6 +13,7 @@
#include <linux/workqueue.h>
#include "xe_uc_fw_types.h"
+#include "xe_device_types.h"
struct xe_bo;
struct xe_exec_queue;
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 9fb8f1e678dc8..c33040278e1aa 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -141,26 +141,6 @@ static void xe_gt_disable_host_l2_vram(struct xe_gt *gt)
xe_force_wake_put(gt_to_fw(gt), fw_ref);
}
-/**
- * xe_gt_remove() - Clean up the GT structures before driver removal
- * @gt: the GT object
- *
- * This function should only act on objects/structures that must be cleaned
- * before the driver removal callback is complete and therefore can't be
- * deferred to a drmm action.
- */
-void xe_gt_remove(struct xe_gt *gt)
-{
- int i;
-
- xe_uc_remove(>->uc);
-
- for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
- xe_hw_fence_irq_finish(>->fence_irq[i]);
-
- xe_gt_disable_host_l2_vram(gt);
-}
-
static void gt_reset_worker(struct work_struct *w);
static int emit_nop_job(struct xe_gt *gt, struct xe_exec_queue *q)
@@ -583,6 +563,17 @@ int xe_gt_init_hwconfig(struct xe_gt *gt)
return err;
}
+static void xe_gt_fini(void *arg)
+{
+ struct xe_gt *gt = arg;
+ int i;
+
+ for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
+ xe_hw_fence_irq_finish(>->fence_irq[i]);
+
+ xe_gt_disable_host_l2_vram(gt);
+}
+
int xe_gt_init(struct xe_gt *gt)
{
int err;
@@ -595,6 +586,10 @@ int xe_gt_init(struct xe_gt *gt)
xe_hw_fence_irq_init(>->fence_irq[i]);
}
+ err = devm_add_action_or_reset(gt_to_xe(gt)->drm.dev, xe_gt_fini, gt);
+ if (err)
+ return err;
+
err = xe_gt_pagefault_init(gt);
if (err)
return err;
diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index e504cc33ade4f..187fa6490eafc 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -54,7 +54,6 @@ int xe_gt_resume(struct xe_gt *gt);
void xe_gt_reset_async(struct xe_gt *gt);
void xe_gt_sanitize(struct xe_gt *gt);
int xe_gt_sanitize_freq(struct xe_gt *gt);
-void xe_gt_remove(struct xe_gt *gt);
/**
* xe_gt_wait_for_reset - wait for gt's async reset to finalize.
diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
index 0d073a9987c2e..d8167e818280b 100644
--- a/drivers/gpu/drm/xe/xe_uc.c
+++ b/drivers/gpu/drm/xe/xe_uc.c
@@ -288,19 +288,6 @@ int xe_uc_suspend(struct xe_uc *uc)
return xe_guc_suspend(&uc->guc);
}
-/**
- * xe_uc_remove() - Clean up the UC structures before driver removal
- * @uc: the UC object
- *
- * This function should only act on objects/structures that must be cleaned
- * before the driver removal callback is complete and therefore can't be
- * deferred to a drmm action.
- */
-void xe_uc_remove(struct xe_uc *uc)
-{
- xe_gsc_remove(&uc->gsc);
-}
-
/**
* xe_uc_declare_wedged() - Declare UC wedged
* @uc: the UC object
diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h
index 506517c113339..3813c1ede450e 100644
--- a/drivers/gpu/drm/xe/xe_uc.h
+++ b/drivers/gpu/drm/xe/xe_uc.h
@@ -20,7 +20,6 @@ void xe_uc_stop(struct xe_uc *uc);
int xe_uc_start(struct xe_uc *uc);
int xe_uc_suspend(struct xe_uc *uc);
int xe_uc_sanitize_reset(struct xe_uc *uc);
-void xe_uc_remove(struct xe_uc *uc);
void xe_uc_declare_wedged(struct xe_uc *uc);
#endif
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 08/13] drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish()
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (6 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 07/13] drm/xe: Cleanup unwind of gt initialization Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 09/13] drm/xe/oa: Move fini to xe_oa Lucas De Marchi
` (7 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Himal Prasad Ghimiray
Now that xe_gt_remove is handled entirely by xe_gt, it's clear there are
some extra calls to xe_hw_fence_irq_finish() that aren't necessary.
Neither all_fw_domain_init() or gt_fw_domain_init() need to do that
since it's handled by the caller on any error.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_gt.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index c33040278e1aa..bd16ca070dd20 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -389,13 +389,11 @@ static void dump_pat_on_error(struct xe_gt *gt)
static int gt_fw_domain_init(struct xe_gt *gt)
{
unsigned int fw_ref;
- int err, i;
+ int err;
fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
- if (!fw_ref) {
- err = -ETIMEDOUT;
- goto err_hw_fence_irq;
- }
+ if (!fw_ref)
+ return -ETIMEDOUT;
if (!xe_gt_is_media_type(gt)) {
err = xe_ggtt_init(gt_to_tile(gt)->mem.ggtt);
@@ -436,9 +434,6 @@ static int gt_fw_domain_init(struct xe_gt *gt)
err_force_wake:
dump_pat_on_error(gt);
xe_force_wake_put(gt_to_fw(gt), fw_ref);
-err_hw_fence_irq:
- for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
- xe_hw_fence_irq_finish(>->fence_irq[i]);
return err;
}
@@ -446,7 +441,7 @@ static int gt_fw_domain_init(struct xe_gt *gt)
static int all_fw_domain_init(struct xe_gt *gt)
{
unsigned int fw_ref;
- int err, i;
+ int err;
fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) {
@@ -524,8 +519,6 @@ static int all_fw_domain_init(struct xe_gt *gt)
err_force_wake:
xe_force_wake_put(gt_to_fw(gt), fw_ref);
- for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
- xe_hw_fence_irq_finish(>->fence_irq[i]);
return err;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 09/13] drm/xe/oa: Move fini to xe_oa
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (7 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 08/13] drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish() Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 10/13] drm/xe: Move drm_dev_unplug() out of display function Lucas De Marchi
` (6 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Ashutosh Dixit
Like done with other functions, cleanup the error handling in
xe_device_probe() by moving the OA fini to be handled by xe_oa
itself, which relies on devm to call the cleanup function.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 7 +----
drivers/gpu/drm/xe/xe_oa.c | 48 +++++++++++++++++-----------------
drivers/gpu/drm/xe/xe_oa.h | 1 -
3 files changed, 25 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 08dd8e9f8f6de..1a3563dbfe36b 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -873,7 +873,7 @@ int xe_device_probe(struct xe_device *xe)
err = xe_display_init(xe);
if (err)
- goto err_fini_oa;
+ return err;
err = xe_pxp_init(xe);
if (err)
@@ -905,9 +905,6 @@ int xe_device_probe(struct xe_device *xe)
err_fini_display:
xe_display_driver_remove(xe);
-err_fini_oa:
- xe_oa_fini(xe);
-
return err;
}
@@ -991,8 +988,6 @@ void xe_device_remove(struct xe_device *xe)
xe_device_remove_display(xe);
- xe_oa_fini(xe);
-
xe_heci_gsc_fini(xe);
xe_device_call_remove_actions(xe);
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index fa873f3d0a9d1..2c640185bdeca 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -2641,6 +2641,27 @@ static void xe_oa_init_supported_formats(struct xe_oa *oa)
}
}
+static int destroy_config(int id, void *p, void *data)
+{
+ xe_oa_config_put(p);
+
+ return 0;
+}
+
+static void xe_oa_fini(void *arg)
+{
+ struct xe_device *xe = arg;
+ struct xe_oa *oa = &xe->oa;
+
+ if (!oa->xe)
+ return;
+
+ idr_for_each(&oa->metrics_idr, destroy_config, oa);
+ idr_destroy(&oa->metrics_idr);
+
+ oa->xe = NULL;
+}
+
/**
* xe_oa_init - OA initialization during device probe
* @xe: @xe_device
@@ -2672,31 +2693,10 @@ int xe_oa_init(struct xe_device *xe)
}
xe_oa_init_supported_formats(oa);
- return 0;
-exit:
- oa->xe = NULL;
- return ret;
-}
-static int destroy_config(int id, void *p, void *data)
-{
- xe_oa_config_put(p);
- return 0;
-}
-
-/**
- * xe_oa_fini - OA de-initialization during device remove
- * @xe: @xe_device
- */
-void xe_oa_fini(struct xe_device *xe)
-{
- struct xe_oa *oa = &xe->oa;
-
- if (!oa->xe)
- return;
-
- idr_for_each(&oa->metrics_idr, destroy_config, oa);
- idr_destroy(&oa->metrics_idr);
+ return devm_add_action_or_reset(xe->drm.dev, xe_oa_fini, xe);
+exit:
oa->xe = NULL;
+ return ret;
}
diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
index 87a38820c317d..eb36ce250c615 100644
--- a/drivers/gpu/drm/xe/xe_oa.h
+++ b/drivers/gpu/drm/xe/xe_oa.h
@@ -15,7 +15,6 @@ struct xe_gt;
struct xe_hw_engine;
int xe_oa_init(struct xe_device *xe);
-void xe_oa_fini(struct xe_device *xe);
void xe_oa_register(struct xe_device *xe);
void xe_oa_unregister(struct xe_device *xe);
int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *file);
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 10/13] drm/xe: Move drm_dev_unplug() out of display function
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (8 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 09/13] drm/xe/oa: Move fini to xe_oa Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 11/13] drm/xe/oa: Handle errors in xe_oa_register() Lucas De Marchi
` (5 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Jani Nikula,
Tejas Upadhyay
This is not really display-related and needed for any sequence on driver
removal that has to interact with drm_dev_enter()/drm_dev_exit().
Just remove xe_device_remove_display() and inline it in the single
caller to make clear this is not done only for display.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 1a3563dbfe36b..2e1b7be7f60ab 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -974,20 +974,16 @@ int xe_device_add_action_or_reset(struct xe_device *xe,
return 0;
}
-static void xe_device_remove_display(struct xe_device *xe)
+void xe_device_remove(struct xe_device *xe)
{
xe_display_unregister(xe);
drm_dev_unplug(&xe->drm);
+
xe_display_driver_remove(xe);
-}
-void xe_device_remove(struct xe_device *xe)
-{
xe_oa_unregister(xe);
- xe_device_remove_display(xe);
-
xe_heci_gsc_fini(xe);
xe_device_call_remove_actions(xe);
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 11/13] drm/xe/oa: Handle errors in xe_oa_register()
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (9 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 10/13] drm/xe: Move drm_dev_unplug() out of display function Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 12/13] drm/xe/pmu: Fail probe if xe_pmu_register() fails Lucas De Marchi
` (4 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Ashutosh Dixit
Let xe_oa_unregister() be handled by devm infra since it's only putting
the kobject. Also, since kobject_create_and_add may fail, handle the
error accordingly.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 14 ++++++++------
drivers/gpu/drm/xe/xe_oa.c | 30 +++++++++++++++---------------
drivers/gpu/drm/xe/xe_oa.h | 3 +--
3 files changed, 24 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 2e1b7be7f60ab..22c9de08b927c 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -877,15 +877,17 @@ int xe_device_probe(struct xe_device *xe)
err = xe_pxp_init(xe);
if (err)
- goto err_fini_display;
+ goto err_remove_display;
err = drm_dev_register(&xe->drm, 0);
if (err)
- goto err_fini_display;
+ goto err_remove_display;
xe_display_register(xe);
- xe_oa_register(xe);
+ err = xe_oa_register(xe);
+ if (err)
+ goto err_unregister_display;
xe_pmu_register(&xe->pmu);
@@ -902,7 +904,9 @@ int xe_device_probe(struct xe_device *xe)
return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
-err_fini_display:
+err_unregister_display:
+ xe_display_unregister(xe);
+err_remove_display:
xe_display_driver_remove(xe);
return err;
@@ -982,8 +986,6 @@ void xe_device_remove(struct xe_device *xe)
xe_display_driver_remove(xe);
- xe_oa_unregister(xe);
-
xe_heci_gsc_fini(xe);
xe_device_call_remove_actions(xe);
diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
index 2c640185bdeca..d89e6cabf5a56 100644
--- a/drivers/gpu/drm/xe/xe_oa.c
+++ b/drivers/gpu/drm/xe/xe_oa.c
@@ -2423,36 +2423,36 @@ int xe_oa_remove_config_ioctl(struct drm_device *dev, u64 data, struct drm_file
return ret;
}
+static void xe_oa_unregister(void *arg)
+{
+ struct xe_oa *oa = arg;
+
+ if (!oa->metrics_kobj)
+ return;
+
+ kobject_put(oa->metrics_kobj);
+ oa->metrics_kobj = NULL;
+}
+
/**
* xe_oa_register - Xe OA registration
* @xe: @xe_device
*
* Exposes the metrics sysfs directory upon completion of module initialization
*/
-void xe_oa_register(struct xe_device *xe)
+int xe_oa_register(struct xe_device *xe)
{
struct xe_oa *oa = &xe->oa;
if (!oa->xe)
- return;
+ return 0;
oa->metrics_kobj = kobject_create_and_add("metrics",
&xe->drm.primary->kdev->kobj);
-}
-
-/**
- * xe_oa_unregister - Xe OA de-registration
- * @xe: @xe_device
- */
-void xe_oa_unregister(struct xe_device *xe)
-{
- struct xe_oa *oa = &xe->oa;
-
if (!oa->metrics_kobj)
- return;
+ return -ENOMEM;
- kobject_put(oa->metrics_kobj);
- oa->metrics_kobj = NULL;
+ return devm_add_action_or_reset(xe->drm.dev, xe_oa_unregister, oa);
}
static u32 num_oa_units_per_gt(struct xe_gt *gt)
diff --git a/drivers/gpu/drm/xe/xe_oa.h b/drivers/gpu/drm/xe/xe_oa.h
index eb36ce250c615..e510826f9efc6 100644
--- a/drivers/gpu/drm/xe/xe_oa.h
+++ b/drivers/gpu/drm/xe/xe_oa.h
@@ -15,8 +15,7 @@ struct xe_gt;
struct xe_hw_engine;
int xe_oa_init(struct xe_device *xe);
-void xe_oa_register(struct xe_device *xe);
-void xe_oa_unregister(struct xe_device *xe);
+int xe_oa_register(struct xe_device *xe);
int xe_oa_stream_open_ioctl(struct drm_device *dev, u64 data, struct drm_file *file);
int xe_oa_add_config_ioctl(struct drm_device *dev, u64 data, struct drm_file *file);
int xe_oa_remove_config_ioctl(struct drm_device *dev, u64 data, struct drm_file *file);
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 12/13] drm/xe/pmu: Fail probe if xe_pmu_register() fails
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (10 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 11/13] drm/xe/oa: Handle errors in xe_oa_register() Lucas De Marchi
@ 2025-02-12 19:35 ` Lucas De Marchi
2025-02-12 19:36 ` [PATCH v4 13/13] drm/xe/hwmon: Stop ignoring errors on probe Lucas De Marchi
` (3 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:35 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Riana Tauro,
Vinay Belgaumkar, Tejas Upadhyay, Himal Prasad Ghimiray
Now that previous callers in xe_device_probe() are handling the errors,
that can be done for xe_pmu_register() as well.
Cc: Riana Tauro <riana.tauro@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 22c9de08b927c..7469bbcdc65f2 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -889,7 +889,9 @@ int xe_device_probe(struct xe_device *xe)
if (err)
goto err_unregister_display;
- xe_pmu_register(&xe->pmu);
+ err = xe_pmu_register(&xe->pmu);
+ if (err)
+ goto err_unregister_display;
xe_debugfs_register(xe);
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 13/13] drm/xe/hwmon: Stop ignoring errors on probe
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (11 preceding siblings ...)
2025-02-12 19:35 ` [PATCH v4 12/13] drm/xe/pmu: Fail probe if xe_pmu_register() fails Lucas De Marchi
@ 2025-02-12 19:36 ` Lucas De Marchi
2025-02-12 23:04 ` ✓ CI.Patch_applied: success for Cleanup error handling on probe (rev4) Patchwork
` (2 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 19:36 UTC (permalink / raw)
To: intel-xe
Cc: Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Lucas De Marchi, Badal Nilawar,
Karthik Poosa, Raag Jadav
Not registering hwmon because it's not available (SRIOV_VF and DGFX) is
different from failing the initialization. Handle the errors
appropriately.
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 4 +++-
drivers/gpu/drm/xe/xe_hwmon.c | 31 ++++++++++++++++---------------
drivers/gpu/drm/xe/xe_hwmon.h | 4 ++--
3 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 7469bbcdc65f2..4b4039cf29fd4 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -895,7 +895,9 @@ int xe_device_probe(struct xe_device *xe)
xe_debugfs_register(xe);
- xe_hwmon_register(xe);
+ err = xe_hwmon_register(xe);
+ if (err)
+ goto err_unregister_display;
for_each_gt(gt, xe, id)
xe_gt_sanitize_freq(gt);
diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index 7f327e3342123..48d80ffdf7bb9 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -839,10 +839,9 @@ static const struct hwmon_chip_info hwmon_chip_info = {
};
static void
-xe_hwmon_get_preregistration_info(struct xe_device *xe)
+xe_hwmon_get_preregistration_info(struct xe_hwmon *hwmon)
{
- struct xe_mmio *mmio = xe_root_tile_mmio(xe);
- struct xe_hwmon *hwmon = xe->hwmon;
+ struct xe_mmio *mmio = xe_root_tile_mmio(hwmon->xe);
long energy;
u64 val_sku_unit = 0;
int channel;
@@ -876,33 +875,34 @@ static void xe_hwmon_mutex_destroy(void *arg)
mutex_destroy(&hwmon->hwmon_lock);
}
-void xe_hwmon_register(struct xe_device *xe)
+int xe_hwmon_register(struct xe_device *xe)
{
struct device *dev = xe->drm.dev;
struct xe_hwmon *hwmon;
+ int ret;
/* hwmon is available only for dGfx */
if (!IS_DGFX(xe))
- return;
+ return 0;
/* hwmon is not available on VFs */
if (IS_SRIOV_VF(xe))
- return;
+ return 0;
hwmon = devm_kzalloc(dev, sizeof(*hwmon), GFP_KERNEL);
if (!hwmon)
- return;
-
- xe->hwmon = hwmon;
+ return -ENOMEM;
mutex_init(&hwmon->hwmon_lock);
- if (devm_add_action_or_reset(dev, xe_hwmon_mutex_destroy, hwmon))
- return;
+ ret = devm_add_action_or_reset(dev, xe_hwmon_mutex_destroy, hwmon);
+ if (ret)
+ return ret;
/* There's only one instance of hwmon per device */
hwmon->xe = xe;
+ xe->hwmon = hwmon;
- xe_hwmon_get_preregistration_info(xe);
+ xe_hwmon_get_preregistration_info(hwmon);
drm_dbg(&xe->drm, "Register xe hwmon interface\n");
@@ -910,11 +910,12 @@ void xe_hwmon_register(struct xe_device *xe)
hwmon->hwmon_dev = devm_hwmon_device_register_with_info(dev, "xe", hwmon,
&hwmon_chip_info,
hwmon_groups);
-
if (IS_ERR(hwmon->hwmon_dev)) {
- drm_warn(&xe->drm, "Failed to register xe hwmon (%pe)\n", hwmon->hwmon_dev);
+ drm_err(&xe->drm, "Failed to register xe hwmon (%pe)\n", hwmon->hwmon_dev);
xe->hwmon = NULL;
- return;
+ return PTR_ERR(hwmon->hwmon_dev);
}
+
+ return 0;
}
diff --git a/drivers/gpu/drm/xe/xe_hwmon.h b/drivers/gpu/drm/xe/xe_hwmon.h
index c42a1de2cd7a2..d02c1bfe8c0a0 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.h
+++ b/drivers/gpu/drm/xe/xe_hwmon.h
@@ -11,9 +11,9 @@
struct xe_device;
#if IS_REACHABLE(CONFIG_HWMON)
-void xe_hwmon_register(struct xe_device *xe);
+int xe_hwmon_register(struct xe_device *xe);
#else
-static inline void xe_hwmon_register(struct xe_device *xe) { };
+static inline int xe_hwmon_register(struct xe_device *xe) { return 0; };
#endif
#endif /* _XE_HWMON_H_ */
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v4 01/13] drm/xe: Add callback support for driver remove
2025-02-12 19:35 ` [PATCH v4 01/13] drm/xe: Add callback support for driver remove Lucas De Marchi
@ 2025-02-12 20:01 ` Rodrigo Vivi
2025-02-12 20:11 ` Lucas De Marchi
2025-02-12 21:19 ` Raag Jadav
1 sibling, 1 reply; 23+ messages in thread
From: Rodrigo Vivi @ 2025-02-12 20:01 UTC (permalink / raw)
To: Lucas De Marchi
Cc: intel-xe, Francois Dugast, Matthew Auld, Daniele Ceraolo Spurio,
Thomas Hellström
On Wed, Feb 12, 2025 at 11:35:48AM -0800, Lucas De Marchi wrote:
> xe device probe uses devm cleanup in most places. However there are a
> few cases where this is not possible: when the driver interacts with
> component add/del. In that case, the resource group would be cleanup
> while the entire device resources are in the process of cleanup. One
> example is the xe_gsc_proxy and display using that to interact with mei
> and audio.
>
> Add a callback-based remove so the exception doesn't make the probe
> use multiple error handling styles.
>
> v2: Change internal API to mimic the devm API. This will make it easier
> to migrate in future when devm can be used.
>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
> drivers/gpu/drm/xe/xe_device.c | 79 ++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_device.h | 4 ++
> drivers/gpu/drm/xe/xe_device_types.h | 17 ++++++
> drivers/gpu/drm/xe/xe_pci.c | 4 +-
> 4 files changed, 103 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 36d7ffb3b4d90..69bde506ee87e 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -65,6 +65,12 @@
>
> #include <generated/xe_wa_oob.h>
>
> +struct xe_device_remove_action {
> + struct list_head node;
> + xe_device_remove_action_t remove;
> + void *data;
> +};
> +
> static int xe_file_open(struct drm_device *dev, struct drm_file *file)
> {
> struct xe_device *xe = to_xe_device(dev);
> @@ -746,6 +752,9 @@ int xe_device_probe(struct xe_device *xe)
> u8 last_gt;
> u8 id;
>
> + xe->probing = true;
> + INIT_LIST_HEAD(&xe->remove_action_list);
> +
> xe_pat_init_early(xe);
>
> err = xe_sriov_init(xe);
> @@ -886,6 +895,8 @@ int xe_device_probe(struct xe_device *xe)
>
> xe_vsec_init(xe);
>
> + xe->probing = false;
> +
> return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
>
> err_fini_display:
> @@ -907,6 +918,72 @@ int xe_device_probe(struct xe_device *xe)
> return err;
> }
>
> +/**
> + * xe_device_call_remove_actions - Call the remove actions
> + * @xe: xe device instance
> + *
> + * This is only to be used by xe_pci and xe_device to call the remove actions
> + * while removing the driver or handling probe failures.
> + */
> +void xe_device_call_remove_actions(struct xe_device *xe)
> +{
> + struct xe_device_remove_action *ra, *tmp;
> +
> + list_for_each_entry_safe(ra, tmp, &xe->remove_action_list, node) {
> + ra->remove(xe, ra->data);
> + list_del(&ra->node);
> + kfree(ra);
> + }
> +
> + xe->probing = false;
> +}
> +
> +/**
> + * xe_device_add_action_or_reset - Add an action to run on driver removal
> + * @xe: xe device instance
> + * @ra: pointer to the object embedded into the object to cleanup
> + * @remove: function to execute. The @ra is passed as argument
> + *
> + * Example:
> + *
> + * .. code-block:: c
> + *
> + * static void foo_remove(struct xe_device_remove_action *ra)
> + * {
> + * struct xe_foo *foo = container_of(ra, struct xe_foo, remove_action);
> + * ...
> + * }
> + *
> + * int xe_foo_init(struct xe_foo *foo)
> + * {
> + * ...
> + * xe_device_add_remove_action(xe, &foo->remove_action, foo_remove);
> + * ...
> + * return 0;
> + * };
I still believe we should add here a note here to highlight this is the
exception and that devm should be preferred. But up to you, the
explanation in the commit message makes more sense now and the patch
is right. I hope we can get some devm solution to handle this component
case. But let's move on:
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> + */
> +int xe_device_add_action_or_reset(struct xe_device *xe,
> + xe_device_remove_action_t action,
> + void *data)
> +{
> + struct xe_device_remove_action *ra;
> +
> + drm_WARN_ON(&xe->drm, !xe->probing);
> +
> + ra = kmalloc(sizeof(*ra), GFP_KERNEL);
> + if (!ra) {
> + action(xe, data);
> + return -ENOMEM;
> + }
> +
> + INIT_LIST_HEAD(&ra->node);
> + ra->remove = action;
> + ra->data = data;
> + list_add(&ra->node, &xe->remove_action_list);
> +
> + return 0;
> +}
> +
> static void xe_device_remove_display(struct xe_device *xe)
> {
> xe_display_unregister(xe);
> @@ -932,6 +1009,8 @@ void xe_device_remove(struct xe_device *xe)
>
> for_each_gt(gt, xe, id)
> xe_gt_remove(gt);
> +
> + xe_device_call_remove_actions(xe);
> }
>
> void xe_device_shutdown(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index 0bc3bc8e68030..a6fedf1ef3c7b 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -45,6 +45,10 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
> const struct pci_device_id *ent);
> int xe_device_probe_early(struct xe_device *xe);
> int xe_device_probe(struct xe_device *xe);
> +int xe_device_add_action_or_reset(struct xe_device *xe,
> + xe_device_remove_action_t action,
> + void *data);
> +void xe_device_call_remove_actions(struct xe_device *xe);
> void xe_device_remove(struct xe_device *xe);
> void xe_device_shutdown(struct xe_device *xe);
>
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 833c29fed3a37..b322d49c83c77 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -35,6 +35,7 @@
> #include "intel_display_device.h"
> #endif
>
> +struct xe_device;
> struct xe_ggtt;
> struct xe_pat_ops;
> struct xe_pxp;
> @@ -70,6 +71,8 @@ struct xe_pxp;
> const struct xe_tile * : (const struct xe_device *)((tile__)->xe), \
> struct xe_tile * : (tile__)->xe)
>
> +typedef void (*xe_device_remove_action_t)(struct xe_device *xe, void *data);
> +
> /**
> * struct xe_vram_region - memory region structure
> * This is used to describe a memory region in xe
> @@ -428,6 +431,20 @@ struct xe_device {
> /** @tiles: device tiles */
> struct xe_tile tiles[XE_MAX_TILES_PER_DEVICE];
>
> + /**
> + * @remove_action_list: list of actions to execute on device remove.
> + * Use xe_device_add_remove_action() for that. Actions can only be added
> + * during probe and are executed during the call from PCI subsystem to
> + * remove the driver from the device.
> + */
> + struct list_head remove_action_list;
> +
> + /**
> + * @probing: cover the section in which @remove_action_list can be used
> + * to post cleaning actions
> + */
> + bool probing;
> +
> /**
> * @mem_access: keep track of memory access in the device, possibly
> * triggering additional actions when they occur.
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index 6a8e82aff3853..70b697fde5b96 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -905,8 +905,10 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> return err;
>
> err = xe_device_probe(xe);
> - if (err)
> + if (err) {
> + xe_device_call_remove_actions(xe);
> return err;
> + }
>
> err = xe_pm_init(xe);
> if (err)
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 01/13] drm/xe: Add callback support for driver remove
2025-02-12 20:01 ` Rodrigo Vivi
@ 2025-02-12 20:11 ` Lucas De Marchi
2025-02-14 20:55 ` Lucas De Marchi
0 siblings, 1 reply; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 20:11 UTC (permalink / raw)
To: Rodrigo Vivi
Cc: intel-xe, Francois Dugast, Matthew Auld, Daniele Ceraolo Spurio,
Thomas Hellström
On Wed, Feb 12, 2025 at 03:01:12PM -0500, Rodrigo Vivi wrote:
>On Wed, Feb 12, 2025 at 11:35:48AM -0800, Lucas De Marchi wrote:
>> xe device probe uses devm cleanup in most places. However there are a
>> few cases where this is not possible: when the driver interacts with
>> component add/del. In that case, the resource group would be cleanup
>> while the entire device resources are in the process of cleanup. One
>> example is the xe_gsc_proxy and display using that to interact with mei
>> and audio.
>>
>> Add a callback-based remove so the exception doesn't make the probe
>> use multiple error handling styles.
>>
>> v2: Change internal API to mimic the devm API. This will make it easier
>> to migrate in future when devm can be used.
>>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_device.c | 79 ++++++++++++++++++++++++++++
>> drivers/gpu/drm/xe/xe_device.h | 4 ++
>> drivers/gpu/drm/xe/xe_device_types.h | 17 ++++++
>> drivers/gpu/drm/xe/xe_pci.c | 4 +-
>> 4 files changed, 103 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> index 36d7ffb3b4d90..69bde506ee87e 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -65,6 +65,12 @@
>>
>> #include <generated/xe_wa_oob.h>
>>
>> +struct xe_device_remove_action {
>> + struct list_head node;
>> + xe_device_remove_action_t remove;
>> + void *data;
>> +};
>> +
>> static int xe_file_open(struct drm_device *dev, struct drm_file *file)
>> {
>> struct xe_device *xe = to_xe_device(dev);
>> @@ -746,6 +752,9 @@ int xe_device_probe(struct xe_device *xe)
>> u8 last_gt;
>> u8 id;
>>
>> + xe->probing = true;
>> + INIT_LIST_HEAD(&xe->remove_action_list);
>> +
>> xe_pat_init_early(xe);
>>
>> err = xe_sriov_init(xe);
>> @@ -886,6 +895,8 @@ int xe_device_probe(struct xe_device *xe)
>>
>> xe_vsec_init(xe);
>>
>> + xe->probing = false;
>> +
>> return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
>>
>> err_fini_display:
>> @@ -907,6 +918,72 @@ int xe_device_probe(struct xe_device *xe)
>> return err;
>> }
>>
>> +/**
>> + * xe_device_call_remove_actions - Call the remove actions
>> + * @xe: xe device instance
>> + *
>> + * This is only to be used by xe_pci and xe_device to call the remove actions
>> + * while removing the driver or handling probe failures.
>> + */
>> +void xe_device_call_remove_actions(struct xe_device *xe)
>> +{
>> + struct xe_device_remove_action *ra, *tmp;
>> +
>> + list_for_each_entry_safe(ra, tmp, &xe->remove_action_list, node) {
>> + ra->remove(xe, ra->data);
>> + list_del(&ra->node);
>> + kfree(ra);
>> + }
>> +
>> + xe->probing = false;
>> +}
>> +
>> +/**
>> + * xe_device_add_action_or_reset - Add an action to run on driver removal
>> + * @xe: xe device instance
>> + * @ra: pointer to the object embedded into the object to cleanup
>> + * @remove: function to execute. The @ra is passed as argument
>> + *
>> + * Example:
>> + *
>> + * .. code-block:: c
>> + *
>> + * static void foo_remove(struct xe_device_remove_action *ra)
>> + * {
>> + * struct xe_foo *foo = container_of(ra, struct xe_foo, remove_action);
>> + * ...
>> + * }
>> + *
>> + * int xe_foo_init(struct xe_foo *foo)
>> + * {
>> + * ...
>> + * xe_device_add_remove_action(xe, &foo->remove_action, foo_remove);
>> + * ...
>> + * return 0;
>> + * };
>
>I still believe we should add here a note here to highlight this is the
>exception and that devm should be preferred. But up to you, the
>explanation in the commit message makes more sense now and the patch
>is right. I hope we can get some devm solution to handle this component
arrgh, sorry. I was too entertained with the devm solution that I forgot
the update in this doc that you requested in previous version. My bad.
The devm part is here:
https://lore.kernel.org/dri-devel/20250212200542.515493-1-lucas.demarchi@intel.com/
In the end I could simplify it much more than I thought and what is
required on devres side is just one patch, 2 lines diff. Hopefully I
didn't simplify it too much. Let's see. If that one is accepted through
intel-xe tree, then we may even drop this first patch and go straight
with that solution.
Lucas De Marchi
>case. But let's move on:
>
>Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
I will wait some feedback on that other series to see how we are going
to proceed. If we keep this patch, then I will update this doc.
thanks
Lucas De Marchi
>
>> + */
>> +int xe_device_add_action_or_reset(struct xe_device *xe,
>> + xe_device_remove_action_t action,
>> + void *data)
>> +{
>> + struct xe_device_remove_action *ra;
>> +
>> + drm_WARN_ON(&xe->drm, !xe->probing);
>> +
>> + ra = kmalloc(sizeof(*ra), GFP_KERNEL);
>> + if (!ra) {
>> + action(xe, data);
>> + return -ENOMEM;
>> + }
>> +
>> + INIT_LIST_HEAD(&ra->node);
>> + ra->remove = action;
>> + ra->data = data;
>> + list_add(&ra->node, &xe->remove_action_list);
>> +
>> + return 0;
>> +}
>> +
>> static void xe_device_remove_display(struct xe_device *xe)
>> {
>> xe_display_unregister(xe);
>> @@ -932,6 +1009,8 @@ void xe_device_remove(struct xe_device *xe)
>>
>> for_each_gt(gt, xe, id)
>> xe_gt_remove(gt);
>> +
>> + xe_device_call_remove_actions(xe);
>> }
>>
>> void xe_device_shutdown(struct xe_device *xe)
>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>> index 0bc3bc8e68030..a6fedf1ef3c7b 100644
>> --- a/drivers/gpu/drm/xe/xe_device.h
>> +++ b/drivers/gpu/drm/xe/xe_device.h
>> @@ -45,6 +45,10 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
>> const struct pci_device_id *ent);
>> int xe_device_probe_early(struct xe_device *xe);
>> int xe_device_probe(struct xe_device *xe);
>> +int xe_device_add_action_or_reset(struct xe_device *xe,
>> + xe_device_remove_action_t action,
>> + void *data);
>> +void xe_device_call_remove_actions(struct xe_device *xe);
>> void xe_device_remove(struct xe_device *xe);
>> void xe_device_shutdown(struct xe_device *xe);
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index 833c29fed3a37..b322d49c83c77 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -35,6 +35,7 @@
>> #include "intel_display_device.h"
>> #endif
>>
>> +struct xe_device;
>> struct xe_ggtt;
>> struct xe_pat_ops;
>> struct xe_pxp;
>> @@ -70,6 +71,8 @@ struct xe_pxp;
>> const struct xe_tile * : (const struct xe_device *)((tile__)->xe), \
>> struct xe_tile * : (tile__)->xe)
>>
>> +typedef void (*xe_device_remove_action_t)(struct xe_device *xe, void *data);
>> +
>> /**
>> * struct xe_vram_region - memory region structure
>> * This is used to describe a memory region in xe
>> @@ -428,6 +431,20 @@ struct xe_device {
>> /** @tiles: device tiles */
>> struct xe_tile tiles[XE_MAX_TILES_PER_DEVICE];
>>
>> + /**
>> + * @remove_action_list: list of actions to execute on device remove.
>> + * Use xe_device_add_remove_action() for that. Actions can only be added
>> + * during probe and are executed during the call from PCI subsystem to
>> + * remove the driver from the device.
>> + */
>> + struct list_head remove_action_list;
>> +
>> + /**
>> + * @probing: cover the section in which @remove_action_list can be used
>> + * to post cleaning actions
>> + */
>> + bool probing;
>> +
>> /**
>> * @mem_access: keep track of memory access in the device, possibly
>> * triggering additional actions when they occur.
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index 6a8e82aff3853..70b697fde5b96 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -905,8 +905,10 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>> return err;
>>
>> err = xe_device_probe(xe);
>> - if (err)
>> + if (err) {
>> + xe_device_call_remove_actions(xe);
>> return err;
>> + }
>>
>> err = xe_pm_init(xe);
>> if (err)
>> --
>> 2.48.1
>>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 01/13] drm/xe: Add callback support for driver remove
2025-02-12 19:35 ` [PATCH v4 01/13] drm/xe: Add callback support for driver remove Lucas De Marchi
2025-02-12 20:01 ` Rodrigo Vivi
@ 2025-02-12 21:19 ` Raag Jadav
2025-02-12 21:28 ` Lucas De Marchi
1 sibling, 1 reply; 23+ messages in thread
From: Raag Jadav @ 2025-02-12 21:19 UTC (permalink / raw)
To: Lucas De Marchi
Cc: intel-xe, Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Thomas Hellström
On Wed, Feb 12, 2025 at 11:35:48AM -0800, Lucas De Marchi wrote:
> xe device probe uses devm cleanup in most places. However there are a
> few cases where this is not possible: when the driver interacts with
> component add/del. In that case, the resource group would be cleanup
> while the entire device resources are in the process of cleanup. One
> example is the xe_gsc_proxy and display using that to interact with mei
> and audio.
>
> Add a callback-based remove so the exception doesn't make the probe
> use multiple error handling styles.
>
> v2: Change internal API to mimic the devm API. This will make it easier
> to migrate in future when devm can be used.
Which means we'd like to go back to devm action someday, or is that even
possible? Assuming it is, and still worth it, why not try to do that
instead?
Raag
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 01/13] drm/xe: Add callback support for driver remove
2025-02-12 21:19 ` Raag Jadav
@ 2025-02-12 21:28 ` Lucas De Marchi
2025-02-13 4:41 ` Raag Jadav
0 siblings, 1 reply; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-12 21:28 UTC (permalink / raw)
To: Raag Jadav
Cc: intel-xe, Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Thomas Hellström
On Wed, Feb 12, 2025 at 11:19:22PM +0200, Raag Jadav wrote:
>On Wed, Feb 12, 2025 at 11:35:48AM -0800, Lucas De Marchi wrote:
>> xe device probe uses devm cleanup in most places. However there are a
>> few cases where this is not possible: when the driver interacts with
>> component add/del. In that case, the resource group would be cleanup
>> while the entire device resources are in the process of cleanup. One
>> example is the xe_gsc_proxy and display using that to interact with mei
>> and audio.
>>
>> Add a callback-based remove so the exception doesn't make the probe
>> use multiple error handling styles.
>>
>> v2: Change internal API to mimic the devm API. This will make it easier
>> to migrate in future when devm can be used.
>
>Which means we'd like to go back to devm action someday, or is that even
>possible? Assuming it is, and still worth it, why not try to do that
>instead?
From the cover letter:
I have other changes on top of these that will make devm compatible.
That will need some drivers/base/ changes though, so it's probably good
to do it in parallel.
From my reply to Rodrigo:
The devm part is here:
https://lore.kernel.org/dri-devel/20250212200542.515493-1-lucas.demarchi@intel.com/
In the end I could simplify it much more than I thought and what is
required on devres side is just one patch, 2 lines diff. Hopefully I
didn't simplify it too much. Let's see. If that one is accepted through
intel-xe tree, then we may even drop this first patch and go straight
with that solution.
...
I will wait some feedback on that other series to see how we are going
to proceed. If we keep this patch, then I will update this doc.
I hope this clarifies,
thanks
Lucas De Marchi
>
>Raag
^ permalink raw reply [flat|nested] 23+ messages in thread
* ✓ CI.Patch_applied: success for Cleanup error handling on probe (rev4)
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (12 preceding siblings ...)
2025-02-12 19:36 ` [PATCH v4 13/13] drm/xe/hwmon: Stop ignoring errors on probe Lucas De Marchi
@ 2025-02-12 23:04 ` Patchwork
2025-02-12 23:05 ` ✓ CI.checkpatch: " Patchwork
2025-02-12 23:07 ` ✗ CI.KUnit: failure " Patchwork
15 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2025-02-12 23:04 UTC (permalink / raw)
To: Lucas De Marchi; +Cc: intel-xe
== Series Details ==
Series: Cleanup error handling on probe (rev4)
URL : https://patchwork.freedesktop.org/series/144211/
State : success
== Summary ==
=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: fd03884f35d1 drm-tip: 2025y-02m-12d-22h-21m-23s UTC integration manifest
=== git am output follows ===
Applying: drm/xe: Add callback support for driver remove
Applying: drm/xe: Fix xe_display_fini() calls
Applying: drm/xe: Fix error handling in xe_irq_install()
Applying: drm/xe: Fix xe_tile_init_noalloc() error propagation
Applying: drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init()
Applying: drm/xe: Remove leftover pxp comment
Applying: drm/xe: Cleanup unwind of gt initialization
Applying: drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish()
Applying: drm/xe/oa: Move fini to xe_oa
Applying: drm/xe: Move drm_dev_unplug() out of display function
Applying: drm/xe/oa: Handle errors in xe_oa_register()
Applying: drm/xe/pmu: Fail probe if xe_pmu_register() fails
Applying: drm/xe/hwmon: Stop ignoring errors on probe
^ permalink raw reply [flat|nested] 23+ messages in thread
* ✓ CI.checkpatch: success for Cleanup error handling on probe (rev4)
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (13 preceding siblings ...)
2025-02-12 23:04 ` ✓ CI.Patch_applied: success for Cleanup error handling on probe (rev4) Patchwork
@ 2025-02-12 23:05 ` Patchwork
2025-02-12 23:07 ` ✗ CI.KUnit: failure " Patchwork
15 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2025-02-12 23:05 UTC (permalink / raw)
To: Lucas De Marchi; +Cc: intel-xe
== Series Details ==
Series: Cleanup error handling on probe (rev4)
URL : https://patchwork.freedesktop.org/series/144211/
State : success
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
22f9cda3436b4fe965b5c5f31d2f2c1bcb483189
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit df1816ccafadfcd630b2c498e2ed59ebef1d0d71
Author: Lucas De Marchi <lucas.demarchi@intel.com>
Date: Wed Feb 12 11:36:00 2025 -0800
drm/xe/hwmon: Stop ignoring errors on probe
Not registering hwmon because it's not available (SRIOV_VF and DGFX) is
different from failing the initialization. Handle the errors
appropriately.
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
+ /mt/dim checkpatch fd03884f35d18372b0f7abb2fd33c55cee599bc3 drm-intel
6239c132d025 drm/xe: Add callback support for driver remove
d78c0d0ecbb4 drm/xe: Fix xe_display_fini() calls
e275c3a80220 drm/xe: Fix error handling in xe_irq_install()
46735dbcc491 drm/xe: Fix xe_tile_init_noalloc() error propagation
833aba3ef616 drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init()
23858c9bedd9 drm/xe: Remove leftover pxp comment
cb8f5cad56a2 drm/xe: Cleanup unwind of gt initialization
2578ea8beb9c drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish()
a52859de774e drm/xe/oa: Move fini to xe_oa
652528572ca1 drm/xe: Move drm_dev_unplug() out of display function
e7ec773eea6b drm/xe/oa: Handle errors in xe_oa_register()
c8fd7c166cda drm/xe/pmu: Fail probe if xe_pmu_register() fails
df1816ccafad drm/xe/hwmon: Stop ignoring errors on probe
^ permalink raw reply [flat|nested] 23+ messages in thread
* ✗ CI.KUnit: failure for Cleanup error handling on probe (rev4)
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
` (14 preceding siblings ...)
2025-02-12 23:05 ` ✓ CI.checkpatch: " Patchwork
@ 2025-02-12 23:07 ` Patchwork
15 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2025-02-12 23:07 UTC (permalink / raw)
To: Lucas De Marchi; +Cc: intel-xe
== Series Details ==
Series: Cleanup error handling on probe (rev4)
URL : https://patchwork.freedesktop.org/series/144211/
State : failure
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../scripts/Makefile.build:41: ../drivers/gpu/drm/i2c/Makefile: No such file or directory
make[7]: *** No rule to make target '../drivers/gpu/drm/i2c/Makefile'. Stop.
make[6]: *** [../scripts/Makefile.build:465: drivers/gpu/drm/i2c] Error 2
make[6]: *** Waiting for unfinished jobs....
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
156 | u64 ioread64_lo_hi(const void __iomem *addr)
| ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
163 | u64 ioread64_hi_lo(const void __iomem *addr)
| ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
170 | u64 ioread64be_lo_hi(const void __iomem *addr)
| ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
178 | u64 ioread64be_hi_lo(const void __iomem *addr)
| ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~~~
make[5]: *** [../scripts/Makefile.build:465: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:465: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:465: drivers] Error 2
make[2]: *** [/kernel/Makefile:1994: .] Error 2
make[1]: *** [/kernel/Makefile:251: __sub-make] Error 2
make: *** [Makefile:251: __sub-make] Error 2
[23:05:33] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[23:05:37] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 01/13] drm/xe: Add callback support for driver remove
2025-02-12 21:28 ` Lucas De Marchi
@ 2025-02-13 4:41 ` Raag Jadav
0 siblings, 0 replies; 23+ messages in thread
From: Raag Jadav @ 2025-02-13 4:41 UTC (permalink / raw)
To: Lucas De Marchi
Cc: intel-xe, Rodrigo Vivi, Francois Dugast, Matthew Auld,
Daniele Ceraolo Spurio, Thomas Hellström
On Wed, Feb 12, 2025 at 03:28:15PM -0600, Lucas De Marchi wrote:
> On Wed, Feb 12, 2025 at 11:19:22PM +0200, Raag Jadav wrote:
> > On Wed, Feb 12, 2025 at 11:35:48AM -0800, Lucas De Marchi wrote:
> > > xe device probe uses devm cleanup in most places. However there are a
> > > few cases where this is not possible: when the driver interacts with
> > > component add/del. In that case, the resource group would be cleanup
> > > while the entire device resources are in the process of cleanup. One
> > > example is the xe_gsc_proxy and display using that to interact with mei
> > > and audio.
> > >
> > > Add a callback-based remove so the exception doesn't make the probe
> > > use multiple error handling styles.
> > >
> > > v2: Change internal API to mimic the devm API. This will make it easier
> > > to migrate in future when devm can be used.
> >
> > Which means we'd like to go back to devm action someday, or is that even
> > possible? Assuming it is, and still worth it, why not try to do that
> > instead?
>
> From the cover letter:
> I have other changes on top of these that will make devm compatible.
> That will need some drivers/base/ changes though, so it's probably good
> to do it in parallel.
>
> From my reply to Rodrigo:
> The devm part is here:
> https://lore.kernel.org/dri-devel/20250212200542.515493-1-lucas.demarchi@intel.com/
>
> In the end I could simplify it much more than I thought and what is
> required on devres side is just one patch, 2 lines diff. Hopefully I
> didn't simplify it too much. Let's see. If that one is accepted through
> intel-xe tree, then we may even drop this first patch and go straight
> with that solution.
>
> ...
>
> I will wait some feedback on that other series to see how we are going
> to proceed. If we keep this patch, then I will update this doc.
>
> I hope this clarifies,
Yep, just trying to weigh in.
Side note: There's a devres.h split in progress[1]. In case you'll be touching
these in future versions, please let me know. We'll do it in a cleaner way that
doesn't conflict.
[1] https://lore.kernel.org/r/20250212062513.2254767-1-raag.jadav@intel.com/
Raag
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 01/13] drm/xe: Add callback support for driver remove
2025-02-12 20:11 ` Lucas De Marchi
@ 2025-02-14 20:55 ` Lucas De Marchi
0 siblings, 0 replies; 23+ messages in thread
From: Lucas De Marchi @ 2025-02-14 20:55 UTC (permalink / raw)
To: Rodrigo Vivi
Cc: intel-xe, Francois Dugast, Matthew Auld, Daniele Ceraolo Spurio,
Thomas Hellström
On Wed, Feb 12, 2025 at 02:11:42PM -0600, Lucas De Marchi wrote:
>On Wed, Feb 12, 2025 at 03:01:12PM -0500, Rodrigo Vivi wrote:
>>On Wed, Feb 12, 2025 at 11:35:48AM -0800, Lucas De Marchi wrote:
>>>xe device probe uses devm cleanup in most places. However there are a
>>>few cases where this is not possible: when the driver interacts with
>>>component add/del. In that case, the resource group would be cleanup
>>>while the entire device resources are in the process of cleanup. One
>>>example is the xe_gsc_proxy and display using that to interact with mei
>>>and audio.
>>>
>>>Add a callback-based remove so the exception doesn't make the probe
>>>use multiple error handling styles.
>>>
>>>v2: Change internal API to mimic the devm API. This will make it easier
>>> to migrate in future when devm can be used.
>>>
>>>Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>>>---
>>> drivers/gpu/drm/xe/xe_device.c | 79 ++++++++++++++++++++++++++++
>>> drivers/gpu/drm/xe/xe_device.h | 4 ++
>>> drivers/gpu/drm/xe/xe_device_types.h | 17 ++++++
>>> drivers/gpu/drm/xe/xe_pci.c | 4 +-
>>> 4 files changed, 103 insertions(+), 1 deletion(-)
>>>
>>>diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>>>index 36d7ffb3b4d90..69bde506ee87e 100644
>>>--- a/drivers/gpu/drm/xe/xe_device.c
>>>+++ b/drivers/gpu/drm/xe/xe_device.c
>>>@@ -65,6 +65,12 @@
>>>
>>> #include <generated/xe_wa_oob.h>
>>>
>>>+struct xe_device_remove_action {
>>>+ struct list_head node;
>>>+ xe_device_remove_action_t remove;
>>>+ void *data;
>>>+};
>>>+
>>> static int xe_file_open(struct drm_device *dev, struct drm_file *file)
>>> {
>>> struct xe_device *xe = to_xe_device(dev);
>>>@@ -746,6 +752,9 @@ int xe_device_probe(struct xe_device *xe)
>>> u8 last_gt;
>>> u8 id;
>>>
>>>+ xe->probing = true;
>>>+ INIT_LIST_HEAD(&xe->remove_action_list);
>>>+
>>> xe_pat_init_early(xe);
>>>
>>> err = xe_sriov_init(xe);
>>>@@ -886,6 +895,8 @@ int xe_device_probe(struct xe_device *xe)
>>>
>>> xe_vsec_init(xe);
>>>
>>>+ xe->probing = false;
>>>+
>>> return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
>>>
>>> err_fini_display:
>>>@@ -907,6 +918,72 @@ int xe_device_probe(struct xe_device *xe)
>>> return err;
>>> }
>>>
>>>+/**
>>>+ * xe_device_call_remove_actions - Call the remove actions
>>>+ * @xe: xe device instance
>>>+ *
>>>+ * This is only to be used by xe_pci and xe_device to call the remove actions
>>>+ * while removing the driver or handling probe failures.
>>>+ */
>>>+void xe_device_call_remove_actions(struct xe_device *xe)
>>>+{
>>>+ struct xe_device_remove_action *ra, *tmp;
>>>+
>>>+ list_for_each_entry_safe(ra, tmp, &xe->remove_action_list, node) {
>>>+ ra->remove(xe, ra->data);
>>>+ list_del(&ra->node);
>>>+ kfree(ra);
>>>+ }
>>>+
>>>+ xe->probing = false;
>>>+}
>>>+
>>>+/**
>>>+ * xe_device_add_action_or_reset - Add an action to run on driver removal
>>>+ * @xe: xe device instance
>>>+ * @ra: pointer to the object embedded into the object to cleanup
>>>+ * @remove: function to execute. The @ra is passed as argument
>>>+ *
>>>+ * Example:
>>>+ *
>>>+ * .. code-block:: c
>>>+ *
>>>+ * static void foo_remove(struct xe_device_remove_action *ra)
>>>+ * {
>>>+ * struct xe_foo *foo = container_of(ra, struct xe_foo, remove_action);
>>>+ * ...
>>>+ * }
>>>+ *
>>>+ * int xe_foo_init(struct xe_foo *foo)
>>>+ * {
>>>+ * ...
>>>+ * xe_device_add_remove_action(xe, &foo->remove_action, foo_remove);
>>>+ * ...
>>>+ * return 0;
>>>+ * };
>>
>>I still believe we should add here a note here to highlight this is the
>>exception and that devm should be preferred. But up to you, the
>>explanation in the commit message makes more sense now and the patch
>>is right. I hope we can get some devm solution to handle this component
>
>arrgh, sorry. I was too entertained with the devm solution that I forgot
>the update in this doc that you requested in previous version. My bad.
>
>The devm part is here:
>https://lore.kernel.org/dri-devel/20250212200542.515493-1-lucas.demarchi@intel.com/
>
>In the end I could simplify it much more than I thought and what is
>required on devres side is just one patch, 2 lines diff. Hopefully I
>didn't simplify it too much. Let's see. If that one is accepted through
>intel-xe tree, then we may even drop this first patch and go straight
>with that solution.
let's not rush that other series and continue improving xe - there still
some work to do on the xe side that would otherwise be blocked.
Thank you all for reviews. Series pushed to drm-xe-next.
Lucas De Marchi
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-02-14 20:56 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-12 19:35 [PATCH v4 00/13] Cleanup error handling on probe Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 01/13] drm/xe: Add callback support for driver remove Lucas De Marchi
2025-02-12 20:01 ` Rodrigo Vivi
2025-02-12 20:11 ` Lucas De Marchi
2025-02-14 20:55 ` Lucas De Marchi
2025-02-12 21:19 ` Raag Jadav
2025-02-12 21:28 ` Lucas De Marchi
2025-02-13 4:41 ` Raag Jadav
2025-02-12 19:35 ` [PATCH v4 02/13] drm/xe: Fix xe_display_fini() calls Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 03/13] drm/xe: Fix error handling in xe_irq_install() Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 04/13] drm/xe: Fix xe_tile_init_noalloc() error propagation Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 05/13] drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init() Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 06/13] drm/xe: Remove leftover pxp comment Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 07/13] drm/xe: Cleanup unwind of gt initialization Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 08/13] drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish() Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 09/13] drm/xe/oa: Move fini to xe_oa Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 10/13] drm/xe: Move drm_dev_unplug() out of display function Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 11/13] drm/xe/oa: Handle errors in xe_oa_register() Lucas De Marchi
2025-02-12 19:35 ` [PATCH v4 12/13] drm/xe/pmu: Fail probe if xe_pmu_register() fails Lucas De Marchi
2025-02-12 19:36 ` [PATCH v4 13/13] drm/xe/hwmon: Stop ignoring errors on probe Lucas De Marchi
2025-02-12 23:04 ` ✓ CI.Patch_applied: success for Cleanup error handling on probe (rev4) Patchwork
2025-02-12 23:05 ` ✓ CI.checkpatch: " Patchwork
2025-02-12 23:07 ` ✗ CI.KUnit: failure " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.