From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: intel-xe@lists.freedesktop.org,
"Francois Dugast" <francois.dugast@intel.com>,
"Matthew Auld" <matthew.auld@intel.com>,
"Daniele Ceraolo Spurio" <daniele.ceraolospurio@intel.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH v3 06/13] drm/xe: Add callback support for driver remove
Date: Mon, 10 Feb 2025 12:41:46 -0500 [thread overview]
Message-ID: <Z6o6WpGqz_B0bj3W@intel.com> (raw)
In-Reply-To: <20250207221945.2878241-7-lucas.demarchi@intel.com>
On Fri, Feb 07, 2025 at 02:19:38PM -0800, Lucas De Marchi wrote:
> xe device probe uses devm cleanup in most places. However there are a
> few that are not possible: when the driver interacts with other
"few cases where this is not possible" ?!
> subsystems that require the cleanup to happen before the device being
> removed from the bus. One example is the component_* APIs used by
> xe_gsc_proxy and display.
>
> Add a callback-based remove so the exception don't make the probe
> use multiple error handling styles.
>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
> drivers/gpu/drm/xe/xe_device.c | 59 ++++++++++++++++++++
> drivers/gpu/drm/xe/xe_device.h | 4 ++
> drivers/gpu/drm/xe/xe_device_remove_action.h | 24 ++++++++
> drivers/gpu/drm/xe/xe_device_types.h | 15 +++++
> drivers/gpu/drm/xe/xe_pci.c | 4 +-
> 5 files changed, 105 insertions(+), 1 deletion(-)
> create mode 100644 drivers/gpu/drm/xe/xe_device_remove_action.h
>
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 90275531653fe..5fc4e696262f9 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -747,6 +747,9 @@ int xe_device_probe(struct xe_device *xe)
> u8 last_gt;
> u8 id;
>
> + xe->probing = true;
> + INIT_LIST_HEAD(&xe->remove_action_list);
> +
> xe_pat_init_early(xe);
>
> err = xe_sriov_init(xe);
> @@ -892,6 +895,8 @@ int xe_device_probe(struct xe_device *xe)
>
> xe_vsec_init(xe);
>
> + xe->probing = false;
> +
> return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe);
>
> err_fini_display:
> @@ -911,6 +916,58 @@ int xe_device_probe(struct xe_device *xe)
> return err;
> }
>
> +/**
> + * xe_device_call_remove_actions - Call the remove actions
> + * @xe: xe device instance
> + *
> + * This is only to be used by xe_pci and xe_device to call the remove actions
> + * while removing the driver or handling probe failures.
> + */
> +void xe_device_call_remove_actions(struct xe_device *xe)
> +{
> + struct xe_device_remove_action *ra;
> +
> + list_for_each_entry(ra, &xe->remove_action_list, node)
> + ra->remove(ra);
> +
> + xe->probing = false;
> +}
> +
> +/**
> + * xe_device_add_remove_action - Add an action to run on driver removal
> + * @xe: xe device instance
> + * @ra: pointer to the object embedded into the object to cleanup
> + * @remove: function to execute. The @ra is passed as argument
> + *
> + * Example:
> + *
> + * .. code-block:: c
> + *
> + * static void foo_remove(struct xe_device_remove_action *ra)
> + * {
> + * struct xe_foo *foo = container_of(ra, struct xe_foo, remove_action);
> + * ...
> + * }
> + *
> + * int xe_foo_init(struct xe_foo *foo)
> + * {
> + * ...
> + * xe_device_add_remove_action(xe, &foo->remove_action, foo_remove);
> + * ...
> + * return 0;
> + * };
Although the cover letter mention that this should be the exception, the
documentation here doesn't make that so clear.
I believe we should be more clear on what cases this structure is aiming
and some basic rules on when to go here instead of devm or drmm.
And probably even keep that comment where it is used with the GSC code.
But other than that, the code and the approach looks good to me.
> + */
> +void xe_device_add_remove_action(struct xe_device *xe,
> + struct xe_device_remove_action *ra,
> + void (*remove)(struct xe_device_remove_action *ra))
> +{
> + drm_WARN_ON(&xe->drm, !xe->probing);
> +
> + INIT_LIST_HEAD(&ra->node);
> + ra->remove = remove;
> + list_add(&ra->node, &xe->remove_action_list);
> +}
> +
> static void xe_device_remove_display(struct xe_device *xe)
> {
> xe_display_unregister(xe);
> @@ -934,6 +991,8 @@ void xe_device_remove(struct xe_device *xe)
>
> for_each_gt(gt, xe, id)
> xe_gt_remove(gt);
> +
> + xe_device_call_remove_actions(xe);
> }
>
> void xe_device_shutdown(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index fc3c2af3fb7fd..3fecf865957b0 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -45,6 +45,10 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
> const struct pci_device_id *ent);
> int xe_device_probe_early(struct xe_device *xe);
> int xe_device_probe(struct xe_device *xe);
> +void xe_device_add_remove_action(struct xe_device *xe,
> + struct xe_device_remove_action *ra,
> + void (*remove)(struct xe_device_remove_action *ra));
> +void xe_device_call_remove_actions(struct xe_device *xe);
> void xe_device_remove(struct xe_device *xe);
> void xe_device_shutdown(struct xe_device *xe);
>
> diff --git a/drivers/gpu/drm/xe/xe_device_remove_action.h b/drivers/gpu/drm/xe/xe_device_remove_action.h
> new file mode 100644
> index 0000000000000..e0322c4660dda
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_device_remove_action.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2025 Intel Corporation
> + */
> +
> +#ifndef _XE_DEVICE_REMOVE_ACTION_H_
> +#define _XE_DEVICE_REMOVE_ACTION_H_
> +
> +#include <linux/list.h>
> +
> +/**
> + * struct xe_device_remove_action - Action item to run on driver removal
> + *
> + * This should be used like a list_head, embeding it into structures of the
> + * individual parts being initialized. Once the remove action is ready to be
> + * added, call xe_device_add_remove_action() to initialize and use this struct.
> + */
> +struct xe_device_remove_action {
> + /* private: */
> + struct list_head node;
> + void (*remove)(struct xe_device_remove_action *ra);
> +};
> +
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index c0e886bac1831..4c902e0cb4ba9 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -13,6 +13,7 @@
> #include <drm/ttm/ttm_device.h>
>
> #include "xe_devcoredump_types.h"
> +#include "xe_device_remove_action.h"
> #include "xe_heci_gsc.h"
> #include "xe_lmtt_types.h"
> #include "xe_memirq_types.h"
> @@ -428,6 +429,20 @@ struct xe_device {
> /** @tiles: device tiles */
> struct xe_tile tiles[XE_MAX_TILES_PER_DEVICE];
>
> + /**
> + * @remove_action_list: list of actions to execute on device remove.
> + * Use xe_device_add_remove_action() for that. Actions can only be added
> + * during probe and are executed during the call from PCI subsystem to
> + * remove the driver from the device.
> + */
> + struct list_head remove_action_list;
> +
> + /**
> + * @probing: cover the section in which @remove_action_list can be used
> + * to post cleaning actions
> + */
> + bool probing;
> +
> /**
> * @mem_access: keep track of memory access in the device, possibly
> * triggering additional actions when they occur.
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index 6a8e82aff3853..70b697fde5b96 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -905,8 +905,10 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> return err;
>
> err = xe_device_probe(xe);
> - if (err)
> + if (err) {
> + xe_device_call_remove_actions(xe);
> return err;
> + }
>
> err = xe_pm_init(xe);
> if (err)
> --
> 2.48.1
>
next prev parent reply other threads:[~2025-02-10 17:41 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-07 22:19 [PATCH v3 00/13] Cleanup error handling on probe Lucas De Marchi
2025-02-07 22:19 ` [PATCH v3 01/13] drm/xe: Fix xe_display_fini() calls Lucas De Marchi
2025-02-10 17:56 ` Ghimiray, Himal Prasad
2025-02-10 19:08 ` Ghimiray, Himal Prasad
2025-02-07 22:19 ` [PATCH v3 02/13] drm/xe: Fix error handling in xe_irq_install() Lucas De Marchi
2025-02-10 17:58 ` Ghimiray, Himal Prasad
2025-02-07 22:19 ` [PATCH v3 03/13] drm/xe: Fix xe_tile_init_noalloc() error propagation Lucas De Marchi
2025-02-10 5:52 ` Upadhyay, Tejas
2025-02-10 17:59 ` Ghimiray, Himal Prasad
2025-02-07 22:19 ` [PATCH v3 04/13] drm/xe: Stop ignoring errors from xe_ttm_stolen_mgr_init() Lucas De Marchi
2025-02-10 18:05 ` Ghimiray, Himal Prasad
2025-02-07 22:19 ` [PATCH v3 05/13] drm/xe: Remove leftover pxp comment Lucas De Marchi
2025-02-07 22:19 ` [PATCH v3 06/13] drm/xe: Add callback support for driver remove Lucas De Marchi
2025-02-10 17:41 ` Rodrigo Vivi [this message]
2025-02-12 5:45 ` Lucas De Marchi
2025-02-07 22:19 ` [PATCH v3 07/13] drm/xe: Cleanup unwind of gt initialization Lucas De Marchi
2025-02-10 18:54 ` Ghimiray, Himal Prasad
2025-02-07 22:19 ` [PATCH v3 08/13] drm/xe: Cleanup extra calls to xe_hw_fence_irq_finish() Lucas De Marchi
2025-02-10 18:56 ` Ghimiray, Himal Prasad
2025-02-07 22:19 ` [PATCH v3 09/13] drm/xe/oa: Move fini to xe_oa Lucas De Marchi
2025-02-07 22:19 ` [PATCH v3 10/13] drm/xe: Move drm_dev_unplug() out of display function Lucas De Marchi
2025-02-10 5:44 ` Upadhyay, Tejas
2025-02-07 22:19 ` [PATCH v3 11/13] drm/xe/oa: Handle errors in xe_oa_register() Lucas De Marchi
2025-02-07 22:19 ` [PATCH v3 12/13] drm/xe/pmu: Fail probe if xe_pmu_register() fails Lucas De Marchi
2025-02-10 11:17 ` Upadhyay, Tejas
2025-02-10 19:00 ` Ghimiray, Himal Prasad
2025-02-07 22:19 ` [PATCH v3 13/13] drm/xe/hwmon: Stop ignoring errors on probe Lucas De Marchi
2025-02-07 22:31 ` Raag Jadav
2025-02-10 5:31 ` Nilawar, Badal
2025-02-10 15:15 ` Lucas De Marchi
2025-02-10 16:54 ` Nilawar, Badal
2025-02-07 23:42 ` ✓ CI.Patch_applied: success for Cleanup error handling on probe (rev3) Patchwork
2025-02-07 23:42 ` ✗ CI.checkpatch: warning " Patchwork
2025-02-07 23:43 ` ✓ CI.KUnit: success " Patchwork
2025-02-08 0:00 ` ✓ CI.Build: " Patchwork
2025-02-08 0:02 ` ✓ CI.Hooks: " Patchwork
2025-02-08 0:03 ` ✓ CI.checksparse: " Patchwork
2025-02-10 6:05 ` ✓ Xe.CI.BAT: " Patchwork
2025-02-10 6:55 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6o6WpGqz_B0bj3W@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=daniele.ceraolospurio@intel.com \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=matthew.auld@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox