Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: <intel-xe@lists.freedesktop.org>
Cc: Matthew Auld <matthew.auld@intel.com>,
	Andrzej Hajda <andrzej.hajda@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [PATCH] Revert "drm/xe: make gt_remove use devm"
Date: Tue, 28 May 2024 11:27:32 -0700	[thread overview]
Message-ID: <7a949a77-88c2-47e5-aa19-773965181bc0@intel.com> (raw)
In-Reply-To: <20240528182354.1200424-1-daniele.ceraolospurio@intel.com>



On 5/28/2024 11:23 AM, Daniele Ceraolo Spurio wrote:
> The gt_remove function was explictly added as part of the remove flow
> instead of using drmm/devm automatic cleanup due to it being illegal
> to remove a component after the driver has been detached from the pci
> device; the GSC proxy component is removed as part of gt_remove, so we
> need to do it in the pci cleanup flow. The function already has a
> comment above it to explain this.
>
> Note that the change to use the devm also caused an invalid pointer
> deref in the gsc_proxy unbind function, but I didn't bother to debug
> which pointer was bad since we shouldn't be calling the unbind that
> late anyway and this revert fixes it.

Here is the bad pointer deref log in case anyone wants to have a better 
look:

https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-134099v1/shard-lnl-8/igt@xe_module_load@reload.html

Daniele

>
> Both issue were not seen in CI because the GSC loading is temporarily
> disabled due to a critical bug, which means we're not binding the
> component.
>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_device.c | 22 ++++++++++++++++++++--
>   drivers/gpu/drm/xe/xe_gt.c     | 16 +++++++++-------
>   drivers/gpu/drm/xe/xe_gt.h     |  1 +
>   3 files changed, 30 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 85c1b0c406a6..4c44f23e58ea 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -549,6 +549,7 @@ int xe_device_probe(struct xe_device *xe)
>   	struct xe_tile *tile;
>   	struct xe_gt *gt;
>   	int err;
> +	u8 last_gt;
>   	u8 id;
>   
>   	xe_pat_init_early(xe);
> @@ -647,16 +648,18 @@ int xe_device_probe(struct xe_device *xe)
>   		goto err_irq_shutdown;
>   
>   	for_each_gt(gt, xe, id) {
> +		last_gt = id;
> +
>   		err = xe_gt_init(gt);
>   		if (err)
> -			goto err_irq_shutdown;
> +			goto err_fini_gt;
>   	}
>   
>   	xe_heci_gsc_init(xe);
>   
>   	err = xe_display_init(xe);
>   	if (err)
> -		goto err_irq_shutdown;
> +		goto err_fini_gt;
>   
>   	err = drm_dev_register(&xe->drm, 0);
>   	if (err)
> @@ -672,6 +675,15 @@ int xe_device_probe(struct xe_device *xe)
>   
>   err_fini_display:
>   	xe_display_driver_remove(xe);
> +
> +err_fini_gt:
> +	for_each_gt(gt, xe, id) {
> +		if (id < last_gt)
> +			xe_gt_remove(gt);
> +		else
> +			break;
> +	}
> +
>   err_irq_shutdown:
>   	xe_irq_shutdown(xe);
>   err:
> @@ -689,12 +701,18 @@ static void xe_device_remove_display(struct xe_device *xe)
>   
>   void xe_device_remove(struct xe_device *xe)
>   {
> +	struct xe_gt *gt;
> +	u8 id;
> +
>   	xe_device_remove_display(xe);
>   
>   	xe_display_fini(xe);
>   
>   	xe_heci_gsc_fini(xe);
>   
> +	for_each_gt(gt, xe, id)
> +		xe_gt_remove(gt);
> +
>   	xe_irq_shutdown(xe);
>   }
>   
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 6f4b59a6e710..98c2228b51d0 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -93,14 +93,16 @@ void xe_gt_sanitize(struct xe_gt *gt)
>   	gt->uc.guc.submission_state.enabled = false;
>   }
>   
> -/*
> - * Clean up the GT structures before driver removal. This function should only
> - * act on objects/structures that must be cleaned before the driver removal
> - * callback is complete and therefore can't be deferred to a drmm action.
> +/**
> + * xe_gt_remove() - Clean up the GT structures before driver removal
> + * @gt: the GT object
> + *
> + * This function should only act on objects/structures that must be cleaned
> + * before the driver removal callback is complete and therefore can't be
> + * deferred to a drmm action.
>    */
> -static void gt_remove(void *arg)
> +void xe_gt_remove(struct xe_gt *gt)
>   {
> -	struct xe_gt *gt = arg;
>   	int i;
>   
>   	xe_uc_remove(&gt->uc);
> @@ -566,7 +568,7 @@ int xe_gt_init(struct xe_gt *gt)
>   
>   	xe_gt_record_user_engines(gt);
>   
> -	return devm_add_action_or_reset(gt_to_xe(gt)->drm.dev, gt_remove, gt);
> +	return 0;
>   }
>   
>   void xe_gt_record_user_engines(struct xe_gt *gt)
> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> index d0747edfe020..9073ac68a777 100644
> --- a/drivers/gpu/drm/xe/xe_gt.h
> +++ b/drivers/gpu/drm/xe/xe_gt.h
> @@ -56,6 +56,7 @@ int xe_gt_suspend(struct xe_gt *gt);
>   int xe_gt_resume(struct xe_gt *gt);
>   void xe_gt_reset_async(struct xe_gt *gt);
>   void xe_gt_sanitize(struct xe_gt *gt);
> +void xe_gt_remove(struct xe_gt *gt);
>   
>   /**
>    * xe_gt_any_hw_engine_by_reset_domain - scan the list of engines and return the


  reply	other threads:[~2024-05-28 18:31 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-28 18:23 [PATCH] Revert "drm/xe: make gt_remove use devm" Daniele Ceraolo Spurio
2024-05-28 18:27 ` Daniele Ceraolo Spurio [this message]
2024-05-29  8:26   ` Matthew Auld
2024-05-29 16:38     ` Daniele Ceraolo Spurio
2024-05-28 18:29 ` ✓ CI.Patch_applied: success for " Patchwork
2024-05-28 18:30 ` ✗ CI.checkpatch: warning " Patchwork
2024-05-28 18:31 ` ✗ CI.KUnit: failure " Patchwork
2024-05-29 16:07 ` ✓ CI.Patch_applied: success for Revert "drm/xe: make gt_remove use devm" (rev2) Patchwork
2024-05-29 16:07 ` ✗ CI.checkpatch: warning " Patchwork
2024-05-29 16:08 ` ✓ CI.KUnit: success " Patchwork
2024-05-29 16:20 ` ✓ CI.Build: " Patchwork
2024-05-29 16:21 ` ✗ CI.Hooks: failure " Patchwork
2024-05-29 16:22 ` ✓ CI.checksparse: success " Patchwork
2024-05-29 16:47 ` ✓ CI.BAT: " Patchwork
2024-05-29 18:17 ` ✗ CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a949a77-88c2-47e5-aa19-773965181bc0@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox