Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: Matthew Auld <matthew.auld@intel.com>, <intel-xe@lists.freedesktop.org>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [PATCH] Revert "drm/xe: make gt_remove use devm"
Date: Wed, 29 May 2024 09:38:24 -0700	[thread overview]
Message-ID: <eec3dfb5-8eb6-4de4-9562-cd2e593a53aa@intel.com> (raw)
In-Reply-To: <001d2e4b-e9d2-422f-9f4d-dfe1c7b8bbaa@intel.com>



On 5/29/2024 1:26 AM, Matthew Auld wrote:
> Hi,
>
> On 28/05/2024 19:27, Daniele Ceraolo Spurio wrote:
>>
>>
>> On 5/28/2024 11:23 AM, Daniele Ceraolo Spurio wrote:
>>> The gt_remove function was explictly added as part of the remove flow
>>> instead of using drmm/devm automatic cleanup due to it being illegal
>>> to remove a component after the driver has been detached from the pci
>>> device; the GSC proxy component is removed as part of gt_remove, so we
>>> need to do it in the pci cleanup flow. The function already has a
>>> comment above it to explain this.
>>>
>>> Note that the change to use the devm also caused an invalid pointer
>>> deref in the gsc_proxy unbind function, but I didn't bother to debug
>>> which pointer was bad since we shouldn't be calling the unbind that
>>> late anyway and this revert fixes it.
>>
>> Here is the bad pointer deref log in case anyone wants to have a 
>> better look:
>>
>> https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-134099v1/shard-lnl-8/igt@xe_module_load@reload.html 
>>
>
> I also ran into this. If we remove the explicit pci_set_drvdata(pdev, 
> NULL) in our pci remove callback, and rather let the core driver do it 
> for us, it seems to fix the above. But then we trigger:
>
> <4> [69.885925] WARNING: CPU: 2 PID: 1441 at drivers/base/devres.c:690 
> devres_release_group+0x103/0x120
>
> Which I am not too sure about.

This error is why I was saying that we can't delete the component after 
the pci_remove has completed. This is my understanding of what's 
happening (which might be wrong):
the component resources are auto-cleared when the driver is detached 
from the PCI device. Since the component subsystem has no guarantee that 
the driver is still present after that point, they can't call the unbind 
function while doing their cleanup. If we call the component_del after 
that point, the resources are not there anymore, which result in the 
above warning.

>
> Anyway, if it is indeed a no-go to use devm here,
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>

Thanks!
Daniele

>
>>
>> Daniele
>>
>>>
>>> Both issue were not seen in CI because the GSC loading is temporarily
>>> disabled due to a critical bug, which means we're not binding the
>>> component.
>>>
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_device.c | 22 ++++++++++++++++++++--
>>>   drivers/gpu/drm/xe/xe_gt.c     | 16 +++++++++-------
>>>   drivers/gpu/drm/xe/xe_gt.h     |  1 +
>>>   3 files changed, 30 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_device.c 
>>> b/drivers/gpu/drm/xe/xe_device.c
>>> index 85c1b0c406a6..4c44f23e58ea 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.c
>>> +++ b/drivers/gpu/drm/xe/xe_device.c
>>> @@ -549,6 +549,7 @@ int xe_device_probe(struct xe_device *xe)
>>>       struct xe_tile *tile;
>>>       struct xe_gt *gt;
>>>       int err;
>>> +    u8 last_gt;
>>>       u8 id;
>>>       xe_pat_init_early(xe);
>>> @@ -647,16 +648,18 @@ int xe_device_probe(struct xe_device *xe)
>>>           goto err_irq_shutdown;
>>>       for_each_gt(gt, xe, id) {
>>> +        last_gt = id;
>>> +
>>>           err = xe_gt_init(gt);
>>>           if (err)
>>> -            goto err_irq_shutdown;
>>> +            goto err_fini_gt;
>>>       }
>>>       xe_heci_gsc_init(xe);
>>>       err = xe_display_init(xe);
>>>       if (err)
>>> -        goto err_irq_shutdown;
>>> +        goto err_fini_gt;
>>>       err = drm_dev_register(&xe->drm, 0);
>>>       if (err)
>>> @@ -672,6 +675,15 @@ int xe_device_probe(struct xe_device *xe)
>>>   err_fini_display:
>>>       xe_display_driver_remove(xe);
>>> +
>>> +err_fini_gt:
>>> +    for_each_gt(gt, xe, id) {
>>> +        if (id < last_gt)
>>> +            xe_gt_remove(gt);
>>> +        else
>>> +            break;
>>> +    }
>>> +
>>>   err_irq_shutdown:
>>>       xe_irq_shutdown(xe);
>>>   err:
>>> @@ -689,12 +701,18 @@ static void xe_device_remove_display(struct 
>>> xe_device *xe)
>>>   void xe_device_remove(struct xe_device *xe)
>>>   {
>>> +    struct xe_gt *gt;
>>> +    u8 id;
>>> +
>>>       xe_device_remove_display(xe);
>>>       xe_display_fini(xe);
>>>       xe_heci_gsc_fini(xe);
>>> +    for_each_gt(gt, xe, id)
>>> +        xe_gt_remove(gt);
>>> +
>>>       xe_irq_shutdown(xe);
>>>   }
>>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>>> index 6f4b59a6e710..98c2228b51d0 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt.c
>>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>>> @@ -93,14 +93,16 @@ void xe_gt_sanitize(struct xe_gt *gt)
>>>       gt->uc.guc.submission_state.enabled = false;
>>>   }
>>> -/*
>>> - * Clean up the GT structures before driver removal. This function 
>>> should only
>>> - * act on objects/structures that must be cleaned before the driver 
>>> removal
>>> - * callback is complete and therefore can't be deferred to a drmm 
>>> action.
>>> +/**
>>> + * xe_gt_remove() - Clean up the GT structures before driver removal
>>> + * @gt: the GT object
>>> + *
>>> + * This function should only act on objects/structures that must be 
>>> cleaned
>>> + * before the driver removal callback is complete and therefore 
>>> can't be
>>> + * deferred to a drmm action.
>>>    */
>>> -static void gt_remove(void *arg)
>>> +void xe_gt_remove(struct xe_gt *gt)
>>>   {
>>> -    struct xe_gt *gt = arg;
>>>       int i;
>>>       xe_uc_remove(&gt->uc);
>>> @@ -566,7 +568,7 @@ int xe_gt_init(struct xe_gt *gt)
>>>       xe_gt_record_user_engines(gt);
>>> -    return devm_add_action_or_reset(gt_to_xe(gt)->drm.dev, 
>>> gt_remove, gt);
>>> +    return 0;
>>>   }
>>>   void xe_gt_record_user_engines(struct xe_gt *gt)
>>> diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
>>> index d0747edfe020..9073ac68a777 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt.h
>>> +++ b/drivers/gpu/drm/xe/xe_gt.h
>>> @@ -56,6 +56,7 @@ int xe_gt_suspend(struct xe_gt *gt);
>>>   int xe_gt_resume(struct xe_gt *gt);
>>>   void xe_gt_reset_async(struct xe_gt *gt);
>>>   void xe_gt_sanitize(struct xe_gt *gt);
>>> +void xe_gt_remove(struct xe_gt *gt);
>>>   /**
>>>    * xe_gt_any_hw_engine_by_reset_domain - scan the list of engines 
>>> and return the
>>


  reply	other threads:[~2024-05-29 16:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-28 18:23 [PATCH] Revert "drm/xe: make gt_remove use devm" Daniele Ceraolo Spurio
2024-05-28 18:27 ` Daniele Ceraolo Spurio
2024-05-29  8:26   ` Matthew Auld
2024-05-29 16:38     ` Daniele Ceraolo Spurio [this message]
2024-05-28 18:29 ` ✓ CI.Patch_applied: success for " Patchwork
2024-05-28 18:30 ` ✗ CI.checkpatch: warning " Patchwork
2024-05-28 18:31 ` ✗ CI.KUnit: failure " Patchwork
2024-05-29 16:07 ` ✓ CI.Patch_applied: success for Revert "drm/xe: make gt_remove use devm" (rev2) Patchwork
2024-05-29 16:07 ` ✗ CI.checkpatch: warning " Patchwork
2024-05-29 16:08 ` ✓ CI.KUnit: success " Patchwork
2024-05-29 16:20 ` ✓ CI.Build: " Patchwork
2024-05-29 16:21 ` ✗ CI.Hooks: failure " Patchwork
2024-05-29 16:22 ` ✓ CI.checksparse: success " Patchwork
2024-05-29 16:47 ` ✓ CI.BAT: " Patchwork
2024-05-29 18:17 ` ✗ CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eec3dfb5-8eb6-4de4-9562-cd2e593a53aa@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox