Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Dong, Zhanjun" <zhanjun.dong@intel.com>
To: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v3] drm/xe/uc: Add stop on hardware initialization error
Date: Tue, 4 Nov 2025 11:33:19 -0500	[thread overview]
Message-ID: <84fa5b89-61e7-4aec-ab17-5057f9c52d74@intel.com> (raw)
In-Reply-To: <55e77810-bf9a-4914-9eec-8984d29684da@intel.com>



On 2025-10-28 6:36 p.m., Dong, Zhanjun wrote:
> 
> 
> On 2025-10-28 3:57 p.m., Matthew Brost wrote:
>> On Tue, Oct 28, 2025 at 11:38:20AM -0400, Zhanjun Dong wrote:
>>> On hardware init fail, the hardware might no longer response, add GuC 
>>> stop
>>> to clean up exec_queue items.
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5466
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5530
>>> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
>>> ---
>>> v3: Switch to xe_guc_stop
>>> v2: Switch to xe_guc_ct_stop
>>> ---
>>>   drivers/gpu/drm/xe/xe_uc.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
>>> index 465bda355443..00ca5883e006 100644
>>> --- a/drivers/gpu/drm/xe/xe_uc.c
>>> +++ b/drivers/gpu/drm/xe/xe_uc.c
>>> @@ -173,6 +173,7 @@ static int vf_uc_load_hw(struct xe_uc *uc)
>>>       return 0;
>>>   err_out:
>>> +    xe_guc_stop(&uc->guc);
>>
>> If exec queues are destroyed later—after the submission backend has been
>> stopped—the final put on the queue may be lost, leading to dangling
>> memory when aborting the driver load or unloading it.
>>
>> I think you'll need to call xe_guc_submit_pause_abort somewhere to
>> ensure the final put cleanup messages are processed by the queues. Maybe
>> we add this call in guc_submit_fini before wait_event_timeout?
>>
>> Matt
> Thanks for review.
> My original thought is through xe_guc_stop/xe_guc_submit_stop/ 
> guc_exec_queue_stop, where will do clean up, might be not covers all 
> conditions, let me try.
Tested with call xe_guc_submit_pause_abort in guc_submit_fini before 
wait_event_timeout, works in some condition, while there is 1 condition 
might not cover: for lr queues, it won't clear, so I'm thinking of:

@@ -2375,7 +2382,9 @@ void xe_guc_submit_pause_abort(struct xe_guc *guc)
                         continue;

                 xe_sched_submission_start(sched);
-               if (exec_queue_killed_or_banned_or_wedged(q))
+               if (exec_queue_killed_or_banned_or_wedged(q) || \
		    exec_queue_registered(q))
                         xe_guc_exec_queue_trigger_cleanup(q);
         }
         mutex_unlock(&guc->submission_state.lock);

@Matthew Brost <matthew.brost@intel.com>, Do you think this change has 
side effect to migration worker? I can make it another function if true.

Regards,
Zhanjun Dong

> 
> Regards,
> Zhanjun Dong
> 
>>
>>>       xe_guc_sanitize(&uc->guc);
>>>       return err;
>>>   }
>>> @@ -228,6 +229,7 @@ int xe_uc_load_hw(struct xe_uc *uc)
>>>       return 0;
>>>   err_out:
>>> +    xe_guc_stop(&uc->guc);
>>>       xe_guc_sanitize(&uc->guc);
>>>       return ret;
>>>   }
>>> -- 
>>> 2.34.1
>>>
> 


  reply	other threads:[~2025-11-04 16:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-28 15:38 [PATCH v3] drm/xe/uc: Add stop on hardware initialization error Zhanjun Dong
2025-10-28 17:29 ` ✓ CI.KUnit: success for drm/xe/uc: Add stop on hardware initialization error (rev2) Patchwork
2025-10-28 18:23 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-28 19:57 ` [PATCH v3] drm/xe/uc: Add stop on hardware initialization error Matthew Brost
2025-10-28 22:36   ` Dong, Zhanjun
2025-11-04 16:33     ` Dong, Zhanjun [this message]
2025-11-19  3:17       ` Matthew Brost
2025-11-20 17:05         ` Dong, Zhanjun
2025-10-29  3:43 ` ✗ Xe.CI.Full: failure for drm/xe/uc: Add stop on hardware initialization error (rev2) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84fa5b89-61e7-4aec-ab17-5057f9c52d74@intel.com \
    --to=zhanjun.dong@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox