From: "Dong, Zhanjun" <zhanjun.dong@intel.com>
To: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v3] drm/xe/uc: Add stop on hardware initialization error
Date: Tue, 4 Nov 2025 11:33:19 -0500 [thread overview]
Message-ID: <84fa5b89-61e7-4aec-ab17-5057f9c52d74@intel.com> (raw)
In-Reply-To: <55e77810-bf9a-4914-9eec-8984d29684da@intel.com>
On 2025-10-28 6:36 p.m., Dong, Zhanjun wrote:
>
>
> On 2025-10-28 3:57 p.m., Matthew Brost wrote:
>> On Tue, Oct 28, 2025 at 11:38:20AM -0400, Zhanjun Dong wrote:
>>> On hardware init fail, the hardware might no longer response, add GuC
>>> stop
>>> to clean up exec_queue items.
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5466
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5530
>>> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
>>> ---
>>> v3: Switch to xe_guc_stop
>>> v2: Switch to xe_guc_ct_stop
>>> ---
>>> drivers/gpu/drm/xe/xe_uc.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
>>> index 465bda355443..00ca5883e006 100644
>>> --- a/drivers/gpu/drm/xe/xe_uc.c
>>> +++ b/drivers/gpu/drm/xe/xe_uc.c
>>> @@ -173,6 +173,7 @@ static int vf_uc_load_hw(struct xe_uc *uc)
>>> return 0;
>>> err_out:
>>> + xe_guc_stop(&uc->guc);
>>
>> If exec queues are destroyed later—after the submission backend has been
>> stopped—the final put on the queue may be lost, leading to dangling
>> memory when aborting the driver load or unloading it.
>>
>> I think you'll need to call xe_guc_submit_pause_abort somewhere to
>> ensure the final put cleanup messages are processed by the queues. Maybe
>> we add this call in guc_submit_fini before wait_event_timeout?
>>
>> Matt
> Thanks for review.
> My original thought is through xe_guc_stop/xe_guc_submit_stop/
> guc_exec_queue_stop, where will do clean up, might be not covers all
> conditions, let me try.
Tested with call xe_guc_submit_pause_abort in guc_submit_fini before
wait_event_timeout, works in some condition, while there is 1 condition
might not cover: for lr queues, it won't clear, so I'm thinking of:
@@ -2375,7 +2382,9 @@ void xe_guc_submit_pause_abort(struct xe_guc *guc)
continue;
xe_sched_submission_start(sched);
- if (exec_queue_killed_or_banned_or_wedged(q))
+ if (exec_queue_killed_or_banned_or_wedged(q) || \
exec_queue_registered(q))
xe_guc_exec_queue_trigger_cleanup(q);
}
mutex_unlock(&guc->submission_state.lock);
@Matthew Brost <matthew.brost@intel.com>, Do you think this change has
side effect to migration worker? I can make it another function if true.
Regards,
Zhanjun Dong
>
> Regards,
> Zhanjun Dong
>
>>
>>> xe_guc_sanitize(&uc->guc);
>>> return err;
>>> }
>>> @@ -228,6 +229,7 @@ int xe_uc_load_hw(struct xe_uc *uc)
>>> return 0;
>>> err_out:
>>> + xe_guc_stop(&uc->guc);
>>> xe_guc_sanitize(&uc->guc);
>>> return ret;
>>> }
>>> --
>>> 2.34.1
>>>
>
next prev parent reply other threads:[~2025-11-04 16:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-28 15:38 [PATCH v3] drm/xe/uc: Add stop on hardware initialization error Zhanjun Dong
2025-10-28 17:29 ` ✓ CI.KUnit: success for drm/xe/uc: Add stop on hardware initialization error (rev2) Patchwork
2025-10-28 18:23 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-28 19:57 ` [PATCH v3] drm/xe/uc: Add stop on hardware initialization error Matthew Brost
2025-10-28 22:36 ` Dong, Zhanjun
2025-11-04 16:33 ` Dong, Zhanjun [this message]
2025-11-19 3:17 ` Matthew Brost
2025-11-20 17:05 ` Dong, Zhanjun
2025-10-29 3:43 ` ✗ Xe.CI.Full: failure for drm/xe/uc: Add stop on hardware initialization error (rev2) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84fa5b89-61e7-4aec-ab17-5057f9c52d74@intel.com \
--to=zhanjun.dong@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox