From: "Nilawar, Badal" <badal.nilawar@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <anshuman.gupta@intel.com>,
<john.c.harrison@intel.com>, <rodrigo.vivi@intel.com>,
<matthew.auld@intel.com>
Subject: Re: [PATCH 1/3] drm/xe/guc/ct: Improve g2h request handling during async gt reset
Date: Mon, 14 Oct 2024 17:40:16 +0530 [thread overview]
Message-ID: <9eae52ab-fe51-487b-9db3-6c05c4a58d20@intel.com> (raw)
In-Reply-To: <Zwhc5V5aQDSbhBWN@DUT025-TGLU.fm.intel.com>
Hi Matt,
Thanks for review comments.
On 11-10-2024 04:31, Matthew Brost wrote:
> On Wed, Oct 09, 2024 at 04:26:43PM +0530, Badal Nilawar wrote:
>> It is possible that a g2h request may be cancelled while waiting for a
>> response due to an asynchronous gt reset. This commit ensures that in
>> such cases, caller will be notified by returning -ECANCELED.
>>
>> Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
>> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> Cc: John Harrison <John.C.Harrison@Intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_guc_ct.c | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> index c7673f56d413..b93b2821e4e8 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> @@ -512,6 +512,9 @@ void xe_guc_ct_stop(struct xe_guc_ct *ct)
>> {
>> xe_guc_ct_set_state(ct, XE_GUC_CT_STATE_STOPPED);
>> stop_g2h_handler(ct);
>> +
>> + /* Notify callers that CT stopped and G2H requests are cancelled */
>> + wake_up_all(&ct->g2h_fence_wq);
>> }
>>
>> static bool h2g_has_room(struct xe_guc_ct *ct, u32 cmd_len)
>> @@ -1018,6 +1021,19 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
>>
>> ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
>
> Better would be abort the wait here if a GT reset is queue'd or in
> progess. We do this a lot in the xe_guc_submit.c - see any of the
> wait_event functions in that file. We likely should normalize this a bit
> with proper layering but basically the flow should be:
>
> - Any wait_event_* are OR'd with a queued or in progess GT reset
In xe_guc_submit.c to check if reset queued/progress we check guc
submission is stopped xe_guc_read_stopped(). Are you suggesting to use
xe_guc_read_stopped instead of checking ct->state?
Or we should do like this?
ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done || ct->state
== XE_GUC_CT_STATE_STOPPED, HZ);
>
> - After wait_event_* signals check for OR condition, handle gracefully
> via an error code kicking it to upper layers
Agree.
>
> - All upper layers need to cope with H2G failing or use *_no_fail
> versions the H2G functions. The *_no_fail versions are untested as I
> coded those 2.5 years ago in Xe and don't have user of those functions
Ok.
>
> - Queuing a GT reset wakes up all waiters
How should we do this. After queening GT reset or during GT reset CT
communication will still be there. Especially during gt start we do
guc_pc_start there xe_guc_send_recv is used for SLPC check.
>
> - Upon completion of GT reset the OR condition is cleared
Ok. Condition will be cleared once CT is enabled.
Regards,
Badal
>
> Matt
>
>>
>> + /*
>> + * It is possible that the g2h request may be cancelled while waiting for a response due
>> + * to an asynchronous gt reset. In such cases, return -ECANCELED.
>> + */
>> + mutex_lock(&ct->lock);
>> + if (ct->state == XE_GUC_CT_STATE_STOPPED) {
>> + xe_gt_dbg(gt, "H2G action %#x canceled as GT reset is in progress\n",
>> + action[0]);
>> + mutex_unlock(&ct->lock);
>> + return -ECANCELED;
>> + }
>> + mutex_unlock(&ct->lock);
>> +
>> /*
>> * Ensure we serialize with completion side to prevent UAF with fence going out of scope on
>> * the stack, since we have no clue if it will fire after the timeout before we can erase
>> --
>> 2.34.1
>>
next prev parent reply other threads:[~2024-10-14 12:10 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-09 10:56 [PATCH 0/3] Handle G2H response timeout Badal Nilawar
2024-10-09 10:56 ` [PATCH 1/3] drm/xe/guc/ct: Improve g2h request handling during async gt reset Badal Nilawar
2024-10-09 19:41 ` John Harrison
2024-10-10 23:03 ` Matthew Brost
2024-10-10 23:01 ` Matthew Brost
2024-10-14 12:10 ` Nilawar, Badal [this message]
2024-10-14 15:57 ` Matthew Brost
2024-10-09 10:56 ` [PATCH 2/3] drm/xe/guc/ct: Increase wait timeout for g2h response Badal Nilawar
2024-10-09 19:43 ` John Harrison
2024-10-10 23:06 ` Matthew Brost
2024-10-14 12:12 ` Nilawar, Badal
2024-10-17 9:54 ` Anshuman Gupta
2024-10-09 10:56 ` [PATCH 3/3] drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout Badal Nilawar
2024-10-09 19:50 ` John Harrison
2024-10-10 23:09 ` Matthew Brost
2024-10-09 13:58 ` ✓ CI.Patch_applied: success for Handle G2H " Patchwork
2024-10-09 13:58 ` ✓ CI.checkpatch: " Patchwork
2024-10-09 14:00 ` ✓ CI.KUnit: " Patchwork
2024-10-09 14:13 ` ✓ CI.Build: " Patchwork
2024-10-09 14:15 ` ✓ CI.Hooks: " Patchwork
2024-10-09 14:17 ` ✓ CI.checksparse: " Patchwork
2024-10-09 14:45 ` ✓ CI.BAT: " Patchwork
2024-10-09 22:54 ` ✗ CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9eae52ab-fe51-487b-9db3-6c05c4a58d20@intel.com \
--to=badal.nilawar@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=john.c.harrison@intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox