Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.auld@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH v2 2/2] drm/xe/guc_submit: fixup deregister in job timeout
Date: Fri, 4 Aug 2023 09:48:30 +0100	[thread overview]
Message-ID: <61193ecb-9f6a-500c-d084-cb9df4ddd4db@intel.com> (raw)
In-Reply-To: <ZMvytzGNetQxODNL@DUT025-TGLU.fm.intel.com>

On 03/08/2023 19:32, Matthew Brost wrote:
> On Thu, Aug 03, 2023 at 06:38:51PM +0100, Matthew Auld wrote:
>> Rather check if the engine is still registered before proceeding with
>> deregister steps. Also the engine being marked as disabled doesn't mean
>> the engine has been disabled or deregistered from GuC pov, and here we
>> are signalling fences so we need to be sure GuC is not still using this
>> context.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_guc_submit.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index b88bfe7d8470..e499e6540ca5 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -881,15 +881,17 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>>   	}
>>   
>>   	/* Engine state now stable, disable scheduling if needed */
>> -	if (exec_queue_enabled(q)) {
>> +	if (exec_queue_registered(q)) {
>>   		struct xe_guc *guc = exec_queue_to_guc(q);
>>   		int ret;
>>   
>>   		if (exec_queue_reset(q))
>>   			err = -EIO;
>>   		set_exec_queue_banned(q);
>> -		xe_exec_queue_get(q);
>> -		disable_scheduling_deregister(guc, q);
>> +		if (!exec_queue_destroyed(q)) {
>> +			xe_exec_queue_get(q);
>> +			disable_scheduling_deregister(guc, q);
> 
> You could include wait under this if statment too but either way works.

Do you mean move the pending_disable wait under the if? My worry is that 
multiple queued timeout jobs could somehow trigger one after the other 
and the first disable_scheduling_deregister() goes bad triggering a 
timeout for the wait and queuing a GT reset. The GT reset looks to use 
the same ordered wq as the timeout jobs, so it might be that another 
timeout job was queued before the reset job (like when doing the ~5 
second wait). If that happens the second timeout job would see that 
exec_queue_destroyed has been seen and incorrectly not wait for the 
pending_disable state change and then start signalling fences even 
though the GuC might still be using the context. Do you know if that is 
possible?

> 
> With that:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> 
>> +		}
>>   
>>   		/*
>>   		 * Must wait for scheduling to be disabled before signalling
>> -- 
>> 2.41.0
>>

  reply	other threads:[~2023-08-04  8:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-03 17:38 [Intel-xe] [PATCH v2 1/2] drm/xe/guc_submit: prevent repeated unregister Matthew Auld
2023-08-03 17:38 ` [Intel-xe] [PATCH v2 2/2] drm/xe/guc_submit: fixup deregister in job timeout Matthew Auld
2023-08-03 18:32   ` Matthew Brost
2023-08-04  8:48     ` Matthew Auld [this message]
2023-08-04 13:37       ` Matthew Brost
2023-08-04 15:03         ` Matthew Auld
2023-08-07 22:18           ` Matthew Brost
2023-08-03 17:41 ` [Intel-xe] ✓ CI.Patch_applied: success for series starting with [v2,1/2] drm/xe/guc_submit: prevent repeated unregister Patchwork
2023-08-03 17:41 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-08-03 17:42 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-08-03 17:46 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-08-03 17:46 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-08-03 17:47 ` [Intel-xe] ✗ CI.checksparse: warning " Patchwork
2023-08-04  8:31 ` [Intel-xe] ○ CI.BAT: info " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61193ecb-9f6a-500c-d084-cb9df4ddd4db@intel.com \
    --to=matthew.auld@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox