Re: [PATCH] drm/xe: Suppress reset log for destroyed queues

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH] drm/xe: Suppress reset log for destroyed queues
Date: Thu, 2 Apr 2026 17:45:20 -0700	[thread overview]
Message-ID: <ac8NoEXghE1XbfbE@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <38565bd9-47e9-4727-aa00-f49962250fee@intel.com>

On Thu, Apr 02, 2026 at 03:51:36PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 4/2/2026 2:37 PM, Matthew Brost wrote:
> > On Thu, Apr 02, 2026 at 02:30:30PM -0700, Daniele Ceraolo Spurio wrote:
> > > When a queue is destroyed while still active on the HW (for example
> > > because the app owning it is exiting abruptly), the driver tells the GuC
> > > to preempt it off the HW immediately and to reset it if it doesn't
> > > preempt. This can cause a reset log to be printed to dmesg, which can
> > > be confusing to users as resets are commonly tied to errors, while in
> > > this case the reset is just done to speed up the cleanup.
> > > Given that a queue is only destroyed once all refs on it have been
> > > released (i.e., no one cares about it anymore), the log of it being
> > > reset is not useful and therefore we can simply suppress it to avoid
> > > confusion.
> > > 
> > > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_guc_submit.c | 7 ++++---
> > >   1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index 10556156eaad..e6702bd99309 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -2968,9 +2968,10 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > >   	if (unlikely(!q))
> > >   		return -EPROTO;
> > > -	xe_gt_info(gt, "Engine reset: engine_class=%s, logical_mask: 0x%x, guc_id=%d, state=0x%0x",
> > > -		   xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id,
> > > -		   atomic_read(&q->guc->state));
> > > +	if (!exec_queue_destroyed(q))
> > I think you want killed here—right? Destroyed is tied to the refcount,
> 
> I actually wanted destroyed here. I was thinking that a queue can be killed
> outside of the user control while the user is still using it (e.g. PXP
> queues are killed when the PXP session is invalidated), in which case it
> might be useful to keep the log.
> I have been pinged multiple times about engine reset logs on destroyed
> queues and I kind of got tired of having to explain that it was irrelevant,
> which is why I'd like to just suppress the log in that scenario. Maybe I can
> rework the commit message to be clearer on where I am coming from?

This was my mistake—I didn’t read your commit message closely enough.

However, destroyed can only be set once the queue is off the hardware
due to the reference counting between jobs and the queue (job hold a ref
to the queue). Note that disable_scheduling_deregister, which sets
set_exec_queue_destroyed, is called from __guc_exec_queue_process_msg_cleanup,
which only happens when the queue refcount reaches zero.

We can only reach a refcount of zero once all jobs have naturally
completed, or after we time out jobs, disable scheduling (a one-way
transition), and then signal the job fences.

So we should likely assert !destroyed in
xe_guc_exec_queue_reset_handler, since reaching that state afterward
really shouldn’t be possible.

Compare this to a user pressing Ctrl-C on an app (which sets kill). We
can trivially trigger this message while the queue is still on the GPU
because the TDR (immediated kicked) sets the preemption timeout to the
minimum value, disables scheduling, making an engine reset highly
likely. I think this is the case you’re trying to fix, unless I’m
misunderstanding the problem.

I haven't looked at the PXP cases so can't comment but will look in bit
if we to discuss further.

Matt

> 
> > whereas killed is tied to either a user closing an exec_queue or the DRM
> > render FD being closed. There may also be a multi-queue case that needs
> > slightly different logic. I forget the exact multi-queue teardown flow,
> > but IIRC it is slightly different. Niranjana would likely know that
> > offhand; otherwise, I’d have to reverse-engineer those flows again.
> 
> I'll check that out.
> 
> Thanks,
> Daniele
> 
> > 
> > Matt
> > 
> > > +		xe_gt_info(gt, "Engine reset: engine_class=%s, logical_mask: 0x%x, guc_id=%d, state=0x%0x",
> > > +			   xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id,
> > > +			   atomic_read(&q->guc->state));
> > >   	trace_xe_exec_queue_reset(q);
> > > -- 
> > > 2.43.0
> > > 
>

next prev parent reply	other threads:[~2026-04-03  0:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-02 21:30 [PATCH] drm/xe: Suppress reset log for destroyed queues Daniele Ceraolo Spurio
2026-04-02 21:37 ` Matthew Brost
2026-04-02 22:51   ` Daniele Ceraolo Spurio
2026-04-03  0:45     ` Matthew Brost [this message]
2026-04-03 17:33       ` Daniele Ceraolo Spurio
2026-04-06 20:08         ` Matthew Brost
2026-04-02 21:37 ` ✓ CI.KUnit: success for " Patchwork
2026-04-02 22:14 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-03 11:29 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac8NoEXghE1XbfbE@gsse-cloud1.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox