Re: [PATCH 2/2] drm/xe/guc: Explicitly exit CT safe mode on unwind

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>,
	<intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 2/2] drm/xe/guc: Explicitly exit CT safe mode on unwind
Date: Tue, 24 Jun 2025 17:30:19 -0400	[thread overview]
Message-ID: <aFsY65GdKwZA1HZU@intel.com> (raw)
In-Reply-To: <aFrFDV6ghPUHfWLI@lstrano-desk.jf.intel.com>

On Tue, Jun 24, 2025 at 08:32:29AM -0700, Matthew Brost wrote:
> On Mon, Jun 23, 2025 at 05:29:21PM -0400, Rodrigo Vivi wrote:
> > On Fri, Jun 13, 2025 at 12:09:37AM +0200, Michal Wajdeczko wrote:
> > > During driver probe we might be briefly using CT safe mode, which
> > > is based on a delayed work, but usually we are able to stop this
> > > once we have IRQ fully operational.  However, if we abort the probe
> > > quite early then during unwind we might try to destroy the workqueue
> > > while there is still a pending delayed work that attempts to restart
> > > itself which triggers a WARN.
> > > 
> > > This was recently observed during unsuccessful VF initialization:
> > > 
> > >  [ ] xe 0000:00:02.1: probe with driver xe failed with error -62
> > >  [ ] ------------[ cut here ]------------
> > >  [ ] workqueue: cannot queue safe_mode_worker_func [xe] on wq xe-g2h-wq
> > >  [ ] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:2257 __queue_work+0x287/0x710
> > >  [ ] RIP: 0010:__queue_work+0x287/0x710
> > >  [ ] Call Trace:
> > >  [ ]  delayed_work_timer_fn+0x19/0x30
> > >  [ ]  call_timer_fn+0xa1/0x2a0
> > > 
> > > Exit the CT safe mode on unwind to avoid that warning.
> > > 
> > > Fixes: 09b286950f29 ("drm/xe/guc: Allow CTB G2H processing without G2H IRQ")
> > > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_guc_ct.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > index 822f4c33f730..6e353757e204 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > @@ -35,6 +35,11 @@
> > >  #include "xe_pm.h"
> > >  #include "xe_trace_guc.h"
> > >  
> > > +static void receive_g2h(struct xe_guc_ct *ct);
> > > +static void g2h_worker_func(struct work_struct *w);
> > > +static void safe_mode_worker_func(struct work_struct *w);
> > > +static void ct_exit_safe_mode(struct xe_guc_ct *ct);
> > > +
> > >  #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
> > >  enum {
> > >  	/* Internal states, not error conditions */
> > > @@ -189,14 +194,11 @@ static void guc_ct_fini(struct drm_device *drm, void *arg)
> > >  {
> > >  	struct xe_guc_ct *ct = arg;
> > >  
> > > +	ct_exit_safe_mode(ct);
> > 
> > I was going to ask if we also don't need to disable more of the ct
> > at this point, but I believe it is indeed not relevant at this point
> > of the code.
> > 
> > But it is any possibility of double call of this function in some of
> > the regular removal paths?
> > 
> 
> ct_exit_safe_mode is safe to be called many times - it just cancels a
> delayed worker.
> 
> The code looks correct to me.
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>

I was going to push this right now, but it looks it needs a rebase now...

> 
> > >  	destroy_workqueue(ct->g2h_wq);
> > >  	xa_destroy(&ct->fence_lookup);
> > >  }
> > >  
> > > -static void receive_g2h(struct xe_guc_ct *ct);
> > > -static void g2h_worker_func(struct work_struct *w);
> > > -static void safe_mode_worker_func(struct work_struct *w);
> > > -
> > >  static void primelockdep(struct xe_guc_ct *ct)
> > >  {
> > >  	if (!IS_ENABLED(CONFIG_LOCKDEP))
> > > -- 
> > > 2.47.1
> > >

next prev parent reply	other threads:[~2025-06-24 21:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-12 22:09 [PATCH 0/2] Improve handling of aborted probe Michal Wajdeczko
2025-06-12 22:09 ` [PATCH 1/2] drm/xe: Process deferred GGTT node removals on device unwind Michal Wajdeczko
2025-06-23 21:22   ` Rodrigo Vivi
2025-06-25 14:20   ` Maarten Lankhorst
2025-06-12 22:09 ` [PATCH 2/2] drm/xe/guc: Explicitly exit CT safe mode on unwind Michal Wajdeczko
2025-06-23 21:29   ` Rodrigo Vivi
2025-06-24 15:32     ` Matthew Brost
2025-06-24 21:30       ` Rodrigo Vivi [this message]
2025-06-25  7:51         ` Michal Wajdeczko
2025-06-13  5:23 ` ✗ CI.checkpatch: warning for Improve handling of aborted probe Patchwork
2025-06-13  5:24 ` ✓ CI.KUnit: success " Patchwork
2025-06-13  6:51 ` ✓ Xe.CI.BAT: " Patchwork
2025-06-14 10:47 ` ✓ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFsY65GdKwZA1HZU@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=michal.wajdeczko@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.