Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Cavitt, Jonathan" <jonathan.cavitt@intel.com>
Cc: "Summers, Stuart" <stuart.summers@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Harrison,  John C" <john.c.harrison@intel.com>,
	"Gupta, saurabhg" <saurabhg.gupta@intel.com>,
	"De Marchi, Lucas" <lucas.demarchi@intel.com>
Subject: Re: [PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned
Date: Fri, 12 Apr 2024 23:45:32 +0000	[thread overview]
Message-ID: <ZhnHnMO8gsppdwg4@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <BL1PR11MB544510DC6F6F4F2644E2DA0DE5042@BL1PR11MB5445.namprd11.prod.outlook.com>

On Fri, Apr 12, 2024 at 02:58:21PM -0600, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Summers, Stuart <stuart.summers@intel.com> 
> Sent: Friday, April 12, 2024 1:49 PM
> To: intel-xe@lists.freedesktop.org; Cavitt, Jonathan <jonathan.cavitt@intel.com>
> Cc: Brost, Matthew <matthew.brost@intel.com>; Harrison, John C <john.c.harrison@intel.com>; Gupta, saurabhg <saurabhg.gupta@intel.com>; De Marchi, Lucas <lucas.demarchi@intel.com>
> Subject: Re: [PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned
> > 
> > On Fri, 2024-04-05 at 10:55 -0700, Jonathan Cavitt wrote:
> > > LR queues currently don't get banned during a GT/GuC reset because
> > > they
> > > lack a job.  Though they don't have a job to detect the reset status
> > > of,
> > > it's still possible to tell when they should be banned by looking at
> > > the
> > > LRC: if the LRC head and tail don't match, then the exec queue should
> > > be
> > > banned and cleaned up.
> > > 
> > > This also requires swapping the usage of xe_sched_tdr_queue_imm with
> > > xe_guc_exec_queue_trigger_cleanup, as the former is specific to non-
> > > lr
> > > exec queues.
> > > 
> > > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > > Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> > > Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > > 
> > > v2:
> > > - Fix Subject line
> > > - Modify change slightly to remove need for "ban" boolean
> > > 
> > > v3: Revert change involving "ban" boolean to version 1
> > > 
> > > v4: Add missing semicolon and remove whitespace
> > > 
> > >  drivers/gpu/drm/xe/xe_guc_submit.c | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index 1a6abb10a960e..e72f2a6cad60a 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -1424,15 +1424,23 @@ static void guc_exec_queue_stop(struct xe_guc
> > > *guc, struct xe_exec_queue *q)
> > >          */
> > >         if (!(q->flags & (EXEC_QUEUE_FLAG_KERNEL |
> > > EXEC_QUEUE_FLAG_VM))) {
> > >                 struct xe_sched_job *job =
> > > xe_sched_first_pending_job(sched);
> > > +               bool ban = false;
> > >  
> > >                 if (job) {
> > >                         if ((xe_sched_job_started(job) &&
> > >                             !xe_sched_job_completed(job)) ||
> > >                             xe_sched_invalidate_job(job, 2)) {
> > >                                 trace_xe_sched_job_ban(job);
> > > -                               set_exec_queue_banned(q);
> > > -                               xe_sched_tdr_queue_imm(&q->guc-
> > > >sched);
> > > +                               ban = true;
> > >                         }
> > > +               } else if (xe_exec_queue_is_lr(q) &&
> > > +                          (xe_lrc_ring_head(q->lrc) != q->lrc-
> > > >ring.tail)) {
> > 
> > Why do you read the head out of the lrc but the tail from our internal
> > variable? Isn't there a small chance here that you could get something
> > ready to send but not quite submitted to GuC and in which case these
> > would not be equal but not necessarily need the ban? I guess the flip
> > side is maybe that doesn't actually have any real functional impact...
> 

LR jobs don't have in deps, i.e. they run immediately. I guess there is
very small window between exec IOCTL moving the tail and actually
hitting the hardware. 

> 
> IIRC it's because no equivalent xe_lrc_ring_head function exists to
> read the lrc ring tail.  It's likely that I just missed the function to read
> the tail, so if you know what the function is, I'll replace the internal
> call with the proper functional call for the next revision.
> 
> Alternatively, should I use the internal variable for both halves of
> the comparison?

No, if anything check both values in memory.

That being said, I think this is fine as is due the nature of LRC
jobs but I guess it would not hurt to read tail from memory.

If Stuart insists, we change this.

Matt

> -Jonathan Cavitt
> 
> 
> > 
> > Thanks,
> > Stuart
> > 
> > > +                       ban = true;
> > > +               }
> > > +
> > > +               if (ban) {
> > > +                       set_exec_queue_banned(q);
> > > +                       xe_guc_exec_queue_trigger_cleanup(q);
> > >                 }
> > >         }
> > >  }
> > 
> > 

  reply	other threads:[~2024-04-12 23:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 17:55 [PATCH v4 1/3] drm/xe/xe_guc_submit: Fix exec queue stop race condition Jonathan Cavitt
2024-04-05 17:55 ` [PATCH v4 2/3] drm/xe/xe_guc_submit: Allow lr exec queues to be banned Jonathan Cavitt
2024-04-12 20:48   ` Summers, Stuart
2024-04-12 20:58     ` Cavitt, Jonathan
2024-04-12 23:45       ` Matthew Brost [this message]
2024-04-15 18:20         ` Summers, Stuart
2024-04-05 17:55 ` [PATCH v4 3/3] drm/xe/xe_guc_submit: Declare reset if banned or killed Jonathan Cavitt
2024-04-09 22:56   ` Matthew Brost
2024-04-12 20:50   ` Summers, Stuart
2024-04-05 19:20 ` ✓ CI.Patch_applied: success for series starting with [v4,1/3] drm/xe/xe_guc_submit: Fix exec queue stop race condition Patchwork
2024-04-05 19:21 ` ✓ CI.checkpatch: " Patchwork
2024-04-05 19:21 ` ✓ CI.KUnit: " Patchwork
2024-04-05 19:33 ` ✓ CI.Build: " Patchwork
2024-04-05 19:35 ` ✓ CI.Hooks: " Patchwork
2024-04-05 19:37 ` ✓ CI.checksparse: " Patchwork
2024-04-05 20:07 ` ✓ CI.BAT: " Patchwork
2024-04-12 20:43 ` [PATCH v4 1/3] " Summers, Stuart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhnHnMO8gsppdwg4@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=jonathan.cavitt@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=saurabhg.gupta@intel.com \
    --cc=stuart.summers@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox