Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, Matt Roper <matthew.d.roper@intel.com>
Subject: Re: [PATCH v6 27/30] drm/xe/vf: Use primary GT ordered work queue on media GT on PTL VF
Date: Tue, 7 Oct 2025 10:22:39 -0700	[thread overview]
Message-ID: <aOVMXzdQCB1e7LoV@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <h4mwspqv5chueicgkrvd4grucstgev3vfkuavaeqbweqecvznc@wjpe5lk7tkr3>

On Tue, Oct 07, 2025 at 12:00:32PM -0500, Lucas De Marchi wrote:
> On Mon, Oct 06, 2025 at 03:51:12PM -0700, Matthew Brost wrote:
> > On Mon, Oct 06, 2025 at 05:24:55PM -0500, Lucas De Marchi wrote:
> > > On Mon, Oct 06, 2025 at 04:10:35AM -0700, Matthew Brost wrote:
> > > > VF CCS restore is a primary GT operation on which the media GT depends.
> > > > Therefore, it doesn't make much sense to run these operations in
> > > 
> > > I'd need to double check the previous patches to see the entire
> > > picture, but this seems weird at a first glance. The VF CCS restore is
> > > not the single work we queue in gt->ordered_wq.  To me it seems more
> > > like "in what ordered queue we are going to queue the VF CCS restore. If
> > > it's global per device, why are we not using the device wq rather than
> > > making all the GT wq point to the same thing?
> > > 
> > 
> > This is where things get convoluted. Four mechanisms manipulate the
> > scheduling state by starting or stopping the DRM scheduler:
> > 
> > - Job timeouts
> > - GT resets
> > - VF restore
> > - PM resume
> > 
> > All of these paths require mutual exclusion, or the scheduler design
> > breaks.
> > 
> > The first three ensure ordering by scheduling on the same ordered work
> > queue (GT-ordered WQ). The last one is guaranteed by holding PM
> > references in all the right places.
> > 
> > Another issue is that the first three items are all in the reclaim path
> > — the GT-ordered work queue is designed to handle this.
> > 
> > This patch [1] explains the entire scheduler design in detail.
> > 
> > Only PTL has the cross-VF/GT restore ordering requirement, so I figured
> > the path of least resistance is to just point all GTs to the primary
> > work queue.
> > 
> > [1] https://patchwork.freedesktop.org/patch/677980/?series=154627&rev=4
> > 
> > > > parallel. To address this, point the media GT's ordered work queue to
> > > > the primary GT's ordered work queue on platforms that require (PTL VFs)
> > > > CCS restore as part of VF post-migration recovery.
> > > >
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_device_types.h | 2 ++
> > > > drivers/gpu/drm/xe/xe_gt.c           | 7 +++++--
> > > > drivers/gpu/drm/xe/xe_gt.h           | 2 +-
> > > > drivers/gpu/drm/xe/xe_pci.c          | 6 +++++-
> > > > drivers/gpu/drm/xe/xe_pci_types.h    | 1 +
> > > > drivers/gpu/drm/xe/xe_tile.c         | 2 +-
> > > > 6 files changed, 15 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > > > index c66523bf4bf0..02c04ad7296e 100644
> > > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > > @@ -334,6 +334,8 @@ struct xe_device {
> > > > 		u8 skip_mtcfg:1;
> > > > 		/** @info.skip_pcode: skip access to PCODE uC */
> > > > 		u8 skip_pcode:1;
> > > > +		/** @info.needs_shared_vf_gt_wq: needs shared GT WQ on VF */
> > > > +		u8 needs_shared_vf_gt_wq:1;
> > > > 	} info;
> > > >
> > > > 	/** @wa_active: keep track of active workarounds */
> > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > index cf484a2da35e..05465f358c96 100644
> > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > @@ -65,7 +65,7 @@
> > > > #include "xe_wa.h"
> > > > #include "xe_wopcm.h"
> > > >
> > > > -struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
> > > > +struct xe_gt *xe_gt_alloc(struct xe_tile *tile, bool use_primary_wq)
> > > 
> > > If using the device wq is not an option (possibly because it would queue
> > > with other undesired work going on there), then I'd rather drop this
> > > bool passing here and make the decision inside this function:
> > > 
> > > > {
> > > > 	struct drm_device *drm = &tile_to_xe(tile)->drm;
> > > > 	struct xe_gt *gt;
> > > > @@ -75,7 +75,10 @@ struct xe_gt *xe_gt_alloc(struct xe_tile *tile)
> > > > 		return ERR_PTR(-ENOMEM);
> > > >
> > > > 	gt->tile = tile;
> > > 
> > > 	if (!xe->info.needs_shared_gt_wq || !tile->primary_gt->ordered_wq)
> > > 		ordered_wq = drmm_alloc_ordered_workqueue(drm, "gt-ordered-wq", WQ_MEM_RECLAIM);
> > > 	else
> > > 		ordered_wq = tile->primary_gt->ordered_wq;
> > > 	if (IS_ERR_OR_NUL(ordered_wq))
> > >  		return ordered_wq ? ERR_CAST(gt->ordered_wq) : ERR_PTR(-EINVAL);
> > > 
> > > 	gt->ordered_wq = ordered_wq;
> > > 	
> > > ... or something like that so you use the xe info to decide it here
> > > rather than passing it down as a function arg.
> > > 
> > 
> > Sure can refactor this as you suggest.
> 
> 
> another option that would avoid a bool arg would be to actually pass the
> wq from the caller.
> 
> 	wq = NULL;
> 
> 	if (xe->info.needs_shared_gt_wq)
> 		wq = tile->primary_gt->ordered_wq;
> 
> 	xe_gt_alloc(tile, wq);
> 

I posted a v7/v8 which did it like you suggest.

> > 
> > > 
> > > > -	gt->ordered_wq = drmm_alloc_ordered_workqueue(drm, "gt-ordered-wq", WQ_MEM_RECLAIM);
> > > > +	if (use_primary_wq)
> > > > +		gt->ordered_wq = tile->primary_gt->ordered_wq;
> > > > +	else
> > > > +		gt->ordered_wq = drmm_alloc_ordered_workqueue(drm, "gt-ordered-wq", WQ_MEM_RECLAIM);
> > > > 	if (IS_ERR(gt->ordered_wq))
> > > > 		return ERR_CAST(gt->ordered_wq);
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
> > > > index 5df2ffe3ff83..9545c0c93ab6 100644
> > > > --- a/drivers/gpu/drm/xe/xe_gt.h
> > > > +++ b/drivers/gpu/drm/xe/xe_gt.h
> > > > @@ -28,7 +28,7 @@ static inline bool xe_fault_inject_gt_reset(void)
> > > > 	return IS_ENABLED(CONFIG_DEBUG_FS) && should_fail(&gt_reset_failure, 1);
> > > > }
> > > >
> > > > -struct xe_gt *xe_gt_alloc(struct xe_tile *tile);
> > > > +struct xe_gt *xe_gt_alloc(struct xe_tile *tile, bool use_primary_wq);
> > > > int xe_gt_init_early(struct xe_gt *gt);
> > > > int xe_gt_init(struct xe_gt *gt);
> > > > void xe_gt_mmio_init(struct xe_gt *gt);
> > > > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> > > > index 3f42b91efa28..25a1d96a68e7 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pci.c
> > > > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > > > @@ -347,6 +347,7 @@ static const struct xe_device_desc ptl_desc = {
> > > > 	.has_sriov = true,
> > > > 	.max_gt_per_tile = 2,
> > > > 	.needs_scratch = true,
> > > > +	.needs_shared_vf_gt_wq = true,
> > > 
> > > as per above... I think this needs to be detached from vf. There may be
> > > other reasons the wq needs to be shared.
> > > 
> > 
> > I’d prefer to share the work queue only when absolutely necessary — such
> > as when PTL is on a VF and migration is supported.
> 
> 
> I missed that you left the VF condition out as an && for the allocation:
> 

Yes.

> > > > +                                       xe->info.needs_shared_vf_gt_wq &&
> > > > +                                       IS_SRIOV_VF(xe));
> 
> IMO a better way would be to call it `needs_shared_gt_wq` and then
> override the condition in the vf-specific function:
> 
> sriov_update_device_info()
> 

I'm trying to follow this one.

This a platform (static) and VF (dynamic) condition combination.

Are you suggesting removing platform information from xe_pci.c and just
have info bit in the device which the sriov code sets? I think we'd need
a platform check then in the sriov code which I thought in general we
want to avoid inline platform checks.

Maybe I'm misunderstanding here.

Matt 

> Lucas De Marchi
> 
> > 
> > > If we just make them point to a device wq as suggested above, then
> > > there's no extra issue with the ongoing work to disable GTs that Matt
> > > Roper is doing (https://patchwork.freedesktop.org/series/154739/).
> > > Otherwise we will need to think on how to reconciliate them.
> > > 
> > 
> > I don’t see how it would ever be possible to disable the primary GT and
> > still do anything meaningful. You can’t allocate memory (e.g., for
> > clears), perform VM binds, or handle page faults without a copy engine.
> > 
> > Matt
> > 
> > > Lucas De Marchi
> > > 
> > > > };
> > > >
> > > > #undef PLATFORM
> > > > @@ -598,6 +599,7 @@ static int xe_info_init_early(struct xe_device *xe,
> > > > 	xe->info.skip_mtcfg = desc->skip_mtcfg;
> > > > 	xe->info.skip_pcode = desc->skip_pcode;
> > > > 	xe->info.needs_scratch = desc->needs_scratch;
> > > > +	xe->info.needs_shared_vf_gt_wq = desc->needs_shared_vf_gt_wq;
> > > >
> > > > 	xe->info.probe_display = IS_ENABLED(CONFIG_DRM_XE_DISPLAY) &&
> > > > 				 xe_modparam.probe_display &&
> > > > @@ -766,7 +768,9 @@ static int xe_info_init(struct xe_device *xe,
> > > > 		 * Allocate and setup media GT for platforms with standalone
> > > > 		 * media.
> > > > 		 */
> > > > -		tile->media_gt = xe_gt_alloc(tile);
> > > > +		tile->media_gt = xe_gt_alloc(tile,
> > > > +					     xe->info.needs_shared_vf_gt_wq &&
> > > > +					     IS_SRIOV_VF(xe));
> > > > 		if (IS_ERR(tile->media_gt))
> > > > 			return PTR_ERR(tile->media_gt);
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
> > > > index 9b9766a3baa3..b11bf6abda5b 100644
> > > > --- a/drivers/gpu/drm/xe/xe_pci_types.h
> > > > +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> > > > @@ -48,6 +48,7 @@ struct xe_device_desc {
> > > > 	u8 skip_guc_pc:1;
> > > > 	u8 skip_mtcfg:1;
> > > > 	u8 skip_pcode:1;
> > > > +	u8 needs_shared_vf_gt_wq:1;
> > > > };
> > > >
> > > > struct xe_graphics_desc {
> > > > diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
> > > > index 6edb5062c1da..e9bcff2de563 100644
> > > > --- a/drivers/gpu/drm/xe/xe_tile.c
> > > > +++ b/drivers/gpu/drm/xe/xe_tile.c
> > > > @@ -157,7 +157,7 @@ int xe_tile_init_early(struct xe_tile *tile, struct xe_device *xe, u8 id)
> > > > 	if (err)
> > > > 		return err;
> > > >
> > > > -	tile->primary_gt = xe_gt_alloc(tile);
> > > > +	tile->primary_gt = xe_gt_alloc(tile, false);
> > > > 	if (IS_ERR(tile->primary_gt))
> > > > 		return PTR_ERR(tile->primary_gt);
> > > >
> > > > --
> > > > 2.34.1
> > > >

  reply	other threads:[~2025-10-07 17:22 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06 11:10 [PATCH v6 00/30] VF migration redesign Matthew Brost
2025-10-06 11:10 ` [PATCH v6 01/30] drm/xe: Add NULL checks to scratch LRC allocation Matthew Brost
2025-10-06 21:51   ` Lis, Tomasz
2025-10-06 11:10 ` [PATCH v6 02/30] drm/xe: Save off position in ring in which a job was programmed Matthew Brost
2025-10-06 11:10 ` [PATCH v6 03/30] drm/xe/guc: Track pending-enable source in submission state Matthew Brost
2025-10-06 11:10 ` [PATCH v6 04/30] drm/xe: Track LR jobs in DRM scheduler pending list Matthew Brost
2025-10-06 11:10 ` [PATCH v6 05/30] drm/xe: Don't change LRC ring head on job resubmission Matthew Brost
2025-10-06 11:10 ` [PATCH v6 06/30] drm/xe: Make LRC W/A scratch buffer usage consistent Matthew Brost
2025-10-06 11:10 ` [PATCH v6 07/30] drm/xe/vf: Add xe_gt_recovery_pending helper Matthew Brost
2025-10-06 13:10   ` Michal Wajdeczko
2025-10-06 11:10 ` [PATCH v6 08/30] drm/xe/vf: Make VF recovery run on per-GT worker Matthew Brost
2025-10-06 11:10 ` [PATCH v6 09/30] drm/xe/vf: Abort H2G sends during VF post-migration recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 10/30] drm/xe/vf: Remove memory allocations from VF post migration recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 11/30] drm/xe/vf: Close multi-GT GGTT shift race Matthew Brost
2025-10-06 14:27   ` Michal Wajdeczko
2025-10-06 14:56     ` Matthew Brost
2025-10-06 11:10 ` [PATCH v6 12/30] drm/xe/vf: Teardown VF post migration worker on driver unload Matthew Brost
2025-10-06 11:10 ` [PATCH v6 13/30] drm/xe/vf: Don't allow GT reset to be queued during VF post migration recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 14/30] drm/xe/vf: Wakeup in GuC backend on " Matthew Brost
2025-10-06 14:35   ` Michal Wajdeczko
2025-10-06 15:54     ` Matthew Brost
2025-10-06 22:27   ` Lis, Tomasz
2025-10-06 23:07     ` Matthew Brost
2025-10-06 11:10 ` [PATCH v6 15/30] drm/xe/vf: Avoid indefinite blocking in preempt rebind worker for VFs supporting migration Matthew Brost
2025-10-06 11:10 ` [PATCH v6 16/30] drm/xe/vf: Use GUC_HXG_TYPE_EVENT for GuC context register Matthew Brost
2025-10-06 14:51   ` Michal Wajdeczko
2025-10-06 16:02     ` Matthew Brost
2025-10-06 22:21   ` Lis, Tomasz
2025-10-06 22:57     ` Matthew Brost
2025-10-06 11:10 ` [PATCH v6 17/30] drm/xe/vf: Flush and stop CTs in VF post migration recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 18/30] drm/xe/vf: Reset TLB invalidations during " Matthew Brost
2025-10-06 11:10 ` [PATCH v6 19/30] drm/xe/vf: Kickstart after resfix in " Matthew Brost
2025-10-06 11:10 ` [PATCH v6 20/30] drm/xe/vf: Start CTs before resfix " Matthew Brost
2025-10-06 21:50   ` Lis, Tomasz
2025-10-06 11:10 ` [PATCH v6 21/30] drm/xe/vf: Abort VF post migration recovery on failure Matthew Brost
2025-10-06 11:10 ` [PATCH v6 22/30] drm/xe/vf: Replay GuC submission state on pause / unpause Matthew Brost
2025-10-06 11:10 ` [PATCH v6 23/30] drm/xe: Move queue init before LRC creation Matthew Brost
2025-10-06 15:22   ` Michal Wajdeczko
2025-10-06 21:33   ` Lis, Tomasz
2025-10-06 11:10 ` [PATCH v6 24/30] drm/xe/vf: Add debug prints for GuC replaying state during VF recovery Matthew Brost
2025-10-06 11:10 ` [PATCH v6 25/30] drm/xe/vf: Workaround for race condition in GuC firmware during VF pause Matthew Brost
2025-10-06 11:10 ` [PATCH v6 26/30] drm/xe: Use PPGTT addresses for TLB invalidation to avoid GGTT fixups Matthew Brost
2025-10-06 11:10 ` [PATCH v6 27/30] drm/xe/vf: Use primary GT ordered work queue on media GT on PTL VF Matthew Brost
2025-10-06 22:24   ` Lucas De Marchi
2025-10-06 22:51     ` Matthew Brost
2025-10-07 17:00       ` Lucas De Marchi
2025-10-07 17:22         ` Matthew Brost [this message]
2025-10-07 20:36           ` Lucas De Marchi
2025-10-07 21:18             ` Matthew Brost
2025-10-06 11:10 ` [PATCH v6 28/30] drm/xe/vf: Ensure media GT VF recovery runs after primary GT on PTL Matthew Brost
2025-10-06 11:10 ` [PATCH v6 29/30] drm/xe/vf: Rebase CCS save/restore BB GGTT addresses Matthew Brost
2025-10-06 11:10 ` [PATCH v6 30/30] drm/xe/guc: Increase wait timeout to 2sec after BUSY reply from GuC Matthew Brost
2025-10-06 11:17 ` ✗ CI.checkpatch: warning for VF migration redesign (rev6) Patchwork
2025-10-06 11:18 ` ✓ CI.KUnit: success " Patchwork
2025-10-06 12:24 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-06 14:28 ` ✗ Xe.CI.Full: " Patchwork
2025-10-07  0:20 ` [PATCH v6 00/30] VF migration redesign Niranjana Vishwanathapura
2025-10-07  1:11   ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aOVMXzdQCB1e7LoV@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.d.roper@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox