Re: [PATCH] drm/xe: Wait for HW clearance before issuing the next TLB inval.

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: <fei.yang@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	Stuart Summers <stuart.summers@intel.com>,
	Roper Matthew D <matthew.d.roper@intel.com>
Subject: Re: [PATCH] drm/xe: Wait for HW clearance before issuing the next TLB inval.
Date: Tue, 21 Apr 2026 19:49:53 -0700	[thread overview]
Message-ID: <aeg3UcVPkPuzJ8Wj@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <aeg1NgXPg+y8E7Ho@gsse-cloud1.jf.intel.com>

On Tue, Apr 21, 2026 at 07:40:54PM -0700, Matthew Brost wrote:
> On Fri, Apr 17, 2026 at 05:03:49PM -0700, fei.yang@intel.com wrote:
> > From: Fei Yang <fei.yang@intel.com>
> > 
> > Hardware requires the software to poll the valid bit and make sure
> > it's cleared before issuing a new TLB invalidation request.
> > We also need to avoid racing against GuC on TLB invalidations. In
> > order to achieve that, add a mutex to serialize TLB invalidation
> > request, and whenever KMD initiates TLB invalidation, make sure
> > we poll for the clearance of the valid bit before and after issuing
> > TLB invalidation request.
> > 
> > Signed-off-by: Fei Yang <fei.yang@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Stuart Summers <stuart.summers@intel.com>
> > Cc: Roper Matthew D <matthew.d.roper@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_gt.c            |  8 +++-
> >  drivers/gpu/drm/xe/xe_gt_types.h      |  7 +++
> >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 63 +++++++++++++++++++++++++--
> >  3 files changed, 73 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > index 8a31c963c372..186b1c10334b 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -486,15 +486,16 @@ static void wa_14026539277(struct xe_gt *gt)
> >  
> >  int xe_gt_init_early(struct xe_gt *gt)
> >  {
> > +	struct xe_device *xe = gt_to_xe(gt);
> >  	int err;
> >  
> > -	if (IS_SRIOV_PF(gt_to_xe(gt))) {
> > +	if (IS_SRIOV_PF(xe)) {
> >  		err = xe_gt_sriov_pf_init_early(gt);
> >  		if (err)
> >  			return err;
> >  	}
> >  
> > -	if (IS_SRIOV_VF(gt_to_xe(gt))) {
> > +	if (IS_SRIOV_VF(xe)) {
> >  		err = xe_gt_sriov_vf_init_early(gt);
> >  		if (err)
> >  			return err;
> > @@ -514,6 +515,9 @@ int xe_gt_init_early(struct xe_gt *gt)
> >  
> >  	xe_force_wake_init_gt(gt, gt_to_fw(gt));
> >  	spin_lock_init(&gt->global_invl_lock);
> > +	err = drmm_mutex_init(&xe->drm, &gt->ggtt_tlb_invl_lock);
> > +	if (err)
> > +		return err;
> >  
> >  	err = xe_gt_tlb_inval_init_early(gt);
> >  	if (err)
> > diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
> > index 7351aadd238e..3dd07d75d195 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_types.h
> > +++ b/drivers/gpu/drm/xe/xe_gt_types.h
> > @@ -324,6 +324,13 @@ struct xe_gt {
> >  	 */
> >  	spinlock_t global_invl_lock;
> >  
> > +	/**
> > +	 * @ggtt_tlb_invl_lock: prevents back to back TLB invalidation
> > +	 *    by serializing TLB invalidation requests with polling for
> > +	 *    the valid bit enforced in between
> > +	 */
> > +	struct mutex ggtt_tlb_invl_lock;
> > +
> >  	/** @wa_active: keep track of active workarounds */
> >  	struct {
> >  		/** @wa_active.gt: bitmap with active GT workarounds */
> > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > index ced58f46f846..2062f990e9de 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > @@ -63,7 +63,9 @@ static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval, u32 seqno)
> >  	struct xe_guc *guc = tlb_inval->private;
> >  	struct xe_gt *gt = guc_to_gt(guc);
> >  	struct xe_device *xe = guc_to_xe(guc);
> > +	int ret = -ECANCELED;
> >  
> > +	mutex_lock(&gt->ggtt_tlb_invl_lock);
> 
> We already have a ggtt mutex and TLB invalidation mutex held here, this
> is not the part of the code which needs extra protection.
> 
> >  	/*
> >  	 * Returning -ECANCELED in this function is squashed at the caller and
> >  	 * signals waiters.
> > @@ -76,26 +78,81 @@ static int send_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval, u32 seqno)
> >  			MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> >  		};
> >  
> > -		return send_tlb_inval(guc, action, ARRAY_SIZE(action));
> > +		ret = send_tlb_inval(guc, action, ARRAY_SIZE(action));
> > +		goto out;
> >  	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
> >  		struct xe_mmio *mmio = &gt->mmio;
> >  
> >  		if (IS_SRIOV_VF(xe))
> > -			return -ECANCELED;
> > +			goto out;
> > +
> > +		/*
> > +		 * If there are pending GuC TLB invalidation requests
> > +		 * KMD requests should be avoided
> > +		 */
> > +		if (!list_empty(&gt->tlb_inval.pending_fences))
> > +			goto out;
> 
> The above is not safe without tlb_inval->pending_lock. This should also
> be moved into xe_tlb_inval layer function.
> 

This is also functionaly incorrect. The check here is if
!list_is_singular as current invalidation on already on
tlb_inval.pending_fences. We also happen to have helper for this
already... xe_tlb_inval_idle()

So...

if (!xe_tlb_inval_idle(&gt->tlb_inval))
	goto out;

Matt

> >  
> >  		CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT);
> >  		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
> 
> The MMIO access does need protection. In practice, we already have
> locking via the GGTT lock, and this is currently the only place where we
> program these MMIO registers in the KMD. However, what happens if we
> need to program the GAM port somewhere else in the future?
> 
> This is why I think the MMIO code should be broken out into a gam_port
> component with its own locking around MMIO access. i.e., A much-simplified
> version of xe_gam_port.c in [1]. It may be a bit overkill, but it would
> provide a clean, reusable layer, and with clear ownership of the MMIO
> port.
> 
> Matt
> 
> [1] https://patchwork.freedesktop.org/patch/707237/?series=162171&rev=1
> 
> > +			/*
> > +			 * In case of any failure causing CT to be disabled,
> > +			 * KMD needs to make sure there is no pending TLB
> > +			 * invalidation issued by GuC before sending more TLB
> > +			 * request through mmio. Wait 1-second for the valid
> > +			 * bit to be cleared, otherwise cancel the request.
> > +			 */
> > +			ret = xe_mmio_wait32(mmio, PVC_GUC_TLB_INV_DESC0,
> > +					     PVC_GUC_TLB_INV_DESC0_VALID,
> > +					     0, 1000 * USEC_PER_MSEC, NULL, true);
> > +			if (ret) {
> > +				ret = -ECANCELED;
> > +				drm_dbg(&xe->drm, "Pending TLB INV not completed\n");
> > +				goto out;
> > +			}
> > +
> >  			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1,
> >  					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
> >  			xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC0,
> >  					PVC_GUC_TLB_INV_DESC0_VALID);
> > +
> > +			/*
> > +			 * In case the CT is recovered, make sure there is no
> > +			 * pending TLB invalidation request before GuC takes over
> > +			 */
> > +			ret = xe_mmio_wait32(mmio, PVC_GUC_TLB_INV_DESC0,
> > +					     PVC_GUC_TLB_INV_DESC0_VALID,
> > +					     0, 1000 * USEC_PER_MSEC, NULL, true);
> > +			if (ret)
> > +				drm_dbg(&xe->drm, "TLB INV not completed\n");
> > +			ret = -ECANCELED;
> >  		} else {
> > +			/* See comments in the if clause above */
> > +			ret = xe_mmio_wait32(mmio, GUC_TLB_INV_CR,
> > +					     GUC_TLB_INV_CR_INVALIDATE,
> > +					     0, 1000 * USEC_PER_MSEC, NULL, true);
> > +			if (ret) {
> > +				ret = -ECANCELED;
> > +				drm_dbg(&xe->drm, "Pending TLB INV not completed\n");
> > +				goto out;
> > +			}
> > +
> >  			xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> >  					GUC_TLB_INV_CR_INVALIDATE);
> > +
> > +			/* See comments in the if clause above */
> > +			ret = xe_mmio_wait32(mmio, GUC_TLB_INV_CR,
> > +					     GUC_TLB_INV_CR_INVALIDATE,
> > +					     0, 1000 * USEC_PER_MSEC, NULL, true);
> > +			if (ret)
> > +				drm_dbg(&xe->drm, "TLB INV not completed\n");
> > +			ret = -ECANCELED;
> >  		}
> >  	}
> >  
> > -	return -ECANCELED;
> > +out:
> > +	mutex_unlock(&gt->ggtt_tlb_invl_lock);
> > +	return ret;
> >  }
> >  
> >  static int send_page_reclaim(struct xe_guc *guc, u32 seqno,
> > -- 
> > 2.43.0
> >

next prev parent reply	other threads:[~2026-04-22  2:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-18  0:03 [PATCH] drm/xe: Wait for HW clearance before issuing the next TLB inval fei.yang
2026-04-20 19:58 ` Summers, Stuart
2026-04-20 20:51 ` ✓ CI.KUnit: success for drm/xe: Wait for HW clearance before issuing the next TLB inval. (rev3) Patchwork
2026-04-20 21:42 ` ✗ Xe.CI.BAT: failure " Patchwork
2026-04-21  0:34 ` ✗ Xe.CI.FULL: " Patchwork
2026-04-22  2:40 ` [PATCH] drm/xe: Wait for HW clearance before issuing the next TLB inval Matthew Brost
2026-04-22  2:49   ` Matthew Brost [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-04-04  6:22 fei.yang
2026-04-06 18:59 ` Matthew Brost
2026-04-06 21:01 ` Matt Roper
2026-03-17 23:21 fei.yang
2026-03-17 23:28 ` Summers, Stuart
2026-03-22  5:35   ` Matthew Brost
2026-03-24 20:39     ` Yang, Fei
2026-03-24 20:53       ` Matthew Brost
2026-03-24 20:58         ` Matthew Brost
2026-03-24 21:10           ` Summers, Stuart
2026-03-24 23:36             ` Matthew Brost
2026-03-25 18:37               ` Summers, Stuart
2026-03-25 22:00                 ` Matthew Brost
2026-03-25 22:25                   ` Summers, Stuart
2026-03-25 22:38                     ` Matthew Brost
2026-03-25 22:43                       ` Summers, Stuart
2026-03-26  0:54                         ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeg3UcVPkPuzJ8Wj@gsse-cloud1.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=fei.yang@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.d.roper@intel.com \
    --cc=stuart.summers@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox