Re: [PATCH] drm/xe: Add min and max context TLB invalidation sizes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Summers, Stuart" <stuart.summers@intel.com>
To: "Brost, Matthew" <matthew.brost@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Vishwanathapura,
	Niranjana" <niranjana.vishwanathapura@intel.com>,
	"Cavitt, Jonathan" <jonathan.cavitt@intel.com>
Subject: Re: [PATCH] drm/xe: Add min and max context TLB invalidation sizes
Date: Mon, 23 Mar 2026 19:18:54 +0000	[thread overview]
Message-ID: <8e8d66b91bcb33072e27537ec4b856f68247fa5b.camel@intel.com> (raw)
In-Reply-To: <acF3Gp800SnB/g5D@lstrano-desk.jf.intel.com>

On Mon, 2026-03-23 at 10:23 -0700, Matthew Brost wrote:
> On Fri, Mar 20, 2026 at 08:46:30PM +0000, Stuart Summers wrote:
> > Allow platform-defined TLB invalidation min and max lengths.
> > 
> > This gives finer granular control to which invalidations we
> > decide to send to GuC. The min size is essentially a round
> > up. The max allows us to switch to a full invalidation.
> > 
> > The expectation here is that GuC will translate the full
> > invalidation in this instance into a series of per context
> > invalidaitons. These are then issued with no H2G or G2H
> > messages and therefore should be quicker than splitting
> > the invalidations from the KMD in max size chunks and sending
> > separately.
> > 
> > v2: Add proper defaults for min/max if not set in the device
> >     structures
> > v3: Add coverage for pow-of-2 out of bounds cases
> > 
> > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_device_types.h  |  4 +++
> >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 39 +++++++++++++++++------
> > ----
> >  drivers/gpu/drm/xe/xe_pci.c           |  3 +++
> >  drivers/gpu/drm/xe/xe_pci_types.h     |  2 ++
> >  4 files changed, 34 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > index 615218d775b1..0c4168fe2ffb 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -137,6 +137,10 @@ struct xe_device {
> >                 u8 vm_max_level;
> >                 /** @info.va_bits: Maximum bits of a virtual
> > address */
> >                 u8 va_bits;
> > +               /** @info.min_tlb_inval_size: Minimum size of
> > context based TLB invalidations */
> > +               u64 min_tlb_inval_size;
> > +               /** @info.max_tlb_inval_size: Maximum size of
> > context based TLB invalidations */
> > +               u64 max_tlb_inval_size;
> >  
> >                 /*
> >                  * Keep all flags below alphabetically sorted
> > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > index ced58f46f846..e9e0be94ceef 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > @@ -115,14 +115,23 @@ static int send_page_reclaim(struct xe_guc
> > *guc, u32 seqno,
> >                               G2H_LEN_DW_PAGE_RECLAMATION, 1);
> >  }
> >  
> > +/*
> > + * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > + * Note that roundup_pow_of_two() operates on unsigned long,
> > + * not on u64.
> > + */
> > +#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > (rounddown_pow_of_two(ULONG_MAX))
> > +
> >  static u64 normalize_invalidation_range(struct xe_gt *gt, u64
> > *start, u64 *end)
> >  {
> > +       struct xe_device *xe = gt_to_xe(gt);
> >         u64 orig_start = *start;
> >         u64 length = *end - *start;
> >         u64 align;
> >  
> > -       if (length < SZ_4K)
> > -               length = SZ_4K;
> > +       xe_gt_assert(gt, length <=
> > MAX_RANGE_TLB_INVALIDATION_LENGTH);
> > +
> > +       length = max_t(u64, xe->info.min_tlb_inval_size, length);
> >  
> >         align = roundup_pow_of_two(length);
> >         *start = ALIGN_DOWN(*start, align);
> > @@ -147,13 +156,6 @@ static u64 normalize_invalidation_range(struct
> > xe_gt *gt, u64 *start, u64 *end)
> >         return length;
> >  }
> >  
> > -/*
> > - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > - * Note that roundup_pow_of_two() operates on unsigned long,
> > - * not on u64.
> > - */
> > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > (rounddown_pow_of_two(ULONG_MAX))
> > -
> >  static int send_tlb_inval_ppgtt(struct xe_guc *guc, u32 seqno, u64
> > start,
> >                                 u64 end, u32 id, u32 type,
> >                                 struct drm_suballoc *prl_sa)
> > @@ -162,8 +164,20 @@ static int send_tlb_inval_ppgtt(struct xe_guc
> > *guc, u32 seqno, u64 start,
> >         struct xe_gt *gt = guc_to_gt(guc);
> >         struct xe_device *xe = guc_to_xe(guc);
> >         u32 action[MAX_TLB_INVALIDATION_LEN];
> > -       u64 length = end - start;
> > +       u64 normalize_len, length = end - start;
> >         int len = 0, err;
> > +       bool do_full_inval = false;
> > +
> > +       if (!xe->info.has_range_tlb_inval ||
> > +           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > +               do_full_inval = true;
> > +       } else {
> > +               normalize_len = normalize_invalidation_range(gt,
> > &start,
> > +                                                            &end);
> > +
> > +               if (normalize_len > xe->info.max_tlb_inval_size)
> > +                       do_full_inval = true;
> > +       }
> 
> I suggested this is the last rev, can this logic be moved to
> send_tlb_inval_asid_ppgtt / send_tlb_inval_ctx_ppgtt?
> 
> For send_tlb_inval_asid_ppgtt it doesn't really matter as
> send_tlb_inval_ppgtt is called once.
> 
> But consider send_tlb_inval_ctx_ppgtt where send_tlb_inval_ppgtt is
> called multiple times and each call fails the
> normalize_invalidation_range step (i.e., we set do_full_inval). We
> only
> need to issue one full invalidation, not multiple.

Yeah it's a good suggestion. I'll split this out in the next rev.

And I'll repond to those other comments in the earlier rev.

Thanks,
Stuart

> 
> So likely want to hook in early in existing if statement in
> send_tlb_inval_ctx_ppgtt.
> 
> 244 #define EXEC_QUEUE_COUNT_FULL_THRESHOLD 8
> 245         if (vm->exec_queues.count[id] >=
> EXEC_QUEUE_COUNT_FULL_THRESHOLD) {
> 246                 u32 action[] = {
> 247                         XE_GUC_ACTION_TLB_INVALIDATION,
> 248                         seqno,
> 249                         MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> 250                 };
> 251
> 252                 err = send_tlb_inval(guc, action,
> ARRAY_SIZE(action));
> 253                 goto err_unlock;
> 254         }
> 255 #undef EXEC_QUEUE_COUNT_FULL_THRESHOLD
> 
> Matt
> 
> >  
> >         xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE
> > &&
> >                           !xe->info.has_ctx_tlb_inval) ||
> > @@ -172,12 +186,9 @@ static int send_tlb_inval_ppgtt(struct xe_guc
> > *guc, u32 seqno, u64 start,
> >  
> >         action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> >         action[len++] = !prl_sa ? seqno :
> > TLB_INVALIDATION_SEQNO_INVALID;
> > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > +       if (do_full_inval) {
> >                 action[len++] =
> > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> >         } else {
> > -               u64 normalize_len =
> > normalize_invalidation_range(gt, &start,
> > -                                                               
> > &end);
> >                 bool need_flush = !prl_sa &&
> >                         seqno != TLB_INVALIDATION_SEQNO_INVALID;
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_pci.c
> > b/drivers/gpu/drm/xe/xe_pci.c
> > index 189e2a1c29f9..5e02f9ab625b 100644
> > --- a/drivers/gpu/drm/xe/xe_pci.c
> > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > @@ -743,6 +743,9 @@ static int xe_info_init_early(struct xe_device
> > *xe,
> >         xe->info.vm_max_level = desc->vm_max_level;
> >         xe->info.vram_flags = desc->vram_flags;
> >  
> > +       xe->info.min_tlb_inval_size = desc->min_tlb_inval_size ?:
> > SZ_4K;
> > +       xe->info.max_tlb_inval_size = desc->max_tlb_inval_size ?:
> > SZ_1G;
> > +
> >         xe->info.is_dgfx = desc->is_dgfx;
> >         xe->info.has_cached_pt = desc->has_cached_pt;
> >         xe->info.has_fan_control = desc->has_fan_control;
> > diff --git a/drivers/gpu/drm/xe/xe_pci_types.h
> > b/drivers/gpu/drm/xe/xe_pci_types.h
> > index 8eee4fb1c57c..cd9d3ad96fe0 100644
> > --- a/drivers/gpu/drm/xe/xe_pci_types.h
> > +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> > @@ -34,6 +34,8 @@ struct xe_device_desc {
> >         u8 va_bits;
> >         u8 vm_max_level;
> >         u8 vram_flags;
> > +       u64 min_tlb_inval_size;
> > +       u64 max_tlb_inval_size;
> >  
> >         u8 require_force_probe:1;
> >         u8 is_dgfx:1;
> > -- 
> > 2.43.0
> >

next prev parent reply	other threads:[~2026-03-23 19:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 20:46 [PATCH] drm/xe: Add min and max context TLB invalidation sizes Stuart Summers
2026-03-20 20:49 ` Summers, Stuart
2026-03-20 20:59   ` Cavitt, Jonathan
2026-03-20 20:53 ` ✓ CI.KUnit: success for drm/xe: Add min and max context TLB invalidation sizes (rev3) Patchwork
2026-03-20 21:31 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-21 22:09 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-03-23 17:23 ` [PATCH] drm/xe: Add min and max context TLB invalidation sizes Matthew Brost
2026-03-23 19:18   ` Summers, Stuart [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-03-19 21:05 Stuart Summers
2026-03-19 21:11 ` Summers, Stuart
2026-03-19 21:36   ` Cavitt, Jonathan
2026-03-19 21:51     ` Summers, Stuart
2026-03-22  5:56 ` Matthew Brost
2026-03-23 19:22   ` Summers, Stuart
2026-03-17 19:50 Stuart Summers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e8d66b91bcb33072e27537ec4b856f68247fa5b.camel@intel.com \
    --to=stuart.summers@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jonathan.cavitt@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=niranjana.vishwanathapura@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.