public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Summers, Stuart" <stuart.summers@intel.com>
To: "Cavitt, Jonathan" <jonathan.cavitt@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Brost,  Matthew" <matthew.brost@intel.com>,
	"Vishwanathapura,
	Niranjana" <niranjana.vishwanathapura@intel.com>
Subject: Re: [PATCH] drm/xe: Add min and max context TLB invalidation sizes
Date: Thu, 19 Mar 2026 21:51:01 +0000	[thread overview]
Message-ID: <5ba63a84ccfb6e271b434106495b2822c6c45c49.camel@intel.com> (raw)
In-Reply-To: <CH0PR11MB5444C674AF8A8CD19A915045E54FA@CH0PR11MB5444.namprd11.prod.outlook.com>

On Thu, 2026-03-19 at 21:36 +0000, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of
> Summers, Stuart
> Sent: Thursday, March 19, 2026 2:12 PM
> To: Summers, Stuart <stuart.summers@intel.com>
> Cc: intel-xe@lists.freedesktop.org; Brost, Matthew
> <matthew.brost@intel.com>; Vishwanathapura, Niranjana
> <niranjana.vishwanathapura@intel.com>
> Subject: Re: [PATCH] drm/xe: Add min and max context TLB invalidation
> sizes
> > 
> > On Thu, 2026-03-19 at 21:05 +0000, Stuart Summers wrote:
> > > Allow platform-defined TLB invalidation min and max lengths.
> > > 
> > > This gives finer granular control to which invalidations we
> > > decide to send to GuC. The min size is essentially a round
> > > up. The max allows us to switch to a full invalidation.
> > > 
> > > The expectation here is that GuC will translate the full
> > > invalidation in this instance into a series of per context
> > > invalidaitons. These are then issued with no H2G or G2H
> > > messages and therefore should be quicker than splitting
> > > the invalidations from the KMD in max size chunks and sending
> > > separately.
> > > 
> > > v2: Add proper defaults for min/max if not set in the device
> > >     structures
> > > 
> > > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_device_types.h  |  4 ++++
> > >  drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 20 +++++++-------------
> > >  drivers/gpu/drm/xe/xe_pci.c           |  3 +++
> > >  drivers/gpu/drm/xe/xe_pci_types.h     |  2 ++
> > >  4 files changed, 16 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > > b/drivers/gpu/drm/xe/xe_device_types.h
> > > index 615218d775b1..0c4168fe2ffb 100644
> > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > @@ -137,6 +137,10 @@ struct xe_device {
> > >                 u8 vm_max_level;
> > >                 /** @info.va_bits: Maximum bits of a virtual
> > > address
> > > */
> > >                 u8 va_bits;
> > > +               /** @info.min_tlb_inval_size: Minimum size of
> > > context
> > > based TLB invalidations */
> > > +               u64 min_tlb_inval_size;
> > > +               /** @info.max_tlb_inval_size: Maximum size of
> > > context
> > > based TLB invalidations */
> > > +               u64 max_tlb_inval_size;
> > >  
> > >                 /*
> > >                  * Keep all flags below alphabetically sorted
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > index eb40528976ca..7512f889a97a 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > @@ -133,12 +133,12 @@ static int send_page_reclaim(struct xe_guc
> > > *guc, u32 seqno,
> > >  
> > >  static u64 normalize_invalidation_range(struct xe_gt *gt, u64
> > > *start, u64 *end)
> > >  {
> > > +       struct xe_device *xe = gt_to_xe(gt);
> > >         u64 orig_start = *start;
> > >         u64 length = *end - *start;
> > >         u64 align;
> > >  
> > > -       if (length < SZ_4K)
> > > -               length = SZ_4K;
> > > +       length = max_t(u64, xe->info.min_tlb_inval_size, length);
> > >  
> > >         align = roundup_pow_of_two(length);
> > >         *start = ALIGN_DOWN(*start, align);
> > > @@ -163,13 +163,6 @@ static u64
> > > normalize_invalidation_range(struct
> > > xe_gt *gt, u64 *start, u64 *end)
> > >         return length;
> > >  }
> > >  
> > > -/*
> > > - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > > - * Note that roundup_pow_of_two() operates on unsigned long,
> > > - * not on u64.
> > > - */
> > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > (rounddown_pow_of_two(ULONG_MAX))
> > > -
> > >  static int send_tlb_inval_ppgtt(struct xe_guc *guc, u32 seqno,
> > > u64
> > > start,
> > >                                 u64 end, u32 id, u32 type,
> > >                                 struct drm_suballoc *prl_sa)
> > > @@ -178,9 +171,12 @@ static int send_tlb_inval_ppgtt(struct
> > > xe_guc
> > > *guc, u32 seqno, u64 start,
> > >         struct xe_gt *gt = guc_to_gt(guc);
> > >         struct xe_device *xe = guc_to_xe(guc);
> > >         u32 action[MAX_TLB_INVALIDATION_LEN];
> > > -       u64 length = end - start;
> > > +       u64 normalize_len;
> > >         int len = 0, err;
> > >  
> > > +       normalize_len = normalize_invalidation_range(gt, &start,
> > > +                                                    &end);
> > > +
> > >         xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE
> > > &&
> > >                           !xe->info.has_ctx_tlb_inval) ||
> > >                      (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX
> > > &&
> > > @@ -189,11 +185,9 @@ static int send_tlb_inval_ppgtt(struct
> > > xe_guc
> > > *guc, u32 seqno, u64 start,
> > >         action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > >         action[len++] = !prl_sa ? seqno :
> > > TLB_INVALIDATION_SEQNO_INVALID;
> > >         if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > +           normalize_len > xe->info.max_tlb_inval_size) {
> > >                 action[len++] =
> > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > >         } else {
> > > -               u64 normalize_len =
> > > normalize_invalidation_range(gt,
> > > &start,
> > > -                                                               
> > > &end);
> > >                 bool need_flush = !prl_sa &&
> > >                         seqno != TLB_INVALIDATION_SEQNO_INVALID;
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_pci.c
> > > b/drivers/gpu/drm/xe/xe_pci.c
> > > index 189e2a1c29f9..5e02f9ab625b 100644
> > > --- a/drivers/gpu/drm/xe/xe_pci.c
> > > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > > @@ -743,6 +743,9 @@ static int xe_info_init_early(struct
> > > xe_device
> > > *xe,
> > >         xe->info.vm_max_level = desc->vm_max_level;
> > >         xe->info.vram_flags = desc->vram_flags;
> > >  
> > > +       xe->info.min_tlb_inval_size = desc->min_tlb_inval_size ?:
> > > SZ_4K;
> > > +       xe->info.max_tlb_inval_size = desc->max_tlb_inval_size ?:
> > > SZ_1G;
> > 
> > Basically I decided to get rid of the pathological case for now. I
> > figured 1G is a good happy medium for larger VMAs. Let me know if
> > you
> > have any concerns there though.
> 
> I was going to comment on this once I saw it, but you already
> considered
> this issue.  But yeah, SZ_1G seems to be only about 1/2 (or maybe 1/4
> depending on how rounddown_pow_of_two operates on ULONG_MAX)
> the size of the original MAX_RANGE_TLB_INVALIDATION_LENGTH value,
> so I'd say it's probably fine.
> 
> This patch doesn't have any other major mechanical differences
> compared
> to before its application, though there are some minor concerns
> regarding
> normalize_invalidation_range:
> 
> 1. There's an xe_gt_assert(gt, length >= SZ_4K); that was left
> unchanged in
> this patch.  We might want to target xe->info.min_tlb_inval_size
> instead?

Thanks for the review Jonathan!

So my thought here is that from the spec, we need to have 4K aligned
pages here. The interface in xe_pci.c isn't really doing any validation
of the values we are providing through the various per platform
structures, so this is kind of a guarantee that we are at least
following that requirement. The max_t() call is what is being used to
make sure the value is aligned to the min value - which per the assert
should be at least 4K.

> 2. Using normalize_invalidation_range immediately in
> send_tlb_inval_ppgtt
> instead of initially operating on 'end - start' like before might
> result in more full
> invalidation cases compared to before, as the function seems to
> return a value
> that is more often than not larger than the raw 'end - start' value.

Yeah that's true. I think there's going to be some tradeoff here of
doing a full invalidation vs a series of smaller range invalidations if
the size is large enough. From the hardware perspective, we might get
more performance out of just invalidating everything. We already have
code to do this in the context invalidation sequence, and this takes a
similar approach.

> 
> As long as you find these tradeoffs acceptable, I don't see any
> reason to block
> on these concerns.  Regardless of if you make changes based on these
> notes or not:
> 
> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>

Thanks!
Stuart

> -Jonathan Cavitt
> 
> > 
> > Thanks,
> > Stuart
> > 
> > > +
> > >         xe->info.is_dgfx = desc->is_dgfx;
> > >         xe->info.has_cached_pt = desc->has_cached_pt;
> > >         xe->info.has_fan_control = desc->has_fan_control;
> > > diff --git a/drivers/gpu/drm/xe/xe_pci_types.h
> > > b/drivers/gpu/drm/xe/xe_pci_types.h
> > > index 8eee4fb1c57c..cd9d3ad96fe0 100644
> > > --- a/drivers/gpu/drm/xe/xe_pci_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> > > @@ -34,6 +34,8 @@ struct xe_device_desc {
> > >         u8 va_bits;
> > >         u8 vm_max_level;
> > >         u8 vram_flags;
> > > +       u64 min_tlb_inval_size;
> > > +       u64 max_tlb_inval_size;
> > >  
> > >         u8 require_force_probe:1;
> > >         u8 is_dgfx:1;
> > 
> > 


  reply	other threads:[~2026-03-19 21:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 21:05 [PATCH] drm/xe: Add min and max context TLB invalidation sizes Stuart Summers
2026-03-19 21:11 ` Summers, Stuart
2026-03-19 21:36   ` Cavitt, Jonathan
2026-03-19 21:51     ` Summers, Stuart [this message]
2026-03-19 21:12 ` ✓ CI.KUnit: success for drm/xe: Add min and max context TLB invalidation sizes (rev2) Patchwork
2026-03-19 22:18 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-20 20:05 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-03-22  5:56 ` [PATCH] drm/xe: Add min and max context TLB invalidation sizes Matthew Brost
2026-03-23 19:22   ` Summers, Stuart
  -- strict thread matches above, loose matches on Subject: below --
2026-03-20 20:46 Stuart Summers
2026-03-20 20:49 ` Summers, Stuart
2026-03-20 20:59   ` Cavitt, Jonathan
2026-03-23 17:23 ` Matthew Brost
2026-03-23 19:18   ` Summers, Stuart
2026-03-17 19:50 Stuart Summers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ba63a84ccfb6e271b434106495b2822c6c45c49.camel@intel.com \
    --to=stuart.summers@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jonathan.cavitt@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=niranjana.vishwanathapura@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox