Re: [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo cachelines manually

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Upadhyay, Tejas" <tejas.upadhyay@intel.com>,
	"Auld, Matthew" <matthew.auld@intel.com>,
	"Roper, Matthew D" <matthew.d.roper@intel.com>,
	 "Souza, Jose" <jose.souza@intel.com>
Cc: "Mrozek, Michal" <michal.mrozek@intel.com>,
	 "intel-xe@lists.freedesktop.org"	
	<intel-xe@lists.freedesktop.org>,
	"Brost, Matthew" <matthew.brost@intel.com>
Subject: Re: [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo cachelines manually
Date: Tue, 17 Feb 2026 10:53:19 +0100	[thread overview]
Message-ID: <edc9d10bc6cbcf22a1661b535d144a74dd7f1002.camel@linux.intel.com> (raw)
In-Reply-To: <SJ1PR11MB620450500088AA17377A9B18816DA@SJ1PR11MB6204.namprd11.prod.outlook.com>

On Tue, 2026-02-17 at 06:19 +0000, Upadhyay, Tejas wrote:
> 
> 
> > -----Original Message-----
> > From: Auld, Matthew <matthew.auld@intel.com>
> > Sent: 16 February 2026 22:12
> > To: Thomas Hellström <thomas.hellstrom@linux.intel.com>; Roper,
> > Matthew
> > D <matthew.d.roper@intel.com>; Souza, Jose <jose.souza@intel.com>
> > Cc: Upadhyay, Tejas <tejas.upadhyay@intel.com>; Mrozek, Michal
> > <michal.mrozek@intel.com>; intel-xe@lists.freedesktop.org; Brost,
> > Matthew
> > <matthew.brost@intel.com>
> > Subject: Re: [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo
> > cachelines manually
> > 
> > On 16/02/2026 15:38, Thomas Hellström wrote:
> > > On Mon, 2026-02-16 at 14:55 +0000, Matthew Auld wrote:
> > > > On 16/02/2026 12:07, Thomas Hellström wrote:
> > > > > On Mon, 2026-02-16 at 10:58 +0000, Matthew Auld wrote:
> > > > > > On 16/02/2026 10:23, Thomas Hellström wrote:
> > > > > > > On Fri, 2026-02-13 at 17:31 +0000, Matthew Auld wrote:
> > > > > > > > On 13/02/2026 17:16, Matt Roper wrote:
> > > > > > > > > On Fri, Feb 13, 2026 at 04:48:39PM +0000, Souza, Jose
> > > > > > > > > wrote:
> > > > > > > > > > On Fri, 2026-02-13 at 16:23 +0000, Upadhyay, Tejas
> > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > From: Roper, Matthew D
> > > > > > > > > > > > <matthew.d.roper@intel.com>
> > > > > > > > > > > > Sent: 12 February 2026 02:41
> > > > > > > > > > > > To: Upadhyay, Tejas <tejas.upadhyay@intel.com>
> > > > > > > > > > > > Cc: Brost, Matthew <matthew.brost@intel.com>;
> > > > > > > > > > > > intel-
> > > > > > > > > > > > xe@lists.freedesktop.org; Auld, Matthew
> > > > > > > > > > > > <matthew.auld@intel.com>;
> > > > > > > > > > > > thomas.hellstrom@linux.intel.com
> > > > > > > > > > > > Subject: Re: [PATCH 1/3] drm/xe/xe3p_lpg: flush
> > > > > > > > > > > > userptr/shrinker bo cachelines manually
> > > > > > > > > > > > 
> > > > > > > > > > > > On Wed, Feb 11, 2026 at 07:06:05PM +0000,
> > > > > > > > > > > > Upadhyay, Tejas
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > > From: Brost, Matthew
> > > > > > > > > > > > > > <matthew.brost@intel.com>
> > > > > > > > > > > > > > Sent: 11 February 2026 05:32
> > > > > > > > > > > > > > To: Roper, Matthew D
> > > > > > > > > > > > > > <matthew.d.roper@intel.com>
> > > > > > > > > > > > > > Cc: Upadhyay, Tejas
> > > > > > > > > > > > > > <tejas.upadhyay@intel.com>;
> > > > > > > > > > > > > > intel-
> > > > > > > > > > > > > > xe@lists.freedesktop.org; Auld, Matthew
> > > > > > > > > > > > > > <matthew.auld@intel.com>;
> > > > > > > > > > > > > > thomas.hellstrom@linux.intel.com
> > > > > > > > > > > > > > Subject: Re: [PATCH 1/3] drm/xe/xe3p_lpg:
> > > > > > > > > > > > > > flush
> > > > > > > > > > > > > > userptr/shrinker bo cachelines manually
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Tue, Feb 10, 2026 at 01:05:25PM -0800,
> > > > > > > > > > > > > > Matt Roper
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > On Tue, Feb 10, 2026 at 06:21:22PM +0530,
> > > > > > > > > > > > > > > Tejas Upadhyay
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > "eXtended Architecture" (XA) tagged
> > > > > > > > > > > > > > > > memory—memory
> > shared
> > > > > > > > > > > > between
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > CPU and GPU
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I'm pretty sure this expansion of "XA" is
> > > > > > > > > > > > > > > wrong; where are
> > > > > > > > > > > > > > > you seeing this definition?  Everything
> > > > > > > > > > > > > > > in the bspec
> > > > > > > > > > > > > > > indicates that XA means "wb
> > > > > > > > > > > > > > > - transient app" (similar to how "XD" is
> > > > > > > > > > > > > > > 'wb - transient
> > > > > > > > > > > > > > > display").
> > > > > > > > > > > > > > > I'm not sure why exactly they picked "X"
> > > > > > > > > > > > > > > to refer to
> > > > > > > > > > > > > > > transient in both of these cases, but
> > > > > > > > > > > > > > > I've never seen any
> > > > > > > > > > > > > > > documentation that refers to it as
> > > > > > > > > > > > > > > "extended."
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > is treated differently from other GPU
> > > > > > > > > > > > > > > > memory when the
> > > > > > > > > > > > > > > > Media engine is
> > > > > > > > > > > > > > power-gated.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > XA is *always* flushed, like at the
> > > > > > > > > > > > > > > > end-of- submssion
> > > > > > > > > > > > > > > > (and maybe other
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I assume you're referring to the fact
> > > > > > > > > > > > > > > that the driver
> > > > > > > > > > > > > > > performs flushes at the end of submission
> > > > > > > > > > > > > > > (via
> > > > > > > > > > > > > > > PIPE_CONTROL or MI_FLUSH_DW), and that
> > > > > > > > > > > > > > > depending on
> > other
> > > > > > > > > > > > > > > state/optimizations in the system, those
> > > > > > > > > > > > > > > flushes may flush
> > > > > > > > > > > > > > > the entire device cache, or may only
> > > > > > > > > > > > > > > flush the subset of
> > > > > > > > > > > > > > > cache data that is not marked as
> > > > > > > > > > > > > > > transient.  The way you
> > > > > > > > > > > > > > > worded this was confusing since it makes
> > > > > > > > > > > > > > > it sound like
> > > > > > > > > > > > > > > cache flushes happen automatically
> > > > > > > > > > > > > > > somewhere in
> > > > > > > > > > > > hardware/firmware.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > places), just that internally as an
> > > > > > > > > > > > > > > > optimisation hw
> > > > > > > > > > > > > > > > doesn't need to make that a full flush
> > > > > > > > > > > > > > > > (which will also
> > > > > > > > > > > > > > > > include
> > > > > > > > > > > > > > > > XA) when
> > > > > > > > > > > > > > > > Media is off/powergated, since it
> > > > > > > > > > > > > > > > doesn't need to worry
> > > > > > > > > > > > > > > > about GT caches vs Media coherency, and
> > > > > > > > > > > > > > > > only CPU vs GPU
> > > > > > > > > > > > > > > > coherency, so can make that flush a
> > > > > > > > > > > > > > > > targeted XA flush,
> > > > > > > > > > > > > > > > since stuff tagged with XA now means
> > > > > > > > > > > > > > > > it's shared with the
> > > > > > > > > > > > > > > > CPU. The main implication is that we
> > > > > > > > > > > > > > > > now need to somehow
> > > > > > > > > > > > > > > > flush non-XA before freeing system
> > > > > > > > > > > > > > > > memory pages,
> > > > > > > > > > > > > > > > otherwise dirty cachelines could be
> > > > > > > > > > > > > > > > flushed after the
> > > > > > > > > > > > > > > > free (like if Media suddenly turns on
> > > > > > > > > > > > > > > > and does a full
> > > > > > > > > > > > > > > > flush)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > This description seems really confusing. 
> > > > > > > > > > > > > > > My understanding
> > > > > > > > > > > > > > > is that marking something as wb-
> > > > > > > > > > > > > > > transient-app indicates
> > > > > > > > > > > > > > > that it might be accessed by something
> > > > > > > > > > > > > > > other than our
> > > > > > > > > > > > > > > graphics/media IP (i.e., accessed from
> > > > > > > > > > > > > > > the CPU, exported
> > > > > > > > > > > > > > > to another device, etc.), so transient
> > > > > > > > > > > > > > > data truly does
> > > > > > > > > > > > > > > need to be flushed at the points in the
> > > > > > > > > > > > > > > driver where a
> > > > > > > > > > > > > > > flush typically happens.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > However when something is _not_
> > > > > > > > > > > > > > > transient, then
> > > > > > > > > > > > > > > either:
> > > > > > > > > > > > > > >      - it's "private" to the GPU and only
> > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > graphics/media IP will be
> > > > > > > > > > > > > > >        accessing it
> > > > > > > > > > > > > > >      - it's bound with a coherent PAT
> > > > > > > > > > > > > > > index so that
> > > > > > > > > > > > > > > outside observers like
> > > > > > > > > > > > > > >        the CPU can snoop the device
> > > > > > > > > > > > > > > cache, even when the
> > > > > > > > > > > > > > > cache hasn't been
> > > > > > > > > > > > > > >        flushed
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > If media is not active, then there's
> > > > > > > > > > > > > > > really no need to
> > > > > > > > > > > > > > > include non-transient data when an device
> > > > > > > > > > > > > > > cache flush
> > > > > > > > > > > > > > > happens since there's no real need for
> > > > > > > > > > > > > > > the data to get to
> > > > > > > > > > > > > > > RAM.
> > > > > > > > > > > > > > > So
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > enables
> > > > > > > > > > > > > > > an optimization (which comes in your next
> > > > > > > > > > > > > > > patch), that
> > > > > > > > > > > > > > > allows flushes to only operate on the
> > > > > > > > > > > > > > > subset of the device
> > > > > > > > > > > > > > > cache tagged as
> > > > > > > > > > > > "transient" if media is idle.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > But what If we have stale non-XA marked pages
> > > > > > > > > > > > > for userptr,
> > > > > > > > > > > > > and that object moves out and at the same
> > > > > > > > > > > > > time media comes
> > > > > > > > > > > > > back, will end up in full flush and flush the
> > > > > > > > > > > > > stale entry to
> > > > > > > > > > > > > RAM.
> > > > > > > > > > > > 
> > > > > > > > > > > > What makes userptr special here?  During
> > > > > > > > > > > > general, active
> > > > > > > > > > > > usage, userptr would be data that's accessible
> > > > > > > > > > > > by the CPU, so
> > > > > > > > > > > > it needs to either be transient (so CPU can see
> > > > > > > > > > > > the data in
> > > > > > > > > > > > RAM after explicit flushes) or it needs to be
> > > > > > > > > > > > using a
> > > > > > > > > > > > coherent PAT (so that the CPU can just snoop
> > > > > > > > > > > > the GPU cache).
> > > > > > > > > > > > If
> > > > > > > > > > > > you marked
> > > > > > > > > > > > userptr as both non-XA and non-coherent, then
> > > > > > > > > > > > that sounds
> > > > > > > > > > > > likely to be a userspace bug (and probably
> > > > > > > > > > > > something we can
> > > > > > > > > > > > catch and reject as an invalid case on any Xe3p
> > > > > > > > > > > > or later
> > > > > > > > > > > > platforms that support
> > > > > > > > > > > > this)
> > > > > > > > > > > > since
> > > > > > > > > > > > the
> > > > > > > > > > > > CPU wouldn't
> > > > > > > > > > > > have any reliable way of seeing GPU updates.
> > > > > > > > > > > 
> > > > > > > > > > > Right. FYI @Mrozek, Michal @Souza, Jose For
> > > > > > > > > > > userptr, as
> > > > > > > > > > > explained above, it needs to be either coherent
> > > > > > > > > > > or XA pat
> > > > > > > > > > > index, or else KMD will reject as invalid case.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > The coherency restriction is already in the uAPI:
> > > > > > > > > > 
> > > > > > > > > > "Note: For userptr and externally imported dma-buf
> > > > > > > > > > the kernel
> > > > > > > > > > expects either 1WAY or 2WAY for the @pat_index."
> > > > > > > > > > 
> > > > > > > > > > Using 1 way is enough as Xe KMD does a PIPE_CONTROL
> > > > > > > > > > flushing
> > > > > > > > > > GPU caches at the end of batch buffers.
> > > > > > > > > 
> > > > > > > > > But isn't that what we're discussing here?  1-way
> > > > > > > > > *won't*
> > > > > > > > > necessarily be enough anymore because PIPE_CONTROL
> > > > > > > > > instructions
> > > > > > > > > don't flush the entire cache anymore.  Whenever the
> > > > > > > > > GuC
> > > > > > > > > determines that media is inactive and activates the
> > > > > > > > > optimization, PIPE_CONTROL, MI_FLUSH_DW, etc.
> > > > > > > > > change
> > > > > > > > > behavior to only flush out the subset of data that
> > > > > > > > > was marked as
> > > > > > > > > app-transient; anything not marked that way doesn't
> > > > > > > > > get flushed
> > > > > > > > > now.  So there's a new requirement here that you
> > > > > > > > > ensure you're
> > > > > > > > > using an XA PAT index, or you switch to use 2-way
> > > > > > > > > coherency
> > > > > > > > > which will allow the CPU to snoop the GPU's caches.
> > > > > > > > 
> > > > > > > > That exactly matches my understanding also.
> > > > > > > 
> > > > > > > This only ever affects IGFX, right? Since AFAIU we don't
> > > > > > > have
> > > > > > > 2-way coherency with DGFX?
> > > > > > 
> > > > > > Yeah, this should be igpu only. I seem to also recall that
> > > > > > on dgpu,
> > > > > > Media is coherent with l2/l3, but also I don't think system
> > > > > > memory
> > > > > > can be cached in l2/l3 (only VRAM), which I assume is why
> > > > > > there is
> > > > > > the special SMRO (system-memory-read-only) cache only on
> > > > > > dgpu,
> > > > > > which is flushed when the fence signals, unlike the l2/l3.
> > > > > 
> > > > > Yes that sounds reasonable.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > It sounds like the same PAT restriction is needed also
> > > > > > > for
> > > > > > > imported dma-buf, right?
> > > > > > 
> > > > > > Good point. Looks like we are missing that still. Otherwise
> > > > > > we can
> > > > > > run into the same issues with stale l2/l3/ppc.
> > > > > 
> > > > > So if this affects only system memory could we instead of
> > > > > relying on
> > > > > 2- way coherency or XA, just flush at dma unmap time, because
> > > > > that's
> > > > > typically just before releasing the pages.
> > > > 
> > > > Yeah, I think we could make it work, from security pov, similar
> > > > to
> > > > userptr, with the right manual flushes in KMD. Maybe just a
> > > > question
> > > > if userspace wants such a model? Anything cached in l2/l3 might
> > > > require manual flushing by userspace (if that is even
> > > > possible)?
> > > 
> > > So that would mean if user-space wants gpu-cpu coherency at fence
> > > synchronization points, they'd have to use either 2-way or XA pat
> > > indices, but not enforced by KMD.
> > 
> > Yeah, looking at BSpec 74635 (Media off case), I'm only really
> > seeing
> > MEM_SET which userspace could potentially use by itself? But then
> > it's unclear
> > if they mean to actually clear-the-memory (which is not what we
> > want) or using the special evict mode, but that seems to be talking
> > more about
> > flushing to local memory, so not completely sure what that does on
> > igpu. If it's
> > the evict mode then should in theory be possible for userpace to do
> > a manual
> > flush, but that would have to be done per-bo/vma?
> 
> MEM_SET says, range needs to be specified as part of command with
> evict mode.
> 
> > 
> > > 
> > > For imported dma-buf kernel requires 2-way or XA for security due
> > > to
> > > the relaxed dma-buf unmap.
> > > 
> > > For SVM/System allocator we'd require 2-way or XA.
> > > 
> > > Otherwise KMD security is enforced by flush at dma-unmap time?
> > 
> > Yeah, that is my understanding. Otherwise I don't currently see
> > what prevents
> > the dirty non-XA cache lines being flushed at some random point
> > later, after
> > we have already freed the corresponding system memory, potentially
> > nuking
> > the next user who allocates those pages.
> 
> Hmm, so it means we can drop this patch completely and do something
> like below :
> 
> In, xe_migrate_dma_unmap(),
> 
> dma_unmap_page()
> if (pat_index != 18 or 19 && coh_mode != 2_way)
> 	/* manual_flush */

I think for userptr you'd want to add this in
__vma_userptr_invalidate(), just before drm_gpusvm_unmap_pages().

For bos you'd want to add it in xe_tt_unmap_sg() just before
dma_unmap_sgtable(). But if you want to do it conditionally, you would
need a flag in the struct xe_ttm_tt that is set whenever the owning bo
is mapped in such a way that content has not been flushed.

/Thomas

> 
> Tejas
> 
> > 
> > > 
> > > /Thomas
> > > 
> > > > 
> > > > > 
> > > > > The exception, though, is dma-buf where the exporter can
> > > > > actually
> > > > > release memory before all importers have given up their dma-
> > > > > mappings.
> > > > > 
> > > > > /Thomas
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > /Thomas
> > > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Matt
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > If something happens that changes the GTT
> > > > > > > > > > > > mapping of
> > > > > > > > > > > > an
> > > > > > > > > > > > object,
> > > > > > > > > > > > then
> > > > > > > > > > > > doesn't that already trigger a TLB invalidation
> > > > > > > > > > > > when
> > > > > > > > > > > > necessary in
> > > > > > > > > > > > the driver
> > > > > > > > > > > > today?  It was my understanding that "heavy"
> > > > > > > > > > > > TLB
> > > > > > > > > > > > invalidations wait
> > > > > > > > > > > > for data
> > > > > > > > > > > > values to be globally observable before
> > > > > > > > > > > > starting, so
> > > > > > > > > > > > I
> > > > > > > > > > > > think
> > > > > > > > > > > > that
> > > > > > > > > > > > would ensure
> > > > > > > > > > > > that any non-XA data makes it to RAM before any
> > > > > > > > > > > > binding
> > > > > > > > > > > > changes,
> > > > > > > > > > > > object,
> > > > > > > > > > > > destruction, etc.?  Is there something special
> > > > > > > > > > > > about
> > > > > > > > > > > > userptr
> > > > > > > > > > > > that
> > > > > > > > > > > > makes that
> > > > > > > > > > > > case more of a problem?
> > > > > > > > > > > > 
> > > > > > > > > > > > I just found bspec page 74635 which gives an
> > > > > > > > > > > > overview
> > > > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > various flush
> > > > > > > > > > > > and invalidate cases, and I don't see anything
> > > > > > > > > > > > there
> > > > > > > > > > > > that
> > > > > > > > > > > > makes it
> > > > > > > > > > > > obvious to
> > > > > > > > > > > > me that userptr would be special.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > As you said, we eventually do want to
> > > > > > > > > > > > > > > force a
> > > > > > > > > > > > > > > flush
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > non-transient data as well once we're
> > > > > > > > > > > > > > > freeing
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > underlying
> > > > > > > > > > > > > > > pages.
> > > > > > > > > > > > > > > So how do we do that?  It's not clear to
> > > > > > > > > > > > > > > me how
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > below
> > > > > > > > > > > > > > > are accomplishing that.  Is there a way
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > explicitly
> > > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > full device cache flush (ignoring the
> > > > > > > > > > > > > > > transient
> > > > > > > > > > > > > > > vs
> > > > > > > > > > > > > > > non-
> > > > > > > > > > > > > > > transient tagging)?
> > > > > > > > > > > > > > > Since the GuC handles the optimization in
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > next
> > > > > > > > > > > > > > > patch
> > > > > > > > > > > > > > > (toggling
> > > > > > > > > > > > > > > whether flushes are full flushes vs non-
> > > > > > > > > > > > > > > transient
> > > > > > > > > > > > > > > flushes
> > > > > > > > > > > > > > > depending on whether media is active), I
> > > > > > > > > > > > > > > thought
> > > > > > > > > > > > > > > there
> > > > > > > > > > > > > > > might
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > some kind of GuC interface to request
> > > > > > > > > > > > > > > "please
> > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > full
> > > > > > > > > > > > > > > flush now, even
> > > > > > > > > > > > if media is idle."
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I’m not an expert here by any means, but
> > > > > > > > > > > > > > everything
> > > > > > > > > > > > > > above
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > Matt
> > > > > > > > > > > > > > seems like valid concerns. Thomas also
> > > > > > > > > > > > > > raised
> > > > > > > > > > > > > > some
> > > > > > > > > > > > > > concerns in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > two previous revisions; again I’m not an
> > > > > > > > > > > > > > expert,
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > reading
> > > > > > > > > > > > > > through
> > > > > > > > > > > > > > those, it doesn’t really seem like he
> > > > > > > > > > > > > > received
> > > > > > > > > > > > > > proper
> > > > > > > > > > > > > > answers
> > > > > > > > > > > > > > to his
> > > > > > > > > > > > questions.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Its forcing flush via tlb invalidation PPC
> > > > > > > > > > > > > flag
> > > > > > > > > > > > > under
> > > > > > > > > > > > > xe_invalidate_vma( ).
> > > > > > > > > > > > 
> > > > > > > > > > > > By the way, what is "PPC?"  It seems like it's
> > > > > > > > > > > > another
> > > > > > > > > > > > new
> > > > > > > > > > > > synonym
> > > > > > > > > > > > for the
> > > > > > > > > > > > device cache?  It's already really confusing
> > > > > > > > > > > > that
> > > > > > > > > > > > some of
> > > > > > > > > > > > our
> > > > > > > > > > > > hardware docs use
> > > > > > > > > > > > a mix of both "L2" and "L3" to refer to the
> > > > > > > > > > > > same
> > > > > > > > > > > > device
> > > > > > > > > > > > cache
> > > > > > > > > > > > for
> > > > > > > > > > > > historical
> > > > > > > > > > > > reasons...
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Matt
> > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > A couple of comments below.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Matt
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > V2(MattA): Expand commit description
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Signed-off-by: Tejas Upadhyay
> > > > > > > > > > > > > > > > <tejas.upadhyay@intel.com>
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >      drivers/gpu/drm/xe/xe_bo.c      | 
> > > > > > > > > > > > > > > > 3 ++-
> > > > > > > > > > > > > > > >      drivers/gpu/drm/xe/xe_device.c  |
> > > > > > > > > > > > > > > > 23
> > > > > > > > > > > > > > > > +++++++++++++++++++++++
> > > > > > > > > > > > > > > > drivers/gpu/drm/xe/xe_device.h  |  1 +
> > > > > > > > > > > > > > > > drivers/gpu/drm/xe/xe_userptr.c |  3
> > > > > > > > > > > > > > > > ++-
> > > > > > > > > > > > > > > >      4 files changed, 28 insertions(+),
> > > > > > > > > > > > > > > > 2
> > > > > > > > > > > > > > > > deletions(-)
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > > > > > > > > > b/drivers/gpu/drm/xe/xe_bo.c index
> > > > > > > > > > > > > > > > e9180b01a4e4..4455886b211e
> > > > > > > > > > > > > > > > 100644
> > > > > > > > > > > > > > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > > > > > > > > > @@ -689,7 +689,8 @@ static int
> > > > > > > > > > > > > > > > xe_bo_trigger_rebind(struct
> > > > > > > > > > > > > > > > xe_device *xe, struct xe_bo *bo,
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > >      		if
> > > > > > > > > > > > > > > > (!xe_vm_in_fault_mode(vm)) {
> > > > > > > > > > > > > > > >      			drm_gpuvm_bo_e
> > > > > > > > > > > > > > > > vict(v
> > > > > > > > > > > > > > > > m_bo
> > > > > > > > > > > > > > > > ,
> > > > > > > > > > > > > > > > true);
> > > > > > > > > > > > > > > > -			continue;
> > > > > > > > > > > > > > > > +			if
> > > > > > > > > > > > > > > > (!xe_device_needs_cache_flush(xe))
> > > > > > > > > > > > > > > > +				contin
> > > > > > > > > > > > > > > > ue;
> > > > > > > > > > > 
> > > > > > > > > > > Matt R,
> > > > > > > > > > > This flush will be still needed as there can be
> > > > > > > > > > > non-xa
> > > > > > > > > > > buffers
> > > > > > > > > > > which
> > > > > > > > > > > can be evicted while media was off and stale
> > > > > > > > > > > entries
> > > > > > > > > > > can be
> > > > > > > > > > > flushed
> > > > > > > > > > > when media comes back on. Which was not case
> > > > > > > > > > > earlier as
> > > > > > > > > > > full
> > > > > > > > > > > flush
> > > > > > > > > > > was happening at regular sync points and that’s
> > > > > > > > > > > where
> > > > > > > > > > > this
> > > > > > > > > > > feature is
> > > > > > > > > > > bringing optimization now.
> > > > > > > > > > > 
> > > > > > > > > > > Tejas
> > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This will trigger a TLB invalidation (and I
> > > > > > > > > > > > > > assume a
> > > > > > > > > > > > > > cache
> > > > > > > > > > > > > > flush)
> > > > > > > > > > > > > > every time we move or free memory in the 3D
> > > > > > > > > > > > > > stack
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > has a
> > > > > > > > > > > > > > binding. It also performs a synchronous
> > > > > > > > > > > > > > wait on
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > BO
> > > > > > > > > > > > > > being
> > > > > > > > > > > > > > idle.
> > > > > > > > > > > > > > Both of these are very expensive
> > > > > > > > > > > > > > operations. I
> > > > > > > > > > > > > > can’t
> > > > > > > > > > > > > > imagine
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > granularity we want here is to do this on
> > > > > > > > > > > > > > every
> > > > > > > > > > > > > > move/free
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > bindings.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Also, for LR compute with preempt fences,
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > would
> > > > > > > > > > > > > > trigger the
> > > > > > > > > > > > > > preempt fences during the wait, so a TLB
> > > > > > > > > > > > > > invalidation
> > > > > > > > > > > > > > after
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > seems unnecessary, though perhaps the cache
> > > > > > > > > > > > > > flush
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > still
> > > > > > > > > > > > > > required?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I think this needs a bit more explanation,
> > > > > > > > > > > > > > because
> > > > > > > > > > > > > > without
> > > > > > > > > > > > > > knowing a
> > > > > > > > > > > > > > lot about the exact requirements, the
> > > > > > > > > > > > > > implementation
> > > > > > > > > > > > > > does
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > look
> > > > > > > > > > > > correct.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The thing is that we are trying to solve
> > > > > > > > > > > > > problem
> > > > > > > > > > > > > with
> > > > > > > > > > > > > userptr
> > > > > > > > > > > > > with non-XA
> > > > > > > > > > > > pat, consider if that BO got moved while media
> > > > > > > > > > > > is not
> > > > > > > > > > > > active.
> > > > > > > > > > > > As
> > > > > > > > > > > > soon as media
> > > > > > > > > > > > will come back active, stale cached entries of
> > > > > > > > > > > > that
> > > > > > > > > > > > object
> > > > > > > > > > > > will be
> > > > > > > > > > > > flushed as part
> > > > > > > > > > > > of full flush , which may corrupt things.
> > > > > > > > > > > > > There was thinking that with this patch we
> > > > > > > > > > > > > would at
> > > > > > > > > > > > > least
> > > > > > > > > > > > > solve
> > > > > > > > > > > > > the problem
> > > > > > > > > > > > of corruption and later when page_reclamation
> > > > > > > > > > > > feature
> > > > > > > > > > > > comes
> > > > > > > > > > > > in will
> > > > > > > > > > > > help in
> > > > > > > > > > > > performance as well. But now when page
> > > > > > > > > > > > reclamation
> > > > > > > > > > > > feature is
> > > > > > > > > > > > merged earlier
> > > > > > > > > > > > and it tightly coupled with bind/unbind some
> > > > > > > > > > > > cases
> > > > > > > > > > > > like
> > > > > > > > > > > > discussed
> > > > > > > > > > > > above
> > > > > > > > > > > > (which are not doing unbind immediately on
> > > > > > > > > > > > move/free)
> > > > > > > > > > > > are
> > > > > > > > > > > > missed in
> > > > > > > > > > > > reclamation.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > So thought was to let this solution go in
> > > > > > > > > > > > > with
> > > > > > > > > > > > > little
> > > > > > > > > > > > > perf
> > > > > > > > > > > > > hit
> > > > > > > > > > > > > and discuss with
> > > > > > > > > > > > page reclamation owner to come with cleaner
> > > > > > > > > > > > solution
> > > > > > > > > > > > together.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Tejas
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > >      		}
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > >      		if (!idle) {
> > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > a/drivers/gpu/drm/xe/xe_device.c
> > > > > > > > > > > > > > > > b/drivers/gpu/drm/xe/xe_device.c index
> > > > > > > > > > > > > > > > 743c18e0c580..da2abed94bc0
> > > > > > > > > > > > > > > > 100644
> > > > > > > > > > > > > > > > --- a/drivers/gpu/drm/xe/xe_device.c
> > > > > > > > > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_device.c
> > > > > > > > > > > > > > > > @@ -1097,6 +1097,29 @@ static void
> > > > > > > > > > > > > > > > tdf_request_sync(struct
> > > > > > > > > > > > > > > > xe_device
> > > > > > > > > > > > > > *xe)
> > > > > > > > > > > > > > > >      	}
> > > > > > > > > > > > > > > >      }
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > +/**
> > > > > > > > > > > > > > > > + * xe_device_needs_cache_flush -
> > > > > > > > > > > > > > > > Whether the
> > > > > > > > > > > > > > > > cache
> > > > > > > > > > > > > > > > needs
> > > > > > > > > > > > > > > > to be
> > > > > > > > > > > > > > > > +flushed
> > > > > > > > > > > > > > > > + * @xe: The device to check.
> > > > > > > > > > > > > > > > + *
> > > > > > > > > > > > > > > > + * Return: true if the device needs
> > > > > > > > > > > > > > > > cache
> > > > > > > > > > > > > > > > flush,
> > > > > > > > > > > > > > > > false
> > > > > > > > > > > > > > > > otherwise.
> > > > > > > > > > > > > > > > + */
> > > > > > > > > > > > > > > > +bool
> > > > > > > > > > > > > > > > xe_device_needs_cache_flush(struct
> > > > > > > > > > > > > > > > xe_device
> > > > > > > > > > > > > > > > *xe) {
> > > > > > > > > > > > > > > > +	/* XA is *always* flushed,
> > > > > > > > > > > > > > > > like at
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > end-
> > > > > > > > > > > > > > > > of-
> > > > > > > > > > > > > > > > submssion (and
> > > > > > > > > > > > > > > > +maybe
> > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > +	 * places), just that
> > > > > > > > > > > > > > > > internally as
> > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > optimisation hw doesn't
> > > > > > > > > > > > > > > > +need to
> > > > > > > > > > > > > > make
> > > > > > > > > > > > > > > > +	 * that a full flush (which
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > include XA)
> > > > > > > > > > > > > > > > when Media is
> > > > > > > > > > > > > > > > +	 * off/powergated, since it
> > > > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > worry
> > > > > > > > > > > > > > > > about GT
> > > > > > > > > > > > > > > > +caches vs
> > > > > > > > > > > > > > Media
> > > > > > > > > > > > > > > > +	 * coherency, and only CPU vs
> > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > coherency,
> > > > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > > can make
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > > +flush
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > +	 * targeted XA flush, since
> > > > > > > > > > > > > > > > stuff
> > > > > > > > > > > > > > > > tagged
> > > > > > > > > > > > > > > > with XA
> > > > > > > > > > > > > > > > now means
> > > > > > > > > > > > > > > > +it's shared
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > +	 * the CPU. The main
> > > > > > > > > > > > > > > > implication is
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > now
> > > > > > > > > > > > > > > > need to
> > > > > > > > > > > > > > > > +somehow
> > > > > > > > > > > > > > flush non-XA before
> > > > > > > > > > > > > > > > +	 * freeing system memory
> > > > > > > > > > > > > > > > pages,
> > > > > > > > > > > > > > > > otherwise
> > > > > > > > > > > > > > > > dirty
> > > > > > > > > > > > > > > > cachelines
> > > > > > > > > > > > > > > > +could be
> > > > > > > > > > > > > > flushed after the free
> > > > > > > > > > > > > > > > +	 * (like if Media suddenly
> > > > > > > > > > > > > > > > turns on
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > full flush)
> > > > > > > > > > > > > > > > +	 */
> > > > > > > > > > > > > > > > +	if (GRAPHICS_VER(xe) >= 35 &&
> > > > > > > > > > > > > > > > !IS_DGFX(xe))
> > > > > > > > > > > > > > > > +		return true;
> > > > > > > > > > > > > > > > +	return false;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >      void xe_device_l2_flush(struct
> > > > > > > > > > > > > > > > xe_device
> > > > > > > > > > > > > > > > *xe)
> > > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > >      	struct xe_gt *gt;
> > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > a/drivers/gpu/drm/xe/xe_device.h
> > > > > > > > > > > > > > > > b/drivers/gpu/drm/xe/xe_device.h index
> > > > > > > > > > > > > > > > 39464650533b..baf386e0e037
> > > > > > > > > > > > > > > > 100644
> > > > > > > > > > > > > > > > --- a/drivers/gpu/drm/xe/xe_device.h
> > > > > > > > > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_device.h
> > > > > > > > > > > > > > > > @@ -184,6 +184,7 @@ void
> > > > > > > > > > > > > > > > xe_device_snapshot_print(struct
> > > > > > > > > > > > > > > > xe_device *xe, struct drm_printer *p);
> > > > > > > > > > > > > > > >      u64
> > > > > > > > > > > > > > > > xe_device_canonicalize_addr(struct
> > > > > > > > > > > > > > > > xe_device
> > > > > > > > > > > > > > > > *xe, u64
> > > > > > > > > > > > > > > > address);
> > > > > > > > > > > > > > > >      u64
> > > > > > > > > > > > > > > > xe_device_uncanonicalize_addr(struct
> > > > > > > > > > > > > > > > xe_device
> > > > > > > > > > > > > > > > *xe,
> > > > > > > > > > > > > > > > u64
> > > > > > > > > > > > > > > > address);
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > +bool
> > > > > > > > > > > > > > > > xe_device_needs_cache_flush(struct
> > > > > > > > > > > > > > > > xe_device
> > > > > > > > > > > > > > > > *xe);
> > > > > > > > > > > > > > > >      void xe_device_td_flush(struct
> > > > > > > > > > > > > > > > xe_device
> > > > > > > > > > > > > > > > *xe);
> > > > > > > > > > > > > > > > void
> > > > > > > > > > > > > > > > xe_device_l2_flush(struct xe_device
> > > > > > > > > > > > > > > > *xe);
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > a/drivers/gpu/drm/xe/xe_userptr.c
> > > > > > > > > > > > > > > > b/drivers/gpu/drm/xe/xe_userptr.c index
> > > > > > > > > > > > > > > > e120323c43bc..b435ea7f9b66
> > > > > > > > > > > > > > > > 100644
> > > > > > > > > > > > > > > > --- a/drivers/gpu/drm/xe/xe_userptr.c
> > > > > > > > > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_userptr.c
> > > > > > > > > > > > > > > > @@ -114,7 +114,8 @@ static void
> > > > > > > > > > > > > > > > __vma_userptr_invalidate(struct
> > > > > > > > > > > > > > > > xe_vm
> > > > > > > > > > > > > > *vm, struct xe_userptr_vma *uv
> > > > > > > > > > > > > > > >      				   
> > > > > > > > > > > > > > > > false,
> > > > > > > > > > > > > > > > MAX_SCHEDULE_TIMEOUT);
> > > > > > > > > > > > > > > >      	XE_WARN_ON(err <= 0);
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > -	if (xe_vm_in_fault_mode(vm) &&
> > > > > > > > > > > > > > > > userptr-
> > > > > > > > > > > > > > > > > initial_bind) {
> > > > > > > > > > > > > > > > +	if ((xe_vm_in_fault_mode(vm)
> > > > > > > > > > > > > > > > ||
> > > > > > > > > > > > > > > > +xe_device_needs_cache_flush(vm-
> > > > > > > > > > > > > > > xe)) &&
> > > > > > > > > > > > > > > > +	    userptr->initial_bind) {
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Same concern with the LR preempt fence as
> > > > > > > > > > > > > > above —
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > hardware
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > be interrupted via preempt fences, so it
> > > > > > > > > > > > > > doesn’t
> > > > > > > > > > > > > > seem
> > > > > > > > > > > > > > necessary
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > invalidate the TLBs but perhaps we need a
> > > > > > > > > > > > > > cflush
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > TLB
> > > > > > > > > > > > > > invalidation is the mechanism for that too?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Matt
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > >      		err =
> > > > > > > > > > > > > > > > xe_vm_invalidate_vma(vma);
> > > > > > > > > > > > > > > >      		XE_WARN_ON(err);
> > > > > > > > > > > > > > > >      	}
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.52.0
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Matt Roper
> > > > > > > > > > > > > > > Graphics Software Engineer
> > > > > > > > > > > > > > > Linux GPU Platform Enablement
> > > > > > > > > > > > > > > Intel Corporation
> > > > > > > > > > > > 
> > > > > > > > > > > > --
> > > > > > > > > > > > Matt Roper
> > > > > > > > > > > > Graphics Software Engineer
> > > > > > > > > > > > Linux GPU Platform Enablement
> > > > > > > > > > > > Intel Corporation
> > > > > > > > >

next prev parent reply	other threads:[~2026-02-17  9:53 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 12:51 [PATCH 0/3] drm/xe/xe3p_lpg: L2 flush optimization Tejas Upadhyay
2026-02-10 12:51 ` [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo cachelines manually Tejas Upadhyay
2026-02-10 21:05   ` Matt Roper
2026-02-11  0:02     ` Matthew Brost
2026-02-11 19:06       ` Upadhyay, Tejas
2026-02-11 21:11         ` Matt Roper
2026-02-12  9:53           ` Matthew Auld
2026-02-13 11:17             ` Upadhyay, Tejas
2026-02-13 13:27               ` Matthew Auld
2026-02-13 13:30                 ` Souza, Jose
2026-02-13 16:23           ` Upadhyay, Tejas
2026-02-13 16:48             ` Souza, Jose
2026-02-13 17:16               ` Matt Roper
2026-02-13 17:31                 ` Souza, Jose
2026-02-13 17:31                 ` Matthew Auld
2026-02-16 10:23                   ` Thomas Hellström
2026-02-16 10:58                     ` Matthew Auld
2026-02-16 12:07                       ` Thomas Hellström
2026-02-16 14:55                         ` Matthew Auld
2026-02-16 15:38                           ` Thomas Hellström
2026-02-16 16:41                             ` Matthew Auld
2026-02-17  6:19                               ` Upadhyay, Tejas
2026-02-17  9:53                                 ` Thomas Hellström [this message]
2026-02-17 17:04                               ` Thomas Hellström
2026-02-17 18:41                                 ` Matthew Auld
2026-02-16 10:56             ` Thomas Hellström
2026-02-16 11:26               ` Upadhyay, Tejas
2026-02-13 17:29           ` Matthew Auld
2026-02-10 12:51 ` [PATCH 2/3] drm/xe/xe3p_lpg: Enable L2 flush optimization feature Tejas Upadhyay
2026-02-10 12:51 ` [PATCH 3/3] drm/xe/xe3p: Skip TD flush Tejas Upadhyay
2026-02-10 21:22   ` Matt Roper
2026-02-13 11:06     ` Upadhyay, Tejas
2026-02-10 13:35 ` ✗ CI.checkpatch: warning for drm/xe/xe3p_lpg: L2 flush optimization (rev2) Patchwork
2026-02-10 13:36 ` ✓ CI.KUnit: success " Patchwork
2026-02-10 14:33 ` ✗ Xe.CI.BAT: failure " Patchwork
2026-02-10 18:06 ` ✗ Xe.CI.FULL: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2025-11-25  9:43 [PATCH 0/3] drm/xe/xe3p_lpg: L2 flush optimization Tejas Upadhyay
2025-11-25  9:43 ` [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo cachelines manually Tejas Upadhyay
2025-11-25 10:17   ` Matthew Auld
2025-11-25 13:39     ` Souza, Jose
2025-11-25 15:06   ` Thomas Hellström
2025-11-25 15:31     ` Upadhyay, Tejas
2025-11-26 10:26       ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=edc9d10bc6cbcf22a1661b535d144a74dd7f1002.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jose.souza@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=michal.mrozek@intel.com \
    --cc=tejas.upadhyay@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox