From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 680C9E9A036 for ; Tue, 17 Feb 2026 18:41:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 126F010E52E; Tue, 17 Feb 2026 18:41:13 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="R7Hszp8W"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1C2D810E52C for ; Tue, 17 Feb 2026 18:41:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1771353671; x=1802889671; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=Bl7DJWntrqLiVmGyKQHSqYTYifq3BxlJUrDxOGJp9v8=; b=R7Hszp8W7f3nKOsH93Hq3vYqhVLgvCXIOkQ17I/7dYWr0aF3m4rh9qEP Gk8sGFxtUadzEFZAiTradmRIC+yF4xIOOgDOiuic1yMzEhoALqnWaVQtJ sVU8NjUuzYWlPn0tOvMWwP+CI6GvoslpW4B45R2Pzev4SwGNBA7vB4Lz0 yev2jS9rpf/nRwCQS5bqF2XYxDHYp7MsHCG7q6wfiGFw6VRqX7WNauig3 z+p30xHeUFiEJPV3FAmmIherLmu24CbBCay3Vs2fVCJ/Xpj4KfHEBgpbo UyiKyiWLet3WpGkmcp1iCJeGB1T/Cwnjx5uU10d5ctSxsV4WCXzazUQq6 A==; X-CSE-ConnectionGUID: U2GIwHW/S3OZb/kFQuDZmg== X-CSE-MsgGUID: vzu5cFifTa+zucjJebsm4g== X-IronPort-AV: E=McAfee;i="6800,10657,11704"; a="75029824" X-IronPort-AV: E=Sophos;i="6.21,296,1763452800"; d="scan'208";a="75029824" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Feb 2026 10:41:11 -0800 X-CSE-ConnectionGUID: cDvwFmelTpird5r2b3wKgQ== X-CSE-MsgGUID: a7A7a6LGQIGO/+aHIm52Gw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,296,1763452800"; d="scan'208";a="236966671" Received: from klitkey1-mobl1.ger.corp.intel.com (HELO [10.245.245.125]) ([10.245.245.125]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Feb 2026 10:41:08 -0800 Message-ID: Date: Tue, 17 Feb 2026 18:41:06 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/3] drm/xe/xe3p_lpg: flush userptr/shrinker bo cachelines manually To: =?UTF-8?Q?Thomas_Hellstr=C3=B6m?= , Matt Roper , "Souza, Jose" Cc: "Upadhyay, Tejas" , "Mrozek, Michal" , "intel-xe@lists.freedesktop.org" , "Brost, Matthew" References: <20260210125120.1329411-5-tejas.upadhyay@intel.com> <20260210210525.GC4694@mdroper-desk1.amr.corp.intel.com> <20260211211125.GL4694@mdroper-desk1.amr.corp.intel.com> <20260213171638.GC52346@mdroper-desk1.amr.corp.intel.com> <75dcc80b39ed33a7abc620b2614b0e81586a6299.camel@linux.intel.com> <8ce35a23-b639-4c4f-acfd-993c4f9d5008@intel.com> <9ce0acf02e8ef63bf81cfbb9e1053bfa1437362f.camel@linux.intel.com> <3da2dfce-9bbf-42cd-a178-24a6c44b18c3@intel.com> <0d069a41599695c4c8fa0093fb1a4edb4017e600.camel@linux.intel.com> <9bf23ecf67067f363ab0115b6a18614a8f5258f7.camel@linux.intel.com> Content-Language: en-GB From: Matthew Auld In-Reply-To: <9bf23ecf67067f363ab0115b6a18614a8f5258f7.camel@linux.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On 17/02/2026 17:04, Thomas Hellström wrote: > On Mon, 2026-02-16 at 16:41 +0000, Matthew Auld wrote: >> On 16/02/2026 15:38, Thomas Hellström wrote: >>> On Mon, 2026-02-16 at 14:55 +0000, Matthew Auld wrote: >>>> On 16/02/2026 12:07, Thomas Hellström wrote: >>>>> On Mon, 2026-02-16 at 10:58 +0000, Matthew Auld wrote: >>>>>> On 16/02/2026 10:23, Thomas Hellström wrote: >>>>>>> On Fri, 2026-02-13 at 17:31 +0000, Matthew Auld wrote: >>>>>>>> On 13/02/2026 17:16, Matt Roper wrote: >>>>>>>>> On Fri, Feb 13, 2026 at 04:48:39PM +0000, Souza, Jose >>>>>>>>> wrote: >>>>>>>>>> On Fri, 2026-02-13 at 16:23 +0000, Upadhyay, Tejas >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Roper, Matthew D >>>>>>>>>>>> >>>>>>>>>>>> Sent: 12 February 2026 02:41 >>>>>>>>>>>> To: Upadhyay, Tejas >>>>>>>>>>>> Cc: Brost, Matthew ; >>>>>>>>>>>> intel- >>>>>>>>>>>> xe@lists.freedesktop.org; Auld, Matthew >>>>>>>>>>>> ; >>>>>>>>>>>> thomas.hellstrom@linux.intel.com >>>>>>>>>>>> Subject: Re: [PATCH 1/3] drm/xe/xe3p_lpg: flush >>>>>>>>>>>> userptr/shrinker bo >>>>>>>>>>>> cachelines manually >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 11, 2026 at 07:06:05PM +0000, >>>>>>>>>>>> Upadhyay, >>>>>>>>>>>> Tejas >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>> From: Brost, Matthew >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sent: 11 February 2026 05:32 >>>>>>>>>>>>>> To: Roper, Matthew D >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cc: Upadhyay, Tejas >>>>>>>>>>>>>> ; >>>>>>>>>>>>>> intel- >>>>>>>>>>>>>> xe@lists.freedesktop.org; Auld, Matthew >>>>>>>>>>>>>> ; >>>>>>>>>>>>>> thomas.hellstrom@linux.intel.com >>>>>>>>>>>>>> Subject: Re: [PATCH 1/3] drm/xe/xe3p_lpg: >>>>>>>>>>>>>> flush >>>>>>>>>>>>>> userptr/shrinker bo >>>>>>>>>>>>>> cachelines manually >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Feb 10, 2026 at 01:05:25PM -0800, >>>>>>>>>>>>>> Matt >>>>>>>>>>>>>> Roper >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On Tue, Feb 10, 2026 at 06:21:22PM +0530, >>>>>>>>>>>>>>> Tejas >>>>>>>>>>>>>>> Upadhyay >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> "eXtended Architecture" (XA) tagged >>>>>>>>>>>>>>>> memory—memory >>>>>>>>>>>>>>>> shared >>>>>>>>>>>> between >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> CPU and GPU >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm pretty sure this expansion of "XA" is >>>>>>>>>>>>>>> wrong; >>>>>>>>>>>>>>> where >>>>>>>>>>>>>>> are >>>>>>>>>>>>>>> you >>>>>>>>>>>>>>> seeing this definition?  Everything in the >>>>>>>>>>>>>>> bspec >>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>> that XA >>>>>>>>>>>>>>> means "wb >>>>>>>>>>>>>>> - transient app" (similar to how "XD" is >>>>>>>>>>>>>>> 'wb - >>>>>>>>>>>>>>> transient >>>>>>>>>>>>>>> display"). >>>>>>>>>>>>>>> I'm not sure why exactly they picked "X" to >>>>>>>>>>>>>>> refer >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> transient in >>>>>>>>>>>>>>> both of these cases, but I've never seen >>>>>>>>>>>>>>> any >>>>>>>>>>>>>>> documentation >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> refers to it as "extended." >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is treated differently from other GPU >>>>>>>>>>>>>>>> memory >>>>>>>>>>>>>>>> when >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> Media >>>>>>>>>>>>>>>> engine is >>>>>>>>>>>>>> power-gated. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> XA is *always* flushed, like at the end- >>>>>>>>>>>>>>>> of- >>>>>>>>>>>>>>>> submssion >>>>>>>>>>>>>>>> (and >>>>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I assume you're referring to the fact that >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> driver >>>>>>>>>>>>>>> performs >>>>>>>>>>>>>>> flushes at the end of submission (via >>>>>>>>>>>>>>> PIPE_CONTROL >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>> MI_FLUSH_DW), and that depending on other >>>>>>>>>>>>>>> state/optimizations >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> the system, those flushes may flush the >>>>>>>>>>>>>>> entire >>>>>>>>>>>>>>> device >>>>>>>>>>>>>>> cache, >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>> may only flush the subset of cache data >>>>>>>>>>>>>>> that is >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>> marked as >>>>>>>>>>>>>>> transient.  The way you worded this was >>>>>>>>>>>>>>> confusing >>>>>>>>>>>>>>> since >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>> makes >>>>>>>>>>>>>>> it sound like cache flushes happen >>>>>>>>>>>>>>> automatically >>>>>>>>>>>>>>> somewhere in >>>>>>>>>>>> hardware/firmware. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> places), just that internally as an >>>>>>>>>>>>>>>> optimisation >>>>>>>>>>>>>>>> hw >>>>>>>>>>>>>>>> doesn't >>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>> to make that a full flush (which will >>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>>> include >>>>>>>>>>>>>>>> XA) when >>>>>>>>>>>>>>>> Media is off/powergated, since it doesn't >>>>>>>>>>>>>>>> need to >>>>>>>>>>>>>>>> worry >>>>>>>>>>>>>>>> about GT >>>>>>>>>>>>>>>> caches vs Media coherency, and only CPU >>>>>>>>>>>>>>>> vs >>>>>>>>>>>>>>>> GPU >>>>>>>>>>>>>>>> coherency, >>>>>>>>>>>>>>>> so can >>>>>>>>>>>>>>>> make that flush a targeted XA flush, >>>>>>>>>>>>>>>> since >>>>>>>>>>>>>>>> stuff >>>>>>>>>>>>>>>> tagged >>>>>>>>>>>>>>>> with XA >>>>>>>>>>>>>>>> now means it's shared with the CPU. The >>>>>>>>>>>>>>>> main >>>>>>>>>>>>>>>> implication is >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> we now need to somehow flush non-XA >>>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>> freeing >>>>>>>>>>>>>>>> system >>>>>>>>>>>>>>>> memory >>>>>>>>>>>>>>>> pages, otherwise dirty cachelines could >>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>> flushed >>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> free (like if Media suddenly turns on and >>>>>>>>>>>>>>>> does a >>>>>>>>>>>>>>>> full >>>>>>>>>>>>>>>> flush) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This description seems really confusing. >>>>>>>>>>>>>>> My >>>>>>>>>>>>>>> understanding is >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> marking something as wb-transient-app >>>>>>>>>>>>>>> indicates >>>>>>>>>>>>>>> that it >>>>>>>>>>>>>>> might >>>>>>>>>>>>>>> be >>>>>>>>>>>>>>> accessed by something other than our >>>>>>>>>>>>>>> graphics/media >>>>>>>>>>>>>>> IP >>>>>>>>>>>>>>> (i.e., >>>>>>>>>>>>>>> accessed from the CPU, exported to another >>>>>>>>>>>>>>> device, >>>>>>>>>>>>>>> etc.), so >>>>>>>>>>>>>>> transient data truly does need to be >>>>>>>>>>>>>>> flushed at >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> points in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> driver where a flush typically happens. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> However when something is _not_ transient, >>>>>>>>>>>>>>> then >>>>>>>>>>>>>>> either: >>>>>>>>>>>>>>>      - it's "private" to the GPU and only >>>>>>>>>>>>>>> our >>>>>>>>>>>>>>> graphics/media IP >>>>>>>>>>>>>>> will be >>>>>>>>>>>>>>>        accessing it >>>>>>>>>>>>>>>      - it's bound with a coherent PAT index >>>>>>>>>>>>>>> so >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>> observers like >>>>>>>>>>>>>>>        the CPU can snoop the device cache, >>>>>>>>>>>>>>> even >>>>>>>>>>>>>>> when >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>> hasn't been >>>>>>>>>>>>>>>        flushed >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If media is not active, then there's really >>>>>>>>>>>>>>> no >>>>>>>>>>>>>>> need >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> include >>>>>>>>>>>>>>> non-transient data when an device cache >>>>>>>>>>>>>>> flush >>>>>>>>>>>>>>> happens >>>>>>>>>>>>>>> since >>>>>>>>>>>>>>> there's no real need for the data to get to >>>>>>>>>>>>>>> RAM. >>>>>>>>>>>>>>> So >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> enables >>>>>>>>>>>>>>> an optimization (which comes in your next >>>>>>>>>>>>>>> patch), >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> allows >>>>>>>>>>>>>>> flushes to only operate on the subset of >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> device >>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>> tagged as >>>>>>>>>>>> "transient" if media is idle. >>>>>>>>>>>>> >>>>>>>>>>>>> But what If we have stale non-XA marked pages >>>>>>>>>>>>> for >>>>>>>>>>>>> userptr, >>>>>>>>>>>>> and >>>>>>>>>>>>> that >>>>>>>>>>>>> object moves out and at the same time media >>>>>>>>>>>>> comes >>>>>>>>>>>>> back, >>>>>>>>>>>>> will end >>>>>>>>>>>>> up in >>>>>>>>>>>>> full flush and flush the stale entry to RAM. >>>>>>>>>>>> >>>>>>>>>>>> What makes userptr special here?  During general, >>>>>>>>>>>> active >>>>>>>>>>>> usage, >>>>>>>>>>>> userptr would >>>>>>>>>>>> be data that's accessible by the CPU, so it needs >>>>>>>>>>>> to >>>>>>>>>>>> either >>>>>>>>>>>> be >>>>>>>>>>>> transient (so CPU >>>>>>>>>>>> can see the data in RAM after explicit flushes) >>>>>>>>>>>> or it >>>>>>>>>>>> needs >>>>>>>>>>>> to be >>>>>>>>>>>> using a >>>>>>>>>>>> coherent PAT (so that the CPU can just snoop the >>>>>>>>>>>> GPU >>>>>>>>>>>> cache). >>>>>>>>>>>> If >>>>>>>>>>>> you marked >>>>>>>>>>>> userptr as both non-XA and non-coherent, then >>>>>>>>>>>> that >>>>>>>>>>>> sounds >>>>>>>>>>>> likely to >>>>>>>>>>>> be a >>>>>>>>>>>> userspace bug (and probably something we can >>>>>>>>>>>> catch >>>>>>>>>>>> and >>>>>>>>>>>> reject >>>>>>>>>>>> as an >>>>>>>>>>>> invalid >>>>>>>>>>>> case on any Xe3p or later platforms that support >>>>>>>>>>>> this) >>>>>>>>>>>> since >>>>>>>>>>>> the >>>>>>>>>>>> CPU wouldn't >>>>>>>>>>>> have any reliable way of seeing GPU updates. >>>>>>>>>>> >>>>>>>>>>> Right. FYI @Mrozek, Michal @Souza, Jose >>>>>>>>>>> For userptr, as explained above, it needs to be >>>>>>>>>>> either >>>>>>>>>>> coherent >>>>>>>>>>> or XA >>>>>>>>>>> pat index, or else KMD will reject as invalid case. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> The coherency restriction is already in the uAPI: >>>>>>>>>> >>>>>>>>>> "Note: For userptr and externally imported dma-buf >>>>>>>>>> the >>>>>>>>>> kernel >>>>>>>>>> expects >>>>>>>>>> either 1WAY or 2WAY for the @pat_index." >>>>>>>>>> >>>>>>>>>> Using 1 way is enough as Xe KMD does a PIPE_CONTROL >>>>>>>>>> flushing >>>>>>>>>> GPU >>>>>>>>>> caches >>>>>>>>>> at the end of batch buffers. >>>>>>>>> >>>>>>>>> But isn't that what we're discussing here?  1-way >>>>>>>>> *won't* >>>>>>>>> necessarily be >>>>>>>>> enough anymore because PIPE_CONTROL instructions don't >>>>>>>>> flush >>>>>>>>> the >>>>>>>>> entire >>>>>>>>> cache anymore.  Whenever the GuC determines that media >>>>>>>>> is >>>>>>>>> inactive >>>>>>>>> and >>>>>>>>> activates the optimization, PIPE_CONTROL, MI_FLUSH_DW, >>>>>>>>> etc. >>>>>>>>> change >>>>>>>>> behavior to only flush out the subset of data that was >>>>>>>>> marked >>>>>>>>> as >>>>>>>>> app-transient; anything not marked that way doesn't get >>>>>>>>> flushed >>>>>>>>> now.  So >>>>>>>>> there's a new requirement here that you ensure you're >>>>>>>>> using >>>>>>>>> an >>>>>>>>> XA >>>>>>>>> PAT >>>>>>>>> index, or you switch to use 2-way coherency which will >>>>>>>>> allow >>>>>>>>> the >>>>>>>>> CPU to >>>>>>>>> snoop the GPU's caches. >>>>>>>> >>>>>>>> That exactly matches my understanding also. >>>>>>> >>>>>>> This only ever affects IGFX, right? Since AFAIU we don't >>>>>>> have >>>>>>> 2-way >>>>>>> coherency with DGFX? >>>>>> >>>>>> Yeah, this should be igpu only. I seem to also recall that on >>>>>> dgpu, >>>>>> Media is coherent with l2/l3, but also I don't think system >>>>>> memory >>>>>> can >>>>>> be cached in l2/l3 (only VRAM), which I assume is why there >>>>>> is >>>>>> the >>>>>> special SMRO (system-memory-read-only) cache only on dgpu, >>>>>> which >>>>>> is >>>>>> flushed when the fence signals, unlike the l2/l3. >>>>> >>>>> Yes that sounds reasonable. >>>>> >>>>>> >>>>>>> >>>>>>> It sounds like the same PAT restriction is needed also for >>>>>>> imported >>>>>>> dma-buf, right? >>>>>> >>>>>> Good point. Looks like we are missing that still. Otherwise >>>>>> we >>>>>> can >>>>>> run >>>>>> into the same issues with stale l2/l3/ppc. >>>>> >>>>> So if this affects only system memory could we instead of >>>>> relying >>>>> on 2- >>>>> way coherency or XA, just flush at dma unmap time, because >>>>> that's >>>>> typically just before releasing the pages. >>>> >>>> Yeah, I think we could make it work, from security pov, similar >>>> to >>>> userptr, with the right manual flushes in KMD. Maybe just a >>>> question >>>> if >>>> userspace wants such a model? Anything cached in l2/l3 might >>>> require >>>> manual flushing by userspace (if that is even possible)? >>> >>> So that would mean if user-space wants gpu-cpu coherency at fence >>> synchronization points, they'd have to use either 2-way or XA pat >>> indices, but not enforced by KMD. >> >> Yeah, looking at BSpec 74635 (Media off case), I'm only really seeing >> MEM_SET which userspace could potentially use by itself? But then >> it's >> unclear if they mean to actually clear-the-memory (which is not what >> we >> want) or using the special evict mode, but that seems to be talking >> more >> about flushing to local memory, so not completely sure what that does >> on >> igpu. If it's the evict mode then should in theory be possible for >> userpace to do a manual flush, but that would have to be done per- >> bo/vma? >> >>> >>> For imported dma-buf kernel requires 2-way or XA for security due >>> to >>> the relaxed dma-buf unmap. >>> >>> For SVM/System allocator we'd require 2-way or XA. >>> >>> Otherwise KMD security is enforced by flush at dma-unmap time? >> >> Yeah, that is my understanding. Otherwise I don't currently see what >> prevents the dirty non-XA cache lines being flushed at some random >> point >> later, after we have already freed the corresponding system memory, >> potentially nuking the next user who allocates those pages. > > So I've discussed a bit more with Tejas and since the virtual addresses > are needed for the flush, flushing at dma-unmap time doesn't really > work. And since this is IGFX only, where we sync on moves, a flush in > xe_bo_trigger_rebind() should be completely ok, at least until affected > DGFX occurs, where we might want to look at async TLB flushes. > > And for simplicitly then go for the PAT restriction also for userptr, > svm and imported dma-buf. > > Thoughts? Yeah agreed, I think that should be good enough and will hopefully cover all the missing cases. Otherwise maybe we could somehow use a known dummy address range from the migrate vm or something, and use that to flush the PPC from unmap? I assume the address range doesn't actually matter if we just care about flushing entire PPC? There is also the xe_page_reclaim stuff for a targeted flush instead of nuking entire PPC, which doesn't seem to need ppgtt virtual address, just a list of physical page addresses, which will be a lot better for the smaller BOs. But maybe this is more follow up stuff. > > Thomas > >> >>> >>> /Thomas >>> >>>> >>>>> >>>>> The exception, though, is dma-buf where the exporter can >>>>> actually >>>>> release memory before all importers have given up their dma- >>>>> mappings. >>>>> >>>>> /Thomas >>>>> >>>>>> >>>>>>> >>>>>>> /Thomas >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> If something happens that changes the GTT mapping >>>>>>>>>>>> of >>>>>>>>>>>> an >>>>>>>>>>>> object, >>>>>>>>>>>> then >>>>>>>>>>>> doesn't that already trigger a TLB invalidation >>>>>>>>>>>> when >>>>>>>>>>>> necessary in >>>>>>>>>>>> the driver >>>>>>>>>>>> today?  It was my understanding that "heavy" TLB >>>>>>>>>>>> invalidations wait >>>>>>>>>>>> for data >>>>>>>>>>>> values to be globally observable before starting, >>>>>>>>>>>> so >>>>>>>>>>>> I >>>>>>>>>>>> think >>>>>>>>>>>> that >>>>>>>>>>>> would ensure >>>>>>>>>>>> that any non-XA data makes it to RAM before any >>>>>>>>>>>> binding >>>>>>>>>>>> changes, >>>>>>>>>>>> object, >>>>>>>>>>>> destruction, etc.?  Is there something special >>>>>>>>>>>> about >>>>>>>>>>>> userptr >>>>>>>>>>>> that >>>>>>>>>>>> makes that >>>>>>>>>>>> case more of a problem? >>>>>>>>>>>> >>>>>>>>>>>> I just found bspec page 74635 which gives an >>>>>>>>>>>> overview >>>>>>>>>>>> of >>>>>>>>>>>> the >>>>>>>>>>>> various flush >>>>>>>>>>>> and invalidate cases, and I don't see anything >>>>>>>>>>>> there >>>>>>>>>>>> that >>>>>>>>>>>> makes it >>>>>>>>>>>> obvious to >>>>>>>>>>>> me that userptr would be special. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As you said, we eventually do want to force >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>> flush >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> non-transient data as well once we're >>>>>>>>>>>>>>> freeing >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> underlying >>>>>>>>>>>>>>> pages. >>>>>>>>>>>>>>> So how do we do that?  It's not clear to me >>>>>>>>>>>>>>> how >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> changes >>>>>>>>>>>>>>> below >>>>>>>>>>>>>>> are accomplishing that.  Is there a way to >>>>>>>>>>>>>>> explicitly >>>>>>>>>>>>>>> request >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>> full device cache flush (ignoring the >>>>>>>>>>>>>>> transient >>>>>>>>>>>>>>> vs >>>>>>>>>>>>>>> non- >>>>>>>>>>>>>>> transient tagging)? >>>>>>>>>>>>>>> Since the GuC handles the optimization in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> next >>>>>>>>>>>>>>> patch >>>>>>>>>>>>>>> (toggling >>>>>>>>>>>>>>> whether flushes are full flushes vs non- >>>>>>>>>>>>>>> transient >>>>>>>>>>>>>>> flushes >>>>>>>>>>>>>>> depending on whether media is active), I >>>>>>>>>>>>>>> thought >>>>>>>>>>>>>>> there >>>>>>>>>>>>>>> might >>>>>>>>>>>>>>> be >>>>>>>>>>>>>>> some kind of GuC interface to request >>>>>>>>>>>>>>> "please >>>>>>>>>>>>>>> do >>>>>>>>>>>>>>> one >>>>>>>>>>>>>>> full >>>>>>>>>>>>>>> flush now, even >>>>>>>>>>>> if media is idle." >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I’m not an expert here by any means, but >>>>>>>>>>>>>> everything >>>>>>>>>>>>>> above >>>>>>>>>>>>>> from >>>>>>>>>>>>>> Matt >>>>>>>>>>>>>> seems like valid concerns. Thomas also raised >>>>>>>>>>>>>> some >>>>>>>>>>>>>> concerns in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> two previous revisions; again I’m not an >>>>>>>>>>>>>> expert, >>>>>>>>>>>>>> but >>>>>>>>>>>>>> reading >>>>>>>>>>>>>> through >>>>>>>>>>>>>> those, it doesn’t really seem like he >>>>>>>>>>>>>> received >>>>>>>>>>>>>> proper >>>>>>>>>>>>>> answers >>>>>>>>>>>>>> to his >>>>>>>>>>>> questions. >>>>>>>>>>>>> >>>>>>>>>>>>> Its forcing flush via tlb invalidation PPC flag >>>>>>>>>>>>> under >>>>>>>>>>>>> xe_invalidate_vma( ). >>>>>>>>>>>> >>>>>>>>>>>> By the way, what is "PPC?"  It seems like it's >>>>>>>>>>>> another >>>>>>>>>>>> new >>>>>>>>>>>> synonym >>>>>>>>>>>> for the >>>>>>>>>>>> device cache?  It's already really confusing that >>>>>>>>>>>> some of >>>>>>>>>>>> our >>>>>>>>>>>> hardware docs use >>>>>>>>>>>> a mix of both "L2" and "L3" to refer to the same >>>>>>>>>>>> device >>>>>>>>>>>> cache >>>>>>>>>>>> for >>>>>>>>>>>> historical >>>>>>>>>>>> reasons... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Matt >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> A couple of comments below. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> V2(MattA): Expand commit description >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Signed-off-by: Tejas Upadhyay >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>      drivers/gpu/drm/xe/xe_bo.c      |  3 >>>>>>>>>>>>>>>> ++- >>>>>>>>>>>>>>>>      drivers/gpu/drm/xe/xe_device.c  | 23 >>>>>>>>>>>>>>>> +++++++++++++++++++++++ >>>>>>>>>>>>>>>> drivers/gpu/drm/xe/xe_device.h  |  1 + >>>>>>>>>>>>>>>> drivers/gpu/drm/xe/xe_userptr.c |  3 ++- >>>>>>>>>>>>>>>>      4 files changed, 28 insertions(+), 2 >>>>>>>>>>>>>>>> deletions(-) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c >>>>>>>>>>>>>>>> b/drivers/gpu/drm/xe/xe_bo.c index >>>>>>>>>>>>>>>> e9180b01a4e4..4455886b211e >>>>>>>>>>>>>>>> 100644 >>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_bo.c >>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_bo.c >>>>>>>>>>>>>>>> @@ -689,7 +689,8 @@ static int >>>>>>>>>>>>>>>> xe_bo_trigger_rebind(struct >>>>>>>>>>>>>>>> xe_device *xe, struct xe_bo *bo, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>       if >>>>>>>>>>>>>>>> (!xe_vm_in_fault_mode(vm)) { >>>>>>>>>>>>>>>>       drm_gpuvm_bo_evi >>>>>>>>>>>>>>>> ct(v >>>>>>>>>>>>>>>> m_bo >>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>> true); >>>>>>>>>>>>>>>> - continue; >>>>>>>>>>>>>>>> + if >>>>>>>>>>>>>>>> (!xe_device_needs_cache_flush(xe)) >>>>>>>>>>>>>>>> + continue >>>>>>>>>>>>>>>> ; >>>>>>>>>>> >>>>>>>>>>> Matt R, >>>>>>>>>>> This flush will be still needed as there can be >>>>>>>>>>> non-xa >>>>>>>>>>> buffers >>>>>>>>>>> which >>>>>>>>>>> can be evicted while media was off and stale >>>>>>>>>>> entries >>>>>>>>>>> can be >>>>>>>>>>> flushed >>>>>>>>>>> when media comes back on. Which was not case >>>>>>>>>>> earlier as >>>>>>>>>>> full >>>>>>>>>>> flush >>>>>>>>>>> was happening at regular sync points and that’s >>>>>>>>>>> where >>>>>>>>>>> this >>>>>>>>>>> feature is >>>>>>>>>>> bringing optimization now. >>>>>>>>>>> >>>>>>>>>>> Tejas >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> This will trigger a TLB invalidation (and I >>>>>>>>>>>>>> assume a >>>>>>>>>>>>>> cache >>>>>>>>>>>>>> flush) >>>>>>>>>>>>>> every time we move or free memory in the 3D >>>>>>>>>>>>>> stack >>>>>>>>>>>>>> if >>>>>>>>>>>>>> it >>>>>>>>>>>>>> has a >>>>>>>>>>>>>> binding. It also performs a synchronous wait >>>>>>>>>>>>>> on >>>>>>>>>>>>>> the >>>>>>>>>>>>>> BO >>>>>>>>>>>>>> being >>>>>>>>>>>>>> idle. >>>>>>>>>>>>>> Both of these are very expensive operations. >>>>>>>>>>>>>> I >>>>>>>>>>>>>> can’t >>>>>>>>>>>>>> imagine >>>>>>>>>>>>>> the >>>>>>>>>>>>>> granularity we want here is to do this on >>>>>>>>>>>>>> every >>>>>>>>>>>>>> move/free >>>>>>>>>>>>>> with >>>>>>>>>>>>>> bindings. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, for LR compute with preempt fences, we >>>>>>>>>>>>>> would >>>>>>>>>>>>>> trigger the >>>>>>>>>>>>>> preempt fences during the wait, so a TLB >>>>>>>>>>>>>> invalidation >>>>>>>>>>>>>> after >>>>>>>>>>>>>> this >>>>>>>>>>>>>> seems unnecessary, though perhaps the cache >>>>>>>>>>>>>> flush >>>>>>>>>>>>>> is >>>>>>>>>>>>>> still >>>>>>>>>>>>>> required? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this needs a bit more explanation, >>>>>>>>>>>>>> because >>>>>>>>>>>>>> without >>>>>>>>>>>>>> knowing a >>>>>>>>>>>>>> lot about the exact requirements, the >>>>>>>>>>>>>> implementation >>>>>>>>>>>>>> does >>>>>>>>>>>>>> not >>>>>>>>>>>>>> look >>>>>>>>>>>> correct. >>>>>>>>>>>>> >>>>>>>>>>>>> The thing is that we are trying to solve >>>>>>>>>>>>> problem >>>>>>>>>>>>> with >>>>>>>>>>>>> userptr >>>>>>>>>>>>> with non-XA >>>>>>>>>>>> pat, consider if that BO got moved while media is >>>>>>>>>>>> not >>>>>>>>>>>> active. >>>>>>>>>>>> As >>>>>>>>>>>> soon as media >>>>>>>>>>>> will come back active, stale cached entries of >>>>>>>>>>>> that >>>>>>>>>>>> object >>>>>>>>>>>> will be >>>>>>>>>>>> flushed as part >>>>>>>>>>>> of full flush , which may corrupt things. >>>>>>>>>>>>> There was thinking that with this patch we >>>>>>>>>>>>> would at >>>>>>>>>>>>> least >>>>>>>>>>>>> solve >>>>>>>>>>>>> the problem >>>>>>>>>>>> of corruption and later when page_reclamation >>>>>>>>>>>> feature >>>>>>>>>>>> comes >>>>>>>>>>>> in will >>>>>>>>>>>> help in >>>>>>>>>>>> performance as well. But now when page >>>>>>>>>>>> reclamation >>>>>>>>>>>> feature is >>>>>>>>>>>> merged earlier >>>>>>>>>>>> and it tightly coupled with bind/unbind some >>>>>>>>>>>> cases >>>>>>>>>>>> like >>>>>>>>>>>> discussed >>>>>>>>>>>> above >>>>>>>>>>>> (which are not doing unbind immediately on >>>>>>>>>>>> move/free) >>>>>>>>>>>> are >>>>>>>>>>>> missed in >>>>>>>>>>>> reclamation. >>>>>>>>>>>>> >>>>>>>>>>>>> So thought was to let this solution go in with >>>>>>>>>>>>> little >>>>>>>>>>>>> perf >>>>>>>>>>>>> hit >>>>>>>>>>>>> and discuss with >>>>>>>>>>>> page reclamation owner to come with cleaner >>>>>>>>>>>> solution >>>>>>>>>>>> together. >>>>>>>>>>>>> >>>>>>>>>>>>> Tejas >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>       } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>       if (!idle) { >>>>>>>>>>>>>>>> diff --git >>>>>>>>>>>>>>>> a/drivers/gpu/drm/xe/xe_device.c >>>>>>>>>>>>>>>> b/drivers/gpu/drm/xe/xe_device.c index >>>>>>>>>>>>>>>> 743c18e0c580..da2abed94bc0 >>>>>>>>>>>>>>>> 100644 >>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_device.c >>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_device.c >>>>>>>>>>>>>>>> @@ -1097,6 +1097,29 @@ static void >>>>>>>>>>>>>>>> tdf_request_sync(struct >>>>>>>>>>>>>>>> xe_device >>>>>>>>>>>>>> *xe) >>>>>>>>>>>>>>>>       } >>>>>>>>>>>>>>>>      } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +/** >>>>>>>>>>>>>>>> + * xe_device_needs_cache_flush - Whether >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>> needs >>>>>>>>>>>>>>>> to be >>>>>>>>>>>>>>>> +flushed >>>>>>>>>>>>>>>> + * @xe: The device to check. >>>>>>>>>>>>>>>> + * >>>>>>>>>>>>>>>> + * Return: true if the device needs >>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>> flush, >>>>>>>>>>>>>>>> false >>>>>>>>>>>>>>>> otherwise. >>>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>>> +bool xe_device_needs_cache_flush(struct >>>>>>>>>>>>>>>> xe_device >>>>>>>>>>>>>>>> *xe) { >>>>>>>>>>>>>>>> + /* XA is *always* flushed, like >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> end- >>>>>>>>>>>>>>>> of- >>>>>>>>>>>>>>>> submssion (and >>>>>>>>>>>>>>>> +maybe >>>>>>>>>>>>>> other >>>>>>>>>>>>>>>> + * places), just that internally >>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>> optimisation hw doesn't >>>>>>>>>>>>>>>> +need to >>>>>>>>>>>>>> make >>>>>>>>>>>>>>>> + * that a full flush (which will >>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>>> include XA) >>>>>>>>>>>>>>>> when Media is >>>>>>>>>>>>>>>> + * off/powergated, since it >>>>>>>>>>>>>>>> doesn't >>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> worry >>>>>>>>>>>>>>>> about GT >>>>>>>>>>>>>>>> +caches vs >>>>>>>>>>>>>> Media >>>>>>>>>>>>>>>> + * coherency, and only CPU vs >>>>>>>>>>>>>>>> GPU >>>>>>>>>>>>>>>> coherency, >>>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>> can make >>>>>>>>>>>> that >>>>>>>>>>>>>>>> +flush >>>>>>>>>>>>>> a >>>>>>>>>>>>>>>> + * targeted XA flush, since >>>>>>>>>>>>>>>> stuff >>>>>>>>>>>>>>>> tagged >>>>>>>>>>>>>>>> with XA >>>>>>>>>>>>>>>> now means >>>>>>>>>>>>>>>> +it's shared >>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> + * the CPU. The main implication >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> now >>>>>>>>>>>>>>>> need to >>>>>>>>>>>>>>>> +somehow >>>>>>>>>>>>>> flush non-XA before >>>>>>>>>>>>>>>> + * freeing system memory pages, >>>>>>>>>>>>>>>> otherwise >>>>>>>>>>>>>>>> dirty >>>>>>>>>>>>>>>> cachelines >>>>>>>>>>>>>>>> +could be >>>>>>>>>>>>>> flushed after the free >>>>>>>>>>>>>>>> + * (like if Media suddenly turns >>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>> full flush) >>>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>>> + if (GRAPHICS_VER(xe) >= 35 && >>>>>>>>>>>>>>>> !IS_DGFX(xe)) >>>>>>>>>>>>>>>> + return true; >>>>>>>>>>>>>>>> + return false; >>>>>>>>>>>>>>>> +} >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>      void xe_device_l2_flush(struct >>>>>>>>>>>>>>>> xe_device >>>>>>>>>>>>>>>> *xe) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>       struct xe_gt *gt; >>>>>>>>>>>>>>>> diff --git >>>>>>>>>>>>>>>> a/drivers/gpu/drm/xe/xe_device.h >>>>>>>>>>>>>>>> b/drivers/gpu/drm/xe/xe_device.h index >>>>>>>>>>>>>>>> 39464650533b..baf386e0e037 >>>>>>>>>>>>>>>> 100644 >>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_device.h >>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_device.h >>>>>>>>>>>>>>>> @@ -184,6 +184,7 @@ void >>>>>>>>>>>>>>>> xe_device_snapshot_print(struct >>>>>>>>>>>>>>>> xe_device *xe, struct drm_printer *p); >>>>>>>>>>>>>>>>      u64 >>>>>>>>>>>>>>>> xe_device_canonicalize_addr(struct >>>>>>>>>>>>>>>> xe_device >>>>>>>>>>>>>>>> *xe, u64 >>>>>>>>>>>>>>>> address); >>>>>>>>>>>>>>>>      u64 >>>>>>>>>>>>>>>> xe_device_uncanonicalize_addr(struct >>>>>>>>>>>>>>>> xe_device >>>>>>>>>>>>>>>> *xe, >>>>>>>>>>>>>>>> u64 >>>>>>>>>>>>>>>> address); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +bool xe_device_needs_cache_flush(struct >>>>>>>>>>>>>>>> xe_device >>>>>>>>>>>>>>>> *xe); >>>>>>>>>>>>>>>>      void xe_device_td_flush(struct >>>>>>>>>>>>>>>> xe_device >>>>>>>>>>>>>>>> *xe); >>>>>>>>>>>>>>>> void >>>>>>>>>>>>>>>> xe_device_l2_flush(struct xe_device *xe); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> diff --git >>>>>>>>>>>>>>>> a/drivers/gpu/drm/xe/xe_userptr.c >>>>>>>>>>>>>>>> b/drivers/gpu/drm/xe/xe_userptr.c index >>>>>>>>>>>>>>>> e120323c43bc..b435ea7f9b66 >>>>>>>>>>>>>>>> 100644 >>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/xe/xe_userptr.c >>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/xe/xe_userptr.c >>>>>>>>>>>>>>>> @@ -114,7 +114,8 @@ static void >>>>>>>>>>>>>>>> __vma_userptr_invalidate(struct >>>>>>>>>>>>>>>> xe_vm >>>>>>>>>>>>>> *vm, struct xe_userptr_vma *uv >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> false, >>>>>>>>>>>>>>>> MAX_SCHEDULE_TIMEOUT); >>>>>>>>>>>>>>>>       XE_WARN_ON(err <= 0); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - if (xe_vm_in_fault_mode(vm) && >>>>>>>>>>>>>>>> userptr- >>>>>>>>>>>>>>>>> initial_bind) { >>>>>>>>>>>>>>>> + if ((xe_vm_in_fault_mode(vm) || >>>>>>>>>>>>>>>> +xe_device_needs_cache_flush(vm- >>>>>>>>>>>>>>> xe)) && >>>>>>>>>>>>>>>> +     userptr->initial_bind) { >>>>>>>>>>>>>> >>>>>>>>>>>>>> Same concern with the LR preempt fence as >>>>>>>>>>>>>> above — >>>>>>>>>>>>>> the >>>>>>>>>>>>>> hardware >>>>>>>>>>>>>> will >>>>>>>>>>>>>> be interrupted via preempt fences, so it >>>>>>>>>>>>>> doesn’t >>>>>>>>>>>>>> seem >>>>>>>>>>>>>> necessary >>>>>>>>>>>>>> to >>>>>>>>>>>>>> invalidate the TLBs but perhaps we need a >>>>>>>>>>>>>> cflush >>>>>>>>>>>>>> and >>>>>>>>>>>>>> TLB >>>>>>>>>>>>>> invalidation is the mechanism for that too? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Matt >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>       err = >>>>>>>>>>>>>>>> xe_vm_invalidate_vma(vma); >>>>>>>>>>>>>>>>       XE_WARN_ON(err); >>>>>>>>>>>>>>>>       } >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> 2.52.0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Matt Roper >>>>>>>>>>>>>>> Graphics Software Engineer >>>>>>>>>>>>>>> Linux GPU Platform Enablement >>>>>>>>>>>>>>> Intel Corporation >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Matt Roper >>>>>>>>>>>> Graphics Software Engineer >>>>>>>>>>>> Linux GPU Platform Enablement >>>>>>>>>>>> Intel Corporation >>>>>>>>>