dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org
Cc: airlied@gmail.com, matthew.brost@intel.com
Subject: Re: [PATCH] drm/ttm: WIP limit the TTM pool to 32bit CPUs
Date: Thu, 07 Aug 2025 18:47:49 +0200	[thread overview]
Message-ID: <be9d451d511f8bc4652d835a2c28fa823aaf05f1.camel@linux.intel.com> (raw)
In-Reply-To: <d6830af2-52aa-4ca6-85c5-2a4635ce6c7d@amd.com>

On Thu, 2025-08-07 at 11:53 +0200, Christian König wrote:
> On 06.08.25 19:43, Thomas Hellström wrote:
> > Hi, Christian
> > 
> > On Wed, 2025-08-06 at 15:28 +0200, Christian König wrote:
> > > On some old x86 systems we had the problem that changing the
> > > caching
> > > flags
> > > of system memory requires changing the global MTRR/PAT tables.
> > > 
> > > But on any modern x86 system (CPUs introduced rughly after 2004)
> > > we
> > > actually don't need that any more and can update the caching
> > > flags
> > > directly in the PTEs of the CPU mapping. It was just never
> > > disabled
> > > because of the fear of regressions.
> > > 
> > > We already use the PTE flags for encryption on x86 64bit for
> > > quite a
> > > while
> > > and all other supported platforms (Sparc, PowerPC, ARM, MIPS,
> > > LONGARCH)
> > > have never done anything different either.
> > 
> > IIRC from my VMWARE days, changing SEV encryption mode of a page
> > still
> > requires changing all mappings including kernel maps?
> > __set_memory_enc_pgtable()
> 
> IIRC both Intel and AMD sacrifice a bit in the page address for that,
> e.g. for encryption the most significant bit is used to indicate if a
> page is encrypted or not.
> 
> I'm not aware that we need to change all kernel mappings for
> encryption, but could be that the hypervisor somehow depends on that.
> 
> > > 
> > > So disable the page pool completely for 64bit systems and just
> > > insert
> > > a
> > > clflush to be on the safe side so that we never return memory
> > > with
> > > dirty
> > > cache lines.
> > > 
> > > Testing on a Ryzen 5 and 7 shows that this has absolutely no
> > > performance
> > > impact and of hand the AMD CI can't find a problem either.
> > > 
> > > Let's see what the i915 and XE CI systems say to that.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > 
> > I don't think we can do this. First Lunar Lake can in some
> > situations,
> > just like the old Athlons, write-back clean cache lines which means
> > writebacks of speculative prefetches may overwrite GPU data.
> 
> So a speculative prefetch because of on an access to an adjacent page
> could causes the cache line to be fetched and then written back
> without any change to it?

Exactly. 

> 
> Well it's surprising that even modern CPU do stuff like that. That
> could explain some of the problems we had with uncached mappings on
> ARM and RISC-V.

Yeah. I agree. Need to double-check with HW people whether that is gone
with Panther Lake. Don't have a confirmation yet on that. 

> 
> > LNL makes heavy use of non-coherent GPU mappings for performance.
> 
> That is even more surprising. At least on modern Ryzens that doesn't
> seem to have much performance impact any more at all.
> 
> I mean non-cached mappings where original introduced to avoid the
> extra overhead of going over the front side bus, but that design is
> long gone.

With LNL it's possible to set up GPU mapping to make the accesses
coherent with CPU but that's quite slow. A tradeoff in HW design.

If it wasn't for the writeback of speculative prefetches we could've
settled for only have TTM map the user-space mappings without changing
the kernel map, just like the i915 driver does for older GPUS.

> 
> > Second, IIRC vm_insert_pfn_prot() on X86 will override the given
> > caching mode with the last caching mode set for the kernel linear
> > map,
> > so if you try to set up a write-combined GPU mapping without a
> > previous
> > call to set_pages_xxxxx it will actually end up cached. see
> > track_pfn_insert().
> 
> That is exactly the same incorrect assumption I made as well.
> 
> It's not the linear mapping where that comes from but a separate page
> attribute table, see /sys/kernel/debug/x86/pat_memtype_list.
> 
> Question is why the heck should we do this? I mean we keep an extra
> rb tree around to overwrite something the driver knows in the first
> place?
> 
> That is basically just tons of extra overhead for nothing as far as I
> can see.

IIRC it was PAT people enforcing the x86 documentation that aliased
mappings with conflicting caching attributes were not allowed. But it
has proven to work at least on those CPUs not suffering from the clean
cache-line writeback mentioned above.

FWIW If I understand the code correctly, i915 bypasses this by setting
up user-space mappings not by vm_insert_pfn_prot() but using
apply_to_page_range(), mapping the whole bo.

/Thomas


> 
> Thanks for taking a look,
> Christian.
> 
> > 
> > /Thomas
> > 
> > 
> > > ---
> > >  drivers/gpu/drm/ttm/ttm_pool.c | 16 +++++++++++-----
> > >  1 file changed, 11 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > > b/drivers/gpu/drm/ttm/ttm_pool.c
> > > index baf27c70a419..7487eac29398 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > @@ -38,7 +38,7 @@
> > >  #include <linux/highmem.h>
> > >  #include <linux/sched/mm.h>
> > >  
> > > -#ifdef CONFIG_X86
> > > +#ifdef CONFIG_X86_32
> > >  #include <asm/set_memory.h>
> > >  #endif
> > >  
> > > @@ -46,6 +46,7 @@
> > >  #include <drm/ttm/ttm_pool.h>
> > >  #include <drm/ttm/ttm_tt.h>
> > >  #include <drm/ttm/ttm_bo.h>
> > > +#include <drm/drm_cache.h>
> > >  
> > >  #include "ttm_module.h"
> > >  
> > > @@ -192,7 +193,7 @@ static void ttm_pool_free_page(struct
> > > ttm_pool
> > > *pool, enum ttm_caching caching,
> > >  	struct ttm_pool_dma *dma;
> > >  	void *vaddr;
> > >  
> > > -#ifdef CONFIG_X86
> > > +#ifdef CONFIG_X86_32
> > >  	/* We don't care that set_pages_wb is inefficient here.
> > > This
> > > is only
> > >  	 * used when we have to shrink and CPU overhead is
> > > irrelevant then.
> > >  	 */
> > > @@ -218,7 +219,7 @@ static void ttm_pool_free_page(struct
> > > ttm_pool
> > > *pool, enum ttm_caching caching,
> > >  /* Apply any cpu-caching deferred during page allocation */
> > >  static int ttm_pool_apply_caching(struct ttm_pool_alloc_state
> > > *alloc)
> > >  {
> > > -#ifdef CONFIG_X86
> > > +#ifdef CONFIG_X86_32
> > >  	unsigned int num_pages = alloc->pages - alloc-
> > > > caching_divide;
> > >  
> > >  	if (!num_pages)
> > > @@ -232,6 +233,11 @@ static int ttm_pool_apply_caching(struct
> > > ttm_pool_alloc_state *alloc)
> > >  	case ttm_uncached:
> > >  		return set_pages_array_uc(alloc->caching_divide,
> > > num_pages);
> > >  	}
> > > +
> > > +#elif defined(CONFIG_X86_64)
> > > +	unsigned int num_pages = alloc->pages - alloc-
> > > > caching_divide;
> > > +
> > > +	drm_clflush_pages(alloc->caching_divide, num_pages);
> > >  #endif
> > >  	alloc->caching_divide = alloc->pages;
> > >  	return 0;
> > > @@ -342,7 +348,7 @@ static struct ttm_pool_type
> > > *ttm_pool_select_type(struct ttm_pool *pool,
> > >  	if (pool->use_dma_alloc)
> > >  		return &pool->caching[caching].orders[order];
> > >  
> > > -#ifdef CONFIG_X86
> > > +#ifdef CONFIG_X86_32
> > >  	switch (caching) {
> > >  	case ttm_write_combined:
> > >  		if (pool->nid != NUMA_NO_NODE)
> > > @@ -980,7 +986,7 @@ long ttm_pool_backup(struct ttm_pool *pool,
> > > struct ttm_tt *tt,
> > >  	    pool->use_dma_alloc || ttm_tt_is_backed_up(tt))
> > >  		return -EBUSY;
> > >  
> > > -#ifdef CONFIG_X86
> > > +#ifdef CONFIG_X86_32
> > >  	/* Anything returned to the system needs to be cached.
> > > */
> > >  	if (tt->caching != ttm_cached)
> > >  		set_pages_array_wb(tt->pages, tt->num_pages);
> > 
> 


  reply	other threads:[~2025-08-07 16:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06 13:28 [PATCH] drm/ttm: WIP limit the TTM pool to 32bit CPUs Christian König
2025-08-06 17:43 ` Thomas Hellström
2025-08-07  9:53   ` Christian König
2025-08-07 16:47     ` Thomas Hellström [this message]
2025-08-11 11:51       ` Christian König
2025-08-11 15:16         ` Thomas Hellström
2025-08-12  9:17           ` Christian König
2025-08-12 10:19             ` Thomas Hellström
2025-08-09  4:07 ` kernel test robot
2025-08-12 10:36 ` Matthew Auld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be9d451d511f8bc4652d835a2c28fa823aaf05f1.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=airlied@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).