intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Christian König" <christian.koenig@amd.com>,
	intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	x86@kernel.org
Cc: airlied@gmail.com, thomas.hellstrom@linux.intel.com,
	matthew.brost@intel.com, dave.hansen@linux.intel.com,
	luto@kernel.org, peterz@infradead.org,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Subject: Re:
Date: Tue, 26 Aug 2025 10:46:09 +0200	[thread overview]
Message-ID: <6591105b-969d-44d6-80e1-233c1b84b121@redhat.com> (raw)
In-Reply-To: <a1b95d23-1908-42c1-8ff6-da051fc140aa@amd.com>

On 26.08.25 10:38, Christian König wrote:
> On 25.08.25 21:10, David Hildenbrand wrote:
>> On 21.08.25 10:10, Christian König wrote:
>>> On 20.08.25 17:23, David Hildenbrand wrote:
>>>> CCing Lorenzo
>>>>
>>>> On 20.08.25 16:33, Christian König wrote:
>>>>> Hi everyone,
>>>>>
>>>>> sorry for CCing so many people, but that rabbit hole turned out to be
>>>>> deeper than originally thought.
>>>>>
>>>>> TTM always had problems with UC/WC mappings on 32bit systems and drivers
>>>>> often had to revert to hacks like using GFP_DMA32 to get things working
>>>>> while having no rational explanation why that helped (see the TTM AGP,
>>>>> radeon and nouveau driver code for that).
>>>>>
>>>>> It turned out that the PAT implementation we use on x86 not only enforces
>>>>> the same caching attributes for pages in the linear kernel mapping, but
>>>>> also for highmem pages through a separate R/B tree.
>>>>>
>>>>> That was unexpected and TTM never updated that R/B tree for highmem pages,
>>>>> so the function pgprot_set_cachemode() just overwrote the caching
>>>>> attributes drivers passed in to vmf_insert_pfn_prot() and that essentially
>>>>> caused all kind of random trouble.
>>>>>
>>>>> An R/B tree is potentially not a good data structure to hold thousands if
>>>>> not millions of different attributes for each page, so updating that is
>>>>> probably not the way to solve this issue.
>>>>>
>>>>> Thomas pointed out that the i915 driver is using apply_page_range()
>>>>> instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and
>>>>> just fill in the page tables with what the driver things is the right
>>>>> caching attribute.
>>>>
>>>> I assume you mean apply_to_page_range() -- same issue in patch subjects.
>>>
>>> Oh yes, of course. Sorry.
>>>
>>>> Oh this sounds horrible. Why oh why do we have these hacks in core-mm and have drivers abuse them :(
>>>
>>> Yeah I was also a bit hesitated to use that, but the performance advantage is so high that we probably can't avoid the general approach.
>>>
>>>> Honestly, apply_to_pte_range() is just the entry in doing all kinds of weird crap to page tables because "you know better".
>>>
>>> Exactly that's the problem I'm pointing out, drivers *do* know it better. The core memory management has applied incorrect values which caused all kind of the trouble.
>>>
>>> The problem is not a bug in PAT nor TTM/drivers but rather how they interact with each other.
>>>
>>> What I don't understand is why do we have the PAT in the first place? No other architecture does it this way.
>>
>> Probably because no other architecture has these weird glitches I assume ... skimming over memtype_reserve() and friends there are quite some corner cases the code is handling (BIOS, ACPI, low ISA, system RAM, ...)
>>
>>
>> I did a lot of work on the higher PAT level functions, but I am no expert on the lower level management functions, and in particular all the special cases with different memory types.
>>
>> IIRC, the goal of the PAT subsystem is to make sure that no two page tables map the same PFN with different caching attributes.
> 
> Yeah, that actually makes sense. Thomas from Intel recently explained the technical background to me:
> 
> Some x86 CPUs write back cache lines even if they aren't dirty and what can happen is that because of the linear mapping the CPU speculatively loads a cache line which is elsewhere mapped uncached.
> 
> So the end result is that the writeback of not dirty cache lines potentially corrupts the data in the otherwise uncached system memory.
> 
> But that a) only applies to memory in the linear mapping and b) only to a handful of x86 CPU types (e.g. recently Intels Luna Lake, AMD Athlons produced before 2004, maybe others).
> 
>> It treats ordinary system RAM (IORESOURCE_SYSTEM_RAM) usually in a special way: no special caching mode.
>>
>> For everything else, it expects that someone first reserves a memory range for a specific caching mode.
>>
>> For example, remap_pfn_range()...->pfnmap_track()->memtype_reserve() will make sure that there are no conflicts, to the call memtype_kernel_map_sync() to make sure the identity mapping is updated to the new type.
>>
>> In case someone ends up calling pfnmap_setup_cachemode(), the expectation is that there was a previous call to memtype_reserve_io() or similar, such that pfnmap_setup_cachemode() will find that caching mode.
>>
>>
>> So my assumption would be that that is missing for the drivers here?
> 
> Well yes and no.
> 
> See the PAT is optimized for applying specific caching attributes to ranges [A..B] (e.g. it uses an R/B tree). But what drivers do here is that they have single pages (usually for get_free_page or similar) and want to apply a certain caching attribute to it.
> 
> So what would happen is that we completely clutter the R/B tree used by the PAT with thousands if not millions of entries.
> 

Hm, above you're saying that there is no direct map, but now you are 
saying that the pages were obtained through get_free_page()?

I agree that what you describe here sounds suboptimal. But if the pages 
where obtained from the buddy, there surely is a direct map -- unless we 
explicitly remove it :(

If we're talking about individual pages without a directmap, I would 
wonder if they are actually part of a bigger memory region that can just 
be reserved in one go (similar to how remap_pfn_range()) would handle it.

Can you briefly describe how your use case obtains these PFNs, and how 
scattered tehy + their caching attributes might be?

-- 
Cheers

David / dhildenb


  reply	other threads:[~2025-08-26  8:46 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250820143739.3422-1-christian.koenig@amd.com>
2025-08-20 14:33 ` [PATCH 1/3] drm/ttm: use apply_page_range instead of vmf_insert_pfn_prot Christian König
2025-08-20 14:33 ` [PATCH 2/3] drm/ttm: reapply increase ttm pre-fault value to PMD size" Christian König
2025-08-20 14:33 ` [PATCH 3/3] drm/ttm: disable changing the global caching flags on newer AMD CPUs v2 Christian König
2025-08-20 15:12   ` Borislav Petkov
2025-08-20 15:23 ` David Hildenbrand
2025-08-21  8:10   ` Re: Christian König
2025-08-25 19:10     ` Re: David Hildenbrand
2025-08-26  8:38       ` Re: Christian König
2025-08-26  8:46         ` David Hildenbrand [this message]
2025-08-26  9:00           ` Re: Christian König
2025-08-26  9:17             ` Re: David Hildenbrand
2025-08-26  9:56               ` Re: Christian König
2025-08-26 12:07                 ` Re: David Hildenbrand
2025-08-26 16:09                   ` Re: Christian König
2025-08-27  9:13                     ` [PATCH 0/3] drm/ttm: Michel Dänzer
2025-08-28 21:18                     ` stupid and complicated PAT :) David Hildenbrand
2025-08-28 21:28                       ` David Hildenbrand
2025-08-28 21:32                         ` David Hildenbrand
2025-08-29 10:50                           ` Christian König
2025-08-29 19:52                             ` David Hildenbrand
2025-08-29 19:58                               ` David Hildenbrand
2025-08-26 14:27                 ` Thomas Hellström
2025-08-28 21:01                   ` stupid PAT :) David Hildenbrand
2025-08-26 12:37         ` David Hildenbrand
2025-08-21  9:16   ` your mail Lorenzo Stoakes
2025-08-21  9:30     ` David Hildenbrand
2025-08-21 10:05       ` Lorenzo Stoakes
2025-08-21 10:16         ` David Hildenbrand
2025-08-25 18:35         ` Christian König
2025-08-25 19:20           ` David Hildenbrand
2025-08-21  8:19 ` ✗ i915.CI.BAT: failure for series starting with [1/3] drm/ttm: use apply_page_range instead of vmf_insert_pfn_prot Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6591105b-969d-44d6-80e1-233c1b84b121@redhat.com \
    --to=david@redhat.com \
    --cc=airlied@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=matthew.brost@intel.com \
    --cc=peterz@infradead.org \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).