From: David Hildenbrand <david@redhat.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
linux-trace-kernel@vger.kernel.org,
Dave Hansen <dave.hansen@linux.intel.com>,
Andy Lutomirski <luto@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Jani Nikula <jani.nikula@linux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Tvrtko Ursulin <tursulin@ursulin.net>,
David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
Andrew Morton <akpm@linux-foundation.org>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Pedro Falcato <pfalcato@suse.de>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH v2 00/11] mm: rewrite pfnmap tracking and remove VM_PAT
Date: Tue, 13 May 2025 19:17:10 +0200 [thread overview]
Message-ID: <9fc4ad12-869c-492d-898c-9368505199d0@redhat.com> (raw)
In-Reply-To: <2smxfvgmkrac4uzjwti5sgv2ubzsfgt24e6yinwtl7iuchxjt3@c2ejownpooio>
On 13.05.25 17:53, Liam R. Howlett wrote:
> * David Hildenbrand <david@redhat.com> [250512 08:34]:
>> On top of mm-unstable.
>>
>> VM_PAT annoyed me too much and wasted too much of my time, let's clean
>> PAT handling up and remove VM_PAT.
>>
>> This should sort out various issues with VM_PAT we discovered recently,
>> and will hopefully make the whole code more stable and easier to maintain.
>>
>> In essence: we stop letting PAT mode mess with VMAs and instead lift
>> what to track/untrack to the MM core. We remember per VMA which pfn range
>> we tracked in a new struct we attach to a VMA (we have space without
>> exceeding 192 bytes), use a kref to share it among VMAs during
>> split/mremap/fork, and automatically untrack once the kref drops to 0.
>
> What you do here seems to be decouple the vma start/end addresses by
> abstracting them into another allocated ref counted struct. This is
> close to what we do with the anon vma name..
Yes, inspired by that.
>
> It took a while to understand the underlying interval tree tracking of
> this change, but I think it's as good as it was. IIRC, there was a
> shrinking and matching to the end address in the interval tree, but I
> failed to find that commit and code - maybe it never made it upstream.
> I was able to find a thread about splitting [1], so maybe I'm mistaken.
There was hidden code that kept a memremap() shrinking working
(adjusting the tracked range).
The leftovers are removed in patch #8.
See below.
>
>>
>> This implies that we'll keep tracking a full pfn range even after partially
>> unmapping it, until fully unmapping it; but as that case was mostly broken
>> before, this at least makes it work in a way that is least intrusive to
>> VMA handling.
>>
>> Shrinking with mremap() used to work in a hacky way, now we'll similarly
>> keep the original pfn range tacked even after this form of partial unmap.
>> Does anybody care about that? Unlikely. If we run into issues, we could
>> likely handled that (adjust the tracking) when our kref drops to 1 while
>> freeing a VMA. But it adds more complexity, so avoid that for now.
>
> The decoupling of the vma and ref counted range means that we could beef
> up the backend to support actually tracking the correct range, which
> would be nice..
Right, in patch #4 I have
"
This change implies that we'll keep tracking the original PFN range even
after splitting + partially unmapping it: not too bad, because it was
not working reliably before. The only thing that kind-of worked before
was shrinking such a mapping using mremap(): we managed to adjust the
reservation in a hacky way, now we won't adjust the reservation but
leave it around until all involved VMAs are gone.
If that ever turns out to be an issue, we could hook into VM splitting
code and split the tracking; however, that adds complexity that might
not be required, so we'll keep it simple for now.
"
Duplicating/moving/forking VMAs is now definitely better than before.
Splitting is also arguably better than before -- even a simple partial
munmap() [1] is currently problematic, unless we're munmapping the last
part of a VMA (-> shrinking).
Implementing splitting properly is a bit complicated if the pnfmap ctx
has more than one ref, but it could be added if ever really required.
[1] https://lkml.kernel.org/r/20250509153033.952746-1-david@redhat.com
> but I have very little desire to work on that.
Jap :)
>
>
> [1] https://lore.kernel.org/all/5jrd43vusvcchpk2x6mouighkfhamjpaya5fu2cvikzaieg5pq@wqccwmjs4ian/
>
>>
>> Briefly tested with the new pfnmap selftests [1].
>>
>> [1] https://lkml.kernel.org/r/20250509153033.952746-1-david@redhat.com
>
> oh yes, that's still a pr_info() log. I think that should be a pr_err()
> at least?
I was wondering if that is actually a WARN_ON_ONCE(). Now, it should be
much harder to actually trigger.
Thanks!
--
Cheers,
David / dhildenb
prev parent reply other threads:[~2025-05-13 17:17 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-12 12:34 [PATCH v2 00/11] mm: rewrite pfnmap tracking and remove VM_PAT David Hildenbrand
2025-05-12 12:34 ` [PATCH v2 01/11] x86/mm/pat: factor out setting cachemode into pgprot_set_cachemode() David Hildenbrand
2025-05-13 17:29 ` Liam R. Howlett
2025-05-12 12:34 ` [PATCH v2 02/11] mm: convert track_pfn_insert() to pfnmap_setup_cachemode*() David Hildenbrand
2025-05-12 15:43 ` Lorenzo Stoakes
2025-05-13 9:06 ` David Hildenbrand
2025-05-13 17:29 ` Liam R. Howlett
2025-05-12 12:34 ` [PATCH v2 03/11] mm: introduce pfnmap_track() and pfnmap_untrack() and use them for memremap David Hildenbrand
2025-05-13 17:40 ` Liam R. Howlett
2025-05-14 17:57 ` David Hildenbrand
2025-05-12 12:34 ` [PATCH v2 04/11] mm: convert VM_PFNMAP tracking to pfnmap_track() + pfnmap_untrack() David Hildenbrand
2025-05-12 16:42 ` Lorenzo Stoakes
2025-05-13 9:10 ` David Hildenbrand
2025-05-13 10:16 ` Lorenzo Stoakes
2025-05-13 10:22 ` David Hildenbrand
2025-05-13 17:42 ` Liam R. Howlett
2025-05-12 12:34 ` [PATCH v2 05/11] x86/mm/pat: remove old pfnmap tracking interface David Hildenbrand
2025-05-13 17:42 ` Liam R. Howlett
2025-05-12 12:34 ` [PATCH v2 06/11] mm: remove VM_PAT David Hildenbrand
2025-05-13 17:42 ` Liam R. Howlett
2025-05-12 12:34 ` [PATCH v2 07/11] x86/mm/pat: remove strict_prot parameter from reserve_pfn_range() David Hildenbrand
2025-05-13 17:43 ` Liam R. Howlett
2025-05-12 12:34 ` [PATCH v2 08/11] x86/mm/pat: remove MEMTYPE_*_MATCH David Hildenbrand
2025-05-13 17:48 ` Liam R. Howlett
2025-05-14 17:53 ` David Hildenbrand
2025-05-15 14:10 ` David Hildenbrand
2025-05-12 12:34 ` [PATCH v2 09/11] x86/mm/pat: inline memtype_match() into memtype_erase() David Hildenbrand
2025-05-12 16:49 ` Lorenzo Stoakes
2025-05-13 9:11 ` David Hildenbrand
2025-05-13 17:49 ` Liam R. Howlett
2025-05-12 12:34 ` [PATCH v2 10/11] drm/i915: track_pfn() -> "pfnmap tracking" David Hildenbrand
2025-05-13 17:50 ` Liam R. Howlett
2025-05-12 12:34 ` [PATCH v2 11/11] mm/io-mapping: " David Hildenbrand
2025-05-13 17:50 ` Liam R. Howlett
2025-05-13 15:53 ` [PATCH v2 00/11] mm: rewrite pfnmap tracking and remove VM_PAT Liam R. Howlett
2025-05-13 17:17 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9fc4ad12-869c-492d-898c-9368505199d0@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hpa@zytor.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jani.nikula@linux.intel.com \
--cc=jannh@google.com \
--cc=joonas.lahtinen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=rodrigo.vivi@intel.com \
--cc=rostedt@goodmis.org \
--cc=simona@ffwll.ch \
--cc=tglx@linutronix.de \
--cc=tursulin@ursulin.net \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).