* [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots
@ 2026-06-15 15:52 Alexandru Elisei
2026-06-15 19:07 ` David Hildenbrand
0 siblings, 1 reply; 5+ messages in thread
From: Alexandru Elisei @ 2026-06-15 15:52 UTC (permalink / raw)
To: pbonzini, kvm, linux-kernel, maz, oupton, suzuki.poulose, kvmarm,
linux-arm-kernel, seanjc, david.hildenbrand, mark.rutland
For guest_memfd-only memslots (kvm_memslot_is_gmem_only() is true), the
memory provider for the virtual machine is the guest_memfd file, not the
userspace mapping. Faults are resolved using the guest_memfd page cache,
and the permissions for the secondary MMU mapping depends exclusively on
the memslot (i.e, if the memslot is read-only). How userspace happens to
have the memory mmaped at fault time, or even if the memory is mapped at
all into userspace, is not taken into consideration.
guest_memfd memory is not evictable, is not movable and there's no backing
storage. Once memory is allocated for an offset in guest_memfd file, the
offset will not change, and that memory is not freed unless userspace
explicitly punches a hole in the file. As a result, memory reclaim, page
migration, page aging and dirty page tracking for the userspace mapping
serve little purpose.
Despite this, KVM's MMU notifiers still modify the secondary MMU page
tables, similar to ordinary memslots, only for the same memory to be
remapped next time a guest accesses it. Make the disconnect between the
user mapping and the secondary MMU page tables explicit by ignoring the MMU
notifiers for guest_memfd-only memslots.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
The only theoretical instance where the MMU notifiers are invoked for the
userspace mapping of a guest_memfd-only memslot that I was able to find was
automatic NUMA balancing with a non-NULL NUMA policy for the guest_memfd
file. I wasn't able to test it in practice. Also my knowledge of MM is very
limited, so there might be other cases where it happens, or I might be
wrong and today the MMU notifiers are never invoked.
Either way, when and if it happens, having memory unmapped from the
seconday MMU in the case of guest_memfd-only memslot is at most a
performance issue (it causes unnecessary guest faults), but I wanted to
start a conversation about this because having memory that stays mapped at
stage 2 (unless userspace explicitly unmaps it from the VM) is needed for a
Arm feature (called SPE, Statistical Profiling Extension) that I'm working
to upstream. This patch aims to provide the guarantee that memory won't be
unmapped from the secondary MMU behind the VMMs back, which is what happens
for non guest_memfd memslots.
virt/kvm/kvm_main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 881f92d7a469..8c4158996928 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -592,6 +592,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
unsigned long hva_start, hva_end;
slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
+
+ if (kvm_slot_has_gmem(slot) && kvm_memslot_is_gmem_only(slot))
+ continue;
+
hva_start = max_t(unsigned long, range->start, slot->userspace_addr);
hva_end = min_t(unsigned long, range->end,
slot->userspace_addr + (slot->npages << PAGE_SHIFT));
base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots
2026-06-15 15:52 [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots Alexandru Elisei
@ 2026-06-15 19:07 ` David Hildenbrand
2026-06-17 13:23 ` Alexandru Elisei
0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2026-06-15 19:07 UTC (permalink / raw)
To: Alexandru Elisei, pbonzini, kvm, linux-kernel, maz, oupton,
suzuki.poulose, kvmarm, linux-arm-kernel, seanjc, mark.rutland
On 6/15/26 17:52, Alexandru Elisei wrote:
> For guest_memfd-only memslots (kvm_memslot_is_gmem_only() is true), the
> memory provider for the virtual machine is the guest_memfd file, not the
> userspace mapping. Faults are resolved using the guest_memfd page cache,
> and the permissions for the secondary MMU mapping depends exclusively on
> the memslot (i.e, if the memslot is read-only). How userspace happens to
> have the memory mmaped at fault time, or even if the memory is mapped at
> all into userspace, is not taken into consideration.
>
> guest_memfd memory is not evictable, is not movable and there's no backing
> storage. Once memory is allocated for an offset in guest_memfd file, the
> offset will not change, and that memory is not freed unless userspace
> explicitly punches a hole in the file. As a result, memory reclaim, page
> migration, page aging and dirty page tracking for the userspace mapping
> serve little purpose.
I don't think any of that is relevant for the patch at hand?
The thing is: invalidation (truncation, later migration, for any other reason)
is driven through guest_memfd notifications, not through unrelated page tables.
If we don't lookup pages for the KVM MMU through the page table, then there is
also no need for MMU notifiers. It's all guest_memfd only.
Or am I missing something?
--
Cheers,
David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots
2026-06-15 19:07 ` David Hildenbrand
@ 2026-06-17 13:23 ` Alexandru Elisei
2026-06-17 13:41 ` David Hildenbrand
0 siblings, 1 reply; 5+ messages in thread
From: Alexandru Elisei @ 2026-06-17 13:23 UTC (permalink / raw)
To: David Hildenbrand
Cc: pbonzini, kvm, linux-kernel, maz, oupton, suzuki.poulose, kvmarm,
linux-arm-kernel, seanjc, mark.rutland
Hi David,
On Mon, Jun 15, 2026 at 09:07:50PM +0200, David Hildenbrand wrote:
> On 6/15/26 17:52, Alexandru Elisei wrote:
> > For guest_memfd-only memslots (kvm_memslot_is_gmem_only() is true), the
> > memory provider for the virtual machine is the guest_memfd file, not the
> > userspace mapping. Faults are resolved using the guest_memfd page cache,
> > and the permissions for the secondary MMU mapping depends exclusively on
> > the memslot (i.e, if the memslot is read-only). How userspace happens to
> > have the memory mmaped at fault time, or even if the memory is mapped at
> > all into userspace, is not taken into consideration.
> >
> > guest_memfd memory is not evictable, is not movable and there's no backing
> > storage. Once memory is allocated for an offset in guest_memfd file, the
> > offset will not change, and that memory is not freed unless userspace
> > explicitly punches a hole in the file. As a result, memory reclaim, page
> > migration, page aging and dirty page tracking for the userspace mapping
> > serve little purpose.
>
> I don't think any of that is relevant for the patch at hand?
>
> The thing is: invalidation (truncation, later migration, for any other reason)
> is driven through guest_memfd notifications, not through unrelated page tables.
>
> If we don't lookup pages for the KVM MMU through the page table, then there is
> also no need for MMU notifiers. It's all guest_memfd only.
>
> Or am I missing something?
My thinking was that, because guest_memfd is not evictable, there is no need to
do page ageing, which would require that secondary MMU mappings be made old.
The invalidate callbacks are also used when userspace memory is marked read-only
for dirty state tracking. I was trying to explaing that, since there is no
backing for the guest_memfd file, host doesn't need to keep track of dirty state
for the memory, and ignoring the invalidate callbacks is correct for all cases.
I can drop the paragraph entirely, if you think that would make the commit
message clearer.
Thanks,
Alex
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots
2026-06-17 13:23 ` Alexandru Elisei
@ 2026-06-17 13:41 ` David Hildenbrand
2026-06-17 13:50 ` Alexandru Elisei
0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2026-06-17 13:41 UTC (permalink / raw)
To: Alexandru Elisei
Cc: pbonzini, kvm, linux-kernel, maz, oupton, suzuki.poulose, kvmarm,
linux-arm-kernel, seanjc, mark.rutland
On 6/17/26 15:23, Alexandru Elisei wrote:
> Hi David,
>
> On Mon, Jun 15, 2026 at 09:07:50PM +0200, David Hildenbrand wrote:
>> On 6/15/26 17:52, Alexandru Elisei wrote:
>>> For guest_memfd-only memslots (kvm_memslot_is_gmem_only() is true), the
>>> memory provider for the virtual machine is the guest_memfd file, not the
>>> userspace mapping. Faults are resolved using the guest_memfd page cache,
>>> and the permissions for the secondary MMU mapping depends exclusively on
>>> the memslot (i.e, if the memslot is read-only). How userspace happens to
>>> have the memory mmaped at fault time, or even if the memory is mapped at
>>> all into userspace, is not taken into consideration.
>>>
>>> guest_memfd memory is not evictable, is not movable and there's no backing
>>> storage. Once memory is allocated for an offset in guest_memfd file, the
>>> offset will not change, and that memory is not freed unless userspace
>>> explicitly punches a hole in the file. As a result, memory reclaim, page
>>> migration, page aging and dirty page tracking for the userspace mapping
>>> serve little purpose.
>>
>> I don't think any of that is relevant for the patch at hand?
>>
>> The thing is: invalidation (truncation, later migration, for any other reason)
>> is driven through guest_memfd notifications, not through unrelated page tables.
>>
>> If we don't lookup pages for the KVM MMU through the page table, then there is
>> also no need for MMU notifiers. It's all guest_memfd only.
>>
>> Or am I missing something?
>
> My thinking was that, because guest_memfd is not evictable, there is no need to
> do page ageing, which would require that secondary MMU mappings be made old.
Not really.
The KVM MMU did not obtain the folios through the page tables, but directly
through guest_memfd. Any aging would, therefore, have to be done through
guest_memfd.
Which we don't support and don't want to support :)
That we happen to have a matching user space range that maps the guest_memfd is
just coincidence from a KVM MMU point of view.
>
> The invalidate callbacks are also used when userspace memory is marked read-only
> for dirty state tracking. I was trying to explaing that, since there is no
> backing for the guest_memfd file, host doesn't need to keep track of dirty state
> for the memory, and ignoring the invalidate callbacks is correct for all cases.
>
> I can drop the paragraph entirely, if you think that would make the commit
> message clearer.
I think the real motivation is:
"Mappings in the secondary MMU were established by obtaining folios from
guest_memfd directly, not by looking the folios up through the page tables
through GUP. Consequently, there is no relationship between the page tables and
the secondary MMU: MMU notifiers do not apply."
--
Cheers,
David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots
2026-06-17 13:41 ` David Hildenbrand
@ 2026-06-17 13:50 ` Alexandru Elisei
0 siblings, 0 replies; 5+ messages in thread
From: Alexandru Elisei @ 2026-06-17 13:50 UTC (permalink / raw)
To: David Hildenbrand
Cc: pbonzini, kvm, linux-kernel, maz, oupton, suzuki.poulose, kvmarm,
linux-arm-kernel, seanjc, mark.rutland
Hi David,
On Wed, Jun 17, 2026 at 03:41:41PM +0200, David Hildenbrand wrote:
> On 6/17/26 15:23, Alexandru Elisei wrote:
> > Hi David,
> >
> > On Mon, Jun 15, 2026 at 09:07:50PM +0200, David Hildenbrand wrote:
> >> On 6/15/26 17:52, Alexandru Elisei wrote:
> >>> For guest_memfd-only memslots (kvm_memslot_is_gmem_only() is true), the
> >>> memory provider for the virtual machine is the guest_memfd file, not the
> >>> userspace mapping. Faults are resolved using the guest_memfd page cache,
> >>> and the permissions for the secondary MMU mapping depends exclusively on
> >>> the memslot (i.e, if the memslot is read-only). How userspace happens to
> >>> have the memory mmaped at fault time, or even if the memory is mapped at
> >>> all into userspace, is not taken into consideration.
> >>>
> >>> guest_memfd memory is not evictable, is not movable and there's no backing
> >>> storage. Once memory is allocated for an offset in guest_memfd file, the
> >>> offset will not change, and that memory is not freed unless userspace
> >>> explicitly punches a hole in the file. As a result, memory reclaim, page
> >>> migration, page aging and dirty page tracking for the userspace mapping
> >>> serve little purpose.
> >>
> >> I don't think any of that is relevant for the patch at hand?
> >>
> >> The thing is: invalidation (truncation, later migration, for any other reason)
> >> is driven through guest_memfd notifications, not through unrelated page tables.
> >>
> >> If we don't lookup pages for the KVM MMU through the page table, then there is
> >> also no need for MMU notifiers. It's all guest_memfd only.
> >>
> >> Or am I missing something?
> >
> > My thinking was that, because guest_memfd is not evictable, there is no need to
> > do page ageing, which would require that secondary MMU mappings be made old.
>
> Not really.
>
> The KVM MMU did not obtain the folios through the page tables, but directly
> through guest_memfd. Any aging would, therefore, have to be done through
> guest_memfd.
>
> Which we don't support and don't want to support :)
>
> That we happen to have a matching user space range that maps the guest_memfd is
> just coincidence from a KVM MMU point of view.
>
> >
> > The invalidate callbacks are also used when userspace memory is marked read-only
> > for dirty state tracking. I was trying to explaing that, since there is no
> > backing for the guest_memfd file, host doesn't need to keep track of dirty state
> > for the memory, and ignoring the invalidate callbacks is correct for all cases.
> >
> > I can drop the paragraph entirely, if you think that would make the commit
> > message clearer.
>
> I think the real motivation is:
>
> "Mappings in the secondary MMU were established by obtaining folios from
> guest_memfd directly, not by looking the folios up through the page tables
> through GUP. Consequently, there is no relationship between the page tables and
> the secondary MMU: MMU notifiers do not apply."
That's much better than my version, thanks!
Alex
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-17 13:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15 15:52 [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots Alexandru Elisei
2026-06-15 19:07 ` David Hildenbrand
2026-06-17 13:23 ` Alexandru Elisei
2026-06-17 13:41 ` David Hildenbrand
2026-06-17 13:50 ` Alexandru Elisei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox