From: Takahiro Itazuri <itazur@amazon.com>
To: <kvm@vger.kernel.org>, Sean Christopherson <seanjc@google.com>,
"Paolo Bonzini" <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>,
Fuad Tabba <tabba@google.com>,
Brendan Jackman <jackmanb@google.com>,
David Hildenbrand <david@kernel.org>,
David Woodhouse <dwmw2@infradead.org>,
Paul Durrant <pdurrant@amazon.com>,
Nikita Kalyazin <kalyazin@amazon.com>,
Patrick Roy <patrick.roy@campus.lmu.de>,
Takahiro Itazuri <zulinx86@gmail.com>
Subject: [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache
Date: Tue, 10 Mar 2026 06:36:35 +0000 [thread overview]
Message-ID: <20260310063647.15665-1-itazur@amazon.com> (raw)
[ based on v6.18 with [1] ]
This patch series is another follow-up to RFC v1 with minor fixes of RFC
v2. (This is still labelled RFC since its dependency [1] has not yet
been merged.) This change was tested for guest_memfd created with
GUEST_MEMFD_FLAG_MMAP and GUEST_MEMFD_FLAG_NO_DIRECT_MAP in the feature
branch of Firecracker [2].
=== Problem Statement ===
gfn_to_pfn_cache (a.k.a. pfncache) does not work with guest_memfd. As
of today, pfncaches resolve PFNs via hva_to_pfn(), which requires a
userspace mapping and relies on GUP. This does not work for guest_memfd
in the following two ways:
* guest_memfd created with GUEST_MEMFD_FLAG_MMAP does not have a
userspace mapping due to the nature of private memory.
* guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP uses an
AS_NO_DIRECT_MAP mapping, which is rejected by GUP.
In addition, pfncaches map RAM pages via kmap(), which typically returns
an address derived from the direct map. So kmap() cannot be used for
NO_DIRECT_MAP guest_memfd. pfncaches require fault-free KHVAs since
they can be used from atomic context. Thus, it cannot fall back to
access via a userspace mapping like KVM does for other accesses to
NO_DIRECT_MAP guest_memfd.
The introduction of guest_memfd support necessitates additional
invalidation paths in addition to the existing MMU notifier path: one
from guest_memfd invalidation and another from memory attribute updates.
=== Core Approach ===
The core part keeps the original approach in RFC v1:
* Resolve PFNs for guest_memfd-backed GPAs via kvm_gmem_get_pfn()
* Obtain a fault-free KHVA for NO_DIRECT_MAP pages via vmap()
=== Main Change since RFC v1 ===
* Hook pfncache invalidation into guest_memfd invalidation (punch hole
/ release / error handling) as well as into memory attribute updates
(switch between shared and private memories).
=== Design Considerations (Feedback Appreciated) ===
To implement the above change, this series tries to reuse as much of the
existing invalidation and retry infrastructure as possible. The
following points are potential design trade-offs where feedback is
especially welcome:
* Generalize and reuse the existing mn_active_invalidate_count
(renamed to active_invalidate_count). This allows reusing the
existing pfncache retry logic as-is and enables invalidating
pfncaches without holding mmu_lock from guest_memfd invalidation
context. As a side effect, active memslots swap is blocked while
active_invalidate_count > 0. To avoid this block, it would be
possible to introduce a dedicated counter like
gmem_active_invalidate_count in struct kvm instead.
* Although both guest_memfd invalidation and memory attribute update
are driven by GFN ranges, pfncache invalidation is performed using
HVA ranges and reuses the existing function. This is because
GPA-based pfncaches translate GPA->UHVA->PFN and therefore have
memslot/GPA info, whereas HVA-based pfncaches resolve PFN directly
from UHVA and do not store memslot/GPA info. Using GFN-based
invalidation would therefore miss HVA-based pfncaches. Technically,
it would be possible to refactor HVA-based pfncaches to search for
and retain the corresponding memslot/GPA at activation / refresh
time instead of at invalidation time.
* pfncaches are not dynamically allocated but are statically allocated
on a per-VM and per-vCPU basis. For a normal VM (i.e. non-Xen),
there is one pfncache per vCPU. For a Xen VM, there is one per-VM
pfncache and five per-vCPU pfncaches. Given the maximum of 1024
vCPUs, a normal VM can have up to 1024 pfncaches, consuming 4 MB of
virtual address space. A Xen VM can have up to 5121 pfncaches,
consuming approximately 20 MB of virtual address space. Although
the vmalloc area is limited on 32-bit systems, it should be large
enough and typically tens of TB on 64-bit systems (e.g. 32 TB for
4-level paging and 12800 TB for 5-level paging on x86_64). If
virtual address space exhaustion became a concern, migration to
mm-local region (forthcoming mermap?) could be considered in the
future. Note that vmap() only creates virtual mappings to existing
pages; they do not allocate new physical pages.
* With this patch series, HVA-based pfncaches always resolve PFNs
via hva_to_pfn(), and thus activation for NO_DIRECT_MAP guest_memfd
fails. It is technically possible to support this scenario, but it
would require searching the corresponding memslot and GPA from the
given UHVA in order to determine whether it is backed by
guest_memfd. Doing so would add overhead to the HVA-based pfncache
activation / refresh paths, to a greater or lesser extent,
regardless of guest_memfd-backed or not. At the time of writing,
only Xen uses HVA-based pfncaches.
=== Changelog ===
Changes since RFC v2:
- Drop avoidance of silent kvm-clock activation failure.
- Fix a compile error for kvm_for_each_memslot().
Changes since RFC v1:
- Prevent kvm-clock activation from failing silently.
- Generalize serialization mechanism for invalidation.
- Hook pfncache invalidation into guest_memfd invalidation and memory
attribute updates.
RFC v2: https://lore.kernel.org/all/20260226135309.29493-1-itazur@amazon.com/
RFC v1: https://lore.kernel.org/all/20251203144159.6131-1-itazur@amazon.com/
[1]: https://lore.kernel.org/all/20260126164445.11867-1-kalyazin@amazon.com/
[2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
Takahiro Itazuri (6):
KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs
KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP
KVM: Rename invalidate_begin to invalidate_start for consistency
KVM: pfncache: Rename invalidate_start() helper
KVM: Rename mn_* invalidate-related fields to generic ones
KVM: pfncache: Invalidate on gmem invalidation and memattr updates
Documentation/virt/kvm/locking.rst | 8 +-
arch/x86/kvm/mmu/mmu.c | 2 +-
include/linux/kvm_host.h | 13 ++--
include/linux/mmu_notifier.h | 4 +-
virt/kvm/guest_memfd.c | 64 ++++++++++++++--
virt/kvm/kvm_main.c | 101 ++++++++++++++++++-------
virt/kvm/kvm_mm.h | 12 +--
virt/kvm/pfncache.c | 114 ++++++++++++++++++++---------
8 files changed, 229 insertions(+), 89 deletions(-)
--
2.50.1
next reply other threads:[~2026-03-10 6:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 6:36 Takahiro Itazuri [this message]
2026-03-10 6:41 ` [RFC PATCH v3 1/6] KVM: pfncache: Resolve PFNs via kvm_gmem_get_pfn() for gmem-backed GPAs Takahiro Itazuri
2026-03-10 6:43 ` [RFC PATCH v3 2/6] KVM: pfncache: Obtain KHVA via vmap() for gmem with NO_DIRECT_MAP Takahiro Itazuri
2026-03-10 6:43 ` [RFC PATCH v3 3/6] KVM: Rename invalidate_begin to invalidate_start for consistency Takahiro Itazuri
2026-03-11 20:53 ` Sean Christopherson
2026-03-12 14:17 ` Takahiro Itazuri
2026-03-10 6:43 ` [RFC PATCH v3 4/6] KVM: pfncache: Rename invalidate_start() helper Takahiro Itazuri
2026-03-10 6:44 ` [RFC PATCH v3 5/6] KVM: Rename mn_* invalidate-related fields to generic ones Takahiro Itazuri
2026-03-11 20:57 ` Sean Christopherson
2026-03-12 14:33 ` Takahiro Itazuri
2026-03-10 6:44 ` [RFC PATCH v3 6/6] KVM: pfncache: Invalidate on gmem invalidation and memattr updates Takahiro Itazuri
2026-03-11 12:04 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache David Woodhouse
2026-03-12 14:02 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to Takahiro Itazuri
2026-03-11 22:32 ` [RFC PATCH v3 0/6] KVM: pfncache: Add guest_memfd support to pfncache Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260310063647.15665-1-itazur@amazon.com \
--to=itazur@amazon.com \
--cc=david@kernel.org \
--cc=dwmw2@infradead.org \
--cc=jackmanb@google.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=patrick.roy@campus.lmu.de \
--cc=pbonzini@redhat.com \
--cc=pdurrant@amazon.com \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=vkuznets@redhat.com \
--cc=zulinx86@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox