From: Brendan Jackman <jackmanb@google.com>
To: David Woodhouse <dwmw2@infradead.org>,
Brendan Jackman <jackmanb@google.com>,
Takahiro Itazuri <itazur@amazon.com>, <kvm@vger.kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Fuad Tabba <tabba@google.com>,
David Hildenbrand <david@kernel.org>,
Paul Durrant <pdurrant@amazon.com>,
Nikita Kalyazin <kalyazin@amazon.com>,
Patrick Roy <patrick.roy@campus.lmu.de>,
Takahiro Itazuri <zulinx86@gmail.com>
Subject: Re: [RFC PATCH 0/2] KVM: pfncache: Support guest_memfd without direct map
Date: Wed, 03 Dec 2025 17:06:28 +0000 [thread overview]
Message-ID: <DEOQV1GRUTUX.1KJUWG1JTF1JJ@google.com> (raw)
In-Reply-To: <a07a6edf549cfed840c9ead3db61978c951b15e4.camel@infradead.org>
On Wed Dec 3, 2025 at 4:35 PM UTC, David Woodhouse wrote:
> On Wed, 2025-12-03 at 16:01 +0000, Brendan Jackman wrote:
>> On Wed Dec 3, 2025 at 2:41 PM UTC, Takahiro Itazuri wrote:
>> > [ based on kvm/next with [1] ]
>> >
>> > Recent work on guest_memfd [1] is introducing support for removing guest
>> > memory from the kernel direct map (Note that this work has not yet been
>> > merged, which is why this patch series is labelled RFC). The feature is
>> > useful for non-CoCo VMs to prevent the host kernel from accidentally or
>> > speculatively accessing guest memory as a general safety improvement.
>> > Pages for guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP have
>> > their direct-map PTEs explicitly disabled, and thus cannot rely on the
>> > direct map.
>> >
>> > This breaks the features that use gfn_to_pfn_cache, including kvm-clock.
>> > gfn_to_pfn_cache caches the pfn and kernel host virtual address (khva)
>> > for a given gfn so that KVM can repeatedly access the corresponding
>> > guest page. The cached khva may later be dereferenced from atomic
>> > contexts in some cases. Such contexts cannot tolerate sleep or page
>> > faults, and therefore cannot use the userspace mapping (uhva), as those
>> > mappings may fault at any time. As a result, gfn_to_pfn_cache requires
>> > a stable, fault-free kernel virtual address for the backing pages,
>> > independent of the userspace mapping.
>> >
>> > This small patch series enables gfn_to_pfn_cache to work correctly when
>> > a memslot is backed by guest_memfd with GUEST_MEMFD_FLAG_NO_DIRECT_MAP.
>> > The first patch teaches gfn_to_pfn_cache to obtain pfn for guest_memfd-
>> > backed memslots via kvm_gmem_get_pfn() instead of GUP (hva_to_pfn()).
>> > The second patch makes gfn_to_pfn_cache use vmap()/vunmap() to create a
>> > fault-free kernel address for such pages. We believe that establishing
>> > such mapping for paravirtual guest/host communication is acceptable as
>> > such pages do not contain sensitive data.
>> >
>> > Another considered idea was to use memremap() instead of vmap(), since
>> > gpc_map() already falls back to memremap() if pfn_valid() is false.
>> > However, vmap() was chosen for the following reason. memremap() with
>> > MEMREMAP_WB first attempts to use the direct map via try_ram_remap(),
>> > and then falls back to arch_memremap_wb(), which explicitly refuses to
>> > map system RAM. It would be possible to relax this restriction, but the
>> > side effects are unclear because memremap() is widely used throughout
>> > the kernel. Changing memremap() to support system RAM without the
>> > direct map solely for gfn_to_pfn_cache feels disproportionate. If
>> > additional users appear that need to map system RAM without the direct
>> > map, revisiting and generalizing memremap() might make sense. For now,
>> > vmap()/vunmap() provides a contained and predictable solution.
>> >
>> > A possible approach in the future is to use the "ephmap" (or proclocal)
>> > proposed in [2], but it is not yet clear when that work will be merged.
>>
>> (Nobody knows how to pronounce "ephmap" aloud and when you do know how
>> to say it, it sounds like you are sayhing "fmap" which is very
>> confusing. So next time I post it I plan to call it "mermap" instead:
>> EPHemeral -> epheMERal).
>>
>> Apologies for my ignorance of the context here, I may be missing
>> insights that are obvious, but with that caveat...
>>
>> The point of the mermap (formerly "ephmap") is to be able to efficiently
>> map on demand then immediately unmap without the cost of a TLB
>> shootdown. Is there any reason we'd need to do that here? If we can get
>> away with a stable vmapping then that seems superior to the mermap
>> anyway.
>>
>> Putting it in an mm-local region would be nice (you say there shouldn't
>> be sensitive data in there, but I guess there's still some potential for
>> risk? Bounding that to the VMM process seems like a good idea to me)
>> but that seems nonblocking, could easily be added later. Also note it
>> doesn't depend on mermap, we could just have an mm-local region of the
>> vmalloc area. Mermap requires mm-local but not the other-way around.
>
> Right. It's really the mm-local part which we might want to support in
> the gfn_to_pfn_cache, not ephmap/mermap per se.
>
> As things stand, we're taking guest pages which were taken out of the
> global directmap for a *reason*... and mapping them right back in
> globally. Making the new mapping of those pages mm-local where possible
> is going to be very desirable.
Makes sense. I didn't properly explore if there are any challenges with
making vmalloc aware of it, but assuming there are no issues there I
don't think setting up an mm-local region is very challinging [1]. I
have the impression the main reason there isn't already an mm-local
region is just that the right usecase hasn't come along yet? So maybe
that could just be included in this series (assuming the mermap doesn't
get merged first).
Aside from vmalloc integration the topic I just ignored when prototyping
[0] it was that it obviously has some per-arch element. So I guess for
users of it we do need to look at whether we are OK to gate the
depending feature on arch support.
[0] https://github.com/torvalds/linux/commit/4290b4ffb35bc73ce0ac9ae590f3e9d4d27b6397
[1] https://xcancel.com/pinboard/status/761656824202276864
next prev parent reply other threads:[~2025-12-03 17:06 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 14:41 [RFC PATCH 0/2] KVM: pfncache: Support guest_memfd without direct map Takahiro Itazuri
2025-12-03 14:41 ` [RFC PATCH 1/2] KVM: pfncache: Use kvm_gmem_get_pfn() for guest_memfd-backed memslots Takahiro Itazuri
2026-01-19 12:34 ` David Hildenbrand (Red Hat)
2025-12-03 14:41 ` [RFC PATCH 2/2] KVM: pfncache: Use vmap() for guest_memfd pages without direct map Takahiro Itazuri
2025-12-03 16:01 ` [RFC PATCH 0/2] KVM: pfncache: Support guest_memfd " Brendan Jackman
2025-12-03 16:35 ` David Woodhouse
2025-12-03 17:06 ` Brendan Jackman [this message]
2025-12-04 22:31 ` David Woodhouse
2025-12-05 7:15 ` David Hildenbrand (Red Hat)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DEOQV1GRUTUX.1KJUWG1JTF1JJ@google.com \
--to=jackmanb@google.com \
--cc=david@kernel.org \
--cc=dwmw2@infradead.org \
--cc=itazur@amazon.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=patrick.roy@campus.lmu.de \
--cc=pbonzini@redhat.com \
--cc=pdurrant@amazon.com \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=vkuznets@redhat.com \
--cc=zulinx86@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox