From: Sean Christopherson <seanjc@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
Fuad Tabba <tabba@google.com>,
kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org,
mpe@ellerman.id.au, anup@brainfault.org,
paul.walmsley@sifive.com, palmer@dabbelt.com,
aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk,
brauner@kernel.org, willy@infradead.org,
akpm@linux-foundation.org, xiaoyao.li@intel.com,
yilun.xu@intel.com, chao.p.peng@linux.intel.com,
jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz,
vannapurve@google.com, mail@maciej.szmigiero.name,
michael.roth@amd.com, wei.w.wang@intel.com,
liam.merwick@oracle.com, isaku.yamahata@gmail.com,
kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
steven.price@arm.com, quic_eberman@quicinc.com,
quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
catalin.marinas@arm.com, james.morse@arm.com,
yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org,
will@kernel.org, qperret@google.com, keirf@google.com,
roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com,
fvdl@google.com, hughd@google.com, jthoughton@google.com,
peterx@redhat.com, pankaj.gupta@amd.com
Subject: Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
Date: Mon, 5 May 2025 15:57:12 -0700 [thread overview]
Message-ID: <aBlCSGB86cp3B3zn@google.com> (raw)
In-Reply-To: <7e32aabe-c170-4cfc-99aa-f257d2a69364@redhat.com>
On Mon, May 05, 2025, David Hildenbrand wrote:
> On 03.05.25 00:00, Ackerley Tng wrote:
> > Sean Christopherson <seanjc@google.com> writes:
> >
> > > On Fri, May 02, 2025, David Hildenbrand wrote:
> > > > On 30.04.25 20:58, Ackerley Tng wrote:
> > > > > > - if (is_private)
> > > > > > + if (is_gmem)
> > > > > > return max_level;
> > > > >
> > > > > I think this renaming isn't quite accurate.
> > > >
> > > > After our discussion yesterday, does that still hold true?
> > >
> > > No.
> > >
> > > > > IIUC in __kvm_mmu_max_mapping_level(), we skip considering
> > > > > host_pfn_mapping_level() if the gfn is private because private memory
> > > > > will not be mapped to userspace, so there's no need to query userspace
> > > > > page tables in host_pfn_mapping_level().
> > > >
> > > > I think the reason was that: for private we won't be walking the user space
> > > > pages tables.
> > > >
> > > > Once guest_memfd is also responsible for the shared part, why should this
> > > > here still be private-only, and why should we consider querying a user space
> > > > mapping that might not even exist?
> > >
> > > +1, one of the big selling points for guest_memfd beyond CoCo is that it provides
> > > guest-first memory. It is very explicitly an intended feature that the guest
> > > mappings KVM creates can be a superset of the host userspace mappings. E.g. the
> > > guest can use larger page sizes, have RW while the host has RO, etc.
> >
> > Do you mean that __kvm_mmu_max_mapping_level() should, in addition to
> > the parameter renaming from is_private to is_gmem, do something like
> >
> > if (is_gmem)
> > return kvm_gmem_get_max_mapping_level(slot, gfn);
No, kvm_gmem_get_pfn() already provides the maximum allowed order, we "just" need
to update that to constrain the max order based on shared vs. private. E.g. from
the original guest_memfd hugepage support[*] (which never landed), to take care
of the pgoff not being properly aligned to the memslot.
+ /*
+ * The folio can be mapped with a hugepage if and only if the folio is
+ * fully contained by the range the memslot is bound to. Note, the
+ * caller is responsible for handling gfn alignment, this only deals
+ * with the file binding.
+ */
+ huge_index = ALIGN(index, 1ull << *max_order);
+ if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) ||
+ huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages)
*max_order = 0;
[*] https://lore.kernel.org/all/20231027182217.3615211-18-seanjc@google.com
> I assume you mean, not looking at lpage_info at all?
>
> I have limited understanding what lpage_info is or what it does. I believe
> all it adds is a mechanism to *disable* large page mappings.
Correct. It's a bit of a catch-all that's used by a variety of KVM x86 features
to disable hugepages.
> We want to disable large pages if (using 2M region as example)
>
> (a) Mixed memory attributes. If a PFN falls into a 2M region, and parts
> of that region are shared vs. private (mixed memory attributes ->
> KVM_LPAGE_MIXED_FLAG)
>
> -> With gmem-shared we could have mixed memory attributes, not a PFN
> fracturing. (PFNs don't depend on memory attributes)
>
> (b) page track: intercepting (mostly write) access to GFNs
It's also used to handle misaligned memslots (or sizes), e.g. if a 1GiB memory
region spanse 1GiB+4KiB => 2GiB+4KiB, KVM will disallow 1GiB hugepages, and 2MiB
hugepages for the head and tails. Or if the host virtual address isn't aligned
with the guest physical address (see above for guest_memfd's role when there is
no hva).
> So, I wonder if we still have to take care of lpage_info, at least for
> handling (b) correctly [I assume so].
Ya, we do.
> Regarding (a) I am not sure: once memory attributes are handled by gmem in
> the gmem-shared case. IIRC, with AMD SEV we might still have to honor it? But
> gmem itself could handle that.
>
> What we could definitely do here for now is:
>
> if (is_gmem)
> /* gmem only supports 4k pages for now. */
> return PG_LEVEL_4K;
>
> And not worry about lpage_infor for the time being, until we actually do
> support larger pages.
I don't want to completely punt on this, because if it gets messy, then I want
to know now and have a solution in hand, not find out N months from now.
That said, I don't expect it to be difficult. What we could punt on is
performance of the lookups, which is the real reason KVM maintains the rather
expensive disallow_lpage array.
And that said, memslots can only bind to one guest_memfd instance, so I don't
immediately see any reason why the guest_memfd ioctl() couldn't process the
slots that are bound to it. I.e. why not update KVM_LPAGE_MIXED_FLAG from the
guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?
next prev parent reply other threads:[~2025-05-05 22:57 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-05-01 17:38 ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-05-01 18:10 ` Ira Weiny
2025-05-02 6:44 ` David Hildenbrand
2025-05-02 14:24 ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 03/13] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-05-01 18:18 ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 04/13] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-05-01 18:19 ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 05/13] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-05-01 21:37 ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups Fuad Tabba
2025-04-30 18:58 ` Ackerley Tng
2025-05-01 9:53 ` Fuad Tabba
2025-05-02 15:04 ` David Hildenbrand
2025-05-02 16:21 ` Sean Christopherson
2025-05-02 22:00 ` Ackerley Tng
2025-05-05 8:01 ` David Hildenbrand
2025-05-05 22:57 ` Sean Christopherson [this message]
2025-05-06 5:17 ` Vishal Annapurve
2025-05-06 5:28 ` Vishal Annapurve
2025-05-06 13:58 ` Sean Christopherson
2025-05-06 14:15 ` David Hildenbrand
2025-05-06 20:46 ` Ackerley Tng
2025-05-08 14:12 ` Sean Christopherson
2025-05-08 14:46 ` David Hildenbrand
2025-05-09 21:04 ` James Houghton
2025-05-09 22:29 ` David Hildenbrand
2025-05-09 22:38 ` James Houghton
2025-05-06 19:27 ` Ackerley Tng
2025-05-05 23:09 ` Ackerley Tng
2025-05-05 23:17 ` Sean Christopherson
2025-05-01 21:38 ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-04-30 21:30 ` David Hildenbrand
2025-05-01 21:43 ` Ira Weiny
2025-05-02 12:07 ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
2025-04-30 21:33 ` David Hildenbrand
2025-05-01 8:07 ` Fuad Tabba
2025-05-02 15:11 ` David Hildenbrand
2025-05-02 22:06 ` Ackerley Tng
2025-05-02 22:29 ` Ackerley Tng
2025-05-06 8:47 ` Yan Zhao
2025-05-05 21:06 ` Ira Weiny
2025-05-06 12:15 ` Fuad Tabba
2025-05-09 20:54 ` James Houghton
2025-05-11 8:03 ` David Hildenbrand
2025-05-12 7:08 ` Fuad Tabba
2025-05-12 19:29 ` James Houghton
2025-05-12 7:46 ` Roy, Patrick
2025-04-30 16:56 ` [PATCH v8 09/13] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-04-30 21:35 ` David Hildenbrand
2025-04-30 16:56 ` [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-05-09 20:15 ` James Houghton
2025-05-12 7:07 ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
2025-05-09 21:08 ` James Houghton
2025-05-12 6:55 ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 12/13] KVM: x86: KVM_X86_SW_PROTECTED_VM to support guest_memfd shared memory Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 13/13] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBlCSGB86cp3B3zn@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=amoorthy@google.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chao.p.peng@linux.intel.com \
--cc=chenhuacai@kernel.org \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=fvdl@google.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@intel.com \
--cc=james.morse@arm.com \
--cc=jarkko@kernel.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=jthoughton@google.com \
--cc=keirf@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mail@maciej.szmigiero.name \
--cc=maz@kernel.org \
--cc=mic@digikod.net \
--cc=michael.roth@amd.com \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=pankaj.gupta@amd.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qperret@google.com \
--cc=quic_cvanscha@quicinc.com \
--cc=quic_eberman@quicinc.com \
--cc=quic_mnalajal@quicinc.com \
--cc=quic_pderrin@quicinc.com \
--cc=quic_pheragu@quicinc.com \
--cc=quic_svaddagi@quicinc.com \
--cc=quic_tsoni@quicinc.com \
--cc=rientjes@google.com \
--cc=roypat@amazon.co.uk \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=wei.w.wang@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=xiaoyao.li@intel.com \
--cc=yilun.xu@intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).