All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Ackerley Tng <ackerleytng@google.com>,
	Fuad Tabba <tabba@google.com>,
	kvm@vger.kernel.org,  linux-arm-msm@vger.kernel.org,
	linux-mm@kvack.org, pbonzini@redhat.com,  chenhuacai@kernel.org,
	mpe@ellerman.id.au, anup@brainfault.org,
	 paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu,  viro@zeniv.linux.org.uk,
	brauner@kernel.org, willy@infradead.org,
	 akpm@linux-foundation.org, xiaoyao.li@intel.com,
	yilun.xu@intel.com,  chao.p.peng@linux.intel.com,
	jarkko@kernel.org, amoorthy@google.com,  dmatlack@google.com,
	isaku.yamahata@intel.com, mic@digikod.net,  vbabka@suse.cz,
	vannapurve@google.com, mail@maciej.szmigiero.name,
	 michael.roth@amd.com, wei.w.wang@intel.com,
	liam.merwick@oracle.com,  isaku.yamahata@gmail.com,
	kirill.shutemov@linux.intel.com,  suzuki.poulose@arm.com,
	steven.price@arm.com, quic_eberman@quicinc.com,
	 quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
	quic_svaddagi@quicinc.com,  quic_cvanscha@quicinc.com,
	quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	 catalin.marinas@arm.com, james.morse@arm.com,
	yuzenghui@huawei.com,  oliver.upton@linux.dev, maz@kernel.org,
	will@kernel.org, qperret@google.com,  keirf@google.com,
	roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
	 jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com,
	fvdl@google.com,  hughd@google.com, jthoughton@google.com,
	peterx@redhat.com,  pankaj.gupta@amd.com
Subject: Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups
Date: Mon, 5 May 2025 15:57:12 -0700	[thread overview]
Message-ID: <aBlCSGB86cp3B3zn@google.com> (raw)
In-Reply-To: <7e32aabe-c170-4cfc-99aa-f257d2a69364@redhat.com>

On Mon, May 05, 2025, David Hildenbrand wrote:
> On 03.05.25 00:00, Ackerley Tng wrote:
> > Sean Christopherson <seanjc@google.com> writes:
> > 
> > > On Fri, May 02, 2025, David Hildenbrand wrote:
> > > > On 30.04.25 20:58, Ackerley Tng wrote:
> > > > > > -	if (is_private)
> > > > > > +	if (is_gmem)
> > > > > >    		return max_level;
> > > > > 
> > > > > I think this renaming isn't quite accurate.
> > > > 
> > > > After our discussion yesterday, does that still hold true?
> > > 
> > > No.
> > > 
> > > > > IIUC in __kvm_mmu_max_mapping_level(), we skip considering
> > > > > host_pfn_mapping_level() if the gfn is private because private memory
> > > > > will not be mapped to userspace, so there's no need to query userspace
> > > > > page tables in host_pfn_mapping_level().
> > > > 
> > > > I think the reason was that: for private we won't be walking the user space
> > > > pages tables.
> > > > 
> > > > Once guest_memfd is also responsible for the shared part, why should this
> > > > here still be private-only, and why should we consider querying a user space
> > > > mapping that might not even exist?
> > > 
> > > +1, one of the big selling points for guest_memfd beyond CoCo is that it provides
> > > guest-first memory.  It is very explicitly an intended feature that the guest
> > > mappings KVM creates can be a superset of the host userspace mappings.  E.g. the
> > > guest can use larger page sizes, have RW while the host has RO, etc.
> > 
> > Do you mean that __kvm_mmu_max_mapping_level() should, in addition to
> > the parameter renaming from is_private to is_gmem, do something like
> > 
> > if (is_gmem)
> > 	return kvm_gmem_get_max_mapping_level(slot, gfn);

No, kvm_gmem_get_pfn() already provides the maximum allowed order, we "just" need
to update that to constrain the max order based on shared vs. private.  E.g. from
the original guest_memfd hugepage support[*] (which never landed), to take care
of the pgoff not being properly aligned to the memslot.

+	/*
+	 * The folio can be mapped with a hugepage if and only if the folio is
+	 * fully contained by the range the memslot is bound to.  Note, the
+	 * caller is responsible for handling gfn alignment, this only deals
+	 * with the file binding.
+	 */
+	huge_index = ALIGN(index, 1ull << *max_order);
+	if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) ||
+	    huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages)
 		*max_order = 0;

[*] https://lore.kernel.org/all/20231027182217.3615211-18-seanjc@google.com

> I assume you mean, not looking at lpage_info at all?
> 
> I have limited understanding what lpage_info is or what it does. I believe
> all it adds is a mechanism to *disable* large page mappings.

Correct.  It's a bit of a catch-all that's used by a variety of KVM x86 features
to disable hugepages.

> We want to disable large pages if (using 2M region as example)
> 
> (a) Mixed memory attributes. If a PFN falls into a 2M region, and parts
>     of that region are shared vs. private (mixed memory attributes ->
>     KVM_LPAGE_MIXED_FLAG)
> 
>  -> With gmem-shared we could have mixed memory attributes, not a PFN
>     fracturing. (PFNs don't depend on memory attributes)
> 
> (b) page track: intercepting (mostly write) access to GFNs

It's also used to handle misaligned memslots (or sizes), e.g. if a 1GiB memory
region spanse 1GiB+4KiB => 2GiB+4KiB, KVM will disallow 1GiB hugepages, and 2MiB
hugepages for the head and tails.  Or if the host virtual address isn't aligned
with the guest physical address (see above for guest_memfd's role when there is
no hva).

> So, I wonder if we still have to take care of lpage_info, at least for
> handling (b) correctly [I assume so].

Ya, we do.

> Regarding (a) I am not sure: once memory attributes are handled by gmem in
> the gmem-shared case. IIRC, with AMD SEV we might still have to honor it? But
> gmem itself could handle that.
> 
> What we could definitely do here for now is:
> 
> if (is_gmem)
> 	/* gmem only supports 4k pages for now. */
> 	return PG_LEVEL_4K;
> 
> And not worry about lpage_infor for the time being, until we actually do
> support larger pages.

I don't want to completely punt on this, because if it gets messy, then I want
to know now and have a solution in hand, not find out N months from now.

That said, I don't expect it to be difficult.  What we could punt on is
performance of the lookups, which is the real reason KVM maintains the rather
expensive disallow_lpage array.

And that said, memslots can only bind to one guest_memfd instance, so I don't
immediately see any reason why the guest_memfd ioctl() couldn't process the
slots that are bound to it.  I.e. why not update KVM_LPAGE_MIXED_FLAG from the
guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?

  reply	other threads:[~2025-05-05 22:57 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-30 16:56 [PATCH v8 00/13] KVM: Mapping guest_memfd backed memory at the host for software protected VMs Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 01/13] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-05-01 17:38   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 02/13] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-05-01 18:10   ` Ira Weiny
2025-05-02  6:44     ` David Hildenbrand
2025-05-02 14:24       ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 03/13] KVM: Rename kvm_arch_has_private_mem() to kvm_arch_supports_gmem() Fuad Tabba
2025-05-01 18:18   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 04/13] KVM: x86: Rename kvm->arch.has_private_mem to kvm->arch.supports_gmem Fuad Tabba
2025-05-01 18:19   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 05/13] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-05-01 21:37   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups Fuad Tabba
2025-04-30 18:58   ` Ackerley Tng
2025-05-01  9:53     ` Fuad Tabba
2025-05-02 15:04     ` David Hildenbrand
2025-05-02 16:21       ` Sean Christopherson
2025-05-02 22:00         ` Ackerley Tng
2025-05-05  8:01           ` David Hildenbrand
2025-05-05 22:57             ` Sean Christopherson [this message]
2025-05-06  5:17               ` Vishal Annapurve
2025-05-06  5:28                 ` Vishal Annapurve
2025-05-06 13:58                   ` Sean Christopherson
2025-05-06 14:15                     ` David Hildenbrand
2025-05-06 20:46                       ` Ackerley Tng
2025-05-08 14:12                         ` Sean Christopherson
2025-05-08 14:46                         ` David Hildenbrand
2025-05-09 21:04                         ` James Houghton
2025-05-09 22:29                           ` David Hildenbrand
2025-05-09 22:38                             ` James Houghton
2025-05-06 19:27               ` Ackerley Tng
2025-05-05 23:09             ` Ackerley Tng
2025-05-05 23:17               ` Sean Christopherson
2025-05-01 21:38   ` Ira Weiny
2025-04-30 16:56 ` [PATCH v8 07/13] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-04-30 21:30   ` David Hildenbrand
2025-05-01 21:43   ` Ira Weiny
2025-05-02 12:07     ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 08/13] KVM: guest_memfd: Allow host to map guest_memfd() pages Fuad Tabba
2025-04-30 21:33   ` David Hildenbrand
2025-05-01  8:07     ` Fuad Tabba
2025-05-02 15:11   ` David Hildenbrand
2025-05-02 22:06     ` Ackerley Tng
2025-05-02 22:29   ` Ackerley Tng
2025-05-06  8:47     ` Yan Zhao
2025-05-05 21:06   ` Ira Weiny
2025-05-06 12:15     ` Fuad Tabba
2025-05-07  8:12   ` kernel test robot
2025-05-09 20:54   ` James Houghton
2025-05-11  8:03     ` David Hildenbrand
2025-05-12  7:08       ` Fuad Tabba
2025-05-12 19:29         ` James Houghton
2025-05-12  7:46       ` Roy, Patrick
2025-04-30 16:56 ` [PATCH v8 09/13] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-04-30 21:35   ` David Hildenbrand
2025-04-30 16:56 ` [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-05-09 20:15   ` James Houghton
2025-05-12  7:07     ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 11/13] KVM: arm64: Enable mapping guest_memfd in arm64 Fuad Tabba
2025-05-09 21:08   ` James Houghton
2025-05-12  6:55     ` Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 12/13] KVM: x86: KVM_X86_SW_PROTECTED_VM to support guest_memfd shared memory Fuad Tabba
2025-04-30 16:56 ` [PATCH v8 13/13] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBlCSGB86cp3B3zn@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=fvdl@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=liam.merwick@oracle.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=pankaj.gupta@amd.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_eberman@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=shuah@kernel.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.