Re: [RFC PATCH v2 00/21] QEMU gmem implemention

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Xiaoyao Li" <xiaoyao.li@intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Marcelo Tosatti" <mtosatti@redhat.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	"Michael Roth" <michael.roth@amd.com>,
	isaku.yamahata@gmail.com, "Claudio Fontana" <cfontana@suse.de>
Subject: Re: [RFC PATCH v2 00/21] QEMU gmem implemention
Date: Thu, 14 Sep 2023 18:10:16 -0700	[thread overview]
Message-ID: <ZQOu+OE8LWtLTyno@google.com> (raw)
In-Reply-To: <fe9f3d19-df01-01e6-a253-f7fe5bdea41f@redhat.com>

On Thu, Sep 14, 2023, David Hildenbrand wrote:
> On 14.09.23 05:50, Xiaoyao Li wrote:
> > It's the v2 RFC of enabling KVM gmem[1] as the backend for private
> > memory.
> > 
> > For confidential-computing, KVM provides gmem/guest_mem interfaces for
> > userspace, like QEMU, to allocate user-unaccesible private memory. This
> > series aims to add gmem support in QEMU's RAMBlock so that each RAM can
> > have both hva-based shared memory and gmem_fd based private memory. QEMU
> > does the shared-private conversion on KVM_MEMORY_EXIT and discards the
> > memory.
> > 
> > It chooses the design that adds "private" property to hostmeory backend.
> > If "private" property is set, QEMU will allocate/create KVM gmem when
> > initialize the RAMbloch of the memory backend.
> > 
> > This sereis also introduces the first user of kvm gmem,
> > KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
> > can be created with
> > 
> >    $qemu -object sw-protected-vm,id=sp-vm0 \
> > 	-object memory-backend-ram,id=mem0,size=1G,private=on \
> > 	-machine q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
> > 	...
> > 
> > Unfortunately this patch series fails the boot of OVMF at very early
> > stage due to triple fault, because KVM doesn't support emulating string IO
> > to private memory.
> 
> Is support being added? Or have we figured out what it would take to make it
> work?

Hrm, this isn't something I've thought deeply about.  The issue is that anything
that reaches any form of copy_{from,to}_user() will go kablooie because KVM will
always try to read/write the shared mappings.  The best case scenario is that the
shared mapping is invalid and the uaccess faults.  The worst case scenario is
that KVM read/writes the wrong memory and sends the guest into the weeds.  Eww.

And we (well, at least I) definitely want to support this so that gmem can be
used for "regular" VMs, i.e. for VMs where userspace is in the TCB, but for which
userspace doesn't have access to guest memory by default.

It shouldn't be too hard to support.  It's easy enough to wire up the hook
(thankfully that aren't _that_ many sites), and gmem only supports struct page at
the moment so we go straight to kmap.  E.g. something like this

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 54480655bcce..b500b0ce5ce3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3291,12 +3291,15 @@ static int next_segment(unsigned long len, int offset)
                return len;
 }
 
-static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
-                                void *data, int offset, int len)
+static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot,
+                                gfn_t gfn, void *data, int offset, int len)
 {
        int r;
        unsigned long addr;
 
+       if (kvm_mem_is_private(kvm, gfn))
+               return kvm_gmem_read(slot, gfn, data, offset, len);
+
        addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
        if (kvm_is_error_hva(addr))
                return -EFAULT;
@@ -3309,9 +3312,8 @@ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
                        int len)
 {
-       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
-
-       return __kvm_read_guest_page(slot, gfn, data, offset, len);
+       return __kvm_read_guest_page(kvm, gfn_to_memslot(kvm, gfn), gfn, data,
+                                    offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_page);
 
@@ -3320,7 +3322,7 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data,
 {
        struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 
-       return __kvm_read_guest_page(slot, gfn, data, offset, len);
+       return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page);
 
> > 2. hugepage support.
> > 
> >     KVM gmem can be allocated from hugetlbfs. How does QEMU determine

Not yet it can't.  gmem only supports THP, hugetlbfs is a future thing, if it's
ever supported.  I wouldn't be at all surprised if we end up going down a slightly
different route and don't use hugetlbfs directly.

> >     when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
> >     easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
> >     only when memory backend is HostMemoryBackendFile of hugetlbfs.
> 
> Good question.
> 
> Probably "if the memory backend uses huge pages, also use huge pages for the
> private gmem" makes sense.
> 
> ... but it becomes a mess with preallocation ... which is what people should
> actually be using with hugetlb. Andeventual double memory-consumption ...
> but maybe that's all been taken care of already?
> 
> Probably it's best to leave hugetlb support as future work and start with
> something minimal.
> 
> > 
> > 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?
> > 
> 
> Why implement it when you have to ask others for a motivation? ;)
> 
> Personally, I'm not sure if it is really useful, especially in this state.

Yeah, as of today, KVM_X86_SW_PROTECTED_VM is mainly a development vehicle,
e.g. so that testing gmem doesn't require TDX/SNP hardware, debugging gmem guests
isn't brutally painful, etc.

Longer term, I have aspirations of being able to back most VMs with gmem, but
that's going to require quite a bit more work, e.g. gmem needs to be mappable
(when hardware allows it) so that gmem doesn't all but require double mapping,
KVM obviously needs to be able to read/write gmem, etc.

The value proposition is that having a guest-first memory type will allow KVM to
optimize and harden gmem in ways that wouldn't be feasible for a more generic
memory implementation.  E.g. memory isn't mapped into host userspace by default
(makes it harder to accidentally corrupt the guest), the guest can have *larger*
mappings than host userspace, guest memory can be served from a dedicated pool
(similar-ish to hugetlb), the pool can be omitted from the direct map, etc.

> >     This series implements KVM_X86_SW_PROTECTED_VM because it's introduced
> >     with gmem together on KVM side and it's supposed to be the first user
> >     who requires KVM gmem. However the implementation is incomplete and
> >     there lacks the definition of how KVM_X86_SW_PROTECTED_VM works.
> 
> Then it should not be included in this series such that you can make
> progress with the gmem implementation for TDX guests instead?

next prev parent reply	other threads:[~2023-09-15  1:10 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
2023-09-14  3:50 ` [RFC PATCH v2 01/21] *** HACK *** linux-headers: Update headers to pull in gmem APIs Xiaoyao Li
2023-09-14  3:50 ` [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem Xiaoyao Li
2023-09-15  2:04   ` Wang, Lei
2023-09-15  3:45     ` Xiaoyao Li
2023-09-21  8:55   ` David Hildenbrand
2023-09-22  0:22     ` Xiaoyao Li
2023-09-22  7:08       ` David Hildenbrand
2023-10-08  2:59         ` Xiaoyao Li
2023-10-06 11:07   ` Daniel P. Berrangé
2023-09-14  3:50 ` [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM Xiaoyao Li
     [not found]   ` <8734zazeag.fsf@pond.sub.org>
2023-09-19 23:24     ` Xiaoyao Li
     [not found]       ` <878r91nvy4.fsf@pond.sub.org>
2023-09-20 14:35         ` Xiaoyao Li
2023-09-20 14:37           ` David Hildenbrand
     [not found]             ` <87msxgdf5y.fsf@pond.sub.org>
2023-09-21  8:38               ` Xiaoyao Li
2023-09-21  8:45                 ` David Hildenbrand
2023-09-14  3:51 ` [RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd() Xiaoyao Li
2023-09-21  8:46   ` David Hildenbrand
2023-09-22  0:22     ` Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
2023-09-21  8:56   ` David Hildenbrand
2023-09-22  0:23     ` Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 06/21] i386: Add support for sw-protected-vm object Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
2023-09-21  8:51   ` David Hildenbrand
2023-09-22  0:24     ` Xiaoyao Li
2023-09-22  7:11       ` David Hildenbrand
2023-09-23  7:32   ` David Woodhouse
2023-09-14  3:51 ` [RFC PATCH v2 08/21] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 09/21] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 10/21] i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm specific functions Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 11/21] kvm: Introduce support for memory_attributes Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 12/21] kvm/memory: Introduce the infrastructure to set the default shared/private value Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 13/21] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 14/21] physmem: replace function name with __func__ in ram_block_discard_range() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 15/21] physmem: extract ram_block_discard_range_fd() from ram_block_discard_range() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 16/21] physmem: Introduce ram_block_convert_range() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 17/21] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 18/21] trace/kvm: Add trace for page convertion between shared and private Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 19/21] pci-host/q35: Move PAM initialization above SMRAM initialization Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 20/21] q35: Introduce smm_ranges property for q35-pci-host Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 21/21] i386: Disable SMM mode for X86_SW_PROTECTED_VM Xiaoyao Li
2023-09-14 13:09 ` [RFC PATCH v2 00/21] QEMU gmem implemention David Hildenbrand
2023-09-15  1:10   ` Sean Christopherson [this message]
2023-09-21  9:11     ` David Hildenbrand
2023-09-22  7:03       ` Xiaoyao Li
2023-09-22  7:10         ` David Hildenbrand
2023-09-15  3:37   ` Xiaoyao Li
2023-09-21  8:59     ` David Hildenbrand

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:54480655bcc dfblob:b500b0ce5ce )
 OR (
bs:"Re: [RFC PATCH v2 00/21] QEMU gmem implemention" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQOu+OE8LWtLTyno@google.com \
    --to=seanjc@google.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=eblake@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=michael.roth@amd.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=xiaoyao.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox