public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-kernel@vger.kernel.org,
	 Lai Jiangshan <jiangshan.ljs@antgroup.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	 Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
	kvm@vger.kernel.org, x86@kernel.org,
	 Kees Cook <keescook@chromium.org>,
	Juergen Gross <jgross@suse.com>,
	 Hou Wenlong <houwenlong.hwl@antgroup.com>
Subject: Re: [RFC PATCH 00/73] KVM: x86/PVM: Introduce a new hypervisor
Date: Tue, 27 Feb 2024 09:27:33 -0800	[thread overview]
Message-ID: <Zd4bhQPwZDvyrF44@google.com> (raw)
In-Reply-To: <CABgObfaSGOt4AKRF5WEJt2fGMj_hLXd7J2x2etce2ymvT4HkpA@mail.gmail.com>

On Mon, Feb 26, 2024, Paolo Bonzini wrote:
> On Mon, Feb 26, 2024 at 3:34 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> > - Full control: In XENPV/Lguest, the host Linux (dom0) entry code is
> >   subordinate to the hypervisor/switcher, and the host Linux kernel
> >   loses control over the entry code. This can cause inconvenience if
> >   there is a need to update something when there is a bug in the
> >   switcher or hardware.  Integral entry gives the control back to the
> >   host kernel.
> >
> > - Zero overhead incurred: The integrated entry code doesn't cause any
> >   overhead in host Linux entry path, thanks to the discreet design with
> >   PVM code in the switcher, where the PVM path is bypassed on host events.
> >   While in XENPV/Lguest, host events must be handled by the
> >   hypervisor/switcher before being processed.
> 
> Lguest... Now that's a name I haven't heard in a long time. :)  To be
> honest, it's a bit weird to see yet another PV hypervisor. I think
> what really killed Xen PV was the impossibility to protect from
> various speculation side channel attacks, and I would like to
> understand how PVM fares here.
> 
> You obviously did a great job in implementing this within the KVM
> framework; the changes in arch/x86/ are impressively small. On the
> other hand this means it's also not really my call to decide whether
> this is suitable for merging upstream. The bulk of the changes are
> really in arch/x86/kernel/ and arch/x86/entry/, and those are well
> outside my maintenance area.

The bulk of changes in _this_ patchset are outside of arch/x86/kvm, but there are
more changes on the horizon:

 : To mitigate the performance problem, we designed several optimizations
 : for the shadow MMU (not included in the patchset) and also planning to
 : build a shadow EPT in L0 for L2 PVM guests.

 : - Parallel Page fault for SPT and Paravirtualized MMU Optimization.

And even absent _new_ shadow paging functionality, merging PVM would effectively
shatter any hopes of ever removing KVM's existing, complex shadow paging code.

Specifically, unsync 4KiB PTE support in KVM provides almost no benefit for nested
TDP.  So if we can ever drop support for legacy shadow paging, which is a big if,
but not completely impossible, then we could greatly simplify KVM's shadow MMU.

Which is a good segue into my main question: was there any one thing that was
_the_ motivating factor for taking on the cost+complexity of shadow paging?  And
as alluded to be Paolo, taking on the downsides of reduced isolation?

It doesn't seem like avoiding L0 changes was the driving decision, since IIUC
you have plans to make changes there as well.

 : To mitigate the performance problem, we designed several optimizations
 : for the shadow MMU (not included in the patchset) and also planning to
 : build a shadow EPT in L0 for L2 PVM guests.

Performance I can kinda sorta understand, but my gut feeling is that the problems
with nested virtualization are solvable by adding nested paravirtualization between
L0<=>L1, with likely lower overall cost+complexity than paravirtualizing L1<=>L2.

The bulk of the pain with nested hardware virtualization lies in having to emulate
VMX/SVM, and shadow L1's TDP page tables.  Hyper-V's eVMCS takes some of the sting
off nVMX in particular, but eVMCS is still hobbled by its desire to be almost
drop-in compatible with VMX.

If we're willing to define a fully PV interface between L0 and L1 hypervisors, I
suspect we provide performance far, far better than nVMX/nSVM.  E.g. if L0 provides
a hypercall to map an L2=>L1 GPA, then L0 doesn't need to shadow L1 TDP, and L1
doesn't even need to maintain hardware-defined page tables, it can use whatever
software-defined data structure best fits it needs.

And if we limit support to 64-bit L2 kernels and drop support for unnecessary cruft,
the L1<=>L2 entry/exit paths could be drastically simplified and streamlined.  And
it should be very doable to concoct an ABI between L0 and L2 that allows L0 to
directly emulate "hot" instructions from L2, e.g. CPUID, common MSRs, etc.  I/O
would likely be solvable too, e.g. maybe with a mediated device type solution that
allows L0 to handle the data path for L2?

The one thing that I don't see line of sight to supporting is taking L0 out of the
TCB, i.e. running L2 VMs inside TDX/SNP guests.  But for me at least, that alone
isn't sufficient justification for adding a PV flavor of KVM.

  reply	other threads:[~2024-02-27 17:27 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-26 14:35 [RFC PATCH 00/73] KVM: x86/PVM: Introduce a new hypervisor Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 01/73] KVM: Documentation: Add the specification for PVM Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 02/73] x86/ABI/PVM: Add PVM-specific ABI header file Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 03/73] x86/entry: Implement switcher for PVM VM enter/exit Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 04/73] x86/entry: Implement direct switching for the switcher Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 05/73] KVM: x86: Set 'vcpu->arch.exception.injected' as true before vendor callback Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 06/73] KVM: x86: Move VMX interrupt/nmi handling into kvm.ko Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 07/73] KVM: x86/mmu: Adapt shadow MMU for PVM Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 08/73] KVM: x86: Allow hypercall handling to not skip the instruction Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 09/73] KVM: x86: Add PVM virtual MSRs into emulated_msrs_all[] Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 10/73] KVM: x86: Introduce vendor feature to expose vendor-specific CPUID Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 11/73] KVM: x86: Implement gpc refresh for guest usage Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 12/73] KVM: x86: Add NR_VCPU_SREG in SREG enum Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 13/73] KVM: x86/emulator: Reinject #GP if instruction emulation failed for PVM Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 14/73] KVM: x86: Create stubs for PVM module as a new vendor Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 15/73] mm/vmalloc: Add a helper to reserve a contiguous and aligned kernel virtual area Lai Jiangshan
2024-02-27 14:56   ` Christoph Hellwig
2024-02-27 17:07     ` Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 16/73] KVM: x86/PVM: Implement host mmu initialization Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 17/73] KVM: x86/PVM: Implement module initialization related callbacks Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 18/73] KVM: x86/PVM: Implement VM/VCPU " Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 19/73] x86/entry: Export 32-bit ignore syscall entry and __ia32_enabled variable Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 20/73] KVM: x86/PVM: Implement vcpu_load()/vcpu_put() related callbacks Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 21/73] KVM: x86/PVM: Implement vcpu_run() callbacks Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 22/73] KVM: x86/PVM: Handle some VM exits before enable interrupts Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 23/73] KVM: x86/PVM: Handle event handling related MSR read/write operation Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 24/73] KVM: x86/PVM: Introduce PVM mode switching Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 25/73] KVM: x86/PVM: Implement APIC emulation related callbacks Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 26/73] KVM: x86/PVM: Implement event delivery flags " Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 27/73] KVM: x86/PVM: Implement event injection " Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 28/73] KVM: x86/PVM: Handle syscall from user mode Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 29/73] KVM: x86/PVM: Implement allowed range checking for #PF Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 30/73] KVM: x86/PVM: Implement segment related callbacks Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 31/73] KVM: x86/PVM: Implement instruction emulation for #UD and #GP Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 32/73] KVM: x86/PVM: Enable guest debugging functions Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 33/73] KVM: x86/PVM: Handle VM-exit due to hardware exceptions Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 34/73] KVM: x86/PVM: Handle ERETU/ERETS synthetic instruction Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 35/73] KVM: x86/PVM: Handle PVM_SYNTHETIC_CPUID " Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 36/73] KVM: x86/PVM: Handle KVM hypercall Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 37/73] KVM: x86/PVM: Use host PCID to reduce guest TLB flushing Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 38/73] KVM: x86/PVM: Handle hypercalls for privilege instruction emulation Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 39/73] KVM: x86/PVM: Handle hypercall for CR3 switching Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 40/73] KVM: x86/PVM: Handle hypercall for loading GS selector Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 41/73] KVM: x86/PVM: Allow to load guest TLS in host GDT Lai Jiangshan
2024-02-26 14:35 ` [RFC PATCH 42/73] KVM: x86/PVM: Support for kvm_exit() tracepoint Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 43/73] KVM: x86/PVM: Enable direct switching Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 44/73] KVM: x86/PVM: Implement TSC related callbacks Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 45/73] KVM: x86/PVM: Add dummy PMU " Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 46/73] KVM: x86/PVM: Support for CPUID faulting Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 47/73] KVM: x86/PVM: Handle the left supported MSRs in msrs_to_save_base[] Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 48/73] KVM: x86/PVM: Implement system registers setting callbacks Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 49/73] KVM: x86/PVM: Implement emulation for non-PVM mode Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 50/73] x86/tools/relocs: Cleanup cmdline options Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 51/73] x86/tools/relocs: Append relocations into input file Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 52/73] x86/boot: Allow to do relocation for uncompressed kernel Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 53/73] x86/pvm: Add Kconfig option and the CPU feature bit for PVM guest Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 54/73] x86/pvm: Detect PVM hypervisor support Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 55/73] x86/pvm: Relocate kernel image to specific virtual address range Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 56/73] x86/pvm: Relocate kernel image early in PVH entry Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 57/73] x86/pvm: Make cpu entry area and vmalloc area variable Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 58/73] x86/pvm: Relocate kernel address space layout Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 59/73] x86/pti: Force enabling KPTI for PVM guest Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 60/73] x86/pvm: Add event entry/exit and dispatch code Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 61/73] x86/pvm: Allow to install a system interrupt handler Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 62/73] x86/pvm: Add early kernel event entry and dispatch code Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 63/73] x86/pvm: Add hypercall support Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 64/73] x86/pvm: Enable PVM event delivery Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 65/73] x86/kvm: Patch KVM hypercall as PVM hypercall Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 66/73] x86/pvm: Use new cpu feature to describe XENPV and PVM Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 67/73] x86/pvm: Implement cpu related PVOPS Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 68/73] x86/pvm: Implement irq " Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 69/73] x86/pvm: Implement mmu " Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 70/73] x86/pvm: Don't use SWAPGS for gsbase read/write Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 71/73] x86/pvm: Adapt pushf/popf in this_cpu_cmpxchg16b_emu() Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 72/73] x86/pvm: Use RDTSCP as default in vdso_read_cpunode() Lai Jiangshan
2024-02-26 14:36 ` [RFC PATCH 73/73] x86/pvm: Disable some unsupported syscalls and features Lai Jiangshan
2024-02-26 14:49 ` [RFC PATCH 00/73] KVM: x86/PVM: Introduce a new hypervisor Paolo Bonzini
2024-02-27 17:27   ` Sean Christopherson [this message]
2024-02-29  9:33     ` David Woodhouse
2024-03-01 14:00     ` Lai Jiangshan
2024-02-29 14:55   ` Lai Jiangshan
2024-03-06 11:05 ` Like Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zd4bhQPwZDvyrF44@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=houwenlong.hwl@antgroup.com \
    --cc=jgross@suse.com \
    --cc=jiangshan.ljs@antgroup.com \
    --cc=jiangshanlai@gmail.com \
    --cc=keescook@chromium.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox