From: Kashyap Chamarthy <kchamart@redhat.com>
To: kvm@vger.kernel.org
Cc: pbonzini@redhat.com, dgilbert@redhat.com, vkuznets@redhat.com,
Kashyap Chamarthy <kchamart@redhat.com>
Subject: [PATCH] docs/virt/kvm: Document running nested guests
Date: Fri, 7 Feb 2020 16:30:02 +0100 [thread overview]
Message-ID: <20200207153002.16081-1-kchamart@redhat.com> (raw)
This is a rewrite of the Wiki page:
https://www.linux-kvm.org/page/Nested_Guests
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
---
Question: is the live migration of L1-with-L2-running-in-it fixed for
*all* architectures, including s390x?
---
.../virt/kvm/running-nested-guests.rst | 171 ++++++++++++++++++
1 file changed, 171 insertions(+)
create mode 100644 Documentation/virt/kvm/running-nested-guests.rst
diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
new file mode 100644
index 0000000000000000000000000000000000000000..e94ab665c71a36b7718aebae902af16b792f6dd3
--- /dev/null
+++ b/Documentation/virt/kvm/running-nested-guests.rst
@@ -0,0 +1,171 @@
+Running nested guests with KVM
+==============================
+
+A nested guest is a KVM guest that in turn runs on a KVM guest::
+
+ .----------------. .----------------.
+ | | | |
+ | L2 | | L2 |
+ | (Nested Guest) | | (Nested Guest) |
+ | | | |
+ |----------------'--'----------------|
+ | |
+ | L1 (Guest Hypervisor) |
+ | KVM (/dev/kvm) |
+ | |
+ .------------------------------------------------------.
+ | L0 (Host Hypervisor) |
+ | KVM (/dev/kvm) |
+ |------------------------------------------------------|
+ | x86 Hardware (VMX) |
+ '------------------------------------------------------'
+
+
+Terminology:
+
+ - L0 – level-0; the bare metal host, running KVM
+
+ - L1 – level-1 guest; a VM running on L0; also called the "guest
+ hypervisor", as it itself is capable of running KVM.
+
+ - L2 – level-2 guest; a VM running on L1, this is the "nested guest"
+
+
+Use Cases
+---------
+
+An additional layer of virtualization sometimes can . You
+might have access to a large virtual machine in a cloud environment that
+you want to compartmentalize into multiple workloads. You might be
+running a lab environment in a training session.
+
+There are several scenarios where nested KVM can be Useful:
+
+ - As a developer, you want to test your software on different OSes.
+ Instead of renting multiple VMs from a Cloud Provider, using nested
+ KVM lets you rent a large enough "guest hypervisor" (level-1 guest).
+ This in turn allows you to create multiple nested guests (level-2
+ guests), running different OSes, on which you can develop and test
+ your software.
+
+ - Live migration of "guest hypervisors" and their nested guests, for
+ load balancing, disaster recovery, etc.
+
+ - Using VMs for isolation (as in Kata Containers, and before it Clear
+ Containers https://lwn.net/Articles/644675/) if you're running on a
+ cloud provider that is already using virtual machines
+
+
+Procedure to enable nesting on the bare metal host
+--------------------------------------------------
+
+The KVM kernel modules do not enable nesting by default (though your
+distribution may override this default). To enable nesting, set the
+``nested`` module parameter to ``Y`` or ``1``. You may set this
+parameter persistently in a file in ``/etc/modprobe.d`` in the L0 host:
+
+1. On the bare metal host (L0), list the kernel modules, and ensure that
+ the KVM modules::
+
+ $ lsmod | grep -i kvm
+ kvm_intel 133627 0
+ kvm 435079 1 kvm_intel
+
+2. Show information for ``kvm_intel`` module::
+
+ $ modinfo kvm_intel | grep -i nested
+ parm: nested:boolkvm 435079 1 kvm_intel
+
+3. To make nested KVM configuration persistent across reboots, place the
+ below entry in a config attribute::
+
+ $ cat /etc/modprobe.d/kvm_intel.conf
+ options kvm-intel nested=y
+
+4. Unload and re-load the KVM Intel module::
+
+ $ sudo rmmod kvm-intel
+ $ sudo modprobe kvm-intel
+
+5. Verify if the ``nested`` parameter for KVM is enabled::
+
+ $ cat /sys/module/kvm_intel/parameters/nested
+ Y
+
+For AMD hosts, the process is the same as above, except that the module
+name is ``kvm-amd``.
+
+Once your bare metal host (L0) is configured for nesting, you should be
+able to start an L1 guest with ``qemu-kvm -cpu host`` (which passes
+through the host CPU's capabilities as-is to the guest); or for better
+live migration compatibility, use a named CPU model supported by QEMU,
+e.g.: ``-cpu Haswell-noTSX-IBRS,vmx=on`` and the guest will subsequently
+be capable of running an L2 guest with accelerated KVM.
+
+Additional nested-related kernel parameters
+-------------------------------------------
+
+If your hardware is sufficiently advanced (Intel Haswell processor or
+above which has newer hardware virt extensions), you might want to
+enable additional features: "Shadow VMCS (Virtual Machine Control
+Structure)", APIC Virtualization on your bare metal host (L0).
+Parameters for Intel hosts::
+
+ $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
+ Y
+
+ $ cat /sys/module/kvm_intel/parameters/enable_apicv
+ N
+
+ $ cat /sys/module/kvm_intel/parameters/ept
+ Y
+
+Again, to persist the above values across reboot, append them to
+``/etc/modprobe.d/kvm_intel.conf``::
+
+ options kvm-intel nested=y
+ options kvm-intel enable_shadow_vmcs=y
+ options kvm-intel enable_apivc=y
+ options kvm-intel ept=y
+
+
+Live migration with nested KVM
+------------------------------
+
+The below live migration scenarios should work as of Linux kernel 5.3
+and QEMU 4.2.0. In all the below cases, L1 exposes ``/dev/kvm`` in
+it, i.e. the L2 guest is a "KVM-accelerated guest", not a "plain
+emulated guest" (as done by QEMU's TCG).
+
+- Migrating a nested guest (L2) to another L1 guest on the *same* bare
+ metal host.
+
+- Migrating a nested guest (L2) to another L1 guest on a *different*
+ bare metal host.
+
+- Migrating an L1 guest, with an *offline* nested guest in it, to
+ another bare metal host.
+
+- Migrating an L1 guest, with a *live* nested guest in it, to another
+ bare metal host.
+
+
+Limitations on Linux kernel versions older than 5.3
+---------------------------------------------------
+
+On Linux kernel versions older than 5.3, once an L1 guest has started an
+L2 guest, the L1 guest would no longer capable of being migrated, saved,
+or loaded (refer to QEMU documentation on "save"/"load") until the L2
+guest shuts down. [FIXME: Is this limitation fixed for *all*
+architectures, including s390x?]
+
+Attempting to migrate or save & load an L1 guest while an L2 guest is
+running will result in undefined behavior. You might see a ``kernel
+BUG!`` entry in ``dmesg``, a kernel 'oops', or an outright kernel panic.
+Such a migrated or loaded L1 guest can no longer be considered stable or
+secure, and must be restarted.
+
+Migrating an L1 guest merely configured to support nesting, while not
+actually running L2 guests, is expected to function normally.
+Live-migrating an L2 guest from one L1 guest to another is also expected
+to succeed.
--
2.21.0
next reply other threads:[~2020-02-07 15:30 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 15:30 Kashyap Chamarthy [this message]
2020-02-07 15:46 ` [PATCH] docs/virt/kvm: Document running nested guests Cornelia Huck
2020-02-07 16:26 ` Kashyap Chamarthy
2020-02-07 16:01 ` Dr. David Alan Gilbert
2020-02-07 16:40 ` Kashyap Chamarthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200207153002.16081-1-kchamart@redhat.com \
--to=kchamart@redhat.com \
--cc=dgilbert@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox