From: Cornelia Huck <cohuck@redhat.com>
To: Kashyap Chamarthy <kchamart@redhat.com>,
David Hildenbrand <david@redhat.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, dgilbert@redhat.com,
vkuznets@redhat.com
Subject: Re: [PATCH] docs/virt/kvm: Document running nested guests
Date: Fri, 7 Feb 2020 16:46:53 +0100 [thread overview]
Message-ID: <20200207164653.28849ef0.cohuck@redhat.com> (raw)
In-Reply-To: <20200207153002.16081-1-kchamart@redhat.com>
On Fri, 7 Feb 2020 16:30:02 +0100
Kashyap Chamarthy <kchamart@redhat.com> wrote:
> This is a rewrite of the Wiki page:
>
> https://www.linux-kvm.org/page/Nested_Guests
Thanks for doing that!
>
> Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
> ---
> Question: is the live migration of L1-with-L2-running-in-it fixed for
> *all* architectures, including s390x?
> ---
> .../virt/kvm/running-nested-guests.rst | 171 ++++++++++++++++++
> 1 file changed, 171 insertions(+)
> create mode 100644 Documentation/virt/kvm/running-nested-guests.rst
FWIW, there's currently a series converting this subdirectory to rst
on-list.
>
> diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..e94ab665c71a36b7718aebae902af16b792f6dd3
> --- /dev/null
> +++ b/Documentation/virt/kvm/running-nested-guests.rst
> @@ -0,0 +1,171 @@
> +Running nested guests with KVM
> +==============================
I think the common style is to also have a "===..." line on top.
> +
> +A nested guest is a KVM guest that in turn runs on a KVM guest::
> +
> + .----------------. .----------------.
> + | | | |
> + | L2 | | L2 |
> + | (Nested Guest) | | (Nested Guest) |
> + | | | |
> + |----------------'--'----------------|
> + | |
> + | L1 (Guest Hypervisor) |
> + | KVM (/dev/kvm) |
> + | |
> + .------------------------------------------------------.
> + | L0 (Host Hypervisor) |
> + | KVM (/dev/kvm) |
> + |------------------------------------------------------|
> + | x86 Hardware (VMX) |
Just 'Hardware'? I don't think you want to make this x86-specific?
> + '------------------------------------------------------'
> +
> +
> +Terminology:
> +
> + - L0 – level-0; the bare metal host, running KVM
> +
> + - L1 – level-1 guest; a VM running on L0; also called the "guest
> + hypervisor", as it itself is capable of running KVM.
> +
> + - L2 – level-2 guest; a VM running on L1, this is the "nested guest"
> +
> +
> +Use Cases
> +---------
> +
> +An additional layer of virtualization sometimes can . You
Something seems to be missing here?
> +might have access to a large virtual machine in a cloud environment that
> +you want to compartmentalize into multiple workloads. You might be
> +running a lab environment in a training session.
> +
> +There are several scenarios where nested KVM can be Useful:
s/Useful/useful/
> +
> + - As a developer, you want to test your software on different OSes.
> + Instead of renting multiple VMs from a Cloud Provider, using nested
> + KVM lets you rent a large enough "guest hypervisor" (level-1 guest).
> + This in turn allows you to create multiple nested guests (level-2
> + guests), running different OSes, on which you can develop and test
> + your software.
> +
> + - Live migration of "guest hypervisors" and their nested guests, for
> + load balancing, disaster recovery, etc.
> +
> + - Using VMs for isolation (as in Kata Containers, and before it Clear
> + Containers https://lwn.net/Articles/644675/) if you're running on a
> + cloud provider that is already using virtual machines
> +
> +
> +Procedure to enable nesting on the bare metal host
> +--------------------------------------------------
> +
> +The KVM kernel modules do not enable nesting by default (though your
> +distribution may override this default). To enable nesting, set the
> +``nested`` module parameter to ``Y`` or ``1``. You may set this
> +parameter persistently in a file in ``/etc/modprobe.d`` in the L0 host:
> +
> +1. On the bare metal host (L0), list the kernel modules, and ensure that
> + the KVM modules::
> +
> + $ lsmod | grep -i kvm
> + kvm_intel 133627 0
> + kvm 435079 1 kvm_intel
> +
> +2. Show information for ``kvm_intel`` module::
> +
> + $ modinfo kvm_intel | grep -i nested
> + parm: nested:boolkvm 435079 1 kvm_intel
> +
> +3. To make nested KVM configuration persistent across reboots, place the
> + below entry in a config attribute::
> +
> + $ cat /etc/modprobe.d/kvm_intel.conf
> + options kvm-intel nested=y
> +
> +4. Unload and re-load the KVM Intel module::
> +
> + $ sudo rmmod kvm-intel
> + $ sudo modprobe kvm-intel
> +
> +5. Verify if the ``nested`` parameter for KVM is enabled::
> +
> + $ cat /sys/module/kvm_intel/parameters/nested
> + Y
> +
> +For AMD hosts, the process is the same as above, except that the module
> +name is ``kvm-amd``.
This looks x86-specific. Don't know about others, but s390 has one
module, also a 'nested' parameter, which is mutually exclusive with a
'hpage' parameter.
> +
> +Once your bare metal host (L0) is configured for nesting, you should be
> +able to start an L1 guest with ``qemu-kvm -cpu host`` (which passes
> +through the host CPU's capabilities as-is to the guest); or for better
> +live migration compatibility, use a named CPU model supported by QEMU,
> +e.g.: ``-cpu Haswell-noTSX-IBRS,vmx=on`` and the guest will subsequently
> +be capable of running an L2 guest with accelerated KVM.
That's probably more something that should go into a section that gives
an example how to start a nested guest with QEMU? Cpu models also look
different between architectures.
> +
> +Additional nested-related kernel parameters
> +-------------------------------------------
> +
> +If your hardware is sufficiently advanced (Intel Haswell processor or
> +above which has newer hardware virt extensions), you might want to
> +enable additional features: "Shadow VMCS (Virtual Machine Control
> +Structure)", APIC Virtualization on your bare metal host (L0).
> +Parameters for Intel hosts::
> +
> + $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
> + Y
> +
> + $ cat /sys/module/kvm_intel/parameters/enable_apicv
> + N
> +
> + $ cat /sys/module/kvm_intel/parameters/ept
> + Y
> +
> +Again, to persist the above values across reboot, append them to
> +``/etc/modprobe.d/kvm_intel.conf``::
> +
> + options kvm-intel nested=y
> + options kvm-intel enable_shadow_vmcs=y
> + options kvm-intel enable_apivc=y
> + options kvm-intel ept=y
x86 specific -- maybe reorganize this document by starting with a
general setup section and then giving some architecture-specific
information?
> +
> +
> +Live migration with nested KVM
> +------------------------------
> +
> +The below live migration scenarios should work as of Linux kernel 5.3
> +and QEMU 4.2.0. In all the below cases, L1 exposes ``/dev/kvm`` in
> +it, i.e. the L2 guest is a "KVM-accelerated guest", not a "plain
> +emulated guest" (as done by QEMU's TCG).
> +
> +- Migrating a nested guest (L2) to another L1 guest on the *same* bare
> + metal host.
> +
> +- Migrating a nested guest (L2) to another L1 guest on a *different*
> + bare metal host.
> +
> +- Migrating an L1 guest, with an *offline* nested guest in it, to
> + another bare metal host.
> +
> +- Migrating an L1 guest, with a *live* nested guest in it, to another
> + bare metal host.
> +
> +
> +Limitations on Linux kernel versions older than 5.3
> +---------------------------------------------------
> +
> +On Linux kernel versions older than 5.3, once an L1 guest has started an
> +L2 guest, the L1 guest would no longer capable of being migrated, saved,
> +or loaded (refer to QEMU documentation on "save"/"load") until the L2
> +guest shuts down. [FIXME: Is this limitation fixed for *all*
> +architectures, including s390x?]
I don't think we ever had that limitation on s390x, since the whole way
control blocks etc. are handled is different there. David (H), do you
remember?
> +
> +Attempting to migrate or save & load an L1 guest while an L2 guest is
> +running will result in undefined behavior. You might see a ``kernel
> +BUG!`` entry in ``dmesg``, a kernel 'oops', or an outright kernel panic.
> +Such a migrated or loaded L1 guest can no longer be considered stable or
> +secure, and must be restarted.
> +
> +Migrating an L1 guest merely configured to support nesting, while not
> +actually running L2 guests, is expected to function normally.
> +Live-migrating an L2 guest from one L1 guest to another is also expected
> +to succeed.
next prev parent reply other threads:[~2020-02-07 15:47 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 15:30 [PATCH] docs/virt/kvm: Document running nested guests Kashyap Chamarthy
2020-02-07 15:46 ` Cornelia Huck [this message]
2020-02-07 16:26 ` Kashyap Chamarthy
2020-02-07 16:01 ` Dr. David Alan Gilbert
2020-02-07 16:40 ` Kashyap Chamarthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200207164653.28849ef0.cohuck@redhat.com \
--to=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=kchamart@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox