From: Cornelia Huck <cohuck@redhat.com>
To: Kashyap Chamarthy <kchamart@redhat.com>,
David Hildenbrand <david@redhat.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, dgilbert@redhat.com,
vkuznets@redhat.com
Subject: Re: [PATCH] docs/virt/kvm: Document running nested guests
Date: Fri, 7 Feb 2020 16:46:53 +0100 [thread overview]
Message-ID: <20200207164653.28849ef0.cohuck@redhat.com> (raw)
In-Reply-To: <20200207153002.16081-1-kchamart@redhat.com>
On Fri, 7 Feb 2020 16:30:02 +0100
Kashyap Chamarthy <kchamart@redhat.com> wrote:
> This is a rewrite of the Wiki page:
>
> https://www.linux-kvm.org/page/Nested_Guests
Thanks for doing that!
>
> Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
> ---
> Question: is the live migration of L1-with-L2-running-in-it fixed for
> *all* architectures, including s390x?
> ---
> .../virt/kvm/running-nested-guests.rst | 171 ++++++++++++++++++
> 1 file changed, 171 insertions(+)
> create mode 100644 Documentation/virt/kvm/running-nested-guests.rst
FWIW, there's currently a series converting this subdirectory to rst
on-list.
>
> diff --git a/Documentation/virt/kvm/running-nested-guests.rst b/Documentation/virt/kvm/running-nested-guests.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..e94ab665c71a36b7718aebae902af16b792f6dd3
> --- /dev/null
> +++ b/Documentation/virt/kvm/running-nested-guests.rst
> @@ -0,0 +1,171 @@
> +Running nested guests with KVM
> +==============================
I think the common style is to also have a "===..." line on top.
> +
> +A nested guest is a KVM guest that in turn runs on a KVM guest::
> +
> + .----------------. .----------------.
> + | | | |
> + | L2 | | L2 |
> + | (Nested Guest) | | (Nested Guest) |
> + | | | |
> + |----------------'--'----------------|
> + | |
> + | L1 (Guest Hypervisor) |
> + | KVM (/dev/kvm) |
> + | |
> + .------------------------------------------------------.
> + | L0 (Host Hypervisor) |
> + | KVM (/dev/kvm) |
> + |------------------------------------------------------|
> + | x86 Hardware (VMX) |
Just 'Hardware'? I don't think you want to make this x86-specific?
> + '------------------------------------------------------'
> +
> +
> +Terminology:
> +
> + - L0 – level-0; the bare metal host, running KVM
> +
> + - L1 – level-1 guest; a VM running on L0; also called the "guest
> + hypervisor", as it itself is capable of running KVM.
> +
> + - L2 – level-2 guest; a VM running on L1, this is the "nested guest"
> +
> +
> +Use Cases
> +---------
> +
> +An additional layer of virtualization sometimes can . You
Something seems to be missing here?
> +might have access to a large virtual machine in a cloud environment that
> +you want to compartmentalize into multiple workloads. You might be
> +running a lab environment in a training session.
> +
> +There are several scenarios where nested KVM can be Useful:
s/Useful/useful/
> +
> + - As a developer, you want to test your software on different OSes.
> + Instead of renting multiple VMs from a Cloud Provider, using nested
> + KVM lets you rent a large enough "guest hypervisor" (level-1 guest).
> + This in turn allows you to create multiple nested guests (level-2
> + guests), running different OSes, on which you can develop and test
> + your software.
> +
> + - Live migration of "guest hypervisors" and their nested guests, for
> + load balancing, disaster recovery, etc.
> +
> + - Using VMs for isolation (as in Kata Containers, and before it Clear
> + Containers https://lwn.net/Articles/644675/) if you're running on a
> + cloud provider that is already using virtual machines
> +
> +
> +Procedure to enable nesting on the bare metal host
> +--------------------------------------------------
> +
> +The KVM kernel modules do not enable nesting by default (though your
> +distribution may override this default). To enable nesting, set the
> +``nested`` module parameter to ``Y`` or ``1``. You may set this
> +parameter persistently in a file in ``/etc/modprobe.d`` in the L0 host:
> +
> +1. On the bare metal host (L0), list the kernel modules, and ensure that
> + the KVM modules::
> +
> + $ lsmod | grep -i kvm
> + kvm_intel 133627 0
> + kvm 435079 1 kvm_intel
> +
> +2. Show information for ``kvm_intel`` module::
> +
> + $ modinfo kvm_intel | grep -i nested
> + parm: nested:boolkvm 435079 1 kvm_intel
> +
> +3. To make nested KVM configuration persistent across reboots, place the
> + below entry in a config attribute::
> +
> + $ cat /etc/modprobe.d/kvm_intel.conf
> + options kvm-intel nested=y
> +
> +4. Unload and re-load the KVM Intel module::
> +
> + $ sudo rmmod kvm-intel
> + $ sudo modprobe kvm-intel
> +
> +5. Verify if the ``nested`` parameter for KVM is enabled::
> +
> + $ cat /sys/module/kvm_intel/parameters/nested
> + Y
> +
> +For AMD hosts, the process is the same as above, except that the module
> +name is ``kvm-amd``.
This looks x86-specific. Don't know about others, but s390 has one
module, also a 'nested' parameter, which is mutually exclusive with a
'hpage' parameter.
> +
> +Once your bare metal host (L0) is configured for nesting, you should be
> +able to start an L1 guest with ``qemu-kvm -cpu host`` (which passes
> +through the host CPU's capabilities as-is to the guest); or for better
> +live migration compatibility, use a named CPU model supported by QEMU,
> +e.g.: ``-cpu Haswell-noTSX-IBRS,vmx=on`` and the guest will subsequently
> +be capable of running an L2 guest with accelerated KVM.
That's probably more something that should go into a section that gives
an example how to start a nested guest with QEMU? Cpu models also look
different between architectures.
> +
> +Additional nested-related kernel parameters
> +-------------------------------------------
> +
> +If your hardware is sufficiently advanced (Intel Haswell processor or
> +above which has newer hardware virt extensions), you might want to
> +enable additional features: "Shadow VMCS (Virtual Machine Control
> +Structure)", APIC Virtualization on your bare metal host (L0).
> +Parameters for Intel hosts::
> +
> + $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs
> + Y
> +
> + $ cat /sys/module/kvm_intel/parameters/enable_apicv
> + N
> +
> + $ cat /sys/module/kvm_intel/parameters/ept
> + Y
> +
> +Again, to persist the above values across reboot, append them to
> +``/etc/modprobe.d/kvm_intel.conf``::
> +
> + options kvm-intel nested=y
> + options kvm-intel enable_shadow_vmcs=y
> + options kvm-intel enable_apivc=y
> + options kvm-intel ept=y
x86 specific -- maybe reorganize this document by starting with a
general setup section and then giving some architecture-specific
information?
> +
> +
> +Live migration with nested KVM
> +------------------------------
> +
> +The below live migration scenarios should work as of Linux kernel 5.3
> +and QEMU 4.2.0. In all the below cases, L1 exposes ``/dev/kvm`` in
> +it, i.e. the L2 guest is a "KVM-accelerated guest", not a "plain
> +emulated guest" (as done by QEMU's TCG).
> +
> +- Migrating a nested guest (L2) to another L1 guest on the *same* bare
> + metal host.
> +
> +- Migrating a nested guest (L2) to another L1 guest on a *different*
> + bare metal host.
> +
> +- Migrating an L1 guest, with an *offline* nested guest in it, to
> + another bare metal host.
> +
> +- Migrating an L1 guest, with a *live* nested guest in it, to another
> + bare metal host.
> +
> +
> +Limitations on Linux kernel versions older than 5.3
> +---------------------------------------------------
> +
> +On Linux kernel versions older than 5.3, once an L1 guest has started an
> +L2 guest, the L1 guest would no longer capable of being migrated, saved,
> +or loaded (refer to QEMU documentation on "save"/"load") until the L2
> +guest shuts down. [FIXME: Is this limitation fixed for *all*
> +architectures, including s390x?]
I don't think we ever had that limitation on s390x, since the whole way
control blocks etc. are handled is different there. David (H), do you
remember?
> +
> +Attempting to migrate or save & load an L1 guest while an L2 guest is
> +running will result in undefined behavior. You might see a ``kernel
> +BUG!`` entry in ``dmesg``, a kernel 'oops', or an outright kernel panic.
> +Such a migrated or loaded L1 guest can no longer be considered stable or
> +secure, and must be restarted.
> +
> +Migrating an L1 guest merely configured to support nesting, while not
> +actually running L2 guests, is expected to function normally.
> +Live-migrating an L2 guest from one L1 guest to another is also expected
> +to succeed.
next prev parent reply other threads:[~2020-02-07 15:47 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 15:30 [PATCH] docs/virt/kvm: Document running nested guests Kashyap Chamarthy
2020-02-07 15:46 ` Cornelia Huck [this message]
2020-02-07 16:26 ` Kashyap Chamarthy
2020-02-07 16:01 ` Dr. David Alan Gilbert
2020-02-07 16:40 ` Kashyap Chamarthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200207164653.28849ef0.cohuck@redhat.com \
--to=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=kchamart@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.